What ChatGPT brings us for item writing

Even those ones who do not consider themselves technology buffs have heard about ChatGPT. Today, everybody is talking about ChatGPT—a modern megastar of artificial intelligence (AI) and machine learning (ML) industries. This is how ChatGPT introduces itself:

“I am an AI language model called ChatGPT. My purpose is to assist and communicate with users through text-based conversations, and to provide helpful and informative responses to their questions and inquiries.”

I realized its sense straight away but still decided to interact with ChatGPT more closely while working on item writing for a Math test. I opened this link and initiated a new conversation—stupid simple. Below I am sharing my observations of ChatGPT features, classifying them as its strengths and imperfections.  I will then discuss my experience in item writing with ChatGPT.

Strengths of ChatGPT

ChatGPT is an intelligent chatbot that

• is polite,
• declines inadequate requests,
• generates responses very quickly,
• allows user to provide follow-up adjustments,
• remembers what user said earlier in the conversation,
• can significantly reduce time for item writing to subject matter experts.

Imperfections of ChatGPT

General ones

• Free version of ChatGPT has a restricted number of symbols produced (around 2,000). Therefore, if you need to construct a test, like me, you need to task the chatbot in portions or keep asking it to finalize the assignment that you have given. Thus, your interaction with ChatGPT will be more efficient if you plan your assignment better.
• ChatGPT has limited knowledge of the world after 2021. You need to consider this fact when designing a prompt.
• ChatGPT can generate inaccurate responses to questions. You may find several examples below.

Math Item Writing

ChatGPT’s Math skills are not excellent. Below I have outlined some bright examples:

• ChatGPT “thinks” that 11/2 equals to 2 and 6/5 equals to 1;
• ChatGPT does not realize that one root cannot be a solution of a quadratic equation that has two different roots while it lists both roots among answer options;
• ChatGPT makes mistakes when naming a correct response out of a few options generated;
• ChatGPT makes distractors (incorrect answers in multiple choice questions) up without considering possible mistakes that test takers can make;
• ChatGPT provides a correct solution to a question but does not include it into answer options;
• ChatGPT cannot differentiate between levels of difficulty even if it knows the sample from the prompt.

Let’s have a look at a couple of Math questions written by ChatGPT according to my prompts.

 Prompt Response Comments ChatGPT, please, write 1 multiple-choice Math question of low difficulty for 16-17 year old high school American students for the following topic in Algebra: Linear equations in two variables. Multiple-choice questions must have only one correct answer (key) and three incorrect answers (distractors) based on test takers’ incorrect reasoning. Please, provide an explanation to each distractor and a solution to the question. Which point satisfies the equation y = -2x + 7? A) (5, 1) B) (-1, 5) C) (2, 5) D) (1, -5)   Explanation: To find a point that satisfies the equation, we can plug in values for x and y that make the equation true. For example, if we let x = 1, then y = -2(1) + 7 = 5. So, the point (1, 5) satisfies the equation. ChatGPT did not include a correct response into answer options even though it solved this question correctly. ChatGPT, please, write 1 student-produced response Math question of medium difficulty for 16-17 year old high school American students for the following topic in Algebra: Systems of two linear equations in two variables. Student-produced response questions must have only one correct answer as a real number. Please, provide a solution to this question. Solve the system of equations: 2x + 3y = 8 x – y = 1.   Explanation: To solve the system of equations, we can use either substitution or elimination method. Using substitution, we can solve one equation for one variable and substitute it into the other equation. For example, we can solve the second equation for x and get x = y + 1. Then, we substitute x = y + 1 into the first equation and get 2(y + 1) + 3y = 8. Simplifying this equation gives us 5y + 2 = 8, which gives us y = 1. Substituting y = 1 into x = y + 1 gives us x = 2. Therefore, the solution to the system of equations is (2, 1). Solving 5y + 2 = 8 does not give us y = 1 but y = 6/5 or 1.2. Therefore, the point (2, 1) is not a correct solution of this system of equations.

English Item Writing

ChatGPT’s English skills are not as perfect as I thought they would be since it is an English native speaker (I suppose). I have had some experience with ChatGPT making it design several tasks to test students’ English skills. As expected, ChatGPT is more productive with chunks of text. I asked it to make some Reading and Writing questions for SAT and IELTS, and the results were much better than those for Math. However, there were some shortcomings in terms of grammar errors and poor lexical choice. For instance, ChatGPT suggested option A as a correct response that is arguable:

Which of the following options best replaces the underlined phrase in the sentence: “The company’s profits have increased by 20% over the last quarter.”

A) The company’s profits have risen by 20% over the last quarter.

B) The company’s profits have grown by 20% over the last quarter.

C) The company’s profits have expanded by 20% over the last quarter.

D) The company’s profits have developed by 20% over the last quarter.

Russian Item Writing

I was curious how proficient ChatGPT was in Russian. Well, it knows how many letters the Russian alphabet contains and some other facts, but when I asked ChatGPT to write a question to check some grammar, my expectations were betrayed: ChatGPT gave a wrong answer when asked to identify a gerund.

Какое из перечисленных слов является деепричастием?

A) Зеленеющий

B) Шумно

C) Летящий

D) Бежавший

Правильный ответ: C) Летящий – это деепричастие настоящего времени. Ошибочные ответы: A) – прилагательное, B) – наречие, D) – причастие прошедшего времени.

Summary

Since AI-enabled tools do not have real brains and only regurgitate information based on what they “learned” by interacting with billions of data points, it is fair enough that there will be some inaccuracies in their responses even though they will look human-like.

In its own introduction, ChatGPT announced straight away that it was intended to ASSIST us, humans, not to DO things instead of us. ChatGPT can provide helpful responses but is not capable of independent thought or emotions. Like any AI, ChatGPT is vulnerable to multiple issues, including bias and potential inaccuracies.

I would like to re-emphasize once again my point that I outlined in the previous post, that any AI-powered tool no matter how “good” it is needs a smart user to manipulate it. Otherwise, it is going to be a complete disaster! However, if users provide ChatGPT with a very well structured question and the right guidance, there will be a high chance of obtaining an accurate response.

One more thing to consider is that Math is a way of thinking. Therefore, I would not expect AI-powered chatbots to be super-efficient in this field, at least in the near future.

Another important consideration is the legal situation. Current opinion in the field is this: Copyright laws are designed to protect creative development of new things by humans, but if ChatGPT is doing the heavy lifting, then it is not human-created and therefore not fully protected by copyright law. If the items are stolen, you might not have legal recourse. However, this topic is, of course, quite new and continues to evolve.

In the end, is it worth it to use ChatGPT for item writing? Absolutely. Even though there are errors which are quickly found, such as having two correct answers to a question or not having a key specified, these sorts of things are easily fixed. The average time to develop new items can be significantly reduced. Moreover, ChatGPT will continue to get even more powerful! So, stay tuned!

Laila is an experienced educator and an Educational Measurement specialist with expertise in item and test development, setting standards, analyzing, interpreting, and presenting data based on Classical Test Theory (CTT) and Item Response Theory (IRT). As a professional, Laila is primarily interested in employing IRT methodology and AI technologies to educational improvement.

Credentialing