Tomorrow's Teaching and Learning
The posting below, a bit longer than most, gives some great pointers on constructing multiple choice test items. It is from Chapter 4 – Writing Multiple Choice Test Items, in the book, Connecting the Dots : Developing Student Learning Outcomes and Outcomes-Based Assessments, Second Edition, by Ronald S. Carriveau. Published by Stylus Publishing, LLC 22883 Quicksilver Drive Sterling, Virginia 20166-2102. https://sty.presswarehouse.com/books/features.aspx Copyright © 2016 by the University of North Texas and Ronald S. Carriveau. All rights reserved. Reprinted with permission.
UP NEXT: Changing the Way People Perceive Problems
Tomorrow’s Teaching and Learning
----------- 3,111 words ----------
General Guidelines for Writing MC (Multiple Choice) Test Items
Box 4.3: General Guidelines for Writing MC Test Items
1. What is specified in the learning outcomes will drive the content of the test items. What the SLOs state is what the items should measure.
2. Generally, higher level thinking should be emphasized over recall, but this decision will depend to a great extent on the learning outcome statements.
3. A general rule for writing items is to avoid measuring overly specific and overly general knowledge, but this too will be influenced by the SLO statements.
4. The intent of your SLO statement may require a student to give a written opinion in terms of, for example, a position on an argument; for which you could use the terms you and your in the instructions and test question. However, keep in mind that when SR (Selected Response) test items are used the second person (you or your) should not be used because it opens the opportunity for personal opinion, which is problematic because then any answer choice could be correct in the student’s opinion.
5. Use vocabulary that is appropriate for the group being tested. Keep it consistent with SLO expectations and simple enough that all students will be able to understand it.
6. Edit and proof all items, prompts, and instructions to make sure grammar, punctuation, capitalization, and spelling are correct.
7. Be conservative in your use of words. Don’t use excess verbiage. Minimize as much as possible.
8. Make sure the item does not assume racial, class, or gender values or suggest stereotypes.
9. Ensure the item does not contain wording that would offend any group.
10.Make sure minority interests are considered and are well represented.
Source: Adapted from Haladyna (1999)
Writing Test Items
Writing high-quality MC items is not an easy task. In addition to the following item-writing guidelines that are considered to be best practices, the item writer must match the item to the SLO statement. If the outcome statement is lacking in terms of measurability, then writing items that address the intent of the outcome statement can become very problematic. This is true whether you are assessing with an MC format or with a scoring rubric for a written response.
Many taxonomies and lists of what are termed concepts, goals, objectives, and outcomes have been published (e.g., Bloom , Anderson , etc.) and are helpful for writing SLOs. These outcome writing aids are typically in the form of lists, diagrams, and matrices. The idea behind these aids is that what the student is expected to know and be able to do can be related to desired cognitive tasks, difficulty levels, dimensions, and types of knowledge – all of which will help the outcome writer and, ultimately, the item writer.
My recommendation is that writing outcomes, drafting possible items to measure the outcomes, and developing the test plan will work best when done as one integrated process. The reason for this recommendation is that test-item content must match the outcome statement, and construction of the item that is used to measure the outcome is dependent on the degree of measurability of the outcome statement. For example, a particular outcome may appear to be better measured with a CR (Constructed Response) item rather than an SR (MC) item. Making a decision whether to change the outcome statement to accommodate an MC format or to measure the outcome with a different format on a different test than the MC test is accomplished more efficiently and effectively when all three components can be manipulated at the same time.
To get an accurate measure of what students know and are able to do in terms of your learning outcome statements, you will want to write items that produce scores that give you the most accurate estimate possible of what you are measuring. Since no particular item will function perfectly, and random things (e.g., a pencil breaks or a computer malfunctions) can occur, there will always be some amount of error in the score. Rules and guidelines have been developed by measurement experts to help minimize the error related to how the test item is written. The following guidelines for writing MC items are common practice in the field. The purpose of these guidelines is to help you develop a test that will produce meaningful and useful scores. As was stated in the acknowledgements, Haladyna (1997, 1999), Osterlind (1998), and Linn and Gronlund (1995, 2000) are the main sources I use for item-development guidelines.
As was discussed in chapter 2 and shown in Figure 2.1., the test item includes the question or statement (as in a sentence completion format) plus all the answer choices and any special instructions or directions that are needed. The question or statement is also called the stem (or, formally, the stimulus). The answer choices are also called answer options (or, formally, the response alternatives). The answer choices that are not the correct answer are referred to as distractors or foils.
Guidelines for Writing the Item Stem
1. Write the stem as a question. An incomplete sentence format is seldom a better choice than a question for an item stem. The question-and-answer format works best because it is the normal way we communicate when we ask for information, and it doesn’t involve short-term memory issues with having to reconstruct an incomplete sentence for each option (Statman, 1988).
2. Make the stem as clear as possible so the student knows exactly what is being asked. Include only the information that is necessary. Don’t use excessive verbiage.
3. Place any directions that you use to accompany text or a graphic above the text or graphic. Typically, the text or graphic comes before the question. Any directions or instructions included in the item should be unambiguous and offer clear guidance for the student.
4. Put the main or central idea of what you want to ask in the stem, not in the answer choices. The choices may be longer than the stem, but the stem should contain all of what you want to ask, and the answer choices should only be the answers (i.e., not words that should be in the questions).
5. Word the question positively. Avoid negatives such as not or except. Using negatives can be confusing to students and can also give an advantage to test-wise students. Using not requires students to keep reconstructing in their minds each answer choice, trying to figuring out what is not correct as well as what is correct, and this can be confusing. Often, students will read right through the not and forget to use the reverse logic needed to answer the question. You may determine that not in a particular stem is absolutely needed because otherwise you would have to write too many additional positive items, but keep the number of negative stems in the test at a minimum. (Osterlind  suggests around 5% at most.) If you use a negative, use caps and/or bold font to highlight it.
6. Make sure that something in the stem doesn’t give a clue (cue) that will help the student choose the correct answer. For example, if you use child in the question and children for one or two of the incorrect answer choices, you will be giving a clue that the choices with children are probably not the correct answer. Clueing can happen within an item and between items. It can be a stem of one item to the stem of another item, an answer choice of one item to an answer choice of another item, or a stem of one item to an answer choice of another item.
7. Don’t make the item opinion based. That is, don’t ask, “In your opinion…?” as this would make any answer choice a possibly correct answer.
8. Don’t write trick questions. The purpose of a test item is not to trick the student or measure how a student deals with a trick question.
Guidelines for Writing Answer Choices
1. Traditionally, four (or more) answer choices are used, but in most cases, three options will work better. Typically, the fourth answer choice (distractor) is the most difficult and time-consuming to write, and, statistically, contributes very little to the information you want from the student’s response to the item (Haladyna, 1997, 1999; Rodriguez, 2005).
2. Make sure that only one of the answer choices is the absolutely correct answer. Make the other answer choices plausible but incorrect. That is, the student shouldn’t be able to easily reject distractors because they obviously lack plausibility. A good strategy is to use typical errors that students make as the distractors.
3. Ideally, the distractors should be equal in plausibility. However, it usually becomes increasingly difficult to make each added distractor as plausible as the preceding distractor. Traditionally, it has been common practice to use three distractors and one correct answer. Since in most cases very little additional information is achieved when adding a third distractor, it is usually adequate to stop after two distractors (see #1).
4. Use a separate page for each item when writing draft answer choices and put the correct answer in the first (A) position, the next plausible incorrect answer (distractor) in the next (B) position and the next most plausible in the next (C) position. Ideally, the distractors would be of equally difficulty, but this is difficult to achieve. When you get ready to assemble a test form, you will need to reorder the answer options so that no one position usually has the right answer. A general rule is that no more than two or three of the same letter should appear consecutively. Put an asterisk after the correct answer choice on the one-page item draft for tracking purposes.
5. Place the answer choices vertically in logical or numerical order. For example, order a sequence of events from first to last, and order numbers from lowest in value to highest. When making the test form, you may need to reverse this order to be able to vary the position of the correct answers so you don’t have more than three (or two if a short test) of any answer choice letter in a row.
6. Ideally, keep the length of answer choices about equal. Sometimes this is not possible, but no answer choice should be significantly longer or shorter than the rest of the choices, because the student may be influenced by the length of the answer choice. On the final test form, answer choices should be ordered where possible from shortest to longest or longest to shortest.
7. Avoid using the choice “none of the above” or “all of the above.” Using these answer choices conflicts with the guideline to have one absolutely correct answer. Although there is not a consensus from research or assessment experts that these choices should be eliminated entirely, it is obvious that identifying two choices as being correct will clue the student to choose all of the above. Additionally, the all of the above answer choice may have the effect of increasing the difficulty of the item. The student will probably be using a different strategy to address the all of the above answer choices than with items that do not offer this choice. You will be on firm ground if you decided to avoid these answer choices. If you decide that none of the above or all of the above is needed for a particular item, then use caution and think carefully about the other answer choices you create for that item.
8. Avoid the choice “I don’t know.” This can’t be considered a plausible distractor because the student isn’t given the choice to be distracted and is instead given the option to miss an item that the student may have gotten correct by using partial knowledge or logic. Some experts suggest that this choice may produce a bias in test scores for higher achieving students over lower achieving students (Haladyna, 1997), so it makes sense to eliminate an answer choice that has the potential for introducing bias.
9. Phrase answer choices positively as much as possible. The use of words such as not, except, doesn’t, didn’t, couldn’t, and wouldn’t will be less problematic in the answer choices compared to their use in the stem, and may work fine (and in some instances make the most sense) if the syntax is well crafted, but the recommended strategy is to consider positive phrasing first.
10.Avoid giving clues to the right answer in the item options. This clueing can be within the item and between items. Avoid using terms such as always, never, none, totally, absolutely, and completely because they set extreme limits and thus can be a clue that they are less likely (or appear to be more likely) to be the correct answer. Similarly, terms such as often, sometimes, generally, usually, and typically also qualify the answer choice and should be used with caution as they are clues and are more often true.
11.Using a stem that asks for the “best” answer requires careful wording for the distractors as they all may have a degree of correctness (thus the term best) but the correct answer has to be the best choice. It is good to get another expert’s opinion on what is the best choice and also to try out the item on a few students prior to finalizing the item. Then what you decide is the correct answer choice will more likely prove to be so (based on an analysis of the item statistics after the item is administered).
12.Don’t make a distractor humorous. Such a distractor can be a serious distraction to the real intent of the outcome you are measuring, and has no place in a test for which the results are taken seriously. Students are likely to eliminate the humorous choice and thus reduce the number of choices.
13.Don’t overlap choices. This applies primarily to ranges in numerical problems. For example, if choice A is 10-20, then B should not be 20-30 but rather should be 21-30.
14.Keep the content of the answer choices as homogeneous and parallel as possible. For example, if two of the answer choices are phrases that start with a verb and cover similar content, you would need to make the third choice the same in terms of content and structure. If the third choice is a complete sentence, then it would not be parallel. If the third choice addresses different content, then it would not be homogeneous. However, the answer choices are not limited to the exact same content. For example, three complete-sentence answer choices could have parallel construction, and they could address three different but related pieces of information. One choice could be related to mixtures, one choice could be related to solutions, and one choice could be related to compounds, but all three would be complete sentences with similar syntax. For another example, each of three choices could relate to different art periods, each plausible because of the way each choice is worded and because each is a complete sentence.
Guidelines for Item Format
1. Choose an item format style and follow it consistently. That is, be consistent in the way the stem and answer choices are laid out. If you decide to use capital letters at the beginning of lines (even if not a complete sentence) and periods at the end of lines, then be consistent.
2. Avoid true-false and complex MC formats (where more than one answer choice is correct). Haladyna (1999) lists several reasons why the complex MC format is inferior, including the influence of test-taking skills and lower reliability.
3. List the answer choices vertically rather than horizontally. This makes them easier to read.
4. Use three-option MC items. Statistical probabilities for guessing support this. The probability of getting a three-option item correct by random guessing is .33. However, most guessing is not random but rather applies some form of logical strategy, such as eliminating the least likely answer and making choices between the other two answer choices in terms of a best guess of what makes the most sense. The probability of getting three-option test items correct by only random guessing is very low. For example:
2 items correct = .11
3 items correct = .04
4 items correct = .01
5 items correct = .004
The probability of obtaining 70% correct on a well-developed MC test through random guessing alone = .0000356.
“The tradition of using four or five options for multiple-choice items is strong, despite the research evidence suggesting that it is nearly impossible to create selected-response test items with more than about three functional options” (Downing, 2006, p. 292). A study by Haladyna and Downing (1993) shows that three options are typically sufficient, because even in very well-developed tests it is rare that more than three options are statistically functional.
A meta-analysis of 80 years of published empirical studies on the appropriate number of options to use for MC concludes: “MC items should consist of three options, one correct option and two plausible distractors. Using more options does little to improve item and test score statistics and typically results in implausible distractors” (Rodriguez, 2005, p. 11).
If test items are reasonably well constructed and tests have a sufficient number of total items that are appropriately targeted in difficulty to the examinees’ ability, test developers can confidently use three-option MC items for most tests of achievement or ability. “Because of their extensive research base, well-written selected-response items can be easily defended against challenges and responses to validity” (Downing, 2006, p. 289).
Box 4.4: Quality Assurance Checklist for MC Items
1. Item addresses the content and task specified in the outcome statement.
2. Item is written at the stated cognitive level.
3. Question is not unnecessarily wordy.
4. There is only one clearly correct answer.
5. The correct answer is not clued by the question.
6. Negatives are avoided except where absolutely necessary.
7. Question does not contain misleading, ambiguous, or tricky language.
8. Question contains all the information necessary for a response.
9. Options are independent of one another (no clueing).
10.Options do not contain misleading, ambiguous, or tricky language.
11.Options are parallel in structure.
12.Options are of similar length.
13.Options avoid repetitious wording.
14.Distractor options are plausible and reasonable.
15.Options are in logical order.
16.There are no specific determiners, such as always and all in only one option.
17.There is no option that has the same meaning as another option.
18.There are no all-inclusive options (all of the above, none of the above).
19.There are no unnecessarily wordy options.
20.Item is free of grammatical errors.
Anderson, L.W., & Krathwohl, D. R. (Eds.). (2001). A taxonomy for learning, teaching, and assessing: A revision of Bloom’s taxonomy of educational objectives. New York, NY: Addison Wesley Longman.
Bloom, B.S. (1956). Taxonomy of educational objectives: The classification of educational goals. Handbook I: Cognitive domain. New York, NY: David McKay Company.
Downing, S.M. (2006). Selected-response item formats in test development. In S.M. Downing & T.M. Haladyna (Eds.), Handbook of test development. Mahwah, NJ: Erlbaum.
Haladyna, T.M. (1997). Writing test items to evaluate higher order thinking. Boston, MA: Allyn & Bacon.
Haladyna, T.M. (1999). Developing and validating multiple-choice test items (2nd ed.). Mahwah, NJ: Erlbaum.
Haladyna, T.M., & Downing, S.M. (1993). How many options is enough for a multiple-choice test item? Educational and Psychological Measurement, 53, 999-1010.
Linn, R.L., & Gronlund, N.E. (1995). How to write and use instructional objectives (5th ed.). Upper Saddle River, NJ: Prentice Hall.
Osterlind, S.J. (1998). Constructing test items: Multiple-choice, constructed-response, performance, and other formats. Norwell, MA: Kluwer Academic.
Rodriguez, M.C. (2005). Three options are optimal for multiple-choice items: A meta-analysis of 80 years of research. Educational Measurement: Issues and Practice, 24(2), 3.
Statman, S. (1988). Ask a clear question and get a clear answer: An inquiry into the question/answer and sentence completion formats of multiple-choice items. System, 16(3), 367-376.