CHAPTER VI –TEST CONSTRUCTION
Test Construction Objectives After reading this chapter, the student should be able to: 1 Construct an essay test 2 Construct a true and false test 3 Construct a multiple-choose test 4 Construct a short answer and completion test 5 Construct a matching test 6 Carry out an item analysis 7 Develop an item discrimination index
Key Terms Supply items: Item in which the student must supply the correct information. Select item: Item in which the student must select the correct information. Stem: The stem is the initial question or incomplete statement in a multiple-choice question. Distracter: The distracters are the incorrect options in a multiple-choice question. Premise: The premises are the items in the first column of a matching test. Responses: The answers in the second column of a matching test are the responses. Item analysis: Item analysis is the process of looking at the item-by-item responses of a test. Item Discrimination Index: The item discrimination index is a measure of how well an item is able to distinguish between examinees who are knowledgeable and those who are not, or between masters and non-masters.
Test Construction Now here is one of those blinding flashes of the obvious…a tests should correspond to the material covered in the class. At least you would think it was a blinding flash of the obvious, but the way some teachers test their students you have to stand back and wonder. I have heard numerous students say, “Where in the heck did he come up with those questions…it is certainly nothing that was in the book or what we covered in class.” I have also heard comments like, “He emphasized so and so and then tested us on inconsequential trivia.” Let me warn you, nothing will make a student go “postal” faster than when he prepares hard for a month, master the material and then the teacher gives him a test that doesn’t resemble anything that was covered in the book or class. In all honesty, you can’t blame them. Tests are designed for a number of purposes and one of those purposes is not to cheat your students. That is exactly what teachers are doing when they give students a test that is not relevant or valid. The ultimate purpose of testing is to improve student learning. As you construct a test keep in mind the extent to which it is likely to contribute, directly or indirectly, toward this end. Obviously the primary objective of any assessment is to ascertain what the student’s knowledge of the subject matter is and to determine what they can do and what they can’t do. It also provides feedback as to your teaching effectiveness, your teaching skills and the validity of the testing instrument. Well constructed tests should increase both the quantity and the quality of student learning. A test should yield important information as to whether students are making progress reaching their learning goals. In short, the main goal of testing is to obtain valid, reliable, and useful information concerning student achievement and teaching success. Obvious then tests should correspond to the material covered in the class. One way to ensure that your test matches the material covered in the class is to outline the actual course content that the test will cover. A convenient way of accomplishing this is to take 10 minutes following each class to list on an index card the important concepts covered in class and in the assigned reading for that day. These cards can then be used later on as a source of test items. An even more conscientious approach would be to construct the test items themselves after each class. The advantage of either of these approaches is that the resulting test is likely to be a better representation of the course material than if the test were constructed before the course or after the course, when we usually have only a fond memory or optimistic syllabus to draw from. When we are satisfied that we have an accurate description of the content areas, then all that remains is to construct test items that represent specific content areas.
Constructing True and False Questions In the most basic setup, true and false questions are those in which a statement is presented and the student indicates whether the statement is true or false. This type of test item is referred to as a "selection," item in contrast to "supply" items in which the student must supply the correct information. Another term applied to these items is "forced choice" because the student must choose between two possible answers. True and false questions are well suited for testing a student’s recall or comprehension. Students can generally respond to numerous questions, covering a lot of content, in a fairly short amount of time using true and false questions. From the teacher's perspective, these
questions can be written quickly and are easy to score. Because true and false items are objectively scored, the scores are more reliable than for items such as short essays that are at least partially dependent on the teacher's judgment. Scores on true and false items tend to be high because of the ease of guessing correct answers when the answer is not known. With only two choices (true or false) the student could expect to guess correctly on half of the items for which correct answers are not known. Thus, if a student knows the correct answers to 10 questions out of 20 and guesses on the other 10, the student could expect a score of 15. The teacher can anticipate scores ranging from approximately 50% for a student who did absolutely nothing but guess on all items to 100% for a student who knew the material. Obviously, that is not a good thing, since we are in the business of rewarding excellence not deductive reasoning or luck. Since true and false questions are in the form of statements, there is a tendency for lazy teachers to take quotations from the text, expecting the student to identify a correct excerpt or to recognize a minor change in the wording of passage. There may also be a tendency to include trivial or insignificant material from the text. Both of these practices are no no’s! Don’t you dare do it! Remember that student postal thingy! A good use of true and false questions is for the student to demonstrate understanding or simple common sense. These questions can also be used effectively in stating cause and effect relationships, established by the use of "because" in the statement. 1.
The Harvard Step Test is a valid test because it correlates .64 with maximum oxygen consumption. Unless an item is intended to show cause and effect, it should contain only one idea.
2.
The
Harvard
Step
Test
is
NOT
a
valid
test
of
cardiovascular
fitness.
If more than one idea is contained, one part of the statement may be true while the other part is false, leaving the student confused as to how to answer. Again, you are not trying to trick, hoodwink, fool, or deceive the student you are trying to assess what he knows and what he doesn’t know. 3. The
Harvard
Step
test
is
NOT
a
valid
test
of
muscular
endurance.
If the statement is an opinion, rather than a fact, it should be attributed to someone. A good indicator that a statement is an opinion is the use of "should" or similar language in the statement. 4.
According to the President of the AAHPER, fitness tests should be given at the beginning of each year. One suggested method for developing true and false items is to write a set of true statements that cover the content, and then convert approximately half of them to false statements. When
changing items to false as well as in writing the true statements initially, it is best to keep items stated positively, avoiding negatives or double negatives. If negatives (such as the word "not") are used, there should be some way of calling attention to them: putting them in italics, bold type, or capital letters, or underlining them. For example, question 5 starts with a positive statement, then shows possible changes to address the content while making it a false statement. While 5(c) is an improvement over 5(b), 5(d) is the best question to use. 5. Maximum Oxygen Consumption is a valid method for assessing cardiovascular fitness. (original true statement) 5(b). Maximum Oxygen Consumption is not a valid method for assessing cardiovascular fitness. (awkward false statement containing "not") 5(c). Maximum Oxygen Consumption is NOT a valid method for assessing cardiovascular fitness. (improved false statement) 5(d). Maximum Oxygen Consumption is an invalid method for assessing cardiovascular fitness. (improved false statement)
General Guidelines for Writing True and False Questions Other guidelines suggested by some authors for writing true-false items include the following: 1. The statements should be relatively short and simple. There is an old saying, “Keep it short and simple stupid.” This cliché certainly applies here…not the stupid part. 2. True statements should be about the same length as false statements. There is a tendency to add details in true statements to make them more precise. 3. The answers should not be obvious to students who don't know the material. That’s right you want these Bozos to have to guess when they don’t know the material. 4. Don’t use sweeping broad general statements or absolutes such as all, always, never, none, only, since the student needs only to think of a single incident in which it is untrue to mark it false. For example, a statement like the following is a poor test item A. Football players always score high on strength tests. 5. A similar situation occurs with the use of "can" in a true-false statement. If the student knows of a single case in which something could be done, it would be true. 6. Ambiguous or vague statements and terms, such as "large," "long time," "regularly," "some," and "usually" are best avoided in the interest of clarity. Some terms have more than one meaning and may be interpreted differently by individuals.
B. A nickel is larger than a dime. That is true if we are talking about diameter, false if we are talking about the monetary value. Get the picture? If not try this one. Strength is an excellent measure of physical fitness. It is false if you define fitness as cardiovascular fitness, but true if you define fitness as muscular strength, muscular endurance and cardiovascular endurance. 7. While a number of researchers recommend having about the same number of true and false statements, other researchers suggests having a larger number of false statements. A good guideline is to vary the ratio of true to false statements from test to test or quiz to quiz so that students do not depend on previous tests for cues or hints as to the balance of true and false questions. 8. If students are to write in a "T" or "F" to indicate answers, their handwriting can cause errors in marking. This can be avoided by having them circle or underline their answers ("T" or "F," "true" or "false"), which would be typed beside each question. Another option is to have them spell out their answer. You have to watch everything with these kids‌it is a war out there! 9. It is a good idea to arrange the statements so that there is no discernible pattern of answers (such as T, F, T, F, T, F and T, T, F, F, T, T, F, F) for True and False statements. 10. Be sure to include directions that tell students how and where to mark their responses‌ not that half of the students will read the direction anyway. Still, by giving unambiguous, spelled out directions you will have your butt covered when 90% of the class complains that they didn’t know what to do.
True and False Correction Items Another variation of True and False question is the True and False Correction question. Statements are presented, and each statement contains a key word or brief phrase that is underlined. It is not enough that a student correctly identify a statement as being false. To receive credit for a statement labeled false, the student must also supply the correct word or phrase which, when used to replace the underlined part of the statement, makes the statement a true one. This type of item is more thorough in determining whether students actually know the information that is presented in the false statements. While a student might correctly guess that a statement is false, no credit would be given unless the student could change the statement to a true one by writing a word or words to replace underlined word(s). The teacher decides what word or phrase can be changed in the sentence; if students were instructed only to make the statement a true statement, they would have the liberty of completely rewriting the statement so that the teacher might not be able to determine whether or not the student understood what was wrong with the original statement. If, however, the underlined word or phrase is one that can be changed to its opposite (as shown in the example below) it loses the advantage over the simpler true and false question because all the student has to know is that the statement is false and change "is" to "is not."
True or False_________ The National Collegiate Athletic Association (NCAA) is the organization responsible for certifying the academic eligibility for practice, competition, and financial aid of all prospective student-athletes for Division I and Division II. If the objective is for the student to know the National Collegiate Athletic Association and its respective functions, the question might be better presented as shown in the example below. True or False_________ The National Collegiate Athletic Association (NCAA) is the organization responsible for certifying the academic eligibility for practice, competition, and financial aid of all prospective student-athletes for Division I and Division II.
Constructing Multiple-choice Questions Most people would think that writing a multiple-choice question is pretty easy. Actually, though, it's not; writing a good multiple-choice question requires serious thought, and a range of skills and knowledge. The objective here is to set out some conventional wisdom for the construction of multiple-choice tests, which are one of the most common forms of teacher constructed tests. Multiplechoice questions are selection-type items. Students are given three or more possible answers and are asked to choose the correct answer or the "best" answer. I guess before going any further it would be a good idea to establish our terminology for discussing multiple-choice items. The stem is the initial question or incomplete statement at the beginning of each item and this is followed by the options. The options consist of the answer, (the correct option) and distracters (the incorrect) but hopefully rather persuasive options. Multiple-choice questions can be used to measure knowledge recall as well as higher order thinking. They are appropriate for use with objectives that call for the students to do such tasks as recognize, distinguish between, select, estimate, infer, predict, relate, categorize. Multiple-choice items can be scored easily and rapidly. They can be scored by machine and are frequently used for standardized tests. It is possible to sample a lot of content with multiple-choice items. Although multiple-choice tests are sometimes called "multiple guess" tests, there is less chance of guessing the correct answer than with true and false questions. For instance, if you have a correct option and four distractors the probability of someone like Mr. Potato Head who has no clue as to what the answer is would have only a 20% chance of getting the question right by guessing. It should also be noted that higher-order thinking can be assessed with multiple-choice items. For instance, the student can be asked to apply a rule or principle such as: 1. For a test to be valid it has to be A. highly reliable B. highly objective C. easily administered D. easily scored E. all of the above The student can also be asked to show understanding of cause and effect:
2. As cardiovascular fitness increases A. stroke volume decrease B. heart rate increases C. cardiac output increases D. residual volume increase Last but not least the student can be asked to identify the reasoning behind a particular choice of action: 3. Why did the experimenter us a non-parametric test to analysis his data? A. There are less strict assumptions B. There are more strict assumptions C. There are no assumptions D. The sample strictly mimics the population Although a multiple-choice test may be referred to as an "objective test," no test is truly objective. The instructor or someone else subjectively determines what content is included in the test, the amount of emphasis placed on various topics, and the type(s) of questions used. Tests containing selection type questions (true-false, multiple-choice, matching, etc.), however, can be scored objectively because the scorer is not called upon to use his or her judgment when scoring the questions. Also, the questions can be scored fairly quickly using an answer key or machine scoring if it is available While students may be able to guess or use deductive reasoning, questions of this type require the students to make more complex distinctions than if each item were written as a true and false statement. Instead of choosing between two options in a true and false question, there are more options and less chance of guessing correctly with a multiple-choose question. For example: True or False_______ Maximum oxygen consumption is the most valid method for assessing cardiovascular fitness. Which test is the most valid method for assessing cardiovascular fitness? A. Maximum oxygen consumption B. Balke treadmill test C. Ohio State Step test D. Harvard Step test Of course, nothing is perfect, accept Bo Derrick. Yeah, she is ninety years old and she is still perfect. As you probably guessed there are disadvantages for both the teacher and the student in using multiple-choice questions. It is more time consuming for the teacher to construct good multiplechoice items than true and false or completion items. One reason for this is the difficulty of finding suitable distractors, which are plausible. To be plausible, the distractor must have the potential for being selected as the correct answer. Two distractors are as effective as three if one of the three is not plausible. In other words, if you have a distracter that is highly unlikely to be selected you might just as
well use two distracters. For example, in the question below Mr. Potato head could determine that the answer B is not creditable…hell, Mike Tyson could figure that out. Which test is the most valid method for assessing cardiovascular fitness? A. Maximum oxygen consumption B. I-RM bench press C. Ohio State Step test D. Harvard Step test Reading level and reading speed of the students must be considered when constructing the items. If the language is too difficult or if there are slow readers among the students, some of them may not be able to complete the test. Consequently, these students may earn a lower grade, because they don’t finish the test, rather than because they don’t know the material. Each multiple-choice question typically involves more reading than a true and false, short answer or completion question. Best answer items (measuring understanding, application, and interpretation) are usually more difficult for the students than "correct" answer items. Ask the student to determine the "MOST IMPORTANT" consideration. This presupposes that your instruction has included activities in which students are called upon to develop skills of comparing and evaluating information. However, if the students have been instructed that a particular purpose is the most important one, the question measures only knowledge or recall of information. For example see the question below. Which of the following was the most important consideration in selecting a test for measuring body composition? A. Objectivity of the test B. Reliability of the test C. Validity of the test D. Consistency of the test
General Guidelines for Writing Multiple-Choose Questions Other guidelines suggested for writing multiple-choose items include the following: 1. Before writing the stem, identify the one point to be tested by that item. In general, the stem should not pose more than one problem, although the solution to that problem may require more than one step. 2. Include as much information in the stem and as little in the options as possible. For example, if the point of an item is to associate a term with its definition, the preferred format would be to present the definition in the stem and several terms as options rather than to present the term in the stem and several definitions as options. 3. The stem should be relatively short and simple and it should provide a definite problem. 4. The stem should be stated in a positive manner if at all possible. If negatives are included in the stem they should be highlighted or italics. 5. The stem should not include phrases such as “What do you think” or “In the opinion of”. With a stem using phrases like that any answer could be considered correct.
6. Try to avoid void using terms such as, not, never, accept, or only. If using such terms are necessary call attention to them by putting them in capital letters or bolding them. 7. Have only one correct answer or clearly one best answer among the distracters. 8. Provide three to five answers including the correct answer to the question. Generally, the minimal improvement to the item due to that hard-to-come-by fifth option is not worth the effort to construct it. Indeed, all else the same, a test of 10 items each with four options is likely a better test than a test with nine items of five options each. 9. All of the distracters should be plausible. 10. Construct distractors that are comparable in length, complexity and grammatical form to the answer, avoiding the use of such words as "always," "never," and "all." Adherence to this rule avoids some of the more common sources of biased cueing. For example, we sometimes find ourselves increasing the length and specificity of the answer (relative to distractors) in order to insure its truthfulness. This, however, becomes an easy-to-spot clue for the test wise student. Remember you are at war with these little nerds. They are not as dumb as they look. Understand that they will spend hours trying to figure out how to cheat on a test and ten minutes studying for it. 11. Options which read "none of the above," "both a. and e. above," "all of the above," _etc_., should be avoided when the students have been instructed to choose "the best answer," which implies that the options vary in degree of correctness. On the other hand, "none of the above" is acceptable if the question is factual and is probably desirable if computation yields the answer. "All of the above" is never desirable, as one recognized distractor eliminates it and two recognized answers identify it. 12. If possible, have a colleague with expertise in the content area of the exam review the items for possible ambiguities, redundancies or other structural difficulties.
Constructing Matching Test Items Matching test items, along with true and false and multiple-choice, are selection items. The matching test item format provides a way for learners to connect a word, sentence or phrase in one column to a corresponding word, sentence or phrase in a second column. One matching item can replace several true and false or short answer items (and require less reading for the students). Matching items are generally easy to write and score when the test content and objectives are suitable for matching questions. Possible difficulties in using matching items may arise due to poor student handwriting or printing, or students' being able to guess correct answers through the process of elimination. In developing matching items, there are two columns of material (Example 1). The items in the column on the left (Column A) are usually called premises and assigned numbers (1, 2, 3, etc.). Those in the column on the right (Column B) are called responses and designated by capital letters, as in Example 1. Capital letters are used rather than lower case letters in case some students have reading problems. Also there are apt to be fewer problems in scoring the student's handwritten responses if capital letters are used.
Column A
Column B
_____ 1. ΣX
A. add up all the Y scores
_____ 2. ΣY
B. square each X score
_____ 3. x2
C. add up all the X scores
_____ 4. y2
D. square each Y score
_____ 5. X12
E. number of subjects
_____ 6. N1
F. square each value of X1
The student reads a premise (Column A) and finds the correct response from among those in Column B. The student then prints the letter of the correct response in the blank beside the premise in Column A. An alternative is to have the student draw a line from the correct response to the premise, but this is more time consuming to score and it can cause considerable confusion. In the example above, the student only has to know five of the six answers to get them all correct. Since each answer in Column B can be used only once, the one remaining after the five known answers have been recorded is the answer for the sixth premise. One way to reduce the possibility of guessing correct answers is to list a larger number of responses (Column B) than premises (Column A), as is done in the example below. Column A
Column B
_____ 1. ΣX
A. add up all the Y scores
_____ 2. ΣY
B. square each X score
_____ 3. x2
C. add up all the X scores
_____ 4. y2
D. square each Y score
_____ 5. X12
E. number of subjects
_____ 6. N1
F. square each value of X1 G. Square root of Σ y2 H. Square root of Σ x2
It is suggested there be no more than five to eight premises (Column A) in one set. For each premise, the student has to read through the entire list of responses (or those still unused) to find the matching response. For this reason, the shorter elements should be in Column B, rather than Column
A to minimize the amount of reading needed for each item. Although there is little difference in the length of items in the two columns in the previous examples note the improvement in the example below when the items in the two columns are reversed. Column A
Column B
_____ 1. add up all the Y scores
A. ΣX
_____ 2. square each X score
B. ΣY
_____ 3. add up all the X scores
C. x2
_____ 4. square each Y score
D. y2
_____ 5. number of subjects
E. N1
General Guidelines for Writing Matching Test Questions Other guidelines suggested for writing matching test items include the following: 1. In an ideal world, you should present more responses than premises, so the remaining responses don’t work as hints to the correct answer. Of course, we don’t live in an ideal world. Needless to say that this may not always be possible especially when templates are used. 2. Put the items with more words in Column A. 3. Arrange items in Column B in either a logical or natural order or alphabetically if there is no apparent organizational basis. 4. Use numbers to identify items in Column A, and capital letters to identify responses in Column B. 5. Do NOT list premises in the same order as responses, and there should NOT be a pattern to the correct answers. 6. There should NOT be keywords appearing in both a premise and response providing a clue to the correct answer. 7. All of the responses and premises for a matching item should appear on the same page. 8. The items should all be part of a common set. It should NOT be possible to subdivide the premises and responses into two or more discrete subsets. 9. All responses in Column B should be plausible answers to the premises in Column A. Otherwise, the test loses some of its validity because some answers will be “giveaways.” 10. Ensure your premises don’t include hints through grammar (like implying the answer must be plural) or hints from word choice (like using the term itself in a definition).
11. Limit premises to a reasonable number. Due to the capacity limitations of working memory, avoid a long list of premises in the first column. Experts (you know the guys with all those letters behind their name) in the field of test construction recommend that you keep the list down to six items. Even less might be better, depending on the characteristics of your audience. 12. Use only one correct answer. Every premise should have only one correct response. Obvious, but triple-check to make sure each response can only work for one premise.
Pros and Cons of Matching Tests The matching test item format allows you to cover more content in one question than you can with multiple-choice. That’s why they are excellent for intermittent knowledge checks. They are also a very efficient approach to testing and can provide an excellent objective measurement. In addition, they provide a way to add some variety to your activities. A disadvantage is the tendency to use this format for the simple recall of information. Adult learners often require practice and testing of higher-order thinking skills, such as problem solving. Don’t limit your use of this format to recall of knowledge alone. Rather, try to find ways to use matching for application and analysis too, such as presenting a short scenario and asking for the best solution. When using matching test items in an assessment, you’ll need to identify the specifics of how they will be scored. Some prefer to give partial credit when some—but not all—of the responses are correct. Often, the authoring tool determines the approach, but if you do have control, it’s an issue you’ll need to explore.
Constructing Short Answer and Completion Questions Short answer and completion items are both forms of supply items in which students have to provide the response, rather than selecting a response from among several provided in the test. Supply items are frequently used for recall of information and for problem solving in statistics where the student is asked to supply the answer to a calculation or the result of a formula. How appropriate is that? These types of questions have some advantages. Like true and false questions, short-answer and completion items can be written fairly easily. Students can complete a large number of items in a fairly short time (unless they involve working complex statistical problems), thus sampling a lot of content. Since the student has to generate the answers, the possibility of guessing the correct answers to these questions is greatly reduced when compared with true and false questions. While these items can be easy to score, poor student handwriting poses a potential problem. I am sure you have heard someone say, “He writes like chicken scratch.” Well, some students write so bad not even a chicken would claim the writing. Completion items are those in which a statement is written with blanks substituted for one or more words which the student is to fill in. Generally the student writes their answer directly on the blank that is put in the sentence. For example:
1. In statistical research the ____________ is denoted in symbols as HO: M1=M2. However, scoring can be made easier by providing the answer blank in a column along either the right or left side of the paper. For example: _________ 1. In statistical research the ____________ is denoted in symbols as HO: M1=M2. Short answer questions are similar to completion items except that a question is written in its entirety, with the student supplying a correct response of one word or a short phrase. The use of a short answer question may be preferable to the completion item in Example 1 if it makes the question more specific and leads to the one answer that you are seeking. Also, it may be easier for younger children to respond to a question than to fill in a blank completing a sentence. 1. The symbols H O : M 1 =M 2 denote what statistical principle?
__________
A potential problem with both Short Answer and Completion items is that unless the items are well written, students may give an answer that is not the one you wanted when you wrote the item, but one which is also correct. Along the same lines student may give you an incorrect answer if the item is not well written and argue that it is right. You have heard it said that to err is human‌to admit it is not. That is true of most students‌only more so. It takes careful attention to write the item with enough specificity that the answer you are seeking is the only correct one, but it is essential that you do just that.
General Guidelines for Writing Short Answer and Completion Questions Other guidelines suggested for writing Short Answer and Completion items include the following: 1. When developing short answer questions only key or important words should be replaced by blanks in completion items. 2. The statements for short answer questions should be relatively short and simple. 3. When developing short answer questions the embedded blanks on each statement should be about the same in length. 4. The embedded blanks for short answer questions should be near the end of the sentence rather than at the beginning so that the student has an opportunity to formulate a framework before encountering the missing word or phrase. 5. Statements should not be taken directly from the text. 6. There should be only one blank in an item unless the terms are part of a series. 7. The requested answer should be brief and straight forward. 8. If the answer is a number, indicate the unit of measurement (pounds, cents, dollars, etc.) and the degree of specificity (three decimal places) that you require. 9. Don’t use sweeping broad general statements or absolutes such as all, always, never, none, only, since the student needs only to think of a single incident in which it is untrue to mark it false.