‘Tes%ng a test’ – Evalua%ng our Assessment Tools Eddy White, Ph.D.
Assessment Coordinator Center for English as a Second Language University of Arizona
Targets
1. My background 2. Classroom based assessment 3. Tests ‐ purposes/ func%ons 4. The ‘cardinal criteria’ for evalua%ng a test 5. Conclusions
2
Targets
1. My background 2. Classroom based assessment 3. Tests ‐ purposes/ func%ons 4. The ‘cardinal criteria’ for evalua%ng a test 5. Conclusions
3
(1994‐2009)
Tokyo Woman’s ChrisAan University
Classroom-based Assessment • Assessment of Learning • Assessment for Learning
V a n c o u v e r
Targets
1. My background 2. Classroom based assessment 3. Tests ‐ purposes/ func%ons 4. The ‘cardinal criteria’ for evalua%ng a test 5. Conclusions
8
The goal of assessment is to . . .
9
The goal of assessment has to be, above all, to support the improvement of learning and teaching. (Fredrickson & Collins, 1989)
10
definiAon: Classroom Assessment
Planning
ReporAng
Assessment
Analyzing
CollecAng
ESL Assessment‐Purposes • idenAfy strengths and weaknesses of individual students, • adjust instrucAon to build on students’ strengths and alleviate weaknesses, • monitor the effecAveness of instrucAon, • provide feedback to students (sponsors, parents,etc.), and • make decisions about the advancement of students to the next level of the program. (Source: ESL Senior High Guide to ImplementaAon, 2002)
12
Consider
• Research suggests that teachers spend from one‐quarter to one‐third of their professional Ame on assessment‐related acAviAes.
• Almost all do so without the benefit of having learned the principles of sound assessment. (S%ggins, 2007)
Teachers learn how to teach without learning much about how to assess. (Heritage, 2007)
14
Assessment literacy • the kinds of assessment know‐how and understanding that teachers need to assess their students effecAvely • Assessment literate educators should have knowledge and skills related to the basic principles of quality assessment pracAces (SERVE Center, University of North Carolina, 2004)
Assessment Literacy Know‐how and understanding teachers need to assess students effec%vely and maximize learning
Importance of classroom assessment • We may not like it, but students can and do ignore our teaching; • however if they want to get a qualificaAon, they have to parAcipate in the assessment processes we design and implement. (Brown, S. 2004. Assessment for learning. Learning and Teaching in Higher Educa0on, 1, 81‐89)
43 / W
Who are the assessment ‘deciders’ at your insAtuAon?
Classroom-Based Assessment: Challenges, Choices, and Consequences
Assessment Frameworks
Assessment framework • ‐ the series of assessment tools (exams, tasks, projects, etc.) that are scored and used to arrive at a summa%ve grade for a course • ‐it should be skills‐based and knowledge‐ based (i.e. Ss demonstrate what they know about and can do with English) • based on learning outcomes
• The spirit and style of student assessment defines the de facto curriculum. (Rowntree, 1987)
de facto= exisAng in fact, actual, whether intended or not
Targets
1. My background 2. Classroom based assessment 3. Tests ‐ purposes/ func%ons 4. The ‘cardinal criteria’ for evalua%ng a test 5. Conclusions
25
Quiz Ame!
26
Assessing an English arAcles quiz Context • ConversaAon class (listening & speaking) • high‐beginner level 27
What is a fundamental problem with this quiz?
28
Answer
29
Targets
1. My background 2. Classroom based assessment 3. Tests ‐ purposes/ func%ons 4. The ‘cardinal criteria’ for evalua%ng a test 5. Conclusions
30
What is a test?
31
A test . . . • is a method of measuring a person’s ability, knowledge, or performance in a given domain. • is an instrument – a set of techniques, procedures, or items – that requires performance on the part of the test‐taker. 32
Tests – measuring func%on
33
A test must measure • Some tests measure general ability, while others focus on very specific competencies or objecAves. • Examples • A mulA‐skill proficiency test measures general ability; • a quiz on recognizing correct use of definite arAcles measures very specific knowledge. 34
• A test measures performance, . . .
• but, the results imply the test‐ takers ability, or competence.
35
• Performance‐ based tests sample the test‐ takers actual use of language, • but from those samples the test administrator infers general competence. 36
• A well‐constructed test is an instrument that provides an accurate measure of a test‐taker’s ability within a parAcular domain.
• Construc%ng a good test is a complex task. 37
Your assessment prac%ces? 38
Think about what is happening in your context and your assessment pracAces
Your assessment pracAces? • • • • • • • • • •
True–False Item MulAple Choice CompleAon Short Answer Essay PracAcal Exam Papers/Reports Projects QuesAonnaires PresentaAons
• Inventories • • • • • • • •
Checklists Peer RaAng Self RaAng Journals Porkolios ObservaAons Discussions Interviews
For you, which of the four skills are more/less challenging to test? 41
Targets
1. My background 2. Classroom based assessment 3. Tests ‐ purposes/ func%ons 4. The ‘cardinal criteria’ for evalua%ng a test 5. Conclusions
42
Quiz Ame!
43
2010
• Exploring how principles of language assessment can and should be applied to formal tests. • These principles apply to assessment of all kinds. • How to use these principles to design a good test. 45
• What are the ‘five cardinal criteria’ that can be used to design and evaluate all types of assessment? 46
Q. How do you know if a
test is effecAve, appropriate, useful, or, in down‐to‐earth terms, a “good” test? 47
Five key assessment principles?
• Discuss • 3 minutes • Hint (five nouns) 48
Five key assessment principles
• PracAcality • Reliability • Validity • AuthenAcity • Washback 49
50
Key Assessment Principles
• These quesAons provide an excellent criterion to evaluate the tests we design and use. 52
53
1. PracAcality • Is the procedure relaAvely easy to administer?
54
Prac%cality considera%ons • the logisAcal and administraAve issues involved in making, giving and scoring an assessment instrument • the amount of Ame it takes to construct and administer • the ease of scoring • ease of interpreAng/reporAng the results 55
An effecAve test is prac%cal. This means that it: • is not excessively expensive • stays within appropriate Ame constraints • is relaAvely easy to administer, and • has a scoring/evaluaAon procedure that is specific and Ame efficient
56
The value and quality of a test someAmes hinge on such ni`y‐gri`y prac%cal considera%ons. 57
• In classroom based tesAng, _________ is almost always a crucial pracAcal factor for busy teachers. 58
59
2. Reliability • Is all work being consistently marked to the same standard? 60
• A reliable test is consistent and dependable. • If you give the same test to the same student or matched students on two different occasions, the test should yield similar results. 61
What factors contribute to the unreliability of a test? 62
Test Unreliability‐ contribuAng factors • Student related reliability • Rater reliability (inter, intra) • Test administra%on reliability • Test reliability 63
Q. What is one key way to increase reliability?
A. Use rubrics 64
• Rubrics are scoring guidelines. • They provide a way to make judgments fair and sound when assessing performance. • A uniform set of precisely defined criteria or guidelines are set forth to judge student work. 65
66
3. Validity • Does the assessment measure what we really want to measure?
‐ most complex criteria ‐ most important principle
Validity ‐ definiAon • ‘The extend to which inferences made from assessment results are appropriate, meaningful, and useful in terms of the purpose of the assessment.’ (Gronlund, 1998, p. 226) 68
• A valid test of reading ability . . . • actually measures reading ability – • not math skills • or previous knowledge in a subject • nor wriAng skills • nor some other variable of quesAonable relevance 69
How is the validity of a test established? 1. Content validity 2. Face validity 70
Content validity • If a test requires the test‐taker to perform the behavior that is being measured. . . • it can claim content‐related evidence of validity (content validity) • e.g. A test of a person’s ability to speak an L2 requires the student to actually speak within some sort of authenAc context. • A test with paper and pencil mulAple choice quesAons requiring grammaAcal judgments does not achieve content validity. 71
• direct tes%ng – involves the test‐taker in actually performing the target task • indirect tes%ng‐ students not performing the task itself, but a related task.
Another way of understanding content validity is to consider the difference between direct • e.g. tes%ng oral and indirect produc%on of syllable stress tesAng.
72
To achieve content validity in classroom assessment, try to test performance directly. 73
74
How is the validity of a test established? 1. Content validity 2. Face validity 75
Face validity • The extent to which students view the assessment as: 1. fair 2. relevant 3. useful for improving learning • Face validity refers to the degree to which a test looks right, and appears to measure the knowledge or abiliAes it claims to measure. 76
High face validity: the test . . . • is well‐constructed, expected format with familiar tasks • is clearly doable within alloued Ame • has items that are clear and uncomplicated • direcAons that are crystal clear • has tasks related to course work (content validity) • has a difficulty level that presents a reasonable challenge 77
• Most significant cardinal principle of assessment evalua%on. • If validity is not established, all other consideraAons may be rendered useless. 78
79
4. AuthenAcity • Are students asked to perform real‐world tasks?
80
Test task authen%city • tasks represent, or closely approximate, real‐world tasks • the task is likely to be enacted in the “real world” • not contrived or arAficial 81
AuthenAcity checklist • Is the language in the test as natural as possible? • Are topics as contextualized as possible rather than isolated? • Are topics and situaAons interesAng enjoyable, and/or humorous? • Is some themaAc organizaAon provided, such as through a story line or episode? • Do tasks represent, or closely approximate, real‐world tasks? 82
83
5.Washback • Does the assessment have positive effects on learning and teaching? 84
Washback = the effect of tesAng on teaching and learning ‐ posi%ve washback ‐ nega%ve washback 85
Washback • Classroom assessment: the affects of an assessment on teaching and learning prior to the assessment itself (preparaAon) • Another form of washback=the informaAon that ‘washes back’ to students in the form of useful diagnoses of strengths and weaknesses. • Formal tests provide no washback if students receive a simple leuer grade or single overall numerical score. 86
A test that provides beneficial washback . . . • posiAvely influences what and how teachers teach • posiAvely influences what and how students learn • offers learners a chance to adequately prepare • gives learners feedback that enhances their language development • provides condiAons for peak performance by the learner 87
Teachers’ challenge • to create classroom tests that serve as learning tools through which washback is achieved 88
89
Targets
1. My background 2. Classroom based assessment 3. Tests ‐ purposes/ func%ons 4. The ‘cardinal criteria’ for evalua%ng a test 5. Conclusions
Q. How do you know if a test is
effecAve, appropriate, useful, or, in down‐to‐earth terms, a “good” test?
91
Answer. A ‘good’ test: • can be given within appropriate administraAve constraints, • is dependable, • accurately measures what you want it to measure, • the language in the test is representaAve of real‐world language use, and • the test provides informaAon that is useful for the learner. 92
• These principles will help you make accurate judgments about the English competence of your students. • They provide useful guidelines for evaluaAng exisAng tests, and designing our own. 93
Assessment Literacy Know‐how and understanding teachers need to assess students effec%vely and maximize learning
• There is no gewng away from the fact that most of the things that go wrong with assessment are our fault, • the result of poor assessment design‐ and not the fault of our students. (Race et al., 2005)
• Improving student learning implies improving the assessment system. • Teachers oxen assume that it is their teaching that directs student learning. • In pracAce, assessment directs student learning, because it is the assessment system that defines what is worth learning. (Havnes, 2004, p.1)
(Boud & Falchikov, 2007) • There is substanAal evidence that assessment, rather than teaching, has the major influence on students’ learning. • It directs auenAon to what is important, acts as an incenAve for study, and has a powerful effect on student’s approaches to their work. Rethinking Assessment in Higher Educa0on
“We owe it to ourselves and our students to devote at least as much energy to ensuring that our assessment practices are worthwhile as we do to ensuring that we teach well”. Dr. David Boud,
University of Technology, Sydney, Australia
98
Thank you for your Ame and parAcipaAon. Best wishes with your assessment pracAces.
The End 101