How to create a new English test – practical process National ELT Conference Bogotå 2011 Chris Hurling chrishurling@gmail.com
A new test that has had far reaching benefits • • • • • • •
The need for change Terms of reference Item writing Piloting Statistical analysis Documenting the change Change management/backwash
Terms of Reference
The need for change • • • •
English Graduation Exam About 600 tests taken every year A ‘high stakes test’ Issues with old-style test: – Content validity – Construct validity – Criterion validity – Reliability
Terms of Reference
Test specification • • • • • •
Test skills: reading, writing, speaking Reading & writing 2 hr, speaking 20 min Pass level = mid CEF C1 Rubrics to assess writing and speaking Wider range of reading items types Paper based
Terms of Reference
Getting around constraints Constraints
Solutions
• Knowledge • Experience • Resources
• Background reading • Benchmarking • Voluntary participation
Terms of Reference
Simple project structure Sponsor (e.g. Director) Consultative Group Project Manager
Project Team
Terms of Reference
Valid sample of writing
• • • • •
Test only writing ability More than one sample Authentic tasks (genre) Restrict candidates/no choice of tasks Long enough samples
Terms of Reference
Valid sample of speaking
• Interactive – Transactional or Interpersonal
• Plan and structure the test carefully • Non-sensitive & non-academic topics
Terms of Reference
The more scores, the more reliable the test for a candidate* Holistic rubric
Analytical rubric
• • • • • •
• • • • • •
Example: TOEFL Impressionistic Quick to do Reliable (4 different raters) Sub-skills not rated No analysis for beneficial backwash
Example: IELTS Sum of the parts Takes more time 4 or 5 scores per text A score for each sub-skill Beneficial backwash for teaching
* Hughes A, 2003
Item writing
Define rules for selection and editing of the reading texts Text 1
Text 2
• 500 – 700 words • Limit challenging words • Title • 10 Cloze questions
• 700 – 900 words • Limit challenging words • Title • Max 12 paragraphs • 15 items • Specified range of items
Item writing
Genre of text influences content validity Don’t
Do
• underestimate the time • use texts for special genre – e.g. internet • select topics that will date quickly • select sensitive topics
• use electronic versions • newspapers, specialist magazines • select texts longer than the final text length
Item writing
Deciding item types (reading) IELTS/ TOEFL
Item type
Skill tested
Sentence completion
Locate & understand information
3
4
3
10
IELTS
Short answer
Locate & understand information
3
3
4
10
TOEFL
Referencing (multiple choice)
Cohesion of ideas
3
3
4
10
TOEFL
Scanning and Negative stem reading for m/choice detail
4
0
1
5
IELTS
Ease of question
Students Ease to prepared write
Total score
Item writing
Moderating test items First item writer • Write items • 2 x required number • Reject items
Consultative group
Second item writer
• Attempt items • Amend/reject • Feedback to item writers
• Attempt item • Amend/reject • Feedback to first item writer
Pilot test(s)
Pre-testing on real test-takers 85 Students 16
25
IELTS 1 16
IELTS 2
Grad Exam
8 8
13
TOEFL 1 TOEFL 2 Control
Analyse results
How do you feel about statistics?
Analyse results
Which items work best? Item Facility (IF) - difficulty IF =
__number of Ss answering item correctly_ total number of Ss responding to that item
Item Discrimination (ID) - differentiation ID =
(high group # correct) – (low group # correct) ½ (total # Ss in the high and low groups)
Analyse results
Selecting the best items Item Facility – Cloze questions
Item IF
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15
.54 .96 .98 .61 .88 .71 .93 .89 .76 .48 .71 .88 .60 .74 .84 Item Facility and Item Discrimination compared – Summary completion
Item
23
24
25
26
27
High group correct answer
1
12
12
13
8
Low group correct answer
0
9
8
5
8
Item Discrimination (ID)
.08
.23
.31
.62
.00
Item Facility (IF)
.08
.88
.83
.75
.70
Analyse results
Acceptable concurrent validation
Met standard in both tests Failed to meet standard in both tests IELTS/TOEFL mock tests
Passed TOEFL/IELTS, failed pilot
No correlative data
Failed TOEFL/IELTS, passed pilot
Document new test
Document the new test • Create test-writers manual – Guidelines for writing items – Specify item instructions for candidates • Create instructions for the examiners – Include grade reporting • Create test-takers information – Sample exam with answers – How to prepare for the new exam
Change Management
Manage the change • Training for teachers and test-writers – 1 day course – Awareness of new test – How to write some of the items – How to score the new test
• Devise examiner calibration session • Socialise the change to test-takers
Change Management
Beneficial backwash curriculum • • • • •
Highest level course from CEF C1 to B2+ ‘Best in class’ material selection Revision of syllabus for each level Exam training built into curriculum Exam preparation course refreshed
Change Management
Beneficial backwash assessment • New style items incorporated into progress and summative tests • Exam specifications written for all levels • Exam-writers trained • Teachers trained how to score tests • Teachers trained to do more valid formative testing
Change Management
Beneficial backwash – teaching methodology • Product and process approaches to teaching writing skills • Teachers trained on how to teach other skills • Upgrade in teacher training sessions
A practical process & a new test with beneficial backwash
ToR
Specific ation
Item Writing
Pilot Testing
Analyse results
Docume nt
Manage change
References: Brown D, 2004. Language Assessment. New York: Pearson Education Ltd Fulcher G and Davidson F, 2007. Language Testing and Assessment. Abingdon: Routledge Gear J & Gear R, 2006. Cambridge Preparation for the TOEFL Test 4th Edn. Cambridge: CUP Hughes A, 2003. Testing for Language Teachers. Cambridge: Cambridge University Press IELTS official website, www.IELTS.org ETS official website, www.ets.org
Backwash