IAEA Conference Program by The Achievement and Assessment Institute

The Center for Educational Testing & Evaluation at the University of Kansas is pleased to host the 41st annual IAEA conference.

T HE I NT E RNAT I ONAL AS S OC I AT I ON FOR E D U C AT I ONAL AS S ES S M ENT C ONFERENC E 1 1 â&#x20AC;&#x201C;16 OC T OB ER 2015 T H E UN I V E R S I T Y O F K A N S A S / / L AW R ENC E, K ANSAS, U SA // W W W.IAEA2 015.O R G // # IAEA2 015

T h e T h r e e M o st I mp or t a nt C onsi d era t i ons i n Test i ng: Va l i di t y, Va li d i t y, Va li d i t y

CO NT ENTS

IAEA 2016 CONFERENCE

THE INTERNATIONAL ASSOCIATION FOR EDUCATIONAL ASSESSMENT

WELCOME MESSAGE BY THE PRESIDENT OF IAEA

WELCOME MESSAGE BY NEAL KINGSTON OF AAI

KEYNOTE SPEAKERS

CONFERENCE VENUE

EVENING EVENTS

TRANSPORTATION

GET CONNECTED

NEED-TO-KNOW

CONFERENCE PROGRAM OUTLINE

FULL AGENDA

ABSTRACTS

CONFERENCE ATTENDEES

SPONSORS

TH E I NTERN AT I O N AL AS S O C I ATIO N FO R E DUC ATIO NA L A SSE SSME NT

The International Association for Educational Assessment (IAEA), founded in 1975, is a not-for-profit, non-governmental association of educational measurement agencies around the world. The general purpose of IAEA is to improve the quality of education by assisting educational agencies in the development and appropriate application of educational assessment techniques. IAEA believes that this is best achieved through international cooperation and is committed to facilitating the development of closer ties among the relevant agencies and individuals around the world. IAEA believes that such international cooperation can help nations learn from each other with respect for their cultural autonomy. PRIMARY OBJECTIVES:

»»To improve communication among organizations involved in educational assessment by sharing professional expertise through conferences and publications, and by providing a framework in which cooperative research, training, and projects involving educational assessment can be undertaken. »»To make expertise in assessment techniques more readily available for the solution of educational problems. »»To cooperate with other agencies with complementary interests. »»To engage in activities for the improvement of assessment techniques and their appropriate use by educational agencies around the world. IAEA has consultative status with UNESCO in the achievement of mutual goals. MEMBERS SHALL BE ADMITTED TO THE ASSOCIATION IN ONE OF SIX CAPACITIES:

»»Full Institutional members »»Full Individual members »»Affiliate Institutional members »»Affiliate Individual members »»Associate members »»Honorary members EXPLANATION OF THE TYPES OF MEMBERSHIP:

Full Institutional membership (a) is meant for not-for-profit organizations whose primary objective is educational assessment. For-profit organizations are not admitted as Full Institutional members, but as Affiliate Institutional members (c), whose work relies for a major part on educational assessment techniques, or financial agencies allocating a large part of their budgets to projects involving educational assessment. The Board of Trustees has the discretion to recommend the admission of for-profit organizations as Affiliate Institutional members, taking care to ensure that such recommendations are not excessively restrictive and that candidates for membership have a genuine interest in educational assessment, as well as technical and professional competence in the field, and show willingness to cooperate actively towards the achievement of the Association’s objectives. Full Individual Membership (b) is meant for individuals with a professional interest in educational assessment and for active professionals in the field of educational assessment. Affiliate Individual membership (d) is meant for students or pensioners.

THE BENEFITS OF IAEA MEMBERSHIP ARE:

»»Access to all papers presented at the annual conferences of the IAEA from 2006 onwards. »»Free subscription to the journal Assessment in Education, except for Affiliate Individual Members. »»Reduced conference registration fee (± 10%). »»Access to relevant membership data through our website enables members to find and contact colleagues all over the world. »»Full members can publish job postings and press releases through the IAEA website »»Full members have voting rights at the general meeting of members at the annual conference. MEMBERSHIP FEES AND BENEFITS FU LL INST IT U T IONAL ME MBERS

F UL L IND IV ID UAL MEMBERS

AF F IL IAT E INST IT UT IONAL MEMBERS

AF F IL IAT E IN D I V I D U A L MEMBERS

A N N UA L FEE

$450

$160

$500

$40

2 Y EA R S

$850

$310

$950

$80

3 YEA R S

$1. 200

$450

$1.350

$120

FU LL INST IT U T IONAL ME MBERS

F UL L IND IV ID UAL MEMBERS

AF F IL IAT E INST IT UT IONAL MEMBERS

AF F IL IAT E IN D I V I D U A L MEMBERS

R EDUC ED C ON FER EN C E FEE ( ± 1 0 %)

YES

F REE S UB S C R I PTIO N TO A S S ES S M ENT IN E D UC ATI ON

1 COPY

1 C OP Y

V OTI N G R I GHTS

YES

FEES

B ENEFI T S

All applications for membership are screened by the Board of Trustees, which may grant provisional membership. Applications for permanent membership are decided by the Full members of IAEA at the first annual meeting following the application. Questions of eligibility for membership are resolved by the Board of Trustees. OFFICE HOURS TREASURER / MEMBERSHIP SECRETARY

During the conference the IAEA Treasurer (Mr. Jan Wiegers) is available for membership issues (membership application, payment of membership fees). The treasurer can be found at the International Room of the Kansas Union. M O N DAY, OC TOBER 12 TUES DAY, OC TOBER 13 W EDN ES DAY, OCT OBER 14 THUR S DAY, OC TO BE R 15

3: 30 – 5: 00 10: 30 – 11: 00 10: 30 – 11: 00 10: 30 – 11: 00

W EL COME M E S S AG E BY T H E P R E SIDE NT O F IA E A It is my pleasure on behalf of the Board of Trustees to welcome you to the annual conference of the International Association for Educational Assessment. This year’s conference marks the first time in over a decade that the conference has been held in the United States. That location is significant because new curriculum standards have been introduced in many US states, new assessment systems have been fielded to measure those standards, and new controversies have arisen about the standards, the assessments, and education generally. For me, the conference theme “Validity, Validity, Validity” gets to the heart of the issue: How do we design, deliver, evaluate, and use assessments so that they affect teaching and learning positively? That end goal is, after all, what the IAEA is about. It has been forty-one years of IAEA conferences bringing together educators from across the globe. Let us continue our tradition this year of striving to improve educational assessment and, through it, education for all. We thank you for coming from far and wide to attend the conference and express our deepest appreciation to the Center for Educational Testing and Evaluation at the University of Kansas for hosting this event. L ENORE D E LA L LAN A- DEC ENT ECEO President, IAEA

CO NFER ENC E T H E M E THE THREE MOST IMPORTANT CONSIDERATIONS IN TESTING: VALIDITY, VALIDITY, VALIDITY

Validity – the extent to which inferences drawn from test scores are appropriate – is by far the most important technical characteristic of a test. But because it is much more challenging to establish validity than other test desiderata, it has gotten short shrift in most testing programs. How can advances in technology, test development, psychometrics, and score reporting improve the validity of our testing programs? CONFERENCE SUB-THEMES USING TECHNOLOGY TO IMPROVE VALIDITY

Technology makes feasible new item types, gaming, simulations, work samples, and tracking of all test taker interactions with the computer, but will this improve the quality of the inferences we want to make? IMPROVING TEST-DEVELOPMENT PROCEDURES T O IMPROVE VALIDITY

Evidence-centered design, assessment engineering, and other modern test-development procedures hold much promise, but what is the evidence that tests developed using these procedures yield more useful scores? SCORE REPORTING: THE CORNERSTONE OF VALIDITY

Without understandable, actionable score reports helping users make better inferences, tests are of very limited use. Can we design score reports to enhance validity? 6

W EL COME M E S S AG E BY D I R E C TO R O F THE UNIVE R SITY O F K A NSA S A CH I EV EME N T AN D AS S E S S M ENT INSTITUTE , NE A L K ING STO N It gives me great pleasure to welcome IAEA 2015 Conference delegates to Lawrence, Kansas, USA, where we host the IAEA’s 41st annual conference. We celebrate IAEA’s forty-one years as an international testing organization and the important role it has had in establishing strong ties among agencies and institutions that influence and serve educational systems throughout the world. The University of Kansas also has a long history in educational testing. In fact, this year marks the 100th anniversary of the first use of selected response questions in a large-scale standardized assessment, the Kansas Silent Reading Test of 1915. This test was developed by Frederick Kelly, a student of E.L. Thorndike and the third dean of the University of Kansas School of Education. In the last century we have built on that early work, and through the Center for Educational Testing and Evaluation (one of four centers within the Achievement and Assessment Institute) we design and implement testing programs in our home state of Kansas and 17 other US states. The theme for the 2015 IAEA Conference, “The three most important considerations in testing: validity, validity, validity,” is an important reminder that while other characteristics of tests (such as scales, scale maintenance, and reliability) may be required to support valid inferences and intended consequences, these technical characteristics are not an end in and of themselves. Without validity the testing enterprise has nothing. In the recent past there have been many advances in validity theory and practice. This year we will hear from representatives from many countries as they present their experiences and views regarding producing more valid results from their testing programs. I wish all delegates a fruitful and enriching time of learning, sharing, and networking. For our friends from across the globe, I hope you enjoy this conference and your time in Lawrence, Kansas. N E AL K ING S T ON Director of the Achievement and Assessment Institute University of Kansas

H OST CETE is an internationally recognized research center housed within the Achievement & Assessment Institute at the University of Kansas. CETE specializes in large-scale assessment and online test delivery systems. For more than 30 years CETE researchers have developed cutting-edge testing programs and technology tools for all students. CETE is proud to host the IAEA 2015 Conference.

KE Y NOT E S P E AK E R S

HANS ROSLING, STOCKHOLM, SWEDEN

“A Fact-Based Worldview” Dr. Rosling is a professor of global health at Sweden’s Karolinska Institute and Edutainer of Gapminder Foundation. He is a Swedish medical doctor, academic, statistician, and public speaker. Using his animations of global trends, Dr. Rosling makes trend data on economic, social, and environmental changes in the world understood. His award-winning lectures and videos on have been labeled “humorous, yet deadly serious.” His main message is that there are no longer two types of countries in the world. The old division into Developed and Developing countries has been replaced by countries on a continuum of social and economic development. LORRIE SHEPARD, BOULDER, COLORADO, USA

“Designing, Linking, and Evaluating Validity for Formative and Large-Scale Assessments” Shepard is a Distinguished Professor of Research & Evaluation Methodology and Dean of the School of Education at the University of Colorado Boulder. Her research focuses on psychometrics and the use and misuse of tests in educational settings. In her keynote presentation, Dr. Shepard will summarize what should be the same and what should be different in the design of classroom-level and large-scale assessment, focusing on the specific examples of formative assessment in classrooms and then on large-scale accountability testing. She will propose strategies for designing linkages between the two so that learning gains in classrooms, enabled by formative assessment, will also lead to authentic improvements in performance on large-scale assessments. To fit the conference’s theme, Shepard’s examples will illustrate why validity evaluations, like test design, must be focused on the intended use of the test. Considerations of similarity between the two will be presented, starting with domain specifications or learning progressions, but, ultimately, she will demonstrate why much must be different.

VENUE

The IAEA 2015 Conference will be held on levels 5 and 6 of the Kansas Union on the campus of the University of Kansas. The University of Kansas is the heart of Lawrence, a community of about 90,000 located in the forested hills of northeastern Kansas. LEVEL 5

LEVEL 6

EV ENI NG E V E N T S

MONDAY, OCTOBER 12, 2015

Behind-the-Scenes Tour KU Natural History Museum & Biodiversity Institute 1345 Jayhawk Boulevard Lawrence, Kansas Time: 6:15 PM Cost: $35 www.naturalhistory.ku.edu Enjoy a selection of heavy appetizers while mingling and viewing the Museum’s world-famous Panorama, an American cultural treasure—a 360-degree-view exhibit that was part of the official Kansas Pavilion in the 1893 World’s Columbian Exposition in Chicago. The evening features small-group behind-the-scenes tours of the Biodiversity Institute’s Vertebrate Paleontology and Herpetology laboratories. The KU Herpetology Laboratory’s actively growing collections began in 1900 and, with more than 350,000 entries, are among the largest research collections of reptile and amphibian specimens in the world. Topics of research include global diversity, evolution, geography, genomics, morphology, conservation, ecology and behavior. The KU Vertebrate Paleontology Laboratory’s actively growing research collections began in 1890, now numbering about 150,000 prehistoric vertebrate specimens and their associated data. Topics of research include global diversity, phylogeny, macro-evolutionary patterns, historical biogeography, morphology, paleoecology and behavior. MONDAY, OCTOBER 12, 2015

Specialty Wine Dinner The Oread Hotel 1200 Oread Avenue Lawrence, Kansas Time: 6:30 PM Cost: $75 www.theoread.com Enjoy a four-course dinner with wine pairings, served on the 5th-Floor Terrace of The Oread Hotel, offering spectacular views of the city and valley below. In case of inclement weather, the dinner will move inside to the All Seasons Den. Menu Passed Appetizer: Caprese Terrine with Basil Oil & Balsamic Reduction, paired with a Sauvignon Blanc. Salad: House Salad with Mixed Greens, Cucumber, Tomato, Crouton, Parmesan, Raspberry Vinaigrette, paired with a Chardonnay. Entrée: Herb-Encrusted Salmon over Carrot-Infused Risotto, with blistered Green Beans and Lemon Chili Vinaigrette, paired with a Pinot Noir. Dessert: Chocolate Bread Pudding with Crème Anglaise, paired with a Merlot.

EV ENI NG E V E N T S

TUESDAY, OCTOBER 13, 2015

An Evening at the American Jazz Museum The American Jazz Museum 1616 East 18th Street Kansas City, Missouri Time: 5:45-10:45 PM Cost: $85 www.americanjazzmuseum.org Since the birth of jazz—America’s only indigenous art form—and its subsequent journey across the globe, certain cities have put distinctive stamps in its sound, history and development. Kansas City is one of the cradles and greatest purveyors of jazz, and it continues to offer fertile ground for the music to thrive, with a museum befitting that stature. Located in the Historic 18th & Vine Jazz District, the American Jazz Museum is the only museum in the world solely focused on the preservation, exhibition and advancement of jazz music, showcasing the sights and sounds of jazz through interactive exhibitions and films and live performances. The event price includes transportation, a barbecue dinner, a guided tour of the Museum, and a live jazz band. A cash bar will be available. Transportation departs from the Kansas Union at 5:45 PM and returns to Lawrence by 10:45 PM. Upon return, guests will be dropped off at their hotels.

TRA NSP OR TAT I O N

Transportation will be provided during the conference to and from the following hotels beginning at 5pm on Sunday. Specific pick-up times will be emailed to attendees before the conference so they can plan accordingly. Best Western Hotel 2309 Iowa Street Lawrence, KS 66046

Hampton Inn 2300 W 6th Lawrence, KS 66049

The Oread Hotel 1200 Oread Avenue Lawrence, KS 66044

Holiday Inn Conference Center 200 McDonald Drive Lawrence, KS 66044

Eldridge Hotel 701 Massachusetts Street Lawrence, KS 66044

Springhill Suites by Marriott 1 Riverfront Plaza Lawrence, KS 66044

Towne Place Suites by Marriott 900 New Hampshire Street Lawrence, KS 66044

GET CONNE C T E D NAME BADGES

Name badges are not only a distinctive fashion statement, they are an important way for conference staff to identify conference attendees. Please wear your name badge in all sessions and excursions. SOCIAL MEDIA

Use conference #IAEA2015 on Twitter, Instagram and Facebook to share conference photos! Connect with the conference host on Twitter @CETEmedia www.iaea.info

NEED - T O- KN O W

Local Time »»Kansas time zone is: Central Time Zone UTC -6:00 Electricity »»Electricity equipment in US is 110-120 volts, Type B North American NEMA 5-15 standard. »»Does your device need power? Ask a staffer, and we’ll direct you to a charging station. Currency »»The currency in the US is the US Dollar (USD). We recommend using the local currency, although most expenses in the US may be paid with major credit cards. Most ATMs accept all major credit cards, such as Master Card, Visa, and American Express. Climate »»Weather in Lawrence in October is usually pleasant. Temperatures range from 50 to 75° F (10 to 24° C) Language »»The official language of the US and the conference is English Telephones »»The country code for Lawrence, KS +1-785 Liability and Insurance »»The Conference Organizers cannot accept liability for personal injuries sustained, or for loss or damage of property belonging to conference participants (or their accompanying persons), either during, or as a result of, the conference. Participants are advised to obtain their own personal health and travel insurance for their trip.

CO NFER ENC E P R O G R AM O U T LINE

DAY TIME EVENT

Sunday, October 11

All Day

IAEA Executive Committee Meeting, Oread Gathering Rm 2

9:00 – noon

Pre-conference workshop I, Kansas Room

Development and validation of diagnostic reporting in student monitoring systems (CITO)

1:00 – 4:00 p.m.

Pre-conference workshop II, Kansas Room

Fundamentals of diagnostic classification modeling (CETE) 5:00 – 7:00 p.m. Registration, 5th floor lobby

7:00 – 8:30 p.m.

Opening reception, Ballroom

Monday, October 12

7:00 a.m.

Registration opens, 5th floor lobby

7:30 – 8:30 a.m.

Buffet breakfast, Ballroom

8:30 – 9:15 a.m.

Opening ceremony, Woodruff Auditorium

9:15 – 10:45 a.m.

Opening keynote - Lorrie Shepard, Woodruff Auditorium

10:45 – 11:15 a.m.

Coffee break, Big 12/Jayhawk

11:15 a.m. – 12:30 p.m. Session I

12:30 – 2:00 p.m.

Lunch, Ballroom

2:00 – 3:30 p.m.

Session II

3:30 – 4:00 p.m.

Coffee Break, Big 12/Jayhawk

4:00 – 5:30 p.m.

Session III

6:15 – 9:30 p.m.

Evening activities

Tuesday, October 13

8:00 – 9:00 a.m.

Buffet breakfast, Ballroom

9:00 – 10:30 a.m.

Plenary session, Woodruff Auditorium

Testing as a Positive Force: Changing the Reality and the Perception

10:30 – 11:00 a.m.

Coffee break, Big 12/Jayhawk

11:00 a.m. –12:30 p.m. Session I

12:30 – 2:00 p.m.

Lunch, Ballroom

2:00 – 3:30 p.m.

Session II

3:30 – 4:00 p.m.

Coffee break, Big 12/Jayhawk

4:00 – 5:30 p.m.

Session III

5:45 – 10:45 p.m.

Evening activities

DAY TIME EVENT

Wednesday, October 14 8:00 – 9:00 a.m.

Buffet breakfast, Ballroom

9:00 – 10:30 a.m.

Session I

10:30 – 11:00 a.m.

Coffee break, Big 12/Jayhawk

11:00 a.m. – 12:30 p.m. Session II

12:30 – 2:00 p.m.

Lunch, Ballroom

2:00 – 3:30 p.m.

Session III

3:30 – 4:00 p.m.

Coffee break, Big 12/Jayhawk

4:00 – 5:00 p.m.

Session IV

5:05 – 6:20 p.m.

IAEA business meeting, Pine

GALA DINNER

7:00 – 7:30 p.m.

Cocktails and hors d’oeuvres

Ballroom

7:30 – 9:00 p.m.

Gala dinner

9:00 – 10:15 p.m.

Performance by Quixotic, Woodruff Auditorium

10:15 – 10:45 p.m.

Refreshments

Thursday, October 15

8:00 – 9:00 a.m.

Buffet breakfast, Ballroom

9:00 – 10:30 a.m.

Keynote - Hans Rosling, Woodruff Auditorium

10:30 – 11:00 a.m.

Coffee break, Big 12/Jayhawk

11:00 a.m. – noon

Closing Ceremony, Woodruff Auditorium

Noon – 1:30 p.m.

Lunch, Ballroom

1:30 – 4:30 p.m.

Post-conference workshop I, Kansas Room

Test development in a learning maps environment (CETE)

Friday, October 16

Post-conference workshop II, Jayhawk Room

9:00 a.m. – noon

Use of learning maps to support formative assessment in mathematics (CETE)

F U L L A GEN D A

MONDAY, OCTOBER 12: SESSION I (11:15 A.M. – 12:30 P.M.) //////////////////////////////////////////////////////////////////////////////////////////////////

Validity and Standards, Kansas Room PAPER 1:

CHAIR: WILLIAM SKORUPSKI

Validation of the standard of Hong Kong Diploma of Secondary Education Examinations (HKDSE) Guanzhong Luo, Hong Kong Examinations and Assessment Authority

Computer-based Testing and Validity, Malott Room

CHAIR: GAIL TIEMANN

Teachers’ Perception and Understanding of E-testing as a Means of Enhancing the Validity of Assessment of Students’ Academic Achievement

Prof. Uwakwe O. Igbokwe, Abia State University, Uturu, Nigeria

PAPER 1:

Using Computer Based Test as an Assessment Tool to Improve Validity in E-learning: A Study of National Open University of Nigeria

Nwamaka Patricia Ibemen, Ph.D., National Open University of Nigeria

PAPER 2:

Technology and its Impact, Centennial Room PAPER 1:

CHAIR: JONATHAN TEMPLIN

Staff Perception of the Adoption of ICT for the Management of Data on Examination Malpractice Mrs. Olayinka Omoladun Ajibade, Deputy Registrar,West African Examinations Councils, Nigeria

Transitioning to E-marking to Maximise Operational Efficiencies: A Caribbean Examinations Council Case Study

Jonathan Hale, RM Results, Milton, Oxfordshire, United Kingdom

PAPER 2:

MONDAY, OCTOBER 12: SESSION II (2:00 – 3:30 P.M.) //////////////////////////////////////////////////////////////////////////////////////////////////

Validity Evidence at the Item Level, Kansas Room

Judging the Quality of National Learning Assessments in Trinidad and Tobago: Exploring Item and Test Validity

Jerome De Lisle, The University of the West Indies at St. Augustine,Trinidad and Tobago

PAPER 1:

PAPER 2:

The Evolution of Validity and Modern Psychometrics: Do We Need to Revisit Item Validity? Charles Secolsky, Mississippi Department of Education

Frameworks and Models to Help Improve Validity, Malott Room PAPER 1:

PAPER 2:

CHAIR: STUART SHAW

CHAIR: JAY WIEGERS

From Raw Score to a Multi-Faceted Rasch Modeling Approach: Raters, Ratings and Scales Che Yee Lye, School of Education, University of Adelaide, Australia Evaluating the Impact of the Bahrain National Examinations Basma Alsadeq, National Authority for Qualifications and Quality Assurance of Education and Training

#I AEA2 0 1 5

A Framework for Providing Evidence-Based Vaolidity and Reliability of the Unified Tertiary Matriculation Examination (UTME)

Ann Momoh, Joint Admissions Matriculation Board (JAMB), Abuja, Nigeria

PAPER 3:

Validity in a Formative Environment, Centennial Room

CHAIR: LUCKY DIAUNYANE

“Teachers in the Dark.” Why do they Not Find Coherence Between Large-Scale Assessment Programs and Classroom-Based Assessment?

Serafina Pastore, Ph.D., University of Bari Aldo Moro, Italy

PAPER 1:

PAPER 2:

What Teachers Know about Validity of Classroom Tests: Evidence from a University in Nigeria Christiana Ugodu, University of Jos, Nigeria

Validity of Inferences from Computer-Based Tests, English Room PAPER 1:

PAPER 2:

PAPER 3:

CHAIR: MOSES TJIRARE

Evaluation of Validity of Computer-Based Test Items in National Open University of Nigeria Charity Akuadi Okonkwo, National Open University of Nigeria Trialing Adaptive Comparative Judgment with Long Essay Type Responses Antony Furlong, International Baccalaureate Assessing ICT Literacy via Computer Ray Philpot, Australian Council for Educational Research (ACER)

MONDAY, OCTOBER 12: SESSION III (4:00 – 5:30 P.M.) //////////////////////////////////////////////////////////////////////////////////////////////////

Broad Questions Related to Validity, Kansas Room PAPER 1:

PAPER 2:

CHAIR: JO-ANNE BAIRD

Utility vis-à-vis Validity Charlie DePascale, Center for Assessment, Dover, New Hampshire, United States Is it Valid to Treat Assessment Grades from Different Subjects the Same? Dennis Opposs, Office of Qualifications and Examinations Regulation (Ofqual)

Using Technology to Improve Validity, Malott Room

CHAIR: RUSSELL SWINBURNE ROMINE

The Impact of Technology on the Validity of Assessment in Large Scale Public Examinations: The West African Examinations Council Experience

C.M. Eguridu, West African Examinations Council (WAEC), Nigeria

PAPER 1:

Towards Improving Validity of Examinations in Developing Economies: Application of Technology in Kenya’s Context

Kennedy Ondara Okemwa, The Kenya National Examinations Council

PAPER 2:

F U L L A GEN D A

How Do You Use Technology to Improve Flexibility in Assessment and Maintain High Standards of Validity?

Steve Harrington, RM Results, Milton, Oxfordshire, United Kingdom

PAPER 3:

Philosophical Issues Regarding Validity, Centennial Room

CHAIR: ELIZABETH OBADE

What Makes for a Sound Validity Argument? Exploring Criteria for Evaluation the Strength of Validation Evidence

Stuart Shaw, Cambridge International Examinations, Cambridge Assessment

PAPER 1:

PAPER 2:

Regulating for Validity: Laying the Foundation Paul Newton, Ph.D., Office of Qualifications and Examinations Regulation (Ofqual)

Validity: From Planning to Implementation, English Room PAPER 1:

PAPER 2:

CHAIR: NNEKA UMEZULIKE

On Assessment Literacy Damian Betebenner, Center for Assessment A Preoperative Index for Construct Validity Ali Baykal, Bahcesehir University, Istanbul,Turkey

TUESDAY, OCTOBER 13: SESSION I (11:00 A.M.â&#x20AC;&#x201C; 12:30 P.M.) //////////////////////////////////////////////////////////////////////////////////////////////////

CHAIR: NGONDI KAMATUKA Interactions Between Testing Programs and Cultural and Political Environments, Kansas Room PAPER 1:

Conceptualization and Implementation of Continuous Assessment in Tanzania: Fit for the Purpose? Joyce L. Ndalichako, Aga Khan University, Institute for Educational Development, East Africa

Legal Precepts and Validity of Assessment Outcomes: A Case Study of the Joint Admissions and Matriculation Board (JAMB), Nigeria

Barr. Edward Mojiboye, Joint Admissions Matriculation Board (JAMB), Abuja, Nigeria

PAPER 2:

PAPER 3:

Opt Out: An Analysis of Issues Randy Bennett, Educational Testing Service (ETS), Princeton, New Jersey, United States

Test Development Methods, Malott Room PAPER 1:

CHAIR: HENK MOELANDS

Improving Validity of Tests Through Improved Test Development Procedures I.E. Anyanwu, Ph.D., Quality Assurance Dept., Minna, Niger State, Nigeria

Measuring Change in the Unified Tertiary Matriculation Examination (UTME) Use of English through Pre- and Post-Test Analysis

Patrick Onyeneho, Joint Admissions Matriculation Board (JAMB), Abuja, Nigeria

PAPER 2:

#I AEA2 0 1 5

Using Technology to Improve Validity of Test Items Development Procedures in E-assessment in NOUN

Charity Akuadi Okonkwo, National Open University of Nigeria

PAPER 3:

Validation Practices, Centennial Room PAPER 1:

CHAIR: FIONA ANDERSON

Trainer Practice Review to Validate the Certificate Venera Mussarova, AEO Nazarbayev Intellectual Schools

Validation of an Instrument for Assessing Users’ Perception about the Use of E-resources in South East University Libraries in Nigeria

Chukwudi Patrick Mensah, Michael Okpara University of Agriculture, Umudike, Abia State, Nigeria

PAPER 2:

University Matriculation Examination and Post-University Matriculation Examination Scores as Predictors of University Students’ Achievement in First Year University Degree Examinations in Nigeria

Ismail Junaidu, Nigerian Educational Research and Development Council (NERDC)

PAPER 3:

TUESDAY, OCTOBER 13: SESSION II (2:00 – 3:30 P.M.) //////////////////////////////////////////////////////////////////////////////////////////////////

Predictive Evidence of Validity, Kansas Room

CHAIR: MATHEUS HANGO

Predictive Validity of JAMB University Matriculation Examination and Post University Matriculation Examination Scores on Final Grade Point Average in Universities in South East Nigeria

Ngozi Akanwa, Joint Admissions Matriculation Board (JAMB), Abuja, Nigeria

PAPER 1:

Impact of Resource Deficiency on Predictive Validity: A Study of Factors Affecting Validity of Results in Selected Universities in Malawi

Madalitso Mukiwa, Exploits University, Lilongwe, Malawi

PAPER 2:

Improving Test Development, Malott Room PAPER 1:

CHAIR: SARAH MAUGHAN

Assessment of Nursing and Midwifery Students in Uganda: The UNMEB Experience Prof. Wilton Kezala, Uganda Nurses and Midwives Examinations Board (UNMEB)

The Factor Structure of the Giftedness Assessment Instrument (GAI) as an Identification Measure of Giftedness

Chidimma Adamma Anya, Federal University Gusau, Zamfara State, Nigeria

PAPER 2:

Ensuring the Validity of Outcomes of Technical Skill Assessment in Technology Education Institutions in Nigeria

Prof. Nkechi Patricia-Mary Esomonu, Nnamdi Azikiwe University, Awka, Anambra State, Nigeria

PAPER 3:

F U L L A GEN D A

Science Assessment, Centennial Room PAPER 1:

CHAIR: HELEN SIDIROPOULOS

Validity Issues in the Reform of Practical Science Assessment: An English Case Study Neil Wade, OCR

Assessment of Valid Science Practical Skills for Nigerian Secondary Schools: Teachers’ Practices and Militating Factors

Omaze Afemikhe, University of Benin, Benin City, Nigeria

PAPER 2:

Development and Validation of Biology Achievement Test (BAT) for Assessment of Students in Enugu State

Collens Ikechukwu Odo, Michael Okpara University of Agriculture, Umudike, Abia State, Nigeria

PAPER 3:

TUESDAY, OCTOBER 13: SESSION III (4:00 – 5:30 P.M.) //////////////////////////////////////////////////////////////////////////////////////////////////

Content Evidence of Validity, Kansas Room

CHAIR: RAY PHILPOT

An Evaluation of the Content Validity of the Test Items for Assessing Students’ Achievement in General Study Courses in National Open Unviersity of Nigeria

Prof. Uche Mercy Okonkwo, National Open University of Nigeria

PAPER 1:

Content Validation and Rubric Development of Language Tools Tested in Multi-State Multi-Language Project: Challenges Faced

Gayatri Vaidya, Large Scale Assessments, Educational Initiatives, Ahmedabad, Gujarat, India

PAPER 3:

Use of Item Analysis, Malott Room

CHAIR: EMMANUEL SIBANDA

Empirical Analysis of Item Difficulty Indices and Discrimination Power of Multiple Choice Biology Items

Adekunle Thomas Olutola, Ph.D., Federal University, Dutsin-Ma, Katsina State, Nigeria

PAPER 1:

PAPER 2:

Improving Validity of Test Items through Credible and Robust Trial Testing Exercise: NECO Approach Moses Oladipupo, National Examinations Council, Nigeria

Using Technology, Centennial Room PAPER 1:

CHAIR: WILSON MUGISHA

Integrating Technology into Mathematics Teachers’ Design and Use of Authentic Assessments Colleen Parks, Calgary Girls’ School and University of Calgary

Validity of Nigeria’s Unified Tertiary Matriculation Examination, Physics Computer-Based Tests: Threats and Opportunities

Patience Agommuoh, Ph.D., Michael Okpara University of Agriculture, Umudike, Abia State, Nigeria

PAPER 2:

Comparison of the Scores of Electronic and Manual Markings in Agricultural Science and Biology Practical in the West African Senior School Certificate Examination in Nigeria

Dr. Iyi Uwadiae, West African Examinations Council, Accra, Ghana

PAPER 3:

#I AEA2 0 1 5

Broader Implications of Validity, English Room

CHAIR: CATHY SCHULTZ

Validation in the Assessment and Certification System of School Teachers’ Professional Development - Experience of Kazakhstan

Venera Mussarova, AEO Nazarbayev Intellectual Schools

PAPER 1:

Evaluation of Sexuality Education Programme in the University of Ibadan: Implications for Test and Programme Validity

Francisca Chika Anyanwu, Ph.D., Department of Human Kinetics and Health Education, University of Ibadan, Nigeria

PAPER 2:

Constructing an Interpretive Argument for the Secondary Entrance Assessment in the Republic of Trinidad and Tobago

Jerome De Lisle, The University of theWest Indies at St. Augustine,Trinidad and Tobago

PAPER 3:

WEDNESDAY, OCTOBER 14: SESSION I (9:00 – 10:30 A.M.) //////////////////////////////////////////////////////////////////////////////////////////////////

Language and Validity, Kansas Room PAPER 1:

CHAIR: ANNE OBERHOLZER

Language Rich: Insights from Multilingual Schools Stuart Shaw, Cambridge International Examinations, Cambridge Assessment

Assessment of the Validity of Testing Results of Entrance Examination to the Magister and Doctoral Studies in Foreign Languages

Turakty Intymakov, National Testing Center

PAPER 2:

PAPER 3:

Validity of the Assessment Approach in the Monitoring System for Languages Anara Dyussenova, AEO Nazarbayev Intellectual Schools

Validity Evidence for Specific Examinations, Malott Room PAPER 1:

CHAIR: FRANS KLEINTJES

The Credibility of institutional Based Examinations in Nigerian Universities Prof. Nkechi Patricia-Mary Esomonu, Nnamdi Azikiwe University, Awka, Anambra State, Nigeria

Ascertaining the Credibility of Assessment Instruments through the Application of Item Response Theory: Perspective on the 2015 UTME Physics Test

Francis R. Ojo, Joint Admissions Matriculation Board (JAMB), Abuja, Nigeria

PAPER 2:

Improving Test Development to Improve Validity, Centennial Room

CHAIR: PAM MUNRO-SMITH

Improving Test Development Procedures to Improve Validity: A Look at the Kenya National Examinations Council Test Development Procedures

Edith Leah Ngugi, The Kenya National Examinations Council

PAPER 1:

Evaluation of Pen on Paper Examinations Development Procedures to Improve Validity in Open and Distance Education

Prof. P. E. Eya, National Open University of Nigeria

PAPER 2:

F U L L A GEN D A

Can We Build on What We Report? Development of Valid Reporting in a Student Monitoring System for Mathematics

Rustam Abilov, AEO Nazarbayev Intellectual Schools

PAPER 3:

Validity and the Classroom, English Room PAPER 1:

CHAIR: KENNEDY OKEMWA

Validity in the Teaching-Learning Process: A Call for Curriculum Reforms in Nigeria Prof. Nneka Umezulike, Michael Okpara University of Agriculture, Umudike, Abia State, Nigeria

Nigerian Teachers’ Utilization of Test Construction Procedures for Validity Improvement of Achievement Tests

Omaze Afemikhe, University of Benin, Benin City, Nigeria

PAPER 2:

WEDNESDAY, OCTOBER 14: SESSION II (11:00 A.M. – 12:30 P.M.) //////////////////////////////////////////////////////////////////////////////////////////////////

Test Development Practices, Kansas Room PAPER 1:

CHAIR: BRUCE FREY

Test Quality Assurance: The Effectiveness of Psychometric Controls Francis R. Ojo, Joint Admissions Matriculation Board (JAMB), Abuja, Nigeria

Teacher’s Perceived Influence of Team Teaching Method on Students’ Performance in Mathematics in Secondary Schools in Ikeduru LGA Imo State

Ada Ike Eucharia, Abia State University, Uturu, Nigeria

Presented By Dr. Agomuo, Moua, Nigeria

PAPER 2:

Best Practice in Handling of Cases of Missing or Incomplete Values in Data Analysis: A Guide against Eliminating Other Important Data

Patrick Onyeneho, Joint Admissions Matriculation Board (JAMB), Abuja, Nigeria

PAPER 3:

Teacher Perceptions and Attitudes and Validity, Malott Room

CHAIR: DENNIS OPPOSS

What Might Teachers’ Conceptions of Assessment Mean for Validity in High Stakes, School-Based Assessment?

Cathy Schultz, South Australian Certificate of Education (SACE) Board of South Australia

PAPER 1:

Survey of Teachers’ Attitude to the Validity of Instruments Used for Continuous Assessment of Basic Education in North Central Geo-Political Zone of Nigeria

Prof. Charles M. Anikweze, Nasarawa State University, Keffi, Nigeria

PAPER 2:

Factor Structure, Centennial Room

CHAIR: DAVID MENSAH

Factorial Validation of an Academic Environment Scale for Research Methods Education Students in Jos, Nigeria

Christiana Ugodulunwa, University of Jos, Nigeria

PAPER 1:

#I AEA2 0 1 5

Using Itemised Data Capture Reports as an Additional Lens in Evaluating the use of Cognitive Taxonomies in Determining the Difficulty of Physical Sciences Examinations

Dr. Helen Sidiropoulos, Independent Examinations Board South Africa

PAPER 2:

Systemic Issues , English Room PAPER 1:

PAPER 2:

CHAIR: PETER HERMANS

Challenges to Systemic Validity in Examination Systems Jo-Anne Baird, Oxford University Improving Test Development Procedures to Improve Validity among ECDE Trainers in Kenya Elizabeth A. Obade, The Kenya National Examinations Council

WEDNESDAY, OCTOBER 14: SESSION III (2:00 – 3:30 P.M.) //////////////////////////////////////////////////////////////////////////////////////////////////

Score Reporting and Validity, Kansas Room

CHAIR: OLATUNDE AWORANTI

Improving the Validity of Grading and Reporting Practices of Competency-Based Assessment (CBA) in Ghanaian Vocational Education and Training (VET) System

Dr. Peter Boahin, National Board for Professional & Technician Examinations (NEBPTEX)

PAPER 1:

PAPER 2:

Analytical Report as a Tool in Ensuring the Validity of Examinations Nazgul Tuitina, AEO Nazarbayev Intellectual Schools

Progress in the Validation of High Stakes Assessment: The Case of Selection Test for Nazarbayev Intellectual Schools in Kazakhstan

Zamira Rakhymbayeva, AEO Nazarbayev Intellectual Schools

Presented by Zamira Rakhymbayeva

PAPER 3:

Teachers and Testing, Malott Room PAPER 1:

CHAIR: CHIDIMMA ANYA

The Role of Testing in Assessment of Teachers’ Professional Skills Saltanat Abdildina, National Testing Center of the Ministry of Education and Science of the Republic of Kazakhstan

Validity Concerns in Assessment and the Competence of the Classroom Teacher. An Overview of the Uganda Classroom Teacher’s Competence

Dan Kyagaba, Uganda National Examinations Board

PAPER 2:

An Evaluation of the Awareness of the Need to Ensure Validity in the Continuous Assessment Component of Examination by Some Lecturers of Kaduna Polytechnic

Martha Ada Onjewu, Kaduna Polytechnic

PAPER 3:

Application of Item Response Models, Centennial Room

CHAIR: CHE YEE LYE

Towards Application of Item Response Theory to Data from India’s Graduate Aptitude Test in Engineering (GATE)

Devlina Chatterjee, Indian Institute of Technology, Kanpur, India

PAPER 1:

F U L L A GEN D A

Evaluating the Power of Xcalibre and ConQuest Software in Item Parameter Recovery: A Comparative Analysis

Omokunmi Popoola, Ph.D., Joint Admissions Matriculation Board (JAMB), Abuja, Nigeria

PAPER 2:

An Application of Rasch Model to Illustrate the Validity of Professional Teacher Test from Different Perspectives

Dr. Muhammoud, National Center for Professional Testing, Kingdom of Saudi Arabia

PAPER 3:

Improving Test Development, English Room

CHAIR: JENNY FLINDERS

Examining the Utility of the Music Student Motivation Scale (MSMS) in Higher Education using the Rasch Rating Scale Model

Pey Shin Ooi, University of Adelaide, Australia

PAPER 1:

Differential Item Functioning of 2010 Junior Secondary School Certificate Mathematics Examination in Southern Educational Zone of Cross River State, Nigeria

Beatrice Ndifon, Calabar Cross River State, Nigeria

PAPER 2:

WEDNESDAY, OCTOBER 14: SESSION IV (4:00 – 5:00 P.M.) //////////////////////////////////////////////////////////////////////////////////////////////////

Validity Theory and Practice, Kansas Room PAPER 1:

PAPER 2:

Critiquing Kane’s Argument-Based Approach to Validation Stuart Shaw, Cambridge International Examinations, Cambridge Assessment Validity in Practice: What Is it Realistic to Expect from Examination Bodies? Sarah Maughan, AlphaPlus Consultancy

Opinions as Validity Evidence, Malott Room PAPER 1:

CHAIR: JOYCE NDALICHAKO

CHAIR: EVELYN MWAPASA

Dismantling Face Validity - Why the Concept Must Live On, but the Term Must Finally Die Tzur Karelitz, National Institute for Testing and Evaluation

Nazarbayev Intellectual Schools Graduates’ Opinions as a Factor of Validity in the Development of Educational Processes and Assessment of Learning Outcomes

Olga Mozhayeva, AEO Nazarbayev Intellectual Schools

PAPER 2:

Computer-Based Test Delivery, Centennial Room

CHAIR: DAN KYAGABA

Thin Client Technology, Innovation and Simultaneous Examination of Large Population in Multiple Locations in the National Open University of Nigeria

Ukoha Igwe, National Open University of Nigeria

PAPER 1:

A B ST R A CTS

MONDAY, OCTOBER 12: SESSION I //////////////////////////////////////////////////////////////////////////////////////////////////

»Validation » of the Standard of Hong Kong Diploma of Secondary Education Examinations (HKDSE) Guanzhong Luo, Director of Assessment Technology and Research, Hong Kong Examinations and Assessment Authority The Hong Kong Diploma of Secondary Education (HKDSE) Examination was implemented for the first time in 2012. The results of the HKDSE are reported using the standards-referencing approach. Level descriptors, sample scripts, and cut scores of different levels are the major components of the standards-referencing system. It is a critical task to maintain the standard of the levels of the subjects as so to uphold the quality of the HKDSE. This paper first describes the approach for the standard maintenance of the core and elective subjects of the HKDSE. Then it is pointed out the maintenance of the standard of the core subjects of the HKDSE including Chinese language, English language, Mathematics and Liberal Studies set the baseline for the stability of the standards of elective subjects. The validation of the standards in core subjects were conducted with data collected from 2012 to 2014. The results of the validation are reported in this paper with the conclusion that the standards of the HKDSE subjects across years since 2012 are stable in terms of the cut logits that were produced using the Rasch model. »Teachers’ » Perception and Understanding of E-testing as a Means of Enhancing the Validity of Assessment of Students’ Academic Achievement Prof. Uwakwe O. Igbokwe, Abia State University, Uturu, Nigeria; Catherine C. Igbokwe E-testing has lately come to be accepted as a means of assessing academic achievement, especially in public examinations and for the purpose of determining those to be admitted into tertiary institutions in Nigeria and some other African countries. Higher institutions have also used modified forms of these tests in the semester examinations of their programmes. The use of e-testing may be even more useful in the classroom for evaluating teaching and learning. The question, however, may be asked: How much knowledge do teachers have about e-testing? Do they understand its usefulness to them in enhancing the validity of their assessment of students’ academic achievement as well as teaching and learning? This study, therefore, intends to determine if teachers: i. Have the understanding of what e-testing is. ii. Understand the terminologies associated with e-testing. iii. Perceive the advantages associated with e-testing. iv. Perceive the need for maintaining standards of quality in assessment. v. Understand the areas of assessment amenable to e-testing. vi. Perceive e-testing as useful in enhancing the validity of classroom assessment. The survey design will be adopted in this study, and a researcher constructed questionnaire will be used. Data will be analyzed using means, standard deviations, ranking and z-statistic. »Using » Computer Based Test as an Assessment Tool to Improve Validity in E-Learning: A Study of National Open University of Nigeria Nwamaka Patricia Ibemen, Ph.D., National Open University of Nigeria National Open University of Nigeria (NOUN) has leveraged technology advancement to improve student’s success rate and e-learning ability. The Computer Based Test (CBT) as an assessment tool assists to measure intelligence and memory retention of students. In the case study (NOUN e-examination), this assessment tool is used as a systematic evidence of gathering information on students to judge the level of reasoning and application of knowledge acquired in the school and how relevant they are/will be in the courses they have chosen. NOUN uses this evidence to judge whether students have learned what they are expected to learn by securing valid and reliable information through the CBT examination assessment method as against pen-on-paper assessment method. The assessment method chosen are in practical, oral, and electronic formats which cover various students’ courses and expanded world view. Tests were administered to students using two methods (pen on paper and computer-based test). Student scores were then analyzed to determine whether student performance was affected by test designs as described above or not. Group means were calculated for both tests. Because the sample took both tests on the same day (there was no sample deviation). The study observed that students who used electronic format (CBT) had higher success rate as against those who wrote pen-on-paper assessment tests. The study observed that a lot of factors were responsible for this. We conclude that Computer Based Test enhances student performance in examination and recommend that most e-learning examinations be conducted using computer-based tests.

»Staff » Perception of the Adoption of ICT for the Management of Data on Examination Malpractice Mrs. Olayinka Omoladun Ajibade, Deputy Registrar,West African Examinations Councils, Nigeria; Ayodele Peters Oyejide Examination boards often prescribe severe sanctions for engagement in examination malpractice because such actions undermine the integrity and validity of assessment. Errors in the data on examination malpractice would therefore have serious consequences for any candidate that may be inadvertently listed among those who committed breaches. The examination board would also be exposed to litigations and loss of credibility. The West African Examinations Council (WAEC) conducts the West African Senior School Certificate Examination (WASSCE) in the English-speaking West African states and attaches a lot of importance to the proper handling of the data on malpractice. In Nigeria, high prevalence of malpractice in WASSCE, coupled with the attendant challenges of manually processing the voluminous data relating to the cases detected, prompted WAEC (Nigeria) from 2011 onwards to adopt ICT-driven approaches to handling malpractice cases in order to improve data quality and efficiency of processing. Using staff perception as a quality and service improvement tool, this paper sought to provide valid and reliable information on the impact of the initiative and what needs to be changed, in order to achieve fully its objectives and also enable staff to have a clearer and shared vision of it. A set of questionnaire was designed, developed, and used in conjunction with focus group discussion to collect data for the study. The respondents were 25 staffs of the department that handles reports on malpractice and 275 staffs of other departments who participated in examination administration and detected malpractice cases. The data collected were analyzed quantitatively and qualitatively. The findings were discussed and various recommendation were made to enrich the system and achieve positive shifts in the opinions and attitudes of staff. »Transitioning » to E-marking to Maximise Operational Efficiencies: A Caribbean Examinations Council Case Study Jonathan Hale, RM Results, Milton, Oxfordshire, United Kingdom The Caribbean Examinations Council (CXC) provides assessment services across 19 territories in the Caribbean. Until recently, the approach to marking was for examiners to work together in the same physical location. Relocating them for the marking period brought with it significant traveling and accommodation costs, which led CXC to investigate the adoption of onscreen marking to mitigate these expenses and operational challenges. In this session, RM Results on behalf of CXC will step through their approach to vendor selection and the pilot projects they conducted before making the decision to implement onscreen marking as an integral part of their assessment process. Delegates will gain an understanding, through CXC’s firsthand experience, of the cost savings they could realise as a result of e-marking, other key benefits from the implementation, and important points to consider when embarking on the journey to e-assessment. It is anticipated this session will be particularly instructive for other awarding organisations considering e-marking of paper exam scripts. MONDAY, OCTOBER 12: SESSION II //////////////////////////////////////////////////////////////////////////////////////////////////

»Judging » the Quality of National Learning Assessments in Trinidad and Tobago: Exploring Item and Test Validity Jerome De Lisle, The University of theWest Indies at St. Augustine,Trinidad and Tobago; Peter Smith, Division of Educational Research & Evaluation Trinidad and Tobago Ministry of Education The validity of test-score interpretations is dependent upon the quality of items in a test (Downing & Haladyna, 1997). This is logical because the test item is the building block of the test. In institutions responsible for test development in the global south, insufficient attention is paid to item-development processes. This might be because of lack of resources and expertise or sustained commitment to testing standards. The issue of resource stringency as a constraint on testing processes is especially acute in small island states in the Anglophone Caribbean (Bray, 1998). This study focuses upon evaluating the quality of items used in the national tests of Trinidad and Tobago for the period 2011 to 2014. Quality is judged using item and sub scale statistics and parameters from both classical test theory (CTT) and item response theory (IRT). In Trinidad and Tobago national tests function as national learning assessments which are administered to students in standards 1 and 3 in the Primary School. Data is used to monitor learning standards by school and district and to judge system progress. National tests are low stakes for students but medium to high stakes for schools and education districts because data use policies categorize schools and districts, which may necessitate an institutional response. The national tests consist of both dichotomous and polytomous constructed response items organized into subscales called strands. Administration is an annual census with all students at ages 7-8 and 9-10 targeted. Item analysis data were generated using Lertap, PARSCALE, and Xcalibre software programs. We relate judgments of item quality across administrations to changes in test-development structures and processes. We reflect on the challenge of improving test development in the context of ministries of education in small states.

»The » Evolution of Validity and Modern Psychometrics: Do We Need to Revisit Item Validity? Charles Secolsky, Mississippi Department of Education;William Buchanan; Walt Drane Early validity studies were concerned with discriminative capacity of items. Were examinees with high total scores answering items correctly, and were examinees with low total scores answering items incorrectly? The statistic used was the biserial correlation. The school headed by Lindquist questioned item validity studied this way since a subjective, human element was needed to offset the possibility that ambiguous or structurally deficient items could be missed more by higher ability examinees. Cronbach’s contribution to validity study that it is the interpretation of test scores rather than the test per se has evolved into Kane’s interpretive argument. Item invalidity or ambiguity was no longer the emphasis since the focus of validity was on the interpretation. Here we demonstrate that interpretation of item level data that incorporates subjective components have far reaching implications for seriously revisiting item validity. Items may be ambiguous or biased according to item level statistics, which flag items for review by experts and examinees, or the items may lack technical quality. Ebel stated that experts and examinees have different ideas about what an item measures, whereas Cronbach stated that a few items did not affect a test’s validity. These two juxtaposing views lie at the heart of the argument for the existence of item validity. With high stakes testing and increased emphases on cut-off scores, a focus on items again becomes critical. In addition, advances in psychometrics (e.g., the 3PL’s c-parameter and IRT approaches to DIF analysis) since the Ebel-Cronbach division are now comparable to classical approaches for understanding the problem. Eighth grade Mississippi Competency Test item-level data in Reading and Mathematics are analyzed classically and with IRT to determine whether there were differences in detection of faulty items and whether faulty items can be scored without lose in interpreting scores using IRT approaches. »From » Raw Score to a Multi-Faceted Rasch Modeling Approach: Raters, Ratings and Scales Che Yee Lye, School of Education, the University of Adelaide, Australia The traditional educational assessment framework which is based on the Classical Test Theory (CTT) is commonly used to measure students’ performance. These assessment results are also widely used for the purpose of making a selection or a comparison. However, a critical question is how can the fairness and objectivity of an assessment that seeks to measure the same content be ensured when it is administered to different cohorts of students and rated by different raters? Therefore, this study explored the capability of a multi-faceted Rasch modelling approach which is based on the Item Response Theory (IRT) in examining rater effects, and providing the estimates of students’ abilities after the calibration of raters, test items, and student competence using Conquest 4. The participants were 41 Senior 1 English language teachers from the Malaysian Independent Chinese Secondary Schools (MICSS). They were requested to rate the English language performance of their students (N = 1155) based on the Assessment Checklist. A total of 139 students were randomly selected to assess the inter-rater reliability among the raters. When compared to the results from the raw scores, the multi-faceted Rasch analysis revealed different severe and lenient raters. The multi-faceted Rasch analysis also showed the capability of detecting rater errors such as rater severity, halo effect, central tendency, restriction of range and inter-rater disagreement. Students’ abilities in the English language assessments after the calibration were also obtained through the multi-faceted Rasch modelling approach. As evidenced by the findings of this study, the multi-faceted Rasch modelling approach has the capability to address rater effects and provide the estimates of students’ abilities by bringing the raters, test items, and students’ competence to a common scale that is independent of each other. This work could potentially enhance the validity of score development and assist in refining test instruments. »Evaluating » the Impact of the Bahrain National Examinations Basma Alsadeq, National Authority for Qualifications and Quality Assurance of Education and Training; Wafa Al-Yaqoobi The Directorate of National Examinations (DNE) is part of the National Authority for Qualifications and Quality Assurance of Education and Training (QQA). It is an independent organization, established in 2008. The DNE is responsible for conducting National Examinations to assess the performance levels of students against the national curriculum at key education stages (Grades 3, 6, 9) and at international levels for Grade 12 in the Kingdom of Bahrain. The first administration of grade 3 and grade 6 National Examinations took place in 2009. During 2014 the DNE delivered the sixth cycle of grade 3 and grade 6 National Examinations, the fifth cycle of grade 9 National Examinations, and the second cycle of grade 12 examinations. As part of the commitment of the QQA to the ongoing review of the impact and quality of its work, collaborative research with its international partner Cambridge International Examinations (CIE) has been undertaken in order to investigate the impact of the National Examinations on the educational landscape in the Kingdom of Bahrain. QQA has identified the need to monitor the effects of their National Examinations on a diverse range of stakeholders by eliciting their views, perspectives, and attitudes.

The impact of an assessment will always have some bearing on teaching and learning as well as on other stakeholders outside the classroom. It is important, therefore, that any impact study takes account of the perceptions of its stakeholders, because their attitudes towards the tests may be relevant to its validity. This paper reports on the findings from a longitudinal and iterative impact research program designed to elicit data from a range of educational beneficiaries on their attitudes, experiences, and perceptions of the National Examinations. »A» Framework for Providing Evidence-Based Validity and Reliability of the Unified Tertiary Matriculation Examination (UTME) Ann Momoh, Joint Admissions and Matriculation Board (JAMB), Abuja, Nigeria; Patrick Onyeneho; Barr. Aminat Egberongbe Shafiyi As an educational assessment body, the Joint Admissions and Matriculation Board (JAMB) from time to time strategizes on measures geared towards meeting the demands of validity. Being a high stakes examination for the selection of suitably qualified candidates for admissions into tertiary institutions in Nigeria, the Unified Tertiary Matriculation (UTME) is strictly monitored to ensure compliance with test quality-assurance measures. The approach to validity discussed in this paper assumes a significant understanding where the appropriateness of the inferences made on the basis of assessment results is seen as central. The paper describes the development of a systematic approach to the collection of evidence that can support claims about validity for general qualifications. An operational framework which was adapted from Kane (2006) was developed. The framework involves a list of inferences to be justified as indicated by a number of linked validation questions. For each question various data could be gathered to provide ‘evidence for validity’ which includes five important sources of validity evidence: test content, examinee response processes, internal test structure, external relationships, and consequences of test use as well as identifying any ‘threats to validity.’ Issues bordering on scoring and the reliability of assessment results were also discussed. The paper describes the development of the proposed framework and the types of methods to be used to gather relevant evidence. Finally, an exploration of how distinct sources of evidence can be integrated into an overall validity argument is discussed. »“Teachers » in the Dark.” Why Do They Not Find Coherence Between Large-Scale Assessment Programs and Classroom-Based Assessment? Serafina Pastore, Ph.D., University of Bari Aldo Moro, Italy Following the global education reform trend, the role of large-scale assessment in Italian public education has grown with the implementation of assessment and accountability requirements. The accountability trend has impacted heavily on the educational and scholastic systems putting remarkable pressure on teachers, who are now more conscious about the need of using data and evidence for decision making in the classroom context (Hanushek, Woessmann, 2011; Almond, 2010; Darling-Hammond, 2010). Several research studies have already highlighted how teachers do not use data driven decision-making. They use data in limited ways (Miller, Linn, & Gronlund 2009; Wayman, 2005; Herman, Gribbons, 2001). Even though there is great attention on national and international large-scale assessment systems, how teachers use data driven decision-making in a responsive and effective way represents a neglected research field: “It’s clear, from the expanding literature base on data literacy, that many educators do not have the data literacy skills necessary to use data effectively. ” (Mandinach, Gummer, 2012, p. 3). Although scientific debate about literacy assessment and standards in schooling has grown up, Italian teachers continue to perceive largescale assessment as an improper assessment. Both international (OCSE-PISA; IEA-PIRLS and IEA-TIMSS) and national (INVALSI) largescale assessment results are not considered in a functional way to support school system and teaching practice. In view of the above, the current study examines what perceptions primary and middle school teachers have about different assessment goals and about relationship between the testing programs and their classroom assessment. The study is qualitative in nature: semi-structured interviews with 15 teachers have been used. ATLAS.ti software has been used for analyse collected data. Results indicate a great level of confusion about assessment; several suggestions are discussed for further improvements in Italian school system. »What » Teachers Know about Validity of Classroom Tests: Evidence from a University in Nigeria Christiana Ugodulunwa, University of Jos, Nigeria; Sayita Wakjissa There is increased desire for school effectiveness and improvement. The central role of assessment in the school system for improving teaching and learning therefore demands that classroom tests should be valid and reliable measure of students’ real knowledge and skills and not testwiseness or test taking abilities. Teachers are required to adopt continuous assessment mode of evaluation in Nigeria, and the most commonly used techniques are written tests, performance tests, and projects. These assessment techniques are expected to be valid

and reliable measures of abilities. The extent to which teachers are able to develop and use valid assessment instruments depends on how knowledgeable they are of validity of classroom tests. Although literature abounds on types of validity and use in determining the quality of assessment tools, there is dearth of empirical evidence on how knowledgeable teachers are of what validity means and how to establish it in test development process. So, this study sought to find out teachers’ knowledge of content and predictive validity of classroom tests. A sample of 100 teachers was selected from five departments in the Faculty of Education in a university in Nigeria for the study. A 30item teachers’ validity knowledge questionnaire (TVK-Q) was developed and validated for data collection. The data is being analysed using mean, standard deviation, t-test, and ANOVA statistical techniques. The results of the analysis would be presented on profile of teachers’ knowledge of issues relating to validity and significant differences in their responses due to gender, rank and academic disciplines. Their opinion on areas they need capacity building would also be identified. The implications of the results would be discussed and recommendations made for capacity building. »Evaluation » of Validity of Computer-Based Test Items in National Open University of Nigeria Charity Akuadi Okonkwo, National Open University of Nigeria Multiple Choice Items (MCI) are one of the most commonly used computer based assessment (CBA) instrument for assessment of students in educational settings, especially in open and distance learning (ODL) with large class sizes. The MCI making up the assessment instruments need to be examined for quality which depends on its difficulty index (DIF 1), discrimination index (DI), and Distractor Efficiency (DE) if they are to meaningfully contribute to validity of the students’ examination scores. Such quality characteristics are amenable to examination by item analysis. Hence, the objective of this study is to evaluate the quality of MCI used for CBA in the National Open University of Nigeria (NOUN) as formative assessment measures by employing ex post facto research design. Two foundation courses in School of Education of the University would be used for the study. The aim is to develop a pool of valid items by assessing the items DIFF 1, DI, and DE, and also to store, revise, or discard items based on obtained results. In this cross-sectional study, 320 MCI taken in four (4) sets of CBA per semester per course in 2012 – 2014 academic years are to be analysed. The data would be entered and analysed in MS Excel 2007 and simple proportion, mean, standard deviation are to be calculated and unpaired t-test would be applied. The result shall indicate items of “good to excellent” DIF 1 and “good to excellent” DI. Mean DE and non functional distractors (NFD). Also to be established are mean DI, poor DI with negative DI indicating issues in framing such items with negative DI. This study will emphasize the selection of quality MCI which truly assess levels of students learning and differentiate students of different abilities in correct manner in NOUN thereby contributes to improving the validity of the test items. »Trialing » Adaptive Comparative Judgment with Long Essay Type Responses Antony Furlong, International Baccalaureate; Matthew Glanville A number of subjects within the International Baccalaureate (IB) Diploma Programme prove to be challenging to mark reliably. The process of adaptive comparative judgement (ACJ) (Pollitt, 2012) potentially offers a radical solution to this challenge. Previous trials have reported reliability values of over 0.90 (e.g., Whitehouse and Pollitt, 2012); however, owing to the requirement for each piece of work to be viewed multiple times, concerns remain about whether or not the process is feasible within the short marking window in which the IB operates in. A trial using 700 English A Literature scripts was undertaken by the IB assessment research and design team and supported by TAG Assessment. The analysis of the trial (Pollitt, 2015) suggested that each script needed to be viewed eight times on average, leading to an estimated total judging time of around 28 minutes per candidate, which compares unfavourably to an estimated time of between 15 and 17 minutes on average to mark a script in the traditional sense; however, the comparison is not a straightforward one due to examiner attrition rates in a traditional marking session and the different quality control methods and available time periods for both processes. This paper discusses the results of the trial and the sort of issues that the IB would need to address if this approach were to be adopted in a live examination session and areas where further research is necessary. »Assessing » ICT Literacy via Computer Ray Philpot, Australian Council for Educational Research (ACER) ICT literacy is about using computer technology to access, manage, evaluate and create information, and communicate it appropriately. The last few years have seen dramatic changes in computer technology and how it is used, particularly by young people. Think of tablets and smart phones, “apps,” cloud computing, wireless connectivity, social media, crowd-sourcing and so on. Is there a valid ICT literacy construct that remains stable and relevant despite all these changes? This paper describes a computer-based test instrument that was recently developed and used to measure ICT ability in 10 000 Australian students in Grades 6 and 10. The instrument consists of a suite of modules containing authentic simulated environments with which students interact on-line to demonstrate their ICT skills. The modules attempt to take into account the recent changes in technology. The instrument is contended to have a high degree of validity. The construction process, the contents of the modules and an IRT analysis of the response data are discussed in this paper in support of this contention. 29

MONDAY, OCTOBER 12: SESSION III //////////////////////////////////////////////////////////////////////////////////////////////////

»Utility » Vis-à-Vis Validity Charlie DePascale, Center for Assessment, Dover, New Hampshire, United States; Damian Betebenner As this year’s IAEA conference theme affirms, test validity is the most accepted framework used to investigate test quality. Just as tests have taken on more and more prominent social uses, the concept of validity and the enterprise of test validation have expanded to accommodate consideration of those uses: “valid for what purpose.” In our own work, we often find discussions of test validity extending to issues not traditionally included under the test validation framework. In particular, as tests have become essential components of high stakes accountability systems, discussions about test validity often intersect with discussions of test utility (e.g., useful for improving instruction). In this paper we discuss test validation and its relationship to test utility for a given purpose or set of purposes. Specifically, we argue that utility is not a concept that fits well within the current treatment of “validity” and validation; but is better served when treated as a separate characteristic to be considered alongside validity. Although Messick (1993) laid out a broad framework for considering the utility of a test, current practice tends to limit the concept of utility to a relatively narrow consideration of intended and unintended consequences of the use and interpretation of test scores; and even this limited view of utility as consequential validity is not without controversy. The broad view of utility proposed by Messick includes appraising the proposed test use against alternative approaches and counterproposals. To motivate our discussion, we borrow concepts from the field of genomic testing and the ACCE model for evaluating genomic testing that addresses ‘analytic validity,’ ‘clinical validity,’ ‘clinical utility,’ and ‘ethical, legal, and social implications.’ We argue that such a separation of concerns is well suited to understanding test quality and naturally extends validity studies that often accompany test development. »Is » it Valid to Treat Assessment Grades from Different Subjects the Same? Dennis Opposs, Office of Qualifications and Examinations Regulation (Ofqual) Some say that it is meaningless to compare standards in different subjects to each other. But what if those standards are actually different but become a common currency so that, for example, a grade A in mathematics is treated in the same way as a grade A in geography? Is that interpretation and use of the grades valid? In England, universities mostly treat the same A-level grade from different subjects as being of equal value for entrance purposes. That might dissuade students from choosing A level subjects in which they perceive it to be hardest to achieve the highest grades–sciences and languages for example. It might lead universities to choose the wrong students for their courses. If it is having either or both of these impacts, action may be justified. As part of a wider consideration of comparability between subjects, Ofqual is trying to understand better how other jurisdictions deal with comparability between subjects in assessments used for university entrance. England is unusual in that pre-university students have an almost free choice of subjects and take few subjects in total–typically just three. In jurisdictions where the curriculum is constrained or where all those competing for the same university places take a very similar assessment, comparability between subjects may be of little interest. There are some jurisdictions where there is more curriculum choice and where an important feature of their university entrance systems are statistical adjustments to subject outcomes even though these may be controversial. With a view to informing current discussions in England, this paper explores how different jurisdictions make or do not make adjustments to grades across subjects. It is hoped that the session at the conference will provide an opportunity for those from different countries to share their knowledge and experiences in this area. »The » Impact of Technology on the Validity of Assessment in Large Scale Public Examinations: The West African Examinations Council Experience C.M. Eguridu, West African Examinations Council (WAEC), Nigeria, Head of National Office; Dr. O.F. Dacosta With a track record in public examinations that spans over six decades, the West African Examinations Council (WAEC) presents some of the challenges and innovations in the conduct of high stake examinations in Nigeria. The major challenges identified were standardization of assessment tools, issues with item banking and item selection, uniformity and fairness in scoring assessment tests, effective supervision and invigilation of examinations, prevalence of fake results and certificates, and the menace of examination malpractice. WAEC is involved in large-scale assessment involving achievement tests covering seventy seven (77) subjects for an average of over Two Million (2,000,000) candidates annually. The paper describes various technologies adopted to continually improve on the validity of the Council’s examinations in view of the stated challenges. These include computer solutions for online registration, incorporating an offline registration module for rural schools, online validation of candidates’ examination details, online result verification, adaptation of the Item Response Theory 30

(IRT) for item analysis and e-item banking, remote e-marking of essay scripts using dongles and internet data cards, photo embossment and QR Codes on certificates and the Candidates’ Identity Verification, Attendance, Malpractice and Post Examination Management Systems (CIVAMPEMS). An analysis of the impact of these solutions shows a steady improvement in the reliability and validity of WAEC examinations. The deployment of these innovations, especially CIVAMPEMS, had engendered a steady reduction in examination malpractice cases recorded in the various diets of the examination. »Towards » Improving Validity of Examinations in Developing Economies: Application of Technology in Kenya’s Context Kennedy Ondara Okemwa, The Kenya National Examinations Council Examination in every level of education is central in gauging the suitability of a prospective candidate for the next level of the academic ladder or in other circles of life. While examination plays a central role as a tool of measure in predictive validity, its relative validity equally remains critical. In essence, for an examination to be valid it must not only be able to measure what it is supposed to measure (content validity) but also ensure the scores given are relative to the candidate performance in future placement. Validity is not a property of the tool itself, but rather of the interpretation or specific purpose of the assessment tool with particular settings and learners. The validity of examinations has attracted a lot of interest worldwide. The concern now is how technology can be used to improve the validity of examination results. This therefore forms the main purpose of this paper, that is, how technology can be used to improve the validity of examination in Kenya’s context. In order to respond to the main objective, the paper intends to review the following key areas: the validity of examination, use of technology in test development, automation of the printing, packing, centering and dispatch of examinations, monitoring of field administration of examinations using technology, e-marking of candidates scripts and electronic capture of candidates marks, processing and release of candidates examination results. A critical review of literature is therefore expected to form the basis for drawing conclusions on the best practices in technological application in improvement of examination validity. The paper is expected to inform further empirical research on the application of technology in validation of examination. »How » Do You Use Technology to Improve Flexibility in Assessment and Maintain High Standards of Validity? Steve Harrington, RM Results, Milton, Oxfordshire, United Kingdom; Sarah Corcoran This is the challenge that ACCA, the global body for professional accountants, faces on their journey to offer candidates worldwide a more flexible and authentic experience in their accountancy exams whilst maintaining the high standards of assessment that ACCA is known for throughout the accountancy community. To test the performance of candidates in a way that accurately reflects real-world-working, it is essential for ACCA to move towards computer-based examinations (CBE) to better simulate the workplace. The inherent technological challenges are considerable and far ranging. This is a key change programme for ACCA, the aim of which is to develop a robust, end-to-end e-assessment process - from developing assessments, sitting assessments, through to e-marking and delivering fair and valid results. ACCA’s strategy to deliver high stakes assessments via a new CBE solution will deliver a more authentic and valid assessment experience, for example, by using spreadsheets as part of the assessment, as well as enabling greater flexibility around when candidates choose to sit their exams. This flexibility means that candidates will no longer all face the same assessment at the same time as they currently do on paper. To achieve this fairly requires a robust test security model which ACCA has now devised and will implement. Further implications for ACCA, supported by RM Results, are wide ranging and include compressing the exam session by 40% whilst maintaining high standards of quality, reliability, and validity. A new psychometric model has also been constructed to deal with multiple exams being delivered over a set period, which contain both dichotomous items and constructed-response questions requiring expert marking. In this session we hear how ACCA’s e-assessment journey will transform candidates’ assessment experience by responding to the needs of students and employers for valid, rigorous, and yet flexible accountancy examinations, fit for the 21st Century. »What » Makes for a Sound Validity Argument? Exploring Criteria for Evaluating the Strength of Validation Evidence Stuart Shaw, Cambridge International Examinations, Cambridge Assessment The current version of the Standards for Educational and Psychological Testing (2014) adopts a practical stance with regard to approaches to validation and endorses, at least implicitly, an argument-based approach which requires a clear statement of the proposed interpretations and uses of test scores as a starting point for any validation. The extent to which the Standards advocates an argument-based approach is

illustrated in the following passage: “A sound validity argument integrates various strands of evidence into a coherent account of the degree to which existing evidence and theory support the intended interpretation of test scores for specific uses.” (AERA, APA & NCME, 2014, p. 21) [emphasis added] But what makes for a sound validity argument? One that offers a conclusion based on solid reasoning? And what counts as evidence of validity in practice? How much of what kind of empirical evidence and/or logical analysis is required in order to stake a claim to validity? These questions, more than any other, have framed debate on validity theory and validation practice over the past century. Evaluating the strength of validity arguments is fundamental to the process of staking a claim of validity. If the argument is judged to be sufficiently strong, then the interpretation of test scores for intended uses would be valid and fair. The strength of a validity argument might be judged in relation to general criteria for establishing the strength of any informal argument. Whilst no universal criteria exist for conceptualising the adequacy or strength of an argument, a number of evaluative criteria have been proposed. This presentation considers ways in which the ‘soundness’ of a validation argument is evaluated. Several evaluation criteria are first summarised then discussed. »Regulating » for Validity: Laying the Foundation Paul Newton, Ph.D., Office of Qualifications and Examinations Regulation (Ofqual) In England, organisations which provide high profile qualifications are regulated by an independent body, to help establish confidence in the standard of their qualifications and in the efficiency of the qualifications system overall. Recently, the Office of Qualifications and Examinations Regulation (Ofqual) announced a new approach to regulation, with validity at its heart. This change of approach was motivated by a recognition that, for regulations to be effective in upholding the quality and value of regulated qualifications, they need to be understood through the lens of validity. Although the transition to the new approach was more evolution than revolution, and although it seemed almost unquestionably the right thing to do, it still raised all sorts of challenging questions, related to: reaching consensus over a definition of validity and of related concepts; the lack of a precise and comprehensive framework for classifying qualification purposes; the need to justify the new approach and to explain exactly what it would mean in practice; tension between the need for regulatory (non-) compliance decisions to be clear-cut and the fact that overall evaluative judgements concerning (in)sufficient validity are typically far from clear-cut; very substantial differences between qualification types – both technical and social; systemic risks associated with viewing tens of thousands of qualifications through this new (or newly polished) lens; and so on. The presentation will explain how Ofqual has laid the foundation for its new approach to regulation by addressing challenges such as these. »On » Assessment Literacy Damian Betebenner, Center for Assessment; Charlie DePascale, Center for Assessment; Luciana Conchado, University ofWisconsin, Milwaukee; Gretchen Anderson, University of Kansas Broadly, the function of assessment reporting (i.e., score reporting) is to inform stakeholders of the outcomes of student assessments. As the theme of this year’s IAEA conference indicates, score reporting is critical to the effort of enhancing both the validity (consequential) and utility of assessments by transforming the data derived from the report into information that stakeholders can act upon. Unfortunately, it is often the case that score reports fall short of this and are criticized for leading to greater confusion than clarity. In this session we argue that the creation of score reports is part of the larger effort of assessment literacy. To this end, we present a utility/ validity based framework that embeds score reporting within that effort. That is, the goal of reporting is not solely to communicate data but to make users (e.g., teachers) literate consumers of both the data and the valid uses to which the data can applied. Accomplishing this is easier in theory than in practice. One of the largest impediments to communicating daunting (for some users) test details is the traditional paper-based reporting of results. Paper-based reports often require placing both big ideas and minute details on the same page, leading quickly to user information overload. Recognizing this in our own work on creating modules on assessment literacy topics, we introduce an open source, GitHub based platform for the creation of web-based explorable explanations (http:// explorableexplanations.com/) directed toward assessment literacy. The platform is built so that anyone with content-area expertise and the desire can build high fidelity assessment literacy materials that are easily shared and modified by others. Examples presented will include explorable score reports, explorable modules on assessment topics (e.g., measurement error), and projects based on assessment literacy internship work. »A» Preoperative Index for Construct Validity Ali Baykal, Bahcesehir University, Istanbul,Turkey Judgmental procedures fall short of restoring the desirable attributes of the test after it has been administered. Preventive strategies must replace negating efforts. Perfect key reliability, for instance, can always be ensured during the construction of the test before using it.

Apparently, attributes such as inter-subject reliability, predictive validity, concurrent validity, etc. cannot be predicted before obtaining the empirical data. So far as the construct validity is concerned, there are some a priori aspects independent of responses given by the participants. The inclusion error (irrelevant impurities diffused into the items) can be detected and removed before the subjects are exposed to the test. Also, exclusion error (misrepresented intent) can be identified and essential content can be supplied in advance. In order to describe the relevancy between the intent of the test maker and the effect as distinguished by the expert(s), a numerical index based on the Shannon’s concept of entropy is introduced in this study. Items in the instrument are tallied into categories (taxonomical levels, subconstructs etc.) as intended by the test-maker. Same set of items are checked in categories as distinguished by the expert(s). Observed frequencies are cross-tabulated on a contingency table to compute entropy values. The pre-operative construct validity of the instrument is defined as the uncertainty removed by the observed distribution over the total uncertainty observed in the distribution. The implications of the index will be demonstrated on a questionnaire named “Personal Predispositions Perceived (PPP).” PPP uses 9 point Likert scale. There are 80 items representing all sub-constructs equally. Items are judged by 49 experts independently and completed by iterative use of the proposed procedure. Finalized items have been administered on-line in randomized sequence to 2,149 educators. Benefits and shortcomings of the index in the improvement of questionnaire will be discussed and criticisms will be collected in return. TUESDAY, OCTOBER 13: SESSION I //////////////////////////////////////////////////////////////////////////////////////////////////

»Conceptualization » and Implementation of Continuous Assessment in Tanzania: Fit for the Purpose? Joyce L. Ndalichako, Aga Khan University, Institute for Educational Development, East Africa Continuous assessment in Tanzania aims at ensuring that students’ learning is continually assessed and incorporated into the final grade attained at the end of schooling. This study explored the conceptualization and implementation of continuous assessment in secondary schools in Tanzania. A questionnaire developed by the researcher was used to collect data from a total of 4,160 secondary school teachers who participated in the marking of the Certificate for Secondary School Examinations in 2013. Findings revealed that the traditional methods of assessment, such as tests, class exercises, homework, and quizzes, are dominantly used in the implementation of continuous assessment. Statistically significant differences were found in the teachers’ frequency of use of assessment methods and the type of subject taught. In contrast, no statistically significant differences were found in the frequency of use of assessment method by the qualification of teachers. Furthermore, there were no statistically significant differences in the method of continuous assessment used by teachers and the number of students in a class, possibly because of the confounding effect of the workload of teachers. In some schools, teachers with small number of students had a heavy teaching workload, outweighing the benefits of having a small class in conducting effective assessment. The study concludes that conceptualization of continuous assessment in Tanzania is limited mainly to administration of tests that are not even constructed in schools. This practice raises a question of whether the conceptualization and implementation of continuous assessment in secondary schools fulfill the purpose for which it was meant to serve. It is essential to re-conceptualize continuous assessment in line with assessment for learning so that the implementation of continuous assessment in schools contributes to supporting students’ learning. Accordingly, sustained professional development for teachers is necessary to enable them use assessment to improve learning outcomes. »Legal » Precepts and Validity of Assessment Outcomes: A Case Study of the Joint Admissions and Matriculation Board (JAMB), Nigeria Barr. Edward Mojiboye, Joint Admissions and Matriculation Board (JAMB), Abuja, Nigeria; Barr. Shafiyi; Barr. Aminat Egberongbe Shafiyi Experiences over the years have shown that the higher the stakes involved in a public examination, the greater the risk of security breach and the greater the pressure on the examination body to maintain security. Examination bodies, be it public, professional bodies or tertiary institutions, assure the quality of their examinations by applying different punitive measures on perpetrators of examination malpractice to deter them from such unwholesome conduct. These include non-release or total cancellation of examination results, blacklisting, expulsion of candidates, etc. which have given rise to litigations against the examination bodies by candidates who felt they do not deserve such sanctions even when evidences against them are quite overwhelming. The Joint Admissions and Matriculation Board, with the mandate to select only qualified candidates for Nigerian tertiary institutions, has had its own fair share of such litigations. This paper highlights the mandate of JAMB, Nigerian Examination Malpractice Acts, precedents, and judgements of courts- citing some cases which have particularly challenged the Board. Cases with nationwide geographical spread spanning the last ten years were collated and outstanding cases sampled. Inferences on the strengths and failures of the Board’s legal defense are highlighted to serve as a guide for other examination bodies encountering similar cases, especially in this era of enhanced ingenuity for examination breaches engendered by information technology. Recommendations on ways to surmount the antics of the candidates within the ambit of the law are stated in order to minimize the frequency and intensity of such legal tussles.

»Opt » Out: An Analysis of Issues Randy Bennett, Educational Testing Service (ETS), Princeton, New Jersey, United States Media reports have recently given significant attention to the “opt-out” movement, an organized effort to refuse to take standardized tests, including the newly created Common Core State Assessments. Whereas the narrative often told in press accounts is one of a large grassroots effort led by parents concerned about large amounts of lost instructional time and about children made extremely anxious by the high stakes associated with state-mandated tests, the reality is considerably more complicated. This presentation will examine the opt-out movement in more depth in an attempt to better understand the dynamics behind it. A careful examination is critical because sensible solutions are unlikely to emerge without a more complete appreciation of the root causes. Several topics will be explored, including the attention given the movement by the media in proportion to the movement’s size; the social-class divide that appears to characterize proponents vs. opponents; the impetus given the movement by test uses that many in the measurement community have openly opposed (in particular, the use of student test performance for teacher evaluation); the amount of time actually spent on state-mandated tests and the actual nature of the stakes; and the role of those tests as contributors to negative, instead of positive, impact on teaching and learning. »Improving » Validity of Tests Through Improved Test Development Procedures I.E. Anyanwu, Ph.D., Quality Assurance Dept., Minna, Niger State, Nigeria; F.I. Williams Onwuakpa, Ph.D. Psychometricians over the world are of the view that validity and reliability of test items are very critical in quality assurance of test items. However, validity is the most important between the two attributes of a good test because it gives the true score, relevance, and appropriateness of the test items. This paper presents the definitions of validity of tests, its types, and importance in quality assurance of testing procedures. Effort was made to identify some test development procedures in order to improve upon validity measures of tests, among which are providing clear instructions in the tests, avoiding the use of difficult vocabularies, appropriate arrangement of items, and improving upon the length of tests. The paper opines that when these test development procedures are adopted, then, there is an assurance in the improvement of the validity of the test. »Measuring » Change in the Unified Tertiary Matriculation Examination (UTME) Use of English through Pre- and Post-Test Analysis Patrick Onyeneho, Joint Admissions Matriculation Board (JAMB), Abuja, Nigeria; Omokunmi Popoola; Chidinma Ifewulu Administering different forms of the same test to different groups of examinees is a common practice in computer-based tests (CBT). This is to guard against test item exposure. In CBT examinations, different test forms that are parallel are administered to examinees with the aim of achieving fairness irrespective of which form of the test an examinee takes. A drift may occur when there is a violation in identical performance of the common items in terms of precision and difficulty level for a target population across repeated use. When item parameter drift (IPD) occurs, this can be worrisome, especially when items with estimates that exhibit differential behaviour across test administrations are used as anchors for test equating. The purpose of this study is to examine the item parameters of the common items used as anchors across the four different test forms (C1, C2, D1, D2) used in the 2013 Unified Tertiary Matriculation Examination Use of English (UOE). The study employed the causal-comparative design method. Twenty (20) items were repeated as common items across the four test forms of the UOE using a random sample of 8,000 examinees’ responses. The items were calibrated using a 3-PL Item Response Theory model. Results revealed that no significant differences were found in the item parameter estimates across the anchor items of the test forms. However, only one item each in test forms C1 and D2 exhibited IPD but this had no significant impact on the resulting ability scores of the examinees. »Using » Technology to Improve Validity of Test Items Development Procedures in E-assessment in NOUN Charity Akuadi Okonkwo, National Open University of Nigeria The use of technology in assessment is now well accepted by National Open University of Nigeria (NOUN) community of assessment practitioners. The NOUN e-assessment is the end-to-end electronic assessment process whereby computers are used for presentation of test items and the recording of responses. The test items format consists of multiple choice items and fill-in the blanks well suited for computerization. So far, the use of technology by NOUN in assessment has remained in this fundamental level of approach. But, in order

to improve validity of the test items, more attention need to be focused on the use of technology in the test items data analysis to enhance the validity. This study, which adopted descriptive research design and is theoretical in approach, explores options available to NOUN and other institutions using technology in assessment to valid item-development procedures; item analysis using technology; and conditions for selection of test items for banking. Since a standard item banking system not only provides a set of tools to facilitate the writing, review, editing, and selection of test items but also provides the automation, standardization, and scalability (Prometric, 2012) essential to developing and maintaining effective and responsive assessment, which invariably improves validity. The study is guided by three research questions tailored around three major issues. The discussions are based on improving item development procedures to improve validity. The aim is to stir-up NOUN community of practitioners interest in the use of technology in test item analysis and to leapfrog the University practice in the use of technology in testing the test items, banking of quality items and/or review, reconstruct or delete items as may be necessary. The study report is based on good practices provided by technology open to practitioners for improved item development, validation, and use for assessment purposes. »Trainer » Practice Review to Validate the Certificate Venera Mussarova, AEO Nazarbayev Intellectual Schools; Saule Vildanova; Aigerim Aitbayeva This article examines the Centre for Pedagogical Measurements branch under the autonomous educational organization Nazarbayev Intellectual Schools experience in provision Trainer certificate of Kazakhstan teachers training within in-service teacher training system via validation. Some empirical data, various quality factors, expert conclusions are being analyzed. Interim research outcomes of trainer practice review after certification as a tool to validate trainer qualification are being suggested. This article is representing steps and outcomes of our research. »Validation » of an Instrument for Assessing Users’ Perception about the Use of E-resources in South East University Libraries in Nigeria Chukwudi Patrick Mensah, Michael Okpara University of Agriculture, Umudike, Abia State, Nigeria Too much information is available on internet in the form of e-resources; therefore, there is need to develop and validate instruments to assess users’ perception of the e-resources since the users determine whether the resources obtained from the libraries meet their academic needs. This study developed and validated instrument for assessing users’ perception about the use of e-resources in University libraries in South East Nigeria. The data for this survey was collected through checklist and questionnaires. The questionnaires were given to 300 students and the instrument was validated by three experts from Library and Information Science. The result revealed that e-resources are paramount for student academic success but that the e-resources are underutilized by the students. The study also revealed, among other challenges, that slow Internet access, power outages, and non-availability of e-resources relevant to students information needs were indicated as hindrances to the use of e-library resources. »University » Matriculation Examination and Post-University Matriculation Examination Scores as Predictors of University Students’ Achievement in First Year University Degree Examinations in Nigeria Ismail Junaidu, Nigerian Educational Research and Development Council (NERDC); Kate Nwufo; Moses Salau; David Omole This study (i) investigated the predictive validity of University Matriculation Examination (UME) and Post-UME (PUME) scores with respect to students’ achievement in First Year Degree Examinations; (ii) developed a structural model for predicting the academic achievement of students in first year university degree examination based on performance in both UME and PUME. Records on performance in both UME and Post-UME of a random sample of 2,637 students that were admitted into eight core faculties from fourteen randomly selected Nigerian university were examined. The data obtained were analysed using arithmetic means, percentages, correlation, and forward inclusion multiple linear regression analysis. The postulated hypotheses were tested at 0.05 significance level. The study revealed a significant relationship between two predictor variables (UME and PUME) and the first year Cumulative Grade Point Average (CGPA) which were 0.44 and 0.36, respectively. However, the correlation between UME and PUME was very low, suggesting that UME and PUME may not have much in common. Based on the candidates’ scores in UME and PUME, the following model was obtained for predicting students’ academic achievement: CGPA = 0.5 + 0.32 UME + 0.30 PUME. This result further showed that both UME and PUME scores significantly influenced students’ first year CGPA in all the universities sampled. The paper concludes with some policy driven recommendations and suggestions for further study.

TUESDAY, OCTOBER 13: SESSION II //////////////////////////////////////////////////////////////////////////////////////////////////

»Predictive » Validity of JAMB University Matriculation Examination and Post University Matriculation Examination Scores on Final Grade Point Average in Universities in South East Nigeria Ngozi Akanwa, Joint Admissions and Matriculation Board (JAMB), Abuja, Nigeria Universities in Nigeria introduced Post University Matriculation Examination because they believe Joint Admissions and Matriculation Board (JAMB) University Matriculation Examination (UME) has not predicted performance due to issues related to public examinations in Nigeria. The study looked at the relationship between JAMB UME and Post UME and the final cumulative grade point average. The sample of 3,280 from 12,332 graduates who were admitted into 3 government-owned Conventional Universities in 2005/2006 were used for the study. The design is a longitudinal study.These students being the first set, were followed from 100 level to 500 level depending on length of programme for null hypotheses were tested with Pearson’s product moment Correlation Coefficient t-test was used to test for significance. Stepwise regression analysis was used to find the predictor. Results show that Post UME is a better predictor of performance and should be retained. »Impact » of Resource Deficiency on Predictive Validity: A Study of Factors Affecting Validity of Results in Selected Universities in Malawi Madalitso Mukiwa, Dean of Commerce, Exploits University, Lilongwe, Malawi As a supplier of skilled labor into the industry, universities in Malawi are facing serious human, financial, and physical resource deficiencies. As a result, predictive validity of examination results is being questioned. Of course, this abstract will make a special concentration on physical resources deficiencies and its impact on results. Can final results under a heavily resource constrained institution lead examiners to predict career success of a graduate? It should be noted that physical resource deficiencies is not only a threat to quality, but it is also a contributor to academic ineffectiveness. Knowledge levels, as an output of University education, acquired in a highly resource constrained environment, put a college finalist capabilities into question. It is envisaged that a Bachelor of Commerce Degree holder in Malawi, for example, should be at par in terms of knowledge, skills, abilities, and behavior with a degree holder of a similar qualification anywhere in the world.Yet their counterpart in a developed economy pursued a similar qualification while enjoying unlimited access to adequate physical resources such as books and journals, Internet, and comfortable lecturer halls. Malawi, as a member of the global community, is slowly attracting foreign direct investments; hence, the demand for global skills, knowledge, abilities, and behavior are in great demand. Yet graduates in Malawi generally pursue their university education in institutions that have physical resources deficiencies such as high student per book ratio, high student-per-computer ratio, and lectures rooms too small to accommodate all students to the extent that some institutions are either erecting tents or letting students sit near a window of a lecturer room to learn. Internet costs have led universities to maintain traditional over innovative assessments. »Assessment » of Nursing and Midwifery Students in Uganda: The UNMEB Experience Prof. Wilton Kezala, Uganda Nurses and Midwives Examinations Board (UNMEB); Helen Mukakarisa Kataratambi (Executive Secretary) UNMEB; Agnes N. Wadda (PRO) UNMEB Nursing and Midwifery Assessment in Uganda is defined in the Business, Technical, Vocational Education and Training (BTVET) Act No. 12 of 2008 and operationalized by Statutory Instrument No. 4 of 2009, which gave a legal basis for the establishment of the Uganda Nurses and Midwives Examinations Board (UNMEB). UNMEB’s cardinal objective is to streamline, regulate, coordinate, and conduct credible national examinations and award diplomas and certificates to qualified nurses and midwives. Prior to the establishment of UNMEB, assessment of nurses and midwives was hospital based and conducted by the Uganda Nurses and Midwives Council, the agency responsible for registration and regulation of nursing and midwifery practice after qualification. During assessment, cases presented to students for skills assessment were based on chance appearances of ill patients in hospital wards. Validity of the mode of assessment, skills attained, examination results, and quality of award lacked objectivity due to non-uniformity of cases, time lag, high resource requirements, and attitudes of patients, students, and examiners at the time of assessment. Since the establishment of UNMEB, validity and reliability became the core of assessment and a basis for development of all operational frameworks including establishment of a vibrant secretariat, corporate governance structures, implementing partnerships and stakeholder networks, introduction of OSPE/OSCE assessment methodologies, examination management information systems, grading formulae, human resource management policies, regulations for the conduct of examinations among others. Despite the above, validity of skills and awards remained an issue, especially to those examined before 2005.

This paper will present the strategic interventions engaged to address the validity of skills, examination results, awards, and examination processes presented with objectivity, practicability, and relevance. It is hoped that the Conference discussions will enrich UNMEB validity strategies and provide learning and sharing platform for benchmarking with International best practices. »The » Factor Structure of the Giftedness Assessment Instrument (GAI) as an Identification Measure of Giftedness Chidimma Adamma Anya, Federal University Gusau, Zamfara State, Nigeria; M.L. Mayanchi, Usmanu Danfodiyo, University Sokoto-Nigeria The study investigated the factor structure of the Giftedness Assessment Instrument (GAI) as an identification measure of giftedness construct among primary six children in Lagos, Nigeria. Three component factors with their independent attributes were obtained through factor analysis using principal components with Varimax rotations. A sample of 600 elementary school children (275 boys and 325 girls) both in public and private schools completed the test items. The results show that the two existing instruments for selection of pupils into the gifted schools do not cover all the attributes of giftedness. Instead, a three component factor with all the attributes of the ’real’ gifted child was obtained. Further validation also obtained a significant discriminant measure with standardized convergent and divergent measures of giftedness. The instrument was therefore recommended for use as a quick test for the selection of pupils into gifted schools. »Ensuring » the Validity of Outcomes of Technical Skill Assessment in Technology Education Institutions in Nigeria Prof. Nkechi Patricia-Mary Esomonu, Nnamdi Azikiwe University, Awka, Anambra State, Nigeria; Engr. Martins Ndibem Esomonu, Gbenga Kayode Oluwatayo A prominent drawback to the economic advancement of Nigeria is the lack of right and adequate technical manpower to drive the technological sector of the economy. This situation provides obvious challenges to the technological institutions in the country. One factor of such challenge is the outcomes of classroom assessment. Assessment at the classroom level places great tasks at the shoulders of teachers and evaluations. The tasks increasingly manifest in technology education, where scores awarded should of necessity translate to technical skills acquired. Though a recent study has shown that the products of technology education of Nigerian Certificate in Education (NCE) programmes possess knowledge and skills that enable them perform creditability at their workplaces, there still seem to be some gap between the grades they earn at graduation and the skill they possess. The need to make the technology education graduates in Nigeria internationally acceptable in terms of skill acquisition still remains a problem. Hence the study sets out to ascertain activities that will ensure the validity of skill assessment scores in technology education. Four research questions guided the study. The population of the study consists of 104 technology education teachers in tertiary institutions in South East, Nigeria. There was no sampling because the entire population was not very large and therefore were all used for the study. The instrument for data collection was questionnaire. The results showed that some item construction skills, teacher preparation and development activities, content delivery issues, and facilities for teaching and conducting examinations will ensure valid interpretation of skill assessment outcomes. It was recommended among others that government and implementers of technology education should be encouraged to allow the identified conditions to operate in technology education institutions. »Validity » Issues in the Reform of Practical Science Assessment: An English Case Study Neil Wade, OCR; Ian Abrahams Assessments in national examinations are frequently modified. Radical change, such as changes to the overall form and style of assessment, is infrequent. However, as part of a wider reform of A-levels (qualifications taken by 18 year olds in England as preparation for university study), the assessment of practical skills and techniques in A-levels in biology, chemistry, and physics will change radically. Currently practical skills at A-level are assessed using tasks set externally but marked by teachers. These tasks contribute to the overall grade awarded for the subject. Central to the reform of the assessment of practical skills is the separation of what we refer to as the direct assessment of practical skills (DAPS), assessed in the classroom by the teacher, from indirect assessment of practical skills (IAPS), assessed in written examination. Additionally, the change involves giving a separate reported grade (pass/fail) for the practical skills and techniques demonstrated in class, alongside the grade for the written examination. The intention is to increase the validity of the resulting grade, whilst delivering an increased level of practical skills and techniques that students can demonstrate at university. In this paper we explore the factors which threaten the validity of practical assessments currently used at A-level and the impact of the reforms. In order to do so, we trace the development of DAPS and IAPS (Abrahams, Reiss and Sharpe, 2013): their initial conception; their discussion within the Department for Education in England and Ofqual (a non-ministerial government department which regulates qualifications and assessments); and finally the interpretation of those policy requirements by the awarding organisations who provide the A-level examinations (Evans and Wade, 2015). The paper will comment on the stages of development and link these to improvements in assessment and in the teaching and learning of A-level students. 37

»Assessment » of Valid Science Practical Skills for Nigerian Secondary Schools: Teachers’ Practices and Militating Factors Omaze Afemikhe, University of Benin, Benin City, Nigeria; Sylvanus Yakubu Imobekhai; Theresa Ogbuanya The teaching of science in Nigerian secondary schools has always assumed an important position in the scheme of things because of the belief that it can accelerate technological development. Practical work in science has and continues to be given much emphasis as it is felt that science teaching should utilize a pedagogic approach which integrates theory and experiments. Good science instruction involves interplay of experiments, observation, and theoretical inferences. For experiments to be used there should be a laboratory with basic equipment and consumables. The paucity of these equipment and consumables in most schools coupled with a shortage of qualified and experienced teachers has made it difficult to assess science practical work and hence its formative function, of assisting in understanding science and how scientific ideas are developed, has not been achieved. Despite this, experiments would still need to be conducted and assessment of practical carried out as an important component of grades in certification examinations on completion of secondary school. Teachers play an important role in the preparation of candidates for certification examinations in delivery of theory lessons and conduct of practical exercises. The question which then has to be answered is what assessment practices are used in preparation of students for the certification examination. In addition, the problems which confront proper integration of science practical would equally be explored. Towards this end, this study utilized a survey research method utilizing science teachers in secondary schools in Edo state, Nigeria. A sample of three hundred science teachers would be used for data generation. The data would be collected using the researchers’ designed questionnaire. The validity and reliability of the scores obtained would be determined. The data would be analyzed using mean scores and an interpretative norm. It is anticipated that the typology of practices and militating factors would vary between teachers based on school ownership, location, and science subject taught. Relevant recommendations which would enhance the role of science teaching using experiments would be made. »Development » and Validation of Biology Achievement Test (BAT) for Assessment of Students in Enugu State Collens Ikechukwu Odo, Michael Okpara University of Agriculture, Umudike, Abia State, Nigeria This study sought to develop and validate a Biology Achievement Test (BAT) which will be used for accurate assessment of Senior Secondary School Students. Two research questions were drawn to guide the study. A multiple choice test of 100 items was developed based on biology syllabus. The test was administered to a sample of 600 students randomly drawn to determine the validity and reliability of the test through regular method of item analysis, facility index, and discrimination index were calculated. The test has a reliability coefficient of 0.88 established through the use of Kuder-Richardson formula 20. The test is valid and reliable for assessing students internally and prepare student for external examination. Based on the results, some recommendations were made. TUESDAY, OCTOBER 13: SESSION III //////////////////////////////////////////////////////////////////////////////////////////////////

»An » Evaluation of the Content Validity of the Test Items for Assessing Students’ Achievement in General Study Courses in National Open University of Nigeria Prof. Uche Mercy Okonkwo, National Open University of Nigeria General study courses are compulsory courses offered at the undergraduate level by every Nigerian university. The National Open University of Nigeria initially adopted the essay format for examining these and other first and second year courses but changed to the objective mode because of the challenges encountered with marking essay answers. For both formative and summative assessment, Access & General Studies Centre (AGSC) sets multiple-choice and fill in the blank questions for assessing students’ achievement in the general courses. These are administered on e-platform. But since 2013, when the objective mode was adopted, the questions have never been evaluated. The assumption was that the test-item writing workshop attended by the course coordinators had adequately prepared them for the task of setting valid and reliable test items. Although this could be the case, there is no empirical basis to support it. The purpose of this study therefore is to evaluate the validity of samples of test items developed for assessing students’ achievement in the six general courses offered in the University. The study would involve an in-depth content analysis of the items set for each course. Specifically, the study would assess each set of questions for each course in terms of • Adequacy of coverage of content of the modules in each course using a table of specification • Adequacy of coverage of levels of learning using taxonomy of educational objectives • Adequacy of instructions • Adequacy of distracters The outcome of this study would provide an empirical basis for improving test-development procedures and consequently test validity in Access & General Studies Centre. Course coordinators at the Centre would affirm what they are doing right and improve and change what they are not doing right. The result of the study would act as an impetus for improving test development procedures in the other programmes in the university. 38

»Content » Validation and Rubric Development of Language Tools Tested in Multi-State Multi-Language Project: Challenges Faced Gayatri Vaidya, Large Scale Assessments, Educational Initiatives, Ahmedabad, Gujarat, India Validity of tools is a key issue in large-scale assessments. For any assessment to have its valid impact the inferences and decisions made on the basis of assessments are well founded. To ensure this, the content, construct, testing procedures, and analysis procedures have to be validated. As such a complex feature, validity of tools enters a different paradigm when the tools have to be tested in multiple states in various vernacular languages. While mathematical tools require simpler adaptation and precise translation in other languages, for language testing, the tools developed should be validated for cultural and linguistic purposes as well. Questions where the construct includes skills like identifying letters and sounds, word usage, grammar, and comprehension, mere translation in vernacular language may not suffice. More than translation, it is trans-creation and adaptation of tools in that language maintaining the difficulty levels. This requires various validity checks. The paper focuses on a case study of a project funded by international organization working on children education and welfare. As part of the project, the tools were developed and administered for grades 2 & 3 in 7 different states of India in 5 different languages. Various validity checks including harmonization of tools, constructs, difficulty levels, and pilot testing were employed to ensure uniformity and efficacy of the assessment. The paper lists out challenges faced and remedies worked out to ensure that the objective of the assessment was achieved. »Empirical » Analysis of Item Difficulty Indices and Discrimination Power of Multiple Choice Biology Items Adekunle Thomas Olutola, Ph.D., Department of Educational Foundations, Faculty of Science and Education, Federal University, Dutsin-Ma, Katsina State, Nigeria This study investigated the empirical analysis of item difficulty indices and discrimination power of multiple choice biology items. The study obtained empirical data on the item difficulty and discrimination of senior school certificate examination (SSCE) multiple choice biology tests used by the West African Examinations Council (WAEC) and National Examinations Council (NECO). The difficulty indices and discrimination power of the 2008 multiple choice WAEC and NECO SSCE Biology tests were determined. The survey research design was employed for the study. Sample for the study consisted of 1,450 Senior Secondary Three (SS III) students, made up of 758 male and 692 female, drawn from 20 randomly selected secondary schools in Ekiti State, Nigeria. The instruments used for the study were 2008 NECO and WAEC multiple choice Biology test papers. The classical test theory methods involving frequency count and percentages were used for obtaining the difficulty index and discrimination power. Findings from the study showed that 2008 SSCE Biology multiple choice test had mean difficulty index of 0.42, and this is higher than NECO Biology multiple choice test with mean difficulty index of 0.40. In addition, 2008 SSCE in Biology had a discriminating power of 0.43, and this is higher than NECO with mean discriminating power of 0.39. It was therefore recommended that NECO and WAEC should improve on the psychometric properties of their multiple choice items for improved students’ performance in Biology SSCE papers. Also, both WAEC and NECO should evaluate the effectiveness of their multiple choice items regularly. »Improving » Validity of Test Items Through Credible and Robust Trial Testing Exercise: NECO Approach Moses Oladipupo, National Examinations Council, Nigeria The underlying concept of whether an item is testing what is supposed to test and does the item satisfy the constructor yearning underscores the treatise of validity of test item; individual and examining body tend to adopt the most convenient and universally accepted method in ensuring validity of test items, National Examinations Council, Nigeria over the years conduct trial testing of her test items which a view of establishing the difficulty index, discriminating index, reliability and validity of test items. This exercise integrate psychometric ingredients of test items, present a standardized item structured in multiple choice form to ensure robust content coverage, the security network through which this sensitive materials passes is more tight than the normal examinations; of course, the supervision of the exercise is solely the responsibility of the examining body to the extent that the selected schools are not permitted to have access to the items before, during, and after the exercise. Since the guiding principles of improving validity such as (a) reliable measurement of each facet through the use of multiple, alternate form item, (b) accurate articulation of facets within and between content domain, (c) examination of incremental validity, (d) empirical examination of whether there is a broad construct or combination of separate construct, (e) use of items that represent single facets rather than combination of facets the trial testing exercise conducted by the National Examinations Council satisfies these traits. This paper diligently looked at the procedure, process, pattern, personality, prospect, and problems of trial testing in Nigeria schools with the view of establishing the relevance of trial testing for the improvement of validity and reliability of a test item in assessment industry.

»Integrating » Technology into Mathematics Teachers’ Design and Use of Authentic Assessments Colleen Parks, Calgary Girls’ School and University of Calgary; Dr. Kim Koh; Judi Hadden In an era of competency-based curriculum and outcome-based reporting, building teachers’ capacity in designing and using authentic assessments to support students’ learning and development of the 21st-century competencies is deemed to be a central priority in teacher education and professional development programs. A further layer of developing students’ competencies in the 21st-century classrooms also calls for teachers to effectively and meaningfully leverage technology into their assessments to create authentic learning environments for students. This paper reports on the preliminary findings of an action research that aimed to build mathematics teachers’ capacity in designing and using reliable and valid authentic assessments that incorporate the use of technology to capture students’ learning of mathematics and development of competencies such as critical thinking, complex problem solving, communication, and self-directed learning. The specific objectives of the research were threefold: (1) To examine the mathematics teachers’ conceptions of authentic assessment and assessment for learning before and after their engagement in designing authentic assessments; (2) To examine the effects of the mathematics authentic assessment tasks on students’ learning of mathematics and development of competencies; and (3) To examine the benefits and challenges of integrating technology into the mathematics authentic assessments. Five teachers who taught Grade 6 mathematics participated in a school-based professional learning community over a six-month period. They employed the criteria for authentic intellectual quality, the patchwork text assessment strategy, and the Structure of the Observed Learning Outcome taxonomy to design authentic assessment tasks. To further achieve the goals of enhancing student learning, the teachers have adopted the Substitution Augmentation Modification Redefinition (SAMR) Model to infuse digital learning into the authentic assessment tasks. The data sources included teacher focus group interviews, analyses of the mathematics authentic assessment tasks and students’ work, and one-on-one interviews with the teachers and a selected sample of students. »Validity » of Nigeria’s Unified Tertiary Matriculation Examination, Physics Computer-Based Tests: Threats and Opportunities Patience Agommuoh, Michael Okpara University of Agriculture, Umudike, Abia State, Nigeria; Ursula Akanwa The Joint Admission and Matriculation Board (JAMB) in Nigeria has adopted computer-based testing (CBT) for her Unified Tertiary Matriculation Examinations (UTME) for prospective tertiary education students. The study sought to find out the threats and opportunities encounter during this examination by students taking Physics. The study adopted a descriptive survey design. The population of the study consists of all Senior Secondary two and three (SS 2 & SS 3) students who took and who will take Physics in JAMB UTME in Umuahia Education Zone of Abia State, Nigeria. The sample is made up of two hundred and fifty (250) students got by a combination of purposive and cluster sampling techniques. Three (3) research questions and three (3) null hypotheses guided the study. The instruments for data collection are researchers’ developed structural questionnaire of the four point Likert type. Research questions were answered with mean and standard deviation while the hypotheses were tested with t test. Recommendations were made based on the findings of the study. »Comparison » of the Scores of Electronic and Manual Markings in Agricultural Science and Biology Practical in the West African Senior School Certificate Examination in Nigeria Dr. Iyi Uwadiae, West African Examinations Council, Accra, Ghana; Modupe Oke; Collins Uduh The West African Examinations Council has continued to facilitate its services through the usage of information and communication technology (ICT). This has led to e-marking of scripts for the West African Senior School Certificate Examination (WASSCE), which started on a pilot scale in 2013 with some papers. One of the challenges of any innovation is stakeholders’ skepticism. It is likely for some to be skeptical about the reliability of scores generated through e-marking. This study compared the scores of electronic and manual markings for agricultural science and biology practical papers for the November/December 2014 WASSCE. Three hundred and fifty candidates for each Paper were randomly selected. Seven experienced examiners for each paper were allotted 50 candidates’ scripts each to manually mark. The scores for each question and the total scores for each candidate generated through electronic and manual markings were obtained. The scores electronically and manually generated for each question and the total scores for each candidate in each Paper were correlated using Pearson product moment correlation. The results showed that the scores for each Paper were significantly correlated. Based on the findings, it was recommended that the electronic marking of WASSCE scripts should be sustained. »Validation » in the Assessment and Certification System of School Teachers’ Professional Development – Experience of Kazakhstan Venera Mussarova, AEO Nazarbayev Intellectual Schools; Aigul Jandarova; Dinara Ulanova This article describes some gained experience of assessment system formation within in-service teacher training in Kazakhstan. The role and place of validation in the assessment system is analyzed as the process—affirmation through assessment the evidence of set standards

compliance, and as the result certification. General construction and arguments of validation, combinations of various assessment components such as portfolio and teachers’ presentations and testing, contrast the results of different assessment procedures; the correlation of validity and reliability of assessment are considered in the context of real practice in achieving the validity in assessment system and more goals of teachers’ training. »Evaluation » of Sexuality Education Programme in the University of Ibadan: Implications for Test and Programme Validity Francisca Chika Anyanwu, Ph.D., Department of Human Kinetics and Health Education, University of Ibadan, Nigeria; Sylvester Reuben Okeke The goal of health education is to surmount health challenges militating against health and well-being of individuals and groups through behaviour change. Adolescents and young adults constitute important population group requiring health interventions to address health challenges caused by their behavioural peculiarities. However, for this intervention to be responsive it must be able to address what it is designed to address and valid test and package are sine qua non to realizing this goal. Therefore, designing a valid test to measure sexual behaviour before and after the intervention is instrumental to ascertaining the extent to which set objectives have been realized. This study examined the validity of the sexuality intervention programme designed to regulate risky sexual behaviour among undergraduates in the University of Ibadan. Ex post facto research design was used and a sample of 1,440 respondents was drawn across the faculties using multi stage sampling procedure. The intervention programme was examined for content and context validity. Primary and secondary data were collected using self-structured questionnaire and the intervention programme respectively. Content analysis, frequency counts, and percentages were used for data analysis. Result showed deficiency in content of the programme. While the intervention improved knowledge on key reproductive health issues, it however did not result to behaviour change. Designing, implementing, and evaluating intervention targeted at behaviour change, which is the overall goal of health education was recommended. »Constructing » an Interpretive Argument for the Secondary Entrance Assessment in the Republic of Trinidad and Tobago Jerome De Lisle, The University of theWest Indies at St. Augustine,Trinidad and Tobago According to Kane (2006), an interpretive argument (IA) specifies the proposed interpretations of test scores by laying out a network of inferences and assumptions from student performance to score-based interpretations. The IA is the first stage of argument-based validation (ABV) to be followed by the validity argument, in which evidence is gathered. More recently, Kane (2013) has called for the development of an interpretation/use (IUA) argument, in order to give equal billing to uses. The interpretations relate to the claims made about an assessment and uses relate to the score-based decisions. There is great value in explicating an IA for the high-stakes secondary school entrance examinations, still in use in the education systems of the Anglophone Caribbean. In Trinidad and Tobago, this examination is called the Secondary Entrance Assessment (SEA), with scores used for multiple purposes, such as (1) assigning students to a secondary school of their choice, (2) retaining students in the primary school, and (3) assigning students to a remedial classroom or school. The SEA has changed over time, often incorporating several new claims and purposes. For example, the post-2013 version includes a continuous assessment component (CAC) with a stated formative purpose. In explicating both intended and unintended outcomes, ABV represents a significant advance in judging the quality of assessments. In this paper, (1) eight propositions explicated in Kane’s (2013) paper are first used to interrogate validity; (2) a theory of action for the pre and post 2013 versions of the SEA is constructed; and (3) the veracity and popularity of claims and uses among stakeholders (teachers, parents, and civil society) are assessed using Q methodology. A step hinted to by Koch and De Luca (2012). Establishing an IUA argument can help policymakers in the global south better reflect on the utility and plausibility of assertions. WEDNESDAY, OCTOBER 14: SESSION I //////////////////////////////////////////////////////////////////////////////////////////////////

»Language » Rich: Insights from Multilingual Schools Stuart Shaw, Cambridge International Examinations, Cambridge Assessment; Helen Imam; Sarah Hughes Increasingly, international awarding bodies are developing and delivering programmes of learning and assessments worldwide in a wide range of subjects through the medium of English. These assessments are taken in a variety of multilingual and educational contexts by many candidates whose first language is not necessarily English and increasingly in ’bilingual education’ contexts. This raises interesting questions about challenges and opportunities for language development, validity, and fairness. The international context poses both a potential threat and a potential opportunity. On the one hand, the international quest for Englishmedium education can cause anxieties about achievement through the second language (English) as well as maintenance of the first

language. On the other hand, bilingual education, in which two languages are used as the media of instruction for non-language content subjects, is a fast-developing practice that could be the future direction of language learning in schools (Mehisto, Marsh & Frigols, 2008). In order to explore the potential concern and potential opportunity and to better understand and support bilingualism and bilingual education, Cambridge International Examinations (Cambridge) administered and distributed two online global questionnaires to schools (where the Cambridge curriculum is known to be delivered and assessed). Insights from the surveys, which constitute the focus of this presentation, reveal the hidden richness of bilingualism in schools as well as emerging practices of bilingual education. Already, considerable support materials for bilingual education and for training bilingual teachers and teaching bilingual learners have been produced by Cambridge. »Assessment » of the Validity of Testing Results of Entrance Examination to the Magister and Doctoral Studies in Foreign Languages Turakty Intymakov, National Testing Center; Maral Aliakparova; Saltanat Abdildina In this article provided the analysis of testing results of entrance examination to a magister and doctoral studies on languages of 2014. Entrance examination in languages is carried out to magister and doctoral studies in the form of testing on three blocks: Hearing, Lexical and grammatical, Reading. The analysis is carried out with use of statistical processing on model Rasch for dichotomizing estimates “the correct answer – the wrong answer”. This analysis allows defining quality of functioning of distractors, and compliance of level of difficulty of testing to level of readiness of applicants. The carried-out statistical analysis showed that systematization of test tasks on levels of difficulty had positive impact on increase of a validity and objectivity of an assessment of level of knowledge of applicants, determined objective distribution of applicants by level of proficiency in language (A1, A2, B1, B2). The validity of this analysis characterizes suitability of testing for measurement levels of proficiency in language of applicants of a certain size. With a research objective of a convergent validity of testing, test data compared to expert estimates about level of proficiency in applicants in a foreign language. Following the results of the analysis, it is possible to judge that the test correctly defines levels of proficiency in a foreign language. »Validity » of the Assessment Approach in the Monitoring System for Languages Anara Dyussenova, AEO Nazarbayev Intellectual Schools; Rustam Abilov, Assem Issabekova, NIS, Kazakhstan; Jesse Koops, Cito,The Netherlands Education in Kazakhstan is facing a challenge of implementing trilingual policy at school level so that three languages (Kazakh, Russian, English) will be at fluent use in the future. Nazarbayev Intellectual Schools (NIS) has been developing any opportunity for students, teachers, and parents to reach this aim since 2008. In general, this new stream of NIS schools has brought the development of new curriculum, analysis, research, and monitoring tools that include best international and national practice in education. Monitoring system for languages in NIS started in 2013. In order to fulfil the aim one of the leading organizations in this field, Cito Institute for educational measurements, Netherlands was chosen. The aim of long-term project is to develop a monitoring tool that will help in obtaining reliable information about students’ present performance in four language skills. Reading and listening skills contain multiple choice questions which are administered via computer, while writing and speaking are much more difficult to test. In terms of their administration they are quite time-consuming. But most importantly the assessment of writing and speaking skills is fully upon teachers themselves. They evaluate students’ ability in speaking via face-to-face interview; two teachers evaluate while one teacher interviews the students. As for writing, students’ written works are assessed. This scheme has been developed to ensure the validity of the assessment system. The principle is that teachers’ marks should be more or less common. Another essential issue is the validity of marks received across skills. Substantial differences in scores across two groups of skills (readinglistening/writing-speaking) would lead to unreliable results and threaten the validity of the assessment system. The aim of the paper is to research the validity of the system of assessment of writing and speaking skills. »The » Credibility of Institutional Based Examinations in Nigerian Universities Prof. Nkechi Patricia-Mary Esomonu, Nnamdi Azikiwe University, Awka, Anambra State, Nigeria; Gbenga Kayode Oluwatayo The credibility of institutional based examinations in Nigerian universities have often been questioned. Some experts have questioned the

quality of instruments used in such examination. However, lack of empirical evidence to ascertain the quality of the examination in terms of validity and reliability of tests in Nigeria universities is a problem. The main purpose of the study was to determine the validity of the instruments used to conduct institutional based examinations in Nigerian universities. The research was delimited to Faculties of Education in federal universities in Nigeria. Four research questions guided the study. Incidental sampling technique was used to select 100 question papers of multiple choice and 300 of essay items. The multiple choice test were administered to samples of similar students for whom the questions were prepared for and item analysis carried out. The indices determined in the item analysis were difficulty index, discrimination indices, and distracter index. Experts were used to determine the validity of essay questions. The validity of the instruments established indicated that many questions are invalid, among other findings. The conclusion and recommendation for the study were made. »Ascertaining » the Credibility of Assessment Instruments through the Application of Item Response Theory: Perspective on the 2015 UTME Physics Test Francis R. Ojo, Joint Admissions and Matriculation Board (JAMB), Abuja, Nigeria; Chidinma Ifewulu Item Response Theory (IRT) is a paradigm for the design, analysis, scoring of test items, and similar instruments for measuring abilities, attitudes, or other variables. It is based on a set of fairly strong assumptions which if not met, could compromise the validity and reliability of the test. The purpose of this study is to evaluate the impact of IRT on the test production processes of The Joint Admissions and Matriculation Board (JAMB), focusing on how the use of IRT has helped in maintaining the quality and credibility of the Unified Tertiary Matriculation Examination (UTME). The paper demonstrated the basic IRT assumptions by using real-data examples from the 2015 UTME Physics test. It also shows how to use the IRT procedure to present an evaluation of items as examples of the type of information IRT assumptions provides. The study used an expo facto and descriptive research design method using random sample of candidates’ responses from the 2015 UTME Physics test. The results of the analysis were used in demonstrating the verification of the basic IRT assumptions of the unidimensionality, local independence, nature of the ICC, and the parameter invariance of the test. Findings revealed that the 2015 UTME Physics test met all the assumptions of IRT, thereby upholding the integrity of the test. The paper thus recommends application of IRT in the calibration and analysis of items in large-scale assessments. »Improving » Test Development Procedures to Improve Validity: A Look at the Kenya National Examinations Council Test Development Procedures Edith Leah Ngugi, The Kenya National Examinations Council Validity is a reflection of the extent to which test scores actually measure what they were meant to measure. It is a crucial characteristic of good assessment. Valid assessment information can help stakeholders make good educational decisions. Without validity, an assessment may not be of much use. In order to produce valid learners’ scores, sound practice in the development and continual improvement of the tools used for the evaluation must be ensured. The tools must evaluate the knowledge and skills learned in such a way as to ensure that the desired competency levels are achieved. The parameters of a well-developed examination can be divided into two broad categories: the general parameters of a good test as an assessment tool and the specific parameters of a specific type of test. The general parameters of a good test as an assessment tool are: Validity, Reliability, Usability, Efficiency, Equity and Fairness. This paper aims at espousing, in detail, how The Kenya National Examinations Council (KNEC) continues to improve its test development procedures in order to ensure that validity in its examinations is not just maintained but that improvement is realized. In ensuring best practice in developing the tests it offers, KNEC has put in place sound and robust measures in the test development procedures. The methods used to ensure and improve these procedures will be delineated. The role each of the procedures plays in ensuring that validity is not just maintained but improved shall be shown. These procedures range from handling and adoption of a new syllabus to the production of examination results feedback for schools in particular, and the public in general. Various KNEC documents, including other published materials, shall be used to write this paper. »Evaluation » of Pen on Paper Examinations Development Procedures to Improve Validity in Open and Distance Education Prof. P.E. Eya, National Open University of Nigeria; Dr. A.C. Ukwueze In every educational institution all over the world, examination is a major tool for assessing instructional objectives, students’ progress and success, and certification. In open and distance education institutions, for instance, examinations could be web-based, pen on paper, or both. Whichever is the platform, the use of examinations depends largely on the validity considerations during testing. This study is therefore designed to evaluate pen on paper examinations development procedures with a view to improving validity in open and distance education. The study will specifically examine the appropriateness, meaningfulness, and usefulness of pen on paper examinations in open and distance education in Nigeria. The pen on paper examinations as an instrument for assessing students’ performance needs to be examined for quality

in order to achieve its purpose. The study will employ survey research design with a view to reaching a large number of stakeholders in open and distance education to ascertain the validity of the procedures often used to formulate objectives, construct table of specification or blueprint, and writing and organization of tests. The National Open University of Nigeria (NOUN) will be the focal point of the study. A self-structured questionnaire would be constructed and validated for data collection. The data would be analyzed using descriptive and inferential statistics. From the expected results, emphasis will be on valid procedures of improving pen on paper examinations development and administration in NOUN. »Can » We Build on What We Report? Development of Valid Reporting in a Student Monitoring System for Mathematics Rustam Abilov, AEO Nazarbayev Intellectual Schools, CPM, Kazakhstan; Sjoerd Crans; Nico Dieteren; Frans Kleintjes; Frans Kamphuis, Cito,The Netherlands The development of the student monitoring system for mathematics started in 2011, at the same time a new curriculum was introduced in Nazarbayev Intellectual Schools (NIS). It was believed that the implementation of the new curriculum would be strengthened by the introduction of such an extensive assessment program. The goal of the development of the student monitoring system was to develop a system, which allows obtaining reliable information on the performance and progress in students learning. One of the main features - call it a cornerstone - of the system is the reporting of student performance against level descriptors, allowing teachers to determine the most appropriate action in teaching and learning. This is called the ‘diagnostic purpose’ of a monitoring system and reflects the very high ambition we have in this project: score reports should tell more than ‘just a performance result.’ While developing this diagnostic usefulness of the monitoring system, a number of issues need to be addressed. Among these, the validity of level descriptors of the reporting categories as initiated from the very beginning. Not only the descriptions need to be evaluated, possibly aligned, and validated, but also the performance levels that are preliminary set for each domain and administration moment, requires, alignment and validation. The results are needed to further develop a valid student monitoring system. The paper reports on this development. »Validity » in the Teaching-Learning Process: A Call for Curriculum Reforms in Nigeria Prof. Nneka A. Umezulike, Michael Okpara University of Agriculture, Umudike, Abia State, Nigeria; Dr. Idowu O. Eluwa The content of what an everyday classroom teacher must learn to be effective has increased exponentially in line with the pressure of global change; as such, calls for change regarding issues in curriculum and instruction to ensure validity as well as sustainability in the educational system cannot be overlooked. The call for alignment of every day classroom instructions with the curriculum goals and objectives to guarantee valid outputs cannot also be ignored. This study employed the survey research design and qualitatively examined the issue of validity in the teaching-learning process of some tertiary institutions in South East Nigeria to ascertain the validity of instructional service delivery. A validity gap was found between the curriculum content and the instructional process. As a result, proactive curriculum reforms were recommended, among others, to improve validity of instructional service delivery in Nigeria. »Nigerian » Teachers’ Utilization of Test Construction Procedures for Validity Improvement of Achievement Tests Omaze Afemikhe, University of Benin, Benin City, Nigeria; Sylvanus Yakubu Imobekhai Teachers are very important stakeholders in the education system as they drive whatever takes place within the classrooms of the school system. Apart from teaching, they generate assessments which are used to evaluate teaching efficacy and learning achievement within their classrooms. The validity of these assessments is of essence if we are to have confidence in the interpretations and uses to which the assessments are to be put. To ensure the validity of the tests used by teachers, appropriate test construction procedures should be utilized by them. The main question addressed in this study is whether teachers use appropriate construction procedures which confer on tests the requisite validity. In executing this study, a survey approach was applied and the population of the study was composed of teachers in primary and secondary schools in Benin metropolis in Nigeria. From the population a sample of five hundred teachers made of two hundred and fifty each from primary and secondary school levels were selected. A questionnaire focusing on steps in constructing a valid achievement test was designed. The response option was a three-point scale of ‘all the time,’ ‘sometimes’ and ‘not at all.’ The validity evidence of the questionnaire was established using a juror of experts in measurement and evaluation and they were to determine the adequacy, comprehensiveness, and suitability of the items. The reliability of the scores from the instrument would be determined using Cronbach alpha. The data collected would be analyzed using means and standard deviation and an interpretative norm. It is anticipated that the results would indicate that teachers are not knowledgeable in the issues normally considered in constructing valid achievement tests. Recommendations based on enhancing the knowledge and skills of the teachers in the construction procedures of valid achievement tests would therefore be made. 44

WEDNESDAY, OCTOBER 14: SESSION II //////////////////////////////////////////////////////////////////////////////////////////////////

»Test » Quality Assurance: The Effectiveness of Psychometric Controls Francis R. Ojo, Joint Admissions Matriculation Board (JAMB), Abuja, Nigeria; Frans Kleintjes; Barr. Aminat Egberongbe Shafiyi Several researchers have continued to underscore the importance of testing in the teaching and learning process. There is no doubt that effective testing provides a realistic appraisal of students’ progress in key knowledge areas and skills that are essential for building successful careers. Good test development is central to all levels of educational attainments. Psychometrics, a field of study regarding the theory and techniques of test measurements, is concerned with the theories behind the construction and validity of measurement instruments and their evaluation. Psychometricians are, therefore, saddled with the task of making assessments not only valid, but reliable and fair. To accomplish this, three major psychometric controls are implemented by the Joint Admissions and Matriculation Board (JAMB), Nigeria, in its Unified Tertiary Matriculation Examination (UTME) Computer-Based Testing (CBT) with initial CITO tutelage. They are: (1) content control; that is, techniques adopted to achieve consistency with content and construct validity, scoring methods, and passing standards, (2) difficulty level control; adherence to the Three Parameter (3PL) Item Response Theory (IRT) analysis and equating methods; and (3) item exposure control. That is, using automated process in the creation of parallel forms to achieve consistency among test versions. All these are achieved with strong team spirit among the subject matter experts (SMEs), Psychometrics and the information and technology services departments in order to build, maintain a secure and sophisticated examination content that are not only valid, but cost-effective. This study highlights functions and procedures that assist in the best of test refinement and production as exemplified by JAMB. »Teacher’s » Perceived Influence of Team Teaching Method on Students’ Performance in Mathematics in Secondary Schools in Ikeduru LGA Imo State Ada Ike Eucharia, Abia State University, Uturu, Nigeria This study will investigate the teacher’s perceived influence of team teaching method on students’ performance in mathematics in secondary schools in Ikeduru LGA of Imo State, when compared with conventional teaching method. The design to be used in this study will be a pretest-posttest non-equivalent group control design. Simple random sampling technique will be used in selecting 100 Junior Secondary School (JSS3) students from 10 JSS schools to be selected in the LGA. A well-constructed questionnaire will be developed by the researcher, face validated, and use it for data collection. The reliability coefficient of the research questionnaire will be established at 0.73 coefficient. The research will be guided by two research questions and two hypotheses, to be tested at 0.05 significance level. The researcher shall analyze the research questions using mean, while analysis of covariance (ANCOVA) statistics will be used to test the hypotheses. The researcher shall make her findings after the study is concluded, and recommendations will be made alongside the findings to ascertain the teaching method that should be adopted in the teaching of mathematics in secondary schools in Nigeria. »Best » Practice in Handling of Cases of Missing or Incomplete Values in Data Analysis: A Guide against Eliminating Other Important Data Patrick Onyeneho, Joint Admissions and Matriculation Board (JAMB), Abuja, Nigeria; Chidinma Ifewulu A common issue encountered in data analysis is the presence of missing values in the dataset. Modern statistical technique require complete data, but some statistical packages often default to the least desirable options in handling missing data such as (exclude cases list-wise, exclude cases pairwise, exclude cases analysis by analysis, etc.). Allowing software packages to perform the task of removing incomplete data most often create the problem of eliminating a good deal of other important data that contribute to the overall analysis. Incomplete or missing data affects the precision and validity of the result estimation depending on the extent of ‘missing-ness.’ Various methods are available for handling incomplete values before data analysis. This study aims at comparing the results of using complete data in analysis, missing completely at random (MCAR), missing at random (MAR), means substitution (MS), and multiple imputation (MI). A random sample of 3,500 examinee responses in the UTME Physics subject was extracted and analyzed. Ten percent of the data were deleted to simulate an MCAR situation. Four different statistical methods were used in dealing with the missing values. Scores obtained from the analysis showed significant relationship with discipline applied by the examinees in the original dataset. The relationship is based on complete data analysis, MAR and MS were biased and insignificant. MI yielded a closest significant estimate of the relationship (at p < 0.05). It is recommended that MI method be used in handling missing or incomplete data because it produces unbiased estimates of relationship in MAR situation. Missing or incomplete data should not be disregarded in analysis because this will produced biased estimates.

»What » Might Teachers’ Conceptions of Assessment Mean for Validity in High Stakes, School-Based Assessment? Cathy Schultz, South Australian Certificate of Education (SACE) Board of South Australia The South Australian Certificate of Education (SACE) is the single senior secondary certificate in South Australia, providing pathways to university, vocational education and training, and the workplace. The certificate was first introduced in 1992 and subsequently reviewed. Following the release of Success for All: SACE Review Final Report in 2006, a comprehensive agenda of reform in senior secondary curriculum, assessment, and certification was undertaken in South Australia. One of the key reforms undertaken was to place greater reliance on teacher judgement in this high-stakes assessment context. The expanded use of teacher judgement in high-stakes assessment was based on a belief that “validity and reliability are increased because teacher-led assessment enables a wide range of skills, concepts, processes and understandings to be demonstrated” (Crafter et al., 2006, p. 129). The conference presentation will draw on research undertaken for a Masters of Education thesis, South Australian senior secondary teachers’ conceptions of assessment: An analysis of how and why SACE teachers assess as they do. Using focus groups as a primary research methodology, the research captures South Australian senior secondary teachers’ beliefs, attitudes, and approaches to assessment as, and immediately following the implementation of the revised SACE (2009 – 2012). The qualitative research design enables the richness and diversity of teachers’ individual voices to emerge. Analysis of the data explores the relationship between pre- and in-service learning, teacher accountability in a high-stakes assessment context, and the impact of these on educational assessment. A framework that illustrates SACE teachers’ cognitive understandings, beliefs and attitudes, approaches, and actions in relation to assessment has been developed. The framework will be presented and issues of validity in high-stakes assessment that are raised by these conceptions will be explored. »Survey » of Teachers’ Attitude to the Validity of Instruments Used for Continuous Assessment of Basic Education in North Central Geo-Political Zone of Nigeria Prof. Charles M. Anikweze, Dept. of Educational Foundations, Nasarawa State University, Keffi, Nigeria In the Nigerian school system, continuous assessment (CA) is applied, in part, for the determination of the overall achievement of learners, particularly at basic education levels. Current policy provides that CA will take 60% while the end of term examination accounts for 40% of the cumulated scores that determine the learners’ ultimate annual achievement. At federal level, attempts were made to harmonize the implementation strategies for CA and particularly test development procedures to improve the validity of the instruments used by teachers for assessing learning, but the actualization remains an issue for perturbation. The survey sought to explore the attitude of basic education teachers towards establishing the validity of the instruments applied for continuous assessment of learners. Three research questions and two hypotheses guided the study. The multistage stratified random sampling procedure was adopted to select a sample of 1,500 teachers from three states of the north central geopolitical zone plus the Federal Capital Territory. A researcher-developed 20-item structured questionnaire of the Likert type was validated through expert consensus-based appraisal and used for data collection. Data were analysed using descriptive statistics for answering research questions and chi square for testing the hypothesis at the 0.05 alpha. The results show that most of the professionally qualified teachers are sensitive to the relevance of validity for any test that will produce useful results. Nonetheless, only a negligible proportion of teachers irrespective of sex and level of educational qualification actually bothered to establish the validity of the instruments they use for obtaining CA scores. Findings from the study reflect the inevitable gap between policy prescription and its implementation. The dynamics of politics coupled with the diversity of ethnic nationalities and the associated multi-lingual confederacy seem to have acted as stumbling blocks to policy implementation. It is recommended that the Federal Ministry of Education should intensify efforts to harmonize the implementation strategies for CA particularly in ensuring the validity of the instruments used by teachers for assessing learning. »Factorial » Validation of an Academic Environment Scale for Research Methods Education Students in Jos, Nigeria Christiana Ugodulunwa,University of Jos, Nigeria; Amos Adeyemo This study developed and validated an academic-environment assessment scale for Research Methods undergraduate education students in the Faculty of Education of a University in Nigeria. Dearth of reliable academic environment assessment scale and persistent poor performance of students over the years in Research Methods course prompted the study. Academic variables such as students’ personal dispositions and interactions with the course lecturers, which are intertwined with stress, tension, and frustration, are assumed to be contributing factors to success or failure in the course at the undergraduate level. A sample of 310 students was drawn from a population of 790 final year undergraduate education students in 2013/2014 academic session, using a stratified sampling technique. A 42-item academic environment scale (AES), which sought perceptions of students about their academic environment developed and validated by the researchers, was used for data collection. Data collected were analysed using exploratory factor analysis technique and Cronbach alpha

method. The data passed the Kaiser-Meyer-Olkin (KMO) test for adequacy of sample and Bartlett’s test of sphericity for suitability of data for factor analysis. The Kaiser’s rule (eigenvalue ≥ 1) and scree plot were adopted in extraction of five factors as underlying structure of the instrument. The factors were labeled personal attention to students, respect for students, relationship with lecturers, freedom of learning, and commitment to academic work, based on the description of items that loaded on them. The reliability coefficient of 0.85 was also established for the instrument. The factor analysis and reliability results provided clear evidence for factorial validity and reliability of the instrument. It was recommended, based on the findings, that the instrument should be used for exploring students’ perceptions of personal and social academic-environment variables and as a valid diagnostic tool for providing guidance and counseling support to students. »Using » Itemised Data Capture Reports as an Additional Lens in Evaluating the Use of Cognitive Taxonomies in Determining the Difficulty of Physical Sciences Examinations Dr. Helen Sidiropoulos, Independent Examinations Board South Africa In response to the need for evidence on candidate performance at different cognitive levels, the Independent Examinations Board captures the Grade 12 National Senior Certificate Examination results question by question. These results are reported in what is known as the Itemised Data Capture Report (IDC). This report provides a unique opportunity for the IEB to monitor the performance of candidates in questions set on different content areas and at different cognitive levels. This empirically grounded information informs the setting of future examination papers by comparing the Itemised Data Capture data with the original intention of the questions recorded in the examination taxonomy design grids. It also influences the teaching and learning process through the dissemination of all the data to the teaching community. Results from the IDC Report in the NSC Physical Sciences in 2014 reveal that the performance of candidates was at its lowest over a seven year cycle, even though the examination papers, according to both internal and external moderators, were set in line with the cognitive taxonomy requirements of the assessment framework. Inferences from these reports suggest that the cognitive level alone does not determine the question difficulty level. This paper provides a snapshot of how an examination paper set at the required cognitive taxonomy levels was insufficient in predicting the desired performance outcomes of Physical Sciences candidates. The data generated from the IDC Reports affords teachers and examiners an additional tool, and possibly a more appropriate tool, that can be used to inform more valid assessments. Concentrated efforts should be directed at building assessment capacity in teachers and examiners that develop their understanding of the relationship between cognitive levels, difficulty levels, and candidate performance. This includes an understanding of the inferences that can be made from the IDC Reports. »Challenges » to Systemic Validity in Examination Systems Jo-Anne Baird, Oxford University; Therese N. Hopfenbeck High-stakes tests and examinations in the final years of school education are evident in almost every country internationally. Even as increasing numbers of countries engage in curriculum and assessment reform in an attempt to create curricular and pedagogical experiences fit for the 21st century, the challenge of creating assessment systems that are fit for a variety of purposes remains. This presentation documents some challenges currently faced by examination systems - 1) crises of knowledge, 2) spiraling reform, 3) globalisation, 4) performativity, and 5) grade inflation. The implications of these issues for curriculum, test development, and pedagogy at multiple levels of the education system will be explored. There are tensions running deeply through education systems to the point that there are power struggles over what constitutes knowledge, who can be trusted to impart knowledge, and by what criteria we credit knowledge with a qualification. In part, this presentation serves to ’call out’ these issues, but we also argue that to tackle the negative consequences of some of these trends would require a different approach to curriculum and assessment than the short-term political models currently permit. Systemic validity is threatened by the culture that has developed around assessment such that many question whether educational outcomes are worthwhile or are in decline. We discuss the reasons for this and the tough challenges that it raises for test developers. Unless we engage properly with these issues, the educational project of assessment systems is undermined. »Improving » Test Development Procedures to Improve Validity among ECDE Trainers in Kenya Elizabeth A. Obade, The Kenya National Examinations Council Validity is an important quality to consider when constructing or selecting a test. Test validity can be interpreted as usefulness for the purpose. A test with high content validity for one curriculum may not be valid for another curriculum. Classroom tests can be designed by the teacher to determine or monitor progress of the learners in a classroom. Most trainers have limited assessment training and even less time to develop and evaluate their own tests. Trainers routinely develop tests without considering validity issues. The Kenya National

Examinations Council is mandated to assess the learning objectives of Early Childhood Development Examination (ECDE) trainees at Certificate and Diploma level and award certificates to successful candidates. However, the average performance of some candidates in National Examinations as alleged by Chief Examiners is attributed to poor syllabus coverage. The purpose of the study is to establish the factors influencing validity in classroom test construction. The study objectives therefore seek to explore the content and face validity of ECDE classroom examinations, effect of trainers experience in test construction, to establish the use of table of specification and Bloom’s taxonomy in classroom tests to improve validity, and to examine the challenges facing trainers in classroom test construction. The Study will adopt a case study research design and will be conducted in ECDE training institutions in Nairobi County. Data will be collected through the use of questionnaires and structured interviews from the ECDE trainers. Data analysis will be done using descriptive analysis, and the information will be presented in figures and percentages in tables. WEDNESDAY, OCTOBER 14: SESSION III //////////////////////////////////////////////////////////////////////////////////////////////////

»Improving » the Validity of Grading and Reporting Practices of Competency-Based Assessment (CBA) in Ghanaian Vocational Education and Training (VET) System Dr. Peter Boahin, National Board for Professional & Technician Examinations (NEBPTEX) Since the introduction of competency-based training (CBT) in Ghanaian vocational education and training (VET) system in the early 2006, one area that presents the most contentious issues in its implementation is assessment and reporting of learning outcomes. CompetencyBased Assessment (CBA) may be defined as a process of judging competency against prescribed standards of performance (Argüelles & Gonczi, 2000). This is based on the premise that competency standard defines only one level of performance criterion that can either be demonstrated or not (Allais, 2003). Furthermore, CBA results are not graded; the assessment process is based on criterion-referencing (CR) rather than norm-referencing (NR), and learners are judged as either ’competent’ or not ’Not yet Competent’ (Wolf, 1993, Schofield & McDonald, 2004). Critics who oppose non-grading and its associated binary reporting techniques in CBA argue that the system encourages mediocrity rather than excellence in the learning process (Gills & Griffin, 2005). In their views, competency is a developmental continuum that lends itself to higher levels of performance. Wolf (1993) argues that CR testing produces levels of performance for which a ‘competent/not yet competent’ is one of them. In reality, a purely CR or NR assessment does not exist, rather, a CR testing may be capable of yielding NR information (Thompson et al, 1996). The use of ‘competent/not yet competent’ results from CBA seems to disadvantage a number of VET applicants seeking employment or further education partly due to difficulties in; a) transferring credits between institutions and programmes, b) selection procedures, c) converting VET results into scores comparable with other graded results, d) equating ‘competent’ in ungraded results with ‘pass’ results in graded subjects, and e) interpreting proficiency-based transcripts from VET applicants. As a result of these challenges, National Board for Professional and Technician Examinations (NABPTEX), the main Examining Body for VET in Ghana, has been inundated with letters from industries, organisations, and universities seeking clarification, interpretation, equivalences of ungraded results, and review of CBA grading and reporting levels. »Analytical » Report as a Tool in Ensuring the Validity of Examinations Nazgul Tuitina, AEO Nazarbayev Intellectual Schools; Indira Ismailova It is crucial for educational organization, especially those which main focus is testing assessment quality, to ensure that students achieve the results stipulated by subject programmes. Nazarbayev Intellectual Schools are implementing external summative assessment (ESA), which is absolutely different from national testing system in structure and content and is aimed to assess 21-century skills. The present research is based on first ESA results which are processed and summarized in Analytical Report (2013-2014). It was important to see how students succeed and what kind of improvements should be made. For example, it was obvious that students demonstrate the higher-order thinking skills; on the other hand, results showed lower performance on some subjects where functional reading skills were assessed. The aim of this study is to examine the report and to identify the validity of the assessment.

»Progress » in the Validation of High Stakes Assessment: the Case of Selection Test for Nazarbayev Intellectual Schools in Kazakhstan Miras Baimyrza, AEO Nazarbayev Intellectual Schools; Rustam Abilov; Nico Dieteren; Anara Dyussenova; Frans Kamphuis; Frans Kleintjes, Dariya Kulbayeva; Zamira Rakhymbayeva Nowadays, teaching and learning in many countries are oriented in acquiring competences by students to enable them to function as citizens in modern society. In Kazakhstan, Nazarbayev Intellectual Schools (NIS) were initiated with the aim to prepare an intellectual capital of the nation, focusing on the representatives of the new generation who will contribute to the prosperity of the country in the (near) future. To establish this goal, a new curriculum has been developed jointly with Cambridge University. Implementing a new curriculum asks for an appropriate selection procedure to be sure that selected NIS students able to study naturalmathematic sciences. For the selection procedure it means that it has changed in such a way that it will select those students that will be able to acquire the skills and knowledge as indicated by the curriculum. This means a thorough change in the type of questions that are asked in the selection test: more application of knowledge and higher order skills than (just) knowledge. The selection test should aim at assessing potential rather than remembering or knowledge. The first selection test was administered in 2013. In the meanwhile, several large scale assessment sessions have been organised. Applicants have been selected and are appointed for the highly valued scholarships. Public and political interest for the test and its results are high. In the paper we will focus on the validation research we are currently developing to determine some very essential questions about the quality of the test: • did we select the right students? • how good do the test items and the results of the candidates predict future successfulness in NIS schools? • are composition, structure and administration of the test appropriate for the purpose? »The » Role of Testing in Assessment of Teachers’ Professional Skills Saltanat Abdildina, National Testing Center of the Ministry of Education and Science of the Republic of Kazakhstan; Aktoty Kametova; Maral Aliakparova Quality of education depends on teacher’s professional competences. There is organized testing the teaching staff qualifications in the Republic of Kazakhstan with a view of professional training and assessment of their functional accuracy (QTTS). The new format of testing consists of multiple-choice questions with one true answer, with one or several true answers, and situational tasks. The given article deals with innovations of testing process during QTTS as well as a comparative analysis of the old and new formats of QTTS through the program of RUMM. During the testing process of teachers there were organized questionnaire for teachers for purposes of determining their understanding of the content of testing. During the process of testing there were considered obvious validity, internal validity, reliability, correlation of tests. Another purpose of the testing was to determine the compliance of tests in the new format with the levels of teachers’ functional literacy. The results and conclusions presented in a paper of testing is aimed for further discussion and improvement of items in new format. »Validity » Concerns in Assessment and the Competence of the Classroom Teacher. An Overview of the Uganda Classroom Teacher’s Competence. Dan Kyagaba, Uganda National Examinations Board Validity is one of the critical indicators of the quality of an assessment instrument. It is basically the measure of the extent to which an assessment instrument measures what it was designed to measure, by showing the degree of relationship between the curriculum standards and the instrument. Assessment instruments are tools regularly designed and employed by a classroom teacher to measure learners’ progress and, eventually, to determine learners’ achievement over time. To achieve this, the instrument must possess the ability to deliver effectively. Delivering effectively means that the instrument measures exactly what it was tasked to measure. To a classroom teacher, the assessment instrument should pass at least the content and construct validity levels. The validity of an instrument is taken care of during the process of development of that very instrument. A classroom teacher regularly develops and uses test instruments to assess the teaching and learning that goes on in his/her class. This presupposes that the teacher possesses the required skills or ability to come up with good quality instrument. Such ability is the teacher’s competence to cater for validity, among other qualities, of an assessment instrument, and, to a classroom teacher, mostly the content and construct validity. Therefore, the development of an effective assessment instrument requires the developer to have at least basic competence in the principles of assessment. The paper sets to give an overview of the classroom teacher’s competence to attend to validity concerns during assessment.

»An » Evaluation of the Awareness of the Need to Ensure Validity in the Continuous Assessment Component of Examination by Some Lecturers of Kaduna Polytechnic Martha Ada Onjewu, Kaduna Polytechnic Marks emanating from continuous assessment (CA) being forty (40) of each subject or course in most cases form an integral part of the final results of students earned at the point of graduation from all levels of education the world over. At the tertiary level, particularly, while care is taken to ensure that examination question setting and administration are carefully managed to ensure validity among other considerations, the management of CA is entirely left to the dictates of individual lecturers. This paper evaluates the awareness of the importance of CA by lecturers of Kaduna Polytechnic and draws their attention to the need to ensure validity in CA by engaging a set of questionnaires .The data obtained was analyzed using frequency counts and content analysis. The conclusion of the findings suggests that the lecturers do not attach the desired importance to CA when compared to Examination and do not go out of their way to ensure the validity of the items in their CA. To improve the situation, the paper suggests the organization of a workshop to discuss the subject matter alongside the strategies to engage in order to achieve validity in CA as much as in examination through peer collaboration. »Towards » Application of Item Response Theory to Data from India’s Graduate Aptitude Test in Engineering (GATE) Devlina Chatterjee, Indian Institute of Technology, Kanpur, India; Anindya Chatterjee In India, many important examinations include multiple-choice questions. In engineering, admission to graduate programs is partly based on candidates’ scores on the Graduate Aptitude Test in Engineering (GATE), taken by about a million candidates annually. GATE is administered in various engineering disciplines, such as Aerospace, Civil, Electrical, Mechanical, etc. In GATE 2015, the number of candidates for Mechanical Engineering exceeded 170,000, while for Aerospace Engineering it was slightly under 4,000. Item response theory (IRT) has so far not been used for GATE. Exploring the GATE data towards possibly incorporating IRT in future, we have studied the 2015 Aerospace Engineering exam. This exam contained questions with numerical answers to be keyed in, as well as 36 multiple-choice questions with 4 options each (one correct). In the actual exam, negative marks were given for wrong answers, zero for omitted questions, and full marks for correct answers. In the GATE, many candidates simultaneously face identical questions. The questions are notionally all new, set annually by different professors in a confidential operation coordinated by a small committee. Many institutes are stakeholders in this exam, which has traditionally been using classical scoring methods as above. Consequently, adoption of modern methods like IRT for this exam must be preceded by case studies such as we begin here. For our polytomous data, we have used the Generalized Partial Credit Model (GPCM) implemented in R (package: ltm). The z scores obtained are approximately normally distributed. The discrimination parameters for various questions mostly show useful variations, but a few have tiny negative values. In ongoing work, we will investigate those specific questions in consultation with domain experts to understand possible reasons for such tiny negative values. By the time of the conference, we will have more detailed results, especially comparing ranks obtained from usual scoring versus the IRT. »Evaluating » the Power of Xcalibre and ConQuest Software in Item Parameter Recovery: A Comparative Analysis Omokunmi Popoola, Joint Admissions Matriculation Board (JAMB), Abuja, Nigeria; Francis R. Ojo; Patrick Onyeneho; Chidinma Ifewulu The quest for obtaining valid and reliable results in assessment has compelled measurement experts to consider a reexamination of the choice of software they use as well as the Item Response Theory (IRT) model applied in calibrating the items. This choice, however, should be made trendy. Calibration software are expected to accurately estimate parameters irrespective of the form of IRT model used. The accuracy of parameter recovery can be shown by computing the bias and root mean square error (RMSE) statistics for each of the estimated parameters. The purpose of this paper is to evaluate and compare parameter recovery and classification accuracy of Xcalibre and ConQuest software. The study utilized data extracted from candidates’ responses in 2014 UTME Physics test. The sample data was classified into four groups of different sizes, test lengths, and models. (i.e., 3 test lengths x 4 sample sizes x 3 models). The result shows interaction between item combination, IRT model, and sample size impacted the root mean square error in the theta recovery and the classification accuracy. Also, the two software programs produced related results and the algorithm used to a great extent recovered IRT parameters. However, it appears that the root mean square error accuracy in parameter estimation decreased as sample size and test length reduces.

»An » Application of Rasch Model to Illustrate the Validity of Professional Teacher Test from Different Perspectives Dr. Muhammad, National Center for Professional Testing, Kingdom of Saudi Arabia; Dr. Abdullah AI Sadaawi The validity and reliability of test scores are important concerns in education setting. There are numerous procures to demonstrate these curial aspects of any high stake standardized test. In present study a number of explanatory models using a Rasch measurement framework will be used to assess the accuracy and validity of scores on educational assessments. In particular, systematic relationships between student and item characteristics and achievement differences will be explored using differential item functioning (DIF), differential group functioning (DGF), and differential person functioning (DPF). Present study will enable test developers to better understand subgroup performance rather than to conduct bias or sensitivity reviews using these techniques. Beside the dimensionality, factor structure, and invariance of the measurement model across gender and teaching experience groups, item analysis using classical test theory and item response theory that are in place to examine reliability, validity, and fairness related to the professional test are sufficient for examining the psychometric quality of professional tests, the proposed analyses will serve as additional tools that supplement the routine analyses and will examine whether selected student characteristics (gender, experience, and study background) influence DIF, and also whether subgroups of students function differentially on different item subsets. High-stakes tests developed by the National Center for Assessment in Higher Education as an assessment tool for teacher certification will be used to illustrate these perspectives. »Examining » the Utility of the Music Student Motivation Scale (MSMS) in Higher Education using the Rasch Rating Scale Model Pey Shin Ooi, School of Education, the University of Adelaide, Australia Motivation has a key role in students’ instrumental music learning process which requires students to undertake extensive independent practice in addition to their formal one-to-one tuition. Therefore, it is important to ensure that students remain motivated so that they persist and engage in their music learning. Although there have been many research studies within the primary and secondary school sectors, research into students’ motivation in the field of music performance in the higher education is still lacking. Thus, the focus of this study is to develop an instrument which conforms to the measurement properties in terms of validity and reliability to examine music students’ motivation within the context of higher education. The instrument is a survey questionnaire consist of music student motivation scale. It is vital to ensure validity and reliability of the items used in the instrument to produce meaningful and useful inferences. The survey was carried out in Malaysian higher education institutions which offer music programmes. The survey has been completed by 375 bachelor’s degree students enrolled in music courses requiring them to undertake music performance assessment at the end of the semester. The data collected has been validated based on Rasch rating scale model (RSM) using ConQuest software. By employing RSM, it allows examination of the model fit at the item level. The results and their implication to research on student motivation, particularly in music, are discussed. This study has been conducted in the hope of laying a foundation for future expansion of the instrument and to promote research in higher music education. WEDNESDAY, OCTOBER 14: SESSION IV //////////////////////////////////////////////////////////////////////////////////////////////////

»Critiquing » Kane’s Argument-Based Approach to Validation Stuart Shaw, Cambridge International Examinations, Cambridge Assessment Over the past thirty years and particularly through the accumulated scholarship of Michael Kane, an argument-based approach to validation has gained widespread attention within the educational and psychological measurement and assessment community. Kane’s approach, which resonates with concepts in program evaluation, aspires to make validation a more tractable endeavour for practitioners by decomposing the big question of construct validity into meaningful, manageable chunks. Building on the work of Cronbach, Kane has provided a practical structure for validation against which others can model their own validation efforts depending on the intended purposes of the assessment and intended interpretations of assessment outcomes. Though not without its detractors, an argument based-approach to validity inquiry is the current preferred conception within the literature.Yet, despite the promise of the argument-based approach to provide a realistic and pragmatic “technology” (Kane, 2004, p.136) for evaluating the interpretations and uses of tests scores there are relatively few examples of argument-based validation studies reported in the literature (though there is an increasing number of validation studies across a number of disciplines appearing in measurement and assessment journals). After more than a quarter of a century, it remains to be seen whether the argument-based approach can fulfil its potential and revolutionise validation practice on a large scale. This presentation considers some of the merits and de-merits of Kane’s argument approach to validation. By way of illustration, a warranted defeasible validity argument, grounded in Toulmin’s logic of informal argumentation and which takes as its focus a generic scoring inference based on Kane’s Interpretation/Use Argument, is first assembled then critiqued.

»Validity » in Practice: What Is it Realistic to Expect from Examination Bodies? Sarah Maughan, AlphaPlus Consultancy Theoretical perspectives on validity have developed extensively over recent years, and this has been translated into proposed models for validation studies. This paper will consider the reality of the situation in England, where regulating validity is the statutory objective of the qualifications regulator, and validity auditing is now the primary mechanism for the regulator to evaluate awarding organisation’s compliance with their conditions of recognition. Rather than going into the theoretical arguments again in detail (that will be left to those more expert than the author), the paper will consider the interface between theoretical perspectives and emerging practice. The qualifications market in England could be categorised, simplistically, into: • general qualifications: those qualifications designed for use by learners in school-based settings following an academic route • technical and vocational qualifications: those qualifications designed for use by learners, either in schools, technical or Further Education colleges or in adult educational settings, following a more vocational or applied route • professional qualifications: those qualifications designed for use by adults entering or progressing in a particular professional career. Although there is some overlap, in many cases there are different organisations involved in designing, developing, and awarding these different qualifications. In many cases the organisations follow different day-to-day processes. The way in which validation takes place in each of these types of organisation is also variable. I will put forward a suggested categorisation of what the organisations ought to be doing, given the theoretical developments and the context in which they operate, and then contrast this with the reality of current practice (where this is known). The categorisation will be based on experience of working with a wide range of awarding organisations in England on validity or other assessment development or research projects. »Dismantling » Face Validity – Why the Concept Must Live On, But the Term Must Finally Die Tzur Karelitz, National Institute for Testing and Evaluation; Charles Secolsky Traditionally, Face Validity (FV) refers to whether a test, on the face of it, seems to measure what it aims to measure. We concur that FV is not validity at all. Without providing proper evidence, such a claim cannot be taken seriously. Consequently, there has been little conceptual development for FV, as opposed to modern conceptions of validity. According to the validity as argument framework (Kane, 2013), validators need to evaluate evidence supporting or refuting the chain of assumptions and inferences underlying interpretation and uses of test scores. This is not a simple yes/no question; it becomes an elaborate argument. Validators often use the perceptions of experts to evaluate validity claims, but are expert perceptions enough? We propose that FV is a type of perception-based evidence (PBE) that can be used to evaluate validity claims. For laypersons, forming opinions about the test is more likely motivated by satisfaction with test outcomes and not test reliability. If test users are not convinced of test utility, or examinees feel the test is unfair or too expensive, the test may perish, thus making the study of its validity irrelevant. Therefore, PBE should include foci on various stakeholders’ perceptions, including non-professionals. PBE provides the extent to which interpretive arguments seem plausible, clear, and coherent to examinees, users, and decision-makers. This perspective is valuable for the validation and is directly relevant to the definition of validity. In order to evaluate the plausibility of arguments, validators should juxtapose claims against various alternatives. Studying non-expert perceptions of the test informs alternative inferential networks, and helps explore the negative perceptions that create validity threats. The measurement community, by eradicating uses of FV, has unintentionally crippled the development of literature on appropriately using perception-based evidence for validation. »Nazarbayev » Intellectual Schools Graduates’ Opinions as a Factor of Validity in the Development of Educational Processes and Assessment of Learning Outcomes Olga Mozhayeva, AEO Nazarbayev Intellectual Schools; Aidana Shilibekova; Zhanat Bazarbekova The opinion of school leavers and higher education graduates has the potential to provide institutions with significant qualitative data with which to reflect on the relative success of their students’ learning journeys, both in terms of the quality of their education and the validity of methods used to assess learning outcomes. Analyzing data relating to students’ educational preparation for university and their subsequent experience of higher education will provide: 1. An accumulation of detailed information relating to students’ pedagogical experiences. 2. An insight into the quality of predictions for students’ school based examination results. 3. The relevance of these results with regard to a students’ subsequent performance in higher education and preparations for future careers.

The case-study presented here examines the experience of students who have studied at 6 Nazabayev Intellectual Schools (NIS) in the Republic of Kazakhstan (at least 300 responses from undergraduates enrolled on degree programs at the different universities across the Republic of Kazakhstan have been received). Each was questioned on their satisfaction with, and perceived relevance of, the education received at their NIS school and also the education they are currently receiving at their chosen University. Criteria used for the design of each student’s questionnaire included: recent academic and extracurricular achievements at their chosen university; specific factors characterizing an individual’s academic experience at their chosen university; NIS results used to obtain their place at university; satisfaction with their NIS education; examination results obtained at their NIS school; perceived relevance of their NIS education. The questionnaire was delivered by means of an online survey. »Thin » Client Technology, Innovation, and Simultaneous Examination of Large Population in Multiple Locations in the National Open University of Nigeria Ukoha Igwe, National Open University of Nigeria Examinations are very important in universities and indeed in all institutions of learning. Conducting credible, reliable examinations and producing valid results remains central in virtually every examination. The National Open University of Nigeria (NOUN), the only dual mode tertiary institution in Nigeria, has a student population of over 400,000 students, and about 50% are very active and write examinations regularly. The university also currently has about 70 study centres across every corner of Nigeria. Students write examinations in these centres simultaneously. The centres are also growing geometrically. This study examines the challenges of conducting examinations in the centres; the introduction of computer-based testing (CBT) to 100/200-level students; the inconsistencies in the results from different centres; the systems failures; the astronomical cost of conducting each examination; the loss of some results; and several other challenges. The study x-rays the introduction of the thin client technology and the innovation that has remarkably changed the conduct of simultaneous examination of large population in multiple locations in the National Open University of Nigeria. The Thin clients are used as PC replacement technology. The cost, security, manageability, scalability and the ability to produce valid results in all the centres at the same time are benefits that recommend the thin clients as tested technology that help improve validity of examination results.

CO NFER ENC E AT T E N D E E S NAME

ORGANIZATION

COUNTRY

Abdallah Khamir

Uganda Allied Health and Examinations Board

Uganda

akhamir2000@yahoo.com

Abdulrashid Garba

National Examinations Council

Nigeria

emenimf@yahoo.com

Abubakar Gana Mohammed

National Examinations Council

Nigeria

emenimf@yahoo.com

Abubakarr S A Rahman

West African Examinations Council

Sierra Leone

rabubakarr@hotmail.com

Adedibu Ojerinde

Joint Admissions and Matriculation Board Â

Nigeria

oluranti51@gmail.com

Aderemi Adeniyi

Joint Admissions and Matriculation Board Â

Nigeria

oluranti51@gmail.com

Adesina Abraham Adetona

National Examinations Council

Nigeria

emenimf@yahoo.com

Agnes Wadda

Uganda Nurses and Midwives Examinations Board

Uganda

info@unmeb.go.ug

Aidana Shilibekova

AEO Nazarbayev Intellectual Schools

Kazakhstan

aidanashil@gmail.com

Aigerim Aitbayeva

AEO Nazarbayev Intellectual Schools

Kazakhstan

aitbayeva_a@cpi.nis.edu.kz

Alexander Kamwesiga

Uganda Allied Health and Examinations Board

Uganda

katoakimoga@gmail.com

Anat Ben-Simon

National Institute of Testing and Evaluation

Israel

anat@nite.org.il

Anne Oberholzer

Independent Examinations Board

South Africa

oberholzera@ieb.co.za

Antony Furlong

International Baccalaureate

Netherlands

antony.furlong@ibo.org

Asadullah Almani

Merit Testing Service

Pakistan

aqeelbhai_2007@yahoo.com

Ashirudeen Maliki

West African Examinations Council

Nigeria

sobodaniels@yahoo.com

Ayokunle Akinniran

Fleet Technologies Ltd

Nigeria

tayoko@fleettechltd.com

Babunde Sinatra Aina

National Examinations Council

Nigeria

emenimf@yahoo.com

Badayi Ahmed

National Examinations Council

Nigeria

emenimf@yahoo.com

Badrul Hisham bin Abdullah

Malaysian Examinations Council

Malaysia

badrul@mpm.edu.my

Bert IJsveld

Cito

Netherlands

bert.ijsveld@cito.nl

Bolanle Ojeleye

West African Examinations Council

Nigeria

hnowaeclagos@waecnigeria.org

Brandon Craig

Polk County School Board

United States

brandon.craig@pol-fl.net

Brian Stecher

RAND Corporation

United States

stecher@rand.org

Cathy Schultz

SACE Board of South Australia

Australia

cathy.schultz@sa.gov.au

Charles Anikweze

Nasarawa State University

Nigeria

anikweze@yahoo.com

Charles Eguridu

West African Examinations Council

Nigeria

hnowaeclagos@waecnigeria.org

Charles Enock Msonde

National Examinations Council of Tanzania

Tanzania

paes@necta.go.tz

Charles Secolsky

Mississippi Department of Education

United States

csecolsky@gmail.com

NAME

ORGANIZATION

COUNTRY

Che Yee Lye

University of Adelaide

Australia

cheyee.lye@adelaide.edu.au

Chidimma Anya

Federal University Gusau

Nigeria

chidijudeanya@yahoo.com

Christiana Ugodulunwa

University of Jos, Nigeria

Nigeria

ugochr@unijos.edu.ng

Chukwudi Mensah

Michael Okpara University of Agriculture, Umudike Nigeria

mensahchukwudi@gmail.com

Claus Jensen

Danish Ministry of Education

Denmark

claus.jensen@stil.dk

Colleen Parks

University of Calgary/Calgary Girlsâ&#x20AC;&#x2122; School

Canada

colleen.parks@calgarygirlsschool.com

Collens Odo

Michael Okpara University of Agriculture, Umudike Nigeria

collens.ikechukwu@gmail.com

Collins Uduh

West African Examinations Council

Nigeria

hnowaeclagos@waecnigeria.org

Dale Gbotoe

West African Examinations Council

Liberia

dggbotoe@liberiawaec.org

Damian Betebenner

Center for Assessment

United States

dbetebenner@nciea.org

Rex Bookstore Inc.

Philippines

danda_garcia@yahoo.com

West African Examinations Council

Ghana

imaniimani@yahoo.com

Deborah Rukop

National Examinations Council

Nigeria

deborahrukop@gmail.com

Dennis Opposs

Office of Qualifications and Examinations Regulation

United Kingdom

dennis.opposs@ofqual.gov.uk

Dinara Ulanova

AEO Nazarbayev Intellectual Schools

Kazakhstan

ulanova_d@cpi.nis.edu.kz

Ebikibina John Ogborodi National Examinations Council

Nigeria

emenimf@yahoo.com

Edmund Mazibuko

Examinations Council of Swaziland

Swaziland

registrar@examscouncil.org.sz

Eliza Dirnu

RM Education

United Kingdom

edirnu@rm.com

Elizabeth Obade

Kenya National Examinations Council

Kenya

eobade@knec.ac.ke

Ellen Omvlee

Cito

Netherlands

ellen.omvlee@cito.nl

Emeka Ajero

National Examinations Council

Nigeria

emenimf@yahoo.com

Emmanuel Sibanda

Umalusi

South Africa

emmanuel.sibanda@umalusi.org.za

Evangeline AlvarezEncabo

Rex Bookstore Inc.

Philippines

banggita@gmail.com

Evans Okosodo

Fleet Technologies Ltd

Nigeria

tayoko@fleettechltd.com

Evelyn Mwapasa

Institute of Chartered Accountants in Malawi

Malawi

evelyn.mwapasa@icam.mw

Fiona Brigette Anderson

Namibia Training Authority

Namibia

fanderson@nta.com.na

Francis R. Ojo

Joint Admissions and Matriculation Board Â

Nigeria

olranti51@gmail.com

Frans Kleintjes

Cito

Netherlands

frans.kleintjes@cito.nl

Danda Crimelda Garcia David Nii Djan Mensah

NAME

ORGANIZATION

COUNTRY

Guanzhong Luo

Hong Kong Examinations and Assessment

Hong Kong

gluo@hkeaa.edu.hk

Heather Wright

Polk County School Board

United States

heather.wright@polk-fl.net

Helen Mukakarisa Kataratambi

Uganda Nurses and Midwives Examinations Board

Uganda

info@unmeb.go.ug

Helen Sidiropoulos

Independent Examinations Board

South Africa

sidiropoulosh@ieb.co.za

Henk Moelands

Cito

Netherlands

henk.moelands@cito.nl

Ikechukwu Emmanuel Anyanwu

National Examinations Council

Nigeria

emenimf@yahoo.com

Indira Ismailova

AEO Nazarbayev Intellectual Schools

Kazakhstan

ismailova_i@nis.edu.kz

Iris Lark Dizer

Center for Educational Measurement, Inc.

Philippines

baddy.montalbo@gmail.com

Iyi Uwadiae

West African Examinations Council

Nigeria

jpsogie@yahoo.com

Jabbar Ali

Jathol Law Associates

Pakistan

jaalicap@gmail.com

Jacqueline van Hagen

Cito

Netherlands

jacqueline.vanhagen@cito.nl

James Adesoka Ojebode

National Examinations Council

Nigeria

emenimf@yahoo.com

Jamil bin Adimin

Malaysian Examinations Council

Malaysia

ceo@mpm.edu.my

Jan Wiegers

Cito

Netherlands

jan.wiegers@cito.nl

Jenny Flinders

Center for Educational Testing and Evaluation

United States

jflinders@ku.edu

Jerome De Lisle

University of the West Indies

Jo-Anne Baird

University of Oxford

John Asuwe

National Examinations Council

Nigeria

emenimf@yahoo.com

John Gayvolor, Sr.

West African Examinations Council

Liberia

johngayvolor2008@yahoo.com

John OkonkwoUwandulu

Joint Admissions and Matriculation Board Â

Nigeria

olranti51@gmail.com

John David Volmink

Umalusi

South Africa

jvolmink@iafrica.com

Jonathan Hale

RM Education

United Kingdom

marcia.evans@rm.com

Joy Iyamu

National Institute for Public Policy and Administration

Nigeria

unitprotocol@gmail.com

Joyce Ebal Awor

Uganda National Examinations Board

Uganda

joyebal@yahoo.co.uk

Joyce Ndalichako

Aga Khan University

Tanzania

joyce.ndalichako@aku.edu

Juliet Christine Kiberu

Uganda National Examinations Board

Uganda

julietkiberu@yahoo.co.uk

Julius Kamwesiga

United Bank of Africa

Uganda

uaheb09@gmail.com

Kehinde Akinsanya

Dolbib Integrated Services (WA) Ltd

Nigeria

kehindeakinsanya@yahoo.com

Trinidad & Tobago United Kingdom

jeromedelisle@yahoo.com jo-anne.baird@education.ox.ac.uk

NAME

ORGANIZATION

COUNTRY

Kenya National Examinations Council

Kenya

kokemwa@knec.ac.ke

West African Examinations Council

Nigeria

hnowaeclagos@waecnigeria.org

Ko Mei Yee

Police College

China

cip-jpo-eac@police.gov.hk

Kwai Fan Chu

Education Bureau

Hong Kong

eortd7@edb.gov.hk

Kweku Essel-Amoah

West African Examinations Council

Ghana

kesselamoah@yahoo.com

Dan Kyagaba

Uganda National Examinations Board

Uganda

dankya62@gmail.com

Lai Yi Tam

Education Bureau

Hong Kong

louisatam@edb.gov.hk

Laila Nurakayeva

AEO Nazarbayev Intellectual Schools

Kazakhstan

nurakayeva_l@cpi.nis.edu.kz

Lana Elramly

American University in Cairo

Egypt

lelramly@aucegypt.edu

Lay Choo Tan

Singapore Examinations and Assessment Board

Singapore

tan_lay_choo@seab.gov.sg

Lenore Decenteceo

Center for Educational Measurement, Inc.

Philippines

baddy.montalbo@gmail.com

Lorena Garelli

Universidad Anáhuac México Sur

Mexico

lorena.garelli@anahuac.mx

Mabel Agbebaku

Joint Admissions and Matriculation Board

Nigeria

oluranti51@gmail.com

Mafu Solomon Rakometsi

Umalusi

South Africa

mafu.rakometsi@umalusi.org.za

Martha Ada

Kaduna Polytechnic

Nigeria

monjewu@yahoo.com

Matheus Hango

Namibia Training Authority

Namibia

mhango@valombola.vtc.org.na

Mei Ling Kang

Singapore Examinations and Assessment Board

Singapore

kang_mei_ling@seab.gov.sg

Michael Aigbe

Fleet Technologies Ltd

Nigeria

tayoko@fleettechltd.com

Michael Olajide

Sidmach Technologies Nigeria Ltd

Nigeria

maolajide@sidmach.com

Modupe Oke

West African Examinations Council

Nigeria

modupeoke2000@yahoo.com

Mohd Fauzi bin Datuk Haji Mohd Kassim

Malaysian Examinations Council

Malaysia

fauzi@mpm.edu.my

Moses Tijare

Namibia Training Authority

Namibia

mtjirare@nta.com.na

Muhammadu Bello Udu

National Examinations Council

Nigeria

emenimf@yahoo.com

Murali Krishna Komomduri

IBS

India

muralikrishna.k1976@gmail.com

Nazgul Tuitina

AEO Nazarbayev Intellectual Schools

Kazakhstan

ismailova_i@nis.edu.kz

Neil Wade

OCR (Cambridge Assessment)

United Kingdom

neil.wade@ocr.org.uk

Ngondi Kamatuka

University of Kansas

United States

kamatuka@ku.edu

Nneka A. Umezulike

Michael Okpara University of Agriculture, Umudike Nigeria

Kennedy Ondara Okemwa Kingsley Ugochukwu Anyim

neksiems@yahoo.com 57

NAME

ORGANIZATION

COUNTRY

Odutayo Olufolajimi Odukoya

National Examinations Council

Nigeria

emenimf@yahoo.com

Olatunde Aworanti

National Business and Technical Examinations Board

Nigeria

aworantio@yahoo.com

Olayinka Ajibade

West African Examinations Council

Nigeria

hnowaeclagos@waecnigeria.org

Olusanya Francis Dacosta

West African Examinations Council

Nigeria

hnowaeclagos@waecnigeria.org

Oluwasegun Abiodun

Dolbib Integrated Services (WA) Ltd

Nigeria

segunabiodun@yahoo.com

Omokunmi Popoola

Joint Admissions and Matriculation Board Â

Nigeria

oluranti51@gmail.com

Onyemaechi Eke

National Examinations Council

Nigeria

onyemaechieke@gmail.com

Pamela Munro-Smith

Australian Council for Education Research

Australia

pam.munro-smith@acer.edu.au

Pateh D.M.K. Bah

West African Examinations Council

Gambia

patehb@hotmail.com

Patience Agommuoh

Michael Okpara University of Agriculture, Umudike Nigeria

agomuohchinyere@yahoo.com

Patrick Hamilton

West African Examinations Council

Sierra Leone

waecfreetown@yahoo.com

Patrick B.M. Ndulu

West African Examinations Council

Sierra Leone

patndulu@yahoo.co.uk

Patrick Onyeneho

Joint Admissions and Matriculation Board Â

Nigeria

oluranti51@gmail.com

Paul Newton

Office of Qualifications and Examinations Regulation

United Kingdom

paul.newton@ofqual.gov.uk

Peter Arogundade

Sidmach Technologies Nigeria Ltd

Nigeria

apsarogundade@sidmach.com

Peter Hermans

Cito

Netherlands

peter.hermans@cito.nl

Poh Guan Toh

Singapore Examinations and Assessment Board

Singapore

toh_poh_guan@seab.gov.sg

Primbetova Gulzhan

National Testing Center of the Ministry of Education and Science

Kazakhstan

fire-guljan@mail.ru

Promise Nwachukwu Okpala

University of Ibadan

Nigeria

emenimf@yahoo.com

Randy Bennett

Educational Testing Service

United States

rbennett@ets.org

Ray Philpot

Australian Council for Educational Research

Australia

ray.philpot@acer.edu.au

Roderic Gillespie

Cambridge International Examinations

United Kingdom

gillespie.r@cie.org.uk

Saltanat Abdildina

National Testing Center of the Ministry of Education and Science

Kazakhstan

saltanat_abdildina@mail.ru

Sam Nii Nmai Ollennu

West African Examinations Council

Ghana

snnollennu@waecgh.org

Samuel David

West African Examinations Council

Liberia

swdavid2002@yahoo.com

Sarah Maughan

AlphaPlus Consultancy

United Kingdom

sarah.maughan@alphaplusconsultancy. co.uk

Sayita Wakjissa

University of Jos, Nigeria

Nigeria

wsayita@yahoo.com

Serafina Pastore

University of Bari Aldo Moro

Italy

serafina.pastore@uniba.it

NAME

ORGANIZATION

COUNTRY

Shalewa Adetimehin

Joint Admissions and Matriculation Board Â

Nigeria

oluranti51@gmail.com

Shoadi Ezekiel Ditaunyane

Umalusi

South Africa

lucky.ditaunyane@umalusi.org.za

Simon Reusch

Danish Ministry of Education

Denmark

simon.reusch@stil.dk

Sosseh Jagne-Gillen

West African Examinations Council

Gambia

sosabou@hotmail.com

Steve Harrington

RM Education

Stuart Atkins

Department for Education

Stuart Shaw

Cambridge Assessment

England

shaw.s@cie.org.uk

Subindra Jwarchan

Eden Educational Consultancy

Nepal

info.eeducon@gmail.com

Tifapi Jere

Institute of Chartered Accountants in Malawi

Malawi

tifapi_jere@yahoo.com

Turakty Intymakov

National Testing Center of the Ministry of Education and Science

Kazakhstan

t.intymakov@ncgsot.kz

Tzur Karelitz

National Institute for Testing and Evaluation

Israel

tzur@nite.org.il

Uchenna Ofong

National Examinations Council, Minna

Nigeria

ofongu@yahoo.com

Ursula Akanwa

Michael Okpara University of Agriculture, Umudike Nigeria

sisngoakanwa@yahoo.com

Uvanguapi Kamberipa

Namibia Training Authority

Namibia

ukamberipa@nta.com.na

Veronica Asante

West African Examinations Council

Ghana

vasante17@yahoo.com

Vincent Ligt

Cito

Netherlands

vincent.ligt@cito.nl

Vincent Ado Tenebe

National Open University of Nigeria

Nigeria

miyere-ebojele@noun.edu.ng

Wilson Rwandembo

Uganda Allied Health and Examinations Board

Uganda

rwandembo@yahoo.co.uk

Wilton Kezala

Uganda Nurses and Midwives Examinations Board

Uganda

info@unmeb.go.ug

Zamira Rakhymbayeva

AEO Nazarbayev Intellectual Schools

Kazakhstan

rakhymbayeva_z@nis.edu.kz

United Kingdom United Kingdom

marcia.evans@rm.com stuart.atkins@education.gsi.gov.uk

TH A NK Y OU T O O U R S P O N S O R S Our sponsors have helped make this event possible. If you havenâ&#x20AC;&#x2122;t had the chance yet, please stop by the exhibitor booths in the Big 12/Jayhawk room.

Ch ang e Po n d - G o ld

C ITO - Silv e r

Ca mbri dg e As s e s s m e n t - S ilve r

E ducat ion al Te s t in g Se r v ice (E TS) - Silv e r

ed Co unt - G o ld

CONFERENCE ORGANIZER: