4
E T H I C A L T E S T P R E PA R AT I O N I N T H E C L A S S R O O M
Our Study We began our study by identifying a representative group of 8,804 items from state, national, and international large-scale assessments. These included the American College Testing (ACT), the Scholastic Aptitude Test (SAT), the National Assessment of Educational Progress (NAEP), state and international science tests, and assessments from the Partnership for Assessment of Readiness for College and Careers (PARCC) and the Smarter Balanced Assessment Consortium (SBAC), the two state-led consortia that developed assessments aligned to the Common Core State Standards (CCSS). We selected assessments that not only meet high standards of validity and reliability but also reflect what most students in the United States will encounter over the course of their K–12 education. The tests we studied serve a variety of purposes and occur at different times and frequencies throughout a student’s education, but collectively they cover grades 3–12, the typical span during which students take large-scale standardized assessments. Within this sample, we analyzed the characteristics of each item, in terms of not just what content it required students to know, but how it asked them to demonstrate that knowledge. The analysis was somewhat different for each of the three content areas (ELA, mathematics, and science), as the assessments approached each area in different ways (see table I.1). Table I.1: Assessment Items Analyzed
Subject Area
Number of Items
Assessments Analyzed
ELA
1,684
PARCC, SBAC, NAEP, ACT, SAT
Mathematics
2,629
PARCC, SBAC, NAEP, ACT, SAT
Science
4,491
State science tests, NAEP, ACT, Trends in International Mathematics and Science Study (TIMSS), Program for International Student Assessment (PISA)
Total
8,804
The primary limitations of our study are related to an uneven sampling of items among assessments and the diversity of the assessments themselves. While PARCC and SBAC released items were abundantly available, the SAT and ACT each make only one practice test available at any one time, so the study included very few items from those assessments. For the science component of the study, while sampling from across every state with publicly available items created a large and diverse sample, not all states provided items, and the number of items available varied widely state to state. The international assessments used in the study also provided comparatively few items. Additionally, the purposes and forms of the assessments we analyzed vary. For example, the SAT and ACT are designed to measure individual students’ preparedness for college, while NAEP (“The Nation’s Report Card”) is meant to assess the overall growth and condition of U.S. students as a group. While the various assessments share some characteristics, such as a high frequency of selected-response