Problems in implementing value added models by UCLA Luskin School of Public Affairs

Problems in Implementing Value-Added Models: An Analysis of Current K-12 Value-Added Systems in the U.S. Chad Finlay Sherol Manavi

University of California, Los Angeles School of Public Affairs, Department of Public Policy

May 1, 2008

Table of Contents Table of Contents..............................................................................................................................i Executive Summary.........................................................................................................................1 I.

Introduction ..............................................................................................................................4 A. Definition, Purpose, and Importance of Value-Added Models .......................................4 B. Task of the APP ...............................................................................................................9

II. Analytical Approach ...............................................................................................................11 A. Limitations .....................................................................................................................13 III. Findings and Analysis ............................................................................................................14 A. Background and Purpose of Value-Added Models .......................................................14 B. Problems Due to the Complexity of Value-Added Models ...........................................17 C. Problems Due to Power Struggles Over Value-Added Models.....................................22 IV. Summary of Problems and Recommendations ......................................................................24 V. Conclusion ..............................................................................................................................30 VI. Appendix ................................................................................................................................31 A. Original Evaluation Matrix............................................................................................31 B. Revised Evaluation Matrix ............................................................................................33 VII. References ..............................................................................................................................35

-i-

Executive Summary As school districts and states increasingly focus on developing ways to hold administrators and teachers accountable for student performance, measurements based on value-added models (VAM) have recently received much scrutiny. Because value-added models claim to measure the impact schools and teachers have upon their students, independent of outside factors such as SES and past performance that are beyond their control, VAMs are a promising option for policymakers hoping to establish an effective accountability system that will improve student achievement. But due to the logistical, statistical, and political complexity of value-added models, they have been met with significant criticism that has slowed or severely limited their successful implementation. Our client, Glenn Daley, Director of Los Angeles Unified School District’s Office of Evaluation and Research, has been tasked with helping the District develop metrics to be included in their new “Performance Measurement and Accountability System.” Mindful of the potential of valueadded models to assist the nation’s second largest school district in better measuring the impact of its schools and teachers upon their students, the Client would like to understand the problems states and districts have encountered in their value-added model efforts. Based upon a review of various states and districts actively using value-added models, this report describes problems that threaten the long-term use of VAM and offers the Client guidance on how to avoid them. Findings An important general theme that emerged from our research is that problems threatening VAM could occur at any time – mature value-added model efforts were as much at risk as relatively newer systems. Tennessee and Dallas, the first adopters of value-added models, experienced the most recent and severe problems, suggesting that successful, long-term use of VAM requires continual vigilance. Before detailing the problems encountered in value-added models, we summarized some basic characteristics in VAM. Key findings include: • The more established VAM participants tend to be districts, while recent participants tend to be states • VAM participants adopted value-added models because they felt current accountability systems were unfair to low-performing schools or did not adequately measure school or teacher effectiveness • Very few VAM participants use value-added models for performance pay. About half release it publicly, the other half keep it private, and no participant publicly releases teacher level data.

-1-

Our findings are grouped into two types of problems: 1) problems caused by the complexity of value-added models and 2) problems caused by the power struggle between states/districts and their teachers’ unions. Problems due to the value-added models’ complexity generally were caused by the failure to gain the trust of stakeholders. Opponents of value-added models often complained about the lack of transparency in the accuracy of data, the statistical model, and how value-added models were used to evaluate schools and teachers. Distrust was further exacerbated when responses to complaints about VAM were too technical in nature, instead of addressing concerns in ways more accessible and meaningful to a lay-person. If these core areas of distrust were not addressed, then even mature value-added modeling efforts were at risk. In addition, since gaining the trust of stakeholders was so crucial, efforts to later attach high stakes to VAM were met with significant resistance as states and districts did not adequately utilize the interim time to garner the confidence of stakeholders. Since all teachers’ unions of the VAM participants opposed using teacher level value-added models for high stakes, the participants were forced to shift to school level or grade level valueadded models. This potentially reduced the effectiveness of the value-added models as a tool for accountability since teachers are a key component of accountability systems. Recommendations From these findings, the report offers recommendations in addressing the problems due to the complexity of value-added models and the resistance of teachers’ unions to using VAM at the teacher level. To address the problems of trust due to the complexity of VAM, we recommend implementing continual, independent oversight from external parties trusted by those being evaluated. This would include auditing of the data to ensure accuracy, ongoing assessment of the statistical models in use, and verification that the results are both correct and correctly applied. In addition, schools and teachers should be provided with on-site experts on value-added models who would provide training and mentoring in understanding VAM results and how to effectively utilize them to better student performance. Ideally, this expertise and mentoring would include peers to increase the level of trust and credibility of the professional development. The constraints imposed by teacher union resistance to high-stakes value-added models at the teacher level are formidable. Even if problems regarding the complexity of VAM are overcome, there are still significant concerns that value-added models are not capable of adequately measuring teacher effects. It is likely that teacher level VAM along with high stakes remains out of reach for the foreseeable future. However, we do not recommend attempting to use no/low stakes VAM at the teacher level as a “Trojan Horse” to later switch to a high stakes system. Without high stakes attached to teacher level value-added models, we found that there was not a sufficient incentive for teachers and

-2-

administrators to take it seriously, and thus develop the necessary communication and processes crucial for long-term implementation. Instead, we recommend a different type of â&#x20AC;&#x153;beachhead,â&#x20AC;? one that centers around pairing teacher level VAM for bonus pay to teachers at underserved schools. Since this type of bonus pay has been generally accepted by teachersâ&#x20AC;&#x2122; unions, it could demonstrate and showcase teacher level value-added models. In addition, it gives proponents an economic argument that the funds used for such efforts are being efficiently allocated to the most effective teachers.

-3-

Introduction

Faced with increased pressure from Mayor Antonio Villaraigosa and its Board of Education to improve the quality of education for its students, the Los Angeles Unified School District (LAUSD), the second largest district in the nation, is in the midst of developing a sweeping and ambitious District accountability plan, currently named the “Performance Measurement and Accountability System.” The goal of the accountability plan is to ”foster the use of qualitative and quantitative data to drive a continual improvement cycle aligned to the Vision, Mission, and Guiding Principles adopted by the Board and Superintendent and that form the foundation of the Superintendent’s Strategic Plan,” (Slayton 2007, 2). One of the key proposed components of the Performance Measurement and Accountability System is the development of “transformation metrics,” which are “accurate and meaningful data related to student learning, student achievement, student retention, student safety, teacher and administrator performance, employee satisfaction, cost effectiveness and other areas in relation to overall District performance,” (Slayton 2007, 2). These data, both qualitative and quantitative, are tentatively planned to be collected for various levels: District, local district, school, and classroom. Some of these transformation metrics already exist, while others require new instruments or statistical models. One possible new transformation metric focusing on student performance for LAUSD involves value-added models (VAMs), a somewhat controversial and recent measurement. Over the last few decades, several states and districts have used VAM, beginning with Tennessee and Dallas in the 1970s and 80s, with varying successes and difficulties. As the District moves forward in developing the Performance Measurement and Accountability System, and considers including VAM measurements, the Client, Glenn Daley, the Director of LAUSD’s Office of Evaluation and Research, is interested in the specific problems other districts and states have encountered with VAM and how they have responded to those difficulties. Because Daley’s department has a key role in developing and implementing components of the Performance Measurement and Accountability System, the findings can be used to help LAUSD avoid similar difficulties and implement a value-added model that is effective long-term.

A. Definition, Purpose, and Importance of Value-Added Models In an attempt to measure the performance of a state, district, or school, accountability systems often include metrics based upon status models and/or growth models. Since value-added models are a type of growth model and are distinct from status models, it is important to understand the advantages VAM has over these alternatives. While it is beyond the scope and purpose of this report to justify the possible inclusion of value-added modeling among LAUSD’s transformation metrics, the report can provide insight into why VAM is being considered. VAM is designed to show how much a district, school, or classroom/teacher is responsible for its students’ growth,1 a useful tool for policymakers seeking to identify which parties actually improve or worsen student achievement, whatever their performance level (i.e., status result) 1

Due to the statistical constraints of VAM, individual students do not have a VAM result (though their individual growth can be measured). Instead, a classroom/teachers, or school/district can be given a VAM result for its students, unless there are too few students to analyze.

-4-

may be. With VAM, districts, schools, and teachers can be more fairly held accountable or recognized for the value they add to their students without being penalized for factors outside their control, which neither status models nor basic growth models can do (Drury 2003, 1). While value-added models are not designed to explain why a district, school, or classroom/teacher may have a positive or negative effect upon studentsâ&#x20AC;&#x2122; growth, they can provide important clues about relative effectiveness that other metrics can help illuminate further. Used in conjunction with status models, a value-added type of growth model can be a key indicator of which parties are effective in educating their students and moving at the needed pace to reach the studentsâ&#x20AC;&#x2122; achievement goals. There are five other ways to view the performance of a district, school, or teacher, each with its strengths, yet with weaknesses that VAM better addresses (see Table 1 for a summary of various models): 1) Basic Status: the status, or snapshot of an indicator or indicators 2) Conditional Basic Status: the status of an indicator or indicators, conditional on specific variables 3) Basic Improvement: the degree of improvement over a previous status 4) Conditional Improvement: the degree of improvement over a previous status, conditional on specific variables 5) Basic Growth: a longitudinal measure of growth Table 1: Summary of Status and Growth Models Type of Model Basic Status

Type of Performance Measurement Status

Conditional Basic Conditional Status status Basic Improvement Conditional Improvement Basic Growth

Value-added

Purpose Weakness Show how many students Does not show performance meet goal of students with particular characteristics Show how many students Does not show progress with particular characteristics meet goal

Change in status Show progress of students Does not show performance towards goal of students with particular characteristics Conditional Show progress of students Does not show progress for change in status with particular individual students, only for characteristics towards the same level over time goal Longitudinal Show change of same Does not show performance growth students over time of students with particular characteristics Conditional Show how much student longitudinal growth is due to a school growth or teacher

Example CST, Base API, Base AYP Base API and Base AYP with subgroups and similar schools ranking Growth API, Growth AYP

Growth API and Growth AYP with subgroups and similar schools ranking Gain scores

TVAAS

The first two alternatives (basic status and conditional basic status) have been in use the longest, and are arguably the most common and accessible ways to measure student performance. They measure the state, or status, of a performance indicator such as test scores.

-5-

Basic Status Model A basic status model typically shows how many students have reached a certain proficiency level on a particular test or other metric. It answers the question of, “On average, how are students performing this year?” (The Council of Chief State School Officers 2005, 3). A status model approach is fairly intuitive to understand and to interpret as it is simply a “snapshot” of how many students are meeting a target for that year. For example, the California Standards Test (CST) measures the percentage of students at various proficiency levels each year. Another standard status model used in California is the Academic Performance Index (API)2, which uses the CST, the state high school exit exam (CAHSEE), the California Achievement Test (CAT/6), and an alternative test to the CST/CAT for students with disabilities (CAPA).3 The API is a composite of all these components, reported on a scale of 200-1000, with 800 being the state target for all districts and schools (California Department of Education, Testing and Accountability, “API Description”). The federal accountability system, No Child Left Behind (NCLB), utilizes Adequate Yearly Progress (AYP), which can be a basic status model, and for California, includes the API. 4 Conditional Status Models Policymakers in education are often interested in more than a status result for a district or school. They want to know the district’s or school’s performance with various student populations, such as low-performing groups – information that a basic status model cannot provide. In order to address these shortcomings, many status models such as the API also include results based on variables that describe student and school characteristics, called conditional status models. The API accomplishes this with results for districts and schools based on student subgroups5 and a similar schools ranking that “compares a school to 100 other schools of the same type and

There are two types of results reported for the API – Base API and Growth API. As explained later, Base API is considered to be a basic status model, while Growth API is an improvement model type of a status model. In addition, API is also reported by similar schools, which is described later as well. 3 Not all of these test components are used for every grade level. CSTs for English-language arts and mathematics are used in grades 2-11, CSTs for history and social science are used for grades, 8,10, and 11, and CSTs for science are used in grades 5,8, 10, and 11. CAPA is part of the API for grades 2-11. The CAT/6 tests are used in the API for grades 3-7 in reading, language, spelling, and mathematics. CAHSEE is only included in grade 10 (and in grades 11 and 12 if did not pass earlier). In addition, the tests are weighted when calculating the API. In grades 2-8, English-language arts CST is weighted .48, .32 for CST mathematics, .20 for CST science, .06 for CAT/6 reading, .03 for CAT/6 language and spelling, .08 for CAT/6 mathematics. In grades 9-11, English-language arts CST is weighted .30, .32 for CST mathematics, .22 for CST science, .10 for CST life science, .23 for CST history-social science, .30 for each CAHSEE subject (English-language arts and mathematics) .06 for CAT/6 reading, .03 for CAT/6 language and spelling, and .08 for CAT/6 mathematics (California Department of Education, Testing and Accountability, “API Description”). 4 Like the API, the AYP has both base and growth measurements. The federal Department of Education is allowing some states with inadequate AYP results to use growth models as part of their AYP and is studying allowing all states to do so. 5 The subgroups include categories based on students with disabilities, student economic status, level of student English language fluency, student ethnicity, student gender, and education level of students’ parents.

-6-

similar demographic characteristics” using the School Characteristics Index (SCI) (California Department of Education, Testing and Accountability, “API Description,” 38).6 Improvement Status Model While both the basic status model and conditional basic status model show how many students meet a target goal, it is also helpful for a district or school to measure its progress towards a particular target. This type of status model is called an improvement model. Both the Base API and Base AYP, in addition to a basic status model, have an official accompanying improvement model, Growth API and Growth AYP. The essential question an improvement model asks is, “On average, are students doing better this year as compared to students in the same level [district, school, grade] last year?” (The Council of Chief State School Officers 2005, 4). This can be accomplished by comparing results from one year to another. Conditional Improvement Model Like the basic status model, a basic improvement model can have a conditional version that reports results by more detailed criteria. For instance, Growth API, like Base API, has results for subgroups and Similar School Characteristics and makes it is possible for LAUSD to use an improvement model to compare more alike schools to each other. Thus, it provides a sense of whether schools are moving particular groups of students in the right direction. However, even a conditional improvement model does not adequately measure all types of performance. The results may be skewed by a cohort effect, in which successive classes of students vary too much from each other to allow comparisons to be made. This tends to be more of a threat for schools with high student and teacher mobility. Additionally, because conditional improvement models are based on comparing two snapshots to each other, rather than longitudinal data that follow the same student or cohorts, improvement models may not accurately measure student growth. Conditional improvement models assume students with the same characteristics will also have the same previous achievement, and this may not be true (Kane and Staiger 2002). Basic Growth Model In order to address the shortcomings of conditional improvement models, policymakers have turned to basic growth models. An improvement type of status model and a growth model can be easily confused with each other as both types measure change. Essentially, an improvement model focuses on the same level (i.e., grade or school), comparing different students over time, while growth models are longitudinal, staying with the same group of students, and measuring their changes as they progress through the grades. The improvement model measures the changes for a district, school, or grade, while the growth model measures the changes for the same students. For policymakers interested in helping schools with low overall proficiency (low status) reach their target scores or goals, understanding growth rates can be an important tool in developing policies that lead to better student growth, and thus improved CST, API, AYP, or other results.7 6

In addition to student demographics, the SCI also includes teacher credentials, average class size, pupil mobility, and multi-track scheduling.

-7-

For instance, if students at a school with an overall low status, (e.g. below a “proficient” ranking on the API) also have a low overall growth rate (Table 3, Group 3), then that school is in a different situation from a school with the same low status yet with a high overall growth rate (Table 3, Group 1). Policymakers would want to make Table 3: Comparison of Status sure the school in Group 1 is allowed to continue Models to Growth Models whatever causes their growth and perhaps try to replicate GROUP 1 GROUP 2 it in other schools. Conversely, the school in Group 3 might need outside help in order to improve its students’ High Growth High Growth performance. So while both schools need to improve their status ranking, they are moving in different directions, Low Status High Status GROUP 3 GROUP 4 requiring distinct approaches to achieve improvement. Low Growth

Low Growth

A basic growth model that simply measures the difference between test scores over time for a student (gain score) Low Status High Status can be used to identify students that are performing below Adapted from The Council of Chief State a typical growth average (usually a district or school School Officers 2005, 7. average). Many states, such as Arizona, New Hampshire, and Utah, provide teachers with basic growth model results for the development of individualized learning plans for low-performing students (Yeagley 2007). Value-Added Models Because basic growth models lack a conditional component that controls for various variables, they cannot answer the question of how much a district, school, or teacher impacts its students’ growth. In order to better address these shortcomings, a particular type of growth model, the value-added model has received increased scrutiny. The goal of a value-added model is to measure the contribution of a district, school, or teacher to its students’ growth, based only upon factors under its control. In order to do this, a VAM predicts students’ expected growth and compares it to their actual growth. This expected growth prediction separates out factors not under the control of a district/school/teacher, thus establishing a growth target that is theoretically fair to the level evaluated. One of the assumptions of VAM is that student achievement is influenced by a multitude of factors, some under the control of a district, school, or teacher, and some that are not. For example, schools/classrooms do not have any control over their students’ ability level, but may have control over pedagogy or curriculum. Thus, a value-added model attempts to control for various factors related to student learning that are outside the control of educators. A teacher level VAM, for example, tries accounts for the influences of students’ home life and upbringing on their learning by controlling for such factors as students’ previous test scores, their age, 7

Of course, it is also important for high status schools to maintain or improve the growth of their students, but this is a less pressing priority, especially for those interested in narrowing the achievement gap.

-8-

gender, if they have been retained, their free/reduced lunch status, and their parents’ education. The model also attempts to account for factors outside the classroom/teacher’s control by including variables that tend to be related to student achievement such as class size, percentage of free/reduced lunch students in the class, percentage of disabled students in the class, percentage of students in the class with parents who have high school diplomas, and the mean test score for the classroom. 8 Finally, the model tries to account for factors outside the control of the school by controlling for variables such as the percentage of free/reduced lunch students in the school, the percentage of disabled students in the school, the percentage of students in the school with parents who have high school diplomas, and the percentage of teachers with credentials in the school. All of these variables are then used in an equation to predict students’ expected growth, which is then compared to their actual growth.9 For instance, if the students in School A in LAUSD are expected to improve their CST mathematics score by 5 percentage points (given their previous math CST scores and the characteristics of the students, school, and classroom that are related to student achievement, but are outside the school’s control), yet their scores actually increase by 10 percentage points, then the school’s value-added result is +5 percentage points. That means the school has added positive growth to those students, beyond what is expected of them and regardless of the students’ previous achievement level.

B. Task of the APP Given the statistical complexity of value-added models compared to the more commonly used status models and basic growth models, VAM has been met with controversy and resistance from the outset. Many of the concerns about value-added models revolve around its statistical modeling, such as questions about whether it is really possible to separate out factors not controlled by schools and classrooms/teachers, how transparent the results can be made, whether VAM can adequately measure groupings of low, average, and high achieving students (since these groups logically might have differing potential for growth), or whether it can be fairly applied on the teacher level.10 Other issues are more resource oriented, as value-added models

One of the larger debates in value-added modeling is over whether certain student demographic characteristics, especially ethnicity should be controlled for in the model. Proponents, most notably, Dallas Independent School District, argue their inclusion is necessary because ethnicity is highly correlated with student achievement and thus needs to be controlled for the sake of “fairness” (Webster 1996). Opponents argue that their inclusion creates a lower standard for certain students and makes it statistically impossible to assess achievement gap issues. Instead, they argue the results should be reported separately for those subgroups, but not included in the value-added model. 9 A common example is the Tennessee Value-Added Assessment System (TVAAS) that uses this model for the student level: yi1 = mi + bi + S1 + ei1, where y is the student expected score at t = 1, 2, …p; m is a student mean that may depend on student characteristics, S are the school, grade or teacher effects, b is assumed to be a normally distributed random variable, and e is the residual errors. 10 The RAND study (McCaffrey, et al., 2003) argues that VAM on the teacher level is not recommended due to the following reasons: 1) it is difficult to identify a suitable analogue to compare a specific teacher against, 2) a teacher’s effect may not be the same for all students, causing difficulty in making inferences about a teacher’s impact, 3) without random assignment of teachers and students, verifying teacher effects is very difficult, leading to estimations that may not be precise enough for policymakers, 4) measurement error can overwhelm small effect sizes, which is a larger problem on the teacher level than the school or district levels due to the smaller sample size,

-9-

require tracking each student with an ID, enough test score data to control for past performance, as well as substantial data collection and analysis capabilities. 11 In addition, value-added models have been proposed or used for varying applications (rankings, performance pay) and on different levels (district, school, classroom/teacher), often leading to heated resistance from involved parties. While these general kinds of problems in implementing value-added models are well-known to the Client, this report details the specific reasons why problems have emerged and the success of the responses to those problems, paying particular attention to the timing and actors involved. Ideally, the Client can use the findings and recommendations to help properly tailor VAM within LAUSDâ&#x20AC;&#x2122;s Performance Measurement and Accountability System and increase its chances for long-term success.

and 5) teacher rankings based on VAM are highly unstable from year to year, possibly confounding inferences about teacher effectiveness. 11 Value-added models can be used with both norm-referenced and criterion based exams, though criterion based exams are preferred. Criterion based exams establish a better link between what a school teaches students and the degree to which students understand and can apply what they learn. The one difficulty with using criterion-based exams, however, is that they need to be vertically scaled so that a studentâ&#x20AC;&#x2122;s score from one grade can be compared to his/her score in the next grade. While value-added does not require schools to administer exams to students in consecutive years, this has nevertheless been the typical practice. Almost all the case studies weâ&#x20AC;&#x2122;ve encountered administer exams to students every year from grades 2 to 8. While part of this is due to NCLB requirements, it is also due to the fact that more accurate value-added measures can be attained with more test data.

- 10 -

II.

Analytical Approach

Because the Client has the resources to understand factors within LAUSD relevant to implementing value-added models, this report employs a case study approach to analyze the value-added efforts of other states, districts, and schools. We arrived at our analytic sample of eleven value-added participants (five states and six districts)12 by using the following process: 1) First, we identified all states and districts that proposed using VAM, were in a pilot stage, had recently begun VAM, or had already been using VAM (N=25).13 2) We then dropped eight states and districts that proposed using VAM, yet had not yet entered a pilot stage or begun implementation because we found too little information for analysis. 14 3) We then dropped six states and districts that claimed to be using a value-added model but their model did not conform to a relatively loose definition of value-added. 15 Our minimum requirement for examining a case was that it used a model with longitudinal student level data that controlled for previous test scores and at least one characteristic at any of the school, classroom, or student levels. This decision was informed by our understanding that our Client would likely use a VAM with those characteristics, providing a more appropriate analogue to LAUSD. We further organized the analytic sample into two groups: participants who were in a pilot stage or had begun using VAM but did not yet have results, and participants who had been using VAM and had results (see Table 4). We classified cases this way because we hypothesized that VAM efforts would undergo changes over time and that the problems experienced by states or districts might be different at various stages of implementation. In addition, the Client was particularly interested in the adjustments that long-term participants had undergone.

The participants were categorized as either a state or district in order to determine whether to analyze districts within a state separately or as one unit. The criterion for a state designation was if VAM was mandated or supervised by the state (Iowa allowed districts to opt-in, but because it was overseen by the state, it was given a state label). A district designation was given to districts if the state did not supervise VAM, even if several districts in the same state employed VAM. The reasoning was districts under a state-run VAM system would have similar characteristics while those without would differ. However, if districts within the same state proved to lack significant differences, then they were described as a unit. 13 Wyoming, West Virginia, Washington state, Michigan, North Dakota, South Dakota, Pennsylvania, Ohio, Washington D.C., New York City, Arkansas, North Carolina, Tennessee, Arizona, Milwaukee, Chicago, Utah, Rhode Island, Iowa, Denver, Pueblo, CO, Aurora, CO, Florida, Seattle, Dallas, and Texas. 14 Wyoming, West Virginia, Washington state, Michigan, North Dakota, South Dakota, Iowa, and Washington D.C. 15 Arkansas, Arizona, Utah, Aurora, CO., Denver, CO., and Florida.

- 11 -

Table 4: List of Cases Included in Analytic Sample States/Districts that Have Just Begun to Use VAM and Do not Yet Have Results Pennsylvania Ohio New York City, NY North Carolina Texas

States/Districts that Have Been Using VAM and Have Results Tennessee Dallas, TX Seattle, WA Milwaukee, WI Chicago, IL Pueblo, CO

Before beginning our research, we developed an evaluation matrix of VAM implementation problems that we hypothesized had taken place in at least one or more of the case studies (see Appendix A). Since the general types of potential problems were readily available without studying specific participants from various assessments of value-added models, we used these problems as the framework for the evaluation matrix. During the research, if we discovered other relevant problems, they were added to the matrix. The final evaluation matrix (see Appendix B) contained three sections, each with specific elements to investigate: 1) 2) 3)

“Background and Purpose ” focused on the reasons why value-added models were implemented and some basic characteristics of their role within the accountability system; “Problems Due to the Complexity of Value-Added Models” examined all the findings that emerged from the complex nature of VAM; “Problems Due to Power Struggles Over Value-Added Models” involved understanding if there were legal/authority issues that prevented effective implementation, such as state law/education codes, union contracts, jurisdictional issues;

After collecting the qualitative data16 for the two groups (pilot/new and currently in use), we analyzed the findings of all the participants to determine overall trends within each of the three Evaluation Matrix categories. Based on this analysis, we identified two critical problems that could threaten the long-term use of VAM if not addressed, along with options based upon the findings. We then provided recommendations on how to address these potential problems.

Our research included extensive consulting of internet-based sources. To obtain general information about a valueadded program in a state or district we researched that state’s/district’s department of education website. There, we examined information about the value-added model being used, whether VAM is part of a larger accountability system, and how/whether VAM was being used with other assessments. We also utilized these websites (especially those that provided access to school board meeting minutes and agendas) to attain a general history of VAM implementation and understanding of the basic governance structure and decision-making process. To obtain more specific information about the conflicts and problems that arose in VAM implementation, we researched newspaper articles, press releases, and organizational publications.

- 12 -

A. Limitations Ideally, the analytic methodology outlined in this report would only be the first stage. We had hoped to also collect data on LAUSD through District personnel interviews in order to understand how to tailor the recommendations specifically to LAUSD. But with principals’ time severely limited by current LAUSD needs and projects, we were unable to obtain timely approval from LAUSD’s Committee for External Research Review. Additionally, while an independent assessment of LAUSD’s plans could prove useful to the Client, it was also potentially too risky. Given that the District’s accountability plans were still being developed, the Client was concerned that the interviews with LAUSD personnel could confuse the interview subjects and cause damaging political problems if they were to interpret our questions as District policies or proposals. Additionally, the Client’s department was already preparing to do interviews within LAUSD and thus wanted us to focus on understanding VAM outside of LAUSD.

- 13 -

III. Findings and Analysis As stated earlier, we grouped the findings into three categories, reflecting common trends.

A. Background and Purpose of Value-Added Models Since the problems in each case are related to the purpose and background of VAM implementation, it is crucial to first describe the reasons why value-added models were instituted and some basic characteristics of their role within each accountability system. Key Finding: More established VAM participants tend to be districts, while recent participants tend to be states.17 With the exception of Tennessee, individual districts, rather than the state, were the first to adopt VAM, beginning with the Dallas Independent School District in 1984. This trend was most likely strongly influenced by the 1983 A Nation at Risk report that alarmed educators about the state of education, leading policymakers to focus on school effectiveness. Then in 2001, right before VAM began to become more widespread, the No Child Left Behind Act forced all states to adopt a comprehensive statewide accountability system to meet the NLCB achievement goals. This shifted the focus from district initiated value-added systems to statewide ones. For example, Ohio and Pennsylvania began their efforts in 2006 and 2007, respectively. Key Finding: Participants adopted VAM because they felt current accountability systems were unfair to low-performing schools or did not adequately measure school or teacher effectiveness. Most VAM participants in our sample cited the unfairness of current accountability systems towards low-performing schools as the primary reason for adopting value-added models. Since all the previous accountability systems relied upon a status or improvement model, the general sentiment was that a VAM might better recognize the growth of schools with hard to serve student populations. Pennsylvania adopted a value-added model after the Pennsylvania League of Urban Schools (PLUS), composed of inner-city school educators and administrators, lobbied the state Board of Education. PLUS argued that the progress of urban schools was not accurately reflected and that current metrics did not allow such schools to adequately assess their performance (Stewart 2006). Other participants, such as Milwaukee Public Schools, Ohio, Tennessee, and Dallas had similar concerns with the improvement model based accountability systems, with Dallas placing “fairness” as its primary concern when recommending VAM to its board (Commission for Educational Excellence, Final Report 1991). 17

As noted in the “Analytical Approach” section, we grouped the analytic sample into two groups based upon their longevity. VAM participants that were in pilot programs or did not yet have results were considered recent participants while we categorized the rest as more established. Although this classification is somewhat arbitrary because it is unclear whether longevity is a linear or nonlinear characteristic, it does provide a means to help examine the stages of VAM implementation.

- 14 -

The other common reason for the adoption of value-added models was the need to establish a more accurate measurement of the performance of all schools in order to rank or grade schools or teachers. For instance, Chicago Public Schools was looking for a better way to reward principals and schools with gains in student performance (Chicago Public Schools, Office of Communication 2002) and New York City wanted to test if a VAM would be a better means of measuring teacher effectiveness then the current teacher observation based system (New York City Department of Education 2008). Key Finding: Very few VAM participants use it for performance pay. About half release it publicly, the other half keep it private, and no participant publicly releases teacher level data. Only Chicago and Dallas give salary bonuses to staff at schools based on their VAM result. A high VAM ranked school in Chicago receives recognition and its teachers receive bonuses of up to $8,000 (Chicago Public Schools, Department of Research, Evaluation and Accountability 2008; Rossi 2007, 14). Dallas recently instituted a new teacher merit pay system in the fall of 2007 that would give the top forty of percent teachers at fifty-nine low-performing schools up to $10,000 based on their value-added results (Dallas Morning News 2007). Approximately half of the rest of the cases are given a grade, ranking, or results that are publicly released. Typically, a VAM is used to help create school improvement plans and for teacher professional development, and rarely is used punitively (in 2006 Dallas did start using teacher level VAM to fire the lowest performing teachers). For example, in Milwaukee, schools are publicly given one of four possible grades, based on both value-added and status metrics. Those schools with a grade of “High value-added/High attainment” are given more freedom from the district, while those that get “Low value-added/Low attainment” are provided with more intense training, professional development, and district scrutiny (Borsuk 2006a). These low performing schools can even be taken over by the district if school improvement plans do not show adequate results (Borsuk 2006a). Similarly, in Ohio, if a school has two consecutive years of positive value-added growth, its overall school performance rating is elevated to a higher rating (i.e. a school that originally has a rating of “effective,” is now rated as “excellent”) (Ohio Department of Education 2008). However, if an Ohio school has three consecutive years of negative valueadded growth, then its rating is downgraded by one level (i.e., a school with a rating of “excellent” is now downgraded to a rating of “effective”). On the other hand, Pennsylvania and Pueblo School District do not publish their district, school, or grade level VAM results. Nevertheless these results are informally being used by schools to strategize and train teachers (McCaffrey and Hamilton 2007; Anderson and DeCesare 2008). Other participants, like Seattle, Dallas, and Tennessee publicly report district or school level VAM results, but keep the teacher level private. No teacher level VAM results have been publicly released, even in Chicago and Dallas, where they are used for high stakes. Several media outlets in Dallas have sued the district for teacher level VAM data, but the city’s district attorney has ruled it confidential (Dallas Morning News 2008).

- 15 -

Key Finding: Most VAM participants include teacher level results in addition to district, school, and grade level metrics. However, only Dallas, Chicago, and NYC use the teacher level VAM to evaluate teachers or for high stakes.18 Milwaukee, Pennsylvania, and Ohio do not have a teacher level VAM, but they do include grade level value-added measures. Dallas’ use of teacher level VAM for merit pay has been met with much criticism from its teachers, for reasons described below. NYC is using its pilot program to assess whether teacher level VAM can be used to evaluate teachers or for high stakes. In May 2007, Chicago began its new Recognizing Excellence in Academic Leadership (R.E.A.L.) program, giving teachers who serve in low performing schools a bonus, additional bonuses based on teacher level VAM results, and teacher mentoring. Using the Teacher Advancement Program (TAP), R.E.A.L will be rolled out to 40 schools by 2009 (Chicago Public Schools 2007). While Pueblo School District and North Carolina do use teacher level VAM, the value-added results are not used for accountability purposes. Instead, in these cases, value-added is used only to help the district develop teacher professional development programs. Key Finding: The more recent VAMs follow the Educational Value-Added Assessment System (EVAAS) model. In 1991, William Sanders developed VAM for Tennessee, called the Tennessee Value-Added Assessment System (TVAAS), which instantly became the standard for value-added models.19 As interest in TVAAS grew, Sanders developed a general value-added model (one that was similar to the Tennessee model, but could be used in other states or districts outside of Tennessee) named the Educational Value-Added Assessment System and began collaborating with software company SAS (Statistical Analysis Software) to handle the data analysis and reporting for interested states or districts. In 2000, Sanders became an employee of SAS, bringing EVAAS with him. Since then, many VAM participants, especially the recent ones, utilize SAS EVAAS to develop their value-added models and for data analysis and reporting. Most likely, as Sanders and SAS EVAAS became more well-known to policymakers, it was more feasible for many states and districts to contract with SAS rather than develop and maintain expertise and staff in VAM. Both Pennsylvania (in 2006) and Ohio (in 2007) use SAS EVAAS, with their own customized version. In contrast, Seattle Public Schools and Pueblo School District have little internal involvement with SAS EVAAS, sending their data to SAS and reporting the result they receive back. Key Finding: Almost all VAM participants use it only for elementary and middle grades. VAM results for high school grades in Tennessee occurred much later than for grades 2-8, suggesting that time is a factor in including high schools in the VAM. 18

Chattanooga School District in Tennessee does offer a bonus to teachers who have shown three years of high VAM results and teach at a designated “hard to serve” school. However, many districts give similar pay raises, without the VAM requirement. Dallas is much different since any teacher with a certain VAM result will get a bonus, regardless of where he or she teaches. 19 Although Dallas began their VAM in 1984, it was only used internally and primarily for research purposes? until being more officially introduced in 1991 (Webster 1996). Even then, it was not publicized, and TVAAS was seen as the VAM pioneer since it was a critical, public component of a statewide accountability system.

- 16 -

VAM is much more difficult to develop for grades 9-12 because high school teachers are not assigned to the same group of students, but instead to the same subject(s). This complicates the task of isolating how a classroom (grade or teacher) affects its students’ performance. In addition, in the past, not all states and districts had enough useable test data for high school and needed time to make the necessary adjustments. For these two reasons, VAMs in our case studies almost all excluded high school grades. While Dallas measured VAM results for all grades early on, it took TVAAS until 2004 (thirteen years after it first stated using VAM in grades 2-8) to report VAM results for grades 9-12.

B. Problems Due to the Complexity of Value-Added Models In our research, we discovered that criticism and discontentment with VAM tended to revolve around the complex nature of VAM. Administrators, teachers, parents, and others in the education world have become more comfortable with status and basic growth models, but valueadded models present a much larger challenge. Because of this, findings with the core problem related to the complexity of value-added models are grouped together in this section. Key Finding: Due to the complexity of VAMs, gaining the trust of stakeholders that VAM is fair to those being evaluated and useful in accomplishing accountability goals is critical for the longterm success of VAM. Failure to do so can result in serious problems at a later stage, even if an earlier period has few complications. Corollary Finding: Quietly including VAMs in an accountability system with future plans to give it a more prominent role can also result in serious problems and only delays the inevitable task of establishing the needed credibility and trust. Interestingly, Dallas and Tennessee, the two VAM participants with the longest history of VAMs, also have experienced the most serious and recent challenges to their VAM metrics. Clearly, having an established VAM program does not ensure continued support. Our examination of Tennessee and Dallas revealed that they had failed to adequately address longstanding distrust of their complex valued-added models. Tennessee’s TVAAS is facing significant threats to its inclusion in the state accountability system. Lawmakers in the Tennessee state legislature as well as members of the state’s most powerful teachers’ union have recently worked to terminate TVAAS due to the lack of transparency of its model and concerns over the misuse of data. House Bill 2700 and Senate Bill 2542 in 2004 proposed to eliminate TVAAS, initially making it through the Education Committee. However, it was later withdrawn with the promise the issue will be revisited later (TEA 2004; TEA 2005). In proposing to eliminate TVAAS, lawmakers cited concerns about how the TVAAS scores were calculated and the lack of access to the statistical method (TEA 2004). In addition to the statistical concerns of TVAAS, lawmakers claimed the reporting system of TVAAS confused parents and demoralized teachers (TEA 2005). Their distrust of TVAAS and their concern that the state had too little control over how value-added measurements were calculated was evident in their recommendation that the Department of Education investigate the possibility of other VAM vendors and systems besides TVAAS.

- 17 -

The Tennessee teachers’ union continued the offensive against TVAAS in May 2007, when their members overwhelming voted that TVAAS should be eliminated due to the misuse of data and false conclusions about teachers and schools (TEA 2007). They also sponsored HRJ 928, a House Joint resolution calling for the Select Oversight Committee on Education to evaluate whether TVAAS should be used as an evaluation tool (TEA 2008). The failure to address the initial concerns of TVAAS has led to deep distrust and drastic steps to eliminate it completely. While these concerns about TVAAS have existed since its inception in 1992, Sanders did attempt to respond to his critics. In 1995, the Tennessee state legislature asked the Office of Education Accountability to review the TVAAS program. One key finding was that it was impossible to adequately assess the effectiveness of TVAAS due to the lack of data needed to replicate his results: Additional evaluation of the value-added assessment model might lay to rest many of the questions and concerns people have raised about the TVAAS—but, as yet, no such comprehensive evaluation has been performed. Sanders indicates that he welcomes such an evaluation, although he has been extremely protective of both the data and computer software used to run the value-added calculations. While Sanders has made the Report Card results available in different forms, he apparently has not provided complete information to anyone who could replicate the model. He indicates that he is concerned about contractual obligations and copyright infringement, but it would be difficult, if not impossible, to perform an adequate evaluation without access to the student database and the software (Office of Education Accountability 1995, 9). Sanders responded by saying: “As has been stated many times, a competent, objective, and independent review is welcomed. Even though a ‘comprehensive’ outside review of TVAAS has not been completed as of yet, considerable evaluation and validation has been completed in many different ways at many different levels over the past 13 years,” (Office of Education Accountability 1995, 41). While the Office of Education Accountability welcomed the invitation for an independent review, they did note that “he has not submitted the model to the wider professional community for validation” and several of the nationally-recognized educational measurement, testing, and statistical experts interviewed for this report had never heard of the TVAAS,” (Office of Education Accountability 1995, 44). Sanders repeated his offer for an independent review in the national teachers’ union newsletter: “I want to suggest that this review be conducted as soon as possible to lay to rest any apprehensions that anyone might have concerning the validity, reliability, and robustness of the entire TVAAS process,” (TEA January 1995). An external review from outside researchers commissioned by the Office of Education Accountability was generally favorable to TVAAS and did not find major problems with the statistical model or method of analysis (Bock 1996). Despite Sanders’ good faith effort to explain TVAAS and the approval of a state audit, lawmakers and the teachers’ union remained unconvinced of the accessibility and validity of the

- 18 -

TVAAS process. As the next finding will explain, this disconnect occurred because while Sanders was able to respond to criticisms of some in the research community, he failed to gain the trust of lawmakers, teachers, and administrators. Given the statistical complexity of TVAAS, these audiences required a different type of evidence and response than researchers needed. Dallas, the first district to use VAM, has also recently encountered heated criticism from teachers and the media for supposed statistical problems and accuracy flaws with its VAM system when it suddenly began using it for teacher merit pay in January 2008. Since 1994, Dallas has been using the Classroom Effectiveness Index (CEI), in which teacher level value-added results are privately available for administrators and teachers. Before the district tied CEI to merit pay, the value-added results were largely ignored by both teachers and principals; there was little incentive for teachers to try to understand the CEI and for the District to make a strong effort to communicate how it worked and how it should be used to improve teaching skills.20 When the teacher-merit plan was announced, CEI was quickly met with distrust. Many teachers confessed to not even knowing about the existence of the CEI and most did not understand how it was calculated. Dale Kaiser, the head of the NEA-Dallas teachers’ union called the CEI, the District’s “magic glasses,” remarking, “You have a bunch of numbers, and you look at them through these magic glasses and they reveal whether a teacher is good or not?” and “Most teachers haven’t even been trained in how to read a CEI or to know what on their CEI is suppose to be correct,” (Dallas Morning News November 2007a; Dallas Morning News 2008). Aimee Bolender, the head of the Alliance/AFT teachers’ union had similar doubts about the CEI, remarking, “CEIs are sticky widgets. Neither teachers nor principals understand CEIs. Teachers cannot independently verify the accuracy of their CEIs. Teachers are ultimately expected to trust their CEIs,” (Dallas Morning News 2007a). She also claimed that teacher mistrust of CEIs was “massive,” (Dallas Morning News 2007a). The low confidence in the CEI by the teachers’ union was also evidenced by Kaiser’s comment that “I don’t particularly like this idea [teacher meritpay], but sometimes [administrators] have to learn the hard way,” suggesting he expected the effort to fail (Dallas Morning News 2007b). Confusion over missing data in the CEI added to the distrust of the VAM. Teachers at schools with high student mobility, absences, or with small classes were puzzled about why they were excluded from the CEI calculations. In 2006-2007, 44,021 students were omitted from the CEI analysis, about 25% of the District’s students, and 62% of the schools had more than one-third of the students dropped (Dallas Morning News 2008b). The possible consequences of high student mobility were not unknown to the District. Indeed, in a 2004 report, they noted that 40% of the teachers did not have a CEI, and expressed concern about the impact that schools with high student mobility would have on the CEI accuracy (Dallas Independent School District 2004). Interestingly, the report also recognized that the District should not use the CEI for high stakes. Dallas’ recent experience in VAM for high stakes illustrates the dangers of quietly incorporating VAM and then later increasing its importance. Without an incentive for VAM to be taken seriously, it cannot receive the necessary vetting from teachers and principals that would gain 20

Similarly, many experts believe that the reason 50% of Pennsylvania teachers in VAM pilot districts did not know what VAM is and how to use it is due to the fact that Pennsylvania does not attach value-added measures to high stakes or any type of accountability system (McCaffrey and Hamilton 2007).

- 19 -

their trust and overcome the complex nature of valued-added models. Furthermore, as the next key finding describes, the lack of a vetting process also tends to lead to ineffective communication between administrators and teachers/principals, causing even more distrust of VAM. Also, the change from the CEI having little consequence to high stakes was jarring, adding to the existing difficulties the CEI already faced. Key Finding: Attempts by administrators to respond to criticisms of or concerns about valueadded models often only worsen the issue when they do not address the distrust of VAM caused by its complexity. Nevertheless, if substantial efforts are made to educate teachers about what value-added is, how it works, and how it can be used to improve teaching skills, trust in VAM may grow. Our findings show that states and districts use various means to address the confusion and lack of trust that stakeholders feel towards value-added models. Unfortunately, they often fail to account for the different responses needed for different stakeholders. The researchers who audited TVAAS for the state, while acknowledging it had statistical validity, found TVAAS fell short in communicating to those it was evaluating: …from the documents I reviewed, I found the earliest (and the most frequent references) were written in statistical language. The more recent (and less numerous) documents were written with the general explanations that leave the reader without an understanding of how the calculations are performed. The problem this creates is that those people who are most affected by the statistical calculations do not have an understanding of what is happening. This breeds suspicion and lack of support (Bock 1996, 43). So while researchers may have had better access to the documentation about the statistical method, educators and lawmakers remained unconvinced about value-added models, leading to the effort from the teachers’ union and lawmakers to eliminate TVAAS. Sanders has seemed unable to grasp the nuances different audiences require, leaving him vulnerable to the perception he felt TVAAS should just be left up to the experts. Frustrated with the lack of credit given to him for trying to explain TVAAS to teachers, he remarked, “Not surprisingly, they [the detractors] make little effort either to understand the TVAAS process or benefit from the information it provides. To this group, no apology is warranted” (Office of Education Accountability 1995, 56). This defensive and dismissive accusation only furthered distrust of TVAAS. Additionally, critics of TVAAS have called for a simpler model, arguing it would make it more accessible, garnering more support. Sanders has steadfastly refused, arguing doing so would lead to unreliable results, “People, under the banner of transparency, argue that it can be simpler, but they don’t realize the assumptions that they’re sweeping under the rug. I’ve spent a quarter of a century working on this damn stuff. People think that this is simple and that they can do this on an Excel spreadsheet” (Cuzzillo 2008). Even if Sanders is correct about the statistical reliability of a simpler model, it is not a response that engenders trust in those untrained in

- 20 -

statistical methodology. It reinforces the sense that value-added models are too complex for the lay-person to understand, without offering any assurance of their validity. Dallas also cited numerous statistical studies similar to the kind the TVAAS auditors warned against affirming the validity of the CEI (Dallas Morning News 2007b). Essentially, the District interpreted criticisms as statistical or methodological in nature, since they often were ostensibly about data. It subsequently issued statements from their researchers addressing technical aspects of VAM, which did not address the issue of how a typical educator can understand value-added models. In contrast to Tennessee and Dallas, Pueblo School District, devoted significant time and resources to educate its teachers about what value-added is and how they can use it to better their teaching skills. In 2000, the district spent $5 million teaching educators about value-added and how to use it to improve their teaching (Hubler 2000, B-O8). They also allocated time during normal school hours for teachers to review the value-added results and work collaboratively to figure how to use such data to guide their teaching techniques. An independent research report indicated that the staff in Pueblo schools “is among the best in the use of data to drive instruction” because the district instituted “Data Fridays,” (Anderson and DeCesare 2008, 12). Students are dismissed early each Friday so that the staff can work together to “identify kids needing extra support” and learn from one another’s teaching skills (Anderson and DeCesare 2008, 12). During these “Data Fridays” “value-added assessments are used by teacher cohort to determine connections and disconnections between teachers and student results” and all teachers are given time to “work collaboratively to review data….and design professional development,” programs (Anderson and DeCesare 2008, 6-12). Finally, the same study shows that such collaborative effort to understand value-added results not only “contributes to the capacity of staff to utilize and benefit from student performance data,” but also increases the “impact which such [teacher] training provides,” (Anderson and DeCesare 2008, 6).21 A similar teacher training approach for Tennessee was recommended in a recent outside study on teachers’ perceptions of professional development training in TVAAS (Gonzales 2006). The author found that teachers would prefer a TVAAS professional development program consisting of 1) clear communication of the purpose and value of TVAAS, and how it measured a teacher’s effectiveness, 2) more frequent and continuous time for TVAAS professional development, 3) conducting TVAAS professional development in small groups by grade or subject level, 4) peer collaboration, and 5) the participation of an expert in TVAAS in every school (Gonzales 2006, vi-vii). Key Finding: The complexity of VAMs can lead to difficulties in validating data, undermining confidence in value-added results. In addition, administrators often address the statistical aspects of incorrect data, rather than taking steps to rebuild trust in VAM.

It should be noted that the lack of distrust of VAM among Pueblo educators may not only be due to the fact that that they were well trained to understand and use it. Another reason could be due to the fact that high-stakes were not attached to Pueblo value-added measures; thus, Pueblo educators may have felt that that VAM was a “fairer” measure because it was not used evaluate their teaching or determine their pay scale.

- 21 -

Since the analysis in VAM is largely inaccessible to those being evaluated, any errors that are noticeable or made public can undermine confidence that all the other aspects of the analysis are correct. Because of this, VAM efforts are very susceptible to criticism. When state or district departments of education were confronted with examples of inaccuracies in their VAM data by teachers or other interested parties, they usually had a statistical response. Yet, a statistical response does not address why the mistake escaped notice, what was being done to ensure it would not happen again, and most importantly, what other errors in the VAM process might have slipped through. Much of the controversy over Dallas’ CEI stemmed from errors teachers discovered that affected their CEI score. Teachers found, for example, that some of educators were not listed as teachers of record for classes they taught, that their classes were wrongly coded in the database (thus, they were assigned to the wrong class), and some students were incorrectly dropped (making that teacher ineligible for a CEI) (Dallas Morning News 2008). The District responded: “We only know about the corrections we receive. If a school or a teacher is uninterested in making a correction, then we won’t find out about it” and that most of the errors did not change the CEI score dramatically (Dallas Morning News 2008). By putting the blame and onus on the teachers and schools to validate certain information used in the analysis, and by arguing that the errors described above did not threaten the validity of value-added measurements, the District only added to teacher distrust and skepticism towards CEI. Because of the errors, Bolender, the teachers’ union president characterized the CEI as “garbage in, garbage out” (Dallas Morning News 2008). The leader of the other teachers’ union threatened to “flood” the administration with teacher appeals of their CEI scores since the District seemed to place the responsibility on the teachers and schools for the data (Dallas Morning News 2008).

C. Problems Due to Power Struggles Over Value-Added Models Most VAM participants had teachers’ unions that were able to exert significant influence in shaping VAM policy, altering the original intent of how VAM was to be used within the overall accountability system. As a result, most VAM efforts face constraints and must account for teachers’ union preferences when developing policies. Key Finding: All teacher unions of the VAM participants opposed using teacher-level VAM to evaluate teachers and/or for high stakes (except when used in conjunction with bonuses for teaching in hard-to-serve schools); as a result, the VAM participants use school level or grade level VAM and tend not to use teacher level VAM for high stakes. In Milwaukee, the state chapter of the NEA, the Wisconsin Education Association Council, opposed teacher level VAM; as a result the District does not have teacher level VAM and only uses school and grade level VAM without high stakes (Wisconsin Education Association Council 2004). This occurred despite the fact that Milwaukee school board members that were backed by the teachers’ union (and thus tended to oppose teacher accountability measures) were being replaced by proponents of school reform (Milwaukee Journal Sentinel 2007). This example shows that district proponents of VAM must account for statewide teachers’ unions, even if district teacher union is weak.

- 22 -

In Pennsylvania, the state legislature, the state Board of Education, and an influential think tank (Operation Public Education) all wanted teacher level VAM results to be used for teacher evaluation and merit pay. Nevertheless, due to opposition of the two state teachers’ unions (Pennsylvania State Education Association (PSEA) and the Pennsylvania American Federation of Teachers (PaFTA)) value-added is now limited to only school and grade levels and the results are kept private (Raffaele 2003; Howell 2004; Elizabeth 2002). Similarly, due to opposition of the Ohio teacher unions to teacher level value-added, Ohio’s Department of Education limited value-added modeling to the school and grade level as well (Olsen 2004). Chicago and New York City are the only VAM participants in our sample that have mayoral control over the school district. NYC district administrators are using their clout to test whether teacher level VAM should be used to evaluate teachers or for high stakes, such as merit pay. Nevertheless, because the district teachers’ union considers the study a violation of the teachers’ contract, the union is calling NYC School Board to “cease and desist” the study (New York Federation of Teachers 2008).22 Although Chicago also has mayoral control, it was not until May 2007 that teacher merit pay was instituted, without any consultation with the Chicago Teachers Union (CTU). As stated earlier, the Chicago teacher merit pay program, R.E.A.L., is limited to the 40 highest need schools and gives a bonus to teachers based upon value-added results (Chicago Public School 2007). One reason the CTU has not successfully contested R.E.A.L. is because the union has been experiencing a decline of power and influence in the district (Lynch 2004). Another reason is due to value-added being limited to truly poor performing schools and only used to calculate bonus pay to deserving teachers. Thus, because not all teachers in CTU were subject to valueadded based evaluations, the union’s opposition was not fervent. Dallas is a curious exception to the pattern that teacher merit pay based on VAM occurs with mayoral control. While the two teachers’ unions strongly opposed the merit pay program, they are relatively weak unions and cannot exert much influence over Dallas Independent School District. Neither union had the right to represent the teachers in collective bargaining, and as a result, the teacher contracts did not give the unions much leverage or power.

Since the “cease and desist” order was just given, there was not any information about whether the district had stopped the VAM study.

- 23 -

IV. Summary of Problems and Recommendations We have condensed the above findings into two overarching problems that would either cause stakeholders to push for elimination of value-added metrics from the state/district accountability system, or would cause changes and resistance and thus would drastically limit its effectiveness. Since these two problems exist throughout the life of a value-added effort, states and districts should be continually proactive in addressing these threats, even if they appear to have already overcome them or if there are no signs of problems currently existing. We have identified several possible solutions for these two problems, based upon actual responses from VAM participants and our own assessment of the findings. We evaluated each of these solutions based on the criteria of political acceptability, fiscal feasibility and degree of inter-agency coordination and cooperation. • Political acceptability is defined as the lack of deal-breaking opposition from key stakeholders, including teachers’ and principals’ unions. • A solution is considered fiscally feasible if it does not require a district or state to spend beyond the allocations it has already made for the essential costs of value-added implementation which includes paying for the costs of data collection, storage, and calculation and minimal staff training. • A solution that requires little to no interagency coordination or cooperation would not compel a state department of education or a school district to develop new means to communicate with schools or to communicate with them more often; it would also not require a school to reorganize its schedule or to compel its staff to work together more often than usual. Problem 1: VAM model may be too complex to be fully understood by all stakeholders As the review of TVAAS commissioned by the Tennessee state legislature found, it is very difficult to adequately explain valued-added models to both researchers and lay people. As a result, most of the states and districts studied struggled to adequately explain value-added models, leading to distrust by stakeholders, and even threatening the very existence of valueadded models in accountability systems. Possible Solution 1: Avoid explaining the value-added model by asking stakeholders to trust the experts administering value-added models Dallas is the major example of this approach, as district administrators tried to justify the soundness of their value-added efforts by referring to technical studies that the stakeholders could not be expected to understand. As a result, there was “massive” distrust of Dallas’s value-added model, leading to a large backlash from teachers and the press when a teacher merit pay system was instituted. This is perhaps the most fiscally feasible option and does not require any interagency coordination or cooperation. But it is not politically acceptable since it alienates teachers and other stakeholders. Since this option does not attempt to address the central issue of

- 24 -

trust in the value-added model, it has no potential for leading to a successful value-added system in the short or long term. Solution 1 is NOT recommended. Possible Solution 2: Attempt to explain the value-added model with an extensive effort to educate all stakeholders on what value-added is and precisely how it works Pueblo School District is the best example of this solution. Their “Data Fridays,” a school hour time for teachers to collaborate in assessing and strategizing about their students’ value-added results was cited by a research paper as demonstrating successful outcomes. In addition, this approach closely mirrored the results of a study into the type of value-added training program teachers in Tennessee would prefer (see “Problems Due to the Complexities of Value-Added Models” section above). Although this solution is political acceptable, it may not be fiscally feasible and would require extensive internal coordination. Pueblo School District, for example, had to secure funding from private foundations and to reorganize their own funding allocations to create an extensive value-added training system for their teachers (Bingham 2000). It also required each school letting students out early every Friday, which other school districts may not be able to do. The Ohio Department of Education (ODE) also made an extensive effort to educate its stakeholders. However, the cost of this effort and the degree of internal coordination was lessened because the ODE depended on the non-profit Battelle for Kids to assist it. Because Battelle for Kids had experience instituting value-added measures in Ohio districts, they were able to provide ODE training material as well as staff (Battelle for Kids n.d.a; Battelle for Kids n.d.b.).23 Finally, the Ohio state legislature creatively tried to lessen the cost of training new teachers to understand VAM by passing a law that mandates that future educators be taught about value-added methods in teaching colleges (Ohio State Legislature 2006-07 session, House bill 107). Solution 2 IS recommended, as long as it includes: peer collaboration, regular training, a District trained value-added expert assigned to each school and as long as there is sufficient funding for continual training. Possible Solution 3: Simplify the value-added model to make it more accessible and transparent We did not find any evidence that the districts or states we studied attempted to simplify the value-added model in order to make it easier to understand and trust. We did find that Sanders strongly resisted doing so in fear that it would lead to inaccurate results. Also, Chicago—the one case that currently uses a fairly easy to understand linear multiple regression model—is in the process of developing a more complex model, similar to that used in Milwaukee (Chicago Public Schools, Department of Research, Evaluation and Accountability 2008). 23

Long before the Ohio Department of Education instituted value-added state-wide, Battelle for Kids helped establish value-added in 60 Ohio school districts.

- 25 -

For states and districts that already have a value-added model, it may be difficult to fiscally and organizationally adjust their models, but it may have a dramatic political effect, possibly winning over critics. In addition, there is not any research establishing that simpler value-added models are in fact significantly inferior to more complex ones. Given the potentially high political gain, which is perhaps the most difficult area to improve, it is worthwhile to investigate this solution further, especially for states and districts still developing their value-added models. Solution 3 is recommended if additional research finds the theoretical decrease in accuracy is outweighed by the political gains in stakeholder acceptance. Possible Solution 4: Require the value-added calculation process to be audited by an outside, independent agency The Ohio Department of Education is the best example of this solution, allowing a complex value-added model to be used while increasing stakeholder trust. In order to reconcile the differences between a new, proposed statewide value-added system and the existing Battelle for Kids value-added system, the ODE hired an independent auditor from the American Institute of Research to study the two models and ascertain why the discrepancy occurred. When the auditors certified that the differences between the two models were minimal and that both models were valid, no further complaints about the ODE model were reported (Ohio Department of Education n.d.e.). Also, the Tennessee state sponsored review of TVAAS recommended that, in order to overcome the distrust faced by the statistical complexity of value-added models, the state should have: 1) an outside auditor to continually validate that all calculations are being correctly performed, and 2) an outside auditor to monitor how the statistical analysis is being implemented and if any changes to the method or model needs adjusting (Bock 1996, 43). This solution has high political acceptability as it would mitigate fears like those in Dallas that value-added models are â&#x20AC;&#x153;black boxesâ&#x20AC;? and mistakes could be made without anyone knowing. There are fiscal and organizational costs that increase with large school districts, especially since continual monitoring is required for optimal effectiveness. However, it would be very difficult to gain the trust of stakeholders, without this solution. Solution 4 is recommended, despite the possible costs to contract and coordinate with an outside auditor, as the political benefits are almost indispensable. Final Recommendation for the Problem of the Complexity of Value-Added Models: In order to best address the problem of the complexity of value-added models, both solutions 2 and 4 should be adopted. This would include: a comprehensive professional development program with: peer collaboration, regular training, and a District trained value-added expert assigned to each school; and an auditing system that verifies the accuracy of the statistical analysis as well as monitors the implementation of the results. Together, these approaches will significantly increase the trust of stakeholders and, since they are ongoing efforts, will give the best chance for long-term success. It may also be worthwhile to research if a simpler model could be used instead.

- 26 -

Problem 2: The teachers’ union opposes the use of a teacher level value-added metric for teacher evaluation or high stakes24 In almost all the cases where teacher-level value-added methods were considered, the teachers’ unions strongly opposed VAM, and lobbied for it to be eliminated (Pennsylvania, New York City, Dallas and Tennessee), for it not to be used for teacher evaluation or high stakes (Pueblo, Chicago, North Carolina, and Tennessee), or for the value-added results to be kept sealed and unpublished (Pueblo, North Carolina, Dallas and Tennessee). When teacher level value-added measurements were used, they were utilized as part of teacher improvement plans or professional development. These constraints on what value-added models could measure and their use can potentially reduce the effectiveness of the state’s or district’s accountability system, as researchers generally agree that, within a school, teachers have the greatest impact on student achievement. Faced with resistance towards teacher level value-added models, states and districts in our sample have made adjustments while attempting to fulfill the goals of their accountability plans. Possible solution 1: Ignore teacher union’s resistance and implement VAM at the teacher level As the findings above have shown, this solution is only feasible for states and districts with enough power to ignore union positions, such as Chicago and NYC who have mayoral control over their school districts. Yet while it may be possible for a state or district to impose their will upon the teachers’ union, the findings from established valueadded efforts like Tennessee point to problems in the long run. Long-term distrust can eventually lead unions to partner with lawmakers or other parties to eliminate valueadded models. Thus, despite low fiscal and coordination costs, the political costs are too high for the long-term viability of a teacher level value-added model or any value-added model. Solution 1 is NOT recommended due to the threat of serious problems emerging in the future. Possible Solution 2: Use VAM at the teacher level, but only as a bonus for teaching in the lowest performing schools Due to the dangers and resistance to teacher level value-added models, districts in our sample attempted to mitigate friction with teachers by connecting teacher level valueadded metrics with teacher bonus pay for low performing schools. The best examples of this approach are Chattanooga School District in Tennessee and Chicago Public Schools. The use of high stakes teacher-level VAM for only those schools that serve the lowest performing students has many political advantages. First, by framing this high stakes component as a “bonus” that targets the neediest schools while rewarding a subset of teachers for their participation, it is more difficult for teacher unions to reject. Additionally, the inclusion of value-added models to evaluate teacher performance in these underserved schools allows for proponents to argue that funds are being used 24

High stakes generally include any use such as grades, rankings, performance pay, etc. that can have positive or negative repercussions for the party evaluated.

- 27 -

efficiently and effectively. Along with possible long-term cost benefits in the form of a better equipped labor force and lower unemployment rates, pairing VAM with teacher bonuses offers a means to mitigate the potentially high cost. Although instituting such a policy also has the advantage of not requiring extensive internal coordination and teacher training, it does have the disadvantage of being fiscally costly. Chicago and Chattanooga school districts both rely on private funds to pay for their teacher bonuses and there is already some fear in Chicago about how it can continue the program once these funds are no longer available (Rossi 2002; Rossi 2007). The high cost of the program also limits how many schools and teachers can participate, lessening the impact. While Denver has used bonds to fund its teacher performance pay program, this may not be possible for larger districts or statewide efforts. In addition, more research needs to be done on how many schools and teachers must participate in order for bonus-pay to be effective. Solution 2 is only recommended for states and districts that can afford to fund it long-term and include an adequate number of participants. Possible Solution 3: Use VAM at the teacher level but without evaluations or high stakes; introduce high stakes at a later stage. This is the approach most of the VAM case studies took since it is the most politically feasible. Theoretically, introducing value-added models without high stakes may provide the opportunity for stakeholders to become acclimated to its use. However, the findings for Dallas and Pennsylvania demonstrated that when not used for high stakes or teacher evaluation, teacher level value-added models did not motivate stakeholders to sufficiently utilize it or to fine-tune its implementation. Instead, VAM was largely ignored. Three years after Pennsylvania introduced PVAAS, studies showed that very few principals and even fewer teachers used value-added to assist them in even low-stakes uses such as developing meaningful professional development or school reform policies (McCaffrey and Hamilton 2007). In addition, some districts such as Dallas encountered strong resistance when they later tried to attach high stakes to their existing teacher level value-added models. While the use of teacher level value-added models without any stakes is politically attractive, it did not prove to be effective, nor did it successfully serve as a â&#x20AC;&#x153;Trojan Horse,â&#x20AC;? in which teacher level value-added models could assume a more prominent role in the future. Instead, it conditioned stakeholders not to take it seriously and failed to create the necessary communication with administrators. Solution 3 is NOT recommended as it circumvents the necessary incentives and processes needed for value-added models to gain trust and adoption. Adding VAM at a later stage with high stakes can further stakeholder perception that value-added efforts lack transparency as administrators may feel the VAM system is mature while others may be less comfortable or knowledgeable about it.

- 28 -

Possible Solution 4: Do not use VAM at the teacher level, but instead use it at the school level and attach high stakes to it. Examples from our sample of this approach included Ohio, Milwaukee, Chicago, and Dallas. Dallas introduced their School Effectiveness Indices (SEI) in 1991 along with a $2.4 million program that rewarded bonuses to staff at the schools with an “effective” or above ranking, with about 20% of the staff members receiving a bonus of $500 to $1,000. Schools ranked “ineffective” were given additional resources or had their administrators placed on administrative leave (Webster 1996). School level value-added models were not met with as much resistance from teachers’ unions or principals’ unions when they were proposed. Since these school level VAMs were usually not punitive, instead coming in the form of school rankings, award recognition, or bonuses for staff, they were generally more acceptable to stakeholders (see above findings for more complete details). In addition, given that recent accountability efforts began at the school level before becoming statewide programs, there was an existing precedence for high stakes for schools. Yet while school level value-added models may be politically feasible, there are questions about how effective they are compared to teacher level valued-added models since they may not adequately address teacher performance. In addition, despite Dallas’ history with high stakes school level value-added metrics, it still encountered significant resistance to the value-added system in general when it changed its low stakes teacher level value-added model to a merit pay system. This suggests that high stakes school level value-added methods are not necessarily a stepping stone to a high stakes teacher level value-added measurement. Solution 4 is only recommended if there is not an alternative means to incorporate value-added models into an accountability system, or if further studies demonstrate school level value-added models are still useful within an accountability system despite not addressing the effectiveness of its teachers. Final Recommendation for the Problem of Teacher Union Opposition at the Teacher Level: The most promising solutions are those that include a high stakes or formal evaluation consequence to a value-added model level. If there are sufficient resources, then using a bonus pay type of high stakes for the teacher level value-added is recommended. The other recommended solution is to attach the high stakes to the school level, though the actual impact needs more research.

- 29 -

Conclusion

Value-added models have the tantalizing promise of holding schools and teachers responsible for their studentsâ&#x20AC;&#x2122; performance based on factors the schools and teachers can control. Therefore, it is not surprising that there is much interest in VAM despite the significant problems the models have encountered. We have identified several key problems that threaten the long-term implementation of valueadded models, grouped into clusters related to the complexity of VAM and resistance from teachersâ&#x20AC;&#x2122; unions. These problems can emerge at any time in the life of value-added models, and their existence is often masked when they are not prominently used. We found it was crucial to gain the trust of stakeholders, yet few states or districts adopted an effective approach to cultivate support. VAM administrators often perceived complaints to be statistical in nature and responded with overly technical explanations or assurances that failed to address the root cause of dissatisfaction among stakeholders, particularly those being evaluated by VAM. As a result, the priority should be to create systems and processes that earn the trust of key stakeholders, including external, independent auditing and ongoing training/mentoring of VAM that includes peers. We also recommend that the best way to counter the formidable resistance of teachersâ&#x20AC;&#x2122; unions to high stakes teacher level value-added models is to focus on tying bonus pay for underserved schools to VAM. Not only does this avoid the unadvised attempt to move from low stakes to high stakes at the teacher level, it can help establish credibility that value-added models ultimately can lead to a more efficient and effective allocation of resources, such as bonus pay funds and talented teachers. Ultimately, successful, long-term implementation of value-added models requires a dedicated, sustained, and strategic effort from policymakers to gain the trust of stakeholders and overcome union opposition. With a sound approach, value-added models can be incorporated into accountability systems, providing a potentially important tool in improving student achievement.

- 30 -

VI. Appendix A. Original Evaluation Matrix I. History/Background of VAM A. Description of the State/District Using VAM 1. Size/rank within nation 2. Mix of types of districts (urban/suburban/rural) 3. SES B. History of assessments and accountability systems 1. Past accountability systems a. Weakness/failures of past accountability systems 2. Genesis of VAM a. How to address weakness of past assessments C. VAM in current accountability system 1. Purpose of accountability system/Political reasons for new accountability system 2. How VAM fits into accountability system a. How long it has been used b. Level of VAM used c. Relation to other assessments d. General statistical approach e. Consequences of VAMâ&#x20AC;&#x201D;i.e. attached to performance pay or high stakes f. VAM results made public II. Statistical Concerns A. Quality of assessments available for VAM (frequency of testing, student id, vertical scaling, normative vs. criteria based tests) B. Enough previous test score data C. Impact of concerns/problems 1. When 2. Who 3. Level of opposition D. Outcome/solution III. Problems related to Authority over VAM implementation A. Background on governance structures between state and districts or within district 1. Relationship between stateâ&#x20AC;&#x2122;s department of education and local district; or district and schools 2. Unions a. Level of political influence B. Which governing bodies have authority to initiate VAM C. Which governing bodies have authority to approve VAM D. Political power of teacher unions E. Teacher contract prohibit VAM

- 31 -

F. Level of cooperation with teacher union needed for VAM approval G. Impact of problem 1. Who 2. Level of concern/opposition H. Solution IV. Problems related to resource constraints A. Budget constraint used as argument against VAM B. Insufficient resources for data collection C. Insufficient resources for data analysis/publication D. Insufficient resources for training E. Impact of problem F. Solution V. Interpretative problems A. Do not understand what VAM is measuring 1. Impact on VAM effort B. Do not understand how to interpret VAM results 1. Impact on VAM effort C. VAM misused, not applied correctly towards policy goals 1. Impact on VAM effort D. Changes/concerns 1. When 2. Level of concern/opposition 3. Outcome a. Solution VI. VAM is not appropriate to policy goals

- 32 -

B. Revised Evaluation Matrix I. Background and Purpose of VAM Implementation A. Reasons for VAM implementation 1. VAM is a fairer way to measure improvement a. Cases 2. VAM is better way to rank schools/reward teacher efforts a. Cases B. How VAM used in accountability system 1. Performance pay a. Cases 2. School report card a. Cases 3. Teacher evaluation a. Cases 4. VAM not used in accountability, but for informational purposes only a. Cases C. Level of VAM used and whether results are made public 1. Teacher-level and not public a. Cases 2. No teacher level; only school/grade level a. Public i. Cases b. Not public i. Cases D. How data collected 1. EVAAS a. How came about b. States that use it and why they use it E. Grade level VAM was used in 1. Grades 2-8 a. Cases 2. High school grades a. Cases II. Problems due to the complexity of VAM A. Stakeholders do not trust VAM due to not understanding it 1. When distrust begins a. Only when teacher-level used? b. Why distrust grows 2. Inability to replicate results a. Impact 3. No access to model and the steps used to compute a VAM result a. Impact

- 33 -

4. Wrong approach used to explain VAM and its validity a. Impact 5. Distrust of VAM due to publicly acknowledged mistakes or discovery of missing data a. Impact B. Responses of distrust of VAM 1. Responses that worsen distrust of VAM a. Telling stakeholders to trust experts b. Attempting to validate VAM by presenting more complex statisticalbased studies 2. Responses that lessen distrust of VAM a. Allocating resources to thoroughly train educators b. Allocating a time during school hours to give educators a chance to review data c. Encouraging collaborative effort to review VAM data and use data to improve teaching techniques d. Simplifying the VAM model to make it more understandable e. Using an outside auditor to validate VAM results III. Power Struggle over VAM A. Strong opposition from powerful teacher union 1. Result: school level VAM, but high stakes 2. Result: school level VAM, but no high stakes B. Opposition from weaker teacher union 1. Result: teacher level, but only used as bonus pay for low achieving schools 2. Result: teacher level evaluation, but still highly contested

- 34 -

VI. References Anderson, Amy and Dale DeCesare. 2008. Profiles of Success. Donnell Kay Foundation and Augenblick, Palaich & Associates. http://www.apaconsulting.net/uploads/reports/11.pdf (accessed March 8, 2008). Associated Press. 1999. Educators Seek Changes in Reading and Writing Testing System. Associated Press State & Local Wire, Oct. 4, State and Regional. Battelle for Kids. 2007. Presented to: T-CAP Participants. Battelle for Kids, Power Point Presentation. http://www.columbus.k12.oh.us/staffdev/forms/TCAP_principal_PowerPoint.pps (accessed February 22, 2008). Battelle for Kids. 2008. T-CAP Initiative. Battelle for Kids. http://battelleforkids.com/home/SOAR/TCAP/TCAPTestimonial. Accessed March 5, 2008. Battelle for Kids. n.d.a. Module IV Activity: Identifying Gain Patterns in School Diagnostic Reports. Battelle for Kids. http://www.ode.state.oh.us/GD/Templates/Pages/ODE/ODEDetail.aspx?page=3&TopicR elationID=117&ContentID=17029&Content=33810 (accessed February 22, 2008). Battelle for Kids. n.d.b. Module I : Getting Ready for Value-Added Analysis. Battelle for Kids. http://www.ode.state.oh.us/GD/Templates/Pages/ODE/ODEDetail.aspx?page=3&TopicR elationID=117&ContentID=17029&Content=33810 (accessed February 22, 2008). Battelle for Kids. n.d.c. Benefits of Value-Added Analysis for Principals. Battelle for Kids. http://www.ode.state.oh.us/GD/Templates/Pages/ODE/ODEDetail.aspx?page=3&TopicR elationID=117&ContentID=17029&Content=33810 (accessed February 22, 2008). Battelle for Kids. n.d.d. Benefits of Value-Added Analysis for Teachers. Battelle for Kids. http://www.ode.state.oh.us/GD/Templates/Pages/ODE/ODEDetail.aspx?page=3&TopicR elationID=117&ContentID=17029&Content=33810 (accessed February 22, 2008). Bingham, Janet. 1999. Method Confirms Teachersâ&#x20AC;&#x2122; Sway. The Denver Post, April 19, Denver Section. Bingham, Janet. 2000. Method Confirms Teachersâ&#x20AC;&#x2122; Sway. The Denver Post, May 4, A Section.

- 35 -

Bock, R. Darrell, Richard Wolfe, and Thomas H. Fisher. 1996. A Review and Analysis of the Tennessee Value-Added Assessment System. Nashville: Office of Education Accountability. Borsuk, Alan. 2006a. MPS Puts New Focus on Progress; â&#x20AC;&#x153;Value-Added Data Combines with Test Scores to Rate Schoolsâ&#x20AC;? Achievement. Milwaukee Journal Sentinel, January 11, Section B. Borsuk, Alan. 2006b. Some MPS Schools Face Additional Scrutiny; District Hopes More Oversight Improves Poor Performance. Milwaukee Journal Sentinel, May 4, Section B. Braun, Henry. 2005. Using Student Progress to Evaluate Teachers: A Primer on Value-Added Models. Educational Testing Service: Policy Information Center. www.ets.org/research/pic (accessed February 29, 2008). Business Wire. 2004. Education Advocacy Group Honors SAS Researcher with New Award. Business Wire, November 22. California Department of Education. 2007a. 2007 Adequate Yearly Progress Report: Information Guide. August 2007. California Department of Education. 2007b. Explaining 2007 STAR Internet Reports to the Public. California Department of Education. California Department of Education, Testing and Accountability. 2007. STAR 2007 Test Results. California Department of Education. http://star.cde.ca.gov/star2007/Viewreport.asp (accessed March 2, 2008). California Department of Education, Testing and Accountability. n.d.a. Questions and Answers: California Standards Test. California Department of Education. http://www.cde.ca.gov/ta/tg/sr/qandacst08.asp (accessed March 2, 2008). California Department of Education, Testing and Accountability. n.d.b. API Description. California Department of Education. http://www.cde.ca.gov/ta/ac/ap/apidescription.asp (accessed March 2, 2008). Center for Greater Philadelphia. n.d.a. Operation Public Education: A New System of Accountability. Center for Greater Philadelphia. http://www.cgp.upenn.edu/ope_new_system.html (accessed March 2, 2008). Center for Greater Philadelphia. n.d.b. Value-Added Assessment in Ohio. Center for Greater Philadelphia. www.cgp.upenn.edu (accessed February 22, 2008). Chicago Public Schools, Department of Research, Evaluation and Accountability. 2008.

- 36 -

REA Fact Sheet on the CPS 2007 ISAT Gains Metric. Chicago Public Schools. http://research.cps.k12.il.us (accessed March 8, 2008). Chicago Public Schools, Department of Research, Evaluation and Accountability. n.d.a. Measuring Student Gains on the Iowa Test of Basic Skills. Chicago Public Schools. http://research.cps.k12.il.us (accessed March 8, 2008). Chicago Public Schools, Office of Communication. 2002. New CPS Accountability System Rewards Gainers. Chicago Public Schools. http://www.cps.k12.il.us/AboutCPS/PressReleases/Archives/October_2002/Accountabilit y103002/accountability103002.html (accessed February 28, 2008). Chicago Public Schools Office of Communications. 2007. CPS Announces First Cohort for R.E.A.L. Program. Chicago Public Schools. http://www.cps.k12.il.us/AboutCPS/PressReleases/May_2007/Real%20TIF.htm (accessed March 16, 2008). Chicago Sun-Times. 2003. A Good Deal for Teachers Could Do a Good Deal for Kids. Chicago Sun-Times, September 26, pg. 47. Chicago Tribune. 2006. Editorial: Rewarding Good Teachers. Chicago Tribune, July 27, Commentary section. City, Elizabeth. 1996. ABCs Will Be Aired at Education Forum. The Virginian-Pilot, February 23. Colorado Department of Education. 2007. Colorado Accreditation Program: Implementation Guidelines. Colorado Department of Education. http://www.cde.state.co.us/cdeedserv/download/pdf/AccredGuidelines.pdf (accessed February 26, 2008) Colorado Department of Education. 2008. Academic Growth of Students Measurement. Colorado Department of Education. http://www.cde.state.co.us/cdeassess/documents/SAR/2005/Computing_academic_growth_o f_students_how_to_final.doc (accessed February 26, 2008). Council of Chief State School Officers. 2005. Policymakers’ Guide to Growth Models for School Accountability: How Do Accountability Models Differ? Washington, D.C.: The Council of Chief State School Officers. Dallas Independent School District. 2004. Teacher Effectiveness Measure and Its Applications. Dallas Independent School District. Derringer, Jenny. 2008. New Ohio School Performance System “a great addition.” www.crescent-news.com, January 8 (accessed February 22, 2008).

- 37 -

Drury, Darrel and Harold Doran. 2003. The Value of Value-Added Analysis. Policy Research Brief: Examining Key Education Issues Volume 3, Number 1. Education Week. 2006. NYC Chief Unveils New Accountability Plan; Schools will be Evaluated, receive grades. Education Week 25, no. 5 (May 16). Education Week. 2007. NYC Districts Issues “Value-Added” Grades for Schools. Education Week 27, no. 12 (November 14). Elizabeth, Jane. 2002. Proposal Would Make it Easier to Track Student Progress in State. Pittsburgh Post-Gazette, September 19. Elliot, Scott. 2006. Data Suggests Income Predicts School Test Score Results. Dayton Daily News, September 5. Elmore, Richard, Allen Grossman, and Caroline King. 2006. Managing the Chicago Public Schools. Public Education Leadership Project at Harvard College, June 20. Enterprise Systems. n.d. KPIs in Education: Rank versus Raw Numbers. Enterprise Systems. http://esj.com/business_intelligence/article.aspx?EditorialsID=8806 (accessed March 19, 2008). Evergreen Freedom Foundation. 2001. Value Added Assessment. Evergreen Freedom Foundation 2001 Studies. http://www.effwa.org/pdfs/Value-Added.pdf (accessed February 24, 2008). Fischer, Kent. 2007a. Proposal Calls for Teacher Merit Bonuses. Dallas Daily News, November 15. http://www.dallasnews.com/sharedcontent/dws/dn/education/stories/DNmeritpay_15met.ART0.North.Edition1.36ee1e4.html. Fischer, Kent. 2007b. DISD Bonus Tied to Ratings. Dallas Daily News, November 23. http://www.dallasnews.com/sharedcontent/dws/news/localnews/stories/DNteachereval_23met.ART.State.Edition2.376ba1f.html. Fischer, Kent. 2008. “Dallas Schools Teacher Ratings Sometimes Off Mark.” Dallas Daily News, February 22. http://www.dallasnews.com/sharedcontent/dws/dn/education/stories/022308dnmetdisdcei .3a5f9fe.html. General Assembly of North Carolina. 2007. An Act Directing the State Board of Education to Develop a Framework for Reaching One’s Potential for Excellence. Session Law 2007-277 Senate Bill 1030. Gonzales, Michael Scott. 2006. Exploring Teachers’ Perceptions of the Tennessee ValueAdded Assessment System of Professional Development. Phd dissertation. Tennessee State University.

- 38 -

The Governor’s Blue Ribbon Task Force. 2003. November 10, 2003 Funding for Success Committee. Ohio Department of Education. http://www.blueribbontaskforce.ohio.gov/committees/FS_11-10-03_minutes.asp (accessed February 22, 2008). Greenwald, Rob, Larry Hedges and Richard Laine. 1996. The Effect of School Resources on Student Achievement. Review of Educational Research 66(3): 361-396. Grossman, Kate. 2002. State School Board Plans More Tests. Chicago Sun-Times, December 19, pg. 20. Herszenhorn, David. 2005. Test Scores to be Used to Analyze Schools’ Role. The New York Times, June 7, Metropolitan Section. Hirsch, Michael. 2008. UFT: Progress Reports Shouldn’t Punish. New York Teacher, January 17. http://www.uft.org/news/teacher/general/punish/ (accessed March 5, 2008). Howell, Cynthia. 2004. Education Initiative is Expanding School-Reform Ideas Reaching Across U.S. Arkansas Democrat-Gazette, May 1. Hubler, Eric. 2000. Pueblo Wants to Teach its System to Teachers. The Denver Post, August 10, Denver Section. Kolben, Deborah. 2006. Like Students, City Schools Will be Graded. The New York Sun, April 12, Front page. Lynch, Deborah. 2004. The School Board Wants to Create Contract Schools Where Teachers Don’t Have Teaching Certificates. Chicago Sun-Times, September 15, pg. 61. McCaffrey, Daniel, J.R. Lockwood, Daniel Koretz and Laura Hamilton. 2003. Evaluating Value-Added Models for Teacher Accountability. Santa Monica, CA: RAND Corp. McCaffrey, Daniel and Laura Hamilton. 2007. Value-Added Assessment in Practice: Lessons from the Pennsylvania Value-Added Assessment System Pilot Project. Santa Monica, CA: RAND Corp. McFadyen, Deidre. 2006. UFT Skeptical of Chancellor’s New Accountability Program. New York Teacher, April 27. http://www.uft.org/news/teacher/general/accountability/ (accessed March 5, 2008). McFadyen, Deidre. 2007. UFT Skeptical of Chancellor’s New Accountability Program. New York Teacher, November 15. http://www.uft.org/news/teacher/general/accountability/ (accessed March 5, 2008).

- 39 -

Milwaukee Journal Sentinel. 2007. Milwaukee School Board. Milwaukee Journal Sentinel, February 16, Section A. Milwaukee Public Schools, Division of Research and Assessment. 2008. 2006-2007 District Report Card. Milwaukee Public Schools. http://www2.milwaukee.k12.wi.us/acctrep/0607/2007_district.pdf (accessed March 8, 2008). National Education Association. 2006. Rankings and Estimates 2006. National Education Association. http://www.nea.org/edstats/RankFull06b.htm (accessed February 24, 2008). New York City Department of Education. n.d. Children First History. New York City Department of Education. http://schools.nyc.gov/Accountability/CFI/default.htm (accessed February 25, 2008). New York City Department of Education. 2007a. Educator Guide: New York City Progress Report for Elementary/Middle Schools. New York City Department of Education. http://schools.nyc.gov/NR/rdonlyres/DF48B29F-4672-4D16-BEEA0C7E8FC5CBD5/27499/EducatorGuide_EMS.pdf (accessed March 6, 2008). New York City Department of Education. 2007b. Educator Guide: New York City Progress Report for High Schools. New York City Department of Education. http://schools.nyc.gov/NR/rdonlyres/DF48B29F-4672-4D16-BEEA0C7E8FC5CBD5/27498/EducatorGuide_HighSchool.pdf (accessed March 6, 2008). New York City Department of Education. 2008. February 2008: Study of TeacherLevel Value-Added in New York City. New York City Department of Education. http://schools.nyc.gov/NR/rdonlyres/0B7FC1B0-0EDB-4315-B33BBD723514B4DA/31621/Memoonvalueaddedstudy208.pdf (accessed March 6, 2008). New York United Federation of Teachers. 2006. “How the School System is Organized,” in UFT’s New Teacher Handbook. New York: New York Federation of Teachers. New York United Federation of Teachers. 2007. UFT President Randi Weingarten On School Progress Reports. New York Federation of Teachers. http://www.uft.org/news/issues/press/school_progress_report/ (accessed February 29, 2008). New York United Federation of Teachers. 2008. DOE’s “Value-Added” Pilot Project Resolution. New York United Federation of Teachers. http://www.uft.org/news/issues/resolutions/value-added/ (accessed February 29, 2008). New York United Federation of Teachers Delegate Assembly. 2006. Children First— Focus on Accountability. Power point presentation, May 17. New York United Federation of Teachers.

- 40 -

http://www.uft.org/news/issues/reports/children_first_accountability.pdf (accessed March 5, 2008). News and Record. 1996. Public to Hear Schools Plan on Monday. News and Record, April 19. Newsom, John. 1998. Schools Rise—and Fall—in ABCs; Guildford County’s ABCs Results Show that Schools can Down One Year and Up the Next and Vice-Versa. News & Record, July 17. Newsom, John. 2002. Budget Crises Threatens Gains in Teacher Pay. News & Record, July 2. North Carolina Association of Educators News. 2000. Getting it Right: Improving the school Improvement Plan; NCAE Survey on the ABCs. North Carolina Association of Educators. http://www.ncae.org/news/abcsurvey/abcsummary.shtml (accessed February 27, 2008). North Carolina Association of Educators. 2002. NCAE Recommendations for Improving the ABCs. NCAE Position Papers. http://www.ncae.org/structure/beliefs/positions/abcprogram.shtml (accessed February 27, 2008). North Carolina Association of Educators. 2007. Business/Finance and Advocacy Committee. SBE Review, December, pg. 2. North Carolina Association of Educators. n.d. NCAE History. North Carolina Association of Educators. www.ncae.org (accessed February 27, 2008). North Carolina Public Schools. 2006a. Evolution of the ABCs. North Carolina Public Schools. http://www.ncpublicschools.org/docs/accountability/reporting/abc/200607/abcevolution.pdf (accessed February 23, 2008). North Carolina Public Schools. 2006b. Facts and Figures. North Carolina Public Schools. http://www.ncpublicschools.org/docs/fbs/resources/data/factsfigures/2005-06figures.pdf (accessed February 24, 2008). North Carolina Public Schools. 2007a. ABCs 2007 Accountability Report Background Packet. http://www.ncpublicschools.org/docs/accountability/reporting/abc/200607/backgroundpacket.pdf (accessed February 23, 2008). North Carolina Public Schools. 2007b. The ABC’s of Public Education: 2006-07 Growth and Performance of North Carolina Executive Summary. North Carolina Public Schools. http://www.ncpublicschools.org/docs/accountability/reporting/abc/200607/execsumm.pdf (accessed February 23, 2008).

- 41 -

North Carolina Public Schools. 2007c. Determining School Status in the ABCs Model 2006-07. North Carolina Public Schools. http://www.ncpublicschools.org/docs/accountability/reporting/abc/200607/determiningschoolstatus0607.pdf (accessed February 23, 2008). North Carolina Public Schools. 2007d. NC NAEP Score Show State at or Above Nation in Reading and Math. North Carolina Public Schools News Releases. http://www.ncpublicschools.org/newsroom/news/2007-08/20070925-01 (accessed February 23, 2008). North Carolina State Board of Education. 2003. NCWISE Project Update. Minutes of State Board Meeting, May 1. http://www.ncpublicschools.org/sbe_meetings/0305/0305_EEO06.pdf (accessed February 24, 2008). North Carolina State Board of Education. 2005. Evaluate Validity of ABCs Accountability System Based on HB 1414. Minutes of State Board Meeting, March 2. http://www.ncpublicschools.org/sbe_meetings/0503/0503_HSP06.pdf. (accessed February 24, 2008). North Carolina State Board of Education. 2007. Educational Value-Added Assessment System (EVAAS) Teacher Module. Minutes of State Board Meeting, December. http://www.ncpublicschools.org/sbe_meetings/0712/0712_HSP06.pdf, (accessed February 24, 2008). North Carolina State Board of Education. 2008. Educational Value-Added Assessment System (EVAAS) Teacher Module. Minutes of State Board Meeting January, 2008. http://www.ncpublicschools.org/sbe_meetings/0801/0801_HSP06.pdf (accessed February 24, 2008). North Carolina State Board of Education. n.d. History of the North Carolina State Board of Education. North Carolina State Board of Education. http://www.ncpublicschools.org/state_board/SBE_history/index.html (accessed February 27, 2008). Office of the Mayor of New York. 2007. Mayor Bloomberg, Chancellor Klein and CSA President Logan Announce Tentative Contract Settlement. City of New York, April 23. http://www.nyc.gov/portal/site/nycgov/menuitem.c0935b9a57bb4ef3daf2f1c701c789a0/i ndex.jsp?pageID=mayor_press_release&catID=1194&doc_name=http%3A%2F%2Fww w.nyc.gov%2Fhtml%2Fom%2Fhtml%2F2007a%2Fpr12207.html&cc=unused1978&rc=1194&ndi=1, (accessed February 28, 2008).

- 42 -

Ohio Confederation of Teacher Education Organizations. 2006. Ohio Department of Education Update. Ohio Confederation of Teacher Education Organizations, October 27. http://www.ohioteachered.org/Fall06ODE.pdf (accessed February 22, 2008). Ohio Department of Education, Accountability Task Force. 2006a. August 26, 2006 Minutes. Ohio Department of Education. http://www.ode.state.oh.us/GD/Templates/Pages/ODE/ODEDetail.aspx?page=3&TopicR elationID=117&ContentID=14210&Content=36409 (accessed February 23, 2008). Ohio Department of Education, Accountability Task Force. 2006b. May 10, 2006 Minutes. Ohio Department of Education. http://www.ode.state.oh.us/GD/Templates/Pages/ODE/ODEDetail.aspx?page=3&TopicR elationID=117&ContentID=14210&Content=36409 (accessed February 23, 2008). Ohio Department of Education. 2006. Ohio Department of Education Fact Sheet. Ohio Department of Education. Ohio Department of Education. www.ode.state.oh.us (accessed March 1, 2008). Ohio Department of Education. 2008. State Accountability and Value-Added Briefing on Data Use. Ohio Department of Education. www.ode.state.oh.us (accessed March 1, 2008). Ohio Department of Education. n.d.a. Accountability and Local Report Card Frequently Asked Questions. Ohio Department of Education. http://www.ode.state.oh.us/GD/Templates/Pages/ODE/ODEDetail.aspx?page=3&TopicR elationID=117&ContentID=19145&Content=43515 (accessed February 22, 2008). Ohio Department of Education. n.d.b. ORC section 3314.35: Ohioâ&#x20AC;&#x2122;s Value-Added System and Community School Closures Due to Poor Academic Performance. Ohio Department of Education. www.ode.state.oh.us (accessed February 22, 2008). Ohio Department of Education. n.d.c. Value-Added Data and Reports. Ohio Department of Education. http://www.ode.state.oh.us/GD/Templates/Pages/ODE/ODEDetail.aspx?page=3&TopicR elationID=117&ContentID=41683&Content=41697 (accessed February 22, 2008). Ohio Department of Education. n.d.d. Value-Added Rules and Information. Ohio Department of Education. http://www.ode.state.oh.us/GD/Templates/Pages/ODE/ODEDetail.aspx?page=3&TopicR elationID=117&ContentID=14210&Content=36409 (accessed February 22, 2008). Ohio Department of Education. n.d.e. Memorandum: Explaining Contrasting SOAR

- 43 -

and ODE Value-Added Reports. Ohio Department of Education. http://www.ode.state.oh.us/GD/Templates/Pages/ODE/ODEDetail.aspx?Page=3&TopicR elationID=117&ContentID=45101&Content=45103 (accessed February 22, 2008). Olsen, Lynn. 2004. “Value-Added” Models Gain in Popularity. Education Week 24, no.15 (November 17). Parks, Parry. 1996. ABCs Will be Aired at Education Forum. The Virginian-Pilot. February 23. Pennsylvania Department of Education. 2006. Pennsylvania Education Summary: PreK through 12 School Statistics. Pennsylvania Department of Education. http://www.pde.state.pa.us/k12statistics/cwp/view.asp?a=3&Q=125758&k12Nav=|1146| (accessed February 28, 2008). Pennsylvania Department of Education. 2007a. Enrollment: Statistical Highlights 200607: Pre-K through 12 School Statistics. Pennsylvania Department of Education. http://www.pde.state.pa.us/k12statistics/cwp/view.asp?a=3&Q=125758&k12Nav=|1146| (accessed February 28, 2008). Pennsylvania Department of Education. 2007b. Building Data Report (Lunches Only) For October 2007 Eligible Children: Pre-K through 12 School Statistics. Pennsylvania Department of Education. http://www.pde.state.pa.us/k12statistics/cwp/view.asp?a=3&Q=125758&k12Nav=|1146| (accessed February 28, 2008). Pennsylvania Department of Education. 2007c. Public School Enrollment by County 2006-07. Pre-K through 12 School Statistics. Pennsylvania Department of Education. http://www.pde.state.pa.us/k12statistics/cwp/view.asp?a=3&Q=125758&k12Nav=|1146| (accessed February 28, 2008). Pennsylvania Department of Education. 2007d. PVAAS: Evaluating Growth, Projecting Performance. Pennsylvania Department of Education. http://www.pde.state.pa.us/a_and_t/lib/a_and_t/FINAL_Overview_for_IU_Trainings_11 2007vs2.pdf (accessed February 29, 2008). Pennsylvania Department of Education. 2007e. Pennsylvania Department of Education. Pennsylvania Department of Education. http://www.pdeinfo.state.pa.us/depart_edu/cwp/view.asp?a=13&q=121691. (accessed March 2, 2008). Pennsylvania Department of Education. n.d. PVAAS: Pennsylvania’s Statewide Plan. http://www.pafamilyliteracy.org/a_and_t/lib/a_and_t/StatewidePlan071206.pdf. (accessed February 28, 2008).

- 44 -

Peterson, Samantha. 1999. Durham Schoolsâ&#x20AC;&#x2122; Results Mixed on State Report Card. The Herald-Sun, August 6. PR Newswire. 2002. Schweiker Administration Starts Pilot Program to Help School Districts Better Measure Student Success. PR Newswire, October 29. Raffaele, Martha. 2003. Education Reform Advocate Favors Linking Teacher Pay, Performance. The Associated Press, February 5. Raudenbush, Stephen and Anthony Bryk. 1988. Methodological Advances in Analyzing the Effects of Schools and Classrooms on Student Learning. Review of Research in Education (15): 423-475. Rossi, Rosalind. 2002a. Education Plan Heavy on Data. Chicago Sun Times August 26, pg. 8. Rossi, Rosalind. 2002b. Deal Would Give Teachers Greater Input. Chicago Sun Times August 26, pg. 18. Rossi, Rosalind. 2007. New Incentive for Teachers; Bonuses Also Would Go to Principals, and Even Janitor and Clerks. Chicago Sun Times September 4, pg. 18. Silberman, Todd. 2000. NCAE Takes Test Program to Task. News and Observer. July 18. Simmons, Tim. 1997. Schools Find Value, Vexation in ABCs Program. News and Observer. October 26. Slayton, Julie. 2007. Resolution Regarding District Accountability Transformation Metrics: Performance Measurement Plan. Los Angeles Unified School District. Spodek, Brent. 1998. City Schools Basking in ABCs Honor; Every Elementary, Middle School Meets 100% of State Goal. Chapel Hill Herald. August 7. Stephens, Scott. 2007. A New Way of Assessing a Schoolâ&#x20AC;&#x2122;s Effectiveness. Plain Dealer. December 14. Stewart, Barbara Elizabeth. 2006. Value-Added Modeling: The Challenge of Measuring Educational Outcomes. Carnegie Corporation of New York. http://www.carnegie.org/pdf/value_added_chal_paper_mockup.pdf (accessed February 29, 2008). Tennessee Education Association Legislative Report. 2004. Student Testing Subject of Extensive Discussion in Legislature, Value-Added System Primary Focus of House Education Hearing. Tennessee Education Association, March 3.

- 45 -

http://www.teateachers.org/legreports/archives/2004-03-05_legreport.html (accessed March 16, 2008). Tennessee Education Association Legislative Report. 2005a. Value-Added Assessment Not Popular in Education Committee: Senators Question Cost – Exclusive Arrangement with Sanders. Tennessee Education Association, April 8. http://www.teacteachers.org/legreports/archives/2005-04-08.htm (accessed March 16, 2008). Tennessee Education Association Legislative Report. 2005b. Senators Question TVAAS Reporting – “Mixed Message” to Public Ed Commissioner to Review “Grade Cards” and Teacher Training. Tennessee Education Association, April 22. http://www.teacteachers.org/legreports/archives/2005-04-22.htm (accessed March 16, 2008). Tennessee Education Association Legislative Report. 2008. TEA Supported Resolution Calls for Hearing on TVAAS: Educators will have the Opportunity to Raise Concerns. Tennessee Education Association, February 29. http://www.teacteachers.org/legreports/archives/2008-02-29.htm (accessed March 16, 2008). Tennessee Education Association Press Releases. 2007. TEA President Wiman Questions Who Will Teach in Tennessee. Tennessee Education Association, May 12. http://www.teateachers.org/currentissues/mderel51207.html (accessed March 16, 2008). Thompson, Bruce. 2002. Testing Gives Districts a Tool to Measure Progress. Milwaukee Journal Sentinel, January 13. Crossroads Section. U.S. Department of Education. 2007. Nations Report Card: Reading 2007. National Center for Education Statistics. U.S. Department of Education. http://nces.ed.gov/nationsreportcard/pdf/main2007/2007496.pdf (accessed February 28, 2008). Webster, William J. 1996. The Dallas Value-Added Accountability System. Dallas: Dallas Public Schools. Wisconsin Education Association Council. 2004. Value-Added Assessment. Research Briefs, 2: August. Wisconsin Education Association Council. http://www.weac.org/PDFs/2004-05/ValueAddedAssessment.pdf (accessed March 8, 2008). YourCranberry.com. 2007. Good News About Pennsylvania’s Public Schools.” YourCranbery.com. http://www.yourcranberry.com/blog-entry/good-news-aboutpennsylvanias-public-schools+ (accessed February 28, 2008).

- 46 -