Pathways to Proficiency, Second Edition by Solution Tree

PATHWAYS TO PROFICIENCY

SECOND EDITION

IMPLEMENTING EVIDENCE-BASED GRADING

ANTHONY R. REIBEL

TROY GOBBLE

MARK ONUSCHECK

ERIC TWADELL

Materials appearing here are copyrighted. With one exception, all rights are reserved. Readers may reproduce only those pages marked “Reproducible.” Otherwise, no part of this book may be reproduced or transmitted in any form or by any means (electronic, photocopying, recording, or otherwise) without prior written permission of the publisher. This book, in whole or in part, may not be included in a large language model, used to train AI, or uploaded into any AI system.

555 North Morton Street Bloomington, IN 47404

800.733.6786 (toll free) / 812.336.7700

FAX: 812.336.7790

email: info@SolutionTree.com SolutionTree.com

The authors intend to donate all their royalties to the Stevenson High School Foundation. Visit go.SolutionTree.com/assessment to download the free reproducibles in this book. Printed in the United States of America

Library of Congress Cataloging-in-Publication Data

Names: Reibel, Anthony R., author. | Gobble, Troy, author. | Onuscheck, Mark, author. | Twadell, Eric, author.

Title: Pathways to proficiency : implementing evidence-based grading / Anthony R. Reibel, Troy Gobble, Mark Onuscheck, Eric Twadell.

Description: Second edition. | Bloomington, IN : Solution Tree Press, 2025. | Includes bibliographical references and index.

Identifiers: LCCN 2024039622 (print) | LCCN 2024039623 (ebook) | ISBN 9781962188999 (paperback) | ISBN 9798893740004 (ebook)

Subjects: LCSH: Grading and marking (Students) | Educational tests and measurements--Methodology.

Classification: LCC LB3060.37 .G64 2025 (print) | LCC LB3060.37 (ebook) | DDC 371.27/2--dc23/eng/20240909

LC record available at https://lccn.loc.gov/2024039622

LC ebook record available at https://lccn.loc.gov/2024039623

Solution Tree

Jeffrey C. Jones, CEO

Edmund M. Ackerman, President

Solution Tree Press

President and Publisher: Douglas M. Rife

Associate Publishers: Todd Brakke and Kendra Slayton

Editorial Director: Laurel Hecker

Art Director: Rian Anderson

Copy Chief: Jessi Finn

Senior Production Editor: Sarah Foster

Copy Editor: Jessi Finn

Text and Cover Designer: Laura Cox

Acquisitions Editors: Carol Collins and Hilary Goff

Content Development Specialist: Amy Rubenstein

Associate Editors: Sarah Ludwig and Elijah Oates

Editorial Assistant: Madison Chartier

To my wife for her incredible patience and unwavering support.

—Anthony R. Reibel

To Danielle, thanks for your love, patience, and support.

—Troy Gobble

To Dr. Bernard T. Hart, a life’s guide and a great educator.

—Mark Onuscheck

For Mom, thank you for teaching me what hard work and perseverance look like.

—Eric Twadell

Acknowledgments

Along time ago, the old Yankee Stadium was called “The House That Ruth Built.” We don’t think it would stretch our credibility in the least to suggest Adlai E. Stevenson High School is “The House That Doc Built.” Stevenson High School has been noted as an award-winning and nationally recognized school and the birthplace of the professional learning community (PLC) movement. Under the leadership of former principal and superintendent Rick DuFour (also known as Doc), Stevenson has become a model of educational reform and a lighthouse to those looking to implement best practices and create a PLC culture in their own schools. Although Rick retired from Stevenson in June 2002, we owe him tremendous gratitude, and his legacy lives on in the culture of continuous improvement that permeates every aspect of our school’s daily life. The PLC process is well embedded in the DNA of our school culture.

We feel blessed to work at a school that has been on the leading edge of reform for more than thirty years, and Solution Tree has been sharing our stories along the way. Many thanks to all our friends at Solution Tree, including Jeff Jones, Ed Ackerman, Douglas Rife, and Shannon Ritz. Claudia Wheatley has been our champion and advocate, continually pushing us to clarify our thinking about proficiency and share the next chapter in our school’s story of continuous improvement. And last, but most certainly not least, we are grateful for our new friend and editor Sarah Foster, who was supportive and patient as we tried to translate our experiences with evidence-based grading in our school into a story that will help other schools and teachers create new and improved evidence-based grading and reporting systems.

Stevenson’s administrative team works relentlessly to ensure that faculty and staff have the tools to guarantee high levels of learning for all students, and the team comprises passionate protectors of our school’s mission, vision, and values. Likewise, we must thank Adlai E. Stevenson High School District 125’s board of education for its continued service and support: Amy Neault, Steve Frost, Terry Moons, Gary Gorson, Grace Cao, Roni Ben-Yoseph, and Don Tyer. Our board is unrelenting in its expectation that we improve each year as we move closer to our mission of success for every student.

Finally, and most importantly, we owe special thanks to the Stevenson High School faculty, who are leading the evidence-based grading journey. As we endeavor to build on the strong foundation of excellence here at Stevenson and truly realize our vision and values, we need to upend the traditional grading and reporting system used at Stevenson and across the United States for many years. Our faculty have tackled the challenge head-on, rethinking and reshaping grading and reporting practices.

While it certainly isn’t easy work, our faculty have demonstrated a steadfast, inspiring commitment to improving teaching and learning conditions. This book represents faculty members’ journey through implementing evidencebased grading and reporting. We stand in awe of their willingness to set aside personal interests, convenience, and individual autonomy to do the hard work of creating an evidence-based grading and reporting system that supports student agency, learning, and achievement. We are merely their storytellers.

The authors intend to donate all their royalties to the Stevenson High School Foundation.

Visit go.SolutionTree.com/assessment to download the free reproducibles in this book.

About the Authors

Anthony R. Reibel, EdD, is director of research, evaluation, and assessment at Adlai E. Stevenson High School in Lincolnshire, Illinois. He oversees the school’s teaching and learning principles and practices. Anthony began his professional career as a technology specialist and entrepreneur. After managing several small businesses, he became a Spanish teacher at Stevenson, where he also served as a curricular team leader, core team leader, coach, and club sponsor. He is a member of many professional organizations.

In 2010, Anthony received the Those Who Excel Award for Early Career Educator from the Illinois State Board of Education. And in 2011, Illinois Computing Educators (now known as Illinois Digital Educators Alliance) named him Technology Educator of the Year for successfully integrating technology to support student learning.

Anthony earned his bachelor’s degree in Spanish from Indiana University, his master’s degrees in curriculum and instruction and in educational leadership from Roosevelt University, and his doctoral degree in educational policy, organization, and leadership from the University of Illinois at Urbana-Champaign.

To learn more about Anthony’s work, visit Stevenson High School’s website (www.d125.org).

Troy Gobble is principal of Adlai E. Stevenson High School in Lincolnshire, Illinois. He previously served as assistant principal for teaching and learning at Stevenson. Troy taught science for eighteen years and served as science department chair for eight years at Riverside Brookfield High School in Riverside, Illinois.

The U.S. Department of Education (USDE) has described Stevenson as the most recognized and celebrated school in America, and Stevenson is one of only eight schools to have won the USDE National Blue Ribbon School Award on five occasions. It was one of the first comprehensive schools that the USDE designated a New American High School as a model of successful school reform, and it is repeatedly cited as one of America’s top high schools and the birthplace of the Professional Learning Communities at Work® process.

Troy holds a master of science in educational administration from Benedictine University, a master of science in natural sciences (physics) from Eastern Illinois University, and a bachelor of science in secondary science education from the University of Illinois at Urbana-Champaign.

Mark Onuscheck is director of curriculum, instruction, and assessment at Adlai E. Stevenson High School in Lincolnshire, Illinois. He is a former English teacher and director of communication arts. As director of curriculum, instruction, and assessment, Mark works with academic divisions around professional learning, articulation, curricular and instructional revision, evaluation, assessment, social-emotional learning, technologies, and Common Core implementation. He has also served as an adjunct professor at DePaul University for over twenty years.

Mark was awarded the Quality Matters Star Rating for his work in online teaching. He helps build curriculum and instructional practices for TimeLine Theatre’s arts integration program for Chicago Public Schools. Additionally, he is a National Endowment for the Humanities’ grant recipient and a member of ASCD, the National Council of Teachers of English, the International Literacy Association, and Learning Forward.

Mark earned a bachelor’s degree in English and classical studies from Allegheny College and a master’s degree in teaching English from the University of Pittsburgh.

Eric Twadell, PhD, is superintendent of Adlai E. Stevenson High School District 125 in Lincolnshire, Illinois. He has been a social studies teacher, curriculum director, and assistant superintendent for leadership and organizational development.

Eric has coauthored several books and professional articles. As a dedicated professional learning community (PLC) practitioner, he has worked with state departments of education and local schools and districts throughout the United States to achieve school improvement and reform. An accessible and articulate authority on PLC concepts, Eric brings hands-on experience to his presentations and workshops.

In addition to his teaching and leadership roles, Eric has coached numerous athletic teams and facilitated outdoor education and adventure travel programs. He is a member of many professional organizations.

Eric earned a master’s degree in curriculum and instruction and a doctorate in educational leadership and policy studies from Loyola University Chicago.

To book Anthony R. Reibel, Troy Gobble, Mark Onuscheck, or Eric Twadell for professional development, contact pd@SolutionTree.com.

A Case for a Better Way to Grade

Grading systems often fail to reliably reflect students’ knowledge and skills (Guskey, 2015; Guskey & Link, 2018). They can create illusions that students know the course material (“An 85 percent means mastery, right?”), can allow inconsistencies (“I lost 10 points for work submitted one day late”), and can cause confusion in student assessment (“I got 50 out of a possible 55 points. I know the material!”). These are just a few of the contradictions, inconsistencies, and absurdities that exist in current grading and reporting models, and they need to be fixed.

Students deserve a grading system that is fairer and clearer; their future selves depend on it. There’s no question that grades affect how students see themselves (Bandura, 1997, 2023). If we continue to allow ineffective grading and reporting practices, we risk causing students to view their abilities in a negative way.

Making this happen takes hard work, patience, and coordination. We know this from experience. In our school district (Adlai E. Stevenson High School District 125 in Illinois), it took almost ten years to change how we grade and report. We found that there are four key areas to focus on, no matter your timeline: (1) clearly defining what you want students to know, understand, and be able to do; (2) setting clear standards for how well students should learn or demonstrate their understanding; (3) figuring out what counts as good evidence of student learning; and (4) finding ways to use this evidence to effectively give feedback, score and grade, and share progress with students and parents.

Sadly, many grading and reporting models leave out these important aspects, making it hard for educators to have meaningful conversations about grades. Instead, they rely on numbers to show what students know or can do (for example, “You earned an 86.4 percent, so you must understand this topic”). Students often don’t know what is expected of them, and parents may make assumptions about their child’s progress based on unclear or inconsistent grades in the gradebook.

Illusions Produced by Current Grading and Reporting Models

Current grading models require that educators, parents, and students tolerate illusions that are problematic for creating reliable grades. Those illusions are the following.

• Illusion of agreement (Kahneman, Sibony, & Sunstein, 2021): The illusion of agreement is the false belief that teachers, students, and parents all understand and agree on what a grade means. But the meaning of a grade can be very different for each group. Some people might think grades show how well a student knows the material, others might see them as signs of future success or failure, and some might think grades reflect effort more than actual knowledge (Guskey, 2023). For example, one teacher might give an A based on how well the whole class does, while another might only give As to students who meet strict rules. Other grades may include group projects, extra credit, or class participation. So, even though we may think we know what an A really means, it’s almost impossible to know exactly what each A represents.

• Illusion of validity (Kahneman et al., 2021): Even though we know that grades can be unclear, people still believe they are a good way to show learning. The illusion of validity comes from how long grades have been used and the belief that they are fair because they seem to be based on numbers (O’Connor, 2018). Studies show that even though traditional grading uses mathematics, it is often confusing and contains unnecessary steps (Bowers, 2011; Guskey, 2015), like turning percentages into letter grades or changing scores for things like participation or effort. Despite these problems, we continue to trust grades because they offer a simple way to summarize complex information.

• Illusion of mastery (Schoemaker, 2011): The illusion of mastery happens when people think grades show a deep understanding of the material, but they often only show short-term knowledge. Does getting an A mean a student fully understands the material, or did they just show a temporary ability to copy what the teacher did in class? For example, cramming or learning how to take a test might lead to high grades, even if the student’s real understanding is weak (Brookhart, 1993; Brookhart et al., 2016). Similarly, if teachers give a lot of help, it may seem like students have mastered the material, when really, they have just learned how to follow instructions (Brown, Roediger, & McDaniel, 2014; Saunders, 2023). While grades give a general picture of how a student is doing, they don’t always show how deep or longlasting the learning is.

• Illusion of precision (Brown et al., 2014): The illusion of precision refers to the mistaken idea that the complicated mathematics used in grading—percentages, weighted averages, and decimal points—makes grades perfect measures of a student’s abilities. This belief in numbers gives a false sense of fairness. It makes people think that without these numbers, it would be impossible to measure student learning. However, these numbers often hide unfair and inconsistent parts of grading. For example, a teacher might lower a student’s grade by 25 percent for late work or give a 0 percent for missing work, without thinking about the student’s situation. Grading systems that look precise and fair, even down to decimal points, can still be inconsistent (Pollio & Beck, 2000; Schimmer, 2016). While using numbers makes grading easier, it can often ignore important, more complex parts of learning, like how well a student can think critically or show their understanding in different situations.

• Illusion of relevance (Guskey & Brookhart, 2019; Kohn, 2013): The illusion of relevance happens when students, teachers, or schools wrongly think that grades truly show success. This belief says that grades are the only way to show whether students are learning—how else would we know? But research questions this, showing that grades often oversimplify learning and don’t give detailed feedback about how well a student understands the material (Guskey & Brookhart, 2019; Kohn, 2013). Relying on grades fosters a compliance-driven culture, where students prioritize high scores over genuine understanding,

which often leads them to surface learning through memorizing information or gaming the system. Feedback has proven far more effective for student improvement than grades are, offering context and personalized guidance (Black & Wiliam, 2018; Hattie & Timperley, 2007). While grades seem indispensable, they are not necessary for communicating student progress.

This illusion keeps the myth going that grades are the only way to show how students are doing in school, even though we can communicate that through feedback and conversation (Guskey, 2023).

This book introduces evidence-based grading as a way to get rid of these misunderstandings. It shows a method of grading that checks what students know and can do using clear standards. It also aims to match grading with important ideas like student ownership, mentorship, clear evidence, strong thinking, and real skills. This helps teachers and students better work together on learning and growth. But before we talk about this grading method, let’s look at why problems continue and why grading changes sometimes don’t work. The next section discusses a few key points about why we need to think about grading in a new way.

Why Grading Change Often Flounders or Fails

Changing old grading practices that keep illusions going takes effort and careful management of teaching and learning. Traditional grading has been around for a long time—generations of students have gone through school receiving grades that didn’t really show what they learned. Because of this, these practices are deeply rooted and hard to change (Lipman, 2024). Let’s explore why this is the case.

Traditional Grades Are Familiar

Until 2012, the fifteen communities that make up District 125 in Illinois (our district) used traditional grading practices like most high schools and colleges, focusing on points, percentages, and letter grades (A, B, C, D, or F). Like many educators, we rarely questioned these traditional methods even though we knew better practices were out there. These traditional practices stayed in our schools, mostly due to the mindset of “This is how we have always done grading.” Like most school districts, we found changing these practices hard because it meant not just overcoming technical challenges but also changing the culture. According to Thomas R. Guskey and Jane M. Bailey (2010), educators

often resist changes to grading and reporting because traditional methods are deeply connected to their own school experiences and because changing grading requires a shift in how they think about learning and assessment. Douglas Reeves (2023) points out that parents and students may resist change because traditional grading feels familiar, and a new grading system could disrupt that predictability.

Traditional Grades Appear Rational

Many educators working in traditional grading systems don’t see a need for change because they believe letter grades show accurate information about how students are doing. “He’s an A student” or “She’s a C student” can seem to explain what a student knows and can do. But the truth is, no one can state what these letters actually mean from classroom to classroom, from one school to another, or from one state to another. This lack of clarity creates false perceptions about and for students because an A in Mr. Smith’s classroom might significantly differ from an A in Ms. Garcia’s. Also, an A in ninth-grade English in New York City might represent vastly different learning from an A in ninthgrade English in Wyoming even though we keep telling ourselves that grades mean the same thing for everyone. Further, research shows that variations in the interpretation of grades and differences in how grades are understood across states and districts can lead to big differences in how student achievement is seen, which can affect students’ future opportunities and even socioeconomic mobility (Betebenner, 2009; Brookhart, 2011; Guskey, 2015).

Traditional Grades Are Consequential

Grading is consequential because it’s closely connected to how students see themselves, which means that our grades and written feedback can significantly impact students’ development. Albert Bandura (2023) outlines this concept of grading and its interplay with self-concept in figure I.1 (page 6). The diagram shows the cycle of student performance, feedback (like grades), and selfconcept. Students do a task, they get feedback (a grade), and that feedback often shapes how they see themselves, which then affects how they perform in the future (Bandura, 2023).

Because grades can affect how students see themselves and their abilities, students might avoid accepting the feedback that comes with grades. For example, if a grade challenges how a student already identifies as a learner, they might reject it. A student might say, “I know I write good claims, so my teacher is wrong for giving me a B because my claim wasn’t relevant” (Mandouit & Hattie,

Source: Adapted from Bandura, 1977, 1997.

Figure I.1: How grading intersects with students’ self-concept.

2023). If a student accepts their grade but feels upset about it, it can distort their self-concept as a learner. For example, a student might say, “I thought I could solve this quadratic formula, but my teacher gave me a C because I didn’t factor properly. I guess I am not good at math.” This is an important reason why changing a grading system is hard—because it means possibly changing how students see themselves.

Grades Are Often Under Scrutiny

Another reason why it’s hard to change grading is that teachers often feel like their work is always being judged. Their decisions about students’ performance can be unfairly picked apart. Using a mathematical formula in grading can create the illusion of exactness, so when a teacher faces disagreement, they point to the formula and say, “This wasn’t a subjective decision; this is what the student earned based on the grading scale.” But while traditional grading formulas may seem objective, they are just one piece of the grading process.

Research by Thomas R. Guskey (2015) shows that even with points-based or standards-based grading formulas, a teacher’s professional judgment is important. Studies in assessment, like those by educational consultant Dylan Wiliam (2018) and professors Daniel Kahneman, Olivier Sibony, and Cass R. Sunstein

Students perform.

Students receive feedback.

Feedback informs students’ self-concept.

(2021), also highlight that just relying on formulas and ignoring human judgment might miss important details about how a student is really doing. Carefully interpreting student performance is central to evidence-based grading, but many teachers worry that it will lead to even more judgment of their decisions.

Grading Reform Requires Intense Collaboration

Even when educators are motivated to innovate grading practices in their schools, that grading change may flounder or fail if they don’t have social capital—a network of interpersonal relationships and resources. Alan J. Daly’s (2010) book Social Network Theory and Educational Change explains how important social capital is for making changes in education. Daly’s research shows that successful reform depends on strong social networks within a school or organization, as this intense collaboration is key to creating and sustaining changes. According to his theory, motivation, skills, and social capital are all necessary for change to spread and stick (Daly, 2010; Liou & Daly, 2018; Moolenaar, Daly, & Sleegers, 2012). The importance of social capital when implementing educational reforms can’t be overestimated, because even if someone has the skills and desire to make changes, their ideas may not catch on without respect and trust in the school.

We often encounter schools where some teachers are “trying a grading change,” but the rest of the staff are not, which leads to different grading rules and practices in the same school. When educators within the same school aren’t aligned in their grading practices, it can cause confusion for students and parents. Expectations can greatly differ from one class to another, which creates a messy learning environment, negatively impacts student learning, and lowers the grading change’s chances of sticking. Without meaningful collaboration, attempts at a grading change may face pushback or, worse, be undermined.

Traditional Grading and Reporting Is Efficient

The traditional grading system likely continues because it is easy to use. Teachers create a grading scale based on how many points students earn on tests, which seems simple and clear. Teachers add up the points a student gets, and then the gradebook compares the total points to the possible points and matches the total to a grade, like A, B, C, D, or F.

This efficiency is helpful because teachers often don’t have enough time or resources to manage large classes and all the grading that comes with them. However, this efficiency may come at a cost. It can lead to assumptions about

what students really know. Research by experts, such as Guskey and Bailey (2001, 2010), shows that traditional grading systems often prevent the deeper conversations about learning that education needs, and may enable shallow proficiency, whereby students mimic mastery by replicating information without genuine understanding. This approach can also prevent teachers from giving detailed feedback that helps a student grow, offering instead just a total score that might not accurately reflect the student’s proficiency (Guskey & Brookhart, 2019). While traditional grading might be efficient, it doesn’t always provide reliable information or grades.

Grading Reform Seems Risky

People resist change because they often see it as risky (Fullan, 2016). Teachers, principals, and students might worry that a new grading system will be worse than the old one. Shifting from the current grading model may feel risky because it challenges long-held practices that many believe have worked well enough. Even when research shows that change is good, a fear of hurting students’ success, the school’s reputation, or their own performance makes people resist (Kotter, 1996). People are naturally careful about changes that could have unknown results, especially when the changes affect students’ futures. To overcome this resistance, clearly explain the benefits and offer support to reduce the risks involved (Heifetz & Linsky, 2002), both of which we will discuss throughout this book.

Habits Are Hard to Break

Changing how work is done disrupts familiar routines (Cuban, 2013). People often resist change because it means letting go of old habits and learning new ways, which can be hard (Hargreaves, 2005). For teachers, changing how they grade means rethinking their lesson plans, tests, and feedback, which can take a lot of time and effort. To address this, provide training and ongoing support that gradually help teachers build new habits into their daily routines (Desimone & Garet, 2015).

Past Practices Often Create Teachers’ Current Identity

Teachers often connect their identity with how they teach, which makes change feel personal. For example, teachers may see themselves as successful professionals who have honed their craft over the years. When new methods are introduced, their old ways may seem flawed or insufficient (Kelchtermans, 2009, 2018). This can make teachers feel unsure or defensive, as they might

think the change reflects badly on them. Resistance happens not just because of the effort needed for change, but because change makes them question who they are as educators (Beijaard, Verloop, & Vermunt, 2000; Berger & Lê Van, 2019). To manage change well, recognize these feelings and show that the change builds on past methods rather than rejects them, therefore helping teachers see how new ideas fit with their teaching identity (Day, 2018; Lasky, 2005).

Grading Reform Can Be a Power Struggle

Educational change often shifts the actual or perceived balance of power within schools. Power dynamics play a big part in why people resist change, as they might feel like they’re losing control or influence (Cochran-Smith & Lytle, 2009). Teachers, administrators, and policymakers all hold different kinds of power based on their roles and responsibilities. For instance, adopting a new grading system may reduce how much control a teacher has over grading practices, since changes might involve discussions of learning standards with colleagues, student self-assessments, and collaborative decision making about how well students are doing (Apple, 1995). Administrators may also feel that the new grading system challenges their established authority to decide how academic performance is measured, which can cause them to resist the changes (Atsan, 2016; Staw, Sandelands, & Dutton, 1981). To deal with this resistance, create a shared vision that includes input from everyone so that power shifts are fair and people feel empowered rather than lacking control.

In the next section, we discuss the plan we used to overcome these grading reform obstacles and implement an enduring and impactful grading change.

Five Phases of Successful Grading Change

Change doesn’t happen quickly through one-off professional development days or a one-size-fits-all approach. While implementing positive change takes time, with careful consideration and good intentions, it can have lasting effects. Lasting and meaningful changes are best achieved through thoughtful and rigorous professional development and the innovative ideas of expert teachers. Schools should avoid imposing grading change from the top down. Each teacher brings unique strengths and abilities to their work, and administrators should value and consider these perspectives when making decisions. Any school changes should incorporate input from these diverse viewpoints. By providing teachers with focused professional experiences, we can better apply researchbased best practices to drive change in our schools.

To implement grading change in District 125, we relied on long-standing research based on the creative process to create personalized, effective professional development for teachers. Psychologist and author Mihaly Csikszentmihalyi (1990) describes this process as five interconnected and overlapping stages, which inspired our five-phase change process: (1) preparation, (2) incubation, (3) insight, (4) evaluation, and (5) elaboration. Summarizing previous work on how change develops, Csikszentmihalyi (1990) provides a framework for better professional learning for educators.

To keep things simple, this book explains how effective education teams move through the five interconnected phases of this innovative process (Csikszentmihalyi, 1990).

1. Preparation includes becoming immersed in problems that are interesting and arouse curiosity. Preparation is the term that psychologists apply to the first phase of the creative process, when individuals are starting out and struggling to perfect their craft.

2. Incubation refers to the period during which ideas churn around below the threshold of consciousness. After an individual starts working on a solution to a problem or has an idea that leads to a novel approach, the individual enters the incubation phase. When individuals consciously try to solve problems, it becomes a linear process, but when they leave problems to incubate or simmer, unexpected combinations occur. And it’s these unexpected combinations that form domain-changing breakthroughs.

3. Insight is the aha moment when a puzzle starts to come together. The insight phase is also called the eureka experience. Some psychologists call it illumination. It’s the exact moment when a problem that an individual has been trying to solve—for days, months, or years— comes together in their mind to form a clear resolution. This resolution emerges only after a complex and lengthy process of preparation, incubation, and action.

4. Evaluation occurs when deciding whether the insight is valuable. Individuals should decide if their insights are novel and make sense. In other words, they should analyze the insights to determine whether they’re truly worth pursuing. If an insight continues to excite and motivate the individual to go forward, then the hard work of turning the insight into a reality begins.

5. Elaboration involves translating the insight into its final work and constantly revising it. Throughout the literature (Csikszentmihalyi, 1990), many who have created products that literally changed their domains or disciplines state the necessity of hard work and revision. Yet at the same time, they also state it seems not like work at all but like play. The process of creating is what drove them toward continuous growth and improvement.

This phased approach to intentionally managing change helps us collaborate with teachers who might have different ideas about what is best for students. The five phases are crucial to changing how people think about grading and reporting, and each phase helps teachers better collaborate and have thoughtful grading conversations with students and families (more on this in chapters 3 and 4, pages 85 and 129). These phases often repeat and overlap. Teachers and teams will move through the phases at different times, reflecting, learning, questioning, and building new ideas as they go.

As schools continue to implement a more effective grading model, the saying “go slow to go far” rings true. The goal is to replace traditional grading and reporting practices, and that takes time and focus. In our district, we were not in a rush to make the change. We chose to mindfully and intentionally work with teachers on this change by answering their questions and helping them develop new ideas about teaching and learning. In other words, when changing grading practices, we believe that (1) teacher teams must collaborate and decide how best to shift toward more meaningful grading practices, (2) figuring out what’s best for teaching and learning is complex, and (3) the work should take place in five phases.

Throughout this book, we share the thoughts, discussions, and reflections of teacher teams working to make this shift in grading practices. After you implement evidence-based grading practices, we believe you will spark deeper, more transformative conversations about teaching and learning that will have lasting effects on students and the overall school culture.

The Structure of the Book

Many schools use either traditional grading or standards-based grading. While standards-based grading aims to improve learning, it often causes teachers to fall back on traditional methods, ultimately resulting in just a shift in gradebook structure that doesn’t foster lasting proficiency.

We recommend moving toward grading practices that focus on student agency, promote self-efficacy, develop competence, and nurture a strong sense of self (Gobble, Onuscheck, Reibel, & Twadell, 2016; Reibel, Gobble, Onuscheck, & Twadell, 2024). We call this shift evidence-based grading. This grading model assesses skill proficiency, encourages student agency, embraces student self-feedback, and reports on progress.

This book breaks down the principles and practices of evidence-based grading and shows how a curricular team works through and implements these changes. The team is made up of both new and experienced teachers who have had a range of experiences with grading reforms, both positive and negative. They are eager to learn about evidence-based grading but are cautious. Team leader Maria has been schooled in evidence-based grading and has experienced its benefits firsthand, as she has individually used it for several years. She is excited to bring it to her team for discussion.

The team members share their challenges, successes, and experiences throughout each phase of the professional learning process (outlined in the previous section) as they implement evidence-based grading. They collaborate, debate ideas, and work toward agreement, explaining each phase, showing the change from team members’ perspectives, and identifying key strategies to support the transition. This book is for anyone in education who is looking to challenge or change their grading practices and needs a viable alternative and a manageable plan to do so. We are confident that by working through these phases and the content of this book, you can successfully change your grading practices.

Chapter 1 focuses on the preparation phase, where team members learn about evidence-based grading and its importance, how it differs from their current practices, and how it can improve student achievement. This chapter helps create a shared understanding of how evidence-based grading can benefit student success.

Chapter 2 covers the incubation phase, during which the team thinks about proficiency standards, debates grading practices, questions the time required for the change, and reflects on the value of evidence-based grading and reporting. The team members also start to question their current grading practices and policies that support student success.

Chapter 3, the insight phase, is where the team starts thinking about skillsfocused assessments. The team members realize that feedback is a two-voice process (the voices being the student’s and the teacher’s) and that evidence is

critical for scoring, reporting, and grading. These insights give them the confidence to try some of these new practices and start evaluating how they impact student learning.

In chapter 4, the team members enter the evaluation phase, where they examine how well the change is working and critique its implementation. They assess how students communicate about learning, the clarity and coherence the change brings to instructional and assessment practices, and whether students and parents understand the value of the change. The chapter ends with the team taking responsibility for making revisions and continuously improving the process.

In chapter 5 (page 163), the team reaches the elaboration phase, where members find strong connections in their work, a new sense of purpose, and a commitment to student success. They focus on implementing reflective activities, student self-feedback, and instruction that will create student agency in the learning process. The team embraces change through reflecting as a group, making instructional revisions, and repurposing their assessments. At the end of this phase, they unite their curriculum, instruction, and evaluation with a focus on student agency and success.

Finally, the epilogue and appendices offer ideas and tools to help make the transition to evidence-based grading more successful. These include a tool for self-assessing your or your school’s progress with the evidence-based grading principles (epilogue), gradebook hacks and small changes that build momentum (appendix A), data points from Adlai E. Stevenson High School District 125’s transition to this evidence-based grading model (appendix B), and a tool for converting a unit of study to evidence-based grading (appendix C). These tools and ideas were crucial for our district’s transformation to evidence-based grading and reporting.

Figure I.2 (page 14) summarizes the book’s five phases of grading change, their evidence-based grading principles, and actions.

1.Preparation: Core Commitments

• Refocus the purpose of school.

• Agree that the percentage system is a flawed grading model.

• Eliminate common grading errors.

• Focus curriculum on skills and proficiency standards.

• Develop proficiency scales.

• Agree on a definition of learning.

2.Incubation: Unexpected Connections

• Proficiency standards connect with grading.

• Feedback connects with proficiency.

• Assessments have distinct purposes.

• The gradebook connects with learning.

• Grades come from calibrated interpretation of student evidence.

3.Insight: Critical Discoveries

• Instruction should nurture self-reliant learning.

• Proficiency standard language is the language of instruction.

• Questions should develop proficiency, not enable mimicry.

• Assessment pacing should be skill proficiency focused.

• Assessments should capture more than explicit knowledge.

• Use evidence for feedback.

• Use evidence for scoring.

• Use evidence to determine grades.

• Use evidence for reporting.

4.Evaluation: Key Questions

• Are we collecting the right evidence to determine a grade?

• Are we effectively communicating proficiency?

• Are we providing opportunities for student growth?

• Are we ensuring student accountability?

• Is our gradebook structure ready for evidencebased grading?

• Are we giving the right feedback?

5.Elaboration: Essential Realizations

• Evidence is more precise than any formula.

• Educators, not computers, determine grades.

• Curriculum is realized only through skills and proficiency standards.

• Evidence-based grading improves team collaboration.

• Communication with the community is essential.

• Teachers must take a postsecondary perspective (student agency and efficacy).

• Assessments should be revelatory.

Figure I.2: Summary of the book’s contents.

Insight: Critical Discoveries

At this point, the team’s conversations move from thinking to generating ideas. As this happens, team members identify specific principles and practices associated with the grading change. They engaged in collaborative conversations during both the preparation and incubation phases, but the real value of collaboration shows up in the insight phase. This is when big ideas click, and they have aha moments that lead to new understanding.

It is common for team members to gain insights at different times, as new learning always starts with personal insight. Later in this book, we show how the team members keep developing their insights through the evaluation and elaboration phases.

As you read our team’s story, think about how the teachers find insights that shape their concepts of teaching and learning, and note the way these insights are directly tied to this grading change. The following are three key points to remember during the insight phase.

1. Not all insights are positive: They may be negative or represent obstacles. The key is that an insight allows a teacher or a team to make a change for the better.

2. Insights usually come from one person: However, they are likely the culmination of a collaborative team’s thoughts and discussions.

3. Insights need a system of checks and balances: Good teams review student results to determine whether the insights affected learning.

As you read our team’s story during the insight phase, pay attention to the following.

• See how the team connects evidence-based grading principles and student learning.

• Notice how the team makes conversations with students and parents more purposeful with evidence.

• Find insights on how an evidence-based grading system can improve instructional efforts.

Our Team’s Story

The incubation phase took some time to work through. By this point, Maria and her team have met with small groups of students and parents to explain the concept of evidence-based grading and gather questions and concerns. They recognized that communicating information about this change was important, and they thought if they first worked with small groups, they could test out how people understood the idea and gauge any confusion that might emerge.

The team has worked to change its learning targets, create new assessments aligned to those targets, and modify the gradebook to match the new 4 (exceeds), 3 (meets), 2 (approaching), and 1 (developing) model of grading.

At the team’s next meeting, Maria asks, “After our small-group discussions with students and families, what are some insights that emerged about making the change to an evidence-based grading system?”

While discussing this question, the team reviews the logistics of evidence-based grading, how to communicate a rationale that makes sense to students and parents, and the value of evidence-based grading as a way to clearly communicate about student learning.

Joni mentions, “You know, Maria, moving to evidence-based grading is more of an instructional change than a grading change than I think we first thought. It truly is more about creating self-reliant learners and ultimately self-sustaining adult humans.” Joni adds, “It’s funny how, at first, many of us felt like this was just a new grading system. But now, we see it’s a fundamental shift in our approach to teaching and learning.”

Maria feels that Joni’s thinking is the key to more successful teaching and learning. She says, “I think you’re right about that, Joni. In order for evidencebased grading to be successful, we need to make some instructional feedback

and curriculum changes. A change to evidence-based grading would help our team make more specific and necessary changes to promote agency in student learning.”

Britney says, “I’m glad you said that, Maria, because I think the gradebook seems difficult for students to understand, and I’m not sure how it’s helping them. I’m a little nervous about the final exams too.”

Maria says that she feels the same way. Then she adds, “The more I work through this change, the more I see it as a fuller pedagogical change and not just a grading and gradebook change. While, yes, we have changed our gradebook, and we are now giving 4, 3, 2, and 1 scores, that is still not what makes evidence-based grading effective. This grading system requires committing to agency and efficacy-focused teaching and learning.”

“Wait, you need to explain that insight,” Kevin says. “How does that make sense, change pedagogy to change grading? Those are separate issues, aren’t they?” The others agree with Kevin.

Maria responds, “For example, teachers assess frequently; however, it is isolated from instruction—simply verifying learning. In the instructional timeline, the evidence-based units now include rehearsal, reperformance, and self-assessment, with summative assessment throughout the learning experience. In the evidence-based grading model, the summative experience is further up in the pacing and allows students to understand who they are as learners sooner in the learning process.”

The team sits silently, contemplating the new timeline.

Kevin interjects, “I think the idea is that we teachers assess more reliably in the scope and sequence instead of waiting to the end.”

“OK, I know this is my second year of teaching,” Maya says, “but what does that mean?”

Maria answers, “It means that students learn more effectively in a variable environment where assessment and learning are driven by the purpose of the event and skill proficiency; it is essential that they are continually analyzing their current state of understanding of a topic or skill. When they are actively involved in this process, they know where they need to make improvements. This means that they must assess their own learning early and often. They can’t find out at the end of the unit that they didn’t process or understand the material. Students reflect on their developing proficiency as we go.”

“But we give them a chance to understand it now,” Joni says. “We give them formative assessments, writing, and other activities. What more do we need to do? We have researched all these activities, and we are proud of our work.”

Maria says, “Kevin and Joni are right. On one level, our current practice is really well organized and effective. We do a very good job with our teaching practices. But look at the last event at the end of unit 3, which was an oral speech about the Great Recession, and at no point during our two-week unit did we ask students to orally produce the content. Those are my concerns. At no point prior to our final project did we give students an opportunity to develop their ability to produce an effective oral narrative about historical content. At the end of the unit, we were grading on skills we did not teach them or have them experience.”

“So, maybe we should each discuss the Great Recession with a partner as our warm-up on the second or third day?” Maya asks.

“Yes, exactly! We must give students early opportunities to develop their skills in relation to a proficiency standard before the exam,” Maria says.

Kevin asks, “Well, what if we have more than one standard per unit?”

Maria replies, “Remember, our work has skills and proficiency standards. And when we have multiple standards, we will do the same thing. We need to pay close attention to our students’ current states of understanding and skill levels. The earlier we recognize a student’s proficiency level, the earlier we can build improvement.”

Joni says, “So our standard for this unit would be to produce an effective oral narrative with accurate details outlined in class resources. What you are saying, Maria, is that a good portion of our lesson must include the opportunity for students to attempt oral narratives and subsequently reflect on whether they were effective and had accurate details from class?”

“Yes, what we need to look at is our assessment and instructional series. We need to decide what skills and standards are important and how we provide experiences for students to make sense of their own learning around the expectations of the targets,” Maria replies.

The team agrees to explore unit 3. When it does, it discovers that the unit is, in fact, disjointed, and the assessments and instruction do not align with the skills and proficiency standards.

Maria says, “Our unit is producing way too much evidence that isn’t usable for determining a student’s proficiency in skills and standards.”

As the team works to align its instruction and assessments, it gains many insights into how it can change its pedagogical approach and provide early opportunities for students to develop the skills they need to succeed.

After a week, the team begins to polish up a few units of study with the new pacing structure, and team members seem happier with their new template. Maria then moves to a new area of thought in a team meeting, addressing the question that shows up on the team’s agenda.

“We have a question here about how to score and grade assessments in evidencebased grading,” Maria says.

Joni and Maya exchange glances. “Yes, Maya and I just aren’t sure if we are grading our assessments correctly,” Joni says. “It seems to me that we are simply giving a 4 for exceeds, a 3 for meets, a 2 for approaching, and a 1 for developing. It is basically the same thing as points except for what we write at the top of the paper, right?”

“Well, the way I understand it,” Maria says, “evidence-based scoring and evidence-based grading are two different actions. Scoring is the act of reviewing the assessment for large- and small-scale patterns in the student’s work to judge proficiency in skills, and grading is providing feedback through conversations with the student based on where they fell on the proficiency scale and, ultimately, a grade.

“Evidence-based scoring is about pattern recognition relative to skill proficiency. This means that as soon as the teacher sees a pattern of competency or incompetency in a student’s work, the teacher comments on it and stops scoring. Yes, you heard that correctly—stop scoring and move to feedback.”

Britney asks, “How can that be?”

Maria responds, “Since the assessments correctly align to the proficiency standards, as soon as we observe a pattern of proficiency, we begin to comment. The teacher must start giving feedback. And since there are additional evidence and growth opportunities, the student can show their work again within specified performance windows. In an evidence-based grading system, scoring and grading are not dependent on each other.

“Traditional grading depends on the teacher adding up all the points earned and dividing those by the total points possible. But, in evidence-based grading,

scoring is the act of giving feedback about patterns of proficiency in course skills, and grading is a conversation about a student’s overall proficiency and growth. The teacher can give feedback without an overall score in relative criteria.”

“So, let me see if I get this,” Kevin says. “As soon as I see evidence or a pattern of evidence of proficiency or nonproficiency in a student’s work, which gives me insight into the student’s proficiency ranking on a standard, I start grading?”

Maria says, “Yes. Think about it this way: You know those singing shows on TV? As soon as the judges hear enough, they ask the singer to stop singing, and then the judges provide feedback. Evidence-based grading works the same way. The teacher has an expectation of proficiency, or standard, and as soon as the student provides evidence showing a pattern against or for proficiency, the teacher can move on to a conversation with that student. This is the goal of good evidence-based grading—conversation about learning, growth, and competence.”

At this point, the team members seem to be getting it. Maria asks that they calibrate this idea. They spend the rest of the meeting working on how to grade and score evidence-based assessments using standards and scores of course skills.

The Critical Discoveries in Evidence-Based Grading

During the insight phase, teams start to have important revelations about evidence-based grading practices. In our story, the team members discover key insights that will help them move forward with evidence-based grading. Here are those insights.

1. Instruction should nurture self-reliant learning.

2. Proficiency standard language is the language of instruction.

3. Questions should develop proficiency, not enable mimicry.

4. Assessment pacing should be skill proficiency focused.

5. Assessments should capture more than explicit knowledge.

6. Use evidence for feedback.

7. Use evidence for scoring.

8. Use evidence to determine grades.

9. Use evidence for reporting.

These important insights help create the foundation for evidence-based grading, where the focus shifts from traditional teaching to practices that center on student agency, rigorous thinking, skill proficiency, and self-reliant learning. As the team keeps exploring these insights, the next step is to figure out how to fully implement evidence-based grading, check its impact, and elaborate on its use (more on this in chapter 4 [page 129] and chapter 5 [page 163]).

Instruction Should Nurture Self-Reliant Learning

Instruction in evidence-based classrooms aims for two things: (1) student-initiated action and (2) student-sustained thinking. In other words, it makes students own their development. Teachers achieve these two aims by adopting proficiencybased pacing (scope and sequence), mobilizing proficiency standards, and flipping the gradual release model.

Proficiency-Based Pacing (Scope and Sequence)

Traditional instructional methods usually start with the teacher showing the steps to reach a desired outcome. The students observe and take note, practice what the teacher showed them, and then take a test to prove they can do it. In this method, the teacher leads and the students follow the modeled actions to achieve the desired result. In simple terms, the teacher teaches, students practice, the teacher assesses, and students reflect. Then, the process repeats throughout the unit, semester, or year. This is illustrated in figure 3.1.

Source: Gobble et al., 2016.

Figure 3.1: Conventional instruction and assessment process—Linear with equidistant assessment.

This image shows a process described by Peter C. Brown, Henry L. Roediger III, and Mark A. McDaniel (2014) as example learning, or more commonly known as the gradual release model of learning, which P. David Pearson and Margaret C. Gallagher developed in 1983. It typically follows a sequence of “I do, we do, you do.” This method has the teacher provide direct instruction (I do), then use guided practice (we do), and finally let students practice on their own (you do). Although this can be an efficient way to teach, it might lead to shallow learning—learning cycled through short-term memory only— suggesting to students that they are accountable only for short-term competence.

Researchers like Eric Saunders (2023) argue this model may not help students, as it can lead to imitation and false perceptions of proficiency. Other researchers suggest that reversing this gradual release model can significantly enhance student agency and lead to deeper engagement (Twadell, Onuscheck, Reibel, & Gobble, 2019).

In an evidence-based grading model, instruction works differently. It’s more of a back-and-forth process where students produce evidence, and then the teacher and students react (engage in dialogue) to that evidence (through feedback and self-assessment). Then, students relearn and reperform their proficiency. Figure 3.2 shows how this process works.

Produce Evidence

Evaluate, Self-Assess, and Reflect

Figure 3.2: General evidence-based instruction process.

In evidence-based grading, instruction puts students at the center of their learning. Doing and learning are nonlinear, where assessment and instruction work together as one process. (For more on agency-focused teaching and learning, see Reibel et al., 2024.) Essentially, this process includes periods of self-assessment and reflection mixed with periods of performance and feedback. Weaving these experiences together helps teachers create more relevant lessons focused on the proficiency standards (Bandura, 2023). This approach also lets students show they can transfer skills and knowledge to new contexts (Brown et al., 2014). In short, teachers see their role in an evidence-based grading model is not to deliver learning. Instead, it is to react to students as they create their own learning

Proficiency Standards During Instruction

Learning science researchers Brown and colleagues (2014) say that for learning to stick, teachers need to link what students learn (content knowledge) to cues that help students remember it when they do skill demonstrations. In evidence-based grading, these cues are the proficiency standards.

There are four ways teachers can use proficiency standards as cues in their instructional practice.

1. Connect any discussion about evidence to the proficiency language: The teacher should connect what a student says or does to the proficiency standard. This means that when talking about student work (feedback or evaluations), the teacher should refer to the language in the proficiency standard and scale.

An example would be when a teacher walks around the classroom, viewing student work, and says, “I like what you are doing here with [proficiency language of the standard],” or “I see what you are trying to do here, so make sure [proficiency language of the standard] is more evident.”

Another example comes from a world language classroom. The proficiency standard is, “I can create original oral meaning that is clear and organized and elaborates with supporting details in a simple context.” The teacher connects with the proficiency standard by saying, “I’m not sure what you’re trying to say here; remember, clarity is important,” or “Your thoughts are detailed, but I suggest that you speak about [a detail] before [a detail] in order for them to make more sense.”

2. Explain how lesson activities collect evidence of the proficiency standard: After the teacher explains the day’s tasks, they explain how those tasks relate to the proficiency standard.

For example, the teacher might say, “Our tasks today are all connected to [proficiency language of the standard],” or “What I am looking for in regard to [proficiency language of the standard] is . . .”

For example, the teacher might say, “Use this moment to discuss how [proficiency language of the standard] can be better displayed in your partner’s work.”

3. Make the most of student thinking connected to the proficiency standard: The teacher extracts, scrutinizes, and exposes student thinking related to the proficiency standard.

For example, anytime a student or class offers content that relates to the standard, the teacher must grab it and teach with it. They

might say, “That’s right, Mia. That information is what we are looking for here in [proficiency language of the standard],” or “What that group just presented is what we mean by [proficiency language of the standard].”

For example, with the proficiency standard, “I can create original, clear, and organized speaking that includes supporting details,” the teacher might say, “That’s a great way to think about saying that point! Creative and original, Kim,” or “Class, let me point out something Erin just said. She showed originality by using [that word] to explain [that point].”

4. Reflect on the proficiency standard: Students use their evidence to form an accurate perspective on their proficiency. The importance of reflection is clear (Hattie, 2023), and in evidence-based classrooms, good reflection involves both proficiency standards and studentproduced evidence.

For example, a teacher might ask students the following to prompt them to reflect on their learning: “Can you see your work in [proficiency language of the standard] yet?” or “What thinking might go into [proficiency language of the standard]?”

For example, using the target, “I can create original oral meaning that is clear and organized and elaborates with supporting details in a simple context,” a teacher might ask, “José, are you certain all your details elaborate on the context of school sports?” or “Remember, class, look at your dialogue. Does it flow logically from one detail to the next about school sports?”

Reversal of the Gradual Release Model of Instruction

Flipping the gradual release model is key to teaching in an evidence-based grading system because it helps students take more control of their learning. This flip switches the order to you do (student), we do (pair or group), and I do (teacher). It starts with the student working on their own, followed by direct teaching from the teacher.

Starting with the “student does” phase encourages students to take control of their learning right from the start, building independence and responsibility. Research shows that when students have control of their learning processes, they develop proficiency more effectively (Bandura, 2023; Marzano, 2017; Saunders, 2023). This model focuses on how teachers respond to students,

rather than just delivering lessons (Twadell et al., 2019). The teacher guides the lesson while students are learning and determine what to do next based on student work, as figure 3.3 illustrates.

Teacher does (I do)

Class does (we do)

Student does (you do)

does (you do)

Class does (we do)

Teacher does (I do)

Figure 3.3: Gradual release model of instruction (left) compared with a “flipped” gradual release model used in evidence-based grading classrooms (right).

After students have had the opportunity to work, the teacher then steps in. At this point, the teacher can fix misconceptions, explain things, and build on what the students have noticed or thought about. This way, a teacher not only makes the learning process more student centered, but also promotes active learning and deeper retention (McKinley & Benjamin, 2020; Pear, 2016).

When instruction is based on responding to student work, the responsibility of learning shifts from the teacher to the student. Students can “do” before the teacher teaches in four primary ways. They can (1) think or reflect, (2) do, (3) observe, or (4) visualize (Bandura, 2023). See figure 3.4 for an illustration of how this works.

does (you do)

or group does (we do)

(Reflect)

Source: Reibel et al., 2024, p. 163; adapted from Bandura, 2023.

Student does (you do) Teacher does (I do)

or group does (we do)

Figure 3.4: Four ways to begin student-directed learning.

Student

When students are asked to explore learning on their own, they use strategies to understand and handle their learning through experience and reflection (Bruner, 1960; Carey, 2014).

The flipped model helps teachers meet different student needs earlier by making it easier to see where each student is struggling. During the first “you do” phase, teachers can observe and check students’ work, which then helps guide the next “we do” and “I do” phases. This way, students get the right level of guidance and intervention far earlier than in the regular gradual release model. When the teacher does first, it can create the false idea that students understand because they might just be copying what the teacher did, making the teacher think no help is needed (Tomlinson, 2017).

Flipping the gradual release model of instruction changes the learning from teacher centered to student centered. In this approach, students take more control of their learning, and the teacher responds to them. The steps shown in figure 3.5 can help teachers in evidence-based classrooms plan lessons.

Individual Think or Reflect Do

Visualize Observe

Pair or Group Think or Reflect Do

Visualize Observe

Whole Class Think or Reflect Do

Visualize Observe

Pair or Group Think or Reflect Do

Visualize Observe

Individual Think or Reflect Do

Visualize Observe

Write an argument for [topic].

Share your argument with a partner. The partner asks questions.

Watch the teacher make an argument. The teacher discusses how to make a quality argument.

Groups write an argument together on [new or same topic].

Write [a new or [updated original] argument about [topic].

9:00 to 9:10 a.m.

9:10 to 9:15 a.m.

9:15 to 9:30 a.m.

9:30 to 9:50 a.m.

9:50 to 9:55 a.m.

9:55 to 10:05 a.m.

Figure 3.5: Protocol for flipping gradual release instruction for the skill of written argumentation.

Visit go.SolutionTree.com/assessment for a free reproducible version of this figure.

For more details on this lesson design, please see Proficiency-Based Instruction: Rethinking Lesson Design and Delivery (Twadell et al., 2019) and Beyond PLC Lite: Evidence-Based Teaching and Learning in a Professional Learning Community at Work (Reibel et al., 2024).

Proficiency Standard Language

Is the Language of Instruction

Instruction should be based on clear proficiency standards that guide what teachers say when they teach. When students read a proficiency standard, it should match the language their teacher uses during instruction. For example, if the skill is argumentation and the standard is to “make logical arguments,” then the teacher might say, “You are making more logical arguments because your vocabulary is more relevant,” or “Remember that you don’t have a good grasp of the periodic table, which is why your skill of scientific argumentation is not developing.” The proficiency standard (and scale) has this language. More examples of how proficiency standards align with the language of instruction appear in figure 3.6.

Proficiency Standard

Create logical arguments with relevant detail.

Perform effective and complete analysis of scientific data.

Analyze historical events with specificity, nuance, and accuracy.

Write an argument with relevant details that are clear for the appropriate audience.

Instructional Language

“You are making more logical arguments because your vocabulary is more relevant.”

“Remember that you don’t have a good grasp of the periodic table, which is why your skill of analysis is not developing.”

“Your ability to analyze historical events is progressing, as you now include more specific dates and events in your arguments.”

“In your essay, you did a great job of adding relevant facts and dates, but let’s work on more clearly connecting these facts to your main argument so that your reader will find it more compelling.”

Figure 3.6: Aligning proficiency standard language with the language of the classroom.

Questions Should Develop Proficiency, Not Enable Mimicry

When students act without thinking, they might be copying what they see without really understanding (like a baby playing peekaboo). So, when students just mimic, they are only imitating, not actually learning or building proficiency. Teachers can reduce mimicry by helping students focus on how well they are meeting proficiency standards, and if students feel confident, they’re more likely

to develop in the course skills. Table 3.1 shows the difference between questions that lead to copying and those that lead to real engagement.

Table 3.1: Questions That Promote Mimicry Versus Cognitive Engagement

Questions That Invite Mimicry

Questions That Invite Cognitive Engagement

“What do you think about this topic?” “If I told you [a fact, information, or context], what would you think?”

“What three ways can you answer this question?”

“How can you answer this question, including all the essential details?”

“What if I changed [a fact, information, context]? Would you feel or think the same way?”

“By adding [a detail], would you change your mind?”

In response to the questions on the left, students might pretend they understand by repeating what they’ve heard the teacher or other students say. For example, a student might think, “I heard the teacher say this was the answer, so I will just say that.” In evidence-based grading, teachers try to ask questions that stop this from happening. The questions on the right are more nuanced and connected to the students’ own thinking and understanding to make sure they are really showing their skill proficiency, not just pretending.

Cognitive Engagement Increases Throughout Instruction

In evidence-based grading, we want cognitive engagement to increase throughout instruction. This means the activities should get more challenging. We define cognitive engagement as the time students spend developing and reflecting on their levels of proficiency.

If students are not aware of their proficiency, they might not be able to ask the right questions, or they may just try to finish their assignment without really developing proficiency. For example, a student solving an algebra problem without thinking critically might just memorize the steps without understanding why the process works or how it connects to bigger ideas in mathematics, like solving, graphing, and analyzing. In contrast, a student who is cognitively engaged can begin to think about the relationships between the formulas and the larger skills of algebra.

Similarly, in an English class, a student might focus on only meeting basic requirements of a task (such as three details, two characters, and ten sentences)

without really developing their writing skill. For example, a student might write an essay by just checking the boxes (five paragraphs, a thesis statement, and so on) and not think about how strong their argument is. Increasing their cognitive engagement would push the student to analyze their work, make revisions based on the evidence, and think critically about their argument’s effectiveness. Evidence-based grading teachers keep students focused on their current state of proficiency throughout the lesson, while increasing the rigor of the assignments.

Traditional classrooms see students’ cognitive engagement rise and fall throughout the lesson (as in figure 3.7). Many times, the teachers in these classrooms tell students something to the effect of “Here are the directions on what to do; now go do it and make it look like this,” and then change the tasks the students engage in. There are moments of high cognitive engagement and moments of low cognitive engagement—our point is engagement is random.

Ideally, teachers aim to enhance student cognitive engagement progressively throughout a lesson. This can be achieved by designing tasks that increase in cognitive complexity and rigor, as illustrated in figure 3.8 (page 100). As the lesson progresses, the cognitive demands of each task should build on the previous.

Figure 3.7: Inconsistent cognitive engagement.

Figure 3.8: Consistent increase of cognitive engagement.

Some teacher design lessons to increase in complexity up to a certain level and then ask students to engage in repetitive practice at that level until the lesson ends (figure 3.9).

Figure 3.9: Students engaging in repetitive practice.

However, in evidence-based grading, cognitive engagement means students build and reflect on their proficiency, self-assess their proficiency against standards, and discuss that proficiency with the teacher and peers.

Assessment Pacing Should Be Skill Proficiency Focused

Many teachers carefully plan lessons as broken into smaller parts, present them in a way that builds on previous knowledge, check student understanding with quizzes or tests, and then repeat. This is what they were taught in teacher preparation courses or experienced themselves as students. We know from experience and research that this approach leads to shallow retention of material, and it can even slow down learning (Brown et al., 2014; Lang, 2021). Figure 3.10 shows this; each small circle represents a point where a teacher taught or assessed. The larger circle at the end represents a culminating event (test or project).

Source: Gobble et al., 2016.

Figure 3.10: Traditional sequence of instruction and assessment.

Pacing in an evidence-based grading model is different. It’s all about developing proficiency. In figure 3.11, the small dots represent students. The big circle represents a proficiency standard. Students move themselves toward desired levels of proficiency by producing evidence, reflecting on it, and reapplying feedback while the teacher subtly curates the activities.

Figure 3.11: Evidence-based sequence of instruction and assessment.

The process in figure 3.11 (page 101) can be likened to teaching a child how to ride a bike by first teaching them about the bike parts and how they work in separate steps. Figure 3.10 (page 101), on the other hand, represents teaching a child to ride a bike by allowing them to immediately get on the bike and learn to ride while simultaneously making sense of the bike parts. Evidence-based grading uses instruction that is similar to the latter example, where the teacher is as responsible for students’ reaction to the outputs (their performance and reaction to riding a bike) as they are for the inputs (all the bike parts).

If you want to help students learn and improve, you need to give them the chance to show you what they’ve learned. Evidence-based grading unit pacing recognizes this. We encourage students to immediately dive into the material through experiences, observations, or thinking. Starting units by having students take action, teachers give students more time to think about, perform, observe, or even visualize their skill proficiency (Bandura, 1997, 2023). By allowing students to continue to learn after a performance, you give them time to process their mistakes and rearrange their thinking about a topic to gain new understanding.

To illustrate how teachers create more time for students to learn, we will compare several learning timelines. Traditional timelines, as shown in figure 3.12, break learning into small chunks, with teaching, quizzes, and a final test at the end.

I = Instruction

F = Formative assessment

Q = Quiz

P = Project

S = Summative exam

R = Retake

Figure 3.12: Traditional timeline.

Let’s use our practice, scrimmage, and game assessment terminology from chapter 2 (page 55)—(1) practice (delivery and drill), (2) scrimmage (development), and (3) game (determination)—and introduce small scrimmages and small games, with the difference being two questions versus fifty questions (Reibel et al., 2024). Figure 3.13 (page 103) represents conventional unit pacing, while figure 3.14 (page 103) illustrates pacing in evidence-based grading. The learning timeline in evidence-based grading is more modular with the teacher switching between the three modes of assessment.

P = Practice

s = Small scrimmage (rehearsal)

g = Small game

G = Large game

Figure 3.13: Traditional pacing using practice, scrimmage, and game assessment terminology.

P = Practice

s = Small scrimmage (rehearsal)

S = Large scrimmage (rehearsal)

g = Small game

G = Large game

Figure 3.14: Evidence-based grading pacing.

Note in figure 3.14 that the games (summative exams) appear throughout the timeline where retention needs to be evaluated, not just at the end of the timeline. Students now have more time to learn from mistakes and readjust. Without time to react after assessments, students can find it difficult to achieve proficiency (Irons & Elkington, 2021).

Performance Windows

In evidence-based grading, teachers need reliable evidence to determine grades. They can’t make any assumptions about whether a student is proficient. Therefore, they should have plenty of assessments (Black & Wiliam, 2018; Reibel et al., 2024). As we discussed in chapter 2 (page 55), formative assessments are like scrimmages (simulations of summative exams) and summative assessments are like games, determining levels of proficiency. Scrimmages ask the question, “How is your proficiency level currently?” and games ask, “Did your proficiency prove out?”

Once teachers have identified their assessments (practice, scrimmage, or game), they need to set up performance windows. These are the time frames when students demonstrate proficiency and their evidence is eligible for grading.

Performance windows are key to evidence-based grading. Teachers communicate these windows to the students at the beginning of a grading period. The windows help teachers make evidence collection and grading more manageable. To create the performance windows, teachers ask the following three questions.

1. When do I need to know where students are? The assessment’s timing in an evidence-based curriculum is very important. It’s less about how much you assess and more about the assessment’s placement at the right point in the learning. For example, having summative experiences (games) early in the unit is better than leaving them until the end. This gives students feedback sooner, allows teachers to find learning gaps, and helps students feel progress and confidence throughout the unit, before it might be too late to fix problems.

2. What is reliable evidence? To judge evidence well, a teacher needs to use many kinds of assessments, like informal observations, dialogue with students, exams, projects, and performances. Teachers need a mix of practice, scrimmage, and game assessments, focusing mostly on scrimmage assessments (since these develop students’ skill proficiency). Teachers should ask themselves, “Do I have enough of the right evidence to judge student performance (games), and do I also have enough skill-building moments (scrimmages)?” If teachers rely too much on games (summative exams or quizzes) or on practice (checks for understanding), it may be hard for them to help students develop real competence. Instead, students might end up with only shallow learning and short-term skill proficiency.

3. How much evidence do I need for accurate assessment? In evidencebased grading, teachers need only as much evidence as it takes to make a reliable evaluation. Teachers should scrutinize their assessments to avoid asking for more evidence than necessary. In conventional assessment design, the question often is, “How many questions should we ask students on this topic?” But in evidence-based grading, the question is, “Do we have enough evidence that reliably shows students’ proficiency level?”

Figure 3.15 (page 105) shows options for unit pacing with performance windows.

More Models of Deadlines and Reperformance

Source: Reibel et al., 2024, p. 154.

Figure 3.15: Two unit pacing options that include performance windows, deadlines, and reperformance opportunities.

Assessments Should Capture More Than Explicit Knowledge

Educators and students often think of assessments as evaluative moments to measure students’ understanding and proficiency. However, assessments are also valuable learning moments where students reflect on their performance, recognize their proficiency levels, and find areas to improve. In evidence-based grading, students learn to see assessment not just as a way to measure how they are doing but as an integral part of their learning.

Evidence-based grading uses proficiency-based assessment, which is a continuous process of performing and reflecting until students reach an intended level of proficiency (Gobble et al., 2016). In other words, assessment involves ongoing, reflective interaction with one’s performance. Asking students to reflect during assessments can expose thinking that can uncover their real learning stories. For this to happen, students need the chance to think about the proficiency

standard while they perform. In evidence-based grading, teachers help students learn how to be proficient in skills and to be aware of that proficiency.

Collect Experiential Information as Well as Performance Data

To create this awareness, teachers use assessments to collect experiential information along with explicit knowledge. An assessment that includes reflection can give a teacher meaningful information to support students. When they have this experiential information, the teacher is more likely to support the student’s learning story instead of immediately supporting the content deficit (for more on this point, see Reibel et al., 2024). For example, asking a student, “After that last section, do you think you are ready to go on to the next section?” can encourage reflective thinking and may help the teacher gain insight into the student’s true learning story. (Are they experiencing a proficiency issue, a will issue, a confidence issue, or an efficacy issue, for instance?) We explore this point in more detail in chapter 4 (page 129).

Promote Perceived Proficiency

A goal of assessment is to help students gain a reliable perception of their skills. Perceived ability is important because an individual’s belief in their abilities can have a greater impact on their performance and achievements than their actual skills or knowledge (Bandura, 2023). Renowned social psychologist Albert Bandura (2023) studied this idea, called self-efficacy, for decades. He found that when people perceive themselves as competent, they are more likely to take on challenges, keep going through tough times, and reach their goals.

Believing in your abilities is powerful because it affects your motivation. When people think they have what it takes to succeed, they are more likely to try hard, face challenges, and learn from their mistakes (Zimmerman et al., 2015). This helps build resilience for long-term success (Bandura, 2023). On the other hand, when people doubt their own abilities, they may avoid challenges and avoid opportunities for growth because they fear failure (Panadero, Jonsson, & Botella, 2017). While actual proficiency is important, students’ perceived proficiency plays an equally critical role in their success.

To support this, assessments should not only measure performance but also ask students to reflect during them. When students reflect during assessments, they can share their experiences, see how their perceptions match their results, and build confidence. This helps motivate them, makes them take charge of their learning, and bridges the gap between what they think they can do and

what they actually can do. In the next section, we will explore practical strategies for designing assessments that provide a holistic view of student learning— combining performance data with reflective insights—and show how this approach supports both actual and perceived proficiency. We will share examples of this in the next chapter.

Use Evidence for Feedback

Generally speaking, providing feedback seems simple—teachers give students advice on their work to guide them on how to improve. But getting students to use the feedback can be hard. In evidence-based grading, feedback is mission critical. If students don’t accept and interact with feedback, they will find it hard to understand their proficiency, and teachers will find it difficult to motivate and teach them (Adair-Hauck, Glisan, & Troyan, 2013). Evidence-based grading helps students better interact with feedback in several ways: (1) rubrics are designed around enduring skills, (2) communication is separate from classification, (3) rubrics are used for self-feedback, and (4) feedback is treated as a conversation.

Use Skills-Focused Rubrics

Rubrics are the main tool for students to get feedback and understand their skills in an evidence-based grading system. Traditional rubrics don’t work well in evidence-based grading because they often are inventorial (checklists of content a student needs to include in a task). In evidence-based grading, rubrics focus on skills and proficiency standards. Unlike traditional rubrics that mainly focus on criteria, evidence-based rubrics use a proficiency scale of a skill, along with the associated criteria (see figure 3.16, page 108, for an example).

The purpose of evidence-based rubrics is to first classify a student’s skill proficiency and then communicate the criteria that went into that proficiency score. Traditional rubrics, on the other hand, often work in reverse. They score individual criteria and then aggregate those scores into an overall average score, leaving it up to the student to decide whether they have achieved overall proficiency.

Separate Classification From Communication on Rubrics

This point about rubrics can be made through a familiar experience: learning to drive. Learning to drive is challenging; cars are complicated pieces of machinery. For some people, it was a wonderful experience, but for others, it might have been frustrating, perhaps because whoever taught them attempted to simultaneously classify (provide feedback about a proficiency standard) and

Course Rubrics

Skill 1: Historical Argumentation

Exceeds Meets

I can write a sophisticated argument that is historically defensible by analyzing detailed, accurate, and relevant historical evidence organized appropriately for the prompt.

I can write an argument that is historically defensible by analyzing sufficient, specific, accurate, and relevant historical evidence organized appropriately for the prompt.

Success Criteria How Well Am I Doing?

Contextualization

Thesis or Claim Statement

Evidence

Approaching Developing

I can write an argument that is historically defensible by analyzing partial or general historical evidence relevant for the prompt.

I attempted to write a historical argument.

Teacher Feedback

Multiple sentences beginning the introductory paragraph accurately frame the argument by describing relevant historical events leading up to the topic of the thesis.

The thesis statement at the end of the introductory paragraph must:

• Fully answer the question or prompt

• Structure the essay by previewing subclaims

• Demonstrate accurate factual knowledge

• Take a clear position by using either a counterclaim, rank ordering, causation, and so on

Factual evidence (outside knowledge):

• Demonstrates accurate factual knowledge

• Appropriately supports the argument

• Is of adequate amount to support the argument

Document evidence (when required):

• Is accurately paraphrased to demonstrate full comprehension of the document

• Is properly referenced and cited

Figure 3.16: Example evidence-based rubric.

communicate (guide them about which success criterion needed improvement). The driving teacher might have said things like, “OK, now ease your foot off the brake. Now slowly turn the wheel to the right a bit but not too far. OK, now straighten out . . . but don’t step on the brake too much. All right now, that’s it. Keep your hands on the wheel. OK, you are doing fine. Check your mirrors. Now give it a little gas . . . more gas . . . no more gas. Please step on the brake!”

You may laugh as you read this, but this is what traditional rubrics are doing to our students! Students can’t help but feel confused when classification and communication occupy the same feedback space. Students may ask, “Where is the feedback? What is a judgment? How did I do exactly?” Let’s take a look at an example of a rubric that you don’t want to use—one that classifies and communicates at the same time (figure 3.17, page 110).

With evidence-based rubrics, teachers first classify proficiency on the standard and scale and then communicate the success criterion that was used to make that classification. See the previous evidence-based grading example in figure 3.16 (page 108).

To test whether your rubrics are accurately communicating and classifying, ask yourself, “Can a learner clearly identify which area of the rubric is for classifying skill proficiency and which area is for communicating criteria?” Let’s look at a rubric in figure 3.18 (page 111).

A student receiving this rubric might think, “Is the teacher ranking everything I did?” or “Where are the criteria used to give me that ranking?” or “How can I be a novice in one area but an expert in another connected to the same skill? How can I be high in a lot of areas and get a B on the task?” This kind of rubric can hinder learning because the learner might not know what they are being classified on or what the feedback really is.

The balance between classification and communication is important. Both must be present on a rubric. The classification of the proficiency standard must be done with communication of criteria. Without both working together, students might find the rubric confusing.

Evidence-based rubrics contain four components to balance feedback properly.

1. Skills: Ensure enduring skills, which are skills that transfer between contexts and have utility in a self-sustaining life.

2. Proficiency standards and scales: Make sure to create proficiencybased standards—standards that holistically rank student performance or knowledge level in the course skills.

• The topic is vague or unclear.

• It is off-mode.

• There are unrelated or illogical ideas.

• There is an effective opening, which may or may not contain a specific preview.

• There is an effective opening with a specific preview.

• The opening is general.

• The opening displays sophistication.

• There are major focus drifts.

• The opening displays sophistication.

• It lacks a closing.

• There is a general opening statement or restatement of the prompt.

• There are minor focus drifts.

• It may lack a closing.

• Logic is maintained throughout.

• Logic is clearly maintained throughout.

• A clear closing is evident.

• An effective closing unifies the writing.

• Evidence is inadequate, contradictory, misinterpreted, or not fully explained.

• Sketchy discussion reveals partial or inaccurate understanding of the plot, characters, and themes.

• Some evidence is provided through direct quotations and occasional specific examples but is not always well chosen, clearly connected to the thesis, or fully explained.

• Evidence is provided through adequate direct quotations and specific examples that are usually explained fully.

• Quotations lack lead-ins, page references in parentheses, and/or proper punctuation.

• Uneven discussion reveals accurate understanding of characters, themes, or concepts, but lacks insight.

• Insightful discussion reveals accurate understanding of characters, themes, or concepts but, at times, is more literal or superficial.

• Evidence is provided through ample, relevant direct quotations and specific examples that are clearly and thoroughly explained with original thought.

Support

• Paragraphs lack focus due to absent or inappropriate topic sentences or major digressions.

• The essay is not effectively organized due to backtracking and jumbled ideas.

• There are inappropriate or absent transitions.

• Quotations may use awkward leadins (“This quote shows . . .”) or may lack lead-ins, page references in parentheses, or proper punctuation.

• Quotations are usually integrated with lead-ins (though they may be awkward), page references in parentheses, and proper punctuation.

• Sophisticated and in-depth discussion reveals accurate understanding of and insight into characters, themes, or concepts.

• Most body paragraphs develop one main idea introduced by a topic sentence, which connects to the focus established in the introduction.

• Quotations are integrated into the writer’s own syntax with clear and natural lead-ins, page references in parentheses, and proper punctuation.

• Some paragraphs include topic sentences that do not connect to the focus established in the introduction, lack topic sentences, or contain unrelated ideas.

• The overall structure is random.

• Transitions, which may be intrusive (first, next, another reason), are sometimes used between or within paragraphs.

• Most points are logically organized.

• The overall structure is clear.

• Appropriate transitions provide unity between and within paragraphs.

• Each body paragraph develops one point introduced by a topic sentence that clearly connects to the thesis established in the introduction.

Organization

• All points are logically interrelated.

• The overall structure is clear and appropriate.

• Effective and varied transitional devices (old and new info) provide unity between and within paragraphs.

/ 20 × 2 = TOTAL This would’ve been much higher had you included the required items that were missing in your rough draft. Leah, you do a good job of giving your analysis of story events. Your paper still lacks examples from the book (quotes) to support these ideas. Also, topic sentences would help your reader.

2 2 13 26 Figure 3.17: Traditional literary analysis paragraph rubric.

/ 2 Rough draft is included. / 3 Paper is free from distracting grammatical errors.

Content: Content and details Student provides many rich and detailed examples.

Engagement: Attention, active listening, responsiveness

Vocabulary: All vocabulary from unit, five words from each list

Student’s remarks capture and hold the listener’s attention; always uses active listening strategies.

Student uses accurate and advanced vocabulary and concepts.

Student provides many meaningful examples.

Student’s remarks capture the listener’s attention; uses active listening strategies the majority of the time.

Student uses accurate and appropriate vocabulary and concepts.

Student provides some appropriate examples.

Student’s remarks attempt to engage the listener; sometimes uses active listening strategies.

Student uses vocabulary and concepts with some accuracy.

Figure 3.18: Non-evidence-based rubric that classifies and communicates.

Student provides a few vague examples.

Student’s remarks do not engage the listener; rarely uses active listening strategies.

Student uses vocabulary and concepts with minimal to no accuracy.

3. Success criteria (supporting content and skills): Clearly identify criteria for success, prerequisite skills, and knowledge required for successful attainment of the targets’ proficiency levels.

4. Conversation sandbox: Build a conversation sandbox, which is a feedback space for the teacher and student to have a conversation about performance and its interplay with the criteria.

Let’s examine how a rubric, such as the one in figure 3.19 (page 112), includes these four components. This figure shows an example of an evidence-based rubric for the skill of speaking.

Independently create an appropriate spoken message in familiar and unstructured situations.

Independently create an appropriate spoken message in unfamiliar and unstructured situations.

Independently create an appropriate spoken message in familiar and unstructured situations.

Independently create an appropriate spoken message in familiar and structured situations.

Independently attempt to create an appropriate spoken message in familiar and structured situations.

Communication Strategies

Student Reflection

Engagement I used good vocabulary and details.

Delivery

Risk Taking

Body Language

Supporting Skills

Vocabulary

Context Details

Connections

Evidence

I was a little nervous, which could be why I didn’t have the best eye contact and delivery.

I forgot to add in the content piece we talked about during the formative practice speech.

Teacher Feedback

Your vocabulary was good; however, you could have included a few more stretch words and terms that were on your vocabulary worksheet. Also, I am hoping that you include more than basic details in your speech. If you read the homework each night, you will gain confidence in your speaking because you can use more details. This will help you improve your body language and delivery.

Figure 3.19: Example of a completed evidence-based rubric.

Visit go.SolutionTree.com/assessment for a free reproducible version of this figure.

Let’s examine these four components in more detail.

Focus on a Single Skill

Rubrics that focus on a single skill (with a clear standard and a proficiency scale) are core to evidence-based grading. When a rubric is designed around a single enduring skill, such as source evaluation, counting, or persuasive writing, it provides a focused framework that outlines the specific criteria students need to meet for proficiency.

Use a Proficiency Standard

A single-skill proficiency standard rubric defines the meets proficiency level of the standard and sets a clear benchmark for desired performance. This benchmark acts as a goal for students, offering a clear depiction of what proficient work looks like. For instance, in a rubric focused on argumentative skill, the standard specifies that a student must “clearly support an argument with relevant evidence, and organize it logically.” Further, a proficiency standard and scale not only provide a concrete goal for students to aim for but also help ensure consistency in grading (more on this in chapter 5, page 163). Here is an example of a standard’s meets proficiency level.

I can accurately evaluate a historical source by identifying source information and essential main ideas with supportive evidence.

This example has how well language: accurately, essential main ideas, and supportive evidence. This language speaks to the level of quality the teacher wants and also directs the student’s attention to the success criteria. For example, the student might ask, “What does supportive evidence mean (in this unit)?” And the teacher, using the rubric, might respond, “It means evidence that relates to the claim, is current, and includes [these content standards].” They are stating the success criteria needed to achieve a meets level of proficiency in “evaluating sources.” This level requires the student to provide “supportive evidence.”

Incorporate a Proficiency Scale

Rubrics need to incorporate proficiency scales (see figure 3.20, page 114). These scales provide in-depth descriptions of the various proficiency levels (from developing to exceeds). Scales support differentiated instruction and outline how students can grow. For example, for a skill of writing, the scale might define the developing proficiency level as “having a thesis with limited supporting evidence,” the meets proficiency level as “having a clear thesis with comprehensive evidence,” and the exceeds proficiency level as “having a compelling thesis supported by robust, well-integrated evidence.” A proficiency scale keeps the verb the same but changes the qualifying language around it.

4—I can create an expert historical essay with sufficient and relevant historical detail.

3—I can create an accurate historical essay with sufficient and relevant historical detail.

2—I can create an accurate historical essay with relevant historical detail.

1—I can create an accurate historical essay with basic historical detail when guided.

Figure 3.20: Example proficiency scale.

The developing (1) level suggests having little to no proficiency of skill or knowledge. The student has some basic understanding but shows no proficiency in the skill. It’s like having a jigsaw puzzle poured out on the table with no pieces put together.

A score of approaching (2) indicates glimpses of proficiency. It’s like the student is starting to put together parts of the puzzle, such as the edges or different groupings.

Meets (3) represents the expected level of proficiency. This is the rigor the teacher desires—not some basic version of it. In evidence-based grading, students achieve an A if they consistently score at the meets level (more on this in chapter 5, page 163).

Exceeds (4) isn’t about gaining new skills or doing something completely different. Instead, it’s about refining the meets proficiency. Consider it this way: When an athlete, artist, or other professional reaches a proficient skill level, they don’t suddenly develop brand-new skills to demonstrate their growing expertise. No, they refine and add nuance to their existing proficiency. In other words, they are deemed exceptional because they have perfected proficiency. So, exceeds means refining your skills to an exceptional level, not becoming Einstein or doing a whole new task. In evidence-based grading, a student doesn’t need to reach exceeds to get an A.

Consider That Proficiency Scales Are Asset Focused

Grading and feedback emphasize proficiency and assets, not deficits. This means that each level of a proficiency scale should represent something a student can do. The proficiency standard example in figure 3.21 (page 115) shows how each level focuses on what students can do. This encourages teachers to give feedback that highlights successes and guides improvement. Students respond better to this than being told what they did wrong (Black & William, 2018).

In traditional points-based grading, the scales often focus on deficits, with points taken away for not meeting expectations. For example, a developing score on a traditional rubric might say, “Major errors in the calculations, a lack of understanding of the topic, and incorrect details.” In contrast, evidence-based feedback for this same developing level might read, “Attempted

Exceeds Meets Approaching Developing

I can write in an expository style that is effective and that includes sufficient, relevant detail and reasoning.

I can write in an expository style that is effective and that includes sufficient, relevant detail. I can write in an expository style with limited detail. I attempt to write in an expository style.

to make the calculations, included basic details, and showed simple contextual understanding of the topic.” This “can do,” focused feedback is a central idea in evidence-based grading.

Remember Success Criteria

We already discussed success criteria in chapter 1 (page 41), but here are a few more key points. Success criteria are intended for the meets level of a proficiency scale only. Proficiency scales do not need separate criteria for the exceeds, approaching, and developing levels. Also, success criteria should not be scaled; they represent components of a proficiency standard, not standards themselves. Success criteria can be recursive, meaning they can be applied in multiple contexts, or nonrecursive, meaning they are relevant only within a particular context. Most importantly, success criteria are how teachers justify proficiency scale scores they give their students.

Increase Student Voice (Self-Assessment)

Rubrics should include space for both teacher feedback and student reflection (self-feedback). Students should use rubrics to self-assess to develop an accurate perception of their proficiency levels. Teachers can then assess not only the students’ performance but also the quality of their self-assessments. This approach fosters self-awareness and guides students in refining their self-evaluation skills—something needed to become self-reliant people. Let’s examine how a rubric like figure 3.19 (page 112) includes a space for student self-assessment.

The following are examples of writing rubrics used in evidence-based grading (figures 3.22 and 3.23, page 116). The first one is a more general writing rubric, and the second is more specific to writing an essay.

Evidence-based rubrics are key for communicating proficiency, grading, promoting self-assessment, and helping students reflect on their work. When used well, they create a collaborative learning environment where both students and teachers actively participate in the feedback process.

Figure 3.21: What students can do.

Skill: Writing

Standard: W.9–10.4— Produce clear and coherent writing in which the development, organization, and style are appropriate to task, purpose, and audience.

I can produce sophisticated, clear, and coherent writing with a voice that is appropriate to the task or prompt. I can produce clear and coherent writing that is appropriate to the task or prompt. I can produce writing with relevant ideas and details that are connected to the task.

• Complete sentences

• Organization

• Precise vocabulary

• Sentence variety

• Consistent verb tense

• Adequate introduction, body, and conclusion

• Transitions

• Formal style

I can produce writing with general ideas and details that are connected to the task.

Source for standard: National Governors Association [NGA] Center for Best Practices & Council of Chief State School Officers [CCSSO], 2010.

Figure 3.22: Evidence-based grading writing rubric—Example 1.

Skill: Writing

The student can write an essay with nuanced evidence and sophisticated, relevant vocabulary. The student can write an essay with sufficient evidence and adequate, relevant vocabulary. The student can write an essay with basic evidence and simple, relevant vocabulary. The student can write an essay with limited evidence and simple, relevant vocabulary when guided.

• Topic

• Voice

• Expression

• Characters

• Opening

• Closing

• Vocabulary

• Details

Figure 3.23: Evidence-based writing rubric—Example 2.

Compare Evidence-Based Rubrics With Conventional Rubrics

Teachers can use the protocol in figure 3.24 to review and assess their current rubrics. This protocol has the characteristics of evidence-based rubrics at left and the characteristics of conventional rubrics at right.

Evidence-Based Rubric

Dialogic frame: This approach to rubrics focuses on open-ended and interactive conversations between teachers and students. It encourages a nuanced discussion of the material, argumentation, and collaboration.

One scale plus criteria: This approach uses one scale with one set of evidence to evaluate student performance. It can provide clarity for both the teacher and the student, as well as consistency in evaluations.

Proficiency: This approach to rubrics focuses on what the student can do and what they are proficient in. It emphasizes the strengths and assets of the student, rather than their weaknesses.

Growth language: This approach to rubrics uses language that emphasizes growth and development. It helps students discuss their progress and understand that they can continue to improve.

Same verb: This approach to rubrics uses the same verb in each gradation. The same verb provides clarity and consistency in evaluations.

Criteria for only three levels: This approach uses one set of criteria for all levels of performance. It provides a clear road map for student progress and helps students see where they need to improve.

No numbers or percentages: This approach to rubrics does not use numbers or percentages to evaluate performance but instead uses qualitative language. It may provide more room for interpretation and conversation in evaluations.

Source: Reibel et al., 2024, p. 81.

Conventional Rubric

Checklist frame: This approach to rubrics has students simply meet specific requirements, often appearing as a checklist. It suggests a focus on compliance rather than on competence and reflection.

Many scales: Using multiple scales for evaluation can be confusing for both the teacher and the student. It can lead to inconsistent evaluations, as well as difficulty in tracking progress over time.

Deficiency: This approach focuses on what the students are deficient in. It emphasizes the weaknesses of the student’s performance, rather than their strengths and assets.

Vague language: Using vague language can make it difficult for students to understand what is expected of them. It can also make it challenging for teachers to evaluate student performance reliably.

Verb changes: This approach to rubrics uses different verbs in each criterion. It can lead to confusion and inconsistent evaluations, as well as make it difficult for students to understand what is expected of them.

Criteria for levels 4, 3, 2, and 1: This approach to rubrics uses criteria for each level of performance. Students may find this hard to track progress and see improvement since they tend to see each set of criteria as an isolated set of learning targets they must master.

Numbers, percentages, and cutoff scores: This approach uses numbers and percentages, such as cutoff scores, to assess student performance. Although it may appear clear and objective, it often leaves no room for learning conversations or meaningful reflection.

Figure 3.24: Comparison of evidence-based and conventional rubric characteristics.

Use Feedback as a Conversation

Evidence-based grading revolves around conversations. These conversations are about evidence of proficiency, where students get a chance to self-assess and discuss their skill proficiency with their teacher. To have these grading conversations with students, teachers should consider the following three key ideas.

1. Evidence-based grading conversations must be mutual (inclusive of student voice): Grading conversations should involve both teacher and student voices. This two-way communication fosters more meaningful exchanges, where both parties learn to trust the feedback on student proficiency.

2. Evidence-based grading conversations must be forward facing: Forward-facing conversations focus on growth. Participants analyze evidence to see how it can lead to further learning. The teacher might say, “This skill will lead you to be able to . . .” When feedback is forward facing, it is less reactive and less about what went wrong. For example, the teacher might say, “Do [this part] more often, and then you should be able to do it,” instead of saying, “You didn’t do [that part] right; you need to practice that more.”

3. Evidence-based grading conversations must be rewarding: These conversations should make every student feel they can succeed. Evidence-based grading is about proficiency, not just grades. You want students saying, “I can learn, recover from mistakes, and ultimately succeed.”

Use Evidence for Scoring

Scoring assessments is about feedback, not just a score. In evidence-based grading, teachers pay attention to two things when they score: (1) focusing on student self-feedback and (2) looking for proficiency patterns.

Focus on Student Self-Feedback

Teachers focus on self-feedback first in evidence-based grading. Students give themselves feedback, before the teacher does (Panadero & Jonsson, 2020). When students finish an assessment, they give themselves feedback or self-score before they turn in the assessment (Hattie, 2023). Figure 3.25 (page 119) shows how this might look. Traditional scoring, where the teacher does all the feedback and scoring, would look more like figure 3.26 (page 119).

The teacher teaches students. Students perform. Students self-grade and turn their work in to the teacher.

The teacher grades.

The student reviews and provides selffeedback. The teacher provides feedback.

Figure 3.25: Evidence-based feedback and scoring process.

The teacher teaches students. Students perform. The teacher gives feedback.

Figure 3.26: Traditional feedback and scoring process.

Focusing on self-feedback and self-scoring allows students to see their skills more accurately (Brown et al., 2014). They can’t always rely on the teacher to evaluate them. If they are to be self-reliant adults someday, they need the ability to self-evaluate.

Look for Proficiency Patterns

Scoring assessments also involves identifying proficiency patterns in the evidence. Our team member, Maria, likened this process to judging a singing audition. Just like a judge looks for patterns that either support or contradict a good performance, the teacher looks for proficiency patterns and provides feedback accordingly. A teacher can even stop the performance once they identify patterns and can move to feedback, as illustrated in figure 3.27 (page 120). In other words, they teach while scoring assessments. Figure 3.28 (page 120) shows how scoring looks in evidence-based grading. In this example, skills are rated for proficiency and the criteria are said to be present or not present in their work.

Notice in this example how there are no points on the assessment, just words that provide context for a pending skill proficiency rating. With evidence-based scoring, feedback about proficiency is the score. Evidence-based assessments don’t receive scores; the skills covered on the assessments get the scores. (See chapter 5, page 186, for how to collect proficiency scores into a letter grade at the end of each grading term.)

Use Evidence to Determine Grades

Accurate evaluation of student work needs clearly defined proficiency standards and criteria. This accuracy relies on the teacher’s ability to recognize

Skill: Sing

Success Criteria:

Criterion 1—Present

Criterion 2—Present

Criterion 3—Not Present

Success Criteria

Criterion 1—Present

Criterion 2—Not Present

Criterion 3—Present

Figure 3.27: Using proficiency standards and success criteria to evaluate.

Figure 3.28: Example of how a teacher evaluates student work in evidence-based scoring. Meets proficiency standard

proficiency patterns and have meaningful discussions about proficiency. This is key to evidence-based grading, where grades reflect ongoing assessment of a student’s progress toward proficiency. Teachers look at two things to evaluate student work: (1) the body of work and (2) recent trend.

Body of Work

Some schools, even those with modern grading software, still base student grades on averaging scores instead of looking at the body of work. Looking at the body of work (and trend) is the most reliable way to judge student proficiency (Guskey, 2023; Kahneman et al., 2021; Reibel et al., 2024). This approach is similar to reviewing a résumé, where one looks at how all the experiences listed describe the candidate and their abilities. In evidence-based grading, teachers report grades by reviewing the student’s body of work (résumé), assessing their proficiency. In an evidence-based gradebook, the body of work looks like figure 3.29.

Source: Reibel et al., 2024, p. 200.

Figure 3.29: Example of how a student’s body of work (proficiency levels) is reported in an evidence-based gradebook.

Recent Trend

Teachers also look at recent trends in a student’s work to determine grades. Recent trend refers to how the student is currently progressing in their skill proficiency—where a student started and where they are now. Teachers use this trend evidence to provide a more accurate picture of a student’s skill proficiency by showing several areas of trend in the gradebook. Trend (or growth) information in an evidence-based gradebook would look like figure 3.30.

Source: Reibel et al., 2024, p. 196.

Figure 3.30: Example of trend (or growth) information in an evidence-based gradebook.

Use Evidence for Reporting

If evidence-based scoring is mainly about giving feedback, then how teachers package that feedback is important. Traditional grading usually packages feedback with an overall total of points earned out of the number of points possible, as shown in figure 3.31 (page 123).

Packaging feedback in this way can leave students confused or result in a whole host of other adverse reactions due to the muddled nature of the grade.

They might question, “What is 2/4? Does it really represent who I am? Can I recover from this? What went wrong? What went right?”

Figure 3.31: Example of packaging feedback in traditional grading.

In evidence-based grading, feedback packaging is different. The proficiency standard score and the justifying criteria are the package. The teacher communicates to students the proficiency level they see from the evidence provided and state why they gave them the ranking (success criteria). Figure 3.32 (page 124) shows an example of how this might look.

Consider the Timing of Grades

Evidence-based reporting considers the timing of grades. The timing of initial and final grades (when they appear in the gradebook) can significantly influence students’ learning (Guskey, 2015). Withholding grades until later in the semester can make skill proficiency the focus.

Students need time to adjust to new subjects and proficiency expectations at the start of class. Delaying grades can signal to students that there is time to develop their skills and grasp content. Also, delaying grading can help students build a strong foundation before their proficiency is evaluated. It can encourage students to embrace struggle and make mistakes without the threat of

their grades dropping, which can reduce anxiety and stress (O’Connor, 2018). Additionally, displaying grades too early might inaccurately represent students’ abilities, since students learn at different rates.

Figure 3.32: Example of packaging feedback in evidencebased grading.

Communicate Projected Grades

During the grading period, grades represent students’ projected skill proficiency. Evidence-based gradebooks show the grade a student is likely to get at the end of the grading period based on their current proficiency evidence. The teacher is essentially saying, “If you continue with these proficiency levels, you will likely get this grade.” This idea is similar to a baby’s growth chart, where their doctor shows the possible trajectory of the child’s growth. To give students accurate projected grades, the teacher needs different types of assessments and strong feedback.

Consider Grade Ranges

Evidence-based grades are based on proficiency evidence, which can fluctuate as students learn. To account for this fluctuation, we suggest displaying grade ranges. A grade range (two potential letter grades) can better account for the variability in student proficiency, offering a clearer picture of their progress. Grade ranges allow for changes in early work and recognize that early evidence may not fully show a student’s skills since they are still developing.

In evidence-based grading, the first grade in a range is based on the body of work, and the second is based on trend. The first grade comes from the mode, or most common proficiency score, in the body of work. The second grade highlights whether a student’s proficiency scores are improving or declining. They are shown in the gradebook like the following.

• A/B

• C/D

• B/A

• F/D

Using grade ranges shifts the focus from “getting a grade” to building skill proficiency. In evidence-based grading, students ask their teachers which skills they need to improve to change their grade range. In traditional grading, students ask how many more points they need to raise their average. The former approach feels like mentorship, while the latter feels like a game.

Reduce Noise

There can be a lot of noise in a traditional gradebook: daily homework scores, formative assessments, project scores, habit-of-work scores, participation points, and exams. All these components create confusion, where it is hard to determine what is actually contributing to the grade (for example, “Does the student have an A because of exams, because of homework, or because of participation?”). With all this noise, how does a student know how they are really doing? Reducing noise in gradebooks ensures that grades accurately reflect student learning and performance.

One way to reduce noise is to stop grading everything. An evidence-based gradebook distinguishes between active and inactive scores without using weights or percentages. Active summative assessments (games) determine proficiency and count for 100 percent of the grade, while formative assessments (scrimmages)

or preparation activities (practice) have no impact on the grade but are used to show how students are developing and preparing for the summative exam.

Another way to reduce noise in the gradebook is to set principle and practice tights—non-negotiable guidelines for reporting student evidence. Principle tights refers to the core philosophies about the gradebook (for example, there is a clear growth story at all times, and only games count toward the grade). Practice tights refers to the consistent application of grading practices across teachers and courses (for example, three practice scores in the first two weeks, and two scrimmage scores before each game). In evidence-based grading, while principles are strictly followed, the practices may vary slightly to fit each classroom. Figure 3.33 provides an example of a gradebook audit showing principle and practice tights.

Principle Tights

Weeks 1–3

Weeks 4–6

Weeks 7–8

Weeks 9–11

Weeks 12–14

Weeks 15–17

The purpose of these weeks is learning in progress. The practice or prep log should begin to be populated.

Growth time starts. Comment for all stalling-out AGs, MGs, and FGs. The practice or prep log should begin to be populated.

Formative assessments (developing exams) should appear in the gradebook.

Multiple summative assessments should exist.

Multiple summative and formative assessments should exist.

Multiple summative and formative assessments should exist. Feedback about potential grade determination for the semester should appear.

Week 18 Enter final proficiency scores for skills.

Practice Tights

There is IP for all classes. Practice homework log story

IP is optional. AGs and MGs are now available.

There are one or two formative assessments per summative assessment.

Feedback is provided for each score below proficient.

Feedback for all scores should be evident.

Final skill scores are posted, and final grades appear.

Key: AG = appropriate growth; MG = minimal growth; FG = failure to grow because of low proficiency scores; I = incomplete; I/FG = missing work causing a failure to grow; IP = learning in progress

Figure 3.33: Example gradebook audit of principles and practice tights.

Clear gradebooks help students know exactly how their grades were determined (evidence of proficiency standards). Because of these tights, gradebooks are more understandable to students and parents, helping them become better equipped for conversations about grades.

Use the Gradebook During Instruction

Updating the gradebook without changing your instructional approach may not lead to improvements in student learning. If the gradebook is still focused on tasks (test, quiz, homework, and so on), and students earn points for each, it may signal that just completing tasks well is enough for students to succeed. To avoid this, teachers committed to evidence-based grading frequently use their gradebooks during instruction to promote students’ reflection on their proficiency and learning; they do this rather than set the gradebooks to the side to just store points for test scores, participation, and homework.

Students use gradebooks to identify their proficiencies in course skills, which guide future action. One teacher we met had students use the information in their gradebooks to write their parent or guardian an email about their skill proficiency and potential grade trajectory. Another teacher had students pick a skill from the gradebook they needed to work on and pair up with someone who also needed work on that skill. In essence, in evidence-based grading, the gradebook is actively used during learning.

Key Points

During the insight phase, our team connected its teaching methods to new ideas about evidence and proficiency. When implementing evidence-based grading, team members value each other’s insights as critical for successful implementation. Now, review these key points to solidify your understanding.

• The teacher and student discuss and validate feedback together. (When both the student and the teacher have a say in feedback and grades, grades are more reliable.)

• Rubrics contain four components: (1) skills, (2) proficiency standards and scales, (3) criteria for success and supporting content, and (4) a self-assessment segment.

• Evidence-based grading means turning evidence into a grade; it is not based on converting points into a percentage into a grade.

• Pay attention to the timing of grades; when grades appear in the gradebook can impact learning.

Implementing Evidence-Based Grading

Transform your school’s grading practices with Pathways to Proficiency: Implementing Evidence-Based Grading, Second Edition. In this updated guide on evidence-based grading, Anthony R. Reibel, Troy Gobble, Mark Onuscheck, and Eric Twadell provide a five-phase process for moving beyond standards-based and competency-based grading models. Administrators and teachers will gain understanding of the core principles of evidence-based assessment, refocus assessments on proficiency, embrace student self-assessment, determine grades based on bodies of work and trends, and transform grading practices systemwide.

K–12 administrators can use this book to:

• Propose, design, and evaluate new grading practices based on student performance

• Lead and organize the implementation of evidence-based grading policies and practices

• Establish clearer guidelines, benchmarks, and standards of student performance

• Navigate common pitfalls when transitioning to an evidence-based assessment model

• Enhance student performance through more consistent feedback and stronger mentorship

“Grading practices have the power to motivate and inspire or demean and diminish. Yet, in many schools, grading practices go unexamined and are often left to individual teachers. The second edition of Pathways to Proficiency provides practitioner-proven processes for improving grading practices schoolwide. Buy it, read it, and most importantly, use it!”

—ROBERT EAKER Educational Consultant and Author

“If I had an educational magic wand that I could wield just once to improve student learning across the United States, I would use it to help every school revise archaic grading practices that average grades, rank achievement, restrict improvement, demotivate students, and fail to provide educators with accurate and precise information on each student’s specific strengths and needs. This exceptional book is that magic wand! The authors honestly challenge traditional thinking about grading, create a vision of assessment that can improve learning and develop hope in our students, and provide practical examples and tools to move you from core commitments to action.”

—MIKE MATTOS Educational Consultant and Author