The science of Star Phonics by ETC Educational Technology Connection (HK) Ltd

The Science Behind Star Phonics

TECHNICAL PAPER | SEPTEMBER 2022

Reports and screens are regularly reviewed and may vary from those shown as enhancements are made.

All logos, designs, and brand names for Renaissance’s products and services are trademarks of Renaissance Learning, Inc., and its subsidiaries, registered, common law, or pending registration in the United States and other countries. All other product and company names should be considered the property of their respective companies and organizations.

©Copyright 2022 by Renaissance Learning, Inc. All rights reserved. Printed in the United States of America. This publication is protected by U.S. and international copyright laws. It is unlawful to duplicate or reproduce any copyrighted material without authorization from the copyright holder. For more information, contact:

RENAISSANCE

P.O. Box 8036 Wisconsin Rapids, WI 54495-8036

(800) 338-4204 www.renaissance.com

09/22

3 Table of Contents Introduction 4 Content and Item Development ...................................................................................................................... 5 Test Blueprint Characteristics 6 Reliability 9 Classification accuracy 11 Validity 12 Putting it all together Star Phonics Reports 14 Conclusion 16

Introduction

Star Phonics: Screening and Diagnostic Assessment Research has documented that mastering phonics skills is critical for learning to read, and this is a critical element of The Science of Reading. Closely monitoring how students’ phonics skills are developing gives teachers critical insight to guide phonics instruction. Star Phonics assists with reading instruction by quickly and efficiently screening 12 of the most critical phonics categories while additionally providing diagnostics on 102 specific phonics skills. The assessment is designed for all students in grades 1-6 and older students who continue to struggle in reading.

Purpose of Star Phonics

Star Phonics is both a screening and diagnostic phonics assessment grounded in the Science of Reading. Star Phonics is an ideal complement to other screening tools that broadly assess multiple domains or skills areas. Unlike these tools, Star Phonics only focuses on phonic skills The diagnostic is designed to further pinpoint specific phonics skills for those students who require additional instruction or intervention to learn phonics

Star Phonics is reliable, valid, and also extremely efficient because it uses nonsense words to screen and diagnose phonics skills and only requires 2-5 minutes per student to administer.1 Research shows that the best way to measure student's cognitive processing of phonics is to read unfamiliar (e.g., nonsense) words aloud which requires that students use their knowledge of phonemes (sounds) and graphemes (letters) in order to read words while eliminating the chance students may have seen or already learned the word. Assessing with nonsense words also helps to eliminate cultural biases that can occur when students have more or less familiarity with real words.

Star Phonics is reliable, valid, and also extremely efficient because it uses nonsense words to screen and diagnose phonics skills.

1 Nonsense words are used to assess all phonics categories except contractions. Star Phonics Assesses with Student Device

Content and Item Development

Based on the Science of Reading

The Science of Reading embodies extensive, researchproven ideas about how reading develops and instructional practices for teaching reading. It is drawn from developmental and educational psychology, cognitive science, and cognitive neuroscience on reading. This research has important implications for helping students to succeed in reading. The Science of Reading confirms that students need explicit and systematic phonics instruction to succeed in reading

We know from the Science of Reading that most students who experience reading difficulties struggle at the word level. In fact, 70-80 percent of struggling readers have difficulty reading words because of a deficit in phonological processing 2

“I recommend Star Phonics, which is based on years of research, because it is the best phonics screener and diagnostic assessment available. It provides valuable information about phonics skills essential for instruction, especially for those students struggling to read words, including students with dyslexia.”

Dr. Louisa Moats • a leading author of LETRS, consultant, and literacy expert

The role of word recognition, and the importance of foundational decoding/phonics skills, is illustrated in Scarborough’s Rope (see Figure A). Summarizing a large, consistent, and growing body of research, Scarborough (2005) illustrates the separate and essential nature of word recognition and language comprehension in the path to skilled reading. The strands of word recognition, phonological awareness, decoding/phonics, and sight recognition are distinct sets of skills that students 2 Moats, L, & Tolman, C (2019). Excerpted from Language Essentials for Teachers of Reading and Spelling (LETRS), 3rd Edition. Dallas: Voyager Sopris Learning, Inc.

Star Phonics provides a direct measure of phonics skills by screening the 12 most common phonics categories and providing the option to diagnose 102 specific phonics skills. The data from Star Phonics helps teachers identify whether a student has deficits in phonics and more importantly where the deficits are. As a result, teachers save time, by focusing on the specific phonics skills students need to become proficient readers

Administrators also benefit by reviewing district data on phonics skills to determine how all students are progressing in phonics. The data also shows which phonics categories have been learned and which need more attention at a district, grade and classroom level.

Test Blueprint Characteristics

Phonics Categories and Target Skills

Star Phonics assesses 12 of the most critical phonics categories with diagnostics covering 102 specific phonics skills. The Screening component has three forms, one for each screening period (fall, winter, spring). Each screening form includes 36 nonsense words3 , three words from each of the 12 categories. There are 12 separate diagnostic assessments, one for each of the 12 phonics categories included on the screener

3 Nonsense words are used to assess all phonics categories except contractions.

Item Development

Star Phonics was carefully designed and rigorously evaluated. Developers used Item Response Theory (IRT) to evaluate the difficulty of individual items and advanced statistical techniques to assure that the collection of items focused only on phonics skills assessment.

The item development team initially generated test items for each assessment and then used two parameter Item Response Theory to select the best words within each phonics category (See Table 1) IRT calibration places the item difficulty and student ability on the same scale. The relationship between them can be represented graphically in the form of an item response function, which describes the probability of answering an item correctly as a function of the student’s ability and the difficulty of the item. The items in Table 1 are also ordered from easiest to hardest based on the difficulty parameter.

Once the items were finalized, the three words with the highest discrimination parameters were selected for the screener. Students who read these items correctly are likely to be able to read most words within that phonics category correctly as well

Discrimination Parameter Difficulty Parameter Skill Mean Standard Deviation Mean Standard Deviation Contraction 2.16 0.80 -1.77 0.64 CVC 2.00 0.37 -0.82 0.40 CVCC 1.81 0.19 -0.66 0.30 Digraph 2.08 0.37 -0.62 0.48 Blend 1.82 0.25 -0.50 0.35 R-controlled Vowel 2.14 0.49 -0.49 0.43 Vowel Team 1.85 0.40 -0.43 0.57

Table 1. Item Response Theory Two Parameter Model

Culturally Representative Content

Star Phonics uses nonsense words to assess phonics skills and eliminate biases that may occur from using real words with potential cultural representations. This evens the playing field, ensures that each student is evaluated exclusively on their knowledge of letter sound correspondences.

Test Design

Star Phonics includes screening and diagnostic components. The Screening component has three forms, one for each screening period. Each screening form includes 36 nonsense words1 , three words from each of the 12 phonics categories. The words on the screening forms are the same but in a different order. After the screening assessment, the software provides a report with recommendations of which diagnostic assessments to administer (see page 14 with the report titled Diagnostic Recommendations)

The diagnostic component includes 12 forms, one for each phonics categories measured on the screener The words on each form do not change but are randomized each time the diagnostic is given. The diagnostic component can be given as many times as the teacher chooses. The five most recent administrations of the diagnostic assessment are graphed on the diagnostic bar graph so teachers can gauge improvement in students phonics skills over time.

Renaissance recommends administering no more than three diagnostic assessments to a student at any one time. In addition, we advise only administering the diagnostic measures if a teacher is planning to work with the student on that skill in the immediate future. This ensures the data accurately represents the student's skills

8 Discrimination Parameter Difficulty Parameter CVCe 2.23 0.40 -0.32 0.27 Short Vowel Suffix 2.04 0.34 -0.09 0.39 CVCCVC 1.98 0.34 0.22 0.39 Prefix 2.30 0.41 0.24 0.55 Long Vowel Suffix 1.73 0.58 0.90 0.67

Star Phonics is a criterion-referenced measure that assesses mastery of phonics skills. Student responses are scored at the phoneme (sound) level to capture specific error patterns. Star Phonics does not produce a “words correct” metric Students are given three seconds to decode each nonsense word Limiting response time helps capture automaticity of word reading. It also helps ensure the assessment is conducted within 2-3 minutes for the screener and 2-5 minutes for the diagnostics

Reliability

It’s critical to evaluate the reliability, or the trustworthiness of scores, before selecting and using any assessment. Reliability can be measured in different ways, but scores almost always range from 0.00 to 1.00, with scores closer to 1.00 considered more reliable. The National Center for Intensive Intervention and many other professional groups recommend that low-stakes assessments demonstrate reliability coefficients at or above .70 and coefficients at or above .80 for making more important decisions about individual students.

Renaissance measures three types of reliability coefficients for Star Phonics:

• Internal consistency reliability measures how highly each item on a test correlates with the other items on the test. If high internal consistency reliability exists, high-ability students would tend to pass each item administered, while low-ability students fail each item.

• Test-Retest reliability measures the consistency of test results with repeated test administration under the same conditions. Retest reliability also may be referred to as alternate test reliability or alternate forms reliability.

• Inter-rater reliability measures the extent to which two or more examiners would score the same student’s performance the same way. This is important because there must be consistency across test administrations to establish confidence that each student’s score is based solely on the student’s actual performance

Star Phonics exceeds industry standards for acceptable reliability. On average, the internal consistency of Star Phonics is 0.9; it ranges from 0.857 to 0.963 across the measurements. Test-retest reliability is 0.864 on average; it ranges from 0.748 to 0.951 across the measurements. Test-retest was done within a two-week window. See Table 2 for the data for each measure. These coefficients show that Star Phonics measures students’ phonics mastery with a high degree of consistency.

For interrater reliability, evidence that scoring is consistent across raters, a study of interrater reliability was conducted. The coefficient is 0.97. This provides strong evidence that scoring is consistent across raters.

Phonics category Internal consistency (alpha) Retest Contraction 0.857 0.909 CVC 0.877 0.796 CVCC 0.869 0.844 Digraph 0.912 0.899 Blend 0.952 0.906 R-controlled Vowel 0.891 0.842 Vowel Team 0.959 0.951 CVCe 0.931 0.855 Short Vowel Suffix 0.963 0.933 CVCCVC 0.872 0.783 Prefix 0.951 0.902 Long Vowel Suffix 0.88 0.748

Table 2 Star Phonics reliability

Classification accuracy

Accuracy for Identifying Students At-Risk for Word Reading Difficulties

In addition to providing information to teachers on student’s phonics mastery, Star Phonics can be used to identify students considered “at risk” for reading difficulties, including characteristics of dyslexia, and thus requiring additional instruction and diagnostic assessment. In such cases, correlation coefficients are of lesser interest than classification accuracy statistics, such as overall accuracy of classification and area under the curve.

Area under the ROC curve (AUC) is a summary measure of diagnostic accuracy. The National Center For Intensive Intervention has set an AUC of 0.80 or higher as indicating convincing evidence that an assessment can accurately predict among students with satisfactory and unsatisfactory reading performance

To evaluate classification accuracy, student’s scores on Star Phonics were compared to their performance on the Woodcock Johnson Reading Mastery test (WJRM). Using the WJRM cut score as the criterion for identifying performance “at risk,” Star Phonics developers calculated AUC for Star Phonics. Coefficients range from 0.823 to 0.969 (average 0.92), demonstrating high agreement between the results of these two measures (See Table 3). This indicates that Star Phonics does a very good job of discriminating between students who performed satisfactorily and unsatisfactorily on the Woodcock Johnson assessment.

Skill Area Under the Curve (AUC) Contraction 0.933 CVC 0.926 CVCC 0.942 Digraph 0.933 Blend 0.936 R-controlled Vowel 0.890 Vowel Team 0.946

Table 3 Area under the ROC curve

Validity

Test validity was long described as the degree to which a test measures what it is intended to measure. A more current description is that a test is valid to the extent that there are evidentiary data to support specific claims as to what the test measures, the interpretation of its scores, and the uses for which it is recommended or applied. Evidence of test validity is often indirect and incremental, consisting of a variety of findings that in the aggregate are consistent with the theory that the test measures the intended construct(s), or is suitable for its intended uses and interpretations of its scores. Determining the validity of a test involves the use of data and other information both internal and external to the test instrument itself. Star Phonics assessments meet validity expectations on all accounts.

Criterion validity (or criterion-related validity) measures how well one measure predicts an outcome on another measure. A test has this type of validity if it is useful for predicting performance or behavior in another situation (past, present, or future). The first measure is sometimes called the predictor variable or the estimator. The second measure is called the criterion variable as long as the measure is known to be a valid tool for measuring similar outcomes or skills. Star Phonics was evaluated using concurrent validity which is when the predictor and criterion data are collected at the same time. (See Table 4).

12 Skill Area Under the Curve (AUC) CVCe 0.923 Short Vowel Suffix 0.969 CVCCVC 0.929 Prefix 0.940 Long Vowel Suffix 0.823

Star Phonics Skill Woodcock Johnson: Word Attack Woodcock Johnson: Word ID Woodcock Johnson: Basic Reading DIBELS CBM-R Contraction 0.585 0.756 0.724 0.679

Table 4. Concurrent validity data

13 Star Phonics Skill Woodcock Johnson: Word Attack Woodcock Johnson: Word ID Woodcock Johnson: Basic Reading DIBELS CBM-R CVC 0.616 0.632 0.661 0.625 CVCC 0.608 0.641 0.664 0.601 Digraph 0.683 0.656 0.707 0.619 Blend 0.691 0.651 0.708 0.627 R-controlled Vowel 0.600 0.669 0.679 0.700 Vowel Team 0.679 0.737 0.755 0.718 CVCe 0.719 0.645 0.714 0.648 Short Vowel Suffix 0.702 0.738 0.767 0.693 CVCCVC 0.637 0.654 0.687 0.654 Prefix 0.597 0.644 0.663 0.640 Long Vowel Suffix 0.422 0.534 0.517 0.660

Putting it all together—Star Phonics Reports

Star Phonics includes screening and diagnostic components. After the screening assessment, the software provides a report with recommendations of which diagnostic assessments to administer. The report is called “Diagnostic Recommendations.”

Diagnostic Recommendations Report

Class Screener Matrix

Star Phonics produces immediate reports that show teachers where to focus instruction by illustrating which phonics categories are secure and which need additional attention. The assessment can be customized to align with the scope and sequence of the school’s phonics curriculum.

Student Screener Analysis Report

Immediate, easy-to-read reports provide skill-level insights, patterns of error, and instructional focus that teachers and specialists can use to personalize instruction. Bar graph reports are automatically produced at a district, grade, class, and student level to identify how many students need help in phonics and exactly which phonics categories they need help in. This saves time by identifying where to focus time and resources for phonics instruction.

Conclusion

Star Phonics is reliable, valid, and extremely efficient. The assessment includes screening and diagnostic components for the most critical phonics skills students need to master. After screening, the software provides specific recommendations of which diagnostics to administer. Each measure takes between 2-5 minutes and provides in-depth critical insights into a student’s acquisition of phonics skills. With customized alignment to your school’s phonics curriculum, Star Phonics helps educators uncover each student’s instructional strengths and gaps. With reporting at the student, group, class, grade, and district levels, administrators and teachers can make efficient and informed curriculum and intervention decisions on phonics to help all students be successful readers.

Research-based, research proven

Developed by expert educators for teachers

Michelle K. Hosp, Ph.D.,

Director of

Foundational Literacy at Renaissance Learning and an Associate Professor of Special Education at the University of Massachusetts, Amherst. A nationally known trainer and speaker on problem solving and the use of progress monitoring data, she has worked as the Director of the Iowa Reading Research Center, a trainer with the National Center on Progress Monitoring and the National Center on Response to Intervention and is currently on the technical review committee for the National Center on Intensive Intervention. Her research focus is on reading, and MTSS/RTI in relation to CBM and CBE. Dr. Hosp has published numerous articles, book chapters, and books related to reading and effective decision-making practices.

Over 15 years ago, Dr. Hosp set out to create a better assessment that would help teachers guide phonics instruction for their students. Her research included an analysis of reading curriculums to ensure her new assessment focused on categories that commonly occur in English words. Additionally, she solicited input from national experts, such as Dr. Louisa Moats, and educators across the country, as well as testing hundreds of students.

The resulting assessment, first released in 2016 under the name KeyPhonics, is now used in schools in all 50 states. This technology-based screening and diagnostic assessment provides a fast, easy way to help students become proficient readers using data that informs which phonics categories to teach at a classroom, small group, and individual level.

The science of Star Phonics

The Science Behind Star Phonics

Introduction

Purpose of Star Phonics

Content and Item Development

Based on the Science of Reading

Test Blueprint Characteristics

Item Development

Culturally Representative Content

Test Design

Reliability

Classification accuracy

Accuracy for Identifying Students At-Risk for Word Reading Difficulties

Validity

Putting it all together—Star Phonics Reports

Diagnostic Recommendations Report

Class Screener Matrix

Student Screener Analysis Report

Conclusion

Research-based, research proven

Developed by expert educators for teachers

Articles inside

Putting it all together—Star Phonics Reports

Classification accuracy

The Science Behind Star Phonics