Grambling State University Standard Five Compendium 4
Reliability and Validity of EPP Assessments
Alignment to National Standard: Standard 5: Quality Assurance System and Continuous Improvement
The provider maintains a quality assurance system that consists of valid data from multiple measures and supports continuous improvement that is sustained and evidence-based. The system is developed and maintained with input from internal and external stakeholders. The provider uses the results of inquiry and data collection to establish priorities, enhance program elements, and highlight innovations.
CAEP R5.2 Data Quality: The provider’s quality assurance system from R5.1 relies on relevant, verifiable, representative, cumulative, and actionable measures to ensure interpretations of data are valid and consistent.
How Alignment is assured: The Assessment Coordinator in consultation with Program/Discipline Chairs, aligns the evaluation measures and assessment tasks with CAEP, InTASC, and appropriate Technology Standards. The Assessment Coordinator maintains alignments and adherence to multiple Louisiana state laws and policy regulations. All Standards have been maintained utilizing Watermark – Taskstream. This standards database is maintained by the Assessment Coordinator so that alignments can accommodate updates to standards, program competencies, courses, or assessments.
Evidence Overview
Evidence for this compendium will be presented in the following manner (1) Process for designing and developing assessments, (2) Presentation of Reliability Evidence, and (3) Presentation of Validity Evidence. Evidence will document that the EPP-created assessment has met the minimal 80% or .80 or above to establish content validity, and 75% or .75 or above to establish inter-rater reliability or agreement.
Evidence and Analysis
Process for Designing and Developing Assessments
Once an evaluation measure has been established, program leads work with a team of subject matter experts to create individual activities, assessment prompts, and the associated rubric that are all aligned with the SPA standards. Because they offer subject matter expertise to guarantee that content validity is embedded into the final design, the team of SMEs is a crucial component of this process.
The team concentrates on the rubric after completing the work instructions. GSU has basic rules for each of the rubric levels and how they should be constructed, even though the content specifies the precise criteria for each level under a particular rubric element. These definitions are listed in Table 1.
Table
Grambling State University Standard Five Compendium 4 Reliability and Validity of EPP Assessments
1: Performance Indicators- Descriptions
Performance Indicators - Descriptions
Novice Effective: Emerging Effective: Proficient (Target) Highly Effective
This rating is equivalent to having emerging performance skills/content knowledge that can be enriched with additional coursework.
This rating is equivalent to having the performance skills/content knowledge needed to move forward into student teaching; however, additional remediation might be needed to hone the candidate's performance.
This rating is equivalent to having the performance skills/content knowledge needed to be an effective student teacher where additional skills will be practiced.
This rating is equivalent to having the performance skills/content knowledge needed as a highly effective first year teacher.
There was a small section labeled "Assurance of Reliability and Validity" in each of the evidence items tagged to Standards 1 and 4 that includes information from assessments made by GSU. Also, the Constant Improvement/Actionability of Outcomes section of Standard Five Compendium 3 underlines how data insights are also actionable at GSU. We have a systematic approach to data quality in our EPP. We typically set validity reviews to every three years unless substantive changes are made to the instrument. As well, reliability will be explored every 3 years unless there are changes to the evaluators/instructors in the courses (Data Quality Review Table).
Program leads and faculty (SMEs) participate in training and calibration exercises to make sure that evaluators are using and interpreting rubrics in a consistent manner, which is necessary to ensure interrater reliability regarding the consistency of evaluating candidate performance on assessments (IRR and Norming Training). All evaluators in that area utilize the scoring rubric to evaluate a particular candidate submission that is selected during calibration. In order to ensure consistency among raters, evaluators receive personalized feedback to help them understand where they converge with and diverge from the broader team.
Faculty members are also occasionally chosen to take part in a formal inter-rater reliability study. The same pre-selected work sample from a course that the faculty members actively teach is scored individually for this study by other members of the faculty. Internal and external subject-matter and content experts are invited t o participate in content validity studies of common, EPP-created key assessments on a 3-year cycle or following instrument or description revisions.
Formal content validity and reliability studies are conducted electronically via Google Forms surveys using the format presented by Drs. Monaco and Horne at CAEPCon Spring 2022 (Monaco & Horne, 2022). Reliability study forms (Sample- ED 402: Technology-Infused Unit Plan Reliability Study Form) provide student work samples to teams of faculty members along with assessment rubrics and assignment directions. Percent Agreement is calculated using the scores of the faculty members to evaluate the amount of inter-rater reliability. GSU seeks 75% or higher agreement (Sample- ED 402: TechnologyInfused Unit Plan Percentage of Agreement Worksheet). Newly revised assessments are piloted upon
Grambling State University Standard Five Compendium 4
Reliability and Validity of EPP Assessments
completion of the reliability and validity studies and are adopted following review by the QAS Review Panel.
References
Monaco, M., Ph. D., & Horne, E. T., Ph. D. (2022, March 9). Data Quality: Deconstructing CAEP R5.2 [Conference Presentation]. CAEP.
Reliability of Assessments (Initial Programs)
Compendium CAEP Standard Assessment Inter-Rater Reliability
Standard R1
Compendium 1
The Learner and Learning
Standard R1
Compendium 3
Instructional Practices
Standard R1Compendium 1
The Learner and Learning
Standard R1
Compendium 4
Professional Responsibility
R1.1. R1.3 Danielson - Aligned Lesson Plan Template
All faculty have been through two iterations of Danielson group train-the-trainer modeled professional development in which they participated in norming activities after thoroughly investigating and discussing the theoretical framework and cross-cutting themes to which the Framework for Teaching is aligned. During this time, they discussed the interrelationships of all components and discovered the look-for and attributes that aligned with the core themes. Mentor teachers and other PK-12 stakeholders were also encouraged to participate in an effort gain their insight and perspectives. The University Supervisor or Residency Coordinator in consultation with the Mentor Teacher (ED 452/ 453) reviews the lesson plan, provides feedback on the lesson plan, observes the teaching event, provides additional feedback based on the candidate’s facilitation of the lesson, and scores the lesson plan.
Reliability study results for this assessment show that the percent of agreement, across all programs, was as follows:
AY 2022-23: 100%
Reliability study results for this assessment show that the percent of agreement, across all programs, was as follows:
R1.1, R1.4 Showcase Portfolio
AY 2022-23: 85%
Grambling State University Standard Five Compendium
4
Reliability and Validity of EPP Assessments
Reliability of Assessments (Initial Programs)
Compendium CAEP Standard Assessment
Standard R1
Compendium 1
The Learner and Learning R1.1 Technology Infused Unit Plan
Standard R1
Compendium 1
The Learner and Learning R1.1 Teacher Toolkit
Standard R1
Compendium 1
The Learner and Learning R1.1 Case Study Project
Inter-Rater Reliability
Reliability study results for this assessment show that the percent of agreement, across all programs, was as follows:
AY 2022-23: 87.5 %
Reliability study results for this assessment show that the percent of agreement, across all programs, was as follows:
AY 2022-23: 75%
Reliability study results for this assessment show that the percent of agreement, across all programs, was as follows:
AY 2022-23: 87.5%
Standard R1
Compendium 2
Application of Content
Standard R1
Compendium 3
Instructional Practices
R1.2, R1.3 Praxis II (Proprietary) Proprietary assessment scored outside of GSU by ETS.
Standard R1
Compendium 2
Application of Content
Standard R1
Compendium 4
Professional
Responsibility
R1.2, R1.4
Evaluation of Year Long Residency- Framework for Teaching Evaluation Instrument
(Proprietary)
According to the FFT Danielson Group training, numerous observers participate in scoring the candidates and providing feedback. The University Supervisor and Mentor Teacher jointly review the lesson plan for each of the formal observations, provide feedback on the lesson plan, observe the teaching event, provide additional feedback based on the candidate's facilitation of the lesson and reflection on the lesson, score the teaching event, and meet with the candidate after the lesson. Collaboration is employed in this approach to ensure inter-rater reliability and to provide candidates with more detailed feedback.
Reliability study results for this assessment show that the percent of agreement, across all programs, was as follows:
Grambling State University Standard Five Compendium 4
Reliability and Validity of EPP Assessments
Reliability of Assessments (Initial Programs) Compendium CAEP Standard Assessment Inter-Rater Reliability
AY 2022-23: 100%
Educator Disposition Assessment was presented at the CAEP Conference (during September 17-19, 2015 in Washington, D. C.). The session was entitled, Educator Disposition Assessment: A Research-Based Measure of Teacher Dispositional Behaviors by the University of Tampa. They indicated that the instrument has already gone through the validity processes.
Reliability study results for this assessment show that the percent of agreement, across all programs, was as follows:
AY 2022-23: 90%
Validity Evidence
CAEP recommends establishing content validity using Lawshe’s approach. To determine the content validity of EPP created assessments, GSU uses a panel of subject matter experts (SMEs) to determine how well the elements included within the assessment align with the intended outcomes. Using the Lawshe Method (recommended by CAEP), SMEs are provided with a copy of the assessment’s directions and rubric. They are then asked to determine if each element is essential, useful but not essential, or not necessary (Sample- ED 402: Technology-Infused Unit Plan Content Validity Study Form. The content validity ratio (CVR) is calculated for each element and the content validity index (CVI ) is calculated for the instrument using an Excel worksheet (Sample- ED 402- Technology-Infused Unit Plan CVR and CVI Outcomes) formatted with the following formulas:
Standard
Compendium 4Professional Responsibility R1.4 Educator
R1
Disposition Assessment
Standard
Compendium
Professional Responsibility R1.4 Classroom
R1
4
Management Plan
Grambling State University Standard Five Compendium
4
Reliability and Validity of EPP Assessments
CVR = (ne – n/2)/(n/2)
Feedback from the experts is reviewed and discussed by program leads and faculty members to discuss what modifications and updates might be necessary, particularly for those items or instruments that fail to meet the acceptable CAEP Sufficiency of Evidence Standards.
Validity of GSU Assessments Initial Programs (ITP)
Item 4 does not meet content validity with CVR of .60
do not meet content validity
S-CVI (R)
Compendium CAEP Standard Assessment Content Validity Standard R1Compendium 1 The Learner and Learning Standard R1 Compendium 3 Instructional Practices R1.1. R1.3 Danielson - Aligned Lesson Plan Template Assessment Element CVR Demonstrating Knowledge of Content and Pedagogy 1.00 Demonstrating Knowledge of Students 1.00 Setting Instructional Outcomes 1.00 Demonstrating Knowledge of Resources .50 Designing Coherent Instruction 1.00 Designing Student Assessments 1.00
CVI= .92
Standard R1Compendium
Learner
Standard R1 Compendium
Professional Responsibility R1.1, R1.4 Showcase Portfolio Assessment Element CVR Standard 1: Learner Development 1.00 Standard 2: Learning Differences .50 Standard 3: Learning Environments 1.00 Standard 4: Content Knowledge .50 Standard 5: Innovative Applications of Content 1.00 Standard 6: Assessment 1.00 Standard 7: Planning for Instruction 1.00 Standard 8: Instructional Strategies 1.00 Standard 9: Reflection and Continuous Growth .50 Standard 10: Collaboration .50 CVI=.80
2,
9, and 10
Standard R1 Compendium 1 The Learner and Learning R1.1 Technology Infused Unit Plan Assessment Element CVR Instructional Design & Strategies 1.00 Classroom Culture & Student Engagement 1.00
1 The
and Learning
4
Items
4,
with .60
Grambling State University Standard Five Compendium 4
Reliability and Validity of EPP Assessments
Validity of GSU Assessments Initial Programs (ITP) Compendium CAEP Standard Assessment Content Validity Communication and Collaboration .60 Assessment Strategies 1.00 CVI = .93 Item 3 does not meet content validity with CVR of .60 Standard R1 Compendium 1 The Learner and Learning R1.1 Teacher Toolkit Assessment Element CVR Evidence of Planning 1.00 Instructional Delivery 1.00 Classroom Management 1.00 Technology .33 Formal Assessment 1.00 CVI= .87 Item 4 does not meet content validity with CVR of .60 Standard R1 Compendium 1 The Learner and Learning R1.1 Case Study Project Assessment Element CVR Demographic Data 1 Anecdotal Notes 1 Quantitative and Qualitative Assessment Data 1 Intervention(s) (Plan of Action Applications) 1 Summarization of Positive Effects of Intervention(s) 1 Collaboration 1 Reflection 1 Conventions 1 CVI= 1 Standard R1 Compendium 2 Application of Content Standard R1 Compendium 3 Instructional Practices R1.2, R1.3 Praxis II (Proprietary) Proprietary Assessment: Link to ETS validity study
Grambling State University Standard Five Compendium 4
Reliability and Validity of EPP Assessments
Validity of GSU Assessments Initial Programs (ITP)
Compendium CAEP Standard Assessment Content Validity
Standard R1
Compendium 2
Application of Content
Standard R1
Compendium 4
Professional Responsibility
R1.2, R1.4
Evaluation of Year Long ResidencyFramework for Teaching Evaluation Instrument (Proprietary)
Proprietary Assessment: Link to Framework for Teacher validity study
Standard R1
Compendium 4
Professional Responsibility R1.4 Disposition Survey
Questions or topics are explicitly aligned with aspects of the EPP’s mission and also CAEP, InTASC, national/professional, and state standards. Individual items have a single subject; language is unambiguous. Leading questions are avoided. Items are stated in terms of behaviors or practices instead of opinions, whenever possible. Surveys of dispositions make clear to candidates how the survey is related to effective teaching.
Educator Disposition Assessment was presented at the CAEP Conference (during September 17-19, 2015 in Washington, D. C.). The session was entitled, Educator Disposition Assessment: A Research-Based Measure of Teacher Dispositional Behaviors by the University of Tampa. They indicated that the instrument has already gone through the validity processes.
Standard
Compendium
Professional Responsibility
Educator
R1
4
R1.4
Disposition Assessment
Standard R1 Compendium 4 Professional Responsibility R1.4 Classroom Management Plan Assessment Element CVR Theoretical Introduction 1 Classroom Management Philosophy 1 Instructional and Assessment Strategies 1 Motivation 1 Vision 1 Essential Knowledge 1 CVI= 1
Focus Area(s)
Grambling State University Standard Five Compendium
4
Reliability and Validity of EPP Assessments
Continuous Improvement
The EPP-created assessments have met the minimal 80% or .80 or above to establish content validity, and 75% or .75 or above to establish inter-rater reliability or agreement. However, several criteria within the instruments failed to meet the established threshold. In response, GSU faculty will review the comments from SME respondents and draft changes to the assessments. SMEs and other stakeholders will be invited to provide feedback and assist in the co-creation of revised directions and rubrics. After the assessments have been approved by the QAS Review panel, new assessments will be piloted during the 2023-2024 academic year. The current data quality review cycle will be amended to allow for an additional reliability and validity review following the completion of the pilot.