Interleaved problem sets: Testing for maths learning methods that work for all pupil performance levels
Date: July 2019 Team: Academics, NewGlobe Acknowledgments: Special thanks to:
Leaders in Learning. https://newglobe.education/
Interleaved problem sets: Testing for maths learning methods that work for all pupil performance levels
2
Contents 1. 2. 3. 4. 5. 6. 7.
Abstract Introduction Review of the Literature Methods Results Discussion References
Interleaved problem sets: Testing for maths learning methods that work for all pupil performance levels
Page 04 Page 05 Page 06 Page 08 Page 11 Page 13 Page 15
3
1. Abstract Interleaved problem sets assign content aligned with the daily objective and also map back to previous material. Interleaved problem sets offer a compelling alternative to blocked problem sets, in which all problems align to the daily objective. A growing body of lab-based and small-scale classroom-based literature suggests that interleaved problem sets lead to better learning outcomes by facilitating increased retrieval practice and more opportunities for strategy selection. This study explores the impact of interleaved problem sets among Primary 5 pupils attending 61 schools in Lagos and Osun states in Nigeria. Pupils in the control group received problem sets that were 100% blocked and aligned to the daily objective. Pupils in the treatment group received problem sets that were 100% interleaved and that included problems from the current lesson and from previous lessons and units. At the midline of this study, we find that pupils who completed interleaved problem sets during the first term of the school year made significantly larger gains on content mastery (0.41 standard deviations) compared with pupils who completed blocked problem sets. The effects were particularly pronounced among the lowest-performing pupils.
Interleaved problem sets: Testing for maths learning methods that work for all pupil performance levels
4
2. Introduction The global education community has achieved significant progress towards improving access to primary education, particularly in developing countries. But parallel progress towards educational quality has not kept pace with the rapid expansion of access. 330 million children are attending school but are not learning. Improving educational quality is of paramount importance for the future of young people and for the economic growth of developing countries. It is conceptually simple, although logistically challenging, to build and open new schools. Quality, however, is a far more nebulous construct both to define and to achieve. The work of educational theorists and cognitive scientists have increasingly converged in an attempt to both define quality education and to identify ways to accelerate learning among pupils. Several concrete instructional methodologies have emerged from this literature. These approaches are rooted in cognitive science and educational theory, but can be applied concretely in practical settings to boost learning and retention.
One such practical approach is interleaving. In an interleaving approach, questions or problems assigned during independent practice include content aligned to that particular day’s objective, but also encompasses content from previous days, units, and even terms. Interleaved problem sets offer a compelling alternative to blocked problem sets, in which all content is aligned to the day’s objective. A growing body of lab-based literature supported the use of interleaved problem sets. Due to challenges limiting our capacity to evaluate such a granular pedagogical approach across multiple schools, we lack a substantial evidence base from classroom research. It is crucial to explore
Interleaved problem sets: Testing for maths learning methods that work for all pupil performance levels
5
interventions such as interleaving in a classroom setting in order to better understand how they compare to more conventional approaches to problem set development. The purpose of this study is to analyse the impact of interleaving problem sets on termly learning gains compared with the impact of blocked problem sets. This study was conducted among 61 schools operated by Bridge International Academies in Lagos. The study focused on Primary 5 Mathematics pupils. Half of schools were provided with instructional materials that utilised blocked problem sets. The other half of schools was provided with instructional materials that utilised interleaved problem sets. This paper presents preliminary midline results collected after the first term of the study. The study total study will last for 3 terms and include termly pretest/posttest analysis as well as a cumulative assessment measuring long-term retention. This research is crucial to advance the conversation around improving educational quality in the developing world. While much of the conversation is increasingly focusing on educational technologies as a panacea to low-quality instruction, these technologies are challenging for classrooms in low-resource settings to adopt. Low-cost, easily adoptable instructional approaches offer a universally accessible way to drive learning gains in core subject areas across all classrooms.
3. Review of the Literature Interleaving Interleaving draws upon theories of learning that emphasise the use of spacing and opportunities for retrieval practice (Roediger and Pyc, 2012). This framework argues that humans learn and build long-term memory through the act of retrieving information. In the act of retrieving past information, neurological connections are strengthened and information is stored in long-term memory. In order to promote long-term memory storage, classroom instruction must provide ample opportunity for memory retrieval. This traditionally occurs through strategic use of formative and summative assessments. On these assessments, pupils are forced to retrieve information from the previous chapter or term and demonstrate mastery through recognition (as in multiple choice) or production (as in open-response). Too often, however, these assessments are given at the end of a chapter, unit, or term. With the exception of daily assessments encouraging retrieval practice, reliance on traditional assessments as the sole retrieval opportunity misses a significant opportunity to strengthen long-term storage of information on a daily basis. Interleaving presents one opportunity to leverage retrieval practice opportunities towards long-term information storage. Interleaving is a strategy in which content from previous days, chapters, units, or terms is interwoven with content aligned to the lesson’s learning objective. Interleaving promotes two invaluable learning outcomes: retrieval practice through spacing and strategy selection. First, the use of spacing between the learning of the material and the completion of problems forces pupils to retrieve information, concepts, or ideas learned during previous lessons. The act of retrieval increases the likelihood that the pupil will be able to recall that information in the future. Second, particularly in mathematics, pupils are forced to select from a wide range of solution strategies. For instance, pupils might need to select whether to apply the formula for circumference or area of a circle when both types of problems are included in a single lesson’s problem set. Not only does this improve pupils’ ability to select the appropriate strategy. It more accurately reflects a typical exam structure in which content is cumulative and jumbled.
Interleaved problem sets: Testing for maths learning methods that work for all pupil performance levels
6
A mixture of lab-based and classroom-based studies supports the notion that interleaving can promote improved content acquisition and memory retention. The lab-based research focuses largely on the impact of interleaving among students in a university setting. Rickard, Lau, and Pashler (2008) conducted a study of 39 university students from the University of California (San Diego). During this 7-day study, pupils in the control group completed 8 blocked problem sets, each consisting of 3 problems. Pupils in the treatment group completed 2 interleaved problem sets, each consisting of 12 problems. By the end of the study, pupils in the control group and treatment group had completed the same questions, only in different orders and orientations. Pupils were administered a pretest before the completion of the problem sets, and a posttest 7 days after the conclusion of the last problem set. The authors sought to better understand whether blocked practice improved short-term memorisation, and if interleaved practice improved calculation skills and long-term retention. Subjects who completed larger interleaved problem sets made more errors and had longer response times during the training session. But those pupils also significantly outperformed pupils who completed blocked problem sets on the delayed posttest. Mayfield and Chase (2002) particularly focused on whether interleaving better supports learning and the practice of new skills compared to a blocked approach. Their sample included 33 university students with poor mathematical skills attending the University of West Virginia. Both groups of pupils were assessed at the outset using a pretest measuring content knowledge. During the 25-day study, students in the control group learned 5 algebraic rules using worksheets that provided explanation of the new rule, worked examples of the rule, and aligned practice problems. Students in the treatment group followed the same process to learn the 5 algebraic rules. The instruction and practice opportunities, however, used an interleaved structure rather than a blocked structure. Both groups of students took an endline exam measuring their understanding, retention, and ability to solve according to the algebraic rules. Pupils who received interleaved instruction and problem sets scored significantly higher on the endline assessment (97%) compared with students who received blocked instruction and problem sets (85%). Taylor and Rohrer (2010) explored the effect of interleaved problem sets among a small sample of 4th grade pupils attending a Florida elementary school. In this study, 12 pupils in the control group classroom watched a tutorial and completed two blocked problem sets. The 12 pupils in the treatment group watched the same tutorial and completed two interleaved problem sets. All pupils were assessed using a pretest before the tutorial and a posttest one day after the problem sets. Taylor and Rohrer explored how blocked versus interleaved problem sets impacted learning in the short-term (on the daily problem sets) and in the long-term (on the delayed posttest). They found that while interleaved practice impaired pupil performance on the practice problem sets, pupils scored twice as high on the posttest (77% among treatment pupils compared with 38% among control group pupils). The authors found that the impact of interleaving was more closely related to each pupil’s improved ability to pair the problem with the appropriate strategy rather than due to the effects of spacing. Rohrer, Dedrick, and Burgess (2014) conducted a larger classroom-based study of 140 7th grade pupils attending a public middle school in Florida, The study lasted for 11 weeks, comparing the effects of blocked and interleaved problem sets on pupil performance in mathematics. In the control group, pupils completed 9 weeks of blocked mathematics problem sets. Pupils in the treatment group completed 9 weeks of interleaved problem sets. In total, all pupils completed the same problems. The major difference was the order in which pupils encountered each problem. Mean test scores
Interleaved problem sets: Testing for maths learning methods that work for all pupil performance levels
7
were significantly higher among pupils who learned material using interleaved problem sets (72%) compared with pupils in the control group (38%). The effect size was 1.05. Some literature does suggest that spacing and interleaving can actually negatively impact skilllearning and implicit memory. Greene (1990) and Perruchet (1989) found that spacing effects were not robust for the perceptual identification and word fragment completion tasks. Overall, however, a substantial body of evidence supports the notion that interleaving can improve content acquisition and long-term memory storage through the mechanism of memory retrieval, spacing, and strategy selection. But the most convincing evidence comes from lab-based research among university students, while the existing classroom-based research lacks sufficient sample size to make a convincing case for the widespread adoption of interleaving in classroom practice.
4.Methods Participants This study was conducted among schools in Nigeria operated by Bridge International Academies (Bridge). Bridge was founded in 2009 and since then, has educated over 500,000 children at more than 1000 schools across Africa and Asia. Initially, Bridge operated community-based private schools. More recently, Bridge has begun to partner with governments to provide technical support and expertise in instructional design, teacher training, and programme evaluation in government schools. In both private and public schools, Bridge provides teacher training, teacher technology, lesson guides aligned to the national syllabus, and data-driven professional development. Bridge operates 63 community-based private schools in Lagos and Osun states in Nigeria. This study includes 61 of those schools. The selected 61 schools are those that offer a Primary 5 classroom. All 61 available schools were selected to maximise statistical power in the pilot. Specifically, the study focused on Primary 5 pupils. 1050 Primary 5 pupils attended those 61 schools at the beginning of the study. In this study, the unit of randomisation was the academy. Approximately half of schools were allocated to the treatment group, while approximately half were assigned to the control group. Stratified randomisation was used to increase power and ensure balance between the control and treatment groups. Strata were constructed using baseline pupil performance on internal assessments (historical midterm and endterm exams). External partners from Harvard University conducted randomization to ensure a lack of bias in the process.
Procedure All teachers at Bridge are equipped with a personal teacher tablet. Daily lesson guides are uploaded via mobile network to each school leader’s smartphone. Each teacher syncs their tablet with the smartphone each morning, which allows teachers direct access to each day’s lessons. The lesson guides themselves are designed by Bridge instructional design experts based in Lagos, Kenya, and the United States. Each lesson is aligned to the national curriculum and maps back from daily objectives.
Interleaved problem sets: Testing for maths learning methods that work for all pupil performance levels
8
Because digital lessons are synced to individual teacher tablets, Bridge is able to design alternative streams of lesson guides using a new instructional approach. Original, incumbent lesson guides can be programmed to sync to the tablets of control group teachers, while treatment group lessons using the new intervention can be programmed to sync to the tablets of treatment group teachers. This ensures that lessons are identical except for the instructional approach being evaluated. In addition, all other aspects of a teacher’s day (timetable, content and structure of other lessons, training and professional development, etc.) are held constant. This allows for a unique opportunity to isolate the impact of a very specific approach to instructional design on learning outcomes. The control group lessons consist of Bridge’s incumbent approach to Mathematics instruction. In this approach, pupils receive two mathematics lessons each day. During the first lesson, pupils learn a new concept aligned with the Nigerian national syllabus. During this lesson, the teacher provides two worked examples to demonstrate the new concept or strategy. Following each demonstration, pupils complete blocked problem sets aligned with the daily objective for that lesson. During the second lesson, pupils receive additional instruction on variations of the concept or strategy. Pupils then complete two additional blocked problem sets aligned with the daily objective. Each problem set consists of approximately 8-12 problems that build in rigour within and across problem sets. During each session of independent practice, the teacher provides corrective feedback to individual pupils. Treatment group lessons retain the same overall structure, in which pupils receive targeted instruction and opportunities for practice during the first lesson and additional worked examples and practice opportunities during the second lesson. Both lessons retain identical daily objectives, worked examples, and lesson guide structure. The only difference between the control group and treatment group lessons is the problems selected for each problem set. In the treatment group, problems are interleaved using a consistent system. In an 10-question problem set, two questions are aligned Interleaved problem sets: Testing for maths learning methods that work for all pupil performance levels
9
to the daily objective. Three questions spiral back to the previous three lessons (n-1, n-2, n-3), three questions spiral back to the previous three units (u-1, u-2, u-3). This system uses spacing to offer ample opportunities for memory retrieval as well as aligned practice. The final question involves a synthesis of multiple concepts to facilitate improved strategy selection. All interleaved problems are selected from the blocked problem sets used in previous lessons. Aside from the synthesis question, no new questions are introduced in the treatment group. Only the order in which problems appear is adjusted. Baseline knowledge was assessed using a pretest at the start of the term. The pretest was communicated via the teacher tablet. Teachers were notified of the assessment in advance and received the assessment on the morning of the pretest. In order to reduce bias that could result from a teacher administering the assessment to his or her own class, a teacher exchange was required. The Primary 5 teacher exchanged tablets and classrooms with the Primary 4 teacher, who was responsible for administering the assessment, marking the tests, and entering the data. The assessment included 30 problems. Problems were written on the board. Pupils then solved the problems in their exercise books. Each problem was an equation that pupils were required to solve. Answers were written in open-response format. Pupils were given 80 minutes to complete the assessment. The teacher received time to mark and enter the scores during the subsequent lesson. Once a teacher entered the scores and synced with the school leader’s smartphone, scores were automatically uploaded to Bridge’s central server, where they are available for download. The duration of this study is one year (three terms). This paper presents the outcomes from the first term of the study. During this term, pupils received 50 total instructional days. 4 of those days were devoted to assessment and marking. The remaining 46 days used either the interleaving or blocked problem sets. At the end of the term, pupils were given an endline assessment. The assessment was written as a mirror-exam to the pretest. It was administered in the same way and included 30 questions. All questions map back from the pretest questions and focus on the same content. Only the numbers used and the order of the questions vary between pretest and posttest in order to reduce the likelihood of test sensitisation. For each additional term during the 2018-2019 academic year, a similar pretest/posttest design will be used. A final cumulative assessment will be administered at the end of term 3, 2019 to measure long-term retention.
Analytic strategy We estimate the impact of interleaved problem sets using an ordinary least squares regression, controlling for baseline assessment results, teacher quality (as measured by lesson completion percentage), and pupil attendance. Lesson completion percentage is measured as each teacher’s delivery of at least 80% of the lesson guide on the day during which the guide was assigned. Effective teachers in the Bridge system consistently deliver each lesson and are able to deliver the lesson within the time allocated for the lesson. Each teacher receives an overall lesson completion percentage over the course of the experiment duration. Pupil attendance is calculated by averaging the overall attendance for a classroom over time. Our model estimates endline assessment scores for student i in school j at time period t as a function of baseline assessment scores (Yi,j,t-1), lesson completion (Completion,j), attendance rates (Attendance,j), and an error term (εi,j). Yi,j,t = 𝛽0 + 𝛽1 Treatmentj,t + 𝛽2 Yi,j,t-1 + 𝛽3 Completioni,j + 𝛽4 Attendancei,j + εi,j Interleaved problem sets: Testing for maths learning methods that work for all pupil performance levels
10
Our coefficient of interest is 𝛽1, which provides an estimate for the impact of the interleaved problem sets on pupil performance on termly content knowledge. All estimates cluster standard errors at the academy level.
4. Results Impact on pupil achievement No significant differences were found between pretests (p=0.41). Significant differences were observed on other baseline characteristics. Pupil attendance was significantly higher among control group pupils (2.6%; p<0.05), while lesson completion percentage was significantly higher among treatment group teachers (2.8%; P<0.01). A restricted sample was used in this analysis. Only pupils with both a pretest and a posttest score were included. As a result, the overall counts for both students and schools were slightly low. I present overall numbers for the restricted sample in the table below. The imbalance in the counts reflects an imbalance in data entry. Treatment group schools more frequently entered scores compared with control group schools.
Below, I present a table of descriptive statistics comparing pretest and posttest outcomes (in terms of raw scores on a 30-item assessment) among control group and treatment group pupils. There was no significant difference among pretest scores between control group and treatment group pupils. There were, however, significant differences observed among posttest scores between control and treatment group pupils.The difference, 2.36, was significant at the 0.001 level.
Additionally, I conduct a simple gains score analysis. Below, I present the results (in terms of raw gains scores between pretest and posttest, each a 30-item assessment). Treatment group pupils also made larger gains between baseline and endline (2.81), and those gains were significant at the 0.001 level.
Interleaved problem sets: Testing for maths learning methods that work for all pupil performance levels
11
I present the results of the ordinary least squares regression below. In each stage of the model, treatment assignment predicts a significant amount of the variance in posttest scores, controlling for pretest scores, lesson completion, and pupil attendance. The effects were significant at the 0.001 level. In the final model, he effect size of the treatment was 0.41.
Differential response by achievement levels Further analysis was conducted to explore the differential effects of interleaved problem sets among different ability groups. Among lowest performing pupils (baseline = 0-4), treatment group pupils gained significantly more (3.17 points higher; p<0.01). Among the second-lowest performing pupils (baseline = 5-9), treatment group pupils gained significantly more (3.50 points higher; p<0.001). Among middle-performing pupils (baseline = 10-19), there was no significant different in gains scores. The number of pupils falling within the highest performance bands (20-30) was too small to meaningfully evaluate differences between control group and treatment group pupils.
Non-experimental evidence from lesson observations In order to monitor the fidelity of implementation of the interleaved problem sets, Bridge relies on academic field officers. Academic field officers regularly observe lessons at different academies within a particular region. These lesson observations capture granular details about the quality of the lesson, such as lesson quality, the extent to which pupils achieved the academic objective of the lesson, the precision of section-by-section timing, the clarity and accuracy of the lesson guide, and the ratio of independent practice to total classroom time. An academic field officer observed 20 different control and treatment group lessons throughout the term. Below, I present the results of these qualitative Interleaved problem sets: Testing for maths learning methods that work for all pupil performance levels
12
lesson observations disaggregated by control and treatment group lessons. It is important to note that these results reflect differences in the ways that lessons were delivered, not in the way that lessons were designed.
Based on qualitative lesson observations, lesson-to-lesson academic achievement did not significantly differ between control group and treatment group lessons. In both cases, pupils answered approximately 80% of questions correctly. The teacher rating also suggests that the quality of teachers in control and treatment schools did not vary. The only factors in which major differences were observed were the timing (treatment group lessons were slightly less precise), the number of confusing lines (control group lessons were slightly less clear for teachers to deliver), completion rate (fewer treatment group lessons were delivered to completion), and the ratio of independent practice to total number of minutes (treatment group lessons afforded more time for independent practice). These results could suggest some moderating variables that explain the strength of the effect size.
5. Discussion The purpose of this research is to explore the impact of interleaved problem sets on pupil achievement compared with blocked problem sets. The underlying question is whether spacing, which facilitates memory retrieval, and mixed problem sets, which facilitate strategy selection, result in improved long-term content retention. We found that pupils who completed interleaved problem sets achieved significantly larger gains compared with pupils completing blocked problem sets. We found some heterogeneous effects of interleaving. Pupils at the lower end of the performance distribution accounted for a large percentage of the gains observed in the total sample. It is not possible within the limitations of this study to isolate the differential impacts of spacing and strategy selection on the observed outcomes. It may be that spacing or strategy selection explain a significantly larger amount of the variance observed in the outcome variable. In some ways, however, the answer is irrelevant. Effective interleaved problem sets should integrate both spacing and strategy selection in order to achieve the results described here. One additional explanation of these results could be that pupils who completed interleaved problem sets were better prepared to succeed on an assessment structure that more closely reflects an interleaved problem set than a blocked problem set. In this way, interleaving may represent a way to improve exam performance without necessarily improving long-term memory retention. Additional research is needed to better understand the impact of interleaving on memory as opposed to exam preparation.
Interleaved problem sets: Testing for maths learning methods that work for all pupil performance levels
13
These results suggest that interleaving is a low-cost, low-tech instructional approach that can significantly accelerate learning growth in mathematics. This strategy can be applied at all levels of an instructional model, from print materials to lesson guides to supplementary learning materials. It does not require apps, tablets, or even textbooks. Furthermore, the impact appears to be most significantly felt among lowest-performing pupils, which makes this an intervention with the opportunity to target struggling pupils and reduce the achievement gap within the classroom. Qualitative lesson observations provide a fascinating glimpse into the implementation of blocked versus interleaved lesson guides. Aside from the summative statistical differences noted above, the lesson observation notes do not reveal any major difference in the minute-to-minute experience of pupils and teachers. Teachers were equally able to deliver each type of lesson, despite the fact that teachers received no additional training or ongoing professional development. This suggests the wide applicability of interleaving among teachers of all ability levels and in all settings. There are several limitations of this research that are important to note. First, the sample size limited the statistical power of this study. Only 61 schools were included in the analysis, and it was not possible to lower the unit of randomization to the pupil level. Second, the sample included only upper-primary pupils. It is unclear whether lower-primary pupils would similarly benefit from interleaved practice, or whether the lack of aligned daily practice might in fact harm pupil achievement. Finally, the study focuses on mathematics, which is a natural setting for interleaved practice. It is possible that the effects of interleaving might be different for other subjects such as science or social studies. Future research into the effects of interleaving should focus on lowerprimary pupils, ideally from a larger sample of schools. In addition, future research should also explore the effects of interleaving in other core content areas. Finally, future interventions could Interleaved problem sets: Testing for maths learning methods that work for all pupil performance levels
14
combine interleaved instruction and problem sets with more frequent assessment system in order to strengthen the opportunities for retrieval practice. This research contributes to the literature in three important ways. First, the results affirm the outcomes reported in several previous studies of interleaving. Second, the strategy was evaluated in a classroom setting rather than in a controlled, lab-based setting. Furthermore, it included classrooms across a large school network and evaluates interleaved and blocked practice among comparable instructional settings. Third, this study explores the impact of interleaving in a developing context. While interleaving has the potential to impact classroom instruction around the world, its effects could be most profound in low-resource settings in developing contexts. Finally, and perhaps most importantly, this study reinforces the notion that learning science can offer the educational community enormous insight into how we can accelerate learning among pupils in all settings and of all ability levels.
6. References Greene, Robert L. (1990). “Spacing Effects on Implicit Memory Tests.” Journal of Experimental Psychology, Learning, Memory, and Cognition 16(6): 1004-1011. Mayfield, Kristin H. and Phillip N. Chase (2002). “The Effects of Cumulative Practice on Mathematics Problem Solving.” Journal of Applied Behavior Analysis 35: 105-123. Perruchet, Pierre (1989). “The Effects of Spaced Practice on Explicit and Implicit Memory.” British Journal of Psychology 80(1): 113-130. Rickard, Timothy C., Jonas Sin-Heng Lau, and Harold Pashler (2008). “Spacing and the Transition from Calculation to Retrieval.” Psychonomic Bulletin & Review 15(3): 656-661. Roediger, Henry L and Mary A. Pyc (2012). “Inexpensive Techniques to Improve Education: Applying Cognitive Psychology to Enhance Educational Practice.” Journal of Applied Research in Memory and Cognition 4(1): 242-248. Rohrer, Doug, Robert F. Dedrick, and Kaleena Burgess (2014). “The Benefit of Interleaved Mathematics Practice Is Not Limited to Superficially Similar Kinds of Problems.” Psychonomic Bulletin Review 21(5): 23-30. Taylor, Kelli and Doug Rohrer (2010). “The Effects of Interleaved Practice.” Applied Cognitive Psychology 24: 837-848.
Interleaved problem sets: Testing for maths learning methods that work for all pupil performance levels
15
Leaders in Learning https://newglobe.education/
Interleaved problem sets: Testing for maths learning methods that work for all pupil performance levels
Interleaved problem sets: Testing for maths learning methods that work for all pupil performance levels
16