Creating systems to support experimentation in international education
Date: February 2018 Team: Academics, Bridge International Academies Acknowledgments: Special thanks to:
Leaders in Learning. https://newglobe.education/
Creating systems to support experimentation in international education
2
Contents 1. 2. 3. 4. 5. 6.
Abstract Introduction Methodology Case-Study: Leveled Reading in Lagos Discussion References
Creating systems to support experimentation in international education
Page 04 Page 05 Page 06 Page 10 Page 11 Page 14
3
1. Abstract Educational organisations rely increasingly on digital platforms and learning technologies to deliver quality education. As a result, randomised A/B testing has become a viable pathway for internal innovation in addition to its traditional role in academic research and more formal evaluations. Several prominent education technology organisations already rely on A/B testing to improve the quality and reach of their product. But educational providers have not yet embraced A/B testing as a mechanism through which they can hone and improve instruction in classrooms. This paper outlines a new, innovative methodology to conduct randomised A/B testing across schools operated by Bridge International Academies. This platform relies on digital dissemination of learning materials and teacher-driven assessment and data entry in order to execute and evaluate new interventions in the classroom. After describing the process itself, this paper briefly discusses a case study in which this platform is being used to evaluate a leveled reading approach in Lagos and Osun States, Nigeria. Finally, the paper concludes with a discussion of the unique elements of this platform, the implications for instructional quality at Bridge, and the lateral influence that these evaluations could have on other educational providers, schools, teachers, and researchers.
Creating systems to support experimentation in international education
4
2. Introduction The field of Mind, Brain, and Education seeks to bridge the gap between cognitive learning science and practical classroom strategies. This field draws on exciting new lines of inquiry such as brain scanning to better understand the neurological processes at work during learning and memory formation. The synergy between cognitive learning science and opportunities for the practical application of this science offer an extraordinarily powerful avenue to improve educational quality around the world. But often, learning science research focuses on incredibly specific pedagogical interventions or learning processes. This level of specificity typically necessitates exploration in a lab-based setting rather than in a classroom where the practices would ultimately be put into practice. As a result, educational providers (teachers, school systems, school networks, and instructional design organisations) often struggle to determine the practical applicability of those interventions in a classroom setting. Even after successful lab-based evaluation of a new idea, providers are left with a perilous choice: either adopt the new approach based on lab-based research and anecdotal field-testing, or miss out on a potentially powerful opportunity to improve instructional quality and boost pupil learning. Some organisations that operate at sufficient scale have adopted A/B testing platforms to evaluate the impact of particular insights from learning and behavioural science. Khan Academy, a widely used online database of learning videos and practice opportunities, employs A/B testing using split testing of videos to explore the impact on user outcomes. Coursera, [insert description of Coursera], uses A/B testing of inputs such as videos, PowerPoint, and assessments to measure impact on course completion and pupil achievement. Duolingo, a foreign language learning platform, uses A/B testing to explore the effects of delayed sign-ups, streaks encouraging regular behaviours, badges providing rewards, and the use of the in-app coaching option. All of these systems, however, are tech-based learning platforms. While their approach to innovation is admirable and enjoys a wide reach to learners around the world, their A/B testing fails to evaluate the impact of new instructional methods or learning incentives in a classroom setting. Furthermore, they do not investigate practical instructional methodologies that all teachers can adopt, even those without access to these tech platforms. Finally, the outcomes of this research certainly inform the development of a better product, which in turn impacts the learning outcomes of its users. But they do not typically have implications for broader classroom practice among teachers of diverse ability levels and in a wide range of resource settings. The purpose of this article is to illustrate a platform to conduct short-cycle, randomised controlled trials at scale to explore the impact of instructional methodologies and learning materials with universal applicability for classroom practice. This platform was designed by the Learning Innovation department at NewGlobe, an organisation that operates or supports approximately 1,500 schools across six countries in Africa and Asia. This platform allows NewGlobe to evaluate the effects of new approaches to teaching, instructional design, and parental communication. This paper will discuss a particular methodology that is available to organisations with sufficient scale. It will demonstrate the importance of scale, which enables organisations to both ask and also to answer questions about how new approaches impact pupil learning. In addition, this paper explores the concrete ways in which NewGlobe’s approach to learning innovation contributes to the global conversation about best practices in instructional design, school management, and parental engagement. Finally, I make the case that organisations must be data-driven and focus on learning outcomes, rather than intermediate measures that may or may not lead to learning gains. Creating systems to support experimentation in international education
5
3. Methodology NewGlobe NewGlobe was founded in 2007 and since then, has educated over 500,000 children at more than 1000 schools across Africa and Asia. Initially, NewGlobe operated community-based private schools. More recently, NewGlobe has begun to partner with governments to provide technical support and expertise in instructional design, teacher training, and programme evaluation in government schools. In both private and public schools, NewGlobe provides teacher training, teacher technology, lesson guides aligned to the national syllabus, and data-driven professional development. One particularly unique element of the NewGlobe instructional model is the method for distributing teacher guides. All teachers supported by NewGlobe are equipped with a personal teacher tablet. Daily lesson guides are uploaded via mobile network to each school leader’s smartphone. Each teacher syncs their tablet with the smartphone each morning, which allows teachers direct access to each day’s lessons. Once a teacher has digitally retrieved lesson guides for the day, they are prepared to deliver each lesson of the day. The lesson guides themselves are designed by NewGlobe instructional design experts based in Lagos, Kenya, and the United States. Each lesson is aligned to the national curriculum and maps back from daily objectives. Lessons are stored on a cloud-based management platform called Xyleme. This platform then converts lesson guides from their Word format into a format compatible with the teacher tablets.
Creating systems to support experimentation in international education
6
Because digital lessons are synced to individual teacher tablets, NewGlobe is able to design alternative streams of lesson guides using a new instructional approach. Original, incumbent lesson guides can be programmed to sync to the tablets of control group teachers, while treatment group lessons using the new intervention can be programmed to sync to the tablets of treatment group teachers. This ensures that lessons are identical except for the instructional approach being evaluated. In addition, all other aspects of a teacher’s day (timetable, content and structure of other lessons, training and professional development, etc.) are held constant. This allows for a unique opportunity to isolate the impact of a very specific approach to instructional design on learning outcomes. The same system that enables the outward digital distribution of lesson guides to teachers also works in reverse. Teachers are able to enter data into their teacher tablets. This, in turn, makes pupil data immediately accessible at the organisational level. To begin, teachers administer a test in the classroom. This can be a paper-based test like a midterm or endterm exam. But the assessment can also be embedded in the digital lesson guide. This digitlasition of assessments significantly reduces the cost and complexity of printing assessments for thousands of pupils across hundreds of schools. In addition, these assessments can be tweaked and changed at the last minute, while printed tests are static. After administering the assessment, teachers then mark the tests and enter scores for each pupil into the teacher tablet. After syncing with the school leader’s smart phone, the scores automatically sync to NewGlobe’s central reports server. NewGlobe analysts are then able to download the assessment scores for analysis.
Procedure The first stage of NewGlobe’s A/B testing process is to collect new data and analyse existing data on pupil learning. This data typically includes midterm and endterm exam data, EGRA and EGMA data, and internally designed literacy and numeracy assessment data. Often, new tests are given to answer a specific question about learning in a country context. Reading widely in the academic literature and initiating conversations with experts in learning science and classroom instruction supplement this data collection. The goal of the analysis is to identify the most pressing problems confronting pupil learning. Before moving forward with any idea for testing, it is crucial to demonstrate a strong evidence-base for the problem. Once the problem has been clearly defined, we explore possible evidence-based solutions to the problem in the academic literature. The next stage of the process is to create and evaluate a research proposal. Proposals thoroughly map out the evidence supporting the problem, the academic literature supporting the proposed intervention, and a detailed description of the intervention itself. This description should clearly identify the minute-to-minute experience of the control and treatment groups and include a sample lesson guide for each. A thoughtful measurement strategy must also be devised in order to illustrate how outcomes will be evaluated, and what the associated costs will be. Finally, these lesson guides and any newly developed measurement instruments should be thoroughly field-tested by our academic field team. Each proposal is evaluated at a process called ‘Gateway’, in which the person proposing the idea makes their case and a panel of stakeholders ask questions, critique the model, and make a final decision about whether to approve funding and resources towards the experiment’s execution. Next comes the design and execution phase. In order to design the intervention, a separate stream of teacher guides must be created as an alternative to the incumbent (control) group of teacher guides. This new stream should incorporate the proposed pedagogical approach and reflect feedback Creating systems to support experimentation in international education
7
from an extensive period of field-testing and piloting. In addition, a pretest and posttest must be finalised based on previous field-testing. These assessments should be proven to accurately measure the intended construct, and also to avoid a floor or ceiling effect. Once the measurement instrument and the control and treatment guides have been finalised, they are uploaded to the content management platform and available for sync by teachers in each group. Experiments typically commence with a pretest. While pretests can be printed, pretests are more frequently digitally delivered via the teacher guide. In these cases, teachers write the problems or questions on the board and pupils answer the questions in their exercise books. In order to reduce any potential bias from a teacher administering the assessment to their own pupils, NewGlobe requires a teacher exchange so that a different classroom teacher administers, marks, and enters the test data. After the pretest, the intervention begins and continues for the intended duration of the experiment. At the end of the study, the same process is used to administer a posttest. Posttests are typically mirror-exams, in which the order of questions and the wording of questions are adjusted but the core content of the assessment remains identical. After a study has concluded, NewGlobe analysts are able to pull pupil-level data directly from the reports server. In addition, data on school-level, teacher-level, and pupil-level characteristics can be downloaded from existing reports and used as control variables in the regression model. Once all data has been pulled from the reports server and cleaned, analysts conduct analysis on the data and draw conclusions about the outcome of the study.
Response to Evaluation Outcomes The final, and perhaps most complex, phase of the process involves NewGlobe’s response to the outcomes of the study. Three potential options emerge depending on the results of the analysis. In the event of a successful treatment outcome, NewGlobe explores opportunities to apply the successful intervention at scale. This might involve continued use of the intervention in that particular grade and subject, but also the adoption of the intervention for other grade/subject combinations in the country. If the results are particularly compelling, we consider integrating the intervention into our instructional approach for other countries. Depending on capacity, we may also decide to retroactively update existing materials with the newly proven approach. In the event of a neutral outcome, three possibilities emerge. First, we might conduct the study again with different study design conditions. This might include longer duration, a different population, or a different subject/grade combination. A second possibility is to adjust the treatment condition itself in order to potentially amplify the effects of the intervention in some way. Finally, we may decide to abandon the intervention altogether, particularly if the costs are too high for future testing and implementation. An unsuccessful treatment outcome does not represent failure. Rather, it reaffirms the quality of our incumbent approach to instructional design in that particular area, even after robust pressure testing. Upon conclusion of an unsuccessful intervention, we maintain our existing approach to instructional design, but with bolstered confidence in the methodology and lesson design. Most likely, future consideration of that intervention will be abandoned unless a compelling reason exists to continue testing in some other form. We draw heavily on the work on Kraft (2018) to interpret effect sizes in experiment outcomes, particularly when the effect size is positive and significant. Kraft challenges traditional interpretations Creating systems to support experimentation in international education
8
of effect sizes using benchmarks proposed by Cohen. Kraft recognises the potential impact for practitioners of effect sizes that Cohen’s standards consider small. He also encouraged consideration of study features, program costs, and scalability when interpreting effect sizes of a study. Kraft offers a compelling schema for interpreting effect sizes from causal studies with achievement outcomes. The matrix factors in effect size and costs per pupil in order to classify successful interventions as ‘Easy to Scale’, ‘Reasonable to Scale’, and ‘Hard to Scale’. Studies with small effect sizes and moderate to high costs should not be considered for scaling.
External Collaboration In an effort to connect the inquiry of the academic research community and the work of educational organisations, NewGlobe has pursued extensive collaboration with external partners as a central component of this process. There are three major objectives of this collaboration. First, we aim to foster productive dialogue with experts in the field. These conversations are typically informal and serve to guide our thinking about current approaches and shape the direction of future proposals. Second, we solicit specific feedback from experts on current proposals and the associated instructional design materials. This formative feedback is used to guide the substance of an intervention planned for evaluation. Finally, we enter into more formal partnerships with academics that have expressed interest in both thought-partnership and also in genuine collaboration. In this collaborative partnership, academic experts and NewGlobe offer different and complementary contributions. These experts often come from a background in developmental economics or educational development. They typically provide technical assistance on the study design, offering unique insights into ways that we can increase statistical power, more accurately measure the outcome variable, and more thoughtfully structure treatment arms to inform the outcomes of interest. In addition, they provide advice on the execution of the study, specifically in terms of how the execution of the study reflects the initial intentions of the study design. Partners also offer more concrete contributions. They conduct the randomisation in order to ensure that NewGlobe is not involved in any way that could bias the treatment assignments. In addition, they lend their analytic expertise to analyse and interpret the data. Finally, partners also generate ideas for future inquiry. Often, these are ideas that are grounded in their own research but that also align with NewGlobes organisational priorities and focus on achievement. NewGlobe brings an entirely separate set of contributions to this collaboration. Typically, academic experts at NewGlobe generate the vast majority of research proposals, since they more fully understand NewGlobe’s instructional model and organisational structure. NewGlobe’s master teachers also carry out the design and creation of instructional materials, with feedback and guidance from academic partners with an expertise in instructional design. Of course, NewGlobe actually leads on the execution the experiment, which includes the syncing of teacher guides, the delivery of lessons by NewGlobe teachers, the assessment of pupils through a teacher exchange, and the final entry of data. NewGlobe then provides blinded data to those partners for analysis. These unique contributions combine to form two powerful outcomes. NewGlobe learns more about how to improve its own academic offerings from thought-leaders in academia. These evidence-based outcomes either confirm our incumbent practices through intense pressure testing, or they highlight new opportunities for improvement that are proven to accelerate pupil learning in our educational setting. Our research partners gain the opportunity to work hand-in-hand with a large-scale educational provider. They are able to explore the impact of innovative approaches to education Creating systems to support experimentation in international education
9
in a large network of schools with a comparable instructional setting. And finally, those partners are able to publish the results, if the outcomes add to the literature in some way. In this way, NewGlobe and its partners are able to contribute to the global discourse regarding educational quality, and particularly to the conversation about how learning science can inform actual classroom practice at scale.
4. Case-Study: Leveled Reading in Lagos In the final section of this paper, I will briefly describe a case study to illustrate the use of this methodology to better understand a new approach to instructional design. In this study, we explore the impact of a within-classroom leveled reading approach on fluency and decoding skills. This study was conducted among 63 schools in Primary 3 Electronic Reading Programme in Lagos and Osun states in Nigeria. We began the process by exploring the data on literacy levels in Lagos. One key finding was that there was a wide range of literacy levels within a single grade level, as indicated by a large standard deviation in EGRA and midterm/endterm assessment scores. This called into question our current approach in Electronic Reading Programme, which assigns a single text for all pupils to read.
We then turned to the literature to explore evidence-based ways to address wide-ranging literacy levels among learners. Duflo, Dupas, and Kremer (2011) lay the groundwork for an approach to across-grade ability-grouping, finding that students grouped by ability had positive and significantly Creating systems to support experimentation in international education
10
higher literacy post-test scores (.17 SD) compared with students that were grouped randomly. Furthermore, the effects of ability grouping were persistent (.16 SD one year later). Reis, McCoach, Little, and Muller (2011) build on this research through an exploration of within-class, as opposed to across-grade, ability grouping. This study compared traditional direct instruction against a balanced literacy approach that included independent reading of on-level texts. They found that the differentiated reading programme resulted in significantly higher fluency and comprehension scores (d = 0.33 and 0.10) compared with a traditional direct instruction programme. These effects were particularly pronounced among pupils attending high-poverty urban schools (d = 0.27). Once we had identified a pressing problem affecting pupil learning and an evidence-based solution to address the problem, we set out to design an intervention. We selected Lagos Primary 3 Electronic Reading Programme for two reasons. First, we found particularly wide-ranging ability levels in Primary 3. Second, the Electronic Reading Programme, in which pupils read independently using e-readers, lends itself particularly well to an intervention where leveled texts are assigned based on pupil literacy levels. In order to create a second stream of treatment lesson guides, we began by conducting a survey of our incumbent approach. In the control group, all pupils are assigned four different stories to read independently on their e-readers. In the treatment group, we designed lesson guides in which the teacher assigned pupils to one of two different groups according to literacy levels. Grouping was conducted according to baseline literacy assessment scores. Once pupils were assigned to a group, pupils in each group were instructed to read four stories aligned to each group’s literacy levels. The lower-performing group read less challenging texts, while the higher-performing group read more challenging texts. All other classroom processes (lesson guide structure, etc.) remained the same. Treatment teacher guides were extensively field-tested by our academic field team to improve the lesson design and ensure that any confusing instructions were improved upon before the formal submission of lesson guides for execution. Pupils will be assessed at the baseline, midline, and endline of the study in order to measure literacy gains. In each case, the teacher will administer a one-to-one literacy assessment. The assessment is a 40-word reading list modeled on Fontas and Pinnell and aligned to predetermined reading levels (as indicated by state-wide EGRA testing in 2016). The assessment measures decoding and fluency skills. Pupils read as many words as possible in 1 minute, and teachers note how many words were read correctly. Again, this measurement instrument was field-tested to assess teachers’ abilities to effectively deliver a one-to-one assessment using only the guidance in the lesson guide. The data was analysed to ensure that floor and ceiling effects were not present. At present, only the baseline has been administered. The study is ongoing, with a planned conclusion in July 2019. A team of researchers from Harvard University, Utrecht University, and Oxford University has supported this study. These partners provided technical expertise on study design, conducted the randomisation, and will analyse the final data in July.
5. Discussion This randomised A/B testing platform offers significant benefits for NewGlobe academics. First, it presents an opportunity to empirically validate new ideas before scaling up an intervention. Historically, these decisions were made based on substantial but ultimately anecdotal field-testing. Now, decisions to scale or phase out an approach are based on data analysis from a randomised Creating systems to support experimentation in international education
11
study. This process has also enabled more strategic cost-benefit decision-making opportunities. Because evaluations result in a measurable effect size of an intervention, the effect can be quantified and compared with other possible uses of resources to make decisions about which approach will yield larger learning gains for a given cost input. Even in unsuccessful results, this approach has also provided us with a mechanism to pressure-test our existing approach in an intellectually honest and rigorous way. Rather than continuing with a particular methodology because ‘it’s the way that we’ve always done things’, we now make decisions only on the basis of an approach’s impact on pupil achievement. Historically, NewGlobe’s academic decision-making process has been firmly rooted in a single north star: pupil learning. Now, we have a way to guide us more effectively towards that north star and to better understand when we have moved closer towards that goal. There are also substantial outward-facing benefits of this platform. The results of this inquiry will undoubtedly contribute to the global conversation around how to improve educational quality and equity. In addition to simply adding more research on the relationship between school inputs and learning outcomes, NewGlobe’s evaluations are also quite different from traditional research and uniquely contribute to the literature. First, we are able to evaluate interventions in an actual classroom environment that have previously only been tested in controlled settings. Furthermore, the scale of our school operations, combined with the comparable instructional setting resulting from structured lesson guides, allow us to hold aspects of the learning experience constant and isolate the impact of a specific intervention. The outcomes of this research will also build a compelling case that lowcost, low-tech instructional approaches can generate meaningful learning gains. Finally, and most importantly, our partners will ensure that this work, and the clearer understanding of best practices that emerges, is communicated to teachers, trainers, and providers around the world through academic publications. It is our sincere hope that those audiences learn from our own evaluations, as we collaboratively seek to improve the quality of education worldwide.
Creating systems to support experimentation in international education
12
There are several unique elements of this platform that are relevant to note. First, it enables us to execute large-scale RCTs at a fraction of the cost of a traditional study. Teacher-administered assessments, teacher-entered data, and a cloud-based dissemination of instructional materials allow all data collection, training, communication, and execution to be handled digitally at zero additional cost to the organisation. Second, NewGlobe has the necessary school network and technology infrastructure to accommodate such a large-scale evaluation. The school buildings already exist. They are staffed with teachers trained by NewGlobe. They are equipped with technology that support teachers and allow NewGlobe to better understand metrics such as attendance and lesson delivery. And finally, NewGlobe’s field-team is uniquely situated to field-test, pilot, and monitor the outcomes of a new study. The unique instructional context across classrooms also warrants further discussion. In a traditional group of 300 classrooms, we would expect significant variation in terms of the minute-to-minute experience of children and teachers. Even if an intervention was faithfully delivered, so many other parts of their classroom experience are dependent on the teacher and their individual approach to lesson design and delivery. At NewGlobe, all teachers in the same grade are provided with the same detailed lesson guide. This ensures that teachers are approaching each lesson using a shared lesson design and using skills learned at a shared training experience. As a result, we can assume (and measure, through lesson completion) that pupils across these schools have a reasonably similar classroom experience, although there are inevitably variations in teacher quality that impact a pupil’s development. This platform does have several limitations that are important to note. First, a system that relies on teacher-entered data cannot be 100% validated. While we select schools to audit and employ external auditors to verify our manually entered data, we would still expect some issues to affect the integrity of the data. There might be bias in the administration or marking of assessments, despite our best efforts to eliminate this bias through a teacher exchange. Second, assessments delivered via the teacher tablet save costs, but also can result in teachers not administering the assessment or entering the scores due to technical difficulties with lesson access or data entry. Those same digital assessments also limit the types of questions that can be asked. Since questions must be written on the board, multiple-choice questions are challenging, and more than 30 questions in an assessment is often overwhelming for the teacher. Finally, interventions must be relatively simple to execute and limited in scope in order to minimise the necessary training and support. While it is possible to provide such training within NewGlobes organisational capacity, this would significantly increase the costs of executing an RCT using this platform. Future work in this setting will focus on testing new interventions. But we will also broaden the scope of interventions to include questions around teacher training, social-emotional learning, and closing the information gap between parents and the school. We will also use the platform to evaluate the impact of learning technologies. But the lens through which we evaluate these new ideas will focus solely on learning outcomes, not around issues of access or usage. These intermediate variables are only interesting insofar as they contribute to learning growth. Finally, we will explore new ways to design studies in order to better understand the impact of the treatment on our outcome variable. There are some elements of this work that are unique to NewGlobe. Most other organisations do not rely on teacher tablets to communicate instructional design materials. Other organisations may also lack NewGlobe’s substantial technological infrastructure, which allows constant monitoring of operational health and a wide range of school-based outcomes. Finally, no other organisation in the educational space operates at the kind of scale that NewGlobe has achieved. But with that said, Creating systems to support experimentation in international education
13
NewGlobe’s approach to internal evaluation could inform learning innovation work at other organisations. First, evaluation must be rooted in comparison with an incumbent approach rather than simply piloted for feasibility. Second, collaborating with scholars in academia can offer significant benefits and insights as organisations seek to design more effective evaluations. NewGlobe’s experience also highlights the importance of investing time and resources in an internal R&D process to make informed, strategic decisions that factor in learning outcomes and also costs of implementation. This process also exemplifies that importance of focusing on learning gains as the most important metric of success. While statistics on access are often tempting to tout as success, all intermediate outcomes matter only insofar as they contribute to improved learning for pupils. This is our obligation to pupils and parents. We must remain focused on that desire to accelerate learning growth if we are to make informed choices about the best possible ways to support pupils and parents in their educational journey.
6. References Duflo, Esther, Pascaline Dupas, and Michael Kremer (2011). “Peer Effects, Teacher Incentives, and the Impact of Tracking: Evidence from a Randomized Evaluation in Kenya.” American Economic Review 101(5): 1739-1774. Kraft, Matthew A. (2018). “Interpreting Effect Sizes of Education Interventions.” Brown University Working Paper. Reis, Sally M., D. Betsy McCoach, Catherine A. Little, and Lisa M. Muller (2011). “The Effects of Differentiated Instruction and Enrichment Pedagogy on Reading Achievement in Five Elementary Schools.” American Educational Research Journal 48(2): 462-501.
Creating systems to support experimentation in international education
14
Leaders in Learning https://newglobe.education/
Creating systems to support experimentation in international education
Creating systems to support experimentation in international education
15