Preview Literacy as Numbers by Cambridge International Education

LITERACY AS NUMBERS Researching the Politics and Practices of International Literacy Assessment Edited by Mary Hamilton, Bryan Maddox and Camilla Addey

University Printing House, Cambridge cb2 8bs, United Kingdom Cambridge University Press is part of the University of Cambridge. It furthers the Universityâ&#x20AC;&#x2122;s mission by disseminating knowledge in the pursuit of education, learning and research at the highest international levels of excellence. Information on this title: education.cambridge.org/ ÂŠ Cambridge University Press 2015 This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 2015 Printed in the United Kingdom by Printondemand-worldwide, Peterborough A catalogue record for this publication is available from the British Library isbn 13-9781107525177 Paperback Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate. Information regarding prices, travel timetables, and other factual information given in this work is correct at the time of first printing but Cambridge University Press does not guarantee the accuracy of such information thereafter.

CONTENTS

Foreword Introduction

Notes on contributors Series Editors’ Preface Gita Steiner-Khamsi Mary Hamilton, Bryan Maddox, Camilla Addey

PART ONE

DEFINITIONS AND CONCEPTUALISATIONS

Chapter 1

Assembling a Sociology of Numbers Radhika Gorur

Chapter 2

New Literacisation, Curricular Isomorphism and the OECD’s PISA Sam Sellar and Bob Lingard

Chapter 3

Transnational Education Policy-making: International Assessments and the Formation of a New Institutional Order Sotiria Grek

Interpreting International Surveys of Adult Skills: Methodological and Policy-related Issues Jeff Evans

PART TWO

PROCESSES, EFFECTS AND PRACTICES

Chapter 5

Disentangling Policy Intentions, Educational Practice and the Discourse of Quantification: Accounting for the Policy of “Payment by Results” in Nineteenth-Century England Gemma Moss

Adding New Numbers to the Literacy Narrative: Using PIAAC Data to Focus on Literacy Practices JD Carpentieri

Chapter 4

Chapter 6

Chapter 7

iii

How Feasible is it to Develop a Culturally Sensitive Large-scale, Standardised Assessment of Literacy Skills? César Guadalupe

v ix xi xiii

111

Chapter 8 Chapter 9

Inside the Assessment Machine: The Life and Times of a Test Item Bryan Maddox

129

Participating in International Literacy Assessments in Lao PDR and Mongolia: A Global Ritual of Belonging Camilla Addey

147

Chapter 10

Towards a Global Model in Education? International Student Literacy Assessments and their Impact on Policies and Institutions Tonia Bieber, Kerstin Martens, Dennis Niemann and Janna Teltemann 165

Chapter 11

From an International Adult Literacy Assessment to the Classroom: How Test Development Methods are Transposed into Curriculum Christine Pinsent-Johnson

187

Counting â&#x20AC;&#x2DC;What you Want Them to Wantâ&#x20AC;&#x2122;: Psychometrics and Social Policy in Ontario Tannis Atkinson

207

Chapter 12

NOTES ON CONTRIBUTORS

Camilla Addey is a researcher in international educational assessments and global educational policy. She recently completed her PhD, which focused on the rationales for participation in international literacy assessments in Mongolia and Laos. Her current research enquires into PISA for Development from a governance perspective in lowerand middle-income countries. Her research has established International Assessment Studies as a field of enquiry. Dr Addey previously worked at UNESCO in the Literacy and Non-Formal Education section and taught English as a foreign language at the British Council in Rome and Paris. She is one of the directors of the Laboratory of International Assessment Studies. She is author of Readers and Non-Readers. Tannis Atkinson completed her PhD at OISE, University of Toronto, after several decades’ experience in adult literacy including as editor of the journal Literacies. Currently a Postdoctoral Fellow in the Department of Sociology and Legal Studies at the University of Waterloo, her research focuses on the governing effects of literacy statistics in Canada, particularly how they are mobilised in social policies, and on how educators both comply with and resist policy imperatives. She is currently working on a book tentatively titled Obliged to Read: Literacy, Coercion and Advanced Liberalism in Canada. Tonia Bieber is Postdoctoral Fellow in the Kolleg-Forschergruppe project ‘The Transformative Power of Europe’ at the Freie Universität Berlin. Previously she was a Senior Researcher in the research project onsoti ‘Internationalization of Education Policy’ within the TranState Research Center 597 ‘Transformations of the State’ at the University of Bremen. Specialising in international relations and comparative public policy, she has published widely in the field of European integration and internationalisation processes in social policy, especially education policy, in Western democracies. In particular, she is interested in policy diffusion and convergence research, as well as empirical research methods in this field. Tonia holds a PhD in Political Sciences from the University of Bremen and Jacobs University Bremen. v

Notes on contributors

JD Carpentieri is a Lecturer in Adult Education at the Institute of Education, London, where he conducts research for the NRDC (National Research and Development Centre for Adult Literacy and Numeracy) and teaches a Masters module on international adult literacy policy. In addition to research and lecturing, he has contributed to a number of policy forums. This includes serving as Rapporteur for the European Union High Level Group of Experts on Literacy, a pan-European expert group charged with investigating and improving literacy policy. Jeff Evans is Emeritus Reader in Adults’ Mathematical Learning in the School of Science and Technology, Middlesex University, in London. His research interests include adult numeracy; mathematical thinking and emotion; images of mathematics in popular culture; public understanding of statistics. He has a lifelong commitment to numeracy, mathematics and statistics teaching to adults, with a focus on the range of methodologies used in social and educational research. From 2008 to 2013, he was a member of the Numeracy Expert Group, which was responsible for the design of the items used to measure adult numeracy in PIAAC. His recent activity includes talks, webinars and articles aiming to consider the relation of international surveys of adults to alternative policy discourses, and to facilitate access to, and critical engagement with, the results by researchers, practitioners and policy-makers. Radhika Gorur is a Senior Research Fellow at the Victoria Institute, Victoria University, Australia. Her research has sought, broadly, to understand how some ideas and practices cohere, stabilise, gain momentum and make their way in the world. Her current research focuses on the ways in which numbers – particularly international comparative data – are being produced, validated, contested and used in contemporary education policy. Her research is driven by an impulse to engage in productive critique, going beyond deconstruction to create arenas in which diverse policy actors can engage in seeking ways to move forward. She uses assemblage and other concepts from Science and Technology Studies and Actor-Network Theory as the main analytical and methodological approaches in her research. She is one of the directors of the Laboratory of International Assessments for Research and Innovation. Sotiria Grek is a Lecturer in Social Policy at the School of Social and Political Science, University of Edinburgh. She works in the area of Europeanisation of education policy and governance with a particular focus on transnational policy learning, knowledge and governance. She has recently co-authored (with Martin Lawn) Europeanising Education: Governing a New Policy Space (Symposium, 2012) and co-edited (with Joakim Lindgren) Governing by Inspection (Routledge, 2015). She is currently writing a monograph on ‘Educating Europe: EU Government, Knowledge and Legitimation’ to be published by Routledge in 2015. César Guadalupe completed his Education Doctorate and MA in Social and Political Thought at Sussex University. Dr Guadalupe is a lecturer and researcher at the Universidad del Pacífico (Peru) and non-Resident Fellow at the Brookings Institution (USA). Between 1992 and 2012 he worked on establishing connections between policy questions and research design, and between research results and decision-making

vii

Notes on contributors

processes in civil service institutions both in his home country (Peru) and at UNESCO. From 2007 to 2012 he led UNESCO’s Literacy Assessment and Monitoring Programme (LAMP), conducting field tests in eight countries (ten languages) as well as four full implementations of the programme. Mary Hamilton is Professor of Adult Learning and Literacy in the Department of Educational Research at Lancaster University. She is Associate Director of the Lancaster Literacy Research Centre and a founder member of the Research and Practice in Adult Literacy group. Her current research is in literacy policy and governance, socio-material theory, academic literacies, digital technologies and change. Her most recent book is Literacy and the Politics of Representation published by Routledge in 2012. Her coauthored publications include Local Literacies (with David Barton); More Powerful Literacies (with Lynn Tett and Jim Crowther); and Changing Faces of Adult Literacy, Language and Numeracy: A Critical History of Policy and Practice (with Yvonne Hillier). She is one of the directors of the Laboratory of International Assessment Studies. Bob Lingard is a Professorial Research Fellow in the School of Education at the University of Queensland in Brisbane, Australia, and is Fellow of the Academy of Social Sciences in Australia. He researches sociology of education and education policy. He is an editor of the Journal Discourse: Studies in the Cultural Politics of Education, co-editor with Greg Dimitriadis of the Routledge New York book series, Key Ideas and Education. His most recent book (2014) is Politics, Policies and Pedagogies in Education (Routledge). Bryan Maddox is a Senior Lecturer in Education and International Development at the University of East Anglia. He specialises in ethnographic and mixed methods research on globalised literacy assessments and the literacy practices of non-schooled adults. He has conducted ethnographic research on literacy assessment in Nepal, Bangladesh, Mongolia and Slovenia. With Esposito and Kebede he combined methods from ethnography and economics to develop new measures of functional adult literacy assessment and the assessment of literacy values. His ethnographies of assessment provide accounts of testing situations, and how standardised tests travel and are received across diverse cultural settings. His recent research collaborations ‘inside the assessment machine’ combine ethnographic accounts of assessment practice with large-scale psychometric data. He is one of the directors of the Laboratory of International Assessment Studies. Kerstin Martens is Associate Professor of International Relations at the University of Bremen, Germany. Her research interests include theories of international relations, international organisations, global governance, and global public policy, in particular education and social policy. She heads the research project ‘Internationalisation of Education Policy’ located at the University of Bremen. She is co-editor of several books, including Internationalization of Education Policy? (Palgrave Macmillan, 2014), Education in Political Science: Discovering a Neglected Field (Routledge, 2009). She holds a PhD in Social and Political Sciences from the European University Institute, Florence, Italy. Gemma Moss is Professor of Education at the University of Bristol. Her main research interests include literacy policy; gender and literacy; the shifting relationships between

viii

Notes on contributors

policy-makers, practitioners and stakeholders that are re-shaping the literacy curriculum; and the use of research evidence to support policy and practice. She specialises in the use of qualitative methods in policy evaluation, and innovative mixed methods research designs. She has recently co-edited a Special Issue of the journal Comparative Education with Harvey Goldstein. Other publications include ‘Policy and the Search for Explanations for the Gender Gap in Literacy Attainment’ and Literacy and Gender: Researching Texts, Contexts and Readers. Dennis Niemann is Research Fellow in the project ‘Internationalization of Education Policy’ within the TranState Research Center at the University of Bremen. His research interests include the internationalisation of education policy and the role of international organisations in global governance. In his recently completed PhD thesis, he analysed the soft governance influence of international organisations on domestic policy-making using the example of the OECD’s PISA study and its impact on German education policy. He has published on recent internationalisation processes in education policy, with a special focus on secondary and higher education reforms in Germany. Christine Pinsent-Johnson recently completed her PhD at the Faculty of Education of the University of Ottawa. She carried out a comprehensive analysis of the curricular and policy changes instituted at both the federal and provincial levels using the OECD’s international adult literacy assessment. She also has approximately two decades of experience working in adult literacy programmes in Ontario, Canada. She is currently leading a study to further examine assessment practices in Ontario adult literacy programmes, which have been shaped by OECD testing methods, in order to build on initial findings from her doctoral research that indicate there are unevenly distributed impacts on learners, educators and access to meaningful and relevant educational opportunities. Sam Sellar is a Postdoctoral Research fellow in the School of Education at the University of Queensland. Dr Sellar is currently working on research projects investigating the measurement of subjective well-being and non-cognitive skills in large-scale assessments, the development of new accountabilities in schooling and the aspirations of young people. Sam is Associate Editor of Critical Studies in Education and Discourse: Studies in the Cultural Politics of Education. Janna Teltemann is a Senior Researcher in the project ‘Internationalization of Education Policy’ within the TranState Research Center at the University of Bremen. She holds a PhD in Sociology from the University of Bremen. Her research interests are quantitative methods, migration, integration and education. In her PhD project she analysed the impact of institutions on the educational achievement of immigrants with data from the OECD PISA Study. She has published several papers on determinants of educational inequality, results and methodological implications of the PISA Study as well as on OECD activities in the field of migration.

SERIES EDITORSâ&#x20AC;&#x2122; PREFACE

The manifold dimensions of the field of teacher education are increasingly attracting the attention of researchers, educators, classroom practitioners and policymakers, while awareness has also emerged of the blurred boundaries between these categories of stakeholders in the discipline. One notable feature of contemporary theory, research and practice in this field is consensus on the value of exploring the diversity of international experience for understanding the dynamics of educational development and the desired outcomes of teaching and learning. A second salient feature has been the view that theory and policy development in this field need to be evidencedriven and attentive to diversity of experience. Our aim in this series is to give space to in-depth examination and critical discussion of educational development in context with a particular focus on the role of the teacher and of teacher education. While significant, disparate studies have appeared in relation to specific areas of enquiry and activity, the Cambridge Education Research Series provides a platform for contributing to international debate by publishing within one overarching series monographs and edited collections by leading and emerging authors tackling innovative thinking, practice and research in education. The series consists of three strands of publication representing three fundamental perspectives. The Teacher Education strand focuses on a range of issues and contexts and provides a re-examination of aspects of national and international teacher education systems or analysis of contextual examples of innovative practice in initial and continuing teacher education programmes in different national settings. The International Education Reform strand examines the global and country-specific moves to reform education and ix

Series editorsâ&#x20AC;&#x2122; preface

particularly teacher development, which is now widely acknowledged as central to educational systems development. Books published in the Language Education strand address the multilingual context of education in different national and international settings, critically examining among other phenomena the first, second and foreign language ambitions of different national settings and innovative classroom pedagogies and language teacher education approaches that take account of linguistic diversity. Literacy as Numbers is a timely critical analysis of the current dominant political reliance on international comparative measures of literacy based largely on quantifiable evidence in the context of school and adult education. As the apparent paradox in the title suggests, the contributors to this volume focus on the prevailing ideology of literacy assessment and provide a critical but balanced evaluation of this perspective while challenging readers, educationalists and policy-makers to reconsider the educational and sociopolitical assumptions of global measurements of literacy education. As such the book fits very well within the framework of this series.

Michael Evans and Colleen McLaughlin

FOREWORD Gita Steiner-Khamsi (Teachers College, Columbia University, New York)

As the use of numbers has become ubiquitous in the educational sector, the study of knowledge-based regulation, governance by numbers, or evidencebased policy planning has boomed over the past two decades. Many scholars in the social sciences, comparative education and in policy studies have scratched at the façade of precision, rationality or objectivity associated with these terms and uncovered a process that is steeped in political agendas and financial gains. They have analysed how indicators and statistics are produced or manufactured as a new policy tool to mobilise social agreement and political support as well as financial resources. Before our eyes, international surveys from the 1980s and 1990s have taken on, in the form of PISA, TIMSS, IALS, PIAAC and other international surveys, a monumental role in agendasetting and policy formulation. Nowadays the demand for such surveys is so great that they are administered in short sequence, recurrently, and in an ever-increasing number of subjects, grade levels and countries – in fact so much so that we are starting to see critique and resistance emerging through scholarly publications and the media. This book moves beyond describing, analysing and criticising the success story of international measurement and surveys. The focus of several chapters is on measurement as a performative act, that is, on a new mode of regulation that produces new truths, new ways of seeing and new realities. It is an all-encompassing mode of regulation that permeates every domain in the education system, from pedagogy and curriculum to governance and education finance. As examined by Michel Foucault, the 19th-century obsession with measurement was a project of the modern nation-state and an attempt to count, scrutinise and ‘normalise’ its citizens. Analogously, the xi

xii

Foreword

21st-century preoccupation with measurement in education is a project of global governance, constituted by internationally oriented governments and multi-lateral organisations that use the ‘semantics of globalisation’ to generate reform pressure on national educational systems. Without any doubt, the declared purpose of international surveys is lesson-drawing, emulation or policy borrowing from league leaders. Whether selective import really occurs, why it does or does not happen, under what circumstances, and how the ‘traveling reform’ is translated in the new context are different matters altogether and in fact are objects of intense intellectual investigation among policy borrowing and lending researchers. Nevertheless, there is pressure on national policy actors to borrow ‘best practices’, ‘world class education’ or ‘international standards’; all broadly defined and elusive terms with a tendency for inflationary usage. Every standardisation implies de-contextualisation or, in terms of educational systems, a process of de-territorialisation, de-nationalisation and globalisation. Reform packages may be catapulted from one corner of the world to the other given the comparability of systems construed through international indicators and tests. Unsurprisingly, then, international measurements, surveys and standards have become good (for) business and for international organisations because the same curriculum, textbook, student assessment, teacher education module may be sold (global education industry) or disseminated (international organisations), respectively, to educational systems as varied as Qatar, Mongolia and Indonesia. Among other factors, it is this great interest in the homogenisation or standardisation of education that kicked off the perpetuum mobile of measurement and keeps the test machinery in motion, enabling OECD and IEA-type international surveys to expand into new areas and territories as evidenced most recently in the creation of ‘PISA for Development’. The co-editors of this timely volume have gathered a group of noted social researchers, policy analysts, psychometricians, statisticians and comparativists to reflect on this numerical turn in education. They bring to bear a powerful array of theories and empirical data to examine in depth how international assessment practices are reconfiguring our knowledge of literacy in policy and practice around the world.

Gita Steiner-Khamsi (Teacher’s College, Columbia University, New York)

INTRODUCTION Mary Hamilton (Lancaster University), Bryan Maddox (University of East Anglia), Camilla Addey (Laboratory of International Assessment Studies)

The contemporary scale of international assessments of literacy makes them a rich field for scholarly enquiry. Transnational organisations, testing agencies, and regional and national governments invest heavily to produce internationally comparable statistics on literacy through standardised assessments such as the Organisation for Economic Development (OECD) Programme for International Student Assessment (PISA) and the Programme for the International Assessment of Adult Competencies (PIAAC). The scope of these assessments is rapidly widening. While North American and European institutions developed and promoted these measurement programmes as policy-informing instruments for higher income countries, they increasingly incorporate low- and middle-income countries in their frames of reference and comparison (see appendix for a brief historical overview). The growth of international assessments is not driven by international organisations and high-income countries alone. These networks of comparative measurement involve processes of globalised policy-making, policy borrowing and comparison, whose dynamics and implications are only just beginning to be appreciated and understood (Steiner-Khamsi 2003; Olssen et al. 2004; Zajada 2005; Lawn 2013). It has been argued that the growth of international literacy assessments is a global response to neoliberal demands for â&#x20AC;&#x2DC;a measure of fluctuating human capital stocksâ&#x20AC;&#x2122; (Sellar and Lingard 2013, 195), a way to allocate dominant values and to stimulate global performance competitiveness (Rizvi and Lingard 2009). The findings from international surveys have also been described as a way of ensuring mutual accountability and democratic transparency, despite the patently non-transparent nature of many of the programmes themselves (NĂłvoa and Yariv-Mashal 2003). xiii

xiv

Mary Hamilton, Bryan Maddox, Camilla Addey

Acts of encoding literacy as internationally comparable numbers thus raise profound questions about the ambitions, power and rationality of large-scale assessment regimes, their institutional and technical characteristics, their role in ‘governance by data’ (Fenwick et al. 2014), and associated questions about the validity of cross-cultural comparisons, institutional accountability and transparency. The products of large-scale assessment programmes operate as particularly influential artefacts, in which numbers are mobilised to make public impacts in the mass media and to substantiate policy agendas. Considerable work and resources, much of it invisible to the public, goes into the production of these numbers and sustaining their credibility and standing in the public imagination. Enquiring into these invisible practices leads to questions of assessment methodology and the validity of cross-cultural comparison, which are important emerging areas of enquiry dealt with by contributors to this book. This volume considers and examines in detail the commensurative practices (Espeland and Stevens 2008) that transform diverse literacy practices and competencies into measurable facts, and explores their policy implications. Numerical measures of literacy are not essential for describing outcomes for individuals and populations. The emancipatory and moral discourses of literacy invoking human rights and religious principles that motivate educators in many countries may rely on different yardsticks of success (Hamilton 2012, 4–5), but the international numbers are compelling, especially given the social power that is currently mobilised behind developing and promoting them. The numbers generated through tests and surveys rest upon assumed models of literacy. The model embedded in the OECD’s programme is an information-processing theory of functional literacy (Murray et al. 1998; OECD 2012). The model encompasses a broad set of skills (mainly assessed through reading comprehension) that are used to define threshold levels of competence. This view contrasts with an alternative relational view of literacy as part of everyday situated practices (Barton 2007; Brandt 2009; Cope and Kalantzis 2000; Street and Lefstein 2007; Reder 2009). This perspective on literacy offers a strong challenge to literacy survey measurement since it assumes it is not possible to lift individual performance in literacy from the context and social relations that constitute it without fundamentally changing its meaning. One of the powerful aspects of literacy as numbers is that the evidence produced through quantification seems to offer certainty and closure on issues of what literacy is, and who it is for. However, debates about the nature

Introduction

of literacy and how to account for the diversity of everyday practices are far from resolved, as can be seen from the different assumptions outlined above and discussed in this volume. In fact, these debates are more fascinating and challenging than ever before. The meanings and practices of contemporary literacy are woven into increasingly complex and rapidly moving mixtures of languages and cultures, named by some as ‘superdiversity’ (Blommaert and Rampton 2011). Literacies are migrating into the new ‘virtual’ spaces created by digital technologies (Barton and Lee 2013; Selfe and Hawisher 2004). In these processes, the nature of literacy is being transformed in unpredicted and as yet unclassified ways. Nevertheless, the diverse character of literacy practices is transformed by assessment experts into a rational set of competencies set out in assessment frameworks, enabling teachers to focus on the technical business of addressing the ‘skills gap’ of the millions of adults and students deemed to be underperforming. The contributors to this book start from an engagement with this complexity to examine a set of key themes covering the politics and practices of literacy measurement. All are concerned with how to respond to and understand the rise of literacy as numbers and its effects on policy and practice. One response is to view such measurement regimes as a growth of ignorance, since test construction inevitably reduces diverse socially and institutionally situated literacy practices, languages and scripts into a set of comparable numbers. Certain aspects of literacy practices (typically vernacular reading and all forms of writing) are excluded from standardised tests, simply because they are difficult to test and compare. Similarly, some uses of literacy are excluded on the grounds of avoiding test bias and undue controversy (see Educational Testing Service 2009). Literacy assessment regimes have also prompted robust critique on the basis of their ideological agendas, for example in the way that assessment frameworks, test items and survey reports support neoliberal competitiveness agendas (Darville 1999). Similar criticism might be made about how ‘data shock’ acts as a potent resource to make claims on resources and to legitimise radical policy change (see, for example, Waldow 2009). In editing this book, however, it has become clear to us that a rejectionist critique offers an inadequate response to a complex and influential sociological phenomenon (see Gorur, this volume). Large-scale literacy assessment regimes demand to be theorised, researched and understood. We have to understand the ontologies of assessment regimes, to develop intimate knowledge of their technical and institutional characteristics, and how they operate in processes of national and transnational governance. Such knowledge

xvi

Mary Hamilton, Bryan Maddox, Camilla Addey

is clearly required for robust academic encounters with assessment regimes, and to support informed policy and practitioner engagement. The chapters in this collection benefit from having been presented and discussed at a thematically designed international symposium, Literacy as Numbers, held in London in June 2013, which brought together leading academics in this rapidly developing field, along with representatives from key policy and literacy assessment institutions, to reflect on large-scale literacy assessment regimes and their methods as a significant topic for academic enquiry. The chapters in this collection use a variety of ethnographic, documentary and historical research methods to gain insights into contemporary practice and to demonstrate the complexity of local interpretations and responses. They are theoretically rich, drawing on critical policy studies and theories of literacy as social practice. Themes of globalisation and Europeanisation in educational policy are explored using world institutional theories (Meyer 2010). Socio-material theories (Latour 2005; Fenwick et al. 2011; Denis and Pontille 2013) are particularly useful for the core purpose of the book, enabling authors to trace the networks, flows of ideas, information and resources, to follow the enrolment or assembly of national agents in international alliances and delve into the intricate and often invisible processes of translation whereby ‘matters of concern’ are transformed into ‘matters of fact’ (Latour, 2004). Dorothy Smith’s institutional ethnography (Smith, 2005) is used to illuminate the ways in which international assessments come to act as regulatory frames for literacy practices. A distinctive feature of this book is that many chapters involve investigations inside assessment regimes whether through archival research, interviews with policy makers or through ethnographic observation. This extends the reach of critical policy discussion to include themes that were previously considered off-limits as the domain of technical specialists, such as the development and use of test items, the development of protocols and politics of test adaptation in cross-cultural assessment, and the work of policy actors. The chapters in this collection also discuss the implications of the statistical procedures involved in large-scale assessments, notably the demands and character of Item Response Theory (IRT). The book marks the emergence of ‘International Assessment Studies’ as a field of enquiry (Addey 2014) at an historical moment when assessment regimes are reaching out in ambitious acts of big data generation and globalisation, integrating and presenting data on an international scale and from across the life-course. The vitality of this emerging field can be seen from the variety of publications, research projects, fellowships, networks, seminars

xvii

Introduction

and conferences gathering around it. International Assessment Studies incorporate themes such as the sociology of measurement, the politics of assessment regimes, policy networks, actors and impacts in order to analyse the conceptual and methodological challenges and affordances of large-scale psychometric data. They also consider the implications of globalisation as assessment regimes attempt the difficult (some would say, impossible) task of recognising and integrating diverse cultures, contexts and relational literacy practices into international assessment (See Sellar and Lingard, this volume). These challenges are particularly pertinent to the UNESCO ‘Literacy Assessment and Monitoring Programme’ (LAMP1) (Guadalupe and Cardoso 2011) and more recently to the OECD’s initiative on ‘PISA for Development’ (P4D), which reach out to greater diversity (Bloem 2013). These frontier projects illustrate the complexity of international assessments involving multiple foci and levels of analysis – from questions of international goals, networks and relations, to the intimate and technical details of test-item development, adaptation and performance.

Literacy assessment across the life-course Global policy discourses increasingly make connections between initial education and employment and adult skills. Agencies such as the OECD are concerned to bring the literacy practices and learning opportunities of adults into measurable relation with school-based knowledge, and the recent development of PIAAC realises this ambition. Hence this book approaches literacy across the life-course. Rather than taking a more limited perspective of either child or adult literacy, it covers surveys of literacy in childhood and youth as well as in adult life. Adult literacy is presented in the book as a contested territory, which is currently being subjected to new forms of codification and institutionalisation, parallel to the ways in which children’s literacy has long been organised. This makes it a particularly important arena for exploring the processes whereby the diversity of everyday experiences and practices gives way to an ordered field of measurement (see Pinsent-Johnson, this volume). We can observe how the field of adult literacy is being re-positioned in the discourse of large-scale assessments, and trace the links that are established with themes such as employability, citizenship and opportunity. For many countries this is a subtle but important shift as lifelong learning is institutionally reframed within assessment discourse and practice (See Grek, this volume). Adults

xviii

Mary Hamilton, Bryan Maddox, Camilla Addey

are re-entering policy discourse within the narrower frame of reference of human resource development and competitiveness. The OECD PIAAC assessment exerts its technology to frame adult skills in relation to employability, in a similar way that the World Bank incorporates adults into â&#x20AC;&#x2DC;learning for allâ&#x20AC;&#x2122; (World Bank Group 2011) but with a clear orientation to the labour market. Assessment practices are at the heart of this discursive framing and policy positioning. For adult literacy practitioners, this involves a significant shift in power relations and the way that values and resources are distributed. In many contexts where standardised assessments are used, individual adults and their tutors are no longer at liberty to appraise their own abilities and make decisions on the content and goals of their literacy learning (principles of andragogy that have been cherished by many adult literacy programmes). This shift in the locus of control is described in this book as sobering for adult literacy advocates who have sought to obtain new resources for adult literacy programmes in response to the survey results (see Atkinson, this volume). Several of the chapters offer a more optimistic perspective. Namely, that for those who are able to understand and access large-scale assessment, and to understand its statistical methods, assessment data can provide evidence and insights that can inform policy and make evidence-based claims on resources (see for example, Evans, this volume). This includes analysing particular age cohorts, locating them in specific epochs of educational policy and provision, and drawing connections between the literacy practices of adults and their assessment performance (Carpentieri, this volume). As testing agencies make data sets available online, secondary data analysis allows for greater understanding of the comparative knowledge produced , and how this is mediated by the content of test items, statistical procedures and the decisions of testing agencies (e.g. decisions about sampling, weighting, setting of levels and thresholds). Contributors to the book highlight both the opportunities afforded by large-scale data, and the complex demands that it makes on educationalists, policy-makers, the media and/or civil society organisations (see Beiber et al., this volume).

Reframing literacy A distinctive theme of this book is the way that researchers and practitioners â&#x20AC;&#x201C; those who are creating and conducting assessments, interpreting them for policy and research purposes, or dealing with their uses and consequences

xix

Introduction

in educational practice – grapple with the demands and validity of reframing literacy as numbers. The project of reframing literacy around globalised assessment regimes of standardised literacy assessment is clearly still in progress. While the testing agencies acting as ‘centres of calculation’, in the sense described by Latour and Woolgar (1979), are motivated to present assessment programmes and their methods as unproblematic and routinised procedures, it appears to us that their black boxes are not yet sealed. In these processes of innovation and expansion, debates are necessarily raised about the efficacy, merits and validity of transnational assessment programmes, about their new institutional architecture and their governance and accountability. The insider approach taken by contributors to this book reveals that such debates are happening at all levels of the production and use of the survey findings, though much of the discussion remains inaccessible to the public gaze. The chapters in this book also approach literacy assessments and statistical knowledge as forms of technology with historical roots and entanglements (see Moss, this volume). The worlds of politics and statistics have always been closely connected (Hacking, 1990; Porter, 1996). From the 1800s the institutional framing of literacy as numbers (for example in the form of census data and marriage registers) took place around a dichotomy of literacy and illiteracy that was used to produce powerful discourses in the processes of governance (Graff 1979; Vincent 2014; Collins and Blot 2003). Those literacy surveys highlight the Janus-faced nature of many literacy statistics. Good data is a fundamental support to the administration and politics of liberal democracies yet it also offers unprecedented powers to control and define – as in colonial administrations (Mitchell 2002). Literacy assessments have clearly supported repressive state technologies of governance, negative ideological representations and attempts to legitimise inequalities (see Lin and Martin, 2005; Maddox 2007). As Street argues, the category of ‘illiteracy’ has long been associated with forms of prejudice and supported ‘great divide’ theories of literacy and development (Street 1995). On the other hand, literacy statistics provide a resource for humanistic and social justice projects to identify and challenge distributional inequalities (e.g. Sen 2003) and this was the rationale used by UNESCO in the second half of the 20th century to collect self-reported literacy statistics from governments across the world (Jones 1990). The clear inadequacy of the dichotomous paradigm and a dissatisfaction with the self-report measures associated with it, led to the rise in the 1990s of new psychometric programmes of literacy assessment such as those used on the National Adult Literacy Survey, NALS in the United States, and the

Mary Hamilton, Bryan Maddox, Camilla Addey

International Adult Literacy Survey (IALS) – the methodological precursors to today’s large-scale assessment programmes (see appendix). These new approaches developed by the US-based Educational Testing Service (ETS) and Statistics Canada rejected dichotomous models of literacy and replaced them with the notion of a continuum of literacy abilities, externally defined by ‘objective’ measurement which combined advances in psychometrics, reading theory and large-scale assessment with household survey methodologies (Thorn 2009, 5). The IALS offered telling insights into the perception of the potentials and challenges of large-scale psychometric approaches to literacy assessment, as it produced its own categories of innovators, champions and rejectionists. Critics argued that the IALS framed literacy in narrow terms around agendas of economic competitiveness, rather than those of human development or social justice (e.g. see Darville 1999; Hamilton and Barton 2000; Blum et al. 2001). However, IALS and its successor programmes gained international influence and momentum, as governments sought to use IALS data strategically (and with some success) in order to promote increased resources for adult literacy programmes and to innovate in nationally organised assessment programmes (Moser 1999; St Clair et al. 2010). The present collection can be viewed as a new wave of responses, as researchers and practitioners develop new frameworks for critique and scholarly engagement, and examine the extent to which evolving assessment technologies can be used as resources to promote contrasting ideological projects and agendas. It contributes to a growing literature on the globalisation of assessment practice and its impacts on educational policy borrowing and convergence (e.g. Lingard and Ozga 2007; Jakobi and Martens 2010; Steiner-Khamsi and Waldow 2011; Lawn 2013). Two notable collections focus on PISA (Meyer and Benavot 2013; Pereyra et al. 2011) and other publications deal specifically with Europeanisation and its intersection with global agendas (see Lawn and Grek 2012; Dale and Robertson 2009). While publications to date address issues of governance and the role of data and standardising measures, this volume is the first to specifically focus on literacy and its central role in the global assessment project.

The power of numbers In 1991, Nikolas Rose put forward the concept of ‘policy as numbers’ to indicate the increased reliance on numbers. Numbers have come to be seen as

xxi

Introduction

the most objective and scientific form of information, with an ‘intrinsic force of persuasion’ (Pons and Van Zanten 2007, 112) within policy processes. Rose further argues that citizens become complicit in measuring themselves against others, developing what he terms ‘the calculating self’ (Rose 1998). Since then, there has been extensive scholarly debate on the role of numbers in policy processes and global educational policy discourse. This trend has been discussed in terms of the rise of an audit society (Power 1999), ‘governance by ranking and rating’ (Lehmkuhl 2005), ‘governance by comparison’ (Martens 2007), ‘governance by numbers’ (Grek 2009) and ‘governance by data’ (Hamilton 2012). All these are ways of describing governments’ growing reliance on international statistical indicators to inform and frame ‘evidence-based’ literacy policy and to justify policy change (Rizvi and Lingard 2009). Fenwick et al. (2014) go further, to argue that these numberbased technologies and the comparative knowledge they produce have not only changed forms of educational governance but have themselves become a process of governing. This highlights a shift in scholarly attention to an interest in how large-scale assessments are themselves governed and held accountable. How can we understand the particular power that numbers hold over the public imagination? What is it about numbers that makes this form of symbolic representation so useful to projects of social ordering in the policy sphere? The work of social semioticians such as Lemke (1995), Van Leeuwen (2008) and O’Halloran (2008) can illuminate the processes involved (see discussion in Hamilton 2012, 33–40). Firstly, numbers help to create the categories and classification systems that fix and align points within the flux of social activity (Bowker and Starr 2000). They set clear though often arbitrary and spuriously accurate boundaries around our knowledge and experience of reading and writing. These categories can be manipulated, ordered into levels and they generate new languages and shorthands for talking about literacy (such as ‘IALS Level 3’ or ‘Document Literacy’). The appearance of accuracy, and the way in which the arbitrary and fuzzy categories that numbers rest upon become naturalised in discourse, are powerful assets for policy and research. Secondly, numbers enable us to deal with things and people as de-contextualised instances of classes and groups rather than as embodied individuals, thus enabling them to be counted and measured for audit purposes. Discourse analysts like Van Leeuwen understand such mechanisms of depersonalisation to be very important in ordering social life (Van Leeuwen 2008, 47).

xxii

Mary Hamilton, Bryan Maddox, Camilla Addey

The creation of clear-cut classifications and mechanisms for allocating people to groups and categories in turn enables comparisons to be made more easily across incomparable spaces of time and place. People can be ordered in hierarchical levels, and causal relationships can be made between literacy and other quantifiable variables, such as income, age, educational qualifications. International networks and new technologies make such comparisons and relationships easier and quicker. Dense and succinct cross-referencing can be made between statements of relationship and their referents which are often each arrived at through complex processes of definition and argument. These processes are encoded in the numbers but also, as Oâ&#x20AC;&#x2122;Halloran puts it, these operations are black-boxed for ease of access (Oâ&#x20AC;&#x2122;Halloran 2008). Numbers are thus particularly useful for aligning national and international policy, for standardising qualifications in the global marketplace and, in the process, legitimising what come to be seen as purely technical facts and marginalising other ways of imagining literacy. Numbers facilitate a particular way of imagining literacy as a thing, a commodity or a resource that can be owned by an individual, exchanged and given a value in the educational marketplace. In turn, this view of literacy as a commodity appears to distance it from moral values and personal experience while embedding the values of the market within it, thus turning literacy into a key variable in a human resource model of educational progress.

An emerging research agenda Since technologies of assessment and modes of representing literacy as numbers are now such pervasive features of the policy landscape, discussions about how such data is produced, and for what purpose, and under what systems of transparency and accountability must also become commonplace and integrated into the workings of democratic institutions. Such themes go beyond conventional concerns about the validity of assessments to speak to wider concerns with power, resource flows and accountability of the state and of transnational organisations. This collection is intended as a substantial contribution to such discussions and as a springboard for future research. The contributions we outline in the following sections illuminate the amount of (often invisible) work that goes on behind the scenes in producing the tests and the policies. They offer ample evidence that the challenges faced in this area are matters of value as well as technical matters and that asymmetric power relations suffuse the

xxiii

Introduction

field. They begin to reveal the politics of reception as international test results travel through national policies and practice, often taken up in ways that were never intended by the test-designers and translated into dilemmas for pedagogical practice. All these themes urgently demand further investigation.

Overview of chapters The book is organised into two main parts, one presenting general framings and definitions that underpin discussions about literacy as numbers, and the second exploring through specific and insider examples, the processes of producing the international tests and how they impact on policy and practice. The first section of the book, Part 1, ‘Definitions and conceptualisations’ begins with Chapter 1, Assembling a Sociology of Numbers, by Radhika Gorur. Gorur argues that while existing sociological critiques of quantification are legitimate, they are, in Latour’s term, ‘running out of steam’. Merely showing that numbers are reductive, that they are political, that they are misused and that they are an instrument of governance is not enough. The chapter suggests that a move towards a sociology of numbers in education, inspired by the conceptual tools and methodologies of Science and Technology Studies (STS), may provide useful ways to develop new forms of critique. Such a sociology would involve moving from a representational to a performative idiom, and empirically tracing the socio-material histories and lives of numbers, focusing both on the processes by which they translate the world, and the ways in which they make their way through the world. Gorur invites a collective exploration of the forms that sociology might take, and of the scope it might offer for productive interference in policy processes. In Chapter 2, New Literacisation, Curricular Isomorphism and the OECD’s PISA, Sam Sellar and Bob Lingard problematise the underlying model of literacy used in international surveys. They elaborate a concept of ‘literacisation’ and place this within an analysis of the growing significance of education within the OECD’s policy work and within global educational governance. They draw on the approaches of Common World Educational Culture and the Globally Structured Agenda for Education in order to understand the underpinning assumptions and effects of international tests such as PISA. The authors discuss definitions of literacy in relation to the theory of multiliteracies, arguing that the increasingly broad and amorphous definition of literacy used in PISA contributes to the intensification of the human capital framing of global educational policy. It does this by enabling

xxiv

Mary Hamilton, Bryan Maddox, Camilla Addey

international tests to encompass more diverse aspects of education and to strengthen an underlying assumption of curricular isomorphism in international contexts that is at odds with a rhetoric of situated relevance. In Chapter 3, Transnational Education Policy-making: International Assessments and the Formation of a New Institutional Order, Sotiria Grek takes the development of the Programme for the International Assessment of Adult Competencies (PIAAC) as a specific case through which the extent of policy learning and teaching between two significant international players, the European Community and the OECD, may be scrutinised and evaluated. It discusses the processes of problematisation and normalisation of the notions of ‘skills and competences’ by the two organisations and examines the ways both concepts have turned into a significant policy problem, in need of soft governance through new data, standards and new policy solutions. The chapter focuses on the nature of the problem, its contours, characteristics and shifting qualities. It discusses the ways that policy problems can be transformed into public issues with all-pervasive and all-inclusive effects. Grek suggests that in order to understand the ‘problem’, one has to move behind it, since the very process of its creation already carries the seeds of its solution. In Chapter 4, Interpreting International Surveys of Adult Skills: Methodological and Policy-related Issues, Jeff Evans takes a detailed and critical perspective view of the production of numbers in international assessments. The chapter follows in the tradition established by Blum, Goldstein and Guérin-Pace (2001) with its review of the statistical methods used in PIAAC, and associated questions of validity. Evans uses a discussion of numeracy statistics in PIAAC to provide methodological insights into how PIAAC data should and should not be used, and what we can legitimately infer from the assessment results. His critical and optimistic perspective suggests that by struggling with design and methodological issues researchers, practitioners and policy-makers are able to make more of survey data as powerful knowledge, and to develop new and empowering agendas in applied research. Part 2 of the book, ‘Processes, Effects and Practices’, begins with an historical perspective on literacy and data by Gemma Moss. In Chapter 5, Disentangling Policy Intentions, Educational Practice and the Discourse of Quantification: Accounting for the Policy of ‘Payment by Results’ in Nineteenth-century England, Moss focuses on the collection and use of literacy attainment data in the 1860s, and the dilemmas and uncertainties that the data created in policy and in practice. The case raises questions about how we generalise about quantitative practice and the uses to which it can be put in education. By considering how numbers were mobilised, displayed and

xxv

Introduction

interpreted in policy and in practice in the 1860s, a more nuanced account of the role numerical data play in the formation of educational discourse is proposed. In Chapter 6, Adding New Numbers to the Literacy Narrative: Using PIAAC Data to Focus on Literacy Practices, JD Carpentieri takes the context of recent adult literacy policy in the UK, specifically the English Skills for Life initiative, to show how quantitative data produced by international assessments is used and misused by politicians. He shows the limited success of narrowly defined and evaluated literacy programmes and argues that, to date, quantitative data about literacy proficiency has had only limited success in guiding interventions in adult literacy. He suggests that practice engagement theory (which considers the everyday uses of reading as well as skills) can offer a sounder basis for making policy decisions in this field. Background data on reading practices collected by PIAAC offer good empirical, quantitative evidence for the usefulness of this approach and could be the basis of a robust pragmatic argument acceptable to policy-makers. Chapters 7, 8 and 9 focus on the UNESCO Literacy Assessment and Monitoring Programme (LAMP) to discuss issues of cross-cultural validity and the motivations of policy actors. César Guadalupe (Chapter 7) examines an increasingly central question in globalised programmes of assessment: How Feasible is it to Develop a Culturally Sensitive Large-scale, Standardised Assessment of Literacy Skills? His account describes his role as director of LAMP as the LAMP team struggled to reconcile the use of standardised assessment items, many of which are derived from a North American experience, with their use in countries that are geographically and culturally distant from such origins. As Guadalupe’s chapter illustrates, the institutional and managerial commitment to culturally sensitive assessment of literacy goes at least some way to resolving such tensions (e.g. protocols for test-item production and adaptation). In Chapter 8, Inside the Assessment Machine: The Life and Times of a Test Item, Bryan Maddox continues the discussion of cross-cultural validity. It uses Actor-Network Theory (ANT) and Science and Technology Studies (STS) to examine the role of test items in LAMP. Maddox argues that that the study and critique of large-scale assessment programmes from the outside provides only partial insights into their character. His insider perspective on LAMP offers an intimate ethnographic account of the production of statistical knowledge and the challenges of cross-cultural testing. The chapter tells the story of the test item – from its initial development, to the production of statistical data. We are introduced to various characters along the way –

xxvi

Mary Hamilton, Bryan Maddox, Camilla Addey

the ‘framework’, Mongolian camels, Item Response Theory and statistical artefacts. In Chapter 9, Participating in International Literacy Assessments in Lao PDR and Mongolia: A Global Ritual of Belonging, Camilla Addey explores what lies behind the growth of international assessments in lower- and middle-income countries, as part of international educational policy processes. Although there is consensus that the emergence of international assessments is in part a response to the recent shift towards policy as numbers and a shift towards international educational goals measured by international performance benchmarks, the research presented here suggests the rationales of lower- and middle-income countries’ participation are more complex. Using Actor-Network Theory to analyse a multiple qualitative case study of LAMP in Lao PDR and Mongolia, this chapter argues that countries benefit from both scandalising and glorifying (Steiner-Khamsi 2003) through international assessment numbers, and from ‘a global ritual of belonging’. The complex picture that emerges from the data contributes to our understanding of the politics of data reception generally as well as illuminating how international assessment data shape (or do not shape) policy processes and enter governance in lower- and middle-income countries. In different ways, the final three chapters explore the effects of international assessments on policy and practice and the challenges these pose for those implementing educational reforms. In Chapter 10, Towards a Global Model in Education? International Student Literacy Assessments and their Impact on Policies and Institutions, Tonia Bieber and her colleagues demonstrate the heterogeneity of school policy reforms across different countries in response to PISA results and OECD guidance. The authors juxtapose careful analyses of two case-study countries, Germany and Switzerland, with descriptive quantitative data across all participating countries to show the variety of changes in school policies between 2000 and 2102 relating to increased school autonomy, accountability and educational standards framed in terms of literacy. In Chapter 11, From an International Adult Literacy Assessment to the Classroom: How Test Development Methods are Transposed into Pedagogy, Christine Pinsent-Johnson uses the theoretical tools of institutional ethnography to analyse the ways in which international testing methodologies act as regulatory frames for understanding adult literacy. She shows how the use of tests is being inappropriately extended from their original purpose as summary benchmark statements for adult populations to screening tests of individual capability and curriculum frameworks.

xxvii

Introduction

In Chapter 12, Counting ‘What you Want them to Want’: Psychometrics and Social Policy in Ontario, Tannis Atkinson uses governmentality analytics to examine the statistical indicators of adult literacy promoted by the OECD and first employed in the International Adult Literacy Survey (IALS). She argues that using numerical operations to dissect interactions with text and to describe capacities of entire populations represents a new way of knowing and acting upon ‘adult literacy’. Drawing on empirical data from one jurisdiction in Canada – Ontario – she considers how constituting literacy as a labour market problem has individualised responsibility for structural changes in the economy and naturalised gendered and racialised inequalities. Atkinson outlines how policies based on rendering literacy calculable in this way are coercing and punishing those who are poor or unemployed; she also shares findings about how the emphasis on ‘employability’ is diminishing teaching and learning. The chapter’s conclusion urges researchers to attend to the dilemmas and dangers produced when literacy is offered as the simple, calculable solution to complex social and macroeconomic problems.

NOTES 1 On LAMP, see chapters by Addey, Guadalupe and Maddox in this volume.

REFERENCES Addey, C. (2014). ‘Why do Countries Join International Literacy Assessments? An ActorNetwork Theory Analysis with Case Studies from Lao PDR and Mongolia’. Norwich: School of Education and Lifelong Learning, University of East Anglia. PhD. Barton, D. (2007). Literacy: An Introduction to the Ecology of Written Language (second edition). Oxford: Wiley-Blackwell. Barton, D. and Lee, C. (2013). Language Online: Investigating Digital Texts and Practices. London: Routledge. Bloem, S. (2013). PISA in Low and Middle Income Countries. Paris: OECD Publishing. Blommaert, J. and Rampton, B. (2011). ‘Language and Superdiversity’. Diversities 13 (2). Blum, A., Goldstein, H. and Guérin-Pace, F. (2001). ‘International Adult Literacy Survey (IALS): An Analysis of International Comparisons of Adult Literacy’. Assessment in Education: Principles, Policy and Practice 8 (2), 225–46. Bowker, G.C. and Starr, S.L. (2000). Sorting Things Out: Classification and its Consequences, Massachusetts: MIT Press. Brandt, D. (2009). Literacy and Learning: Reflections on Writing, Reading, and Society. Chichester: John Wiley & Sons.

xxviii

Mary Hamilton, Bryan Maddox, Camilla Addey

Collins, J. and Blot, R.K. (2003). Literacy and Literacies: Texts, Power, and Identity (22). Cambridge University Press. Cope, B. and Kalantzis, M., eds (2000). Multiliteracies: Literacy Learning and the Design of Social Futures. London: Routledge. Dale, R. and Robertson, S. (2009). Globalisation and Europeanisation in Education. Oxford: Symposium Books. Darville, R. (1999). Knowledges of Adult Literacy: Surveying for Competitiveness. International Journal of Educational Development 19 (4–5), 273–85. Denis, J. and Pontille, D. (2013). ‘Material Ordering and the Care of Things’. International Journal of Urban and Regional Research 37 (3), 1035–52. Educational Testing Service (2009). ‘ETS Guidelines for Fairness Review of Assessments’. Princeton, NJ: ETS. Espeland, W.N. and Stevens, M.L. (2008). ‘A Sociology of Quantification’. European Journal of Sociology 49 (03), 401–36. Esposito, L., Kebede, B. and Maddox, B. (2014). ‘The Value of Literacy Practices’. Compare: A Journal of Comparative and International Education (forthcoming), 1–18. Fenwick, T., Edwards, R. and Sawchuk, P. (2011). Emerging Approaches to Educational Research: Tracing the Socio-Material. London: Routledge. Fenwick, T., Mangez, E., & Ozga, J. (2014). Governing Knowledge. Comparison, KnowledgeBased Technologies and Expertise in the Regulation of Education. London: Routledge. Graff, H.J. (1979).The Literacy Myth: Literacy and Social Structure in the Nineteenth-Century City. New York: Academic Press, xix. Grek, S. (2009). ‘Governing by Numbers: The PISA “Effect” in Europe’. Journal of Education Policy 24 (1), 23–37. Guadalupe, C. and Cardoso, M. (2011). ‘Measuring the Continuum of Literacy Skills among Adults: Educational Testing and the LAMP Experience’. International Review of Education 57 (1–2), 199–217. Hacking, I. (1990). The Taming of Chance (vol. 17). Cambridge University Press. Hamilton, M. (2012). Literacy and the Politics of Representation. London: Routledge. — (2013). ‘Imagining Literacy through Numbers in an Era of Globalised Social Statistics’. Knowledge and Numbers in Education. Seminar Series: Literacy and the Power of Numbers. London: Institute of Education. Hamilton, M. and Barton, D. (2000). ‘The International Adult Literacy Survey: What Does it Really Measure?’ International Review of Education 46 (5), 377–89. Jakobi, A.P. and Martens, K. (2010). ‘Introduction: The OECD as an Actor in International Politics’. In A.P. Jakobi and K. Martens, eds, Mechanisms of OECD Governance: International Incentives for National Policy-Making. Oxford University Press, 163–79. Jones, P.W. (1990). ‘UNESCO and the Politics of Global Literacy’. Comparative Education Review 41–60. Latour, B. (2004). ‘Why Has Critique Run Out of Steam? From Matters of Fact to Matters of Concern’. Critical Inquiry 30 (2), 225–48. — (2005). Reassembling the Social: An Introduction to Actor-Network-Theory. Oxford University Press. Latour, B. and Woolgar, S. (1979). Laboratory Life: The Construction of Scientific Facts. London: Sage Library of Social Research.

xxix

Introduction

Lawn, M., ed. (2013) The Rise of Data in Education Systems Collection: Visualization and Use. Oxford: Symposium Books. Lawn, M. and Grek, S (2012). Europeanizing Education: Governing a New Policy. Oxford: Symposium Books. Lehmkuhl, D. (2005). ‘Governance by Rating and Ranking’. In Annual Meeting of the International Studies Association (ISA), Honolulu, 2–6. Lemke, Jay (1995) Textual Politics: Discourse and Social Dynamics. London: Taylor and Francis. Lin, A.M., and Martin, P.W., eds (2005). Decolonisation, Globalisation: Language-inEducation Policy and Practice (3). Clevedon: Multilingual Matters. Lingard, B. and Ozga, J., eds (2007). The RoutledgeFalmer Reader in Education Policy and Politics. London: Routledge. Maddox, B. (2007). ‘Secular and Koranic Literacies in South Asia: From Colonisation to Contemporary Practice’. International Journal of Educational Development 27, 661–8. Martens, K. (2007). ‘How to Become an Influential Actor: The “Comparative Turn” in OECD Education Policy’. In K. Martens, A. Rusconi and K. Leuze, New Arenas in Education Governance. New York: Palgrave Macmillan. Meyer, H.D. and Benavot, A., eds (2013). PISA, Power, and Policy: The Emergence of Global Educational Governance. Oxford: Symposium Books. Meyer, J.W. (2010). ‘World Society, Institutional Theories, and the Actor’. Annual Review of Sociology 36, 1–20. Mitchell, T. (2002). Rule of Experts: Egypt, Techno-Politics, Modernity. Berkeley and Los Angeles: University of California Press. Moser, S.C. (1999). Improving Literacy and Numeracy: A Fresh Start. London: DfEE Publications. Murray, T.S., Kirsch, I.S. and Jenkins, L.B. (1998). Adult Literacy in OECD Countries: Technical Report on the First International Adult Literacy Survey. Washington, DC: US Government Printing Office, Superintendent of Documents, Mail Stop: SSOP, Washington, DC 20402-9328. www.files.eric.ed.gov/fulltext/ED445117.pdf (retrieved December 2014). Nóvoa, A. and Yariv-Mashal, T. (2003). ‘Comparative Research in Education: A Mode of Governance or a Historical Journey?’ Comparative Education 39 (4), 423–38. OECD (2012). Literacy, Numeracy and Problem Solving in Technology-Rich Environments: Framework for the OECD Survey of Adult Skills. Paris: OECD. — (2013). OECD Skills Outlook 2013: First Results from the Survey of Adult Skills. Paris: OECD. O’Halloran, K.L. (2008) Mathematical Discourse: Language, Symbolism and Visual Images. London and New York: Continuum. Olssen, M., Codd, J.A. and O’Neill, A.M. (2004). Education Policy: Globalization, Citizenship and Democracy. London: Sage. Pereyra, M.A., Kotthoff, H.G. and Cowen, R. (2011). PISA under Examination. Rotterdam: Sense Publishers. Pons, X. and Van Zanten, A. (2007). ‘Knowledge Circulation, Regulation and Governance.’ Knowledge and Policy in Education and Health Sectors. Literature Review, part 6 (June). Louvain: EU Research Project. www.knowandpol.eu/IMG/pdf/lr.tr.pons_vanzanten.eng. pdf (retrieved December 2014). Porter, T.M. (1996). Trust in Numbers: The Pursuit of Objectivity in Science and Public Life. Princeton University Press.

xxx

Mary Hamilton, Bryan Maddox, Camilla Addey

Power, M. (1999). The Audit Society: Rituals of Verification. Oxford University Press. Reder, S. (2009). ‘Scaling up and Moving in: Connecting Social Practices Views to Policies and Programs in Adult Aducation’. Literacy and Numeracy Studies 16 (2), 35–50. Rizvi, F., and Lingard, B. (2009). ‘The OECD and Global Shifts in Education Policy’. International Handbook of Comparative Education. Springer Netherlands, 437–53. Rose, N. (1991). ‘Governing by Numbers: Figuring out Democracy’. Accounting, Organizations and Society 16 (7), 673–92. — (1998). Inventing Our Selves: Psychology, Power, and Personhood. Cambridge: Cambridge University Press. Selfe, C.L. and Hawisher, G.E. (2004). Literate Lives in the Information Age: Narratives of Literacy from the United States. London: Routledge. Sellar, S. and Lingard, B. (2013). ‘PISA and the Expanding Role of the OECD in Global Education Governance’. In H.D. Meyer and A. Benavot, eds, PISA, Power, and Policy: The Emergence of Global Educational Governance. Wallingford, UK: Symposium Books. Sen, A. (2003). ‘Reflections on Literacy’. In C. Robinson, ed., Literacy as Freedom. Paris: UNESCO, 20–30. Smith, D.E. (2005). Institutional Ethnography: A Sociology for People. Walnut Creek: AltaMira Press. St Clair, R., Maclachlan, K. and Tett, L. (2010). Scottish Survey of Adult Literacies 2009: Research Findings. Edinburgh: Scottish Government. Steiner-Khamsi, G. (2003). ‘The Politics of League Tables’. JSSE – Journal of Social Science Education 2 (1). www.jsse.org/index.php/jsse/article/view/470 (retrieved December 2014). Steiner-Khamsi, G., and Waldow, F., eds (2011). Policy Borrowing and Lending in Education. London: Routledge. Street, B. (1995). Social Literacies: Critical Approaches to Literacy in Development, Ethnography and Education. London: Longman. Street, B. and Lefstein, A. (2007). Literacy: An Advanced Resource Book for Students. London: Routledge. Thorn, W. (2009). ‘International Adult Literacy and Basic Skills Surveys in the OECD Region’. OECD Education Working Papers 26. Paris: OECD (NJ1). Van Leeuwen T. (2008) Discourse and Practice: New Tools for Critical Discourse Analysis. Oxford: Oxford University Press. Vincent, D. (2014). ‘The Invention of Counting: The Statistical Measurement of Literacy in Nineteenth-Century England’. Comparative Education 50 (3), 266–81. Waldow, F. (2009). ‘What PISA Did and Did Not Do: Germany after the “PISAshock”’. European Educational Research Journal 8 (3), 476–83. World Bank Group (2011). Learning for All: Investing in People’s Knowledge and Skills to Promote Development. Washington, DC: World Bank, Education Strategy 2020. Zajda, J.I., ed. (2005). International Handbook on Globalisation, Education and Policy Research: Global Pedagogies and Policies. London: Springer.

8 INSIDE THE ASSESSMENT MACHINE: The Life and Times of a Test Item Bryan Maddox (University of East Anglia)

Introduction International assessments offer compelling opportunities for comparative ranking and normative policy prescriptions (Hamilton 2012). In the process, literacy is transformed from a diverse social practice into a set of numbers. But how does this transformation take place? The technically demanding procedures of knowledge production are often black-boxed and routinised – out of sight and beyond public debate (Latour 1987). The reports of large-scale assessment programmes provide little insight into the internal controversies and debates that are involved in knowledge production (Hamilton 2001). Furthermore, as Latour and Woolgar (1979) argued, the published outputs of ‘laboratory life’ may ‘systematically misrepresent’ (page 28) scientific activity, and lead to inaccurate and stereotypical portrayals of what the process of knowledge production actually entails. Bruno Latour (1987) therefore suggests that researchers who wish to understand the everyday life of scientific and technical processes are compelled to enter the laboratory to observe fact-making ‘machines’ from the inside: ‘We will enter facts and machines while they are in the making; we will carry with us no preconceptions of what constitutes knowledge.’ (page 13). This chapter takes such an approach based on ethnographic fieldwork conducted with the UNESCO Literacy Assessment and Monitoring Programme (LAMP) in Mongolia. Instead of attempting to piece together a speculative, post hoc account of what assessment projects might do and involve in practice, we

129

130

Bryan Maddox

enter the assessment machine, and examine the production of literacy assessment data from the inside. We shall start with the Camel test item – and take it from there. We trace ‘Ariadne’s thread’ (Latour 1993) to reassemble a series of moments, places and scales of analysis.1 The chapter will identify various components in the assessment machine, and follow the transformations of data, from individual assessment events, to large-scale psychometric artefacts. The story of the Camel test item introduces various characters who come together as an ‘assemblage’ (Latour 2005), and who bring contrasting personalities and ontological positions. These include Mongolian camels, nomadic herders, LAMP testers, and some complex non-human and non-animal actors – the ‘Framework’, the ‘Mongolian Camel’ assessment item, Item Response Theory and its related algorithms, and Item Characteristic Curves (ICCs). The story of the test item takes us to Mongolia, Morocco, Montreal, Brussels, Princeton and my office in Norfolk. It incorporates testing agencies, families, educational departments, funding agencies, psychometricians, nomadic herders, policy researchers and government officials.

Inside the Assessment Machine LAMP has all the characteristics of a demanding and innovative ‘technical project’; driven by a set of attractive and desirable ideas about what might be, or must be made possible in international assessments (see Guadalupe in this volume). Its task was to manage chaos – to take multiple and diverse populations and their many different languages, scripts and literacy traditions, and somehow convert them into an orderly set of numbers and graphs. Technically, the project was not original. It might have claimed to be original and indispensable (Guadalupe and Cardoso 2011). Technically speaking, it inherited a set of constructs, people, ‘Item Response Theory’ (IRT) algorithms, procedures, frameworks, standards, components, and test stimuli and test from previous assessment projects (NALS, IALS, ALL), which continue to be used by other programmes. Many of the processes and tools were already prepared before the project could exist – routinised and ‘black boxed’, to be used without further discussion. The project didn’t have to reinvent the procedures of competency based psychometric assessment, or the principles of Item Response Theory and common scales. They didn’t have to reinvent ‘literacy’ and ‘numeracy’ – they had already been invented and made real by NALS and IALS (Kirsch 2001). Nevertheless, as a project,

131

Inside the Assessment Machine

a way of connecting various allies, it was original – as a UN response to a set of powerful desires and aspirations, for a reliable set of internationally comparable numbers. Before we introduce the characters, let me say more about the ‘assessment machine’ as it is informed by the work of Latour (1987). The idea suggests machine-like processes: standardisation and routines, inputs and outputs of data and resources. Metaphors of the machines and ‘machine building’ abound in international assessment regimes – with their various components and specifications and their technical expertise. Those perspectives draw our attention to the production of norms and standardisation – making testing technology durable (Latour 1991) and governable. Science and Technology Studies describes machines as social phenomena combining human actors and material artefacts (e.g. Latour 1991; 1994), where the agency of non-human actors must be considered (Law 1991), including their role in ‘configuring the user’ (Woolgar 1991). In describing an ‘assessment machine’, I refer to the way that assessment projects involve an assemblage of enrolled actors and resources, as a ‘set of allies’ (Latour 1987, 128), both human and non-human, that must somehow be bound together and work in concert. The assemblage and network of actors and components must be connected and kept in check, so that ‘none can fly apart from the group’ (1987, 128). What forces hold the international assessment machine and its networks together? Their arguments may convince their actors and supporters that assessment projects are ‘indispensable’ (Callon 1986). But arguments alone are insufficient to create and sustain a large-scale literacy assessment programme. International assessments respond to the various ‘interests’ (Latour 1987) of their participants and supporters – such as accumulation of prestige, status group membership, support for political and ideological projects, financial reward, career-building opportunities – all of which may be significant and additional to their stated goals (Addey, this volume). Those factors are critical to the existence of the project, and ultimately adjudicate on when an assessment machine works, and whether its findings are true (Latour 1987). On a day-to-day basis the ‘assessment machine’ involves inherently technical procedures of governance – the protocols, standards, frameworks and technical artefacts of the laboratory. Those ‘insider’ techniques and characters are essential to the running of the machine, the adjudication of conflict, and the production of data. In the workings of the assessment machine ‘nonhuman actors’ play a central role.2

132

Bryan Maddox

The Mongolian camel … now I’ll tell you about the legend of the Camel.3

Test items have history. The sociology and anthropology of numbers involve an obligation to piece together that story. It is commonplace for large assessment projects to ‘release’ some of their test items for open access. They give us a glimpse at what is going on. But those test items, without the stories of production, movement, reception and statistical performance are like empty shells – or fossils, whose analysis even the most insightful of researchers must acknowledge as inadequate. Test items are fundamental to assessment projects. They are like the teeth in the cog – the gear that enables tests to replicate their code, to make real the conceptual domains of ‘prose’ and ‘document’ literacy and numeracy. By the time we pick up the story the Mongolian Camel already exists – or at least it does as a young prototype. In 2005 LAMP asked their country teams to suggest nationally relevant test items. Those would fulfil two important purposes. They would help to introduce nationally meaningful test items (to represent the participating countries), and form a bank of international test items.4 We can already anticipate some challenges. Testing programmes like the idea of ‘realistic’, contextually meaningful test items. So into the black box go any unruly questions about how locally or nationally specific test items would be integrated into an international comparative project. The LAMP guidelines instructed that in order to retain the ‘psychometric properties of the test items’, the national teams should ‘keep modifications to a minimum’.5 The Camel is iconic in the imagination of the Mongolian nation, as a reminder of its nomadic past and present. The domesticated Bactrian Camel is widely used by nomadic herders, especially in the Gobi region. They have numerous adaptations that make them well suited to the harsh Mongolian climate. I am no expert in camels – but from observations, books and various internet sources the following can be found: Bactrian Camels have black skin to protect them from the sun, and long, dense hair that insulates every inch of their bodies. They can close their nostrils, and see through their eye-lids. Their fatty humps enable them to store energy for a long time, and their body is particularly good at conserving water (they rarely sweat and have very concentrated urine and dung). The Camel can drink huge quantities of water in one sitting, and has specially adapted blood cells that enable it to rapidly drink large quantities of water. There is no doubt more that can be said. Information about camels that nomadic herders

133

Inside the Assessment Machine

might want to share – about their diet, and about their resilience and special relationship with herders, the uses of their wool and milk, and how they appear in Mongolian folklore. When we first meet the Mongolian Camel as a prototype test item it had already left Mongolia and was attending a workshop on test item development in Marrakech, Morocco, in February 2005. We meet in its first Englishlanguage incarnation as ‘The camel “living dinosaur”’. It consists of a picture (a stimulus) of a Bactrian camel with a short section of text describing Bactrian camels, and two questions related to information in the text: (1) How many camels or ‘living dinosaurs’ are in our country? (2) How many litres of water drinks the Camel in a day? The ‘living dinosaur’ was accompanied to Morocco by other colleagues and foreign relatives. The Mongolian team arrived with other prototypes: ‘Sheepskin Processing’, ‘Cashmere Cost’, the ‘Workplace’, ‘Electricity Consumption’ and the ‘Map’. Other countries also arrived with their test items. They were introduced to some more experienced ‘international’ test items – who had worked in previous international assessment projects (IALS, and ALL).6 Those international role models appeared with their ‘projected difficulty level’, their item type (expository, form, injunctive, table, graph) and their ‘context’ labels (e.g. community, citizenship, work, consumer economics, leisure/recreation). In comparison, the national test items seemed inexperienced and provincial. After the initial workshop, the camel was selected, and went on to be modified and developed. It became cosmopolitan, had the attention of global experts, and received a budget that enabled it to travel globally. It visited the UNESCO Institute of Statistics in Montreal, spent some time receiving revisions in ‘ETS’ in Princeton and adaptions in ‘cApSTan’ (Linguistic Quality Control) in Brussels. It received an adapted title (‘Mongolia’s Bactrian Camel’), revised text and questions. The camel had started its life speaking Mongolian, was translated into English for the Moroccan workshop and later (with the help of professional experts), translated back into Mongolian language and Cyrillic script. How did the process of test item selection and modification take place? To explain that we need to introduce our second character. The ‘Framework’ Despite participants’ well-ordered reconstructions and rationalisations, actual scientific practice entails the confrontation and negotiation of utter confusion. The solution adopted by scientists is the imposition of various frameworks by

134

Bryan Maddox

which the extent of background noise can be reduced and against which, an apparently coherent signal can be presented. (Latour and Woolgar, 1979, 37)

Throughout the LAMP documents there are references to the ‘Framework’. It can be found in The International Adult Literacy Survey (IALS): Understanding What Was Measured (2001) that was written for Adult Literacy and Life Skills (ALL) by Irwin Kirsch at Educational Testing Service, based on the experience of IALS, and in a shorter academic chapter (Kirsch 2003). The Framework took on a life of its own, as it became referred to and operationalised in LAMP. Its existence represents a moment of ‘closure’ on a series of conceptual and operational debates and challenges, and making ‘routine’, what was previously innovative and experimental (see Latour 1987). The Framework drew on the prior experience of large-scale assessment projects (the National Adult Literacy Survey; NALS) in the United States, and later, the experience of the OECD International Adult Literacy Survey. Kirsch (2001; 2003) locates the origins of the Framework in the experience of institutions based in the United States, Canada and Australia. The Framework offers a desirable ‘statistical solution’ to problems of international comparability (2001, 26), by adopting a probabilistic approach to measurement of literacy and numeracy competencies, with a common scale and ‘levels’ of competence. The Framework defines and ‘black boxes’ a series of concepts, procedures and experiences that puts them beyond debate and that holds the project together. It can define literacy and manage its unruly diversity (2001, 4–6). The Framework remembers the role of an NALS committee, funded by the United States National Centre for Educational Statistics (NCES) in defining literacy (as ‘using printed and written information to function in society, to achieve one’s goals, and to develop one’s knowledge and potential’, page 6). That vague and almost relativist definition of literacy is reworked by the Framework to become a set of specific domains, categories and procedures. We get an insight into the character and ontology of the Framework, how it views and deals with literacy. The plurality of literacy practices becomes a narrow set of tasks and texts related to reading of printed and public documents (see Hamilton 2001). We begin to understand that the Framework is uncomfortable with too much chaotic diversity, and prefers an orderly approach to comparison. The Framework describes literacy in terms of the application of skills ‘for specific purposes in specific contexts’ (page 6), but appears narrow in its understanding of global diversity. LAMP owes everything to the Framework, and the Programme can be viewed as an attempt to further internationalise the globalising ambitions

135

Inside the Assessment Machine

that the Framework is only too keen to promote (page 1). Yet, it distances itself from the Framework’s applications such as IALS, as one might do of a slightly embarrassing and ethnocentric friend (UNESCO 2009). Nevertheless, there is love there, and if we look inside, at the private workings of the assessment machine, we find that the Framework is usually in charge. This can be illustrated by a description of the chance encounter in a Moroccan workshop, between the North American ‘Framework’ and the prototype test items. The Moroccan Camel meets the Framework

In an unlikely turn of events, the Mongolian camel meets the Framework at an international workshop in Marrakech. They are not alone. There are other prototype test items present. Most have not travelled before, and are unfamiliar to this kind of international meeting. As far as the Framework is concerned, many of the test items should not have been there at all. They are too local, ‘too country specific’ to be of much use to the Programme. Their English is poor, and they might not travel well to other countries.7 The Framework has very firm views about this, and has the support of Experts and the programme team. It is concerned about how the prototype test items will support ‘comparability’ to ‘provide data or measures that are commensurable across time periods and across populations of interest’ (page 1). It wants to ensure that comparisons can be made between the participating countries. Many of the prototype test items are much too local, too parochial (‘culturally specific’) to cope with international travel. Some did not even ‘fit the construct’. As a result, of all the test items that attended the Marrakech workshop, about two-thirds were told that they were not suitably qualified for international travel, and played no further role in the programme8 To some embarrassment and tension for the workshop hosts, they were told that most of the Moroccan test items (even ‘building a Mosque’) were unsuitable.9 There was also some disappointing news for the Mongolian team. ‘Sheepskin processing’ was rejected for cultural reasons, despite being an esteemed part of Mongolian life. Other prototype test items that did not make it beyond the workshop included the ‘Wrestling Star’ of Niger, Palestinian ‘Travel and Tourism’, and the ‘Female Students’ of Kenya. However, when the Framework met the Mongolian camel a relationship began that would transform the test item into a cosmopolitan and internationally recognised part of the Programme. Of course, from the Framework point of view there was some work to be done. There were some cosmetic improvements, modifications and adaptations. But the Camel could clearly fulfil the requirements of national and international audiences. This

136

Bryan Maddox

test item would travel – to work on its presentation and would attend private meetings and international workshops in Canada, the United States and Belgium. It would learn to speak in the multiple languages and scripts of the participating countries. It is fair to say that by the time it returned to Mongolia in 2010 to take part in their national assessment, it was no longer just a Mongolian Camel, but was a polished international actor. The Mongolian Gobi: a glorious return

We pick up on the story as the Mongolian Camel test item returns to Mongolia in 2010, and is reunited with the nomadic Camel herding populations of the Gobi Desert. After spending some time in the national capital, the Camel travelled south by Land Cruiser to the Gobi Desert. It was accompanied by the Mongolian testing team and various foreign observers, including an anthropologist, who knew little of Mongolian life, spent a lot of time writing notes in his notebook, and asking naive questions that the assessment team were obliged to politely answer. It was autumn, and the weather was getting cold. During the assessment events the nomadic gers were usually crowded, with much discussion, drinking and eating around the stove. This unruly informality was not what the test item was told to expect by the training manual and the other technical documents. Nevertheless, when it was the turn of the Mongolian Camel test item to perform, it was usually greeted with great fondness and familiarity by the nomadic herders. Many of the herders said that compared with some of the other test items that had appeared out of place and foreign in the desert, they had especially liked the Camel test item. The nomads smiled and joked with it, and treated it with the familiarity of an old friend. This informality was not always welcomed by the test item. It was sometimes a bit brusque and formal in its treatment of their answers – especially when they were unscientific or subjective, and where they had departed from the printed text that the test item provided. It seemed that after its foreign travel the test item had somehow changed. It was no longer a Mongolian Camel, no longer quite one of them, but a representative of some other place. Things got a little tense when some of the nomads for some inexplicable reason, had supplied incorrect and unsuitable answers to the Camel test item’s questions. This unanticipated behaviour surprised the psychometrician visiting from UNESCO, who had later commented that it was peculiar that so many of the nomadic Camel herders got the questions wrong, when ‘the answer is right there in the text’. But despite some disagreements with the

137

Inside the Assessment Machine

herders about the characteristics of camels, the meeting between them and the Camel test item was accompanied by good humour, hospitality and the drinking of vodka that one might expect from such an encounter. The Laboratory Whilst the technical literature is accessible in libraries, archives, patent offices or corporate documentation centres, it is much less easy to sneak into the few places where the chapters are written and follow the construction of facts in their most intimate details. (Latour 1987, 63)

We continue the story as the assessment data enters the laboratory. We can follow its trail, as the answer booklets in bundles and boxes are carried from the desert to the office in Ulan Bator. The answer booklets are carried by Land Cruiser along bumpy desert tracks and to the roads that lead back to the capital. There, they are warmly greeted and a whole office is dedicated to housing the mass of information they contain. Each answer booklet represents an assessment event – an encounter between respondents, the testers and test items. They still resonate with the memories of emotions, puzzles, good-humoured questions and discussions, and the work of the testing team. In the first of a series of transformations (Latour and Woolgar 1979; Latour 1987), a team goes painstakingly through each answer booklet. They read the detailed notes that the enumerators have added about the circumstances and conditions of each assessment event. They adjudicate on anomalies and incomplete data. They clean and prepare the data in a way that it can be understood by Des – the Data Entry Software. Here they encounter a problem. Des has only just arrived from Canada, and is not working. There is some heated debate between humans. Des must work, or must be made to work. It is eventually agreed that Des is probably useless and will not work. They have to ask for a replacement, and that will take some time. In the meantime, the answer booklets have to wait, tied up in their bundles and boxes, until the replacement arrives. Several months pass, and we hear nothing about the plight of the answer booklets. Then we hear some news. The replacement Des has arrived and is working. The data is cleaned and sent from Mongolia to the ‘centre of calculation’ (Latour and Woolgar 1979), the Institute of Statistics in Montreal. But somehow, in the process, a tragedy occurs. Despite their critical role in the assessment process, once Des has finished with it, and the data booklets are no longer required. The entry booklets are taken out and burned.10 They have served their purpose and are now considered worthless (Latour and Woolgar

138

Bryan Maddox

1979). On hearing this, the ethnographer cannot believe this turn of events. After everything they had been through – the answer booklets are destroyed? After some desperate searching a few answer booklets from the pilot stage are discovered that had avoided being burned, by hiding in some drawers in an upstairs office. The ethnographer gets emotional and cannot be consoled. He thinks that it was probably Des (no doubt supported by the Framework) who has committed this criminal act against his qualitative insights. Qualitative data (any contextual residues of the assessment events) is treated like dirt and removed. The evidence is there in the project documents. Des was only ever interested in a series of binary outputs: 1 = correct answer, 0 = question refused or not done (treated as the incorrect answer), and 7 = any other response (treated as the incorrect answer). The data is sent to the team of technicians at the laboratory in Montreal. It is not clear if they are aware of the acts that have been committed. The laboratory in Montreal looks something like this. A series of bright and airy glass-walled offices are home to a team of specialists. Often you can see the humans at their desks with their computers and papers. Sometimes they meet with other specialists and advisers who congregate in the meeting room and coffee room, especially when there are difficult and complicated discussions about complex technical procedures. Sometimes there are heated arguments about unspeakable things. The data has to be treated carefully. The information from each country has to be combined in a way that satisfies not only the technicians and software programmes, but also pleases their supporters in the country teams and the institute directors. There are complex, demanding and lengthy processes of weighting, establishment of scales and levels, ‘DIF’ checks for test item performance and comparison, and the production of graphs and tables. This takes months. The data is continuously refined. Sometimes things are faulty and have to be reworked. In all of this process the Framework is present and is regularly consulted if there is any question about the overall purpose.

The Project Director Despite the presence of a team of human actors, the assessment programme seems to be led by a series of powerful algorithms that has ultimate authority in technical matters and the production of data. It is known formally as ‘Item Response Theory’, but is more like an algorithm or an approach.11 We shall discuss IRT here with the friendly name of ‘Ira’. If you ask some

139

Inside the Assessment Machine

psychometricians to describe Ira, they may present him as one or more algorithms. That always makes me smile. It would be a bit like describing the human team members as a set of genetic material! Of course, in a sense he is an algorithm, but in terms of day-to-day interactions between Ira and the technicians he is much more than that. If you ask Ira’s colleagues, many would say that without Ira the whole project would not exist. In fact, the more you look inside the laboratory it becomes clearer that the Framework and Ira have been directing operations all along! In comparison, the humans, for all their emotional commitment to the project, have limited capacity alone to understand the world, and are replaceable. Many people have written academic articles about Ira, and Ira is so valuable to the project and so respected as to be protected at all times. So much so that some may feel uneasy about me writing about him in this way. What kind of character is Ira? If you ask the technicians, many would consider Ira as ‘pretty cool’, with his astute mathematical and perceptive abilities.12 Ira can produce tremendous mathematical insights, beautiful statistical artefacts and is always on hand to reassure people about the rigour and significance of the data. I think it’s fair to say that some people in the project team have come to love Ira. How about Ira’s ontology?13 How does Ira view the world? Well most would agree that Ira, despite his astute mathematical abilities, tends to be a bit discriminatory, and views things in one- or two-dimensional terms. Superrational some might say. Others might add that he tends to treat people like data, is not interested in ‘culture’ and does not seem to care too much about individuals. ‘All for the greater good’ Ira might reply! Perhaps Ira is a little too sure of himself? Few people inside the project would feel comfortable to criticise Ira openly. Of course, there are those who noticed that when he accompanied the project overseas, Ira suffered culture shock, and did not cope well with the diversity of languages and culture there, or the apparent sameness of people’s abilities. I am not sure if that is true, or if it is just the way people have treated Ira in the field – after all, you can’t blame Ira for what people ask him to do.14 With that in mind, the story of Ira’s meeting with the Mongolian Camel test item is worth recounting. Ira and the Camel

The story of Ira and the Camel provides some interesting insights into how Ira responds to questions of cultural difference, challenging contexts, and analysis at the level of local group characteristics. Ira and the technician do a DIF test using Mantel–Haenszel (M–H) procedures, and they say that that

140

Bryan Maddox

there was no problem with the Camel test item.15 It seemed to perform just fine. The test item had not introduced any significant bias due to factors such as gender, age and urban/rural location. As a result, there was no need to refer the test item to an expert committee to adjudicate on its performance (a fate that had met some of the other test items). The ethnographer asked Ira to look in more depth, to drill down in the data to look at the performance of the nomadic herding populations in the Gobi. About one-third of the respondents in that area own camels. They are the real Camel experts, so it seemed like a good place to investigate the test item performance. As we go ‘local’ things get a bit complicated. Ira, it seems, prefers to work with large sample sizes. It was not exactly that Ira stopped communicating with us. But, as the technician admitted – when you go very local, with small subgroup sizes, Ira tends to get a bit ‘unstable’. Admittedly, the problem was not entirely one of Ira’s making. The procedures of randomised sampling meant that only a few hundred Camel owners were asked to complete the Camel test item. In any case, it seemed that if Ira is asked to produce something very local, then Ira can produce an image of the local – even if Ira was never there! At this point the ethnographer (whose legitimacy is produced with the claim that ‘I was there’) got anxious, and tried to brand the whole enterprise as a case of ‘statistical fiction’. In response, Ira accused the ethnographer of being ethnocentric and said that he should consider other ontological positions, including those that are adopted by algorithms (a criticism that the ethnographer reluctantly had to accept). Ira and the technicians then produced some graphs, which would help the ethnographer to appreciate the test item performance. The ethnographer viewed the graphs excitedly – as ‘insider artefacts of laboratory life’. He was happier seeing the graphs than he had been when confronted by an apparently chaotic mass of tables and data (which he did not seem to understand) – and he did not even object when the technicians applied ‘smoothing’ techniques to produce curves that were aesthetically more pleasing (less ‘jagged) and ‘visually easier to study’ (see page 143). Unfortunately, even with data-smoothing procedures, the test item graphs did not match the beautiful idealised S-shaped curves that psychometricians working with Ira might have expected. Familiarity with camels did not help the nomads to obtain the correct answers and their literacy abilities were not a good predictor of performance on the Camel test item. On seeing this, Ira was silent, and was able to offer no plausible explanation.

141

Inside the Assessment Machine

This is not a Camel – the treachery of images Pursuing its quarry by two paths, the calligram sets the most perfect trap.

(Foucault 2008, 22)

The final part of the story takes us back into the Gobi, to consider the ‘testing situation’16 and literacy assessment events (see Maddox 2014). The nomads of the Gobi can unequivocally be described as experts on Bactrian camels. They live with and depend on the camels for their livelihoods and survival. Why is it then, that so many of the nomads provided incorrect answers to the test item questions? The answer seems to lie in the political ecology of international testing – and what the assessment Framework demands in terms of comparability. To paraphrase the Surrealist artist Magritte and his ‘treachery of images’ (Ceci n’est pas une pipe); the test item is not a Camel, but a representation of a Camel. The respondents are invited to draw on their expert knowledge with camels – but the realistic content of the test item invites them to becom unnecessarily familiar. The test item as ‘calligram’ (Foucault 2008) contains an image of the Camel (the stimulant), and an associated text. Magritte alerts us to the possibilities of this ambiguous relationship of image and text – that ‘between words and objects one can create new relations’. Foucault (2008) argues that the relationship between image and text establishes a hierarchical order; ‘running from the figure to discourse or discourse to figure‘ (page 33). In this case, the assessment events in the Gobi momentarily create spaces for competing orders. For the testers and the Framework, the answer is in the text and the associated cultural hierarchy is orientated to the written text. 17 The nomadic herders depart from the text to look instead in the direction of the referent (their experiences of Bactrian camels). The test item invited misrecognition. Meeting the test item the nomadic herders smiled and joked. They relaxed and talked freely about the camels. They could describe the resilience of camels, their suitability and adaptations to desert life and the harsh Mongolian winter. But despite its familiar appearance, ‘Mongolia’s Bactrian Camel’ was not a local actor. It was no longer the idealistic young prototype that ventured to the workshop in Morocco. As a test item its loyalty was to the Framework, Ira and the technicians, and their international programme of testing and comparison. The background knowledge of the herders made little impression on the test item, which treated all responses that departed from the answer booklet as incorrect.

142

Bryan Maddox

Conclusion We have to lay continuous connections leading from one local interaction to the other places, times and agencies, through which a local site is made to do something. (Latour 2005, 173)

This story has taken us inside the assessment machine to look at the life and times of a test item. This has taken us on a journey, from the Mongolian Gobi, to the global ‘centres of calculation’ (Latour 1987) in North America and Europe, and back again. The assessment machine holds together an assemblage of globally networked actors and components who are the characters of the story. There are the humans who desire internationally comparable statistics, the technicians and their country teams, who make their living from assessing the literacy of others. There are the non-human agents at the centre of the machine, without whom the assessment could not take place: test items, answer booklets, instruction documents, the powerful Framework and Ira (Item Response Theory) who direct the assessment, and whose ontology and views about the world give the assessment programme its distinctive character. The Camel test item has given us a glimpse into the ‘black boxed’ components and inner workings of the assessment machine – test item development, assessment events, data collection and cleaning, and procedures for establishing test validity. We have seen how the Camel test item travelled globally and was transformed from a ‘living dinosaur’ into a cosmopolitan character, and key player in the assessment programme. Tracing Ariadne’s thread of globalised assessment brings us to a strange assemblage in the desert that connects the local space with multiple places, ambitions, moments, resources, algorithms, documents and people. Out of a Land Cruiser in the Gobi Desert in 2010 appear an unlikely set of connected people and objects. They bring a set of test items that contain texts from other times and places – newspaper cuttings, menus, charts and tables, tide tables, advertisements. They have all been translated into Mongolian, but they look peculiar and foreign. ‘Mongolia’s Bactrian Camel’ test item is among them. What better candidate could there be to mediate in this unusual local-global encounter?

100d

Year

1856 1858 1860 1862 1864 1866 1868 1870 1872

Inside the Assessment Machine

143

Smoothed non-parametric graphs illustrating modified Mantel-Haenszel DIF analysis (MH-DIF) Mongolian Camel Item 1:

Proportion correct Proportion correct

1.00 0.90 0.80 0.70 1.00 0.60 0.90 0.50 0.80 0.40 0.70 0.30 0.60 0.20 0.50 0.10 0.40 0.00 0.30 0.20 0.10

10 12

14 16

18 20

22 24

Estimated total score

0.00 0

10 12

14 16

18 20

22 24

Estimated total score

26 28 3 0 32 34 36 38 40 Herders Non-herders

26 28 3 0 32 34 36 38 40 Smoothing parameter 0.6 Herders Non-herders

1.00 0.90

Proportion correct Proportion correct

Mongolian Camel Item 2:

Smoothing parameter 0.6

0.80 0.70 1.00 0.60 0.90 0.50 0.80 0.40 0.70 0.30 0.60 0.20 0.50 0.10 0.40 0.00 0.30 0.20 0.10

10 12

14 16

18 20

22 24

Estimated total score

0.00 0

10 12

14 16

18 20

22 24

Estimated total score

26 28 3 0 32 34 36 38 40 Herders Non-herders

26 28 3 0 32 34 36 38 40 Smoothing parameter 0.3 Herders Non-herders

tion correct

1.00 0.90 0.80 0.70 1.00 0.60 0.90 0.50 0.80

Smoothing parameter 0.3

0.10 0.00 0

10 12

14 16

Bryan Maddox

144

18 20

22 24

Estimated total score

26 28 3 0 32 34 36 38 40 Herders Non-herders Smoothing parameter 0.3

Mongolian Camel Item 3:

Proportion correct

1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 0

10 12

14 16

18 20

22 24

Estimated total score

26 28 3 0 32 34 36 38 40 Herders Non-herders Smoothing parameter 0.4

Source: UNESCO Institute of Statistics. With permission.

NOTES 1 On the application of Latour’s Ariadne’s thread in literacy research see Kell (2011). 2 On non-human actors in social life and education see the extensive writing of Bruno Latour (cited above) and Actor Network theorists. For secondary literature see Fenwick and Edwards (2010). 3 Quotation from film The Story of the Weeping Camel (2003), directed by Luigi Falorni and Byambasuren Davaa. 4 The desire to build an International Item Bank was mentioned in the UIS ‘Translation and Adaptation Guidelines’ (2009). 5 The insights on adaptation are drawn from discussions with project staff and from the LAMP documents ‘Translation and Adaptation Procedures’ (2005) and the ‘Translation and Adaptation Guidelines’ (2009). 6 At the Marrakech workshop in 2005 IALS and ALL test items were classified as ‘International’ test items. The prototype national test items (provided by participating countries) were described as ‘LAMP’ test items (suitable for cross-country assessment) or as ‘National’ (country specific) test items that did not comply with the demand of the Framework for international use (International workshop on the implementation of LAMP, Marrakech February 2005). It is likely that the test items were reviewed in

145

Inside the Assessment Machine

7 8

11 12

13 14

15 16 17

accordance with the ‘International Guidelines for Fairness Review of Assessments’ (Educational Testing Service 2009). This analysis is informed by the test item review form from the Marrakech workshop (7 December 2005). Several participants at the Marrakech workshop voiced concerns about the preponderance of IALS and ALL (‘International’) test items compared with the ‘LAMP’ items (International workshop on the implementation of LAMP, Marrakech February 2005). The Moroccan team at the Marrakech workshop were ‘very vocal’, about the rejection of their test items and on the perceived unsuitability of IALS test items in their country context (Source: Guadalupe, in this volume, and memo, 19 March 2006 from ETS to LAMP). The whole set of 4000 answer booklets were burned. ‘It was destroyed … all of … at dump collecting site, one time. The institute kept some empty booklets for archive. Suddenly, couple of booklets for pilot survey were found and attached herein. Please note that there were not used at main survey’ (Per. Com. 29 March 2013). Critics of IRT have argued that it is not a theory, but an approach or model. See for example Blum, Goldstein and Guerin-Pace (2001). On Item Response Theory being ‘cool’, see the excellent series of You Tube presentations by Karon Cook at Northwestern University on ‘A Conceptual Introductuion to Item Response Theory’. On interactions between humans and non-humans, and the ontology of non-human actants see Latour (1996), 239. Thanks to Ron Hambleton for this insight. We should distinguish between the characteristics and strengths of IRT and the constructs and uses that it is put. A weak construct is not necessarily a fault of IRT. On DIF (Differential Item Functioning) see Zumbo (2007). On the importance of considering the ‘testing situation’ in psychometric analysis see Zumbo (2007). On logocentricism see Derrida (1976), and on the significance of written texts as sources of authority see for example, Riles (2006).

REFERENCES Blum, A. Goldstein, H. and Guerin-Pace, F. (2001). ‘International Adult Literacy Survey (IALS): An Analysis of International Comparisons of Adult Literacy’. Assessment in Education 8 (2), 225–46. Callon, J. (1986). ‘Some Elements of a Sociology of Translation: The Domestication of Scallops and the Fishermen of St. Brieuc Bay’. In J. Law, ed., Power, Action and Belief: A New Sociology of Knowledge. London: Routledge and Kegan Paul, 196–233. Derrida, J. (1976). Of Grammatology. London and Baltimore, MD: Johns Hopkins University Press. Educational Testing Service (2009). International Principles for Fairness Review of Assessments A Manual for Developing Locally Appropriate Fairness Review Guidelines in Various Countries. Princeton, NJ: Educational Testing Service.

146

Bryan Maddox

Fenwick, T. and Edwards, R. (2010). Actor-Network Theory in Education. London and New York: Routledge. Foucault, M. (2008). This is Not a Pipe. Quantum Books. Guadalupe, C. and Cardoso, M. (2011). ‘Measuring the Continuum of Literacy Skills among Adults: Educational Testing and the LAMP Experience’. International Review of Education 57, May, issue 1–2, 199–217. Hamilton, M. (2001). ‘Privileged Literacies: Policy, Institutional Process and the Life of the IALS’. Language in Education 15 (2–3), 178–96. — (2012). Literacy and the Politics of Representation. London: Routledge. Kell, C. (2011). ‘Inequalities and Crossings: Literacy and the Spaces-in-between’. International Journal of Educational Development 31 (6), 606–13. Kirsch, I. (2001). ‘The International Adult Literacy Survey (IALS): Understanding What Was Measured’. ETS Research Report RR-01-25. Princeton, NJ: Educational Testing Service. — (2003). ‘Measuring Literacy in IALS: A Construct Centred Approach’. International Journal of Educational Research 39, 181–90 Latour, B. (1987). Science in Action: How to Follow Scientists and Engineers through Society. Cambridge, MA: Harvard University Press. — (1991). ‘Technology is Society made Durable’. In J. Law ed., A Sociology of Monsters: Essays on Power, Technology and Domination. London: Routledge, 103–31. — (1993). We Have Never Been Modern. Cambridge, MA: Harvard University Press. — (1994). ‘Where Are the Missing Masses? The Sociology of a Few Mundane Artefacts’. In W.Bijker and J.Long, eds, Shaping Technology/Building Society. Cambridge, MA: MIT Press, 225–58. — (1996). ‘On Interobjectivity’, Mind, Culture and Activity 3 (4), 228–45. — (2005). Reassembling the Social: An Introduction to Actor-Network Theory. Oxford: Oxford University Press. Latour, B. and Woolgar, S. (1979). Laboratory Life: The Construction of Scientific Facts. Princeton, NJ: Princeton University Press. Law, J. (1991). ‘Introduction: Monsters, Machines and Societal Relations’. In J. Law, ed., A Sociology of Monsters. London: Routledge, 1–23. Maddox, B. (2014). ‘Globalising Assessment: An Ethnography of Literacy Assessment, Camels and Fast Food in the Mongolian Gobi’. Comparative Education 50 (4), 474–89. Riles, A., ed. (2006). Documents: Artifacts of Modern Knowledge. Ann Arbor: University of Michigan Press. UNESCO Institute of Statistics (2009). ‘The Next Generation of Literacy Statistics: Implementing the Literacy Assessment and Monitoring Programme (LAMP)’. Technical Chapter No. 1. Montreal: UIS. Woolgar, S. (1991). ‘Configuring the User: The Case of Usability Trials’. In J. Law, ed., A Sociology of Monsters. London: Routledge, 57–102. Zumbo, B.D. (2007). ‘Three Generations of Differential Item Functioning (DIF) Analysis: Considering Where it is Now, and Where it is Going’. Language Assessment Quarterly 4, 223–33.