Pearson Global Scale of English by Pearson Middle East

INTRODUCING THE GLOBAL SCALE OF ENGLISH Executive summary The Global Scale of English (GSE) offers a numeric transformation of the levels defined in the Common European Framework of Reference for Languages (CEFR, Council of Europe 2011). It ranges from 10 to 90 and provides an instrument for more granular and detailed measurements of learnersâ&#x20AC;&#x2122; level and progress than is possible with the CEFR itself, with its limited number of wide levels. In the spirit of the CEFR itself, GSE values may be attributed to component or enabling skills (e.g. listening, spoken interaction, vocabulary range, phonological control) as well as being combined to indicate an overall level. The intention is to encourage the profiling of learners, rather than merely classifying them by level. The GSE is intended to serve: a) as the metric underlying all Pearson English learning, teaching and assessment products b) as a benchmark for English achievement which can be related to other tests, to national examinations, and to instructional materials c) as a benchmark of the level of English needed for a range of professional and academic purposes.

Origins: GSE and PTE Academic The Global Scale of English has its origins in the reporting scale developed for PTE Academic, which is a high-stakes academic English test for students wishing to study at institutions where English is the language of instruction. The scale was designed to report test takersâ&#x20AC;&#x2122; results as a numeric integer from 10 to 90 as well as being located in a CEFR level. In the process of aligning PTE Academic scores to the CEFR, the reporting scale became in effect a generic linear transformation of the set of CEFR levels. The alignment was accomplished in four phases, following the procedure suggested by the Council of Europe for building a case to relate test scores to CEFR levels (Council of Europe, 2009; Martyniuk, 2011): 1.

Familiarisation. All individuals involved in the development of the test received intensive prior training in understanding and applying the CEF.

Specification. The selection and definition of language tasks to be included was closely linked to the CEFR, using a checklist to determine for each criterion mentioned in the CEFR publication (a) whether it was applicable and (b) whether it had been applied.

Standardization. Raters and reviewers underwent rigorous training and had to pass exams evaluating their consistency and their agreement with pre-established standards.

Empirical validation. This has two aspects: (a) the test itself should meet requirements for reliability and validity to serve as a measurement instrument and (b) the relation with the CEFR must be supported by statistical data. Reliability was assessed by a number of means, e.g., by computing test takersâ&#x20AC;&#x2122; ability estimates from separate calibrations of all odd and all even items: the high correlation (0.90) found between the scores based on these test halves suggested a reliability of 0.95 for the yet unselected field tests. Likewise validity was determined by several means, including comparing scores obtained by native and non-native speakers of various different age groups. Finally, the statistical link between test scores and CEFR levels was obtained by correlating test taker-centred and item-centred approaches. Both test taker responses and items were assessed by teams of expert raters, and the resulting ratings were Rasch analysed1.

For further information see research papers on: http://pearsonpte.com/research/Pages/ResearchSummaries.aspx

Once this task had been completed and the test launched, it quickly became evident that this scale would have useful applications outside the scoring of the PTE Academic test, additional research into the lower levels was conducted and a project was initiated accordingly to offer the scale to the field as an instrument for measuring student progress and proficiency, alternative and complementary to the CEFR. As such, it became known as the Global Scale of English (GSE).

GSE and CEFR Each level of the CEFR corresponds to a range of values on the GSE, extending from Tourist (below A1) to C2 level, or from 10 to 90. Measuring below 10 or above 90 is irrelevant as below 10 learners may know just a few isolated words, but are unable to use the language for communication, while above 90 any communication about anything is bound to be successful. The relation between the GSE and the CEFR is summarised in this diagram.

GSE

The CEFR levels are not equal in width, with A2, B1 and B2 being about twice as wide as the A1 and C1. This corresponds to the observed inequalities in width between the different levels as observed in our item response theory analysis for PTE Academic, as well as in Brian North’s original research (North, 2000)2. Within the wider levels the CEFR distinguishes ‘plus levels’ as shown in the above diagram. These indicate a higher probability (≥ 75%) that tasks at these levels will be performed successfully3. The following table is based on the calibration and scaling of PTE Academic and other Pearson tests of English. CEFR

Delta range

Score range

-1.366 to -1.154

-1.155 to -0.495

30 - 42

-0.496 to 0.273

43 - 58

0.274 to 1.104

59 – 75

1.105 to 1.553

76 - 84

>1.554

>85

22 - 29

GSE and Pearson Syllabuses As part of the GSE project, Pearson is creating an associated set of Pearson Syllabuses, initially for General, Professional, Academic and Young Learners (to be published in 2013). These will help to link instructional content with assessments and to create a reference for authoring, instruction and testing. In the process of creating these syllabuses, Pearson experts have continued and extended the work of the Council of Europe, by adding to the descriptor set and scaling these to CEFR levels and to GSE values based on ratings by groups of in-house experts and independent and teachers. It was found necessary to extend the descriptor set for three main reasons:

2 3

The logit values based on descriptor ratings published in Brian North (2000). By definition, a learner is at a given level if he or she can successfully perform at least 50% of the tasks at that

level.

i) ii) iii)

In the original CEFR descriptors are unequally distributed across the skills with many more for speaking than for any of the other skills. Descriptors are also unequally distributed across levels, with too few at A1, C1 and C2. Some key descriptors are lacking, related to standard communicative tasks universally taught and included, e.g., in the Threshold Level specifications.

The starting point were the Common European Framework descriptors in Brian North’s dissertation (North, 2000), which contains the same set of descriptors as the Council of Europe publication (2001) plus the difficulty ratings computed from the ratings by panels of teachers. Pearson ‘sourced’ new descriptors (a) by identifying common syllabus items that seemed to be missing from the CEFR and (b) by looking at descriptors and syllabus specifications widely used in the USA. These descriptors were rewritten following the format of CEFR descriptors, and were then rated on the GSE by selected in-house experts: publishers and experienced development editors from 11 different Pearson publishing centres around the world. Following the protocol established for the Common European Framework (Council of Europe, 2001), this process fell into four stages: i) Familiarising the teams with the GSE and its relation to the CEFR. ii) Guided hands-on practice at rating descriptors of known difficulty. iii) Assessment by trained individuals of randomly assorted groups of descriptors, including anchor items from the original CEFR document. iv) Tabulation and Rasch analysis of the results. A second phase of analysis was carried out using Survey Monkey, in which the same randomised groups of descriptors were rated online according to CEFR levels by several hundred teachers self-described as “very familiar with the CEFR” from over 50 different countries. These results were also Rasch analysed and compared with those from the in-house experts. Once a very small proportion of outliers had been removed, the degree of correspondence was very high: 0.96. Besides the GSE-rated descriptors (new and pre-existing) already referred to, the Pearson Syllabuses also contain grammar and vocabulary inventories. These are expressed in the form of can-do statements with suggested sample exponents rather than as the prescriptive lists found in more traditional syllabuses. In addition they include guidelines, also in the form of can-do statements, to enable less experienced teachers to assess their students’ performance in speaking and writing in terms of what is to be expected at the GSE scale values.4

Summary: scope and purpose The GSE is intended to answer commonly asked questions and to resolve the related alignment issues in six main areas: 1. How good is my English? (Assessing proficiency) 2. What is the learning goal I want to achieve? (Defining target levels) 3. What do I need to learn to get there? (Diagnosing learning needs) 4. How do different courses compare? (Alignment to standards across different providers) 5. What level course should I study? (Placement) 6. How can I measure my progress? (Demonstrating learning outcomes) 7. How will people recognise how good my English is? (Certification aligned to recognised standards).

See Pearson General Syllabus of English (in preparation)

No currently available reference scale or set of benchmarks resolves all these issues. CEFR levels are too wide to provide the sufficiently granular differentiation needed in language learning, teaching and assessment, and the definition of the levels is open to interpretation and misinterpretation. As the GSE becomes embedded in its approach to English learning, teaching and assessment, Pearson offers a whole range of syllabuses, learning materials and assessment tools to cater for as wide as possible a range of requirements.

References Council of Europe (2001). The Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Cambridge: CUP. Council of Europe (2009). A Manual for Relating Language Examinations to the Common European Framework of Reference for Languages: Learning, teaching, assessment. Strasbourg: Council of Europe. North, Brian (2000). The Development of a Common Framework Scale of Language Proficiency (PhD thesis). New York, NY: Peter Lang. Martyniuk,Waldemar (2011). Aligning Tests with the CEFR: Reflections on Using the Council of Europe's Draft Manual. Studies in Language Testing. Cambridge: CUP.