IDL - International Digital Library Of Management & Research Volume 1, Issue 3, Mar 2017
Available at: www.dbpublications.org
International e-Journal For Management And Research-2017
Concurrent Validity and Stability of the Maze Task in a Sample of College Students James M. Kuterbach Dept. of Human Development and Family Studies Penn State DuBois DuBois, PA, USA jmk110@psu.edu
Abstract:
This paper examines the concurrent validity and test-retest reliability of the maze task in a sample of college students. The maze task is a form of Curriculum Based Measurement that is typically used to assess reading comprehension in elementary and secondary students, but has not been used with college students. Three visions of the maze task (one-, two-, and three-minute probes) were created and compared to student scores on the Nelson-Denny Reading Test (NDRT), student’s self-report of GPA, and scores on the SAT-Reading, -Math, and –Writing tests. The one-minute probe was found to have the best psychometric properties, with high correlations with the NDRT, GPA, and SAT-Reading, as well as divergent validity with the SAT-Writing test. Implications for use with college students is discussed. Keywords: Reading assessment, maze task, college students, learning disabilities 1.
INTRODUCTION
With the large influx of students with a learning disability into colleges and universities, administrators and evaluators need tools that will aid them in determining the need for special accommodations for student populations. While current assessments do a good job, they are generally lengthy and may be a deterrent to students seeking out assistance. What is needed is a brief screening tool that is both a valid and reliable measure of a student’s academic skills. The purpose of this study is to evaluate the reliability and validity of the maze task, a commonly used reading
IDL - International Digital Library
comprehension curriculum-based assessment tool, in a sample of college students. The disability category with the largest proportion of students continuing on for a postsecondary education is students with a learning disability (Horn, Berktold, & Bobbitt, 1999). Students with a learning disability also make up the fastest growing population of college students with a disability (Henderson, 2001), with more than twenty-seven percent of high school students diagnosed as having a specific learning disability continuing on for postsecondary education (Wagner, Newman, Cameto, Garza, & Levine, 2005). This increase in students with a learning disability at the postsecondary level is creating new challenges for educators, evaluators, and administrators. College disability services administrators report that students with a learning disability tend to want the same accommodations at the postsecondary level that they received in high school (e.g., oral essay exams, no foreign language, extended time on tests), even when the documentation of their disability maybe shaky, at best (McGuire, 2000). One author reports that the number of overall requests for specific accommodations had risen by more than 160% between the years 2000 and 2001 (Ofiesh, Mather, & Russell, 2005). The required type and comprehensiveness of assessments at the secondary and postsecondary levels, as well as the documentation needed to demonstrate a history of services varies greatly from institution to institution (Gregg, Coleman, Davis, Lindstrom, & Hartwig, 2006). Some students are required to produce documentation on 1|P a g e
Copyright@IDL-2017
IDL - International Digital Library Of Management & Research Volume 1, Issue 3, Mar 2017
Available at: www.dbpublications.org
International e-Journal For Management And Research-2017 their own, while others are required to complete new evaluations. With comprehensive evaluation lasting six to eight hours some students find this to be daunting (Canter, 2004). The largest segment of students with a learning disability is those with a reading disorder (Lorry, 2000). Both typically achieving students and learning disabled students are often overwhelmed by the amount of reading that is needed at the college level (Du Boulay, 1999). Even though reading ability has been found to be a significant predictor of college freshman grades (Wood, 1982), reading is generally not directly assessed beyond the secondary level. While reading is not directly assessed, the results of reading are indirectly assessed for college students throughout their college career (Du Boulay, 1999). Lack of reading skills, even for students without a diagnosed disability, is one of the biggest problems in postsecondary education, but problems in reading at the postsecondary level are generally not identified until the problem manifests itself in the classroom (Du Boulay, 1999). Reading disabilities are also the most likely of all learning disabilities to serve as a basis for an accommodations claim in higher education (Lorry, 2000). Educators and administrators working with postsecondary students with reading problems need a quick, easy method to assess a student’s reading ability. A test currently in use in colleges around the country is the Nelson-Denny Reading Test (NDRT; Brown, Fishco, & Hanna, 1993). The NDRT is a widely used test of reading, which includes a vocabulary and a comprehension section, as well as a reading fluency measure. The NDRT has been used with college students in research on reading comprehension (Nicaise & Gettinger, 1995; Bell & Perfetti, 1994; Onwuegbuzie & Collins, 2002), as well as for diagnostic and decision-making purposes (Norman, Kemper, & Kynette, 1992). The NDRT has been found to be a good predictor of college freshman grades (Wood, 1982) and has been used as a criterion measure for other tests of reading (Hannon & IDL - International Digital Library
Daneman, 2006; Wood, Nemeth, & Brooks, 1985), memory, and cognitive processes (Carver, 1992; Davis, Bardos, & Woodward, 2006; Millis, Magiliano, & Todaro, 2006). However, the NDRT takes a total of 35 minutes to administer, with the Comprehension section taking 20 minutes. An alternative to norm-based testing is curriculumbased measurements (CBM). Curriculum-based measurement has been defined by Deno (1987) as “any set of measurement procedures that use direct observation and recording of a student’s performance … as a basis for gathering information to make instructional decisions” (in Marston, 1989, p. 62). Curriculum-based measurement has a long history in the research literature (Shapiro, Keller, Lutz, Santoro, & Hintze, 2006). Test-retest reliability coefficients for reading CBM probes range from .82 to .96, with interrater reliability of .99 and reliability coefficients for parallel forms ranging from .84 to .96 (Marston, 1989). Validity studies for CBM in reading have found correlation coefficients ranging from .73 to .91, with most coefficients above .80 (Marston, 1989). One example of a reading CBM is the maze task. The maze task is a multiple-choice cloze task that students complete while silently reading a short reading passage that is rated at their grade level (Fuchs & Fuchs, 1992). The first sentence of the passage is left unchanged, but thereafter every seventh word of the passage is replaced with a three-word forced choice inside of parenthesis. Two of the words are distractors and one is the correct word for the sentence. The number of correct words chosen during the testing time is the student’s score. The maze task is used to determine a student’s functioning level, but it can also be used as a progress-monitoring tool because it can produce multiple data points in order to chart growth across the school year. Another advantage that the maze task has is that it can be given to several students at the same time (Madelaine & Wheldall, 2004).
2|P a g e
Copyright@IDL-2017
IDL - International Digital Library Of Management & Research Volume 1, Issue 3, Mar 2017
Available at: www.dbpublications.org
International e-Journal For Management And Research-2017 Timed tests have been used to distinguish between college students with a learning disability from those who are typically achieving (Ofiesh, Mather, & Russell, 2005) and another study found similar results using adults in a clinic setting (Lesaux, Pearson, & Siegel, 2006). Lack of reading skills is one of the biggest problems in postsecondary education, but problems in reading at the postsecondary level are generally not identified until the problem manifests itself in the classroom (Du Boulay, 1999). Students are often overwhelmed by the amount of reading that is needed at the college level (Du Boulay, 1999). Some researchers feel that reading is the most important skill in college (Onwuegbuzie & Collins, 2002). 2.
OBJECTIVES
The purpose of this study is to evaluate the reliability and validity of the maze task, a commonly used reading comprehension curriculum-based assessment tool, in a sample of college students. The maze test is a quick, easy reading comprehension test that can be used for both determining a student’s functioning level, as well as for progress monitoring. While the maze task is commonly used in both primary and secondary schools it has not been used with college students. The Standards for Educational and Psychological Testing (AERA, APA, & NCME, 1999) dictate that when a test is used in a way in which it has not been validated new evidence should be collected in order to justify this new use. Consequently, this study will investigate the concurrent validity of the maze task in a college sample. 3.
METHODOLOGY
Participants Participants were 141 undergraduate college students who were enrolled in two upper-level human development classes. There were 16 male participants IDL - International Digital Library
and 125 female participants, between the ages 18 and 50 with a median age of 21. Of those participants reporting a race 78 percent were white, 7.8 percent were African-American, 2.1 percent were AsianAmerican, 2.1 percent were Latino, and 3.7 percent indicated they were multi-racial. Participants’ semester standing ranged from first-semester to ninth-semester, with a median semester standing of seventh-semester. Seventy-seven percent of participants were Human Development and Family Studies majors, with the remaining participants majoring in other social sciences, including Psychology and Communication Sciences and Disorders. Four participants (2.8 percent) reported having received special education services at some time during their education. Materials The Nelson-Denny Reading Test. The Nelson-Denny Reading Test Form G (NDRT; Brown, Fishco, & Hanna, 1993) is a group administered standardized reading test that assesses a student’s vocabulary, reading comprehension, and reading rate. The NDRT is a commonly used measure of silent reading comprehension (Cirino, Israelian, Morris, & Morris, 2005), with reported test-retest reliability scores ranging from .76 to .81 (Brown et at., 1993), and has established correlations between .60 and .69 with other measures of reading (Murphy, 1995). The NDRT Reading Comprehension section consists of seven reading passages and 38 multiple-choice comprehension questions. Administration time-limit for the NDRT Reading Comprehension section is 20 minutes (Brown et al., 1993). The maze task. The maze task is a commonly used curriculum-based measurement (CBM) used to assess student reading comprehension in the elementary and secondary levels of education. Fuchs and Fuchs (1992) reported correlations between the maze task and the Reading Comprehension subtest of the Stanford Achievement Test of .77 in elementary students, while Espin and Foegen (1996) report correlations between the maze task and three different measures of 3|P a g e
Copyright@IDL-2017
IDL - International Digital Library Of Management & Research Volume 1, Issue 3, Mar 2017
Available at: www.dbpublications.org
International e-Journal For Management And Research-2017 comprehension to be between .56 and .62 in secondary students. The creation and administration of the maze task followed the procedure described by Fuchs and Fuchs (1992). The maze task requires the participant to read a passage which has been previously prepared such that, following the first sentence, every seventh word has been replaced with a forced-choice of three possible words (Espin et al., 2001). This study used three different passages to create probes of 1-minute, 2-minutes, and 3-minutes in length. The passages used in the study were taken from a textbook used in a survey of Human Development course (Dacey & Travers, 2004) and each had a Flesch-Kincaid Grade Level of 12.0. Procedure Participants completed a questionnaire that included demographic information, education experiences, and recollection of grade point average (GPA) and scores of the Scholastic Aptitude Test (SAT). Both the maze task and the NDRT were administered in a group setting. Participants were given the maze task probes first and were timed by the investigator for oneminute, two-minutes, and three-minutes. The NDRT Reading Comprehension section was then administered, with a 20-minute time-limit. Approximately three weeks later students from one class (68 participants) completed a retest involving the three maze task probes.
ď Ą=0.05; however the Bonferroni correction for multiple comparisons was employed. 4.
OUTCOMES
Independent samples t-test showed no significant difference between male and female participants in terms of their scores on the NDRT, one-minute, twominute, or three-minute maze task probes. Correlations above .50 are considered large (Cohen, 1988). Table 1 displays the correlations between the maze task probes and the criterion measures. All correlations between maze tasks and the NDRT were significant at the ď Ą=.003 level, with the strongest correlation found between the one-minute probe and the NDRT (r = .606). In addition, both the one-minute and twominute probes correlated moderately with participant recollection of their GPA (r = .390 and .341, respectively). Finally, only the one-minute maze task probe correlated significantly with participant recollection of their SAT reading score (r = .502). None of the maze task probes correlated significantly with participant recollection of SAT mathematics scores or SAT writing scores.
Analysis Pearson Product Moment Correlations were performed between the three maze task probes and the NDRT. In addition, the maze task probes were correlated with participants’ recollection of their GPA and SAT scores. Finally, test-retest reliability was evaluated by correlating the one-minute probe, two-minute probe, and three-minute probe maze tasks from the initial administration with those from the retest administration. Significance level was initially set a
IDL - International Digital Library
Test-retest reliabilities can be found in table 2. All three maze task probes had good test-retest reliability. The one-minute probe showed the highest reliability, with a correlation of .952, which is suitable reliability
4|P a g e
Copyright@IDL-2017
IDL - International Digital Library Of Management & Research Volume 1, Issue 3, Mar 2017
Available at: www.dbpublications.org
International e-Journal For Management And Research-2017 for making diagnostic decisions (Salvia & Ysseldyke, 2003).
CONCLUSION This research demonstrates that the maze task has the psychometric properties necessary to be used to determine reading comprehension ability in college students. This study examined one-minute, twominute, and three-minute probes and found that the one-minute probe had the best psychometric properties, with the highest test-retest reliability, highest correlation with the NDRT, self-reported GPA, and SAT-Reading scores, and good divergent validity with SAT-Writing scores. While the divergent validity between the maze task and the SATMath score was not as robust, the correlation would not be described as high (Cohen, 1988). These results give evidence to the idea that the maze task could be used for determining reading comprehension levels in college students, as well as for progress monitoring of college students with a reading disability.
REFERENCES Bell, L. C. & Perfetti, C. A. (1994). Reading skill: Some adult comparisons. Journal of Educational Psychology, 86, 244-255. Brown, J. I., Fishco, V. V., & Hanna, G. (1993). Nelson-Denny Reading Test: Manual for scoring and interpretation. Chicago, IL: Riverside. Canter, A. (2004). A problem-solving model for improving student achievement. Principal Leadership Magazine, 5. Retrieved from http://www.naspcenter.org/principals/nassp_probso lve.html IDL - International Digital Library
Carver, R. P. (1992). Reliability and validity of the speed of thinking test. Educational and Psychological Measurement, 52, 125-134. Cirino, P. T., Israelian, M. K., Morris, M. K., & Morris, R. D. (2005). Evaluation of the doubledeficit hypothesis in college students referred for learning difficulties. Journal of Learning Disabilities, 38, 29-44. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlaum. Davis, A. S., Bardos, A. N., & Woodward, K. M. (2006). Concurrent validity of the general ability measure for adults (GAMA) with sudden-onset neurological impairment. International Journal of Neuroscience, 116, 1215-1221. Du Boulay, D. (1999). Argument in reading: What does it involve and how can students become better critical readers? Teaching in Higher Education, 4, 147-162. Fuchs, L.S., & Fuchs, D. (1992). Identifying a measure for monitoring student reading progress. School Psychology Review, 21, 45-58. Greg, N., Coleman, C., Davis, M., Lindstrom, W., & Hartwig, J. (2006). Critical issues for the diagnosis of learning disabilities in the adult population. Psychology in the Schools, 43, 889-899. Hannon, B. & Daneman, M. (2006). What do test of reading comprehension ability such as the VSAT really measure?: A componential analysis. In A. V. Mittel (Ed.) Focus on educational psychology (pp. 105-146). Hauppauge, NY: Nova Science Publishers. Henderson, C. (2001). College freshman with disabilities: A statistical profile. HEATH Resource Center, American Council on Education, U.S. Department of Education. Horn, L., Berktold, J. & Bobbitt, L. (1999). Students with disabilities in postsecondary education: A profile of preparation, participation, and outcomes. Retrieved from http://nces.ed.gov/pubs99/1999187.pdf Lesaux, N. K., Pearson, M. R., & Siegel, L. S. (2006). The effects of timed and untimed testing 5|P a g e
Copyright@IDL-2017
IDL - International Digital Library Of Management & Research Volume 1, Issue 3, Mar 2017
Available at: www.dbpublications.org
International e-Journal For Management And Research-2017 conditions on the reading comprehension performance of adults with reading disabilities. Reading and Writing, 19, 21-48. Lorry, B. J. (2000). Language-based learning disabilties. In M. Gordon and S. Keiser (Eds.) Accommodations in higher education under the Americans with disabilities act (ADA): A nononsense guide for clinicians, educators, administrators, and lawyers. (pp. 20-45). New York, NY: The Guilford Press. Madelaine, A. & Wheldall, K. (2004). Curriculumbased measurement of reading: Recent advances. International Journal of Disability, Development and Education, 51, 57-82. Marston, D. B. (1989). A curriculum-based measurement approach to assessing academic performance: What is it and why do it. In M. R. Shinn (Ed.) Curriculum-based measurement: Assessing Special Children. (pp. 18-78). New York, NY: The Guilford Press. McGuire, J. (2000). Educational accommodations: A university administrator’s view. In M. Gordon and S. Keiser (Eds.) Accommodations in higher education under the Americans with disabilities act (ADA): A no-nonsense guide for clinicians, educators, administrators, and lawyers. (pp. 2045). New York, NY: The Guilford Press. Millis, K., Magliano, J., & Todaro, S. (2006). Measuring discourse-level processes with verbal protocols and latent semantic analysis. Scientific Studies of Reading, 10, 225-240. Murphy, S. (1995). An analysis of the construct and predictive validity of the CPT-R and Nelson Denny tests. Unpublished manuscript, Rose State College, Midwest City, Oklahoma. Nicaise, M. & Gettinger, M. (1995). Fostering reading comprehension in college students. Reading Psychology, 16, 283-337. Norman, S., Kemper, S., & Kynette, D. (1992). Adults’ reading comprehension: Effects of syntactic complexity and working memory. Journal of Gerontology, 47, 258-265. Ofiesh, N., Mather, N., & Russell, A. (2005). Using speeded cognitive, reading, and academic IDL - International Digital Library
measures to determine the need for extended test time among university students with learning disabilities. Journal of Psychoeducational Assessment, 23, 35-52. Onwuegbuzie, A. J. & Collins, K. M. (2002). Reading comprehension among graduate students. Psychological Reports, 90, 879-882. Salvia, J. & Ysseldyke, J. E. (2003). Assessment in special and inclusive education (9th edition). Boston, MA: Houghton Mifflin Company. Shapiro, E. S., Keller, M. A., Lutz, J. G., Santoro, L.E., & Hintze, J. M. (2006). Curriculum-based measures and performance on state assessment and standardized tests: Reading and math performance in Pennsylvania. Journal of Psychoeducational Assessment, 24, 19-35. Wagner, M., Newman, L., Cameto, R., Garza, N., & Levine, P. (2005). After high school: A first look at the postschool experiences of youth with disabilities. A report from the National Longitudinal Transition Study-2. Menlo Park, CA: SRI International. Wood, P. H. (1982). The Nelson-Denny Reading Test as a predictor of college freshman grades. Educational and Psychological Measurement, 42, 575-583. Wood, P. H., Nemeth, J. S., & Brooks, C. C. (1985). Criterion-related validity of the Degrees of Reading Power Test (Form CP-1A). Educational and Psychological Measurement, 45, 965-969.
6|P a g e
Copyright@IDL-2017