The Normal Curve Did I quote Mike Tyson yet? No book worth a couple hundred dollars is complete without a quote from Mike Tyson. Let’s see… here is a good one…“You better understands what a normal curve be or I wull wip out you heart and eat your children.” You got that…well then you better get this. The normal curve is a theoretical distribution of scores based on probability. The basis for most statistical methods is the assumption of a norms distribution. The normal curve is bell shaped and symmetrical (see Figure 2-5) and based on an infinite number of cases. Statistical methods that assume the characteristics of the normal probability curve are called parametric methods. When data cannot be assumed to conform to a normal distribution, nonparametric statistical techniques are used. We will talk about this later in the book. Figure 2-5 The normal curve
Many human traits have a normal distribution because they occur according to the laws of chance. Heredity, environment, social conditions, and other factors produce these traits. Actually, very few distributions are perfectly normal. However, if the number of cases in a distribution is sufficiently large, many of the traits measured occur in a frequency that follows the shape of the normal curve quite closely. The concept of what is normal, or average, is based on this phenomenon. For example, in a population of 10-year-old boys of a particular race and location, some boys will be very short, some very tall, and most about the same size. The same is true for strength, speed, neuromuscular skills, mental abilities, and other traits. In many instances a set of scores will not have the normal curve shape. Because of the laws of chance, some samples can be expected to have a disproportionately large number of high scores or low scores (or both). When the scores are clustered at one or the other end of the distribution, the curve is said to be skewed.
Frequency
Frequency
Figure 2-6 illustrates positive skewness Figure 2-7 in which the scores are clustered at the lower Distribution negatively skewed end of the scale. In Figure 2-7, the scores are predominantly at the upper end of the Pullup Performance of competative gymnasts distribution, and thus the curve is negatively skewed. If the tail of the curve is to the right, the 40 skew is positive; if the tail points to the left, it is 30 negative. In Figure 2-6 the performance of 1020 year-old boys in pull-ups forms a positively 10 skewed curve, since most boys that age cannot 0 4 8 12 16 20 24 28 32 do many pull-ups (30% cannot do any and 55% cannot do more than one). In fact, pull-up Number of pullups performance tends to be positively skewed (scores cluster at the lower end of the scale) at any age. However, Figure 2-7 also represents pull-up performance but with a select population (competitive gymnasts) who typically possess tremendous arm and shoulder strength relative to their body weight. Consequently, their scores cluster at the upper end of the scale, and the distribution is said to be negatively skewed. It should be emphasized that an actual set of scores will not graph exactly into a normal curve since the normal curve is a theoretical distribution. But if your data involves a large number of subjects, it will usually form a distribution which approximates the normal curve. Therefore in working with groups of scores, knowledge of the normal curve and its characteristics will help you to make predictions and interpretations regarding your data. The total area under the normal curve represents 100% of the scores or frequencies. The mean and mode coincide; hence, the mean is also the most frequent score (highest point on the curve). Furthermore, the mean and median coincide, so 50% of the scores (or 50% of the area) fall above the mean and 50% below it. (See Figure 2-5) How the normal curve can be used to approximate the distribution of a large set of data can be illustrated by the following example. Suppose the Physical Fitness Index (PFI) is administered to 200,000 high school students in a large metropolitan area. The mean is expected to be 100 and the standard deviation, 10. Since N = 200,000, we expect the distribution of scores on the PFI to approximate a normal curve. So we expect 100,000 scores (50% of 200,000) to be greater than the mean score, 100, and 100,000 scores to be less Figure 2-6 than 100. Distribution positively skewed It is known that, for a normal curve, Pullup Performance 34.13% of the area under the curve falls between of 10-year old boys the mean (M) and the score 1 standard deviation above the mean (M + 1SD). The area between 30 the mean plus one standard deviation and the 20 mean plus two standard deviations (M + 1SD 10 and M + 2SD) is 13.59%. Since we are using a 0 normal curve to approximate the distribution of 1 2 3 4 5 6 7 8 9 10 11 12 PFI scores, we expect about 68,260 scores Number of pullups (34.13% of 200,000) to fall between the mean and the score 1 standard deviation above the
Figure 2-8 PFI scores for 200,000 High school Students
mean. In this case, the mean equals 100 and the standard deviation equals 10, consequently one standard deviation above the mean equals 110. (M = 100 and SD = 10, so M + 1 SD = 110). We expect about 37,180 (13.59% of 200,000) scores to fall between 110 (M + 1SD) and 120 (M + 2SD). See Figure 2-8 for an illustration. Since we know that 50 per cent of the area under the curve is found to the right of the mean and 47.72% (34.3% + 13.59%) of the area under the curve is found between the mean and the mean plus two standard deviations by subtraction we find that 2.28% of the area under the normal curve is located more than two standard deviations above the mean. This means that only about 2.28% of the frequencies of a large set of data will be more than two standard deviations above the mean. Furthermore, of this 2.28%, 2.14% is located between the mean plus two standard deviations and the mean plus three standard deviations (M + 2SD and M + 3SD), so that only about .14% (2.28% 2.14%+ .14%) of the scores in a distribution are more than three standard deviations above the mean. Therefore, for the example using PFI scores, we expect about 4,280 scores (2.14% of 200,000) to fall between 120 (M + 2SD) and 130 (M + 3SD) and only about 280 (.14% of 200,000) scores to be larger than 130 (M + 3SD). Since the normal curve is symmetrical, the area under the curve below the mean is divided in the same way as that above the mean. Similarly, a complete distribution of the PFI scores for 200,000 high school students can be estimated by using a normal curve approximation. (See Figure 2-8.) Notice that the sum of all the percent areas is 100%. When we use the normal curve to approximate a distribution curve, the total area under the curve represents the sum of all the frequencies in the distribution. Okay, I am going to a little quiz. Let’s say I gave you and your classmates a 100 question examination on the material we just covered and you got a 32 on the test. The mean for the test was 52 and the standard deviation was 8. What kind of grade do you think you deserve? Well, if I am grading on the normal curve you would most likely get an F for the simple reason I can’t give you a G. Why, because 32 would be more than three standard deviations below the mean, meaning more than 98 % of the students in the class scored higher than you did. Now take Johnny the guy sitting right next to you. He scored a 72. What does that tell you? It tells you that you should have been coping off
him because he scored higher than 98% of the students in the class. How do I know that? Simple he is three standard deviations above the mean. For future reference that is definitely the guy you want to cheat off of. Of course, I know you would never cheat because cheating is a sin and you would never want to commit a sin. Consequently, you will probably be taking this class over about six or seven times…just kidding…I am sure you will pass it after repeating it just two or three times. Look on the bright side though. We are binding here. It’s a wonderful feeling, isn’t it, like something significant is happening between us, and that things are going to be very good for us. Moving right along…
Scales Standard Scores Many of the measurements taken by physical educators are in different units. For example, scores may be recorded in seconds, as in the 50-yard dash; in feet, as in the softball throw; or in repetitions, as in sit-ups. Moreover, the physical educator measures strength in pounds, records the number of times a student volleys a tennis ball against a wall, scores the vertical jump in inches, charts the zone in which a golf ball lands, and uses written test scores, ratings, game scores, and various and various other units of measurement. The combining of scores from separate tests has often posed a difficult problem for teachers who lack the knowledge (or the desire, or both) to transform raw scores into some form of standard score. Well, I am going to give you the knowledge how to make this transformation, and I expect you to supply the motivation and desire to get it right.
z Scores z Scores are scores expressed in terms of standard deviations from the mean. The mean of z scores is zero; thus, scores below the mean are expressed as negative values and scores above the mean as positive values. The formula for computing z scores is: z score
raw score mean of scores standard deviation
If, for example, the mean for the vertical jump is 16 inches and the standard deviation is 7 inches, what is the z score for a jump of 14 in.? If you said .29 you get a red star. Keep saving those red stars they will come in handy later on.
z score
14 16 0.29 7
Using the formula a jump of 9 inches would be a z score of - 1.0; a jump of 26.5 inches would equal a z score of 1.5, and so on. What if you jumped 44 inches, what would that tell you? He tells you that you are a jumping fool because that is a z- score of 4 which is 4 standard deviations above the mean. Plotted on a normal curve it means that you are better than 99% of the students who took that test. Congratulations…now you are cooking!
Specific scores on different tests may be readily and meaningfully compared by simply using z scores (assuming that the test distributions are similar). If a girl had a score of 15 seconds on the flexed-arm hang and ran 1840 yards in the 9-minute run, how can we compare these performances? If the means are 10 sec and 1560 yd, respectively, the girl was above average on both tests, but how much above? Was she better on one test than the other? z scores can be used to answer these questions. For example, if the mean is 10 seconds and the standard deviation is 5 seconds for the flexed-arm hang her score of 15 seconds represents a z score of 1.00. z score
15 10 1.00 5
On the 9-minute run, with a mean of 1560 yd and a standard deviation of 280, we see that her score of 1840 yd is also a z score of 1.00. z score
1840 1560 1.00 280
Consequently, the performances are similar in that each score is one standard deviation above the mean. If we refer back to Table 1-18, the normal distribution, we can see that both scores are better than 84% of the scores (50 + 34.13%). Thus, a tester can determine the percentage of scores above and below a particular score by converting the raw score to a z score and consulting the table. He or she can also see what percentage of a population lies between certain scores and can determine the probability that a particular score will occur. Now, knowing what I just told you, which tells you more “I did 69 sit-ups on a fitness test” or “I made a standard score (z) of 1.3 on sit-ups?” Obviously the latter gives you more information. This illustrates one use of a z score. Standard scores can also be used in setting goals. For example a boy scored 1 standard deviation above the mean ( X + 1SD) on a sit-up test at the beginning of the year. He wants to know how many sit-ups he needs to do to increase his score to two standard deviations above the mean ( X + 2SD). To answer his question, all he has to do is follow the procedure below, remembering that a score X + 2SD is equivalent to a z-score of 2.0. Predicted Score =Z x SD + X If the z is 2, the SD is 13, and the mean is 52, you simply multiple 2 x the SD (13) and add it to the mean (52) to get your predicted score of 78. See it’s as easy as eating cheese cake… assuming you are not lactose intolerance that is. It might be noted that most testers do not use z scores for norms simply because they are awkward to deal with in that the numbers are usually small, involve decimals, and are expressed in both positive and negative values. It might also be noted that although standard scores may be very useful to a teacher, they may confuse the students…a task easily performed. A student usually thinks of test scores in terms of 100 as a perfect score, so a z-score of 2.5 sounds low. In truth it is a great score…right! Also, many students are unable to interpret the negative values…"You mean I did worse than zero?"
To overcome these shortcomings and still retain the advantages of a standard score, various scales (Hull, Sigma, T, and percentile, for example) have been developed. Since the T-scales, and percentile scales are most often used in physical education literature, they will be presented here in more detail. Figure 2-9 Relating z-scores, T-scores, and raw data to the normal curve
T-Scores The T-scale converts raw scores into normalized standard scores with a mean of 50 and a standard deviation of 10. The T-scale has a range of 0 to 100. The mean is always 50 and the standard deviation, 10. Figure 2-9 shows a normal curve with the baseline units given in z-scores, T-scores and raw sit-up data. The Figure 2-9 illustrates the relationship between a z-score and a T-score. You will notice that the mean z-score is always zero, while the mean T-score is 50. Since we know that the standard deviation of the z-scale is 1 and the standard deviation on the T-scale is always 10, the point corresponding to 1 on the z-scale is 50 + 10 = 60. Further, we can convert any set of raw data to Tscale units. For instance, in the sit-up example ( X = 52, SD = 13) the score 1 standard deviation above the mean (52 +13) corresponds to a T-score of 60. If you go back and look at Chapter 1 Table 1-2 (or you can just take my word for it), you will see that the maximum number of sit-ups performed by any student was 80 which corresponds to a Tscore only slightly above T=70. It is very difficult for a student to increase his score from T=70 to T=80. How many sit-ups would he have to do? If you said 91 or a truckload, you are right on both counts. I would prefer you used the former terminology. So does the dean, but who is asking him. What percent of any group would you expect to score more than three standard deviations above the mean ( X +3SD)? If you said less than a percent, you get another one of those red stars. You can see that since plus and minus 3 standard deviations Table 2-3 encompass more than 99% of the scores (99.73%) in the Raw scores on 3 fitness tests
Student 1 Student 2 Class Mean Class SD
Pullups Pushups Situps 8 30 52 12 30 52 11 33 50 1 10 5
normal distribution, most scores will fall between T-scores of 80 and 20. In fact, many norm tables present only this range of T-scores. T-scores represent equivalent points in the distribution; thus, they are comparable for different tests since the reference is always to a standard scale of 100 units that is based on the normal curve. On the other hand, the fact that few people score above T=80 or below T=20 can be an advantage in motivation. A student who scores in the 98th percentile may become quite complacent. The equivalent T-score of 70, however, leaves considerable room for improvement. Conversely, the student who scores in the second percentile may become discouraged; where as his T-score of 30 does not seem quite so low. Of course, we are just playing Table 2-4 with statistics here but if it helps the student T-scores on 3 fitness tests why not? The government does it all the time to Student 1 Student 2 convince poor people… wait I am not going there. Pull-ups 20 60 An instructor is often interested in combining Push-ups 47 47 54 54 the results Sit-ups from several fitness tests in order to get a Mean T score 39.7 53.3 composite score. Consider the scores of the three tests Mean Raw score 30 31.3 which are reported in Table 2-3. If a "Fitness Score” for each student is obtained by finding the mean of his 3 raw scores, Student 1 has a Fitness Score of 30 and Student 2 has a Fitness Score of 31.3. On this basis, you would say that the two are about equally fit. If, however, we convert each boy's scores to T-scores and then find a mean score, the difference in fitness is quite apparent. (See Table 2-4) Since the two boys scored the same on two of the three tests, but 4 standard deviations apart on the pull-up tests, which set of Fitness Scores, seems to reflect their relative fitness most accurately? Tell me you know the answer to that…thank you!
Short Method for Converting Raw Data to a T-Score T-scores can be constructed by any of several methods. We are often concerned with T-score in round numbers (40, 45, 50, etc.). In such cases, it is easy to find the corresponding raw score. Consider a test with a mean of 32 and standard deviation of 8. What score on this test is equivalent to a T-score of 70? First, you must realize that T = 70 is the score two standard deviations above the mean ( X + 2SD). So, for this test, the mean plus two standard deviations equals 48 ( X + 2SD - 32 + 2 x (8) = 48). Thus a raw score of 48 is equivalent to a T-score of 70. After a little practice, you can often find T-score equivalents by this method in your head. Table 2-5 provides additional examples. Table 2-5 Short method for finding t-score equivalents
Example 1:
Test X =83, SD=10, find the score equivalent to T=40
Solution to 1:
T=40 is X – 1SD, so T40 = 83 – (1)(10) which equals 73
Example 2:
Test X =52, SD=7, find the score equivalent to T=65
Solution to 2:
T=65 is X +1.5SD so T65 = 52 + (1.5)(7), which equals 62.5
Formula Methods
The formula for converting from a z-score to a T-score (and vice versa) is: T = 50 + 10z. In Table 1-19 compare the z-scale and T-scale to see why "50" and "10" are used. When working with raw data and you do not need a z-score, the following formula may be used: x X T 50 10 SD
How do the two formulas differ? Note that both for the short method and for this formula you must know the mean and standard deviation of the test before you find T-scores. Table 2-6 illustrates the procedure for finding the T-score of the boy who scored 42 sit-ups. (z = -.77). For the data in Figure Table 2-6 2-9, which would be easier for Computing the T-Score the student (and his parents) Converting from a z-score or Using raw data to understand: a z-score of -.77 x X T 50 10 or a T-score of 42? If you said a T = 50 + 10z SD T-score of 42, you get another 42 53 red star. Before you know it T = 50 + 10(-.77) T 50 10 you are going to have a whole 13 book full of red stars, and if T = 50 - 7.7 T = 50 – 7.7 T = 42.3 T = 42.3 you get enough of them, I will even throw in a Donald Duck lunch pail for you.
Percentiles When you take a test and get a score back of 81%. It tells you how many questions you got right. But your test score doesn’t tell you how well you did compared to other people who took the test: how many students (or what percent of the class) scored above you, or, perhaps more consolingly, how many were below you. Let’s say you scored 167 on a fitness test. You are less concerned about the score itself than with how that score compares with highly fit individuals. There are still other occasions in which we are interested in the position of a particular score rather than the relative status of an individual who happens to get a given score. Colleges for instance, require that students have a minimum score on an intellectual classification test. You may have heard of it…the SAT test. Determination of the percentage of men and women who equal or exceed this score may be of importance in establishing the number of potential candidates for entrance into college in the future. There are, in other words, many situations in which our primary concern is the relative position of an individual (or of a specific score) in a group rather than the performance of the group as a whole. Implicit in this statement is not only the fact that our concern is in an individual rather than a group, but also that a score, a numerical representation of some characteristic, is seldom meaningful in and of itself. In order to interpret a particular measure we must know something about the scores of other items or individuals. For instance, what if we were told that Tom Terrific had scored 120 on a physical fitness test? Taken alone, this bit of information would be completely useless. The statement would take on more significance, however, if we knew something about the scores of others taking the
test (and, it should be added, the composition of the group as well). Thus, for any measurement, particularly one that measures a fitness characteristic (e.g., a strength test, cardiovascular test, or a body composition test) that has no meaning by itself, our interpretation of the measure or score depends wholly or partially on a knowledge of its relative position in a group. The need for designating the particular position a specific score occupies in the group from which it is drawn in order to interpret its significance should now be quite obvious. The next step is to find a method that will enable us to state with numerical precision the relative position of any specific measure. The simplest procedure would be to count up the number of individuals above (or below) a given score and express that score in terms of numerical rank. For instance, Tom Terrific’s score of 120 on his physical fitness test ranks 102nd out of 200…not to terrific. If you read the example carefully, you can make it more meaningful to yourself by understanding that just about 50 percent of the group scored below 120 on the test. This process of translating a numerical rank into percentage terms is one of the most common ways of expressing relative position. The result of such a conversion is known as a percentile rank. Percentiles can be used to compare values in any set of data that is ordered. In fact, one of the more common ways of presenting norms is by percentiles. The physical educator frequently has occasion to use percentiles, since many standardized tests (e.g., the AAHPERD Youth Fitness Test, the AAHPERD Health Related Physical Fitness Test, and the AAHPERD Sports Skills Test) present scores in this manner. Generally speaking, students are fairly well oriented in the interpretation of percentiles. I am hoping you are one of those individuals… you know one that understands something. Just in case you are not I will continue on. The percentile rank of any specific score is a value indicating the percent of cases in a distribution falling at or below this score. A percentile rank therefore defines, in percentage terms, the position of that score in its distribution. If, for example, 83 percent of a group score 101 or lower, the score 101 would have a percentile rank of 83. If only 10 percent of the group obtained a score of 34 or lower, than 34 has a percentile rank of 10. Conversely, if a given score has a percentile rank of 54, we would know that 46 percent of the group falls above it. When we are obtaining a percentile rank, we are given or start out with a score and find the percent of cases falling at or below it. However, we can also reverse the procedure, starting out with a given percent in mind and finding the score corresponding to it. To illustrate, a college registrar might be forced to reject 50 percent of the applicants for the freshman class and decide to make use of a scholastic aptitude test as the selective device. His problem is to find the score on this test below which 30 percent of the applicants fall. This score would be called the 30th percentile. Thus, a percentile is the score at or below which a given percent of the cases lie The percentile scale, derived from percentages is one that contains 100 units. A number of points along this scale are given special names. The most commonly used of these are the quartiles. The first quartile (known as Q1) is the special name for the 25th percentile, the second quartile (Q2 also called the median) is equal to the 50th percentile, and the third quartile (Q3) to the 75th percentile. Occasionally certain percentiles are referred to as deciles: the first decile (or 10th percentile), the second decile (20th percentile) and so forth up to the tenth decile (100th percentile). “At or In” is an incorrect response? Students occasionally say: "This score is in the second quartile" or "in the 71st percentile". These statements are inaccurate because as I already pointed out, a quartile (or decile, or percentile) is a score, a specific point, and therefore no other score can fall "within" that point. A score, then, is, or falls at, a percentile, quartile, or decile, not "in." We could,
however, make a statement of this sort: X falls between the 2nd and 3rd quartiles (or between two percentiles) since this would indicate a position somewhere between two points.
Obtaining Percentile Ranks Again, a percentile score informs students what proportion of individuals of the same population scored below them and how many scored above them. For example, a percentile rank of 80 on a test means that 80% of the people taking the test had lower scores, and 20% had higher scores. With percentiles (as with other norms), scores from different tests can be compared to show how a student performed in relation to the other students taking each test. In most statistics and tests and measurement books, percentiles are calculated from a frequency distribution. However, I am going to show you a much simpler technique… only because I like you. Once you are familiar with the steps, you will be able to construct percentile norms for a couple of hundred scores in less than a half hour. For most classroom purposes, this method is sufficiently accurate and it has the advantage of being easily computed. First put your raw scores into an array from largest to smallest. Then list the percentile ranks in order from 99 to 1. Now comes the tough part…NOT! Find the raw scores that correspond to each percentile rank. For instance, for the 99th percentile, record your highest score and for the 1st percentile, record your lowest score. To find any other percentile simply multiply N by the percent and count that many scores from the bottom up. For example, let’s say you have 125 raw scores (N = 125). To find the 10th percentile, simply multiply 125 (N=125) by 10%. If you did that correctly you get 12.5 (125 x .10 = 12.5). Then all you do is count 12.5 scores from the bottom up and will be know what score corresponds to 12.5%. One advantage of a percentile rank is that it is generally easy to interpret and communicate. It shows an individual’s standing in reference to some group. Percentile ranks are also useful for comparing an individual's performance on several tests where the characters of the raw scores differ. For example, a tensiometer score (pounds), sit-up score (number completed) and 100 yard dash score (seconds) can be compared when they are transformed into percentile ranks. There is, however, a limitation associated with the interpretation of percentile ranks. Two or more percentiles ranks cannot be combined in order to find a meaningful composite score. In other words, percentile ranks are not proportional to raw scores. For instance, let’s say you have two students and you test them on a 1-RM bench press. The first student bench presses 200 pounds which gives him a percentile rank of 40. The other student bench presses 300 pounds which gives him a percentile rank of 90. Now let’s say that after months of training the first student increases his bench to 220 pounds which ups his ranking to the 70 percentile rank. On the other hand, student two after months of training increases his bench press to 320 which now gives him a 92 percent ranking. Now, from this example it doesn’t take Steven Hawking to tell you that percentile ranks are not proportional to raw scores. Student one jumped 30 percentile ranks by increasing his raw score 20 pounds while student two raised his percentile rank only 3 spots by increasing the same amount of weight.
Other Scales — Sigma and Hull Okay, now we are almost done here for real. I will make this brief…PROMISE! There is just one more thing I want to tell you. It has been noted that scores on a T-scale generally range from 20 to 80. It is considered desirable to have scores range from 0 to 100; yet retain the proportionality
between raw scores and standard scores which is lacking in a percentile scale. Two other scales are used which have a range of 0 to 100 and proportionality. The sigma scale is designed with a mean of 50. The scale score corresponding to M+3SD is 100 and that corresponding to X -3SD is 0. The formula for converting raw scores to sigma scores is:
(sigma score)
50 50( x X ) 3SD
Occasionally sigma scores greater than 100 or less than zero occur. Not often but they do occur. To reduce this possibility, the Hull Scale includes all scores from M-3.5SD to M+3.5SD in its 0 to 100 range. The formula for converting raw scores to Hull scores is:
Hull score
50 50( x X ) 3.5SD
For the data in Table 1-24, a raw score of 139 corresponds to a T-score of 70. The equivalent sigma score is: sigma score
50 50(139 83) 83.3 3( 28)
the equivalent Hull score is: Hull score
50 50(139 83) 78.6 3.5( 28)
Table 2-7 gives a comparison of T, Hull and sigma scores for the tensiometer strength scores ( X =83, SD=28).
The T-scales and percentile scales have been discussed in greater detail in this text because they occur most frequently in the literature. The author feels (that’s me); however, that either the Hull or sigma scale may be preferable for constructing norm scales. Of course, that is just my opinion and you know what they say about opinions… they are like noses everyone has one… even you.
Table 2-7 Comparison of T, Sigma, and Hull scales using Mean (83) and standard deviation (28) of tensiometer strength scores (Figure 1-24)
Raw Score 181 167 153 139 125 111 97 83 69 55 41 27 13
T 85 80 75 70 65 60 55 50 45 40 35 30 25 20 15
Sigma 108 100 92 83 75 67 58 50 42 33 25 16 8 0 -8
Hull 100 93 86 79 71 64 57 50 43 36 29 21 14 7 0