What Good Is A Correlation Coefficient? Now I know exactly what you are thinking; I always know what you are thinking. “All that freaken work and all those calculations just to end up with a single number: 0.89. How absurd is that?” Seems kind of like a waste of time, huh? Well, guess again! It is actually very cool! “Yeah, right!” you say. Now, just be quit and let me explain. First of all a correlation, that single number, can tell you the direction of a relationship. If your correlation coefficient is a negative number, you can tell, just by looking at it, that there is a negative relationship between the two variables. As you may recall, a negative relationship means that as values on one variable increase (go up) the values on the other variable tend to decrease (go down) in a predictable manner. If your correlation coefficient is a positive number, then you know that you have a positive relationship. This means that as one variable increases (or decreases) the values of the other variable tend to go in the same direction. If one increases, so does the other. If one decreases, so does the other in a predictable manner. Let me make this crystal clear for you. All correlation coefficients range from -1.00 to +1.00. A correlation coefficient of -1.00 tells you that there is a perfect negative relationship between the two variables. This means that as values on one variable increase there is a perfectly predictable decrease in values on the other variable. In other words, as one variable goes up, the other goes in the opposite direction (it goes down). A correlation coefficient of +1.00 tells you that there is a perfect positive relationship between the two variables. This means that as values on one variable increase there is a perfectly predictable increase in values on the other variable. In other words, as one variable goes up so does the other. A correlation coefficient of 0.00 tells you that there is a zero correlation, or no relationship, between the two variables. In other words, as one variable changes (goes up or down) you can’t really say anything about what happens to the other variable. Sometimes the other variable goes up and sometimes it goes down. However, these changes are not predictable. Most correlation coefficients (assuming there really is a relationship between the two variables you are examining) tend to be somewhat lower than plus or minus 1.00 (meaning that they are not perfect relationships) but are somewhat above 0.00. Remember that a correlation coefficient of 0.00 means that there is no relationship between your two variables based on the data you are looking at. The closer a correlation coefficient is to 0.00, the weaker the relationship is and the less able you are to tell exactly what happens to one variable based on knowledge of the other variable. The closer a correlation coefficient approaches plus or minus 1.00 the stronger the relationship is and the more accurately you are able to predict what happens to one variable based on the knowledge you have of the other variable. See the following, for a mathematical representation of what I am talking about. +.80 +.60 +.40 +.20 0
to +1.00 to +.80 to +.60 to +.40 to +.20
or or or or or
-.80 -.60 -.40 -.20 0
to -1.00 - high to -.80 - moderately high to -.60 - moderate to -.40 - low to -.20 - little or none
The reader should be wary in using any guidelines since they can not reflect the factors important for an individual situation. Guidelines do not consider the nature of the variables, the number of subjects included or the use to which the correlation coefficient will be put.
Let’s look at this another way. Suppose that you read a research study that reported a correlation of r = .75 between strength and speed. What could you say about the relationship between these two variables and what could you do with the information? You could tell, just by looking at the correlation coefficient that there is a positive relationship between strength and speed. This means that people with more strength tend to have greater speed. Similarly, people with low levels of strength tend to be slower speed wise. The relationship between strength and speed seems to be pretty strong. You can tell this because the correlation coefficient is much closer to 1.00 than it is to 0.00. Notice, however, that the correlation coefficient is not 1.00. Therefore, it is not a perfect relationship. As a result, if you were examining a coaches training practices, you could use the knowledge that strength is related speed to “model” the coaches training practices. This means that you can generate predictions of how fast an individual should be based on his or her level of strength. Let me give you another example. Let’s say that you are interested in finding out if body fat is related to cardiovascular fitness. You compute a correlation coefficient to evaluate the relationship between body fat and cardiovascular fitness and get an r = -.35. What could you say based on this correlation coefficient? You could tell, just by looking at the correlation coefficient that there is a negative relationship between body fat and cardiovascular fitness. This means that people with more body fat tend to have less cardiovascular fitness. Similarly, people with less body fat tend have greater cardiovascular fitness. In this example the relationship between body fat and cardiovascular fitness to be moderately strong. You can tell this because the correlation coefficient is about halfway between 0.00 and 1.00. Okay, so how would you use this information? Since you have established that a relationship exists between these two variables, you can use the knowledge that body fat and cardiovascular fitness has an inverse relationship to investigate further. What you would do is look at those individuals who had high body fat ratings to see if there is some valid reason for their poor cardiovascular performance (e.g., higher-workload, weaker heart, etc.) Finally, you could use this information to identify how much weight loss needs to be increased to eliminate the relationship. In addition to telling you A. Whether two variables are related to one another, B. Whether the relationship is positive or negative and C. How large the relationship is, The correlation coefficient tells you one more important bit of information--it tells you exactly how much variation in one variable is related to changes in the other variable. Many students who are
new to the concept of correlation coefficients make the mistake of thinking that a correlation coefficient is a percentage. They tend to think that when r = .90 it means that 90% of the changes in one variable are accounted for or related to the other variable. Even worse, some think that this means that any predictions you make will be 90% accurate. This is not correct! A correlation coefficient is a “ratio” not a percent. However it is very easy to translate the correlation coefficient into a percentage. All you have to do is “square the correlation coefficient” which means that you multiply it by itself. So, if the symbol for a correlation coefficient is “r”, then the symbol for this new statistic is simply “r2” which can be called “r squared”. There is a name for this new statistic—the Coefficient of Determination.
Coefficient of Determination Again, the coefficient of determination is simply r2, the square of the correlation coefficient. The coefficient of determination gives the proportion of variance in one variable that is associated with the variance of the other variable. For example, suppose the right boomerang and shuttle run yield an r =.85. The coefficient of determination (r2) then is .72. (.85 X .85=.72) We interpret this by stating that 72% of the variance in the shuttle run scores is explained by predicting from the variance in the right boomerang scores. In other words, some common factor, perhaps agility, accounts for 72% of the association between the two items. Now suppose the correlation between the right boomerang and the quadrant jump is .40. In this case, r2=.16 (.40 X .40= .16) so only 16% of the variance of the two tests is in common and 84% of the variance must be accounted for by other factors.
Making Statistical Inferences Okay you calculated a correlation coefficient. Good for you, but the next step is critical. You need to ascertain whether the results you got are “real” or if you got them purely by chance. Remember shit happens. There is always the possibility that the sample you selected is really a group of weirdoes and derelicts…you know the kind of people you hang with… and that the results you got were a product of chance. Consequently, you need to find out if your results can be generalized back to the population from which you selected your sample. This process is called making a “statistical inference.” Before you can say that your correlation coefficient—which tells you very accurately about what is happening in your sample—would be equally true for the general population, you need to ask yourself the important question. Could I have a correlation coefficient as large as I found simply by chance?” Knowing you that is a real possibility…just kidding. In other words, is it possible that there really is no relationship between these two variables but that somehow you just happened to select a sample of people whose scores made it seem like there was a relationship? How do you determine whether or not your correlation is simply a chance occurrence or if it really is true of the population at large? All you have to do is look your correlation up in a table and compare it to a number that will find there and you will get your answer. How easy is that? And you thought statistics was hard. Hold on though I am going to discuss how to do that in greater detail later in this textbook, but for now I am going to explain to you what you need to do to answer this important question. That’s right I am going to make something that requires little effort to a task that will strain your brain a little. Look at it as mental strength training. When we get done here you are
going to have a massively strong brain that you will be able to flex for your friends. Now are you excited? Don’t answer that. Okay, you will need three things in order to determine whether you can make a generalization from the relationship you found in your sample to a larger population. You will be pleased as punch to know that most statistical programs will compute this for you. Don’t get too happy because we are going to do all the work by hand. Remember, we are pumping up your brain here. Okay, the steps below will show you how to do it manually. 1. The Correlation Coefficient that you calculated (for example, r = .65) 2. Something called the “degrees of freedom” which is simply the number of pairs of data in your sample minus 2. For example, if you had 10 people in the data set you used to calculate the correlation coefficient that means you have 10 pairs of data (each person has two scores one on the X variable and one on the Y variable). Therefore: a. N = 10 b. The number of pairs of data would be 10 c. number of pairs would be 2 would be equal to 8. 3. The table you will find below which lists “The Critical Values of the Correlation Coefficient has this information. Let me give you an example that should make clear what you need to do. 1. Suppose you have calculate a correlation coefficient of r = .89. You already did this remember…go back and look at Figure 2-4 2. Your correlation coefficient was based on 10 people. Yes, it was an n of 10….go ahead go back and look. If you have an n of 10 your degrees of freedom would be what? If you said 8 you get a red star. Again, you have 10 pairs of data, and each person has two scores one on the X variable and one on the Y variable. The “degrees of freedom” is the number of pairs of data in your sample minus 2 (10 – 2= 8) 3. If you look at the table in the back of nearly every statistics book which should be called something like “Critical Values of r” or “Critical Values of the Correlation Coefficient” you will see something like the following:
The Correlation Coefficient Critical Values of the Correlation Coefficient Two Tailed Test Type 1 Error Rate Degrees of Freedom 1 2 3 4 5 6 7 8
.05 0.997 0.950 0.878 0.811 0.754 0.707 0.666 0.632
.01 0.999 0.990 0.959 0.917 0.874 0.834 0.798 0.765
9 10 11 Etc…
0.602 0.576 0.553 Etc…
0.735 0.708 0.684 Etc…
Under the heading of “Type 1 Error Rate” you will see two columns. One of the columns has a “.05” at the top and the other column has a “.01”. These numbers refer to the chance you would have of making a Type 1 Error. A Type 1 error refers to the chance you would have of finding results by chance. In other words, the “.05” tells you that you would have a 5% chance of making a Type 1 Error. Still, another way of saying it is If you conducted this same study 100 times using different people in your sample, you would only find a correlation coefficient as large as you did about 5 times purely by chance even if a relationship does not exist. The “.01” tells you that If you conducted this same study 100 times using different people in your sample, you would only find a correlation coefficient as large as you did one time purely by chance even if a relationship does not exist.” Once you have looked down the column for Degrees of Freedom in the table named “Critical Values of the Correlation Coefficient” look across to the number listed under .05. Do you see that the number listed there is 0.632? This number is called “the critical value of r”. Why is 0.632 called the “critical value of r”? Because when you compare the correlation coefficient that you found for your sample, ignoring the negative or positive sign, with this number, if your correlation coefficient is equal to or bigger than this critical value, you can say with reasonable confidence that you would have found a relationship this strong no more than 5% of the time by chance. In other words, the evidence suggests that there really is a relationship between your two variables and that you most likely (5 out of 100 times) would not have found such a large relationship by chance. Of course, you could still be wrong—but you can actually say something about how big of a chance that is! This explanation is overly simplistic but it is good enough to give you the general idea. We will discuss decision errors and statistical significance in greater detail in the chapter on “Hypothesis Testing.” I know you can’t wait. Now look at the number listed under the heading of .01. Notice that the number that is listed for 8 degrees of freedom is 0.765. This tells you that if your correlation coefficient is equal to or larger than this critical value, then you could expect to find a correlation that large when no relationship really exists—only 1% of the time. In other words, your risk of making a Type 1 Error is less than when you used the number listed under the heading of .05 the Correlation Coefficient. Okay, we are almost done here. 1. The last step therefore is to compare your correlation coefficient with the critical value listed in the table. In our study investigating the relationship between hours spent conditioning and the final fitness level of 10 students we found a correlation coefficient of.89. Go ahead and look Bozo the work was in Figure 2-4. Since our correlation coefficient of .89 and the critical value listed under the heading of .05 with 8 degrees of freedom is 0.632, you can see that our correlation coefficient is larger than the critical value listed in the table. This means that if we accept our findings as being “true” meaning that we believe there is in fact a relationship between these two variables in the general population, we run only about a 5% chance of being wrong. Here is a question for you is our correlation coefficient significant at the .01 level? To be significant our correlation coefficient of .89 would have to be large than the critical value listed under the heading of
.01 with 8 degrees of freedom. That value is 0.765, consequently correlation coefficient of .89 is significant at both the .05 level and the .01 level. 2. When your correlation coefficient is equal to or larger than the critical value, you can say it is “statistically significant”. Whenever you hear that a research finding is statistically significant, it tells you how much confidence you can have in those findings and you can tell just how much chance there is that they may be wrong! Several fallacies and limitations should be considered in interpreting the meaning of a coefficient of correlation. A correlation does not imply a cause effect relationship between the variables. Many people confuse correlation and causation. A high correlation does not necessarily mean that a change in one variable causes a change in the other. For example, suppose you found a high correlation (r = .80) between pushup scores and 100 yard dash scores. Do you think that doing well in pushups caused an individual to do well on the dash or vice versa? On the other hand, suppose there is a high positive correlation (r = .80) between the number of errors committed by baseball teams and the number of losses per team. Do you think there might be a causal relationship there? In other words, the criterion for determining causality should be the nature of the two variables involved rather than the coefficient itself. Let me give you a better example. I bet if you did a research study to determine if there is a relationship between individuals who carry lighters and lung cancer you would most likely get a pretty high relationship. If you don’t believe me go ahead and do the study. You will see that I am right…I am always right. Now, do you think that carrying a lighter causes cancer? NO! It is probably because people who carry lighters also smoke which causes lung cancer. These are examples of the post-hoc fallacy. They are symptoms of a different cause, and do not represent a cause-effect relationship. Although these coefficients are statistically genuine, and may be explained by other factors, it would be spurious analysis to interpret one as the cause of the other. Nor could any useful conclusions be drawn by correlating the decline in corporal punishment in schools and the increase of juvenile delinquency. Technological and social change may be underlying factors in this relationship. Before a causal relationship between variables can be accepted, it must be established by logical and empirical analysis. The process of computing the coefficient of correlation only quantifies the relationship that has been previously established. A relationship of a particular type may exist only within certain limits, and must be tested to determine whether it is linear or curvilinear. Logical analysis must also be applied to determine what other factors may be a significant part of the pattern.
THE END!...not quite …THERE IS MORE! Ascertaining Reliability, Validly and Objectivity Three additional uses of the correlation coefficient are common and should be mentioned here. 1. Reliability: The reliability of a test or a test item indicates how likely a student is to make about the same score if the test is repeated. It could be called the “repeatability” or “consistency” of the test. The most common way of determining the reliability of a test is the “test-retest” method. For this, a test is given to a group of subjects and then the same test, under the same conditions, is repeated at a later date, usually the next day. The scores from the first test are correlated with those of the re-test and the correlation coefficient obtained is
the reliability of the test for those conditions. For example, a teacher wanted to know the reliability of a softball throw for distance. The teacher administered the test to the class and repeated the test on the following day. In general, a reliability of .90 or greater is considered high and from .80 to .90, moderate. Reliability below .80 leaves some question as to whether the test will give consistent results when it is repeated. One point which should be noted is that reliability is computed for a test under a specific set of circumstances. A test which has been reported to be highly reliable can be an extremely unreliable measure if the test administration is not carefully done. On the other hand, through careful administration, the reliability of a test which has been reported to have only moderate reliability can often be improved. Remember, a reported reliability is for a particular type group under specific conditions. For example, knowing that a test has been found reliable for high school boys does not imply that it will be reliable for girls, boys of another age level, or even another group of high school boys under different conditions 2
Objectivity: A test is considered to be objective if different administrators can give it to the same set of students and get about the same results. The directions for administering and scoring the test are important factors in its objectivity. To determine the objectivity of a test, you have to have two different administrators or instructors give the test to the same group under similar conditions and compute a correlation coefficient between the two sets of test scores. Objectivity coefficients can be interpreted in about the same way as reliability coefficients.
3
Validity: The validity of a test is an indication of how well it measures what it is supposed to measure. It is sometimes called the “honesty” of the test. There are several ways commonly used to determine the validity of a test. We will talk about these methods in detail later on in the text.
Stress Can Kill You…Like Dead Even Now that you've gotten over the biggest hump, don't fool yourself. It's not over yet. This is a very big camel we have to hump. Did that come out right? Anywho, the stress that keeps you from going to the next chapter, which in all honesty is the most challenging chapter, might well keep you from performing at your best once you are there. Here is something you need to know. You can make stress work for you. But first it helps to understand it. Not all stress is bad. Stress is what keeps us interested and motivated in life. Anything that gets your adrenaline going, whether it's an important business deal or being in love, is considered stress. Whenever you feel particularly gratified by a situation, it's the result of working with stress. The bad part comes when you go through too much stress. It begins to take its toll. It can leave you tired and depressed, burned out. In reaction to stress, your body releases hormones that increase the heart rate, blood sugar, and blood pressure while slowing down digestion and constricting blood vessels. Not a pretty sight. And if you don't do anything to stop it, you could be headed for some bad territory…headaches, ulcers, heart attacks, strokes. All reasons to run to the registers office and drop this freaken class. Don’t do that…there is good news about all your stress.
Lying dormant in your body is a complete defense system designed to counteract stress. Endorphins, opiate like substances released in the brain and bloodstream, can decrease your sensitivity to pain, slow your breathing, and even lower your blood pressure. Forget Valium: Endorphins come free with everyone’s body…a good reason to be born. There's a little secret here, though. Endorphins are most effective through exercise, but can be released through mental calisthenics too. They produce a sort of natural high that makes you feel so good you want to shout. It's that pumped-up feeling you get after a good workout or after mastering a complex cerebral quandary. You'll hear people say, "I really got my adrenaline flowing on that one." That's right. But more importantly, they got their endorphins flowing. A warning though! Endorphins can be addictive. Seriously! Compulsive thinkers like Einstein, Hawking and Nietzsche often go through withdrawal and depression if forced to lay off awhile. Don't sweat it though. Healthy thinking should be the worst thing you are ever addicted to. And don’t worry I am not going to let you experience any withdraws. That is your cue to turn to the next chapter.