TM-3-1

Page 1

CHAPTER III – CORRELATION

Computing Correlation Coefficients Objectives After reading this chapter, the student should be able to: Define correlation 1. Understand the difference between positive correlation and negative correlation 2. Explain the relationship between correlation and causation 3. Understand the problems with interpreting a correlation coefficient 4. Define coefficient of determination 5. List some of the common uses of correlation 6. Define reliability and objectivity 7. Explain how to determine reliability and objectivity 8. Interpret reliability coefficients and objectivity 9. Define validity 10. Identify six common ways to determine validity 11. Interpret validity coefficients 12. Define criterion 13. Explain the major precaution in regard to criterion 14. Explain what is meant by a positive and negative correlation, and interpret the meaningfulness of a correlation in terms of percentage of variation

Key Terms Correlation(r): The simultaneous change in value of two numerically valued random variables. Positive Correlation(r): Direct association between two variables. As one variable becomes large, the other also becomes large, and vice versa…the positive correlation between cigarette smoking and the incidence of lung cancer. Positive correlation is represented by Correlation Coefficients greater than 0. Negative Correlation(r): Inverse association between two variables. As one variable becomes large, the other becomes small… the negative correlation between age and normal vision Negative correlation is represented by correlation coefficients less than 0.. Correlation Coefficient(r): A measure of the interdependence of two random variables that ranges in value from −1 to +1, indicating perfect negative correlation at −1, absence of correlation at zero, and perfect positive correlation at +1. Also called a coefficient of correlation. Causation: The belief that events occur in predictable ways and that one event leads to another. Coefficient of Determination (r2): The coefficient of determination is simply r2, the square of the correlation coefficient.


Statistical Inference: Making a generalization or inference back to the population from which a sample was drawn.

Statistically Significant: (usually expressed as p=.05): In statistics, a result is called statistically significant if it is unlikely to have occurred by chance. Reliability: The reliability of a test or a test item indicates how likely a student is to make about the same score if the test is repeated. Objectivity: A test is considered to be objective if different administrators can give it to the same set of students and get about the same results. Validity: The validity of a test is an indication of how well it measures what it is supposed to measure. Regression: In regression the interest is directional, one variable is predicted and the other is the predictor; in correlation the interest is non-directional, the relationship is the critical aspect. Regression Equation: the equation representing the relation between selected values of one variable (x) and observed values of the other (y); it permits the prediction of the most probable values of y.


Measuring Relationships We are often interested in measuring the relationship or correlation between two variables. For example, you may believe that persons who spend more hours participating in a conditioning program will be in better shape. Suppose you select a group of volunteers to spend varying amounts of time in a conditioning program and then measure their fitness level. The fitness test we are going to use is the “Ultimate Fitness Test” which was developed by the author this very moment just so I could present these scores to you. In other words, use your imagination here. The scores on the “Ultimate Fitness Test” range from 0 to 200 with a score of 0 being the worst score you could get and a score of 200 being the best score you can get. There were ten students in the study. Table 3-1 shows the number of hours each student spent conditioning and his level of fitness at the end of the conditioning program. Table 3-1 Hours spent conditioning and the final fitness level of 10 students

Student 1 2 3 4 5 6 7 8 9 10

Hours Conditioning (hrs) 12 10 25 20 5 13 18 20 12 15

Fitness Level 120 120 145 135 80 90 120 140 100 110

As usual I did the research for you. After this class is over you will probably owe me your first born child… No! I don’t pay child support. To get a picture of any trend, you might graph this data as shown in Figure 3-1. Note that each dot on the graph represents the hours of Figure 3-1 Graphical representation of the hour’s conditioning-fitness data in Table 3-1

Fitness Level

150 140 130 120 110 100 90 80 70 0

5

10

15

20

Hours of Conditioning

25

30


conditioning and fitness level of a student. From Figures 3-1 and Table 3-1, would you say there is a relationship between hours spent in conditioning and fitness level? From the graph it would appear that in general the more hours you train, the higher your fitness level. Therefore, we would conclude that the two factors or variables are correlated. Now let’s see how smart you are. Examine each graph in Figure 3-2 to see if there appears to be a relationship between the two variables in each diagram. Graph A of Figure 3-2 is similar to the graph in Figure 3-1. In each case, as the measurements of one variable increase, the measurements of the other also increase. This is called positive correlation. Graph B shows no real pattern or relationship between the number of pushups and IQ. In such a case we say there is no correlation or zero correlation. Graph C illustrates a negative correlation. As the measurements of one variable increase, the measurements of the other tend to decrease.

120

10 Football Games Won

90

60 IQ

Gallons of Ice Cream Sold

Figure 3-2 Graphical representation of three sets of data

100

30

0

80 40

60

80

Outside Temperature - Chart A

5

0 0

25

Number of Pushups - Chart B

50

0

10

20

Number of Injuries - Chart C

Since we do not wish to be limited to judging correlations from a graph, we can use our data to calculate a correlation coefficient. A correlation coefficient is a more objective way of expressing the relationship between the two variables. Its values range from -1.0 to +1.0. A correlation coefficient of zero indicates that there is no relationship between the two variables. The closer the value of the coefficient is to +1.0 or -1.0 the stronger the relationship. Thus, a coefficient of .60 would indicate more relationship than a coefficient of .30. It is also important that you understand that a correlation coefficient of -.85 indicates a relationship that is a strong as the relationship represented by correlation coefficient of +.85. The minus (-) sign simply indicates the kind of relationship…a negative correlation. Note: It is impossible for a correlation coefficient to be greater than +1.0 or less than -1.0. We are going to talk about all this in greater depth later on.

Computation of a Correlation Coefficient Now we are going to learn how to compute a correlation coefficient. Are you ready for this? You know you have the right to say, “No.” This is where common sense comes in. Learning to say “No” is not only the best method of birth control, but it is also one sure way of avoiding “intellectual constipation”. You can say, “No” to your classmates, or you can say, “No” to your mother. When is it a good time to say “NO”? Your brain will tell you when. You need to learn to listen to your brain. When learning, there will be times when you feel an unaccustomed throbbing, pain, or twinge in your grey matter. When you feel this, your subconscious will start to analysis and compute your cerebral


workload. It will quickly calculate the degree of brain discomfort (the mathematical difference between “Ouch” and “Oh, my God” and try to compare it with some other discomfort you felt at some other time in your life…perhaps when you were studying nuclear psychics. You know the prerequisite for this chapter. Understand that the mind works in strange and wondrous ways. If you are not careful, before you know it, things start sinking into your brain subliminally. At that point, you are leaning new concepts, cramming your brain with more and more data, and a new burst of cranial discomfort has you considering every bit of mental activity you’ve engaged in for the preceding 48 hours. Still you are going at it…learning, and developing…taking yourself to new galaxies. If you are not careful, you’ll end up with indeterminable “brain” damage. You don’t want that, so let me ask you again, “Are you ready to learn how to compute a correlation coefficient?” Of course you are, because I am not taking “No” for an answer. Like I said, you can say “No” to your classmates, and you can say “No” to your mother, but you can’t say “No” to me. Just for the record, I don’t care if you do develop “brain” damage. Remember, no pain, no gain. Going for the burn is not just a physical thing, but it can also be mental. It’s true, it’s true. The only way to strengthen your muscles and your brain is to work hard. While you are spending time with me you are going to work hard. As Joe Willie Namath used to say, “Score, if you’re going to play, Baby.” So let’s get started…you want to score, don’t you? I mean here in the classroom not in the dorm. Get your mind in the right place…think correlation. And don’t give me that “No” crap either. Now, the Pearson Product-Moment Correlation Coefficient (r), or correlation coefficient for short is a measure of the degree of linear relationship between two variables, usually labeled X and Y. While in regression the emphasis is on predicting one variable from the other, in correlation the emphasis is on the degree to which a linear model may describe the relationship between two variables. In regression the interest is directional, one variable is predicted and the other is the predictor; in correlation the interest is non-directional, the relationship is the critical aspect. I am sure that is crystal clear to you. If not, just make believe for now. I will explain it all again later so even Joe Willie could understand it. The formula for Pearson's correlation takes on many forms. A commonly used formula is shown below. r

 xy  x  y  2

2

Is that scary looking or what? I know it scared the living heck right out of you …didn’t it? Now you really feel intimidated don’t you? Is your heart pounding, your palms getting a little sweaty and are you starting to feel a little faint? You wimp! Okay, just take a deep breath and chill. Then take a close look at the formula. Can you tell me how many separate things you really need to calculate in order to work this mother out? Think it through. The formula looks a bit complicated, but taken step by step as shown in the numerical example, it is really quite simple. Think of it like a football play. Just like a football play, it made up of nothing more than a series of Xs and Os that have been mixed together and then worked to accomplish a process (blocking, hitting a designated hole and scoring a touchdown), so it is with this formula. Look for the elements. They are listed below: ΣX - This simply tells you to add up all the X scores


ΣY - This tells you to add up all the Y scores Σ x2 - This tells you to square each X score and then add them up Σ y2 - This tells you to square each Y score and then add them up ΣXY- This tells you to multiply each X score by its associated Y score and then add the resulting products together (this is called a “cross-products”) N- This refers to the number of “pairs” of data you have. These are the basics you need in order to score your touchdown. The rest is simply a matter of adding them, subtracting them, dividing them, multiplying them, and finally taking a square root. All of this is easy stuff with your calculator. Let’s work through an example. I am going to use the same data we used in Table 3-1 when we were interested in seeing if there was a relationship between hours spent conditioning and the final fitness level of 10 students. Get ready, get set…hike! Here are the steps in computing our correlation coefficient.

1. Form subject, X, and Y columns. List each subject and their score on each variable 2. Find the mean of the X-column (Mx) 3. Find the mean of the Y-column (My) 4. Form the x-column. (X- Mx) Subtract Mx from each score in the X-column 5. Form the y-column. (Y- My) Subtract My from each score in the Y-column 6. As a check, if you have not rounded off, Σx=0 and Σy=0 7. Form the X2-column. Square each entry in x-column 8. Find Σx2. Sum the x2 column 9. Form the Y2-column. Square each entry in y-column 10. Find Σy2. Sum the y2 column 11. Form the xy column. Multiply the corresponding entries in the x and y-columns. 12. Find Σxy. Sum the xy column 13. Use the formula to compute the correlation coefficient

r

 xy  x  y  2

2


Table 3-2 Procedures for computing the correlation coefficient using data from Figure 2-1

Student 1 2 3 4 5 6 7 8 9 10 N=10

X Hrs Condit. 12 10 25 20 5 13 18 20 12 15 ΣX = 150 Mx = 15

Y Fitness Lvl. 120 110 145 135 80 90 120 140 100 110 ΣY = 1150 My =115

x (X-Mx) -3 -5 10 5 -10 -2 3 5 -3 0 Σx = 0

r r

y (Y-My) 5 -5 30 20 -35 -25 5 25 -15 -5 Σy = 0

x2

y2

9 25 25 25 100 900 25 400 100 1225 4 625 9 25 25 625 9 225 0 5 2 2 Σx = 306 Σy =4100

xy -15 25 300 100 350 50 15 125 45 0 Σxy=995

 xy  x  y  2

2

995

3064100 

995 1120.1 r  .89 r

The correlation coefficient obtained, .89 indicates a strong positive relationship between hours spent conditioning and fitness level. In other words, people who spend more time conditioning tend, generally, to be more fit.


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.