• The most popular correlation coefficient. • Used to determine the correlation between two variables under three conditions: 1. Both variables must be interval or ratio measures. 2. The relationship between the two variables must be linear. 3. Both variables are normally distributed.
• Should use a large scale for variables (larger sample size) in this analysis. Pharmaceutical Biostatistics: Correlation & Regression
In Words
In Symbols
1. Sum all the X-values.
∑X
2. Sum all the Y-values.
∑Y ∑ XY
3. Multiply each X-value by its corresponding Y-value and find the sum.
2 ∑ X 4. Square each X-value and find the sum. ∑Y 2 5. Square each Y-value and find the sum. 6. Use these five sums to calculate the n ∑ XY − ∑ X ∑Y r= correlation coefficient. 2&# 2& # 2 2 %n ∑ X − ∑ X (%n ∑Y − ∑Y ( $ '$ '
( )( )
( )
Pharmaceutical Biostatistics: Correlation & Regression
( )
Example: Calculate the correlation coefficient r for the following data.
x 1 2 3 4 5
• Formula:
r=
y –3 –1 0 1 2 n ∑ XY − ∑ X ∑Y
( )( )
2&# 2& # 2 2 n ∑ X − ∑ X n ∑ Y − ∑ Y % (% ( $ '$ '
( )
( )
Pharmaceutical Biostatistics: Correlation & Regression
Example: Calculate the correlation coefficient r for the following data.
x 1 2 3 4 5
y –3 –1 0 1 2
∑ x = 15 ∑ y = −1 r=
n ∑ XY − ∑ X ∑Y
( )( )
2&# 2& # 2 2 %n ∑ X − ∑ X (%n ∑Y − ∑Y ( $ '$ '
( )
xy –3 –2 0 4 10
x2 1 4 9 16 25
∑ xy = 9
∑ x 2 = 55
=
( )
y2 9 1 0 1 4 ∑ y 2 = 15
( )( )
5(9) − 15 −1
"5(55) − 152$"5(15) − −1 2$ ' # %&# %
( )
=
60 50 * 74
≈ 0.986
There is a strong positive linear correlation between x and y.
Pharmaceutical Biostatistics: Correlation & Regression
Exercise: The following data represents the number of hours 12 different students watched television during the weekend and the scores of each student who took a test the following Monday. a.) Display the scatter plot. b.) Calculate the correlation coefficient r.
Hours, x
0
1
2
3
3
5
5
5
6
7
7
10
Test score, y 96 85 82 74 95 68 76 84 58 65 75 50
Continued. Pharmaceutical Biostatistics: Correlation & Regression
Answer: Hours, x
0
1
2
3
3
5
5
5
6
7
7
10
Test score, y 96 85 82 74 95 68 76 84 58 65 75 50 y
Test score
100 80 60 40 20 x 2
4 6 8 10 Hours watching TV
Continued.
Answer:
Hours, x
0
1
2
3
3
5
5
5
6
7
7
10
Test score, y 96
85
82
74
95
68
76
84
58
65
75
50
xy x2
85 1
164 222 285 340 380 420 348 455 525 500 4 9 9 25 25 25 36 49 49 100
0 0
y2
9216 7225 6724 5476 9025 4624 5776 7056 3364 4225 5625 2500 ∑ x = 54
∑ y = 908
∑ xy = 3724 =
∑ y 2 = 70836
∑ x 2 = 332
( )(
)
12(3724) − 54 908
"12(332) − 542$"12(70836) − 908 2$ ' # %&# %
(
)
There is a strong negative linear correlation. As the number of hours spent watching TV increases, the test scores tend to decrease. Pharmaceutical Biostatistics: Correlation & Regression
≈ −0.831
• Alternative procedure: Using
n, ∑ X , ∑Y ,∑ X 2 , ∑Y 2 , ∑ XY
compute
2
SYY
X) ∑ ( =∑X − n Y) ∑ ( = ∑Y −
S XY
X ) (∑Y ) ∑ ( = ∑ XY −
S XX
2
2
2
n
r can be shown equal to
n
r=
S XY S XX SYY
One way to determine this is to use Table Pearson’s Correlation Coefficient r (Critical Values). x
1
2
3
4
5
y
-3
-1
0
1
2
r = 0.986
If |r| is greater than the critical value, there is enough evidence to decide that the correlation coefficient ρ is significant.
df= (n-2) 2 3 4 5
α = 0.05 0.950 0.878 0.811 0.754
α = 0.01 0.990 0.959 0.971 0.875
For a sample of size n = 6, ρ is significant at the 5% significance level, if |r| > 0.811.
In Words
In Symbols
1. Determine the number of pairs of data in the sample.
Determine n.
2. Specify the level of significance.
Identify α.
3. Find the critical value. 4. Decide if the correlation is significant. 5. Interpret the decision in the context of the original claim.
Use Table Pearson’s r (Critical Values) If |r| > critical value, the correlation is significant. Otherwise, there is not enough evidence to support that the correlation is significant.
Pharmaceutical Biostatistics: Correlation & Regression
Example: The following data represents the number of hours 12 different students watched television during the weekend and the scores of each student who took a test the following Monday. The correlation coefficient r ≈ -0.831. Hours, x
0
1
2
3
3
5
5
5
6
7
7
10
Test score, y 96 85 82 74 95 68
76
84 58 65 75
50
Is the correlation coefficient significant at Îą = 0.01? Continued. Pharmaceutical Biostatistics: Correlation & Regression
r ≈ -0.831 n = 12
n 4 5 6
α = 0.05 0.950 0.878 0.811
α = 0.01 0.990 0.959 0.917
10 11 12 13
0.632 0.602 0.576 0.553
0.765 0.735 0.708 0.684
α = 0.01
|r| > 0.708
Because, the population correlation is significant, there is enough evidence at the 1% level of significance to conclude that there is a significant linear correlation between the number of hours of television watched during the weekend and the scores of each student who took a test the following Monday. Pharmaceutical Biostatistics: Correlation & Regression
• A hypothesis test can also be used to determine whether the sample correlation coefficient r provides enough evidence to conclude that the population correlation coefficient ρ is significant at a specified level of significance. A hypothesis test can be one tailed or two tailed.
H0: ρ ≥ 0 (no significant negative correlation) Ha: ρ < 0 (significant negative correlation)
Left-tailed test
H0: ρ ≤ 0 (no significant positive correlation) Ha: ρ > 0 (significant positive correlation)
Right-tailed test
H0: ρ = 0 (no significant correlation) Ha: ρ ≠ 0 (significant correlation)
Two-tailed test
The t-Test for the Correlation Coefficient A t-test can be used to test whether the correlation between two variables is significant. The test statistic is r and the standardized test statistic follows a tdistribution with n – 2 degrees of freedom.
r
n − 2 t= =r 2 2 1−r 1−r n−2 In this text, only two-tailed hypothesis tests for ρ are considered.
Pharmaceutical Biostatistics: Correlation & Regression
Using the t-Test for the Correlation Coefficient ρ In Words
In Symbols
1. State the null and alternative State H0 and Ha. hypothesis. 2. Specify the level of significance.
Identify α.
3. Identify the degrees of freedom.
d.f. = n – 2
4. Determine the critical value(s) and rejection region(s).
Use t-Table
Pharmaceutical Biostatistics: Correlation & Regression
Using the t-Test for the Correlation Coefficient ρ In Words
In Symbols
5. Find the standardized test statistic.
t = r n − 22 1−r
6. Make a decision to reject or fail to reject the null hypothesis.
If t is in the rejection region, reject H0. Otherwise fail to reject H0.
7. Interpret the decision in the context of the original claim.
Pharmaceutical Biostatistics: Correlation & Regression
Example: The following data represents the number of hours 12 different students watched television during the weekend and the scores of each student who took a test the following Monday. The correlation coefficient r â&#x2030;&#x2C6; -0.831. Hours, x
0
5
5
5
7
10
Test score, y
96 85 82 74 95 68
76
84 58 65 75
50
1
2
3
3
6
7
Test the significance of this correlation coefficient significant at Îą = 0.01? Continued. Pharmaceutical Biostatistics: Correlation & Regression
Example continued: H0: ρ = 0 (no correlation)
Ha: ρ ≠ 0 (significant correlation)
The level of significance is α = 0.01. Degrees of freedom are d.f. = 12 – 2 = 10. The critical values are -t0 = -3.169 and t0 = 3.169. The standardized test statistic is The test statistic falls in the rejection region, so H0 is rejected.
12 − 2 t = r n − 22 = −0.831 1−r 1 − (−0.831)2
≈ −4.72. -t0 = -3.169
0
t0 = 3.169
t
At the 1% level of significance, there is enough evidence to conclude that there is a significant linear correlation between the number of hours of TV watched over the weekend and the test scores on Monday morning. Pharmaceutical Biostatistics: Correlation & Regression