A correlation is a relationship between two variables. The data can be represented by the ordered pairs (x, y) where x is the independent (or explanatory) variable, and y is the dependent (or response) variable. x
1
2
3
4
5
y
–4
–2
–1
0
2
y 2 x
Example: 2
A scatter plot can be used to determine whether a linear (straight line) correlation exists between two variables.
–2
–4
4
6
• If we are interested only in determining whether a relationship exists, we employ correlation analysis. • Example: Student’s height and weight. Plot of Height vs Weight
Plot of Height vs Weight
7
7 6.6
6.2
Height
Height
6.6
5.8 5.4
6.2 5.8
5 4.6 100
140
180
220
5.4
260
100
Weight
140
180
220
260
Weight
Plot of Height vs Weight
Plot of Height vs Weight
6.8
6.6 6.2
6.2
Height
Height
6.5
5.9 5.6
5.8 5.4
5.3 100
140
180
Weight
220
260
5 100
140
180
Weight
220
260
17.17
• No distinction between explanatory (x) and response (y) variable. • Requires both variables to be quantitative or continuous variables (no categorical or nominal variables). • Both variables must be normally distributed. If one or both are not, either transform the variables to near normality or use an alternative non-parametric test of Spearman. Pharmaceutical Biostatistics: Correlation & Regression
Strong relationships
Weak relationships
Y
Y
X Y
X Y
X Pharmaceutical Biostatistics: Correlation & Regression
X
No relationship Y
X Y
X Pharmaceutical Biostatistics: Correlation & Regression
• The values of the two variables (x,y) deviate in the same direction. • i.e. if an increase (or decrease) in the values of one variable results on an average, in a corresponding increase (or decrease) in the values of the other variable. Pharmaceutical Biostatistics: Correlation & Regression
• Examples: 1. Education level and salary potential. 2. People suffer from depression and suicidal tendencies. 3. Household income and expenditure. • In statistics, a perfect positive correlation is represented by the value +1.00. • The points lie close to a straight line, which has a positive gradient. Pharmaceutical Biostatistics: Correlation & Regression
• The values of the two variables (x,y) deviate in opposite direction. • i.e. if an increase (or decrease) in the values of one variable results in an average, in corresponding decrease (or increase) in the values of the other variable. Pharmaceutical Biostatistics: Correlation & Regression
• Example: 1. Price and demand of goods. 2. Depression and self esteem. 3. Amount of exercises and percentage of body fat. • In statistics, a perfect negative correlation is represented by the value -1.00. Pharmaceutical Biostatistics: Correlation & Regression
y
y As x increases, y tends to decrease.
As x increases, y tends to increase.
x
x
Negative Linear Correlation y
Positive Linear Correlation y
x
No Correlation
x
Nonlinear Correlation Pharmaceutical Biostatistics: Correlation & Regression
• The correlation coefficient is a measure of the strength and the direction of a linear relationship between two variables. The symbol r represents the sample correlation coefficient. The formula for r is r=
n ∑ XY − ∑ X ∑Y
( )( )
2&# 2& # 2 2 %n ∑ X − ∑ X (%n ∑Y − ∑Y ( $ '$ '
( )
( )
• r does not measure nor describe curved or nonlinear association no matter how strong. Pharmaceutical Biostatistics: Correlation & Regression
• The null hypothesis test that r is significantly different from zero (0). • Like the mean and SD, r is strongly affected by outliers and sample size.
Pharmaceutical Biostatistics: Correlation & Regression
• Unit-less and always between –1 and 1 • The closer to 1, the stronger the positive linear relationship (positive r) • The closer to –1, the stronger the negative linear relationship (negative r) • The closer to 0, the weaker any positive/negative linear relationship • The extreme values +1 and -1 indicate perfect linear relationship (points lie exactly along a straight line) • Graded interpretation of r : 0.1-0.3 = weak; 0.4-0.7 = moderate and 0.8-1.0=strong correlation Pharmaceutical Biostatistics: Correlation & Regression
y
y
r = -0.91
r = 0.88
x
Strong negative correlation y
x
Strong positive correlation y
r = 0.32
r = 0.07
x
Weak positive correlation
x
Nonlinear Correlation
Pharmaceutical Biostatistics: Correlation & Regression