Simple Linear Correlation: An Introduction

Page 1

A correlation is a relationship between two variables. The data can be represented by the ordered pairs (x, y) where x is the independent (or explanatory) variable, and y is the dependent (or response) variable. x

1

2

3

4

5

y

–4

–2

–1

0

2

y 2 x

Example: 2

A scatter plot can be used to determine whether a linear (straight line) correlation exists between two variables.

–2

–4

4

6


• If we are interested only in determining whether a relationship exists, we employ correlation analysis. •  Example: Student’s height and weight. Plot of Height vs Weight

Plot of Height vs Weight

7

7 6.6

6.2

Height

Height

6.6

5.8 5.4

6.2 5.8

5 4.6 100

140

180

220

5.4

260

100

Weight

140

180

220

260

Weight

Plot of Height vs Weight

Plot of Height vs Weight

6.8

6.6 6.2

6.2

Height

Height

6.5

5.9 5.6

5.8 5.4

5.3 100

140

180

Weight

220

260

5 100

140

180

Weight

220

260

17.17


•  No distinction between explanatory (x) and response (y) variable. •  Requires both variables to be quantitative or continuous variables (no categorical or nominal variables). •  Both variables must be normally distributed. If one or both are not, either transform the variables to near normality or use an alternative non-parametric test of Spearman. Pharmaceutical Biostatistics: Correlation & Regression


Strong relationships

Weak relationships

Y

Y

X Y

X Y

X Pharmaceutical Biostatistics: Correlation & Regression

X


No relationship Y

X Y

X Pharmaceutical Biostatistics: Correlation & Regression


•  The values of the two variables (x,y) deviate in the same direction. •  i.e. if an increase (or decrease) in the values of one variable results on an average, in a corresponding increase (or decrease) in the values of the other variable. Pharmaceutical Biostatistics: Correlation & Regression


•  Examples: 1.  Education level and salary potential. 2.  People suffer from depression and suicidal tendencies. 3.  Household income and expenditure. •  In statistics, a perfect positive correlation is represented by the value +1.00. •  The points lie close to a straight line, which has a positive gradient. Pharmaceutical Biostatistics: Correlation & Regression


•  The values of the two variables (x,y) deviate in opposite direction. •  i.e. if an increase (or decrease) in the values of one variable results in an average, in corresponding decrease (or increase) in the values of the other variable. Pharmaceutical Biostatistics: Correlation & Regression


•  Example: 1.  Price and demand of goods. 2.  Depression and self esteem. 3.  Amount of exercises and percentage of body fat. •  In statistics, a perfect negative correlation is represented by the value -1.00. Pharmaceutical Biostatistics: Correlation & Regression


y

y As x increases, y tends to decrease.

As x increases, y tends to increase.

x

x

Negative Linear Correlation y

Positive Linear Correlation y

x

No Correlation

x

Nonlinear Correlation Pharmaceutical Biostatistics: Correlation & Regression


•  The correlation coefficient is a measure of the strength and the direction of a linear relationship between two variables. The symbol r represents the sample correlation coefficient. The formula for r is r=

n ∑ XY − ∑ X ∑Y

( )( )

2&# 2& # 2 2 %n ∑ X − ∑ X (%n ∑Y − ∑Y ( $ '$ '

( )

( )

•  r does not measure nor describe curved or nonlinear association no matter how strong. Pharmaceutical Biostatistics: Correlation & Regression


•  The null hypothesis test that r is significantly different from zero (0). •  Like the mean and SD, r is strongly affected by outliers and sample size.

Pharmaceutical Biostatistics: Correlation & Regression


•  Unit-less and always between –1 and 1 •  The closer to 1, the stronger the positive linear relationship (positive r) •  The closer to –1, the stronger the negative linear relationship (negative r) •  The closer to 0, the weaker any positive/negative linear relationship •  The extreme values +1 and -1 indicate perfect linear relationship (points lie exactly along a straight line) •  Graded interpretation of r : 0.1-0.3 = weak; 0.4-0.7 = moderate and 0.8-1.0=strong correlation Pharmaceutical Biostatistics: Correlation & Regression


y

y

r = -0.91

r = 0.88

x

Strong negative correlation y

x

Strong positive correlation y

r = 0.32

r = 0.07

x

Weak positive correlation

x

Nonlinear Correlation

Pharmaceutical Biostatistics: Correlation & Regression


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.