Statistics formulas

Page 1

Basic statistics formulas Creating a list of stats formulas used on your course is a good aid for exam revision. We have compiled a list of commonly used formulas in an introductory course involving statistics which is in pdf format for printing. A more comprehensive sheet is available for instant download (for a fee) from our solutions library at www.ahmetrics.com.

Sets

Measures of Location

De Morgan’s Law (AUB)c = Ac ∩ Bc & (A∩B)c = Ac U Bc

Sample mean x=

Commutativity A U B = B U A and A ∩ B = B ∩ A Associativity (A U B) U C = A U (B U C) (A ∩ B) ∩ C = A ∩ (B ∩ C) Distributivity (A U B) ∩ C = (A ∩ C) U (B ∩ C) (A ∩ B) U C = (A U C) ∩ (B U C)

(n+1)/2 value if n is odd; Mean of n/2th and (n+1)/2th values if n is even.

Measures of spread Sample variance 2

∑(x − x ) =

• • •

p(A) = p(A|B1)p(B1)+ p(A|B2)p(B2)+...+p(A|Bk)p(Bk)

n −1

Sample standard deviation,s s = var iance Range Largest value – smallest value Interquartile range (IQR) Upper quartile – lower quartile Coefficient of variation s/x

© www.ahmetrics.com

2

i

Probability p(A) = 1 - p(Ac) p(A U B) = p(A) + p(B) - p(A ∩ B) If events A and B are mutually independent then p(A ∩ B) = p(A)p(B) p(A|B) = p(A ∩ B)/p(B) so long as p(B)>0 p(A ∩ B) = p(A|B)p(B) = p(B|A)p(A) so long as p(A)>0 and p(B)>0 If {B1, B2...,Bk} is a set of mutually excusive and exhaustive events, then

n

Median (for raw data i.e list of numbers ungrouped) List the numbers in ascending order. Median is:

s

• • •

∑x


Expectation, variance, and Normal density function covariance The normal density function (aka normal distribution) has 2 parameters: mean and variance. For the normal distribution:

If random variable X is discrete: E ( X ) = ∑ xi p( xi )

90% of data falls within ± 1.65σ

which is calculated over all possible values of X.

95% of data falls within ± 1.95σ Standardizing (Z-score)

Let g(X) denote a function of discrete X, then:

E ( g ( X )) = ∑ g ( xi ) p(xi )

z=

x−µ

σ

Sampling distribution of the sample mean

Expection rules

Let X, Y, Z denote random variables; a, b denote constants E1. E(a) = a

Suppose X ∼ N ( µ , σ 2 ) , then the distribution of the sample mean X is X ∼ N (µ ,

E2. E(aX) = a E(X)

σ2 n

)

E3. E(X+Y) = E(X) + E(Y) The standard error of X is

Variance rules

σ2 V1. var(a) = 0 V2. var(aX) = a2var(X) V3. var(X

Y) = var(X) + var(Y)

Covariance rules

n This result is true if X does not follow the normal distribution but n is large (and then 2*Cov(X,Y) the result follows because of the Central Limit Theorem)

C1. Cov(a,X) = 0 C2. Cov(aX,bY) = ab*Cov(X,Y) C3. Cov(X+Y,Z) = Cov(X,Z) + Cov(Y,Z)

© www.ahmetrics.com


Confidence intervals for the mean Parameter Mean µ

Assumptions

Formula

Data normally distributed or n is large (n>30);

x ± zα /2

σ n

σ 2 known Data normally distributed;

x ± tα /2,n−1

s n

n small;

σ 2 unknown Difference in means µ X − µY Case of 2 independent distributions

Data are normally distributed;

σ 2 σ 2  ( x − y ) ± zα /2  X + Y  nY   nX

σ X 2 , σ Y 2 are known Variances unknown; Large samples

Data are normally distributed;

σ X 2 , σ Y 2 are unknown but σ X 2 = σ Y 2

s 2 s 2 ( x − y ) ± zα /2  X + X  nX   nX  1 1  sp2  +   n X nY  Where the estimate of the pooled variance is

( x − y ) ± tα /2,n

sp2 =

© www.ahmetrics.com

X

+ nY − 2

(nX − 1) s X 2 + (nY − 1) sY 2 nX + nY − 2


Hypothesis test for the mean Hypothesis

Assumption

Testing a single mean equals a value “a”

Data normally distributed, or large sample;

Ho : µ = a

σ 2 known Data normally distributed

σ 2 unknown

Testing the difference between 2 means equals a number “a” (which includes the case of a=0 which is a test for no difference between means)

Data normally distributed, or large sample; Independent samples;

σ X 2 , σ Y 2 are known

H o : µ X − µY = a

x −a σ/ n Z-table for critical value x −a s/ n t-table with df = n-1 for critical value

x − y −a  σ X 2 σY 2  +   nY   nX Z-table for critical value

Data normally distributed, or large sample; Case of 2 independent distributions

Test equation

Independent samples;

σ X 2 , σ Y 2 are unknown Data are normally distributed;

σ X 2 , σ Y 2 are unknown but σ X 2 = σY 2

x − y −a  s X 2 sY 2  +   nY   nX Z-table for critical value

x − y −a  1 1  sp2  +   nX nY  Where the estimate of the pooled variance is sp2 =

© www.ahmetrics.com

(nX − 1) s X 2 + (nY − 1) sY 2 nX + nY − 2


Covariance and correlation Parameter

Population formula

Covariance between variables X and Y

COV(X,Y) = E(XY) – E(X)E(Y)

Sample formula ∑ ( x − x )( y − y ) i

i

n −1

Pearson’s correlation

∑ xy − nxy

Cov ( X , Y )

σ XσY

s X sY

Simple linear regression Model: for i = 1,2,…,n yi = α + β xi + ui OLS estimators Intercept

Slope

β=

α = y − β x σ 2 ∑ x 2i

var(α ) =

n

(∑ x

i

2

− nx 2

2

2

i

⌢ var( β ) =

)

σ2

∑x

i

Estimator for variance of error term u is ⌢ 2 ∑ ( yi − yi )

n−2

© www.ahmetrics.com

∑ xy − nxy ∑ x − nx 2

− nx 2


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.