Basic statistics formulas Creating a list of stats formulas used on your course is a good aid for exam revision. We have compiled a list of commonly used formulas in an introductory course involving statistics which is in pdf format for printing. A more comprehensive sheet is available for instant download (for a fee) from our solutions library at www.ahmetrics.com.
Sets
Measures of Location
De Morgan’s Law (AUB)c = Ac ∩ Bc & (A∩B)c = Ac U Bc
Sample mean x=
Commutativity A U B = B U A and A ∩ B = B ∩ A Associativity (A U B) U C = A U (B U C) (A ∩ B) ∩ C = A ∩ (B ∩ C) Distributivity (A U B) ∩ C = (A ∩ C) U (B ∩ C) (A ∩ B) U C = (A U C) ∩ (B U C)
(n+1)/2 value if n is odd; Mean of n/2th and (n+1)/2th values if n is even.
Measures of spread Sample variance 2
∑(x − x ) =
• • •
p(A) = p(A|B1)p(B1)+ p(A|B2)p(B2)+...+p(A|Bk)p(Bk)
n −1
Sample standard deviation,s s = var iance Range Largest value – smallest value Interquartile range (IQR) Upper quartile – lower quartile Coefficient of variation s/x
© www.ahmetrics.com
2
i
Probability p(A) = 1 - p(Ac) p(A U B) = p(A) + p(B) - p(A ∩ B) If events A and B are mutually independent then p(A ∩ B) = p(A)p(B) p(A|B) = p(A ∩ B)/p(B) so long as p(B)>0 p(A ∩ B) = p(A|B)p(B) = p(B|A)p(A) so long as p(A)>0 and p(B)>0 If {B1, B2...,Bk} is a set of mutually excusive and exhaustive events, then
n
Median (for raw data i.e list of numbers ungrouped) List the numbers in ascending order. Median is:
s
• • •
∑x
Expectation, variance, and Normal density function covariance The normal density function (aka normal distribution) has 2 parameters: mean and variance. For the normal distribution:
If random variable X is discrete: E ( X ) = ∑ xi p( xi )
90% of data falls within ± 1.65σ
which is calculated over all possible values of X.
95% of data falls within ± 1.95σ Standardizing (Z-score)
Let g(X) denote a function of discrete X, then:
E ( g ( X )) = ∑ g ( xi ) p(xi )
z=
x−µ
σ
Sampling distribution of the sample mean
Expection rules
Let X, Y, Z denote random variables; a, b denote constants E1. E(a) = a
Suppose X ∼ N ( µ , σ 2 ) , then the distribution of the sample mean X is X ∼ N (µ ,
E2. E(aX) = a E(X)
σ2 n
)
E3. E(X+Y) = E(X) + E(Y) The standard error of X is
Variance rules
σ2 V1. var(a) = 0 V2. var(aX) = a2var(X) V3. var(X
Y) = var(X) + var(Y)
Covariance rules
n This result is true if X does not follow the normal distribution but n is large (and then 2*Cov(X,Y) the result follows because of the Central Limit Theorem)
C1. Cov(a,X) = 0 C2. Cov(aX,bY) = ab*Cov(X,Y) C3. Cov(X+Y,Z) = Cov(X,Z) + Cov(Y,Z)
© www.ahmetrics.com
Confidence intervals for the mean Parameter Mean µ
Assumptions
Formula
Data normally distributed or n is large (n>30);
x ± zα /2
σ n
σ 2 known Data normally distributed;
x ± tα /2,n−1
s n
n small;
σ 2 unknown Difference in means µ X − µY Case of 2 independent distributions
Data are normally distributed;
σ 2 σ 2 ( x − y ) ± zα /2 X + Y nY nX
σ X 2 , σ Y 2 are known Variances unknown; Large samples
Data are normally distributed;
σ X 2 , σ Y 2 are unknown but σ X 2 = σ Y 2
s 2 s 2 ( x − y ) ± zα /2 X + X nX nX 1 1 sp2 + n X nY Where the estimate of the pooled variance is
( x − y ) ± tα /2,n
sp2 =
© www.ahmetrics.com
X
+ nY − 2
(nX − 1) s X 2 + (nY − 1) sY 2 nX + nY − 2
Hypothesis test for the mean Hypothesis
Assumption
Testing a single mean equals a value “a”
Data normally distributed, or large sample;
Ho : µ = a
σ 2 known Data normally distributed
σ 2 unknown
Testing the difference between 2 means equals a number “a” (which includes the case of a=0 which is a test for no difference between means)
Data normally distributed, or large sample; Independent samples;
σ X 2 , σ Y 2 are known
H o : µ X − µY = a
x −a σ/ n Z-table for critical value x −a s/ n t-table with df = n-1 for critical value
x − y −a σ X 2 σY 2 + nY nX Z-table for critical value
Data normally distributed, or large sample; Case of 2 independent distributions
Test equation
Independent samples;
σ X 2 , σ Y 2 are unknown Data are normally distributed;
σ X 2 , σ Y 2 are unknown but σ X 2 = σY 2
x − y −a s X 2 sY 2 + nY nX Z-table for critical value
x − y −a 1 1 sp2 + nX nY Where the estimate of the pooled variance is sp2 =
© www.ahmetrics.com
(nX − 1) s X 2 + (nY − 1) sY 2 nX + nY − 2
Covariance and correlation Parameter
Population formula
Covariance between variables X and Y
COV(X,Y) = E(XY) – E(X)E(Y)
Sample formula ∑ ( x − x )( y − y ) i
i
n −1
Pearson’s correlation
∑ xy − nxy
Cov ( X , Y )
σ XσY
s X sY
Simple linear regression Model: for i = 1,2,…,n yi = α + β xi + ui OLS estimators Intercept
Slope
⌢
β=
α = y − β x σ 2 ∑ x 2i
⌢
var(α ) =
n
(∑ x
i
2
− nx 2
2
2
i
⌢ var( β ) =
)
σ2
∑x
i
Estimator for variance of error term u is ⌢ 2 ∑ ( yi − yi )
n−2
© www.ahmetrics.com
∑ xy − nxy ∑ x − nx 2
− nx 2