Virtual University of Pakistan Lecture No. 32 of the course on Statistics and Probability by Miss Saleha Naghmi Habibullah
IN THE LAST LECTURE, YOU LEARNT
• Sampling Distribution of X • Mean and Standard Deviation of the Sampling Distribution of X • Central Limit Theorem
TOPICS FOR TODAY
• Sampling Distribution of pˆ • Sampling Distribution of X1 − X2
You will recall that, in the last lecture, we discussed the sampling distribution of X. We discussed the mean and the standard deviation of the sampling distribution, and, towards the end of the lecture, we consider the very important theorem known as the Central Limit Theorem.
Let us now consider the real-life application of this concept with the help of an example:
EXAMPLE A construction company has 310 employees who have an average annual salary of Rs.24,000. The standard deviation of annual salaries is Rs.5,000.
Suppose that the employees of this company launch a demand that the government should institute a law by which their average salary should be at least Rs. 24500, and, suppose that the government decides to check the validity of this demand by drawing a random sample of 100 employees of this company, and acquiring information regarding their present salaries.
What is the probability that, in a random sample of 100 employees, the average salary will exceed Rs.24,500 (so that the government decides that the demand of the employees of this company is unfounded, and hence does not pay attention to the demand(although, in reality, it was justified))?
SOLUTION The sample size (n = 100) is large enough to assume that the sampling distribution of鵃出 is approximately normally distributed with the following mean and standard deviation:
µ x = µ = Rs.24,000. and standard deviation σ N − n 5000 310 − 100 σx = . = 310 − 1 n N −1 100 = Rs. 412.20
NOTE: Here we have used finite population correction factor (fpc), because the sample size n = 100 is greater than 5 percent of the population size N = 310.
Since X is approximately N(24000, 412.20), therefore X − µ x X − 24000 Z= = σx 412.20
is approximately N(0, 1).
We are required to evaluate P(X > 24,500). Atx = 24,500, we find that
24500 − 24000 z= = 1.21 412.20
24000
24500
0
1.21
X Z
Using the table of areas under the standard normal curve, we find that the area between z = 0 and z = 1.21 is 0.3869.
0.3869 24000
24500
0
1.21
X Z
Hence, P(X > 24,500) = P(Z > 1.21) = 0.5 – P(0 < Z < 1.21) = 0.5 – 0.3869 = 0.1131.
0.3869 0.1131 24000
24500
0
1.21
X Z
Hence, the chances are only 11% that in a random sample of 100 employees from this particular construction company , the average salary will exceed Rs.24,500. In other words, the chances are 89% that, in such a sample, the average salary will exceed Rs.24,500.
not
Hence, the chances are considerably high that the government might pay attention to the employeesâ&#x20AC;&#x2122; demand.
Next, we consider the SAMPLING DISTRIBUTION OF THE SAMPLE PROPORTION:
In this regard, the first point to be noted is that, whenever the elements of a population can be classified into two categories, technically called “success” and “failure”, we may be interested in the proportion of “successes” in the population.
If X denotes the number of successes in the population, then the proportion of successes in the population is given by
X p= . N
Similarly, if we draw a sample of size n from the population, the proportion of successes in the sample is given by
X pË&#x2020; = , n
where X represents the number of successes in the sample.
It is interesting to note that X is a binomial random variable and the binomial parameter p is being called a proportion of successes here.
The sample proportion pË&#x2020; has different values in different samples. It is obviously a random variable and has a probability distribution.
This probability distribution of the proportions of successes in all possible random samples of size n, is called the sampling distribution of pË&#x2020;.
We illustrate this sampling distribution with the help of the following examples:
EXAMPLE-1 A population consists of six values 1, 3, 6, 8, 9 and 12. Draw all possible samples of size n = 3 without replacement from the population and find the proportion of even numbers in each sample.
Construct the sampling distribution of sample proportions and verify that i) µ pˆ = p pq N − n . . ii) Var ( pˆ ) = n N −1
1 - p ; pˆ and p where q = are sample and population proportions respectively.
SOLUTION The number of possible samples of size n = 3 that could be selected without replacement from a population of size N is 6 = 20. 3
Let
pË&#x2020; represent the
proportion
of
even
numbers in the sample. Then the 20 possible samples and the proportion of even numbers are given as follows:
Sample No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Sample Data 1, 3, 6 1, 3, 8 1, 3, 9 1, 3, 12 1, 6, 8 1, 6, 9 1, 6, 12 1, 8, 9 1, 8, 12 1, 9, 12 3, 6, 8 3, 6, 9 3, 6, 12 3, 8, 9 3, 8, 12 3, 9, 12 6, 8, 9 6, 8, 12 6, 9, 12 8, 9, 12
Sample Proportion ( pË&#x2020; ) 1/3 1/3 0 1/3 2/3 1/3 2/3 1/3 2/3 1/3 2/3 1/3 2/3 1/3 2/3 1/3 2/3 1 2/3 2/3
The sampling distribution of sample proportion is given below:
Sampling Distribution of pˆ : (pˆ ) 0 1/3 2/3 1
Σ
No. of Samples 1 9 9 1
Probability
pˆ f (pˆ )
ˆp 2 f (pˆ )
1/20 9/20 9/20 1/20
0 3/20 6/20 1/20
0 1/20 4/20 1/20
20
1
10/20
6/20
f (pˆ )
As n → ∞, the sampling distribution of approaches normality:
µ pˆ = p. pq σpˆ = , n
pˆ
pˆ
Now
10 µpˆ = ∑ pˆ f ( pˆ ) = = 0.5 , and 20 2 σpˆ
= ∑ pˆ f ( pˆ ) − [ ∑ pˆ f ( pˆ ) ] 2
2
2
2 10 1 = − = = 0.05 . 60 20 20
To verify the given relations, we first calculate the population proportion p. Thus :
X p = , where X represents N the number of even numbers in the population.
3 In other words, p = = 0.5 , 6
Hence, we find that µpˆ =0.5 =p ,
pq N −n 0.25 6 −3 . = . n N −1 3 6 −1 and 0.25 = =0.05 =Var (pˆ ) 5 Hence, two properties of the sampling distribution of pˆ are verified.
The sampling Ë&#x2020; p distribution of has the following important properties:
PROPERTIES OF THE SAMPLING DISTRIBUTION OF
Property No. 1: The mean of the sampling distribution of proportions, denoted by µ pˆ , is equal to the population proportion p, that is µ = p.
pˆ
pˆ
Property No. 2: The standard deviation of the sampling distribution of proportions, called the standard error of pË&#x2020; and Ď&#x192; , Ë&#x2020; p denoted by is given as:
pq , a) Ď&#x192;pË&#x2020; = n
when the sampling is performed with replacement
b)
pq N − n σ pˆ = , n N −1
when
sampling is done without replacement from a finite population.
(As in the case of the sampling distribution of X, N−n , is N −1
known as the finite population correction factor (fpc).)
Property No. 3: SHAPE OF THE DISTRIBUTION:
Ë&#x2020; The sampling distribution of p is the binomial distribution. However, for sufficiently large sample sizes, the sampling Ë&#x2020; is approximately distribution of p normal.
As n → ∞, the sampling distribution of approaches normality:
µ pˆ = p. pq σpˆ = , n
pˆ
pˆ
As a rule of thumb, the sampling distribution of pË&#x2020; will be approximately normal whenever both np and nq are equal to or greater than 5.
Let us apply this concept to a real-world situation:
EXAMPLE-2 Ten percent of the 1-kilogram boxes of sugar in a large warehouse are underweight. Suppose a retailer buys a random sample of 144 of these boxes. What is the probability that at least 5 percent of the sample boxes will be underweight?
SOLUTION Here the statistic is the sample proportion ( pË&#x2020; ) . The sample size (n = 144) is large enough to assume that the sample proportion pË&#x2020; is approximately normally distributed with mean
Mean of the sampling distribution of : and Standard Error of
:
Therefore, the sampling distribution of pˆ is approximately N(0.10, 0.025) And, hence: Z =
pˆ − µ pˆ σ pˆ
pˆ − p = pq / n pˆ − 0.10 = 0.025
is approximately N(0, 1).
We are required to find the probability that the proportion of underweight boxes in the sample is equal to or greater than 5% i.e., we require
P( pË&#x2020; â&#x2030;Ľ 0.05) .
In this regard, a very important point to be noted is that, just as we use a continuity correction of + ½ whenever we consider the normal approximation to the binomially distributed random variable X, in this situation, since pË&#x2020; = X , n
therefore, we need to use the following continuity correction:
We need to use a continuity
1 correction of Âą 2n
in the case of the sampling distribution of pË&#x2020; .
Applying the continuity correction in this problem, we have: 1 P( pˆ ≥ 0.05) ⇒ P pˆ ≥ 0.05 − ( 2)(144) 1 ˆ = P p ≥ 0.05 − 288
pˆ − 0.10 ( 0.05 −1 / 288) − 0.10 = P ≥ 0.025 0.025 = P ( Z ≥ −2.14) = P ( − 2.14 ≤ Z ≤ 0) + P( 0 ≤ Z ≤ ∞) = 0.4838 + 0.5 = 0.9838
(using the area table of the standard normal distribution)
0.4838
-2.14
0.5 0.10
pˆ
0
Z
Hence, the probability that at least 5% of the sample boxes are underweight is as high as 98% !
The sampling distributions Ë&#x2020; pertain to the of X and p situation when we are drawing all possible samples of a particular size from one particular population.
Next, we will discuss the case when we are dealing with all possible samples drawn from two populations, such that the samples from the two populations are independent. In this regard, we will consider the sampling distributions of
X1 − X 2 and
pˆ1 − pˆ 2 :
We begin with the sampling distribution of
X1 â&#x2C6;&#x2019; X 2 :
SAMPLING DISTRIBUTION OF DIFFERENCES BETWEEN MEANS:
Suppose we have two distinct populations with means µ 1 and µ 2 and variances
2 σ 1
and
2 σ 2 respectively.
Let independent random samples of sizes n1 and n 2 be selected from the respective populations, and the differences x1 â&#x2C6;&#x2019; x 2 between the means of all possible pairs of samples be computed.
Then, a probability distribution of the differences X1 â&#x2C6;&#x2019;X 2 can be obtained. Such a distribution is called the sampling distribution of the differences of sample means X1 â&#x2C6;&#x2019;X 2 .
We illustrate the sampling distribution of X1 â&#x2C6;&#x2019; X 2 with the help of the following example:
EXAMPLE Draw all possible random samples of size n1 = 2 with replacement from a finite population consisting of 4, 6, 8. Similarly, draw all possible random samples of size n = 2 with replacement from another finite population consisting of 1, 2, 3.
a) Find the possible differences between the sample means of the two population. b) Construct the sampling distribution of X1 â&#x2C6;&#x2019; X 2 and compute its mean and variance.
c)
Verify that
µ x1 − x 2 = µ1 − µ 2 and σ
2
x1 − x 2
=
2 σ1
n1
+
2 σ2 .
n1
SOLUTION:
Whenever we are sampling with replacement from a finite population, the total number of possible n samples is N (where N is the population size, and n is the sample size).
Hence, in this example, there are (3)2 = 9 possible samples which can be drawn with replacement from each population. These two sets of samples and their means are given below:
From Population 1 From Population 2 Sample Sample Sample Sample x1 x2 No. Value No. Value 1 4, 4 4 1 1, 1 1.0 2 4, 6 5 2 1, 2 1.5 3 4, 8 6 3 1, 3 2.0 4 6, 4 5 4 2, 1 1.5 5 6, 6 6 5 2, 2 2.0 6 6, 8 7 6 2, 3 2.5 7 8, 4 6 7 3, 1 2.0 8 8, 6 7 8 3, 2 2.5 9 8, 8 8 9 3, 3 3.0
a) Since there are 9 samples from the first population as well as 9 from the second, hence, there are 81 possible combinations of x1 andx2 . The 81 possible differences x1 –x2 are presented in the following table:
x2 1.0 1.5 2.0 1.5 2.0 2.5 2.0 2.5 3.0
x2 4 3.0 2.5 2.0 2.5 2.0 1.5 2.0 1.0 1.0
5 4.0 3.5 3.0 3.5 3.0 2.5 3.0 2.5 2.0
6 5.0 4.5 4.0 4.5 4.0 3.5 4.0 3.5 3.0
5 4.0 3.5 3.0 3.5 3.0 2.5 3.0 2.5 2.0
6 5.0 4.5 4.0 4.5 4.0 3.5 4.0 3.5 3.0
7 6.0 5.5 5.0 5.5 5.0 4.5 5.0 4.5 4.0
6 5.0 4.5 4.0 4.5 4.0 3.5 4.0 3.5 3.0
7 6.0 5.5 5.0 5.5 5.0 4.5 5.0 4.5 4.0
8 7.0 6.5 6.0 6.5 6.0 5.5 6.0 5.5 5.0
b) The sampling distribution of X1 â&#x2C6;&#x2019; X 2 is as follows:
Probability
x1 − x 2
Tally
=d
f
f ( x1 − x 2 )
df (d)
d f(d)
= f ( d)
2
1.0
|
1
1/81
1/81
1.0/81
1.5
||
2
2/81
3/81
4.5/81
2.0
||||
5
5/81
10/81
20.0/81
2.5
|||| |
6
6/81
15/81
37.5/81
3.0
|||| ||||
10
10/81
30/81
90.0/81
3.5
|||| ||||
10
10/81
35/81
122.5/81
4.0
|||| |||| |||
13
13/81
52/81
208.0/81
4.5
|||| ||||
10
10/81
45/81
202.5/81
5.0
|||| ||||
10
10/81
50/81
250.0/81
5.5
|||| |
6
6/81
33/81
181.5/81
6.0
||||
5
5/81
30/81
180.0/81
6.5
||
2
2/81
13/81
84.5/81
7.0
|
1
1/81
7/81
49.0/81
---
81
1
324/81
1431/81
Total
Thus the mean and the variance are
µ x1 − x 2 = ∑ ( x1 − x 2 ) f ( x1 − x 2 )
324 = ∑ df ( d ) = = 4 , and 81
2 σ x1 − x 2
= ∑ d f ( d ) − [ ∑ df ( d ) ] 2
2
2
1431 324 53 5 = − = − 16 = = 1.67 81 81 3 3
c) In order to verify the properties of the sampling distribution of X1 â&#x2C6;&#x2019; X 2 , we first need to compute the mean and variance of the first population:
The mean and standard deviation of the first population are: 4+6+8 µ1 = = 6 , and 3 2 σ1
( 4 − 6) =
2
+ ( 6 − 6) + ( 8 − 6) 8 = . 3 3 2
2
and
The mean and variance of the second population are: 1+ 2 + 3 µ2 = = 2 , and 3 2 σ2
( 1 − 2) =
2
+ ( 2 − 2) + ( 3 − 2) 2 = . 3 3 2
2
Now µ x1 − x 2 = 4 = 6 − 2 = µ1 − µ 2 , and 2 σ1
2 σ2
8 1 2 1 + = . + . n1 n 2 3 2 3 2 4 1 5 = + = 3 3 3 = 1.67 =
2 σ x1 − x 2
Hence, two properties of the sampling distribution of X1 − X 2 are satisfied.
The sampling distribution X2 of the differences X1 â&#x2C6;&#x2019; has the following properties:
PROPERTIES OF THE SAMPLING DISTRIBUTION OF X1 −X 2 Property No. 1: The mean of the sampling distribution of X1 −X 2 , denoted by
µX1 −X 2 , is equal to the difference
between population means, that is
µ X1 − X2 = µ1 − µ 2
Property No. 2: In case of sampling with or without replacement from two infinite populations, the standard deviation of the sampling distribution of X1 −X 2 (i.e. standard error of X1 −X 2 ), denoted by σX1 −X 2 , is given by
σ X1 − X2 =
2 σ1
n1
+
2 σ2
n2
The above expression for the Standard Error of X1 â&#x2C6;&#x2019;X 2 also holds for finite population when sampling is performed with replacement.
In case of sampling without replacement from a finite population, the formula for the standard error of X1 â&#x2C6;&#x2019; X2 will be suitably modified.
Property No. 3: Shape of the distribution: a) If the POPULATIONS
normally
are
distributed, the sampling
distribution of X1 −X 2 , regardless of sample sizes, will be
normal
mean µ1 −µ2 and variance
2 σ1
n1
+
2 σ2
n2
with .
In other words, the variable
( X1 − X 2 ) − ( µ1 − µ 2 ) Z= 2 σ1
n1
+
2 σ2
n2
is normally distributed with zero mean and unit variance.
b) If the POPULATIONS are non-normal and if both sample sizes are large, (i.e., greater than or equal to 30), then the sampling distribution of the differences between means is approximately a normal distribution by the Central Limit Theorem.
In this case too, the variable ( X1 − X 2 ) − ( µ1 − µ 2 ) Z= 2 2 σ1 σ 2 + n1 n 2 will be approximately normally distributed with mean zero and variance one.
IN TODAY’S LECTURE, YOU LEARNT
• Sampling Distribution of pˆ • Sampling Distribution of X1 − X2
IN THE NEXT LECTURE, YOU WILL LEARN
ˆ 1 − pˆ 2 •Sampling Distribution of p •Point Estimation •Desirable Qualities of a Good Point Estimator –Unbiasedness –Consistency –Efficiency Methods of Point Estimation: •The Method of Moments,