Virtual University of Pakistan Lecture No. 35 of the course on Statistics and Probability by Miss Saleha Naghmi Habibullah
1
IN THE LAST LECTURE, YOU LEARNT
•Desirable Qualities of a Good Point Estimator: •Efficiency •Methods of Point Estimation: •The Method of Moments •The Method of Least Squares •The Method of Maximum Likelihood •Interval Estimation: •Confidence Interval for µ
2
TOPICS FOR TODAY
•Confidence Interval for µ (continued) •Confidence Interval for µ 1-µ 2
In the last lecture, we discussed the construction of the 95% confidence interval regarding the mean of a population i.e. Âľ.
Let us now apply this concept to an example:
3
EXAMPLE-1:
Consider a car assembly plant employing something over 25,000 men. In planning its future labour requirements, the management wants an estimate of the number of days lost per man each year due to illness or absenteeism. A random sample of employment records shows following situation:
500 the
4 Number of Days Lost None 1 or 2 3 or 4 5 or 6 7 or 8 9 to 12 13 to 20 Total
Number of Employees 48 43 90 186 78 34 21 500
5
Construct a 95% confidence
mean
interval for the number of days lost per man each year due to illness or absenteeism.
6
SOLUTION
1. The point estimate of µ is X, which in this example comes out to be X = 5.38 days 2. In order to construct a confidence interval for µ, we need to compute s, which in this example comes out to be s = 3.53 days.
7
Hence, the 95% confidence interval for µ comes out to be 1.96 × 3.53 1.96 × 3.53 , 5.38 + 5.38 − 500 500
or 5.38 ± 0.31 days = 5.07 days to 5.69 days.
In other words, we can say that the mean number of days lost per man each year due to illness or absenteeism lies somewhere between 5.07 days and 5.69 days, and this statement is being made on the basis of 95% confidence.
A very important point to be noted here is that we should be very careful regarding the
interpretation
confidence intervals:
of
When we set 1 - α = 0.95, it means that the probability is 95% that the interval σ σ from X − 1.96 to X + 1.96 n n
will actually contain the true population mean µ.
In other words, if we construct a large number of intervals of this type, corresponding to the large number of samples that we can draw from any particular population, then out of every 100 such intervals, 95 will contain the true population mean Âľ whereas 5 will not.
The above statement pertains to the overall situation in repeated sampling --- once a sample has actually been chosen from a population,X computed σ and the interval X ± 1.96 n constructed, then this interval either contains µ, or does not contain µ .
So, the probability that our interval corresponding to sample values that have actually occurred, is either one (i.e. cent per cent), or zero. The statement 95% probability is valid before any sample has actually materialized.
In other words, we can say that our procedure of interval estimation is such that, in repeated sampling, 95% of the intervals will contain Âľ.
The above example pertained to the 95% confidence interval for Âľ.
8
In general, the lower and upper limits of the confidence interval for µ are given by s x ± zα 2 n
Where the value of zα/2 depends on how much confidence we want to have in our interval estimate.
9 α 2
− zα 2
1− α 0
α 2
zα 2
Z
The above situation leads to the (1-α) 100% C.I. for µ .
10
If (1-α) = 0.95, then zα/2 = 1.96 whereas If (1-α) = 0.99, then zα/2 = 2.58 and If (1-α) = 0.90, then zα/2 = 1.645 . (The above values of zα/2 are easily obtained from the area table of the standard normal distribution).
An important to note is that, as indicated earlier, the above formula for the conference interval is valid when we are sampling from an infinite population in such a way that the sample size n is
large.
How large should n be in a practical situation? The rule of thumb in this regard is that whenever n ≼ 30, we can use the above formula.
Confidence Interval for µ, 11 the Mean of an Infinite Population: For large n (n ≥ 30), the confidence interval is given by
x ± zα 2 where and s=
x ∑ x= n
s n
is the sample mean
2 ( ) x − x ∑
n −1
is the sample standard deviation.
Let us consolidate the idea by looking at a few more examples:
12
EXAMPLE-1
The Punjab Highway Department is studying the traffic pattern on the G.T. Road near Lahore. As part of the study, the department needs to estimate the average number of vehicles that pass the Ravi bridge each day.
13 A random sample of 64 days gives
X = 5410 and s = 680. Find the 90 per cent confidence interval estimate for ¾, the average number of vehicles per day.
14
SOLUTION
The 90% confidence interval for µ is x ± zα 2
s , n
where x = 5410, s = 680, n = 64 and z0.05 = 1.645.
15 Substituting these values, we obtain 680 5410 ± (1.645) 64
or 5410 ± (1.645) ( 85) or 5410 ± 139.8 or 5270.2 to 5549.8 or, rounding the above two figures correct to the nearest whole number, we have : 5270 to 5550.
Hence, we can say that the average number of vehicles that pass the Ravi bridge each day lies somewhere between 5270 and 5550, and this statement is being made on the basis of 90% confidence.
16
EXAMPLE-2
Suppose a car rental firm wants to estimate the average number of miles traveled per day by each of its cars rented in one particular city.
17 A random sample of 110 cars rented in this particular city reveals that the mean travel distance per day is 85.5 miles, with a standard deviation of 19.3 miles.
Compute a 99% confidence interval to estimate Âľ.
18
SOLUTION
Here, n = 110,鵃出 = 85.5, and S = 19.3. For a 99% level of confidence, a z-value of 2.575 is obtained.
19
The confidence interval is
X− Zα /2
S S ≤ µ ≤ X+ Zα /2 n n
19.3 19.3 85.5 − 2.575 ≤ µ ≤ 85.5 + 2.575 110 110 85.5 − 4.7 ≤ µ ≤ 85.5 + 4.7 80.8 ≤ µ ≤ 90.2
The point estimate indicates that the average number of miles traveled per day by a rental car in this particular city is 85.5. With 99% confidence, we estimate that the population mean is somewhere between 80.8 and 90.2 miles per day.
Next, we consider a very interesting and
important
way of interpreting a confidence interval:
20
An Important Way of Interpreting a Confidence Interval: Because of the fact that σ
σ x is equal to , n σ Hence, x ± z α / 2 is equal to n
x ± z α / 2σ x
(where σ x represents the standard error of X ).
Hence :
21
The C.I. for µ can be defined as X ± a certain number of standard errors of X .
Defining a Confidence Interval as: “A point estimate plus/minus a few times the standard error of that estimate”, The question arises: “How many
times?”
The answer is: That depends on the level of confidence that we wish to have.
22 In the case of 99% confidence, zホア/2 ~ 2.5, (so that, in this case, we can say that our confidence interval is
x
1 ツア 2 2 マベ
);
23 Similarly, in the case of 95% confidence, zα/2 ~ 2, (so that, in this case, we can say that our confidence interval is
x ± 2σ x ) ;
and so on.
Another important point to be noted is that:
It is a matter of common sense that, in any situation, the narrower our confidence interval, the better. (Ideally, the width of a confidence interval should be zero --- i.e. we should simply have a point estimate.)
It would be quite unwise to say: “I am 99.999% confident that the mean height of the adult males of this particular city lies somewhere between 4 feet and 12 feet.� _!
The important question is :
How do we achieve a narrow confidence interval with a high level of confidence?
To answer this question, we should have a closer look at the expression of the confidence interval :
x ± z α / 2σ x
This expression shows clearly that if the quantity z α / 2 σ x is small, we will achieve a narrow confidence interval. This quantity will be small if either σ x is small or z α / 2 is small.
σ Now, σ x is equal to , n and hence σ x will be small if the sample size n is large.
On the other hand,
zα / 2
will be small if the level of confidence 1-α is relatively low.
As far as the first point, that of n being small, is concerned, it should be noted that, in many real-life situations, due to practical constraints, we cannot increase the sample size beyond a certain limit.
(We may not have the resources to be able to draw a relatively large sample --- our budget may be limited, the timeperiod at our disposal may be short, etc.)
As far as the second point, that of fixing a relatively low level of confidence, is concerned, this is in our own hands, and we can fix our level of confidence as low as we wish --- but, obviously, it will not make much sense to say:
“I have estimated that the mean height of adult males of this particular city lies somewhere between 5 feet, 6 inches and 5 feet, 7 inches, and I am saying this with 20% confidence.� _!
The gist of the above discussion is that, in any reallife situation, given a particular sample size, we need to strike a compromise between how low a level of confidence can we tolerate, or how wide an interval can we tolerate.
Next, we consider the confidence interval for the difference between two population means i.e. Âľ 1-Âľ 2:
24
Confidence Interval for the difference between the means of two Populations (i.e. µ 1 – µ 2): For large samples drawn independently from two populations, the C.I. for µ 1 – µ 2 is given by
( x1 − x 2 ) ± z α / 2
2 s1
2
s2 + n1 n 2
where subscript 1 denotes the first population, and subscript 2 denotes the second population
We illustrate this concept with the help of a few examples:
25
EXAMPLE-1 The means and variances of the weekly incomes in rupees of two samples of workers are given in the following table, the samples being randomly drawn from two different factories: Factory A B
Sample Size 160 220
Mean 12.80 11.25
Variance 64 47
26
Calculate the 90% confidence interval for the real difference in the incomes of the workers from the two factories.
27
SOLUTION
1. If both n1 and n2 are large, the confidence limits are given by 2 2 s1 s 2 ( x1 − x 2 ) ± z α / 2 + n1 n 2 2. We know that zα/2 = 1.645 for 90% confidence
28
0.05
-zα/2= -1.645
0.90 0
0.05
zα/2=1.645
Z
29
3. Hence, Substituting the values in the formula, we obtain 64 47 (12.80 – 11.25) ± 1.645 + 160 220
or 1.55 ± 1.645 0.4 + 0.21 or 1.55 ± 1.645 0.61 or 1.55 ± 1.28 or 0.27 and 2.83
Hence we can say that we are 90% confident that, on the average, the difference in the incomes of the workers from the two factories lies somewhere between Rs.0.27 and Rs.2.83.
30
EXAMPLE-2
Suppose a study is conducted in a developed country to estimate the difference between middleincome shoppers and lowincome shoppers in terms of the average amount saved on grocery bills per week by using coupons.
31 Random samples of 60
middle-income shoppers and 80 low-income shoppers are taken, and their purchases are monitored for 1 week. The average amounts saved with coupons, as well as sample sizes and sample standard deviations are given below:
32 Middle-Income Shoppers
Low-Income Shoppers
n1 = 60
n2 = 80
X1 = $5.84
X 2 = $2.67
S1 = $1.41
S2 = $0.54
33
Use this information to construct a 98% confidence interval to estimate the difference between the mean amounts saved with coupons by middle-income shoppers and low-income shoppers.
34
SOLUTION
The value of z α / 2 associated with a 98% level of confidence is 2.33. 0.01
-zα/2= -2.33
0.98 0
0.01
zα/2=2.33
Z
Using this value, we can determine the confidence interval as follows:
35 1.412 0.542 (5.84 −2.67 )−2.33 + 60 80 ≤µ µ 1− 2 2
1.41 0.54 ≤(5.84 −2.67 ) +2.33 + 60 80 3.17 −0.45 ≤µ µ 3.17 +0.45 1− 2 ≤ 2.72 ≤µ µ 3.62 1− 2 ≤
2
Hence, the 98% confidence interval for the difference between the mean amounts saved with coupons by middleincome shoppers and lowincome shoppers is : ($2.72, $3.62)
The point estimate for the difference in mean savings is $3.17. Note that a zero difference in the population means of these two groups is unlikely, because the number zero is not in the 98% range.
The data seems to provide a strong indication that, on the average, the middle income shoppers are saving a little more than the low income shoppers.
36
IN TODAY’S LECTURE, YOU LEARNT
•Confidence Interval for µ (continued) •Confidence Interval for µ 1-µ 2
37
IN THE NEXT LECTURE, YOU WILL LEARN
•Large Sample Confidence Intervals for p and p1-p2 •Determination of Sample Size (with reference to Interval Estimation) •Hypothesis-Testing (An Introduction)