Statistics Sampling Distribution and the Central Limit Theorem
A very important part of statistics is making conclusions about an entire population based on a relatively small sample.
A sampling distribution is a probability distribution of a sample statistic that is formed when samples of size n are repeatedly taken from a population. If the sample statistic is the sample mean, then the distribution is the sampling distribution of sample means. The sampling distribution of the mean is a very important distribution.
If you compute the mean of a sample of 10 numbers, the value you obtain will probably not equal the population mean exactly. If you sampled sets of 10 numbers over and over again (computing the mean for each set), you would find that some sample means come much closer to the population mean than others. Some would be higher than the population mean and some would be lower and some would be spot on.
But the mean of these sample means would be very close to the population mean.
Properties of Sampling Distributions of Sample Means 1. The mean of the sample means μ x is equal to the population mean μ . 2. The standard deviation of the sample means σ x is equal to the population standard deviation σ divided by the square root of n.
1. μ x μ σ 2. σ x n
The Central Limit Theorem describes the relationship between the sampling distribution of sample means and the population that the samples are taken from.
1. If samples of size n, where n ≼ 30, are drawn from any population with a mean Îź and a standard deviation Ďƒ, then the sampling distribution of sample means approximates a normal distribution. The greater the sample size, the better the approximation.
2. If the population itself is normally distributed, the sampling distribution of sample means is normally distributed for any sample size n.
In either case, the sampling distribution of sample means has a mean equal to the population mean. Οx  Ο
The sampling distribution of sample means has a variance equal to 1/n times the variance of the population and a standard deviation equal to the population standard deviation divided by the square root of n. 2 σ 2 σx The standard deviation of the sampling n distribution of the sample means is also σ σx called the standard error of the mean. n Probability and the Central Limit Theorem x μ x μ z σx σ/ n
Example. In mountain country, major highways sometimes use tunnels instead of long, winding roads over high passes. However, too many vehicles in a tunnel at the same time can cause a hazardous situation. Traffic engineers are studying a long tunnel in Colorado. If x represents the time for a vehicle to go through the tunnel, it is known that the x distribution has mean 12.1 minutes and standard deviation 3.8 minutes under ordinary traffic conditions. From data, it is found that the x distribution is approximately normal.
Engineers have calculated that, on average, vehicles should spend from 11 to 13 minutes in the tunnel. If the time is less than 11 minutes, traffic is moving too fast for safe travel. If the time is more than 13 minutes, there is a problem of bad air – too much carbon monoxide and other pollutants). Under ordinary conditions, there are about 50 vehicles in the tunnel at one time. What is the probability that the mean time for 50 vehicles in the tunnel will be from 11 to 13 minutes?
Because the sample size is greater than 30, the x distribution is considered approximately normal. μ x 12.1
σ x σ / n 3.8 / 50 0.5374 P 11 x 13 13 12.1 11 12.1 P z P 2.0469 z 1.6747 0.5374 0.5374
It would seem that about 93% of the time there should be no safety hazard for average traffic flow.
Examples. Credit card balances are normally distributed, with a mean of $2870 and a standard deviation of $900. 1. What is the probability that a randomly selected credit card holder has a credit card balance of less than $2500? In this case, we are finding the probability for an individual member of a population, not a sample. P x 2500 2500 2870 P z P z 0.4111 900
2. We randomly select 25 credit card holders. What is the probability that their credit card balance is less than $2500? 2500 2870 P x 2500 P z P z 2.0556 900 / 25
3.Compare the probability from 1 and 2. There is a 34% chance that an individual will have a balance less than $2500, there is only a 2% chance that the mean of a sample of 25 will have a balance less than $2500. The chances of an individual varying from the population mean is larger than the chances that the average obtained from a group of 25 will vary from the population mean. If the sample had fewer members, its chances of varying more than the population would be larger.
Example. A manufacturer of automobile batteries claims that the distribution of the lengths of life of its best battery has a mean of 54 months and a standard deviation of 6 months. Suppose a consumer group decides to check the claim by purchasing a sample of 50 of these batteries and subjecting them to tests that determine battery life. 1. Assuming that the manufacturer's claim is true, describe the sampling distribution of the mean lifetime of a sample of 50 batteries.
Even though we have no information about the shape of the probability distribution of the lives of the batteries, we can use the Central Limit Theorem to deduce that the sampling distribution for a sample mean lifetime of 50 batteries is approximately normally distributed. We can also assume the mean of this sampling distribution is the same as the mean of the sampled population, which is μ = 54 months according to the manufacturer's claim. We can find that the standard deviation of the sample distribution σ x σ / n 6 / 50 0.8485 months
2. Assuming that the manufacturer's claim is true, what is the probability the consumer group's sample has a mean life of 52 or fewer months? 52 54 P x 52 P z P z 2.3571 .8485
The probability the consumer group will observe a sample mean of 52 or less is only .0092 if the manufacturer's claim is true. If the 50 tested batteries do exhibit a mean of 52 or fewer months, the consumer group will have strong evidence that the manufacturer's claim is untrue, because such an event is very unlikely to occur.
Example. Make a decision. A machine used to fill gallon-sized paint cans is regulated so that the amount of paint dispensed has a mean of 128 ounces and a standard deviation of 0.20 ounce. You randomly select 40 cans and carefully measure the contents. The sample mean of the cans is 127.9 ounces. Does the machine need to be reset? Explain your reasoning.