liveexamhelper.com
Introduction to Probability and Statistics For any help regarding Probability and Statistics Exam Help visit : https://www.liveexamhelper.com/, Email - info@liveexamhelper.com or call us at - +1 678 648 4277
Questi ons 1
Topics • Statistics: data, M L E (pset 5) • Bayesian inference: prior, likelihood, posterior, predictive probability, probability in tervals (psets 5, 6) • Frequentist inference: N H S T (psets 7, 8)
2
U s i n g the probability tables
You should become familiar with the probability tables at the end of these notes. 1. Use the standard normal table to find the following values. In all the problems Z is a standard normal random variable. ( a ) (i) P (Z < 1.5)
(ii) P (Z > 1.5)
(iii) P (−1.5 < Z < 1.5)
(iv) P (Z ≤ 1.625)
( b ) (i) The right-tail with probability α = 0.05. (ii) The two-sided rejection region with probability α = 0.2. (iii) Find the range for the middle 50% of probability. 2. The t-tables are different. They give the right critical values corresponding to probabil ities. To save space we only give critical values for p ≤ 0.5. You need to use the symmetry of the t-distribution to get them for p < 0.5. That is, t df, p = −tdf, 1 − p , e.g. t5, 0.0975 = −t5, 0.025 . Use the t-table to estimate the following values. In all the problems T is a random variable drawn from a tdistribution with the indicated number of degrees of freedom. (a)
(i) P (T > 1.6), with df = 3
(ii) P (T < 1.6) with df = 3 (iii) P (−1.68 < T < 1.68) with df = 49 (iv) P (−1.6 < T < 1.6) with df = 49 ( b ) (i) The critical value for probability α = 0.05 for 8 degrees of freedom. (iv) The two-sided rejection region with probability α = 0.2 for 16 degrees of freedom. (v) Find the range for the middle 50% of probability with df = 20. 3.
The chi-square tables are different. They give the right critical values corresponding to probabilities.
liveexamhelper.com
Use the chi-square tables table to find the following values. In all the problems X 2 from a χ 2 -distribution with the indicated number of degrees of freedom. ( a)
2 is a random variable drawn
(i) P (X 2 > 1.6), with df = 3
(ii) P (X 2 > 20) with df = 16 ( b ) (i) The right critical value for probability α = 0.05 for 8 degrees of freedom. (ii) The two-sided rejection region with probability α = 0.2 for 16 degrees of freedom.
3 4.
Data The following data is from a random sample: 5, 1, 3, 3, 8.
Compute the sample mean, sample standard deviation and sample median. 5.
The following data is from a random sample: 1, 1, 1, 2, 3, 5, 5, 8, 12, 13, 14, 14, 14, 14, 18, 100. Find the first, second and third quartiles.
4
MLE
6. ( a ) A coin is tossed 100 times and lands heads 62 times. What is the maximum likelihood estimate for θ the probability of heads. ( b ) A coin is tossed n times and lands heads k times. What is the maximum likelihood estimate for θ the probability of heads. 7. Suppose the data set y 1 , . . . , y n is a drawn from a random sample consisting of i.i.d. discrete uniform distributions with range 1 to N . Find the maximum likelihood estimate of N . 8. Suppose data x 1 , . . . , x n is drawn from an exponential distribution exp(λ). Find the maximum likelihood for λ. 9.Suppose x 1 , . . . , x n is a data set drawn from a geometric(1/a) distribution. Find the maximum likelihood estimate of a. Here, geometric(p) means the probability of success is p and we run trials until the first success and report the total number of trials, including the success. For example, the sequence F F F F S is 4 failures followed by a success, which produces x = 5. 10.You want to estimate the size of an M I T class that is closed to visitors. You know that the students are numbered from 1 to n, where n is the number of students. You call three random students out of the classroom and ask for their numbers, which turn out to be 1, 3, 7. Find the maximum likelihood estimate for n. (Hint: the student # ’s are drawn from a discrete uniform distribution.)
liveexamhelper.com
5
Bayesian updati ng: discrete prior, discrete likelihood
11.Tw i n s Suppose 1/3 of twins are identical and 2/3 of twins are fraternal. If you are pregnant with twins of the same sex, what is the probability that they are identical? 12.Dice. You have a drawer full of 4, 6, 8, 12 and 20-sided dice. You suspect that they are in proportion 1:2:10:2:1. Your friend picks one at random and rolls it twice getting 5 both times. (a)
What is the probability your friend picked the 8-sided die?
(b) (i) What is the probability the next roll will be a 5? (ii) What is the probability the next roll will be a 15? 13. Sameer has two coins: one fair coin and one biased coin which lands heads with probability 3/4. He picks one coin at random (50-50) and flips it repeatedly until he gets a tails. Given that he observes 3 heads before the first tails, find the posterior probability that he picked each coin. (c)
What are the prior and posterior odds for the fair coin?
(d)What are the prior and posterior predictive probabilities of heads on the next flip? Here prior predictive means prior to considering the data of the first four flips. to ask each student which they prefer. They agree to start with a prior f (θ) ∼ beta(2, 2),
6
Bayesian U p d ati n g : conti nuous prior, discrete likelihood
14. Peter and Jerry disagree over whether 18.05 students prefer Bayesian or frequentist statistics. They decide to pick a random sample of 10 students from the class and get Shelby where θ is the percent that prefer Bayesian. (a)Let x 1 be the number of people in the sample who prefer Bayesian statistics. What is the pmf of x 1 ? (b)
Compute the posterior distribution of θ given x 1 = 6.
(c)Use R to compute 50% and 90% probability intervals for θ. Center the intervals so that the leftover probability in both tails is the same. (d)The maximum a posteriori ( M AP ) estimate of θ (the peak of the posterior) is given by θˆ = 7/12, leading Jerry to concede that a majority of students are Bayesians. In light of your answer to part (c) does Jerry have a strong case? (e)They decide to get another sample of 10 students and ask Neil to poll them. Write down in detail the expression for the posterior predictive probability that the majority of the second sample prefer Bayesian statistics. The result will be an integral with several terms. Don’t bother computing the integral.
liveexamhelper.com
7
Bayesian Up d ati n g: discrete prior, conti nuous likelihood
15. Suppose that Alice is always X hours late to class and X is uniformly distributed on [0, θ]. Suppose that a priori, we know that θ is either 1/4 or 3/4, both equally likely. If Alice arrives 10 minutes late, what is the most likely value of θ? What if she had arrived 30 minutes late?
8
Bayesian U p d ati n g: conti nuous prior, conti nuous likeli hood
16. Suppose that you have a cable whose exact length is θ. You have a ruler with known error normally distributed with mean 0 and variance 10 − 4 . Using this ruler, you measure your cable, and the resulting measurement x is distributed as − 4 ). N ( a)(θ, 10 Suppose your prior on the length of the cable is θ ∼ N (9, 1). If you then measure x = 10, what is your posterior pdf for θ? ( b ) With the same prior as in part (a), compute the total number of measurements needed so that the posterior variance of θ is less than 10 − 6 . 17. G a m m a prior. Customer waiting times (in hours) at a popular restaurant can be modeled as an exponential random variable with parameter λ. Suppose that a priori we know that λ can take any value in (0, ∞) and has density function 1 f (λ) = 4! λ 4 e − λ . Suppose we observe 5 customers, with waitings times x 1 = 0.23, x 2 = 0.80, x 3 = 0.12, x 4 = 0.35, x 5 =∫ 0.5. density function of λ. ∞ Compute the posterior (a − 1)! a − 1 −by (Hint: y e dy = .) 0
ba
18. Exponenti al censoring. [Information Theory, Inference, and Learning Algorithms by David J. C . Mackay]. Unstable particles are emitted from a source and decay at a distance X ∼ exp(λ), where λ is unknown. Scientists are interested in finding the mean decay distance, given by 1/λ. Their equipment is such that decay events can be observed only if they occur in a window extending from x = 1cm to x = 20cm. ( a) Let Z ( λ ) be the probability that an emitted particle decays in the window of detection. Find Z ( λ ) in terms of λ. probability that an observed decay event occurs at location x, given λ. Use (a). ( b ) A decay event is observed at location x. Find the likelihood f (x|λ). Hint: This is the ( c ) Suppose that based on earlier experiments, scientists believe that the mean decay distance 1/λ is equally likely to be anywhere between 5cm and 30cm. B y transforming random variables, this corresponds to a prior for λ of f Λ (λ ) = 1 2 on [ 1 , 1 ]. O ver the 25λ course of a new experiment, 4 decay events are observed at locations {5, 11, 13, 14}. Find the
30 5
10
1
posterior odds the of mean decay distance is not greater 10cm (i.e., λ ≤ ). Express your answer as that a ratio two integrals (you do needthan to evaluate these integrals; in practice you would hand them to a computer).
liveexamhelper.com
9
NHST
19. z-test Suppose we have 49 data points with sample mean 6.25 and sample variance 12. We want to test the following hypotheses H 0 : the data is drawn from a N (4, 10 2 ) distribution. H A : the data is drawn from N (µ, 10 2 ) where µ / = 4. (a)Test for significance at the α = 0.05 level. Use the tables at the end of this file to compute p-values. (b)Draw a picture showing the null pdf, the rejection region and the area used to compute the p-value. 20. t-test Suppose we have 49 data points with sample mean 6.25 and sample variance 36. We want to test the following hypotheses: (c) H 0 : the data is drawn from N (4, σ 2 ), where σ is unknown. H A : the data is drawn from N (µ, σ 2 ) where µ / = 4. Test for significance at the α = 0.05 level. Use the t-table to find the p value. (b)Draw a picture showing the null pdf, the rejection region and the area used to compute the p-value for part (a). 21. There are lots of good N H S T problems in psets 7 and 8 and the reading, including two-sample t test, chisquare, ANOVA, and F-test for equal variance. 22. Probability, M L E , goodness of fit There was a multicenter test of the rate of success for a certain medical procedure. At each of the 60 centers the researchers tested 12 subjects and reported the number of successes. (a)Assume that θ is the probability of success for one patient and let x be the data from one center. What is the probability mass function of x? (b)Assume that the probability of success θ is the same at each center and the 60 centers produced data: x 6 0 . Find the M L E for θ. Write your answer in terms of x¯
x1 , x2 , . . . ,
Parts (c-e) use the following table which gives counts from 60 centers, e.g. x = 2 occurred in 17 out of 60 centers. x counts 4 015 117 210 38 46 5 (c)
Compute x¯ the average number of successes over the 60 centers.
(d)Assuming the probability of success at each center is the same, show that the M L E for θ is θˆ = 0.1958. (e)Do a χ 2 goodness of fit to test the assumption that the probability of success is the same at each center. Find the p-value and use a significance level of 0.05. In this test the number of degrees of freedom is the number of bins - 2.
liveexamhelper.com
Standard normal table of lef tail probabiliti es. z -4.00 -3.95 -3.90 -3.85 -3.80 -3.75 -3.70 -3.65 -3.60 -3.55 -3.50 -3.45 -3.40 -3.35 -3.30 -3.25 -3.20 -3.15 -3.10 -3.05 -3.00 -2.95 -2.90 -2.85 -2.80 -2.75 -2.70 -2.65 -2.60 -2.55 -2.50 -2.45 -2.40 -2.35 -2.30 -2.25 -2.20 -2.15 -2.10 -2.05
Φ(z) 0.0000 0.0000 0.0000 0.0001 0.0001 0.0001 0.0001 0.0001 0.0002 0.0002 0.0002 0.0003 0.0003 0.0004 0.0005 0.0006 0.0007 0.0008 0.0010 0.0011 0.0013 0.0016 0.0019 0.0022 0.0026 0.0030 0.0035 0.0040 0.0047 0.0054 0.0062 0.0071 0.0082 0.0094 0.0107 0.0122 0.0139 0.0158 0.0179 0.0202
z -2.00 -1.95 -1.90 -1.85 -1.80 -1.75 -1.70 -1.65 -1.60 -1.55 -1.50 -1.45 -1.40 -1.35 -1.30 -1.25 -1.20 -1.15 -1.10 -1.05 -1.00 -0.95 -0.90 -0.85 -0.80 -0.75 -0.70 -0.65 -0.60 -0.55 -0.50 -0.45 -0.40 -0.35 -0.30 -0.25 -0.20 -0.15 -0.10 -0.05
Φ(z) 0.0228 0.0256 0.0287 0.0322 0.0359 0.0401 0.0446 0.0495 0.0548 0.0606 0.0668 0.0735 0.0808 0.0885 0.0968 0.1056 0.1151 0.1251 0.1357 0.1469 0.1587 0.1711 0.1841 0.1977 0.2119 0.2266 0.2420 0.2578 0.2743 0.2912 0.3085 0.3264 0.3446 0.3632 0.3821 0.4013 0.4207 0.4404 0.4602 0.4801
z 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 1.05 1.10 1.15 1.20 1.25 1.30 1.35 1.40 1.45 1.50 1.55 1.60 1.65 1.70 1.75 1.80 1.85 1.90 1.95
Φ(z) 0.5000 0.5199 0.5398 0.5596 0.5793 0.5987 0.6179 0.6368 0.6554 0.6736 0.6915 0.7088 0.7257 0.7422 0.7580 0.7734 0.7881 0.8023 0.8159 0.8289 0.8413 0.8531 0.8643 0.8749 0.8849 0.8944 0.9032 0.9115 0.9192 0.9265 0.9332 0.9394 0.9452 0.9505 0.9554 0.9599 0.9641 0.9678 0.9713 0.9744
z 2.00 2.05 2.10 2.15 2.20 2.25 2.30 2.35 2.40 2.45 2.50 2.55 2.60 2.65 2.70 2.75 2.80 2.85 2.90 2.95 3.00 3.05 3.10 3.15 3.20 3.25 3.30 3.35 3.40 3.45 3.50 3.55 3.60 3.65 3.70 3.75 3.80 3.85 3.90 3.95
Φ(z) 0.9772 0.9798 0.9821 0.9842 0.9861 0.9878 0.9893 0.9906 0.9918 0.9929 0.9938 0.9946 0.9953 0.9960 0.9965 0.9970 0.9974 0.9978 0.9981 0.9984 0.9987 0.9989 0.9990 0.9992 0.9993 0.9994 0.9995 0.9996 0.9997 0.9997 0.9998 0.9998 0.9998 0.9999 0.9999 0.9999 0.9999 0.9999 1.0000 1.0000
Φ(z ) = P (Z ≤ z ) for N(0, 1). (Use interpolation to estimate z values to a 3rd decimal place.)
liveexamhelper.com
Table of Student t criti cal values (right-tail) The table shows tdf , p = the 1 − p quantile of t(df ). We only give values for p ≤ 0.5. Use symmetry to find the values for p > 0.5, e.g. t 5, 0.975 = −t 5, 0.025
In df\p 1 2 3 4 5 6 7 8 9 10 16 17 18 19 20 21 22 23 24 25 30 31 32 33 34 35 40 41 42 43 44 45 46 47 48 49
R notation q t ( 1 -0.015 p, df). 0.005 t df, p = 0.010 63.66 9.92 5.84 4.60 4.03 3.71 3.50 3.36 3.25 3.17 2.92 2.90 2.88 2.86 2.85 2.83 2.82 2.81 2.80 2.79 2.75 2.74 2.74 2.73 2.73 2.72 2.70 2.70 2.70 2.70 2.69 2.69 2.69 2.68 2.68 2.68
31.82 6.96 4.54 3.75 3.36 3.14 3.00 2.90 2.82 2.76 2.58 2.57 2.55 2.54 2.53 2.52 2.51 2.50 2.49 2.49 2.46 2.45 2.45 2.44 2.44 2.44 2.42 2.42 2.42 2.42 2.41 2.41 2.41 2.41 2.41 2.40
21.20 5.64 3.90 3.30 3.00 2.83 2.71 2.63 2.57 2.53 2.38 2.37 2.36 2.35 2.34 2.33 2.32 2.31 2.31 2.30 2.28 2.27 2.27 2.27 2.27 2.26 2.25 2.25 2.25 2.24 2.24 2.24 2.24 2.24 2.24 2.24
0.020
0.025
0.030
0.040
0.050
0.100
0.200
0.300
0.400
0.500
15.89 4.85 3.48 3.00 2.76 2.61 2.52 2.45 2.40 2.36 2.24 2.22 2.21 2.20 2.20 2.19 2.18 2.18 2.17 2.17 2.15 2.14 2.14 2.14 2.14 2.13 2.12 2.12 2.12 2.12 2.12 2.12 2.11 2.11 2.11 2.11
12.71 4.30 3.18 2.78 2.57 2.45 2.36 2.31 2.26 2.23 2.12 2.11 2.10 2.09 2.09 2.08 2.07 2.07 2.06 2.06 2.04 2.04 2.04 2.03 2.03 2.03 2.02 2.02 2.02 2.02 2.02 2.01 2.01 2.01 2.01 2.01
10.58 3.90 2.95 2.60 2.42 2.31 2.24 2.19 2.15 2.12 2.02 2.02 2.01 2.00 1.99 1.99 1.98 1.98 1.97 1.97 1.95 1.95 1.95 1.95 1.95 1.94 1.94 1.93 1.93 1.93 1.93 1.93 1.93 1.93 1.93 1.93
7.92 3.32 2.61 2.33 2.19 2.10 2.05 2.00 1.97 1.95 1.87 1.86 1.86 1.85 1.84 1.84 1.84 1.83 1.83 1.82 1.81 1.81 1.81 1.81 1.80 1.80 1.80 1.80 1.79 1.79 1.79 1.79 1.79 1.79 1.79 1.79
6.31 2.92 2.35 2.13 2.02 1.94 1.89 1.86 1.83 1.81 1.75 1.74 1.73 1.73 1.72 1.72 1.72 1.71 1.71 1.71 1.70 1.70 1.69 1.69 1.69 1.69 1.68 1.68 1.68 1.68 1.68 1.68 1.68 1.68 1.68 1.68
3.08 1.89 1.64 1.53 1.48 1.44 1.41 1.40 1.38 1.37 1.34 1.33 1.33 1.33 1.33 1.32 1.32 1.32 1.32 1.32 1.31 1.31 1.31 1.31 1.31 1.31 1.30 1.30 1.30 1.30 1.30 1.30 1.30 1.30 1.30 1.30
1.38 1.06 0.98 0.94 0.92 0.91 0.90 0.89 0.88 0.88 0.86 0.86 0.86 0.86 0.86 0.86 0.86 0.86 0.86 0.86 0.85 0.85 0.85 0.85 0.85 0.85 0.85 0.85 0.85 0.85 0.85 0.85 0.85 0.85 0.85 0.85
0.73 0.62 0.58 0.57 0.56 0.55 0.55 0.55 0.54 0.54 0.54 0.53 0.53 0.53 0.53 0.53 0.53 0.53 0.53 0.53 0.53 0.53 0.53 0.53 0.53 0.53 0.53 0.53 0.53 0.53 0.53 0.53 0.53 0.53 0.53 0.53
0.32 0.29 0.28 0.27 0.27 0.26 0.26 0.26 0.26 0.26 0.26 0.26 0.26 0.26 0.26 0.26 0.26 0.26 0.26 0.26 0.26 0.26 0.26 0.26 0.26 0.26 0.26 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
liveexamhelper.com
soluti ons Table of χ 2 criti cal values (right-tail) The table shows cdf , p = the 1 − p quantile of χ 2 (df ). In R notation cdf, p = q c h i s q ( 1 - p , d f ) .
df\p
0.010
0.025
0.050
0.100
0.200
0.300
0.500
0.700
0.800
0.900
0.950
0.975
0.990
1 2 3 4 5 6 7 8 9 10 16 17 18 19 20 21 22 23 24 25 30 31 32 33 34 35 40 41 42 43 44 45 46 47 48 49
6.63 9.21 11.34 13.28 15.09 16.81 18.48 20.09 21.67 23.21 32.00 33.41 34.81 36.19 37.57 38.93 40.29 41.64 42.98 44.31 50.89 52.19 53.49 54.78 56.06 57.34 63.69 64.95 66.21 67.46 68.71 69.96 71.20 72.44 73.68 74.92
5.02 7.38 9.35 11.14 12.83 14.45 16.01 17.53 19.02 20.48 28.85 30.19 31.53 32.85 34.17 35.48 36.78 38.08 39.36 40.65 46.98 48.23 49.48 50.73 51.97 53.20 59.34 60.56 61.78 62.99 64.20 65.41 66.62 67.82 69.02 70.22
3.84 5.99 7.81 9.49 11.07 12.59 14.07 15.51 16.92 18.31 26.30 27.59 28.87 30.14 31.41 32.67 33.92 35.17 36.42 37.65 43.77 44.99 46.19 47.40 48.60 49.80 55.76 56.94 58.12 59.30 60.48 61.66 62.83 64.00 65.17 66.34
2.71 4.61 6.25 7.78 9.24 10.64 12.02 13.36 14.68 15.99 23.54 24.77 25.99 27.20 28.41 29.62 30.81 32.01 33.20 34.38 40.26 41.42 42.58 43.75 44.90 46.06 51.81 52.95 54.09 55.23 56.37 57.51 58.64 59.77 60.91 62.04
1.64 3.22 4.64 5.99 7.29 8.56 9.80 11.03 12.24 13.44 20.47 21.61 22.76 23.90 25.04 26.17 27.30 28.43 29.55 30.68 36.25 37.36 38.47 39.57 40.68 41.78 47.27 48.36 49.46 50.55 51.64 52.73 53.82 54.91 55.99 57.08
1.07 2.41 3.66 4.88 6.06 7.23 8.38 9.52 10.66 11.78 18.42 19.51 20.60 21.69 22.77 23.86 24.94 26.02 27.10 28.17 33.53 34.60 35.66 36.73 37.80 38.86 44.16 45.22 46.28 47.34 48.40 49.45 50.51 51.56 52.62 53.67
0.45 1.39 2.37 3.36 4.35 5.35 6.35 7.34 8.34 9.34 15.34 16.34 17.34 18.34 19.34 20.34 21.34 22.34 23.34 24.34 29.34 30.34 31.34 32.34 33.34 34.34 39.34 40.34 41.34 42.34 43.34 44.34 45.34 46.34 47.34 48.33
0.15 0.71 1.42 2.19 3.00 3.83 4.67 5.53 6.39 7.27 12.62 13.53 14.44 15.35 16.27 17.18 18.10 19.02 19.94 20.87 25.51 26.44 27.37 28.31 29.24 30.18 34.87 35.81 36.75 37.70 38.64 39.58 40.53 41.47 42.42 43.37
0.06 0.45 1.01 1.65 2.34 3.07 3.82 4.59 5.38 6.18 11.15 12.00 12.86 13.72 14.58 15.44 16.31 17.19 18.06 18.94 23.36 24.26 25.15 26.04 26.94 27.84 32.34 33.25 34.16 35.07 35.97 36.88 37.80 38.71 39.62 40.53
0.02 0.21 0.58 1.06 1.61 2.20 2.83 3.49 4.17 4.87 9.31 10.09 10.86 11.65 12.44 13.24 14.04 14.85 15.66 16.47 20.60 21.43 22.27 23.11 23.95 24.80 29.05 29.91 30.77 31.63 32.49 33.35 34.22 35.08 35.95 36.82
0.00 0.10 0.35 0.71 1.15 1.64 2.17 2.73 3.33 3.94 7.96 8.67 9.39 10.12 10.85 11.59 12.34 13.09 13.85 14.61 18.49 19.28 20.07 20.87 21.66 22.47 26.51 27.33 28.14 28.96 29.79 30.61 31.44 32.27 33.10 33.93
0.00 0.05 0.22 0.48 0.83 1.24 1.69 2.18 2.70 3.25 6.91 7.56 8.23 8.91 9.59 10.28 10.98 11.69 12.40 13.12 16.79 17.54 18.29 19.05 19.81 20.57 24.43 25.21 26.00 26.79 27.57 28.37 29.16 29.96 30.75 31.55
0.00 0.02 0.11 0.30 0.55 0.87 1.24 1.65 2.09 2.56 5.81 6.41 7.01 7.63 8.26 8.90 9.54 10.20 10.86 11.52 14.95 15.66 16.36 17.07 17.79 18.51 22.16 22.91 23.65 24.40 25.15 25.90 26.66 27.42 28.18 28.94
liveexamhelper.com
1
Topics • Statistics: data, M L E (pset 5) • Bayesian inference: prior, likelihood, posterior, predictive probability, probability in- tervals (psets 5, 6) • Frequentist inference: N H S T (psets 7, 8)
2
U s i n g the probability tables
You should become familiar with the probability tables at the end of these notes. 1.
( a)
(i) The table gives this value as P ( Z < 1.5) = 0.9332.
(ii) This is the complement of the answer in (i): P ( Z > 1.5) = 1 − 0.9332 = 0.0668. Or by symmetry we could use the table for -1.5. (iii) We want P ( Z < 1.5) − P ( Z < −1.5) = P ( Z < 1.5) − P ( Z > 1.5). This is the difference of the answers in (i) and (ii): .8664. (iv) A rough estimate is the average of P ( Z < 1.6) and P ( Z < 1.65). That is, P ( Z < 1.6) + P ( Z < 1.65) .9452 + .9505 P (Z < 1.625) ≈ = = .9479. 2 2 ( b ) (i) We are looking for the table entry with probability 0.95. This is between the table entries for z = 1.65 and z = 1.60 and very close to that of z = 1.65. Answer: the region is [1.64, ∞). ( R gives the ‘exact’ lower limit as 1.644854.) (ii)We want the table entry with probability 0.1. The table probabilities for z = −1.25 and z = −1.30 are 0.1056 and 0.0968. Since 0.1 is about 1/2 way from the first to the second we take the left critical value as -1.275. Our region is (−∞, −1.275) ∪ (1.275, ∞). ( R gives qnorm(0.1, 0 , 1 ) = -1.2816.) (iii)This is the range from q0.25 to q0.75. With the table we estimate q0.25 is about 1/2 of the way from -0.65 to -0.70, i.e. ≈ −0.675. So, the range is [−0.675, 0.675]. 2. ( a ) (i) The question asks to find which p-value goes with t = 1.6 when df = 3. We look in the df = 3 row of the table and find 1.64 goes with p = 0.100 So P (T > 1.6 | df = 3) ≈ 0.1. (The true value is a little bit greater.) (ii) P (T < 1.6 | df = 3) = 1 − P (T > 1.6 | df = 3) ≈ 0.9. (iii) Using the df = 49 row of the t-table we find P (T > 1.68 | df = 49) = 0.05. Now, by symmetry P (T < −1.68 | df = 49) = 0.05 and P (−1.68 < T < 1.68 | df = 49) = 0.9 .
liveexamhelper.com
(iv) Using the df = 49 row of the t-table we find P (T > 1.68 | df = 49) = 0.05 and P (T > 1.30 | df = 49) = 0.1. We can do a rough interpolation: P (T > 1.6 | df = 49) ≈ 0.06. Now, by symmetry P (T < −1.6 | df = 49) ≈ 0.06 and P (−1.6 < T < 1.6 | df = 49) ≈ 0.88 . ( R gives 0.8839727.) (b)
(i) This is a straightforward lookup: The p = 0.05, df = 8 entry is
.
(ii)For a two-sided rejection region we need 0.1 probability in each tail. The critical value at p = 0.1, 1.86df = 16 is 1.34. So (by symmetry) the rejection region is (−∞, −1.34) ∪ (1.34, ∞). (iii)This is the range from q0.25 to q0.75, i.e. from critical values t0.75 to t 0.25 . The table only gives critical for 0.2 and 0.3 For df = 20 these are 0.86 and 0.53. We average these to esti- mate the 0.25 critical value as 0.7. Answer: the middle 50% of probability is approximately between t-values −0.7 and 0.7. (If we took into account the bell shape of the t-distribution we would estimate the 0.25 critical value as slightly closer to 0.53 than 0.86. Indeed R gives the value 0.687.) the way between the values for p = 0.7 and p = 0.5. So we approximate P ( X 2 > 1.6) ≈ 0.66. 3. ( a ) (i) Looking in the df = 3 row of the chi-square table we see that 1.6 is about 1/5 of (The true value is 0.6594.) (ii) Looking in the df = 16 row of the chi-square table we see that 20 is about 1/4 of the way between the values for p = 0.2 and p = 0.3. We estimate P ( X 2 > 20) = 0.25. (The true value is 0.220) (b)
(i) This is in the table in the df = 8 row under p = 0.05. Answer: 15.51
(ii) We want the critical values for p = 0.9 and p = 0.1 from the df = 16 row of the table. [0, 9.31] ∪ [23.54, ∞).
3
Data
4. Sample mean 20/5 = 24. 1 + (−3)2 + (−1)2 + (−1)2 + 42 Sample variance = = 7. 5− 1 √ Sample standard deviation = 7. Sample median = 3. 5. The first quartile is the value where 25% of the data is below it. We have 16 data points so this is between the 4th and 5th points, i.e. between 2 and 3. It is reasonable to take the midpoint and say 2.5. The second quartile is between 8 and 12, we say 10. The third quartile is 14.
liveexamhelper.com
4
MLE
6. ( a)
The likelihood function is
100 θ62 (1 − θ) 38 = cθ62 (1 − θ) 38 . 62 To find the M L E we find the derivative of the log-likelihood and set it to 0. p(data|θ) =
ln(p(data|θ)) = ln(c) + 62 ln(θ) + 38 ln(1 − θ). |θ)) 62 38 d ln(p(data = − = 0. θ 1− θ dθ θ = 62/100 The algebra leads to the M L E . ( b ) The computation is identical to part (a). The likelihood function is n k
p(data|θ) =
θk (1 − θ) n − k = cθk (1 − θ) n − k .
To find the M L E we set the derivative of the log-likelihood and set it to 0. ln(p(data|θ)) = ln(c) + k ln(θ) + (n − k) ln(1 − θ). d ln(p(data|θ)) dθ θ = k/n
The algebra leads to the M L E 7.
= θ .
k n−k − = 0. 1− θ
If N < max(y i ) then the likelihood p(y 1 , . . . , y n |N ) = 0. So the likelihood function is p(y 1 , . . . , y n |N ) =
This is maximized when N is as small as possible. N = max(y i ).
(
0 N1
if N < max(y i ) if N ≥ max(y )
n
i
Since N ≥ max(y i ) the M L E is
8.
The pdf of exp(λ) is p(x|λ) = λe − λ x . So the likelihood and log-likelihood functions are Σ n −λ(x1 +···+xn ) |λ)) = n ln(λ) − λ x. i p(data|λ) = λ e , ln(p(data Taking a derivative with respect to λ and setting it equal to 0: Σ Σ |λ )) n 1 xi d ln(p(data = − x = i0 ⇒ = = x¯. λ λ n dλ So the M L E is
9.
P (x i |a) =
λ = 1/x¯ 1−
1 a
. x i− 1
1 a
.=
a− 1 a
x i− 1
1 a
.
liveexamhelper.com
So, the likelihood function is a− 1 a
P (data|a) =
Σ
x i− n
1 a
n
The log likelihood is ln(P (data|a)) = |a)) d ln(P (data Taking the derivative = da The maximum likelihood estimate is
Σ
Σ
x i − n (ln(a − 1) − ln(a)) − n ln(a). 1 1 — a− 1 a . a = x¯
x −i n
−
10. If there are n students in the room then for the data 1, 3, 7 (occuring the likelihood is 0 p(data | n) = 3! 1/ n 3 Maximizing this does not require calculus. It clearly has a maximum when n possible. Answer: n= 7 . =
5
11.
Σ
n = 0 ⇒ a
n
xi
= a.
in any order) for n < 7 for n ≥ 7 is as small as
n(n−1)(n−2)
Bayesian updati ng: discrete prior, discrete likelihood This is a Bayes’ theorem problem. The likelihoods are P(same sex | identical) = 1 fraternal) = 1/2
P(different sex | identical) = 0 P(same sex | P(different sex | fraternal) = 1/2
The data is ‘the twins are the same sex’. We find the answer with an update table hyp. identical fraternal Tot.
prior 1/3 2/3 1
likelihood 1 1/2
unnorm.
post. 1/3 1/3 2/3
posterior 1/2 1/2 1
So P(identical | same sex) = 1/2 . 12.
( a)
The data is 5. Let H n be the hypothesis the die is n-sided. Here is the update table. hyp. H4 H6 H8 H 12
prior 1 2 10 2
likelihood 0 (1/6) 2 (1/8) 2 (1/12) 2
unnorm. post. 0 2/36 10/64 2/144
posterior 0 0.243457 0.684723 0.060864
H 20
1
(1/20) 2
1/400
0.010956
Tot.
16
0.22819
1
liveexamhelper.com
So P (H 8 |data) = 0.685. ( b ) We are asked for posterior predictive probabilities. Let x be the value of the next roll. We have to compute the total probability p(x|data) =
Σ
p(x|H)p(H|data) =
Σ
likelihood × posterior.
The sum is over all hypotheses. We can organize the calculation in a table where we multiply the posterior column by the appropriate likelihood posterior predictive is the sum of the product hyp. column. posterior The total likelihood post.probability to (i) likelihood post. to column. (ii) to data
H4 H6 H8
(i) x = 5
(ii) x = 15
H 12
0 0.243457 0.684723 0.060864
0 1/6 1/8 1/12
0 0.04058 0.08559 0.00507
0 0 0 0
0 0 0 0
H 20
0.010956
1/20
0.00055
1/20
0.00055
So, (i) p(x = 5|data) Tot. = 0.132 0.22819 and (ii) p(x = 15|data) = 0.00055. 0.13179 13.
(a)
0.00055
Solution to (a) is with part (b).
( b ) Let θ be the probability of the selected coin landing on heads. Given θ, we know that the number of heads observed before the first tails, X , is a geo(θ) random variable. We have updating table: Hyp. θ = 1/2 θ = 3/4 Total
Prior 1/2 1/2 1
Likelihood (1/2) 3 (1/2) (3/4) 3 (1/4) –
Unnorm. Post. 1/2 5 4 3 /2 · 44 43/256
Posterior 16/43 27/43 1
The prior odds for the fair coin are 1, the posterior odds are 16/27. The prior predictive probability of heads is 0.5 · + 0.7512· . The 1posterior predictive probability of heads is 2 27 0.5 · 16 43+ 0.75 · . 43
6
Bayesian U p d ati n g : conti nuous prior, discrete likelihood
14.
(a)
(b)
We have prior:
x 1 ∼ Bin(10, θ).
and likelihood:
f (θ) = c1 θ(1 − θ) p(x
1=
6 | θ) = c2 θ (1 6− θ) ,
4
where
c =2
10 6
.
The unnormalized posterior is f (θ)p(x 1 |θ) = c1 c2θ7 (1 − θ) 5. So the normalized posterior is 7
f (θ|x 1 ) = c3 θ (1 − θ)
5
liveexamhelper.com
Since the posterior has the form of a beta(8, 6) distribution it must be a beta(8, 6) distri7! 5! bution. We can look up the normalizing coefficient c3 = 13! . (c) The 50% interval is [qbeta(0.25,8,6), q b e ta ( 0 . 7 5 , 8 , 6 ) ] = [0 .48 330 , 0.66319] The 90% interval is [qbeta(0.05,8,6),
q b e ta ( 0 . 9 5 , 8 , 6 ) ] = [0.35 480 ,
0.77604]
(d)If the majority prefer Bayes then θ > 0.5. Since the 50% interval includes θ < 0.5 and the 90% interval covers a lot of θ < 0.5 we don’t have a strong case that θ > 0.5. As a further test we compute P (θ < 0.5|x 1) = p b e ta ( 0 . 5 , 8 , 6 ) = 0.29053. So there is still a 29% posterior probability that the majority prefers frequentist statistics. ∫ 1 (e)Let x 2 be the result of the second poll. We want p(x 2 > 5|x 1 ). We can compute this using the law of total p(x 2 > 5|x 1 ) = p(x 2 > 5|θ)p(θ| probability: x 1 ) dθ. 0 10 10 10 p(x 2> 5 |θ) = θ 8(1 − θ) 2 θ6 (1 − θ) 4 + θ7 (1 − θ) 3 + 6 7 8 The two factors in the integral are: 10 10 9 1 + θ 10(1 − θ) 0 9 θ (1 − θ) + 10 13! 7 5 p(θ|x )1 = θ (1 − θ) 7!5! This can be computed exactly or numerically in R using the i n t e g r a t e ( ) function. The answer is P (x 2 > 5 |x1 = 6) = 0.5521.
7
15.
Bayesian U p d ati n g: discrete prior, conti nuous likelihood For a fixed θ the likelihood is
f (x |θ) =
(
If Alice arrived 10 minutes late, we have table Hypothesis θ = 1/4 θ = 3/4 Total
Prior 1/2 1/2 1
1/θ 0
for x ≤ θ for x ≥ θ
Likelihood for x = 1/6 4 4/3 –
Unnorm. Post 2 2/3 8/3
Posterior 3/4 1/4 1
Likelihood for x = 1/2 0 4/3 –
Unnorm. Post 0 2/3 2/3
Posterior 0 1 1
In this case the most likely value of θ is 1/4. If Alice arrived 30 minutes late, we have table Hypothesis θ = 1/4 θ = 3/4 Total
Prior 1/2 1/2 1
liveexamhelper.com
In this case the most likely value of θ is 3/4.
8
Bayesian Up d ati n g: conti nuous prior, conti nuous likeli- hood
16. ( a ) We have µ prior = 9, σ 2 formulas are 1 n a= prior 2 b = σ 2, σ So we compute a = 1/1, b = 10000, σ 2
= 1 and σ 2 = 10 − 4 . The normal-normal updating
prior
aµ prior + bx¯ a +, b
post
µ
=
1 =
σ 2posta + b
.
= 1/(apost + b) = 1/10001 and aµ
bx = 100009 ≈ 9.990 a+ b 10001 So we have posterior distribution f (θ|x = 10) ∼ N (9.99990, 0.0099). (b) We have σ 2 = prior 1 and σ 2 = 10 − 4 . The posterior variance of θ given observations µ post =
prior +
x 1 , . . . , x n is given by
1 1
n2
1
=
1 + n · 104 We wish to find n such that the above quantity is less than 10 − 6 . It is not hard to see that n = 100 is the smallest value such that this is true. σ 2p r i o r
17.
We have likelihood function f ( x 1, . . . , x 5|λ) =
Y5 λe
+
σ
= −λλ xe i
5 − λ ( x +1 x +2 · · · + x )
5
i=1
= λ e5
−2λ
So our posterior density is proportional to: f (λ )f (x 1 , . . . , x 5 |λ ) ∝ λ 9 e− 3λ The hint allows us to compute the normalizing factor. (Or we could recognize this as the pdf of a Gamma random 9 −3λ variable with parameters 10 and 3. Thus, the density is f (λ|x 1 , . . . , x 5 ) = λ e . 310
18.
( a)
9!
Let X be a random decay distance.
Z (λ ) = P (detection | λ ) = P (1 ≤ X ≤ 20 | λ ) =
λe
∫ 20 dx = e 1
−−eλ x
−λ
− 20λ
.
liveexamhelper.com
(b)
Fully specifying the likelihood (remember detection only occurs for 1 ≤ x ≤ 20). ( λe − λx f (x and detected | λ ) f Z (λ ) likelihood = f (x | λ, detected) = = 0 (detected | λ )
for 1 ≤ x ≤ 20 otherwise
( c ) Let Λ be the random variable for λ. Let X = 1/Λ be the random variable for the mean decay distance. We are given that X ba r is uniform on [5, 30] ⇒ f X (x ) = 1/25. First we find the pdf for λ, f Λ ( λ ) , by finding 1and 1then differentiating F Λ ( λ ) . F Λ (λ ) = P (Λ < λ ) = P Λ λ λ 1 0 for 1/λ > 30 0 30− 1/ λ >30 for= P = X > 30 − 1 for 5 < 1/λ < 25λ 1 25 25 1/λ < 5 1 Taking the derivative we get 1 1 f Λ (λ ) = F J (λ Λ ) = 25λ2 on < λ λ e −λxi 30 From part (b) the likelihood f ( x i | λ ) = . So the likelihood Z(λ) f (data | λ ) = = Z(λ)4 Z(λ)4 Σ Now we have the prior and likelihood so we − λ do x i a Bayesian update: λ 4 ecan Hypothesis λ
λ 4 e −43λ
(1/30 < λ < 1/5)
Odds
1
prior 1 25λ 2
λ> 10 = Odds λ < ∫ 1/ 10 =
1/ 30 ∫ 1/ 5 1/ 10
NHST
19.
( a)
Our z-statistic is z=
x¯ − µ √ = σ/ n
c
Z(λ)4
posterior dλ
<
5
.
λ 2 e − 43 λ
Z(λ)4
P (λ < 1/ 10) = P (λ > 1/ 10) ∫
posterior dλ > 10 ≈
6.25 − 4 10/ 7
1
posterior
λ 4 e − 43 λ
1 10
Using the R function i n t e g ra t e ( ) we computed Odds
9
likelihood
for λ < 1/30 for 1/30 < λ < 1/5 for λ > 1/5
1/10
=
1/ 30
∫
1/ 10 10.1. λ1
1 /5
λ 2 e −43λ 4
Z(λ) λ2 −e4 3 λ Z(λ)4
dλ dλ
= 1.575
liveexamhelper.com
Under the null hypothesis z ∼ N(0, 1) The two-sided p-value is p = 2 × P (Z > 1.575) = 2 × 0.0576 = 0.1152 The probability was computed from the z-table. We interpolated between z = 1.57 and z = 1.58 Because p > α we do not reject H 0 . ( b ) The null pdf is standard normal as shown. The red shaded area is over the rejection region. The area used to compute significance is shown in red. The area used to compute the p-value is shown with blue stripes. Note, the zstatistic outside the rejection region corresponds to the blue completely covering the red. f (z |H 0 ) ∼ N(0, 1)
z.975 = −1.96
z = 1.575
reject H 0
20.
( a)
Our t-statistic is
do not reject H 0
x¯ − µ
z
z.025 = 1.96 reject H 0
6.25 − 4
= 2.625 s/ n 6/7 Under the null hypothesis t ∼ t48 . Using the t-table we find the two-sided p-value is √
=
p = 2 × P (t > 2.625) < 2 × 0.005 = 0.01 Because p < α we reject H 0 . ( b ) The null pdf is a t-distribution as shown. The rejection region is shown. The area used to compute significance is shown in red. The area used to compute the p-value is shown with blue stripes. Note, the t-statistic is inside the rejection region corresponds. This corresponds to the red completely covering the blue. The critical values for t48 we’re looked up in the table. f (t|H 0 ) ∼ t48
t.975 = −2.011 reject H 0
do not reject H 0
t.025 = 2.011 reject H 0
t = 2.625
t
liveexamhelper.com
21.
See the psets 7 and 8.
22. Probability, M L E , goodness of fi t ( a) This is a binomial distribution. Let θ be the Bernoulli probability of success in one test. 12 k 1− k p(x = k) = , for k = 0, 1, . . . , 12. k θ (1 − θ) (b)
The likelihood function for the combined data from all 60 centers is 12 12 x 2 θ x 1(1 − θ) 12− x 1 θ (1 − θ) p(x 1 , x 2 , . . . , x 60 | θ) = x1 x2 Σ
=
cθ
xi
(1 − θ)
Σ
···
12−2 x
12 x 60
θ x 60(1 − θ)
12− x
60
12− x i
To find the maximum we use the log likelihood. At the same time we make the substitution 60 · x¯ for
Σ
xi .
ln(p(data | θ)) = ln(c) + 60x¯ ln(θ) + 60(12 − x¯) ln(1 − θ). Now we set the derivative to 0: | θ)) 60x¯ 60(12 − x¯) d ln(p(data = − = 0. θ 1− θ dθ Solving for θ we get (c)
x¯
θˆ = 12 .
The sample mean is
= 2.35
Σ
(count × x ) counts 4 · 0 + 15 · 1 + 17 · 2 + 10 · 3 + 8 · 4 + 6 · 5 60 =
x¯ =
Σ
(d)
Just plug x¯ = 2.35 into the formula from part (b): θˆ = x¯/12 = 2.35/12 = 0.1958
(e)
There were 60 trials in all. Our hypotheses are:
H 0 = the probability of success is the same at all centers. (This determines the probabilities of the counts in each cell of our table.) H A = the probabilities for the cell counts can be anything as long as they sum to 1, i.e. x follows an arbitrary distribution. of X 2are computedmultinomial using the formula X =2 ( E 2 i — Oi ) / Ei . i
Using the the value for θˆ in part (d) we have the following table. The probabilities are computed using R , the expected counts are just the probabilities times 60. The components x p(x) Observed Expected X
2 i
0 0.0731 4 4.3884 0.0344
1 0.2137 15 12.8241 0.3692
2 0.2863 17 17.1763 0.0018
3 0.2324 10 13.9428 1.1149
4 0.1273 8 7.63962 0.0170
5 0.0496 6 2.9767 3.0707
liveexamhelper.com
The χ 2statistic is X p-value is
=2
X
Σ =2 4.608. There are 6 cells, so 4 degrees of freedom. The i p = 1 - pc h isq ( 4 . 6 0 8 , 4 ) = 0.3299
With this p-value we do not reject H 0 . The reason the degrees of freedom is two less than the number of cells is that there are two constraints on assigning cell counts assuming H A but consistent with the statistics used to compute the expected counts. They are the total number of observations = 60, and the grand mean x = 2.35.
liveexamhelper.com