6
C HA PTER 1
Significance: How Strong Is the Evidence
e. No, 0.50 is just a plausible (reasonable) explanation for the data. Other explanations are possible (e.g., the author’s long-run proportion of wins could be 0.55).
c. Yes, the simulation analysis gives strong evidence that the woman is not simply guessing. If she were guessing she’d rarely get 8 out of 8 correct.
f. Yes, it means that there were special circumstances when the author played these 20 games and so these 20 games may not be a good representation of the author’s long-run proportion of wins in Minesweeper.
d. Statistically significance evidence she is not guessing
1.1.14 a. 100 b. Each dot represents the number of times out of 10 attempts the toast lands buttered side down when the probability that the toast lands buttered side down is 0.50. c. 5, because that is what will happen on average if the toast is dropped 10 times and 50% of the drops it lands buttered side down. d. No, we are not convinced that the long-run proportion of times the toast lands buttered side down is above 0.50 because 7 is a fairly typical outcome for the number of times landing buttered side down out of 10 drops of toast when the long-run proportion of times it lands buttered side down is 0.50. Stated another way, 0.50 is a plausible value for the long-run proportion of times that the toast lands buttered side down based on getting 7 times landing buttered side down in 10 drops. e. No, 0.50 is just a plausible (reasonable) explanation for the data. Other explanations are possible (e.g., the long-run proportion of times the toast lands buttered side down could be 0.60). 1.1.15 a. Statistic
1.1.18 a. The conclusion you’ve drawn is incorrect, because 5 out of 8 is a likely result if someone is just guessing. In particular, when you do a simulation with probability of success = 0.5, sample size (n) = 8, getting 5 heads happens quite frequently. b. No, this does not prove that you cannot tell the difference. It’s plausible (believable) you are not guessing, but we haven’t proven it. c. Applet inputs are: probability of success (π) = 0.5, sample size (n) = 16, number of samples = 1000. Applet output suggests that 14 out of 16 is a fairly unlikely result (~2 out of 1000 times). Thus, this result also provides strong evidence that the person actually has ability better than random guessing. The applet value for π stays the same because 0.50 still represents guessing, and n = 16 now because there are 16 cups of tea. 1.1.19 a. The long-run proportion of times that Zwerg chooses the correct object b. Zwerg is just guessing or Zwerg is choosing the correct object because she understands the cue. c. 37 out of 48 attempts seems fairly unlikely to happen by chance, because 24 out of 48 is what we would expect to happen in the long run.
b. Parameter
d. 50%
c. Yes, it is possible to get 17 out of 20 first serves in if Mark was just as likely to make his first serve as to miss it.
1.1.20 a. 37 times out of 48 attempts
d. Getting 17 out of 20 first serves in if Mark was just as likely to make the serve as to miss it is like flipping a coin 20 times and getting heads 17 times. This is fairly unlikely, so 17 out of 20 first serves in is not a very plausible outcome if Mark is just as likely to make his first serve as to miss it.
b. Applet input: probability of success is 0.5, sample size is 48, number of samples is 1,000 160
1.1.16
120
a. Observational unit is each cup, variable is whether the tea or milk was poured first. b. The long-run proportion of times the woman correctly identifies a cup c. 8, p̂ = 1.0 d. Yes, it’s possible she could get 8 out of 8 correct if she was just randomly guessing with each cup. e. Getting 8 out of 8 correct if she was randomly guessing is like flipping a coin 8 times and getting heads every time—a fairly unlikely result. Thus, 8 out of 8 seems unlikely. 1.1.17 a. Toss a coin 8 times to represent the 8 cups of tea. Heads represents a correct identification of what was poured first, tea or milk, and tails represents an incorrect identification of what was poured first. Count the number of heads in the 8 tosses, this represents the number of correct identifications of what was poured first out of the 8 cups. Repeat this process many times (1,000). You will end up with a distribution of the number of correct identifications out of 8 cups when the chance of a correct identification is 50%. If 8 correct out of 8 cups rarely occurs, then it is unlikely that the woman was just guessing as to what was poured first. b. Using the applet shows that 8 out of 8 occurs rarely by chance (~4 times out of 1,000), confirming the fact that 8 out of 8 is quite unlikely to occur just by chance.
c01Solutions.indd 6
80
40
0
10
12
14
16
18
20
22
24
26
28
30
32
34
36
Number of successes
c. Yes, it appears as if the chance model is wrong, as it is highly unlikely to obtain a value as large as 37 when there is a 50% chance of picking the correct object. d. We have strong evidence that Zwerg can correctly follow this type of direction more than 50% of the time. e. The results are statistically significant because we have strong evidence that the chance model is incorrect. 1.1.21 a. Zwerg is just guessing or Zwerg is picking up on the experimenter cue to make a choice. b. 26 out of 48 seems like the kind of thing that could happen just by chance because 24 out of 48 is what we would expect on average in the long run.
10/16/20 9:15 PM
Solutions to Problems c. 50%
b. 17 out of 30 seems like the kind of thing that could happen just by chance because 15 out of 30 is what we would expect on average in the long run.
1.1.22 a. 26 times out of 48 attempts b. Applet input: probability of success is 0.5, sample size is 48, number of samples is 1,000. This distribution is centered at 24. 180
c. 50% 1.1.26 a. 17 times out of 30 attempts b. Applet input: probability of success is 0.5, sample size is 30, number of samples is 1,000. The distribution is centered at 15.
0.3300
c. We cannot conclude the chance model is wrong because a value as large or larger than 17 is fairly likely.
120
d. We do not have strong evidence that Janine can land the majority of her serves in-bounds when serving right-handed.
60
0
e. The chance model (Janine lands 50% of her serves in-bounds when serving right-handed) is a plausible explanation for the observed data (17 out of 30). 12
15
18
21
24
27 30 26 Number of heads
33
36
c. We cannot conclude the chance model is wrong because a value as large or larger than 26 is fairly likely.
f. This does not prove that Janine lands 50% of her right-handed short-serves in bounds. This is just one plausible explanation for Janine’s performance. We cannot rule out a 0.50 long-run proportion of serves in bounds as an explanation for Janine landing 17 out of 30 serves in bounds.
d. We do not have strong evidence that Zwerg can correctly follow this type of direction more than 50% of the time.
1.1.27
e. The chance model (Zwerg guessing) is a plausible explanation for the observed data (26 out of 48), because the observed outcome was likely to occur under the chance model.
b. 20
f. Less convincing evidence that Zwerg can correctly follow this type of direction more than 50% of the time. We could have anticipated this because 26 out of 48 is closer to 24 out of 48 than is 37 out of 48. g. This does not prove that Zwerg is just guessing. Guessing is just one plausible explanation for Zwerg’s performance in this experiment. We cannot rule out guessing as an explanation for Zwerg getting 26 out of 48 correct. 1.1.23 a. The long-run proportion of times that Janine’s short serve lands in bounds when serving left-handed b. Janine has a 50-50 chance of landing in- bounds and so 23 out of 30 happened by chance; Janine’s chance of landing her serve in bounds is greater than 50%. c. 23 out of 30 seems somewhat unlikely to occur if she has a 50-50 chance of landing the serve in-bounds d. 50% 1.1.24 a. 23 out of 30 attempts b. Applet input: probability of success is 0.5, sample size is 30, number of samples is 1,000. Centered ~15 c. Yes, it appears as if the chance model is wrong, as it is highly unlikely to obtain a value as large as 23 when there is a 50% chance of getting the serve in-bounds.
a. 0.50 c. 1,000 (or some large number) d. 12 out 20 is a fairly likely value because it occurred frequently in the simulated data. 1.1.28 a. 0.50 b. 100 c. 1,000 (or some large number) d. 60 out of 100 is somewhat unlikely because it occurred somewhat infrequently in the simulated data. e. The sample size was different (20 serves vs. 100 serves). 1.1.29 B. 1.1.30 a. 12 coin flips b. It is unlikely, because at least 11 correct (heads) happened very rarely on 12 coin flips. c. Yes, we have very strong evidence that Milne does better than just guess because 11 correct out of 12 is very unlikely to happen by just guessing. d. Even stronger evidence, because 12 correct is even more extreme than 11 correct. e. Compared to 11, 12 is farther away in the tail and farther away from 6 (which is ½ of 12).
d. We have strong evidence that Janine can land the majority of her serves in bounds.
1.1.31
e. The results are statistically significant because we have strong evidence that the chance model is incorrect.
b. Yes, because 16 heads out 20 coin flips is very rare.
1.1.25 a. Janine has a 50-50 chance of landing in- bounds and so 17 out of 30 happened by chance; Janine’s chance of landing her serve in-bounds is greater than 50%.
c01Solutions.indd 7
7
a. 20 times c. Yes, because 16 heads in 20 tosses is very rare, so we have strong evidence that Mercury’s choices are not random. 1.1.32 a. 20 times
10/16/20 9:15 PM
8
C HA PTER 1
Significance: How Strong Is the Evidence
b. No, because 11 heads in 20 coin flips is not unusual. c. No, we do not have evidence that Panzee’s choices are that different from what could have happened if Panzee was just randomly choosing containers. 1.1.33 a. We would flip 40 coins where one side (say heads) represents Donaghy’s foul calls favoring the team that received heavier betting and the other side (say tails) represents Donaghy’s foul calls not favoring the team that received heavier betting.
d. Because 4 of the 100 simulated outcomes gave a result of 7 or more, the p-value is 0.04. e. We have strong evidence that Sarah is not simply guessing, because 7 out 8 rarely occurs by chance (if just guessing) f. If Sarah doesn’t understand how to solve problems and is just guessing at which picture to select, the probability she would get 7 or more correct out of 8 is 0.04. g. A single dot represents the number of times Sarah would choose the correct picture (out of 8) if she were just guessing.
b. Because 28 out of 40 is way out in the upper tail of the chance distribution, it is very unlikely that Donaghy’s foul calls would favor the team that received heavier betting 28 out of 40 times if the chance model is true.
1.2.17
1.1.34
b. H0: π = 0.50, Ha: π > 0.50
a. We are assuming that Buzz’s probability of choosing the correct button does not change and that previous trials don’t influence future guesses.
c. 0.23 (23 dots are 0.60 or larger)
b. The parameter is his actual probability of pushing the correct button.
e. 0.70
a. Null: The long-run proportion of times Hope will go to the correct object is 0.50, Alternative: The long-run proportion of times that Hope will go to the correct object is more than 0.50
d. No, the approximate p-value is 0.23, which provides little to no evidence that Hope understands pointing.
1.2.1 D.
f. i. 1.2.18 Researcher A has stronger evidence against the null hypothesis because his p-value is smaller.
1.2.2 C.
1.2.19
1.2.3 A.
a. Roll a die 20 times, and keep track of how many times ‘one’ is rolled. Repeat this many times.
Section 1.2
1.2.4 B.
b. Using a set of five black cards and one red card, shuffle the cards and choose a card. Note the color of the card and return it to the deck. Shuffle and choose a card 20 times keeping track of how many times the red card is selected. Repeat this many times.
1.2.5 A. 1.2.6 D. 1.2.7 C. 1.2.8 B.
c. Roll 30 times, then repeat.
1.2.9 B.
d. Shuffle and choose a card 30 times, then repeat.
1.2.10 C.
e. Roll a die 20 times, and keep track of how many times a ‘one, two, three, or four’ is rolled. Repeat this many times.
1.2.11 B.
f. Using a set of one black card and two red cards, shuffle the cards and choose a card. Shuffle and choose a card 20 times keeping track of how many times the red card is selected. Repeat this many times.
1.2.12 A. 1.2.13 a. 0.25 b. 25 (because 0.25 × 100 = 25) 1.2.14 a. H0 = Null hypothesis b. Ha = Alternative hypothesis
c. ˆ p = sample proportion
d. π = long-run proportion (parameter) e. n = sample size
1.2.15 ˆ p is the value of the observed statistic, while the p-value is the probability that the observed statistic or more extreme occurs if the null hypothesis is true; p-value is a measure of strength of the evidence. 1.2.16 a. The long-run proportion of times that Sarah chooses the correct photo, π b. 7/8 = 0.875. c. Null: The long-run proportion of times Sarah chooses the correct photo is 0.50. Alternative: The long-run proportion of times Sarah chooses the correct photo is more than 0.50. H0: π = 0.50, Ha: π > 0.50
c01Solutions.indd 8
1.2.20 a. Observational units: 40 heterosexual couples who agreed on their response to which person was the first to say “I love you,” Variable: Whether the man or woman said “I love you” first; this is a categorical variable b. Null: The proportion of all couples where the male said “I love you” first is 0.50. Alternative: The proportion of all couples where the male said “I love you” first greater than 0.50 c. π is the proportion of all couples where the male said “I love you” first d. 28/40 = 0.7 is the sample proportion; we use the symbol e. Flip a coin 40 times and keep track of the number of heads. Repeat the 40 coin flips, 1,000 times. Calculate the proportion of sets of 40 coin flips where 28 or more heads were obtained. That proportion is the p-value. f. Applet: π = 0.50, n = 40, number of samples = 1,000. To find the p-value, we find the proportion of times a value greater than or equal to 28 is observed. The p-value is approximately 0.008. g. The p-value is the probability of observing a value of 28 or greater, assuming that for 50% of couples the man said “I love you” first.
10/16/20 9:15 PM
Solutions to Problems h. The small p-value gives us strong evidence that for more than 50% of couples the man said “I love you” first. 1.2.21 a. Obs units = university students, Variable = male or female said “I love you” first b. Null: The long-run proportion of university student relationships in which the male says “I love you” first is 0.50, Alternative: The longrun proportion of university student relationships in which the male says “I love you” first is more than 0.50,
c. 59/96 = 0.61 is the sample proportion, we use the symbol ˆ pto denote this quantity. d. We could flip a coin 96 times and keep track of the number of heads. Then do many, many more sets of 96 coin flips, keeping track of the number of heads each time.
e. Probability of heads: 0.5, number of tosses: 96, number of repetitions: 1,000. To find the p-value, we find the proportion of times a value is greater than or equal to 0.61. The p-value is approximately 0.016. f. The p-value (0.016) is the probability of 0.61 or larger assuming the null hypothesis is true. g. We have strong evidence that the long-run proportion of university student relationships in which the male says “I love you” first is more than 0.50. 1.2.22 a. Obs units: each of the 40 monkeys. Variable: correct choice or not (categorical)
9
d. Null hypothesis: The long-run proportion of times that rhesus monkeys make the correct choice when the researcher looks towards the correct box is 0.50 (just guessing). Alternative hypothesis: The long-run proportion of times that rhesus monkeys make the correct choice is more than 0.50. H0: π = 0.50, Ha: π > 0.50 e. Flip a coin 40 times and record the number of heads. Repeat this process 999 more times, yielding a set of 1,000 values of the number of heads received in 40 coin tosses. Compute the p-value as the proportion of times 31 or larger was obtained by chance in the 1,000 sets of 40 coin tosses. If the p-value is small (indicating 31 or larger rarely occurs by chance), then this is convincing evidence that rhesus monkeys can interpret human gestures better than by random chance. f. The approximate p-value from the applet (using π = 0.50, n = 40, number of samples = 1,000) is 0.001 (probability of 31 or greater). This small p-value means that 31 out of 40 is strong evidence that the rhesus monkeys are not guessing, which may lead us to believe that rhesus monkeys can interpret gestures to indicate which box to choose. 1.2.24 The p-value is approximately 0.25. We don’t have strong evidence that the author’s long-run proportion of wins in Minesweeper is greater than 0.50. The null hypothesis (long-run proportion of wins is 0.50) is a plausible explanation for her winning 12 out of 20 games. 1.2.25 The p-value is approximately 0.134. We don’t have strong evidence that the author’s long-run proportion of wins in Spider Solitaire is greater than 0.50. The null hypothesis (long-run proportion of wins is 0.50) is a plausible explanation for him winning 24 out of 40 games. 1.2.26
b. The long-run proportion of times that a monkey will make the correct choice, π
a. The long-run proportion of times a spun penny lands heads
d. Null hypothesis: The long-run proportion of times that rhesus monkeys make the correct choice when observing the researcher jerk their head is 0.50 (just guessing). Alternative hypothesis: The longrun proportion of times that rhesus monkeys make the correct choice is more than 0.50.
c. Null would be the same, Alternative would be > 0.50. To calculate the p-value, find the probability that 29 or larger (0.58 or larger) occurred.
c. 30/40 = 0.75. Statistic. We use the symbol ˆ to denote this quantity. p
H0: π = 0.50, Ha: π > 0.50 e. Flip a coin 40 times and record the number of heads. Repeat this process 999 more times, yielding a set of 1,000 values of the number of heads received in 40 coin tosses. Compute the p-value as the proportion of times 30 or larger was obtained by chance in the 1,000 sets of 40 coin tosses. If the p-value is small (indicating 30 or larger rarely occurs by chance), then this is convincing evidence that rhesus monkeys can interpret human gestures better than by random chance. f. The approximate p-value from the applet (using π = 0.50, n = 40, number of samples = 1,000) is 0.001 (probability of 30 or greater). This small p-value means that 30 out of 40 is strong evidence that the rhesus monkeys are not guessing, which may lead us to believe that rhesus monkeys may be able to understand a head jerk to indicate which box to choose. 1.2.23 a. Obs units: each of the 40 monkeys. Variable: correct choice or not (categorical) b. The long-run proportion of times that a monkey will make the correct choice, π
c. 31/40 = 0.775. Statistic. We use the symbol ˆ to denote this quantity.
c01Solutions.indd 9
b. The p-value = 0.16. There is little to no evidence that a spun penny lands heads less than 50% of the time.
1.2.27 a. Null: π = 0.50, Alternative: π > 0.50 b. The p-value = 0.31. There is little to no evidence that a coin that starts out heads will land heads more than 50% of the time. c. No, the p-value does not prove the null hypothesized value (0.50) is correct, just that it is a plausible value for the parameter. d. In the long run it will land heads 51% of the time, in any particular set of 100 flips it is likely the coin won’t land heads exactly 51 out of 100 times. 1.2.28 A simulation analysis using a null hypothesis probability of 0.75 yields a p-value of 0.10, meaning that the set of 20 free throws by your friend (and making 12/20 of them) provides little to no evidence that your friend’s long-run proportion of free throws made is worse than the NBA average. 1.2.29 A simulation analysis using a null hypothesis probability of 0.75 yields a p-value of 0.02, meaning that the set of 40 free throws by your friend (and making 24/20 of them) provides strong evidence that your friend’s long-run proportion of free throws made is worse than the NBA average. 1.2.30 a. Null: Mercury will randomly choose between two containers versus Alternative: Mercury will choose the container with more bananas more often than the other container. b. Null, H0: π = 0.50 versus Alternative, Ha: π > 0.50 c. 0.02
10/16/20 9:15 PM
10
C HA PTER 1
Significance: How Strong Is the Evidence
d. Yes, because very rarely (p-value = 0.02) would Mercury pick the container with more bananas at least 16 times in 20 trials by just randomly choosing. e. Answers may vary depending on what you consider “strong evidence.” For a p-value to be at most 0.05, the sample proportion would have to be at least 0.75; so, 15 or more out of 20.
1.3.11 a. –3.47 (100 out of 400; 25%), –3.80 (20 out of 120; 16.7%), –4.17 (65 out of 300; 21.7%). Because of the random nature of obtaining the SDs of the null distributions, these standardized statistics can vary by as much as about plus or minus 0.2.
1.2.31
b. 65 out of 300 is the strongest evidence, 100 out of 400 is the least strong evidence.
a. Null, H0: π = 0.50 versus Alternative, Ha: π > 0.50
1.3.12
b. About 0.06 c. We have only moderate evidence because the p-value is fairly small, but not small enough to be considered strong evidence. 1.2.32 a. Null, H0: π = 0.25 versus Alternative, Ha: π > 0.25 b. About 0.003 c. We have very strong evidence because the p-value is very small; it is unlikely that 42 correct matches out of 113 would happen by just guessing. 1.2.33 a. Null, H0: π = 0.25 versus Alternative, Ha: π > 0.25 b. About 0.09 c. We have only moderate evidence because the p-value is somewhat small; it is somewhat unlikely that 35 correct matches out of 113 would happen by just guessing, but not unlikely enough to give us strong evidence. 1.2.34 When there are many outcomes, the probability of a single outcome, even if it is close to what we would expect, can be very small. We need to look at more outcomes to get a better understanding of the likeliness or unlikeliness of an outcome. 1.2.35 a. He claims he can flip a coin and make it come up heads every single time. b. Five
1.3.2 1.3.3 because the standard deviation is smaller 1.3.4 D. Even though C is farther away from 0 (z = 3), because a positive standardized statistic puts the observed statistic in the right tail, this means the observed statistic was a number (much) larger than 0.25, which is not evidence for the alternative hypothesis that π < 0.25. 1.3.5 1.3.6 B. 1.3.7 C. 1.3.8 A. 1.3.9 D. 1.3.10 a. 0.06 b. No, greater, 0.05 c. 1.72 d. No, less, 2
c01Solutions.indd 10
c. Friend D because this is more evidence against the null hypothesis d. Friend D because a smaller standard deviation leads to a larger standardized statistic 1.3.13 Friend G because the value of their statistic (30 out of 40) is larger than Friend F (15 out of 40) and thus will be farther in the tail of the distribution. 1.3.14 Simulation yields a standard deviation of the null distribution of 0.112, and a standardized statistic of approximately 0.89, which provides little or no evidence that the author’s long-run proportion of wins in Minesweeper is higher than 0.50. 1.3.15 Simulation yields a standard deviation of the null distribution of 0.125, and a standardized statistic of approximately (0.9375 − 0.50)/0.125 = 3.5, which provides very strong evidence that the long-run proportion of Buzz pushing the correct button is higher than 0.50. 1.3.16 Simulation yields a standard deviation of the null distribution of approximately 0.094, and a standardized statistic of approximately 0.76, which provides little to no evidence that the long-run proportion of times Buzz pushes the correct button is higher than 0.50. 1.3.17 a. The long-run proportion of all couples that lean their heads to the right while kissing, π. c. 80/124 = 0.645 = p̂
1.3.1 C.
b. True
b. Friend D because this is more evidence against the null hypothesis
b. Null: π = 0.50, Alternative: π > 0.50
Section 1.3
a. False
a. Friend D because they played more games
c. False d. False
d. (0.645 − 0.5)/0.045 = 3.22 = z e. The observed proportion of couples leaning their heads to the right while kissing is 3.22 standard deviations away from the null hypothesized parameter value of 0.5. f. We have strong evidence that the proportion of couples that lean their heads to the right while kissing is more than 0.50. 1.3.18 a. (0.645 − 0.60)/0.044 = 1.02 b. The standardized statistic is smaller. This makes sense because the null hypothesis is now closer to the observed statistic (less extreme). 1.3.19 a. Null: The long-run proportion of all couples that have the male say “I love you” first is 0.50. Alternative: The long-run proportion is more than 0.50. b. z = (0.70 − 0.50)/0.079 = 2.53 c. The observed proportion of couples where the males says “I love you” first is 2.53 standard deviations above the null hypothesized parameter value of 0.50. d. We have strong evidence that the proportion of couples for which the male says ”I love you” first is more than 0.50.
10/16/20 9:15 PM
11
Solutions to Problems 1.3.20 a. Null: The long-run proportion of times that rhesus monkeys choose the correct box is 0.50. Alternative. The long-run proportion of times that rhesus monkeys choose the correct box is greater than 0.50. b. z = (0.75 – 0.5)/0.079 = 3.16 c. The observed proportion of rhesus monkeys that chose the box the experimenter gestured towards is 3.16 standard deviations away from the null hypothesized parameter value of 0.5. d. We have strong evidence that the long-run proportion of times that rhesus monkeys choose the correct box is greater than 0.50. 1.3.21 a. The long-run proportion of times the lady correctly identifies which was poured first, π b. Null: π = 0.50, Alternative: π > 0.50 c. 8/8 = 1 = ˆ p
d. 0.50, because that is the value of the parameter if the null hypothesis is true. The standard deviation will be positive because the standard deviation must be at least 0, and is only equal to zero if there is no variability in the values (there will be variability in the simulated statistics). e. z = (1 − 0.50)/0.177 = 2.82 f. The observed proportion of times the lady correctly identified which was poured first is 2.82 standard deviations away from the null hypothesized parameter value of 0.50. g. We have strong evidence that the long-run proportion of times that the lady makes the correct identification is greater than 0.50.
g. We have little to no evidence that the long-run proportion of times that Zwerg makes the correct choice when using a marker is greater than 0.50. 1.3.24 a. The long-run proportion of times that 10-month olds choose the helper toy, π b. Null: π = 0.50, Alternative: π > 0.50 c. 14/16 = 0.875 = p̂ d. z = (0.875 − 0.50)/0.125 = 3 e. The observed proportion of times the 10-month-old babies chose the helper toy is 3 standard deviations above the null hypothesized parameter value of 0.50. f. We have strong evidence that the long-run proportion of times that 10-minth-old babies choose the helper toy is greater than 0.50. 1.3.25 a. Yes, the p-value will be small because the standardized statistic is large b. A p-value of approximately 0.002. The p-value is the probability observing 14/16 or larger assuming the null hypothesis is true. c. We have strong evidence that the long-run proportion of times that 10-month-old babies choose the helper toy is greater than 0.50. d. Yes, because both the p-value and the standardized statistic are measuring the strength of evidence (how far out in the tail the observed value is), and so should lead to the same conclusion. 1.3.26
1.3.22
a. The long-run proportion of people that choose the number 3, π
a. The long-run proportion of times that Zwerg makes the correct choice when the object is pointed at, π
b. Null: π = 0.25, Alternative: π > 0.25
b. Null: π = 0.50, Alternative: π > 0.50
d. The mean = 0.248 and SD = 0.076
c. 37/48 = 0.77 = p̂ d. 0.5, because that is the value of the parameter if the null hypothesis is true. The standard deviation will be positive because the standard deviation must be at least 0, and is only equal to zero if there is no variability in the values (there will be variability in the simulated statistics). e. (0.77 – 0.5)/0.073 = 3.70 f. The observed proportion of times Zwerg made the correct choice is 3.70 standard deviations away from the null hypothesized parameter value of 0.5. g. We have strong evidence that the long-run proportion of times that Zwerg makes the correct choice is greater than 0.50. 1.3.23 a. The long-run proportion of times that Zwerg makes the correct choice when using a marker, π b. Null: π = 0.50, Alternative: π > 0.50
c. 14/33 = 0.42 = p̂ e. (0.42 – 0.248)/0.076 = 2.26 f. The observed proportion of people that chose the number 3 is 2.26 standard deviations above the null hypothesized parameter value of 0.25. g. We have strong evidence that the long-run proportion of people that will choose the number 3 is greater than 0.25. 1.3.27 a. Yes, because the standardized statistic is far from zero. b. The p-value is approximately 0.02, and is the probability of observing 14/33 or larger assuming the null hypothesis is true. c. We have strong evidence that the long-run proportion of people that will choose the number 3 is greater than 0.25. d. Yes, because both the p-value and the standardized statistic are measuring the strength of evidence (how far out in the tail the observed value is), and so should lead to the same conclusion.
c. 26/48 = 0.54 = p̂
1.3.28
d. 0.5, because that is the value of the parameter if the null hypothesis is true. The standard deviation will be positive because the standard deviation must be at least 0, and is only equal to zero if there is no variability in the values (there will be variability in the simulated statistics).
b. Null: π = 0.50, Alternative: π > 0.50
e. (0.54 – 0.5)/0.073 = 0.55 f. The observed proportion of times Zwerg made the correct choice is 0.55 standard deviations away from the null hypothesized parameter value of 0.5.
c01Solutions.indd 11
a. The long-run proportion of people that choose a big number, π c. 19/33 = 0.58 = p̂ d. The mean = 0.50 and the SD = 0.084 e. (0.58 − 0.5)/0.084 = 0.952 f. We have little to no evidence that the long-run proportion of people that will choose a ”big number” is greater than 0.50.
10/16/20 9:15 PM
12
C HA PTER 1
Significance: How Strong Is the Evidence
1.3.29
1.3.34
a. No, because the standardized statistic is not far from zero.
a. Null, H0: π = 0.50 versus Alternative, Ha: π > 0.50
b. The p-value is approximately 0.153, and is the probability of observing 19/33 or larger assuming the null hypothesis is true
b. z = (0.8667 – 0.50)/0.129 = 2.84; because the standardized statistic is greater than 2, we have strong evidence that dogs are more likely to approach the positive donor.
c. We have little to no evidence that the long-run proportion of people that will choose a big number is greater than 0.50. d. Yes, because both the p-value and the standardized statistic are measuring the strength of evidence (how far out in the tail the observed value is), and so should lead to the same conclusion. 1.3.30 a. 0.50 b. Null, H0: π = 0.50 versus Alternative, Ha: π > 0.50 c. 0.144 d. 2.89 e. Because the standardized statistic is much larger than 2, we have convincing evidence that Milne’s answers are better than random guesses. f. Stronger evidence, because the standardized statistic is 3.46 which is farther from 0 than is 2.89.
c. p-value = 0.003; because this is less than 0.05, we again have strong evidence that dogs are more likely to approach the positive donor. 1.3.35 a. Null, H0: π = 0.50 versus Alternative, Ha: π > 0.50 b. z = (0.6667 – 0.50)/0.129 = 1.29; because the standardized statistic is less than 2, we do not have strong evidence that dogs are more likely to approach the positive donor. c. p-value = 0.150; because this is greater than 0.05, we again do not have strong evidence that dogs are more likely to approach the positive donor.
Section 1.4 1.4.1 D. 1.4.2 A.
1.3.31
1.4.3 D.
a. Null, H0: GY just guesses, π = 0.50 versus Alternative, Ha: GY is correct more often than just guessing, π > 0.50, where π represents the chance that GY guesses correctly on any trial.
1.4.4 B.
b. Sample proportion; SD of the sample proportion is about 0.034. c. 6.03 d. Very strong evidence that GY is more likely than random chance to correctly identify whether or not the square was present because the standardized statistic is so large. e. p-value is <0.001 that is very small; yes, leads to same conclusion as the standardized statistic. 1.3.32 a. Null, H0: GY just chooses between the two wagers randomly, π = 0.50 versus Alternative, Ha: GY chooses the higher wager more often than the lower wager, π > 0.50, where π represents the chance that GY chooses the higher wager. b. Sample proportion; SD of the sample proportion is about 0.043. c. –0.58 d. No evidence that GY is more likely to take the high wager than the low wager because the standardized statistic is close to zero. e. p-value is about 0.75 that is not small at all; yes, leads to the same conclusion as the standardized statistic. f. Because the 67/141 is 0.475 which less than 0.5, and in the opposite direction than that conjectured by the alternative hypothesis. 1.3.33 a. Null, H0: π = 0.50 versus Alternative, Ha: π > 0.50 b. 0.003; there is about a 0.3% chance that Paul got all 8 predictions correct by just guessing.
1.4.5 C. 1.4.6 A. 1.4.7 a. Larger, because the sample proportion is closer to the hypothesized value (0.50) of the long-run proportion. b. Smaller, because less likely to get extreme values of the statistic from a larger sample. c. Larger, because now including at least as extreme values of the sample proportion in both tails of the null distribution. 1.4.8 a. Smaller, as in, closer to 0, because the sample proportion is closer to the hypothesized value (0.5) of the long-run proportion. b. Larger, as in, farther from 0; because less likely to get extreme values of the statistic from a larger sample. c. Same, because the sample proportion is still the same distance away from 0 in the null distribution. 1.4.9 a. Null, H0: Students randomly choose between the two cookies, π = 0.50 versus Alternative, Ha: Students have a genuine preference for Chips Ahoy over Chipsters, π > 0.50, where π represents the longrun proportion of students that choose Chips Ahoy over Chipsters. b. 0.03 c. Null, H0: π = 0.50 versus Alternative, Ha: π ≠ 0.50 d. 0.08 1.4.10
c. Based on the very small p-value, we have very strong evidence that Paul is doing better than just guessing.
a. 0.05
d. Based on the small p-value, it is surprising that an animal would correctly choose the winning team in 8 of 8 competitions. This is a fairly small sample size, however, and it would be interesting to see whether Paul could sustain this accuracy through a much larger number of competitions.
1.4.11
c01Solutions.indd 12
b. 0.11 (0.30 or less and 0.70 or more) a. Smaller b. Smaller c. Larger
10/16/20 9:15 PM
13
Solutions to Problems 1.4.12
c. The p-value (The probability of obtaining 0.60 or larger when the true chance Krieger chooses the correct object is 0.05) is approximately 0.38.
a. True b. False 1.4.13 a. Stronger. The statistic (18/20 = 0.90) is much farther away from the null hypothesized value (0.50) than before (12/20 = 0.60). b. Stronger. The statistic is the same (0.60) but the sample size is much larger (100 vs. 20). c. No, 12/30 = 0.40 is less than the null hypothesis value of 0.50: thus this is not evidence that the long-run proportion of wins in Minesweeper is more than 0.50. 1.4.14 a. The long-run proportion of male births is greater than 0.50. b. The long-run proportion of male births c. Null: The long-run proportion of Null: π = 0.50.
male births is 0.50.
d. We do not have strong evidence that Krieger will choose the correct object more than 50% of the time. It is plausible that Krieger will choose the correct object 50% of the time. e. Stronger 1.4.26 a. Values of the long-run proportion less than 0.50 b. Increase, a two-sided p-value will be approximately twice as big as the corresponding one-sided p-value. c. The two-sided p-value will approximately double to 0.75. d. Stronger 1.4.27 a. Decrease, larger sample size. b. The p-value decreased to 0.244, yes it did behave as predicted.
d. Alternative: One-sided. He wanted to demonstrate that male births outnumbered female births.
c. Stronger
1.4.15 Probably weak because they are quite close (0.516 and 0.50) 1.4.16 Strong; the sample size is extremely large (n = 938,223).
a. The long-run proportion of times Krieger makes the correct choice when the experimenter leans towards the object, π.
1.4.17 Twice
b. 50%
1.4.18
c. Null: The long run proportion of times Krieger makes the correct choice when the experimenter leans towards the object is 0.50; Alternative: The long-run proportion of times Krieger makes the correct choice when the experimenter leans towards the object is more than 0.50.
a. Tiny b. Huge c. One-sided
1.4.28
1.4.19
d. Decrease, 9 out of 10 is farther out in the tail of the null distribution than 6 out of 10.
a. Change: sample size. Stay the same: distance, one- or two-sided.
1.4.29
b. Stronger
a. 9 out 10 (0.90)
1.4.20
b. Mean = 0.50, SD = 0.16
a. Change: distance. Stay the same: sample size, one- or two-sided.
c. 0.01
b. Weaker
d. Approximately 0.80
1.4.21
1.4.30
a. Double it. The alternative would now be two-sided.
a. z = (0.90 − 0.50)/0.16 = 2.5
b. Weaker
b. Using both the p-value and the standardized statistic, we have strong evidence that the long-run proportion of times that Krieger makes the correct choice is more than 0.50.
1.4.22 a. Sample size changes; distance and one-sided are the same. b. Stronger 1.4.23 a. Distance changes; sample size and one-sided are the same.
c. Yes, the p-value got smaller (evidence got stronger). 1.4.31 a.
b. Weaker
Analysis method
1.4.24 a. The long run proportion of times that Krieger chooses the correct object, π.
Exercises
Sample size n
Null value π0
Value of p ˆ
b. 50%
A: 1.4.8 – 1.4.12
938,223
0.50
0.516
B: 1.4.25
82
0.50
1.00
c. Null: The long-run proportion of times that Krieger chooses the correct object is 0.50; Alternative: The long-run proportion of times that Krieger chooses the correct object is more than 0.50.
b. Analysis Method A provides strong evidence, but Method B is overwhelming.
1.4.25
1.4.32
a. 6 out of 10 for a proportion of 0.60
a. One-sided
b. Mean = 0.50, SD = 0.16
b. Moderate. 6 out of 7 is not that strong.
c01Solutions.indd 13
10/16/20 9:15 PM
14
C HA PTER 1
Significance: How Strong Is the Evidence
c. p-value = 0.0547 + 0.0078 = 0.0625. We have moderate evidence that individuals living in the country have healthier lungs than those of individuals living in cities. 1.4.33
d.
Alternative hypothesis
1.4.34
c. Always lies above
c. Probability of heads: 0.5, number of tosses: 26, approximate pvalue = 0.0047. d. We have very strong evidence against the null and in support of the taller candidate winning the race more often than would be predicted by random chance. e. It is somewhat arbitrary to only look at 20th and 21st century elections.
Increasing
= 0 or ˆ p p= 1 ˆ
π ≠ 0.50
1.4.40
b. Null: The long-run proportion of races where the taller candidate wins in U.S. presidential elections is 0.5. Alternative: The long-run proportion of races where the taller candidate wins in U.S. presidential elections is larger than 0.5. Using symbols: H0: π = 0.5, Ha: π > 0.5; where π is the long-run proportion of races where the taller candidate won.
Decreasing
= 0 p ˆ
π < 0.50
c. p-value = 0.1094 + 0.0313 + 0.0039 = 0.1446. We have little to no evidence that bees are more likely to sting a target that has already been stung compared to a pristine target. a. In a race for U.S. president, is the taller candidate more likely to win? Alternatively, is π > 0.5.
Shape of curve
= 1 p ˆ
π > 0.50
a. One-sided b. Six out of 8 seems fairly likely to occur by chance; thus it is plausible that bees are just as likely to sting a target that has already been stung as they are to sting a target that is pristine.
Strongest evidence
Up-down
a. Lower b. Larger d. Less steep 1.4.41
Increasing
Decreasing
Up- Down
Steeper
B
A
F
Flatter
C
D
E
1.4.42 a. That would be considered cheating. The hypotheses are always statements about the parameter(s) of interest and are stated before we begin to explore the data. We use the data to test our hypotheses, not to form them.
1.4.35
b. Two-sided tests always yield larger p-values than one-sided tests, making it harder to find strong evidence in support of your research conjecture.
a. A: p-value = 0.3036, B: p-value = 0.0047, C: p-value = 0.9616, D: p-value = 0.1400
Section 1.5
b. Looking at the set of p-values suggests there is little evidence that taller candidates are more likely to win. In particular, looking at all presidential elections since 1796 yields a p-value of 0.1400. Looking at an arbitrary subset of presidential elections (Exercise 1.4.34) suggested a potentially significant result, but looking at more data suggested otherwise. 1.4.36 D = Sample size is n = 25, alternative hypothesis is right-sided: π > ½. = Sample size is n = 225, alternative hypothesis is right-sided: A π > ½. C = Sample size is n = 25, alternative hypothesis is left-sided: π < ½. B = Sample size is n = 225, alternative hypothesis is left-sided: π < ½. E = Sample size is n = 25, alternative hypothesis is two-sided: π ≠ ½. F = Sample size is n = 225, alternative hypothesis is two-sided: π ≠ ½. 1.4.37 a. Increasing: B,C b. Decreasing: A,D c. Up-down: E,F 1.4.38 a. 100% b. 50% c. 0% 1.4.39 a. Smaller, decreasing, A,D b. Larger, increasing, B,C c. Up-down, E,F
c01Solutions.indd 14
1.5.1 C. 1.5.2 No, the maximum standard deviation is always obtained at 0.5, regardless of sample size. 1.5.3 D. 1.5.4 C. 1.5.5 D. 1.5.6 A. 1.5.7 C. 1.5.8 The predicted value of the standard deviation of a null distribution of sample proportions. The prediction is accurate when the sample size is large. 1.5.9 The value of the standardized statistic, z. 1.5.10 a. Decrease b. 0.50 c. 0 or 1 1.5.11 The standardized statistic; a measurement of how many standard deviations from the mean the observed statistic is on the null distribution 1.5.12 a. The p-value from option 1 is more valid. b. The validity conditions are not met for this test because the light was green only 4 times (which is less than 10). We can also see this is a problem in the applet because the normal overlay does not match up nicely with the skewed null distribution.
10/16/20 9:15 PM
Solutions to Problems 1.5.13 a. Null: The long-run proportion of times that a penny lands heads when spun is 0.20 Alternative: The long-run proportion of times that a penny lands heads when spun is greater than 0.20 b. The simulation based p-value of 0.077, because the validity conditions are not met. There are not at least 10 times where the penny landed heads and at least 10 times where the penny landed tails in the sample. c. No, we do not have strong evidence that a penny will land heads more than 20% of the time in the long run, because the p-value is only 0.077 (but there is moderate evidence). 1.5.14 a. The symbol π represents the long-run proportion of times of the coin landing heads up. b. Null: π = 0.50. Alternative: π > 0.50 c. If the result was heads 52% of the time out 1000, then 520 must have been heads and 480 tails. Both of these are greater than 10. d. A standardized statistic of 1.26 means that our observed proportion of 0.52 is 1.26 standard deviations above 0.50 in the null distribution. e. We have little to no evidence that it is more likely for a coin to land the same side up as it started than not. 1.5.15 a. 0.152; this is very close to the hypothesized parameter value of 0.15. b. No, the validity conditions are not met. There are not at least 10 successes and 10 failures in the data (only 8 and 2). 1.5.16 a. 0.15; this is the hypothesized parameter value of 0.15 _
________________
b. 0.019; √π(1 − π)/n = √ 0.15(1 − 0.15)/361 = 0.019 c. Yes d. Because the validity conditions are met for this dataset (larger sample size) 1.5.17 a. The long-run proportion of times that a person identifies the correct image b. Null: The long-run proportion of times that a person identifies the correct image equals 0.25. Alternative: The long-run proportion of times that a person identifies the correct image greater than 0.25 c. Approximately 0.25 will identify the correct image if they have no psychic ability. This is the null hypothesis. d. See graph below Mean = 0.248 SD = 0.023 60 40 20 0 0.152 0.170 0.188 0.207 0.225 0.243 0.261 0.280 0.298 0.316 0.334 Proportion of successes
e. The p-value is approximately 0.002 because we only got a result 0.322 or larger 2 out of 1000 times by chance. Thus, there is strong evidence that proportion of correct guesses is larger than 0.25.
c01Solutions.indd 15
15
f. Using the Theory-based inference applet (which uses the normal approximation for the null distribution) yields a standardized statistic of 3.02 and p-value of 0.0012, again showing strong evidence that the proportion of correct guesses is larger than 0.25. It is not surprising that the two approaches give similar results as the sample size is very much larger (in particular, there are 106 successful guesses and 223 unsuccessful guesses. Both values are much larger than 10). 1.5.18 a. 32/97 = 0.330 (Morris et al.) vs. 106/329 = 0.322 (Bern and Honorton). The proportions are about the same (Morris is just slightly more). b. The p-value for Morris will be larger because the sample size is smaller c. The simulation p-value is approximately 0.048; this is strong evidence that the Ganzfeld receivers choices are better than just chance d. The p-value = 0.0346. It is similar to what we got from simulation, which is not surprising because the validity conditions are met (32 successes and 65 failures). e. They are more similar in the previous study because the sample size is larger and, even though the validity conditions are met here, the p-values will continue to get closer and closer together as the sample size increases. 1.5.19 a. Null: The long-run proportion of times that the male says “I love you” first is 0.50. lternative: The long-run proportion of times that the male says “I A love you” first is more than 0.50. b. 0.0057 c. We have very strong evidence that the long-run proportion of times that the male says “I love you” first is more than 0.50 d. z = 2.53. That the proportion of males that say “I love you first” is 2.53 standard deviations above the mean of the null d istribution. e. 0.0057 × 2 = 0.0114 1.5.20 a. Null: The long-run proportion of rhesus monkeys that choose the correct box is 0.50 Alternative: The long-run proportion of rhesus monkeys that choose the correct box is more than 0.50 Null: π = 0.50 Alternative: π > 0.50 b. 0.0003 c. We have very strong evidence that the long-run proportion of Rhesus monkeys that choose the correct box is more than 0.50 d. Z = 3.48. That the observed proportion of rhesus monkeys that chose the correct box is 3.48 standard deviations above the mean of the null distribution. e. 0.0003 × 2 = 0.0006 1.5.21 a. Null: The long-run proportion of times that a player starts with scissors is 0.333 lternative: The long-run proportion of times that a player starts with A scissors is different than 0.333 Null: π = 0.333 Alternative: π ≠ 0.333 b. p-value = 0
10/16/20 9:15 PM
16
C HA PTER 1
Significance: How Strong Is the Evidence
c. We have very strong evidence that the long-run proportion of times that a player starts with scissors is different than 0.333.
1.5.27
1.5.22
b. Null: The long-run proportion of times a six is rolled is 0.167.
a. Null: The long-run proportion of times that a player starts with rock is 0.333. lternative: The long-run proportion of times that a player starts A with rock is different than 0.333. Null: π = 0.333 Alternative: π ≠ 0.333 b. p-value = 0 c. We have very strong evidence that the long-run proportion of times that a player starts with rock is different than 0.333. 1.5.23 a. Null: The long-run proportion of times that people assign the name Tim to the face on the left is 0.50. lternative: The long-run proportion of times that people assign the A name Tim to the face on the left is more than 0.50. b. 0.0721 c. We have moderate evidence that the long-run proportion of times that people assign the name Tim to the face on the left is more than 0.50. d. z = 1.46. That the proportion of the sample that chose Tim is 1.46 standard deviations above the mean of the null distribution 1.5.24 a. Null: The long-run proportion of times that the most competent-looking candidate wins is 0.50. lternative: The long-run proportion of times that the most compeA tent-looking candidate wins is more 0.50. b. n = 279, p ˆ = 0.677 c. 0 d. We have very strong evidence that the most competent-looking candidate wins more than 50% of the time. 1.5.25 a. Null: The long-run proportion of matches that the red uniform wins is 0.50. lternative: The long-run proportion of matches that the red uniform A wins is not 0.50. b. 0.0681 c. We have moderate evidence that the long-run proportion of matches the red uniform wins is not 0.50. d. z = 1.82. That the proportion of the sample in which red won is 1.82 standard deviations above the mean of the null distribution 1.5.26 a. Null: The long-run proportion of times that the red uniform wins a boxing match is 0.50. lternative: The long-run proportion of times that the red uniform A wins a boxing match is not 0.50. b. 0.0896 c. We have moderate evidence that the proportion of times the red uniform wins a boxing match is not 0.50. d. z = 1.70. That the proportion of the sample in which red won a boxing match is 1.70 standard deviations above the mean of the null distribution
c01Solutions.indd 16
a. The long-run proportion of times a six is rolled lternative: The long-run proportion of times a six is rolled is more A than 0.167. c. 0.1497 d. We have little to no evidence that the long-run proportion of times a six is rolled is more than 0.167. 1.5.28 a. The long-run proportion of times a one is rolled. b. Null: The long-run proportion of times a one is rolled is 0.167. lternative: The long-run proportion of times a one is rolled is less A than 0.167. c. 0.0840 d. We have moderate evidence that the long-run proportion of times a one is rolled is less than 0.167. 1.5.29 a. The long-run proportion of times that Mario wins b. Null: The long-run proportion of times Mario wins is 0.50. Alternative: The long-run proportion of times Mario wins is not 0.50. c. p-value = 0.2733 d. We have little to no evidence that the long-run proportion of times Mario wins is not 0.50. e. 100 games gives z = 2 (p = 0.0455). So, if out of 100 games Mario wins 60, this would be strong evidence that Mario’s long-run proportion of times he wins is not 0.50. 1.5.30 a. Null, H0: Students randomly choose a number between 1 and 10, π = 0.50 versus Alternative, Ha: Students have a genuine preference for numbers greater than 5, π > 0.50, where π represents the long-run proportion of students that a number greater than 5, when asked to choose a number between 1 and 10. b. 0.61 c. Yes, more than 10 successes (62) and 10 failures (39) d. 0.011 e. We have strong evidence (based on the small p-value = 0.011) that students have a genuine preference for numbers greater than 5. 1.5.31 a. Null, H0: Cardiac arrests are just as likely on Mondays as on either Wednesdays or Fridays, π = 0.33 versus Alternative, Ha: Cardiac arrests are more likely on Mondays than on either Wednesdays or Fridays, π > 0.33, where π represents the long-run proportion of cardiac arrests among dialysis patients that happen on Mondays. b. 0.454 c. Yes, more than 10 successes (93) and 10 failures (112) d. < 0.001 e. 0.0001 f. Very strong evidence that of cardiac arrests among dialysis patients more than a third happen on Mondays (compared to Wednesdays or Fridays) 1.5.32 a. Null, H0: Cardiac arrests are just as likely on Tuesdays as on either Thursdays or Saturdays, π = 0.33 versus Alternative, Ha: Cardiac
10/16/20 9:15 PM
17
Solutions to Problems arrests are more likely on Tuesdays than on either Thursdays or Saturdays, π > 0.33, where π represents the long-run proportion of cardiac arrests among dialysis patients that happen on Tuesdays.
320
b. 0.342
240
Mean SD
2.44 1.37
6
7
c. Yes, more than 10 successes (65) and 10 failures (125) d. 0.384 e. 0.3613 f. No evidence that of cardiac arrests among dialysis patients more than a third happen on Tuesdays (compared to Thursdays or Saturdays) 1.5.33 a. Null, H0: The probability a spun coin will land on heads is 0.50, π = 0.50 versus Alternative, Ha: The probability a spun coin will land on heads is not 0.50, π ≠ 0.50, where π represents the probability the spun coin will land on heads. b. The sample statistic symbol is ˆ . Its value will vary. p
c. Answers will vary. d. Answers will vary. e. Answers will vary. 1.5.34 a. 0.05525 b. 0.0289 c. The simulation-based p-value is more valid because the validity conditions for the normal approximation are not met here.
160
80
0
0
1
2
3
4
5
Number of successes
d. The p-value is the probability of observing 7 or more successes out of 12 attempts when each attempt has a 20% chance of being correct. e. The theory-based approach is not appropriate here because the resulting simulated distribution of statistics is not normal. This is because the sample size is not large enough. In particular, there are only 7 successes and 5 failures, instead of at least 10 of each. 1.CE.4 Step 1: Ask a research question Jamie and Adam wanted to investigate which side buttered toast prefers to land on when it falls through the air.
d. 0.0547, closer to the simulation-based p-value
Step 2: Design a study and collect data
1.5.35
They set up a specially designed rig and dropped 48 pieces of toast from the roof of the Mythbusters’ headquarters. They wish to test the following null and alternative hypotheses:
a. Simulation-based p-value = 0.168, normal-approx.-based p-value = 0.103, exact binomial p-value = 0.1719; the simulation-based and exact binomial p-values are close to each other. b. Simulation-based p-value = 0.378, normal-approx.-based p-value = 0.2635, exact binomial p-value = 0.377; the simulation-based and exact binomial p-values are close to each other.
End of Chapter 1 Exercises
Null: There is no preference for which side the buttered toast lands on; both sided are equally likely (have a 50% chance of landing face down) Alternative: One of the sides tends to land face down more than the other.
1.CE.1 Null
They recorded which sided landed down (buttered or not buttered side) for each of the 48 attempts.
1.CE.2 Probability value in the null hypothesis
Step 3: Explore the data
1.CE.3
In 19 out of 48 attempts the buttered side landed down.
a. Null: The probability the statistics professor wins is 0.02, Alternative: The probability the statistics professor wins is larger than 0.02.
Step 4: Draw inferences
b. Start with 5 playing cards—one red and four black. Shuffle the cards. Randomly choose one card, record if it is red or not, and then place the card back in the deck. Shuffle and randomly choose cards until 12 cards have been selected. Record the number of red cards selected out of the 12 selections. Repeat this entire process 999 more times to generate a distribution of counts of red cards. If 7 out of 12 red cards chosen rarely happened in the 1,000 repetitions, then this would be convincing evidence that 7 out of 12 was unlikely to have occurred by chance. c. The observed data (7 wins out of 12 attempts) provide convincing evidence that the statistics professor’s probability of winning in one week was larger than would be expected if the 5 competitor’s were equally likely to win because 7 out of 12 rarely happens by chance, when everyone is equally likely to win; in particular the p-value is 0.005.
c01Solutions.indd 17
Statistic: 19/48 = 0.396 Simulation: Used applet to generate a distribution (using probability = 0.5, sample size = n = 48, number of samples = 1000), to generate a two-sided p-value of 0.19 Strength of evidence: We do not have strong evidence that one side of buttered toast tends to fall face down more often than the other. Step 5: Formulate conclusions We don’t know if the results can be generalized to other situations (different bread? inside vs. outside? device used to drop bread?) Step 6: Look back and ahead While no evidence was found that one side falls to the ground more than the other, further studies are needed to ensure that the results apply to all bread, inside, and when a person drops it instead of a machine.
10/16/20 9:15 PM
18
C HA PTER 1
Significance: How Strong Is the Evidence
1.CE.5
π = 0.50
a. Even though 34.6% (the percentage of players suspended for PED use who were from the United States) is less than the percentage of all baseball players born in the United States (57.3%), it’s possible that this could have happened by chance (just like it’s possible to flip a coin and get heads 8 times out of 8); the question is how likely this (34.6%) would happen just by chance. If it is quite unlikely then we say that the result is statistically significant, meaning that there is something about U.S. baseball players which make them less likely to be suspended for PED use.
π < 0.50
b. The likelihood of having only 34.6% of 595 suspended baseball players be from the United States when the percentage of all baseball players who are from the United States is 57.3% is extremely unlikely; in other words, 34.6% is (statistically) significantly less than 57.3%. c. z = –11.19. This tells us that the observed proportion (0.346) is 11.19 SDs less than the mean of the null distribution, confirming that we have extremely strong evidence that the null hypothesized value of the parameter is incorrect. 1.CE.6 a. Even though 61.8% (the percentage of players suspended for PED use who were from Latin America) is more than the percentage of all baseball players born in Latin America (34.6%), it’s possible that this could have happened by chance (just like it’s possible to flip a coin and get heads 8 times out of 8); the question is how likely is it that this (61.8%) would happen just by chance. If it is quite unlikely then we say that the result is statistically significant, meaning that there is something about Latin American baseball players which make them more likely to be suspended for PED use. b. The likelihood of having 61.8% of 595 suspended baseball players be from Latin America when the percentage of all baseball players who are from the United States is 34.6% is extremely unlikely; in other words, 61.8% is (statistically) significantly more than 34.6%. c. z = 13.97. This tells us that the observed percentage (61.8%) is more than 13.95 SDs less than the mean of the null distribution, confirming that we have extremely strong evidence that the null hypothesized value of the parameter is incorrect.
b. The p-value is approximately 0, with a standardized statistic of –6.45. This is extremely strong evidence that the long-run proportion of times that New Zealand students associate the name Bob with the face on the left is less than 0.50. c. Yes, because there are at least 10 successes (30 Bob on left) and at least 10 failures (105 Tim on left). d. p-value = 0; z = –6.45; This is extremely strong evidence that the long-run proportion of times that New Zealand students associate the name Bob with the face on the left is less than 0.50. 1.CE.9 Not necessarily. A larger sample size yields a smaller p-value if the value of the statistic is the same; there is no guarantee the value of the statistic (proportion of heads) will be the same in Jose and Roberto’s separate samples 1.CE.10 A is the only correct answer. 1.CE.11 Roll a die. If it comes up 1 or 2 then call it a success, otherwise failure. Repeat the process 50 times keeping track of the total number of successes out of 50. Then, repeat sets of 50 rolls keeping track of the number of successes within the 50 rolls. 1.CE.12 Flip two coins. If they both come up heads call it a “success,” otherwise ‘failure.’ Repeat the process 25 times keeping track of the total number of successes out of 25. Then, repeat sets of 25 pairs of flips keeping track of the number of successes within the 25 paired flips. 1.CE.13 a. The long-run proportion of times that Rick makes a free throw underhanded b. Null: The long-run proportion of times that Rick makes a free throw underhanded is 0.90. π = 0.90 π > 0.90
1.CE.7
1.CE.14
a. Null: The long-run proportion of times that New Zealand students associate the name Tim with the face on the left is 0.50.
a. The long-run proportion of times that Lorena makes a 10-foot putt
lternative: The long-run proportion of times that New Zealand stuA dents associate the name Tim with the face on the left is more than 0.50.
b. a) Null: The long-run proportion of times that Lorena makes a 10foot putt is 0.60.
Null: π = 0.50
lternative: The long-run proportion of times that Lorena makes a A 10-foot putt is more than 0.60.
Alternative: π > 0.50
Null: π = 0.60
b. The p-value is approximately 0, with a standardized statistic of 6.45. This is extremely strong evidence that the long-run proportion of times that New Zealand students associate the name Tim with the face on the left is more than 0.50.
Alternative: π > 0.60
c. Yes, because there are at least 10 successes (105 Tim on left) and at least 10 failures (30 Bob on left). d. p-value = 0; Z = 6.45; This is extremely strong evidence that the long-run proportion of times that New Zealand students associate the name Tim with the face on the left is more than 0.50. 1.CE.8 a. Null: The long-run proportion of times that New Zealand students associate the name Bob with the face on the left is 0.50. lternative: The long-run proportion of times that New Zealand A students associate the name Bob with the face on the left is less than 0.50.
c01Solutions.indd 18
1.CE.15 a. The long-run proportion of times someone chooses an odd number b. Null: The long-run proportion of times that someone chooses an odd number is 0.50. lternative: The long-run proportion of times that someone chooses A an odd-number is more than 0.50. c. 1029/1770 = 0.581 d. Yes, the validity conditions are met. There are 1029 successes and 741 failures, both well above the minimum of 10. e. 6.85 f. p-value = 0 g. We have very strong evidence that the long-run proportion of times someone chooses an odd number is more than 0.50.
10/16/20 9:15 PM
Solutions to Problems 1.CE.16 a. The long-run proportion of times someone chooses 7 b. Null: The long-run proportion of times that someone chooses 7 is 0.10. lternative: The long-run proportion of times that someone chooses A 7 is more than 0.10. c. 503/1770 = 0.284 d. Yes, the validity conditions are met. There are 503 successes and 1267 failures, both well above the minimum of 10. e. 25.83 f. p-value ≈ 0 g. We have very strong evidence that the long-run proportion of times someone chooses 7 is more than 0.10. 1.CE.17 a. We don’t have strong evidence that the probability a spun tennis racket lands with the label up is different from 0.50. b. If the probability that a spun tennis racket lands with the label is actually 0.50, then it is quite likely to get 46 spins out of 100 with the label up; thus, a reasonable (plausible) explanation for the author’s data (46 out 100) is that the tennis racket is fair (0.50 chance of label landing up). 1.CE.18 Null hypothesis probability = 0.50 Statistic = 0.46 p-value = 0.484 1.CE.19 Answers will vary. Mean = 0.243 180 SD = 0.081
0.0050
4. Null: The probability that students choose the right front tire is 0.25 (π = 0.25). Alternative: The probability that students choose the right front tire is more than 0.25 (π > 0.25). 5. In this sample,14 out of 28 or 50% of the students selected the right front tire. This is more than we would expect if students choose randomly and equally among the four tires (25% of the time in the long run). 6. Yes, anything is possible, although some outcomes will be less likely or believable when the null hypothesis is true. 7. Our statistic is ˆ p = 0.50.
8. Probability of success (π) = 0.25, sample size (n) = 28, number of samples = 1,000 9. The center is located at 0.25. Yes, it makes sense that this is the center because 0.25 is the specified null hypothesis proportion. 10. a. The p-value from the simulation should be around 0.004. In other words, in a large number of samples (from a process with = 0.25), roughly .4% of those samples should have a sample proportion of 0.50 or larger. b. The standard deviation of the simulated sample proportions should be around 0.082, so the standardized statistic is approximately z = (0.50 – 0.25)/0.082 = 3.05. (This number may vary a bit because the SD of the simulated null distribution will vary a bit.) Therefore, a sample proportion of 0.50 is more than 3 standard deviations above the hypothesized process probability of 0.25. c. The One Proportion applet reports a theory-based p-value of 0.0011. The validity conditions are met because there are 14 “successes” (choose right front) and 14 “failures” (choose something else), both of which exceed 10. 11. Yes, all three methods in question 10 give strong evidence against the null hypothesis; the p-values are quite small and the standardized statistic is large (e.g., above 2).
120
12. We have strong evidence that students pick the right front tire more than 25% of the time because we would rarely have 50% of a sample of 28 students choose the right front tire if the long-run proportion of students choosing the right front tire is 0.25.
60
0
0 0
0.07
0.14 0.21 0.29 0.36 0.43 Proportion of successes
0.50
0.57 0.05000
1.CE.20 It would be more surprising for the sample proportion to be 0.70 in a sample of 60 compared to a sample of 30 students, if the population proportion is 0.60. This is because it is more unusual to see extreme results in a larger sample (p-value = 0.0719) rather than a small sample (p-value = 0.1763).
Chapter 1 Investigation 1. The observational units are the 28 students. 2. The variable recorded is which tire each student indicates (Right front, right rear, left front, left rear). This is a categorical, non- binary variable. We could also define the variable to be “right front” or “not right front” as that is the primary outcome of interest in our research question. 3. The parameter is the long-term proportion of students who will pick the right front tire.
c01Solutions.indd 19
19
13. Hard to say. The question is: “Who are these students?” Will they perform similarly to other people in the same situation? It’s hard to say that these students will necessarily act like people in general. We can probably say that we can infer these results to people similar to those that were in the study. 14. Answers will vary. Some things to consider include selecting a broader representation of students to participate in the study, examining exactly how the question is posed to students and whether that impacts their choices, considering whether there might be gender differences or a tendency for different responses among individuals who have recently had a flat tire, and whether this tendency is similar across cultures (including countries where motorists drive on other side of the road). 15. We would expect to find weaker evidence because the sample size is smaller and the statistic (0.50) has stayed the same. 16. The p-value should be around 0.04 which is larger than we got before and hence weaker evidence, as expected. 17. Null: The probability that students choose the right front tire is 0.25 (π = 0.25). Alternative: The probability that students choose the right-front tire is different than 0.25 (π ≠ 0.25). 18. The p-value is 0.0023 for the two-sided test—about twice as large as before.
10/16/20 9:15 PM
20
C HA PTER 1
Significance: How Strong Is the Evidence
19. Yes, we have strong evidence the probability is different than onefourth because the p-value of 0.0023 is still small enough to be considered strong evidence against the null hypothesis. I t is interesting to note here that even though we pass the technical conditions, the sample size of 28 is still pretty small and the theory-based method is not in close agreement with the simulation approach.
Chapter 1 Research Article 1. The researchers are examining the nature and development of attitudes toward similar and dissimilar others in human infancy 2. (two options, among others) (a) Dissimilar others are perceived as unkind trustworthy and unintelligent (Brewer 1979, etc.) (b) Humans may engage, support or ignore violence directed towards individuals who differ from themselves (Prentice and Miller, 1999) 3. 16 4. To figure out what kind of food the babies preferred (graham crackers or green beans) so that information could be used later in the study 5. Food preference: green beans or graham crackers, categorical with 2 outcomes (green beans or graham crackers) 6. To have babies establish which rabbit is similar to them, and which is dissimilar 7. No 8. The researchers needed to make sure that the babies understood what they were seeing (one puppy be nice to the rabbit, and one puppy be mean to the rabbit). 9. Puppy preference: Harmful or Helpful 10. 12/16 chose helper when viewing activities involving the rabbit similar to them, compared to 4/16 who chose the harmful puppy when viewing activities involving the rabbit similar to them. 11. 100% similar chose helper; 0% dissimilar chose helper 12. Fifty-three percent is fairly close to 50% (the null hypothesis) and the sample size (36) is not large.
14. Null hypothesis: The long-run proportion of times an infant chooses the harmer dog is 0.50 when the dog interacts with the dissimilar rabbit; Alternative hypothesis: The long-run proportion of times an infant chooses the harmer dog is not 0.50 when the dog interacts with the dissimilar puppet.. 15. Answers will vary. Sixty-three percent of 14-month-olds in the study chose graham crackers over green beans when given a choice between the two. 16. Answers will vary. Null hypothesis: The long-run proportion of times that a 14-month-old chooses helper character instead of the neutral character is 0.50 when the dog interacts with the similar rabbit; Alternative hypothesis: The long-run proportion of times that a 14-month-old chooses the helper character instead of the neutral character is not 0.50 when the dog interacts with the similar rabbit. The p-value is 0.08, meaning that there is moderate evidence that the long run proportion of times that a 14-month-old chooses the helper characters instead of the neutral character is not 0.50. 17. If the infants in the study are special in some way (e.g., particularly developmentally advanced or not; different ethnicity, socio-economic status, etc.) then the results from this study may not generalize to all infants. 18. Answers will vary. (a) Select babies to represent different ethnicities/SES in order to improve the ability to generalize the results. (b) Give babies different foods to choose from initially to ensure that there is no impact of green beans/graham crackers in particular. 19. If babies generally liked the graham cracker rabbit/dog and didn’t like the green bean rabbit/dog, then the researchers’ conclusions about similarity/dissimilarity would be invalidated, because the differences between the similar and dissimilar conditions would be better explained by a different variable (green beans/graham crackers). 20. The researchers refer to prior research that links adult and child similarity preferences and group psychology, which, they argue, suggests that their results are more inborn (nature) rather than the result of accumulated experiences (nurture).
13. To modify the experiment so that babies have a “neutral” option to provide strong comparisons between groups
c01Solutions.indd 20
10/16/20 9:15 PM
CHAPTER 2
Generalization: How Broadly Do the Results Apply? Section 2.1
2.1.18
2.1.1 B.
a. Mean = 0.25, and SD = 0.068
2.1.2 A.
b. Mean = 0.25, and SD = 0.022
2.1.3 A.
2.1.19
2.1.4 C.
a. Mean = 0.20, SD = √0.20(0.80) / 1,414 = 0.0106
2.1.5 C.
b. ( 271 / 1,414 − 0.200) / 0.0106 = − 0.7873. So the sample proportion is 0.7873 standard deviations below the mean.
2.1.6 C.
c. It is not very unlikely to get a sample proportion of 271/1,414 because it is within 2 standard deviations of the mean.
2.1.7 D. 2.1.8 D.
2.1.20 Using a phone call as the method of asking this question is probably a biased method. Those answering a person on a phone call were much more unlikely to say that they exercise less than once per week. Having an interaction with a person probably makes some people not give the socially undesirable answer.
2.1.9 A. 2.1.10 A. 2.1.11 B. 2.1.12
2.1.21
A. False
a. Likely representative, because the distribution of blood type is probably not different among those that eat at the cafeteria compared to that of the U.S. population.
B. True C. True 2.1.13 A.
b. Likely not representative because those in the cafeteria may eat most of their meals in the cafeteria and not regularly eat fast food
2.1.14 B. 2.1.15
c. Perhaps representative because the proportion with brown hair is probably not too different among those that eat at the cafeteria compared to that of all the students at the school.
a. F. b. C. c. A. 2.1.16
______________
_____________
a. SD = √0.41(0.59) / 30 = 0.090 b. The SD should be about 0.085 which is a bit smaller than that from part (a). It is different because the population is less than 10 times the sample size. It should be at least 20 times the sample size. c. The SD should be about 0.090 which is what we got in part (a). The larger population size helped. 2.1.17 The graph of the most recent sample represents whether or not a word was short, a categorical variable. However, in the graph of the proportions, the horizontal axis represents the proportion of short words in a sample, a quantitative variable.
d. This may not be representative. Those in the cafeteria (as well as those at the school) could differ quite a bit racially from the U.S. population and thus would differ in the proportion that has brown hair. 2.1.22 Although some of these could be argued the other way, all of the samples would likely not be representative. a. If you have food that finches like and other birds do not, you would overestimate the proportion of finches. b. You could have food that finches do not like, and you would rarely see more than one eating at a time. c. The proportion of male birds could be species-dependent and depending on the type of food you have could affect the type of species and hence affect the proportion of male birds that come to your feeder.
21
c02Solutions.indd 21
10/19/20 6:22 PM
22
C HA PTER 2
Generalization: How Broadly Do the Results Apply?
d. This proportion could then be different than the proportion of males in your area as well as those that typically visit feeders. 2.1.23 a. The population is all the likely voters in the city. b. Because it is a random sample, I would think the proportion that favor the incumbent in the sample is similar to that for the population. 2.1.24 a. The proportion of all city voters that plan to vote for the incumbent b. The proportion of those in the sample that plan to vote for the incumbent (0.65). 2.1.25 a. The variable is who they plan to vote for or whether or not they plan to vote for the incumbent. b. Categorical c. Proportion d. Bar graph 2.1.26 a. Null: 50% of the all the city voters plan to vote for the incumbent. Alternative: A majority of all the city voters plan to vote for the incumbent. b. The proportion of all city voters that plan to vote for the incumbent, 0.65. 2.1.27 a. p-value ≈ 0
c. Not comfortable generalizing to all U.S. adults. Instead, comfortable generalizing to a population like the one that participated in the survey—watchers of the TV news program who were motivated enough to participate. d. Theory-based is appropriate because at least 10 successes and 10 failures in the data, z = 3.83 and p-value = 0.0001. 2.1.33 a. The population is all the sharks at the zoo. b. The proportion should be similar to the population because it came from a random sample. 2.1.34 a. The parameter is the proportion of sharks in the zoo that have the disease. b. The statistic is the proportion of sharks in the sample (0.20) that have the disease. 2.1.35 a. The variable measured is whether or not they have the disease. b. Categorical c. Proportion d. Bar graph 2.1.36 a. Null: The proportion of all sharks at the zoo that have the disease is 0.25. Alternative: The proportion of all sharks at the zoo that have the disease is less than 0.25.
b. We have strong evidence that a majority of all the city voters plan to vote for the incumbent.
b. 3/15 = 0.20
c. We can infer these results to all likely city voters because they came from a random sample.
a. p-value = 0.46
d. Using theory-based methods is appropriate and we also obtain a p-value of approximately 0. 2.1.28 a. The population is all adults in the United States b. It is perhaps greater than the population. People were allowed to self-select themselves to be part of the sample. This method will often overestimate the population proportion because people that really care about the issue will be the ones to respond. 2.1.29 a. The proportion of U.S. adults that are unhappy with the verdict. b. The proportion of the sample (0.82) that are unhappy with the verdict. 2.1.30 a. Whether or not someone is unhappy with the verdict b. Categorical c. Proportion d. Bar graph 2.1.31 a. Null: The proportion of U.S. adults that are unhappy with the verdict is 0.75. Alternative: The proportion of U.S. adults that are unhappy with the verdict is greater than 0.75. b. 0.82 2.1.32 a. p-value ≈ 0 b. We have very strong evidence that the proportion of all U.S. adults that are opposed to the verdict is greater than 0.75.
c02Solutions.indd 22
2.1.37 b. We do not have any evidence that the proportion of sharks in the zoo that have the disease is less than 0.25. c. The sharks at the zoo d. A theory-based approach using the normal distribution is not reasonable to use because there were only three sharks with the disease. We need at least 10. 2.1.38 a. All the customers of the store b. The 100 people asked to fill out the survey c. The proportion of all customers that visit the store because of the sale on coats d. The proportion of the sample that said they visited the store because of the sale on coats (0.40) 2.1.39 It may not be representative because it was not a random sample. 2.1.40 a. The new proportion of 1,523/2,216 = 0.687 is closer to the proportion that actually voted. ________________
b. SD = √0.592(0.408) / 2,216 = 0.0104 c. (0.687 – 0.592)/0.0104 = 9.13. Because the sample proportion is 9.13 SD above 0.592, it is significantly larger. Results like this are very unlikely to happen by chance. 2.1.41 The question is awkwardly phrased. “Does it seem possible or impossible … it never happened?” The latter part (“impossible it never happened”) involves a double negative. 2.1.42 This question is more clearly phrased; they got rid of the double negative.
10/19/20 6:22 PM
Solutions to Problems 2.1.43 Yes, assistance to the poor likely elicits a more favorable response toward programs than the term welfare. 2.1.44 Probably the question that did not give two options will yield a higher percentage of affirmative responses. 2.1.45 A sample of 1,500 might not have many or any members of a rare subpopulation, but that does not matter because members of a rare subpopulation are such a small part of the entire population and hence do not really affect an overall population proportion. 2.1.46 False. Increasing the sample size will not affect bias but will only affect sampling variability. 2.1.47 In the Doris and Buzz example, Dr. Bastian randomly determine if the light would flash or if it would be shining steadily. 2.1.48 Randomness occurs in the chance model by flipping a coin to determine if Buzz would push the correct button if he was just guessing (a constant probability). In reality, Buzz may not have a constant probability of choosing the correct button. He may be learning along the way, he may get tired, or his stomach may get full of fish.
Section 2.2 2.2.1 B. 2.2.2 B. 2.2.3 A. 2.2.4 D. 2.2.5 B. 2.2.6 C. 2.2.7 B. 2.2.8 B. 2.2.9 D. 2.2.10 C. 2.2.11 A. 2.2.12 A, D. 2.2.13 E. 2.2.14 Graph (a) is a distribution of sample means from samples of size 30; we know this because it is the distribution with the smaller SD. 2.2.15 a. False b. False c. False 2.2.16 The horizontal axis of the graph of the most recent sample is the length of the words, but the horizontal axis of the graph of the statistic is the mean length of the words in a sample of 10 words. Both of these are quantitative. 2.2.17 a. Because the distribution is skewed to the left, the mean will be to the left of the median; hence, 65.86°F is the mean and 67.50°F is the median. b. Mean: Larger; Median: Larger; Standard deviation: Smaller 2.2.18 Because your score of 84 is below the median of 87, more students had exam scores higher than yours than below. 2.2.19 a. The students in the class b. The variable is the number of states visited and it is quantitative. c. The graph is centered about 8, most of the data is between 2 and 16 (and skewed to the right a bit) with outliers 25, 30, and 43.
c02Solutions.indd 23
23
d. 7.5 e. The mean would be larger because the distribution is skewed to the right. f. The mean would be smaller, the median would stay the same, and the standard deviation would be smaller. 2.2.20 a. The distribution is skewed to the right. b. Because the distribution is skewed to the right, we should expect the mean to be higher than the median. c. The median is $35, and the mean is $45.68. The mean is higher, as expected. d. If a $150 haircut is changed to $300 we should expect the median to stay the same (because $150 or $300 is just another larger value), but the mean should increase (because the total of all haircut costs will increase). When the change is made, we can see the median does stay the same and the mean increases to $48.68. 2.2.21 a. E. b. C. c. B. 2.2.22
2.119 _ = 0.387 a. SD = _____ √ 30 b. The SD is about 0.366. It is smaller than what is predicted because the population size is too small. The population size should be at least 20 times the sample size and it is less than 10 times. c. The SD is about 0.386 or 0.387, much closer to what was predicted in (a). 2.2.23 a. Mean = 10, and SD = 0.8 b. Mean = 10, and SD = 0.4 2.2.24 a. Decreases b. Increases c. 100 2.2.25 The sample size must be four times as large. 2.2.26 a. Mean = 100 points, SD = 3.35 points b. 80 c. 320 2.2.27 a.
i. Mean = 8 hours, SD = 0.47 hours, and bell-shaped; ii. Yes
b.
i. Mean = 8 hours, SD = 0.47 hours, and slight skewed to the right; ii. yes
c.
i. Mean = 8 hours, SD = 0.47 hours, and bell-shaped; ii. Yes
d. Regardless of the shape of the population distribution, the mean and SD were about 8 and 0.47, respectively. When the population had a symmetric shape, the sampling distribution had a bell shape. But when the population had a skewed distribution, the sampling distribution was also skewed. 2.2.28 a. Students at her school
10/19/20 6:22 PM
24
C HA PTER 2
Generalization: How Broadly Do the Results Apply?
b. I would guess that the average number of hours students in a statistics class watch TV per day is similar to that of students in the entire school. 2.2.29 a. The parameter is the average number of hours students at the school watch TV per day. b. The statistic is the average number of hours students in the sample watch TV per day (1.2 hours). 2.2.30 a. The variable is the number of hours of TV is watched per day. b. Quantitative c. Mean or median d. Dotplot
2.2.37
Voter
Time spent reading/ learning about local politics
Voting for incumbent?
#1
0
Yes
#2
0
Yes
#3
60
No
…
…
…
2.2.38 The study was done using a random sample. It could have been done by obtaining a list of all the voters in the city and then assigning every voter a number. Then have a random number generator give 267 random numbers. The voter’s names that match the numbers chosen will be the random sample. 2.2.39
2.2.31 a. If a student watched more than 10 minutes of TV yesterday b. Categorical
a. The variable is the time spent reading or watching news coverage about the trial in the last 3 days. b. Quantitative
c. Proportion
c. Mean or median
d. Bar graph
d. Dotplot
2.2.32 a. Null: 50% of the students at the school watched at least 10 minutes of TV yesterday. Alternative: More than 50% of the students at the school watched at least 10 minutes of TV yesterday. b. The proportion (or percentage) of all students that watched at least 10 minutes of TV yesterday
2.2.40
Respondent
Time spent reading/ watching about the trial
Not happy with verdict?
#1
240
Not happy
#2
90
Not happy
#3
30
Happy
…
…
…
c. 21/30 = 0.70 2.2.33 a. 0.0214 b. We have strong evidence that students watched more than 10 minutes of TV yesterday. c. Concerned because not a random sample of students at the school, although that is the population of interest; generalize to this population with caution. d. A theory-based approach is not reasonable because there are not at least 10 students in the sample who watched less than 10 minutes of TV yesterday. 2.2.34
2.2.41 The study was not done using a random sample. Because national polls like this don’t have available lists of all U.S. adults from which to sample, typically random-digit dialing is done. Phone numbers are randomly generated. This could have been done for this poll. 2.2.42 a. The shark’s blood oxygen content b. Quantitative c. Mean or median
Student
Hours per day
Watched TV yesterday
Alejandra
2
no
Ben
4
yes
Cassie
0.5
no
…
…
…
2.2.35 The study was not done using a random sample. To take a random sample of 30 students at the school, we first need to obtain a list of all the students at the school and randomly choose from that list. One way to do this is to assign every student a number and then have a random number generator give you 30 random numbers. The students’ names that match the numbers chosen will be your random sample.
d. Dotplot 2.2.43
Shark
Has disease?
Blood oxygen content
#1
Yes
1.2%
#2
No
5.6%
#3
No
6.2%
…
…
…
2.2.36
2.2.44 The study was done using a random sample. It could have been done by obtaining a list of all the sharks at the zoo and assigning each a number. Then have a random number generator give 15 random numbers. The sharks that match the numbers chosen will be the random sample.
a. How long people spend reading or learning about local politics
2.2.45
b. Quantitative
a. All full-time students at the school
c. Mean or median
b. The 150 students in the sample
d. Dotplot
c. The average daily study time for all full-time students at the school
c02Solutions.indd 24
10/19/20 6:22 PM
Solutions to Problems d. The average daily study time for the students in the sample (2.23 hours)
25
2.3.8 a. Statistic _ b. x = 3.01
2.2.46 Because it is a random sample it should be representative of the population.
c. You would have to fabricate a large dataset to represent the population of times for all Cal Poly students with the variability similar to that of the sample data and a mean of 2.84. From that data you would take a sample of 100 and find its mean. Repeat this at least 1,000 times to develop a null distribution. To determine the p-value, determine the proportion of simulated statistics that are at least as far away from 2.84 as that of 3.01.
2.2.47 a. If one coffee bar opens earlier than the other, the lines may be different at the bars when they first open (if one opens before first class starts it may have a longer line than one that opens after first class starts). b. A more representative sample of students might be observed at different times of the day and on different days of the week.
d.
2.2.48
Simulation
a. Maybe overstate, as students may tend to think they are getting more sleep than they are actually getting
Real study
One repetition
=
A random sample of 100 students
Null model
=
Population mean hours of TV watching is 2.84 hours
c. Maybe overstate, as students may tend to think they are attending church more often than they actually are attending
Statistic
=
The average TV watching time in the sample
d. Maybe overstate, as students may tend to think they are studying more than they actually are studying
2.3.9
b. Maybe overstate, as students may tend to think they are volunteering more than they are actually volunteering
a. If the mean daily TV watching time for Cal Poly students is 2.84 hours, the probability we would get a sample mean as extreme as 3.01 from a random sample of 100 students is 0.16.
e. Maybe overstate, as students may tend to think they are wearing a seat belt more often than they actually wear a seat belt
b. Because this is a one-sided test, Dr. Elliot’s p-value should be about half that of Dr. Sameer’s.
Section 2.3 2.3.1 B.
2.3.10
2.3.2 C.
a. s = 1.97 is a statistic.
2.3.3 D.
_____
b. (3.01 − 2.84)/(1.97/√100 ) = 0.86
2.3.4 B.
c. Because the standardized statistic is 0.86 (and that is less than 2) we do not have strong evidence that the average time Cal Poly students watch TV is different than 2.84 hours.
2.3.5 C. 2.3.6 C. 2.3.7
2.3.11
a. The number of hours Cal Poly students watch TV per day and it is quantitative
a. Yes, because the sample size is large b. The standardized statistic is 0.86 and the p-value is 0.3903.
b. μ = the average number of hours Cal Poly students watch TV per day
c. See Solution 2.3.11c. Because the p-value is greater than 0.05 we do not have strong evidence that the average time Cal Poly students watch TV is different than 2.84 hours.
c. H0: μ = 2.84 hrs. Ha: μ ≠ 2.84 hrs Scenario: One mean
Theory-based inference
Sample data
Paste dtata
Ho:
n: 100
SD
mean, x: 3.01 sample sd, s: 1.97
SD
2.84 2.84
Calculate Mean
Mean = 2.84 SD = 0.197
Calculate Reset
=
Ha:
1
2
3
4
5
2
2.4 2.8 3.1 -------- "x" -------t = −4 t = −2 t = 0 t=2 Standardized statistic t = 0.86 p-value 0.3903
3.5 t=4 df = 99
Solution 2.3.11c
c02Solutions.indd 25
10/19/20 6:22 PM
26
C HA PTER 2
Generalization: How Broadly Do the Results Apply?
2.3.12 The standardized statistic should be more than 2 because a t-distribution has more area (probability) in the tail, we would have to move the standardized statistic out farther to reduce the probability to what you would find beyond 2 in the tail of a normal distribution.
2.3.15 a. Yes, because the sample size is larger than 20 b. The standardized statistic: t = 2.13 and the p-value = 0.0191 (applet output is shown in Solution 2.3.15b).
2.3.13
c. If the mean SPF for all students at the school is 30, the probability we would get a sample mean as large or larger than 35.29 from a random sample of 48 is 0.0191.
a. The SPF value for sunscreen used by students at a certain school b. μ = mean SPF of sunscreens used by all students at this school.
d. We have strong evidence that the average SPF used by students at the school is more than 30.
c. H0: μ = 30 versus Ha: μ > 30 _ d. n = 48, x = 35.29, s = 17.19
2.3.16
e. No, it just came from students in her class
a. The diameter of needles and it is quantitative
f. They probably don’t differ much from the students as a whole on this issue.
b. H0: μ = 1.65 mm Ha: μ ≠ 1.65 mm _ c. n = 35, x = 1.64, s = 0.07
g. It may not be representative for students at a Midwestern college where it is very cloudy.
d. See Solution 2.3.16d. The mean should be about 1.65 and the 0.07 ___ ≈ 0.01. standard deviation of the distribution should be about _____ √ 35 2.3.17
2.3.14 a. You would have to fabricate a large dataset to represent the population of SPF numbers of sunscreens for all students at the school with the variability similar to that of the sample data and a mean of 30. From that data you would take a sample of 48 and find its mean. Repeat this at least 1,000 times to develop a null distribution. To determine the p-value, determine the proportion of simulated statistics that are at or more than 35.29.
a. You would have to fabricate a large dataset to represent the population of needles with the variability similar to that of the sample data and a mean of 1.65. From that data you would take a sample of 35 and find its mean. Repeat this at least 1000 times to develop a null distribution. To determine the p-value, determine the proportion of simulated statistics that are at least as extreme or more extreme than 1.64.
b.
Simulation
b.
Real study
Simulation
One repetition
=
A sample of 48 students
Null model
=
Population mean is 30
Statistic
=
Average SPF number in the sample
n: 48
One mean
Real study
One repetition
=
A random sample of 35 needles
Null model
=
Population mean is 1.65mm
Statistic
=
Average diameter of needles in the sample
Mean = 30.00 SD = 2.481
test statistic
t = 2.13
p-value
0.0191
¯ 35.29 mean, x:
Ho: µ = 30
sample sd, s: 17.19
Ha: µ > 30
Reverse
Calculate
95
%
(30.299, 40.281)
20.1
25
30
35
39.9
¯ <----------“x”----------> t
−4
t
−2
t
0
t
2
t
4
Solution 2.3.15b Mean = 1.65
SD = 0.01
1.62
1.63
1.64 1.65 1.66 Sample mean diameter (mm)
1.67
1.68
Solution 2.3.16d
c02Solutions.indd 26
10/19/20 6:22 PM
Solutions to Problems 2.3.18 a. The standardized statistic is –0.85 and the p-value is 0.4039. b. Because the p-value is much greater than 0.05, we do not have strong evidence that the average diameter of the population of needles is different than 1.65mm. 2.3.19 a. The distribution is fairly symmetric. b. The mean and the median will be about the same because the distribution of temperatures is fairly symmetric. c. The mean is 98.105 and the median is 98.100. Yes, they are very close. d. The actual standard deviation is 0.699.
27
c. Although both mistakes are problematic, telling someone they are healthy when they are not could have serious, negative, long-term personal and community health impacts (e.g., they could give others the disease). 2.3.25 a. We have strong evidence the subject is not telling the truth. b. It is plausible the subject is telling the truth (we cannot rule out the fact that they are telling the truth). c. We have strong evidence the subject is not telling the truth, when in fact they are telling the truth. d. We do not have strong evidence the subject is not telling the truth (it’s plausible they are telling the truth), when in fact they are lying.
2.3.20
2.3.26
a. Null: The average body temperature for males is 98.6°F ( μ = 98.6°F), Alternative: The average body temperature for males is not 98.6°F ( μ ≠ 98.6°F).
b. It is plausible the incoming message is legitimate.
a. We have strong evidence the incoming message is not legitimate.
b. The standardized statistic is −5.71 and the p-value is 0.
c. We have strong evidence the incoming message is not legitimate, but it is legitimate.
c. Because the p-value is less than 0.05 we have strong evidence that the average male body temperature is not 98.6°F.
d. It is plausible the incoming message is legitimate, when it is actually not legitimate.
d. Any generalization should be done with caution, but we can probably generalize it to healthy male adults similar to those that were in the study.
2.3.27
2.3.21 a. Null hypothesis: The average body temperature for females is 98.6 °F ( μ = 98.6°F). Alternative: hypothesis: the average body temperature for females is not 98.6°F ( μ ≠ 98.6°F). b. Hard to tell, there is a lot of variability in the data c. The standardized statistic is −2.24 and the p-value = 0.0289. d. Because the p-value is less than 0.05 we have strong evidence that the average body temperature for females is different than 98.6°F. 2.3.22 a. Null hypothesis: The average commute time in the California city is 27.5 minutes. Alternative hypothesis: The average commute time in the California city is different than 27.5 minutes.
a. We have strong evidence the new treatment is better, when it actually is not. b. We do not have evidence the new treatment is better, when it actually is. 2.3.28 a. We find strong evidence that Buzz is not guessing, but he is guessing. b. We do not have evidence that Buzz is not guessing (guessing is plausible), when Buzz is actually not guessing. 2.3.29 A Type I error is possible here, which means that we conclude there is strong evidence Buzz is not guessing, when in fact Buzz is guessing.
b. The standardized statistic is –2.23 and the p-value is 0.0308.
2.3.30
c. Because the p-value is less than 0.05 we have strong evidence that the average commute time in the California city is different than 27.5 minutes.
a. The true, long-run, average needle diameter ( μ).
d. Perhaps we can generalize just to the people like those that she and her parents know. They could easily not be representative of the city residents as a whole. 2.3.23 a. Rejecting a true null hypothesis is finding an innocent person guilty. b. Not rejecting a false null hypothesis is finding a guilty person not guilty. c. Our judicial system is supposed to be set up to make it difficult to find an innocent person guilty, the one described in part (a). 2.3.24 a. We have strong evidence the patient has the disease, when in fact they are healthy. The consequence is potentially frightening a patient into thinking they are diseased when, in fact, they are not. b. We do not have evidence the patient has the disease, when in fact they are diseased. The consequence is giving a patient a false/untrue sense of security.
c02Solutions.indd 27
b. Null: The long-run average needle diameter is 1.65 mm ( μ = 1.65 mm). Alternative: The long-run average needle diameter is different than 1.65 mm ( μ ≠ 1.65 mm). c. We find evidence that the long-run average needle diameter is different than 1.65 mm, when it is actually 1.65 mm. The consequence is that production may stop when it should not have. d. We do not have evidence that the long-run average needle diameter is different from 1.65 mm, when it is actually different than 1.65 mm. The consequence is producing needles with the wrong diameter and selling them in the marketplace, which ultimately may be bad for business. 2.3.31 Type I error would be the producer’s risk because they risk shutting down manufacturing when it should not have and Type II error would be the consumer’s risk because they are risking using needles which (purportedly) have an average diameter of 1.65 mm when the true average may not be 1.65 mm. 2.3.32
10/19/20 6:22 PM
28
C HA PTER 2
Generalization: How Broadly Do the Results Apply?
gets more skewed to the low numbers the mean will get smaller than the median and (mean – median) will get smaller. When we divide by the SD we standardize this difference and thus this makes this statistic a good measure of skewedness. 2.3.33 Deviations from the average always sum to be zero. So if you know n – 1 deviations, you can figure out the other one. This means that only n – 1 of the numbers of a dataset are free. The last one is determined by the rest. The more degrees of freedom for a t-distribution, the more it looks like a normal distribution. 2.3.34 A p-value of 0.04999 or 0.050001 are very similar and came from very similar results. It is not like something magical happens when a p-value drops below 0.05. p-values around 0.05 are really all about the same, just as p-values close to any other number are all about the same. Therefore, we should not have dramatically different conclusions for p-values of 0.04999 and 0.050001.
Section 2.4 2.4.1 B. 2.4.2 A.
c. The mean would increase because 5 would be added to the total of the scores. We cannot really say for certain how or whether the median or standard deviation would change. It depends on the original distribution of scores. For example, if adding the 5 points to the lowest score could still keep it the lowest score or it could change it to the highest score. 2.4.19 No, the distribution of means is not symmetric but is skewed right. Therefore, a t-distribution will not model this well. 2.4.20 Randomly choose a coin out of your collection, note its year, and put it back. Do this same thing 99 more times so you have a sample of 100 years. Find the mean of those 100 years. Repeat this process many, many times to develop a bootstrap sampling distribution. 2.4.21 a. (3, 3, 3), (3, 3, 6), (3, 6, 6), (3, 3, 9), (3, 9, 9), (6, 6, 6), (6, 6, 9), (6, 9, 9), (9, 9, 9), (3, 6, 9) b. There are seven possible means: 3, 4, 5, 6, 7, 8, 9. c. There are only three possible medians: 3, 6, 9. 2.4.22
2.4.4 B.
a. H0: The mean body temperature for males is 98.6°F ( μ = 98.6°F). Ha: The mean body temperature for males is not 98.6°F ( μ ≠ 98.6°F). _ b. x = 98.105°F, s = 0.699
2.4.5 C.
c. SD ≈ 0.086
2.4.6 B.
d. (98.105 – 98.6)/0.086 = –5.76. Because the sample mean 98.105°F is more than 5 SDs below the hypothesized mean of 98.6°F, we have very strong evidence that the mean body temperature for males is different (less) than 98.6°F.
2.4.3 B.
2.4.7 C. 2.4.8 D. 2.4.9 B. 2.4.10 A.
e. As the p-value < 0.0001, we come to the same conclusion.
2.4.11 B.
f. We can probably generalize to all healthy adult males in the United States between the ages of 18 and 40.
2.4.12 A.
2.4.23 98.105 ± 2(0.086) = 97.933°F to 98.277°F
2.4.13 A. 2.4.14 B. 2.4.15 a. Get larger b. Stay the same c. Get larger d. Get larger 2.4.16 a. False b. True c. False d. True 2.4.17 a. True b. False c. True d. False 2.4.18 a. The mean and median would increase by 5 points; the standard deviation would stay the same because the entire distribution would move up but the variability stays the same. b. The mean would increase because 5 would be added to the total of the scores. The median would stay the same because the high score remains the high score. The standard deviation would increase because there would be more variability with the larger number.
c02Solutions.indd 28
2.4.24 a. The standardized statistic is t = –5.71. It is very similar to what was found using a bootstrap sampling distribution. b. The p-value is < 0.001. It is very similar to what was found using a bootstrap sampling distribution. 2.4.25 a. H0: The mean body temperature for females is 98.6°F ( μ = 98.6 °F). Ha: The mean body temperature for females is not 98.6°F ( μ ≠ 98.6°F). _ b. x = 98.394°F, s = 0.743 c. SD ≈ 0.092 d. (98.394 – 98.6)/0.092 = –2.24. Because the sample mean 98.394°F is more than 2 SDs below the hypothesized mean of 98.6°F, we have strong evidence that the mean body temperature for females is different (less) than 98.6°F. e. As the p-value ≈ 0.022, yes, we come to the same conclusion. 2.4.26 a. The standardized statistic is t = –2.24. It is exactly what was found using a bootstrap sampling distribution. b. The p-value is 0.0289. It is very similar to what was found using a bootstrap sampling distribution. 2.4.27 a. H0: The mean increase in the laugh rating is 0 ( μ = 0). Ha: The mean increase in the laugh rating greater than 0 ( μ > 0). _ b. x = 0.295, s = 0.427 c. SD ≈ 0.067
10/19/20 6:22 PM
Solutions to Problems
29
d. As the p-value < 0.0001, we have strong evidence that the mean increase in rating is greater than 0 (or jokes are funnier with a laugh track).
bootstrap sampling distribution also increased. This would also cause the p-value to increase.
2.4.28
e. For the original data, the median is 2, the SD of the sampling distribution is ≈ 1.3, and the p-value ≈ 0.004. For the data where the 20 changes to a 40, the median is still 2, the SD of the sampling distribution is still about 1.3, and the p-value is still about 0.004.
a. H0: The median increase in the laugh rating is 0. Ha: The median increase in the laugh rating greater than 0. b. Median = 0.330, s = 0.427 c. SD ≈ 0 d. As the p-value < 0.0001, we have strong evidence that the median increase in rating is greater than 0 (or jokes are funnier with a laugh track). 2.4.29 a. H0: Students’ actual scores are the same as predicted on average or the mean difference is 0 ( μ = 0). Ha: The students’ actual scores are lower than predicted on average or the mean difference is less than 0 ( μ < 0). _ b. x = −2.481, s = 8.635 c. p-value ≈ 0.07 d. We do not have strong evidence that the mean difference in scores is less than 0, or we do not have strong evidence that students’ actual scores tend to be less than the predicted in the long run. 2.4.30 a. H0: The median difference in scores is 0. Ha: The median difference in scores is less than 0. b. Median = –1.000, c. p-value ≈ 0.28 d. We do not have strong evidence that the median difference score is less than 0 in the long run (or we do not have strong evidence that students tend to predict lower scores than their actual score). 2.4.31 a. H0: People tend to pick their own face, on average ( μ = 0). Ha: People tend to pick a face that is different than theirs, on average ( μ ≠ 0). _ b. x = 6.296, s = 12.449 c. SD ≈ 2.346 d. (6.296 – 0)/2.346 = 2.68. Because the standardized statistic is more than 2, we have strong evidence that the mean score is different (greater) than 0, or people, on average, pick faces that are better looking than their own in the long run. e. As the p-value ≈ 0.01, we come to the same conclusion. 2.4.32 a. H 0: People tend to pick their own face, on average ( μ = 0). Ha: People tend to pick a face that is different than theirs, on average ( μ ≠ 0). _ b. x = 12.000, s = 19.712 c. SD ≈ 4.93 d. (12 – 0)/4.93 = 2.43. Because the standardized statistic is more than 2, we have strong evidence that the mean score is different (greater) than 0, or people, on average, pick faces that are better looking than their own in the long run when looking at mirror images of their faces. 2.4.33 a. 3.8 cups per week
f. The observed median did not change because 40 is just a high number just like the 20 was. The SD of the null did not change much because again, having the 20 or 40 (or even multiples of these) in your bootstrap sample does not affect the median of that sample. Therefore, a very similar sampling distribution will be obtained. Because the observed median and the SD of the sampling distribution did not change much, the p-value will not change much.
End of Chapter 2 Exercises 2.CE.1 a. No, all games during a certain period were selected, instead of randomly choosing some. b. Number of runs is likely more representative of the population (all games) as attendance fluctuates dramatically during the year due to weather, opponent, and timing of the games during the season. 2.CE.2 a. No, you did not make a list of all students and sample from the list. b. Answers will vary. One possibility is blood type—unlikely that blood type is associated with whether or not you are likely to walk in front of the library. c. Answers will vary. One possibility is GPA—students walking near the library may be more likely to have higher GPAs. 2.CE.3 a. No, the instructor did not list all games in the 2010 season and randomly choose some. b. Null hypothesis: 75% of all major league baseball games have a “big bang” (π = 0.75). Alternative: hypothesis: Less than 75% of all games have a “big bang” (π < 0.75). 21 = 0.467 c. p̂ = ___ 45 d. Using applet with 0.75 = probability of success, n = 45, and number of samples = 1000, calculate probability of 0.467 or less. The p-value is approximately 0. e. The data provide strong evidence that the true proportion of all major league baseball games with a big bang is less than 0.75, because it’s extremely unlikely that in a sample of 45 games only 21 would have a big bang if the true proportion of all games with a big bang was 0.75. 2.CE.4 a. 263/499 = 0.527 b. statistic, because it is based on the sample. The appropriate symbol is p.̂ c. Bar chart 0.5271
0.4729
b. SD ≈ 1.4, p-value ≈ 0.02 c. 5.133 cups per week, SD ≈ 2.5, p-value ≈ 0.50 d. The p-value increased. This happened because the sample mean moved closer to the value hypothesized under the null. The SD of the
c02Solutions.indd 29
Do read
Do not read
10/19/20 6:22 PM
30
C HA PTER 2
Generalization: How Broadly Do the Results Apply?
d. p-value is 0.2444 (probability of heads = 0.5, number of tosses = 499, number of reps = 1,000, as extreme as 263, two-sided). We do not have evidence that the proportion of all Israelis that read while using the toilet differs from one half. e. Although this is not a random sample and we should be cautious generalizing, generalizing to all Israelis seems reasonable if we believe that sampling individuals in public gathering areas are like those that aren’t in public gathering areas. It is a judgment call as to whether generalizing to the United States or another country is reasonable because the behavior may be quite different in different cultures. 2.CE.5 a. The observational units are the 600 brides in the sample. The variable is whether or not they kept their own name.
ii. 0.172—as expected; iii. Slightly skewed to the left c.
i. 3.29—as expected; ii. 0.082—as expected; iii. Fairly bell-shaped
2.CE.11 a. 86.7, as expected; close to the population mean 86.736 9.608 _ = 2.148 b. 2.106, as expected; close to _____ √ 20 c. The distribution is almost symmetric; it is very slightly skewed to the left. d. Mean ≈ 86.7, SD ≈ 1.36, nearly bell-shaped
b. The population is the U.S. brides in 2001 to 2005. The sample is the 600 brides with wedding announcements in the NY Times.
2.CE.12
c. The proportion of all U.S. brides that keep their own names
b. Larger
d. Yes, concerned because people with wedding articles in NY Times are likely to be different than “typical” U.S. adults. Probably reasonable to generalize to other brides who could/would have their wedding announcement in the NY Times.
c. Smaller
2.CE.6 With a p-value on 0.0396, we have strong evidence that the proportion of brides (who could/would have their wedding announcement in the NY Times from 2001 to 2005) that kept their name is different (or greater than) 0.15. 2.CE.7 a. Nonbiased b. Biased; another population would be “students who visit the library” c. Biased; another population would be “students who visit the student center” d. Biased; another population would be “students who go to basketball games”
a. Smaller
d. Larger e. Larger f. Smaller 2.CE.13 a. Smaller b. Larger c. Larger d Smaller e. Larger f. Smaller 2.CE.14 a. Smaller b. Larger
e. Biased; another population would be students who live on campus
c. Smaller
f. Biased; another population would be students who have a car on campus
d. Stays the same e. Larger
2.CE.8
f. Larger
a. Library
g. Smaller
b. Basketball game, cars c. Library, student center d. Dorms 2.CE.9 a. The sample size is large, but it would be good to know that the data was not strongly skewed to feel better about running a theory-based test on the data. b. Null: The average adult body temperature is 98.6°F. Alternative: The average adult body temperature is not 98.6°F. p-value is 0.0000, we have strong evidence that average body temperature is different than 98.6°F. c. t-statistic of 6.32 means our sample mean is more than 6 standard deviations from 98.6. d. Yes, the t-statistic is very different than 0 (in particular > 3) which means there is very strong evidence against the null hypothesis. 2.CE.10 a. Skewed to the left; most students have reasonably good GPAs, except a few who are struggling. b.
i. 3.29—as expected;
c02Solutions.indd 30
Chapter 2 Investigation 1. It was stated that it was a random sample, so therefore it is an unbiased sampling method. 2. Telling the respondents that a very large proportion of people fake phone calls might increase the proportion of the respondents that would admit to faking a phone call. 3. Yes, because it says it was a random sample of all cell phone users 4. It was stated that it was a random sample, so therefore it is an unbiased sampling method. However, our population now would be U.S. college students instead of all adult Americans. 5. Telling the respondents many college students use apps to help them make fake phone calls might increase the proportion of the respondents that would admit to faking a phone call. 6. No, because it was not a random sample of all cell phone users. We can generalize to the college student population that was being sampled. 7. It was not a random sample, so therefore we cannot assume that it is an unbiased sampling method.
10/19/20 6:22 PM
31
Solutions to Problems 8. Telling the respondents that a very small proportion of people fake phone calls might decrease the proportion of the respondents that would admit to faking a phone call.
c. Will infants choose the pushing up toy more than the pushing down toy?
9. No, hard to generalize because not a random sample
e. Will infants choose the neutral toy more than the hinderer toy?
10. Have more than 1 in 10 cell phone users faked cell phone calls within the last 30 days? 11. Each of the cell phone users in the sample (1,858 in the Pew sample) 12. Whether or not the person has faked a cell phone call in the past 30 days 13. The population proportion of all American cell phone users who have faked a cell phone call within the past 30 days 14. Null: The population proportion of all American cell phone users who have faked a cell phone call within the past 30 days is 0.10. lternative: The population proportion of all American cell phone A users who have faked a cell phone call within the past 30 days is more than 0.10. 15. Yes, every sample may yield somewhat different results because they are randomly taken. 16. Statistic, because it is based on the sample 17. 13% of the people in the study report faking a cell phone call within the past 30 days. 18. Yes, because any result for the sample statistic is possible 19. Probability of success = 0.10, sample size = 1,858, number of samples = 1,000 20. 0.10 because that’s the value in the population we are simulating samples from. It makes sense that, on average, our sample proportions are equal to the population proportion. 21. Yes, we don’t get exactly 0.10 every time. What we get varies. 22. Sample proportions between 0.08 and 0.12 are typical values of the sample statistic if the population proportion is 0.10. 23. Because the sample actually yielded 0.13 this suggests that the population proportion is not 0.10. This is convincing evidence that the population proportion of people who faked a cell phone call in the past 30 days is more than 0.10. 24. The approximate p-value is 0. This is the proportion of times we obtained 0.13 or larger when assuming the population proportion was 0.10. 25. This conclusion does not hold for people in general, but it does hold for all American cell phone users because we took a random sample of all American cell phone users. Because we took a random sample of the population of all American cell phone users, we know that the sample is representative of that population. 26. A random sample of American cell phone users gave us strong evidence that more than 10% of all American cell phone users have faked a cell phone call within the past 30 days. Further research might investigate particular demographic groups who are more/less likely to fake cell phone calls and to pursue popular reasons why people are faking cell phone calls.
Chapter 2 Research Article 1. a. Will infants prefer a helping toy vs. a hindering toy given a choice between the two? b. Will infants look at interactions between the climbing toy and the hindering toy longer than between the climbing toy and the helping toy?
c02Solutions.indd 31
d. Will infants choose the helper toy more than the neutral toy? f. Will infants look at interactions between the climbing toy and the helping toy longer than between the climbing toy and the neutral toy? g. Will infants look at interactions between the climbing toy and the neutral toy longer than between the climbing toy and the hinderer toy? 2. Researchers are interested in learning whether preverbal infants (6-month-olds and 10-month-olds) assess individuals based on their interactions with others. 3. Ten-month-old babies and 6-month-old babies. All from the greater New Haven, CT, area. Little other information is available. 4. Ethnic background, socioeconomic status, socialization experiences, and so on may be helpful. 5. The infants were “recruited” (see Methods), meaning that they probably used lists of new babies from the hospital or newspaper and contacted people asking them to participate. Detailed information on the recruiting strategy, however, is not provided in the article. 6. Experiment #1: Choose helper toy or hinderer toy? Categorical, 2 outcomes (helper, hinderer). Looking time at hindering toy (quantitative), looking time at helping toy (quantitative). xperiment #2: Choose pusher-up toy or pusher-down toy? CategoriE cal, 2 outcomes (pusher-up, pusher-down). xperiment #3: Choose neutral or hinderer toy? Categorical, 2 outE comes (neutral, hinderer). Choose neutral or helper toy? Categorical, 2 outcomes (neutral, helper). Looking time neutral toy (when also shown helper; quantitative), looking time helper toy (quantitative). Looking time neutral toy (when also shown hinderer; quantitative), looking time hinderer toy (quantitative). 7. 10-month-olds: 14/16, 6-month-olds: 12/12 prefer helper vs. hinderer 8. 10-month-olds: average length of look at helper toys: 3.82 s, average length of look at hinderer toys: 4.96 s; 6-month-olds: average length of look at helper toys: 6.7 s, average length of look at hinderer toys: 5.7 s 9. 10-month-olds: 6/12, 6-month-olds: 4/12 prefer pusher up vs. pusher down 10. 10-month-olds: 7/8, 6-month-olds: 7/8 prefer helper vs. neutral 11. 10-month-olds: 7/8, 6-month-olds: 7/8 prefer neutral vs. hinderer 12. The p-value is 0.002. We have strong evidence that (in the long run) 10-month-old infants prefer the helper toy over the hinderer toy. 13. The p-value is 0.0002. We have strong evidence that (in the long run) 6-month-old infants prefer the helper toy over the hinderer toy. 14. The validity conditions are not met. In particular, there were only two 10-month-old infants who chose the helper (not 10 or more), and there were no 6-month-old infants who chose the helper (not 10 or more). 15. Quantitative 16. We have strong evidence that the average difference in looking times of the 10-month-old infants between the helper and hinderer was different than zero. 17. With a p-value of 0.44, we do not have strong evidence that the average difference in looking times is different than zero for the 6-month-olds 18. No, they make it sound like they’ve proven the null hypothesis is true (average difference in looking times is zero).
10/19/20 6:22 PM
32
C HA PTER 2
Generalization: How Broadly Do the Results Apply?
19. Infants like those in the study (similar background, etc.); however, because we don’t know much about the infants in the study, it’s difficult to be more specific 20. If the characteristics of the helper/hinderer (color/shape) are not counter-balanced, then you don’t know if the preference of the infants is for the color/shape or for the helper/hinderer. It is ideal to counter-balance as many possibly important characteristics of the experiments as possible to rule them out as possibly explaining the significant preference of the infants. 21. The first sentence of the final paragraph of the paper summarizes two key findings well: “Our findings indicate that humans engage in social evaluation far earlier in development than previously thought, and support the view that the capacity to evaluate individuals on the basis of their social interactions is universal and unlearned.”
in a fairly sound manner. However, the second conclusion may be a bit of a stretch based on this experiment alone and would require evidence from other studies to support the statement more fully; though, this final point is certainly open to debate. 22. The demographic profile of the infants used (ethnicity; socioeconomic); the supposition that the choice of toys will translate into choices of partners, friends, and so on, among others 23. Two ideas (there are many more) are to: a. Try the study on infants with different demographic profiles to argue that it is universal behavior, and not only observed among a particular ethnic or socioeconomic strata. b. Use live subjects (young children) instead of inanimate objects and see whether the preferences persist.
he first aspect of this conclusion seems reasonable given the prior reT search cited in this article and our belief that the study was conducted
c02Solutions.indd 32
10/19/20 6:22 PM
CHAPTER 3
Estimation: How Large Is the Effect? Section 3.1 3.1.1 B. 3.1.2 C. 3.1.3 A. 3.1.4 C. 3.1.5 D. 3.1.6 B. 3.1.7 D. 3.1.8 A. 3.1.9 B. 3.1.10 B. 3.1.11 A. 3.1.12 A. 3.1.13 A. 3.1.14 C. 3.1.15 a. ii. b. ii. 3.1.16 0.05 3.1.17 0.01 3.1.18 The proportion of all American adults that think a college education is very important. 3.1.19 The proportion of all American adults that are in favor of free tuition at a community college for anyone who wants to attend. 3.1.20 a. (0.48, 0.56) b. (0.47, 0.57) 3.1.21 a. (0.48, 0.56) b. (0.46, 0.59) 3.1.22 p-values will differ for each students’ simulation; one possible example follows.
Null
p-value
Null
p-value
Proportion = 0.45
0.018
Proportion = 0.53
0.720
Proportion = 0.46
0.045
Proportion = 0.54
0.483
Proportion = 0.47
0.094
Proportion = 0.55
0.299
Proportion = 0.48
0.139
Proportion = 0.56
0.163
Proportion = 0.49
0.302
Proportion = 0.57
0.083
Proportion = 0.50
0.510
Proportion = 0.58
0.047
Proportion = 0.51
0.760
Proportion = 0.59
0.016
Proportion = 0.52
1.000
Proportion = 0.60
0.005
The 95% confidence interval is (0.47, 0.57). 3.1.23 p-values will differ for each students’ simulation; one possible example follows.
Null
p-value
Null
p-value
Proportion = 0.45
0.001
Proportion = 0.53
0.640
Proportion = 0.46
0.006
Proportion = 0.54
0.316
Proportion = 0.47
0.010
Proportion = 0.55
0.130
Proportion = 0.48
0.053
Proportion = 0.56
0.058
Proportion = 0.49
0.140
Proportion = 0.57
0.014
Proportion = 0.50
0.342
Proportion = 0.58
0.007
Proportion = 0.51
0.667
Proportion = 0.59
0.001
Proportion = 0.52
1.000
Proportion = 0.60
0.000
3.1.24 a. The symbol π represents the proportion of all American adults that drink at least one cup of coffee per day. b. p-value < 0.001 c. No, 0.50 does not appear to be a plausible value for π. Because we have a small p-value, we have strong evidence that π ≠ 0.50, so 0.50 is not plausible. d. The two-sided p-value is larger than a one-sided p-value. e. A 95% confidence interval for π is 0.622 to 0.657.
33
c03Solutions.indd 33
10/19/20 6:23 PM
34
C HA PTER 3
Estimation: How Large Is the Effect?
f. We are 95% confident that the value for π (the proportion of all American adults that drink at least one cup of coffee per day) is between 0.622 and 0.657. 3.1.25 A 99% confidence interval is (0.616, 0.663). 3.1.26 a. The symbol π represents the proportion of all American independent voters that support same-sex marriage. b. p-value < 0.001 c. No, 0.50 does not appear to be a plausible value for π. Because we have a small p-value, we have strong evidence against π = 0.50, so 0.5 is NOT a plausible value.
3.2.11 0.38 ± 0.11 3.2.12 a. The proportion of all U.S. adults who will correctly answer the question. b. There are more than 10 successes (975 > 10), and more than 10 failures (1,170 – 975 = 195 > 10). c. We are 95% confident that the proportion of all U.S. adults who will correctly answer the question is between 0.8120 and 0.8547. d. Yes, because 0.85 is in the 95% confidence interval
d. A 95% confidence interval for π is 0.602 to 0.697.
3.2.13 We are 90% confident that the proportion of all U.S. adults who will correctly answer the question is between 0.8154 and 0.8513; same midpoint but smaller width.
e. We are 95% confident that the value for π (the proportion of all American independent voters that support same-sex marriage) is between 0.602 and 0.697.
3.2.14 a. The proportion of all U.S. adults who think that the United States will become a cashless society in their lifetimes.
3.1.27 A 99% confidence interval is (0.587, 0.709).
b. There are more than 10 successes (635 > 10), and more than 10 failures (1,024 – 635 = 389 > 10).
3.1.28 a. The proportion of all Lee University students who would rather break a bone than lose their phone. b. (0.07, 0.18) 3.1.29 a. The proportion of all male climbers headed to the summit of Mont Blanc who take diuretics. b. (0.19, 0.27) 3.1.30 Answers will vary. Answers could include: Males that use urinals may not include all men hiking to the summit. Some that used the urinals may not be intending to climb all the way to the summit. 3.1.31 a. The proportion of all male climbers headed to the summit of Mont Blanc who take hypnotic drugs. b. (0.09, 0.17) or (0.09, 0.18) 3.1.32 a. The long-run proportion of times when the taller person will pass through first in a same-sex pair of individuals. b. (0.55, 0.78) 3.1.33 B. 3.1.34 False 3.1.35 True
Section 3.2 3.2.1 3.2.2 3.2.3 3.2.4 3.2.5 3.2.6 C. 3.2.7 D. 3.2.8 C. 3.2.9
c. We are 95% confident that the proportion of all U.S. adults who think that the United States will become a cashless society in their lifetimes is between 0.5904 and 0.6498. d. Yes, because 0.65 is not included in the 95% confidence interval 3.2.15 We are 99% confident that the proportion of all U.S. adults who think that the United States will become a cashless society in their lifetimes is between 0.5810 and 0.6592. Although there was strong evidence, there is not very strong evidence that the proportion of Americans that think the United States would become a cashless society in their lifetime is different than 0.65, because 0.65 is included in the 99% confidence interval. The 99% interval has the same midpoint but is wider. 3.2.16 Manny; a small sample size will result in more variability and hence a wider interval. 3.2.17 a. We need to know the sample size. b. The sample size of 4,000 will produce a confidence interval with the smallest width, the sample size of 1,000 will produce a confidence interval with a width a bit wider, and the sample size of 250 will produce a confidence interval with the widest width. This is because as sample size increases, the variability of the statistic (or the variability in the null distribution) will decrease and hence the confidence intervals will be narrower. c. For a sample size of 250, we got a standard deviation of 0.031 and a confidence interval of 0.39 ± 0.062. For a sample size of 1,000 we got a standard deviation of 0.015 and a confidence interval of 0.39 ± 0.030. For a sample size of 4,000 we got a standard deviation of 0.008 and a confidence interval of 0.39 ± 0.016. d. The midpoints of the intervals are all the same. e. As the sample size increases, the widths of the intervals decrease. f. Four times as much g. 0.39 ± 0.02; yes 3.2.18 a. π is the proportion of all U.S. teenagers who have some level of hearing loss. b. p-value is 0.2079
b. 0.0135
c. Our p-value is not small, so we only have weak evidence against the hypothesis that the proportion of all U.S. teens having hearing loss is different from 0.20.
3.2.10 0.45 ± 0.15
d. The 95% confidence interval is (0.1698, 0.2062).
a. 0.8715
c03Solutions.indd 34
10/19/20 6:23 PM
35
Solutions to Problems e. 0.0182
3.2.34 0.35 ± 0.0296, or (0.3204, 0.3796)
f. Yes, 0.20 is a plausible value for the proportion of the population that has hearing loss as 0.20 is contained in the 95% confidence interval.
3.2.35 0.65 ± 0.0418, or (0.6082, 0.6918)
g. Yes, 0.20 is a plausible value for the proportion of the population that has hearing loss because our p-value tells us that the probability we would have seen study results of 0.188 or more extreme is 0.21 assuming the population proportion truly is 0.20.
3.2.19
____________
____ ____ 1.96(0.5)√1/n is about √1/n
3.2.37 a. 278 b. 1,111 c. 10,000
a. 0.60 ± 2(0.070), or (0.46, 0.74) b. No, because values less than 0.50 are contained in the confidence interval of plausible values. 3.2.20
3.2.38 Answers will depend on the results of your sample size and proportion of heads for your spun coins. 3.2.39 a. 3
a. (0.4642, 0.7358)
b. 1
b. (0.2642, 0.5358)
c. 2
3.2.21
3.2.40 False
a. 0.72 ± 2(0.09) is (0.54, 0.90) b. Yes, because 50% is not a plausible value (based on confidence interval not including 0.50). c. There are not at least 10 successes and 10 failures (only 9 failures).
3.2.41 True 3.2.42 Range of plausible values, 2 SD method, theory-based oneproportion z-interval.
3.2.22
3.2.43 B.
a. 0.68 ± 2(0.03) is (0.62, 0.74)
Section 3.3
b. (0.62, 0.73) c. Similar because validity conditions are met 3.2.23
c. The p-value for the test is less than 0.01. 3.2.24 0.667 ± 2(0.063) = 0.667 ± 0.126, (0.541, 0.793). Similar because validity conditions are met. 3.2.25 all U.S. adults who consider
b. 0.05 ± 2(0.007), or (0.036, 0.064) c. Narrower d. The 95% confidence interval is (0.037, 0.064) and for 99% is (0.033, 0.068). 3.2.26 a. H0: π = 0.33; Ha: π > 0.33; p-value = 0.48, which offers weak evidence against the null so we can say that it is plausible that the longrun probability of correctly identifying the cup with Pepsi is 0.33. b. 0.344 ± 2(0.058), or (0.228, 0.460) c. Yes. Because our p-value was larger than 0.05 the value under the null was found to be plausible; thus it would be contained in an approximate 95% confidence interval of plausible values. a. (0.2274, 0.4602) b. 0.1164 3.2.28 (0.0376, 0.2098) 3.2.29 (0.0251, 0.1923) 3.2.30 (0.4267, 0.6071) 3.2.31 (0.1036, 0.2528) 3.2.32 (0.4275, 0.5855) 3.2.33 (0.2856, 0.4602)
c03Solutions.indd 35
3.3.2 D. 3.3.4 B.
b. 99% (0.5137, 0.8197)
3.2.27
3.3.1 D. 3.3.3 E.
a. 95% (0.5503, 0.7831)
a. π is the proportion of themselves vegetarians.
_________
3.2.36 1.96√0.5(1 − 0.5)/n = 1.96√0.5(0.5)/n =
3.3.5 False 3.3.6 False 3.3.7 False 3.3.8 a. Male rattlesnakes, all male rattlesnakes, 21 male rattlesnakes b. Sample size is greater than 20 and data are not strongly skewed c. Average age of all male rattlesnakes at this single site d. False; the parameter is fixed and is either in the interval or not. 3.3.9 No, the data seem to come from a population that has a skewed distribution, and the sample size of 16 is not large enough. 3.3.10 Yes, even though the sample size of 16 is not large enough, the sample data seem to come from a population that has a symmetric distribution. 3.3.11 a. The average number of siblings for all students at this school b. Though the data seem to come from a population that has a skewed distribution, the sample size of 122 is large enough. c. We are 95% confident that the average number of siblings that students at this school have is between 1.9340 and 2.4100. 3.3.12 a. The average reaction time of all student athletes at this school b. Though the data seem to come from a population that has a skewed distribution, the sample size of 42 is large enough. c. We are 99% confident that the average reaction time of all student athletes at this school is between 0.2799 and 0.3141 seconds. 3.3.13 a. The average guess for the number of beans in the bag by students at this school
10/19/20 6:23 PM
36
C HA PTER 3
Estimation: How Large Is the Effect?
b. Though the data seem to come from a population that has a skewed distribution, the sample size of 40 is large enough.
3.3.20
c. We are 95% confident that the average guess for the number of beans in the bag by students at this school is between 850.42 and 1,075.38 beans.
b. SPF, quantitative
d. Yes, we have strong evidence that, on average, students cannot accurately estimate the number of beans in the bag because the 1470 is not in the 95% confidence interval. 3.3.14 a. (1.37, 11.22) b. Yes, we have strong evidence that, on average, people tend to pick a face that is more attractive than their own when they are asked to identify their own face because the 95% confidence interval for the average score only has positive numbers. 3.3.15 a. The average number of hours per day U.S. adults watch television b. The sample size 1,555 is large enough. c. We are 95% confident that the average number of hours per day U.S. adults watch television is in between 2.80 and 3.08. 3.3.16 a. The average age of U.S. adults at birth of their first child b. The sample size 1,666 is large enough. c. We are 95% confident that the average age at which U.S. adults have their first child is in between 24.02 and 24.57. 3.3.17 a. The average length of time students guess has passed when played a 17-second song b. The sample appears to come from a symmetric distribution and the sample size 24 is reasonably large. c. We are 95% confident that the average length of time students guess has passed when played a 17-second song snippet is in between 11.56 and 16.04 seconds. d. Yes, we have strong evidence that, on average, students cannot accurately estimate the length of the 17-second song snippet because the 95% confidence interval does not contain 17, but rather numbers less than 17. It appears that students tend to underestimate how much time has passed, on average. 3.3.18 We are 99% confident that the average length of time students guess has passed when played a 17-second song is in between 10.77 and 16.83 seconds. Yes, we have very strong evidence that, on average, students cannot accurately estimate the length of the 17-second song snippet because the 99% confidence interval does not contain 17, but rather numbers less than 17. It appears that students tend to underestimate how much time has passed, on average. The midpoint is the same for both intervals, but the 99% confidence interval is wider. 3.3.19 a. The distribution consists of two groups with a large gap between each group. Eleven numbers are centered at approximately 106 dBA and 11 at approximately 85 dBA. The older-type models are probably not as loud, and volumes are represented in the lower group while the more intense dryers have volumes represented by the upper group. b. 95.391 dBA c. 90.5223 dBA to 100.2597 dBA d. None of the measurements are contained in the confidence interval. This makes sense because the interval is an estimate for the mean and the mean is between the two groups and not really close to any of the actual measurements.
c03Solutions.indd 36
a. Students c. (30.2986, 40.2815). We are 95% confident that the mean SPF level of sunscreen used by all students is between 30.2986 and 40.2814. d. Yes, the confidence interval is completely above 30. e. Yes, the sample size is greater than 20 and the data were not strongly skewed. 3.3.21 a. Students b. Hours/week studying statistics outside of class, quantitative c. (7.1114, 9.2886) d. Yes, the sample size is greater than 20 and the data were not strongly skewed. e. We are 95% confident that the average hours/week all statistics students think they will spend studying is between 7.11 and 9.29. 3.3.22 a. (98.2059, 98.5741) b. 98.6°F is not a plausible population average female body temperature because it is not contained in the 95% confidence interval. From the interval we can see that the average temperature is in fact significantly less than 98.6°F. c. Yes, the sample size was greater than 20 and the data were not strongly skewed. 3.3.23 a. (97.9318, 98.2782) b. 98.6°F is not a plausible population average male body temperature because it is not contained in the 95% confidence interval. From the interval we can see that the average temperature is in fact significantly less than 98.6°F. c. Yes, the sample size was greater than 20, and dotplot of data was bellshaped and symmetric. 3.3.24 a. Students b. Number of U.S. states visited, quantitative c. Probably, because statistics students wouldn’t be different on the number of states visited compared to all students at the school d. H0: 𝜇 = 12, Ha: 𝜇 ≠ 12
____
e. The midpoint of the interval is 15.04. 15.04 ± 2(9.26/√136 ) = 15.04 ± 1.588 = (13.45, 16.63). 3.3.25 a. (13.47, 16.61) b. Yes, 12 is not in the confidence interval. c. Yes, the sample size was greater than 20, and the data were not strongly skewed. 3.3.26 a. (13.80, 16.28) b. 2SD method is a rough approximation for a 95% confidence interval. It will not work for other levels of confidence. 3.3.27 a. Invalid b. Valid c. Invalid d. Valid
10/19/20 6:23 PM
Solutions to Problems
37
e. Invalid
3.3.36
f. Invalid
a. The sample size is greater than 20 and the data are not strongly skewed.
3.3.28 a. Needles b. Diameter of needle, quantitative c. The midpoint of the confidence interval would be 1.64. 1.64 ± ___ 2(0.07/√35 ) = 1.64 ± 0.024 = (1.616, 1.664). 3.3.29 a. The sample size is greater than 20 and the data come from a bellshaped distribution. b. (1.6160, 1.6640) c. There is not evidence that the average diameter of needles produced by the manufacturing process is different from 1.65 because 1.65 is contained in the interval. 3.3.30 a. (1.6111, 1.6689). We are 98% confident that the average diameter of all hypodermic needles produced is between 1.6111 mm and 1.6689 mm. b. 2SD method is a rough approximation for a 95% confidence interval. It will not work for other levels of confidence. 3.3.31 a. Invalid b. Invalid c. Invalid d. Valid e. Invalid f. Invalid 3.3.32 a. Students b. Number of Facebook friends, quantitative c. The sample size is greater than 20 and the data are not strongly skewed. d. (453.605, 624.795). We are 95% confident that the population average number of Facebook friends for students at this school is between 453.605 and 624.795. 3.3.33 a. Invalid b. Valid c. Invalid 3.3.34 a. Textbooks b. Price, quantitative c. The sample standard deviation is very large compared to the mean. d. The sample size is greater than 20 and the data are not strongly skewed.
b. (0.3241, 0.3839). We are 95% confident that the population average ppm of mercury in Yellowfin tunas is between 0.3241 and 0.3839. 3.3.37 a. The sample standard deviation is very large compared to the mean. b. The sample size is greater than 20 and the data are not strongly skewed. c. (34.33, 57.03). We are 95% confident that the population average cost of a haircut for college students (including tip) is between $34.33 and $57.03. 3.3.38 a. The sample standard deviation is very large compared to the mean. b. The sample size is greater than 20 and the data are not strongly skewed. c. (2.68, 3.06). We are 95% confident that the population average number of times per week adult Americans contact their closest friend is between 2.68 and 3.06. d. No, because 3 is in the interval
Section 3.4 3.4.1 3.4.2 B, D. 3.4.3 3.4.4 3.4.5 3.4.6 A, F, H. 3.4.7 E. 3.4.8 C. 3.4.9 B. 3.4.10 A. 3.4.11 A. 3.4.12 D. 3.4.13 Decrease 3.4.14 Increase 3.4.15 Increase 3.4.16 Decrease 3.4.17 95% 3.4.18 99% 3.4.19 A. 3.4.20 D. 3.4.21 Decrease 3.4.22 3.4.23
e. (45.82, 84.22). We are 95% confident that the population average cost of a textbook at Cal Poly is between $45.82 and $84.22.
a. Remain same
3.3.35
3.4.24 Increase
a. Albacore tunas
3.4.25 Decrease
b. Mercury level in parts per million, quantitative
3.4.26 95%
c. The sample size is greater than 20 and the data are not strongly skewed.
3.4.27 99%
d. (0.3155, 0.4005). We are 95% confident that the population average ppm of mercury in Albacore tunas is between 0.3155 and 0.4005.
a. p̂ is the sample proportion (statistic) and π is the population proportion (parameter).
c03Solutions.indd 37
b. Increase
3.4.28
10/19/20 6:23 PM
38
C HA PTER 3
Estimation: How Large Is the Effect?
b. p̂
d. False
c. Standard deviation of p̂ (or the standard error)
3.4.35 Two
d. Margin of error
3.4.36 Decreases, slowly
3.4.29
3.4.37
a. No effect on the midpoint
a. We want margin of error to equal 0.01, so, we want SE = 0.005. _____________ Thus, 0.005 = √1/6(1 − 1/6)/n . Solving for n gives 5,556.
b. Increases the margin of error c. Decreases the margin of error 3.4.30 The sample size, the level of confidence, and the value of p.̂ The value of p̂ is hardest to control. 3.4.31 The sample size, the level of confidence, and the value of the sample SD. The value of the sample SD is hardest to control 3.4.32 a. 90%: (0.6376, 0.6624); 95%: (0.6352, 0.6648); 99%: (0.6306, 0.6694) b. Midpoints are all the same.
b. When π = 0 or π = 1, as it is impossible to have any observational units with a second category in these cases c. When π = 0.50 the value of SD (p)̂ will be maximized. See graph (Solution 3.4.37c). 3.4.38 a. Four-sided die b. Because π is largest (0.25) in this case __________
c. 90% is narrowest, 99% is widest; as confidence level increases width of interval increases
d. 900
d. No, because all of the intervals include 0.64
a.
3.4.39
e. Yes, because there are at least 10 people who said they play video games and at least 10 said they didn’t play video games 3.4.33 a. Narrower b. Wider 3.4.34 a. False b. True
_____________
c. √ π(1 − π)/n = √0.1(1 − 0.1)/25 = 0.06
n
 SD of p̂
10
0.158
20
0.112
40
0.079
100
0.050
500
0.022
1,000
0.016
b. See graph (Solution 3.4.39b)
c. False
c. Decreases
0.30 0.25
)
0.20 0.15 0.10 0.05 0.00 0.0
0.2
0.4
0.6
0.8
1.0
1.2
200
400
600 n
800
1000
1200
Solution 3.4.37c 0.180 0.160 0.140
SD
0.120 0.100 0.080 0.060 0.040 0.020 0.000
0
Solution 3.4.39b
c03Solutions.indd 38
10/19/20 6:23 PM
Solutions to Problems d. Less
39
of an improvement. It’s about equally likely to get 33 or fewer, which is plausible if the batting average hasn’t improved (is still 0.250).
3.4.40 a. Rodgers. The benefit of the additional 25 flips decreases as the sample size increases. b. They will go down similar amounts because both Rodgers and Hammerstein are doubling the number of flips they are doing. 3.4.41 1,068, rounded up 3.4.42 1,844, rounded up 3.4.43 a. (0.5689, 0.5711). We are 99.9% confident that the percentage of all registered voters who intend to vote for the Republican candidate is between 56.89% and 57.11%.
3.4.46 a. Less overlap, suggesting some ability to distinguish a .333 hitter from a .250 hitter in a sample of 100 at-bats b. At least 32 hits to reject null hypothesis c. 63% power d. An OK chance. In 100 at-bats the player may get at least 34 hits, which is how many are needed to provide convincing evidence of an improvement. It’s less likely to get 31 or fewer, which is plausible if the batting average hasn’t improved (is still 0.250). 3.4.47
b. The sample size is so large, 2.4 million.
a. Little overlap, suggesting good ability to distinguish a .400 hitter from a .250 hitter in a sample of 100 at-bats
c. 0.365 is not contained in the confidence interval and the interval doesn’t come close to it.
b. At least 32 hits to reject null hypothesis c. 98% power
d. It can be blamed on a nonrepresentative sample because the sampling method was biased.
d. A good chance. In 100 at-bats the player will get at least 32 hits, which is how many are needed to provide convincing evidence of an improvement. It’s not very likely to get 30 or fewer, which is plausible if the batting average hasn’t improved (is still 0.250).
I n particular, the sampling frame only included those who had cars or phones. These people would be more affluent (especially during the Great Depression) and traditionally more affluent voters are Republicans.
3.4.48 True 3.4.49 B.
e. He did better than the Literary Digest because he used a random sample and so his method wasn’t biased.
End of Chapter 3 Exercises
3.4.44
a. 0.115 to 0.229
a. Lots of overlap, suggesting it’s hard to difficult to distinguish a .333 hitter from a .250 hitter in a sample of 20 at-bats b. At least 9 hits to reject null hypothesis c. Approximately 0.18 d. No, in 20 at-bats the player likely won’t get at least 9 hits, which is how many are needed to provide convincing evidence of an improvement. It’s more likely to get 8 or fewer, which is plausible if the batting average hasn’t improved (is still 0.250). e. Could increase sample size (> 20) or increase significance level (> 0.05)
3.CE.1 b. We can be 95% confident that between 11.5% and 22.9% of all water specimens from aircraft carrying domestic and international passengers will test positive for the presence of bacteria, thus failing to meet federal safety standards. 3.CE.2 a. 0.457 to 0.583 b. These data do not provide convincing evidence that a majority of 16- to 17-year-old cell phone users talk on the phone while driving, as part of the interval is below 0.50. c. 0.417 to 0.543
3.4.45
d. You would need about 400 people.
a. Less overlap, suggesting some ability to distinguish a .333 hitter from a .250 hitter in a sample of 100 at-bats
3.CE.4
b. At least 34 hits to reject null hypothesis c. 48% power
Margin
d. Not a real good chance. In 100 at-bats the player may get at least 34 hits, which is how many are needed to provide convincing evidence 0.20 0.18 0.16 0.14 0.12 0.10 0.08 0.06 0.04 0.02 1
2
3
3.CE.3 3.CE.5 a. 100: 0.1, 400: 0.05, 500: 0.045, 1000: 0.032, 2000: 0.022, 8000: 0.011, 9000: 0.011 b. The margin of error decreases as the sample size increases. (See graph for Solution 3.CE.5b.)
4 5 n (thousands)
6
7
8
9
Solution 3.CE.5b
c03Solutions.indd 39
10/19/20 6:23 PM
40
C HA PTER 3
Estimation: How Large Is the Effect?
c. To cut the margin of error in half, the sample size has to increase by a factor of 4.
b. Decrease
d. Increasing the sample size from 100 to 500 has a much bigger impact. This increase in sample size reduced the margin of error by more than half. The increase from 8,000 to 9,000 didn’t even change it when rounded to three decimal places.
a. American adult Internet users
3.CE.6 a. It is unlikely that our sample proportion will be the same as the population proportion, but if the sample was taken randomly, we can be fairly sure that it is close and that closeness can be represented through an interval estimate. b. Country B will have the widest interval because it had the smallest sample size. Country C will have the narrowest because it had the largest sample size. c. The interval for country B will contain 0.20. d. The p-value for this test for country B will be large (above 0.05) because the interval contained 0.20. The p-value for this test on the other two countries will be small (less than 0.05) because their intervals did not contain 0.20 and so 0.20 is not plausible. 3.CE.7 a. A: z = 8.16, B: –1.44, C: 3.95 b. Researcher A has strongest evidence. Researcher B has least evidence.
3.CE.15 b. Proportion of American adult Internet users that use social networking sites c. (0.6998, 0.7402) d. We are 95% confident that the proportion of American adult Internet users that use social networking sites is between 0.6998 and 0.7402. e. Yes, because 75% is not in the interval 3.CE.16 a. for 18- to 29-year-olds (0.8591, 0.9209), for 30- to 49-year-olds (0.7451, 0.8149), for 50- to 64-year-olds (0.5592, 0.6408), for 65 + (0.3786, 0.4814) b. These margins of error are larger because the sample size in these subgroups is smaller than the overall sample size. 3.CE.17 a. Statistics, because they are based on samples b. (0.2863, 0.3057). We are 95% confident that the proportion of Americans who reported that the state of the U.S. economy would affect their Halloween spending is between 28.63% and 30.57%. c. The sample standard deviation
3.CE.8
3.CE.18
a. 0.2565 to 0.3305
a. (55.67, 56.95). We are 95% sure that the average amount Americans will spend on Halloween is between $55.67 and $56.95.
b. Values between 0.2565 and 0.3305 should not be rejected at the 0.01 significance level because they are plausible values for the population proportion. 3.CE.9 a. Null hypothesis: One-third of all American households include a pet cat. Alternative hypothesis: A proportion different than one-third of all American households include a pet cat z = –4.29 and p-value < 0.0001. We can conclude that the proportion of American households that include a cat is not 1/3. Because our sample proportion is less than 1/3, we can also conclude that the population proportion is also less than 1/3. b. The p-value turned out to be so small because the sample size is huge.
b. (55.25, 57.37). The interval is wider. c. The distribution of expected Halloween spending amounts should not be strongly skewed. 3.CE.19 a. Each game is an observational unit; variables are goals per game and margin of victory, both quantitative. b. Get a list of all games from the current NHL season and randomly choose 44 to be part of the sample. c. The graph is fairly symmetric.
c. 0.318 to 0.330 d. Yes, one third (or about 0.3333) is not included in the interval. e. The interval is very close to 1/3. 3.CE.10 This is not an appropriate use for a confidence interval. The instructor is not selecting a random sample but has information about the entire population. Therefore, he can give an exact value for the proportion and does not need to give an interval estimate. 3.CE.11 99.99% confidence intervals will often be extremely wide and uninformative. 3.CE.12 The difference in width between an interval based on a sample of 1,000 Americans vs. 1,000,000 Americans may not be that practically different, but the sample of 1,000,000 Americans will cost substantially more to obtain. 3.CE.13 70% is not a very high confidence level and researchers often want to be more confident in their conclusion than just 70% confident. 3.CE.14 a. Increase
c03Solutions.indd 40
(n = 44) 2
4
6 Total goals
8
10
d. Yes, there are at least 20 observations and the data are fairly symmetric. _ e. Mean (x) is 6.114 and standard deviation (s) is 1.728 f. (5.59, 6.64) g. We are 95% confident that the true average number of goals scored in an NHL game is between 5.59 and 6.64 goals per game.
10/19/20 6:23 PM
Solutions to Problems
41
3.CE.20
3.CE.25
c. Distribution is somewhat right skewed
a. The random sample, because the 4.5% of women who returned the questionnaire are likely quite different from the 95.5% of women who didn’t. b. (0.3938, 0.4862) c. We are 95% confident that the percentage of all American women who believe that they give more emotional support than they receive is between 39.38% and 48.62%. d. The Hite survey would have a smaller margin of error because the sample size is larger.
Chapter 3 Investigation 0
1
2
3 4 Margin victory
5
6
7
d. OK because even though our distribution is skewed to the right, the sample size is larger than 20 _ e. Mean (x) is 2.159 and standard deviation (s) is 1.328 f. (1.76, 2.56) g. We are 95% confident that the true average margin of victory in an NHL game is between 1.76 and 2.56 goals. 3.CE.21 a. The distribution of difference in ages is fairly symmetric and centered slightly above zero, indicating a typical difference of approximately 1 to 2 years in male and female ages (males slightly older), though the overall distribution ranged from –7 to 18, with a slight tail to the right indicating a handful of marriages where the male is more than 10 years older than the female. b. Yes, data are not strongly skewed and sample size is more than 20 c. (0.5944, 3.2456) d. We are 99% confident that the true mean difference in male and female ages of soon to be married couples in Cumberland County is between 0.59 and 3.25 years. e. Yes, because the entire interval is positive, we have convincing evidence that husbands are older, on average, than their wives. f. Soon to be married couples in Cumberland County 3.CE.22 a. 27 out 100 (0.27) b. Yes, because there are at least 10 couples where the wife is older than the husband (27) and at least 10 couples where the husband is at least as old as the wife (73). c. (0.197, 0.343); We are 90% confident that the population proportion of married couples for whom the wife is older than the husband is between 0.197 and 0.343. d. Yes, 50% is not in the interval, and the entire interval is below 50%. 3.CE.23 a. Invalid b. Invalid c. Invalid d. Valid 3.CE.24 No. Because you have the entire population (census), you do not need to do a confidence interval for the parameter of interest. You can find the parameter value directly.
c03Solutions.indd 41
1. The majority of drivers aged 16 to 17 have talked on a cell phone while driving. 2. 242 cell phone users ages 16 to 17 years old 3. Talked on a cell phone while driving (yes/no) 4. The proportion of all 16- to 17-year-old drivers who have talked on a cell phone while driving 5. Null: The proportion of all 16- to 17-year-old drivers who have talked on a cell phone while driving is 0.50. Alternative: The proportion of all 16- to 17-year-old drivers who have talked on a cell phone while driving is larger than 0.50. 6. 0.52 × 242 = 126 7. 0.52 8. Yes 9. 0.50 10. Yes, all sample outcomes are possible. Some are more likely than others. 11. See graph for Investigation 11 12. The center is 0.500. This makes sense because that is the null hypothesis probability of success. 13. Values between 0.43 and 0.57 seem typical. Atypical values are less than 0.43 and greater than 0.57. 14. The actual study result (0.52) is a “typical” value when simulating from a null hypothesis proportion of 0.50. Thus, it’s not convincing evidence that the true proportion is larger than 0.50, because if it is 0.50 you often will see a value like 0.52 in your sample. 15. The approximate p-value is 0.259. This is the probability of obtaining 0.52 or larger when the long-run probability of talking on a cell phone while driving is 0.50. 16. We do not have evidence that the proportion of all 16- to 17-yearold drivers who have talked on cell phones while driving is greater than 0.50; 0.50 is a plausible value for the proportion of all 16- to 17-year-old drivers who have talked on cell phones while driving. This is because 0.52 is a typical value obtained when simulating from a population where the proportion of 16- to 17-year-old drivers who have talked on the cell phone while driving is 0.50. 17. No, other values are also plausible. 18. (0.46 to 0.58); using a two-sided test p-value of 0.05 or higher for plausible values 19. a. 2SD = 2 × 0.031 = 0.062. So, the interval is 0.52 ± 0.062, or (0.458, 0.582). b. They are similar—this makes sense because they are two different approaches to yield the same thing (95% CI).
10/19/20 6:23 PM
42
C HA PTER 3
Estimation: How Large Is the Effect? Probability of heads:
0.5
Number of tosses:
242
60
Number of repetitions: 1000 Animate
Mean 0.500 SD 0.031
40 20
Toss Coins Total
0
1000
Number of heads Proportion of heads As extreme as
.52
0.40 0.43 0.46 0.49 0.52 0.55 0.58 0.61 0.5200 Proportion of heads
Summary Stats Count
Proportion of repetitions: 259/1000 0.2590 Investigation 11
c. Do 1 minus the values of the confidence interval in 19(a). This yields (0.418, 0.542).
8. Each sample was noted as to whether a drug was present or not. This is a categorical variable.
20.
9. Each sample measured the ng/mL of the drug present in the sample. This is a quantitative variable.
a. (0.457, 0.583) b. Similar. Because the sample size is large, the theory-based interval is a good prediction of the simulation-based interval. 21. Confidence interval gets smaller (0.489, 0.551) 22. (0.44, 0.60); interval gets wider 23. Assuming that the 242 16- to 17-year-olds in the sample are a good representation of all 16- to 17-year-olds in the United States, the conclusion may generalize to all 16- to 17-year-olds. Will likely not generalize well to all adults, because there are only 16- to 17-year-olds in the sample. 24. 52% of a sample of 242 16- to 17-year-olds report having ever talked on a cell phone while driving. This is not evidence that the majority of all 16- to 17-year-olds who have ever talked on a cell phone while driving is more than 50%. We’re 95% sure that the true proportion of all 16- to 17-year-olds who have ever talked on a cell phone while driving is between 0.46 and 0.58. Further studies are needed to investigate how frequently cell phone use occurs with this age group and if this result generalizes to other age groups.
Chapter 3 Research Article 1. What proportions of climbers on Mont Blanc use drugs and what types of drugs are these? Specifically, is there a difference in the usage of drugs to prevent altitude illness compared to those used to enhance physical or psychological capacities? 2. The practice of taking multiple drugs may be unsafe in a remote alpine environment. There is not good data on drug use of climbers for Mont Blanc. 3. 430 4. Selection bias (high proportions of drug users who want to hide their drug use and not be included in the study) 5. Automatic urine collection systems were installed in urinals in two locations (Gouter and Comsmiques huts) with the capacity to collect 24 samples a day. These systems were not visible to the human eye, but signs were posted on the hut doors in both French and English that urine samples were being randomly collected. 6. The urine collection system couldn’t be adapted to fit toilets where water was systematically mixed with the urine. 7. Between July and September of 2013 over 14 consecutive days in the Gouter hut and 21 consecutive days in the Cosmiques hut. A maximum of 24 samples per day could be collected.
c03Solutions.indd 42
10. 35.8% of the samples were positive, 48.8% of the samples were negative, and 15.3% of the samples were categorized as contaminated. 11. The main substances detected were the diuretic Acetazolamide and the hypnotic Zolpidem. 12. (16.5, 24.7) 13. Male climbers of Mont Blanc 14. The population proportion of male climbers of Mont Blanc who have used the diuretic Acetazolamide 15. We are 95% confident that the population percentage of male climbers of Mont Blanc who have used the diuretic Acetazolamide is between 16.5% and 24.5%. 16. No, because 25% is included in the interval 17. 9 ± 2(23) = (–37, 55) 18. Male climbers of Mont Blanc 19. The average concentration of the hypnotic Zolpidem in population of male climbers of Mont Blanc 20. We are 95% confidence that the mean concentration of the hypnotic Zolpidem in the population of male climbers of Mont Blanc is between –37 ng/mL and 55ng/mL. 21. No. Because zero is in the confidence interval it is plausible that the mean concentration of the hypnotic Zolpidem in the population is zero. 22. Random sampling was not used in this study. While signs on the hut doors said that samples were being randomly collected, the first 24 to urinate in the urinal with the collection system were included in the sample. This means that it will be difficult to generalize findings. 23. Answers will vary. 24. The percentage of climbers of Mount Kilimanjaro who tested positive for the diuretic Acetazolamide was 33%. This is above the 95% confidence interval for the percentage of climbers that showed the diuretic Acetazolamide in the sample from Mont Blanc. 25. Limited to males, anonymous collection so no demographics of climber for each sample, elimination of all possibly contaminated samples from the analysis, repeated use of urinal by the same person (double samples), climbers who didn’t use the urinals fitted with the collection system didn’t have a chance to be included in the study.
10/19/20 6:23 PM
CHAPTER 4
Causation: Can We Say What Caused the Effect? Section 4.1
4.1.9
4.1.1 B.
a. C.
4.1.2 D.
b. B.
4.1.3 E.
c. Quantitative
4.1.4 B.
d. D.
4.1.5
e. Quantitative
a. B.
4.1.10
b. A.
a. B. b. C.
c. A.
c. Yes, the proportions are very different.
d. Categorical
4.1.11 J ohn needs to add that children of smokers may also be more likely to use candy cigarettes.
e. Quantitative 4.1.6 B. 4.1.7 In short, students who pull all-nighters are more likely to make other poor lifestyle choices (e.g., poor diet, lack of exercise, etc.) and those lifestyle choices may have a negative impact on GPA. (The diagram is shown in Solution 4.1.7.) 4.1.8 A confounding variable may explain this relationship: wealth of the country (e.g., GDP). In particular, wealthier countries have better health care and longer life expectancy and also can afford to buy more televisions.
4.1.12 a. 2,622 adults in the survey b. Explanatory: generation (categorical); Response: whether someone believes marriage is obsolete (categorical) c. Yes, the proportions are different between the groups, suggesting the different generations tend to view marriage differently. 4.1.13 Snow shoveling is a potential confounding variable. (The diagram is shown in Solution 4.1.13.)
Has pulled an allnighter (Other poor lifestyle choices) GPA
Student Hasn’t pulled an allnighter (Other good lifestyle choices) Solution 4.1.7 Dec/Jan (More snow shoveling)
Heart attack risk
People Not Dec/Jan (Less snow shoveling) Solution 4.1.13
c04Solutions.indd 43
43
10/19/20 6:24 PM
44
C HA PTER 4
Causation: Can We Say What Caused the Effect?
Mediterranean Diet (Ethnicity: e.g., Italian) Memory; cognition
People No Mediterranean Diet (Ethnicity: e.g., Not Italian) Solution 4.1.14 Hormone replacement therapy (Higher income) Women
CHD risk No hormone replacement therapy (Lower income)
Solution 4.1.15
Eat breakfast (Get enough sleep) Maintain weight loss
People Don’t eat breakfast (Don’t get enough sleep) Solution 4.1.16 Dinner as family (Good family life)
Drug use
Teens No dinner as family (Poor family life) Solution 4.1.17
4.1.14
4.1.17
a. Exp: Mediterranean diet (Y/N); Resp: memory and cognitive skills
a. Teenagers
b. Ethnicity (The diagram is shown in Solution 4.1.14.)
b. Exp: whether eat dinner as a family; Resp: whether use drugs
4.1.15
c. Quality of family life (The diagram is shown in Solution 4.1.17.)
a. Exp: hormone replacement therapy; Resp: coronary heart disease
4.1.18
b. Income (The diagram is shown in Solution 4.1.15.)
a. Pregnant women
4.1.16 a. Explanatory: whether or not eat breakfast; Response: whether able to maintain weight loss b. Whether or not getting enough sleep (The diagram is shown in Solution 4.1.16.)
c04Solutions.indd 44
b. Whether or not smoke, categorical c. Weight of baby at birth. quantitative d. People of lower socioeconomic status are more likely to smoke and also more likely to have a poorer diet (and other factors) that may lead to lower-weight babies.
10/19/20 6:24 PM
Solutions to Problems 4.1.19
4.1.30
a. Exp: whether or not spanked as a child, categorical
a. Countries
b. Resp: IQ, quantitative
b. Exp: chocolate consumption; Resp: number of Nobel Prizes
45
c. Socioeconomic status
c. Wealth of nation.
4.1.20 Because 46% of people who are overweight have overweight friends, compared to only 30% of people who are not overweight, there is an association; the proportions are different.
4.1.31 a. Exp: ethnicity, categorical; Resp: whether or not have breast cancer (yes/no), categorical
4.1.21
b. The proportions are different, suggesting an association.
a. Whether or not have children, categorical
c. Diet
b. Age at death (years), quantitative
4.1.32 To make sure that the effect of age was not confounded with the effect of species
c. Yes, men with children tended to live longer as there is a higher proportion of men over age 60 in this group. d. Whether or not the man is married 4.1.22 a. Income level, categorical
4.1.33 a. The 347 students b. Explanatory: time of day, categorical; response: whether or not using phone, categorical
b. Happiness level, categorical
c. Afternoon: 48/232 = 0.207, evening: 45/115 = 0.391
c. The proportions are different between the groups, showing evidence of association.
d. Yes, a higher proportion of students in the evening were observed using their phones.
d. Can’t conclude cause and effect. A potential confounding variable is education level.
4.1.34
4.1.23 a. Whether male or female, categorical b. Whether want to stay at same weight or change weight, categorical c. Yes, there is evidence of an association because the proportions are different. 4.1.24 a. Exp: political party (categorical); Resp: whether or not believe that humans evolved (categorical) b. Yes, the proportions are different
a. The 884 moviegoers b. Explanatory: movie type (2D or 3D), categorical; response: whether or not experienced nausea, categorical c. Yes, the proportion who reported experiencing nausea among 3D movie viewers compared to 2D movie viewers was higher (0.461 > 0.215). d. Answers will vary, but could include genre (cartoon, mystery, etc.) and time of viewing (late evening, early evening, etc.).
Section 4.2
4.1.25
4.2.1 A.
a. Education level, categorical
4.2.2 C.
b. Whether or not know that census is required, categorical
4.2.3 A.
c. Yes, the percentages are different in the two groups.
4.2.4 B.
4.1.26
4.2.5 A.
a. Whether or not attend religious service weekly, categorical
4.2.6 E.
b. Blood pressure, quantitative
4.2.7 B.
c. Overall health/diet could be associated with church attendance and blood pressure. 4.1.27 a. Whether or not exercise regularly, categorical b. Whether or not had a cold during last week, categorical c. Other better health behaviors (e.g., handwashing) 4.1.28 a. Political party, categorical b. Whether or not think that states should ignore federal gun laws, categorical c. Yes, the percentages are different between the different political parties. 4.1.29 a. Whether or not eat red meat regularly, categorical b. Whether or not died from heart disease, categorical c. Alcohol consumption
c04Solutions.indd 45
4.2.8 C. 4.2.9 Random sampling 4.2.10 Random sampling 4.2.11 Random assignment 4.2.12 Random sampling 4.2.13 Random assignment 4.2.14 “Tend to” means that there is an inclination/leaning toward a particular outcome; importantly, the phrase does not suggest cause and effect. 4.2.15 Yes, a study can have both random sampling and random assignment. In this case researchers can generalize to the population and infer cause and effect. 4.2.16 Because researchers often don’t know what the potential confounding variables are or the variables can be difficult to measure 4.2.17 No, not always. On average random assignment balances variables between two groups, so there is a tendency for balance, but
10/19/20 6:24 PM
46
C HA PTER 4
Causation: Can We Say What Caused the Effect?
confounding variables won’t be exactly balanced every time due to the randomness in the assignment. 4.2.18 a. No cause and effect b. Yes, cause and effect; listening to music improves scores on math test c. No cause and effect 4.2.19 a. Yes, cause and effect; type of video changed quiz score b. No cause and effect c. No cause and effect 4.2.20 Random sampling is how you obtain observational units from the population of interest, impacting the ability to generalize to that population. Random assignment is how observational units are assigned values of the explanatory variable, impacting the ability to draw cause-and-effect conclusions. 4.2.21 a. Observational studies b. No c. Babies 4.2.22 T o conduct a randomized experiment you would have to randomly assign some mothers to smoke and others not to. This would be unethical, especially if we feared potential negative health outcomes for the baby. 4.2.23 a. Observational b. No c. Babies d. Diet, health of parents e. Sex of baby 4.2.24 a. Observational
b. Yes, because subjects were randomly assigned to which name they saw for the exact same map. Any difference in responses can reasonably be attributed to the difference in name. 4.2.31 a. Experiment, because researchers controlled who received which pill b. In a double-blind study, participants as well as evaluators are unaware of who received which treatment. This tries to eliminate personal bias and psychological effects from changing the response. c. A placebo-controlled study uses a control group that receives a placebo, to eliminate psychological bias. That way all subjects receive a treatment, either active or inactive. d. We should randomly assign about a quarter of the 22,071 subjects to a twice-daily low-dose aspirin or beta carotene, or a placebo version of one of these. 4.2.32 a. Experiment, because researchers controlled who received which pill b. In a double-blind study, participants as well as evaluators are unaware of who received which treatment. This tries to eliminate personal bias and psychological effects from changing the response. c. A placebo-controlled study uses a control group that receives a placebo, to eliminate psychological bias. That way all subjects receive a treatment, either active or inactive. d. We should randomly assign about an eighth of the 14,641 to a twice-daily beta-carotene, twice-daily vitamin E, daily vitamin C, daily multivitamin, or a placebo version of one of these. 4.2.33 a. Experiment, some received incentive and some didn’t b. Households c. Whether or not received an incentive (categorical; explanatory); whether or not responded to survey (categorical; response) d. Yes, can draw conclusions to population e. Yes, incentive is what is causing the increase in participation rate.
b. No
f. Yes, because of the random assignment cause-and-effect conclusions are possible.
c. Children
4.2.34
d. Whether or not classical music was played when in womb or when infant e. Intelligence test score 4.2.25 Randomly assign 30 students to take notes by hand (with pen and paper) and 30 to take notes on their computer (by typing) while watching the video. After the video, give them the 10-question quiz, grade their quizzes, and compile the scores. 4.2.26 There is no control group to compare the effects of the new drug to. 4.2.27 Random assignment of subjects to either new drug or placebo 4.2.28 Double blinding. Evaluators of headache severity should not know whether the subject was on drug or placebo. 4.2.29 The missing component was random assignment of order in which flavors were administered to each subject. The order in which the subjects taste the candy could have an effect on their ability to identify which flavor is which. For example, it could be easy to tell what the first one is but hard for the second. 4.2.30 a. No, association does not imply causation. There could be possible confounding variables that have not been accounted for.
c04Solutions.indd 46
a. This is an experiment because of the r andom assignment to receive a friend request from either a male or female student. b. Students who received friend requests on Facebook c. Sex of person giving friend request (categorical; explanatory); whether or not friend request was accepted (categorical, response) d. The study did not use random sampling, so the sample might not be representative of all students at the college, making generalizing to that larger group difficult. e. Yes, individuals were randomly assigned to receive the friend request from a male or female. The advantage of having done the random assignment is that we may be able to conclude causation. f. Yes, because of the random assignment 4.2.35 a. Type of picture (categorical) b. Number of bubbles popped (quantitative) c. Randomly assigned people to which picture they saw 4.2.36 a. This is an experiment because researchers determined who would be in which pose. b. Participants (42 of them)
10/19/20 6:24 PM
Solutions to Problems
47
c. Type of pose to be held (categorical; explanatory); whether or not took bet (categorical; response)
always easy to do and often many different types of studies need to be done to give strong evidence of causation.
d. No random sampling—participants were volunteers; yes, random assignment—researchers randomly assigned who would be in which pose
b. Answers will vary. For example, if a group of healthy volunteers are recruited to be in a study to determine the safety and effectiveness of a vaccine, we can probably generalize the results to a population that has similar characteristics (age, sex, race, etc.) to the subjects in the study.
e. Yes, because the random assignment suggests cause and effect is possible 4.2.37 a. This is an observational study because there was no random assignment of the explanatory variable (stuffy nose or healthy).
End of Chapter 4 Exercises 4.CE.1 A.
b. Students
4.CE.2
c. Whether student was healthy or had a stuffy or runny nose (categorical; explanatory). Whether the student picked the correct flavor (categorical; response).
a. No
d. Neither. Participants were recruited at the cafeteria and were not randomly assigned to a group. e. No, because no random assignment was used. 4.2.38 a. The purpose is to show that the two treatment groups were alike on aspects such as age, proportion of males and females, how long subjects had had the disease for, and so on, thus eliminating these as being possible confounding variables. b. We would like the p-values to be large so that there is no evidence of differences between the two groups on any other variables except on the explanatory variable of interest. 4.2.39 a. Randomly assign 6 of the 12 bananas to a group where the stems are wrapped and the other 6 to a group where the stems are not wrapped. b. Randomly assign 3 of the 6 organic bananas to a group where the stems are wrapped and the other 3 to a group where the stems are not wrapped. Do the same for the 6 nonorganic bananas. c. If organic and nonorganic bananas ripen at different rates, it is important that there is the same number of organic (and nonorganic) in the two groups (wrapped and unwrapped). If one of the groups has more nonorganic bananas than the other, you don’t know if the difference in ripening is due to the type of banana or the wrapping treatment. With such small sample sizes here, the groups could be very different in terms of the proportion of nonorganic bananas if blocking were not used. 4.2.40 a. Block on the location of the disease. b. Randomly assign 60 of those with the disease in just their small intestine, 40 of those with the disease in just their colon, and 50 with the disease in both areas to the treatment (drug) group and the others to the control (placebo) group. c. A randomized block design guarantees that half of each disease group are in each treatment group. A completely randomized design may not do this exactly. 4.2.41 a. The SDs for both are about 0.20, so they are quite similar. Therefore, it does not appear that a person’s sex is associated with the balance gene. b. The SDs for both are about 4.1, so they are quite similar. Therefore, it does not appear that a person’s sex is associated with the factor x. 4.2.42 a. It is possible to design observational studies that protect against confounding using more advanced statistical methods. These are not
c04Solutions.indd 47
b. No 4.CE.3 a. Type of visit b. Heart rate, blood pressure, anxiety levels, etc. c. Randomized experiment because patients were randomly assigned to one of the three types of experimental groups d. To potentially draw a cause-and-effect conclusion e. No, they were hopefully controlled for by the random assignment, eliminating them as potential confounding variables. 4.CE.4 a. Experiment b. Type of footwear (explanatory; categorical); confidence at walking down hill (response; categorical) c. Random assignment 4.CE.5 a. Explanatory: price of pill (categorical); Response: whether or not experienced a reduction in pain Yes or no (categorical) b. Randomized means subjects were randomly assigned which price group they were in. c. Double-blind means neither the participant nor the researchers interacting with the participants knew which group the participants were in. d. The results are statistically significant, meaning that there is evidence of a potential cause-and-effect relationship between the price of the pill and perceived reduction in symptoms. Because there is not random sampling this result should be generalized, with caution only to individuals with similar characteristics of the volunteers. 4.CE.6 a. Exp: whether money was called rebate or bonus (categorical); Res: how much person spent (quantitative) b. Experiment, because students were randomly assigned to one of two groups, indicating what the $ was for (rebate or bonus) c. Random assignment, but not random sampling d. Yes, because random assignment was used e. No, because the sample was students from a single class at Harvard 4.CE.7 a. Exp: whether male or female; Resp: whether or not satisfied with attractiveness b. Observational study. There is no random assignment of the explanatory variable (a person’s sex). c. Random sampling, to generalize to all U.S. adults, but not random assignment because you can’t randomly assign people to be male or female
10/19/20 6:24 PM
48
C HA PTER 4
Causation: Can We Say What Caused the Effect?
4.CE.8 a. Observational study, because it’s unlikely researchers randomly assigned students to live on or off campus as part of the study b. No, no random assignment was used.
8. It would have approximately doubled to 0.046. 9. We have strong evidence that the long-run proportion of phone calls to the researcher from men in the study is statistically significantly different between the two groups.
c. No, no random assignment was used. 4.CE.9 a. Experiment because visitors to Google were assigned to see one ad or the other b. Explanatory: name of book on ad; Response: Did person click through? Yes or no c. Yes, assuming Google visitors were randomly assigned which ad to see 4.CE.10 a. No, this is not an experiment; it is an observational study. b. You would be forcing some people to spank their children but not others. 4.CE.11 a. Exp: whether teenager is from the U.S. or UK (categorical); Resp: number of Harry Potter books read (quantitative) b. Random sampling because you want to generalize to U.S. and UK teenage populations; random assignment makes less sense because you would have to randomly assign teenagers to live in the U.S. or UK. 4.CE.12 Over a month long period, randomly assign each day to either classical music or not classical music and then keep track of price of average meals ordered at the restaurant each day. 4.CE.13 Randomly assign student essays to be graded by a professor with a blue or red pen and then keep track of the scores given on the essays. 4.CE.14 Randomized experiment is not really feasible because you would have to randomly assign some people to have children and others not. 4.CE.15 Randomized experiment is not really feasible because you would have to randomly assign some people to go to church regularly and others not.
Chapter 4 Investigation 1. Observational study. There is no random assignment of subjects (men) to groups (which bridge they crossed). 2. Each of the 34 men in the study 3. Expl: bridge type, Resp: whether the subject called the researcher
10. Because the confidence interval does not include 0, 0 is a not a plausible value for the true difference between the long-run proportions of the two conditions, meaning we have evidence of a difference between the long-run proportions of calling by men under the two conditions. This is the same conclusion we reached in #9. 11. Because this is not a random sample, there is no obvious population to which we can generalize our conclusion. Certainly not all people and not all men. 12. Because random assignment was not used, we cannot conclude that the type of bridge crossed is what made the difference in the proportions of men calling between the two groups. 13. They are trying to address the potential confounding variable that men who choose to cross the suspension bridge may be of a different personality (more outgoing?) than those who choose to cross the wooden bridge (more reserved?) and that it is this personality difference that explains the difference in the proportion of “call-backs” received by the researcher. 14. 13/20 = 0.65, 7/23 = 0.30 15. We have strong evidence that the long-run proportion of men calling the researcher for men who are on a suspension bridge is larger than that among men who have recently crossed a suspension bridge. 16. No, they still do not have a random sample. 17. No, they still have not randomly assigned subjects. They have addressed one potential confounding variable, but there still may be more.
Chapter 4 Research Article 1. The researchers were trying to understand the role that color plays in impacting people’s aggression and willingness to pay in consumer auction and purchase situations. 2. (Answers will vary; here are two examples.) Colors have a significant effect on emotions (Hemphill, 1996). Bellizi and Hite (1992) find that red induces more negative outcomes (decreases purchase incidence, increases purchase postponement, decreases browsing and search) relative to blue. 3. The bid jump is the difference between the new bid price and the old high bid.
4. Fill in the 2-by-2 table with the data from the study.
Suspension bridge
Wooden bridge
Totals
Subject called interviewer
9
2
11
Subject did not call interviewer
9
14
23
Totals
18
16
34
5. 9/18 = 0.50 6. 2/16 = 0.125 7. We have strong evidence that the long-run proportion of phone calls to the researcher from men in the study is statistically signifi-
c04Solutions.indd 48
cantly larger in the group of men crossing the Capilano suspension bridge.
4. The mean is larger than the median, suggesting that the data are right skewed. 5. … strong evidence … 6. No, bid jumps were not randomly assigned to treatment groups. Thus, no cause-effect conclusions can be made from this study. 7. To control as many possible confounding variables as possible 8. Confounding variables 9. We have strong evidence that students bidding on Wii’s on the red background yielded higher bid jumps on average than students bidding on Wii’s on the blue background. 10. The 39 students in the red group should be similar to the 39 students in the blue group.
10/19/20 6:24 PM
Solutions to Problems 11. Cause and effect 12. They were undergraduate students from a class. This means that the conclusions about the impact of color on bidding behavior may not generalize beyond similar undergraduate students. 13. We have strong evidence that the red and blue groups bid different amounts. 14. They should be similar due to the random assignment of people to red-blue groups 15. Cause-effect conclusion on the effect of color on bid price is possible.
c04Solutions.indd 49
49
16. A convenience sample from the Internet, meaning that we may not be able to generalize the cause-effect conclusions about the impact of color on bid amount beyond similar individuals. 17. Answers will vary. (1) As is often the case in randomized experiments, the subjects were not randomly obtained. Follow- up studies should consider alternative ways of recruiting more diverse samples to ensure that the findings transfer to alternative groups of individuals. (2) Follow-up studies could expand the scope of experiments to consider alternative products and alternative purchase situations (e.g., store-front displays) as the current studies are limited in scope.
10/19/20 6:24 PM
CHAPTER 5
Comparing Two Proportions Section 5.1 5.1.1 B. 5.1.2 B.
e. Relative risk because the difference in proportions are the same, but the relative risks are different 5.1.11 See the following table.
5.1.3 C. 5.1.4 D. 5.1.5 C. 5.1.6 B. 5.1.7 a. Yes, there are more regular coffee drinkers than non–coffee drinkers in the sample because the width of the bar for coffee drinkers is wider than the width of the bar for non–coffee drinkers. b. Yes, there are more nappers than non-nappers in both the regular coffee group and the non–coffee group; the proportion of the bar that represents nappers is greater than 50%. c. There is a larger proportion of nappers among those who regularly drink coffee, because the graph shows roughly 60% nappers in the regular coffee group compared to roughly 55% nappers in the non–coffee group.
Played sports in high school
Male
Female
Total
Yes
83
117
200
No
67
93
160
150
210
360
Total 5.1.12 See the following table.
Member of Frat. or Sor.
Total
Yes
No
Has one or more alcoholic Yes drinks per week No
17
135
152
25
205
230
Total
42
340
382
5.1.8
5.1.13
a. There were more observations in 2002, because the width of the bar for that year is wider than that for 2018.
a. The observational units are the college students.
b. Approximately 70% of the occupants were wearing seat belts in 2002 and 90% were wearing seat belts in 2018.
b. The response variable is whether or not the student is wearing clothing that displays the college name or logo on a particular day. Categorical.
c. There appears to be an association between year and seat belt use, because the percentages wearing seat belts differs by quite a bit between the years 2002 and 2018.
c. Yes, random sampling can be used. Obtain a numbered list of the students at each college, and use software to randomly select the number that you want for the sample size.
5.1.9
d. No, random assignment cannot be used. You cannot randomly assign students to attend one college or the other.
a. Tables 1 and 4 have the same pair of conditional proportions, namely, 0.50 of A’s are Yes and 0.50 of B’s are Yes in both tables. b. The difference in conditional proportions is largest in Table 3, where 0.667 of A’s are Yes and 0.333 of B’s are Yes. c. The difference in conditional proportions is smallest in Tables 1, 2, and 4, where the difference equals zero because the proportion of A’s that are Yes equals the proportion of B’s that are Yes. 5.1.10 a. A: 0.10, B: 0.02, C: 0.60, D: 0.52 b. A vs. B: 0.08, C vs. D: 0.08 c. A vs. B: 5, C vs. D: 1.15 d. A vs. B
e. A 2 × 2 table suitable for summarizing the data is shown below
ESU
MSU
Total
Wearing school name/logo Not wearing school name/logo Total 5.1.14 a. The response variable is whether or not the claim was judged to be fully justified or an overpayment. The explanatory variable is the size of the claim: small or medium.
50
c05Solutions.indd 50
10/16/20 9:10 PM
51
Solutions to Problems b. The observational unit is the claim made to Medicare or Medicaid.
b. Yes (53/100 > 47/100)
c. To address the question “Does the chance that a claim is judged to be an overpayment depend on the size of the claim?” the relevant comparison is 14/30 versus 8/30.
c. No (17/47 < 30/47)
5.1.15 a. More rattlesnakes were caught at Site G than at Site B, so looking at conditional proportions (rather than counts) takes this into account. b. The explanatory variable is site, so we calculate the conditional proportions of males/females at each site. c. The proportion of rattlesnakes that are female are 11/21 ≈ 0.524 at Site B and 12/33 ≈ 0.364 at Site G. Reasoning informally, these proportions appear to be different enough to provide evidence of an association between site and sex of rattlesnakes.
d. Yes (17/25 > 8/25) 5.1.18 a. See graph in Solution 5.1.18. b. Yes, the bars show different distributions of sex of senator (higher percentage of females in Democratic party as opposed to Republican party). 5.1.19 Answers will vary. One possible solutions is as follows and shown in the Solution 5.1.19 graph.
Republicans
Democrats
Total
Female
10
10
20
5.1.16 We need to know at least one of the following four numbers: number of male Republicans, number of female Republicans, number of male Democrats, or number of female Democrats.
Male
40
40
80
Total
50
50
100
5.1.17
here is no association between party and sex of senator in these data beT cause the percentage of females is the same in both parties (20%). Made these data by finding the overall percentage of female senators (20%), assuming a 50/50 split of Republicans and Democrats and then multiplying 50 times 20% to find the number female senators in each party.
a.
Republicans
Democrats
Total
Female
8
17
25
Male
45
30
75
Total
53
47
100
Male
100%
Female
90%
5.1.20 a. Observational study because researchers did not assign respondents to attend church in differing amounts
80% 70% 60% 50% 40% 30% 20% 10% 0% Republican
Democrat
Republican
Democrat
Solution 5.1.18 Male
100%
Female 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Solution 5.1.19
c05Solutions.indd 51
10/16/20 9:10 PM
52
C HA PTER 5
Comparing Two Proportions 100
Don’t smoke Smoke
Percentage
80 60 40 20 0
At least Almost About Seldom once every week once a per week month Level of church attendance
Never
Solution 5.1.20c
b. Explanatory: level of church attendance; response: smoke (yes/no)
5.1.23
c. See graph for Solution 5.1.20c.
people aged 18–29 compared to 47/363 = 0.129 people aged 30–64 have broken up with someone digitally, suggesting an association between age and having broken up with someone using digital means. (Segmented bar graph is shown in Solution 5.1.23.)
d. Because the percentages are different, this is evidence of an association between regularity of church attendance and smoking e. No, because this is an observational study.
5.1.24
5.1.21 a. 0.30/0.12 = 2.5. People who never attend church are 2.5 times as likely as those who attend church weekly to be smokers. b. Seldom vs. at least once: 2.08; about once a month vs. at least once: 1.83; almost every week vs. at least once: 1.17 5.1.22 a. There are different numbers of males and females in the sample (more males). b. Find the proportion of people who’ve broken up digitally for males and females c. 52/289 = 0.18, 55/364 = 0.15 d. Females are more likely to report having broken up digitally, suggesting a potential association between sex of respondent and likelihood to have broken up digitally.
Has not broken up by digital means Has broken up by digital means
a. Experiment b. Explanatory: receive AZT yes/no Response: baby born with HIV yes/no c.
AZT
No AZT
Total
HIV positive
13
40
53
HIV negative
151
120
271
Total
164
160
324
d. 0.079 (AZT) vs. 0.250 (no AZT) e. 3.16
100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Age 18−29
Age 30−64
Solution 5.1.23
c05Solutions.indd 52
10/16/20 9:10 PM
Solutions to Problems
53
Would not give CPR to pet 100% Would give CPR to pet
90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Dog owners
Cat owners
Women
Men
Solution 5.1.25a Would not give CPR to pet 100% Would give CPR to pet
90% 80% 70% 60% 50% 40% 30% 20% 10% 0%
Solution 5.1.26a
b. Sex of respondent (male/female), CPR or no CPR
100
HIV neg
c. The number of men and women in the sample
HIV pos
Percentage
80
5.1.27 a.
60
Male
Female
Total
5
6
11
Coffee 40 20 0
No coffee
5
10
15
Total
10
16
26
b. Males: 0.50; females: 0.375 AZT
No AZT
g. Due to the difference in conditional proportions there appears to be an association between taking AZT and a reduction in the chance that the child of an HIV-positive woman will be HIV positive. 5.1.25 a. See graph for Solution 5.1.25a.
c.
100
No coffee Coffee
80 Percentage
f.
60 40
b. Type of pet owner (dog or cat), CPR or no CPR c. The number of dog owners and cat owners 5.1.26 a. See graph for Solution 5.1.26a.
c05Solutions.indd 53
20 0
Male
Female
10/16/20 9:10 PM
54
C HA PTER 5
Comparing Two Proportions
d. A mosaic plot also shows there were more females in the sample than males. 100
5.1.29 a.
80 Percentage
e. The proportions of men and women choosing R are quite similar, suggesting little association between sex of respondent and letter choice.
Coffee (No)
Coffee (No)
60 40 20
Coffee (Yes)
Coffee (Yes)
0
R
6
8
14
L
4
5
9
Total
10
13
23
b. Proportion of people choosing R: males 0.60, females 0.62
14
7
9
Total
11
12
23
60 40 20 0
Coffee
No coffee
d. A mosaic plot also shows there were slightly fewer coffee drinkers than not.
c. Letter
Letter L
100
L
5
2
100
Percentage
5.1.28
Total
9
L
80
e. Because the proportion of coffee drinking men is different than coffee drinking women this is some evidence of an association between sex of respondent and coffee drinking.
Females
R
R
Male
Males
Total
c. Letter
Sex
a.
No coffee
b. Proportion of people who chose R: coffee 0.82, no coffee 0.42 L
Female
Coffee
100
R
R
80 Percentage
Percentage
80 60 40
60 40 20
20 0
Male
0
Female
d. A mosaic plot also shows there were more females in the sample than males. Letter L
100
No Coffee Coffee?
e. There is an association between letter choice and coffee drinking because the proportions are quite different. 5.1.30
R
Percentage
Coffee
80
a. ~52%
60
c. No
40
e. You would need to know the proportion of conservatives and liberals that took the survey.
20
5.1.31
0
b. ~35% d. Yes, the proportions are not similar.
a. Also need one of the cell counts, for example, how many preterm and Caesarean Female
Male Sex
c05Solutions.indd 54
10/16/20 9:10 PM
Solutions to Problems b. See the following table.
55
5.2.2 D.
Preterm
Term
Total
Caesarean
19
22
41
Vaginal
2
7
9
Total
21
29
50
c. Preterm: 0.905, term: 0.759 d. 1.19 e. There is some evidence (although difficult to comment on strength of evidence without a p-value) that the chance of a caesarean birth is 1.19 times for a preterm birth compared to for a term birth. 5.1.32
5.2.3 D. 5.2.4 A. 5.2.5 A. 5.2.6 a. False; as sample size increases, everything else remaining the same, the standard deviation of the statistic decreases because statistics from larger samples vary less than statistics from smaller samples. b. True 5.2.7 a. B.
a. Cryotherapy: 0.48; duct tape: 0.462
b. A.
b. Back of heel: 0.12, 0.12; plantar: 0.28, 0.31; palmar: 0.08, 0.08; other: 0.04, 0.04
5.2.8 a. A.
c. To eliminate location of wart as a possible source of variation in response, and thus rule it out as a confounding variable
b. B.
5.1.33
a. C.
a. A: 0.8; B: 0.9. Hospital B saved more patients overall. b. 590 + 210 = 800, 870 + 30 = 900, 190 + 10 = 200, and 70 + 30 = 100, c. Hospital A, fair: 590/600 = 0.983; Hospital B, fair: 870/900 = 0.967. Hospital A is better. d. Hospital A, poor: 210/400 = 0.525; Hospital B, poor: 30/100 = 0.30. Hospital A is better. e. Even though Hospital B looks better overall, Hospital A is better when the data are broken out by patient condition. The overall number is misleading because Hospital A treats many more poor condition patients than Hospital B. f. Rather go to Hospital A because survival is better for both patient condition groups. 5.1.34 a. Male acceptance rate: 533/1198 = 0.445; female acceptance rate: 113/449 = 0.252. Males have a higher acceptance rate overall. b. Males: 511/825 = 0.619; females: 89/108 = 0.824. Females have a higher acceptance rate. c. Males: 22/373 = 0.059; females: 24/341 = 0.070. Females have a higher acceptance rate. d. Yes, even though the overall proportion of males accepted is higher, females have a higher acceptance rate within each group. e. Program F is highly competitive, but Program A is not. However, the majority of female applicants received applied to Program F, while the majority of males applied to Program A. Thus, the overall proportion of males accepted was higher. 5.1.35 Person-month: # hits/# at bats = batting average Amy-June: 24/80 = 0.300
5.2.9 b. D. c. B, D. d. C. e. C, D. f. 30 in each pile, representing 30 medium claims and 30 small claims (explanatory variable) g. Difference in conditional proportions or relative risk h. Find how often the observed statistic value or more extreme occurred in the simulation (both tails because two-sided). 5.2.10 a. 0, because if the null hypothesis is true there is no difference in the two probabilities of overpayment b. See the following graph = 1000 Mean = 0.005 SD = 0.125 200 150 100 50 0 –0.48
–0.28
–0.08
0.12
0.32
Barb-July: 33/80 = 0.413
c. There is a 0.184 chance of getting a difference in proportions of 0.20 (the value of the statistic) or larger, or −0.20 or smaller, if the probability a claim is judged to be an overpayment is the same for small and medium claims.
Amy-overall: 34/100 = 0.340
d. There is little to no evidence against the null hypothesis.
Barb-overall: 35/100 = 0.350
e. (0.20 − 0)/0.125 = 1.6. The statistic is approximately 1.6 standard deviations from the mean of the null distribution.
Section 5.2
f. They give similar indications about evidence—there is not much evidence against the null hypothesis. Note: The standardized statistic
Barb-June: 2/20 = 0.100 Amy-July: 10/20 = 0.500
5.2.1 D.
c05Solutions.indd 55
10/16/20 9:10 PM
56
C HA PTER 5
Comparing Two Proportions
(1.6) is near the “cutoff” of moderate evidence, but this modest difference isn’t a problem: Neither gives “strong” evidence. g. 0.20 ± 0.125 × 2 = (−0.05, 0.45). h. We are 95% confident that the true difference in probability of overpaid claims between the two groups is between −0.05 and 0.45. i. Yes, 0 is in the interval, meaning that it’s plausible that there is no difference in the probabilities.
c. We do not have strong evidence against the null hypothesis. d. Z = −0.027/0.019 = −1.42. Our data are 1.42 SDs from the mean when the null hypothesis is true. e. Yes f. –0.027 ± 2 × 0.019 = (−0.065, 0.011). We are 95% sure that the true difference in survival rates is between 0.065 lower and 0.011 higher with surgery.
j. We have little to no evidence that the probability of overpaid claims is different between the two groups. We’re 95% confident that the true difference in the probability of overpaid claims is between −0.05 and 0.45. This is a random sample, and so this conclusion can be generalized to other Medicare/Medicaid claims at the company being audited; however, this is not a randomized experiment and so cause-and-effect conclusions cannot be drawn.
g. Yes, 0 is in the interval.
5.2.11
5.2.15
a. C.
h. We do not have strong evidence that surgery will the improve survival rate compared to observation in the treatment of prostate cancer (p = 0.10; 95% CI: −0.065, 0.011). This result cannot be generalized to all men with prostate cancer because it is not a random sample. A cause-and-effect conclusion would potentially be possible here because of random assignment. a. C.
b. D.
b. C.
c. 24% −16% = 8 percentage points or 0.24/0.16 = 1.5
c. A.
d. Sample size in each group
d. B.
e. Large sample sizes in each group
e. Categorical
f. Small sample sizes in each group
f. Categorical
5.2.12
g. A, D.
a. B.
h. The difference happened by chance because no difference in survival rate between CC and CPR techniques or there is an actual difference in the survival rate from the two different techniques.
b. A. c. C.
5.2.16
d. B.
a. A.
e. A.
b. B.
f. Categorical
c. 3,031
g. Categorical
d. Two different colors, 389 of one color and 2,642 of the other color
h. C and D. i. There is actually a difference in death rates between the two groups or there isn’t a real difference, and the observed difference occurred by chance. 5.2.13 a. A.
e. Two piles, 1,500 in one pile, 1,531 in the other pile, because that’s how many people receive the two techniques in the actual study f. The difference in the proportion of one of the colors of cards between the two groups subtracting the group with 1,531 from the group with 1,500 (or use ratio of proportions) g. Find the proportion of times that the actual difference in the study (0.024) or larger occurred in the simulated statistics.
b. B. c. 731
h. C.
d. Two colors, 52 of one color and 679 of the other
5.2.17
e. Two piles, 364 in one pile and 367 in the other pile f. Difference in proportions or relative risk g. Find how often the observed statistic or more extreme occurred in the simulations. h. C. 5.2.14
Total
Surgery
Observation
Total
No
21
31
52
Yes
343
336
679
364
367
731
b. p-value = 0.10. Differences in the two conditional proportions smaller than −0.027 happen in about 10% of simulations when the null hypothesis (no difference in survival rates) is true.
c05Solutions.indd 56
b. We have strong evidence against the null hypothesis. c. 0.024/0.012 = 2. The observed statistic is 2 SDs from the mean of the null distribution. d. Yes, both suggest strong evidence against the null hypothesis.
a. See the following table.
Survived?
a. p-value is approximately 0.025. We obtained statistics 0.024 or larger in 2.5% of simulations.
e. 0.024 ± 2 × 0.012 = (0, 0.048). We are 95% confident that the survival rates is 0.0 to .048 higher with CC compared to CPR better with CC. f. Yes, more or less. Zero is (barely) in the interval and the p-value rejects zero. Note that the test is one-sided and the interval is twosided, so the strength of evidence will change a bit depending on which approach you take. g. We have strong evidence that the true survival probability is greater for CC than for traditional CPR (p-value = 0.025; 95% CI: 0, 0.048). Because this is a randomized experiment, this suggests that the use
10/16/20 9:10 PM
Solutions to Problems
57
LGA-No 100% LGA-Yes
90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Dietary intervention
Control
Solution 5.2.19d
of CC is causing the survival rate to increase. There is no information provided as to how the sample was gathered, but it was likely not a random sample of calls for help but instead all calls during certain time periods in one or more cities. Thus, this result may not generalize to other locations with different types of people who may respond differently to different types of CPR. 5.2.18
a. The difference in the long-run proportion of babies with LGA in the dietary intervention group (πdiet) vs. the control group (πcontrol) b. Null: The long run proportion of babies with LGA in the dietary intervention group is equal to the proportion in the control group. Alternative: The long run proportions are different. c. Null: πdiet = πcontrol; Alternative: πdiet ≠ πcontrol
a. 0.042 b. 0.024 c. The Hallstrom et al. study had a larger difference in proportions. d. The one-sided p-value is 0.10. This is the probability of obtaining 0.042 or larger in the simulated data assuming the null to be true. e. We have little to no evidence against the null hypothesis. f. The Hallstrom et al. study provides less evidence than the metaanalysis. g. The sample size is smaller. h. The 95% CI for the Hallstrom study is approximately −0.016 to 0.099. The Hallstrom study has a wider interval than the meta-analysis (0.0006, 0.0482) because the sample size is larger. The Hallstrom study CI is centered on 0.042, while the meta-analysis CI is centered on 0.024. CIs are centered on the statistic. 5.2.19 a. Experiment because women were randomly assigned to one of two treatments b. Each woman c. Explanatory = treatment group (categorical); response = had baby with LGA (categorical) d. See table below and graph in Solution 5.2.19d.
Dietary intervention LGA?
5.2.20
Control
Total
Yes
29
68
97
No
456
405
861
485
473
958
Total
he bar chart shows a higher proportion of babies with LGA in the T control group than in the dietary intervention group
d. Get 958 cards, 97 red and 861 black, and shuffle and deal into two stacks of 485 and 473, representing the diet and control groups, respectively. Find the proportion of LGA-Yes babies in each group, then compute the difference in those proportions. Repeat many times. Compare the actual difference in the proportions (−0.084) to the simulated and find the proportion of times differences in proportions less than −0.084 or greater than 0.084 occurred in the simulations, that is, the p-value. 5.2.21 a. p-value is <0.001, meaning that differences larger than 0.084 or smaller than −0.084 did not occur in 1000 simulations assuming the null hypothesis is true. b. We have very strong evidence against the null hypothesis. c. −0.084/0.02 = −4.2. The observed statistic is 4.2 SDs from the mean of the simulated null distribution. d. Yes, both suggest very strong evidence against the null hypothesis. e. −0.084 ± 0.02 × 2 = (−0.124, −0.044). We are 95% confident that the long-run proportion of LGA is between 0.124 and 0.044 less in the dietary intervention group than in the control group. f. Yes, the 95% CI is consistent with the p-value because the interval does not include 0. g. We have very strong evidence that the dietary intervention reduces the proportion of LGA infants (p < 0.001; 95% CI −0.124, −0.044). This is a potential cause-and-effect relationship because the women were randomly assigned to the two treatment groups. Further information is needed about who the women who volunteered represent, as they were not obtained by taking a random sample. 5.2.22 a. 2.4 (or 0.416)
e. 29/485 − 68/473 = −0.084
b. Null: The relative risk is 1. Alternative: The relative risk is not 1.
f. There is actually an effect of the dietary intervention in lowering the proportion of babies born with LGA or the result occurred by chance.
c. The p-value is <0.001, meaning that we never got a relative risk of 2.4 or more extreme when simulating the null hypothesis 1000 times.
c05Solutions.indd 57
10/16/20 9:10 PM
58
C HA PTER 5
Comparing Two Proportions
d. Yes, the p-values are both very small. This makes sense because we haven’t changed the study, just the way we’ve summarized the data (the statistic). 5.2.23 a. 3/20 − 18/22 = −0.668 b. There is an actual difference in the likelihood of taking the double or nothing bet based on pose or this result occurred merely by chance. 5.2.24 a. The parameter is the difference in probabilities of selecting the high-risk option between those who held a low-power pose and those who held a high-power pose. This parameter can be denoted as πlow − πhigh. b. The null hypothesis is that the probability of selecting the highrisk option is the same, regardless of whether the person holds a high-power pose or a low-power pose. The alternative hypothesis is that the probability of selecting the low-risk option is lower for a person who holds a low-power pose than for a person who holds a high-power pose. c. H0: πlow = πhigh; Ha: πlow < πhigh d. Use 21 red cards to represent the people who chose the risky (double-or-nothing) option, and use 21 black cards to represent the people who chose the safe option. Shuffle the 42 cards thoroughly. Deal out 20 cards to represent the people assigned to the low-power pose, with the remaining 22 cards then representing the people assigned to the high-power pose. Count how many red cards (representing the high-risk choice) are in each group. Calculate the proportion of red cards in each group and subtract those proportions, taking the low-power pose group’s proportion minus the high-power pose group’s proportion. Repeat this process a large number of times, say 1,000. The p-value will be approximated by the proportion of repetitions in which this difference is 3/20 − 18/22 ≈ −0.668 or smaller. 5.2.25 a. The approximate p-value is 0.000, which means that a difference in group proportions who choose the risky option of −0.668 or smaller would almost never occur by random assignment alone if there were really no difference in the probabilities of selecting the risky option between the two groups. b. The very small p-value indicates that the experimental data provide extremely strong evidence that people who hold a high-power pose are more likely to choose the risky, double-or-nothing option than people who hold a low-power pose. c. −0.668/0.157 = −4.25. The observed statistic is 4.25 SDs below the mean of the null distribution. d. Yes, both give very strong evidence against the null hypothesis.
5.2.26 a. 5.46 (or 0.183) b. Null: The relative risk is 1. Alternative: The relative risk is greater than 1. c. The p-value is < 0.001, meaning that the chances of getting a relative risk as or more extreme than the one observed (5.46) is < 0.001 d. The conclusions are the same because the data are not changing, just our way of summarizing the data. 5.2.27 a. Observational study: Neither variable was assigned by researchers. b.
Stay? Total
Men
Women
Total
Yes
242
172
414
No
320
305
625
562
477
1039
c. 242/562 − 172/477 = 0.431 − 0.361 = 0.07 d. The proportion of men and women in the U.S. who actually want change is the same in the population and the observed difference is due to chance only or there is actually a difference in the population. 5.2.28 a. Null: There is no association between sex of respondent and wanting to stay at current body weight or change in the population of all U.S. adults. Alternative: There is an association between sex of respondent and wanting to stay at current body weight or change in the population of all U.S. adults. b. Null: πmen = πwomen, πmen = proportion of men in population of all U.S. adults who want to stay at their current weight. Alternative: πmen ≠ πwomen. c. Use 414 red cards to represent the people who want to stay and 625 black cards to represent the people who want to change. Shuffle the 1039 cards thoroughly. Deal out 562 cards to represent the men with the remaining 477 cards representing the women. Count how many red cards are in each group. Calculate the proportion of red cards in each group and subtract those proportions, taking the male proportion minus the female proportion. Repeat this process a large number of times, say 1000. The p-value will be approximated by the proportion of repetitions in which this difference is 242/562 − 172/477 = 0.07 or larger or −0.07 or smaller. 5.2.29 a. p-value = 0.019. In 1000 shuffles to simulate the null hypothesis, we obtained values of 0.07 or larger or −0.07 or smaller 1.9% of the time.
e. −0.668 ± 2 × 0.157 = (−0.982, −0.354). We are 95% confident that the group in the low-power pose takes double or nothing between 0.354 and 0.982 less often than the group in the high-power pose.
b. We have strong evidence against the null hypothesis.
f. Yes, 0 is not in the interval.
d. Yes, both suggest strong evidence against the null hypothesis.
g. We have very strong evidence that the pose yields different proportions of people taking the bet (p-value < 0.001). We are 95% confident that the difference in actual probability of choosing the risky option is between 0.354 and 0.982 lower for people who hold a low-power pose than for people who hold a high-power pose. Because this was a randomized experiment, we conclude that the high-power pose causes an increased probability of selecting the risky option. We are not told how the participants were chosen, so it’s not clear how broadly we can reasonably generalize this conclusion to other people beyond those involved in this experiment.
c05Solutions.indd 58
c. (0.07 − 0)/0.031 = 2.26. The observed data are 2.26 standard deviations from the mean of the simulated null distribution. e. 0.07 ± 2 × 0.031 = (0.008, 0.132). We are 95% sure that the difference in the proportion of men and women who want to remain at their current weight is between 0.008 and 0.132 (men are more likely to want to stay). f. Yes, 0 is not in the confidence interval. g. We have strong evidence that the proportion of U.S. men who want to stay the same weight is different than the proportion of US women (p-value = 0.019). In particular, men are more likely to want to stay the same weight (95% CI: 0.008, 0.132). This result generalizes
10/16/20 9:10 PM
Solutions to Problems to all U.S. adults, but does not prove a cause-and-effect relationship between sex of respondent and desire to stay the same weight.
5.2.33
5.2.30
b. Proportion of the women fired = 15/18 = 0.83
a. See the following table.
Knows Census required by law
a. Proportion of the men fired = 0 c. 0.83 − 0 = 0.83
College degree
No college degree
Total
Yes
195
271
466
No
331
702
1,033
526
973
1,499
Total
59
b. 195/526 − 271/973 = 0.092 c. The proportion of people who know that participating in the Census is required by law is the same for those with a college degree as for those without. The observed difference is due to chance alone. Or there is actually a difference in the two proportions in the population. d. Observational study; the researchers have not assigned the values of either variable to the observational units. 5.2.31 a. Null: There is no association between education level and knowledge that the Census is required by law. Alternative: There is an association between education level and knowledge that the Census is required by law. b. Null: πcollege = πno college, πcollege = proportion of people with college degree in population of all U.S. adults who know participating in the Census is required by law. Alternative: πcollege ≠ πno college. c. Use 466 red cards to represent the people who know Census is required and 1,033 black cards to represent the people who don’t. Shuffle the 1,499 cards thoroughly. Deal out 526 cards to represent the college group, with the remaining 973 cards representing the no-college group. Count how many red cards are in each group. Calculate the proportion of red cards in each group and subtract those proportions, taking the college degree proportion minus the no-college degree proportion. Repeat this process a large number of times, say 1,000. The p-value will be approximated by the proportion of repetitions in which this difference is −0.092 or smaller or 0.092 or larger.
d. Use 15 red cards to represent the people who were fired and 15 black cards to represent the people who weren’t fired. Shuffle the 30 cards thoroughly. Deal out 18 cards to represent the women, with the remaining 12 cards representing the men. Count how many red cards are in each group. Calculate the proportion of red cards in each group and subtract those proportions, taking the female proportion minus the male proportion. Repeat this process a large number of times, say 1000. The p-value will be approximated by the proportion of repetitions in which this difference is (15/18 − 0/12) 0.83 or larger. 5.2.34 a. 0, because if the null hypothesis is true there is no difference in the two groups b. 0.83 c. 0, shade in values greater than 0.83 and less than −0.83 (there are none on the graph!) d. Yes, the p-value is very small. e. No, this is not a randomized experiment. Maybe all of the women work on a manufacturing line for a product that is no longer being manufactured. f. The confidence interval would be estimating the difference in long-run proportions of males and females fired, but this doesn’t make sense because this was a one-time event (the firings) not repeatable or representative (necessarily) of any other company. 5.2.35 a. Experiment, because some households were assigned to receive incentive and others not b. See the following table.
Incentive yes Incentive no Total Participated?
Yes
286
245
531
No
82
122
204
368
367
735
Total
5.2.32
c. 286/368 − 245/367 = 0.11
a. The p-value is approximately 0, meaning that we did not observe any simulated differences in proportions larger than 0.092 or smaller than −0.092.
d. There is no effect of the incentive, and the difference we saw was due to random chance or there is an effect of the incentive.
b. We have very strong evidence against the null hypothesis.
a. Null: There is no association between whether or not the household received the incentive and participation. Alternative: Incentives tend to improve response rates.
c. 0.092/0.025 = 3.68. The observed statistic is 3.68 standard deviations from the mean of the simulated null distribution. d. Yes, both give very strong evidence against the null hypothesis. e. 0.092 ± 0.05 = (0.042, 0.142). We are 95% confident that the difference in the proportions of people who know the Census is required by law when comparing people with a college degree and people without is between 0.042 and 0.142. f. Yes, 0 is not in the interval. g. We have very strong evidence of an association between education level and knowledge that the Census is required, with between 4.2 and 14.2 percentage points more people with a college education knowing the Census is required when compared to people without a college education. This result generalizes to the population of all U.S. adults but does not suggest a cause-and-effect relationship between having a college degree and knowledge of the Census law.
c05Solutions.indd 59
5.2.36
b. Null: πincentive = πno incentive; Alternative: πincentive > πno incentive, where π is the long-run proportion of households that will participate c. Use 531 red cards to represent the households that participated and 204 black cards to represent the households that didn’t. Shuffle the 735 cards thoroughly. Deal out 368 cards to represent the households receiving the incentive and 367 to represent the households that didn’t. Count how many red cards are in each group. Calculate the proportion of participators in each group and subtract those proportions. Repeat this process a large number of times, say 1,000. The p-value will be approximated by the proportion of repetitions in which this difference is 0.11 or larger. 5.2.37 a. 0, none of the simulated statistics were 0.11 or larger.
10/16/20 9:10 PM
60
C HA PTER 5
Comparing Two Proportions
b. We have very strong evidence against the null hypothesis. c. 0.11/0.033 = 3.33, the observed statistic is 3.33 standard deviations from the mean of the null distribution. d. Yes, both give very strong evidence against the null hypothesis. e. 0.11 ± 2 × 0.033 = (0.044, 0.176). We are 95% confident that the proportion of homes that will participate when given a financial incentive is between 0.044 and 0.176 higher than when no incentive is given. f. Yes, 0 is not in the interval. g. We have very strong evidence that the probability of participating households is higher when a financial incentive is used than when no incentive is used (0.044 to 0.176 higher). This is a cause-and-effect relationship that generalizes to all households in the nation. 5.2.38 a. Experiment, because researchers assigned participants to whether or not they would listen to romantic lyrics b. So that participants wouldn’t know that the real purpose of the study was to see if they would give their phone numbers and so that the male confederate wouldn’t unintentionally change his approach with the participants c. Each of the 87 female students d. Romantic lyrics? Yes or no (explanatory; categorical); gave phone number? Yes or no (response; categorical) e. See the following table.
Romantic song Neutral song Total Gave phone number?
Yes
23
12
35
No
21
31
52
44
43
87
Total f. 23/44 − 12/43 = 0.244
g. There is an effect of playing romantic lyrics or there is no effect and the observed difference is due to chance alone.
b. We have strong evidence of an association between the song played and the likelihood that the participant gives out her phone number. c. 0.244/0.104 = 2.35. The observed data are 2.35 SDs from the mean of the simulated null distribution. d. Yes, both give strong evidence. e. 0.244 ± 0.208 = (0.036, 0.452) We are 95% confident that the proportion of female participants who will give their number after hearing romantic lyrics is between 0.036 to 0.452 higher than when a neutral song is played. f. Yes, 0 is not in the interval. g. We have very strong evidence that the proportion of female participants that will give out their phone number is significantly larger (p = 0.027; 95% CI: 0.036 to 0.452. after hearing a romantic song than when no song is played. This result suggests a cause-and-effect relationship between the type of song and giving the number, though this result may not be generalizable to a larger population. 5.2.41 a. Null, H0: the proportion of students on their phones during the afternoon is the same as that in the evening, versus Alternative, Ha: the proportion of students on their phones during the afternoon is different than in the evening, b. 0.184 c. 0.184/0.051 = 3.61; p-value < 0.001 d. There is very strong evidence (based on the large standardized statistic and the extremely small p-value) that the proportion of students on their phones during the afternoon is different than in the evening. 5.2.42 a. Null, H0: the probability of Cesarean deliveries is the same for term and preterm births, π preterm ≠ π term versus Alternative, Ha: the probability of Cesarean deliveries is different for term and preterm births, π preterm≠ π term. b. See the following table.
Preterm
Term
Total
19
22
41
5.2.39
Caesarean
a. πromantic = the long-run proportion of female participants who will give their phone number to the male confederate after listening to the romantic lyrics; πnonromantic = the long-run proportion of female participants who will give their phone number to the male confederate after listening to the nonromantic lyrics
Vaginal
2
7
9
Total
21
29
50
c. Preterm: 0.905; term: 0.759 d. p-value ≈ 0.278, standardized statistic = 1.28
b. Null: There is no association between the type of music played and the likelihood of the participant giving her phone number. Alternative: There is an association between the type of music played and the likelihood of the participant giving her phone number.
e. (−0.082, 0.374)
c. Null: πromantic = πnonromantic; Alternative: πromantic ≠ πnonromantic
5.2.43
f. There is no evidence that the proportion of Cesarean deliveries is different for term and preterm births among births like the ones in the study.
d. Use 35 red cards to represent the participants that gave their phone number and 52 black cards to represent those that didn’t. Shuffle the 87 cards thoroughly. Deal out 44 cards to represent the participants who heard the romantic lyrics and 43 to represent the participants that didn’t. Count how many red cards are in each group. Calculate the proportion of participants in each group and subtract those proportions. Repeat this process a large number of times, say 1000. The p-value will be approximated by the proportion of repetitions in which this difference is 0.244 or larger or −0.244 or smaller.
a. Null, H0: The probabilities of warts cured between the two treatments is the same, πcryo = π duct versus Alternative, Ha: The probabilities of warts cured between the two treatments are different,
5.2.40 a. 0.027; 27 out of 1000 times the simulated statistic was 0.244 or larger or −0.244 or smaller.
c05Solutions.indd 60
b. See the following table.
Cryotherapy
Duct tape
Total
Cured
15
22
37
Not cured
10
4
14
Total
25
26
51
c. Cryotherapy: 0.60; duct tape: 0.846 d. Difference = −0.246; standardized statistic = −1.98; p-value ≈ 0.054
10/16/20 9:10 PM
Solutions to Problems e. (−0.494, 0.002)
5.3.3 A.
f. We have moderate evidence (p-value ≈ 0.054) that the population proportions of warts cured between the two treatments are different, and that the type of treatment has an effect on cure because the study was a randomized experiment with moderately significant results.
5.3.4 A.
5.2.44 elative risk = 1.41; standardized statistic = 2.34; p-value ≈ 0.059. R The approximate confidence interval is (1.06, 1.76). We have moderate evidence (p-value ≈ 0.059) that the population proportions of warts cured between the two treatments are different, and that the type of treatment has an effect on cure because the study was a randomized experiment with moderately significant results. We are pretty confident that the relative risk of curing warts using duct tape is between 1.06 to 1.76 times that using cryotherapy. 5.2.45 a. Null, H0: Those practicing meditation will be just as likely as those not practicing meditation to be compassionate, versus Alternative, Ha: Those practicing meditation will be more likely than those not practicing meditation to be compassionate, π meditation= π not. b. Difference = 0.232 c. Standardized statistic = 1.98, p-value ≈ 0.045 d. We have strong evidence (p-value ≈ 0.045) that those practicing meditation will be more likely than those not practicing meditation to be compassionate, and that the type of practice has an effect on whether or not person shows compassion because the study was a randomized experiment with significant results. 5.2.46 a. 6 b. 4 c. 11 d. 9/15 − 4/15 = 0.333 e. (9/15)/(4/15) = 2.25 f. No. g. The null distributions for the difference in proportions and relative risk will have the same number of possible outcomes. For example, 0.333 is a possible outcome for the difference in proportions while 2.25 is a possible outcome for relative risk. There will be the same number of dots in each column of outcomes for the two null distributions. While the outcomes will be different, there will be a correspondence between outcomes like with 0.333 and 2.25. Both of these represent the same simulated two-way table. 5.2.47 a. We can rule out chance or coincidence as the explanation. Therefore we can conclude there was a cause, but we cannot conclude exactly what that cause was. b. Random assignment allows us to determine what the cause of a study is because we try to make our two groups identical in every way except for the one thing (like swimming with dolphins versus swimming without dolphins). So if we see a difference, we know the only thing to cause that difference was that one thing. c. You can have random assignment when testing a single proportion. Like was explained in the FAQ, the light blinking or not was randomly assigned in the Buzz and Doris study.
Section 5.3 5.3.1 A. 5.3.2 B.
c05Solutions.indd 61
61
5.3.5 B. 5.3.6 A. 5.3.7 (0.0015 + 0.0467)/2 = 0.0241. Therefore, we know that the 2005–2006 sample proportion with hearing loss was 0.0241 larger than the 1988–1994 sample proportion. 5.3.8 The 95% confidence interval has a margin of error of 0.0241. This means 0.0241 = 1.96(SD of the statistic) or SD of the statistic = 0.0241/1.96 = 0.0123. This makes the margin of error for the 99% confidence interval equal to 2.576(0.0123) or the width equal to 2(2.576)(0.0123) = 0.0636. 5.3.9 Need at least 10 successes and at least 10 failures in each group. These validity conditions are met because there are 90 successes and 10 failures in the statistics class group, and 85 successes and 15 failures in the nonstatistics class group. 5.3.10 There were only 8 “successes” in one of the two samples. The theory-based approach works well if all of the observed counts in the two-way table are at least 10. So this count is a little small for us to feel comfortable using the theory-based approach. 5.3.11 a. The parameters of interest are the population proportion of all American households that would respond to the survey when offered a monetary incentive (πincentive) and the population proportion of all American households that would respond to the survey when not offered a monetary incentive (πnone). b. The null hypothesis is that the population proportion of all American households that would respond to the survey is the same whether offered a monetary incentive or not. The alternative hypothesis is that the population proportion of all American households that would respond to the survey is greater when offered a monetary incentive than when not offered a monetary incentive. c. H0: πincentive = πnone, Ha: πincentive > πnone d. All four values in the two-way table (286, 245, 82, and 122) are larger than 10, so the theory-based approach should be valid. e.
i. The standardized statistic is z = 3.32. ii. The theory-based one-sided p-value is 0.0005. iii. The 95% confidence interval is (0.0453, 0.1739).
f. See output shown for Solution 5.3.11f. g. The p-value says that if there were really no difference in the population proportions who would respond to the survey between those offered a monetary incentive and those not offered such an incentive, then there’s only a 0.0005 probability of obtaining sample proportions as far apart, in the direction of favoring the incentive group, as was observed in this study (namely, a difference of 0.1096). h. We are 95% confident that the proportion who will respond to a telephone survey is between 0.045 and 0.174 higher after receiving an incentive. i. The p-value is small enough to indicate that the sample data provide very strong evidence that offering a monetary incentive does help to increase the response proportion to the survey. This is a randomized experiment, so we can conclude that the monetary incentive caused the higher response proportion. Furthermore, the households were randomly selected, so we can generalize this conclusion to all households in the U.S. The confidence interval reveals that the
10/16/20 9:10 PM
62
C HA PTER 5
Comparing Two Proportions Sample data
Scenario: Two proportions
n: 367
Count: 286
Count: 245
0.6
ˆ 0.7772 sample p: ˆ 0.6676 sample p:
0.4
Calculate
0.2
Reset
0
0.668
n: 368
0.332
0.8
Group 2
0.777
Group 1
0.223
1 Paste data
Group 1 Group 2
Solution 5.3.11f
magnitude of the increase in response percentage is between 4.5 and 17.4 percentage points higher when the household is offered a monetary incentive than when no such incentive is offered. 5.3.12 a. This is an experiment because the women were randomly assigned to the drug. b. The explanatory variable is whether they received tamoxifen or raloxifene (categorical). The response variable is whether they developed invasive breast cancer (categorical). c. See the following table.
Tamoxifen
Raloxifene
Total
Developed BC
163
168
331
Didn’t
9563
9577
19,140
Total
9726
9745
19,471
d. p̂ (tamoxifen) = 163/9726 ≈ 0.016759; pˆ (raloxifine) = 168/9745 ≈ 0.017240 difference = −0.00048 e. H0: There is no difference in the probability of developing breast cancer between those that receive tamoxifen and those that receive raloxifene.
g. Using the Theory-Based Inference applet: i. The standardized statistic equals z = −0.26 and ii. the theory-based Solution 5.3.12g.)
two-sided
p-value
is
0.7954.
(See
h. In this study, we find a rather large p-value (around 0.80), which doesn’t give us convincing evidence against the null hypothesis that the probability of developing invasive breast cancer is the same for both treatments. We had the potential to draw a cause-andeffect conclusion if we did think there was a difference based on the random assignment used in the study. We do not have much information about how the women were sampled other than “postmenopausal” so we should be cautious in generalizing these conclusions without additional information about how representative these women are. The 95% confidence interval is (−0.004, 0.0032), reflecting the lack of evidence of a significant difference between the two groups. 5.3.13 a. The parameters of interest can be defined as πtam, the probability of developing invasive breast cancer with tamoxifen, and πral, the probability of developing invasive breast cancer with raloxifene. b. We can use a theory-based approach because we have more than 10 in each cell of our two-way table (smallest is 163).
There is a difference in the probability of developing breast cancer between these two drugs. (There was no indication in the stem that the researchers believed in advance one treatment would be superior to the other.)
c. 99% confidence interval: (−0.0052, 0.0043)
f. We can use a theory-based approach because we have more than 10 in each cell of our two-way table (smallest is 163).
e. The interval is pretty narrow. This is because the study involved rather large sample sizes.
c05Solutions.indd 62
d. We are 99% confident that the probability of developing invasive breast cancer with tamoxifen is 0.0052 lower to 0.0043 higher than with raloxifene.
10/16/20 9:10 PM
Solutions to Problems
63
Sample data
Scenario: Two proportions
1 Paste data Group 2
Count: 163
Count: 168
0.8 0.6
sample p: ˆ 0.0168 sample p: ˆ 0.0172
0.4
Calculate
0.2
Reset 0
0.983
n: 9745
0.983
Group 1 n: 9726
Group 1 Group 2
Solution 5.3.12g
74
99
Failures
192
370
562
Totals
217
444
661
Use Table Clear
80 60 40 20 0
Show table:
Success
25
Failure
Group B Totals
Success
Success
Group A
100
)
Percentage
Sample data (2 2:
Failure
Sample data
Group A Group B EV
Success: Success Solution 5.3.15e
5.3.14 a. If we use sample proportions of 0.0168 (raloxifine) and .0172 (tamoxifen), then we have a relative risk of 0.0172/0.0168 ≈ 1.024. b. We believe the number 1 will be in the confidence interval because that would correspond to the population probabilities being equal, which our earlier analyses indicated was a plausible possibility.
he alternative hypothesis is that Author A has a different probability T of winning than Author B. d. H0: πA = πB, Ha: πA ≠ πB e. Using the Two Proportions applet, we find a p-value of approximately 0.0858. (See the applet output for Solution 5.3.15e.)
5.3.15
f. It would be okay to use a theory-based approach here because all four cell counts (25, 74, 192, 370) are larger than 10.
a. These data arise from sampling from two processes.
g. Using the Theory-Based Inference applet, we find
b. Let πA represent the long-run proportion of times Author A wins and πB the long-run proportion of times Author B wins. c. The null hypothesis would be that the authors have the same probability of winning.
c05Solutions.indd 63
i. a standardized statistic of z = −1.74, ii. a p-value of 0.0817, and iii. a confidence interval of (−0.1063, 0.0034).
10/16/20 9:10 PM
64
C HA PTER 5
Comparing Two Proportions
h. The p-value tells us that approximately 8.17% of random samples from processes with the same probability of success would yield a difference in sample proportions as small as −0.051 or smaller or as large as 0.051 or larger by chance alone (assuming the null hypothesis is true).
5.3.17
i. While we have moderate evidence against the null hypothesis, at the 5% significance level there is not enough evidence to reject the null hypothesis.
b. Null: The probability for men with localized prostate cancer and who have surgery dying of the disease is equal to the probability for men with localized prostate cancer and are just observed dying of the disease. Alternative: The probability for men with localized prostate cancer and who have surgery dying of the disease is less than the probability for men with localized prostate cancer and are just observed dying of the disease.
j. A 95% confidence interval for the difference in their winning probabilities (πA − πB) is (−0.1063, 0.0034), indicating that A’s win probability could be as much as 0.1063 smaller than B’s or as much as 0.0034 higher than B’s. This analysis depends on assuming that the sample of games is essentially a random sample for each player. k. Zero is inside the 95% confidence interval, meaning that it’s plausible there is no difference in the proportions. l. The data provide moderate evidence (but not strong) that the performance of Author B is different than that of Author A (p-value = 0.0817; 95% CI: −0.1063 to 0.0034). These results cannot be generalized to the typical performances of Authors A and B, unless the games in the sample are representative of their typical performance. There is no potential for a cause-and-effect conclusion because this is not a randomized experiment. 5.3.16 a. The parameters are the population proportion of all women who would give their phone numbers if they listened to romantic music (call this proportion πrom) and the population proportion of all women who would give their phone numbers if they listened to neutral music (call this proportion πneu). b. The hypotheses to be tested are H0: πrom = πneu versus Ha: πrom > πneu. c. There are more than 10 successes (women who gave their phone number) and 10 failures (women who did not give their phone number) in each group, so the use of theory-based methods is valid. d.
i. The standardized statistic is 2.32, with ii. a p-value of 0.0103. iii. The 95% confidence interval is (0.0422, 0.4430).
e. The p-value of 0.0103 means that if there were no difference in the population proportions of women who would give their phone number whether listening to romantic or neutral music, there’s only a 1.03% chance of obtaining sample proportions that differ by as much or more as in this study (0.2436 or more). f. The p-value is small enough to indicate that the data provide strong evidence that listening to romantic music does increase how likely a woman is to provide her phone number. We are 95% confident that the probability is between 0.0422 and 0.4430 larger. Because this was a randomized experiment, we can conclude that the romantic music is the cause of this increase. We do not know how the women in the experiment were selected, so we should be cautious about generalizing this conclusion beyond the women used in the study. g. We are 95% confident that the probability a woman will give her number when listening to romantic music is 0.0422–0.4430 higher than when listening to neutral music.
______
h. p̂ = (23 + 12)/(44 + 43) = 0.4023; z = (0.5227 − 0.2791)/(√0.4023 (1 − 0.4023)(1/44 + 1/43)) = 2.32 ______
i. (0.5227 − 0.2791) ± 1.96 (√0.5227 (1 − 0.5227)/44 + 0.2791 (1 − 0.2791)/43) = 0.2436 ± 0.1994, which gives the interval (0.044, 0.443), as found by the applet. j. The p-value will equal 2 × 0.0103 = 0.0206 because this would now be a two-sided alternative hypothesis.
c05Solutions.indd 64
a. The parameters are the probabilities of men that have localized prostate cancer dying of the disease if they have surgery (call this probability πs) and if they are just observed (call this probability πo).
c. H0: πs = πo; Ha: πs < πo d. We can use the theory-based approach to test the hypotheses because at least 10 men died and 10 men survived in each group. e. Our p-value is 0.0794 and the standardized statistic is −1.41. f. We can be 99% confident that the probability of dying of prostate cancer for those that get surgery is between 0.0757 lower and 0.0221 higher than those that are just observed for 10 years. g. Because our p-value is large, we don’t have strong evidence that the probability for men with localized prostate cancer and who have surgery dying of the disease is less than the probability for men with localized prostate cancer and who are just observed dying of the disease (99% CI: −0.0757, 0.0221). If we did have significance, we could determine causation as it was an experiment. The ability to generalize this conclusion to a broader population is suspect as this study is not based on a random sample. 5.3.18 a. The parameters are the probabilities of surviving a heart attack after being given chest compression (call this proportion πCC) and if they are given CPR (call this proportion πCPR). b. Null: The probability of surviving a heart attack if given chest compression only is the same as the probability of surviving a heart attack if given CPR. Alternative: The probability of surviving a heart attack if given chest compression only is greater than the probability of surviving a heart attack if given CPR. c. H0: πCC = πCPR; Ha: πCC > πCPR d. We can use the theory-based approach to test the hypotheses because at least 10 people died and 10 people survived in each group. e. Our p-value is 0.0223 and the standardized statistic is 2.01. f. We can be 95% confident that the probability of surviving a heart attack is between 0.0006 and 0.0482 higher for those receiving just chest compressions compared to those receiving CPR. g. Because our p-value is small, we have strong evidence that the probability of surviving a heart attack if given chest compression only is greater than the probability of surviving a heart attack if given CPR, with survival rates between 0.0006 and 0.0482 higher for those receiving compressions only. We can say the method of resuscitation is causing the difference in survival rates because this was a randomized experiment; however, generalizing these results to a broader population should be done with caution. 5.3.19 a. The parameters are for women diagnosed with mild gestational diabetes, the long-term probabilities of having a baby with LGA after receiving dietary intervention and for those that have normal prenatal care. We will call these two probabilities πDI and πPNC. b. Null: The probability of having a baby with LGA after receiving dietary intervention is the same as the probability of having a baby with LGA after receiving normal prenatal care. Alternative: The prob-
10/16/20 9:10 PM
Solutions to Problems ability of having a baby with LGA after receiving dietary intervention is different than the probability of having a baby with LGA after receiving normal prenatal care. c. H0: πDI = πPNC; Ha: πDI ≠ πPNC d. We can use the theory-based approach to test the hypotheses because at least 10 babies had LGA and 10 babies did not in each group. e. Our p-value is 0 and the standardized statistic is 4.31. f. We can be 95% confident that the probability of having a baby with LGA is between 0.0460 and 0.1220 lower for those receiving dietary intervention compared to those receiving normal prenatal care. g. Because our p-value is very small, we have strong evidence that the probability of having a baby with LGA after receiving dietary intervention is different than the probability of having a baby with LGA after receiving normal prenatal care. In fact, having a baby with LGA is between 0.046 and 0.122 lower for those receiving dietary intervention. We can say that dietary intervention caused the difference because there was random assignment to the two groups; however, generalizing should be done with caution because this is not a random sample. h. Rounding your final answer to two decimal places, you should get 4.31, the same standardized statistic as that in the applet. i. Rounding your final answer to four decimal places, you should get 0.0460 − 0.1220, the same confidence interval as that in the applet. 5.3.20 a. The relative risk is 2.40. b. Because we had a very small p-value and thus results to show there is really a difference, 1 will not lie within this interval. 5.3.21 5.3.22 a. The parameters are the proportion of all U.S. male adults that want to stay at the current weight (πM) and the proportion of all U.S. female adults that want to stay at their current weight (πF). b. Null: The proportion of all U.S. male adults that want to stay at the current weight is the same as the proportion of all U.S. female adults that want to stay at their current weight. Alternative: The proportion of all U.S. male adults that want to stay at the current weight is different than the proportion of all U.S. female adults that want to stay at their current weight. c. H0: πM = πF; Ha: πM ≠ πF d. Theory-based methods will work because we have at least 10 men and 10 women that both want to stay at their current weight and those that don’t. e. The p-value is 0.0216 and the standardized statistic is 2.30. f. We are 95% confident that the proportion of men that want to stay at their current weight is between 0.0106 and 0.1294 higher than that for women. g. We have strong evidence that the proportion of all U.S. male adults that want to stay at their current weight is different than the proportion of all U.S. female adults that want to stay at their current weight (0.0106 and 0.1294 lower for men than for women). Because this was a random sample from the population, we can generalize it back to the entire U.S. adult population. 5.3.23 a. The parameters are the proportion of all U.S. adults with some college or less that knew responding to the Census is required by law
c05Solutions.indd 65
65
(πSC) and the proportion of all U.S. adults with a college degree or more that knew responding to the Census is required by law (πCD). b. Null: The proportion of all U.S. adults with some college or less that knew responding to the Census is required by law is the same as the proportion of all U.S. adults with a college degree or more that knew responding to the Census is required by law. Alternative: The proportion of all U.S. adults with some college or less that knew responding to the Census is required by law is different than the proportion of all U.S. adults with a college degree or more that knew responding to the Census is required by law. c. H0: πSC = πCD;
Ha: πSC ≠ πCD
d. Theory-based methods are valid because we have at least 10 people that knew and 10 people that didn’t know responding to the Census is required by law for both groups. e. The p-value is 0.0002 and the standardized statistic is −3.68. f. We can be 95% confident that the proportion of U.S. adults with a college degree or more that knew that responding to the Census is required by law is 0.0422 to 0.1422 higher than for those with some college or less. g. We have strong evidence that the proportion of all U.S. adults with some college or less that knew responding to the Census is required by law is different than the proportion of all U.S. adults with a college degree or more that knew responding to the Census is required by law (0.0422 to 0.1422 higher among those with a college degree). We can generalize this to the population of all U.S. adults because it was a random sample of all U.S. adults, but this does not provide cause and effect between education level and Census knowledge because it is not a randomized experiment. 5.3.24 It is not valid to use a theory-based approach because they did not fire 3 women and fired 0 men. Both these numbers need to be at least 10. 5.3.25 a. The parameters are the probabilities of someone with cardiac disease dying of that disease for those with depression (πD) and for those without depression (πN). b. Null: The probability of a depressed person with cardiac disease dying of that disease is the same as the probability of a nondepressed person with cardiac disease dying of that disease. Alternative: The probability of a depressed person with cardiac disease dying of that disease is different than the probability of a nondepressed person with cardiac disease dying of that disease. c. H0: πD = πN;
Ha: πD ≠ πN
d. Theory-based methods are valid because we have at least 10 people that died and 10 people that didn’t in both groups. e. The p-value is 0.0263 and the standardized statistic is 2.22. f. We can be 95% confident that the probability of dying for those with cardiac disease and depression is between 0.0039 and 0.2091 higher than those with cardiac disease and no depression. g. Because we have a small p-value, we have strong evidence that the probability of a depressed person with cardiac disease dying of that disease is different than the probability of a nondepressed person with cardiac disease dying of that disease, with depressed individuals’ risk of death between 0.0039 and 0.2091 higher. Because it is an observational study we can’t conclude depression caused the deaths. We can probably generalize this to people similar to those that were in the study. 5.3.26 It is not valid to use theory-based methods because there were only 6 people taking the regular-priced pill that did not have pain reduction. This number needs to be at least 10.
10/16/20 9:10 PM
66
C HA PTER 5
Comparing Two Proportions
5.3.27 a. This is an experiment because the physicians were randomly assigned to treatment groups. b. The explanatory variable is taking aspirin or placebo. The response variable is whether they suffered a heart attack. Both of these are categorical. c. Double blind means neither the tester nor the subject knows which treatment is being received. This helps control for any bias that could develop, affecting the behavior of either the tester or the subject. d.
Heart attack?
you are likely to have a heart attack, no matter what you look at, this difference might seem huge. 5.3.29 a. Rounding your final answer to two decimal places, you should get 5.00, the same standardized statistic as that in the applet. b. Rounding your final answer to four decimal places, you should get 0.0037 to 0.0117, the same confidence interval as that in the applet. 5.3.30 a. The relative risk is 1.82.
Placebo
Aspirin
Total
Yes
189
104
293
b. Because our p-value was very small, we should not expect 1 to be contained in the confidence interval.
No
10,845
10,933
21,778
5.3.31
11,034
11,037
22,071
a. The explanatory variable is taking aspirin or placebo. The response variable is whether they developed an ulcer or not. Both of these are categorical.
Total e. –0.0077
f. Null: The probability of a heart attack for those taking aspirin is the same as for those taking the placebo. Alternative: The probability of a heart attack for those taking aspirin is different than that for those taking the placebo.
b.
Ulcer?
g. We get a p-value of 0. h. Because our p-value is very small, we have very strong evidence that the probability of a heart attack for those taking aspirin is different (and less) than that for those taking the placebo. i. We can use a theory-based approach as there are more than 10 deaths and nondeaths in each group. j.
i. the standardized statistic is 5.00 and ii. The p-value is 0
k. Simulation-based and theory-based p-values are the same. 5.3.28 a. The parameters are the probability of a heart attack for those taking aspirin (πA) and the probability of a heart attack for those not taking aspirin (πP). b. H0: πA = πP;
Ha: πA ≠ πP
c. We can use a theory-based approach because there are more than 10 deaths and nondeaths in each group. d. We can be 99% confident that the probability of having a heart attack is between 0.0037 and 0.0117 higher for those not taking aspirin compared to those that do take aspirin. e. We get a p-value of 0.
Placebo
Aspirin
Total
Yes
138
169
307
No
10,896
10,868
21,764
11,034
11,037
22,071
Total c. 0.0028
d. Null: The probability of developing an ulcer for those taking aspirin is the same as for those taking the placebo. Alternative: The probability of developing an ulcer for those taking aspirin is different than that for those taking the placebo. e. We get a p-value of about 0.10. f. As our p-value is not small, we do not have very strong evidence that the probability of developing an ulcer for those taking aspirin is different than that for those taking the placebo. g. We can use a theory-based approach because there are more than 10 ulcers and nonulcers in each group. h.
i. the standardized statistic is −1.78 and ii. The p-value is 0.0757
i. Simulation-based and theory-based p-values are similar. 5.3.32 a. The parameters are the probability of developing an ulcer for those taking aspirin (πA) and the probability for those not taking aspirin (πP).
f. There is very strong evidence of a difference in the probability of heart attacks between aspirin takers and non–aspirin takers because our p-value is so small.
b. H0: πA = πP; Ha: πA ≠ πP
g. We have strong evidence that the probability of having a heart attack is higher for those not taking aspirin compared to those that do (between 0.0037 and 0.0117 higher). We can conclude aspirin is causing this difference as this was a randomized experiment. We can probably infer these results to middle-aged to older men similar to those in the study.
d. (−0.0059, 0.0003)
h. Relatively speaking, the 99% confidence interval is narrow because we had such large sample sizes. i. It all depends on your perspective if you think this is a very large difference or not. If you just look at the difference in sample proportions, it doesn’t look that much different. However, if you looked at relative risk the difference in probabilities is quite large. If you think
c05Solutions.indd 66
c. We can use a theory-based approach because there are more than 10 ulcers and nonulcers in each group. e. Yes, the 95% confidence interval contains 0. We should expect this, because we got a p-value of more than 0.05 in the previous exercise. f. We can be 95% confident that the probability of developing an ulcer is between 0.0059 lower and 0.0003 higher for those not taking aspirin compared to those that do take aspirin. g. There is not strong evidence of a difference in the probability of developing an ulcer between aspirin takers and non–aspirin takers because our interval contains 0. h. Relatively speaking, the 95% confidence interval is narrow because we had such large sample sizes.
10/16/20 9:10 PM
Solutions to Problems 5.3.33 a. Rounding your final answer to two decimal places, you should get −1.78, the same standardized statistic as that in the applet. b. Rounding your final answer to four decimal places, you should get (−0.0059, 0.0003), the same confidence interval as that in the applet. 5.3.34 a. The relative risk is 1.22. b. Because we obtained a p-value larger than 0.05 in the previous exercise, we would expect 1 to be contained in the confidence interval. 5.3.35 a. π Florida− π NewYork = the difference in proportions of students in the populations of Florida and New York who prefer winter
67
b. Yes, small sample sizes alone are not enough to prevent comparisons being made. The challenge, of course, will be that you will need to have a very large difference in the groups to be convinced. Consider a sample size of 5 in each group, where all the observations in Group A are successes and all in Group B are failures. This is fairly compelling evidence of a difference in the groups (p-value = 0.008). 5.CE.2 a. Not small at all. There won’t be convincing evidence of a difference in the groups b. Very small because the difference in group proportions will be 1. 5.CE.3
b. Yes, because there are 149 and 76 successes, and 319 and 382 failures, which are all greater than 10 c. (0.098, 0.207) d. We are 95% confident that the proportion of Florida students who like winter is larger than the proportion of New York students who like winter by an amount that’s at least 0.098 and at most 0.207. 5.3.36 a. π 1974 − π 2014 = the difference in proportions of married people in the populations in 1974 and in 2014
5.CE.4 a. Explanatory: praised for intelligence or effort; response: lied or did not lie b. p̂ intelligence = 0.38 and pˆ effort = 0.13
d. We are 99% confident that the proportion of the population that’s married was higher in 1974 than in 2014 by an amount between 0.285 and 0.360.
c. Get 59 cards, 15 red and 44 black, shuffle and deal into two stacks of 29 and 30 representing children who were praised for intelligence and praised for effort, respectively. Find the proportion of red cards in each group, then compute the difference in those proportions. Repeat many times. Compare the actual difference in the proportions (0.246) to the simulated and find the proportion of times differences in proportions less than or equal to −0.246 or greater than or equal to 0.246 occurred in the simulations; that is the p-value.
5.3.37
d.
b. Yes, because there are at least 10 successes and at least 10 failures in each year c. (0.285, 0.360)
a. Null, H0: Unbanded penguins are just as likely to be living after 4.5 years as banded penguins, versus Alternative, Ha: Unbanded penguins are more likely to be living after 4.5 years than banded penguins, π unbanded = π banded. b. Yes, because there are at least 10 successes and at least 10 failures in each group, banded and unbanded c. Standardized statistic = 3.01; p-value = 0.0013 d. We have very strong evidence (p-value = 0.0013) that unbanded penguins are more likely to be living after 4.5 years than banded penguins. Because this is a randomized experiment, we can determine that the bands caused the difference. We should be able to generalize to penguins that are like those in the study but should be cautious about generalizing further because this wasn’t a random sample from some larger population. 5.3.38 a. The null tells us that we need to fix the color of the cards and randomize which pile they are placed. b. Under the null it doesn’t matter which label each observational unit gets so therefore we can shuffle and randomize which pile the cards are placed in.
End of Chapter 5 Exercises 5.CE.1 a. Yes, the sample sizes in the two groups can be different. While this makes it unreasonable to compare the number of successes in each group, computing the conditional proportions between the two groups allows for meaningful comparisons to be made.
c05Solutions.indd 67
p-value is 0.071 e. There is a 0.071 chance of obtaining a value of 0.246 or larger (or −0.246 or smaller) by chance when the null hypothesis is true. f. We have moderate evidence that whether a child was praised for intelligence or effort is associated with the likelihood that they
10/16/20 9:10 PM
68
C HA PTER 5
Comparing Two Proportions Female
100%
Male
90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Lozenges
Placebo
Solution 5.CE.6c
Failed to abstain from smoking
100%
Successfully abstained from smoking
80%
90%
70% 60% 50% 40% 30% 20% 10% 0% Lozenges
Placebo
Solution 5.CE.7b
misrepresent their score. Because this is a randomized experiment, this association may be cause and effect. 5.CE.5
Sex of respondent
a. Observational study
First base
Minority
Nonminority
Total
Yes
20
10
30
No
7
23
30
27
33
60
Total d. p-value is 0.001
e. We have strong evidence that minority coaches are more likely to be first base coaches than nonminorities. f. No, cause-and-effect conclusions are not possible. This is an observational study, not a randomized experiment. 5.CE.6 a. Experiment b. See the following table.
c05Solutions.indd 68
Placebo
Total
Male
197
184
381
Female
262
274
536
459
458
917
Total
b. Explanatory variable: minority group; response: type of coach c. See the following table.
Lozenges
c. See graph for Solution 5.CE.6c. d. Not a strong association e. z = 0.84, p-value = 0.3982. We don’t have strong evidence of an association between treatment group and sex of respondent f. Yes, they do not want differences between the group on sex of respondent. 5.CE.7 a. See the following table.
Lozenges Placebo Total Abstained from smoking
Yes
82
44
126
No
377
414
791
459
458
917
Total b. See graph for Solution 5.CE.7b
10/16/20 9:11 PM
69
Solutions to Problems c. 0.083 d. 1.86
d. None of these intervals include zero, meaning that even at the 1% significance level there is evidence of a difference in the group proportions.
e. z = 3.63, p-value = 0.0003. We have strong evidence that the probability that someone will successfully abstain from smoking is higher if they receive nicotine lozenges compared to those that receive a placebo.
5.CE.14 a. We say we have very strong evidence of a difference in the proportion of students that own a credit card between 2001 and 2004 when really there is no difference.
f. Yes, people who took the lozenges were more likely to have abstained from smoking.
b. We don’t find evidence of a difference in the proportion of students who own credit cards between 2001 and 2004, but there really is a difference.
g. Yes, cause and effect is reasonable because this was a randomized experiment.
5.CE.15 a. The probability we find strong evidence of a difference when there is, in fact, a difference.
5.CE.8 a. (0.0383, 0.1267) b. People who took the lozenges were between 0.0383 and 0.1267 more likely to successfully abstain from smoking than those taking the placebo. 5.CE.9 a. (0.1436, 0.2136) b. Not great, only a 14% to 21% chance of successfully abstaining 5.CE.10 a. Random sampling only; random assignment of sex of respondent not possible
c. Decrease. Decreasing significance level decreases the Type I error rate, which raises the Type II error rate and decreases power. d. Increases power because it increases the strength of evidence. 5.CE.16 The percentage of males who consider life to be exciting in the sample is 52.8%; for women the percentage is only 48.6%, as shown in the accompanying bar chart. 1
100
40
a. z = 1.66, p-value = 0.0978 b. z = 3.7, p-value = 0.0002
This difference (0.042) is not statistically significant (p-value = 0.13; 95% CI on difference: (0.013, 0.098)), meaning that we don’t have evidence of a difference in the proportion of men and women who think that life is exciting. 5.CE.17 The percentage of males who consider life too dull in the sample is 5.9%; for women the percentage is 4.6% as shown in the following segmented bar chart.
c. As sample size increases and the difference in proportions remains the same, the strength of evidence against the null hypothesis increases.
1
5.CE.12
b. z = 4.61, p-value < 0.0001. We have strong evidence of a difference in the proportion of undergraduate students that owned a credit card in 2004 and 2001.
0.8 Proportion
a. Null hypothesis: There is no difference in the proportion of students that owned a credit card in 2004 and 2001. Alternative: There is a difference.
0.6 0.4 0.2
a. (0.0451, 0.0949) b. (0.0403, 0.0997) c. (0.031, 0.109)
c05Solutions.indd 69
Men
0.046
5.CE.11
Women
0
0.954
d. The sample size in each group
0.486
Female
0.059
Male
0.528
0
c. Null: The proportion of men who are satisfied with their attractiveness is the same as the proportion of women. Alternative: The proportions are different.
5.CE.13
0.4 0.2
20 0
0.6
0.514
Proportion
Percentage
60
0.472
0.8
80
0.941
b.
b. Larger sample size because it will increase the strength of evidence against a false null hypothesis.
Men
Women
10/16/20 9:11 PM
70
C HA PTER 5
Comparing Two Proportions
This difference (0.013) is not statistically significant (p-value = 0.30; 95% CI on difference: (−0.012, 0.038)), meaning that we don’t have evidence of a difference in the proportion of men and women who think that life is dull. 5.CE.18 a. One-proportion z-test because we are looking to see whether the proportion of those who like one type of cereal (say, name brand) is different from 0.50 b. Two-proportion z-test because we are looking to see whether the proportion of females who like one type of cereal (say, name brand) is different from the proportion of males who like that type of cereal 5.CE.19 a. Two-proportion z-test because we are looking to see whether the proportion of children who like one type of chocolate (say, dark chocolate) is different from the proportion of adults who like that type of chocolate
LP
Restart
Total
Nonstuttering
65
65
130
Stuttering
20
26
46
Total
85
91
176
c. Relative risk = 0.765/0.714 = 1.07 d. The p-value for the test is 0.4468. e. 1.071 ± 2(0.09) or (0.891, 1.251) With a p-value of 0.4468 as well as a 95% confidence interval for relative risk containing 1 as a plausible value, we have weak evidence against the null hypothesis and conclude that it is plausible that there is no difference in the proportion of preschool children in the population who are non-stuttering between the two treatments (LP & RESTART). 5.CE.23
5.CE.20
a. H0: π LP − π RESTART = 0, there is no difference in the proportions of preschool children who are non-stuttering in the LP treatment group compared to the RESTART treatment group; Ha: π LP− π RESTART ≠ 0, there is a difference in the proportions of preschool children who are non-stuttering in the LP treatment group compared to the RESTART group.
1994 = difference in the proportions in the population who a. π 2014− π read the paper every day in 2014 compared to 1994
c. See table. pˆ LP− pˆ RESTART = 0.765 − 0.714 = 0.051
b. One-proportion z-test, because we are looking to see whether the proportion of those who like one type of chocolate (say, dark chocolate) is different from 0.50
b. Yes, the number of successes and the number of failures are both much larger than 10, with the smallest being 417. c. (−0.1781, −0.1410) d. We are 90% confident that the proportion of people in 2014 who read the newspaper every day is smaller than the proportion of people in 1994 who read the newspaper every day by an amount that is between 0.1410 and 0.1781. 5.CE.21 a. H0: π large − π small = 0, there is no difference in the proportions of students who take more than one piece when there is a large amount versus a small amount of candy in the bowl; Ha: π large− π small≠ 0, there is a difference in the proportions of students who take more than one piece when there is a large amount versus a small amount of candy in the bowl.
b. p ˆ LP = 0.765; pˆ RESTART = 0.714
d. The p-value = 0.4931 e. 0.051 ± 2(0.067) or (−0.083, 0.185). With a p-value of 0.4931 as well as a 95% confidence interval for the difference in the proportion of nonstuttering preschool children in the LP group vs the RESTART group containing 0 as a plausible value, we have weak evidence against the null hypothesis and conclude that it is plausible that there is no difference in the proportion of preschool children in the population who are nonstuttering between the two treatments (LP & RESTART).
LP
Restart
Total
Nonstuttering
65
65
130
Stuttering
20
26
46
Total
85
91
176
b. Yes, the number of successes and the number of failures are both larger than 10, with the smallest being 39.
Chapter 5 Investigation
c. z = −1.28; p-value = 0.2007
2. Experiment, because the Vitamin C and placebo pills are randomly assigned without the recipient or the physician knowing which is which
d. There is weak evidence (p-value = 0.2007) that the amount of candy in the bowl will affect whether people take more than one piece of candy as instructed. Because we don’t have significant results, we can’t determine cause and effect. This does not appear to be a random sample, so we can't really describe an appropriate population to generalize to. 5.CE.22 RESTART = 0, there is no difference in the proportions of a. H0: π LP− π preschool children who are nonstuttering in the LP treatment group compared to the RESTART treatment group; Ha: π LP − π RESTART ≠ 0, there is a difference in the proportions of preschool children who are nonstuttering in the LP treatment group compared to the RESTART group. b. p̂ LP = 0.765; pˆ RESTART = 0.714. See table.
c05Solutions.indd 70
1. Does Vitamin C help prevent colds?
3. The skiers 4. Whether they received Vitamin C or placebo and whether or not they contracted a cold. 5. πvitamin C = probability of contracting a cold while taking a daily Vitamin C (or population proportion contracting a cold while taking a daily Vitamin C); πplacebo = probability of contracting a cold while taking a daily placebo pill (or population proportion contracting a cold while taking a daily placebo pill) 6. WORDS The probability of contracting a cold while taking a daily Vitamin C tablet is the same as the probability of contracting a cold while taking a daily placebo pill.
10/16/20 9:11 PM
71
Solutions to Problems The probability of contracting a cold while taking a daily Vitamin C tablet is less than the probability of contracting a cold while taking a daily placebo pill.
where π is the probability of contracting a cold. 7.
b. Double blind: Neither the subject nor the administrator knows which treatment the subject is receiving. c. Placebo controlled: One of the treatment groups receives a placebo (made from inert ingredients) to have a base improvement to compare the other treatment groups to. Any improvements made beyond those of the placebo group are what the study is interested in. 19. The incidence rates are 0.084 (Vitamin C) and 0.086 (placebo).
Vitamin C Placebo Total No cold
122
109
231
Cold
17
31
48
Total
139
140
279
The probability of a major CV event while on Vitamin C is the same as the probability of a major CV event while on a placebo. The probability of a major CV event while on Vitamin C is less than the probability of a major CV event while on a placebo.
8. Vitamin C incidence is 17/139 = 0.122; placebo incidence is 31/140 = 0.221.
9. Difference in incidence rates is 0.099. This is large enough that it may signify that Vitamin C helps reduce colds.
where π is the probability of a major CV event.
10. 0.099
20. Because the participants were male physicians aged 50 and older, that is the population to which we can generalize.
11. Used the Two Proportions applet to carry out a test of significance.
Chapter 5 Research Article
12. You are simulating the null hypothesis. 13. The null distribution is bell-shaped and symmetric. Its center is located at zero. This makes sense because it is a distribution of differences in proportions that have been calculated through a simulation that assumes the null hypothesis is true. When the null is true the proportions are the same, hence the difference would be zero. 14. The p-value is 0.026. This offers strong evidence against the null and in support of the alternative that the probability of contracting a cold while taking a daily Vitamin C is less than the probability of contracting a cold while taking a daily placebo. 15. We can’t generalize our conclusion to the entire human population because the 279 skiers are not a random sample from the entire population, nor are they representative of the entire population. These are skiers at a resort, who are probably more physically fit than the general population, are probably from a higher socioeconomic class, from which follows that they have better health care, and are more likely to be Euro-Americans as skiing is not as popular outside of Europe and North America. 16. We can make a cause-and-effect conclusion because the Vitamin C and placebo tablets were randomly assigned to the skiers. 17. We wondered whether taking a daily Vitamin C tablet would prevent colds better than just taking a daily placebo tablet. In a double-blind experiment, 279 skiers were randomly assigned to take a daily placebo tablet or Vitamin C tablet and it was recorded whether or not they onset with a cold. In the Vitamin C group 12.2% onset with a cold, and 22.1% in the placebo group onset with a cold, for a difference of 9.9 percentage points. A test of significance using a randomization method was used to compare the proportions. With a p-value of 0.026 we can say that we have strong evidence against the null and in support of the alternative hypothesis that daily Vitamin C tablets prevent colds better than daily placebo tablets. We can say that the cause of the lower cold incidence in the Vitamin C group was due to the daily Vitamin C as it was a randomized experiment. We can’t generalize this to the entire population as the subjects were not representative of the entire population. Further studies should look at a random sample of the population of interest. Another possibility would be to run the experiment at different times of the year to see if the results are the same in different seasons. 18. a. Randomized: Subjects are randomly assigned to treatment groups.
c05Solutions.indd 71
1. The researchers are investigating potential relationships between career success, cause of death, and length of life. 2. Some prior research has shown high performers in various fields tend to have shorter life spans (Redelmeier and Kwong, 2004), whereas other prior research has argued that performance-based success and fame usually translate into health advantages (Redelmeier and Singh, 2001). 3. 999, after dropping one duplicate in their initial sample of 1000 4. Occupation/career (explanatory) and cause of death (response) 5. Male: 813/999 or 81.4%; female: 186/999 or 18.6% 6. Average at death for males: 80.35; for females: 78.80 7. Null: The probability that a male whose obituary is in the NY Times dies before age 70 is the same as the probability that a female dies before age 70. Alternative: The probability that a male whose obituary is in the NY Times dies before age 70 is different than the probability that a female dies before age 70. Because the p-value is < 0.02, there is strong evidence that the probability a male whose obituary is in the NY Times dies before age 70 is different than the corresponding probability for females. 8. Null: The probability that a male whose obituary is in the NY Times had a career in performance/sports is the same as the probability that a female had a career in performance/sports. Alternative: The probability that a male whose obituary is in the NY Times had a career in performance/sports is different than the probability that a female had a career in performance/sports. Because the p-value is < 10−5, there is strong evidence that the probability a male whose obituary is in the NY Times had a career in performance/sports is different than the corresponding probability for females. 9. Null: The probability that a male whose obituary is in the NY Times died from lung cancer is the same as the probability that a female died from lung cancer. Alternative: The probability that a male whose obituary is in the NY Times died from lung cancer is different than the probability that a female died from lung cancer. Because the p-value is 0.005, there is strong evidence that the probability a male whose obituary is in the NY Times died from lung cancer is different than the corresponding probability for females. 10. The p-value is likely below 0.05.
10/16/20 9:11 PM
72
C HA PTER 5
Comparing Two Proportions
11. No, the sample is not a random sample. They selected all obituaries between 2009 and 2011. This is not a random sample of all NY Times obituaries, or even recent NY Times obituaries. It is a convenience sample.
13. Females were significantly overrepresented in the NY Times performance/sports category. People in this category tend to have shorter lifespans. Females were underrepresented in longer-lived fields of NY Times interest such as professionals/academics.
12. The researchers could have expanded the scope of their analysis to different eras in time to see if the trends hold up in different eras. Another area of improvement would be to look at obituaries of “common” individuals (say in a typical local paper) and compare the patterns of association between the two different types of obituaries/individuals.
14. The researchers are essentially saying that cause–effect conclusions cannot be drawn from their study because of potential confounding variables (e.g., complex variables that predispose to risk taking) which are present because this is an observational study.
c05Solutions.indd 72
10/16/20 9:11 PM
CHAPTER 6
Comparing Two Means e. Raleigh will be larger because, visually, the data are much more spread out.
Section 6.1 6.1.1 B.
6.1.14
6.1.2 A. (The upper quartile could equal the lower quartile.)
a. SF: 57, Raleigh: 59.5. With regards to median average monthly temperature, the two cities are quite similar.
6.1.3 B. 6.1.4 A, D, and E.
b. Raleigh: lower quartile 46.5, upper quartile 72.5, IQR 72.5 – 46.5 = 26; SF: lower quartile 52.5, upper quartile 62.5, IQR 10. With regards to IQR, the variability in monthly average temperature in Raleigh is much larger than in San Francisco.
6.1.5 A. 6.1.6 B, D. 6.1.7 B.
6.1.15 The shape of the female study hours distribution is right skewed, while the distribution for males is more symmetric. In particular, no males report more than 20 hours of studying per week, while a number of females report more than 20 (as many as 45 hours) of studying per week. Thus, the spread of the female distribution is quite a bit larger than for males. Finally, the center of the female study hour distribution is closer to 15–20 hours per week, compared to only approximately 10 hours per week for males.
6.1.8 B, D. 6.1.9 a. All five numbers will increase by 5; the IQR will not change because it is a measure of variability—the difference in the 3rd and 1st quartiles will not change. b. The maximum will change; there will be little to no impact on the other four values in the five-number summary or on the IQR. c. The minimum will change; there will be little to no impact on the other four values in the five-number summary or on the IQR.
6.1.16
6.1.10 a. 66
b. Males: 11, females: 14. The variability in study hours per week is larger for females.
b. Q1 = 64.5, Q3 = 67
c. See the following boxplots and dotplots
a. Males: 10, females: 15. Females tend to study more than males.
c. IQR = 2.5 d. See the following boxplot and dotplot M (n = 23) median = 10 IQR = 15 – 4 = 11
60
64 68 Observation
72
6.1.11 a. 25th value is 7 and 26th value is 8, so median is 7.5 b. Lower quartile is 6; upper quartile is 12 c. Larger because the data are right-skewed 6.1.12 Mean: smaller, median: same, SD: smaller, IQR: same 6.1.13 a. The months b. Average monthly temperature; quantitative c. Yes, the data are fairly symmetric. d. Yes
c06Solutions.indd 73
0
10
20 30 Observation
40
50
F (n = 47) median = 15 IQR = 20 – 6 = 14
6.1.17 Minimum, 25th percentile, median, 75th percentile, and maximum 6.1.18 IQR 6.1.19 a. 3, 10, 30, 87.5, 250 b. Skewed right 6.1.20 There are no low outliers, but two high outliers: 200 and 250. 6.1.21 a. Right skewed b. Mean larger than median c. Median = 4
73
10/16/20 9:09 PM
74
C HA PTER 6
Comparing Two Means
d. Mean is larger, confirming the earlier answer e. First quartile = 3, third quartile = 5, so the five-number summary is: Min = 1, Q1 = 3, median = 4, Q3 = 5, and Max = 11
c. Median: 30 in each group; not as much because medians are so similar 6.1.30
6.1.22
a. Right
a. Min = −0.50, first quartile = 0.50, median = 1.25, third quartile = 2, Max = 4
b. The space between the five numbers in the five-number summary increases as you move from min towards max.
b. 2 − 0.50 = 1.50
6.1.31 a. Explanatory: shelf level, response: sugar content
c. No change to IQR; would increase standard deviation 6.1.23 a. The Mounds candy bars have a lower center and are less variable compared to PayDay. b. Mounds has lower mean and lower standard deviation. c. Yes, the distributions are different. 6.1.24 a. Mounds: Min = 19.1, Q1 = 20, median = 20.2, Q3 = 20.55, Max = 21.5; PayDay: Min = 19.5, Q1 = 21.25, median = 22.2, Q3 = 22.8, Max = 23.3 b. See boxplots for Solution 6.1.24b. 6.1.25
b. The distributions suggest less sugar content on high shelves (an association). c. Mean, low: 11.925; mean, high: 9.625—supports an association because they are different d. Median, low: 13; median, high: 9—supports an association because they are different e. The medians are farther apart than the means are. 6.1.32 a. Not much, the distributions look quite similar b. One peak represents the male heights and the other peak represents the female heights. c. Mean = 67.657 (no); mean = 66.688 (yes)
a. 85 seconds is Wendy’s, 173 seconds is Hot ’n Now. b. The IQR for Hot ’n now is 116.5 seconds, for Wendy’s it is 75 seconds. c. For both restaurants the mean will be larger than the median because the data are right skewed. 6.1.26 a. Hot ’n Now mean is 203 seconds, Wendy’s is 93.7 seconds. b. SD of Hot ’n Now is 89.6 seconds, for Wendy’s it is 46.7 seconds. 6.1.27 a. Females b. Median is smaller due to right-skew. c. Males have larger SD but smaller IQR because there are some extreme values for the male data but not for the female data. 6.1.28
d. No, median is 67 in both groups e. No, can’t draw observational study
cause-and-effect
conclusions
from
an
6.1.33 a. Christina: 1, 2, 2.5, 4, 7; Christopher: 1, 1.25, 2, 3, 6 b. Yes, the median evacuation score for Christopher is lower than the median evacuation score for Christina. 6.1.34 a. Median = 10; Q1 = 4.5; Q3 = 14; IQR = 9.5 The middle 50% of the number of cups of coffee consumed in a week spans a range of 9.5 cups. b. There are no outliers according to the 1.5 outlier rule. c. See boxplot of cups of coffee.
a. Right b. Mean larger c. No because the distributions look fairly similar 6.1.29
0
a. Yes, maybe, because the distribution of exercise seems more skewed for the no-car-crash group than the car-crash group b. Mean: 57.037 (no crashes), 32.375 (crashes); yes, the means are different
8
16
24
Cups
d. The distribution of cups of coffee per week is slightly skewed to the larger numbers (right).
Mounds (n ˜ 20)
19
20
21 22 Weight (g)
23
24
Payday (n ˜ 20)
Solution 6.1.24b
c06Solutions.indd 74
10/16/20 9:09 PM
Solutions to Problems 75 6.1.35 a. Median number of flip-flops for males is 1 and median number of flip-flops for female is 6. b. There are a large number of males who have only one pair of flipflops, thus the minimum value, first quartile, and median value are all equal to 1.
students who took notes on their computer compared to students who took notes on paper. Scores were compact and symmetric about the median for students who took notes on paper and the scores for those who took notes on a computer were in two clumps (bimodal) with one unusually low score.
c. Using the 1.5 IQR rules there are two males who are outliers at 8 and 10. There are no outliers in the female flip-flop data. 6.1.36 a. Females have a higher median haircut cost at $35 compared to males at $19.
Computer (n = 20)
b. Females have more variability in haircut costs. c. The median and third quartile values for males are so close, $19 and $20 respectively, that it is hard to distinguish these two lines in the boxplot. d. Using the 1.5 IQR rule, the males have outlier haircut costs below $3.75 and above $29.75; and females only have high outliers above $93.50. 6.1.37 a. Experiment b. Explanatory variable is the presence of music, it is categorical. c. Response variable is the number of words memorized; it is quantitative. d. See graphs below. The median is higher for the no music group, 8.5, compared to the music group, 7. The standard deviations for both groups are roughly the same; however, the data for the no music group is very compact about the median with a few low outliers and one unusually high outlier, whereas the data for the music group is fairly evenly spread.
Paper (n = 20) 0
2
4 6 Quiz score
8
10
e. Students who took notes on paper tended to have higher quiz scores compared to those who took notes on their computer, so there is preliminary evidence to support the students’ conjecture that there is a differnce in scores for the different note-taking methods. 6.1.39 When the data are strongly skewed (long tails) or when you care about the center of the distribution. 6.1.40 D. 6.1.41 True 6.1.42 IQR
Section 6.2 6.2.1 A, B, C, and D are all appropriate. 6.2.2 A.
No (n = 18)
6.2.3 B, C. 6.2.4 D. 6.2.5 a. Observational study; there is no randomization of treatment.
0
6 12 Number of words
18
Yes (n = 18)
e. As the median number of words memorized is higher for the no music group, there is preliminary evidence for the students’ conjecture that music would interfere with the memorization task. 6.1.38 a. Experiment b. Explanatory variable is type of note-taking; it is categorical. c. Response variable is the quiz score; it is quantitative. d. See graphs below. Students who took notes on paper had a higher median score, 7, compared to students who took notes on their computer, median was 5. There was more variability in the scores from
c06Solutions.indd 75
b. Each student c. Explanatory: sex (categorical); response: study hours (quantitative) d. B. e. E. 6.2.6 a. Observational study; there is no randomization of treatment. b. Each student c. Explanatory: gender (categorical); response: number of Facebook friends (quantitative) d. B. e. A. f. B, C.
10/16/20 9:09 PM
76
C HA PTER 6
Comparing Two Means
g. C.
b. Statistics students
h. See the following graph
c. Explanatory: sex, (categorical); response: number of flip-flops (quantitative)
Mean ° 0.170 SD ° 97.891
d. B. e. A.
40
f. B, C. g. C. h. See the following graph
Count
80
Mean = −0.010 SD = 1.214
180
120 Count
150 160
200 ˜400
90 60
˜200 0 200 Shuf˜ed differences in means
400
30 −4
6.2.7 b. There is moderate to strong evidence against the null hypothesis. 6.2.8 a. The probability of obtaining a difference in the average number of Facebook friends for men and women of 188.7 or larger or −188.7 or smaller by chance (if the null is true) is approximately 0.052. b. There is moderate to strong evidence against the null hypothesis. 6.2.9
4
a. 3.68/1.214 = 3.03. The mean of 3.68 is 3.03 standard deviations above the mean of the null distribution. b. We have very strong evidence against the null hypothesis. 6.2.13 a. The probability of obtaining a difference in the average number of flip-flops for men and women of 3.68 or larger or −3.68 or smaller by chance (if the null is true) is approximately 0.001. b. We have very strong evidence against the null hypothesis.
a. 188.7 ± 2 × 97.89 = 188.7 ± 195.8 = (−7.1, 384.5) b. We are 95% confident that women have between −7.1 and 384.5 more Facebook friends on average than men do. 6.2.10 a. See following graph; differences in sample medians, which is 532 − 485 = 47 Mean = ˜15.007 SD = 101.096
6.2.14 a. 3.68 ± 2 × 1.214 = 3.68 ± 2.43 = (1.25, 6.11) b. We are 95% confident that females own between 1.25 and 6.11 more flip-flops on average than males in the population. c. Yes, the confidence interval does not include 0. 6.2.15 a. See the following graph Mean = 0.014 SD = 2.348
250 100
200 Count
Count
−2 0 2 Shuf˜ed differences in means
6.2.12
a. 188.7/97.89 = 1.93
50
120
150
150 100
200
50 −6
250 ˜400
˜200 0 200 Shuf˜ed differences in medians
b. 47/101.098 = 0.46 6.2.11 a. Observational study; no randomization of sex
c06Solutions.indd 76
400
−3 0 3 Shuf˜ed differences in medians
b. 5/2.348 = 2.13 6.2.16 a. Students b. Explanatory: anchor (Chicago or Green Bay, categorical); response: guess of the population size of Milwaukee, WI (quantitative) c. Experiment; randomly assigned anchoring city. d. No, just students in an introductory statistics class
10/16/20 9:09 PM
Solutions to Problems 77 e. Yes, randomly assigned to one of the two anchor groups f.
Sample size
Sample mean
Sample SD
Chicago
35
1,357.3
802.2
Green Bay
34
271.4
371
g. 1357.3 − 271.4 = 1,085.9 h. The difference happened just by chance and the true average guesses in each anchor group are the same. There is actually a difference between the average guesses between the two groups.
thousand larger with the Chicago anchor than with the Green Bay anchor. This demonstrates a potential cause and effect due to the random assignment, but caution should be taken when generalizing this result beyond the study participants as a random sample was not taken. 6.2.20 a. The observational units are the obituaries, the explanatory variable is whether or not they had children, and the response variable is the age of death. b. Observational study because the researcher did not assign variable values to the observational units. c.
Sample size
Sample mean
Sample SD
Had children
70
78.43
14.36
No children
20
63.9
25.81
6.2.17 a. The long-run average guess of the population size of Milwaukee if given the Green Bay anchor (μGreen Bay); the long-run average guess of the population size of Milwaukee if given the Chicago anchor (μChicago) b. Null: The long-run average guess of the population size of Milwaukee when given the Green Bay anchor is the same as with the Chicago anchor. Alternative: The long-run average guess of the population size of Milwaukee when given the Green Bay anchor is smaller than when given the Chicago anchor. c. Null: μGreen Bay = μChicago; Alternative: μGreen Bay < μChicago d. Write out all 69 student guesses on 69 separate slips of paper. Shuffle the papers and deal into two stacks (35 and 34, respectively). Find the difference in the averages of the two stacks. Repeat many times and count what proportion of the time values less than −1,085.9 occur in the simulated data. 6.2.18 a. The p-value is 0, meaning that in 1000 simulations we never got a value of −1085.9 or smaller. b. There is very strong evidence against the null hypothesis. c. −1,085.9/202.7 = −5.36; the observed difference in the average guesses for Milwaukee’s population between the Green Bay anchored group and the Chicago anchored group is 5.36 standard deviations below the hypothesized difference of 0. d. −1,085.9 ± 2 × 202.7 = −1,085.9 ± 405.4 = (−1,491.3, −680.5) e. Yes, 0 is not in the interval. f. We have very strong evidence that the long-run average guess of the population size of Milwaukee, Wisconsin is between 680.5 and 1491.3 thousand larger with the Chicago anchor than with the Green Bay anchor. This demonstrates a potential cause and effect due to the random assignment, but caution should be taken when generalizing this result beyond the study participants because a random sample was not taken. 6.2.19 a. The p-value is 0, meaning that in 1,000 simulations we never got a value of −1,351 or smaller. b. There is very strong evidence against the null hypothesis. c. −1,351/359 = −3.76; the observed difference in the median guesses for Milwaukee’s population between the Green Bay anchored group and the Chicago anchored group is 3.76 standard deviations below the hypothesized difference of 0. d. −1,351 ± 2 × 359 = −1351 ± 718 = (−2,069, −633) e. Yes, 0 is not in the interval. f. We have very strong evidence that the long-run median guess of the population size of Milwaukee, Wisconsin is between 633 and 2,069
c06Solutions.indd 77
d. 78.43 − 63.9 = 14.53 e. There is a difference in the population mean lifespans of the two groups. There isn’t a difference in the population mean lifespans, and the observed difference happened by chance. 6.2.21 a. The population average lifespan of men with children (μhad); the population average lifespan of men without children (μnone) b. Null: The population average lifespan of men with children is the same as without. Alternative: The population average lifespan of men with children is longer than without. c. Null: μhad = μnone ; Alternative: μhad > μnone d. Write out all 90 lifespans on 90 separate slips of paper. Shuffle the papers and deal into two stacks (70 and 20, respectively). Find the difference in the averages of the two stacks. Repeat many times and count what proportion of the time values greater than 14.53 occur in the simulated data. 6.2.22 a. The p-value is approximately 0, meaning that we never got values of 14.529 or larger in the simulation. b. We have very strong evidence against the null hypothesis. c. 14.529/4.7 = 3.1; the observed difference in the average lifespan of men with children and men without children is 3.1 standard deviations above the hypothesized difference of 0. d. 14.529 ± 9.4 = (5.1, 23.9). We are 95% confident that the men with children lived on average between 5.1 and 23.9 years longer than those without children. e. Yes, 0 is not in the interval. f. We have strong evidence that the average life span of men with children is longer than those without; between 5.1 and 23.9 years longer on average. This is not a cause-and-effect conclusion (necessarily) but can be generalized to men with obituaries in the San Luis Obispo Tribune in 2012. 6.2.23 a. The p-value is approximately 0.009, meaning that we rarely got values of 16.5 or larger in the simulation. b. We have very strong evidence against the null hypothesis (that the medians are the same in the two groups). c. 16.5∕6 = 2.75; the observed difference in the median lifespan of men with children and men without children is 2.75 standard deviations above the hypothesized difference of 0.
10/16/20 9:09 PM
78
C HA PTER 6
Comparing Two Means
d. 16.5 ± 2 × 6 = (4.5, 28.5). We are 95% confident that the difference in population medians is between 4.5 and 28.5.
6.2.27
e. Yes, 0 is not in the interval.
b. Any value between 0 and infinity
f. We have very strong evidence of a difference in the medians; in fact, men with children have a median lifespan between 4.5 and 28.5 years longer than men without children. Athough not necessarily a cause-and-effect relationship, the result can be generalized to men with obituaries in the San Luis Obispo Tribune in 2012. 6.2.24 a. Observational study: Researchers did not assign variable values to the fish. b. Each fish c. Explanatory: type of tuna (categorical), response: mercury level (quantitative)
a. Yes, the SD for carbon is higher. c. 6.25/4.89 = 1.28 d. Write the 56 commute times on 56 slips of paper. Shuffle and deal into two piles of 26 and 30 each. Compute the SD of each pile and then compute the ratio of the two SDs (SD of pile of 26 over SD of pile of 30). Repeat many times and see how often a ratio of 1.28 or larger is obtained in the simulations. e. 1, because having the SDs be equal is what happens if the null hypothesis is true f. See the following graph
d. The average mercury level in albacore tuna (μalbacore); the average mercury level in yellowfin tuna (μyellowfin) e. Null: The average mercury level in albacore tuna is the same as in yellowfin tuna (μalbacore = μyellowfin). Alternative: The average mercury level in albacore tuna is different than in yellowfin tuna (μalbacore ≠ μyellowfin). f. Difference in sample means (0.357 − 0.354 = 0.003) g. The p-value is 0.94. We have little to no evidence of a difference in average mercury levels between the two types of tuna. h. 0.003 ± 2 × 0.036 = (−0.069, 0.075). We are 95% confident that the difference in average mercury levels between the two types of tuna is between −0.069 and 0.075. i. We have little to no evidence of a difference in average mercury levels between yellowfin and albacore tuna (95% CI: −0.069, 0.075). This result can generalize to the populations of fish being sampled, because random samples were used. A cause-and-effect conclusion would not have been possible here because this was not a randomized experiment. 6.2.25
0.6
0.8
1.0
1.2 Ratio
1.4
1.6
1.8
g. B. h. D. 6.2.28
f. Difference in medians
a. The explanatory variable is the note-taking method (categorical) and the response is the quiz score (quantitative).
g. The p-value is 0.16. We have little to no evidence of a difference in medians between the two types of tuna.
b. Since the note-taking method was randomly assigned, it is an experiment.
h. 0.049 ± 2 × 0.034 = 0.049 ± 0.068 = (−0.019, 0.117)
c. Null: There is no association between the note-taking method and the quiz score (μpaper = μcomputer). Alternative: Taking notes on paper will result in higher quiz scores, on average (μpaper > μcomputer). _ _ d. xpaper = 6.92; xcomputer = 5.50
i. We have little to no evidence of a difference in median mercury levels between the two types of tuna (95% CI: −0.019, 0.117). This result can generalize to the populations of fish being sampled because random samples were used. A cause-and-effect conclusion would not have been possible here because this was not a randomized experiment. 6.2.26 a. The 56 bike trips b. The explanatory variable is the bike (carbon or steel), which is categorical, and the response is the commute time, which is quantitative. c. Null: There is no association between the type of bike and the commute time (μcarbon = μsteel). Alternative: There is an association between the type of bike and the commute time (μcarbon ≠ μsteel). _ _ d. xcarbon = 108.34 min; xsteel = 107.81 sec; the steel frame bike is faster, on average e. p-value ≈ 0.72 f. We do not have strong evidence that there is difference in mean commute times for the two types of bikes, in the long run.
c06Solutions.indd 78
e. p-value ≈ 0.003 f. We have strong evidence that taking notes on paper will result in higher quiz scores, on average. We can determine the note-taking method caused the difference because it was a randomized experiment. We can generalize to students like those that were in the study, but is difficult to be more broad than that. 6.2.29 a. 1.425 ± 2(0.550) = (0.325, 2.525) b. Taking notes on paper will result in quiz scores that are 0.325 to 2.525 points higher, on average, in the long run. c. Yes there is strong evidence of a mean difference in quiz scores between the two types of note-taking groups because 0 is not included in the interval. 6.2.30 a. 1.425/0.550 = 2.59
10/16/20 9:09 PM
Solutions to Problems 79 b. The difference between the average quiz scores (taking notes on paper – taking notes on computer) of 1.425 is 2.59 standard deviations above the mean of the null distribution. c. Yes, there is strong evidence of a difference in the mean quiz scores between the two types of note-taking groups because the standardized statistic is more than 2. 6.2.31 a. The explanatory variable is the presence of music (categorical) and the response variable is the number of words memorized (quantitative). b. It is an experiment because the subjects were randomly assigned to listen to music or not. c. Null: There is no association between the presence of music and the number of words memorized (μmusic = μno music). Alternative: The presence of music hinders the number of words memorized (μmusic < μno music). _ _ d. xmusic = 7.00 and xno music = 8.61 e. p-value ≈ 0.11 f. We do not have strong evidence that the presence of music hinders the number of words memorized, in the long run. 6.2.32 a. –1.611 ± 2(1.266) = (–4.143, 0.921) b. The number of words memorized with music is between 4.143 lower to 0.921 higher, on average, than memorizing without music. c. No there is not strong evidence of a difference in the mean number of words memorized between the music and no music groups because 0 is included in the interval. 6.2.33 a. –1.611/1.266 = –1.27 b. The difference between the average number of words memorized (with music – without music) of –1.611 is 1.27 standard deviations below the mean of the null distribution. c. No there is not strong evidence of a difference in the mean number of words memorized between the music and no music groups because the standardized statistic is more than –2. 6.2.34 a. The explanatory variable is the type of elephant (categorical) and the response is the distance traveled (quantitative).
6.2.36 a. 0.099/0.663 = 0.15 b. The difference between the average distance traveled between elephants (African – Asian) of 0.099 is 0.15 standard deviations above the mean of the null distribution. c. No there is not strong evidence of a difference in the mean distance walked between the Asian and African elephants because the standardized statistic is less than 2. 6.2.37 a. The explanatory variable is the type of animal (categorical) and the response is the time it takes to get to the food (quantitative). b. This is an observational study because there was no random assignment of the type of animal. c. Null: There is no association between the type of animal and time it takes to get to the food (μ dog = μwolf). Alternative: There is an association between the type of animal and the time it takes to get to the food (μdog ≠ μwolf). _ _ d. xdog = 33.27 sec; xwolf = 22.41 sec e. p-value ≈ 0.12 f. We do not have strong evidence that there is an association between the type of animal and the time it takes to get to the food, in the long run. 6.2.38 a. Null: There is no association between the type of animal and the time it spends at the fence looking at the food (μdog = μwolf). Alternative: There is an association between the type of animal and the time it spends at the fence looking at the food (μdog ≠ μwolf). _ _ b. xdog = 19.12 sec; xwolf = 10.01 sec c. p-value ≈ 0.07; Although we have moderate evidence, we do not have strong evidence that there is an association between the type of animal and the time it spends at the fence looking at the food, in the long run. d. 9.104 ± 2(5.096) = (–1.088, 19.296); yes, because the confidence interval contains 0 6.2.39 True
Section 6.3 6.3.1 B. 6.3.2 D.
b. This is an observational study because there was no random assignment of the type of elephant.
6.3.3 A.
c. Null: There is no association between the type of elephant and the distance traveled (μ African = μAsian). Alternative: There is an association between the type of elephant and the distance traveled (μAfrican ≠ μAsian). _ _ d. xAfrican = 5.40 km and xAsian = 5.30 km
6.3.5
e. p-value ≈ 0.89. We do not have strong evidence that there is an association between the type of elephant and distance traveled, in the long run. 6.2.35 a. 0.099 ± 2(0.663) = (–1.227, 1.425) b. The distance traveled by African elephants is between 1.227 km per day less to 1.425 km per day more, on average, than Asian elephants in the long run. c. No there is not strong evidence of a difference in the mean distance walked between the Asian and African elephants, because 0 is included in the interval.
c06Solutions.indd 79
6.3.4 A. a. The difference in the population mean number of hours spent online daily, in particular how many more hours are spent by females then males b.
i. No change. ii. No change. iii. Will change in sign (negative to positive or vice versa). iv. No change. v. Will change in sign. vi. No change. vii. Will change in sign. viii. Both will change in sign. ix. No change.
c. Yes, the exact same conclusion will be reached about strength of evidence against the null hypothesis. The CI will be different because
10/16/20 9:09 PM
80
C HA PTER 6
Comparing Two Means
it is estimating a different parameter but will mean the same thing in terms of whether males or females are, on average, spending more hours online each day. 6.3.6 a. Incorrect b. Incorrect c. Incorrect d. Incorrect e. Incorrect f. Correct g. Incorrect h. Incorrect i. Correct 6.3.7 B and E. 6.3.8 Null: The population average VDAS score for early birds is the same as the population average VDAS score for night owls. Alternative: The population averages are different for the two groups. As the sample sizes in each group are over 20 and the data within each group are not strongly skewed, a theory-based approach is appropriate here. The p-value for this test is < 0.0001, with a t-statistic of −5.06, meaning that we have very strong evidence that the population averages are different. In particular, the early bird group has between 0.5298 and 1.2102 lower VDAS scores on average (95% confidence) than the night owl group. We cannot infer cause and effect from this study because students were not randomly assigned to a sleep group. Furthermore, we cannot generalize to a broader population because this is not a random sample—more information about the characteristics of the students in the sample is needed to infer more broadly. 6.3.9 The test statistic (t) would have been even farther from 0, the p-value would be even smaller (stronger evidence against null), and the confidence interval would be centered on the same value but narrower. 6.3.10 The test statistic (t) would have been closer to 0, the p-value would be larger (weaker evidence against the null), and the confidence interval would be centered at a different value but would be the same width. 6.3.11 The test statistic (t) would have been closer to 0, the p-value would be larger (weaker evidence against the null), and the confidence interval would be centered at the same value but would be wider. 6.3.12 Null: The population average fear score is the same for early birds as it is for night owls. Alternative: The population average fear scores are different in the two groups. As the sample sizes are both over 20 and the data within each group are not strongly skewed, a theory-based approach is appropriate here. The p-value for this test is 0.1722, with a t-statistic of −1.37, meaning that we have little to no evidence that the population averages are different. A 95% confidence interval indicates that the early bird group average fear scores are between 0.6594 lower and 0.1194 higher than the night owl group. If the result was statistically significant, we could not infer cause and effect from this study because students were not randomly assigned to a sleep group. Furthermore, we cannot generalize to a broader population because this is not a random sample—more information about the characteristics of the students in the sample is needed to infer more broadly. 6.3.13 a. The difference in the long-run average guess of the population size of Milwaukee when given the Green Bay anchor (μ Green Bay) minus the average when given the Chicago anchor (μChicago)
c06Solutions.indd 80
b. Null: The long-run average guess of the population size of Milwaukee when given the Green Bay anchor is the same as with the Chicago anchor. Alternative: The long-run average guess of the population size of Milwaukee when given the Green Bay anchor is smaller than when given the Chicago anchor. (in symbols) Null: μGreen Bay = μChicago; Alternative: μGreen Bay < μChicago c. The theory-based validity conditions are met because the sample sizes are both over 20 and the data are not strongly skewed. There may be an outlier in the Green Bay group, but removing it makes the evidence of a difference even stronger, so it does not (substantively) change the conclusion. d. t-statistic = −7.25, p-value < 0.0001 e. We have very strong evidence that the long-run average guess of the population size of Milwaukee, Wisconsin is smaller with the anchor of Green Bay than with the anchor of Chicago. f. We are 95% confident that the students given the Chicago anchor have average guesses between 784.83 and 1387.10 thousand higher than when given the Green Bay anchor. g. Cause and effect is possible because students were randomly assigned to groups. The sample, however, was not taken randomly and so generalizing to a broader population should be done with caution once more information on the students in the sample is obtained. 6.3.14 Null: The long-run average MRT score for men is the same as for women. Alternative: The long-run average MRT scores are different between the two groups. The validity conditions are met because the sample sizes are over 20 in each group and there is not strong skewness. The p-value < 0.0001 (t = 5.57) means that there is very strong evidence that the long-run averages are different. A 95% confidence interval indicates that men have an average MRT score between 2.92 and 6.16 higher than women. This does not suggest cause and effect because it is not a randomized experiment, and the results should be generalized to other men and women with caution because this was not a random sample and little information is given about the characteristics of the men and women in the study. 6.3.15 a. The long-run average difference in the time it took participants to walk down the icy path (Intervention minus control) b. Null: The long-run average time it took in the intervention group is the same as in the control group. Alternative: The long-run average time it took in the intervention group is less than the control group. c. Two-sample t-test d. Yes, because, even though the sample sizes are less than 20, the data are symmetrically distributed in each group e. t = −0.49; p-value = 0.3149 f. (−9.9021, 6.1021) g. We do not have strong evidence that the average walking time is significantly less for the intervention group (95% CI: 9.9021 seconds less to 6.1021 seconds more for intervention group). Because our results are not significant, we can’t determine cause and effect or generalize to a broader population. 6.3.16 a. To see whether the price of the pill was associated with its perceived effectiveness (measured via pain tolerance) b. Null: The average maximum tolerance when taking the regular-price pill is the same as when taking the discount-price pill. Alternative: The average maximum tolerance for the regular-price pill is different than for the discount-price pill. c. Two-sample t-test
10/16/20 9:09 PM
Solutions to Problems 81 d. At least 20 people are in each group, and the data are not strongly skewed in either group and so the validity conditions are met.
women. The sample sizes are at least 20 in both groups and the data are not strongly skewed within either group, so the theory- based test on these data is valid; p-value < 0.0001. The probability of obtaining a t-statistic of 5.07 or larger or −5.07 or smaller is less than 0.0001 if the null hypothesis is true; 95% CI: (−5.86, −2.54). Men have average verbal scores between 2.54 and 5.86 lower than women. We have strong evidence that the average verbal score is significantly higher for women (95% confident between 2.54 and 5.86 higher) than for men. This is not necessarily a cause-and-effect conclusion, nor can the result be confidently generalized to a broader population as the sample was not obtained randomly from a larger population.
e. p-value = 0.51; t = −0.66 f. The probability of getting a t-statistic of −0.66 or smaller or 0.66 or larger by chance if the null hypothesis is true is quite likely (51% of the time). g. (−12.39, 6.19) h. We have little to no evidence of a difference in the average maximum tolerance between the regular-price and discount-price pill (95% CI: −12.39, 6.19; p-value = 0.51). This study gives the potential for causeand-effect conclusion due to the use of random assignment but not the ability to generalize to a larger population without some caution because a random sample was not used to obtain the study’s participants.
6.3.19 a. The difference in the average spatial test scores between men who breathe through their left nostril as compared to their right nostril
6.3.17
b. Null: The average spatial score for men who breathe through their left nostril is the same as for men who breathe through their right nostril. Alternative: The average spatial scores are different for the two groups.
a. Each of the 47 students b. Explanatory: bonus or rebate group (categorical), response: amount of money the student spent (quantitative) c. Null: The long-run average amount spent by students in the bonus group is the same as in the rebate group. Alternative: The long-run averages are different in the two groups.
c. Two-sample t-test
d. See Solution 6.3.17d table.
e. p-value is 0.035 (t = −2.30)
d. Because the sample size is less than 20 in both groups, the data should be distributed symmetrically in both groups, and it is.
e. As the standard deviation increases, the t-statistic decreases, the p-value increases, and thus the strength of evidence weakens. Similarly, as the sample sizes get more unequally balanced between the groups, the t-statistic decreases, the p-value increases, and the strength of evidence weakens.
f. We have strong evidence that the average spatial score for men is different based on whether or not they breathe through their left or right nostril (0.2 to 4.8 higher on average for those breathing through the right nostril). This suggests a cause-and-effect relationship between spatial score and nostril due to the random assignment; however, this result should be generalized with caution because the sample was not obtained randomly.
6.3.18 a. Null: The average spatial score for men is the same as for women. Alternative: The average spatial score for men is different than for women.
g. p-value will get smaller h. No change
b. The sample sizes are at least 20 in both groups, and the data are not strongly skewed within either group.
i. p-value would get smaller 6.3.20
c. p-value = 0.0007. The probability of obtaining a t-statistic of 3.57 or larger or −3.57 or smaller is 0.0007 if the null hypothesis is true.
a. We don’t have enough evidence of a difference at the 1% significance level because 0.0162 is not less than 0.01.
d. 95% CI: (1.055, 3.745). Men have an average spatial score between 1.055 and 3.74 higher than women.
b. There would be enough evidence, because 0.0162 < 0.05. c. B and C are both true; A and D are both false.
e. We have strong evidence that the average spatial score is significantly higher for men (95% confident between 1.055 and 3.745 higher) than for women. This is not necessarily a cause-and-effect conclusion, nor can the result be confidently generalized to a broader population because the sample was not obtained randomly from a larger population.
6.3.21 The samples in each group are above 20 (29 and 27, respectively) and the data are not strongly skewed, so a two-sample t-test is appropriate here. Null: The population average wait time at Wendy’s is the same as at Hot N' Now. Alternative: The population average wait times are different. t = 5.78, p-value < 0.0001. We have very strong evidence that the population average wait times are different between the two restaurants.
f. Null: The average verbal score for men is the same as for women. Alternative: The average verbal score for men is different than for
Scenario 1 2 3 4
Sample sizes
Sample means Sample SDs t-statistic
Bonus
24
22
5
Rebate
23
10
5
Bonus
24
22
10
Rebate
23
10
10
Bonus
30
22
5
Rebate
17
10
5
Bonus
30
22
10
Rebate
17
10
10
p-value
8.22
< 0.0001
4.11
0.0002
7.91
< 0.0001
3.95
0.0004
Solution 6.3.17d
c06Solutions.indd 81
10/16/20 9:09 PM
82
C HA PTER 6
Comparing Two Means
6.3.22 We are 95% confident that the average wait time for Hot N’ Now is between 71.16 and 147.4 seconds longer than Wendy’s. The interval does not include 0, which is consistent with the test of significance in the previous question.
assignment, but the waitress should be cautious in generalizing because this was only with the 40 customers she had on a particular Sunday morning for brunch.
6.3.23 The samples in each group are above 20 (40 in each group) and the data are not strongly skewed, so a two-sample t-test is appropriate here. Null: The population average time in the bathroom for men is the same for women. Alternative: The population average time in the bathroom is different. t = 1.68, p-value = 0.097. We have moderate evidence that the population average restroom times are different between the two sexes.
a. Midpoint same, wider interval
6.3.24 We are 95% confident that the average time in the bathroom is 5 seconds less to 58.7 seconds longer for women than men. The interval does include 0, which is consistent with the test of significance in the previous question. 6.3.25 a. Null: The average score assigned to the essay when using a red pen is the same as with a blue pen. Alternative: The average score is lower when using the red pen. b. Not totally comfortable; would like to know that there are not large outliers or strong skewness in the scores in either group c. The sample size in each group d. Assuming 64 in each group; t = −1.97; p-value = 0.026 e. We have strong evidence that the average score assigned by students assigned to the essay is lower when a red pen is used compared to a blue pen. f. Yes, because random assignment was used to assign pen colors to students g. Shouldn’t generalize too far because these were introductory psych students grading the papers h. Yes, would be stronger with different population, but using undergraduate students is easy/convenient 6.3.26 a. Randomized experiment because randomness was used to determine whether or not the waitress gave her name b. Explanatory: gave name yes/no, response: amount of tip c. Null: The average tip amount when the waitress gives her name is the same as when she doesn’t give her name. Alternative: The average tip amount is larger when she gives her name. (in symbols) Null: μname = μno name; Alternative: μname > μno name. _ _ d. nname = 20, xname = $5.44, sname = $1.75, nno name = 20, xno name = $3.49, sno name = $1.13. e. Yes, sample size is 20 in both groups, without strong skewness. f. t = 4.19; p-value = 0.0002 g. We have strong evidence that the average tip amount is higher when she states her name. This means the waitress will probably start stating her name every time to boost tip amounts on average. 6.3.27 a. 95% CI: 1.00 to 2.90 b. We are 95% confident that tip amounts are between $1 and $2.90 larger on average when she states her name. c. Yes, 0 is not in the interval. d. We have very strong evidence that the waitress receives larger tip amounts, on average, when she states her name (between $1.00 and $2.90 more). This is a cause-and-effect relationship due to the random
c06Solutions.indd 82
6.3.28 b. (0.81, 3.09), as predicted 6.3.29 a. t-statistic farther from 0, p-value smaller, and narrower confidence interval b. t = 5.92; p-value < 0.0001; 95% CI: (1.29, 2.61), as predicted 6.3.30 a. t-statistic closer to 0, p-value larger, and wider confidence interval b. t = 3.08; p-value = 0.0019; and 95% CI: (0.67, 3.23), as predicted 6.3.31 a. Null: There is no association between the name of the hurricane and the perceived danger (or evacuation rating) (μmale = μfem). Alternative: There is an association between the name of the hurricane and the perceived danger (or evacuation rating) (μmale ≠ μfem). b. The validity conditions are met because we have large sample sizes nmale = nfem = 71 and the datasets are not highly skewed. _ _ c. xfem = 5.01; xmale = 5.57 d. t = 2.87; p-value = 0.0048 e. We have strong evidence that average rating of whether they would evacuate for the hurricane with the female name is different (less) than that of a hurricane with a male name. 6.3.32 a. (0.1747, 0.9450); we can be 95% confident that, in the long run, the average hurricane rating for the hurricane named Christopher is 0.1747 to 0.9450 points higher than the hurricane named Christina. b. Because the confidence interval does not include 0, we do have significant results. Because the interval lies entirely above zero and our direction of subtraction was (male – female), the Christopher-named hurricane was rated significantly higher, meaning, on average, people are more likely to think they would evacuate for a hurricane named Christopher than a hurricane named Christina. 6.3.33 a. Null: There is no association between what people imagine themselves as (librarian or poet) and their creativity as measured by the number of different uses they can come up with (μlib = μpoet). Alternative: There is an association between what people imagine themselves as (librarian or poet) and their creativity as measured by the number of different uses they can come up with (μlib ≠ μpoet). b. The validity conditions are met because we have large sample sizes nlib = npoet = 32 and the datasets are not highly skewed. _ _ c. xlib = 60.34; xpoet = 92.16 d. t = − 4.07; p-value = 0.0002 e. We have strong evidence that average number of uses developed by the “eccentric poets” is different, that is, more than that for the “rigid librarians.” f. Because this test did not involve comparing real poets and real librarians, we cannot conclude that real poets are more creative than real librarians. 6.3.34 a. The theater majors generated more uses on average, 85.71 compared to 69.96.
10/16/20 9:09 PM
Solutions to Problems b. (–34.64, 3.14) We can be 95% confident that, in the long run, the average number of uses generated by biology majors is between 34.64 lower to 3.14 higher than that for theater majors. c. No, because the confidence interval includes 0. 6.3.35 a. Null: There is no association between eating breakfast daily and GPA (μbreakfast = μno breakfast). Alternative: There is an association between eating breakfast daily and GPA (μbreakfast ≠ μno breakfast) b. Because our sample sizes are both more than 20 and we have fairly symmetric data, the validity conditions are met. _ _ c. xbreakfast = 3.56; xno breakfast = 3.41 d. t = 1.94; p-value = 0.0564 e. We have moderate evidence (but not strong) that there is an association between eating breakfast and a student’s GPA in this population. 6.3.36 a. Null: There is no association between fitness level and mathematics achievement (μhigh = μlow). Alternative: Children with a high fitness level tend to have higher mathematics achievement than those with a low fitness level (μhigh > μlow). _ _ b. xhigh = 117.46; xlow = 108.92 c. t = 1.98; p-value = 0.0276 d. We have strong evidence (on a one-sided test) that children with a high fitness level tend to have higher mathematics achievement than those with a low fitness level. Since this is an observational study, we cannot say that higher fitness causes higher math scores. e. A child’s sex could be a confounding variable because a child’s sex appears to be associated with the fitness level since there were 58.3% boys in the high-fit group and only 33% in the low-fit group. We would also need to know whether a child’s sex is related to mathematical achievement to confirm that it is confounding. 6.3.37 The bigger the sample size you have and the less skewness you have, the better the theory-based and simulation will match up. This means “20” and “not strong skewness” are just some rough guidelines, and “not strong skewness” is a judgment call.
End of Chapter 6 Exercises 6.CE.1 a. The difference in the population mean of number of Harry Potter books read by girls vs. boys, in particular how many more books are read by girls compared to boys b.
i. No change. ii. No change. iii. Will change in sign (negative to positive or vice versa). iv. No change. v. Will change in sign. vi. No change. vii. Will change in sign. viii. Both will change in sign. ix. No change.
83
b. Observational study c. Null: The average flexibility of females after a physical wellness class is the same for males. Alternative: The average flexibility is different. d. The sample sizes are large in both groups (more than 20) without strong skewness, so theory-based methods are fine. e. p-value = 0.002; 95% CI: −1.362 ± 0.82 f. t = –3.46; p-value = 0.0007; 95% CI: (−2.14, −0.58) g. p-values and confidence intervals are very similar, which makes sense because validity conditions are met. h. No, 0 is not in the interval, which makes sense because the p-values are less than 0.05. 6.CE.3 a. Each game is an observational unit. The variables are explanatory (categorical: regular vs. replacement) and response (duration, quantitative). b. Neither c. The dotplots suggest that games with replacements tended to be longer. d. Null: The long-run average game duration is the same for replacement referees and regular referees. Alternative: The long-run average game durations are different. e. p-value = 0.009 f. The probability of obtaining a difference of 8.035 or larger (or −8.035 or smaller) in the sample means if there is no difference in the long-run average durations only occurs about 0.9% of the time. g. 8.035 ± 2 × 3.05, (1.94, 14.14) h. We are 95% confident that games with replacement referees take between 1.94 and 14.14 minutes longer on average than games with regular referees. i. We have strong evidence of a difference in the long-run average length of games officiated by replacement referees, with such games taking between 1.94 and 14.14 minutes longer on average than games officiated by regular referees. This is not a cause-and-effect relationship, nor can this result, necessarily, be generalized to a broader set of NFL games. 6.CE.4 a. Observational units are the 44 pieces of balsa wood, the explanatory variable is immersed in water yes/no, and the response variable is how far the piece of wood projects a dime into the air. b. Experiment, because pieces of wood were randomly assigned to be immersed or not c. Null: The long-run average elasticity is the same for immersed wood and nonimmersed wood. Alternative: The long-run average elasticity is less for immersed wood than for nonimmersed wood. d. The elasticity of the treated wood tends to be less than that for the nontreated wood (7.63 inches vs. 11.78 inches on average; range: 6–18 inches for untreated wood, 3–12 inches for treated wood). Both distributions are fairly symmetric. 6.CE.5
c. Yes, the exact same conclusion will be reached about strength of evidence against the null hypothesis. The CI will be different because it is estimating a different parameter but will mean the same thing in terms of whether boys or girls are, on average, reading more Harry Potter books. 6.CE.2
a. Write the 44 different elasticity values on 44 separate slips of paper. Shuffle the slips of paper and deal into two stacks of 22 each. Find the mean of each group and then the difference in those means. Repeat many times. The p-value is number of times that a difference of 4.159 or larger happens in the simulations.
a. Is one sex more flexible than the other?
b. The p-value is < 0.001.
c06Solutions.indd 83
10/16/20 9:09 PM
84
C HA PTER 6
Comparing Two Means
c. The p-value is the probability of obtaining 4.159 or larger when simulating the null hypothesis. d. We have very strong evidence that the long-run mean elasticity for the immersed wood is less than for the non-immersed wood. 6.CE.6 a. t = −5.50; p-value < 0.0001 b. Validity conditions are fine because sample size in each group is larger than 20 and the data are not strongly skewed in either group. c. Yes, they are quite similar. d. We have very strong evidence of a difference in the average elasticity of the two types of wood. 6.CE.7 a. 95% CI: −5.69 to −2.63. We are 95% confident that treated wood is between 2.63 and 5.69 inches less elastic than untreated wood. b. Yes, the confidence interval does not include 0. c. Cause and effect is possible because random assignment was used, but the wood was not a random sample, so generalizing to all balsa wood should be done with caution. 6.CE.8 a. Confession: Min = 2, Q1 = 9, median = 13, Q3 = 17, Max = 39, Boone: Min = 3, Q1 = 11, median = 14, Q3 = 20, Max = 47 b. See boxplots for 6.CE.8b. c. Based on the five-number summaries and boxplots, there does not appear to be much of a difference in the sentence lengths between the two books. 6.CE.9 a. Null: The average sentence length in Confession is the same as in Boone. Alternative: The average sentence lengths are longer in Confession. b. Boone: mean = 15.931, SD = 9.075, Confession: mean =13.982, SD = 8.027. c. The average sentence lengths are longer in the sample from Boone, so there will not be evidence they are longer in Confession! 6.CE.10 a. A two-sample test has two groups that are being compared; a two-sided test has to do with the alternative hypothesis. b. Yes, the previous question (6.CE.9) is a one-sided, two-sample test. c. Yes, testing to see whether there is a competitive advantage to uniform colors in the Olympics, by testing (Alternative hypothesis) whether the winning proportion for competitors wearing red uniforms is different than 0.50. 6.CE.11 a. Null hypothesis: Average SAT score improvement with coaching is the same as without coaching. Alternative hypothesis: Average SAT score im-
provement with coaching is more than without coaching. t = 4.28, p-value < 0.0001. We have strong evidence that SAT scores improve more with coaching than without coaching. b. 99% CI: (0.72, 2.88). Average SAT improvement scores are between 0.72 and 2.88 higher in the coaching group than the withoutcoaching group. c. Yes, the CI does not include 0. d. Yes, the p-value is very small. e. No, the CI only indicates a small (0.72 to 2.88) improvement on average f. Just because something is statistically significant it doesn’t mean that it is, necessarily, practically important. 6.CE.12 a. All major league baseball teams b. It is not a random sample, because data were taken from one year for all teams and it is not a randomized experiment because teams were not randomly assigned to the National or American League. c. Yes, this will indicate if the observed difference is beyond what would happen by chance and, to the extent that teams this year are like other years, may illustrate meaningful differences in average runs scored between the leagues.
Chapter 6 Investigation 1. Can people better memorize letters (i.e., memorize more letters on average) when they are presented in recognizable groupings? 2. This is a randomized experiment because the instructor decided which students received which type of grouping (recognizable or not recognizable) by random assignment. 3. The 51 introductory statistics students are the observational units. 4. Two variables: (1) grouping of letters (recognizable or not recognizable) and (2) number of letters memorized before the first mistake 5. Grouping of letters is a binary categorical explanatory variable; number of letters is a quantitative response variable. 6. Null: The long-run average number of words memorized in the recognizable groupings group is the same as the not recognizable groupings group. Alternative: The long-run average number of words memorized in the recognizable groupings group is greater than that of the not recognizable groupings group. 7. Mean recog = 14.32; SD recog = 8.52; Mean not recog = 11.15; SD not recog = 6.58; Diff = 14.32 − 11.15 = 3.17 difference in average scores. Yes, the difference is in the conjectured direction—namely, that people in the recognizable groupings group memorized more letters on average. 8. Count the number of simulated differences in mean scores that are 3.17 or larger and divide this by 1,000. The p-value from a simulation analysis is approximately 0.074.
Boone (n ˜ 29) median ˜ 14 IQR ˜ 20 ° 11 ˜ 9
0
10
20 30 Sentence length
40
50
Confession (n ˜ 55) median ˜ 13 IQR ˜ 17 ° 9 ˜ 8
Solution 6.CE.8b
c06Solutions.indd 84
10/16/20 9:09 PM
Solutions to Problems
85
9. The data provide moderate evidence that memorizing letters in recognizable groupings improves ability to memorize (higher average number of letters memorized).
d. Basketball throw; quantitative; subject (who is sitting on floor with legs straight and back against wall) uses a chest pass to push basketball as far as possible.
10. SD of the simulated differences in sample means is 2.14; thus, the 95% CI is 3.17 ± 2 × 2.14, which is 3.17 ± 4.28, which is the interval (−1.11, 7.45).
5. The children are on average 12.3 years old and consist of 360 boys (189 urban and 171 rural) and 247 girls (125 urban and 122 rural). All children in the study had no history of disease.
11. We’re 95% confident that the difference in long-run average number of letters memorized is between −1.11 and 7.45 letters, meaning that the recognizable groupings group could memorize as many as 7.45 more letters on average or as many as 1.11 fewer letters on average, as compared to the nonrecognizable groupings group.
6.
12. Yes, a cause–effect conclusion would be justified here if the difference between the groups had turned out to be significant because this is a randomized experiment. 13. This is not a random sample, so it’s not valid to generalize to a broader population. 14. Yes, because the sample sizes in each group are greater than 20 and not strongly skewed. 15. The p-value from theory-based t-test is 0.073, and the confidence interval for estimating the difference in long-run averages is −1.14 to 7.47. Both the p-value and confidence interval are similar to those obtained from simulation—as expected because the validity conditions were met. 16. We designed a study to investigate whether putting letters into recognizable groupings would improve average number of letters memorized. We found modest evidence of a higher average number of letters memorized. The sample of students was not a random sample from any population, so generalizations are not easily made. Follow-up studies could do testing on different populations (e.g., elderly, adult, high school). Because moderate evidence was found in the direction of the alternative, a follow-up study should use more people to increase the strength of evidence.
Variables
Urban
Rural
BMI
21 (3.4)
20.5 (3.4)
SR
14.3 (6.1)
14.5 (6.0)
BT
4.7 (0.9)
4.6 (0.8)
7. Null hypothesis: The average BMI of urban boys in Greece is the same as the average BMI of rural boys in Greece. Alternative hypothesis: The average BMI is different between urban and rural boys in Greece. Because the p-value is > 0.05, we do not have evidence of a difference in the average BMI of urban and rural boys in Greece. 8. Null hypothesis: The average SR of urban boys in Greece is the same as the average SR of rural boys in Greece. Alternative hypothesis: The average SR is different between urban and rural boys in Greece. Because the p-value is > 0.05, we do not have evidence of a difference in the average SR of urban and rural boys in Greece. 9. Null hypothesis: The average BT of urban boys in Greece is the same as the average BT of rural boys in Greece. Alternative hypothesis: The average BT is different between urban and rural boys in Greece. Because the p-value is > 0.05, we do not have evidence of a difference in the average BT of urban and rural boys in Greece. 10. The sample size is large in each group (189 and 171, well above the 20 recommended); assuming there is no strong skewness, the validity conditions are met for the t-tests being run in this study. 11. All three 95% confidence intervals will include zero, because none of the corresponding tests of significance have p-values < 0.05.
2.
12. The students were not randomly selected; they were recruited from randomly selected schools, and the students themselves were recruited, so we do not have any guarantee that they represent the schools or all urban or rural children in Greece. The researchers certainly hope that they are representative of the population of all urban and rural children in Greece, however.
a. Some data indicate that urban children have more body fat than their rural counterparts, while other data disagree (Mamalakis et al., 2000; McMurray 1999).
13. No, you cannot infer causation. This is an observational study, not a randomized experiment so cause-effect conclusions are not possible from this study.
b. Greek children are more obese than children in other countries (Mamalakis et al., 1996).
14. The sample is not actually random and thus does not necessarily represent Greek children. A future study could work harder to obtain a random sample. Further studies might look at other countries to see if similar/different patterns are true in those locations.
Chapter 6 Research Article 1. The researchers are interested in learning if there are significant differences between urban and rural children in terms of fitness.
3. Children were recruited from randomly selected schools in Trikala, Greece, or surrounding villages. Recruited probably means that students were contacted and given some incentive to participate in the study. We don’t know for sure because details are not provided. 4. a. Urban vs. rural; categorical: urban means from the city of Trikala (more than 70,000 inhabitants); rural means from a surrounding village (fewer than 2,000 inhabitants) b. BMI; quantitative; weight/(height)2 c. Sit and reach; quantitative, subject leans forward while sitting on the floor and distance (cm) they can reach past their toes is recorded (negative values for cm above toes).
c06Solutions.indd 85
15. False-positive findings; Type I errors. There is really no difference in the means of the response variables when comparing the populations of all urban and rural Greek children (the null hypothesis is true for all 14 tests). 16. a. All students in school (rural or urban) participate in the same physical education program so differences in physical activity may not be that different. b. If fitness is largely determined by genetics, then we wouldn’t expect to see much in the way of differences when looking at urban vs. rural children.
10/16/20 9:09 PM
CHAPTER 7
Paired Data: One Quantitative Variable Chapter 7 Section 7.1 7.1.1 A. 7.1.2 D. 7.1.3 A. 7.1.4 C. 7.1.5 a. False b. True c. True d. True 7.1.6 a. False b. False c. False d. True 7.1.7 a. Paired b. Not paired c. Paired 7.1.8 a. Not paired b. Paired c. Not paired 7.1.9 a. Appropriate b. Not appropriate
7.1.12 a. Have six punters kick the ball with helium and six kick the ball without helium. Compare the average distance kicked. b. Have each kicker kick each ball with the ball they kick first determined randomly; find the difference in the distances kicked for each punter. c. Part (b) is better because it uses pairing. It is better because some punters will tend to kick farther than others and the paired design accounts for that. 7.1.13 a. Have 15 people play with Brand A and 15 people play with Brand B. Compare the average distances between the two groups. b. Have each player play with each ball, with the order determined randomly; find the difference in distance hit for each player. c. The design in part (b) is better because it uses pairing. It is better because some players will tend to hit the ball farther than others and the paired design accounts for that. 7.1.14 a. Observational study, not paired b. Experiment, paired c. Experiment, paired 7.1.15 a. Experiment, paired b. Experiment, not paired c. Observational study, not paired 7.1.16 a. Have each student randomly assigned which hand (dominant or nondominant) they will use first. Each student does both hands. Find the difference in reaction times for each student.
7.1.10
b. Yes, because random assignment is used to determine which hand people used first.
a. Appropriate
7.1.17
b. Not appropriate
a. Randomly assign people to use either dominant or nondominant hand (10 in each group), have each person do the reaction time game once, and compare average reaction times between the two groups.
7.1.11 When using repeated measures, the same observational units are measured twice. When using matching similar (but not the same) observational units are each measured once.
b. Paired with repeated measures, because there is probably quite a bit of person-to-person variability in reaction times.
86
c07Solutions.indd 86
10/16/20 8:06 PM
Solutions to Problems 7.1.18 a. Have each person play the game once to get a baseline reaction time score. Pair up people who have similar reaction times based on baseline time. Randomly assign one member of each pair to use dominant hand and other to use nondominant hand. Find difference in reaction times within each pair of people. b. Less variation within pairs when using the paired with repeated measures design, so that one is more appropriate. 7.1.19 a. Randomly assign people to sit or exercise first. Have each person do the memorization activity while doing each activity. Find the difference in performances for each person. b. Yes, because random assignment is used to determine which activity people did first. 7.1.20 a. Randomly assign each of the 20 people to two groups of 10, with one group exercising and the other sitting. Compare average performance between the two groups. b. Paired with repeated measures, because there is probably quite a bit of person-to-person variability in memorization ability.
87
b. Paired with repeated measures, because there is probably quite a bit of person-to-person variability in running ability. 7.1.27 a. Have each person run the 5K once to get a baseline time. Pair up people who have similar times. Randomly assign one member of each pair to run the 5K having taken the supplement and the other member not. Find difference in scores within each pair of people. b. Less variation within pairs when using the paired with repeated measures design, so that one is more appropriate. 7.1.28 Make a set of pictures of 10 objects and a set of words of 10 different objects. Call this set A. For the other set (called set B) have words of the pictures from set A and pictures of the words from set A. Randomly assign 40 participants to one of the sets and the order in which they will complete the task (some with pictures first and some with words first, that is, some will see set A first and then set B, and the remaining vice versa). Have the participants look at pictures of the 10 objects for a certain amount of time and then have them write down all that they can remember. Do the same with the words. Their score will be the difference in the number of items they can remember with the pictures to the number of items they can remember with the words.
7.1.21 a. Have each person take the test once to get a baseline memorization score. Pair up people who have similar scores. Randomly assign one member of each pair to exercise and the other to sit. Find difference in scores within each pair of people. b. Less variation within pairs when using the paired with repeated measures design, so that one is more appropriate. 7.1.22 a. Randomly assign people to chew or not chew first. Have each person do the memorization activity twice: once with gum and once without. Find the difference in performances for each person. b. Yes, because random assignment is used to determine which activity people did first.
FAQ 7.1.29 a. Each participant would try both blood pressure medicines but we would randomize the order in which they take them. b. I suspect that current weight is more related to blood pressure than current height, so we would rather compare the effects of the blood pressure medicine on two people that are similar in weight. c. Answers will vary. It seems feasible to do the repeated measures so that would guarantee the two observations were “matched” on all variables, not just weight.
7.1.23
Section 7.2
a. Randomly assign each of the 20 people to two groups of 10, with one group chewing gum and the other group not. Compare average performance between the two groups.
7.2.2 C.
b. Paired with repeated measures, because there is probably quite a bit of person-to-person variability in memorization ability. 7.1.24 a. Have each person take the test once to get a baseline memorization score. Pair up people who have similar scores. Randomly assign one member of each pair to chew gum and the other not. Find difference in scores within each pair of people. b. Less variation within pairs when using the paired with repeated measures design, so that one is more appropriate.
7.2.1 B. 7.2.3 A. 7.2.4 C. 7.2.5 D. 7.2.6 a. 1 b. 3 c. 4 d. 5
7.1.25
7.2.7
a. Randomly assign people to first use caffeine supplement or not. Have each person run the 5K twice: once with supplement and once without. Find the difference in performances for each person.
a. 2 b. 5
b. Yes, because random assignment is used to determine which activity people did when.
d. 3
7.1.26 a. Randomly assign each of the 30 people to two groups of 15, with one group getting caffeine supplement and the other group not. Compare average performance between the two groups.
c07Solutions.indd 87
c. 4 7.2.8 a. Null: There is no association between a person’s sex and body flexibility. Alternative: There is an association between a person’s sex and body flexibility such that females are more flexible than males. b. Explanatory: sex, response: flexibility
10/16/20 8:06 PM
88
C HA PTER 7
Paired Data: One Quantitative Variable 7.2.15 _ a. x d = 2.5
c. Independent groups 7.2.9 a. Null: The impatience of older children is the same as for younger children. Alternative: The impatience of older children is higher than for younger children.
b. The p-value does not appear to be very small, because 2.5 is not very far out in the tail of the distribution.
b. Explanatory: older/younger, response: impatience level
d. There is not strong evidence that, on average, the predicted scores are higher than the actual scores because the standardized statistic is less than 2, indicating that the observed mean difference is within 2 SDs of the 0 difference.
c. Paired 7.2.10 Differences, because that is what is actually being tested. 7.2.11 The coin flip is rerandomizing the values of the response variable within each pair to simulate the null hypothesis of no association between the group and the response variable value but maintaining the pairing in the study design. 7.2.12 The standard deviation of the differences is quite a bit smaller than the standard deviation of the response variable (weight) within each group (Fresh/Soph).
c. t = 2.5/1.694 = 1.48
7.2.16 a. The explanatory variable is type of rebound (categorical) and the response variable is the number of shots made (quantitative). b. Randomized experiment _ c. x d = 0.875
7.2.13
d. The p-value appears to be very small, because 0.875 is quite far out in the tail of the null distribution.
a. Husband and wife marriage ages tend to be similar.
e. t = 0.875/0.336 = 2.60
b. Husband and wife marriage ages are taken from spouses, so each husband in the sample can be matched with exactly one wife. c. The population mean difference in marriage ages between husbands and wives d. Null: The population mean difference in marriage ages is 0. Alternative: The population mean difference in marriage ages is greater than 0 (husband minus wife). e. There are 24 pairs of observations. The analysis shows that husbands tend to be older (average age 35.7) than their wives (33.8) when getting married. This is also reflected in the fact that most of the lines connecting husband to wife marriage age are leaning from right to left. See graphs for Solution 7.2.13e. f. p-value = 0.03
f. Because the standardized statistic is more than 2, there is strong evidence that number of shots made differs between rebounding methods, on average. 7.2.17 a. 0.875 ± 2(0.336) = 0.203 to 1.547 b. Yes, because the entire interval is positive (or 0 is not contained in the interval), there is strong evidence against the null hypothesis and in support of, on average, more shots being made when someone else is rebounding for the shooter. 7.2.18 a. The explanatory variable is type of water (categorical) and the response variable is the taste rating (quantitative). b. Randomized experiment _ c. x d = 0.032
g. The probability of 1.875 or larger when the null hypothesis is true h. We have strong evidence that the average marriage age of men is larger than the average marriage age of women. Because these data were not necessarily a random sample, the conclusion should be generalized with caution.
d. The p-value appears to be not small (and close to 0.5), because 0.032 is fairly close to the middle of the null distribution. e. t = 0.032/0.349 = 0.09
a. Becomes –1.875
f. Because the standardized statistic is not more than 2, there is not strong evidence that people will rate the bottled water higher, on average, compared to tap water.
b. Becomes less than 0
7.2.19
c. No change
a. 0.032 ± 2(0.349) = –0.666 to 0.730
7.2.14
d. No change Husband Mean = 35.708 SD = 14.582
Wife Mean = 33.833 SD = 13.560 18
24
30
36
42
48
54
60
66
72
Outcomes Mean = 1.875 SD = 4.812 –15
–12
–9
–6
–3
0 3 Differences
6
9
12
15
Solution 7.2.13e
c07Solutions.indd 88
10/16/20 8:06 PM
89
Solutions to Problems b. No, because the interval contains negative numbers as well as positive numbers (or 0 is contained in the interval), it is plausible that the ratings for bottled water are the same as the ratings for tap water in the population. On average, tap water’s rating could be higher than bottled water’s by as much as 0.666 point, or lower by as much as 0.730 point. 7.2.20 a. Music (yes/no; categorical) b. Number of words memorized (quantitative) c. 2.3 d. Yes, 2.3 is in the tail of the distribution. e. 2.3/0.974 = 2.36 f. Yes, the standardized statistic is more than 2. 7.2.21 a. 2.3 ± 2 × 0.974 = 2.3 ± 1.948, (0.352, 4.248) b. Yes, because the confidence interval does not include 0. 7.2.22 a. −2.3 b. μd < 0 c. No change d. (−4.248, −0.352) 7.2.23 a. Time of day, categorical b. Reaction time, quantitative c. –0.062 d. Yes, –0.062 is in the tail of the distribution. e. –0.062/0.023 = –2.7 f. Yes, strong evidence because –2.7 is less than –2 7.2.24 a. –0.062 ± 0.046 or (–0.108, –0.016) b. Yes, 0 is not in the interval. 7.2.25
c. Yes, most of the lines lean to the right, and the mean for no delay is 10.55, compared to 8.7 for with delay. d. 1.85 is the average difference in number of words memorized. e. p-value = 0.0077 f. We have very strong evidence that a short delay hinders the memorization process. 7.2.28 a. 1.85/0.804 = 2.30 b. Yes, the standardized statistic is larger than 2. 7.2.29 a. 1.85 ± 2 × 0.804 = 1.85 ± 1.608 or (0.242, 3.46) b. We are 95% confident that true long-run average difference in number of words memorized is between 0.242 and 3.46 higher without the delay. c. Yes, 0 is not in the interval. 7.2.30 a. Explanatory: dominant vs. nondominant hand, response: reaction time b. Null: The long-run average difference in reaction times between hands (dominant − nondominant) is 0. Alternative: The long-run average difference is less than 0. c. Yes, the average difference is −0.026 and most people’s average differences are less than 0. d. −0.026, the observed average difference in reaction times e. p-value = 0.009 f. Yes, there is strong evidence that reaction times are slower when people use their nondominant hand. 7.2.31 a. −0.026/0.011 = −2.36 b. Yes, the standardized statistic is less than 2. 7.2.32 a. −0.026 ± 0.022 = (−0.048, −0.004)
b. No change (still two-sided)
b. We are 95% confident that the long-run average difference in reaction times is between 0.004 and 0.048 seconds faster with the dominant hand.
c. No change
c. Yes, 0 is not in the interval.
d. (0.016, 0.108)
7.2.33
7.2.26
a. The explanatory variable is the type of media being looked at, Facebook or Instagram (categorical), and the response variable is the brake reaction time (quantitative).
a. 0.062
a. Explanatory: exercise or sit down, response: number of words memorized b. Null: The long-run average difference in words memorized is 0. Alternative: The long-run average difference in words memorized is not 0. c. is the average difference in number of words memorized between exercising and sitting down. d. p-value is approximately 0.78. e. We do not have strong evidence that exercising while trying to memorize words helps or hinders the process. 7.2.27 a. Explanatory: delay/no delay, response: number of words memorized b. Null: The long-run average difference in words memorized with and without delay is 0. Alternative: The long-run average difference in words memorized (no delay minus with delay) is greater than 0.
c07Solutions.indd 89
b. Let μd represent the mean difference in braking reaction time between Facebook and Instagram; H0: μd = 0; Ha: μd ≠ 0. _ c. x d= 0.287. The mean difference in braking reaction time between using Facebook and Instagram is 0.287 seconds in the sample. d. p-value ≈ 0.0007 e. We have very strong evidence that there is a difference in average braking reaction times between drivers who are looking at Facebook on their phones compared to drivers who are looking at Instagram on their phones, with the longer reaction times, on average, for those looking at Facebook. 7.2.34 a. t = 0.287/0.115 = 2.50 b. Yes, because the standardized statistic is more than 2, there is strong evidence of a difference in braking reaction times for drivers
10/16/20 8:06 PM
90
C HA PTER 7
Paired Data: One Quantitative Variable
who are looking at Facebook on their phones and drivers who are looking at Instagram on their phones, on average.
present and when it is not. When music is not present, on average, more social incidences occur than when it is present.
7.2.35
7.2.41
a. 0.287 ± 2(0.115) = 0.057 to 0.517
a. The mean aggression rate for the music condition (0.777) was higher than the no music condition (0.257). _ b. x d = 0.520
b. Yes, because the entire interval is positive (or doesn’t contain 0) there is strong evidence that the average braking reaction time is higher when drivers are looking at Facebook on their phones compared to when they are looking at Instagram on their phones. 7.2.36 a. The explanatory variable is the media type, Facebook or Instagram (categorical), and the response variable is the time headway variability (quantitative). b. Let μd represent the mean difference in time headway variability between Facebook and Instagram; H0: μd = 0; Ha: μd ≠ 0. _ c. x d = 0.030. The mean difference in time headway variability between Facebook and Instagram is 0.030 seconds in the sample. d. p-value ≈ 0.004 e. We have very strong evidence that there is a difference in average time headway variability between when drivers are looking at Facebook compared to when they are looking at Instagram, with the larger time headway variability for when looking at Facebook. 7.2.37 a. b. Yes, because the standardized statistic is more than 2, there is strong evidence that, on average, the time headway variability is larger when drivers are looking at Facebook compared to when they are looking at Instagram. 7.2.38 a. 0.030 ± 2(0.011) = 0.008 to 0.052 b. Yes, because the entire confidence interval is positive (or doesn’t contain 0) there is strong evidence that, on average, the time headway variability is larger when drivers are looking at Facebook compared to when they are looking at Instagram.
c. p-value ≈ 0.12 d. Because the p-value is greater than 0.10, we do not have strong evidence that, on average, there is a difference in the aggression rates of such chimpanzees between when music is present and when it is not. 7.2.42 a. The data should be paired because the researchers measured the sweat rate from the tattooed area as well as the sweat rate from the nontattooed area on the opposite side of the same person’s body. b. The mean sweat rate for the nontattooed area (0.935) was larger than for the tattooed area (0.922). _ c. x d = –0.013 d. p-value ≈ 0.71 e. Because the p-value is larger than 0.10, we do not have strong evidence that, on average, there is a difference in the sweat rates between tattooed areas and nontattooed areas. 7.2.43 a. The mean difference of –83.105 shows that the seagulls got to the food 83.105 seconds faster, on average, when they were not being watched compared to when they were being stared at. We get a p-value ≈ 0.005, so there is very strong evidence that seagulls get to the food faster, on average, when they are not being watched compared to when they are being stared at. b. 14 out of the 19 times the seagulls took longer when stared at; thus, p̂ = 14/19 = 0.737.
7.2.39
c. We would need to run a single proportion test with H0: π = 0.50 and Ha: π > 0.50. p̂ = 14/19 = 0.737 and p-value ≈ 0.03, giving us strong evidence that more than half the seagulls, in the long run, will get to the food faster when they are not watched.
a. It should be treated as paired data because there are two measurements from each dryer (with and without hands).
d. Although we come to the same conclusion in both (a) and (c), the evidence in (c) is not quite as strong as that in (a).
b. Mean with hands = 95.391 dBA, mean without hands = 91.832 dBA; with hands is louder.
Section 7.3
c. 3.559 dBA
7.3.1 A.
d. p-value ≈ 0.0003
7.3.2 E.
e. Yes, because the p-value is less than 0.01, there is very strong evidence that, on average, there is a difference in the dBA with hands under the dryer compared to without hands under the dryer. The dBA is significantly higher, on average, when hands are under the dryer.
7.3.3 B.
7.2.40 a. The data should be paired because the same chimpanzees are being compared under two conditions—the music condition and the no music condition. b. The mean for the no music condition (16.333) was larger than for the music condition (8.000). _ c. x d = –8.333 d. e. Because the p-value is less than 0.01, we have very strong evidence that, on average, there is a difference in the number of incidences of social behavior among such chimpanzees between when music is
c07Solutions.indd 90
7.3.4 B. 7.3.5 a. Do students perform better on multiple-choice or shortanswer tests? b. Each student will be tested twice—once with short-answer and once with multiple-choice test. c. Explanatory: multiple-choice or short-answer test, response: score on tests d. Random assignment used for order that tests are administered and random sampling will be used to select student participants from all students at our college. e. Null: The average difference in test scores is the same for multiplechoice and short-answer tests. Alternative: The average difference in test scores is different.
10/16/20 8:06 PM
Solutions to Problems f. If there are at least 20 students in the sample and the differences in their two test scores are not strongly skewed 7.3.6 a. The average difference in husband and wife marriage ages in Cumberland County; μd b. Null: The average difference in husband and wife marriage ages in Cumberland County is 0. Alternative: The average difference in husband and wife marriage ages in Cumberland County (male–female) is greater than 0 (husbands tend to be older than their wives). c. Null: μd = 0; Alternative: μd > 0 d. The validity conditions are met because there are 24 pairs, and distribution of paired differences in ages is not strongly skewed as shown in the dotplot for Solution 7.3.6d. e. t= 1.91, p-value = 0.0344 f. We will obtain differences as extreme as the one we actually observed (>1.875) approximately 3.44% of the time (shuffling ages within pairs) when the null hypothesis is true. We have strong evidence for a one-sided test that the average difference in ages is greater than 0 (husbands tend to be older than their wives) g. (−0.157, 3.91). We are 95% confident that, on average, males are between 0.157 younger and 3.91 years older than females when they get married. h. With a one-sided p-value of 0.0344 we have strong evidence that, on average, husbands are older than their wives. We are also 95% confident that males are between 0.157 younger and 3.91 years older than females when they get married. (Because we were using a one-sided p-value we can have an instance like this where we get a small p-value and a confidence interval that contains zero.) This is not a randomized experiment so no cause-and-effect conclusion is possible. Furthermore, the sample may only be representative of Cumberland County, PA, and should not be generalized further. 7.3.7 a. A.
91
d. We have found a very significant difference in the actual and estimated average heights of youngest children by their parents, with parents, on average, estimating the height of their youngest child between 5.17 and 9.83 cm less than actual height. This result neither can be generalized to all parents, as this was not a random sample, nor demonstrates cause and effect, as this is an observational study. e. That there is not strong skewness in the distribution of differences in actual and estimated heights 7.3.9 a. t= 0.44, p-value = 0.6623 (two-sided test) b. (–1.44, 2.24). We are 95% confident that parents, on average, misjudge the height of eldest children by between 1.44 cm (underestimate) and 2.24 cm (overestimate). c. We have found a nonsignificant difference in the actual and estimated average heights of eldest children by their parents, on average, estimating the height of their oldest child between 1.44 less than and 2.24 cm more than actual height. This result neither can be generalized to all parents, as this was not a random sample, nor demonstrates cause and effect, as this is an observational study. d. That there is not strong skewness in the distribution of differences in actual and estimated heights 7.3.10 a. Each infant is paired with itself. b. Whether the helper or hinderer is being watched c. Time looking at the approach d. Null: The average time looking at the approach with the helper is the same as with the hinderer. Alternative: The average time looking at the approach with the hinderer is longer than with the helper. e. You need the data for each infant and it is not provided. 7.3.11 a. The long-run average difference in times looking at the hinderer and the helper condition ( μd) b. Null: μd = 0 and Alternative: μd > 0
b. C.
c. Because the sample size is less than 20, the distribution of differences should be fairly symmetric—and it is (stated as such in the problem).
c. A. d. D. e. We do not have the 39 pairs of observations (actual and estimated) heights; these would be needed to rerandomize actual and estimated heights within pairs using a coin flip. f. Validity conditions are met (sample size is >20 and the differences are not strongly skewed). 7.3.8 a. t = 6.51, p-value < 0.0001 b. We obtain values of 7.5 cm or larger by chance less than 0.0001 of the time if parents, in the long run, are estimating the heights of the children accurately on average. c. (5.17, 9.83). We are 95% confident that, on average, parents underestimate the height of their youngest child by between 5.17 and 9.83 cm.
d. t= 2.60, p-value = 0.01 e. The probability of obtaining an average difference in the sample of 1.14 seconds or larger, if the average difference in the population is 0, is 1%. f. The long-run average difference in looking times is between 0.21 and 2.07 seconds longer when infants are looking at the hinderer as opposed to the helper. g. We have strong evidence that the average difference in looking times is larger than 0 (infants, on average, look at the hinderer toy between 0.21 and 2.07 seconds longer than the helper toy). This suggests a cause-and-effect relationship between condition and looking time because random assignment was used, but the result should be generalized to other infants with caution because this was not a random sample. Mean = 1.875 SD = 4.812
−15
−12
−9
−6
−3
0 Differences
3
6
9
12
15
Solution 7.3.6d
c07Solutions.indd 91
10/16/20 8:06 PM
92
C HA PTER 7
Paired Data: One Quantitative Variable
7.3.12 a. Smaller b. Larger c. Same d. Smaller 7.3.13 a. Smaller
d. With a reasonably large sample size of 80, we just need to assume that the population distribution of the differences is not strongly skewed. e. t= 2.93, p-value = 0.0044 f. (41.4, 216.6) We are 95% confident that men such as those in this study tend to overestimate their EE by 41.4 to 216.6 kcal on average. g. Confidence interval does not include 0 and p-value is less than 0.05.
a. Smaller
h. We have very strong evidence that the average difference in estimated and actual kcal burned is different than 0, with men tending to overestimate calories burned by between 41.4 and 216.6 kcal on average. This is not a cause-and-effect relationship because no random assignment was used, and this result should be generalized with caution because these men are not necessarily representative of all men with regard to their ability to estimate calories burned.
b. Larger
7.3.19
b. Larger c. Larger d. Same 7.3.14
c. Same d. Smaller 7.3.15 Null: The average difference in cholesterol levels between day 2 and 4 is 0. Alternative: The average difference is not 0. t = 3.22, p-value = 0.0033. 95% CI: (8.44, 38.13). We have strong evidence that the average difference in cholesterol levels is not 0. Cholesterol levels are between 8.44 and 38.13 higher on day 2, on average, as compared to day 4. Little information is given to know whether we can generalize this conclusion to all heart attack patients, and no cause and effect can be concluded because random assignment was not used. The theory-based approach is appropriate here because the sample size is greater than 20 (28 pairs) and the distribution of differences is not strongly skewed. 7.3.16 Null: The average difference in cholesterol levels between Day 2 and 14 is 0. Alternative: The average difference is not 0. t= 3.29, p-value = 0.0041. 95% CI: (13.72, 62.28). We have strong evidence that the average difference in cholesterol levels is not 0. Cholesterol levels are between 13.72 and 62.28 higher on Day 2, on average, as compared to Day 14. Little information is given to know whether we can generalize this conclusion to all heart attack patients, and no cause and effect can be concluded because random assignment was not used. The theorybased approach is appropriate here because even though the sample size is less than 20 (19 pairs), the distribution of differences is fairly symmetric. 7.3.17 We do not have strong evidence that the average difference in cholesterol levels is not 0. Cholesterol levels are between 6.56 lower and 25.29 higher on Day 4, on average, as compared to Day 14. Little information is given to know whether we can generalize this conclusion to all heart attack patients, and no cause and effect can be concluded because random assignment was not used. The theory-based approach is appropriate here because even though the sample size is less than 20 (19 pairs), the distribution of differences is fairly symmetric. 7.3.18 a. Overestimate, the sample average is greater than 0. b. Because each person has two measurements of their kcal burning and we want to know whether the average difference is different than 0. c. Null: The long-run average difference in estimated and actual calories burned is 0. Alternative: The long-run average difference in estimated and actual calories burned is not 0.
c07Solutions.indd 92
_ x d− 0 ______ 129 _ = 2.93 _ = a. t = ______ s d / √n 393.5 / √80 _ s d_ . If we assume a multiplier of 2, we have b. x d ± multiplier × ____ √ n ___ √ of 129 ± 2(393.5/ 80 ) = (41.0, 217.0). Or with 80 − 1 = 79 degrees___ freedom, the actual t-multiplier is 1.990 and 129 ± 1.990(393.5/√80) = (41.45, 216.55). 7.3.20 a. t= 2.27, p-value = 0.0257 b. (16.05, 241.95) c. The p-value is larger now, and the t-statistic smaller now (with wrong analysis). The confidence interval is centered at the same value but is now wider. d. This suggests pairing was effective at reducing variability and improving strength of evidence against the null hypothesis. 7.3.21 a. Larger b. Smaller c. Same d. Larger 7.3.22 a. Larger b. Smaller c. Smaller d. Same 7.3.23 a. Smaller b. Larger c. Same d. Smaller 7.3.24 a. t = 4.00, p-value = 0.0007 b. (0.036, 0.114) c. We have strong evidence that the average difference in running time comparing narrow to wide angle is not 0, with longer running times for the narrow angle (on average between 0.036 and 0.114 seconds longer). This result suggests a cause-and-effect relationship between running time and choice of angle but does not necessarily generalize to all baseball players because little is known about how
10/16/20 8:06 PM
93
Solutions to Problems representative the baseball player samples are of the running characteristics of all players. 7.3.25 _ d − 0 ______ x _ = 0.075 _ = 4.00; The sample mean difference is a. t = ______ s d / √n 0.088 / √22
4.00 standard deviations above the mean of the null distribution. _ s d_ b. x d± multiplier × ____ √ n ___ .075 ± 2(.088/√22 ) = (0.037, 0.113) With df = 21, the t-multiplier is 2.08, we find .075 ± 2.08(.088/ ___ √ 22 ) = (0.036, 0.114) 7.3.26 a. H0: μnew = 0 versus Ha: μnew > 0 b. p-value = 0.1172; no change in p-value from changing the order of subtraction c. (−7.78, 29.54). The signs of both endpoints of the interval will change due to changing the order of subtraction. 7.3.27 a. The long-run average difference in standing heart rate for tomoxetine compared to placebo (μd) b. Null: μd = 0; Alternative: μd > 0 (there is an increase, on average, in standing heart rate when on tomoxetine compared to when on the placebo) c. The sample size is less than 20 (18) but the differences in heart rates are approximately symmetric, d. t = 3.10, p-value = 0.0032 e. The probability of obtaining a difference of 11 or larger if the longrun average difference is 0 is 0.32%. f. (3.52, 18.48) g. We have strong evidence that the long-run average difference is greater than 0 (between 3.52 and 18.48 more beats per minute, on average, when taking tomoxetine). This result suggests a cause-andeffect relationship between tomoxetine and an increase in heart rate, but the result should be generalized with caution because little is known about how representative the volunteers in the sample are of any populations of interest. 7.3.28 a. The long-run average difference in RMET scores between oxytocin and placebo ( μd) b. Null: μd = 0; Alternative: μd > 0 (people tend to score higher on the RMET with oxytocin) c. The sample size is more than 20 and the score differences are not strongly skewed. d. t= 2.18, p-value = 0.0188 e. The probability of obtaining a difference of 3 or larger if the longrun average difference is 0 is 1.88%. f. (0.1846, 5.8154) g. We have strong evidence that the long-run average difference is greater than 0 (scoring between 0.18 and 5.82 higher on the RMET test, on average, after oxytocin). This result suggests a cause-andeffect relationship between oxytocin and performance on the RMET, but the result should be generalized with caution because little is known about how representative the volunteers in the sample are of any populations of interest. 7.3.29 _ d− 0 ______ x 3 _ = 2.18; The sample mean difference is 2.18 _ = a. t = ______ s d / √n 7.54 / √30 standard deviations above the mean of the null distribution.
c07Solutions.indd 93
_ s d_ b. x d ± multiplier × ____ . If we assume a multiplier of 2, we have √n ___ 3 ± 2(7.54/√30 ) = (0.25, 5.75)
___
3 ± 2.045 (7.54/√30 ) = (0.18, 5.82). 7.3.30 a. The music condition had a faster average completion time. b. Yes, the vast majority of differences are positive, which means that most subjects are completing the test more quickly with the music. c. H0: μd = 0; Ha:μd ≠ 0, where μd is the long-run mean of the difference in the times it takes people to take a Stroop test with and without music. d. Because the t-statistic is greater than 2 (t = 3.84) and because this was a randomized experiment, we can conclude that there is strong evidence that music affects the completion time. e. Based on a p-value of 0.0005 and because this was a randomized experiment, we can conclude that we have very strong evidence that listening to music does have an effect on completing a Stroop test and that, on average, it will decrease the time for people to complete the test. (This direction was a surprise to the student researchers because they thought the music would increase the time it took to complete the test.) 7.3.31 _ a.
sd = 2.657, nd = 38
b. (0.781, 2.527) c. We can be 95% confident that the average time to complete the Stroop test takes between 0.781 and 2.527 seconds longer, on average, for those not listening to music. (Perhaps opposite of what you might have thought!) d. Because the confidence interval does not include 0, and because this was a randomized experiment, we can conclude that we do have strong evidence that listening to music decreases the average time needed to take the test. 7.3.32 a. Black and white had a larger mean number of objects remembered. b. No, the distribution of differences appears to be almost centered on 0. c. H0: μd = 0; Ha:μd ≠ 0, where μd is the long-run mean of the difference in the number of objects people can remember based on pictures in color and pictures in black and white. d. Because the t-statistic is less than 2 (t = 0.81), there is not strong evidence of a difference in objects remembered (color versus black and white pictures), on average. e. With a p-value of 0.4219, we do not have strong evidence that there is a difference, on average, in the number of objects people can recall when shown pictures in color compared to when shown pictures in black and white. 7.3.33 a. Instrumental music had a larger number of words memorized. b. Yes, the distribution of differences is centered well above 0. c. H0: μd = 0; Ha:μd ≠ 0, where μd is the long-run mean of the difference in the number of words people can remember when music without lyrics is playing compared to when music with lyrics is playing. d. Yes, because thet-statistic (t = 2.27) is greater than 2 e. With a p-value of 0.0306, there is strong evidence of a difference, on average, in the number of words memorized between the two conditions. People memorized significantly more words, on average, when instrumental music was playing.
10/16/20 8:06 PM
94
C HA PTER 7
Paired Data: One Quantitative Variable
7.3.34 a. The same hurricanes had a higher mean intensity rating when they were given male names, compared to when they were given female names. b. H0: μd = 0; Ha:μd ≠ 0, where μd is the long-run mean of the difference in intensity ratings of hurricanes when they are given male names versus female names. c. Yes, because thet-statistic (t = 4.54) is greater than 2 d. With p-value < 0.0001, there is very strong evidence of a difference, on average, in the intensity ratings between male- and female-named hurricanes, with male hurricane names rated more intense. 7.3.35 _ d = 0.226, sd = 0.925, nd = 346 a. x b. (0.1282, 0.3238) c. We can be 95% confident that the intensity rating for a hurricane will be between 0.1282 to 0.3238 point higher, on average, for malenamed hurricanes than for female-named hurricanes. d. Because the confidence interval does not include 0, we do have strong evidence that there is a difference, on average, in the intensity ratings between male- and female-named hurricanes, with ratings for the male-named hurricanes being larger. e. Although there is very strong evidence of a difference, there is not evidence of a strong effect because the sample mean of the differences is only 0.226, and the confidence interval shows this difference in the population could be as low as 0.1282 (where the ratings could go anywhere from 1 to 7). We get an interval that is completely positive (and corresponding small p-value) because the sample size is so large (n = 346). 7.3.36 a. Null: The type of laughter has no effect on the rating of the joke, on average. Alternative: The type of laughter does have an effect on the rating of the joke, on average.
b. Incorrect, the value in the null hypothesis (e.g., 40) should match the value in the alternative hypothesis. c. Correct 7.CE.4 a. Paired b. Paired c. Unpaired d. Paired 7.CE.5 a. Have children, separately, use both smoke detectors, record the time needed to leave the house both times (randomly decide which detector to use first). b. Some children (e.g., older) will be able to leave much faster than others (e.g., younger). c. The first time, children may learn what they need to do and be faster the second time around, regardless of which smoke detector is used. 7.CE.6 a. Explanatory: type of music (categorical), response: amount spent (quantitative) b. Experiment—values randomly assigned.
of
the
explanatory
variable
are
c. Plan 3 d. People may tend to spend more on certain nights of the week (e.g., Friday) compared to other days (e.g., Monday) for reasons other than the type of music being played. 7.CE.7 a. The husbands’ ages in the sample are not connected to the wives’ ages in the sample. b. Collect both husband and wife ages from each marriage license.
b. The mean for the spontaneous laughter is 3.245, for posed laughter is 3.010, and for the difference between spontaneous and posed laughter is 0.235.
c. Because there is likely quite a bit of variation in ages at marriage but less variation in difference in husband and wife ages
c. t = 3.55, p-value = 0.0010
a. Randomly assign boots (waterproof or not) to volunteers to walk around and then rate how waterproof they are.
d. Yes, based on the small p-value and because the study was a randomized experiment, we have strong evidence that the type of laughter (spontaneous vs. posed) makes a difference in how funny people think the jokes are, with spontaneous laughter getting higher ratings than posed laughter, on average.
End of Chapter 7 Exercises 7.CE.1 a. i. b. iv. c. iii.
7.CE.8
b. Randomly assign one waterproof boot and one regular boot to either the left or right foot of one person. 7.CE.9 a. Null: Average pretest flexibility is the same for males and females. Alternative: Average pretest flexibility is different for males and females. Sample sizes are 98 (F) and 81 (M), without strong skewness in either pretest distribution, meaning that a two-sample t test is valid. t = 3.73, p-value = 0.0003. We have very strong evidence of a difference in the average flexibility of males and females (females are between 0.748 and 2.43 inches more flexible than males; 95% CI).
7.CE.2
b. Matched pairs test to evaluate whether there is a significant change in flexibility; this is different than an independent samples t-test to compare male and female flexibility.
a. No, the p-value was not small, so even though random assignment was used, we cannot draw a cause-and-effect conclusion here.
c. Null: Average difference in flexibility pre vs. post is 0. Alternative: Average difference is less than 0 (pre minus post).
b. They should analyze the difference (for each person) in number of M&Ms taken.
The distribution of flexibility change scores is not strongly skewed (see graph) and there are more than 20 (actually 179) scores, so a one-sample t-test (paired t-test) is appropriate. We have very strong evidence (t = −4.76, p-value < 0.0001) that the average difference (pre minus post) is less than 0. A 95% confidence interval finds an
d. ii.
7.CE.3 a. Incorrect because the hypotheses are about statistics instead of parameters.
c07Solutions.indd 94
10/16/20 8:06 PM
95
Solutions to Problems average increase in flexibility between 0.27 and 0.65 inches. This result generalizes to all students taking the class around this time but does not demonstrate cause and effect because random assignment was not used. Mean = 0.462 SD =1.299
7.CE.10 a. Weight: preweight mean: 67.009 kg (SD = 12.12 kg), post weight mean: 67.668 kg (SD = 12.243 kg), average change in weight = – 0.659 kg (SD = 2.214 kg). Sample size is 180. Distribution of weight changes looks fairly symmetric Null: The average weight change is 0. Alternative: The average weight change is not 0. t = –3.99, p-value = 0.0001. We have strong evidence that the average weight change is not 0. Water weight: pre weight mean: 2.559 kg (SD = 1.329 kg), post weight mean: 2.746 kg (SD = 1.321 kg), average change in water weight: –0.187 kg (SD = 0.357 kg). Sample size is 178. Distribution of water weight changes looks fairly symmetric. Null: The average water weight change is 0. Alternative: The average water weight change is not 0. t= –6.99, p-value < 0.0001. We have very strong evidence that the average water weight change is not 0.
−8
−6
−4
−2
0 2 Differences
4
6
8
d. Explanatory: sex, response: change score. Use a two-sample t-test. e. First, create the difference scores for each person, then use Multiple Means applet to look at distribution of change scores for males and females. (See graph in Solution 7.CE.9e.)
Body fat: pre–body fat mean: 19.79 (SD = 7.19) post-body-fat mean: 18.64 (SD = 6.97), average change in weight = 1.155 (SD = 3.08). Sample size is 177. Distribution of weight changes looks fairly symmetric. Null: The average body fat change is 0. Alternative: The average body fat change is not 0. t = 4.99, p-value < 0.0001. We have strong evidence that the average body fat change is not 0. Mean = −0.659 SD =2.214 −10
−8
−6
−4
−2 0 2 Differences
4
6
8
10
Mean = −0.187 SD = 0.357 −1.60 −1.20 −0.80 −0.40 0 0.40 Differences
0.80
1.20
1.60
Mean = 1.155 SD = 3.078 –12
−9
−6
−3
0 3 Differences
6
9
12
Female Gender
b. Weight: 95% CI (–0.98, –0.33). We are 95% confident that students, on average, increase between 0.33 and 0.98 kilograms over the course of the semester. Male
Water weight: 95% CI (–0.24, –0.13). We are 95% confident that students, on average, decrease their water weight between 0.13 and 0.24 kilograms over the course of the semester. 0
5
Change Summary statistics: female male pooled Solution 7.CE.9e
c07Solutions.indd 95
n 98 81 179
Mean 0.36 0.59 0.46
SD 1.33 1.25 1.30
Body fat: 95% CI (0.70, 1.61). We are 95% confident that students, on average, decrease in body fat between 0.70 and 1.61 percentage points over the course of the semester. Overall, it appears that although weight is increasing, body fat is decreasing, suggesting that fat is being replaced by muscle. 7.CE.11 a. Null: The average difference in the percentage of times chimps open the gate is 0 (percentage of times open gate when food platform width is wide minus narrow). Alternative: The average difference is greater than 0.
10/16/20 8:06 PM
96
C HA PTER 7
Paired Data: One Quantitative Variable
b. The fact that the data are paired. This may yield a p-value larger than you could get if you analyzed the data (correctly) as paired. c. The p-value from a simulation using the Matched Pairs applet gives a p-value of about 0.005. We have strong evidence that chimps are more likely to open the gate in the collaborative condition than the solo condition. 7.CE.12 a. p-value = 0.371. We do not have evidence of a difference in the average running times between the two methods. b. We do not have enough evidence this time, whereas before we did c. Matching reduces the variability and improves power. 7.CE.13 a. Null: The average difference in running times is 0. Alternative: The average difference in running times is not 0; t = –4, p-value = 0.0007. We have very strong evidence that the average difference in running times is not 0. b. t = 0.93, p-value = 0.36. In this case we would not have evidence that the average difference in running time differs from zero. c. The answers are quite different, indicating a substantial effect of matching. 7.CE.14 a. Using the unpaired approach gives (–0.237, 0.087), whereas the paired approach gives –0.114 to –0.036. b. The midpoints of the intervals are the same: –0.075. The difference is in the widths. In one case (paired data) the width of the interval is quite a bit smaller than the other case because it is the pairing that reduces the variability in the null distribution and, hence, the confidence interval width (margin of error). 7.CE.15
b. (−1.66, −0.34). We are 95% confident that, on average, cats have between 0.34 and 1.66 more negative interactions after catnip than before. 7.CE.18 a. Null: The average number of negative interactions before catnip is the same as after. Alternative: The average number of negative interactions before catnip is less than after. t= −1.34, p-value = 0.096. We have moderate evidence that the number of negative interactions before catnip is less than after catnip. b. The p-value has gotten quite a bit larger (0.003 compared to 0.096). c. Yes, the p-value is much smaller, indicating that, because of the pairing, the study yields a statistically significant result, compared to the same study without pairing, which would not have yielded a statistically significant result.
Chapter 7 Investigation Part I 1. The observational units are the filters. 2. Escherichia coli count on just filtered water and E.coli count on water filtered the day before 3. These variables are quantitative, but we can also consider the difference in E. coli counts for each day as a quantitative response variable. 4. The samples of the first E.coli count are dependent of the samples of the second E.coli count because they came from the same filter. If the filter is functioning well, we would expect both E.coli counts to be low. If the filter is not functioning well, we would expect both E.coli counts to be higher. So we will perform a matched pairs test for the mean difference.
a. Each cat had negative interactions recorded before and after exposure to catnip.
5. The validity conditions are questionable with a sample size of 14. It might be reasonable to use the theory-based t-test if the distribution of the differences is reasonably symmetric.
b. Null: There is no association between exposure to catnip and negative interactions. Alternative: There is an association between exposure to catnip and negative interactions (more negative interactions after catnip).
6. H0: μd = 0 and Ha: μd ≠ 0, where μd is the long-run average of the differences (Day1–Day2) of E.coli counts. 7. Averages: Day 1: 43.48, Day 2: 94.79, increased on average from Day 1 to Day 2; standard deviations: Day 1: 64.19, Day 2: 82.46 (per 100 mL)
c. p-value = 0.004. The probability of obtaining a mean difference in the number of negative interactions of −1 or less is 0.004.
8. Average of the difference (Day1–Day2) = –51.307 The sign does correspond to answer in #7. On average, Day 2 had larger E. coli counts, so if average of the differences is negative, then it must be that most of the differences (Day1–Day2) were negative, meaning Day 2 had a larger count. Standard deviation for the difference in E.coli counts is 58.687.
d. We have strong evidence against the null hypothesis. In other words, we would concude there is an association between exposure to catnip and an increase in the number of negative interactions (on average, more negative interactions after exposure to catnip). 7.CE.16 a. Although there are only 15 pairs (cats) in the study, the distribution of their difference in negative interaction scores is not strongly skewed. b. t= −3.24, p-value = 0.003 c. We have strong evidence against the null hypothesis, in favor of the alternative of more negative interaction scores, on average, with the catnip. 7.CE.17 a. −1.0 ± 2 × 0.396 = −1.0 ± 0.792, (−1.79, −0.21). We are 95% confident that, on average, cats have between 0.21 and 1.79 more negative interactions after catnip than before.
c07Solutions.indd 96
9. p-value = 0. The 95% confidence interval: –51.306 ± 2(20.396) = (–92.098, –10.514). Note: Answers will vary depending on SD of simulated null. 10. There is strong evidence against the null and in support of the alternative that there is a genuine difference between the mean E.coli on Day 1 and Day 2. On average there is between 10.514/100 mL and 92.098/100 mL more E.coli on Day 2 than on Day 1. 11. We can generalize to filters yet to be made as this is being treated as pilot data. We cannot draw a cause-and-effect conclusion as this was an observational study and there are many possible confounding variables. 12. Researchers were wondering whether there was a difference in E.coli counts from just filtered water compared to water filtered the day before. Data were gathered on 14 filters in a village in Cameroon.
10/16/20 8:06 PM
Solutions to Problems
97
Sample data
n: 23 mean, x: 913.56 342
158
658
1158
1658
sample sd, s: 582.88
Flow rates (mL/min) Solution 18
The average E.coli count of just filtered water was 43.48/100 mL of water and 94.79/100 mL of water for the water filtered the day before. A matched pairs test was run on the data and resulted in a p-value of 0, which offered very strong evidence against the null and in support of the alternative that there is a difference in the average E.coli counts between Day 1 and Day 2 of filtered water. We can generalize these results to the population of filters yet to be built but are not able to draw any causal inference as this was an observational study, not a randomized experiment. Further research could be done on newer filters, also looking at more than 1-day-old water.
Part II 13. Explanatory variable is amount of sand and response is E.coli counts. Amount of sand is categorical (>2 inches or <2 inches) and E.coli counts is a quantitative variable. 14. Average E. coli counts for >2 inches = 38.84, for <2 inches = 80.60. Counts are higher on average with lower amounts of sand. Standard deviations for >2 inches = 70.01, for <2 inches = 96.83. 15. Independent. The filters that have >2 inches of sand are different filters than those with <2 inches of sand. Independent two-sample test for means. 16. No, the sample size is not large enough to conduct a theorybased test. 17. –41.764 ± 2(39.56) … (–120.884, 37.356). We have weak evidence against the null and so conclude it is plausible that the average E.coli counts are the same for filters with >2 inches of sand and filters with <2 inches of sand.
Part III 18. Similarities: Both are looking at a quantitative response variable, and both are a test on a single mean. Differences: The first question was a matched pairs analysis on two dependent measures; this is comparing the quantitative response to a parameter value that makes contextual sense. 19. Dotplot of the flow rates shows two clumps of data. (See Solution 18 output.) 20. Theory-based output shown. We have weak evidence against the null and so it is plausible that on average the filters are flowing at 1000 mL/min.
c07Solutions.indd 97
21. Recalling what the dotplot looked like when we explored our data, it appears that roughly half of the filters are filtering too quickly and the other half are filtering too slowly. This does make the average where we want it, but none of the filters individually are doing a good job of filtering, at least from a flow standpoint. Exploring the data is a very important part of the six-step method.
Chapter 7 Research Article 1. The researchers wish to evaluate claims that the direction people look (up and to the right or up and to the left) is predictive of whether they are lying.
10/16/20 8:06 PM
98
C HA PTER 7
Paired Data: One Quantitative Variable
2. NLP is a commonly accepted psychological technique which claims to be able to gain useful information about whether someone is lying based on their eye movement (Gray, 1991); however, some of its claims may be difficult to replicate in controlled settings (Vrij and Lochun, 1997). 3. There were 32 participants, mainly undergraduate student recruited through contacts of the researchers. 4. They all had to be right handed. 5. Paired 6. 50 participants, recruited through contacts of the experimenters (their friends, acquaintances) 7. 0.040 8. 0.029 9. 0.019 10. 0.017 11. Null: The difference in frequency of upper left glances is the same on average whether participants are telling lies or not telling lies. Alternative: The difference in frequency of upper left glances is different on average depending on whether participants are telling lies or not telling lies. There is not convincing evidence (at the 5% level of significance, though would be at the 10% level of significance) that the difference in frequency of upper left glances is different on average depending on whether participants are telling lies.
13. Null: The average number of correct identifications for subjects who have NLP training is the same as the average number of correct identifications for subjects who don’t have NLP training (control). Alternative: The average number of correct identifications for subjects who have NLP training is different than the average number of correct identifications for subjects who don’t have NLP training (control). There is not convincing evidence (p-value = 0.81) that the average difference in frequency of upper left glances is different depending on whether subjects have NLP training. 14. The sample sizes are 21 and 29 in the two groups, both of which are larger than 20, so the validity conditions are met as long as the samples don’t have any strong skewness. 15. We should be hesitant to generalize these conclusions very far given the fact that the subjects were recruited from an undergraduate class and via the researchers’ acquaintances. This is clearly not a random sample and it is unclear who this sample is representative of. 16. Answers will vary. One idea would be to work harder to obtain a random sample, or at least a potentially representative sample, and one with, potentially, less sampling bias towards the researchers would be good. Another idea would to create a semicontrolled experiment where participants are incentivized to lie to create a more realistic situation where lying might occur as opposed to telling participants to lie or not lie.
12. The sample size is 32, which is more than 20, so the validity conditions are met as long as the distributions of the differences are not strongly skewed.
c07Solutions.indd 98
10/16/20 8:06 PM
CHAPTER 8
Comparing More Than Two Proportions ( ˆ pS − ˆ p L ) + ( ˆ pF − ˆ pS ) + ( ˆ pL − ˆ p F ) = 0 and ( p pS ) + ( ˆ pF − ˆ pS ) + (ˆ pL − ˆ pF ) = 2( ˆ pL − ˆ pS ). ˆL − ˆ
Section 8.1 8.1.1 A, B, E.
8.1.9
8.1.2 C.
a. Rep: 125/358 = 0.349, Dem: 201/357 = 0.563, Ind: 178/362 = 0.492
8.1.3 C. 8.1.4 B. 8.1.5 All observed proportions are the same. 8.1.6 0.25 − 0.30| + |0.25 − 0.35| + |0.30 − 0.35| ____________________________________ | 3 0.05 + 0.10 + 0.05 0.20 = _______________ = ____ = 0.067 3 3 8.1.7 0.66 − 0.45| + |0.66 − 0.73| + |0.66 − 0.44| ____________________________________ | 6 + 0.45 − 0.44| + |0.73 − 0.44| 0.45 − 0.73 | | | + ____________________________________ 6 0.21 + 0.07 + 0.22 + 0.28 + 0.01 + 0.29 = 0.18 = _________________________________ 6 8.1.8 Automatic zeros. Recall Example 8.1: Coming to a stop. The differences were: ˆ pS − ˆ pL = −0.047, p̂F − ˆ p S = −0.082, and ˆ ˆ If you add these without taking absolute values first, you get zero. This is not a coincidence but an algebraic fact. Check that (p pL ) + ˆS − ˆ (p̂ F − p̂ S) + (p̂ L − p̂ F) = 0. Ambiguity. If you take absolute values, order doesn’t matter: |p̂ S − p̂ L| = |p̂L − p̂S|. If you don’t take absolute values, order does
Solution 8.1.12
c08Solutions.indd 99
b. ((0.563 − 0.349) + (0.563 − 0.492) + (0.492 − 0.349))/3 = (0.214 + 0.071 + 0.143)/3 = 0.428/3 = 0.143 8.1.10 a. Random sampling. Random assignment is not possible—you can’t use random numbers to tell a person which party to belong to. Random sampling is both possible and important to avoid sampling bias. (Even if you can’t get access to voter registration lists, you can take a random sample of individuals and ask their party affiliation.) b. You would need to know the sample sizes—the number of Democrats, Republicans, and Independents in the sample. 8.1.11 a. B. The observed proportions for row B show the largest variation, with a maximum of 0.40 and a minimum of 0.15. The larger the variation, the stronger the evidence against the null hypothesis. The stronger the evidence, the smaller the p-value. b. A. The observed proportions for row A show the smallest variation, with a maximum of 0.30 and a minimum of 0.22. The smaller the variation, the weaker the evidence against the null hypothesis. The weaker the evidence, the larger the p-value. 8.1.12 a. Incentive: 34/38 = 0.89; no incentive: 30/42 = 0.71
99
10/16/20 8:05 PM
100
C HA PTE R 8
Comparing More Than Two Proportions
b. Mean Group Diff = 0.89 – 0.71 = 0.18 c. In this case the Mean Group Diff is just the absolute value of the observed statistic. This will be true when you have binary explanatory and binary response variables. d. See graphs for Solution 8.1.12. In distribution of the Mean Group Diff statistic looks like the distribution of the difference in proportions, except all of the negative values have been made positive, so it’s right skewed and always positive, instead of bell-shaped and centered at 0. e. The distribution of Mean Group Diff statistics is the distribution of difference in proportions “folded over” a vertical line at 0, so all the negative values become positive. Therefore, there will be lots of values close to 0 and will decrease as you move to the right. 8.1.13 a. Proportion of “Yes obsolete,” Mill: 0.44, GenX: 0.429, Boomers: 0.35, Over 65: 0.322. Because these proportions are different there is preliminary evidence of an association between generation and opinion about marriage.
e. The null hypothesis is that all four conditional probabilities are equal. More abstractly, there is no association between generation (the explanatory variable) and view of marriage (the response.) The alternative hypothesis is that at least two conditional probabilities differ. f. H0: πM = πX = πB = π65+. Ha: at least one of the probabilities of yes is different from the others. g. The Mean Group Diff = 0.072, with p-value = 0.0010 (based on 10,000 repetitions): The result is highly statistically significant. The evidence of an association in the population is extremely strong. Samples are random, so generalization to the population is justified. Conditions were not randomly assigned, so inference about cause and effect is not justified. Conclusion: There is very strong evidence that attitudes toward marriage vary from one age group to the next and that this pattern holds true for the population as a whole. 8.1.15
(1,018/2,622 = 0.388). Multiply the overall proportion by the total number of individuals in each group.
b.
Generation
c. See the graph.
Millennial Gen X Boomers Total (ages (ages (ages Age 18–29) 30–45) 46–65) 65+
Sample data
Yes
60
Yes
Percentage
Yes
80
Yes
100
445
82 1,018
328
446
701
129 1,604
Total
536
729
1146
211 2,622
i. (331 + 221)/771 so 71.50% ii. 100% − 71.50% = 28.50 percentage points GenX Boomers Generation
No
No
No
No Millennial
Over 65
8.1.14 a. Individual people in the survey. b. The explanatory variable is generation, which is categorical. (The categories are ordered, because they are based on age, which is quantitative.) The response is opinion of marriage, which is categorical (in fact binary). c. For the yawning study subjects were not chosen by random sampling; for this study they were. For the yawning study, categories of the explanatory variable were randomly assigned; for this study they were not. d. For each generation (each category of the explanatory variable), let π denote the conditional probability that a person answers Yes, marriage is obsolete. These conditional probabilities (population proportions) are the parameters. There are four, one for each generation: πB = probability of yes for Boomers and π65+ = probability of yes for 65+.
c08Solutions.indd 100
283
a. Control group:
20
208
8.1.16
40
0
Marriage Yes obsolete? No
b. Those with heart disease: i. little or no baldness (251+165)/663 so 62.75% ii. some or much baldness 100% − 62.75% = 37.25 percentage points c. At this stage, there does seem to be an association between baldness and heart disease. Among those with heart disease, (higher levels of) baldness is more common (37%) than among those without heart disease (29%). d. In words, the null hypothesis is that there is no association between baldness and heart disease. More specifically, the probability of heart disease does not differ among individuals with “none,” “little,” “some,” or “much” baldness. The alternative is that at least one of the four probabilities is different. In symbols, H0: πnone = πlittle = πsome = πmuch, where π is denotes the probability heart disease. e. The value of the Mean Group Diff is 0.099, with a null distribution that is unimodal and skewed to the right. The simulated p-value based on 10,000 repetitions is 0.001. The value of the chi-square statistic is 14.51, with a null distribution that is highly skewed to the right. The p-value, based on 10,000 repetitions, is 0.0023. Both tests give very strong evidence against the null hypothesis, which should be rejected. The conclusion is that there is in fact an association between degree of baldness and presence of heart disease. Degree of baldness could not be randomly assigned, so inference about cause and effect is not supported. The subjects were a sample chosen from the 22,000 doctors in the Physicians Health Study. If the sample was chosen by random sampling, then the results of the study generalize to all the physicians in the population.
10/16/20 8:05 PM
Solutions to Problems 101 8.1.17 a. Amount of light in infants’ bedrooms at night (darkness, night light, room light), categorical b. Whether or not child is nearsighted, categorical c. Null: There is no association between the amount of light in infants’ bedrooms at night and whether or not the child is nearsighted (the population proportion of nearsighted children in each group is the same). Alternative: There is an association between the amount of light in infants’ bedrooms at night and whether or not the child is nearsighted (at least one of the population proportions is different than the others). d. Dark: 18/172 = 0.105, night light: 78/232 = 0.336, room light: 41/75 = 0.547 e. 0.295 f. It is less than 0.001. g. There is very strong evidence of an association between light in room and nearsightedness. 8.1.18 a. See table
No sign
Yellow sign
Red sign
Total
Left door
13
14
25
52
Right door
42
40
28
110
Total
55
54
53
162
b. Sign condition (categorical) c. Which door was used (categorical) d. H0: The proportion of people that use the left door will be the same for all conditions, in the long run. Ha: The proportion of people that use the left door will not be the same for all conditions, in the long run. e. No sign = 0.236; yellow = 0.259; red = 0.472 f. Mean Group Diff = 0.157 g. Because 0.157 is far out in the tail, the p-value appears to be quite small (less than 0.05). h. We have strong evidence that the proportion of people that use the left door is not the same for all conditions. It appears that people are significantly more likely to use the left door when the red sign is present compared to when the yellow sign is present, or when no sign is present. 8.1.19 a. Time of day, categorical b. Whether or not subject yawned, categorical c. Null: There is no association between time of day and whether or not the subject yawns (the long-run proportion of yawners in each population is the same, regardless of time of day). Alternative: There is an association between time of day and whether or not the subject yawns.
only use students with similar characteristics, and eliminate either the morning or evening time slot to emphasize other potential group differences more. Narrow the time intervals. For example, compare 8–9 a.m. to 4–5 p.m. 8.1.21 a. Explanatory is time of day (categorical) b. Response is whether the student was on the phone (categorical) c. See table
Morning
Afternoon
Evening
Total
On phone
44
48
45
137
Not on phone
177
184
70
431
Total
221
232
115
568
d. Null: There is no association between time of day and phone use by students on campus (πM = πA = πE). Alternative: There is an association between time of day and phone use by students on campus (at least one πi is different). e. Morning 44/221 = 0.199; afternoon 48/232 = 0.207; evening = 45/115 = 0.391 f. Mean Group Diff = 0.128 g. p-value ≈ 0.0001 h. We have very strong evidence that there is an association between time of day and phone use by students on campus. As it is not a randomized experiment, we cannot conclude a cause-andeffect relationship. It was not a random sample, so generalization is difficult. However, this could very well be a representative sample, so we could generalize to all students on this campus. Many other campuses would probably find similar results while some might not. 8.1.22 You would combine morning and afternoon into one group and compare it to the evening results. In doing so you would have two proportions—day = 92/453 = 0.203 and evening 45/115 = 0.391—and could run a one-sided two-proportion test as follows. Null: Students are just as likely to be on their cell phones in the evening as they walk across campus as they are during the daytime. Alternative: Students are more likely to be on their cell phones in the evening as they walk across campus than they are during the daytime. p-value = 0.0000 with a theory-based test, so we have very strong evidence that students are more likely to be on their cell phones in the evening as they walk across campus than they are during the daytime. 8.1.23 a. White: 971/1,011 = 0.960, Black: 139/142 = 0.979, Hispanic: 204/213 = 0.958 b. Not much of an association—the proportions are quite close together.
d. 5–8 p.m. (29/44 = 0.659)
c. Null: There is no association between race and owning a cell phone (the population proportions are the same in each population). Alternative: There is an association between race and owning a cell phone (at least one of the population proportions is different).
e. 2–5 p.m. (15/33 = 0.455)
d. Mean Group Diff = 0.014
f. 0.113 g. p-value = 0.22 h. There is not strong evidence of an association between time of day and proportion of yawners.
e. f. We don’t have strong evidence of an association between race and whether or not someone owns a cell phone in the population. 8.1.24
8.1.20
c08Solutions.indd 101
10/16/20 8:05 PM
102
C HA PTE R 8
Comparing More Than Two Proportions
portion of Blacks and Whites that own a cell phone. Note: We used Whites, but using Hispanics is also fine as the White and Hispanic proportions are the same in this sample.
p-value from simulation is approximately 0.089. We have moderate evidence that the probability that cars come to a complete stop when leading is different than when following.
8.1.25
b. The null distributions are both right skewed and always positive, but the null distribution of the Mean Group Diff statistic is less variable (SD = 0.026) than the null distribution of the Max-Min statistic (SD = 0.039).
a. White: 829/1,011 = 0.820, Black: 114/142 = 0.803, Hispanic: 168/213 = 0.789 b. Not much of an association—the proportions are quite close together.
c. The strength of evidence is similar between the two approaches. 8.1.29
c. Null: There is no association between race and owning a smartphone (the population proportions are the same in each group). Alternative: There is an association between race and owning a smartphone (at least one of the population proportions is different).
a. Null: The probability of donating with the opt-in survey is the same as with the opt-out survey. Alternative: At least one of the probabilities differs. The Max-Min statistic is 0.820 − 0.418 = 0.402. The p-value is < 0.001. We have strong evidence that there is a difference in the probability of organ donation depending on the default option.
d. Mean Group Diff = 0.031 e. p-value ≈ 0.65
b. The null distributions are both right skewed and always positive, but the null distribution of the Mean Group Diff statistic is less variable (SD = 0.038) than the null distribution of the Max-Min statistic (SD = 0.057).
f. We don’t have strong evidence of an association between race and whether or not someone owns a smartphone in the population of U.S. adults. 8.1.26
c. The strength of evidence is similar between the two approaches.
a. Smartphone ownership among cell phone owners:
8.1.30
White
Black
Hispanic
Total
Own a Yes smartphone? No
829
114
168
1,111
142
25
36
203
Total
971
139
204
1,314
a. H0: The probability of answering yes is the same across the interview modes. Ha: The probability of answering yes about exercising is not all the same across the interview modes. b. Human text = 34/158 = 0.215, auto text = 46/157 = 0.293, human voice 21/160 = 0.131, auto voice = 20/159 = 0.126 c. Mean Group Diff = 0.098
b. White: 829/971 = 0.854, Black: 114/139 = 0.820, Hispanic: 168/204 = 0.824
d. p-value ≈ 0.0001
d. The p-value is about 0 and the standard deviation is about 0.008.
e. We have very strong evidence that the probability of answering yes to the exercising question is not the same across the interview modes. Because random assignment was used, we can conclude that the survey delivery method caused the difference in responses. There is no indication as to who the subjects were, so generalization is difficult.
e. The Mean Group Diff statistics are the same because each survey had the same three proportions, 0.3, 0.25, and 0.2.
8.1.31 a. See Solution 8.1.31 table
f. Survey B has a much larger sample size; hence we have more information or more evidence against the null hypotheses. Intuitively, this should result in a smaller p-value.
b. H0: The probability of rounding an answer is the same across the interview modes. Ha: The probability of rounding an answer is not the same across the interview modes.
g. When we increase sample size, we reduce variability in the null distribution. This can be seen in the smaller standard deviation in the null distribution for survey B. This means our Mean Group Diff statistic of 0.067 is farther (or more standard deviations) out in the tail of the null distribution for survey B when compared to that for survey A. This results in the smaller p-value for survey B.
c. Mean Group Diff = 0.063
8.1.27 a. The Mean Group Diff statistic is 0.067. b. The p-value is about 0.26 and the standard deviation is about 0.026. c. The Mean Group Diff statistic is 0.067.
d. p-value ≈ 0.04 e. We have strong evidence that probability of rounding an answer is not the same across the interview modes. Because random assignment was used, we can conclude that the survey delivery method caused the difference in responses. There is no indication as to who the subjects were, so generalization is difficult.
8.1.28 a. Null: The population proportion of cars that stop when leading is the same as when following. Alternative: The population proportions are different. The Max-Min statistic is 0.905 − 0.776 = 0.129. The
8.1.32 a. Inflating the overall probability of a Type I error (at least one of the tests falsely finding a statistically significant difference)
Human Automated Human Automated text text voice voice Total Yes
27
19
39
29
114
No
131
138
121
130
520
Total
158
157
160
159
634
Round?
Solution 8.1.31
c08Solutions.indd 102
10/16/20 8:05 PM
Solutions to Problems 103 b. The adjusted Type I error rate would need to be smaller than 5%. 8.1.33 a. There are 4 generation categories, which corresponds to 6 different pairwise tests. b. The probability of making at least one Type I error is one minus the probability of making no Type I errors: 1 – 0.946 = 0.31 or about 31%. This is called the family wise error rate. c. We want to increase the 0.94 value until we reduce the 0.31 back down to 0.05. Trial and error gives us a value of 0.992 indicating that we need to make the individual error rate equal to 1 – 0.992 = 0.008 or 0.8%.
Section 8.2 8.2.1 D. 8.2.2 D.
of adolescents reporting consistent condom use in the safer-sex group (πsafer-sex), and the long-run probability of adolescents reporting consistent condom use in the control group (πcontrol) e. Null: There is no association between prevention group and condom use (all three long-run probabilities are the same). Alternative: There is an association (at least one long-run probability is different). f. See Solution 8.2.8f table. g. Get 107 playing cards, 55 black and 52 red. Shuffle and deal the cards into three piles of 34, 32, and 41, respectively. Create a two-way table of the shuffled cell counts and use the shuffled counts to compute the chi-square statistic. Repeat this many times. Determine the proportion of times that chi-square statistics from simulated tables are larger than the observed chi-square statistic (original table)—this is the p-value. 8.2.9
8.2.3 B. 8.2.5 C.
a. Null: There is no association between HIV intervention program and condom use. Alternative: There is an association between HIV intervention program and condom use.
8.2.6
b. The cell counts are above 10 in each cell of the table.
a. SIGNIFICANTLY LESS
c. Chi-square statistic = 3.0, p-value = 0.223
b. SIGNIFICANTLY LESS
d. Based on our p-value of 0.223, we have little to no evidence of an association between HIV intervention programs and condom use.
8.2.4 A.
c. NOT SIGNIFICANTLY DIFFERENT 8.2.7 a. The statistic can never be negative. The statistic is a sum of terms p − p p (1 − ˆ p). All of the individual of the form ni( ˆ ˆi)2, all divided by ˆ terms are nonnegative, and the overall divisor is positive. b. The chi-square can be zero only if all the individual conditional probabilities ˆ pi are equal to each other (and thus equal to p̂). c. ii.
e. A cause-and-effect conclusion would be possible here because random assignment was used, but there isn’t evidence of an effect (p-value is not small). Generalizability should be done with caution because random sampling was not used. f. Because there is not evidence of an association, (at least one of the) confidence intervals on the differences will include zero; confidence intervals are useful in establishing the size and location of an effect, but we don’t have evidence of one in this case. 8.2.10 2
Totalnumberofcells inthetwo-waytable (observed−expected ) 2
χ = d. The long right tail comes from the fact that there is a “wall” at zero. The statistic can take on any positive value but can never be negative. [More detail: In the formula for chi-square, the values of the ni and p ˆ are fixed by the marginal totals; only the values of the squared terms (p ˆi − p ˆ)2 vary as you rerandomize. For some randomizations, just by chance, one of these terms will be unusually large.] 8.2.8 a. Experiment, because subjects were randomly assigned to one of the three groups
∑
i=1
___________________ expected
Totalnumberofcellsin thetwo-waytable ( Oi − Ei ) 2 ____________
=
∑
i=1
Ei
(20 − 16.5)2 __________ (14 − 17.5)2 __________ (20 − 16.4)2 + = __________ + 17.5 16.5 16.4 (12 − 15.6)2 __________ (21 − 21.1)2 __________ (20 − 19.9)2 + __________ + + 21.1 15.6 19.9 = 0.69 + 0.73 + 0.77 + 0.81 + 0.00 + 0.00 = 3
b. Each adolescent
8.2.11
c. Explanatory: prevention program (categorical), response: condom use (categorical)
a. Increase
d. The long-run probability of adolescents reporting consistent condom use in the abstinence group (πabstinence), the long-run probability
c. Decrease
b. Increase
Abstinence Safer-sex intervention intervention Control Total Consistent Yes condom use? No
17.5
16.4
21.1
55
16.5
15.6
19.9
52
Total
34
32
41
107
Solution 8.2.8f
c08Solutions.indd 103
10/16/20 8:05 PM
104
C HA PTE R 8
Comparing More Than Two Proportions
d. Stronger because larger sample size with same statistics will yield stronger evidence 8.2.12 a. Null: There is no association between generation and marriage views. Alternative: There is an association. b. All cell counts are larger than 10.
severe vs. very, with the “not” population (no sleep apnea) having a lower probability of severe hypertension than those with sleep apnea and the “somewhat” population having lower probability of severe hypertension than the higher levels of sleep apnea.
c. Chi-square = 22.26, p-value = 0.0001
d. We have very strong evidence of an association between generation and marriage views.
e. Cause and effect is not possible here because this is not a randomized experiment; generalizing these findings to all adult Americans is possible because a random sample was used.
f. Mill – Gen X (−0.0444, 0.0662), Mill – Boom (0.0401, 0.1407)*, Mill − 65 (0.0422, 0.1938)*, Gen X − Boom (0.0341, 0.1248)*, Gen X − 65 (0.0345, 0.1797)*, Boom − 65 = (−0.0412, 0.0965). The confidence intervals show that younger generations tend to more often report that marriage is obsolete. Millenials and Gen X are both significantly more likely than Boomers and 65+ to report marriage being obsolete.
8.2.15 2
χ =
=
c. Explanatory: severity of sleep apnea (categorical), response: severity of hypertension (categorical) d. The proportion of individuals in the population who have severe hypertension in each of the sleep apnea severity populations (e.g., πnot)
Ei
2
(32−58.0) (142 − 157.3)2 __________ (63 − 40.9)2 = _________ + ____________ + 157.3 40.9 58.0
(69 − 91.1)2 (365 − 349.7)2 __________ + _____________ + 349.7 91.1 (27 − 46.2)2 + __________ 46.2 = 11.7 + 1.5 + 11.9 + 17.8 + 5.2 + 0.7
+ 5.3 + 8.0 = 62.03
e. Null: πnot = πsomewhat = πsevere = πvery. Alternative: At least one of the π is different.
8.2.16
f. See table below.
a. Increase
Not Somewhat Severe Very
∑
i=1
(155 − 129)2 (40 − 20.8)2 ___________ + + __________ 129 20.8
b. Each person in the study, there are 893
Severity of sleep apnea
___________________ expected
∑
i=1
Totalnumberofcells inthetwo-waytable ( O − E ) 2 i i ____________
8.2.13 a. Observational study, subjects were not assigned to groups
Totalnumberofcells inthetwo-waytable (observed−expected ) 2
b. Decrease
Total
c. Increase d. Stronger because the sample size is increasing but the proportions are not changing
Yes Severe hypertension? No
58.0
157.3
40.9
20.8
277
129.0
349.7
91.1
46.2
616
Total
187
507
132
67
893
g. Get 893 cards: 277 red, 616 black. Shuffle the cards and deal them into four stacks of 187, 507, 132, and 67, respectively. Calculate the chi-square statistic for each shuffle. Repeat many times. The p-value is the proportion of shuffles which generate a chi-square statistic at least as large as the observed (from observed data) chi-square statistic. 8.2.14 a. Null: There is no association between severity of sleep apnea and severe hypertension. Alternative: There is an association between severity of sleep apnea and severe hypertension. b. The cell counts are larger than 10 for each cell. c. Chi-square = 62.03, p-value < 0.0001 d. The likelihood of getting a chi-square statistic of 62.03 or larger if the null hypothesis was true is <0.0001.
8.2.17 a. Each person in the study b. The population proportion of men with heart disease in each baldness group (π) c. Null: There is no association between baldness and heart disease. Alternative: There is an association between baldness and heart disease. d. Null: πnone = πlittle = πsome = πmuch. Alternative: At least one population proportion is different than the others. e. Chi-square = 14.51, p-value = 0.0023. We have strong evidence of an association between heart disease and baldness, but this is not evidence of cause and effect because it is an observational study. Furthermore, the results should be generalized with caution because the sample was not obtained by taking a random sample. 8.2.18 a. None: 0.431, little: 0.427, some: 0.513, much: 0.598
e. We have strong evidence of a genuine association between severity of sleep apnea and severe hypertension. This is not a cause-and-effect relationship because random assignment was not used. Furthermore, this was likely not a random sample and so generalizing to a larger population should be done with caution.
b. Two largest: much vs. none and much vs. little; smallest: none vs. little
f. Here is the set of intervals from the Multiple Proportions applet. They reveal significant differences between all the pairs except for
d. Yes, there are least 10 with heart disease and 10 without heart disease in each group.
c08Solutions.indd 104
c. Much vs. none: p-value = 0.0036, (−0.277, −0.056)*, much vs. little: p-value = 0.004 (−0.285, −0.056)*, none vs. little: p-value = 0.9067 (−0.06, 0.07)
10/16/20 8:05 PM
Solutions to Problems 105 8.2.19 a. Observational study because subjects were not assigned to a group b. Each of the 3,454 people in the study c. Explanatory: activity level, response: healthy aging. Both variables are categorical. d. Null: There is no association between activity level and healthy aging. Alternative: There is an association. e. Get 3,454 cards: 665 black and 2,789 red. Shuffle and deal into three piles of 653, 1,692, and 1,109, respectively. Compute the chisquare statistic. Repeat many times. The proportion of times that the simulated chi-square statistics are larger than the observed statistic is the p-value. 8.2.20 a. All cell counts are at least 10. b. chi-square = 66.05, p-value < 0.0001 c. The probability of getting a chi-square statistic of 66.05 or larger if the null hypothesis is true is less than 0.0001. d. We have very strong evidence of an association between activity level and healthy aging. This result does not allow a cause-and-effect conclusion to be drawn because it is not a randomized experiment, nor can this result be generalized because a random sample was not used. e. Inactive vs. mod (–0.148, –0.091)*, inactive vs. vigorous (–0.19, –0.12)*, mod vs. vigorous (–0.07, –0.004)*. Each level of additional activity is associated with significantly more healthy aging. 8.2.21 χ 2=
Totalnumberofcells inthetwo-waytable (observed−expected ) 2
___________________ expected
∑
i=1
Totalnumberofcells inthetwo-waytable ( O i − Ei ) 2 ____________
=
∑
i=1 2
e. Null: πRep = πDem = πIndep. Alternative: At least one of the population proportions is different. f. Null: There is no association between political leaning and whether or not someone favors a tax on food/soda. Alternative: There is an association. g. There are at least 10 individuals in each cell of the table. h. Chi-square = 56.41, p-value < 0.001 i. The likelihood of getting a chi-square statistic of 56.41 or larger if the null hypothesis is true is < 0.0001. j. We have very strong evidence of an association between political party and opinion on the tax. This does not, however, suggest a causeand-effect relationship between political leaning and the tax (not randomly assigned to political leaning), but the result does generalize to all U.S. adults (random sampling). k. Rep vs. Dem (−0.26, −0.16)*, Rep vs. Ind (−0.14, −0.04)*, Dem vs. Ind (0.07, 0.17)*. All groups are significantly different from each other with Democrats most in favor, followed by Independents, and Republicans showing the lowest support for the tax. 8.2.24 a. Observational study, individuals were not assigned to outcomes for either variable b. Each participant in the survey (6,272) c. Explanatory: amount of fish, response: prostate cancer? Both are categorical. d. The population proportion that get prostate cancer within each fish consumption group (π) e. Null: πlg = πmod = πsmall, = πnone. Alternative: At least one of the π is different. f. Null: There is no association between fish consumption and prostate cancer. Alternative: There is an association. g. There are at least 10 individuals in each cell of the table.
Ei
h. Chi-square = 3.68, p-value = 0.30 2
(55 − 125.7) (345 − 325.8) = ___________ + ____________ 125.7 325.8 (265− 213.5)2 + ____________ 213.5 (1347 − 1366.2)2 (598 − 527.3)2 ______________ + + ____________ 527.3 1366.2 (844 − 895.5)2 + _____________ 895.5 = 39.8 + 1.1 + 12.4 + 9.5 + 0.3 + 3.0 = 66.05 8.2.22 a. Increase. Smaller sample sizes means the differences are more likely to occur by random chance alone.
i. The likelihood of getting a chi-square statistic of 3.68 or larger if the null hypothesis is true is approximately 0.30. j. With a p-value of 0.30, we do not have evidence of an association between fish consumption and prostate cancer. This would not possibly lead to a cause-and-effect conclusion because it is not a randomized experiment. Little information is available about the sample to know to what population generalization would be possible. k. Because there is not convincing evidence of an association, some of the confidence intervals on the differences will include zero; confidence intervals are useful in establishing the size and location of an effect, but we don’t have evidence of one in this case. 8.2.25
b. Smaller the same differences will seem less extreme with the smaller sample sizes.
a. Automated text was most likely to get a yes (0.293), and automated voice the least likely (0.126).
c. Weaker because smaller sample size (with the same differences) means less evidence against the null hypothesis (larger p-value).
b. Yes, all the cell counts are at least 10.
8.2.23 a. Observational study, individuals were not assigned values of either variable b. Each participant in the survey (1883) c. Explanatory: political leaning, response: favor tax? Both are categorical. d. The population proportion that favor tax on unhealthy food/soda within each political group (π)
c08Solutions.indd 105
c. H0: There is no association between the interview mode and answering yes to the exercise question. Ha: There is an association between the interview mode and answering yes to the exercise question. d. p-value ≈ 0.0002 d. We have very strong evidence that there is an association between the interview mode and answering yes to the exercise question. The human text and the automated text were both significantly more likely to get a yes answer than both the human voice and automated voice.
10/16/20 8:05 PM
106
C HA PTE R 8
Comparing More Than Two Proportions Human Automated Human Automated text text voice voice Total Round?
Yes
27
19
39
29
114
No
131
138
121
130
520
158
157
160
159
634
Total Solution 8.2.26a
8.2.26
8.2.29
a. See table in Solution 8.2.26a.
a. Is pulling an all-nighter before a test worth it?
b. Yes, because the count in each cell is at least 10
b. Experiment. Students will be randomly assigned to stay up all night studying, cram the day before but get a good night’s sleep, or no intervention (control group) prior to taking a practice version of the GRE. Students will be invited to participate (volunteer sample).
c. H0: There is no association between the interview mode and rounding an answer to the movie question. Ha: There is an association between the interview mode and rounding the answer to the movie question. d. p-value = 0.0419
c. Each student volunteer d. Explanatory: up all night or good night sleep or control, response: got above 150, got below 150. Both are categorical.
e. We have strong evidence that there is an association between the interview mode and whether people would round their answer to the question of how many movies they have seen in the past year. The human voice is significantly more likely to get a rounded answer than is the automated text.
e. Null: There is no association between pulling an all-nighter before a test and getting above 150 on the test. Alternative: There is an association.
8.2.27
8.2.30
a. 2019 (56.0% compared to 46.0%) b. You could not use the Mean Group Diff because the response variable is not binary. c. H0: There is no association between the year and how often people use Facebook. Ha: There is an association between the year and how often people use Facebook.
f. When there are at least 10 people in each of the six cells of the table a. See the table.
Desipramine Lithium HCl carbonate Placebo Total Relapse?
d. p-value = 0.0000 e. We have very strong evidence that there is an association between the year and how often people use Facebook. In 2019 they were significantly more likely to use Facebook several times a day and significantly less likely to use Facebook only a few times a week than in 2013. 8.2.28 a. Observational study. Students were not randomly assigned outcomes for either variable. b. Each of the 50 students c. Explanatory: sex (male/female), response: eat breakfast frequency. Both are categorical. d. Null: There is no association between a person’s sex and breakfast eating frequency. Alternative: There is an association. e. Some of the cell counts are less than 10. f. Get 50 cards: 26 blue, 11 red, 10 green, and 3 black. Shuffle and deal into two stacks of 37 and 13, respectively. Compute the chi-square statistic. Repeat many times. The p-value is the proportion of times the observed chi-square statistic or larger occurred in the simulations. g. h. We obtain a chi-square value of 1.4 or larger approximately 79% of random shuffles when the null hypothesis is true. i. We have little to no evidence of an association between sex and whether or not someone eats breakfast. This result should be generalized with caution (not random sampling), and if statistical significance was established, a cause-and-effect conclusion would not be possible (not random assignment).
c08Solutions.indd 106
Total
Yes
10
18
20
48
No
14
6
4
24
24
24
24
72
b. Experiment. Subjects were assigned to the three drug groups. c. Each of the 72 individuals in the study d. Explanatory: type of medicine, response: did patient relapse again? Both variables are categorical. e. Null: There is no association between type of medicine and relapse. Alternative: There is an association. f. Two of the cell counts are less than 10. g. Get 72 playing cars: 48 black, 24 red. Shuffle and deal into three stacks of 24 each. Compute the chi-square statistic and repeat. h. Chi-square = 10.50, p-value = 0.0052 i. The chances of getting a chi-square statistic of 10.50 or larger if the null hypothesis is true is 0.0052. j. We have strong evidence of an association between treatment group and chance of relapse. This is a cause-and-effect conclusion because random assignment was used. The results cannot be generalized easily, as little is known about the participants. 8.2.31 a. Null: There is no association between happiness and income. Alternative: There is an association between happiness and income. b. All cell counts are larger than 10. c. Chi-square = 73.35, p-value < 0.0001 d. The probability of obtaining a chi-square statistic of 73.35 or larger by chance if the null hypothesis is true is < 0.0001. e. We have strong evidence (p-value < 0.0001) of an association between happiness level and income category among all U.S. adults.
10/16/20 8:05 PM
Solutions to Problems Cause and effect would not be possible here because this is an observational study. Generalizing is possible because a random sample of U.S. adults was used. 8.2.32 a. Experiment. People are randomly assigned to a group. b. Null: There is no association between treatment group and race. Alternative: There is an association.
107
ier to compare the distributions across the three years. The segments for Seminole and Pierce are much smaller than the segments for the other three lakes: a smaller proportion of the alligators came from Pierce and Seminole in all three years. Whereas the counts were more similar across the lakes in 2000, we see the decrease in Seminole and Pierce with increases in Blue and Hatch, the next two years.
c. Some of the cell counts are less than 10. d. Chi-square = 3.97, p-value = 0.46 e. Shows that random assignment “worked” in the sense that it kept the two treatment groups roughly the same in terms of race distribution (this is what the p-value tells you).
8.2.38
8.2.33
40 20 0
2000
8.2.34 a. Null: There is no association between dietary restraint and ability to estimate calorie expenditure. Alternative: There is an association.
Pierce
60
2005
Seminole
e. We have strong evidence (p-value = 0.034 < 0.05) of an association between dietary restraint and ability to estimate calorie expenditure. We cannot draw a cause-and-effect conclusion because this was an observational study and we don’t know how the subjects were recruited so it is risky to generalize this conclusion to a larger population.
80 Percentage
d. We will obtain chi-square statistics of 6.86 or larger by chance in about 3.4% of simulated random shuffles when the null hypothesis is true.
Pierce
100
c. Chi-square = 6.86, p-value = 0.034
Seminole
b. One of the cell counts is less than 10.
Pierce
8.2.39
Seminole
a. Null: There is no association between dietary restraint and ability to estimate calorie expenditure. Alternative: There is an association.
2010
Year
b. Two of the cell counts are less than 10. c. Chi-square = 2.41 p-value = 0.30 d. We will obtain chi-square statistics of 2.41 or larger by chance in about 30% of simulated random shuffles when the null hypothesis is true. With this p-value, we do not have strong evidence of an association between dietary restraint and ability to estimate energy intake. However it would be risky to generalize these results to a larger population until we know more about how the sample was selected. Even if the result had been statistically significant we would not draw a cause-and-effect conclusion from this observational study. 8.2.35
8.2.40 The corresponding p-value is nowhere near significant (p-value > 0.8). The conclusion is that for the three larger lakes there is no association between lake and year. Put differently, all three of the larger lakes show the same time trend in terms of numbers of alligators harvested. 8.2.41
8.2.36
8.2.37 8.2.42
c08Solutions.indd 107
10/16/20 8:05 PM
108
C HA PTE R 8
Comparing More Than Two Proportions
8.2.43
b. H0: Each value on the die is equally likely to occur (π1 = 1/6, …, π6 = 1/6). a: Each value on the die is not equally likely to occur (at least one π differs from 1/6).
8.2.44 8.2.45
c. χ2 = 6.099 and the p-value = 0.2967
8.2.46
d. We don’t have strong evidence that each value on the die is not equally likely to occur (in other words we don’t have evidence that this is not a fair method of rolling a die).
Section 8.3 8.3.1 D.
8.3.12
8.3.2 B.
a. Each is 50/5 = 10.
8.3.3 B.
b. H0: Each value is equally likely to occur (π1 = 0.20, …, π5 = 0.20). Ha: Each value is not equally likely to occur (at least one π differs from 1/5).
8.3.4 B. 8.3.5 C.
c. χ2 = 2 and the p-value = 0.7358
8.3.6
d. We don’t have strong evidence that each value is not equally likely to occur (in other words we don’t have evidence that the phone is not picking random numbers). e. A large p-value is not strong evidence for the null so we do not have strong evidence that it is truly random, we would just say we don’t have strong evidence against the process being random.
8.3.7 a. 2.667 b. 2.741
f. We could combine some categories. For example we could test to see if values 1, 2, and 3 are distributed significantly different than 60% of the time and values 4 and 5 distributed significantly different than 40% of the time.
8.3.8 a. 1 b. 3.143
8.3.13
8.3.9
a. Blue: 286, brown: 84, other: 30
a. Both are the same, 1:1.5:1 (or 2:3:2).
b. H0: Female UK models’ eye colors are distributed in the same way as all female UK residents (π blue = 0.715, π brown = 0.21, πother = 0.075). Ha: Female UK models’ eye colors are not distributed in the same way as all female UK residents.
b. Just based on the ratios, you might think the strength of evidence against the null hypothesis is the same. But, when you look at the numbers a difference between 100 and 150 is much greater than the difference between 10 and 15, so you might think that dataset B would provide stronger evidence. We also know that larger sample sizes provide stronger evidence against the null hypothesis, all else the same.
c. χ2 = 182.0 and the p-value < 0.0001 d. We have very strong evidence that female UK models’ eye colors are not distributed in the same way as all female UK residents. They have fewer blue-eyed models than expected and more brown-eyed and other than expected.
c. The chi-square statistic for A is 1.429 and for B is 14.286. d. There is much stronger evidence against the null that π1 = π2 = π3 = 1/3, is ratio 1:1:1, with dataset B as its chi-square statistic is much larger (with the same number of categories). This makes sense because we increased the sample size and, as we have seen, when we increase the sample size and everything else stays the same, we have stronger evidence against the null hypothesis.
8.3.14 a. Blue: 32, brown: 296, other: 72 b. H0: Female Brazilian models’ eye colors are distributed in the same way as all female Brazil residents (π blue = 0.08, π brown = 0.74, πother = 0.18). Ha: Female Brazilian models’ eye colors are not distributed in the same way as all female residents in Brazil.
8.3.10 a. Each is 25.
c. χ2 = 110.0 and the p-value < 0.0001
b. H0: Each season is equally likely to be a favorite (π1 = 1/4, …, π4 = 1/4). Ha: Each season is not equally likely to be a favorite (at least one π differs from 1/4).
d. We have very strong evidence that female Brazilian models’ eye colors are not distributed in the same way as all female residents in Brazil. They have fewer brown-eyed models than expected and more blue-eyed and other than expected.
c. χ2 = 20.72 and the p-value = 0.0001 d. We have strong evidence that each season is not equally likely to be a favorite (or the distribution of favorite seasons in the population is not equally distributed).
8.3.15 a. See Solution 8.3.15a table.
8.3.11
b. H0: The leading digits in the textbook are distributed according to Benford’s law (π1 = 0.301, …, π9 = 0.046). Ha: The leading digits in the textbook are not distributed according to Benford’s law.
a. Each is 20.
Leading digit Expected frequency
1
2
3
4
5
6
7
8
9
39.732
23.232
16.5
12.804
10.428
8.844
7.656
6.732
6.072
Solution 8.3.15a
c08Solutions.indd 108
10/16/20 8:05 PM
Solutions to Problems 109 c. χ2 = 11.849 and the p-value = 0.1581
8.3.20
d. We don’t have strong evidence that the leading digits in the textbook are not distributed according to Benford’s law.
a. H0: For dialysis patients receiving treatment on Tuesday, Thursday, and Saturday, the probability of cardiac arrest is equal to 1/3 for each day. Ha: For dialysis patients receiving treatment on Tuesday, Thursday, and Saturday, the probability of cardiac arrest is not 1/3 on each day
8.3.16 a. H0: The plants producing pea colors and shapes as described in the table are distributed in a 9:3:3:1 ratio. Ha: The plants producing pea colors and shapes as described in the table are not distributed in a 9:3:3:1 ratio. b. χ2 = 0.47, p-value = 0.9254 c. We do not have strong evidence that the plants producing pea colors and shapes as described in the table are not distributed in a 9:3:3:1 ratio. 8.3.17 a. H0: The plants producing pea colors and shapes as described in the table are distributed in a 1:2:2:4 ratio. Ha: The plants producing pea colors and shapes as described in the table are not distributed in a 1:2:2:4 ratio. b. χ2 = 1.517, p-value = 0.6784 c. We do not have strong evidence that the plants producing pea colors and shapes as described in the table are not distributed in a 1:2:2:4 ratio. 8.3.18 a. H0: The purple-flowered plants and the white-flowered plants are distributed in a 3:1 ratio. Ha: The purple-flowered plants and the white-flowered plants are not distributed in a 3:1 ratio. b. χ2 = 0.391, p-value = 0.5319 c. We do not have strong evidence that the purple- flowered plants and the white-flowered plants are not distributed in a 3:1 ratio. 8.3.19 a. H0: For dialysis patients receiving treatment on Monday, Wednesday and Friday, the probability of cardiac arrest is equal for Monday, Wednesday, and Friday (1/3 on each day). Ha: For dialysis patients receiving treatment on Monday, Wednesday and Friday, the probability of cardiac arrest is not 1/3 for each day. b. χ2 = 13.473, p-value = 0.0012 c. We have strong evidence that for dialysis patients receiving treatment on Monday, Wednesday and Friday, the probability of cardiac arrest is not 1/3 for each day. In the sample, we saw almost twice as many deaths on Monday compared to Wednesday and Friday.
Solution 8.3.23e
c08Solutions.indd 109
b. χ2 = 1.021, p-value = 0.6002 c. We do not have strong evidence that for dialysis patients receiving treatment on Tuesday, Thursday, and Saturday, the probability of cardiac arrest is not distributed evenly on those days. 8.3.21 a. H0: Heart attacks are distributed evenly across the days of the week for employed people. Ha: Heart attacks are not distributed evenly across the days of the week for employed people. b. χ2 = 18.254, p-value = 0.0056 c. We have strong evidence that heart attacks are not distributed evenly across the days of the week for employed people. d. Looking at the observed counts, we see a higher number of heart attacks on Monday compared to the other days of the week, with the lowest on Sunday. It makes sense that Monday will be different for those who have weekends/days of rest on Saturday and Sunday compared to the rest of the week. 8.3.22 a. H0: Heart attacks are distributed evenly across the days of the week for people who are not employed. Ha: Heart attacks are not distributed evenly across the days of the week for people who are not employed. b. χ2 = 2.227, p-value = 0.8977 c. We do not have strong evidence that heart attacks are not distributed evenly across the days of the week for people that are not employed. d. We no longer see evidence of a “Monday effect” for individuals who do not have the work day/weekend day distinction. 8.3.23 a. H0: πM = πT = πW = πR = πF = 1/5 Ha: Not all the values of πi are the same (at least one π differs from 1/5) b. 4 c. Any chi-square statistic from 9.41 to 9.60 gives the same p-value of about 0.0479 which appears to be as close to 0.05 as we can get. Therefore a rejection region is any chi-square at or above 9.41 and the probability of a Type I error is about 0.0479. d. They appear to be coming from the distribution in the alternative.
Solution 8.3.23g
10/16/20 8:05 PM
110
C HA PTE R 8
Comparing More Than Two Proportions
e. There are 2,370 out of 10,000 simulated statistics in the rejection region. f. If the null hypothesis is false and in fact the true proportion of calling in sick is 0.26 on Monday and Friday and 0.16 on the rest of the days, the probability a sample of size 50 would give significant results (a p-value less than 0.05) is about 0.2370.
b. Answers will vary. Here we see no men in the “almost never” category (but lots of women) and no women in the “every day” category (and lots of men).
Men
Women
Total
Every day
22
0
22
g. The power is larger because the distribution in the alternative has proportions farther away from that under the null. The power of this test is now about 0.8010 or if the null hypothesis is false and in fact the true proportion of calling in sick is 0.32 on Monday and Friday and 0.12 on the rest of the days, the probability a sample of size 50 would give significant results (a p-value less than 0.05) is about 0.8010.
Sometimes
16
6
22
Almost never
0
26
26
Total
38
32
70
h. Using the MAD statistic our rejection region is a MAD statistic equal to or greater than 3.61 (this gave us a p-value of 0.0436). The power of this test is about 0.2215, a little bit smaller than if we would use the chi-square statistic.
80
Sometimes
Almost never
Women
40 20 0
Women
Men
60 Men
Percentage
i. Using the Max-Min statistic our rejection region is a Max-Min statistic equal to or greater than 0.25 (this gave us a p-value of 0.0338). The power of this test is about 0.1917, a little bit smaller than if we would use the chi-square statistic. Both the MAD and the Max-Min statistics give distributions that have gaps and are not as continuous as a chisquare distribution would be. Therefore, we could not get as close to 0.05 to determine a cutoff value. For example using the Max-Min our cutoff value of 0.25 only gave a p-value of 0.0338. With this smaller p-value (more difficult to reject the null) we would expect the power of the test to decrease because this also relies on rejecting the null.
100
Every day
Solution 8.CE.3b
Coffee consumption
8.CE.4 a.
End of Chapter 8 Exercises 8.CE.1 B.
8.CE.3
0
Men
Women
Total
Every day
12
10
22
Sometimes
12
10
22
Almost never
14
10
26
Total
38
32
70
Every day
Sometimes
Coffee consumption
Men Women
20
Women
40
c. No, the data are not cross-tabulated (e.g., number of men, every day). a. The following answer gives the closest possible counts (with whole numbers) to having no association.
Men
60 Women
b. It could involve random sampling (give the survey to a random sample) but not random assignment unless you told people how much coffee they have to drink.
80 Percentage
a. Sex and coffee consumption; both variables are categorical (sex = 2 categories, coffee consumption = 3 categories).
Men
100
8.CE.2
Almost never
b.
Men
Men Women
40 20 0
c08Solutions.indd 110
Women
60
Women
Percentage
80
Men
100
Every day Sometimes
Almost never Coffee consumption
Solution 8.CE.4b
10/16/20 8:05 PM
Solutions to Problems c. Because 0.02 < 0.05, we have strong evidence of an association between sex and coffee. d. Although we have strong evidence of an association (e.g., with women coffee more often than men), this result cannot generalize because a random sample wasn’t taken, nor does this suggest a causeand-effect relationship because random assignment was not used. 8.CE.5
111
( p̂ = 122/1218 = 0.100). Because the sample proportion is 0.10, little further work is needed to see that the sample will not provide evidence that the population proportion is larger than 0.10. The p-value from simulation or theory is approximately 0.50 (theory is z = 0.02, p-value = 0.4907). Note that theory-based p-value is valid because, in the sample, more than 10 people gave blood and more than 10 people didn’t. We do not have evidence that the proportion of all Americans that gave blood last year was more than 0.10. b. A 95% confidence interval (theory-based) is (0.08, 0.12), meaning that we are 95% confident that the percentage of all Americans that gave blood last year was between 8% and 12%.
8.CE.7 a. Liberal: 0.115, moderate: 0.074, conservative: 0.115 b. Mean Group Diff = 0.027
b. Less than HS: 0.138, HS: 0.272, JC: 0.308, Bach: 0.494, Grad: 0.634. See the graph for Solution 8.CE.10b. Sample data 100 T
c. Null: There is no association between political viewpoint and giving blood.
80 Percentage
d. See the graph for Solution 8.CE.7d.
60 40
T
c. Stronger
a. Explanatory: educational degree received, response: generally trust people
T
b. Smaller
8.CE.10
T
a. None
T
8.CE.6
Solution 8.CE.10b
N
High
N
Less
N
N
0
N
20
Bach. JC Highest degree
Grad.
c. Null: There is no association between educational degree and trusting others. Alternative: There is an association between educational degree and trusting others. Solution 8.CE.7d
e. The p-value is 0.149. The probability of obtaining a Mean Group Diff statistic of 0.027 or larger if the null hypothesis is true is 0.149.
d. Mean Group Diff = 0.243 e. The approximate p-value is 0. See the graph for Solution 8.CE.10e.
f. No, we do not have enough evidence to reject the null hypothesis at the 0.05 signi-ficance level. We do not have evidence to conclude that there is an association between political viewpoint and giving blood. 8.CE.8 Chi-square test is valid because all cell counts are at least 10. The chi-square statistic is 5.18, yielding a p-value of 0.0751, giving us moderate evidence of an association between political viewpoint and giving blood. Note that the chi-square statistic yields a p-value a fair bit lower than the p-value from simulation using the Mean Group Diff statistic in the previous question. 8.CE.9 a. The appropriate analysis here is a one-proportion test. Null: The percentage of all Americans that gave blood last year is 10%. Alternative: The percentage of all Americans that gave blood last year is more than 10%. The sample size is 1218 and 122 gave blood last year
c08Solutions.indd 111
Solution 8.CE.10e
f. The probability of obtaining a Mean Group Diff statistic of 0.243 or larger if the null hypothesis is true is approximately 0.
10/16/20 8:05 PM
112
C HA PTE R 8
Comparing More Than Two Proportions
g. Yes h. We have strong evidence of an association between educational degree attained and trust. 8.CE.11
8.CE.14
(−0.136 to 0.065). 8.CE.12 Chi-square = 156.67, p-value < 0.0001. We have strong evidence of an association between educational level attained and trust level, with trust increasing with educational level. There is little difference between this analysis and the one with the two-category response variable in the previous exercises. (We proceeded with the chisquare test here even though one cell had a count less than 10. This seems valid since the count is very close to 10 and since the other cell counts were quite large.) 8.CE.13
Total
Grades
130
247
Popular
50
91
141
Sports
60
30
90
Total
227
251
478
Boy
Girl
Total
Grades
0.515
0.518
0.517
Popular
0.220
0.363
0.295
Sports
0.264
0.120
0.188
Total
1.000
1.000
1.000
Ab
b. 130/251 = 0.518 of girls think getting good grades is most important. c. 50/227 = 0.220 of boys think being popular is most important. d. 91/251 = 0.363 of girls think being popular is most important.
Lt
Lt
Lt
Girl
117
e. 60/227 = 0.264 of boys think excelling in sports is most important.
20
g. Overall, 247/478 = 0.517 of boys and girls think getting good grades is most important.
0
On
Hvy
f. 30/251 = 0.120 of girls think excelling in sports is most important.
Hvy
40 Hvy
Percentage
80 60
Boy
a. 117/227 = 0.515 of boys think getting good grades is most important. Ab
Ab
Sample data 100
college, these results generalize to the student body at this college but will not necessarily represent the patterns at other colleges and universities. Because this is an observational study cause-and-effect conclusions are not possible.
Off OffP Where live?
h. Overall, 141/478 = 0.295 of boys and girls think being popular is most important. i. Overall, 90/478 = 0.188 of boys and girls think excelling in sports is most important. j. On balance, it looks like boys and girls do have differing goals. Boys and girls seem to be the same in terms of the proportion who think getting good grades is most important, but a higher proportion of girls think being popular is most important and a higher proportion of boys think excelling in sports is more important. 8.CE.15
−0.092 to 0.087; thus, because zero is in this interval, we find there is not strong evidence that the population proportion choosing good grades is different for boys and girls. A 95% confidence interval for the difference in the population proportion of boys and girls that think
c08Solutions.indd 112
10/16/20 8:05 PM
Solutions to Problems being popular is most important is −0.223 to 0.062; thus we find there is strong evidence that the population proportion that think being popular is lower among boys than girls. A 95% confidence interval for the difference in the population proportion of boys and girls that think sports is most important is 0.075 to 0.215; thus we find there is strong evidence that the population proportion of boys that think sports is most important is higher than girls.
113
same regardless of choice. The observed Mean Group Diff is 0.085. The observed chi-square is 1.78. Based on 10,000 repetitions, an estimated p-value is 0.2232. The theory-based p-value is 0.182. (We might have expected more agreement between the simulation-based and theory-based approaches here because all observed counts are at least 10.) The conclusion is that there is no evidence in the data to support an inference that the chance of a collision differs according to location. The scope of inference is the same as in Exercise 8.CE.16, part (b). 8.CE.18 a. The null hypothesis is that the conditional probability of a collision given the speed of the experimenter is the same whether fast or slow. The alternative hypothesis is that the two conditional probabilities differ.
8.CE.16 a. The conditional proportions of a collision are 0.391 for a male experimenter and 0.090 for a female experimenter. Here are the segmented bar charts:
Yes
60
8.CE.19
20 0
Male
No
40 No
Percentage
80
Yes
Sample data 100
Female
Experimenter
b. A simulation-based test can use either the Mean Group Diff or chisquare (or the difference in proportions) as a test statistic. (For 2 × 2 tables, the results will be exactly the same.) The value of the Mean Group Diff is 0.301, and the value of chi-square is 22.34. The simulated p-value will be essentially zero for both The null hypothesis is that the conditional probability of a collision is the same whether the experimenter is male or female. The alternative is that the two conditional probabilities are different. There is strong evidence against the null hypothesis and suggesting that the sex of the experimenter makes a difference. We are not told exactly how the conditions (male or female experimenter) were assigned, but we may cautiously draw a causeand-effect conclusion here. We are also not told how the subjects were chosen, so there is no basis for generalizing to a larger population. 8.CE.17 a. The null hypothesis is that the conditional probability of a collision given the location is the same whether the location is inside (hallway in a building) or outside (in a park). The alternative hypothesis is that the two conditional probabilities differ. b. Both simulation-based and theory-based tests are appropriate here, and the results and conclusions are essentially the same. A simulation-based test can use either the Mean Group Diff or the chisquare statistic. For 2 × 2 tables, the simulated p-value will be the
c08Solutions.indd 113
b. Both simulation-based and theory-based tests are appropriate here, and the results and conclusions are essentially the same. A simulation-based test can use either the Mean Group Diff or the chi-square statistic. For 2 × 2 tables, the simulated p-value will be the same regardless of choice. The observed Mean Group Diff is 0.048. The observed chi-square is 0.57. Based on 10,000 repetitions, an estimated p-value is 0.48. The theory-based p-value is 0.45. (The theory-based test is valid because all observed values are at least 10.) The conclusion is that there is no evidence in the data to support an inference that the chance of a collision differs according to speed of the experimenter. The scope of inference is the same as in Exercise 8.CE.16, part (b). a. The observational units are the 47 Harvard undergraduates. b. The (categorical) explanatory variable is whether the student received a bonus or a rebate. c. The (categorical) response variable is whether the student spent or saved the money. d. Abstractly, the null hypothesis is that there is no association between the explanatory and response variables. More concretely, in the context of the study, the null hypothesis is that the conditional probability of saving the money does not depend on whether it was called a rebate or a bonus. The alternative hypothesis is that the conditional probabilities differ. 8.CE.20 a. The number in the upper left cell is 9. The difference in conditional probabilities is 9/25 − 16/22 = −0.3673. Both statistics lead to the same two-sided p-value of 0.012. b. For 2 × 2 tables, the Mean Group Diff and chi-square statistics are equivalent. They both lead to the same simulated p-value. For this dataset, the Mean Group Diff is 0.367 and the chi-square statistic is 6.34. Based on 10,000 repetitions, both statistics lead to a p-value of about 0.02. (The theory-based p-value is 0.012.) c. The two p-values are quite close and lead to the same conclusion. Mathematically, it can be proven that for 2 × 2 tables of counts you always get the same p-value from a given set of shuffles, regardless of whether you use the count in the upper left cell, the difference in conditional proportions, the Mean Group Diff, or the chi-square statistic. 8.CE.21 a. The observed values of the Mean Group Diff is 0.128. The simulated null distribution is very bimodal: See the graph for solution 8.CE.21a.
10/16/20 8:05 PM
114
C HA PTE R 8
Comparing More Than Two Proportions baldness. A test statistic that flops back and forth based on 3 out of 1,435 cases is not to be trusted. j. With a total of more than 1,400 cases, it shouldn’t matter much what we do with just 3 of them. The analysis that omits the 3 cases of extreme baldness and the analysis that lumps them together with the men with much baldness should (and do) give similar results. k. For this dataset, a test based on the chi-square statistic is much more informative than one based on the Mean Group Diff, because the value of Mean Group Diff is so sensitive to where reshuffles locate the three extremely bald men. The lesson from this example is that when sample sizes are tiny it is worth trying more than one analysis, rather than choosing one and putting all your faith in that one. 8.CE.22 a. The overconfident group; 0.789 compared to 0.657 b. You could not use the Mean Group Diff because the response variable is not binary.
b. The null distribution for Mean Group Diff using four degrees of baldness has only one mode and is only moderately skewed. Most striking is the difference in simulated p-values, 0.27 with five categories, 0.003 with four categories. The difference is due to the small number of men (only three) with extreme baldness. Where they are placed by a random shuffle has a big impact on the value of the Mean Group Diff. c. Using four categories is much more trustworthy. The five-category analysis is much too influenced by a tiny percentage of the data, namely, the three men (out of more than 1,435) with extreme baldness. d. The observed value of chi-square is 14.57 on 4 df. The simulationbased null distribution is unimodal and right skewed. The estimated p-value from 10,000 repetitions is about 0.004. e. The two distributions have very different shapes. The one for Mean Group Diff is distinctly bimodal. That for chi-square is not. f. The p-values for the two simulation-based tests are very different. Using the Mean Group Diff gives a p-value of 0.27, which is very far from significant. Using chi-square gives a p-value of .004, which is highly significant. The two conclusions are at odds. The chi-square statistic, whose null distribution is not sensitive to where the shuffles locate the three extremely bald guys, is much more trustworthy. g. For a theory-based test using the chi-square statistic, the observed value is the same as for the simulation-based test, namely 1.02 + 1.85 + 0.86 + 2.77 + 0.23 + 1.19 + 2.15 + 1.00 + 3.23 + 0.27 = 14.57. The theory-based p-value is 0.0057, which is close to the simulation-based value of 0.004. h. The theory-based test is not trustworthy, because the two observed cell counts for extreme baldness are both tiny, nowhere near 10. Nevertheless, the theory-based p-value is essentially the same as the simulation-based p-value. (The validity condition is meant to warn you away from situations where the theory-based test might be misleading. Sometimes, as here, the test gives a reasonable estimate of the p-value even though the conditions are not satisfied.) i. The test based on four degrees of baldness is more informative and more persuasive. The test statistic based on five categories is very sensitive to where a random shuffle locates the three cases of extreme
c08Solutions.indd 114
c. H0: There is no association between confidence level and how many questions one gets correct. Ha: There is an association between confidence level and how many questions one gets correct. d. e. We have strong evidence that there is an association between confidence level and how many questions one gets correct in the population. Overconfident people are significantly more likely to get both questions correct than are those that are not overconfident, and overconfident people are significantly less likely to get neither question correct than are those that are not overconfident.
Chapter 8 Investigation 1. Does the class status of a driver (as determined by the car they drive) affect whether they yield to a pedestrian in a crosswalk? 2. The observational units are the 152 cars that were observed at the intersection. 3. Observational study as neither the drivers’ status nor whether or not they yielded were assigned by the experimenters. Rather they were merely observed for each car. 4. Variables: class status (1–5 with 1 being the lowest class status and 5 being the highest) and whether or not the driver yielded to the pedestrian 5. Explanatory variable is the class status and response variable is whether or not the driver yielded. Class status is categorical (multiple categories) and whether or not yielded is categorical (binary). 6. H0 : π1 = π2 = π3 = π4 = π5 π is different 7. 8. Mean Group Diff = (|0.00 – (0.29)| + |0.00 – (0.33)| + |0.00 – (0.44)| + |0.00 – (0.46)| + |0.29 – (0.33)| + |0.29 – (0.44)| + |0.29 – (0.46)| + |0.33 – (0.44)| + |0.33 – (0.46)| + |0.44 – (0.46)|/10 = 0.216
10/16/20 8:05 PM
Solutions to Problems 9.
115
13. This study did not use random assignment to equalize confounding variables and as such any one of these variables could be the causal explanation for failure to yield. So, no we can’t say that the cause of any differences in failure to yield between the class statuses was a direct cause of the class status. 14. As this was not a random sample, caution needs to be taken in generalizing results. Possibly we could generalize results to cars in California that frequently drive through the particular intersection where the study was conducted.
b. The observed statistic is somewhat out in the tail of this null distribution. c. The p-value is approximately 0.09; thus the study provides moderate evidence against the null hypothesis. It may be that if there were more data we might find there is a difference in at least one of the long-run probabilities of yielding for the different class status groups. This is our conclusion because the probability we would have gotten a statistic from our observed data, or one more extreme, assuming all the long-run probabilities of yielding were the same, was between 0.05 and 0.10. 10. Validity conditions are not met to complete a theory-based chisquare test. Not all cell counts are at least 10. The theory-based p-value = 0.2186. This p-value is much larger than the one found using the Mean Group Diff statistic. (The simulation-based p-value using the chi-square statistic is also about 0.22 so the chi-square distribution is not a bad approximation even though the validity conditions are not met.) 11. Validity conditions are not met so that we can perform follow-up analyses using theory-based techniques as not all cell counts are at least 10. 12. The applet will give theory-based 95% confidence intervals for the difference in each pair of proportions. As validity conditions were not met, these confidence intervals should only be taken as an indication of trends and more data should be gathered to get a valid representation of differences in long-run probabilities. The first column is theory-based and the second is the 2SD method from randomization-based:
c08Solutions.indd 115
15. Researchers wondered whether there were ethical differences between different class statuses. Specifically they looked at whether or not different class statuses were more or less likely to yield to pedestrians in the crosswalk at an intersection in California. The researchers found moderate evidence of a difference in the long-run probabilities of yielding between the classes. Specifically, those in the lowest class status were more likely to yield to pedestrians than any of the four higher class statuses. Because both the lowest class status and the highest class status groups had small numbers, it would be good to gather more data to increase sample sizes in those groups.
Chapter 8 Research Article 1. There is a significant gap between the number of organ donors and the demand for organ donors. The researchers are investigating members of the public, health professionals, and individuals affected by kidney disease regarding their views towards the acceptability of strategies which provide financial compensation to organ donors and potential willingness to donate a kidney under different financial incentives. 2. The demand for transplantation continues to exceed the supply (Axelrod et al., 2010), and a number of alternative strategies have been proposed which include financial incentives for donors (Gaston et al., 2006). 3. a. Public: n = 2004, obtained via the Ipsos polling company using an electronic invitation. b. Health professionals: n = 339, obtained via direct e-mail to all members of professional societies. c. People affected by kidney disease: n = 268, via volunteer sample recruited from Kidney Foundation of Canada website and Facebook page. 4. Ethnicity and marital status have similar distributions. Willingness to donate, education, gender, age, employment, and income all have fairly different distributions. 5. Null hypothesis: The probability of people supporting at least one of the financial incentive programs are the same in each of the three groups. Alternative hypothesis: At least one of the probabilities is different. 6. The general public supports financial incentives significantly more than health professionals, with some evidence this group is also more supportive than individuals impacted by kidney disease. There is not a significant difference in the support proportions between people impacted by kidney disease and health professionals. 7. The sample size of the general public is much, much larger (2004) than the other two samples. Thus, a similar difference between sample proportions is much more significant when comparisons involve the general public. 8. The p-value will be smaller because the sample sizes are the same, but the differences in the percentages are larger in all cases (16% vs.
10/16/20 8:05 PM
116
C HA PTE R 8
Comparing More Than Two Proportions
32% is larger difference than 71% vs. 62%; 16% vs. 23% is larger difference than 71% vs. 66%; 66% to 62% is larger difference than 23% vs. 32%); thus the overall chi-square test statistic will be larger and, consequently, the p-value will be smaller.
12. No, this is not a randomized experiment. Someone’s income is not randomly assigned, so although income may be the cause of this change in attitude, this observational study cannot tell us that conclusively. Confounding variables are possible.
9. The p-value is likely larger than 0.05.
13. Consider better techniques for sampling (random samples) and consider pilot testing programs that move from “what I might do” to “what are people actually doing” to see whether the hypothesized effects (increased organ donation) occur.
10. Null: The population proportion of individuals with lower income who find a monetary payment to living donors acceptable is the same as the population proportion of individuals with higher income. Alternative: The population proportions are different. The p-value of 0.03 gives us strong evidence that the population proportion of individuals with lower income that finds monetary payment to donors acceptable is different than among individuals with higher income. 11. The article says that the people associated with kidney donation were recruited from visitors to the Kidney Foundation of Canada website and Facebook page. Because it is a volunteer sample we should be cautious about generalizing even to visitors to these two sites.
c08Solutions.indd 116
14. The hope is that, even though the general public sample is not a random sample of all Canadian adults between the ages of 18 and 59, that it is still representative. The authors indicate that the proportion of different education levels and employment status in the sample are similar to census data, which is good, but other variables still may differ between the sample and the population (all Canadian adults between 18 and 59).
10/16/20 8:05 PM
CHAPTER 9
Comparing More Than Two Means Section 9.1 9.1.1 D. 9.1.2 A. 9.1.3 D. 9.1.4 D. 9.1.5 A. 9.1.6 D. 9.1.7 a. The max – min and Mean Group Diff statistic will always be the same. b. The difference in means may take negative values based on which mean is subtracted from which. If the difference in means is negative, the Mean Group Diff max – min will just be the absolute value of the difference in means. 9.1.8 No association implies that the means are the same in each
class level. 9.1.9
a. Largest: C, smallest: B b. Largest: D, smallest: C c. C, right skewed d. D e. Unable to tell from a boxplot what the sample size is 9.1.10 If you don’t take absolute values there are two potential problems, automatic zeros and ambiguity. (1) Automatic zeros: Finding the sum of the differences could result in zero if you subtracted in a certain direction. (2) Ambiguity: If you take absolute values, the order you subtract doesn’t matter, but if you don’t take absolute values, order does matter. 9.1.11 Square the differences before adding. 9.1.12 [|5 − 4| + |10 − 5| + |10 − 4|]/3 = 12/3 = 4 9.1.13 [|8 − 7| + |8 − 5| + |8 − 2| + |7 − 5| + |7 − 2| + |5 – 2 |]/6 = 20/6 = 3.33 9.1.14 a. The observational units are the teams, the response is the total payroll, and the explanatory variable is the league and division. b. For plots like the one above:
i. Each point represents a unit.
ii. Each horizontal cluster corresponds to a value of the explanatory variable.
iii. Values along the horizontal axis represent values of the response.
c. B. d. B. 9.1.15 a. H0: There is no association between location in California and squirrel body length. Ha: There is an association between location in California and squirrel body length. b. Mean Group Diff = 14.617 mm c. The p-value appears to be very small because 14.617 is way out in the tail of the distribution. d. It appears there is strong evidence that there is an association between location in California and squirrel body length. We can not conclude cause and effect, because this is an observational study. 9.1.16 a. Explanatory: class level, response: monthly hours in extracurricular activities. b. Null: There is no association between class level and hours in e xtracurricular activities. Alternative: There is an association betw een class level and hours in extracurricular activities. c. [|43.54 − 33.83| + |43.54 − 28.63| + |43.54 − 26.24| + |33.83 − 28.63| + |33.83 − 26.24| + |28.63 − 26.24|]/6 = (9.71 + 14.91 + 17.3 + 5.2 + 7.59 + 2.39)/6 = 57.1/6 = 9.52 hrs d. The p-value will not be small because 9.52 is near the middle of the simulated null distribution of Mean Group Diff statistics. e. We do not have strong evidence of an association between hours in extracurricular activities per month and class level when using the Mean Group Diff statistic. 9.1.17 a. The explanatory variable is the amount of fish consumption (categorical), and the response variable is the level of omega-3 fatty acids in the blood (as a percentage of the total fatty acids in the blood) (quantitative). b. H 0: There is no association between fish consumption and level of omega-3 fatty acids in the blood. Ha: There is an association between fish consumption and the amount of omega-3 fatty acids in the blood.
117
c09Solutions.indd 117
10/16/20 8:04 PM
118
C HA PTE R 9
Comparing More Than Two Means
_ _ _ _ _ c. x A = 3.77%, x B = 4.08%, x C = 5.10%, x D = 5.65%, x E = 5.95%; yes, those who eat the most fish had the highest omega-3 fatty acid levels, and those who eat the least fish had the lowest omega-3 fatty acid levels.
ily, be generalized to a larger population because random sampling was not used to obtain the sample.
d. Mean Group Diff = 1.187 percentage points
a. Explanatory: diet, response: weight loss in kg
e. p-value = 0.005, which provides very strong evidence against the null hypothesis and support for an association between amount of fish consumption and omega-3 fatty acid blood levels. There is strong evidence that those with higher amounts of fish consumption tend to have higher omega-3 fatty acids in the blood, on average. 9.1.18 a. Explanatory: video watched, response: emotional state rating, b. Null: The average emotional rating is the same for all three videos. Alternative: At least one of the average emotional ratings is different for one of the videos. c. The emotional state ratings seem lower for the sad group than for the other two groups. d. Mean Group Diff = 2.467. The average of the absolute differences in mean emotional states is 2.467 when comparing the three groups. e. The p-value is less than 0.001. f. Because the p-value is so small, we have very strong evidence of an association between video watched and emotional state rating. This suggests a cause-and-effect relationship between video watched and emotional state rating because this was a randomized experiment. However, the result cannot, necessarily, be generalized to a larger population because random sampling was not used to obtain the sample. 9.1.19 a. Explanatory: video watched, response: mood rating b. Null: The average mood rating is the same for all three videos. Alternative: At least one of the average mood ratings is different for one of the videos. c. The mood ratings seem lower for the sad group than for the other two groups. d. Mean Group Diff = 2.489. The average of the absolute differences in mean mood states is 2.489 when comparing the three groups e. The p-value is < 0.001. f. Because the p-value is so small, we have very strong evidence of an association between video watched and mood. This suggests a causeand-effect relationship between video watched and mood because this was a randomized experiment. However, the result cannot, necessarily, be generalized to a larger population because random sampling was not used to obtain the sample. 9.1.20 a. Explanatory: video watched, response: stress level b. Null: The average stress level is the same for all three videos. Alternative: At least one of the average stress levels is different for one of the videos.
9.1.21 b. The 93 subjects c. Null: The long-run average weight loss for the four diets is the same. Alternative: At least one of the long-run average weight losses is different for one of the diets compared to the others. d. The dotplots don’t show dramatic differences in weight loss between the diets. e. Mean Group Diff = 1.369 kg f. The p-value is 0.61. g. We do not have convincing evidence of difference in weight loss amounts between the four diets. If the result was significant we would have been able to conclude a cause-and-effect relationship because random assignment was used. Generalizing this result to other individuals should be done with caution because the sample was not obtained randomly from a larger population. 9.1.22 a. The null hypothesis is that there is no association between the number of kills (response) and the level of the game (explanatory variable). More specifically, the mean number of kills (the long-run expected value) is the same for all three levels. The alternative hypothesis is that at least one mean differs from the others. b. Let μi be the mean number of kills for level i, with i = 1, 2, or 3. c. The null hypothesis is H0: μ1 = μ2 = μ3. The alternative is Ha: μi ≠ μj for at least one pair (i, j). d. The simulation is based on shuffling, which in turn requires that all 90 observed values be interchangeable. In more detail, suppose that there were three different players and that each played 10 games at each level. Then all the games played by a given person would be interchangeable and could be shuffled, but games played by two different people would not be. e. The Mean Group Diff has value 0.306 kills. f. A set of 10,000 shuffles of the response resulted in 9,551 with a Mean Group Diff of 0.306 or more, for an estimated p-value of 0.9551. g. The p-value is very far from being significant. It is extremely likely to get a Mean Group Diff of 0.306 just by random chance. We do not have convincing evidence that number of kills is associated with level of the game. The exercise tells nothing about how the levels were assigned, so inference about cause is not supported. The game players were not a random sample, and generalization to some larger population is not supported. 9.1.23 Deaths: Mean Group Diff = 3.054 deaths, p-value = 0.016. We have strong evidence of an association between deaths and level played. 9.1.24 Assists: Mean Group Diff = 1.785 assists, p-value = 0.13. We do not have evidence of an association between assists and level played.
c. The stress levels seem lower for the sad group than for the no video group and higher for the happy video group.
9.1.25 Medals: Mean Group Diff = 1.652 medals, p-value = 0.15. We do not have evidence of an association between medals and level played.
d. Mean Group Diff = 2.000. The average of the absolute differences in mean stress levels is 2.000 when comparing the three groups.
9.1.26 a. Explanatory: major, response: time to complete Sudoku puzzle
e. The estimated p-value is 0.002.
b. Null: There is no association between major and time to complete Sudoku puzzle. Alternative: There is an association between major and time to complete puzzle.
f. Because the p-value is so small, we have very strong evidence of an association between video watched and stress level. This suggests a causeand-effect relationship between video watched and stress level because this was a randomized experiment. However, the result cannot, necessar-
c09Solutions.indd 118
c. There doesn’t appear to be much of difference in the groups, but there is an outlier in the as group.
10/16/20 8:04 PM
Solutions to Problems d. Mean Group Diff = 20.883 seconds
9.2.3 D.
e. p-value = 0.24. There is not convincing evidence of an association between major and time to complete the puzzle.
9.2.4 A, C.
9.1.27 The Mean Group Diff statistic would likely be even smaller because the medians are likely even closer together.
9.2.6
9.1.28 a. Mean Group Diff = 16.08 sec, p-value = 0.20 b. There is less of an average difference in the groups. c. The removal of the outlier changed the variability in the null distribution, making the smaller Mean Group Diff statistic closer to the tail of the null distribution. 9.1.29 a. Means: A = 5, B = 6, C = 7, SD = 2.74 for all three groups; Mean Group Diff = 1.33 b. Mean is 1.05, SD of null distribution is 0.544, p-value is 0.31 c. Means: A = 5, B = 6, C = 7, SD = 1.22 for all three groups; Mean Group Diff = 1.33 d. p-value = 0.005; Mean is 0.55, SD = 0.28 e. The means of the three groups are the same. f. In Study 2 the SDs within each group are smaller, and so the null distribution has a lower mean and lower SD, meaning that the same Mean Group Diff statistic is unlikely in one case but not the other. g. No, Mean Group Diff statistics are not “standardized” so, they are not comparable across studies. 9.1.30 a. H0: There is no association between condition and difference in rating scores for bottled and tap water. Ha: There is an association between condition and difference in rating scores for bottled and tap water.
119
9.2.5 A, D, E. a. C. b. A. c. C. d. A. e. A. 9.2.7 An ANOVA test is used to test a null hypothesis that all group means are equal. In more detail: We have response values on observational units, with each unit belonging to exactly one of two or more groups. We assume as part of our model that each group has a mean response value. ANOVA is used to test the null hypothesis that all group means are the same. 9.2.8 U sing ANOVA controls the overall chance of a false alarm. In more detail: Each test of a null hypothesis has a false alarm rate—the chance of wrongly declaring the null hypothesis to be false, typically 5%. If you do several tests, the overall (“family wise”) false alarm rate can be much higher than the individual error rate, e.g., much higher than 5%. ANOVA is a method that ensures that the family wise false alarm rate is below the level you choose, e.g., 5%. 9.2.9 a. More likely b. Less likely c. More likely d. More likely 9.2.10
b. Mean Group Diff = 1.093
a. If the variability between groups increases, the F-statistic will increase and the p-value will decrease.
c. The p-value appears to be quite small because 1.093 is far out in the tail of the null distribution.
b. If the variability within groups increases, the F-statistic will decrease and the p-value will increase.
d. There is strong evidence that there is an association between condition (or what they are told) and difference in rating scores between bottled water and tap water. We can conclude that the condition is causing this difference, since this was a randomized experiment.
9.2.11
9.1.31
a. The F-statistic will increase, because the differences between the groups increased, and so the p-value will decrease. b. The F-statistic will decrease, because the variability within each group increased, and so the p-value will increase.
a. H0: There is no association between breed of dog or wolves and how often the raw turkey will be chosen. Ha: There is an association between breed of dog or wolves and how often the raw turkey will be chosen.
9.2.12 a. The null and alternative hypotheses are the same as in Exercise 9.1.21, part (c).
b. Short-nosed dogs had the lowest mean score, at 13.42 times, and dogs bred for scent had the highest mean score, at 16.79 times.
c. The p-value is 0.6587, which is not even close to significant.
c. Mean Group Diff = 2.046 times d. p-value = 0.048 e. Mean Group Diff = 2.011 times; p-value = 0.01
b. F = 0.54 d. The answer here is the same as for Exercise 9.1.21, part (g). e. The requirements are that distributions within groups are roughly symmetric and that group SDs are roughly equal. Here the distributions are roughly symmetric with no outliers, and the SDs are in fact roughly equal: Using the theory-based approach is reasonable.
f. The overall variability in the scores decreased after the two low outliers were removed, and this decreased the variability in the simulated Mean Group Diffs, so even though the observed Mean Group Diff decreased, it was farther out in the tail of the simulated Mean Group Diffs, which resulted in a smaller p-value.
f. A follow-up analysis here is not appropriate because there is not convincing evidence of differences among the group means.
Section 9.2
9.2.13
9.2.1 C.
a. The null hypothesis is that the mean heart rates are the same for men and women. The alternative hypothesis is that the mean heart rates are different between men and women.
9.2.2 D.
c09Solutions.indd 119
g. The results here are essentially the same as for the simulation- based test.
10/16/20 8:04 PM
120
C HA PTE R 9
Comparing More Than Two Means
b. The value of the t-statistic is 0.63, with an associated p-value of 0.5299, a value that is not even close to significant. There is not convincing evidence that average heart rates differ for men and women. Sex could not be randomly assigned, so there could well be hidden factors that have influenced the heart rates in the study. Subjects were not chosen by random sampling from a larger population, so generalization beyond the observed sample is not automatic.
b. We have evidence of an association between deaths and level.
c. The value of the F-statistic is 0.399 with an associated p-value of 0.5287. The p-value is far from being significant. The conclusion and its scope are the same as in part (b) above.
a. F = 2.35, p-value = 0.101
d. The conclusions and their scope are the same. e. ANOVA requires that the group standard deviations be equal. This is not a requirement for the t-test. 9.2.14 a. The explanatory variable is the Type of Fish. The response is the Mercury Level in ppm. b. The observational units are the individual fish. c. Yes, validity conditions are satisfied: The sample sizes are above 20 in each to the groups and the standard deviations are within a factor of 2 (0.11 ppm (the smallest SD) is within a factor of 2 of the largest SD (0.20 ppm). d. The F-statistic equals 326.44, with a p-value of 0.0000. There is overwhelming evidence that average mercury levels are not the same for all four kinds of fish. Type of fish is observational rather than experimental, so the study by itself cannot tell us what it is about the types that leads to differences in mercury levels. Although the fish were not chosen by random sampling, it may be reasonable to think that they were caught in a way that mimics random sampling. The statistics alone cannot justify generalizing to all fish of the same type. Here are confidence intervals from the applet, Theory-based Inference: Perch – halibut: –0.13 to –0.07 ppm Perch – bass: –0.0517 to 0.0105 ppm Perch – orange roughy: –0.466 to –0.404 ppm Halibut – bass: 0.047 to 0.109 ppm Halibut – orange roughy: –0.368 to –0.306 ppm Bass – orange roughy: –0.446 to –0.384 ppm All differences except the difference between perch and bass are significant. e. “Does It Matter What Kind of Fish You Eat?” A recent scientific study compared mercury levels in 912 fish of four different kinds and found average mercury concentrations that ranged from 0.15 to 0.59 parts per million. A statistical test shows that the differences between kinds of fish are too large to be dismissed as mere chance like variation. Here are the average levels reported by the study: perch 0.15 ppm, bass 0.17 ppm, halibut 0.25 ppm, and orange roughy 0.59 ppm. A follow-up analysis finds that the difference between perch and bass is too small to regard as proven, but the other differences can be regarded as scientifically established. In short, perch and bass have the lowest levels, with halibut distinctly higher, and orange roughy distinctly higher than halibut. 9.2.15 a. F = 0.05, p-value = 0.9558 b. We do not have evidence of an association between kills and level. 9.2.16 a. F = 4.18, p-value = 0.0185
c09Solutions.indd 120
9.2.17 a. F = 1.88, p-value = 0.1592 b. We do not have evidence of an association between assists and level. 9.2.18 b. We do not have evidence of an association between medals and level. 9.2.19 Chi-square = 4.79, p-value = 0.0914. We have moderate evidence of an association between winning and losing and level played. 9.2.20 a. Experiment because participants were assigned to a group b. Each of the participants c. Explanatory: which group (ENews, Facebook, or control; categorical), response: self-esteem score (quantitative) d. Null: There is no association between group and self-esteem score. Alternative: There is an association between group and self-esteem score. e. Null: μENews = μFacebook = μcontrol., Alternative: At least one of the three means is different than the others. μ indicates the long-run mean self-esteem score for the group. f. Mean Group Diff = 2.395, p-value = 0.005 g. F = 7.08, p-value = 0.0015 h. We have strong evidence that at least one of the group means is different than the others. A cause-and-effect conclusion is possible due to the randomization in the experimental design, but generalizing to a larger population should be done with caution because the sample was not obtained randomly. i. The sample size is larger than 20 in each group, and there is not strong skewness in any of the groups. The SDs are within a factor of 2 of each other (3.43 × 2 = 6.86 > 4.30). Thus, the validity conditions for ANOVA are met. j. Follow-up analyses are appropriate. 95% confidence intervals indicate that Facebook users had significantly higher self-esteem scores than those who looked at the news (1.42–5.76 higher) as well as the control group (1.35–5.69 higher). 9.2.21 a. Larger b. Smaller c. Stronger d. F-statistics increase as the sample sizes increase, all else being equal. This will make the p-value smaller and increase the strength of evidence. 9.2.22 No, there would be two categorical variables and a chi-square test would be more appropriate. 9.2.23 a. Each student b. Explanatory: group (CA, KS, or ME; categorical), response: $ willing to donate (quantitative) c. The long-run average amount willing to donate in each of the three groups (μCA, μKS, μME) d. Null: μCA = μKS = μME. Alternative: At least one of the three means is different than the others.
10/16/20 8:04 PM
Solutions to Problems e. Mean Group Diff = $3.988, p-value=0.84 f. F = 0.19, p-value = 0.83 g. We do not have evidence of a difference in the long-run average amount willing to donate between the three groups. Cause and effect would be possible due to the random assignment, but generalizing should be done with caution due to the fact that the sample was not obtained by taking a random sample. h. The ANOVA/F-test may not be valid due to the small sample size combined with strong skewness/outliers in the data. i. No follow-up analysis is needed because there is not strong evidence of an association. j. Not only the states were different, but the type of disaster was also different, so if there had been evidence of an association we couldn’t attribute it necessarily to state; it could have been type of disaster. 9.2.24 a. Larger b. Smaller c. Stronger d. F-statistics increase as the sample sizes increase, all else being equal. This will make the p-value smaller and increase the strength of evidence. 9.2.25 No, there would be two categorical variables and a chi-square test would be more appropriate. 9.2.26 a. Each drop of a helicopter b. Observational study. While the researcher did make two different helicopters, the observational units (each drop) were not randomly assigned to different wing lengths. c. Explanatory: wing length (2 or 3 cm; categorical), response: flight time (seconds; quantitative) d. Null: There is no association between wing length and flight time. Alternative: There is an association. e. Null: μ2 = μ3, Alternative: μ2 ≠ μ3, where μ is the long run average flight time for the helicopter with a particular wing length. f. 2 cm: mean = 2.402 sec, SD = 0.346 sec, n = 17, 3 cm: mean = 2.904 sec, SD = 0.538 sec, n = 15, t = 3.09, p-value = 0.0051 g. F = 10.09, p-value = 0.0034 h. The p-values are about the same. i. We have strong evidence of an association between wing length and flight time. This is not a cause-and-effect conclusion, and the results cannot necessarily be generalized beyond these particular helicopters. j. The sample sizes are less than 20, but the data in each group are reasonably symmetric. There is one possible outlier for the 3-cm helicopter, but it isn’t dramatically different than the rest of the data. The SDs are within a factor of 2 (0.35 × 2 = 0.70 > 0.54). Thus, the validity conditions are met. k. A 95% confidence interval suggests that the 3-cm helicopter flies for between 0.167 and 0.838 seconds longer on average compared to the 2-cm helicopter. l. So that the difference in flying time can be attributed to wing length and not other factors. Any other differences in the helicopters could also be explaining the difference in flying times. 9.2.27 a. No, there would be more than two groups.
c09Solutions.indd 121
121
b. Yes, ANOVA works when there are multiple groups (different wing lengths) and a quantitative response (flight time). 9.2.28 a. Null hypothesis: The long-run average distance reached is the same for each of the three groups. Alternative: At least one of the long-run averages is different than the others. b. Null: μyoga = μwalking = μcontrol. Alternative: At least one of the three means is different than the others. μ = long-run average distance reached. c. We have strong evidence (based on the small p-value) that at least one of the three means is different than the others. This suggests a cause-and-effect relationship between group and distance reached. Generalizing this result to a larger group should be done with caution because the sample was not obtained randomly. d. The sample sizes are above 20 in each group, so we are assuming that the SDs are within a factor of 2 of each other and that there is not strong skewness/outliers in the data. 9.2.29 a. Null hypothesis: The long-run average reaction time is the same for each of the three groups. Alternative: At least one of the long-run averages is different than the others. b. Null: μyoga = μwalking = μcontrol. Alternative: At least one of the three means is different than the others. μ = long-run average reaction time. c. We do not have strong evidence (based on the large p-value) that at least one of the three means is different than the others. Without an association, we cannot determine a cause-and-effect relationship between group and reaction time. Generalizing this result to a larger group should be done with caution because the sample was not obtained randomly. d. The sample sizes are above 20 in each group, so we are assuming that the SDs are within a factor of 2 of each other and that there is not strong skewness/outliers in the data. 9.2.30 a. Null hypothesis: The long-run average words recalled is the same for each of the three groups. Alternative: At least one of the long-run averages is different than the others. b. Null: μyoga = μwalking = μcontrol. Alternative: At least one of the three means is different than the others. μ = long run average words r ecalled. c. We do not have strong evidence (based on the large p-value) that at least one of the three means is different than the others. Without an association, we cannot determine a cause-and-effect relationship between group and words recalled. Generalizing this result to a larger group should be done with caution because the sample was not obtained randomly. d. The sample sizes are above 20 in each group, so we are assuming that the SDs are within a factor of 2 of each other and that there is not strong skewness/outliers in the data. 9.2.31 a. Null hypothesis: The long-run average perception of health is the same for each of the three groups. Alternative: At least one of the longrun averages is different than the others. b. Null: μyoga = μwalking = μcontrol. Alternative: At least one of the three means is different than the others. μ = long-run average perception of health. c. We have strong evidence (based on the small p-value) that at least one of the three means is different than the others. This suggests a cause-and-effect relationship between group and perception of
10/16/20 8:04 PM
122
C HA PTE R 9
Comparing More Than Two Means
health. Generalizing this result to a larger group should be done with caution because the sample was not obtained randomly.
f. Results are not significant, so no follow-up confidence intervals are needed.
d. The sample sizes are above 20 in each group, so we are assuming that the SDs are within a factor of 2 of each other and that there is not strong skewness/outliers in the data.
g. R2 = 3,241.50/109,183.33 = 0.0297; 2.97% of the variation in creativity scores can be explained by the stereotype group.
9.2.32
a. The explanatory variable is exercise program (categorical), and the response variable is the change in functional reach in cm (quantitative).
a. Short-nosed dogs had the lowest mean score, at 13.42, and dogs bred for scent had the highest mean score, at 16.79. b. Mean Group Diff = 2.046 with a p-value of 0.048; F = 2.699 with a p-value of 0.053 c. New Mean Group Diff = 2.011 with a p-value of 0.01; the new F = 4.339 with a p-value of 0.008 d. After deleting the two low outliers from the two groups, the group means were closer together, so the Mean Group Diff decreased. The numerator of the F-statistic decreased as the variability between the group means decreased; however, the denominator of the F also decreased as the variability within the groups was greatly decreased by removing the two low outliers. This made the F-ratio larger. 9.2.33 a. The explanatory variable is the imagined stereotype (control, librarian, poet) (categorical), and the response variable is the creativity score (quantitative). b. H0: There is no association between imagined stereotype and creativity score. Ha: There is an association between imagined stereotype and creativity score. _ _ _ c. x control = 77.88, x librarian = 60.34, x poet = 92.16; those who were told that they were poets had the highest mean, while librarians had the lowest mean. d. Yes, validity conditions are met for a theory-based ANOVA test, as there are more than 20 in each group, the distributions of creativity scores for each sample are not strongly skewed, and the sample standard deviations are all within a factor of 2 of each other. e. F = 8.146 and the p-value = 0.0006, which provides very strong evidence against the null hypothesis and in support of an association between imagined stereotype and creativity score. f. Confidence interval for μ librarian − μ poet (–47.49, –16.13), Confidence interval for μ librarian − μ control : (–33.21, –1.85); confidence μ control : (–1.40, 29.96). Poets and controls had siginterval for μ poet − nificantly higher mean creativity scores than librarians, but there was no significant difference between poets and controls.
9.2.35
b. H0: There is no association between exercise program and change in functional reach. Ha: There is an association between exercise program and change in functional reach. _ _ _ c. x stretching = 0.86 cm, x resistance = 2.34 cm, x tai chi = 4.89 cm; the tai-chi exercise program had the highest mean, and the stretching exercise program had the lowest mean. d. Yes, validity conditions are met for a theory-based ANOVA test, as there are more than 20 in each group and distribution of the change in functional reach for each sample is not strongly skewed; also, the sample standard deviations are all within a factor of 2 of each other. e. F = 11.097 and the p-value < 0.0001, which provides very strong evidence against the null hypothesis and in support of an association between exercise program and change in functional reach. f. Confidence interval for μ tai chi − μ resistance : (0.8440, 4.2637) cm; μ confidence interval for μ tai chi − stretching : (2.33, 5.75) cm; confidence interval for μ resistance − μ stretching : (–0.2268, 3.1929) cm. The tai-chi exercise group has a higher average change in functional reach than either the resistance group or the stretching group. There was no significant difference in the average change in functional reach between the resistance group and the stretching group. g. R2 = 542.07/5,231.38 = 0.104; 10.4% of the variation in functional reach can be explained by the exercise program used. 9.2.36 Using the F-statistic allows us to compare multiple groups at once and to control the overall risk of finding significant results when they don’t exist (Type I errors). 9.2.37 Using the t-statistic we can do either one- or two-sided tests. We use t-statistics for creating confidence intervals. The t-statistic is also easier to learn so it is a better place to start when comparing two groups that the F-statistic. 9.2.38 a. F-statistic
g. R = 16,248.90/109,003.83 = 0.149; 14.9% of the variation in creativity scores can be explained by the stereotype group.
b. F-statistic
9.2.34
9.2.39
a. The explanatory variable is major (categorical), and the response variable is the creativity score (quantitative).
a. The variability between the groups.
b. H0: There is no association between major and creativity score. Ha: There is an association between major and creativity score. _ _ _ _ c. x art = 77.71, x physics = 73.96, x theater = 85.71, x biology = 69.96; theater majors had the highest mean, and biology majors had the lowest mean.
c. Greater than 1
2
d. Yes, validity conditions are met for a theory-based ANOVA test, as there are more than 20 in each group, the distributions of creativity scores for each sample are not strongly skewed, and the sample standard deviations are all within a factor of 2 of each other. e. F = 0.938 and the p-value = 0.4257, which provides weak evidence against the null hypothesis; thus, it is plausible that there is no association between major and creativity score.
c09Solutions.indd 122
c. F-statistic
b. The variability within the groups.
End of Chapter 9 Exercises 9.CE.1 a. Smaller b. Larger c. Smaller 9.CE.2 Geraldine is right. Josephine is thinking about the number of categories (sometimes called levels) of a categorical variable and mistakenly calling this the number of variables.
10/16/20 8:04 PM
Solutions to Problems 9.CE.3 μ is the population (or long-run) average—it is not observed _ but is what we are trying to learn about. x is the sample mean—it is observed. 9.CE.4 The subscripts are used to indicate that the mean (population or sample) is for a particular subgroup (of the population or sample). 9.CE.5 The overall average will equal the average of the group a verages when the size of each group is the same. For example, if each section had 25 students, then the average of each group of 25 students and the overall a verage of 100 students would be the same. At the other extreme, if one section had 97 students and the other three “sections” each had one student each, then averaging the four section averages won’t necessarily be the same as the overall average. 9.CE.6 A nswers will vary. Here is one possibility. a. Is there a difference between three different versions of a test?
_ b. Sample size is 50 for each of the five groups. Doc: x = $97.4 K, SD = _ _ $40.5 K, Mas: x = $66.0 K, SD = $38.4 K, Bac: x = $55.2 K, SD = $32.2 K, _ _ Assoc: x = $36.8 K, SD = $28.5 K, Some: x = $32.51 K, SD = $20.80 K c. The distribution of incomes is right skewed within each group, with a few extreme values in each group. The means and medians, however, follow a trend so that each additional “higher”-level degree is associated with higher earnings. d. The Mean Group Diff statistic is $31.799 K, which yields a p-value < 0.001. There is very strong evidence of an association between degree received and individual yearly earnings. e. F = 31.53, p-value < 0.0001. There is very strong evidence of an a ssociation between degree received and individual yearly earnings. f. Compute 95% conÿdence interval(s)
b. Students in a large class
1:Some ˜ 2:Associate: (˜17.23, 8.65) 1:Some ˜ 3:Bachelor: (˜35.64, ˜9.75)* 1:Some ˜ 4:Master: (˜46.44, ˜20.55)* 2:Some ˜ 5:Doctorate: (˜77.84, ˜51.95)* 2:Associate ˜ 3:Bachelor: (˜31.35, ˜5.46)* 2:Associate ˜ 4:Master: (˜42.15, ˜16.26)* 2:Associate ˜ 5:Doctorate: (˜73.55, ˜47.66)* 3:Bachelor ˜ 4:Master: (˜23.74, 2.15) 3:Bachelor ˜ 5:Doctorate: (˜55.14. ˜29.26)* 4:Master ˜ 5:Doctorate: (˜44.35, ˜18.46)*
c. Which version of the test the students take d. 3 e. The scores on the exam f. Experimental g. Random assignment, not random sampling h. There is no association between student scores on the exam and test version. i. There is an association between student scores on the exam and test version. 9.CE.7 a. Study A: (0 + 20 + 20 + 20 + 20 + 0)/6 = 80/6 = 13.3, Study B: (10 + 10 + 20 + 0 + 10 + 10)/6 = 60/6 = 10 b. Study A, because on average the means are more different from each other in Study A. In particular, 40 is farther away from 20 than 30 is from 20—thus, the Mean Group Diff statistic is larger for Study A (13.3) as compared to Study B (10).
123
g. Higher education is strongly associated with higher income. Average yearly income ranges from $32.5K for individuals with only some higher education (but no degree), up to $97.4K for individuals with a doctorate. The doctorate degree increases average yearly income over $30K/year over a masters’ degree. 9.CE.10 a. Bass
c. The denominators of the F-statistic will be the same for both studies. However, the group means in Study A are more spread out than in Study B. So, just like Study A had a larger Mean Group Diff statistic, Study A will also have a larger F-statistic.
Part
Tenor
9.CE.8
Alto
a. Study C—small sample size and larger SD b. Study B—larger sample size and smaller SD
Soprano
9.CE.9 a.
63
70
77
Height
5:Do
i. The bass part has the largest median height, 71 inches. ii. The tallest singer is a tenor (76”).
Degree
4:Ma
iii. The soprano part has the smallest IQR (approximately 2.5 inches: 65 − 62.5).
3:Ba
iv. The smallest lower quartile is for the sopranos: 62.5 inches. b. Bass: n = 39, mean = 70.72 in, SD = 2.36 in, tenor: n = 20, mean = 69.15 in, SD = 3.22 in, alto: n = 35, mean = 64.89 in, SD = 2.79 in, soprano: n = 36, mean = 64.25 in, SD = 1.87 in
2:As
i. The bass part has the most singers (39).
1:So
0
c09Solutions.indd 123
100 Earnings ($K)
ii. The bass part also has the largest mean height (70.72 inches).
iii. The largest SD is for tenors (SD = 3.22 inches).
200
10/16/20 8:04 PM
124
C HA PTE R 9
Comparing More Than Two Means
c.
Treatement
DF
SS
MS
F
p-value
3
1058.53
352.84
55.80
< 0.0001
6.32
Error
126
796.74
Total
129
1855.27
Other Doors
Source
Null: There is no association between height and singing part. Alternative: There is an association between height and singing part.
Four
Two
F = 55.80, p-value < 0.0001. We have very strong evidence of an association between height and singing part. d. Validity conditions are met: Sample sizes are above 20 without strong skewness. Furthermore, the SDs are within a factor of 2 of each other (1.87 × 2 = 3.74 > 3.22) e.
Compute 95% conÿdence interval(s)
All groups have significantly different average heights than every other group except for sopranos and altos. Bass is the tallest on average, followed by tenors and then sopranos/altos. 9.CE.11
3.6
b. F = 11.96, p-value < 0.0001. Yes, there is strong evidence that head injury measurements differ based on the number of doors of the vehicle.
two – four: (–0.0788, 0.0186) 95% CI(s) for difference in means two – other: (–0.2097, –0.0871)* four – other: (–0.1764, –0.0603)*
9.CE.13 a. The close friends group looks to have a higher number correct based on the fact that the dotplot appears to have a larger mean than the other two groups.
Close friends Relationship
a. There is a right-skewed distribution of the extent of head injury on the dummy’s head within each of the three groups, with at least one outlier in the “other” group. The average head injury score in the other group is 1,138, compared to only 865.5 and 811.5 in the four- and twodoor groups, respectively. See boxplot below.
Other Doors
3 Log head injury
c. The vehicles in the “other” group have significantly higher average head injury scores than the other two groups [95% CIs: two vs. other (−0.21, −0.09), four vs. other (−0.18, −0.06)]. There is not a significant difference in the average head injury scores between the two- and four-door groups. (See output for Solution 9.CE.12c.)
soprano ˜ alto: (˜1.82, 0.55) soprano ˜ tenor: (˜6.29, ˜3.51)* soprano ˜ bass: (˜7.62, ˜5.32)* alto ˜ tenor: (˜5.66, ˜2.87)* alto ˜ bass: (˜6.99, ˜4.67)* tenor ˜ bass: (˜2.94, ˜0.20)*
Four
Acquaintance
Strangers
Two
0
1000
2000
3000
4000
Extent of head injury
b. No, there is an outlier and severe skew—especially in the “other” group. c. The Mean Group Diff statistic is 217.82. The p-value is < 0.0001. There is very strong evidence of an association between number of doors and extent of head injury. 9.CE.12 a. The distribution of the head injury measurement is fairly symmetric within each of the three groups, with the highest mean (3.01) for the other group. The technical conditions now look to be met because the data are not strongly skewed within each group and the SDs are close (they are all approximately 0.20).
c09Solutions.indd 124
2.4
3.0
5.5 Number_correct
8.0
b. Informally, the null hypothesis is that the capacity to detect lies does not depend on the relationship between the researcher and the person who hears the lies. More formally, let μF be the mean number of correct answers for friends, μA be the mean number of correct answers for acquaintances, and μS be the mean number of correct answers for strangers. Then the null hypothesis is H0: μF = μA = μS. The alternative is H1: μi ≠ μj for at least one pair (i, j). The ANOVA gives an F of 6.48, with a highly significant p-value of 0.0024. The evidence against the null hypothesis is extremely strong. It should be rejected.
10/16/20 8:04 PM
Solutions to Problems c. Here are follow-up confidence intervals for pairwise differences:
Comparison
Lower
Upper
Acquaintance – friend
–2.11
–0.50
Stranger – friend
–2.08
–0.43
Stranger – acquaintance
–0.74
0.85
Taking the results at face value, we find that there is not convincing evidence that the capacity to recognize lies differs between acquaintances and strangers but that friends are detectably better than either acquaintances or strangers.
125
9.CE.16 a.
Music type
Mean £
SD £
None
21.69
3.38
Pop
21.91
2.73
Classical
24.13
2.30
b. The response is the amount spent. The explanatory variable is the type of music. The experimental units are the customers. c. See dotplots.
9.CE.14 None
b. B. c. The scope of inference is extremely limited, because it is impossible to separate the effect on noise level from three different influences: location within New York, type of location (store, restaurant, etc.), and time of day.
Music type
a. The observational unit is a location. The response is the noise level in decibels. The explanatory variable is the area of New York City.
Pop
Classical
9.CE.15 a.
i. True. The observational unit is the same (the snake) for both plots.
ii. False. The response variable is not the same (age in one, length in the other) for both plots.
iii. True. The explanatory variable is the same (a combination of sex and site) for both plots.
b. The plot for length shows the stronger evidence of differences between groups. c. The average difference for sex is larger. d. The mean absolute pairwise difference in age to the nearest 5 years is 0. (The largest difference is 2.12 yrs, the smallest is 0.47 yrs, so the average is about 1.3 yrs.) e. The p-value is 0.230. f. 37.5 is the average for Site 1, Female; 40.1 is for Site 2, Female; 43.7 is for Site 1, Male; 49.5 is for Site 2, Male. g. On average, Site 2 has the longer snakes. On average, males are longer. h. The mean absolute pair wise difference in length to the nearest half foot (6 inches) is closest to 6. The largest difference is 12, the smallest is 2.6, with an average of 7.3. i. The p-values 0.0005 is approximately correct using a theory-based F-test. j. Effect of scale change
i. B.
ii. B.
iii. A.
iv. C.
v. C.
vi. A.
vii. A.
c09Solutions.indd 125
14.3
21.6 Amount spent (£)
28.9
d. The null hypothesis is that there is no association between the type of music and the mean amount spent by customers. The alternative hypothesis is that the mean amount spent is associated with the type of music. More formally, define parameters: μN = mean amount spent by a customer when there is no music playing μP = mean amount spent by a customer when there is pop music playing μC = mean amount spent by a customer when there is classical music playing Then the null hypothesis is H0: μN = μP = μC and the alternative is that at least one of the μ’s is different from the rest. A simulation-based test using the observ ed value of the Mean Group Diff (1.623) has an estimated p-value of 0.0000 using 10,000 repetitions. This is highly significant, and so we reject the null hypothesis and conclude that there is in fact an association between type of music and size of the check. Because the type of music was randomly assigned, we can conclude that the type of music is the cause of the difference. However, because customers were not chosen using a random sample, we cannot automatically generalize to a larger population. e. The distributions are roughly symmetric, with no outliers. The group SDs are about the same, and the sample sizes are substantial. In short, a theory-based test is justified. f. The theory-based test uses the observed value of the F-statistic, equal to 27.82. The p-value is 0.0001, which agrees with the simulation-based test. The conclusion is the same as the one stated in part (d). g. Here are confidence intervals from the Theory-based Inference applet:
10/16/20 8:04 PM
126
C HA PTE R 9
Comparing More Than Two Means
Classical – pop
(1.52, 2.91) £
Classical – none
(1.73, 3.14) £
Pop – none
(−0.46, 0.90) £
The intervals tell us that there is no detectable difference in average amount paid whether there is pop music or no music but that when classical music is playing, the average check amount is distinctly higher. h. The results in parts (d) and (f) are essentially the same.
west and West. If a shuffle puts D.C. in New England, that adds about 270/9 = 30 to the mean for that region, and the resulting Mean Group Diff is about 15, which corresponds to the right-most peak of the null distribution. If instead a shuffle puts D.C. in the South, that adds only about 270/17, or 16, to the mean for the South, and the resulting Mean Group Diff is about 9, which corresponds to the left-most peak of the null distribution. If a shuffle puts D.C. in either the Midwest or West, that adds about 270/12, or about 22.5, to the mean for that region, and the resulting Mean Group Diff is about 10 or 11, which corresponds to the middle peak of the null distribution.
i. See answer to part (d).
Chapter 9 Investigation
9.CE.17 a. The observational unit is a state. Region is the explanatory variable.
1. Do levels of aggression differ for those who are better off or worse off?
b. The means for attorneys and unemployment are between 8 and 9; the means for poverty rate and percent not covered by health insurance are between 13 and 14.
2. Randomized experiment because the three different treatment groups were randomly assigned by the experimenter. 3. The experimental units are the 72 French female college students.
c. (8.5, 9.3) is for unemployment; (11.9, 8.9) is for attorneys; (11.2, 13.8) is for poverty; (11.0, 16.8) is for percentage not covered by health insurance.
4. Variables: treatment group (downward, upward, control) and aggression score
d. The largest mean absolute pairwise deviation is for percentage not covered; the smallest is for unemployment. e. Random assignment is not possible, so inference about cause cannot be based on a test of significance alone. Random sampling is irrelevant—we have the entire population, so generalizing beyond the sample makes no sense. What we can conclude is this: The pattern is too strong to occur by chance alone.
5. Explanatory variable is treatment group and response variable is aggression score. Treatment group is categorical and aggression score is quantitative. 6. H0 : μupward = μdownward = μcontrol. Ha: At least one μ is different. 7. Mean Group Diff = (|0.78 − (–0.52)| + |0.78 − (−0.27)| + |−0.52 – (−0.270|)/3 = 0.87 8. a.
f. The evidence of a difference is strong for poverty, attorneys, and not covered. There is not convincing evidence of differences in unemployment.
9.CE.18 a. There is strong evidence of an association only for year. There is some evidence for waste sites. b. The p-values are ordered year > waste > payday > McDs. (The strength of evidence of a difference goes in the opposite order.) 9.CE.19 a. The two smallest Mean Group Diffs are for HS Grad and Ave Temp. The largest Mean Group Diff is for Precip. Cremation is in between. b. The two smallest variabilities belong to HS Grad and Ave Temp. The two largest belong to precipitation rate and cremation rate. c. Refer to the answer for Exercise 9.CE.17, part (e). d. D.C. has nearly 300 attorneys per 100,000 people. Not even one of the 50 states has as many as 25. Where a scramble puts D.C. has a huge effect on Mean Group Diff. e. A. f. A. g. It is hard to tell the effect on the p-value. The p-value depends not just on the value of the Mean Group Diff but also on the null distribution. Without knowing the null distribution, there is no way to judge whether the observed value is extreme or not. h. There are two things that are responsible for the three modes. First, the value of 276.7 for D.C. is so large that by comparison all the other values are essentially the same. The value of the Mean Group Diff for a shuffle pretty much depends only on which region D.C. ends up in. Second, although there are four regions, there are only three sample sizes: 9 for New England, 17 for the South, and 12 each for the Mid-
c09Solutions.indd 126
105 Count
g. The p-value of 0.00002 is for not covered; 0.0004 is for poverty; 0.03 is for attorneys; and 0.33 is for unemployment.
Mean = 0.413 SD = 0.212 Total = 1000
70 35
0.210 0.605 1.001 Shuf˜ed Mean Group Diffs Count samples Greater than
0.87
Count = 35/1000 (0.0350)
b. The observed statistic is out in the tail of this null distribution. c. The p-value = 0.035. Thus the experiment provides strong evidence against the null hypothesis and we are able to conclude that there is a significant difference in at least one of the average aggression scores for the three experimental groups. This is our conclusion because the probability we would have gotten a statistic from our observed data, or one more extreme, assuming all the mean aggression scores were the same was so small. 9. Validity conditions are met to complete a theory-based ANOVA test. There are more than 20 experimental units in each of the three treatment groups. The theory-based p-value = 0.0258. 10. Because the overall tests is significant, and the validity conditions are met, we can perform follow-up analyses using theory- based techniques because there are more than 20 experimental units in each of the three treatment groups. 11. Upward vs. downward (–2.051, −0.061), upward vs. control (–0.75, 1.24), downward vs. control (0.31, 2.3)
10/16/20 8:04 PM
Solutions to Problems There is a significant difference in the average aggression scores between the upward and downward treatment groups and between the downward and control groups. 12. Because this is a randomized experiment, confounding variables have been equalized or neutralized between the treatment groups so we can conclude the different treatments are the cause of the different average aggression scores. 13. Perhaps to French female college students, because that was the population from which the experimental units were taken, however it is not a random sample so we should be cautious when generalizing. 14. Researchers wondered whether there were different levels of aggression depending on if the aggressor felt superior or inferior to the person they were aggressive against. A randomized experiment was designed to assign 72 French female college students to one of three treatment groups, an upward, downward, or neutral group. Several variables were measured, including an aggression score. The average aggression score for the upward group was −0.27, for the downward group was 0.78, and for the neutral group was –0.52. An ANOVA test was performed to see whether at least one of the population average aggression scores was significantly different from the others. The test resulted in a p-value of 0.0258, which gives strong evidence against the null and in favor of the alternative that indeed at least one of the population average aggression scores is different from the rest. Follow-up analysis of pairwise confidence intervals for the difference in population averages showed that the downward group scored significantly higher than the upward group on their average aggression score. A 95% confidence interval for the difference in population averages was (–2.051, –0.061). So we are 95% confident that the downward group scored on average 0.061 to 2.051 higher than the upward group. Follow-up analysis also showed the downward group scored significantly higher than the control group on their average aggression score. A 95% confidence interval for the difference in population averages was (0.31, 2.3). So we are 95% confident that the downward group scored on average 0.31–2.3 higher than the control group. No significant difference was found between the upward and control groups. Because random assignment was used, we can say that the cause of the different aggression scores was the treatment group that was assigned. We did not have a random sample, so generalizations are limited. Possibly we could generalize results to French female college students. Further studies could include males and other nationalities.
Chapter 9 Research Article 1. The researchers wished to evaluate the extent to which awe was associated with perceived time availability, impatience, volunteerism, and other factors. 2. Many people experience “time famine” (Perlow, 1999) and this feeling is related to undesirable side effects (Lehto, 1998).
c09Solutions.indd 127
127
3. 63 students 4. 86 students 5. 105 members of a national panel 6. Average awe = 6.06 for awe condition participants and 3.84 for happiness condition participants 7. Average happiness = 4.84 for awe condition participants and 5.61 for happiness condition participants 8. Null: The average perceived time availability for the awe condition participants is the same as for happiness condition participants. Alternative: The average perceived time availability is different for awe condition participants as for happiness condition participants. The p-value (0.01) means that there is strong evidence that the average perceived time availability is different for awe condition participants as compared to happiness condition participants. 9. The sample sizes are approximately 31 in each group (63 total, randomly assigned) and so over the 20 p/group limit. Furthermore, the SDs are similar (1.49 and 1.20, within a factor of 2). 10. Null: The average impatience for awe condition participants is the same as for happiness condition participants. Alternative: The average impatience is different for awe condition participants as for happiness condition participants. The p-value (0.03) means that there is strong evidence that the average impatience is different for awe condition participants as compared to happiness condition participants. 11. The sample sizes are approximately 43 in each group (86 total, randomly assigned) and so over the 20 p/group limit. Furthermore, the SDs are similar (1.57 and 1.48, within a factor of 2). 12. Null: The average perceived time availability for awe condition participants is the same as for neutral condition participants. Alternative: The average perceived time availability is different for awe condition participants than for neutral condition participants. The p-value (< 0.01) means that there is strong evidence that the average time availability is different for awe condition participants as compared to neutral condition participants. 13. The sample sizes are approximately 52 in each group (105 total, randomly assigned) and so over the 20 p/group limit. Furthermore, the SDs are similar (1.22 and 1.38, within a factor of 2). 14. The researchers could use independent sample t-tests instead. 15. Yes, because there is random assignment in the study, suggesting a potential cause–effect conclusion. 16. Consider obtaining subjects from different sources to make sure that results hold in different parts of the population. Consider running a more intensive/longer study time that consistently initiated “awe” events in people’s lives to see whether the impact is sustainable over time or if the impact lessens due to normalizing the behavior.
10/16/20 8:04 PM
CHAPTER 10
Two Quantitative Variables Section 10.1 10.1.1 D. There may be some nonlinear relationship. 10.1.2 A is true, provided the change of units is linear, as in changing inches to feet or minutes to seconds. Nonlinear changes in units such as pH to hydrogen ion concentration will change the correlation coefficient. 10.1.3 A. 10.1.4 A. 10.1.5 B. 10.1.6 D. 10.1.7 A. 10.1.8 D. 10.1.9 a. 1
c. Outliers S; Influential points N. You can sometimes tell whether there are outliers: If r is close to –1 or 1, there cannot be outliers unless the sample size is very large compared to the number of outliers. You can never tell whether there are influential points. 10.1.15 In a scatterplot, the points are observational units, the x-axis represents the values of the explanatory variable, and the y-axis represents the values of the response variable. 10.1.16 The correlation coefficient measures the strength of the linear relationship between two quantitative variables x and y. For the curved relationship in the plot, no line comes close to all the points. There is a very strong relationship, but not a strong linear relationship. The near-zero value of r reflects this absence of a linear relationship. 10.1.17 a. The two correlations must be the same. If one is –0.56 the other one must be –0.56 as well.
b. 4
b. Whether they have a dog is a categorical variable. Correlation is a measurement between two quantitative variables.
c. 2
c. Correlation has no units so it can’t be measured in kg/inches.
d. 3
d. Each of these denotes just as strong a linear relationship.
10.1.10
10.1.18
a. 3
a. Positive
b. 1
b. Moderate
c. 4
c. There appear to be two clouds of data.
d. 2
10.1.19
10.1.11 The association will be negative: Larger distances go with lower exam scores and vice-versa.
a. Positive
10.1.12 The association will be positive: Higher temperatures go with larger amounts of ice cream sold and vice-versa.
c. The data appear to be in lines corresponding to ages in whole years.
10.1.13 The correlation coefficient will be exactly 1, because there is a perfect linear relationship with positive slope: Exam2 = Exam1 – 5. 10.1.14 a. S: You can sometimes tell whether there is a relationship between x and y: If r = 1 or r = –1, you know there is a perfect linear relationship between x and y. If r is near those extreme values, you know there is a nearly linear relationship. But if r is not near –1 or 1, there may be no relationship, but there might be a nonlinear systematic relationship. b. S: You can sometimes tell whether the relationship between x and y is linear or curved, as explained in part (a).
b. Weak 10.1.20 a. Moderate positive linear association between glycemic load and mean addictive rating b. Observational units are the 35 foods. c. Glycemic load = 22; it is not the highest glycemic load. d. 3.22 10.1.21 a. Observational study b. The 20 houses
128
c10Solutions.indd 128
10/16/20 8:02 PM
Solutions to Problems
129
c. The explanatory variable is square footage and the response is the price. Both of these are quantitative.
c. The explanatory variable is finger length and the response is height. Both of these are quantitative.
d. There is a fairly strong positive linear association beween square footage and price with no unusal observations. See the scatterplot below.
d. There is a moderate positive linear association beween finger length and height with no unusual observations. See the scatterplot below.
630,000
75
Height (in)
Price ($)
540,000
450,000
360,000
70
65
400
800
1,200
1,600
2,000
60
Sq ft
e. The correlation coefficient is 0.780 and this backs up the description given in part (d) as it is a number fairly close to 1. 10.1.22
6
8
10
Finger (cm)
e. The correlation coefficient is 0.474 and this backs up the description given in part (d) as it is just a little bit below 0.5. 10.1.24
a. Observational b. The 129 roller coasters c. The explanatory variable is the height and the response is speed. Both of these are quantitative.
a. The point (5,70) is circled in the scatterplot. It is a bit unusual as it has the shortest finger length, but it is around the average height.
d. There is a strong positive linear association beween height and speed with three unusual observations that are both unusually high and unually fast. See the scatterplot.
75
Height (in)
120 100
Speed
4
80
70
65
60 60
40
4
6
8
10
Finger (cm) 20 0
100
200
300
400
Height
e. The correlation coefficient is 0.895 and this backs up the description given in part (d) as it is a number close to 1. 10.1.23 a. Observational b. The 34 people in the study
c10Solutions.indd 129
b. r = 0.474 c. r = 0.566 d. The correlation coefficient increased. This makes sense because the point removed did not nicely fall into the positive linear pattern of the data, so removing it would make the data better fit that overall pattern. 10.1.25 a. The point (10.16,78) is circled in the scatterplot. It is a bit unusual as it represents the longest finger and the greatest height.
10/16/20 8:02 PM
130
C HA PTE R 10
Two Quantitative Variables d. There is almost no association beween father’s and child’s heights with no unusual observations. See the scatterplot below.
76 70 72
Height
Height (in)
75
65
60
68
4
6
8
64
10
Finger (cm)
64
68
b. r = 0.474 c. r = 0.353 d. The correlation coefficient decreased. This makes sense because the point removed very nicely fell into the positive linear pattern of the data (and was extreme on the x-axis), so removing it would make our measure of the overall association weaker.
72 Dad height
76
e. The correlation coefficient is 0.061 and this backs up the description given in part (d) as it is close to 0. 10.1.28 a. r = 0061 b. r = 0.200
10.1.26
c. There is a stronger association between mother’s and child’s heights as the correlation is farther away from 0.
a. Observational b. The 26 students c. The explanatory variable is mother’s height and response is child’s height. Both of these are quantitative. d. There is a weak positive linear association beween mother’s and child’s heights with no unusual observations. See the scatterplot.
10.1.29 a. Observational b. The 16 students c. The explanatory variable is name length and response is Scrabble score. Both of these are quantitative. d. There is a moderate positive association beween name length and Scrabble score. There are a couple of scores that have fairly few letters but give high Scrabble scores. See scatterplot.
76
Height
35 72 28 Scrabble score
68
64 60
65 70 Mom height
75
e. The correlation coefficient is 0.200 and this backs up the description given in part (d) because it is fairly close to 0. 10.1.27 a. Observational b. The 25 students c. The explanatory variable is father’s height and response is child’s height. Both of these are quantitative.
c10Solutions.indd 130
21
14
8
10
12
14
16
Letters
e. The correlation coefficient is 0.476 and this backs up the description given in part (d) as it is around 0.5.
10/16/20 8:02 PM
Solutions to Problems
131
10.1.30
10.1.33
a. There is a fairly strong positive linear (perhaps a bit concave downward) association between the two variables. See scatterplot.
a. The direction is positive, the form is linear, and the association is fairly strong. b. The correlation for just the homes on the lake is larger than 0.678. The same is true for homes not on the lake.
35
10.1.34 a. The direction is positive, the form is linear, and the association is strong.
Scrabble score
28
b. The correlation will not change much when you look at just females or just males. It will be a bit smaller in both cases. 10.1.35
21
a. The direction is positive, the form is linear, and the association is fairly strong. b. Yes, the correlations will change quite a bit. The correlation for just the Karschi crickets is larger than 0.678. The same is true for Fultoni crickets.
14
1.4
2.1
2.8
3.5
Section 10.2 10.2.1 D.
Ratio
10.2.2 C.
b. The correlation coefficient is 0.782. This supports the answer in part (a) as it is fairly close to 1.
10.2.3 A.
10.1.31
10.2.4 C.
a. The linear association is stronger between the Scrabble points and the ratio as the correlation is closer to 1.
10.2.5 a. B.
b. This makes sense because the Scrabble ratio will take into account the score of each letter, whereas the name length doesn’t. These individual letter scores can be quite high or low so they have a great impact on the total word score.
b. With Lisa’s larger sample we have more information and thus the strength of evidence is stronger.
10.1.32 a. Observational b. The 22 countries c. There is a fairly strong positive linear (perhaps a bit concave downward) association between the two variables with no unusual observations. See the scatterplot.
a. Invalid: The p-value is computed assuming there is no association. It does not tell the chance of no association. c. Invalid, but close: See the answer to part (d). d. Valid: If there were no association between height and hand span, the p-value (0.022) is the probability of observing the association observed in the sample data or an even stronger association in a sample of 10 students.
70 Life expectancy
10.2.7
b. Invalid, for the same reason as in (a) above.
80
60
e. Invalid, for the same reason as in part (a). 10.2.8 Null: There is no association between height and length of index finger. Alternative: There is an association between height and length of index finger.
50
40 0
200
400 600 TVs per thousand
800
d. The correlation coefficient is 0.743 and this backs up the description given in part (c) as it is fairly close to 1. e. No. Because this is an observational study there could be many potential confounding variables here so we can’t determine causation from this association.
c10Solutions.indd 131
10.2.6 The correlation coefficient, r, tells whether there is a strong linear association between two quantitative variables; values closer to 1 and –1 denote stronger linear associations. The p-value tells whether there is strong evidence of an association between two quantitative variables; values closer to 0 constitute stronger evidence of an association.
10.2.9 Put the 34 lengths of fingers on 34 index cards and the 34 heights on 34 other index cards. Shuffle the height index cards and randomly match them up with the 34 lengths of finger cards. From this dataset, compute the correlation. That simulated correlation will be one dot in the null distribution. Repeat this many, many times to create a null distribution. Find the proportion of simulated correlations that are at least as extreme as the observed correlation from the sample data. This proportion is the (estimated) p-value. 10.2.10 The p-value is determined by adding the areas of two regions: the portion of the null distribution at 0.474 and above and the portion that is –0.474 and below.
10/16/20 8:02 PM
132
C HA PTE R 10
Two Quantitative Variables
10.2.11 a. Observational b. The 90 Honda Civics c. The explanatory variable is age and the response is price. They are both quantitative. d. There is no association between age and price of used Honda Civics. e. There is an association between age and price of used Honda Civics. f. There is a fairly strong negative association between age and price of the cars. Not much unusual except perhaps the oldest and cheapest car. See scatterplot.
10.2.14 a. The p-value is about 0. b. If there is no association between area and price of homes, the probability we would get a sample correlation coefficient at least as extreme as 0.780 is about 0. c. We have extremely strong evidence (small p-value) that there is an association between area and price of homes in Arroyo Grande in 2006 (but can’t generalize to any other populations). Because our sample correlation coefficient is positive, we can go further and say the association is also positive in the population. We are not drawing any causal conclusions from this observational study.
18,000
Price ($)
up with the 20 area cards. From this dataset, compute the correlation coefficient. That simulated correlation coefficient will be one dot in the null distribution. Repeat this many, many times to create a null distribution. Find the proportion of simulated correlation coefficients that are at least as extreme as the observed correlation coefficient from the sample data.
10.2.15
12,000
a. Null: There is no association between maximum speed and maximum height of roller coasters. Alternative: There is an association between maximum speed and maximum height of roller coasters.
6,000
0 0
4
8
12
16
Age (yrs)
g. The correlation of –0.820 backs up the answer to part (f) because it is close to –1. h. Put the 90 ages on 90 index cards and the 90 prices on 90 other index cards. Shuffle the price index cards and randomly match them up with the 90 age cards. From this dataset, compute the correlation coefficient. That simulated correlation coefficient will be one dot in the null distribution. Repeat this many, many times to create a null distribution. Find the proportion of simulated correlation coefficients that are –0.820 or below along with the simulated correlation coefficients that are 0.820 or above. This proportion will be the estimated p-value.
b. Put the 129 heights on 129 index cards and the 129 speeds on 129 other index cards. Shuffle the speed index cards and randomly match them up with the 129 height cards. From this dataset, compute the correlation coefficient. That simulated correlation coefficient will be one dot in the null distribution. Repeat this many, many times to create a null distribution. Find the proportion of simulated correlation coefficients that are at least as extreme as the observed correlation coefficient from the sample data. 10.2.16 a. The p-value is about 0. b. If there is no association between maximum height and maximum speed for roller coasters, the probability we would get a sample correlation coefficient at least as extreme as 0.895 is about 0. c. We have extremely strong evidence (small p-value) that there is an association between maximum height and maximum speed for roller coasters in the United States. Because our sample correlation coefficient is positive, we can go further and say the association is also positive in the population.
10.2.12
10.2.17
a. The p-value is about 0.
a. Null: There is no association between heights of mothers and their children. Alternative: There is an association between heights of mothers and their children.
b. If there is no association between age and price of used Honda Civics, the probability we would get a sample correlation coefficient at least as extreme as –0.820 is about 0. c. We have extremely strong evidence (small p-value) that there is an association between a car’s price and its age in the population of all used Honda Civics advertised online. Because our sample correlation coefficient is negative, we can add that the association is negative in the population. Because this was an observational study, we are not drawing any causal conclusions. The data are a sample of used Honda Civics listed for sale online in July 2006 so we probably can’t generalize to more recent populations. 10.2.13 a. Null: There is no association between area and price of homes. Alternative: There is an association between area and price of homes. b. Put the 20 areas on 20 index cards and the 20 prices on 20 other index cards. Shuffle the price index cards and randomly match them
c10Solutions.indd 132
b. Put the 26 mothers’ heights on 26 index cards and the 26 children’s heights on 26 other index cards. Shuffle the children’s heights index cards and randomly match them up with the 26 mothers’ heights cards. From this dataset, compute the correlation coefficient. That simulated correlation coefficient will be one dot in the null distribution. Repeat this many, many times to create a null distribution. Find the proportion of simulated correlation coefficients that are at least as extreme as the observed correlation coefficient from the sample data. 10.2.18 a. The p-value is about 0.33. b. If there is no association between heights of mothers and their children, the probability we would get a sample correlation coefficient at least as extreme as 0.200 is about 0.33. c. We do not have strong evidence that there is a linear association between heights of mothers and their children.
10/16/20 8:02 PM
Solutions to Problems 10.2.19 a. Null: There is no association between heights of fathers and their children. Alternative: There is an association between heights of fathers and their children. b. Put the 25 fathers’ heights on 25 index cards and the 25 children’s heights on 25 other index cards. Shuffle the children’s heights index cards and randomly match them up with the 25 fathers’ heights cards. From this dataset, compute the correlation. That simulated correlation will be one dot in the null distribution. Repeat this many, many times to create a null distribution. Find the proportion of simulated correlations that are at least as extreme as the observed correlation from the sample data. 10.2.20 a. The p-value is about 0.77. b. If there is no association between heights of fathers and their children, the probability we would get a sample correlation coefficient at least as extreme as 0.061 is about 0.77. c. Because our p-value is not small, we do not have strong evidence that there is an association between heights of fathers and their children. This conclusion can be generalized to statistics students like those in the sample. 10.2.21 a. Null: There is no association between length of names and their Scrabble word score. Alternative: There is an association between length of names and their Scrabble word scores. b. Put the 16 name lengths on 16 index cards and the 16 word scores on 16 other index cards. Shuffle the word score cards and randomly match them up with the 16 name length cards. From this dataset, compute the correlation coefficient. That simulated correlation coefficient will be one dot in the null distribution. Repeat this many, many times to create a null distribution. Find the proportion of simulated correlation coefficients that are at least as extreme as the observed correlation from the sample data. 10.2.22 a. The p-value is about 0.06. b. If there is no association between name lengths and their scores, the probability we would get a sample correlation coefficient at least as extreme as 0.476 is about 0.06. c. We do not have strong evidence (p-value > 0.05) that there is an association between name length and its Scrabble word score, but we do have moderate evidence (p-value < 0.10) of such an association and it is positive. We may cautiously generalize these results to statistics students like those in the study but we are not drawing a cause-andeffect conclusion from this observational study. 10.2.23 a. Null: There is no association between Scrabble points per letter of names and their Scrabble score. Alternative: There is an association between Scrabble points per letter of names and their Scrabble score. b. Put the 16 points per letter on 16 index cards and the 16 name scores on 16 other index cards. Shuffle the name score cards and randomly match them up with the 16 points per letter cards. From this dataset, compute the correlation coefficient. That simulated correlation coefficient will be one dot in the null distribution. Repeat this many, many times to create a null distribution. Find the proportion of simulated correlation coefficients that are at least as extreme as the observed correlation coefficient from the sample data. 10.2.24 a. The p-value is about 0.0006.
c10Solutions.indd 133
133
b. If there is no association between name lengths and their scores, the probability we would get a sample correlation coefficient at least as extreme as 0.782 is about 0.0006. c. We have strong evidence (p-value < 0.001) that there is an association between Scrabble points per letter of names and their Scrabble score and that that relationship is positive. We might cautiously generalize these results to statistics students like those in this study but we are not drawing any cause-and-effect conclusions from this observational study. 10.2.25 a. Null: There is no association between number of televisions per capita and life expectancy for countries around the world. Alternative: There is an association between number of televisions per capita and life expectancy for countries around the world. b. Put the 22 numbers of televisions per thousand people on 22 index cards and the 22 life expectancies on 22 other index cards. Shuffle the life expectancy cards and randomly match them up with the 22 television cards. From this dataset, compute the correlation coefficient. That simulated correlation coefficient will be one dot in the null distribution. Repeat this many, many times to create a null distribution. Find the proportion of simulated correlation coefficients that are at least as extreme as the observed correlation coefficient from the sample data. 10.2.26 a. The p-value is about 0.0001. b. If there is no association between number of televisions per thousand people and life expectancy in countries around the world, the probability we would get a sample correlation at least as extreme as 0.743 is about 0.0001. c. We have very strong evidence (p-value < 0.001) that there is an association between number of televisions per thousand people and life expectancy in countries around the world and that association is positive. These countries were not a random sample so we may not be able to generalize these results to all countries and we certainly aren’t drawing a cause-and-effect conclusion from this observational study. 10.2.27 a. H0: There is no association between a person’s height and reaction time. Ha: There is an association between a person’s height and reaction time b. r = −0.052 c. Negative association because r < 0 d. p-value = 0.72 e. There is weak evidence of an association between height and reaction time. f. Stand. stat. = (–0.052 – 0)/0.145 = –0.359, which also provides weak evidence against the null hypothesis. Yes, this makes sense. 10.2.28 a. H0: There is no association between a person’s height and their magical height. Ha: There is a positive association between a person’s height and magical height. b. r = 0.521 c. Positive association because r > 0 d. If the correlation were negative, then, on average, short females would want to be magically taller and tall females would want to be magically shorter. e. p-value = 0.00, or rather < 0.001 f. Yes, there is strong evidence of a positive association between height and magical height.
10/16/20 8:02 PM
134
C HA PTE R 10
Two Quantitative Variables
g. Because a person’s height is not randomly assigned, we cannot make any causal conclusions. And, because the sample was not randomly selected, we can only generalize to female college students similar to those in the study. 10.2.29 a. H0: There is no association between the number of piercings people have and their number of skipped classes. Ha: There is an association between the number of piercings people have and their number of skipped classes. b. r = 0.223
Section 10.3 10.3.1 B. 10.3.2 B. 10.3.3 C. 10.3.4 a. B, D b. A, C, E c. D d. E
c. Positive linear association because r > 0
10.3.5
d. p-value = 0.025
a. The correlation coefficient is a unitless measurement that is bounded by –1 and 1.
e. Stand. stat. = (0.223 – 0)/0.098 = 2.276 f. Yes, there is strong evidence of an association between number of body piercings and number of classes skipped. The p-value is less than 0.05, and the standardized statistic is greater than 2. g. Because the number of body piercings a person has is not randomly assigned, we cannot make any causal conclusions. And because the sample was not random, we can only generalize to college students similar to those in the study. 10.2.30 a. H0: There is no association between people’s height and arm span. Ha: There is an association between people’s height and arm span. b. r = 0.827 c. There is strong evidence of a linear association between height and arm span. The p-value is less than 0.001, and the standardized statistic is greater than 3. Because a person’s height is not randomly assigned, we cannot make any causal conclusions. And because the sample was random, we can generalize to U.S. high school students. 10.2.31 No. To identify one numerical measure of the long-run association we would have to make a lot more assumptions.
b. The correlation coefficient and slope will always have the same sign. 10.3.6 a. Yes, the mean of the residuals is calculated as the sum of the residuals divided by the number of residuals. If the sum is zero, then zero divided by a positive number is also zero. b. No, suppose that the least squares regression line does not go through any of the points on the scatterplot. Then none of the residuals would be zero, and if the 50th percentile fell on a residual it would necessarily be either negative or positive. 10.3.7 a. Yes b. Yes c. Yes 10.3.8 a. See applet output for Solution 10.3.8a and for an example of a negative slope coefficient.
6 Sample data: (Explanatory, Response)
5
X 1 2 3 1 2 3
4 y
y 6 5 4 3 2 1
3 2 1 0
Use data
Revert
n˜6 Show movable line:
Clear
0
1
2 x
3
Show regression line: y^ ˜ 5.50 ° ˛1.00 ˝ x
Solution 10.3.8a
c10Solutions.indd 134
10/16/20 8:02 PM
Solutions to Problems
135
12 Sample data: (Explanatory, Response) y 2 4 6 8 10 12
Use data
9 y
X 1 2 3 1 2 3
6
3
Revert
0
Clear
n˜6
1
2 x
3
2 x
3
Show regression line: y^ ˜ 3.00 ° 2.00 ˛ x
Show movable line: Solution 10.3.8b
10 Sample data: (Explanatory, Response)
9
X 1 2 3 1 2 3
8 y
y 5 5 5 10 10 10
7 6 5 4
Use data
Revert
Clear
n˜6
0
1
Show regression line: y^ ˜ 7.50 ° 00 ˛ x
Show movable line: Solution 10.3.8c Sample data:
4
(Explanatory, Response) y ˜10 ˜10 ˜10 5 5 5
0 y
X 1 2 3 1 2 3
˜4 ˜8 ˜12
Use data
Revert
Clear
0
1
2 x
3
4
n°6 Show movable line:
Show regression line: y^ ° ˜2.50 ˛ 00 ˝ x
Solution 10.3.9a
b. See applet output for Solution 10.3.8b and for an example of a slope coefficient greater than 1. c. See applet output for Solution 10.3.8c and for an example of a slope coefficient of zero.
c10Solutions.indd 135
10.3.9 For Exercise 10.3.7: a. Yes b. Yes c. Yes
10/16/20 8:02 PM
136
C HA PTE R 10
Two Quantitative Variables 12 Sample data: (Explanatory, Response) y 2 4 6 8 10 12
Use data
9 y
X 1 2 3 1 2 3
6
3
Revert
n˜6 Show movable line:
0
Clear
1
2 x
3
Show regression line: y^ ˜ 3.00 ° 2.00 x x
Solution 10.3.9b
6 5
Sample data: (Explanatory, Response) y 1 2 3 4 5 6
4 y
x 1 2 3 4 5 6
3 2 1
Use data
Revert
Clear
0
0
1
2
3 x
n=6 Show movable line:
4
5
6
Show regression line: y^ = 00 + 1.00 x x
Solution 10.3.9c
For Exercise 10.3.8:
400
a. Negative y-intercept b. Greater than 1 y-intercept c. Zero y-intercept a. There are physical limitations, and the same linear trend will not continue forever. They have extrapolated beyond the observed range of the data. b. 2636 10.3.11 a. Explanatory: number of pieces, response: price b. The scatterplot shows a strong positive linear association with no unusual observations.
c10Solutions.indd 136
300
Price ($)
10.3.10
200
100
0 0
1,000
2,000 Pieces
3,000
4,000
10/16/20 8:02 PM
Solutions to Problems c. r = 0.974 2
d. r = 0.949.This means that 94.9% of the variation in price is explained by the linear association with number of pieces. 10.3.12 dicted price = 4.86 + 0.105 × pieces a. preˆ b. For each additional Lego piece, the estimate price increases 0.105 dollars. c. If there are no pieces in the Lego set, the predicted price is $4.86. Taken literally and in isolation, this interpretation is not meaningful, because no one would order zero pieces. (The large intercept suggests a substantial “fixed cost” to place an order regardless of the number of pieces.) 10.3.13 a. Correct but incomplete b. Incorrect c. Correct d. Correct but incomplete e. Correct f. Correct 10.3.14 a. Incorrect b. Correct c. Incorrect d. Incorrect 10.3.15 a. $57.36 b. $162.36 c. $105 [1,500 is 1,000 more pieces than 500 and 1,000 times the slope (0.105) gives $105] d. $529.86 e. The data range from 34 pieces to 3,803 pieces. However, 5,000 pieces is beyond the range of the data and it is not clear that the linear relationship will hold beyond the range of the data, so there is not a lot of confidence in the predicted price for a 5,000-piece set. 10.3.16 a. $48.435 b. 49.99 – 48.44 = 1.55 c. The observed price is $1.55 above the predicted price for the set d. Above, the residual is positive 10.3.17 a. It depends on which Lego set is offered for $0. If it is a Lego set that already doesn’t cost very much, then no, the least squares line would not be impacted very much. If it is a Lego set that originally was very expensive, then yes, it would affect the least squares regression line. b. Either of the sets that are $399.99. Choose the one with 3,803 pieces as this is the farthest away from the range of values for pieces.
c10Solutions.indd 137
137
c. predicted price = 15.03 + 0.0756(pieces), r 2 = 0.636; yes, they have changed quite a bit. 10.3.18 a. Right: chirps per minute is on the x-axis, which is the explanatory variable that explains or predicts changes in the response variable. The response variable is on the y-axis, in this case it is temperature. b. A: Equation C has a negative slope and the slope is clearly positive; Equation B has a slope of 3.8, so for each increase of 10 chirps per minute, the temperature is predicted to increase 38 degrees, but from the graph it looks more like an increase of 3 degrees. c. 60.78 degrees d. For each additional chirp per minute, the predicted temperature increases 0.25 degrees. 10.3.19 a. Moderately strong negative linear association b. predicted takeoff velocity = 394.47 − 0.0122 (body mass) c. For each additional gram of body mass the cat’s predicted takeoff velocity decreases 0.0122 cm/sec. d. For a cat whose body mass is zero the takeoff velocity is 394.47 cm/ sec; this doesn’t make sense in the context of the study because a cat wouldn’t have a body mass of zero. e. 24.6% of the variability seen in takeoff velocity is explained by the linear association with the body mass of the cat. 10.3.20 a. 333.47 cm/sec b. 272.47 cm/sec. A 10,000-gram cat is beyond the range of the data and the linear relationship may not hold outside of this range. c. A 7930-gram cat has observed takeoff velocity of 286.3 and predicted takeoff velocity of 297.72. The residual is −11.4. The predicted takeoff velocity is 11.4 cm/sec lower than the actual takeoff velocity. 10.3.21 a. The cat with body mass 2,660 grams b. The cat with body mass 7,930 grams c. The cat with body mass 5,600 grams d. The cat with body mass 3,550 grams 10.3.22 a. Percent body fat has the strongest association with takeoff velocity. It is a strong negative linear relationship. b. predicted takeoff velocity = 397.65 − 1.95(body fat) c. For each 1 percentage point increase in body fat the predicted takeoff velocity decreases 1.95 cm/sec. d. r 2 = 0.424; 42.4% of the variability in takeoff velocities can be explained by the linear relationship with percent body fat. 10.3.23 a. The observational units are the cars. b. The scatterplot (below) reveals a fairly strong negative, linear association between price and age.
10/16/20 8:02 PM
138
C HA PTE R 10
Two Quantitative Variables b. For each additional mile hiked, the predicted time of the hike increases 31.48 min.
25,000
c. 124.65 min
Price ($)
20,000
d. A 4-mile hike, because 12 miles is beyond the range of the data gathered and the linear relationship may not hold. e. The variability in hike times is explained by the linear relationship with the distance of the hike.
15,000
10,000
10.3.26
5,000
b. For each additional foot in elevation, the predicted hike time increases 0.06182 min.
a. 0.06182
c. 130.95 min d. An 800-foot elevation gain because a 2,800-foot gain is beyond the range of the data gathered and the linear relationship may not hold.
0 0
2
4
6
8
10
12
14
Age (yrs)
c. The least squares line is: predicted price = 18,785.31 – 1,397.47 × age, and is graphed below. 25,000
e. The variability in hike time is explained by the linear relationship with elevation. 10.3.27 a. Slope = 0.916(63.79/1.856) = 31.48, y-intercept = 102.08 – (31.48) (3.283) = –1.27 b. Slope = 0.344(63.79/355.5) = 0.0617, y-intercept = 102.08 – (0.0617)(333.2) = 81.52 c. 102.08
20,000
Price ($)
d. 102.08 e. They are both equal to the mean of the response (time of hike).
15,000
10.3.28 a. The answer to both questions is yes. The least squares line always goes through the “point of averages.” If the fitted slope is positive, then values of x above the mean correspond to fitted values for y above the mean, and vice-versa.
10,000 5,000
0
5
10
15
Age
d. The slope coefficient is –1,397, indicating that the predicted price of a used Honda Civic decreases by $1,397 for each additional year of age. e. The value of r2 is 0.673, so 67.3% of the variability in cars’ prices is explained by the least squares line with cars’ age. 10.3.24 a. The explanatory variable is the number of pages in the textbook. The response variable is the price of the textbook. b. The equation of the least squares line is: predicted price = −3.42 + 0.1473(pages). c. The predicted price of a 500-page textbook is: −3.42 + 0.1473 × 500 ≈ 70.23 dollars. The predicted price of a 1,500-page textbook is: −3.42 + 0.1473 × 1,500 ≈ 217.53 dollars. The first prediction is more believable, because 500 pages is within the range of the sample data, but no textbooks in the sample had close to 1,500 pages d. The slope coefficient of 0.1473 means that for each additional page in a textbook the predicted price increases by about 0.1473 dollars, which is almost 15 cents. e. The proportion of variability in textbook prices that is explained by knowing the number of pages is the square of the correlation coefficient, which is r 2 = 0.677. 10.3.25 a. 31.48
c10Solutions.indd 138
b. If someone scores 79.8 on Test 2, the predicted score for Test 3 is 84.8. If they actually get 90 on Test 3, the residual is 90 − 84.8 = 5.2. c. If someone scores 10 points above the mean on Test 2, the expected score on Test 3 is (0.3708)(10) = 3.708 points above the mean. d. If someone scores 10 points below the mean on Test 2, the expected score on Test 3 is also 3.708 points below the mean. 10.3.29 a. 75.594, or 76 b. 90.426, or 90 c. The student who scored 55 on Test 2 is predicted to achieve a higher score on Test 3; the student who scored 95 on Test 2 is predicted to achieve a lower score on Test 3. 10.3.30
a. meaˆ n rating = 2.19 + 0.0349(glycemic load) b. For every one unit increase in glycemic load, the predicted mean addictive rating will increase by 0.0349. c. Positive residuals have actual mean addictive ratings that are greater than their predicted addictive ratings. d. The glycemic index for cookies is 7, and the mean addictive rating is 3.71. e. The glycemic index for brown rice is 20, and the mean addictive rating is 1.74. 10.3.31 a. As the square footage increases by one, the price is predicted to increase by $212.74.
10/16/20 8:02 PM
Solutions to Problems b. R2 = 0.6782 = 0.460, meaning 46.0% of the variation in selling price can be explained by the linear association with square footage. c.
i. The slope of just the homes on the lake has increased, and the slope of just the homes not on the lake has decreased compared to when they were aggregated.
ii. The value of R2 has increased when looking at just the homes on the lake. The same is true for the homes not on the lake.
10.3.32 a. As the height increases by one inch, the magical height is predicted to increase by 0.9471 inch. b. R2 = 0.8422 = 0.709, meaning 70.9% of the variation in magical height can be explained by the linear association with height. c.
i. B. Females are represented by equation B because that has a smaller y-intercept and the blue line will be below the orange line when height equals 0. ii. magicˆ height = 38.17 + 0.5084(70) = 73.8, so taller than
iii. magicˆ height = 24.41 + 0.6436(70) = 69.5, so shorter than 70 inches. 70 inches.
10.3.33 a. As the temperature increases by one degree, the mean chirps per second is predicted to increase by 0.0646. b. R2 = 0.6182 = 0.382, meaning 38.2% of the variation in chirping rate can be explained by the linear association with temperature. c.
i. B. Fultoni crickets are equation B, because that has a larger yintercept and the orange line will be above the blue line when the temperature equals 0. ii. The value of R2 has increased quite a bit when looking at just the Fultoni species. The same is true for the Karschi species.
139
10.4.7 a. Null: There is no association between textbook price and number of pages in the population. Alternative: There is an association between textbook price and number of pages in the population. b. i. 2.4 standard deviations
ii. Yes, a standardized statistic between 2 and 3 standard deviations from the mean under the null provides strong evidence against the null.
10.4.8 a. For each additional page in a textbook the predicted price increases 0.13 dollars. b. The predicted price of a book with no pages is $14.11. This is extrapolation, because there are no books in the sample with no pages! 10.4.9 a. Hours of sleep the previous night b. Seconds needed to complete a paper and pencil maze c.
i. 1
ii. No, strong evidence is found when the observed statistic is 2 or more SDs away from the mean under the null; 7.76 is only 1 SD away.
10.4.10 a. For each additional hour of sleep per night the predicted number of seconds needed to complete the maze decreases by 7.76. b. 19.33 is the predicted amount of time for someone who didn’t sleep the previous night. This is extrapolation because no one in the sample pulled an all-nighter. 10.4.11 a. 39 subjects b. i. 1.55 ii. No, strong evidence is found when the observed statistic is 2 or more SDs away from the mean under the null; 0.9658 is 1.55 SDs away.
10.3.34 One main reason to prefer the sum of squared deviations is you will be able to find a unique line that minimizes the sum of squared deviations.
Section 10.4
a. For each additional unit increase in BMI, the predicted total cholesterol increases 0.9658.
10.4.1 A. 10.4.2 B. 10.4.3 Write the heights on 10 different slips of paper and foot lengths on 10 different slips of paper. Lay the 10 slips of paper with the heights written on them in a line on a flat surface. Shuffle the 10 slips of paper with the foot lengths on them and deal one out to each of the 10 slips of paper with the heights on them. Calculate the least squares regression equation for these 10 pairs of data and record the slope. Repeat this procedure 1,000 times to get 1,000 simulated slopes which make up the null distribution. 10.4.4 The estimated p-value from the simulation-based test will be zero as none of the least squares regression lines from the simulated data are as steep as the least squares regression line from the original data. 10.4.5 Lay the 20 slips of paper with the heights written on them in a line on a flat surface. Shuffle the 20 slips of paper with the vertical leap distances on them and deal one out to each of the 20 slips of paper with the heights on them. Calculate the least squares regression equation for these 20 pairs of data and record the slope. Repeat this procedure 1,000 times to get 1,000 simulated slopes which make up the null distribution. 10.4.6 The scatterplot shows a moderately strong, linear form, positive association.
c10Solutions.indd 139
10.4.12
b. The predicted cholesterol level of someone with a BMI of zero is 162.56. This is extrapolation because there is no one in the sample with a BMI of 0. 10.4.13 a. H0: No association between year of manufacture and price and Ha: There is an association between year of manufacture and price. b. Count the number of simulated slopes that are at least as extreme as the observed slope and divide by 1,000. The p-value is 0. c. With a p-value of 0 we have very strong evidence against the null and in favor of the alternative that the slope describing the association between the number of years before 2006 a Honda Civic was manufactured and the sale price is different from zero. Based on our simulation, which assumed there was no association between number of years before 2006 a Honda Civic was manufactured and its sale price, not once did we find a simulated slope that was as steep as the one from the original data. 10.4.14 a. predicted price = $18,785.31 − 1,397.47(age) b. Slope: For every additional year of age the price drops $1,397.47. Intercept: A car from 2006 (current model year) has a predicted price of $18,785.31.
10/16/20 8:02 PM
140
C HA PTE R 10
Two Quantitative Variables
10.4.15 a. Null: There is no association between weight and haircut price in the population of students. Alternative: There is an association between weight and haircut price in the population of students. b. p-value = 0
c. With a p-value of 0 we have very strong evidence against the null and in favor of the alternative that the slope describing the association between number of classes skipped in the past three weeks and GPA is different from zero. There is a negative association; that is, people missing more classes tend to have lower GPAs.
c. With a p-value of 0 we have very strong evidence against the null and in favor of the alternative that the slope describing the association between weight and amount of last haircut is different from zero. It appears to be a negative association (heavier individuals tend to spend less on haircuts)
d. No, this is an observational study and there are many potential confounding variables that may explain the variability in GPA.
d. No, this is an observational study and there are many potential confounding variables (e.g., whether the person is male or female) that may explain the variability in haircut costs.
b. b = 0.0242. For each additional hour of sleep the previous night, the predicted GPA increases 0.0242.
10.4.16 a. predicted haircut price = 101.51 – 0.476(weight) b. Slope: For every additional pound heavier someone is their predicted haircut price drops 47.6 cents. Intercept: For someone who weighs 0 pounds, the predicted price of their last haircut is $101.510 (though we can’t expect our relationship to extend to such small weights). 10.4.17 a. Null: There is no association between height and haircut price in the population of students. Alternative: There is an association between height and haircut price in the population of students. b. p-value = 0.007 c. With a p-value of 0.007 we have very strong evidence against the null and in favor of the alternative that the slope describing the association between height and amount of last haircut is different from zero. It appears to be a negative association (taller people tend to spend less on haircuts). d. No, this is an observational study and there are many potential confounding variables (like whether the person is male or female) that may explain the variability in haircut costs. 10.4.18 a. predicted haircut price = 160.66 − 1.92(height) b. Slope: For every additional inch taller someone is, their predicted haircut price drops $1.92. Intercept: For someone who is 0 inches tall, the predicted price of their last haircut is $160.66 (although this is extrapolation). 10.4.19 a. H0: There is no association between BMI and age in the population. Ha: There is an association between BMI and age in the population. b. p-value = 0.068 c. With a p-value of 0.068 we have moderate evidence against the null and in favor of the alternative that there is an association between age and BMI. There is moderate evidence of a positive association that as age increases so does BMI. 10.4.20 a. p-value = 0.068 b. p-value for test of significance of correlation is 0.069 c. They are almost identical (rounding). 10.4.21 a. H0: There is no association between missing class and GPA in the population of students and Ha: There is an association between missing class and GPA in the population of students. b. p-value = 0
c10Solutions.indd 140
10.4.22 a. r = 0.083; there does not appear to be a strong association between sleep the previous night and GPA.
c. H0: No association between sleep and GPA in the population of students and Ha: Association between sleep and GPA in the population of students. d. p-value = 0.017 e. The p-value of 0.017 offers strong evidence against the null and in support of there being an association between hours of sleep the previous night and GPA. f. Large sample size increases strength of evidence. 10.4.23 a. H0: There is no association between age and the time it takes to read a list of 20 color words. Ha: There is an association between age and the time it takes to read a list of 20 color words. b. b = 0.2383. For each additional year it is predicted to take 0.2383 sec longer to read the 20 color words. c. p-value = 0.003 d. A p-value of 0.003 provides very strong evidence against the null and in favor of there being an association between age and the time in seconds it takes for someone to read 20 color words. The association is positive; older people tend to take longer to read the 20 color words. 10.4.24 a. p-value = 0.188. Results are no longer significant. b. p-value = 0. Yes, it is a significant result. c. For ages 18−80, there is very strong evidence of an association between age and time to read 20 color words. As one moves beyond the range of the data, the linear model may not be the best model to describe the data and predictions using the linear model beyond the 18−80 age range may not be good predictions. 10.4.25 a. H0: There is no association between people’s height and their reaction time. Ha: There is an association between people’s height and their reaction time. b. reactˆ ion time = 0.6485 − 0.001(height) c. Negative linear association because b < 0 d. p-value = 0.727. There is weak evidence of a linear association between height and reaction time. e. Standardized statistic, t = (−0.001 – 0)/0.003 = −0.333, which also provides weak evidence against the null hypothesis, because the observed slope is within 1.5 standard deviations of the mean of the null distribution, 0, which is the null hypothesized value for the population slope. 10.4.26 a. H0: There is no association between a person’s height and his or her magical height. Ha: There is a positive association between a person’s height and magical height.
10/16/20 8:02 PM
c height = 33.11 + 0.5062(height) b. magiˆ c. Positive association because b > 0 d. If the slope was negative, then as heights increased, magical heights would tend to decrease. e. p-value = 0.00 f. Yes, there is strong evidence of a positive association between height and magical height. g. Because a person’s height is not randomly assigned, we cannot make any causal conclusions. Because the sample was not random, we can only generalize to female college students similar to those in the study. 10.4.27 a. H0: There is no association between the number of piercings people have and their number of skipped classes. Ha: There is an association between the number of piercings people have and their number of skipped classes. ˆclasses = 1.10 + 0.1654(piercings) b. skipped c. Positive association because b > 0 d. p-value = 0.023 e. Stand. stat. = (0.1654 – 0)/0.073 = 2.266
Solutions to Problems
141
b. No, because the p-value is smaller than the cutoff value of 0.10 (corresponding to a 90% confidence interval) we have evidence against the null, so the value under the null for slope (zero) will not be in the 90% confidence interval. c. The p-value will be half as big: p-value = 0.03. 10.5.9 To see whether you get a different conclusion 10.5.10 a. No. The distribution of residuals is highly skewed to the right, with almost all the residuals small and negative, apart from two large positive residuals. Either there are two extreme outliers or else the distribution of random errors is highly skewed to the right. b. Validity conditions met c. Validity conditions met d. No, the fan shape indicates that as x increases the variation in y increases, which violates the requirement of constant variation. 10.5.11 a. predicted time in seconds = 198.33 − 7.76(hours sleep) b. 0.3052 c. 0.1526 10.5.12
f. Yes, there is strong evidence of an association between number of body piercings and number of classes skipped. The p-value is less than 0.05, and the standardized statistic is greater than 2.
a. In the population, β is the average amount glucose level change when BMI increases by one unit.
g. Because the number of body piercings a person has is not randomly assigned, we cannot make any causal conclusions. And because the sample was not random, we can only generalize to college students similar to those in the study.
c. Yes, the three validity conditions (linear trend, similar distributions, and equal spread around the line) are all reasonably well met for this dataset.
10.4.28 a. H0: There is no association between people’s height and arm span. Ha: There is an association between people’s height and arm span.
b. arˆ m span = − 3.04 + 1.04(height)
c. There is strong evidence of a linear association between height and arm span. The p-value is less than 0.001, and the standardized statistic is greater than 3. Because a person’s height is not randomly assigned, we cannot make any causal conclusions. And because the sample was random, we can generalize to U.S. high school students. 10.4.29
b. H0: β = 0, Ha: β ≠ 0
d. p-value = 0.0033 is the probability of getting a slope as extreme or more extreme than 0.6339 for the least squares regression line assuming there is no linear association between BMI and glucose levels. e. A p-value of 0.003 provides very strong evidence against the null and in support of there being an association between BMI and glucose levels. f. p-value = 0.00165 10.5.13 a. (0.2249, 1.0429). We are 95% confident that in the population, for each 1 unit increase in BMI, on average glucose levels increase between 0.2249 and 1.0429.
a. point of averages
b. Yes, the entire confidence interval is greater than zero.
b. Since the same values for the x variable and the same values for the y variable are shuffled each time, their averages don’t change like they might if a new sample were taken.
a. In the population, β is the average amount the guessed weight changes when the guessed height increases by 1 inch.
10.5.14
b. H0: β = 0, Ha: β > 0
Section 10.5 10.5.1 A. 10.5.2 C.
c. Yes, the three validity conditions (linear trend, similar distributions, and equal spread around the line) are all reasonably well met for this dataset.
10.5.4 D.
d. p-value = 0.00795 provides very strong evidence against the null and in support of there being an association between the guessed height and guessed weight for the professor.
10.5.5 A.
10.5.15
10.5.3 A.
10.5.6 C. 10.5.7 C. 10.5.8 a. No, a test of significance can never provide evidence for the null hypothesis.
c10Solutions.indd 141
a. In the population, β is the average amount BMI changes for each 1 year increase in age. b. H0: β = 0, Ha: β ≠ 0 c. Yes, the three validity conditions (linear trend, similar distributions, and equal spread around the line) are all reasonably well met for this dataset.
10/16/20 8:02 PM
142
C HA PTE R 10
Two Quantitative Variables
d. p-value = 0.0712 e. The p-value of 0.0712 provides moderate evidence against the null and in support of there being a linear association between age and BMI.
b. Yes, zero is not in the confidence interval and thus is not a plausible value for the population slope. 10.5.22
f. The p-value would be half as big: 0.0356.
a. Slope is 0.0470, which means that for each 1 day increase in the gestational period the predicted life expectancy increases 0.047 years.
10.5.16
b. H0: β = 0, Ha: β ≠ 0
a. (–0.0273,0.6297). We are 95% confident that, in the population, for each additional year increase the average BMI could decrease by as much as 0.0273 or increase by as much as 0.6297.
c. p-value = 0.0666. This provides moderate evidence against the null and in support of there being an association between days gestation and life expectancy.
b. No, the association could be negative, zero, or positive. c. Yes, the one-sided p-value is 0.0356, which is less than a cut-off value of 0.05.
d. It is very possible that a Type II error was made; if there were more data the strength of evidence may be increased to the point where we would have a significant result.
d. The two-sided test matches up with the confidence interval; it is more conservative than a one-sided test, that is, it is harder to find evidence against the null with a two-sided test than it is with a one-sided test.
10.5.23 a. In the population the slope, β, is how much height in inches changes on average for each additional hour of sleep.
10.5.17
b. H0: β = 0, Ha: β < 0
a. In the population of all Honda Civics, the slope tells how much the price changes on average for each year older the car is; β is the symbol for the population slope.
c. p-value = 0.0328 provides us with strong evidence against the null and in support of there being a negative association between amount of sleep per night in hours and height in inches.
b. H0: β = 0, Ha: β ≠ 0
c. Yes, the three validity conditions (linear trend, similar distributions, and equal spread around the line) are all reasonably well met for this dataset. d. p-value = 0 e. With a p-value of 0 we have very strong evidence against the null and in favor of the alternative that the population slope describing the association between the number of years before 2006 a Honda Civic was manufactured and the sale price is different from zero. 10.5.18 a. (–1,603.93,–1,191.02). We are 95% confident that in the population of all Honda Civics for each additional year in age of a Honda Civic, the average sale price decreases between $1,191.02 and $1,603.93. b. Yes, the entire interval is negative indicating a negative association (so as one variable increases (age of car) the other tends to decrease (sale price)). 10.5.19 ˆ = 95.20 + 2.96(foot length) a. height b. For each one centimeter increase in foot length, average predicted height increases 2.96 centimeters. c. Residual = 168 – 166.24 = 1.76 10.5.20 a. In the population, slope tells us how much BMI changes on average for each additional inch in height; β is the symbol for the population slope. b. H0: β = 0, Ha: β ≠ 0 c. Yes, the three validity conditions (linear trend, similar distributions, and equal spread around the line) are all reasonably well met for this dataset. d. p-value = 0.0417 e. The p-value provides strong evidence against the null and in support of there being an association between BMI and height. 10.5.21 a. (−0.3033,−0.0060). We are 95% confident that, in the population, for each additional inch in height the average BMI decreases by between 0.0060 and 0.3003.
c10Solutions.indd 142
d. The new p-value = 0.3622. Now we do not have strong evidence of a negative association between hours slept and height in inches. 10.5.24 a. As GDP per capita increases, mortality rate decreases. The relationship does not appear to be linear. b. This regression line will not give a good prediction of infant mortality rate as the data do not fit the line very well c. The relationship between logarithm of infant mortality and logarithm of GDP per capita is a fairly strong negative linear relationship. d. This regression line will be a good prediction of the logarithm of infant mortality because the data are closely clustered about the line. 10.5.25
a. sˆ core = − 12.7731 + 0.4612(temperature) b. Yes, the data points in the scatterplot tend to follow a linear pattern. c. Slope is 0.4612. For each one degree Celsius increase in body temperature, average predicted warmth score increases 0.4612. d. 12.1% of the variation in warmth score is explained by the linear relationship with body temperature. e. 4.2913 f. Residual = 4.4 – 4.2913 = 0.1087; the regression line underpredicts this person’s warmth score by 0.1087 point. 10.5.26
a. AQˆ score = 10.82 + 0.1416(PIT score) b. Yes, the data points in the scatterplot tend to follow a linear pattern. c. Slope is 0.1416. For each one point increase in PIT score, average predicted AQ score increases 0.1416. d. 24.8% of the variation in AQ score is explained by the linear relationship with PIT score. e. 19.5992 f. Residual = 17 – 19.5992 = –2.5992; the regression line overpredicts this person’s AQ score by 2.5992 points. 10.5.27 a. Wˆ PPSI = 79.39 + 0.2339(DAPIQ)
10/16/20 8:02 PM
Solutions to Problems b. Yes, the data points in the scatterplot tend to follow a linear pattern.
a. See scatterplot 32
c. Slope is 0.2339. For each one point increase in DAPIQ score, average predicted WPPSI score increases 0.2339 point. d. 9.0% of the variation in WPPSI scores is explained by the linear relationship with DAPIQ scores. f. Residual = 82 – 102.78 = –20.78; the predicted WPPSI is 20.78 points below the child’s observed WPPSI score. g. No, because DAPIQ score does not seem to explain much of the variability in the WPPSI scores—only 9%.
24 Distance (mi)
e. 102.78
t accuracy = 62.97 + 0.5002(DCQ) a. percenˆ
16
10.5.28
8
b. Slope is 0.5002. For each one point increase in DCQ score, average predicted percent accuracy of face matching increases 0.5002.
0
c. Yes, the p-value for the test of slope is 0.0022, which provides strong evidence against the null hypothesis that the slope in the population is zero. d. The DCQ score does not seem to explain much of the variability in percent accuracy—only 12.2%—and the correlation coefficient of 0.35 displays only a moderately strong correlation.
20
30
40 Age (yrs)
50
60
b. See scatterplot 32
t accuracy = 65.49 + 0.2932(DCQ) a. percenˆ 10.5.29
Distance (mi)
24
b. Slope is 0.2932. For each one point increase in DCQ score, average predicted percent accuracy of body matching increases 0.2932. c. Yes, the p-value for the test of slope is 0.0033, which provides strong evidence against the null hypothesis that the slope in the population is zero. d. The DCQ score does not seem to explain much of the variability in the percent accuracy—only 11.4%—and the correlation coefficient of 0.337 displays only a moderately strong correlation. 10.5.30 If you don’t assume linearity, then you are testing the null hypothesis against lots of different possibilities. It will be harder to distinguish no association from any form of association. If you correctly assume linearity then you have narrowed down the question. This creates a smaller standard deviation of the slopes which gives you more power.
143
16
8
0 20
30
50 40 Age (yrs)
60
c. See scatterplot 32
End of Chapter 10 Exercises 10.CE.1 A. 10.CE.2 a. r = 1. As midterm scores increase so do final exam scores and every final exam score is the same amount higher than the midterm score. b. r = 1. As midterm scores increase so do final exam scores and every final exam score is the same amount lower than the midterm score. c. r = 1. As midterm scores increase so do final exam scores and every final exam score is the same factor higher than the midterm score. 10.CE.3 a. No
Distance (mi)
24
16
8
0 20
30
40 Age (yrs)
50
60
b. Yes, the line will be different if you switch the roles of the variables. c. Yes, you always calculate the vertical distance. 10.CE.4 Answers will vary. The scatterplots shown for Solutions 10.CE.4a–10.CE.4e give examples of correct answers where age is in years and distance is in miles from school.
c10Solutions.indd 143
10/16/20 8:02 PM
144
C HA PTE R 10
Two Quantitative Variables
d. See scatterplot
100
32 90
Score (%)
Distance (mi)
24
16
80
70
60 8 50
20
30
40
50
60
21
28
42
49
b. r = −0.035 c. p-value = 0.8420. See screenshot of null distribution.
Age (yrs)
e. See scatterplot
–0.35
160
60
Mean = 0.004 SD = 0.179 Num shuf˜es = 1000
+0.35
120 Count
50 Distance (mi)
35 Time (min)
40
80
30 40
20
0 –0.600
10 0 20
30
40
50
60
Age (yrs)
10.CE.5 a. (i)&(iii), (i)&(iv), (i)&(vi), (iii)&(iv), (iii)&(vi), (iv)&(vi) b. (i)&(ii), (i)&(v), (ii)&(iii), (ii)&(iv), (ii)&(v), (ii)&(vi), (iii)&(v), (iv)&(v), (v)&(vi)
Count samples
0 0.300 –0.300 Shuf˜ed correlations Beyond
–.035
0.600
Count
Count = 842/1000 (0.8420)
d. t = −0.19, p-value = 0.8481 e. Both simulation-based and theory-based tests of significance offer weak evidence against the null and so it is plausible that there is no association between time to take a test and score on the test.
10.CE.6 ˆ% = − 14.81 + 3.26(age) a. left
10.CE.8
b. Yes, a theory-based test can be used because validity conditions are met: linear relationship, symmetry, and equal variance.
b. For each additional minute needed to take the test, the predicted test score drops 0.0604 points.
c. The slope is 3.26. For each one year increase in age, the predicted average left brain percent increase is 3.26.
c. If you take 0 min to take the exam, your predicted score is 73.74. This doesn’t make sense in the context of the problem.
d. 4.6% is the percentage of variability in the percent left brain that is explained by the linear association with age.
d. r2 = 0.001 so 0.1% of the variation in test scores can be explained by the linear association with time to take the test.
e. 43.87%
10.CE.9
f. Residual = 31% − 43.87% = −12.87 percentage points. The actual left-brain percentage for this subject is 12.87 less than the regression equation predicts. g. Because age only explains 4.6% of the variability seen in left-brain percentages, it doesn’t seem to be a very useful predictor. 10.CE.7 a. As seen in the scatterplot, there is a weak negative linear association between the time to take the test and the score on the test.
c10Solutions.indd 144
a. predicted score = 73.74 − 0.0604 (time)
a. t = −0.19, p-value = 0.8481 b. (−0.6978,0.5771) c. We are 95% confident that, in the population, for each additional minute taken on the test the average score decreases by as much as 0.6978 points or increases by as much as 0.5771 points. d. This is consistent with the test result because 0 is a plausible value for the population slope coefficient and we found weak evidence against the null of no association between time to take test and score on test.
10/16/20 8:02 PM
Solutions to Problems 10.CE.10 a. There is a strong, positive, fairly linear association between distance from the sun and orbital period. b. Yes, correlation measures the direction and strength of a linear relationship between two quantitative variables. The closer r is to 1 or –1, the stronger the linear association. 0.988 is very close to 1. 10.CE.11 Only within the range of the data. It looks like there may be a stronger exponential relationship, so extrapolating beyond the range of the data may give erroneous predictions. 10.CE.12 a. H0: β = 0, Ha: β ≠ 0, t = 19.05, p-value = 0. We have very strong evidence against the null and in support of an association between distance in miles of a hike and time in minutes the hike takes, specifically the more miles in a hike, the longer it will take on average. b. (28.18,34.77). We are 95% confident that on average each additional mile in a hike increases the time it takes to make the hike by between 28.18 and 34.77 minutes. 10.CE.13 a. H0: β = 0, Ha: β ≠ 0, t = 3.07, p-value = 0.003. We have very strong evidence against the null and in support of an association between elevation gain in feet of a hike and time in minutes the hike takes, specifically the higher the elevation gain of a hike, the longer it will take on average. b. (0.0217,0.1020). We are 95% confident that in the long run each additional foot gain in elevation in a hike increases the time it takes to make the hike by between 0.0217 and 0.102 minutes on average. 10.CE.14 0: β = 0, Ha: β > 0, b = 4.31, simulation-based p-value = 0, t = 22.32, H theory-based p-value = 0. Both simulation-based and theory-based analyses give very strong evidence against the null and in support of a positive association between height and distance walked before veering off. The 95% confidence interval is (3.91, 4.70) so we are 95% confident that, in the population, for each extra inch of height the average person will walk between 3.91 and 4.70 yards farther before veering off course. 100
d. 0.107 < p-value < 0.155 e. We have little evidence against the null and in support of the alternative that there is a positive correlation between the time a professor arrives for class and the average time the students arrive for class. f. Ha: � ≠ 0 and 0.190 < p-value < 0.304 g. No, this is an observational study and there are many possible confounding variables present so cause-and-effect conclusions can’t be made. 10.CE.16 Line 1 sum of squared residuals is 2. Line 2 sum of squared residuals is 2. Line 3 sum of squared residuals is 3/2. Line 4 sum of squared residuals is 9/4. 10.CE.17 Corr/Slope 1 2 3 4 5 6 Scatterplot F C D A E B 10.CE.18 a. A and E b. C, D, and F c. B 10.CE.19 a. For restaurants, the correlation between average noise level and time becomes weaker if you exclude the two lunchtime measurements. b. For stores, the noise level is not related to time of day. c. For the “other” locations, the suggestion of a possible relationship between noise level and time of day becomes stronger if you exclude the gym measured at 6:30 p.m. d. On average, restaurants are more noisy than stores. e. On average, decibel levels measured later in the day tend to be higher than those measured earlier in the day. f. Time of day is confounded with type of location.
h. If you go by the fitted line for all restaurants, the predicted decibel level at 5 p.m. is closest to 80 dB to the nearest 10.
80 Yardline
c. Professor’s arrival time in minutes early (or late) is the explanatory variables and students’ average arrival time in minutes early (or late) is the response variable.
g. For restaurants the fitted noise level increases by more than 10 dB during the three hours from 7 to 10 p.m.
90
10.CE.20
70
a. The obstacle to reaching a conclusion about the effect of time is that location and time of day are confounded. For example, most restaurants were observed late in the day and were loud. Most stores were observed early in the day and were not loud. There is no way— based on this badly planned dataset—to disentangle the effects of location from effects of time of day.
60 50 40 30 60
64
68 Height (in)
Show regression line: yardline^ = –231.65 + 4.31 × height
10.CE.15
a. H0: � = 0 b. Ha: � > 0
c10Solutions.indd 145
145
72
76
b. A better design would measure noise levels at different times for each location. 10.CE.21 a. At the y-intercept the decibel level is about 55dB. (Between 12 and midnight the fitted level goes up 20, so from 12 back to zero the fitted level goes down 20, from 75 to 55.) b. The fitted level of 55 cannot be correct, because 0 corresponds to midnight (24), for which the fitted level is 95dB. 10.CE.22 The first histogram is for Plot A, all locations. The second is for Plot B, restaurants. The third is for Plot C, stores. All histo-
10/16/20 8:02 PM
146
C HA PTE R 10
Two Quantitative Variables
grams are centered at 0 (the null hypothesis of no association), but the histograms differ in the size of their variability. The first (black) histogram is least spread out, with almost all outcomes between –0.50 and 0.50. The last histogram is the most spread out, with the smallest percentage of outcomes between –0.5 and 0.5. The size of variability is negatively correlated with sample size: The larger the sample, the smaller the variability. Thus the first histogram is for the largest of the three samples, and the last histogram is for the smallest of the three. 10.CE.23 A correlation of zero means there is no linear relationship. Most likely for this dataset, there is no relationship of any sort between childhood coffee consumption and adult height. (In principle, there could be a nonlinear relationship, but it is hard to imagine such a relationship between these two particular variables.) 10.CE.24 a. Correlation should be moderate (or strong) and positive. As the rate of gun ownership increases, the rate of gun deaths will also increase. b. Correlation will be positive and very strong. A state’s electoral vote total is roughly proportional to its population, so states with large populations will have large numbers of electoral votes, and vice-versa. (In more detail, the number of electoral votes is equal to its number of senators plus number of representatives. Each state has two senators and its number of representatives is proportional to its population, so (electoral vote) ≈ 2 + k × (population size), where k is a positive constant. c. Correlation will be strong and positive, because both numbers are related to population size. States with larger populations will have more attorneys and more McDonald’s restaurants. States with smaller populations will have fewer of both. d. Correlation will be strong and positive because, as a rule, graduation from high school is a requirement for applying to college. States with high rates of graduation from high school will tend to have higher percentages of college graduates. e. Correlation will be strong and negative. In fact the correlation will equal –1 because there is a perfect linear relationship with slope –1: (year of statehood) + (years as a state) = (current year). f. Correlations will be positive and moderate. The U.S. was settled from east to west, which means that as a rule the year of statehood is earlier (lower) for eastern states and later (higher) for western states. Longitude also increases as you go from east to west in the United States Thus both quantitative variables tend to increase together as you go from east to west. g. Correlation will be near zero, because there is no systematic relationship between a state’s north-south location and its eastwest location. h. Correlation will be negative and moderate. As a rule, southern states (lower latitudes) tend to have warmer (larger) average annual temperatures, whereas northern states (higher latitudes) tend to have colder (lower) annual temperatures. 10.CE.25 a. The association between these two variables is negative. This means that as VO 2 max increases, the predicted thickness of the superior frontal grey matter decreases. ˆ b. superior frontal lobe = 4.08 − 0.0063 VO2 max (
e. Theory-based p-value = 0.0024 f. Because the p-value is less than 0.05, there is strong evidence against the null hypothesis and in support of the alternative that there is an association between VO2 max and superior frontal lobe matter thickness. Specifically, as VO2 max increases, the thickness of the superior frontal lobe decreases. Because VO2 max scores were not randomly assigned, no causal conclusions can be made, and because random sampling was not used, the results can only be generalized to 8-, 9-, and 10-year-olds similar to those in the study. 10.CE.26 If there truly is no association between VO2 max and areas of the brain, because nine tests were run on the same study, the chances of finding at least one significant result increase from the level of significance of 0.05 is quite high. (In this case, it is roughly 37%.) 10.CE.27 a. The association between these two variables is positive. This means that as VO2 max increases, the predicted mathematics ability score also increases. b. Yes, a theory-based test can be used because validity conditions are met: linear relationship, symmetry, and equal variance. c. H0: There is no linear association between VO2 max and math score. Ha: There is a linear association between VO2 max and math score. d. Theory-based p-value = 0.0077 e. Because the p-value is less than 0.05, there is strong evidence against the null hypothesis and in support of the alternative that there is an association between VO2 max and math ability score. Specifically, as VO2 max increases, so does predicted math score. Because VO2 max scores were not randomly assigned, no causal conclusions can be made, and because random sampling was not used, the results can only be generalized to 8-, 9-, and 10-year-olds similar to those in the study.
Chapter 10 Investigation 1. Is there an association between hand span and the number of Tootsie Rolls one is able to grab? 2. 45 students 3. Observational study. The researchers didn’t randomly assign any of the variables. 4. Hand span in cm and number of Tootsie Rolls grabbed 5. Hand span is quantitative explanatory variable and number of Tootsie Rolls is quantitative response variable. 6. Null: No association between hand span in cm and the number of tootsie rolls grabbed. Alternative: There is an association between hand span and the number of Tootsie Rolls grabbed. 7.
)
c. Yes, a theory-based test can be used because validity conditions are met: linear relationship, symmetry, and equal variance.
c10Solutions.indd 146
d. H0: There is no linear association between VO2 max and the thickness of the superior frontal lobe of the brain. Ha: There is a linear association between VO2 max and the thickness of the superior frontal lobe of the brain.
Hand span (cm)
Tootsie Rolls
Mean
20.86
17.11
SD
1.8
3.66
r = 0.574
10/16/20 8:02 PM
Solutions to Problems 8.
16. No causal conclusion can be made as this was an observational study and there are many possible confounding variables not controlled for.
28
Tootsie Rolls
24
17. This was not a random sample, so possibly one could generalize to college students, but not to a larger group.
20
16
12 18
21 24 Hand span (cm)
27
9. There appears to be a positive, moderately strong linear relationship between hand span and number of Tootsie Rolls. 10. No, the observations seem to follow the general trend. ˆ rolls = –7.25 + 1.17 (hand span) 11. tootsie 12. For each 1-cm increase in hand span, the predicted number of Tootsie Rolls grabbed increases by 1.17. 13. a. See graph. Mean = –0.011 SD = 0.312 Num shuf˜es = 1000
200
1.17 Count
150
18. Data were gathered on 45 college students to see whether there was an association between hand span and the number of Tootsie Rolls grabbed. Strong evidence was found against the null and in support of the alternative that there is a genuine association between hand span and number of Tootsie Rolls grabbed. The 95% confidence interval for the population slope parameter to describe the association between hand span and number of Tootsie Rolls was (0.6549,1.6804). This means we are 95% confident that for each 1 cm increase in hand span the average number of Tootsie Rolls grabbed increases by 0.65– 1.68. Random sampling would have helped the generalizability of this study. Further studies might explore a wider range of ages and hence a wider range of hand spans. One might also look at different sizes/ shapes of candy. Possible extensions to other objects, not candy.
Chapter 10 Research Article 1. The researchers are investigating whether early adolescents’ preferences for non-mainstream types of popular music are predictive of current and future delinquency. 2. Music is an important medium for adolescents to enhance mood, cope with problems, and develop social identity (ter Bogt et al., 2011), and adolescents tend to have friends with similar musical tastes (Selfhout et al., 2009). 3. 309 adolescents were in the study.
100
4. All adolescents were in their first year of high school (mean age = 12).
50
5. Chart pop (mean = 4.12) and R&B (3.51) were the two most liked genres of music at age 12.
0 –1.200
Count samples
–0.600
0 0.600 Shuf˜ed slopes
Greater than
1.17
1.200
Count
Count = 0/1000 (0.0000)
b. The observed statistic of 1.17 is out in the tail; it is an unlikely result assuming the null is true. c. p-value is approximately 0. We have strong evidence against the null and in support of there being an association between hand span and number of Tootsie Rolls grabbed. The observed slope of 1.17 from the least squares regression never occurred as a simulated slope when it was assumed there was no association between hand span and number of Tootsie Rolls. 14. The number of Tootsie Rolls grabbed appears to be distributed symmetrically above and below the regression line for each value of hand span AND the spread of the number of Tootsie Rolls grabbed appears to be the same for each value of hand span. The theory-based p-value is 0. This matches our simulation. 15. Confidence interval for β is (0.6549,1.6804). Zero is not in the interval, which makes sense because we found strong evidence against the null of zero slope, or no association.
c10Solutions.indd 147
147
6. Punk (mean = 1.90) and techno/hardhouse (mean = 1.68) were the least liked forms of music at age 12. 7. Chart pop (mean = 3.68) and R&B (mean = 3.27) were still the most liked genres of music at age 16. 8. Gothic (mean = 1.68) and techno/hardhouse (mean = 1.98) were the least liked forms of music at age 16. 9. a. 4.12 is the average preference score for chart pop at age 12 (when age – 12 = 0). b. –0.12 is the average decrease in preference for chart pop music each year. c. Null: There is no relationship between preference for chart pop music and age (slope = 0). Alternative: There is a relationship between preference for chart pop music and age (slope not equal to 0). The p-value (< 0.05) gives strong evidence that there is a relationship between preference for chart pop music and age. 10. a. 3.27 is the average preference score for hip hop music at age 12. b. 0.10 is the average increase in preference for hip hop music each year. c. Null: There is no relationship between preference for hip hop music and age (slope = 0). Alternative: There is a relationship between preference for hip hop music and age (slope not equal to 0). The p-value (> 0.05) means we do not have evidence that there is a relationship between preference for hip hop music and age.
10/16/20 8:02 PM
148
C HA PTE R 10
Two Quantitative Variables
11. Null: There is no relationship between preference for chart pop music and R&B music at age 12. Alternative: There is a relationship between preference for chart pop music and R&B music at age 12. Based on the p-value of < 0.01, there is strong evidence of a relationship between preference for chart pop music and R&B music at age 12. 12. Null: There is no relationship between preference for classical music and hip-hop music at age 12. Alternative: There is a relationship between preference for classical music and hip hop music at age 12. Based on the p-value of < 0.01, there is strong evidence of a relationship between preference for classical music and hip hop music at age 12. 13. Null: There is no relationship between preference for metal music at age 12 and delinquency at age 16. Alternative: There is a relationship between preference for metal music at age 12 and delinquency at age 16. Based on the p-value of < 0.01, there is strong evidence of a relationship between preference for metal music at age 12 and delinquency at age 16.
c10Solutions.indd 148
14. Null: There is no relationship between preference for chart pop music at age 12 and delinquency at age 16. Alternative: There is a relationship between preference for chart pop music at age 12 and delinquency at age 16. Based on the p-value of > 0.05, we do not have evidence of a relationship between preference for chart pop music at age 12 and delinquency at age 16. 15. This is not a random sample of ado-lescents in the Netherlands, so it may be representative of adolescents in the Netherlands, but not necessarily. 16. No, this is an observational study so a cause-and-effect conclusion is not possible 17. Look to see whether similar results hold in other countries. Investigate what is going into early adolescents’ decision to prefer certain types of music.
10/16/20 8:02 PM
CHAPTER 11
Modeling Randomness Section 11.1
11.1.13
11.1.1 C.
a. As the number of dice increases, the probability of getting at least one 5 or 6 will also increase. (Imagine rolling 100 dice. It is almost a sure thing that you will get at least one 5 or 6.) Therefore the probability that the price will be 50 cents or less will decrease as the number of dice increases. Also, rolling additional dice will never drop a price from above 50 cents to below it, but it can move it from below 50 cents to above it.
11.1.2 B. 11.1.3 C. 11.1.4 E. 11.1.5 A. 11.1.6 a. Yes b. Yes c. No d. No e. Yes f. Yes (but not too much longer using a computer) 11.1.7 Number the index cards 1, 2, and 3 (they each represent a phone). Shuffle them and place them on the table in spots labeled 1, 2, and 3. If all the numbers on the cards fail to match all the numbered spots on the table call that a success; anything else call a failure. Repeat this 1,000 times. The number of successes divided by 1,000 is an estimate for the probability none of them would get their own phones. 11.1.8 a. ABC, ACB, BAC, BCA, CAB, CBA b. BCA, CAB
b. Because getting a “large” number is more likely with the more dice, the price is more likely to be higher and so the mean price will also increase as the number of dice increases. c. If you can choose one die (and you don’t have to multiply the outcome by 10) then you will always be able to afford the ice cream cone! However, if you have to use more than one die, you should choose two dice because the probability of an outcome of 50 cents or less is the greatest. 11.1.14 a. 1/6(1/6)(2) + 1/6(2/6)(4) = 10/36 b. (1,2) (1,3) (1,4) (1,5) (1,6) (2,1) (2,3) (2,4) (2,5) (2,6) (3,1) (3,2) (3,4) (3,5) (3,6) (4,1) (4,2) (4,3) (4,5) (4,6) (5,1) (5,2) (5,3) (5,4) (5,6) (6,1) (6,2) (6,3) (6,4) (6,5) P(consecutive) = 10/30 c. The cards. This makes sense because while the numerator is the same, six pairs were eliminated from the denominator when we switched to the cards, therefore that probability increased in value.
c. 2/6 = 1/3
11.1.15
d. If the three executives dropped their phones and randomly picked them up many, many times, about 1/3 of the time nobody would get their own phone.
a. B1B2B3, G1B2B3, B1G2B3, B1B2G3, G1G2B3, B1G2G3, G1B2G3, G1G2G3 b. B1B2B3, G1B2B3, B1G2B3, B1B2G3
11.1.9
11.1.16
a. 1/2 b. 0 c. 1/6 d. 2/3 11.1.10
c. 2/8 = 1/4 a. B1B2B3B4, G1B2B3B4, B1G2B3B4, B1B2G3B4, B1B2B3G4, G1G2B3B4, G1B2G3B4, G1B2B3G4, B1G2G3B4, B1G2B3G4, B1B2G3G4, B1G2G3G4, G1B2G3G4, G1G2B3G4, G1G2G3B4, G1G2G3G4 b. B1B2B3B4, G1B2B3B4, B1G2B3B4, B1B2G3B4, B1B2B3G4 c. 2/16 = 1/8 d. A 3 to 1 breakdown is more likely because this probability is 8/16 whereas two boys and two girls is 6/16. 11.1.17
11.1.11
11.1.12
a. Write “Male” on four cards and “Female” on two cards. Shuffle the six cards and randomly choose two of them. If they are the two Female cards call this a success. Repeat this for a total of 1,000 times. The proportion of successes out of 1,000 is an estimate for the probability. b. If you do it with dice, you can think each number represents a different person. However, you could roll doubles (e.g. (3,3)) and
149
c11Solutions.indd 149
10/16/20 8:03 PM
150
C HA PTE R 11
Modeling Randomness
you can’t pick the same person twice, so this is not a good model for the simulation.
d. Yes because P(A and B) = 0 e.
11.1.18
B
A
a. AB, AC, AD, AE, AF, BC, BD, BE, BF, CD, CE, CF, DE, DF, EF b. 1/15
0.2
0.7
c. If you repeatedly choose two people from the committee at random, about 1 in every 15 times you would choose both of the women.
0.1
d. 9/15 11.1.19
11.2.9
a. H6H7; H6T7H8; T6H7H8; T6T7H8H9; T6H7T8H9; H6T7T8H9; T6T7T8; H6T7T8T9; T6H7T8T9; T6T7H8T9
a. HHH, HHT, HTH, THH, TTH, THT, HTT, TTT
b. Heather wins on the first 6 listed in part (a) and Tom wins on the last 4 listed. c. No, as they are different length sequences the probabilities won’t all be the same. 11.1.20 a. Flip four coins and count the number of heads. Repeat this until you have done it 1,000 times. Count how many times 2 or more heads occurred. Divide that number by 1,000 and that is an estimate for the probability that Heather would win. b. 0.687163
b.
i. 1/8 ii. 3/8 iii. 7/8 iv. 1/8
11.2.10 a. 0.68 b. 0.21 c. 0.20 d. 0.69 11.2.11
c. 0.750627
a.
11.1.21
FB
a. I would expect that Heather has a smaller probability of winning. Heather’s one-point advantage is a bigger deal in the short term than in the long run. b. Because Heather’s one-point advantage is almost nothing if they have to go all the way to 100, her probability of winning should be very slightly above 1/2.
IG 0.41
0.27
0.01 0.31
b. 0.28 c. 0.27
Section 11.2
d. 0.69
11.2.1 A and D.
e. 0.31
11.2.2 D.
11.2.12
11.2.3 C. 11.2.4 E.
11.2.13
11.2.5 E.
a.
11.2.6 A. 11.2.7
a. 0.6
Yes
No
Yes
0.14
0.14
0.28
No
0.07
0.65
0.72
0.21
0.79
1.00
b. 0.5
c. 0.7 d. No because P(A and B) ≠ 0
Total
e.
b. 0.07 A
B 0.2
0.2
c. 0.14 d. 0.35
0.3 0.3
11.2.14 The events are not complements. To be complements there would need to be no one that uses both Twitter and Instagram and everyone uses one or the other.
11.2.8
11.2.15
a. 0.3
a. 1 − (0.42 + 0.10 + 0.04) = 0.44
b. 0.8
b. 0.42 + 0.44 = 0.86
c. 0.9
c. 1 − 0.42 = 0.58
c11Solutions.indd 150
Total
10/16/20 8:03 PM
Solutions to Problems 11.2.16
b. 0.2/0.4 = 0.5
a. P(Y) = 336/1,864 ≈ 0.1803 and it is the probability a randomly chosen student contracted the norovirus.
c. 0.4 + 0.5 – 0.2 = 0.7
b. P(F) = 1,261/1,864 ≈ 0.6765 and it is the probability a randomly chosen student is a female.
e.
d. Yes because P(A and B) = P(A)P(B) = 0.2.
c. P(F c) = 603/1,864 ≈ 0.3235 and it is the probability a randomly chosen student is a male.
A
d. P(Y and F) = 212/1,864 ≈ 0.1137 and it is the probability a randomly chosen student is both female and has contracted the norovirus. e. P(Y or F) = (212 + 124 + 1,049)/ 1,864 = 1,385/1,864 ≈ 0.7430 and it is the probability a randomly chosen student has either contracted the norovirus or is female. 11.2.17 a. (Y or F)c means the event is neither Yes nor Female, so it means that it is the males that did not get the norovirus so it is the same as (N and M). That probability is 479/1,864 = 0.2570. c
c
b. a) (Y and F ) means the event is not Yes (so No) and not Female (so Male), so it means that it is the males that did not get the norovirus so it is the same as (N and M). That probability is 479/1,864 = 0.2570 as in part (a).
B 0.2
0.2
0.3 0.3
11.3.8 a. 0.4/0.5 = 0.8 b. 0.4/0.8 = 0.5 c. 0.8 + 0.5 – 0.4 = 0.9 d. Yes because P(A and B) = P(A)P(B). e. A
11.2.18 a. 1,955/3,222 ≈ 0.6068
B 0.4
0.4
0.1
b. 789/3,222 ≈ 0.2449
0.1
c. 457/3,222 ≈ 0.1418 d. e. Because it is measuring those that are not female and not sophomores it is measuring those males that are freshmen, juniors, seniors or non-degree which is 935/3,222 ≈ 0.2902. f. Because it is measuring those that are neither female nor sophomores it is measuring those males that are freshmen, juniors, seniors, or non-degree which is 935/3,222 ≈ 0.2902, the same as the previous question. 11.2.19 a. 1,267/3,222 ≈ 0.3932 b. 264/3,222 ≈ 0.0819 c. (1,267 + 445)/3,222 = 1,712/3,222 ≈ 0.5313 d. 264/1,267 ≈ 0.2084 e. 264/709 ≈ 0.3724 11.2.20 a. GGG, GGB, GBG, BGG, BBG, BGB, GBB, BBB b. 1/8 (the outcome is GGB) c. 3/8 (the outcomes are BGG or GBG or GGB)
Section 11.3 11.3.1 A. 11.3.2 C.
11.3.9 a. TTT, TTH, THT, HTT, HHT, HTH, THH, HHH b.
i. 7/8 ii. 3/4 iii.
H: 3/7
11.3.10 a. 1/7 b. 1/4 c. 1/4 d. 1/2 11.3.11 a. 1/8 b. 3/8 11.3.12 a. 0.20/0.21 = 0.95 b. 0.20/0.68 = 0.29 11.3.13 a.
11.3.3 B.
FB
11.3.4 A.
IG 0.41
11.3.5 B, D, G. 11.3.6 and because A and B are independent, P(A and B) = P(A)P(B). Therefore, P(A and B) ≠ 0 which implies that A and B are not disjoint.
0.27
0.01 0.31
11.3.7
b. 0.27/0.28 = 0.96
a. 0.2/0.5 = 0.4
c. 0.27/0.68 = 0.40
c11Solutions.indd 151
151
10/16/20 8:03 PM
152
C HA PTE R 11
Modeling Randomness
d. 1 – 0.31 = 0.69
11.3.22
11.3.14
a. 0.56 + 0.35 = 0.91
a. 0.65 = P(I |T) = P(I and T)/0.21 so P(I and T ) = 0.21 × 0.65 = 0.14.
b. 0.35/0.91 = 0.38
Instagram Total
11.3.23
Total
a. P(C) = 0.077; P(C|S) = 0.244
0.14
0.28
0.07
0.65
0.72
11.3.24
0.21
0.79
1.00
Yes
No
Yes
0.14
No
b. No because the P(C) ≠ P(C|S) a.
b. 0.14/0.28 = 0.50 c. 0.14 d. 0.65 11.3.15
P(T) = P(T |I). 11.3.16 a. The probability that a randomly chosen student got the norovirus is 336/1,864 = 0.180. b. The probability that a randomly chosen female got the norovirus is 212/1,261 = 0.168.
Banded
Unbanded
Survived
24
24
Died
26
26
Total
50
50
Section 11.4
d. No, the male rate was 124/603 = 0.206 whereas the female rate was 0.168.
11.4.2 D.
11.3.17 a. 789/3,222 = 0.245 b. 1955/3,222 = 0.607
11.4.1 C. 11.4.3 C. 11.4.4 B. 11.4.5 B. 11.4.6 a.
c. 457/1,955 = 0.234 d. 457/789 = 0.579 e. No they are not independent because P(So) ≠ P(So|F) [or P(F) ≠ P(F|S)]. 11.3.18 a. (0.61)(0.40) = 0.24 b. (0.39)(0.30) = 0.12 c. 0.24 + 0.12 = 0.36
100
b. If there is no association, the two conditional proportions of survival must be the same. The only way they can be the same is if they are the same as the overall proportion of survival. So we would need all three values to equal 0.48 in the penguin study. If surviving is independent of being banded then the probability of surviving given banded needs to be the same as the probability of surviving. This is what we said had to happen if there was no association; we used the idea of proportion instead of probability. Again, these are both 0.48 in the penguin study.
c. No, they are not quite independent, as the probability a female got the norovirus is different (lower) than the overall probability that someone got the norovirus.
e. Yes. If the male rate was the same as the female rate then both of these would have to be the same as the overall rate.
Total
x
0
1
2
3
p(x)
1/8
3/8
3/8
1/8
b. The event when 2 coins land heads up. c. The probability exactly 2 of the 3 coins land heads up is 3/8. d. The probability 2 or fewer of the 3 coins land heads up is 7/8. 11.4.7
11.3.19
a. (1,1) (1,2) (1,3) (1,4) (2,1) (2,2) (2,3) (2,4) (3,1) (3,2) (3,3) (3,4) (4,1) (4,2) (4,3) (4,4)
a. 0.422 = 0.18
b.
b. 0.582 = 0.34 c. (0.42)(0.58) + (0.58)(0.42) = 0.49 (or 1 – (0.18 + 0.34) = 0.48. Note that the two answers are a bit different because of rounding error. 11.3.20 a. 0.425 = 0.013 b. 0.585 = 0.066 c. 1 – 0.066 = 0.934 11.3.21 a. 0.110 b. 0.910 = 0.35 c. 1 – 0.35 = 0.65
c11Solutions.indd 152
x
2
3
4
5
6
7
8
p(x)
1/16
2/16
3/16
4/16
3/16
2/16
1/16
c. The event where the sum of the numbers on the dice is more than 3. d. The probability that the sum is more than 3 when the two dice are thrown is 1 – P(X = 2) – P(X = 3) = 1 – (1/16) – (2/16) = 13/16. The only outcomes not included are (1,1), (1,2), and (2,1). Keep in mind that with discrete random variables, this is not the same as P(X ≥ 3). 11.4.8 a. The probability that a randomly chosen family household has exactly 4 members is 0.1936.
10/16/20 8:03 PM
Solutions to Problems 153 b. The probability that a randomly chosen family household has more than 4 members is 0.0906 + 0.0341 + 0.0192 = 0.1439. c. The probability that a randomly chosen family household has 4 or more members is 0.1936 + 0.1439 = 0.3375. 11.4.9
b. The expected value is 1. This means that if you were to pick two puppies at random from this litter many, many times, you should expect to get 1 male puppy, on average, each time. 11.4.18 a.
a. The probability that a randomly chosen nonfamily household has 3 or 4 members is 0.0323. b. The probability that a randomly chosen nonfamily household has 3 or fewer members is 0.9888. c. The probability that a randomly chosen nonfamily household has fewer than 4 members (which is the same as 3 or fewer) is 0.9888. 11.4.10
p(x)
0
6
10
25/36
5/36
1/6
b. The expected value is 90/36 = 2.50, meaning that you can expect to win about $2.50 per game, on average, in the long run. c. You would have to pay $2.50 per game. 11.4.19
a. 9/24
a.
b. 15/24 c. If 3 people draw their own names then the only name left to draw must be that of the person than hasn’t drawn. So if we know 3 people have drawn their own names, we also know all four of them must. 11.4.11
x
0
1
2
3
p(x)
8/27
12/27
6/27
1/27
b. 7/27 11.4.20
a. 3.1149
a. 0.477
b. 1.2633
b. 0.523
11.4.12
c. 3.441
a. 1.2393
d. 2.462
b. 0.5698 11.4.13
Section 11.5
a. 1
11.5.1 B.
b. 1
11.5.2 A.
11.4.14
11.5.3 C.
a.
11.5.4 D. 11.5.5 D.
x
0
1
2
3
p(x)
2/6
3/6
0
1/6
b. 1
11.5.6 C. 11.5.7 a. 7
c. 1
b. 2.4152
d. They are the same.
c. Mean of 9 and SD of 2.4152
11.4.15 a. The expected payout of wheel B is $2.33, the same as that of wheel A. b. The SD of wheel B is $1.1785, much smaller than that of wheel A. c. Answers will vary. Both will give the same average outcome in the long run, but you are just spinning it once. Those that like taking chances may tend to pick wheel A, as there is a chance to win a larger amount of money than anything on wheel B. Those that are more cautious may pick wheel B, as there is a greater chance that you will win something. 11.4.16
d. Mean of 14 and SD of 4.8304 11.5.8 a.
x
1
2
3
4
5
6
p(x)
1/6
1/6
1/6
1/6
1/6
1/6
b. Mean of 3.5 and SD of 1.7078
_______________
c. Mean: 3.5 + 3.5 = 7, SD = √1.70782 + 1.70782 = 2.4152. The mean and the standard deviation are the same.
a. $1.0017 b. For wheel A the SD is $6.8475 and the average distance is $3.775. For wheel B the SD is $1.1785 and the average distance is $1.0017. So for wheel A they are not similar (because the $25 payout is an outlier), but for wheel B they are similar because there are no outliers. 11.4.17 a.
c11Solutions.indd 153
x
x
0
1
2
p(x)
1/5
3/5
1/5
d. 2X is describing rolling a fair six-sided die and then doubling the result. It is not the same as the sum of the numbers on two dice. For example, you can get a 3 when you are finding the sum, but that can’t be obtained when doubling the numbers. e. The distribution of 2X has a mean of 2(3.5) = 7 (the same as that of X + X) and a SD of 2(1.7078) = 3.4156 (not the same as that of X + X). So in this case, X + X ≠ 2X! The variability is larger with X + X, because there are now two sources of randomness rather than only one.
10/16/20 8:03 PM
154
C HA PTE R 11
Modeling Randomness
11.5.9
11.6.5
a. 16/36
A. would be better modeled with a binomial distribution because the probability of success (picking a female) is much closer to the same value for each student picked because the population size is large compared to the sample size.
b. 20/36 c. E(X) = (1)(16/36) + (–1)(20/36) = −4/36. You can expect to lose $0.11, on average, per game. 11.5.10 a. −2/38 = −$0.0526 (you are expected to lose about 5 cents per game, in the long run) b. $0.9986
11.6.6 a. 0.2201 b. 0.3499 c. 0.9127 d. Not necessarily. The probability that someone has a college degree on a certain street in a certain town could be much different than what it is nationally.
c. −$0.1052 d. $1.412 11.5.11 a. −2/38 = −$0.0526 (you are expected to lose about 5 cents) b. $5.7626
11.6.7 a. 0.0548 b. 0.2516
c. −$0.1578
c. 0.7712
d. $9.9811
d. Not necessarily. The probability that a household in a single neighborhood has Internet access could be much different than what it is nationally.
11.5.12 a.
x p(x)
−2
0
2
0.2770
0.4986
0.2244
b. −$0.1052
11.6.8 a. 0.00098 b. 0.00098
c. $1.412
c. when π = 0.50, the binomial distribution is symmetric and P(X = 10) = P(X = 0)
11.5.13
d. 0.3770
a.
11.6.9
x p(x)
−2
34
70
0.9481
0.05125
0.000693
b. −$0.1052 c. $8.150
a.
i. 1 − 0.357 = 0.643 ii. 0.6434 = 0.1709 iii. 1 − 0.1709 = 0.8291
b. 0.829156 = 0.0000276
11.5.14
c. The estimate should be larger, as there are many sets of 56 consecutive games in 1941 not just one.
11.5.15
11.6.10
11.5.16 11.5.17
a. X follows a binomial distribution with n = 4 and π = 0.51.
11.5.18 a. If we let S be a score, then the first method is S/2 – 1 and the second method is (S – 1)/2 = S/2 – 1/2. We can see by our equations that the second method will give scores (and thus a mean) that is a half point higher than the first method.
11.5.19
Section 11.6
0
1
2
3
4
p(x)
0.0576
0.2400
0.3747
0.2600
0.0677
b. X follows a binomial distribution with n = 4 and π = 0.49
b. Because both methods divide by 2 (or multiply by 1/2) they will give identical standard deviations (when the shift occurs does not impact the standard deviation). 11.5.20
x
x
0
1
2
3
4
p(x)
0.0677
0.2600
0.3747
0.2400
0.0576
11.6.11 0(0.0576) + 1(0.2400)___________ + 2(0.3747) + 3(0.2600) + 4(0.0677) = 4(0.51) = 2.04 and the SD is √4(0.51)(0.49) ≈ 0.9998. The mean number of girls is 1.96 and the SD is 0.9998.
11.6.1 A.
11.6.12
11.6.2 C.
a. Let X represent the number of no-shows. X follows a binomial probability distribution with n = 44 and π = 0.05. We want to find P(X < 2) = P(X ≤ 1) = 0.3471.
11.6.3 In C, the probability of a yes response presumably differs in the two locations. 11.6.4 C.
c11Solutions.indd 154
b. P(X ≥ 2) = 0.6529 c. E(X) = 44(0.95) = 41.8 passengers
10/16/20 8:03 PM
Solutions to Problems 11.6.13
11.7.3 D.
a. Let X represent the number of left-handers. X follows a geometric distribution with π = 0.10. We want P(X = 5) = (0.94)(0.1) = 0.06561
11.7.4 B.
b. P(X ≤ 5) = 0.40951
11.7.6 E.
c. We want to find k so P(X ≤ k) is greater than or equal to 0.5. P(X ≤ 7) = 0.5217, so need 7 picks. 11.6.14 a. Let X represent the number of winners in 3 cups. X follows a binomial distribution with n = 3 and π = 0.1667. P(X = 1) = 0.3473. b. 1 − P(X = 0) = 0.4214 c. Let Y represent the number of cups before the first winner. Y follows a geometric distribution with π = 0.1667. P(Y = 3) = 0.1158. d. P(Y ≤ 3) = 0.4214 11.6.15
11.7.5 A. 11.7.7 a. 95% b. 2.5% c. 2.5% 11.7.8 a. 68% b. 95% c. 95%/2 = 47.5% d. 47.5% + 34% = 81.5% 11.7.9
a. 0.4219
a. 0.0228 or about 2.5%
b. 0.6836
b. 0.0646
c. 0.1055
c. 119
d. 0.6836 e. No, a 1 in 4 chance does not guarantee that one in every four boxes is a winner, but that if you buy infinitely boxes, the proportion of winners will approach 0.25. Though the chance on a winner in four boxes is 0.6836, there is still about a 32% chance you won’t receive a winner in the first four boxes. 11.6.16 a. E(X) = 4(0.25) = 1
___________
b. SD(X) = √4(0.25)(0.75) = 0.8660 c. 1/0.25 = 4 11.6.17 a. 0.4043 b. 0.7293 c. 0.0809 d. 11.6.18 a. 1.15 b. 4.348
11.7.10 a. 0.00135 b. 130.8 and above c. No 11.7.11 a. About 0.0131 b. About 2,760 g and below 11.7.12 a. 0.01066 b. About 4,547 g and above 11.7.13 11.7.14 a. 82nd (.8233) b. 1,339.1 or above 11.7.15
11.6.19
a. 95th
11.6.20
11.7.16
a. 18/38
a. 0.9
b. 0.2762, 0.9233
b. 66.3″
c. 0.9233
11.7.17
11.6.21 18/38) and –1 (with probability 20/38). E(X) = 18/38 – 20/38. You can expect to lose about $0.0556 per game in the long run.
a. 0
11.6.22 a.
i. 0.2373 ii. 0.9990
b.
i. 0.0117 ii. P(X < 4) = 0.9844
Section 11.7 11.7.1 B. 11.7.2 D.
c11Solutions.indd 155
155
b. 27.92 or above
b. 0.1265 c. P(X = 3) = 10(0.1265)3(0.8735)2 = 0.0154 d. P(X ≥ 3) = 0.0166 11.7.18 a. 0.1841 b. P(X = 3) = 4(0.1841)3(0.8159)1 = 0.0204 c. P(X = 3) + P(X = 4) = 0.0204 + (0.1841)4 = 0.0215 11.7.19 a. Mean of 133.1″, SD of 4.1″ b. Mean of 266.2″, SD of 5.8″
10/16/20 8:03 PM
156
C HA PTE R 11
Modeling Randomness
c. Mean of 335.5″ SD of 6.5″
11.8.15
11.7.20
a. Answers will vary, but it should be close to 0.058. See figure for Solution 11.8.15a.
a. 0.3821 b. 38th
b. 0.0579
Section 11.8
11.8.16
11.8.1 B.
a. 0.8491
11.8.2 A.
b. 0.9982
11.8.3 B.
11.8.17
11.8.4 C.
a. 0.8342
11.8.5 B.
b. 5
11.8.6
11.8.18
11.8.7
a. 0.1265
11.8.8
b. 0.0053 (with SD = 1.252)
11.8.9
11.8.19
11.8.10
a. 0.1841
a. 0.7458
b. 0.0359 (with SD = 1.5)
b. Assuming a larger percentage, the distribution will shift to the right. Then the peak of the distribution will no longer be in the middle of our interval, so the area under the curve between 0.17 and 0.21 will decrease (assuming not much change in SD).
11.8.20
c. 0.0535
11.8.11
a. Kayla makes a larger proportion of her shots compared to Jose. b. Yes, for each player we expect more than 10 successes (60, 70) and more than 10 failures (40, 30). c. Mean = 0.10; SD = 0.067
a. 0.4564 b. The probability is smaller because the region of interest is no longer centered at the mean of the distribution. 11.8.12
d. –1.49 e. 0.932 11.8.21 a. It would increase.
a. 0.5708
b. It would increase
b. 0.8253 c. The probability in part (b) is larger because the interval of interest is still centered at the mean of the distribution but is now wider. 11.8.13 a. 0.2525 b. 0.0175 (with SD = 4.743) 11.8.14
11.8.22 a. The mean time for Karen to grade the questions is larger than that for John. b. Yes, because the normal distribution.
times
for
each
individual
follows
a
c. Mean = –30 seconds; SD = 11.40 seconds d. 2.63
a. 0.1333 b. 0.0065 (with SD = 201.25 g)
e. 0.004
Solution 11.8.15a
c11Solutions.indd 156
10/16/20 8:03 PM
Solutions to Problems _ 11.8.23 Let X represent the _ sample mean time (in seconds) for Karen to grade 5 exams and let X represent the _ sample mean time (in seconds) for John to grade 4 exams. Then X _has a normal distribution __ with mean 90 and SD 20/√__5 ≈ 8.944, and_Xj has a normal distribution with mean 120 and SD 30/√4 = 15. Then 5Xk is the total time for Karen to grade 5 exams, which has a normal distribution with mean 5(90) = _ 450 and SD 5(8.944) = 44.721. Also 4Xj is the total time for John to grade 4 exams, which has a normal_ distribution with mean 4(120) = _ _ _ 480 and SD 4(15) _ =_60. We want P(5Xk < 4Xj), which is also P(5Xk – 4Xj < 0). Now, 5Xk___________ – 4Xj has a normal distribution with mean 450 – 480 = _ 2 2 √ –30 _ and SD 44.721 + 60 ≈ 74.833. With that distribution, P(5Xk – 4X j < 0) ≈ 0.656.
1st Red
1st Green
Total
2nd Red
0.25
0.10
0.35
2nd Green
0.20
0.45
0.65
Total
0.45
0.55
1.00
157
b. The complement to at least one red light is no red lights (or both green). The probability is 0.45. 11.CE.8 a. No, you have to subtract off the intersection (the proportion of homes that own both) from the sum. b. 36.5%
End of Chapter 11 Exercises
c. All the cat owners were also dog owners.
11.CE.1 C.
11.CE.9
11.CE.2 A.
a. 0.53
11.CE.3 A.
b. 0.47
11.CE.4 D.
c. 0.17/0.42 = 0.405
11.CE.5 Each index card would represent a different prize. Shuffle the cards and randomly draw one. Note which prize it is. Repeat this process until you have drawn all four prizes. The number of drawings it took you to get all four prizes is a value in the distribution. Repeat this process many, many times (like 1,000 or 10,000 total times) to get other values in the distribution. An estimate for the probability that it would take at least 10 purchases to get all four prizes is the proportion of numbers in your distribution that are 10 or more.
d. No, because P(Speed|Park) = 0.405 ≠ P(Speed) = 0.28
11.CE.6 a. B1B2B3B4, G1B2B3B4, B1G2B3B4, B1B2G3B4, B1B2B3G4, G1G2B3B4, G1B2G3B4, G1B2B3G4, B1G2G3B4, B1G2B3G4, B1B2G3G4, B1G2G3G4, G1B2G3G4, G1G2B3G4, G1G2G3B4, G1G2G3G4
e. No because P(Speed and Park) ≠ 0 f. P(Both|At Least One) = 0.17/0.53 = 0.321 11.CE.10 a. P(Rain Sat) = 0.20, P(Rain Sun|Rain Sat) = 0.80, P(Rain Sun|Rain Satc) = 0.10 b. See tree diagram below Rain
0.80 Rain
0.20
0.20
b. 2/16 = 1/8 Sat
c. A 3 to 1 breakdown is more likely because this probability is 8/16 whereas two boys and two girls is 6/16.
0.80
11.CE.7
0.20 1st Red
c11Solutions.indd 157
0.25
No Rain
0.10 0.90
a. P(1st Red or 2nd Red) = P(1st Red) + P(2nd Red) – P(Both Red) = 0.45 + 0.35 – 0.25 = 0.55 by the addition rule. You can also see this in the Venn diagram and table by adding up 0.20, 0.25, and 0.10. 0.45
Sun
No Rain Rain
No Rain
c. 0.20 × 0.80 = 0.16 d. 0.16 + 0.08 = 0.24 e. P(Rain Sat|Rain Sun) = 0.16/0.24 = 0.667
0.10 2nd Red
10/16/20 8:03 PM