SOLUTIONS MANUAL
CHAPTER 1
Sources of Variation Section 1.1
1.1.10 Color of asignistheexplanatoryvariablewithwhite,yellow, andredbeingthelevels.
1.1.1 B.
1.1.11
1.1.2 B & C. 1.1.3 A.
Observed Variation in:
1.1.4 C.
f.whetherthestudent obeyedthesign
1.1.5 E. 1.1.6 B.
60.34if rigid librarian number of uses for items= 1.1.7 predicted { 92.19if eccentric poet 1.1.8 a. The inclusion criteria are having a clinical diagnosis of mild to moderate depression without any treatment four weeks prior and duringthestudy. b. The purpose of randomly assigning subjects to the groups is to makegroupsverysimilarexceptfortheonevariable(swimmingwith dolphinsornot)thattheresearchersimpose.Volunteeringforagroup couldintroduceaconfoundingvariable. c. Itwasimportantthatthesubjectsinthecontrolgroupswimevery day withoutdolphinssothatthiscontrolgroupdoeseverything(including swimming) that the experimental group does except that whentheyswimtheydon’tdoitinthepresenceof dolphins.Without thiswewouldn’tknowwhetherjustswimmingcausesthedifference inthereductionof depressionsymptoms. d. Yes,thisisanexperimentbecausethesubjectswererandomlyassignedtothetwogroups. 1.1.9.
Observed variation in: d.substantialreduction indepressionsymptoms
Sources of explained variation
Inclusion criteria a.swimmingwith dolphinsornot • b.mildtomoderate depression • c.nouseof antidepressantdrugs orpsychotherapyfour weekspriortothe study Design • e.swimming • f.stayingonanisland fortwoweeksduring thestudy
Sources of unexplained variation • g.problemsinthe personallivesof thesubjectsduring thestudy • h.illnessof subjectsduring thestudy
Inclusion criteria • c.timeof day • e.ageof subject
Sources of explained variation
Sources of unexplained variation
a.colorof the sign
b.whetherthesubject wasleft-handedor right-handed d.attitudeof student e.ageof subject
1.1.12 a. Thevalue6.21representstheoverallmeanquizscore,5.50represents the group mean quiz score for people who used computer notes, and 6.92representsthegroupmeanscoreforpeoplewhousedpapernotes. b. Welooktoseehowfar6.92and5.50arefromoneanotherorfrom theoverallmeanof 6.21todeterminewhetherthenote-takingmethod mightaffectthescore. c. The number 1.76 represents the typical deviation of an observationfromtheexpectedvalue,inthiscase,fromtheoverallmean.The number1.61representsthetypicaldeviationof anobservationafter creatingamodelthattakesintoaccountwhetherthepersonisusing computerorpapernotes. d. Becausethestandarddeviationof theresidualsrepresentstheleftovervariation,wecanseethatafterincludingthetypeof notesasan explanatoryvariableinourmodeltheunexplainedvariationhasbeen reduced(downto1.61from1.76).Thistellsusthatknowingthetype of note-takingmethodenablesustobetterpredictscores. 1.1.13 Random assignment should make the two groups very similar with regard to variables like intelligence, previous knowledge, or any other variable and thus likely eliminate possible confoundingvariables. 1.1.14 a. This table shows us possible confounding variables but then shows that subjects in the two groups are quite similar with regard to these characteristics, thus ruling out these possible confoundingvariables. b. We would want the p-values to be large, so we could say that we havelittletonoevidencethatthereisadifferenceinmeanage, proportionof males,etc.betweenthetwogroups.Wewantourgroups tobeverysimilargoingintothestudy,soacausalconclusionispossibleif wefindasmallp-valueafterapplyingthetreatment(s).
3
c01InstructorSolutions.indd 3
16/10/20 6:49 PM
4
C HA PTER 1
Sources of Variation
1.1.15 Itislikelythat3-to5-year-oldsmighthavedifferentpreferences when it comes to toy or candy than 12- to 14-year-olds. The older group is probably much more likely to prefer the candy over the toy andtheoppositecouldbetruewiththeyoungergroup.Wewouldnot seethisdifferenceif theresultsof alltheagesarecombinedtogether.
Section 1.2 1.2.1 B. 1.2.2 A, D. 1.2.3 C. 1.2.4 A. 1.2.5 C. 1.2.6 D. 1.2.7 B. 1.2.8 Using the effects model, because 4.48 + 0.65 = 5.13 (the mean of thescentgroup)and4.48−0.65=3.83(themeanof thenon-scent group),themodelsareequivalent. 1.2.9 a. SSModel. b. SSError. 1.2.10 a. R2=SSModel/SSTotal=0.4651. b. R2=1−SSError/SSTotal=0.7111. 1.2.11 a. 8. b. 6–8=–2,10–8=2. c. 74. d. 40. e. 34. f. 0.5405. 1.2.12
c. R2=11.1328/199.62=0.0558.Wecaninterpretthisbysayingthat 5.58% of the variation in the perceived level of risk is explained by whetherthenameof thehurricaneismaleorfemale. d. SSError=199.62−11.13=188.49. _____________
e. √ 188.4872/140 =1.16.
0.28ifmalename f. predicted hurricaneriskrating=5.29+ , {−0.28iffemalename SEofresiduals=1.16. 1.2.16 a. The explanatory variable is the note-taking method and the responsevariableisthequizscore. b. Theeffectof takingnotesonpaperis0.71andtheeffectof taking notesonthecomputeris−0.71. c. SSModel=40×(0.712)=20.164. d. R2=20.164/120.92=0.16675.Wecaninterpretitbysayingthat 16.675%of thevariationof quizscoreisexplainedbythenote-taking method. e. 120.92–20.164=100.756. ___________
=1.628. f. √ 100.756/38
0.71 if using paper notes g. predicted . quiz score=6.21+ {−0.71if using computer notes 1.2.17
a. Becausethesamplesizesof eachgrouparethesame,thesample sizeof eachgroupisjusthalf of thetotalsamplesize. (x i−x̅)2 _____________ ∑ ∑allobs(yi−y̅)2 _ b. _____________ allobs 1 + n _ _ −1 n −1 ( )2 2 2
(x −x) 2+∑allobs(y i−y̅ )2 _ ∑ = ____________________________ allobs i ̅ _ 1 n −1 ( )2 2
(x −x) 2+∑allobs(y i−y̅)2 ∑ allobs i ̅ =(____________________________ ) n−2
________________________________________
∑allobs(x i−x̅) 2+∑allobs(y i−y̅)2 Takingthesquarerootweget ______________________________________ n−2
√
⎛n ∑(xi−x̅)2
⎜
n ⎞ ∑(yi−y̅)2
⎟
a. The explanatory variable is the type of testing environment; it iscategorical.
i=1 1 __________ +__________ i=1 __ Usesumfrom1ton:_ n −1 2 __ n −1
b. Theresponsevariableisthetestscore;itisquantitative.
n n n ⎛n ⎞ ∑(xi−x̅)2+∑(yi−y̅)2 ∑(xi−x̅)2+∑(yi−y̅)2 i=1 i=1 i=1 i=1 =_____________________ =_ 1 _____________________ __ 2 n−2 n −1 ⎠ ⎝ 2
c. Thetwolevelsarequietenvironmentanddistractingenvironment. 1.2.13 a. SSTotal would probably be larger with these 10 subjects because withthewidevarietyof agestherewouldprobablybemorevariability inthetestscores. b. SSModelwouldprobablybethesamebecauseitwouldstillrepresentthedifferencebetweentestingenvironments. c. SSError would probably be larger because there would probably be more variability in the test scores within each group due to the variabilityinages. 1.2.14 Thevarianceof thescoresinthedistractingenvironmentis2.5 is6.The andthevarianceof thescoresinthedistractingenvironment _ squarerootof theaverageof thesetwovariancesis√4.25 The = 2.06. _ SSErroris34,sothestandarderrorof theresidualsis√34/8 =2.06. 1.2.15 a. The explanatory variably is whether the name of the hurricane is maleorfemaleandtheresponseistheperceivedrisklevel. b. The effect of naming the hurricane Christina is 5.01 − 5.29 = −0.28 andtheeffectof namingthehurricaneChristopheris 5.57 − 5.29=0.28.TheSSModelis142(0.282)=11.1328.
c01InstructorSolutions.indd 4
⎝
⎜
2
⎠
2
⎟
_____________________
√
n
n
∑(xi−x̅) 2+∑(yi−y̅)2 i=1 _____________________ Takingthesquareroot,weget i=1 . n−2
Section 1.3 1.3.1 D. 1.3.2 A. 1.3.3 D. 1.3.4 A. 1.3.5 A. 1.3.6 The validity conditions are not met because the male sample sizeissmallandthedistributionof thenumberof flip-flopsownedby themalesisquiteskewedtotheright. 1.3.7
____________________
(24.382+ a. √ 36.992)/2 =31.33. 92.16______________ − 60.34 b. t=__ =4.06. ___________________ 31.33√ 1/32+1/32
16/10/20 6:49 PM
Solutions to Exercises c. Yes, there is strong evidence that average creativity is different between “rigid librarians” and “eccentric poets” because the t-statistic is larger than 2. 1.3.8
1.3.14 a. The paper method mean is 6.92 points and the computer method mean is 5.50 points, so the paper method tends to give a higher score.
_____________________
a. √(24.24 2 + 38.78 2)/2 = 32.34. 69.97______________ − 85.71____ = −1.69. b. t = _________________ 32.34√1/24 + 1/24 c. There is not strong evidence that the average creativity measure is different between biology and theater majors because the absolute value of the t-statistic is larger than 2. 1.3.9 Yes, there is strong evidence that the long-run average game duration differs between replacement and regular referees because the difference in mean game length is 8.03 minutes and that value is way out in the right tail of the null distribution. 1.3.10 196 .50______________ − 188.47 = 2.64. a. t = _____________________ 14.47 √1/43 + 1/48 b. Yes, there is strong evidence that the long-run average game duration differs between replacement and regular referees because the t-statistic is larger than 2.
5
b. predicted quiz score = 6.21 +
−0.71 if computer , { 0.71 if paper
SE of residuals = 1.63. c. Let μcomputer be the population quiz score when notes are taken using a computer, and similarly for μpaper. The hypotheses are H0: μcomputer − μpaper = 0, that is, the long-run mean scores will be the same for both methods of note taking vs. Ha: μcomputer − μpaper ≠ 0, that is, the mean scores will not be the same for the two methods of note taking. d. t = 2.27. Because this t-statistic is greater than 2, it appears there is a statistically significant difference in the mean quiz scores between the two studying methods. e. The t-statistic is far in the right tail of the null distribution. f. Simulation-based p-value ≈ 0.006; theory-based p-value = 0.0086.
1.3.11
g. We have very strong evidence that there is a difference in the mean scores on this quiz between taking notes on computer and paper, with the paper method having a higher mean score in the long run.
a. We would need 10 cards.
1.3.15
b. We would write the 10 scores on the cards.
a. We are 95% confident that the mean score for the paper note-taking method is between 0.3832 to 2.4668 points higher than the computer note-taking method in the long run.
c. After the cards are shuffled, randomly sort them in two piles of 5, labeling one pile D and the other pile Q. Calculate the mean of the numbers on the cards in each pile and find and record the difference in means (e.g., D − Q). Repeat this process many, many times to construct a null distribution of the difference in means. 1.3.12
1.3.16
̅ a. Christopher mean xChristopher = 5.57, Christina mean x̅Christina = 5.01, so Christopher tends to be perceived as the riskier name. b. predicted hurricane risk = 5.29 +
b. Yes. Because the interval is completely positive we have evidence that in the long run the paper-based method population mean is larger than the computer-based method population mean.
−0.28 if Christina , SE of residuals = 1.16. { 0.28 if Christopher
c. Let μChristopher be the population average risk rating for hurricanes given the name Christopher, and similarly for μChristina. The hypotheses are H0: μChristopher − μChristina = 0, that is, mean perceived risk ratings are the same regardless of whether the hurricane is named Christopher or Christina name versus HA: μChristopher − μChristina ≠ 0, that is, mean perceived risk ratings differ based on whether the hurricane is named Christopher or Christina. d. The applet shows t = 2.87. Because the t-statistic is greater than 2, it looks like the difference in observed mean perceived risk ratings is statistically significant. e. The t-statistic is far out in the right tail of the simulated null distribution. f. simulation p-value ≈ 0.006; theory p-value = 0.0048. g. We have very strong evidence that the perceived hurricane threat for the name Christopher is different (more specifically, larger) than the perceived hurricane threat for the name Christina. 1.3.13 a. We are 95% confident that the mean perceived threat rating for the name Christopher is between 0.1747 and 0.9450 points higher than that for the name Christina, in the long run. b. Yes, because the entire interval (for Christopher minus Christina) is positive it shows the observed mean rating for Christopher is statistically significantly larger than that for Christina.
a. Let μMusicYes be the population memory score when people are listening to music and similarly for μMusicNo. The hypotheses are H0: μMusicYes − μMusicNo = 0, that is, mean memory scores will be the same regardless of whether or not people are listening to music versus HA: μMusicYes − μMusicNo < 0, that is, mean memory scores will be the lower for people who are listening to music compared to those who aren’t. b. There is a lot of overlap between the distribution of the scores between the two groups. It looks like the difference in sample means might not be significant. c. t = –1.28. With |t| < 2, there does not appear to be a statistically significant difference in the mean scores between the two groups. d. The t-statistic is not in the tail of the distribution. e. Simulation-based p-value ≈ 0.111; Theory-based p-value = 0.1046. f. We do not have strong evidence that listening to music tends to hinder people’s abilities to memorize words. 1.3.17 a. Whereas t-statistics and differences in means can be positive or negative, the values of R2 are never negative. The larger the value of R2, the bigger the difference between the two samples. Therefore, when we want to find R2 values that are as extreme as our observed, we always look at those that are equal to or larger than the observed R2. b. Using R2 as the statistic automatically does a two-sided test even though we are looking just in one direction. Therefore, the p-value is about twice as large as it should be for testing whether music tends to hinder people’s ability to memorize, and we should divide it by 2.
6
C HA PT E R 1
Sources of Variation
1.3.18
1.4.4 D.
a. Let μneutral be the population average amount of chili sauce used by those who play the neutral video game and similarly for μviolent. The hypotheses are H0: μneutral − μviolent = 0, that is, in the long run the average amount of chili sauce used will be the same regardless of which video game is played vs. Ha: μneutral − μviolent < 0, those who play the neutral video game will select less chili on average than those who play the violent video game.
1.4.5 A.
b. Yes, the violent condition has some very large chili sauce amounts compared to the neutral condition and their mean is 16.12 vs. a mean of 9.06 for the neutral group.
e. A.
c. t = −2.96. Because |t| > 2 there appears to be a significant difference in the amount of chili sauce used by the two groups.
1.4.9 A.
d. The observed t-statistic is far out in left the tail.
1.4.11 B.
e. Simulation-based p-value ≈ 0.004; theory-based p-value = 0.0019.
1.4.12 C.
f. We have very strong evidence that people tend to put more chili sauce into the recipe (and thus be more aggressive) after they play a violent video game than when they play a non-violent one.
1.4.13
1.3.19 a. The SD should be around 0.37 which is a bit larger than 0.32. _ _ _ b. i. xnoscent = 4.52; xscent – xnoscent = 0.04. _ _ _ ii. xnoscent = 3.96; xscent – xnoscent = 1.04. iii. If the mean of the scent group is unusually large, the mean of the no scent group should be unusually small and the difference in means should be unusually large. c. If we are forcing some of the simulated differences in means to be unusually large (either positive or negative), we are making the variability of the null distribution (or the SD of the null) a bit larger than in should be compared to what we should get when we are sampling from independent populations. d. The SD should be around 0.31 which is very close to 0.32. e.
i. Through shuffling, you should get two groups that are typically quite similar and hence should have similar means, on average. The difference in these two similar means should then be zero, on average. Therefore, this type of null distribution should be centered on zero. ii. If we are sampling from two independent populations, we should get two means that are typically close to the two population means. Because our sample means are being used as the estimates for the population means, on average, we should get our two sample means back when we resample. The difference in these should be the difference in our two sample means, on average, or 1.292.
1.3.20 a. Only one combination would produce a result as extreme as −83.77, placing the nine largest times in one group and the nine smallest times in the other group. b. C(18,9) = 18!/(9!)2 = 48,620. c. 1/48,620 ≈ 0.0000206. d. The simulation-based, theory-based, and exact p-values are all quite similar as the p-values are all extremely small.
Section 1.4 1.4.1 C. 1.4.2 E. 1.4.3 B.
1.4.6 D. 1.4.7 a. C. b. A. d. B. 1.4.8 B. 1.4.10 B.
a. The F-statistic will increase and the p-value will decrease. b. The F-statistic will decrease and the p-value will increase. 1.4.14 a. 4. b. 93. c. 0.018. d. 0.536. 1.4.15 The F-statistic is much larger than 4, so there is strong evidence that the groups are significantly different.
Source of Variation
DF
Sum of Squares
Mean Squares
F
35.05
17.53
10.01
Model
2
Error
54
94.53
1.75
Total
56
129.58
19.28
1.4.16 a. There were 3 groups. b. The total sample size was 81.
Source
DF
Sums of squares
Mean squares
F
Model
2
227.63
113.81
7.08
Error
78
1,253.26
16.07
Total
80
1,480.89
129.88
1.4.17 a. The response variable is the amount of money spent on meals and the explanatory variable is the type of music playing. The experimental units are the customers eating at the restaurant during the study. b. To compute the effects, we compare the group means to the LS mean: (21.69 + 21.91 + 24.13)/3 = 22.576. The effect for no music is –£0.886, for pop music is –£0.666, and for classical music is £1.554. These numbers tell us how much each group mean is above or below the overall mean. c. predicted amount of money spent −£0.886 if no music ⎪ = £22.58 + ⎨−£0.666 if pop music . ⎪ ⎩ £1.554 if classical music
Solutions to Exercises 1.4.18 a. To compute the sum of squares for the model, we compare the group means to the overall mean: SSModel = 131(21.69 − 22.52)2 + 142(21.91 − 22.52)2 + 120(24.13 − 22.52)2 = 1454.14 (computer 451.95); this is a measure of variability between the groups. b. SSTotal = SSModel + SSError = 454.14 + 3167.62 = 3,621.74 (computer: 3619.57). c. R2 = 454.14/3,621.74 = 0.125; 12.5% of the variation in spending can be attributed to the type of music playing. d. F = (454.14/2)/(3,167.6/390) = 28.0 (computer 27.82); This is the ratio of variation between the groups and the variation within the groups. Because the F-statistic is much larger than 4 these results are significantly significant. 1.4.19 a. Let μn/p/c represent the population average amount spend by diners at this restaurant when listening to no, popular, or classical music, respectively. The hypotheses are H0: μn = μp = μc versus Ha: At least one μ differs from the others. b. The validity conditions are met because the groups are independent, the sample distributions are fairly symmetric, the sample sizes are each very large, and the SDs are all close to each other, easily within a factor of 2. c. F = 27.822. d. Both simulation-based and theory-based p-values are about 0. e. We have strong evidence that at least one population mean amount differs from the others or that the type of music played has an effect on the amount of money spent at the restaurant. f. We can make a cause-and-effect conclusion because this was an experiment. We can probably generalize to restaurants like the one that was used with customers like those involved in the experiment. It would be difficult to generalize much beyond that. 1.4.20 a. The response variable is the number of uses generated for the items. The explanatory variable is whether they imagined themselves as rigid librarians, eccentric poets, or neither. The experimental units are the 96 subjects involved in the experiment. b. The effect is −16.45 for the rigid librarians, 15.37 for the eccentric poets, and 1.09 for the control group. These numbers tell us how much each group mean is above or below the overall mean of 76.79.
7
number of uses for those who imagine themselves eccentric poets is larger than the averages for the librarian and control groups.). 1.4.22 a. From the graphs in the applets, the means of the groups appear roughly the same and there is a lot of overlap between the four groups so there does not appear to be strong evidence that at least one group mean differs from the rest. b. F = 0.536, R2 = 0.018. Although R2 is very small, it is not a standardized statistic so we can best see that there are not significant results based on the F-statistic which is much smaller than 4. c. The simulated p-value using either the F-statistic or R2 is about 0.66. This confirms that there is not much evidence of a difference in the population means. d. A large p-value is not strong evidence for the null so it is not strong evidence that all the means are the same. It just means we do not have strong evidence that there is at least one mean that is different. 1.4.23 a. _ i. xA = 3.77% (SDA = 0.83). _ ii. xB = 4.08% (SDB = 0.52). _ iii. xC = 5.10% (SDC = 0.87). _ iv. xD = 5.65% (SDD = 0.45). _ v. xE = 5.95% (SDE = 1.94). b. Group E (>2 times per week) contained the high omega-3 value. c. The larger mean for group E increases the variability between the groups (thus increasing F). The larger SD of group E will increase the variability within the groups (thus decreasing F). Because the addition of this value will both increase and decrease the F-statistic, it might be hard to determine which will have a greater effect. The new F is 4.467, which is less than the one from Example 1.4, so the increased SD had the greater effect. d. The new p-value should be about 0.006 and should be a little bit larger than the one from Example 1.4. e. No, it is not valid to perform a theory-based test because the standard deviations of the different groups are not all within a factor of 2 of each other. In particular, SDE/SDD ≈ 4.31. f. The theory-based p-value is 0.0081. It is similar to the simulation-based p-value.
⎧−16.45 if rigid librarian ⎪ c. predicted number of uses = 76.79 + ⎨ 15.37 if eccentric poet . ⎪ ⎩ 1.09 if control
1.4.24
1.4.21
a. R2 = SSModel/SSTotal, 1 – R2 = 1 – (SSModel/SSTotal) = (SSTotal –
a. SSModel = 32(16.452) + 32(15.372) + 32(1.092) = 16256.88; this is a measure of the variability between the groups.
n − k = (SSModel/SSTotal)/ R SSModel)/SSTotal, so [_ × _ 1 − R 2] [ k − 1 ]
b. SSTotal = SSModel + SSError = 109,240.9. c. R2 = SSModel/SSTotal = 0.149. This tells us that 15% of the variation in the number of uses for the items can be explained by what the subject imagines themselves as. d. F = (16,256.88/2) / (92,984.01/93) = 8.13. This is the ratio of variation between the groups and the variation within the groups. Because this F-statistic is much larger than 4, we have very strong evidence that at least one of the population mean number of uses for these items is different from the others. (More specifically, the average
g. The high omega-3 value did not make a difference in the conclusions.
2
n−k . [1 – (SSTotal – SSModel)/SSTotal] × [_ k − 1] SSModel/SSTotal n−k = b. [____________________________] × [_ (SSTotal − SSModel)/SSTotal k − 1] SSModel _____________________
n−k _
SSModel _
n−k _
[ (SSTotal − SSModel) ] × [ k − 1 ] = [ SSError ] × [ k − 1 ] SSModel/(k − 1) SSModel × _ n − k = _________________ = [_ . k − 1 ] [ SSError ] [ SSError/(n − k ]
8
Sources of Variation
C HA PT E R 1
1.4.25 For these data, MSModel is 40/1 = 40 and MSError = 34/8 = 10 − 6_ 4.25, so the F-statistic = 40/4.25 = 9.41. The t-statistic = ______________________ _ 1+_ 1 √ 4.25 × √ _ 2 = 3.068 and 3.068 = 9.41. 5 5
1.5.8
1.4.26 _ _ (x1 − x2) 2 a. If we can show that MSModel = ________ then, 1 1 _ _ n1 + n2 _ _ _ _ 2 − x2) (x1___________ (x1 − x2) 2 = ____________ = t2 = _____________ 1 1 _ 1 1 _ sp 2(_ ( sp(√_ n1 + n2 ) n1 + n2 ) )
b. Yes, because the interval is completely positive, there is strong evidence that, on average, people tend to pick a face that is more attractive than their own when they are asked to identify their own face.
_
_
1.5.9
2
b. The prediction interval is trying to capture 95% of the individual results in the long run while the confidence interval is trying to capture the average result in the long run.
n2 )
_ _ _ _ b. MSModel = n1 (x1 − x) 2 + n2 (x2 − x) 2
1.5.10
_ _ 2 _ _ 2 _ _ n1 x1 + n2 x2 n1 x1 + n2 x2 __________ = n1 (x1 − __________ n1 + n2 ) + n2 (x2 − n1 + n2 )
a. The applet reports a p-value of 0.0000, so there is strong evidence at least one type of background music results in a different longrun mean amount of money spent; the validity conditions are met because the sample sizes are fairly large, and all four groups have similar SD values.
_ _ 2 _ (n1 + n2) __________ n x +n x = n1 (x1 ____________ − 1n1 + n2 2 ) + n + n ( 1 1 2 2)
b. The 95% confidence intervals are Classical–Pop: (£1.52, £2.91), Classical–None: (£1.73, £3.14), and Pop–None: (–£0.4590, £0.8985).
_ _ 2 _ (n1 + n2) __________ n x +n x n2 (x2 ____________ − 1n1 + n2 2 ) (n1 + n2) 1 2 _
_
_
_
n1 x1 + n2 x1 − n1 x1 − n2 x2 ______________________
c. We can be 95% confident that, on average, customers will spend between £1.52 and £2.91 more per evening meal when classical music is playing than when pop music is playing at the restaurant.
2
= n1 (
)+ n1 + n2 _ _ _ _ 2 n1 x2 + n2 x2 − n1 x1 − n2 x2 n2 (______________________ ) n1 + n2 _ _ 2 _ _ 2 n1 x2 − n1 x1 n2 x1 − n2 x2 __________ = n1 (___________ n + 2 ) ( n +n n +n ) 1
2
1
d. The mean meal cost when classical music is playing is significantly greater than when either pop or no music is playing. e. Letters plot:
2
_ _ 2 _ _ 2 x1 − x2 x2 − x1 2 _______ + n = n1 n22 (_______ n 1 2 ( n1 + n2 ) n1 + n2 ) _
_
2
_
_
2
x1 − x2 x1 − x2 2 2 _______ = (_______ ) (n1 n2 + n1 n2) = ( ) (n1 n2)(n1 + n2) n1 + n2
n1 + n2
_ _ _ _ _ _ (x1 − x2) 2 (x1 − x2) 2 (x1 − x2) 2 ________ _________ n n n + n n n = _________ = = . ( ) ( ) ( ) 1 1 2 2 1 2 2 n1 + n2 (n1 + n2) _ (n1 + n2) ( n1 n2 )
Section 1.5 1.5.1 D.
Music
Group Mean
Letters
Classical
£24.13
a
Pop
£21.91
b
None
£21.69
b
1.5.11 a. A 95% confidence interval for the long-run mean _ cost of a meal when no music is playing is £21.69 ± 2 × £3.38/√131 ≈ (£21.10, £22.28); the validity conditions are met because the sample size is fairly large. b. A 95% prediction interval for the long run ____________ cost of a meal when no music is playing is £21.69 ± 2 × £3.38 × √1 + 1/131 ≈ (£14.90, £28.48); the validity conditions are met because the histogram of these data is fairly symmetric and bell-shaped.
1.5.2 C. 1.5.3 C. 1.5.4 C. 1.5.5 A, F, H. 1.5.6 The margin of error is based on a prediction interval. The rangers are not trying to predict the mean time for all future eruptions but are trying to predict the time of the next eruption so that visitors have a high probability of seeing the eruption if they are present during the entire interval. 1.5.7 a. Mean = 7.321 hrs and SD = 1.490 hrs.
___________
a. A 95% prediction interval is 6.3 ± 2 × (12.45) × √1 + 1/27 ≈ 6.3 ± 25.36 = (−19.06, 31.66); the validity conditions are met because we were told the distribution of the results was fairly symmetric.
(x1 − x2) MSModel = F . _________________ =_ MSError 1 + _ 1 MSError _ ( n1
a. A 95% confidence interval for the population average score is 6.3 ± _ 2 × (12.45/√27) ≈ 6.3 ± 4.79 = (1.51, 11.09); the validity conditions are met because the sample size is fairly large.
____________
b. An approximate prediction interval is (7.321 ± 2 × 1.49 × √1 + 1/100 ) ≈ 4.326 to 10.316 hr; the validity conditions are met because the data are quite symmetric and have no obvious outliers. c. Ninety-three percent of these data lie within the 95% prediction interval. This is reasonably close to the 95% that we would expect.
c. The prediction interval looks like it contains about 95% of the data (it actually contains 92%), whereas the confidence interval contains a much smaller percentage of the actual data. 1.5.12 a. The p-value = 0.0087, so there is strong evidence that at least one mean is different from the others; the validity conditions are met because the sample sizes are fairly large, and the sample SDs are similar in value. b. The 95% confidence intervals are Lie − Truth: (−2.68, −0.61), Lie − Control: (−1.97, 0.10), and Truth − Control: (−0.3267, 1.7460). c. We can be 95% confident that, in the long run, the mean difference in rating for the lie condition is between 0.61 to 2.68 points lower than that for the truth condition.
Solutions to Exercises
9
d. The lie condition has a mean that is significantly less than the truth condition. Nothing else is significantly different.
earnings; the validity conditions are met because the sample size is fairly large and the SDs are all within a factor of 2 of each other.
e. Letters plot:
c. Letters plot:
Condition Lie
Group mean −0.90
Control
0.03
Truth
0.74
Letters
Estimated group mean
Letters
A
Education level
AB
Doctorate
$97.40K
A
B
Master’s
$66.00K
B
Bachelor’s
$55.20K
B
1.5.13
Associate
$36.80K
C
Some College
$32.51K
C
a. A 95% confidence interval for the long run difference in mean rat_ ings between bottled and tap water is 0.03 ± 2 × 1.975/√31 ≈ 0.03 ± 0.71 = (−0.679, 0.739); the validity conditions are met because the sample size is fairly large.
1.5.16
b. A 95% prediction interval for the___________ difference in bottled and tap water ratings is 0.03 ± 2 × 1.975 × √1 + 1/31 ≈ 0.03 ± 4.01 = (−3.98, 4.04); the validity conditions are met because the dotolot of these data is fairly symmetric and bell-shaped.
a. A 95% confidence interval for the mean amount earned by those _ with doctorates is $97.4K ± 2 × $40.5K/√50 ≈ $97.4K ± $11.455K = ($85.94K, $108.86K); the validity conditions are met because the sample size is fairly large.
c. The prediction interval looks like it contains about 95% of the data (it actually contains 30/31 = 96.8%), whereas the confidence interval contains a much smaller percentage.
b. A 95% prediction interval for the amount ___________earned by those with doctorates is $97.4K ± 2 × $40.5K × √1 + 1/50 ≈ $97.4K ± $81.81K = ($15.59K, $179.21K); the validity conditions may not be met in this case because the distribution appears to be skewed to the right with a few large outliers.
1.5.14 a. predicted quiz score =
5.50 if computer , SE of residuals = 1.63. {6.92 if paper
b. A 95% confidence interval for the long-run mean score using paper _ notes is 6.92 ± 2 × 1.07/√20 ≈ 6.92 ± 0.384 = (6.44 to 7.40). −0.713 if computer , { 0.713 if paper SE of residuals = 1.63.
c. predicted quiz score = 6.213 +
d. A 95% confidence interval for the long-run mean effect when using _ paper notes is 0.71 ± 2 × 1.07/√20 = 0.71 ± 0.384 = (0.23, 1.19). We can use the same standard deviation because the distribution of effects is the same as the distribution of scores but slid down 6.21 units. 1.5.15
c. There are 46/50 or 92% of those with doctoral degrees in this sample contained in the prediction interval. 1.5.17 a. A 95% prediction interval for the mean amount earned by those ___________ with associate degrees is $36.8K ± 2 × $28.5K × √1 + 1/50 ≈ $36.8K ± $57.57K = (−$20.77K, $94.37K); the validity conditions are suspect because the distribution looks skewed right. b. There are 49/50 or 98% of these data within the prediction interval. c. This is such a bad fit because the distribution of salaries is highly skewed to the right. This method is only valid when we have a bellshaped distribution. d. The concerns aren’t as great for a confidence interval. Even though the distribution is skewed, the associated sampling distribution should be quite symmetric with a sample size as large as 50.
Degree Some Asso Bach Mast Doct
a.
1.5.18
_ _ _ _ _ a. Each margin of error is 2s/√n, so y2 − y1 = 2 × 2s/√n = 4s/√n. ____________
_
_
_
b. The margin of error is 2s √1/n + 1/n = 2s √2/n = 2 √2 s/√n. _
0
Education level
100 Earnings ($K)
200
Group mean
Group SD
Doctorate
$97.40K
$40.50K
Master’s
$66.00K
$38.40K
Bachelor’s
$55.20K
$32.20K
Associate
$36.80K
$28.50K
Some College
$32.51K
$20.80K
b. The F-statistic is 31.534 and the p-value is < 0.0001, so there is strong evidence of an association between the levels of education and
c. With 4 > 2 √2, the answer to part (a) is larger than part (b). Both _ _ of the answers represent y2 − y1 but only the confidence interval for the difference in means uses the correctly pooled SE in the margin of error expression. If the individual means were just a tiny bit closer together the single mean intervals from part (a) would overlap, however the difference in means interval from part (b) would be completely positive.
Section 1.6 1.6.1 A, C, D. 1.6.2 A. 1.6.3 C. 1.6.4 a Increase alpha level. b. Increase sample size. c. Decrease number of groups comparing. d. Decrease variability within each group.
10
C HA PT E R 1
Sources of Variation
1.6.5
f. ≈ 0.870.
a. Just under 20.
g. As the effect size (difference in mean heart rates) increases, power of the test increases.
b. Just under 40. c. Just under 50. d. Just under 60. e. Just over 75. f. As sample size per group increases, power of the test increases linearly at first, but then plateaus. (The relationship looks logarithmic, or a power between 0 and 1.). 1.6.6 a. This will decrease the power of the test. b. Power = 0.73. c. The power would be very close to 1. d. The relationship between power and sample size looks linear for most values of the sample size, with no plateauing visible even with each sample being even as large as 120. 1.6.7
Difference in mean heart rates (bpm) Power
5
10
15
20
25
0.395
0.870
0.995
1.000
1.000
1.6.12 a. Now the rejection region should be a difference in means of about 8.2 or more. In this case, the power is roughly 0.189. b. Using a significance level of 0.10 would increase the power. c. Now the rejection region is about ≈ 0.503. d. As level of significance increases, the power of the test increases.
Level of significance
0.001
0.01
0.05
0.10
0.15
Power
0.028
0.189
0.431
0.503
0.611
a. 1 ≤ SD ≤ 3.
1.6.13
b. About 5.
a. About 0.238.
c. 4 ≤ SD ≤ 4.5.
b. Increase power.
d. A little more than 5.5.
c. About 0.510.
e. As SD increases, power decreases. For very small SDs the power of the test is one, then it begins to decrease somewhat linearly as the SDs increase.
d. As number per group increases, power of the test increases.
1.6.8 a. 81%. b. 59.1%. c. 0.10 < α < 0.15. d. As the level of significance increases, the power of the test increases. Power doesn’t increase linearly, but in steps. 1.6.9 A difference between 15 and 20 mL/d. 1.6.10 a. The differences in the group means and overall mean are the same in Scenarios 1 and 2. b. There is greater variability within the groups in Scenario 1.
Sample size per group Power
5
10
15
20
25
0.238
0.395
0.510
0.644
0.672
1.6.14 a. About 0.88. b. Power will decrease. c. About 0.253. d. As SD within each group increases, power of the test decreases.
Standard deviation per group Power
4
8
12
16
20
0.880
0.427
0.253
0.201
0.153
c. Scenario 2 will have the larger F-statistic. d. Scenario 2 will be more likely to have a statistically significant result. e. As the variability within the groups decreases, the F-statistic increases, as does the power of the test. 1.6.11 a. The rejection region is any difference in means of 5.9 or greater. b. A difference in mean heart rates of 7 bpm will be in the rejection region so you would conclude that the two treatment means are significantly different from each other. c. A difference of 4 pbm is not in the rejection region, so you would conclude it is plausible that the two treatment means do not differ from each other. d. P(Type I error) = 0.05. e. ≈ 0.395.
End of Chapter 1 Exercises 1.CE.1. a. H0: μregular = μfilled and Ha: μregular ≠ μfilled; p-value = 0.003. Because the p-value is less than 0.01, there is very strong evidence against the null and for the alternative that there is an association between the type of soup bowl and the amount of soup consumed with the secretly refilled soup bowl resulting in a higher average consumption of soup (oz). b. The samples are independent of each other (randomly assigned to bowl type), sample sizes are greater than 20 and there is no strong skewness in the data. c. Theory-based two-sided p-value is 0.0032, which is very close to the simulation-based p-value. This is expected as validity conditions were met to perform the theory-based test.
Solutions to Exercises d. A 95% confidence interval for μ regular – μfilled is (–10.27, –2.20) We are 95% confident that soup eaters who have secretly refilled bowls eat on average 2.2 oz to 10.27 oz more than those eating from the regular soup bowls.
11
So there is also less variation. This is because we are focused on the variation within each group separately, rather than looking at all the data values together (with the shift from the treatment effect). Total Samples = 10000
e. The average of the two sample standard deviations is about 7.2 oz. So, the effect size could be reported as 6.233/7.2 ≈ 0.866. Some would consider this a meaningful effect size.
1500
1.CE.2.
Mean = 6.240 SD = 1.972
1200
a. Yes: 8.44/6.12 = 1.38 < 2. Count
b. p-value from ANOVA table is 0.0031, essentially the same as the p-value from the unpooled two-sample t-test. 1.CE.3.
600
a. –4 was a plausible difference because the confidence interval from question 1.CE.1, part d contained –4. We can draw a cause and effect conclusion because this was a randomized experiment. b.
i. To model the refilled bowl subjects consuming four more ounces on average, we could subtract four from all of the observations. Then any differences between the groups are by chance alone. So then rerandomize the responses, any response assigned to the refilled group is given the +4. ii. The applet is counting how many of the differences are at least as far from −4 as the observed −6.261. The p-value is not small, indicating, as the confidence interval did, that −4 is a plausible value for the long-run difference in means (regular − refilled).
1.CE.4. a. The p-value is about 0.003, the same as the shuffled p-value and the theory-based pooled t-test p-value. The bootstrapped null distribution has the same bell shape and is centered at 0 like the re-randomized t-test null distribution and the SDs are comparable (2.174 for shuffle and 2.117 for bootstrapped).
140 Mean = –0.050 SD = 2.117
120
80 60
0 –15
–10
–5
0
Sampled Differences in Means
c. –6.23 + 2(1.972) equivalent to (−10.174, −2.280), which is similar to the 95% t-confidence interval. 1.CE.5 a. The MAD will be a larger positive number; the difference in means will be a larger number either positive or negative depending on the direction of the difference; the numerator of the t-statistic will be larger, so the t-statistic will be larger; the p-value will be smaller; the confidence interval will be the same width, but the midpoint will change shifting in the direction of the difference in means. b. The MAD and difference in means won’t change as the distance between the sample means hasn’t changed, but the t-statistic will get larger because the sample sizes make the denominator smaller, so the p-value will get smaller and the width of the confidence interval will get smaller as larger sample sizes are less variable (smaller SD of null).
d. Changing the confidence level from 95% to 99% will only increase the multiplier of the SD of the null, thus the margin of error of the confidence internal will get larger. Nothing else will be affected by this change.
40 20 0
300
c. The MAD and difference in means won’t change as the distance between the sample means hasn’t changed, but the t-statistic will get smaller because the increase in SDs make the denominator larger, so the p-value will get larger and the width of the confidence interval will get larger as more variability in the data means more variability in the null distribution.
100 Count
900
1.CE.6. –8
–4 0 4 Sampled differences in means
Count samples Beyond
8
-6.233
Count = 3/1000 (0.0030)
b. The new bootstrapped null is centered at –6.261, essentially what we assumed the difference in population means to be, and SD = 1.972.
a. Set #2 will have the larger MAD as the means are father apart. b. Set #2 will have the larger F-statistic because the variability within the groups is the same in set #1 as it is in set #2, but the variability between the groups is larger in set #2 than it is in set #1. c. The F-statistic for Set #1 should be between 0 and 1 as all the group means are very close to the overall mean which would make the MSGroups close to zero.
12
C HA PT E R 1
Sources of Variation
1.CE.7
Chapter 1 Investigation
a. Yes, the validity conditions are met because the groups are independent as the treatments were randomly assigned, and although there are fewer than 20 observations in each group, the corn weights are fairly symmetric, and the SDs are all within a factor of 2.
1. This was a randomized experiment and it is advantageous because causation may be concluded from this type of study.
b. Step 1: Ask a research question: Can organic methods be used to control harmful insects and limit their effect on sweet corn growth? Step 2: Design a study and collect data: A total of 60 plots were used in the study. In 12 plots of corn a beneficial soil nematode was introduced. In another 12 plots a parasitic wasp was used. Another 12 plots were treated with both the nematode and the wasp. In a fourth set of 12 plots a bacterium was used. Finally, a fifth set of 12 plots of corn acted as a control in which no special treatment was applied. The plots were all randomly assigned to the treatment conditions. Twenty-five ears of corn from each plot were randomly sampled and each was weighed (in ounces). H0: All population mean weights of corn are equal. Ha: At least one population mean weight of corn is different. Step 3: Explore the data: Largest mean weight of corn in ounces is found in the control group and the smallest mean weight of corn in ounces is found in the wasp group. It appears that the control group and possibly the nematode group might have significantly larger mean weights of corn yield than the other treatment groups. Step 4: Draw inferences: With F = 4.49 (df = 4, 55) and p-value = 0.0033, we have strong evidence (e.g., at the 5% level of significance) that the treatment means are not all equal. This conclusion applies to all plots of sweet corn grown under the same conditions as the experimental plots in this study. Step 5: Formulate conclusions: Because random assignment was used, we can say that the treatment was the cause of the differences seen. We can only generalize results to sweet corn in the environments in which it was grown. Step 6: Look back and ahead: Answers may vary but should suggest follow-up questions or suggest what can be changed if this study were to be run again.
3. Inclusion criteria are Parkinson’s patients (in stages 1, 2, 3, or 4) that have stable medication use and the ability to stand unaided and walk without assistance.
c. The average weight is largest (best) for control (13.2), then nematode (11.6), bacterium (11.1), nematode + wasp (10.3), and smallest (worst) for wasp (8.5).
• Parkinson’s patients (in stages 1, 2, 3, or 4) • stable medication use • ability to stand unaided and walk without assistance Design
d. The control is significantly higher than nematode + wasp and wasp. Wasp also looks significantly lower than bacterium and nematode. The control does not appear to differ from nematode or bacterium. (We can almost separate into two groups: group (1) with control, nematode, and bacterium) and group (2) nematode + wasp and wasp.) See the following letter plot for summary.
Treatment
Mean
Letters
Control
13.2083
a
Nematode
11.5822
ab
Bacterium
11.125
ab
Nematode + Wasp
10.3333
bc
Wasp
8.5
c
e. The overall mean was 10.95 which is not captured in the first and last intervals. So we can say control had a significantly larger than average weight, and wasp had significantly lower than average weights, on average.
Treatment
Mean
95% CI
Control
13.2083
(11.58, 14.84)
Nematode
11.5822
(9.95, 13.22)
Bacterium
11.1250
(9.49, 12.76)
Nematode + Wasp
10.3333
(8.70, 11.97)
Wasp
8.5000
(6.87, 10.13)
2. The experimental units are the Parkinson’s disease patients participating in the study.
4. The explanatory variable is the type of therapy (tai chi, resistance training, or stretching). The therapy lasted 24 weeks. 5. Functional reach is assessed as the maximal distance (in cm) a participant could reach forward beyond arm’s length while standing. 6. Other sources of variation could include a person’s sex, age, genetics, prior activity and fitness, and how long they have had Parkinson’s. 7. a. The sources of variation that were not allowed to change is how long the subject participated in the study, having stage 1, 2, 3, or 4 Parkinson’s disease, stable medication use, and the ability to stand unaided and walk without assistance. b. Sources of variation accounted for include the type of therapy used on each patient (tai chi, resistance training, or stretching). c. Unexplained variation could include person’s sex, age, genetics, prior or current activity levels, fitness level, duration of Parkinson’s. d. See below.
Observed variation in: Functional Reach Inclusion criteria
Sources of explained variation
Sources of unexplained variation
• Therapy type (tai chii, resistance training, stretching)
• person’s sex • age • genetics • prior or current activity levels • fitness level • duration of disease
• 24 weeks of therapy
8. Overall mean = 2.697 cm, Overall SD = 5.193 cm, SSTot al = 5231.38. Predicted change in functional reach = 2.697cm, SE of residuals = 5.193 cm. 9. predicted change in func.reach = ⎧4.89 cm if tai chi ⎪ ⎨2.34 cm if resistance, SE of residuals = 4.943. ⎪ ⎩0.86 cm if stretching The SE of residuals has decreased, making this model better at making predictions about the change in functional reach. 10. The effects are: tai chi = 2.197, resistance training = −0.357, stretching = −1.840. 11. predicted change in func.reach = ⎧ 2.193 cm if tai chi ⎪ 2.697 + ⎨ −0.357 cm if resistance , SE of resid. = 4.94. ⎪ ⎩ −1.837 cm if stretching 12. SSModel = 542.07 and SSError = 4,689.31.
Solutions to Exercises 13. R2 = 542.37/5,231.38 = 0.104. This means that about 10.4% of the variation in change in functional reach can be attributed to the type of exercise used. 14. This is somewhat subjective. The maximum difference in means is about 4 cm and the SE of the residuals is a bit larger than this, so this difference may not be enough to seem practically significant. However, 4 cm or even less may be the difference between being able to reach down to tie your shoes and not. That would be very practically significant. 15. H0: There is no association between the type of exercise and change in functional reach. μTC = μR = μS. Ha: There is an association between the type of exercise and change in functional reach. At least one μi is different from the rest. 16. The F-statistic is 11.097 (you could have used other statistics). In 1,000 shuffles, an F of 11.097 never occurred so the p-value is less than 0.001. Thus we have strong evidence of an association between type of exercise and change in functional reach among Parkinson’s patients. 17. The F-statistic is 11.097 and the theory-based p-value in the applet is given as 0.0000. With such large sample sizes and SD that are fairly similar (within a factor of 2), the validity conditions are met. Thus we have strong evidence of an association between type of exercise and change in functional reach among Parkinson’s patients. 18.
Source
DF
Sums of squares
Mean squares
F
p-value
Groups
2
542.07
271.03
11.097
0.0000
Error
192
4689.31
24.42
Total
194
5231.38
19. Doing an overall test, like we have done, allows us to keep the type I error rate at 5%. 20. Tai chi – Resistance (0.844, 4.2637), Tai chi – Stretching (2.33, 5.75), Resistance – Stretching (−0.2268, 3.1929); We are 95% confident that the true average increase in functional reach is between 2.33 cm to 5.75 cm larger for those that do tai chi than for those that do stretching. Similarly, we are 95% confident the true average increase in functional reach is between 0.844 cm to 4.26 cm larger for Parkinson’s patients who do tai chi than for those who do resistance training. 21.
Exercise type
Sample mean
Letters
Tai Chi
4.89 cm
a
Resistance
2.34 cm
b
Stretching
0.86 cm
b
13
22. Tai chi has the largest effect of 4.89 cm and stretching has the least at 0.86 cm. It is reasonable to conclude that tai chi significantly increases_functional reach because the standardized statistic is 4.89/ (4.33/ √65 ) = 9.1. As that is much greater than 2, there is very strong evidence against the null of no effect of treatment on functional reach. 23. We can be 95% confident that doing tai chi will increase the true average functional reach of Parkinson’s patients by between 3.68 cm and 6.10 cm. We can be 95% confident that doing resistance training will increase the true average functional reach of Parkinson’s patients by between 1.13 cm to 3.55 cm. We can be 95% confident that stretching will change the true average functional reach of Parkinson’s patients between a decrease of 0.35 cm up to an increase to 2.07 cm. 24. We predict that 95% of the Parkinson’s patients doing tai chi will change their functional reach between a decrease of 4.93 cm to an increase of 14.72 cm. We predict that 95% of the Parkinson’s patients doing resistance training will change their functional reach between an increase of 7.48 cm to 12.16 cm. We predict that 95% of the Parkinson’s patients stretching will change their functional reach between a decrease of 8.97 cm to an increase of 10.68 cm. 25. With a p-value of about 0, there is strong evidence of an association between exercise type and change in functional reach. Even though R2 was only 0.104 and the maximum difference in means was a bit less than the SE of the residuals, the results would probably be considered practically significant because even a small change in functional reach could be a great benefit. 26. Answers will vary. The addition of a control group would be nice in order to compare the change in functional reach of Parkinson’s patients in each treatment group to Parkinson’s patients with no intervention. Does their functional reach tend to stay the same, increase, or decrease on average? Follow-up studies could look at other forms of exercise or combinations of exercise. They could also look to see what sort of dose-response there might be. For example if tai chi was done more frequently each week would we get better results?
CHAPTER 2
Controlling Additional Sources of Variation Section 2.1
ii.
2.1.1 B. 2.1.2 D. 2.1.4 B.
b. As there does appear to be a linear pattern in the data and the R2 value is very large, there is an association between the times for the narrow and wide-angle paths, so pairing did help.
2.1.5 C.
2.1.10
2.1.6 A.
a. 3.2.
2.1.7
b. SSTotal = 61.2.
a. See the following table.
c. 4 − 3.2 = 0.8, 2.4 − 3.2 = −0.8.
2.1.3 B.
Source
DF
Sum of Squares
Method
1
30
30
Person
10
50
5
Error
10
20
2
Total
21
100
d. 10(0.8)2 + 10(−0.8)2 = 12.8.
Mean Squares
F 15
b. 0.80. c. 50%. d. The F-statistic is the most helpful. As that number is much greater than 4, there is strong evidence of an association. 2.1.8 a.
i.
e. SSperson = 40.2. f. SSError = 61.2 − 12.8 − 40.2 = 8.2. g. See table below; R2 = (12.8 + 40.2)/61.2 = 0.867, so 86.7%.
DF
Sums of squares
Mean squares
F
Condition
1
12.8
12.8
14.1
Person
9
40.2
4.47
4.91
Error
9
8.2
0.91
Total
19
61.2
Source
2.1.11 a. 8. b. 74. c. –2, 2.
ii.
b. Because there doesn’t appear to be any linear pattern in the data and the R2 value is so small, there is not much of an association between words memorized when exercising and not, so pairing did not help much. 2.1.9 a.
i. Because most of the lines connecting the points appear to be slanted in a particular direction (upper right to lower left), the pairing does appear to help.
d. 40. e. 34. f. The percentage of total variation in the scores that is left unexplained is 54.1%. See the following table.
Source
DF
Sum of Squares
Mean Squares
F
Environment
1
40
40
9.412
Error
8
34
4.25
Total
9
74
14
c02InstructorSolutions.indd 14
16/10/20 6:59 PM
15
Solutions to Exercises g. See the following table.
d. 68.2% and 31.8%.
Person
1
2
3
4
5
Distracting Score
6
5
8
4
7
Quiet Score
10
8
14
8
10
Average Score
8
6.5
11
6
8.5
Person Effect
0
–1.5
3
–2
0.5
h. 4% of the variation is left unexplained. See the following table.
Source
DF
Sum of Squares
Mean Squares
F
Environment
1
40
40
53.33
Person
4
31
7.75
Error
4
3
0.75
Total
9
74
Source
DF
Sums of squares
Mean squares
F 6.721
Media
1
0.74
0.74
Error
34
3.74
0.11
Total
35
4.48
Source
DF
Sums of squares
Mean squares
F
Media
1
0.74
0.74
8.81
Person
17
2.31
0.136
1.62
Error
17
1.43
0.084
Total
35
4.48
Table 2.1.14a
Table 2.1.14c
2.1.15 a. H0: μ d = 0, Ha: μ d ≠ 0, where is the long-term mean difference in headway variability between using Facebook and Instagram. b. 0.030.
2.1.12
c. Yes, there does seem to be a significant difference because 14 of the 18 differences are positive. d. The p-value should be approximately 0.003. 0.03 − _ 0 ≈ 3.44. e. t = ___________ 0.037/√18 f. p-value = 0.0031.
2.1.13
g. We have strong evidence that the long-run mean difference in headway variability is not zero, with Facebook having the significantly larger variability.
a. 4.0485. b. 158.855. c. 0.0161, −0.0161.
2.1.16
d. 0.01613.
a. (0.0828, 0.4907).
e. 50.177 × 2 = 100.355.
b. Yes, there is strong evidence that the mean time headway variability differs between drivers scrolling through Facebook and Instagram because the 95% confidence interval is completely positive.
f. 58.48. g. See the following table.
Source
DF
Sum of Squares
Condition
1
0.01613
0.01613
Person
30
100.355
3.345
Error
30
58.339
1.944
Total
61
158.855
2.1.17
Mean Squares
F 0.0083
h. 0.01%. i. 63.17%. j. No, not even close. The F-statistic is almost 0, nowhere near being larger than 4. 2.1.14 a. See Table 2.1.14a. b. 16.5% explained by media and 83.5% unexplained. c. See Table 2.1.14c.
c02InstructorSolutions.indd 15
a. H0: μd = 0, Ha: μd ≠ 0, where b. 1.645. c. Yes, there does seem to be a significant difference as 32 of the 38 of differences are positive. d. The p-value should be approximately 0.0002. 1.654 −_ 0 ≈ 3.84. e. t = ____________ 2.657/√38 f. p-value = 0.0005. g. We have strong evidence that the long-run mean difference in Stroop test time is not zero, with the no music environment giving significantly larger scores. 2.1.18 a. (0.7803, 2.5268). b. Yes, there is strong evidence that there is a difference in mean Stroop test time because the 95% confidence interval is completely positive.
16/10/20 6:59 PM
16
C HA PTER 2
Controlling Additional Sources of Variation
Section 2.2
b. Randomly assign 3 of the 6 women over 40 to each group, and 3 of the 6 men over 40 to each group.
2.2.1 B. 2.2.2 C. 2.2.3 D. 2.2.4 A.
Male
2.2.5 D. 2.2.6 A. 2.2.8
Sources of explained variation
Racing Speed
• Diets • Dog (breed, • Racing dogs metabolism, Design sleep, race • Each dog received experience, etc.) each diet in a random order Inclusion criteria
b.
40 or older
3
3
Under 40
5
5
3
3
Under 40
1
1
Source
DF
Sums of squares
Mean squares
F
Detergent
1
0.28
0.28
0.085
3.33
2.2.13
a. Sources of Variation Diagram:
Observed variation in:
Treatment B
40 or older
Female
2.2.7 D.
Treatment A
a. See below.
Sources of unexplained variation • Track conditions • Weather conditions • etc.
Error
6
19.97
Total
7
20.25
b. 1.38%; F-statistic is 0.085. c. 1.5625, −1.5625. d. Polyester. e. 19.531.
i. False.
f. See below
ii. True. iii. False. iv. True. v. True. 2.2.9 The null distribution of F-statistics created by shuffling all the responses (completely randomized) tends to have a larger standard deviation. 2.2.10
Source
DF
Sums of squares
Mean squares
F
Pan
1
4.5
4.5
22.5
Brand
1
2
2
10
Error
5
1
0.2
Total
7
7.5
2.2.11 The F-statistic is lower because the brand effect is smaller than the pan effect.
Source
DF
Sums of Squares
Mean squares
F
Brand
1
2
2
10
Pan
1
4.5
4.5
22.5
Error
5
1
0.2
Total
7
7.5
2.2.12 a. Randomly put 8 males in each group and 4 females in each group.
c02InstructorSolutions.indd 16
Source
DF
Sums of squares
Mean squares
F
Detergent
1
0.28
0.28
3.182
Material
1
19.53
19.53
Error
5
0.44
0.088
Total
7
20.25
g. 96.44%. h. No, the percentage of variation in stain ratings explained by detergent type has not changed; Yes, the F-statistic has increased. i. A large proportion of the total variation can be explained by material type which reduces the amount of unexplained variation (SSError). Because the denominator (MSError) is now so much smaller, the F-statistic is larger. 2.2.14
2.2.15 a. See the following table.
Source
DF
Sum of Squares
Mean Squares
F
Treatment
1
553.52
553.52
1.537
Error
10
3600.71
360.07
Total
11
4154.23
16/10/20 6:59 PM
17
Solutions to Exercises b. 13.3%; the F-statistic is 1.537.
ii.
c. –15.21, 15.21.
iii. about 99%.
d. Non-organic.
iv.
e. 2775.521.
b.
f. See the following table.
Source
DF
Sum of Squares
Mean Squares
i.
F
Source
DF
Sums of squares
Mean squares
F
p-value
6.04
Model
3
47852.83
15950.94
13.193
0.0001
1209.08
Treatment
1
553.52
553.52
Type
1
2775.52
2775.52
Error
20
24181.67
Error
9
825.39
91.71
Total
23
72034.50
Total
11
4154.23
ii. iii.
g. 66.8%.
iv.
h. i. A large proportion of the total variation can be explained by banana type and this reduces the amount of unexplained variation (SSError). Because the denominator (MSError) is now so much smaller, the F-statistic is larger. 2.2.16 2.2.17 a. All the cards are being shuffled and 12 are randomly placed in each group. The graph displays a dot that is the proportion of males in group 1 minus the proportion of males in group 2. b. The mean should be about 0 and the SD should be about 0.20. A mean of zero makes sense because, on average, you should have the same proportion of males in each group, so the difference in group proportions should be 0. c. The mean should be about 0 (it will not be quite as close to 0 as the mean of the previous distribution) and the SD should be about 1.38.
c.
i. See Table 2.2.18c.i. This is the same ANOVA table as question 2.2.18(a)i. ii. See Table 2.2.18c.ii. iii. The sum of squares for treatment is the same as it was when ignoring the block, and the sum of squares for bock is the same as when ignoring the treatment. SSModel = 749.50 + 47852.83 = 48602.33 and 48602.33/72034.50 = 0.6747, about 67%. iv. about 33%. v.
vi. The apple orchard farmer should find out the conditions in block #4 where the apple production was highest and see whether those conditions can be replicated throughout the orchard. The type of ground cover did not have a significant effect on the apple production, so use whatever method was least expensive.
Source
DF
Sums of squares
Mean squares
F
p-value
0.04
0.9991
d. Now the applet shuffles the 16 blue (male) cards and random puts 8 in each group, then randomly shuffles the 8 red (female) cards and randomly puts 4 in each group. This results in 8 blue and 4 red cards in each group after every shuffle.
Treatments
5
749.50
149.90
Error
18
71285.00
3960.28
e. The mean and SD are both 0. Because we are blocking on sex, we are forcing the proportion of males to be the same in each group so that all the differences in the proportions of males are 0.
Total
23
72034.50
Table 2.2.18c.i
DF
Sums of squares
Mean squares
F
p-value
Treatments
5
749.50
149.90
0.10
0.9915
Blocks
3
47852.83
15950.94
10.21
0.0007
Error
15
23432.17
1562.14
Total
23
72034.50
f. The SD is about 0.85 and this is smaller than that from part (c) when we weren’t blocking on sex. g. Height is associated with sex because males, on average, are taller than females. 2.2.18 a.
i.
Source
DF
Sums of squares
Mean squares
F
p-value
Model
5
749.50
149.90
0.038
0.9991
Error
18
71285.00
3960.28
Total
23
72034.50
c02InstructorSolutions.indd 17
Source
Table 2.2.18c.ii
2.2.19 a. Experimental units: the bananas; explanatory variable: type of treatment; response variable: hours until over-ripened.
16/10/20 6:59 PM
18
C HA PTER 2
Controlling Additional Sources of Variation
b. See the following table.
Observed variation in: Ripening time Inclusion criteria Bananas from same store
2.3.10
Sources of explained variation • Treatment: bag, wrap, both, neither
Sources of unexplained variation • Type of banana • How long in store
Design Randomly assign treatment to each type of banana
c. 5511.21/8854.65 ≈ 0.62 so 62% of the variability in time until overso ripened is explained by the treatment and 38% is still left unexplained. d. e. p-value ≈ 0.06 (theory-based p-value = 0.0417).
a. B. b. D. c. B + C + D. d. C. e. A. 2.3.11 a. No, age is not associated with Breed because each breed in the sample has 5 young and 5 mature cows. b. There is not strong evidence of an association (p-value = 0.5276). SSModel = 0.21; 0.00414. c. There is strong evidence of an association (p-value < 0.0001). SSModel = 34.32; 0.677. d.
Source
DF
Adjusted SS
Adjusted MS
Adjusted F
f. The average hours until over-ripe for the 4 organic bananas is 53.3 hours. The average hours until over-ripe for all the bananas is 73.9 hours. The effect of the organic bananas is
Breed
4
34.321
8.580
49.907
Age
1
0.207
0.207
1.204
Error
94
16.161
0.172
g.
Total
99
50.689
i. The F-statistic has increased to 17.3. This makes sense because by adjusting the data by the effects for each banana type, the values are less variable which decreases the SSError, but the treatment means haven’t changed, so the SStreatment will stay the same; ii. p-value = 0.002 (theory-based p-value = 0.0023); iii.
h.
i. F = 4.4; ii. iii.
e. Yes, the adjusted SS match the SSModel numbers from parts (b) and (c). f. Yes, the adjusted SS for breed, age, and error sum to SSTotal because we have the same proportion of mature or young cows in each breed and therefore are balanced. g. SScovariation = 0 h.
i.
iv.
Section 2.3 2.3.1 B. 2.3.2 A.
ii.
2.3.3 A. and B. 2.3.4 D. 2.3.5 A. 2.3.6 a. False. If there is covariation then that will also be part of the total sum of squares. b. False; If there is covariation then that will also be part of the sum of squares model. c. False. If variable 2 is not associated with the response variable, adding it won’t change the p-value. d. False. If there is covariation between variables 1 and 2, then the sum of squares for variable 1 will change. e. True. 2.3.7 SSrace is the sum of squares for race when education is also in the model. 2.3.8 A + B − C. 2.3.9 C − (A + B)
c02InstructorSolutions.indd 18
2.3.12 a.
i. ii.
b. R2 = 0.681. c. The R2 values are rather similar because age is not very strongly associated with butterfat percentage. d. The mean butterfat percentages are the same whether age is part of the model or not. This is because we have the same number of cows in each age–breed group. Everything is balanced just like it would be in an experiment. 2.3.13 a. There is not strong evidence of an association (p-value = 0.1532). SSModel = 198.35; 0.0244.
16/10/20 6:59 PM
Solutions to Exercises b. There is strong evidence of an association (p-value = 0.0000). SSModel = 2429.83; 0.299.
ii. iii. No, p-value is 0.13, so there is not a significant difference between the GPAs of those who ate breakfast and those who didn’t.
c.
Source
DF
Adj SS
Adj MS
Adj F
Sex
1
2389.35
2389.35
35.41
Allergy
1
157.88
157.88
2.340
Error
82
5533.56
67.48
Total
84
8121.26
d. No, the adjusted SS do not match the SSModel numbers found in parts (b) and (c).
19
iv. v. c. The R2 values are quite different. R2 increased from about 0.055 to 0.135 by adding sex to the model. This means that sex is associated with eating breakfast which we can see in how different the percentages that ate breakfast are (70.0% and 46.7%).
f. SScovariation = 8121.26 − 5533.56 − 157.88 − 2389.35 = 40.47.
d. The means got closer together when adjusted for a person’s sex. The no breakfast mean increased because most of the males (with lower GPAs) were in that group so when it was adjusted for sex the mean increased. The breakfast mean decreased because most of the females (with higher GPAs) were in that group, so when it was adjusted for sex the mean decreased.
g. No. 2.44% + 29.9% ≠ 31.36%.
2.3.18
2.3.14
a. 1.42.
a. 1 − (5533.56/8121.26) ≈ 0.3386 so 31.86%.
b. 1.42/10.46 × 100% = 13.58%.
b. 2389.35/ 8121.26 ≈ 0.2942 so 29.42%.
c. 0.83/10.46 × 100% = 7.93%.
e. No, because we do not have the same proportion of males and females with allergies, the adjusted SSsex, SSallergy, and SSError do not sum to SSTotal.
c. 157.88/8121.26 ≈ 0.0194 so 1.94%. d. 0.3186 − 0.2942 − 0.194 = 0.005, so 0.50%. e. 5533.56/8121.26 ≈ 0.6814 so 68.14%. f. Yes, 29.42% + 1.94% + 0.5% + 68.14% = 100%. 2.3.15 a.
i. R2 = 0.0244; ii.
b.
i. R2 = 0.319; ii.
is much larger when sex is added to the model because c. The height is strongly associated with a person’s sex. d. The means from part (b) when sex is added to the model are lower than those from part (a). There were more males in the sample than females so when average height is adjusted for a person’s sex these means decrease. 2.3.16 a.
d. 0.28/10.46 × 100% = 2.68%. e. (1.42 − 0.28 − 0.83)/10.46 × 100% = 2.96%. f. 9.04/10.46 × 100% = 86.42%. g. Close; they sum to 99.99% because of a little bit of rounding error. 2.3.19 a. Females have a higher mean GPA (3.55 vs. 3.30). b. Those who ate breakfast have a higher mean GPA (3.53 vs. 3.35). c. Yes, 35/50 or 70% of the females ate breakfast and only 14/30 or 46.7% of the males ate breakfast. d. When the GPAs are adjusted for sex, the red dots move up and the blue dots move down. Because male GPAs were lower, on average, they increase and the female GPAs decrease when the sex effects are subtracted from each GPA. e. The adjusted line is closer to horizontal. The slope of this (purple) line is smaller than the slope of the unadjusted line because the female GPAs were adjusted downward, and there are many more female dots in the breakfast column (more females ate breakfast than didn’t) than male dots, so the column moves downward, on average. 2.3.20 a. Those who played a high school sport are taller in the sample than those who didn’t (67.58 inches vs. 66.15 inches.) The p-value for testing this is 0.1393 so there is not strong evidence of an association. b.
b. For these data, a person’s sex is strongly associated with both GPA and eating breakfast. This means there is a fairly large covariation so when a person’s sex is accounted for, we don’t have strong evidence of an association between eating breakfast and GPA. 2.3.17 a.
i. When the heights are adjusted for sex, the blue dots (female heights) move up and the red dots (male heights) move down. The mean heights both went up because there are more females than males in the sample, but now both are close to the same (68.33 in vs. 68.06 in);
i. ii. iii.
b.
i.
c02InstructorSolutions.indd 19
c. The HS_Sport p-value is now 0.7047. By adjusting for sex, the mean heights were closer together, so the p-value increased from what it was when sex was not in the model.
16/10/20 6:59 PM
20
C HA PTER 2
Controlling Additional Sources of Variation
2.3.21 a.
iii.
i.
ii.
b.
i. The average number of cigarettes for those less than 40 is 10.80 and for those 40 or more is 6.93;
iii.
ii. iii.
b.
i.
c.
i.
ii. iii. c.
Finished High School
i.
Females
Males
History of depression
137 (40.9%)
84 (20.7%)
No history of depression
198 (59.1%)
321 (79.3%)
335
405
Total ii.
Less than 40 years old
40 or older
47 (92.16%)
49 (56.32%)
Didn’t Finish HS
4 (7.84%)
38 (43.67%)
Total
51
87
ii.
iii. d.
i.
Mean cigarette use iii. d.
i.
Mean Alcohol Consumption Females
Males
History of depression
41.11
106.94
No history of depression
50.83
103.29
ii.
Less than 40 years old
40 or older
Finished High School
10.87
7.24
Didn’t Finish HS
10.00
6.53
ii.
iii. iv. v.
iii. iv. Adjusted smaller now.
vi. It was 0.0088 in part (a), so it is
v. vi.
End of Chapter 2 Exercises 2.CE.1.
2.3.22 a.
i. The average number of cigarettes smoked for those who didn’t finish high school is 6.86 and for those who did is 9.02 for a difference of 2.16 cigarettes; ii.
c02InstructorSolutions.indd 20
a. Five paints (explanatory variable) were randomly in eight different locations. Because random assignment to paints was carried out in each location, location is a blocking variable and hence this is a randomized block design. The response variable is the combined measure of wear. b. The means differ (especially for paint 4) but there is still a lot of overlap in the distributions. The within group standard deviations are actually rather similar to the overall standard deviation, indicating we have not explained much of the variation in wear.
16/10/20 6:59 PM
Solutions to Exercises
5 Paint
4 3 2 1 0
20
60
40 Wear
n
Mean
SD
5
8
21.13
10.15
4
8
29.38
13.39
3
8
19.00
11.34
2
8
23.63
12.59
1
8
20.50
11.72
40
0.00
11.89
Resid
d. We have random assignment within each location; the “locationadjusted” distributions still look reasonably well behaved (so we will assume normality in the treatment populations). However, Paint 1 does have a much smaller standard deviation than the others, perhaps indicating that after adjusting for location the variability in the paint types is not the same across the paint types. e. Paint type 1 (the control) and Paint type 3 have significantly below average wear, and Paint type 4 has significantly above average wear. Clearly, Paint type 4 is best to maximize wear. We might not have detected this if we hadn’t adjusted for location. And, of course, not adjusting for location would have been an incorrect analysis here. 2.CE.2
Measure of Wear
Treatment
c. The treatment means have not changed, but the standard deviations are much smaller (better to look at standard deviations than standard errors for this comparison). We see that paint type is statistically significant (F = 30.42, p-value < 0.0001) after adjusting for location. Group 5 (n = 8) Mean = 21.125 SD = 2.305
a. Observational units = 6 sections of a psychology class; Blocking variable = teacher; Factor = study method (3 levels). b. Block effect for teacher A = 70 − 77.5 = −7.5 and block effect for teacher B = 80 − 77.5 = 7.5. Yes, we see that for all three study methods, teacher A has a lower average than teacher B. The teacher A sections score 7.5 points lower than the overall average, whereas the teacher B sections score 7.5 points higher than the overall average. c. Yes, if we compute the three means for the Study Methods, we get 72.5, 82.5, 77.5. This would indicate the Small group + Individual group has the highest mean, and we see that it is higher than the other two groups for both teaching methods (75 > 70 > 65, and 90 > 85 > 80). Yes, if the scores are adjusted for the teacher effects, the study method means will stay the same. Adjusting for the block effects will reduce the variability of the scores within each study method, but it will not change the means. Because there will be less within study method variation in scores, the observed differences in means will actually be more statistically significant than before adjusting for the effects of the teachers. d. See the following table.
Group 4 (n = 8) Mean = 29.375 SD = 1.832
Source
DF
Teacher
2−1=1
Group 3 (n = 8) Mean = 19 SD = 1.855
Study method
3−1=2
Group 2 (n = 8) Mean = 23.625 SD = 2.280
10
20
30 40 Wear
50
60
Group 1 (n = 8) Mean = 20.500 SD = 0.392
Source
DF
Sum of Squares
F
p-value
Paint
4
531.35
132.84
30.42
0.0000
Location
7
4826.37
689.48
157.92
0.0000
Error
28
122.25
4.37
Total
39
5479.97
c02InstructorSolutions.indd 21
21
Mean Squares
Error
2
Total
5
2.CE.3. a. The experimental units are the platoons (or the commanders), because the treatment (type of commander) was applied to the platoon as a whole, not randomly assigned to individual soldiers. (Consequently, we computed the average to get one value per platoon.). b. Explanatory Variable = Pygmalion or control; Blocking variable = company; so, you can think of this as two treatments with a blocking factor. There is a bit of replication because there are three platoons in most of the blocks, so there are two platoons getting the control treatment. c. Null hypothesis: There is no difference in average score between these two treatments; Alternative hypothesis: There is a difference in the long-run average score. Test-statistic, F = 3.52, and p-value = 0.0934; After adjusting for the effect of company, there is mod-
16/10/20 6:59 PM
22
C HA PTER 2
Controlling Additional Sources of Variation
erate (but not strong) evidence (p-value = 0.0934) that there is a Pygmalion effect. (See the following ANOVA table.).
Source
DF
Sum of Squares
Mean Squares
F
p-value
Treatments
1
241.51
241.51
3.52
0.0934
Blocks
9
424.04
47.12
0.69
0.7080
Error
9
617.76
68.64
Total
19
1283.32
Observed Variation in:
Sources of explained variation • Whether or not take cholesterol medication regularly
Inclusion criteria
2.CE.4 a. No, the percentage of crews of size 4 is 50% for both crew leaders A & B. Similarly, the percentage of crews of size 6 is 50% for both crew leaders A & B. b. R2 = 0.549.
Source
DF
Sums of squares
CrewSize
1
231.13
231.13
Error
6
189.88
31.65
Total
7
421.00
Mean squares
F
p-value
7.30
0.0355
• Adult children of the 1948 cohort, • recruited in 1971, and
c. R2 = 0.955.
Adjusted Adjusted Adjusted SS MS F p-value
Source
DF
Crew leader
1
171.13
171.13
45.63
0.0011
Crew size
1
231.12
231.12
61.63
0.0005
Error
5
18.75
3.75
Total
7
421.0
The effects (± 5.38) and sum of squares for crew size stay the same. The R2 value for the two-variable ANOVA is much higher than for the onevariable ANOVA. This is because crew leader explained additional variability in productivity. Even though the degrees of freedom for error decreased when the blocking variable, crew leader was added, the SSError decreased by such a large amount that crew size adjusted for crew leader is now even more of a significant predictor of productivity (p-value = 0.0005).
Chapter 2 Investigation 1. This is an observational study, because the researchers did not manipulate or control any of the explanatory variables. 2. Each individual who was recruited in 1971. 3. Response variable: regularly – categorical.
a. Adult children of the 1948 cohort, recruited in 1971, and attended the 8th examination cycle in 2005–2008.
Sources of unexplained variation • Age • Family history • What other medication they take regularly • Other medical conditions
• attended the 8th examination cycle in 2005–2008.
• Unknown
5. Overall mean = 185.40, Overall SD = 40.62, SSTotal = 477,7736.7 predicted total cholesterol level = 185.40, SE of residuals = 40.62. 6. predicted total cholesterol level =
The effect of crew size 6 is +5.38 and the effect of crew size 4 is −5.38. This is statistically significant at the 5% level with p-value 0.0355.
c02InstructorSolutions.indd 22
d. See the following table.
Total cholesterol level
d. Using software, the 95% confidence interval for difference in mean score between the Pygmalion and control group (P – C) is: (1.80, 12.64). We are 95% confident that the mean score for the Pygmalion treatment will be 1.8 to 12.6 points higher than the mean score for the Control treatment, after controlling for company.
4.
b. There are no sources of variation accounted for by the study design. c. Age, family history, what other medication they take regularly, other medical conditions, etc.
166.96 if cholesterol meds , SE of residuals = 37.33. {199.29 if not
Because the SE of residuals is reduced from 40.62 to 37.33 the predictions are more accurate if knowledge of whether or not someone is taking cholesterol medications regularly is used. 7. Least Squares mean estimate of cholesterol level = 183.12, Effect of taking cholesterol medication regularly = predicted total cholesterol level = 183.12 +
− 16.17 if cholesterol meds , SE of residuals = 37.33. {16.17 if not
8. SSModel = 742,431.0; SSError = 4,035,305.7; 0.155. 9. The observed difference in cholesterol levels between taking meds and not taking meds regularly is 32.34 mg/dL. This seems to be borderline significant with regard to practical significance. 10. Null hypothesis: There is no association between taking cholesterol meds regularly and cholesterol level; μmeds = μno meds. Alternative hypothesis: There is an association between taking cholesterol meds regularly and cholesterol level; represents the mean cholesterol level in the population of those who take cholesterol meds regularly and represents the mean cholesterol level in the population of those who don’t take cholesterol meds regularly. 11.
Source of Variation
DF
Sum of Squares
Mean Squares
F Ratio
Model
1
742431.0
742431
532.6
Error
2895
4035305.7
1394
Prob > F
Total
2896
4777736.7
< .0001*
16/10/20 6:59 PM
Solutions to Exercises The validity conditions are met because the sample sizes in each group (1245 and 1646) are large, and the sample standard deviations for each group are similar (neither is more than twice as large as the other).
16.
12.
Source
DF
Sum of Mean Squares Squares
Take_aspirin
1
253946.4
253946
Error
2895
4523790.3
1563
Total
2896
4777736.7
Cholesterol Medication
Difference
No − Yes
32.34
SE Diff
Confidence interval
1.40
(29.59, 35.08)
We are 95% confident that the mean total cholesterol level is between 29.59 to 35.09 mg/dL higher among those who do not take cholesterol meds regularly compared to those who do take cholesterol meds regularly. 13. Yes, we have found evidence that total cholesterol level tends to be lower, on average, for those who regularly take cholesterol medication compared to those who don’t because the ANOVA resulted in a large F-statistic (532.63) and a small p-value (< 0.0001), and the 95% confidence interval did not contain 0. 14. No, we cannot conclude causation from an observational study. Possible confounding variables include use of other medications, family history, etc. 15. Can either make a mosaic plot (such as the one shown for Solution 15) or a two-way table. There does appear to be an association between aspirin use and the use of cholesterol medication.
Cholesterol Medicine
Aspirin Total
Total
Yes
No
Yes
778
512
1290
No
468
1141
1609
1246
1653
2899
23
Analysis of Variance F
p-value
162.5
< .0001
The validity conditions are met because the sample sizes in each group (1290 and 1607) are large, and the sample standard deviations (39.07 and 39.90) for each group are similar (neither is more than twice as large as the other). 17. Yes, because the proportion of aspirin takers is higher among those who take cholesterol medication regularly, compared to those who do not take cholesterol meds regularly, aspirin use is a confounding variable. And, taking aspirin has a statistically significant association with total cholesterol level (p-value < 0.001). 18.
Source
DF
Sum of Squares
Mean Squares
F
p-value
Cholesterol medication use
1
549013.8
549013.8
399.7
< 0.0001
Aspirin use
1
60529.2
60529.2
44.1
< 0.0001
Error
2883
3974776.5
1373
Total
2886
4777736.7
The validity conditions are met because the sample sizes are very large and the standard deviations of each group are close together. 19. The SScholesterol is now smaller than before. The effect of taking cholesterol medication is –14.6—closer to zero—so a smaller effect in absolute value. Both the R2 (0.115) and the F-statistic (399.7) are now
1.0 0.9 0.8
Take aspirin
0.7 0.6 0.5 0.4 0.3 0.2 Take aspirin No Yes
0.1 0.0
No Take cholesterol medicine
Yes
Solution 15
c02InstructorSolutions.indd 23
16/10/20 6:59 PM
24
C HA PTER 2
Controlling Additional Sources of Variation
smaller. The SE of residuals = 37.06. The changes make sense because after adjusting for the effects of taking aspirin (which we saw has a statistically significant association with lower cholesterol levels), the effect of taking cholesterol medication on total cholesterol level are smaller. But, because use of aspirin and use of cholesterol medication jointly explain more variation that just cholesterol medication usage itself, the SE of the residuals is smaller. 20. predicted total chol. = 182.81 + − 14.64 if chol. med − 4.84 if aspirin + , SE of residuals = 37.06. { 14.64 if not { 4.84 if not 21.
Cholesterol Medication
Difference
Std Err Diff
Confidence interval
No – Yes
29.29
1.46
(26.41, 32.16)
levels) and sex, the effect of taking cholesterol meds on total cholesterol level are smaller. b. The effects are slightly smaller which is consistent with the narrower confidence interval. c. 182.19 − 14.13 − 3.37 − 9.99 = 154.7 mg/dL. d. We are 95% confident that on average, females who take both aspirin and cholesterol medication will have a total cholesterol level between 152.19 and 157.22 mg/dL. e. We are 95% confident that a female who takes both aspirin and cholesterol medication will have a total cholesterol level between 84.54 and 224.86 mg/dL. f. A 95% confidence interval for the population mean response is narrower than a 95% prediction interval for an individual’s response because the confidence interval has a smaller margin of error because the standard error of the sample mean is smaller than that of a single predicted response.
23.
g. Based on the three-variable ANOVA, we should adjust for the effects of aspirin use and whether person is a male or female (p-values < 0.0001), because these both turned out to have a statistically significant association with total cholesterol level. Thus, we have strong evidence (p-value < 0.0001) that after adjusting for use of aspirin and whether the person is a male or a female, those who take cholesterol meds regularly tend to have lower cholesterol levels, on average, compared to those who do not take cholesterol meds regularly, with the mean adjusted total cholesterol level being between 25.47 and 31.04 mg/dL lower among cholesterol med takers. We still cannot conclude causation because there are other possible confounding variables such as age, family history, general health, etc. for which we have not yet made any adjustments.
a. The SScholesterol is now even smaller than before. Both the R2 (0.106) and the F-statistic (397) are now slightly smaller. The 95% CI is narrower than before, involving smaller effects. The changes make sense because after adjusting for the effects of taking aspirin (which we saw has a statistically significant association with lower cholesterol
24. Answers may vary; but possible follow-up questions to pursue include what other variables can be adjusted for to investigate the relationship between use of cholesterol medication and cholesterol level. We can also propose/conduct a randomized experiment that will help us isolate the effect of the cholesterol medication usage.
We are 95% confident that the mean aspirin-use adjusted total cholesterol level is between 26.41 to 32.16 mg/dL higher among those who do not take cholesterol meds regularly compared to those who do take cholesterol meds regularly. 22. We have strong evidence that after adjusting for use of aspirin those who take cholesterol medication regularly tend to have lower cholesterol levels, on average, compared to those who do not take cholesterol medication regularly; the mean aspirinuse adjusted total cholesterol level is between 26.41 to 32.16 mg/dL lower among cholesterol medication takers.
c02InstructorSolutions.indd 24
16/10/20 6:59 PM
CHAPTER 3
Multi-factor Studies and Interactions Section 3.1
3.1.9
3.1.1 D.
a. The two factors are the microwave power settings and the minutes the bag is cooked.
3.1.2 A. 3.1.3 C. 3.1.4 A.
b. The response is the number of bad kernels. c. The 24 experimental units are the bags of popcorn.
a. SSModel is the SSA + SSB or 0.25 + 4 = 4.25.
d. There will be six treatment groups so there will be four bags in each treatment in a factorial design study.
b. More of the total variability is explained when both variables are in the model so there is less unexplained variability (SSError is smaller).
3.1.10 a. The optimal microwave setting is low power for 5 minutes.
c. Because more variability is explained when both variables are in the model, the SSError is changed Also, the DF for the error is different when both variables are in the model. Both of these things change the MSError and thus change the F-statistic which is MSModel/MSError.
b. See the graph that follows. I would disagree with the statement. In the high power setting the number of bad kernels increases as time increases and in the low power setting the number of bad kernels decreases as time increases.
3.1.5
3.1.6 There are 12 treatments: (back, 3, yes), (back, 3, no), (back, 4, yes), (back, 4, no), (middle, 3, yes), (middle, 3, no), (middle, 4, yes), (middle, 4, no), (front, 3, yes), (front, 3, no), (front, 4, yes), (front, 4, no).
c. The variable “number of kernels in bag” is dealt with through random assignment of bags to treatments. This should level out this variability in total number of kernels per bag.
3.1.7
High-Power
120 Number of bad kernels
a. There are 20 treatments: (D, E, DI), (D, NE, DI), (D, E, ACE), (D, NE, ACE), (D, E, BB), (D, NE, BB), (D, E, CCB), (D, NE, CCB), (D, E, NDT), (D, NE, NDT), (ND, E, DI), (ND, NE, DI), (ND, E, ACE), (ND, NE, ACE), (ND, E, BB), (ND, NE, BB), (ND, E, CCB), (ND, NE, CCB), (ND, E, NDT), (ND, NE, NDT). b. The subjects should randomly be assigned to the treatment groups to make the groups very similar or as alike as possible on all aspects except for which treatment they undergo. c. There should be 300/20 = 15 in each treatment group in order for the design to be balanced.
100 80 60 40 Low-Power
20
d. 300/2 = 150. e. 300/2 =150.
3
3.1.8 a. In a factorial design you will have more people assigned to each factor. b. In this method you don’t have anyone on any combination of treatments which may result in better outcomes or worse. For example, suppose one of the diets interacts with the effectiveness of one of the drugs. You wouldn’t find that out in this design.
3.5
4
4.5
5
Time (min)
f. 300/5 = 60. 3.1.11
a. The high setting is optimal based on these results. b. Based on the table, the optimal time is 3 minutes. c. No, 3 minutes at a high setting is not optimal. A low setting and 5 minutes was the real optimal setting/time combination.
25
c03InstructorSolutions.indd 25
16/10/20 7:45 PM
26
C HA PTER 3
Multi-factor Studies and Interactions
3.1.12
3.1.16
a. There are four treatment groups.
a. predicted rating = 3.229 +
b. Randomly assign four strawberries to each treatment group. 0.625 if 34°F c. pr edicted rating = 2.125 + { − 0.625 if 42°F
+
b. 11.0% of the total variation is explained by temperature, 2.4% by time, and 86.6% is unexplained.
0.25 if open container , SE of residuals = 0.81. {− 0.25 if closed container
2
a. We can be 95% confident that the mean rating at temperature setting 6 gives an average quality rating of 0.099423 to 1.15058 points higher than at temperature setting 4. Because this interval is completely positive, we have strong evidence of a temperature effect.
a. R = 1/15.75 = 0.0635. b. See the following table.
DF
Sum of Squares
Mean Squares
F 0.949
Container
1
1.00
1.00
Error
14
14.75
1.05
Total
15
15.75
a. There are two factors and four treatments.
DF
Sum of Squares
Mean Squares
F
Temperature
1
6.25
6.25
9.559
Container
1
1.00
1.00
1.529
Error
13
8.50
0.654
Total
15
15.75
d. The DF and SS for the first two rows from part (c) were added together to obtain DF and SS in the first row of the table. The MS value of the first row was calculated as 7.25/2 (SSModel/Model DF). The F-statistic was recalculated as 3.625/0.654. 3.1.14 a. The difference in means is 0.5. With a two-sided simulation p-value of about 0.47 there is not strong evidence that the container type has an effect on strawberry rating. b. The difference in means is −1.25. With a two-sided simulation p-value of about 0.02 there is strong evidence that temperature has an effect on strawberry rating with the colder temperature giving a higher rating. 3.1.15 a. There are 2 factors and 4 treatments in this experiment. b. Yes, the design is balanced; there are 12 pancake ratings in each treatment. c. See the following table.
Temp = 4
Temp = 6
Average
Time = 2
2.583
3.583
3.083
Time = 3
3.250
3.500
3.375
Average
2.9165
3.5415
3.229
d. No, increasing the time does not have the same effect on the quality ratings for both temperatures. For temperature 4, increasing the time from 2 to 3 minutes seemed to make the quality of the pancakes higher. However, at temperature 6 the quality went down slightly in the sample when the time was increased.
c03InstructorSolutions.indd 26
b. We can be 95% confident that the mean rating for a cooking time of 3 minutes gives an average quality rating of 0.2339107 points lower to 0.817224 points higher than for a cooking time of 2 minutes. Because this interval includes 0, we do not have strong evidence of a cooking time effect. 3.1.18
c. See the following table. R2 = 7.25/15.75 = 0.4603.
Source
c. With a p-value of 0.0208 there is strong evidence of a temperature effect. With a p-value of 0.2696 there is not strong evidence of a time effect. 3.1.17
3.1.13
Source
SE of residuals = 0.904.
− 0.3125 if temp 4 − 0.146 if 2 min + , { 0.3125 if temp 6 { 0.146 if 3 min
b. This is a balanced design because there are the same number of distances in each treatment. c. See the following table.
Standing (and stepping) throw
Running up to throw
Average
Overhand
63.80
65.43
64.615
Sidearm
59.53
50.34
54.935
Average
61.665
57.885
59.775
d. The effect is in the same direction (overhand is better in both standing and running), however, for running, the overhand approach seems to have a greater effect than for standing. 3.1.19 a. predicted distance = 59.775 + +
1.89 if standing {− 1.89 if running
4.84 if overhand , SE residual = 23.0. {−4.84 if sidearm
b. 0.7% of the total variation is explained by approach, 4.8% of the total variation is explained by the throw, and 94.5% of the total variation is unexplained. c. With a p-value of 0.6913 we do not have strong evidence that the approach has an effect on distance thrown. With a p-value of 0.3144 we do not have strong evidence that the throw type has an effect on distance thrown. 3.1.20 a. We can be 95% confident that the mean distance for standing is 15.75 yards less to 23.32 yards more than for running. Because this interval includes 0, we do not have strong evidence of an approach effect. b. We can be 95% confident that the mean distance for sidearm throwing is 29.22 yards less to 9.85 yards more than for overhead throwing. Because this interval includes 0, we do not have strong evidence of a throwing effect.
16/10/20 7:45 PM
Solutions to Exercises
27
Section 3.2
3.2.11
3.2.1 B.
a. See Solution 3.2.11a graph.
3.2.2 C.
b. No. Increasing the time greatly increases the rating at a temperature setting of 4, but it decreases the rating slightly at temperature setting of 6 so there is an interaction here because the effect of time is different for the two different temperature settings.
3.2.3 A. 3.2.4 B. 3.2.5 A.
c. See Solution 3.2.11c table.
3.2.6 A.
d. See Solution 3.2.11d graph.
3.2.7
e. Increasing time from 2 to 3 minutes has the same effect on ratings (it increases by 0.292) so there is no interaction in this model.
a. 4. b. Any value between 1 and 7, other than 4. 3.2.8 The missing mean ratings should be 4 for the bottom rack and 5 for the top rack.
3.5
Temp = 6
a. Adding conversation gives a bigger bump in tips. Going from no crouching to crouching increases the tips by two percentage points but going from no conversation to conversation increases the tips by four percentage points.
Mean rating
3.2.9
Temp = 4 3.0
b. See the following graph. 2.5 2
Mean tip (%)
20
3 Time (min)
Conversation Solution 3.2.11a 15
Predicted Means
No conversation
No crouching
Crouching
Temp = 4
Temp = 6
Time = 2
2.7705
3.3955
Time = 3
3.0625
3.6875
Solution 3.2.11c
c. The vertical distance between the two lines (conversation and no conversation) is greater than the “slopes” of the lines (crouching and no crouching).
Temp = 6 3.5
d. There is no interaction because the lines are parallel. Mean rating
e. Difference of the differences = 0. 3.2.10 a. See the next graph.
Mean tip (%)
20
Temp = 4
3.0
2.5 Conversation
2
3 Time (min)
15
No conversation
Solution 3.2.11d
f. See the table below. No crouching
Crouching
Difference (Observed − Predicted)
Temp = 4 b. We can see a small interaction because the lines are not quite parallel. c. Difference of the differences = 1 or −1.
c03InstructorSolutions.indd 27
Temp = 6
Time = 2
−0.1875
0.1875
Time = 3
0.1875
−0.1875
16/10/20 7:45 PM
28
C HA PTER 3
Multi-factor Studies and Interactions
Solution 3.2.11g
g. See equation for Solution 3.2.11g.
d.
Overhand
h. See the table below. These 65predicted ratings are exactly the same as the observed.
65 Overhand
Time = 3
60
Temp = 4
Temp = 6
2.583
3.583
3.250
3.500
55
Mean distance (ft.)
Time = 2
Mean distance (ft.)
Predicted Ratings with Interaction
60
55
3.2.12
Sidearm
a. H0: There is no interaction between cooking time and temperaSidearm ture. Ha: There is an interaction between cooking time and temper50 ature. b.
Running
Standing
i. 0.25.
50
ii. 1. iii. −0.75. c. The p-value should be approximately 0.203. d. We do not have strong evidence of an interaction between time and temperature with this p-value.
e. Going from standing to running has the same effect on the type of throw (it decreases each by 3.78 yd) because there is no interaction in this model. f. Difference (Observed – Predicted).
Standing approach
3.2.13 a.
Overhand
65
Mean distance (ft.)
Mean distance (ft.)
65
60
55
Running
Standing
Overhand
−2.705
2.705
Sidearm
2.705
−2.705
Overhand
g.
60
55
Running up approach
Sidearm
Sidearm
50
50
Running
Standing
Standing Runningapproach
Running up approach
Overhand
63.80
65.43
Sidearm
59.53
50.34
Standing
b. There seems to be an interaction because the lines are not even close to being parallel. c. Predicted Means (ft).
The values in this table are the same as the observed means.
Standing approach
Running up approach
Overhand
66.505
62.725
Sidearm
56.825
53.045
c03InstructorSolutions.indd 28
h. Predicted Ratings with Interaction.
3.2.14 a. H0: There is no interaction between the approach and type of throw in terms of distance. Ha: There is an interaction between the approach and type of throw in terms of distance.
16/10/20 7:45 PM
29
Solutions to Exercises b.
i. 4.27;
seeds because the predicted outcomes are not close to the observed outcomes.
ii. 15.09; iii. −10.82.
Mean number of seeds germinated
c. The p-value should be approximately 0.57. d. Based on this p-value, we do not have strong evidence of an interaction between approach and type of throw. 3.2.15 a. See interaction plot below. Because the lines are not close to being parallel, there is evidence of an interaction.
2
Growth mean (lb)
No
0.16
Predicted uncovered
60
Predicted covered
55 50
Observed covered High
Low Water
3.2.17 a. H0: There is no interaction between the water level and covering in terms of the number of seeds germinated; Ha: There is an interaction between the water level and covering in terms of the number of seeds germinated.
No
i. −19.67; ii. 29.71;
0 Yes
iii. They are not the same; the difference is −49.38.
Vitamins
c. If there were no interaction, the difference of the differences would be 0. d. Yes.
Solution for 3.2.15a
b. One way to do this is (1.19 − 1.03) − (1.23 − 1.54) = 0.47. i. There is no underlying interaction between the antibiotic and the vitamin B12 in pig growth. Ha: There is an interaction between the antibiotic and the vitamin B12 in pig growth. ii. The p-value should be about 0.035. iii. Based on this p-value, we do have strong evidence of an interaction. 3.2.16 a. There are two factors and four treatments. b. Yes, this is a balanced design because there are the same number of experimental units in each treatment. − 2.51 if low water c. predicted germinated seeds = 57.68 + { 2.51 if high water +
−2.845 if covered . { 2.845 if uncovered
e. The p-value is < 0.001 so the observed results are extremely unlikely to happen if the null is true. f. We have strong evidence of an interaction between the water level and covering in terms of the number of seeds germinated. 3.2.18 The mean number of seeds germinated with low watered in a covered container is significantly higher than low water uncovered, and high water covered. The mean number of seeds germinated with high water uncovered is significantly higher than low water uncovered and high water covered. The letters plot is shown next.
Treatment
Mean
Uncovered container, High water
75.38
a
Covered container, Low water
64.67
a
Uncovered container, Low water
45.67
b
Covered container, High water
45.00
b
3.2.19 a. R2 = 0.124.
d. Predicted Means.
Low water
High water
Uncovered container
58.015
63.035
Covered container
52.325
57.345
e. See the following graph. No, the main effects model does not do a good job of explaining the variability in the number of germinated
c03InstructorSolutions.indd 29
65
–0.31
b.
c.
70
45
Yes
1
Observed uncovered
75
Source
DF
Amount of protein
Sums of squares
Mean squares
F 5.39
1,299.60
1,299.60
Error
38
9,153.90
240.89
Total
39
10,453.50
16/10/20 7:45 PM
30
C HA PTER 3
Multi-factor Studies and Interactions
b. R2 = 0.021.
DF
Sums of squares
Mean squares
F
220.90
220.90
0.82
Error
38
10,232.6
269.28
Total
39
10,453.50
Source Source of protein
h. The histogram of residuals is a bit skewed but is not that far off from one that is close to bell shaped, so we can consider the normality condition met. The spread of the residuals at each of the four predicted means looks reasonably similar, so we can consider the equal variance condition met as well. i. The model that includes the interaction seems to be the better one. The R2 value is quite a bit larger (0.230 compared to 0.145). The SSError is smaller (8,049.4 compared to 8,933.0).The p-value for testing the interaction is 0.0545 making the interaction moderately (and almost strongly) statistically significant so it should be included in the model.
c. SSModel = 1520.50 = 1299.60 + 220.90. R2 = 0.145. The two- variable model is preferred because the SSError is smaller and the R2 is larger, but not by a lot. The model with only protein is almost as good.
Sums of squares
Mean squares
F
1,520.50
760.25
3.15
Source of protein
220.90
220.90
0.91
Amount of protein
1,299.60
1,299.60
5.38
241.43
Source
DF
Model
Error
37
8,933.00
Total
39
10,453.50
Section 3.3 3.3.1 B. 3.3.2 C. 3.3.3 A. 3.3.4 B, C. 3.3.5 A, B. 3.3.6 B. 3.3.7 C.
d. The histogram of residuals looks reasonably bell shaped, so we can consider the normality condition met. The spread of the residuals at each of the four predicted means looks reasonably similar, so we can consider the equal variance condition met as well.
3.3.8 The parallel part tells us there is no interaction and the horizontal part tells us that the main effect of one of the explanatory variables is 0. 3.3.9
e. R2 = 0.230. See the following table.
B1
Sums of squares
Mean squares
F
2,404.10
801.37
3.58
220.90
220.90
0.99
1,299.60
1,299.60
5.81
A1
3.95
3.3.10 a. 3 × 2 × 2 = 12.
Source
DF
Model
Interaction
883.60
883.60
Error
36
8,049.40
223.59
Total
39
10,453.50
B2
b. 36/12 = 3. e. Yes, there does appear to be a three-way interaction because the two-way interaction plots show at least one brand is different. The interaction plots are somewhat similar for Great Value and Minute Maid with the near horizontal lines for the frozen juice and decreasing lines for the fresh. However, for the Meijer brand the lines are switched; the near horizontal line is for the fresh juice and the decreasing line is for the frozen.
f. The sum of squares for the amount of protein and source of protein are the same in both tables. Because the interaction was added, the sum of squares for the model is larger in the table with the interaction and the sum of squares for the error is smaller. g. The sum of squares for the interaction came out of the sum of squares for the error. c. Great Value
A2
Meijer
4°C
24°C
Fresh
2.26
1.79
Frozen
2.39
2.39
Minute Maid 4°C
24°C
Fresh
2.15
2.17
Frozen
2.39
1.94
4°C
24°C
Fresh
2.15
1.82
Frozen
1.99
1.97
1.8
1.6
1.6
1.8 1.6
2.0
4 24 24 4 4 24 Temperature (ºC) Temperature Temperature (ºC) (ºC) Solutions 3.3.10c and d.
c03InstructorSolutions.indd 30
2.0
1.8 1.8 Fresh Fresh Fresh 1.6 1.6
1.8 1.6
FrozenFrozenFrozen 1.8 1.8 1.6
4 24 24 4 4 24 Temperature (ºC) Temperature Temperature (ºC) (ºC)
1.6
Vitamin C (mmol/L)
1.8
2.0
Vitamin C (mmol/L)
2.0
MeijerMeijer Meijer Minute Minute MaidMinute Maid Maid 2.4 2.4 2.4 2.4 Fresh Fresh Fresh 2.2 2.2 2.2 2.2 FrozenFrozenFrozen 2.0 2.0 2.0 2.0 Vitamin C (mmol/L)
2.2
2.0
Vitamin C (mmol/L)
2.2
Great Value GreatGreat ValueValue 2.4 2.4 2.4 FrozenFrozenFrozen 2.2 2.2 2.2 Vitamin C (mmol/L)
2.4
Vitamin C (mmol/L)
2.4
Vitamin C (mmol/L)
Vitamin C (mmol/L)
Vitamin C (mmol/L)
d.
1.8 1.6
Fresh Fresh Fresh
4 24 24 4 4 24 Temperature (ºC) Temperature Temperature (ºC) (ºC)
16/10/20 7:45 PM
Solutions to Exercises 3.3.11 a. No, there does not appear to be much of an interaction between source and temperature. For Frozen there are two near parallel lines and one decreasing line. Combining these we would get a line that decreases slightly. For Fresh, there are two decreasing lines and one parallel line. Combining these we would get a decreasing line. It might decrease more than the frozen one, but not by a lot. b. There seems to be some interaction between Brand and Source. The average of the vitamin C contents for the two temperatures for Fresh and Frozen are about the same for Minute Maid. The same is true for Meijer. However, for Great Value the average vitamin C content for Frozen is quite a bit higher than that of Fresh. c. There does not appear to be much of an interaction between Brand and Temperature. If you combine the two lines (Frozen and Fresh) for Great Value, you will get a decreasing line. The same is going to be true for the other two brands. 3.3.12 a. Yes, the p-value for Brand is 0.0027, for Source is 0.0227, and for Temperature is 0.0003. The R2 = 0.5412. b. The R2 is now 0.9632. Yes, all of these interactions should be included because this model explains a much larger percentage of the variation in the response than the model without any interactions. All but one of the interactions (brand × temperature) have p-values that are 0.004 or smaller. 3.3.13 b. See the following graph. Albertsons Krusteaz Quaker
Mean thickness (mm)
22 20
3.3.16 a. R2 = 0.6423. b. R2 = 0.8399. c. The model R2 increases from 0.6423 to 0.8381, so the interaction does explain quite a bit more of the variation in average failure load. The p-value of 0.0042 from the ANOVA table provides strong evidence of an interaction effect. 3.3.17 a. The means and interaction plots are the same in the two datasets. b.
i. 0. ii. There would be eight categories for the interaction portion of the prediction equation. This is the same number as the number of observations. Therefore each observation can be predicted exactly. This would give a SE of residuals of 0. Therefore, the SSError must be 0.
3.3.18 a. As salinity increases, the number of beans that sprouted tended to decrease. b. There is no clear trend. As the temperature increased, the number of beans that sprouted decreased for salinity 8, increased slightly then decreased for the salinity 0, and decreased then increased for salinity 4.
d. Based on the small p-value, salinity is the only variable that has a significant effect on the number of sprouts. 3.3.19 a. As salinity increased, the biomass tended to decrease. b. In general, as temperature increased, the biomass also increased. Under two of the conditions the biomass decreased as the temperature went from 34 to 36°, however everything else increased.
18 16
c. There doesn’t appear to be much of an interaction between temperature and salinity.
14
d. Based on the small p-values, both salinity and temperature have a significant effect on biomass, however the interaction is not significant.
12 10 Water
Milk Liquid
c. Yes. The Albertsons brand produces pancakes that are much thinner when water is used than when milk is used. The other two brands don’t change as drastically. 3.3.14 a. Yes, the p-value is < 0.0001 for both brand and liquid. R2 = 0.7568. b. The R2 is now 0.8864. Yes, this interaction should be included because it greatly increases the R2 and the p-value corresponding to the interaction is < 0.0001. 3.3.15 a. The sisal rope did substantially worse in both the control and heat than the nylon rope. However, in the other two conditions the sisal rope did a little bit better than the nylon rope. i. 37.50/591.33 × 100% = 6.34%. ii. 342.33/591.33 × 100%= 57.89%. iii. 116.83/591.33 × 100%= 19.58%.
c03InstructorSolutions.indd 31
c. Nylon, because its average failure load over the different environments is higher than that for the sisal rope.
c. There doesn’t appear to be much of an interaction between temperature and salinity.
a. Generalized block design.
b.
31
3.3.20 a. As temperature increases, battery life also increases quite dramatically. b. No. The brand that lasted the longest in the cold, Brand C, actually lasted the shortest amount of time at room temperature. The brand that lasted for the shortest time in the cold, Brand B, actually lasted the longest amount of time at room temperature. c. There does appear to be an interaction based on the graph and answers to the previous questions. d. The interaction does have a significant effect on battery life as the p-value is 0.0032. 3.3.21 a. Null: There is no interaction between Brand and Temperature on vitamin C (or, the difference between vitamin C levels for different temperatures is the same for each brand). Alt: There is an interaction between Brand and Temperature on vitamin C (or the difference between vitamin C levels for different temperatures is different for at least one brand).
16/10/20 7:45 PM
32
C HA PTER 3
Multi-factor Studies and Interactions
b. 1. One possible statistic is the sum of the differences (Refrigerator – Room temp) = (2.38 − 2.32) + (2.34 − 1.86) + (2.01 − 2.01) = 0.54; 2. Re-randomize the two cups of each brand of juice to the two treatment groups for each of the three brands. Compute the sum of the differences. Repeat this many, many times; 3. Compare the sum of the differences for the real data with the values obtained when re-randomizing. c. Two ways for each brand so 2 × 2 × 2 = 8 ways total. d. The eight statistics for the sum of the differences are 0.54, 0.54, −0.42, −0.42, 0.42, 0.42, −0.54, −0.54. e. Smallest possible two-sided p-value here is 0.50. The p-value for this dataset is 0.50. f. If Room temp for Minute Maid is 2.00, then our observed statistic would be 0.55 and the possible statistics would be 0.55, −0.55, 0.41, −0.41, 0.53, −0.53, 0.43, −0.43, so the smallest possible p-value would be 0.25, and that would be the p-value for the dataset. g. Increase the sample size (more cups within each brand).
Section 3.4 3.4.1 D.
3.4.9 a. The Model DF is made up of the two main effects and the interaction. The variable Sex will contribute 2 − 1 = 1 to the total, the variable time of day will contribute 3 − 1 = 2, and the interaction will contribute 2 × 3 − 1 − (1 + 2) = 2. b.
Source
DF
Sum of Squares
Mean Squares
F
Model
5
55,232
11,046.4
5.54
Error
57
113,563
1,992.3
Total
62
168,795
3.4.10 a. An interaction would mean that the difference in average height between males that played high school sports and males that didn’t is not the same as the difference in average height between females that played high school sports and females that didn’t. b. See the following interaction plot. The lines are not parallel and show that average height of males who played high school sports is larger than males who didn’t play high school sports, but the difference in reversed for females and not as big—seems like there is an interaction.
3.4.2 A. 3.4.3 D.
70 Height mean (in)
3.4.4 A. 3.4.5 a. D. b. A. c. B.
Male
–6.33
–3.45 Female
60
d. C. 3.4.6 There would be three lines in each interaction plot, one for males, one for females, and one for other. 3.4.7
Yes
b. One of the main effects is very small.
c. The sample means show the same pattern as the interaction plot with means being as follows: 71.56 inches for “male, sport,” 69.00 for “male, no sport,” 65.23 for “female, sport,” and 65.55 for “female, no sport.”
c. The interaction effect is very small. 3.4.8 a.
College dorm College apartment Off campus
Greek
Non-Greek
0.05
−0.05
−0.06
0.06
0.01
−0.01
b. There are two degrees of freedom. We can see it in the table because if we have two values in the table, the rest can all be determined. We couldn’t do that with only one value. Alternatively, the groups have (3 × 2 − 1) = 5 DF. Dorm has 3 − 1 = 2 DF and Greek has 1 DF leaving 2 DF for the interaction. c.
Source
DF
Residence
3−1=2
Greek life
2−1=1
Residence × Greek life
2×1=2
Error
49 − 5 = 44
Total
50 − 1 = 49
c03InstructorSolutions.indd 32
No Played HS sport
a. One of the main effects is very small.
3.4.11 H0: There is not an underlying interaction between playing high school sports and a person’s sex on height. Ha: There is an underlying interaction between playing high school sports and a person’s sex on height. The difference of differences = −2.88 and the (twosided) p-value from the simulation should be approximately 0.24. Because the p-value is larger than 0.05 we do not have strong evidence of an interaction between playing high school sports and a person’s sex on height. 3.4.12 predicted height (inches) 2.445 if male 0.56 if sport + = 67.835 + {−2.445 if female {− 0.56 if no sport ⎧ 0.72 if male sport
⎪− 0.72 if male no sport
+⎨
⎪− 0.72 if female sport
⎩ 0.72 if female no sport
where SE of residuals = 2.90 inches.
16/10/20 7:45 PM
33
Solutions to Exercises 3.4.13 a.
i. See the forthcoming interaction plot. There appears to be an interaction because the lines aren’t close to parallel and, in fact, cross; ii.
b. H0: There is not an interaction between a person’s sex and amount of sleep on their accuracy score. Ha: There is an interaction between a person’s sex and amount of sleep on their accuracy score. The difference of differences = 1.36 and the (two-sided) p-value is approximately 0.068. Because the p-value is a bit larger than 0.05 we do not have strong evidence of an interaction between a person’s sex and amount of sleep on their accuracy score, but we do have moderate evidence of an interaction.
Mean score
3.4.16 10
Played instrument
a.
ii. After adjusting for person’s sex, there is strong evidence of an association between sleep and accuracy score based on the small p-value (< 0.0001).
–2.99 2.83
Did not play instrument
iii. After adjusting for person’s sleep level, there is some evidence of an association between person being a male or female and accuracy score based on the somewhat moderate p-value (0.09).
0 Visual
Auditory Stimulus
b. H0: There is not an interaction between playing an instrument and the stimulus type on score. Ha: There is an interaction between playing an instrument and the stimulus type on score. The difference of differences = 5.83 and the (two-sided) p-value is approximately 0.053. Because the p-value is a bit larger than 0.05 we do not have strong evi dence of an interaction between playing an instrument and stimulus type on score, but we do have moderate evidence of an interaction.
b.
iii. The model with the interaction term explains somewhat more of the variation in the accuracy scores than the model without the interaction, as is evidenced by the somewhat larger R2 = 0.4468 > 0.3783. The F-statistic decreased, however the degrees of freedom changed so you can’t compare it directly with the other F-statistic. Because the p-value was so small, less than 0.0001, we could not see whether that changed.
i. R2 = 0.0255, F = 0.371, p-value = 0.5885; ii. After adjusting for playing an instrument, there is no evidence of an association between stimulus type and accuracy (p-value = 0.346);
iv. predicted accuracy score
iii. After adjusting for stimulus type, there is no evidence of an association between playing an instrument and accuracy (p-value = 0.643). b.
− 0.23 if male − 0.73 if above avg. sleep = 0.84 + + { 0.23 if female { 0.73 if below avg. sleep
⎧ 0.34 if male, above avg. sleep ⎪ − 0.34 if female, above avg. sleep , where SE of residuals = 0.99. + ⎨ ⎪ 0.34 if female, below avg. sleep ⎩− 0.34 if male, below avg. sleep
i. R2 = 0.1114, F = 1.672, p-value = 0.1884; ii. There is moderate evidence of an interaction of stimulus type and playing an instrument, because the p-value = 0.056; iii. The model with the interaction explains more of the variation in accuracy than the model without the interaction, as is evidenced by the larger R2 for the model with the interaction (0.1114 > 0.0255), F, and a smaller p-value.
3.2.15 a.
i.
i. R2 = 0.4468, F = 12.117, p-value < 0.0001. ii. There is strong evidence of an interaction because the corresponding p-value is quite small (0.0226).
3.4.14 a.
i. R2 = 0.378, F = 13.996, p-value < 0.0001.
3.4.17 a. The explanatory variables are a person’s sex and their genotype. Both of these are categorical. The response variable is the gratitude score which is quantitative. b. There are two levels for a person’s sex and three levels for genotype. c.
Observed variation in:
Score mean
3
Gratitude Score
1.14
Inclusion criteria Male –0.21
0
Female Below
Above Sleep
ii. The average score for the females with below average sleep was 2.14 higher than those with above average sleep (2.14 vs. 0.0) whereas the average scores for males with below average sleep was only 0.79 higher than for those with above average sleep (1.0 vs. 0.21).
c03InstructorSolutions.indd 33
• Students at a midwestern college Design • The gratitude questionnaire • Had an offense perpetrated against them
Sources of explained variation • Sex • Genotype (AA, AG, GG)
Sources of unexplained variation • The student’s upbringing • Past episodes of being offended • How a student feels on the day they fill out the gratitude questionnaire • Cultural influences
d. Yes, the p-value is 0.0006; average gratitude score was significantly higher for females than males. The R2 = 0.062.
16/10/20 7:45 PM
34
C HA PTER 3
Multi-factor Studies and Interactions
e. Yes, the p-value is 0.0173 and R2 = 0.043. Only the AA and GG genotypes had significantly different average gratitude scores from each other. f. Yes, there is strong evidence that genotype is a significant predictor (p-value = 0.0099) and strong evidence that sex is a significant predictor (p-value = 0.0003); overall, R2 = 0.1072. g.
e. No, the p-value is 0.1131. R2 = 0.0232. f. Yes, there is moderate evidence that genotype is a significant predictor (p-value = 0.0506) and strong evidence that sex is a significant predictor (p-value < 0.0001); overall, R2 = 0.1872. g.
ii. No, the lines are fairly close to parallel so there does not appear to be an interaction between gender and genotype.
i. See the interaction plot; ii. No, the lines are close to parallel so there does not appear to be an interaction between gender and genotype; iii. The female line is consistently above the male line; iv. Both lines slope up as you go from genotype AA to AG to GG.
h. The interaction should not be part of the model because the p-value for the interaction is 0.982. Also, adding the interaction to the model doesn’t noticeably change the R2; it is 0.1074. 40
iv. Both lines slope up as you go from genotype AA to AG to GG. h. The interaction should not be part of the model because the p-value for the interaction is 0.8128. Also, adding the interaction to the model doesn’t change the R2 much; it increases slightly to 0.1890.
38
Male
37
Sex Female Male
57.5 55.0 Empathy mean
Mean gratitude
iii. The female line is consistently above the male line.
Female
39
i. See plot for Solution 3.4.19g.i.
52.5 50.0
36
47.5
35
45.0 AA
34 AA
AG Genotype
GG
3.4.18
⎧−1.437 if AA ⎪ predicted gratitude score = 37.082 + ⎨ 0.138 if AG ⎪ ⎩ 1.298 if GG 1.098 if female + , SE of residuals = 4.045. {−1.098 if male
3.4.19 a. The explanatory variables are a person’s sex and genotype. Both of these are categorical. The response is the empathy score which is quantitative. b. There are two levels for a person’s sex and three for genotype. c.
Observed variation in: Empathy Score Inclusion criteria • Students at a midwestern college Design • The gratitude questionnaire • Had an offense perpetrated against them?
Sources of explained variation
Sources of unexplained variation
• Sex • Genotype AA, AG, GG
• The student’s upbringing • Past episodes of being offended • How a student feels on the day they fill out the gratitude questionnaire • Cultural influences
d. Yes, the p-value is less than 0.0001; average empathy score was significantly higher for females than males. R2 = 0.1605.
c03InstructorSolutions.indd 34
AG Genotype
GG
Solution 3.4.19g.i
⎧ − 2.01 if AA ⎪ 3.4.20 predicted empathy score = 52.25 + ⎨ 0.126 if AG ⎪ ⎩ 1.88 if GG +
−3.22 if male , SE of residuals = 7.22. { 3.22 if female
End of Chapter 3 Exercises 3.CE.1. a. This is a randomized experiment; being a randomized experiment will allow us to (potentially) conclude causation. b. Explanatory variables: level of physical exercise, and whether or not they were told that exercise would have harmful outcomes; Response variable: airflow resistance (measured in cmH2O/ liter/second). c. There were four treatments: (40 W, not harmful), (80 W, not harmful), (40 W, harmful), and (80 W, harmful). d. It was important to keep the cycling time the same for all participants so that amount of time spent cycling would not be a source of variation in airflow resistance. e.
Source
DF
Physical exercise level
2−1=1
Expected outcome
2−1=1
Physical exercise level × Expected outcome
1
Error
28
Total
32 − 1 = 31
16/10/20 7:45 PM
35
Solutions to Exercises 3.CE.2.
e. 19.54 cmH2O/L/sec.
a. The increase/decrease in airflow resistance from cycling at 40 watts to cycling at 80 watts depends on whether a person is told that there will be harmful outcomes.
f. 4.31 cmH2O/L/sec. g. (−0.80, 7.83) cmH2O/L/sec. 3.CE.3
b. See the interaction plot.
a. This study is a randomized experiment. b. The purpose of randomizing the run order and randomly assigning oranges to treatments is to eliminate run order as a possible source of variation, and orange characteristics (size, age, etc.) as a possible source of variation/confounding variable.
No
25 20
c. Explanatory variables: preparation method, and microwave time; Response variable: Amount of juice extracted (mL).
15
d. There are 6 treatments in this study: (rolled, 0 seconds) (rolled, 15 seconds) (rolled, 30 seconds) (not rolled, 0 seconds) (not rolled, 15 seconds) (not rolled, 30 seconds).
10 5
30 40
6 0
24
15 10
20
18
Residual
20
12 6
40
Yes –5 0 5 10 –15 –10 80 Residual Ergo load (W)
15
0
20
Residual
20 10 0
5
10
15
20
5
10
15
d. predicted airflow resistance = 17.82 +
20
25
2.73 if load = 40 W { −2.73 if load = 80 W
⎧− 5.28 if 40, no harm 6.29 if expect no harm ⎪ 5.28 if 80, no harm . + + ⎨ {− 6.29 if expect harm ⎪ 5.28 if 40, harm ⎩ −5.28 if 80, harm where SE residuals = 5.95 cmH2O/L/sec.
c03InstructorSolutions.indd 35
3−1=2
Method × Time
2
Error
12
Total
18 − 1 = 17
20 25 increase/decrease in amount of juice extracted when the or80a. The
ange is rolled compared to not rolled depends on how much time the orange was microwaved.
5
10
Time (sec) 0 15 30
38 36 15 20 34 Fitted value
25
32 30 Yes Rolled
Fitted value
Residual
Microwave time
No
–10 0
2−1=1
b. See the following graph.
10
–10
0
5
DF
Prep method
3.CE.4 Yes
Ergo Fitted load (W) value
No
Residual
25
Frequency
Mean airflow resistance cmH2O/liter/second
5 –10 –5
20
Residual
12
Source
No
80
20 of physical exercise and c. H0: There is no interaction between level outcome expectations; Ha: There 15 is an interaction between level of 10 physical exercise and outcome expectations; F = 25.187 and p-value < 0.0001. There is very strong evidence 10 of an 0 interaction, so do not remove the term from the model; a histogram of the residuals shows –10 5 some skewness to raise concerns about the residuals not being normally distributed, and there seems to be one unusually large residual 10 15 5 40 –5 0 plot. 5 10 15 20 –15in–10 the residual
18
30
25 load (W) Ergo
Mean airflow resistance cmH2O/liter/second
Frequency
24
e. See the following table.
Yes
Mean amount of juice extracted (mL)
Mean airflow resistance cmH2O/liter/second
30
c. H0: There is no underlying interaction between prep method and microwave time versus Ha: There is an interaction between prep method and microwave time; F = 4.26 and p-value = 0.0401. There is strong evidence of an interaction, do not remove the term from the model; a histogram of the residuals appears to have a fairly normal distribution—no concerns about the residuals not being normally distributed, and the standard deviations appear to be similar for the different treatments. See graphs for Solution 3.CE.4. A histogram of the residuals is reasonably normal so we may consider this validity condition met. However, the spread of the residuals are not quite the same at all conditions, so might not consider the equal variance condition to be met. d. The mean amount of juice is highest when an orange has not been rolled and not been microwaved, though it is not significantly higher than when rolled and microwaved for either 15 or 30 seconds.
16/10/20 7:45 PM
36
C HA PTER 3
Multi-factor Studies and Interactions
4.8
5.0 Residual
Frequency
3.6 2.4 1.2 0.0
2.5 0.0 –2.5 –5.0
–6
–4
–2
0
4
2
6
30
Residual
32
34 Fitted value
36
38
Solution 3.CE.4c
3.CE.5. a. Yes, blocks were used; each student was a block. b. Randomizing the run order of the treatments for each student attempts to eliminate run order or time as a potential confounding variable. c. Explanatory variables: whether typing on laptop or phone, and type of distraction (music, conversation, none), Response variable: number of characters correctly typed in a minute. d. There were six treatments: (laptop, no distraction), (phone, no distraction), (laptop, music), (phone, music), (laptop, conversation), and (phone, conversation). e. Some controlled sources of variation were that all participants used same phone, same laptop, same headphones, same radio station, and same person for conversation. f.
g. We cannot include the three-variable type of distraction, type of device, and person in our analysis because there were not enough replicates. 3.CE.6. a. The change in typing speed and accuracy associated with the type of distraction differs from one person to another. b. The change in typing speed and accuracy associated with the type of device differs from one person to another. c. The change in typing speed and accuracy associated with a combination of distraction and device differs from one person to another. d. See output for Solution 3.CE.6d. Neither the interaction person × device (p-value = 0.0952) nor person × distraction (p-value = 0.4077) is significant at a 5% significance level. We are justified in removing these interactions from our analysis.
Source
DF
Person
7−1=6
e. The residual plots show no reason to be concerned about the residuals not being from a normal distribution nor about the standard deviations being different for the various treatments.
Device
2−1=1
3.CE.7
Person × Device
a. Explanatory variables: removed contact point (dominant/ non-dominant hand/dominant/non-dominant foot), and difficulty of climb (easy or hard) which is based on the Hueco Bouldering Scale; Response variable: time to complete climb (seconds).
6
Distraction
3−1=2
Person × Distraction
12
Error
14
Total
7 × 6 − 1 = 41
b. There were eight treatments: (dominant hand removed, easy), (dominant hand removed, hard), (non-dominant hand removed, easy), (non-dominant hand removed, hard), (dominant foot
Source
DF
Sum of Squares
Mean Squares
F
p-value
Person
6
107,696.48
17,949.71
20.58
< 0.0001
Device
1
18,480.02
18,480.02
21.61
0.00056
Person × Device
6
12,195.81
2032.64
2.38
0.0952
Distraction
2
9439.19
4719.60
5.52
0.0200
Person × Distraction
12
11,777.81
981.48
1.15
0.4077
Error
14
12209.67
872.12
Total
41
171,798.98
Solution 3.CE.6d
c03InstructorSolutions.indd 36
16/10/20 7:45 PM
Solutions to Exercises 3.CE.10.
removed, easy), (dominant foot removed, hard), (non-dominant foot removed, easy), and (non-dominant foot removed, hard).
a. See the following graph. Whereas the difference in mean bury times seems to be similar between males and female when the sand is either dry or barely saturated, the difference in mean bury times seems to be much smaller between males and females when the sand is wet. There may be a reason to suspect an interaction.
c. Answers may vary depending on whether or not the interactions with the person block are of interest. The following table shows the result when those interactions are not of interest.
DF
Person
3−1=2
7
Difficulty
2−1=1
6
Handicap
4−1=3 3
Error
14
Total
23
Mean bury time (s)
Source
Difficulty × Handicap
37
3.CE.8
5 4
Female
3 Male
2 1 Dry
a. The change in climb time associated with the type of handicap differs depending on difficulty of climb.
Medium
Wet
Sand type
b. The p-value for the interaction term6 is 0.5817, providing no evi4 sand wetness. It is okay to dence for an interaction of sex of crab and drop the interaction term from the model. 2
c. The histogram of the residuals causes some concern that the re3 plot siduals may not be from a normal distribution, and the residual shows a curved pattern. 0
c. In the additive model the main effect 0 of sand wetness (p-value = 0.0039) and the main effect of sex–2of crab (p-value = 0.0133) are significant.
3.CE.9. a. Explanatory variable: sand wetness level (dry, medium, wet); response variable: time to bury (seconds). b. There were three treatments—dry sand, barely saturated sand, and oversaturated sand. c. Sex of the crab serves as a blocking variable, and five crabs of each sex were assigned to a treatment; this will allow us to test for an interaction of the explanatory variable and the blocking variable. d. There is an interaction between sand wetness and sex of the crab if the change(s) in time to bury from one level of sand wetness to another are different between the male and the female crabs.
Residual
Frequency
b. See output for Solution 3.CE.8b. There is strong evidence 12 (p-value = 0.0014) of an interaction between difficulty and handicap. We should 9 retain the interaction term in our model. Person was also a significant source of variation (p-value = 0.0002). 6
–2
3 at pairwise 4 5 2 can look 4 6 d. 0Because2 both main effects are significant, we comparisons for both. We can be 95% confident that the mean bury Fitted value Residuals time is higher for similar female sand crabs compared to similar male sand crabs by between 0.41 and 3.29 seconds. Also, we found that sand crabs took a significantly higher average time to bury in dry sand, but the difference in mean bury times for the oversaturated and barely saturated sand were not significantly different from each other.
6
e.
e.
Source
DF
Sand wetness
3−1=2
Sex
2−1=1
Sand wetness × Sex
2
Error
24
Total
30 − 1 = 29
where SE of residuals = 2.12 seconds. f. The histogram of the residuals causes some concern that the residuals may not be from a normal distribution, and the residual plot shows (except for one large residual for a small predicted bury time)
Sums of squares
Mean squares
F
p-value
Person
222.27
111.14
16.32
0.0002
Difficulty
834.14
834.14
122.48
< 0.0001
Handicap
147.52
49.17
7.22
0.0037
Difficulty × Handicap
184.63
61.54
9.04
0.0014
6.81
Source
DF
Error
14
95.35
Total
23
1,483.91
Solution 3.CE.8b
c03InstructorSolutions.indd 37
16/10/20 7:45 PM
C HA PTER 3
6 5 Female
4 3
Multi-factor Studies and Interactions
Male
2
the difference in means between study methods is 10 (for small group Wet + individual and individual) and −5 (for study session + individual and small group + individual).
1 about the variability changsome fanning which might raise concerns Dry Medium ing across the different treatment groups. Sand type 12
5 4 3 2
9 Female
6 3
Male
0
–2
1 Dry
b. See the following graph.
6 Residual
6 Frequency
Mean bury time (s)
7
0
Medium
2
Wet
4
4
100
2
90
0
80
–2
70 2
6
Residuals
Sand type
3
60
4 5 50 Fitted 40 value
6
30
6 Residual
Mean score
38
Mean bury time (s)
7
4
20
TL
10
LBD
0
2
Ind
0
SG + Ind
SS + Ind
Study method
–2 2
4
2
6
Residuals
3
4
5
c. See the table.
6
Fitted value
3.CE.11 These data show no evidence of an interaction, because for both the traditional lecture and learn by doing teaching methods, the difference in means between study methods is 10 (for small group + individual and individual) and −5 (for study session + individual and small group + individual). 3.CE.12 a. Yes, now there is evidence of an interaction, because for the traditional lecture the difference in means between study methods is 25 (for small group + individual and individual) and −5 (for study session + individual and small group + individual) but for learn by doing
Source
DF
Study method
3−1=2
Teaching method
2−1=1
Study × Teaching
2
Error
114
Total
120 − 1 = 119
3.CE.13. a. There would be no interaction present if the “Pygmalion effect” is the same in each company. b. See the graph in Solution 3.CE.13b.
95
90
85
Score
80
75
Company Comp1 Comp2 Comp3 Comp4 Comp5 Comp6 Comp7 Comp8 Comp9 Comp10
70
65
60
55 Control
Treatment
Pygmalion
Solution 3.CE.13b
c03InstructorSolutions.indd 38
16/10/20 7:45 PM
39
Solutions to Exercises
3.CE.14 a. The interaction is not significant (p-value = 1.0). b. The interaction is statistically significant (p-value = 0.0145); we have the same bonus and crew values, but we changed the crew productivity values so that now the effect of bonus depends on whether crew is 4 or 6. Crew and Bonus are still not associated but because of the changes in the productivity values, they do interact. c. The interaction is not statistically significant (p-value = 0.876). Here the variables don’t interact, but they are associated because we put more observations in just one of the cells. So now the adjusted estimates of the group effects will differ from the unadjusted effects.
Chapter 3 Investigation 1. Each student is an experimental unit. 2. This was an experiment; the advantage of the experimental design is that it allows for potentially concluding a cause-and-effect relationship between type of curriculum and science score. 3. Inclusion criteria: students from the third, fourth, and fifth grade classes of the teachers who volunteered to participate in the study, from seven elementary schools. 4. a. All students were subject to the same curriculum guidelines (as appropriate for their grade level), same grade-appropriate test given at the end of the school year; all tests were graded electronically, and raw test scores were converted to a 100-point scale. b. The students were blocked by grade levels—so similar tests, curriculum guidelines applied within each grade level. c. Instructor differences, class/section differences, time of class meeting differences, other unknown sources. d. Possible Sources of Variation diagram.
Observed variation in: Science score
Inclusion criteria Students whose teachers agreed to participate Design Same (grade-appropriate) curriculum guidelines and tests
Sources of explained variation • Curriculum • Grade level
Sources of unexplained variation • Instructor difference • Class/section differences • Time of class meeting differences • Unknown
5. Overall mean = 51.65 points, Overall SD = 18.99 points, SSTotal = 232,947.9, predicted score = 51.65 points, SE of residuals = 18.99 points. 6. The change in average score from one type of curriculum to another type of curriculum is different for at least one of the grade levels.
c03InstructorSolutions.indd 39
7. See the following graph. 60
Mean science score
c. H0: There is no interaction between company and treatment vs. Ha: There is an interaction between company and treatment. With an F-ratio of 0.67 and a p-value of 0.722 > 0.05, we do not have convincing evidence of a genuine interaction between treatment and company. It would be reasonable to remove the interaction from the analysis (we won’t lose a significant amount to our R2).
Fifth
55
Fourth
50
Third
45
Garden
Control Treatment
8. Yes, there is evidence of an interaction because it appears that for both the third and fifth grades, the mean score is higher with the garden curriculum but for the fourth grade the mean score is higher for the traditional curriculum compared to the garden curriculum. 9. Two-variable ANOVA table without interaction:
DF
Sum of Squares
Mean Squares
F
p-value
Curriculum
1
5308.03
5308.03
15.61
< 0.0001
Grade level
2
7089.05
3544.53
10.43
< 0.0001
Error
643
218,617.61
340.00
Total
646
232,947.90
Source
10. Two-variable ANOVA table with interaction:
DF
Sum of Squares
Mean Squares
F
p-value
Curriculum
1
4926.99
4926.99
14.79
0.0001
Grade level
2
4402.86
2201.43
6.61
0.0014
Interaction
2
5128.60
2564.30
7.70
0.0005
Error
641
213,489.01
333.06
Total
646
232,947.90
Source
11. See the following table.
Model with interaction
Model without interaction
Comparison
SSTotal
232,947.9
232,947.9
Same
SSError
213,489.01
218,617.61
Larger without interaction
SSModel
19,458.9
12,397.08
Smaller without interaction
R2 pertaining to the model
0.0835
0.0615
Smaller without interaction
16/10/20 7:45 PM
40
C HA PTER 3
Multi-factor Studies and Interactions
These comparisons imply that the model is more useful with the interaction. 12. The two-variable model with the interaction allows for the effect of curriculum type to differ by grade level and explains more of the variation in the scores than does the model without the interaction. 13. predicted science achievement score
where SE of residuals = 18.25 points. 14. Null hypothesis: There is no interaction between type of curriculum and grade level. Alternative hypothesis: There is an interaction between type of curriculum and grade level. 15. There is strong evidence (F = 7.70, p-value = 0.0005) that there is an interaction between type of curriculum and grade level with regard to their effect on science scores. The distribution of residuals does not exhibit signs of skewness, but the plot of residuals vs. predicted response does show that variability of the response in the fifth-grade garden treatment group somewhat exceeds that of the other treatment groups; we need to be cautious about this. 16. We are 95% confident that the mean science score for fifth grade students using the garden curriculum exceeds that of third grade students using the traditional curriculum by an amount that is between 9.52 and 23.9 points.
c03InstructorSolutions.indd 40
17. It appears that fifth-grade students using the garden curriculum scored highest on average, significantly higher than all other groups except for fourth-grade students using the traditional curriculum. Even though the fourth-grade traditional curriculum group has only a slightly larger mean than the fourth-grade garden group, because it has a much smaller sample size compared to the latter, the fourthgrade traditional mean is not significantly different from the third-grade traditional mean (they share the letter C). However, because fourthgrade traditional curriculum group has a much larger sample size, its mean is significantly different from the third-grade traditional curriculum group. 18. We are 95% confident that the average score for third graders using the gardening curriculum will be between 45.25 and 52.89 points. 19. A 95% prediction interval will be wider because the margin of error will increase when we try to predict how an individual will perform, instead of what average for the population will be. To interpret the prediction interval, we will use language such as, “We are 95% confident that the score for a third grader using the gardening curriculum will be between …points.” 20. Based on the small p-value and random assignment from this quasi-experiment, we have strong evidence that there is an interaction effect of type of curriculum and grade level on the science scores of elementary school students like those in our study. We can conclude causation because the type of curriculum was randomly assigned, and we can generalize to other students from schools similar to these schools in Texas, with similar volunteer teachers. It appears that the benefit of the garden curriculum is largest for fifth grades and smallest for fourth grades. For each grade, the 95% confidence intervals comparing mean science scores between the two curricula (garden and traditional) were found to be statistically significantly higher for the garden curriculum in both the fifth and third grades, but not for the fourth grade. 21. Answers may vary; randomly assign students to curriculum type instead of classrooms to even better be able to isolate the effects of curriculum type. Account for incoming GPA, whether the student is male or female, etc. See whether the garden curriculum is effective at lower grades in the elementary schools. Possibly look at other nontraditional curricula.
16/10/20 7:45 PM
CHAPTER 4
Including a Quantitative Explanatory Variable Section 4.1 4.1.1 C. 4.1.2 C. 4.1.3 D. 4.1.4 D. 4.1.5 A. 4.1.6 B. 4.1.7 C.
4.1.13 Because the association between the explanatory and response variables in graph A is not linear, a separate-means model would be best. Because the association in graph B is linear, a regression model would be best. 4.1.14 a.
i. 48.41. ii. 16.00.
b.
i. 18: 44.62, 19: 45.45, 20: 49.12, 21: 57.00.
4.1.8
ii. 15.68; this is a bit of improvement over the single-means model.
a. Yes, a 62" tall female has a predicted magical height of 64.4" tall.
iii. R2 = 0.0733, so 7.33%.
b. No, a 72" tall female has a predicted magical height of 69.8". c. To find out where the switch happens, make the predicted magical height and the actual height the same. Solving x = 30.89 + 0.5408x, we get x = 67.27". 4.1.9 Because there is no variation in the residuals (SE = 0), the regression line must perfectly fit the data. This means all the variation in the response is explained by the linear relationship with the explanatory variable and R2 must equal 1.00 or 100% of the variation in the response is explained by the model. 4.1.10 a. Points B and D have positive residuals. b. Points A, C, and E have negative residuals. c. Point D has the largest residual. d. Point E has the residual closest to 0. 4.1.11 a. An increase in foot length by 1 centimeter predicts the height to increase by 2.96 centimeters. b. The residual for this person is 168 − (95.20 + 2.96(24)) = 1.76 cm. The least squares regression line underestimates the individual’s height by 1.76 cm. 4.1.12 a. They are extrapolating and the same linear relationships most likely won’t continue into the future in the same way. b. Solving we get 2634.48 so the year would be 2636 (the next multiple of 4).
4.1.15 a. lˆ eft % = − 33.31 + 4.19(age). b. 18: 42.11, 19: 46.30, 20: 50.49, 21: 54.68; The regression predictions are a bit low for the 18- and 21-year-olds and a little high for the other two groups. c. 15.58. d. 6.26%. 4.1.16 a. This is an observational study. b. Explanatory variables: Height and diameter (both are quantitative), Response variable: Volume (quantitative). c. Mean = 19.58, SD = 5.93; the single-mean model is predicted volume = 19.58 ft3, SE of residuals = 5.93 ft3. d. ˆme = −22.06 + 3.77(diameter), SE of residuals = 2.895, R 2 = e. volu 0.776.
f. Diameter is a better predictor than height because the SE of residuals is smaller and the R2 is larger. 4.1.17 a. This is an experiment randomly assigned.
because
the
treatments
were
b. Mean number of seeds = 57.17, SD = 19.24; single-mean model is predicted seeds germinated = 57.17, SE of residuals = 19.24.
41
c04InstructorSolutions.indd 41
16/10/20 7:56 PM
42
C HA PTER 4
Including a Quantitative Explanatory Variable Model
Med (3)
MedHigh (4)
High (5)
Actual means
Low (1) LowMed (2) 33.50
60.63
71.38
65.00
55.38
Single-mean
57.17
57.17
57.17
57.17
57.17
Residual SE 19.24
Separate-means
33.50
60.63
71.38
65.00
55.38
14.86
Regression
47.55
52.36
57.17
61.98
66.79
18.20
Solution 4.1.17d.ii Regression Model
Separate-Means Model
Source
Df
Sums of squares
Source
DF
Sums of squares
Model
1
1,852.81
Model
4
6,708.15
Error
38
12,588.96
Error
35
7,733.63
Total
39
14,441.78
Total
39
14,441.78
Solution 4.1.17e
c.
i. Low: mean = 33.50, SD = 10.06; LowMed: mean = 60.63, SD = 17.38; Med: mean = 71.38, SD = 9.35; MedHigh: mean = 65.00, SD = 15.62; High: mean = 55.38, SD = 19.24. There appears to be an association between water level and number of seeds germinated because the means are quite different from each other. ⎧
⎪
The separate-means model is a better predictor than the single-means model because the SE of residuals is smaller. d.
i. SE residuals = 18.20. ii. See table for Solution 4.1.17d.ii. The separate-means models seem to be the best because it has the smallest standard error of residuals.
e. See Solution 4.1.17e. The separate-means models seem to be the best because it has the smaller SSError and the larger R2 (0.464 compared to 0.128). f. A linear regression model is not appropriate for these data because the data are not linear. As the water amounts increase, the number of seeds that germinated increases then decreases. 4.1.18 a. Yes, a linear regression model seems appropriate because there appears to be a linear relationship between body temperature and social warmth. ˆ b. c. The slope is 0.4612 which means as temperature increases by 1 degree, the social warmth score is predicted to increase by 0.4612 points. d. R2 = 0.121. e. 4.3. f. 4.4 − 4.3 = 0.1 which means the actual score of 4.4 is 0.1 points higher than predicted.
b. c. The slope is 0.2339 which means as the DAPIQ score increases by 1, the WPPSI score is predicted to increase by 0.2339. d. R2 = 0.090 so just 9%. e. 102.8. f. 82 − 102.8 = −20.8, which means the actual score is 20.8 points below the predicted score. 4.1.20 a. Yes, a linear regression model is appropriate because there appears to be a strong linear relationship between the frequency of a cricket’s chirps and outside temperature. ˆ b. c. The slope is 4.31 which means that as the temperature increases by 1°F the chirps per minute are predicted to increase by 4.31. d. R2 = 0.961. e. The regression equation changes to 2 R stays the same at 0.961.
Section 4.2 4.2.1 A. 4.2.2 C. 4.2.3 C. 4.2.4 A. 4.2.5 D. 4.2.6 B. 4.2.7 B. 4.2.8 Write the x-values on 15 different slips of paper and y-values on 15 different slips of paper. Lay the 15 slips of paper with the xvalues written on them in a line on a flat surface. Shuffle the 15 slips of paper with the y-values on them and deal one out to each of the 15 slips of paper with the x-values on them. Calculate the least-squares regression equation for these shuffled data of 15 pairs of x-values and y-values and record the slope. Repeat this procedure 999 more times to get 1,000 simulated slopes which can be plotted to construct the simulated null distribution.
4.1.19
4.2.9 The F-statistic is the square of the t-statistic, that is, F = t2.
a. Yes, a linear regression model seems appropriate because there appears to be a linear relationship between WPPSI and DAPIQ.
ˆ a.
c04InstructorSolutions.indd 42
4.2.10
16/10/20 7:56 PM
Solutions to Exercises
0.5 Residuals
Count
50 Mean = 0.00 SE = 0.384 (df = 52) 40
43
30 20
0 –0.5
10
–1
0
–1
–0.5 0 Residuals
4.0
0.5
4.2 4.4 Predicted values
4.6
Solution 4.2.14a ˆ b.
b. p-value < 0.0001. c. For each one-inch increase in actual height, magical height is predicted to increase by 0.55 inches. 4.2.11 ˆ a.
c.
H0: There is no association between physical body temperature and social warmth. Ha: There is a linear association between physical body temperature and social warmth. ii. iii. (Both p-values should be the same.).
b. 0.3052.
iv.
c. ½(0.3052) = 0.1526. d. 1 − 0.1526 = 0.8474. 4.2.12
4.2.14
a. Because the response variable is the same in both cases, the SSTotal, which quantifies the variation in the response variable, stays the same, regardless of the model.
a. See graphs for Solution 4.2.14a. Because the graph of the residuals is fairly symmetric with no large outliers, and the graph of the residuals vs. the predicted values does not show any strong evidence of curvature or patterns and has a fairly constant width, the validity conditions are met.
b. For the separate-means model, the sum of squares is the sum of the squared distance each response value is from its group mean. For the regression model, the sum of squares is the sum of the squared distance each response value is from its predicted value using the regression equation. The regression model gives the larger sum of squares error. c. For the separate-means model, R2 = 1,594.86/21,762.76 = 0.0733. For the regression model, R2 = 1,362.33/21,762.76 = 0.0626. The separate-means model gives the larger R2. d. The degrees of freedom are smaller for the regression model (1 DF compared to 3 DF for the separate-means model). Therefore, a somewhat similar sums of squares for the regression model is divided by only 1 (compared to 3) and the mean-square model for the regression is much larger and therefore the F is much larger. 4.2.13 a. Yes, from the scatterplot, there appears to be a linear relationship between body temperature and warmth score.
b. H0: β1 = 0, Ha: β1 ≠ 0; b1 = 0.4612, t = 2.68, p-value = 0.0099. Based on the very small p-value, we have strong evidence of a (positive) linear association between physical warmth and social warmth. c. A 95% confidence interval for the population slope is (0.1153, 0.8071). As physical body temperature increases by 1 degree, the social warmth score is predicted to increase by between 0.1153 and 0.8071 points. d. Yes, because the confidence interval does not contain 0, 0 is not a plausible value for the population slope, which agrees with the conclusion that there is a linear association between physical warmth and social warmth obtained using the p-value. 4.2.15 a. See graph below. A line fits okay, but points tend to start out below the line and as you move to the right they tend to be above the line, a curved model rather than a linear model might work better.
5.6
PIT
Warmth score
60 4.9 4.2
20
3.5
36
c04InstructorSolutions.indd 43
40
36.5
37.5 37 Temperature(ºC)
38
10
20 AQ
30
16/10/20 7:56 PM
44
C HA PTER 4
c.
i. There is no association between PIT and AQ. Ha: There is a linear association between PIT and AQ.
c. From the theory-based regression table, the SE = 0.54 and t = 3.24. The regression table SE is a bit smaller making the t-statistic a bit larger than the values found through shuffling.
ii.
4.2.18
iii. (both p-values should be the same).
a. See graphs for Solution 4.2.18a. Because the graph of the residuals is fairly symmetric with no large outliers and the graph of the residuals vs. the predicted values does not show any strong evidence of curvature or patterns and has a fairly constant width, the validity conditions are met.
Including a Quantitative Explanatory Variable
b. ˆ PIT = 8.43 + 1.75(AQ).
Based on the very small p-value, there is strong evidence of a linear association between PIT and AQ. Based on the positive slope, the association is positive. 4.2.16
b. H0: β1 = 0, Ha: β1 ≠ 0; b1 = 0.2339, t = 3.10, p-value = 0.0025. Based on the very small p-value, we have very strong evidence of a (positive) linear association between WPPSI scores and DAPIQ.
a. See graphs for Solution 4.2.16a. The graph of the residuals is fairly symmetrical with no large outliers. The graph of the residuals vs. the predicted values has a fairly constant width; however there appears to be a downward trend of the points so the validity condition associated with linearity may not be met.
c. A 95% confidence interval for the population slope is (0.0844, 0.3834). We are 95% confident that as DAPIQ increases by 1, the mean WPPSI score increases by between 0.0844 and 0.3834 points.
b. H0: β1 = 0, Ha: β1 ≠ 0; b1 = 1.75, t = 3.24, p-value = 0.0028. Based on the very small p-value, we have very strong evidence of a (positive) linear association between PIT and AQ.
d. Yes, because the confidence interval does not contain 0, 0 is not a plausible value for the population slope which agrees with the conclusion obtained using the p-value.
c. A 95% confidence interval for the slope or the population regression line is (0.6508, 2.8455). As AQ increases by one point, the PIT is predicted to increase by between 0.6508 and 2.8455 points.
4.2.19 a. See graphs for Solution 4.2.19a. Because the graph of the residuals is fairly symmetrical with no large outliers and the graph of the residuals vs. the predicted values does not show any strong evidence of curvature or patterns and has a fairly constant width, the validity conditions are met.
d. Yes, because the confidence interval does not contain 0, zero is not a plausible value for the population slope which agrees with the conclusion that there is a linear association between PIT and AQ that we obtained using the p-value.
b. H0: β1 = 0, Ha: β1 ≠ 0; b1 = 0.7178, t = 5.51, p-value < 0.0001. Based on the very small p-value (and the positive slope), we have very strong evidence of a (positive) linear association between height and volume.
4.2.17 a. SD ≈ 0.614. b. standardized statistic = 1.75/0.614 = 2.85.
Mean = 0.00 30 SE = 21.244 (df = 32)
40
Residuals
Count
20 20
10
0 –20 –40
0
–40
–20 0 Residuals
20
40
20
40 Predicted values
60
Solution 4.2.16a
100 Mean = 0.00 SE = 10.981 (df = 98)
20
Residuals
Count
80 60 40
0
–20
20 0
–20
0 Residuals
20
95
100 105 Predicted values
110
Solution 4.2.18a
c04InstructorSolutions.indd 44
16/10/20 7:56 PM
Solutions to Exercises
5 Residuals
Count
Mean = 0.00 15 SE = 3.592 (df = 16)
45
10
0
5 –5 0
–5
0 Residuals
10
5
15 20 Predicted values
25
Solution 4.2.19a
c. A 95% confidence interval for the population regression line slope is (0.4572, 0.9784) using 2SD from applet or using software (0.4415, 0.9940). We are 95% confident that as height increases by one foot, the volume increases by between 0.4415 and 0.9940 cubic feet on average.
4.3.5 C.
d. Yes, because the confidence interval does not contain 0, zero is not a plausible value for the population slope which agrees with the conclusion obtained using the p-value.
ˆ b.
4.3.6 C. 4.3.7 B. 4.3.8 ˆ a. 4.3.9
4.2.20
a. ˆ b.
a. See graphs for Solution 4.2.20a. Because the graph of the residuals is fairly symmetrical with no large outliers and the graph of the residuals vs. the predicted values does not show any strong evidence of curvature or patterns and has a fairly constant width, the validity conditions are met.
4.3.10 a. Among high school seniors with the same index finger length, the predicted height for females is 14.44 centimeters less than the predicted height for males.
b. H0: β1 = 0, Ha: β1 ≠ 0; b1 = 3.77, t = 7.44, p-value < 0.0001. Based on the very small p-value (and the positive slope), we have very strong evidence of a (positive) linear association between diameter and volume.
b. As index finger length increases by 1 millimeter, the predicted height increases by 0.1816 centimeters, for male as well as female high school seniors.
c. A 95% confidence interval for the population regression line slope is (2.76, 4.78) using 2SD from applet or (2.70, 4.84) using statistical software. We are 95% confident that as diameter increases by 1 inch, the volume increases by between 2.70 and 4.84 cubic feet on average.
4.3.11 a. where method = 1, for paper and method = −1, for computer. The intercept of 6.21 is the average score for both methods combined and the slope coefficient of 0.7125 is how much higher the predicted score is, on average, for the paper method over the overall average score.
d. Yes, because the confidence interval does not contain 0, zero is not a plausible value for the population slope which agrees with the conclusion obtained using the p-value.
4.3.3 B.
b. where method = 1, for paper and method = 0, for computer. The intercept of 5.50 is the average score for the computer method, and the slope coefficient of 1.43 is how much higher the predicted score is, on average, for the paper method than for the computer method.
4.3.4 D.
c. 16.8%.
Section 4.3 4.3.1 A. 4.3.2 C.
Mean = 0.00 15 SE = 1.086 (df = 16)
2
Residuals
Count
1 10
5
0 –1 –2
0
–2
–1
0 Residuals
1
2
10
11 Predicted values
12
Solution 4.2.20a
c04InstructorSolutions.indd 45
16/10/20 7:56 PM
46
C HA PTER 4
Including a Quantitative Explanatory Variable
d. Yes, with a p-value of 0.0086 and because this was a randomized experiment, there is strong evidence that the note-taking method has an effect on the quiz score, with the paper method giving higher scores on average than the computer method. 4.3.12 a.
iii.
b.
ˆ i.
ii. 60.795. The slope can be interpreted as follows: as sleep time increases by 1 hour, the predicted reaction time decreases by 10.55 milliseconds. b. A person’s sex does seem to have an association with reaction time because the mean reaction time for females is 366.50 milliseconds while for males it is noticeably lower at 306.29 milliseconds. ˆreaction c. i. adjusted time = 396.88 − 9.33(sleep). ii. The SE of residuals is 53.775 after adjusting for a person’s sex. This is smaller than the residual SE before adjusting.
iv. ˆ = 2.82 + 0.0787(sleep hours) + 0.1105(nap), where i. GPA nap = 1, for didn’t take a nap and nap = 0, for took a nap.
ii.
iii.
4.3.17 a.
ˆ / sec = 0 + 0.1028(air temp). i. chirps
ii. iii. For an increase of 1 °C in the air temperature, the chirp rate is predicted to increase by 0.1028 chirps per second. ˆ / sec = − 0.4737 + 1.0142(species) + 0.1353(air temp), b. i. chirps where species = −1, for Karschi crickets and species = 1, for Capensis crickets.
The slope can be interpreted as follows: after adjusting for a person’s sex, as sleep time increases by 1 hour, the predicted reaction time decreases by 9.33 milliseconds.
ii. R2 = 0.9356. iii.
4.3.13
ˆ a. with sex = 0, for female and sex = 1, for male.
4.3.18
b. R2 = 0.291.
a.
ii. R2 = 0.321.
c. The reaction time for males is predicted to be 58.14 milliseconds faster (smaller) than that for females with the same amount of sleep. d. 354.98 milliseconds. 4.3.14
iii. For each increase of 1 hour of sleep, the score is predicted to decrease by 0.5457 points. b.
a. i. 72-inch-tall person is predicted to have a taller magical A height of 73.70 inches. This is probably not true for both males and females, because tall females (say, 72 inches) probably don’t generally wish to be taller. b.
i.
core = 4.48 − 0.5457(sleep). i. sˆ
i. ii. iii.
4.3.19
ii. A 72-inch-tall person is predicted to have a shorter magical height of 70.73 inches. This is probably not true for both males and females; a 72-inch-tall male would probably be much less likely to want to be shorter than would a 72-inch-tall female.
a.
iii.
b.
i. ˆ cost = − 3.30 + 12.84(carat).
ii. iii. For each increase of 1 carat, the price is predicted to increase $12,840. i. ˆ cost = − 2.9346 − 0.5341(clarity) + 12.7369(carat), where clarity = 1, for good, and clarity = 0, for excellent.
ii. iii. 4.3.15 a.
c.
ii.
ii.
iii. b.
i. magiˆ c height = 32.33 − 4.46(sex) + 0.5904(height), with sex = 1, for female and sex = 0, for male.
ii. iii. The magical height for females is predicted to be 4.46 inches shorter than that for a male of the same actual height. 4.3.16 a.
i. ii.
c04InstructorSolutions.indd 46
i. ˆ cost = − 4.0412 + 1.2624(color) + 13.0625(carat), where color = 1, for clear, and color = 0, for yellow. iii. iv. Color results in a larger increase in the R2 value.
4.3.20
a. i. ˆ cost = − 3.30 + 12.84(carat). ii. R2 = 0.865. b. R2 = 0.872. c.
i. ii. R2 values increased each time a variable was added to the model.
16/10/20 7:56 PM
Solutions to Exercises iii. ˆ cost = − 3.5779 − 0.8087(clarity) + 1.4181(color) + 12.9296 (carat), where clarity = 1, for good and 0 for excellent, and color = 1, for clear, and 0 for yellow. For the same weight and color, a diamond rated good in terms of clarity is predicted to cost $808.70 less than one rated excellent. For the same weight and clarity, a colorless diamond is predicted to cost $1,418.10 more than a yellow one. For the same color and clarity, for each additional carat a diamond weighs, the cost is predicted to increase by $12,929.60.
Section 4.4 4.4.1 B.
c. ˆ GPA = 3.06 − 0.7822(nap) + 0.0430(sleep hours) + 0.1265(nap × sleep hours), SE of residuals = 0.334, R2 = 0.104, model p-value = 0.0022. d. The SE of residuals and the p-value decreased and the R2 value increased. 4.4.13 ˆ GPA = 3.06 − 0.7822(nap) + 0.0430(sleep hours) + 0.1265
(nap × sleep hours); Higher GPAs are predicted for those who take naps if they sleep less than about 6. hours. However, for those who sleep more than about 6.25 hours, having a nap is associated with a lower predicted GPA. This difference between GPAs continues to grow as sleep hours increase. (To find the 6.25 hours, solve 0.7822 = 0.1265x.).
4.4.14 cos ˆt = −2.9003 − 1.1270(color) + 11.5173(carat) + 3.3294 (color × carat). For diamonds between 3 and 4 carats, color has little effect on the relationship between carat and cost. However, as carat increases, the predicted price of colorless diamonds starts to increase by an increasingly large amount.
4.4.2 C. 4.4.3 B. 4.4.4 C. 4.4.5 D. 4.4.6 D. 4.4.7
47
4.4.8 a. For each 1-centimeter increase in foot length, the height of a male is predicted to increase by 1.27 centimeters. b. The y-intercept of the regression line for predicting female height is 24.06 centimeters less than the y-intercept of the regression line for predicting male height. c. The slope of the regression line for predicting female height is 0.47 more than the slope of the regression line for predicting male height. 4.4.9
4.4.15 ˆ
For diamonds that weigh about 0.5 carats, clarity has very little effect on the relationship between carat and cost. For diamonds above 0.5 carats, there is an increasing difference in price with diamonds rated as excellent clarity costing more than diamonds rated as good clarity. Below 0.5 carats there is an opposite effect (at least with the lines, not necessarily the data) as diamonds that are rated as good clarity are predicted to cost more than those rated as excellent clarity. 4.4.16
ˆ a. , where species = 1 for Karschi and species = 0 for Capensis. b.
a. A = 171.13 and B = 0.09.
ii. The y-intercept of the regression line for Karschi is 1.3789 chirps per second higher than the y-intercept of the regression line for Capensis.
b. C = 144.04 − 171.13 = −27.09. c. D = 0.24 − 0.09 = 0.15.
iii. The slope of the regression line predicting the chirp rate for Karschi is 0.1685 less than the slope of the regression line predicting the chirp rate for Capensis.
4.4.10 a. For each 1 millimeter increase in finger length, predicted height increases by B centimeters for males. b. For each 1 millimeter increase in finger length, predicted height increases by B + D centimeters for females.
i. For each 1 degree increase in temperature, the predicted chirps per minute of the Capensis species increase by 0.2865.
4.4.17 a. R2 = 0.952 and the SE of residuals = 0.240.
c. For males with a finger length of 0 millimeters, the predicted height is A centimeters.
b. There is a significant interaction between temperature and species because the t = −5.04 and the p-value < 0.0001.
d. For females with a finger length of 0 millimeters, the predicted height is A + C centimeters.
4.4.18
4.4.11 a. The y-intercept of the regression equation for predicting male height is A + C. b. The y-intercept of the regression equation for predicting female height is A − C.
ˆ ˆ c. a. for Capensis. b.
where species = 1 for Karschi and species = 0
i. For each 1 degree increase in chirp rate, predicted temperature increases by 2.60 degrees for Capensis.
c. The slope of the regression equation for predicting male height is B + D.
ii. The y-intercept of the regression line for Karschi is 3.51 degrees higher than the y-intercept of the regression line for Capensis.
d. The slope of the regression equation for predicting female height is B − D.
iii. The slope of the regression line predicting the temperature for Karschi is 0.99 more than the slope of the regression line predicting temperature for Capensis.
4.4.12 a. ˆ GPA = 2.81 + 0.0851(sleep hours), SE of residuals = 0.340, R2 = 0.056, p-value = 0.0054. b. ˆ GPA = 2.82 + 0.1105(nap) + 0.0787(sleep hours), SE of residuals = 0.337, R2 = 0.079, p-value = 0.0041, nap = 1 for no nap, and nap = 0 for a nap.
c04InstructorSolutions.indd 47
4.4.19
ˆ a. where sex = 1 for male and sex = 0 for female. b.
i. For each 1-hour increase in sleep hours, the reaction time for females is predicted to decrease by 16 milliseconds.
16/10/20 7:56 PM
48
C HA PTER 4
Including a Quantitative Explanatory Variable
ii. For those with 0 hours of sleep, the predicted reaction time for males is 136.73 milliseconds faster (smaller) than the predicted reaction time for females. The slope of the regression equation for males is 11.45 more than the slope of the regression equation for females. 4.4.20 a. R2 = 0.313 and the SE of the residuals = 53.450. b. There is not a significant interaction between sleep time and a person’s sex because t = 1.28 and the p-value is 0.2070. c. Although the interaction explains some of the variation in reaction times (so R2 increases) it is not significant, so R2 does not increase very much. Even though more variation is explained with the interaction, removing the (non-significant) interaction term from the model reduces the number of degrees of freedom for the model. Without the interaction term in the model, the F-statistic for the model increases from 7.76 to 10.70 and the p-value decreases from 0.0002 to 0.0001.
a. 162.44 cm. b. 182.99 cm. c. Those whose left foot is longer are predicted to be tallest and those whose right foot is longer are predicted to be the shortest. 4.5.10 a. Each 1-centimeter increase in the left foot length of a person whose left foot is longer than her/his right foot results in a 2.98-centimeter predicted increase in height. b. The predicted height of a person whose left and right feet are both 0 centimeters long is 130.8 centimeters. c. Each 1-centimeter increase in the left foot length of a person whose right foot is longer than her/his left foot results in a 2.61-centimeter predicted increase in height. 4.5.11 ˆ a.
ˆ b. ˆ c.
i. For each 1-inch increase in height, the magical height for males is predicted to increase by 0.5084 inches.
4.5.12 ˆ a.
4.4.21
ˆ a. b.
4.5.9
ii. For those with a height of 0 inches, the predicted magical height for females is 13.7674 inches less than the predicted magical height for males. iii. The slope of the regression equation for females is 0.1352 more than the slope of the regression equation for males. 4.4.22 a. R2 = 0.814 and the SE of residuals = 2.103. b. There is not a significant interaction between a person’s sex and their height; t = 1.09 and the p-value is 0.2758.
b.
i. 3.66 − 32.89 = −29.23. ii. iii. iv.
ˆ c. 4.5.13 a. A + C.
c. Removing the (non-significant) interaction term from the model reduces the amount of variation in magical height slightly, but the F-statistic for the model increases from 186.43 to 278.62 because the degrees of freedom for the model decreases from 3 to 2 when the interaction term is removed.
b. A.
Section 4.5
b. B + F − G − H.
4.5.1 B.
4.5.15
4.5.2 B, C.
a.
4.5.3 C. 4.5.4 C. 4.5.5 C. 4.5.6 D. 4.5.7 A. 4.5.8 a. For those whose left feet are longer, for each 1-centimeter increase in left foot length, their heights are predicted to increase by 2.12 centimeters. b. For people with the identical foot lengths, those whose right and left feet are of the same length, their heights are predicted to be 4.19 centimeters shorter than those whose left foot is longer than their right foot. c. For people whose right foot is longer than their left foot, their heights are predicted to be 5.71 centimeters shorter than those whose left foot is longer than their right foot.
c. B + E. d. B. 4.5.14 a. A + C − D − E.
b. Yes, D-graded diamonds are predicted to be the most expensive because grade D has the largest coefficient and the coefficients get smaller as you go to grades E, F, and down to H which has the smallest coefficient. Because grade I is the reference category, if you included a term for grade I in the equation, it would have a coefficient of 0 which is even smaller than the coefficient for grade H. c. Grade I is the reference category. d. For grade-I diamonds, for each 1-carat increase in weight, the cost is predicted to increase by $13,275.40. e. For diamonds that are 0 carats (or any number of carats but the same), a diamond of grade D is predicted to cost $3,850.30 more than a diamond of grade I. 4.5.16 a.
d. People who have longer left feet than right feet, with a left foot length of 0 centimeters are predicted to have a height of 119.39 centimeters.
c04InstructorSolutions.indd 48
16/10/20 7:56 PM
Solutions to Exercises
− 2.5572 + 0.0572(prof arrival) + 1.8405(100 level) − 1.6901 (200 level) − 0.1505(300 level) + 0.2759(100 level × prof arrival) − 0.3742(200 level × prof arrival) + 0.0983(300 level × prof arrival).
b. The interactions between carat and groups D, E, and F are statistically significant at the 5% level. c. There tends to be a greater difference in the costs of the different colors of diamonds as the number of carats increases. d. The applet shows: The intercept is calculated by adding the intercept and the coefficient for grade D together (−3.6374 − 1.8373 = −5.4747) and the slope is calculated by adding the coefficients on the carat term and the (D × carat) term (11.6848 + 7.4578 = 19.1426).
4.5.19 a.
i. Yes, the scatterplot has an upward trend with a positive correlation and slope of the regression line which indicates higher resting heart rates are associated with higher post-exercise heart rates. See the scatterplot for Solution 4.5.19a. ii. There is strong evidence of a positive linear relationship between resting heart rate and heart rate. F = 10.46, p-value = 0.0031, and the slope is positive.
4.5.17
ˆ a.
iii. There is strong evidence of an association between stepping heart rate and heart rate. F = 6.00, and p-value = 0.0070.
i. For each 1°C increase in temperature, the Sycamorus cricket’s chrip rate is predicted to increase by 0.7277 chirps per second. ii. For a temperature of 0°C, the Karschi cricket is predicted to have a chirp rate of 5.0789 chirps per minute higher than that of the Sycamorus cricket. iii. d. The intercept is and the slope is ˆ 0.7277 − 0.6096 = 0.1181, so the equation is
b.
i. There is strong evidence of an association between resting heart rate and heart rate after adjusting for stepping rate. F = 20.32, p-value < 0.0001. ii. There is not strong evidence of an interaction between stepping rate and resting heart rate on heart rate. F = 1.35, p-value = 0.2794.
c. Because the interactions were not significant, we leave these out of the model. ˆ Heart rate (bpm) after stepping
b. The reference category is Sycamorus. c.
4.5.18 a.
i.
ii.
Avg class arrival (min)
120 100 80 70 80 90 Resting heart rate (bpm)
Solution 4.5.19a
4.5.20
0
a. There is strong evidence of a positive linear relationship between bill amount and tip amount. F = 114.1, and p-value < 0.0001, and the slope is positive.
–2
b. i. Adding the variable age does not improve the model significantly because the partial F = 1.81 with a p-value of 0.1754.
–4
ii. Adding the interactions do not improve the model significantly because F = 0.38 and p-value = 0.6854. –10
–5 0 Professor arrival (min)
5
i. There is not strong evidence (although it is moderate) of an association between average student arrival time after adjusting for class level and professor arrival time because F = 3.37 and the p-value is 0.0785. ii. There is strong evidence of an interaction between average student arrival time and class level because F = 7.35 and the p-value is 0.0034.
c.
c04InstructorSolutions.indd 49
140
60
iii. There is not strong evidence of an association between class level and professor arrival time because F = 0.77 and p-value = 0.4711.
b.
49
ˆ arrival time = student
c. i. Adding conversation improves the model because F = 6.44 and p-value = 0.0146. ii. The interaction is not significant. F = 1.59, p-value = 0.2131. Because adding age was not significant nor were interactions between conversation and bill amount, we leave those out of the model. We end up with a model of tipˆ amount = − 0.1068 + 0.1786(bill amount) + 0.8859(conversation) where conversation = 1 when a conversation was had with the customer and conversation = 0 when no conversation was had with the customer. 4.5.21 a. There is strong evidence of a positive linear relationship between heart rate and temperature. R2 = 0.064, F = 8.80, and p-value = 0.0036, and the slope is positive.
16/10/20 7:56 PM
50
C HA PTER 4
b.
i. Adding the variable sex does help improve the model. R2 = 0.098. The partial F is 4.78 and the p-value is 0.0307.
b. i. SSTotal is the same because the response variable is the same in both analyses.
ii. Adding the interaction does not improve the model because F = 0.03 and p-value = 0.8690.
ii. SSError is larger and R2 is smaller in this ACC analysis because the residuals are so much larger; we see larger residuals because the linear association between GPM and ACC much weaker than the linear association between GPM and WT.
Including a Quantitative Explanatory Variable
c. Because the interaction was not significant we leave this out of the model and end up with
End of Chapter 4 Exercises 4.CE.1 a. See Solution 4.CE.1a for a histogram and summary statistics from JMP; the average gallons per mile is 4.33 gallons per 100 miles, with standard deviation 1.156 gallons per 100 miles.
iii. With such a weak relationship, we expected a small t-statistic and large p-value because the data do not provide convincing evidence that the observed slope could not have just happened by random chance alone. c. From the scatterplot for GMP and WT shown in Graph 4.CE.1(b) and the fact that r = 0.926, we can see that there is a strong positive linear association between HP and WT. Yes, HP is a confounding variable.
b. See graphs in Solution 4.CE.1b. WT
4
c. Weight appears to have the strongest linear relationship. The absolute correlation coefficient is largest between weight and GPM.
3
d. predicted GPM = −0.0061 + 1.5148(weight); If the car has 0 lb. weight, we predict −0.0061 GPM; for each 1000-pound increase in weight, we predict an increase of 1.515 in GPM.
g. A 95% confidence interval for the population regression line slope is (1.31, 1.72); I’m 95% confident that for each 1,000-pound increase in weight there is a an average increase of 1.31 to 1.72 gallons per 100 miles. h. Yes, because zero is not contained inside the confidence interval there is a significant linear relationship between GPM and weight for cars in the population.
DIS
200 100
6 4
120
Mean Squares
F
p-value
0.0541
0.0394
0.8437
Model Error
36
49.3905
Total
37
49.4446
ACC
Sum of Squares
80 r = 0.888
20
a. The ANOVA table is shown next; predicted GPM = 3.967 + 0.245(ACC), with SE of residuals = 1.171.
DF
r = 0.841
160
4.CE.2
Source
r = 0.823
8
NC
f. (See Solution 4.CE.1f.) Based on the residuals vs. predicted values plot, there may be a hint of a pattern of negative, then positive, then negative residuals, indicating a possible violation of the linearity condition, but there is no reason to suspect that the variability of residuals changes as the predicted response increases or decreases; a histogram of residuals behaves like a normal distribution.
r = 0.926
300
HP
e. R2 = 42.4219/49.4446 = 0.858 so 85.8% of the variation in GPM is explained by the linear association with weight.
2
16 12 r = 0.033 3.0
1.3720
6.0
Solution 4.CE.1b
GPM
Summary Statistics GPM
Summary Statistics Mean Mean 4.3306053 Std 1.1560015 StdDev Dev Std Err mean 0.1875282 Std Err Upper 95%mean Mean 4.7105735 Lower 95% Mean 3.950637 Upper 95% Mean N 38
2.5
4.5 GPM
3
3.5
4
4.5 5 GPM
5.5
6
4.330 1.156 0.187 4.710
Lower 95% Mean
3.950
n
38
6.5
Solution 4.CE.1a
c04InstructorSolutions.indd 50
16/10/20 7:56 PM
51
Solutions to Exercises
1 0.5
Residuals
Count
10
Mean = 0.00 SE = 0.442 (df = 36)
5
0 –0.5
0
–1
–0.5
0 Residuals
0.5
–1
1
3
4 5 Predicted values
6
Solution 4.CE.1f d. WT and ACC might be the best two variables to use because there isn’t an association between WT and ACC. 4.CE.3 a. Mean for secretly filled bowls = 14.69 ounces and mean for regular bowls = 8.45 ounces. b.
Source
DF
Sums of squares
Mean squares
F
p-value
524.54
524.54
9.65
0.0031
54.35
Group Error
52
2826.26
Total
53
3350.80
h. Yes, because 0 is not in the confidence interval for the population slope, there is a significant relationship between the amount of soup consumed and the bowl type. 4.CE.4 a. predicted glucose = 76.82 + 1.06(BMI); F = 174.74 and t = 13.22; p-value < 0.0001. There is very strong evidence of a positive linear association between BMI and blood glucose level. b. After adjusting for prevalence of diabetes, there seems to still be strong evidence (p-value < 0.0001) of a positive linear association between BMI and blood glucose level. c. R2 = 0.394 About 39% of the observed variation in blood glucose levels is explained by BMI and prevalence of diabetes.
c. predicted ounces consumed = 8.45 + 6.23(secretcode); Intercept: 8.45 is the predicted ounces consumed with the regularly refilled bowls; Slope: 6.23 is the increase in predicted ounces consumed moving from regular bowls to secretly refilled bowls. d. The intercept is the mean for the regularly filled bowls, and the slope is the increase in mean ounces consumed from regularly to secretly filled bowls. e. The F-statistic = 9.65 is the square of the t-statistic = 3.11; the p-value (0.0031) is the same because they both test for whether the type of bowl is a significant source of variation in the amount of soup consumed. f. The residuals versus predicted ounces plot shows no reason to believe that the assumption of equal variance has been violated. Furthermore, a histogram of the residuals does not show any severe skewness of distribution—therefore, the normality assumption is valid. (See graph for Solution 4.CE.3f.). g. We are 95% confident that, on average, secretly refilled bowls increase the ounces consumed by 2.21 to 10.25 ounces.
d. predicted blood glucose = 109.26 + 0.3859(BMI) +
19.60, if have diabetes SE of residuals = 18.53 mg/dL. , {− 19.60, if don′t have diabetes
After adjusting for BMI, having diabetes is associated with an increase of 19.66 mg/dL in the predicted blood glucose level from the overall mean blood glucose level of 109.26 mg/dL. e. Yes, after adjusting for the presence or absence of diabetes, BMI is still a significant predictor of blood glucose because the p-value is still < 0.0001. 4.CE.5 a. For diabetics predicted blood glucose = 128.86 + 0.3859(BMI); for non-diabetics predicted blood glucose = 89.67 + 0.3859(BMI); the slope coefficient corresponding to BMI has stayed the same, but the intercept has changed corresponding to whether having diabetes has a positive or negative effect on the predicted blood glucose level. For diabetics, the intercept is 109.26 + 19.60 = 128.86 and for nondiabetics the intercept is 109.26 − 19.60 = 89.66. b. R2 and residual SE stay the same. Mean = 0.00 15 SE = 7.372 (df = 52)
Count
Residuals
10
0
10
5
–10 8
10 12 Predicted values
14
0
–10
0 Residuals
10
Solution 4.CE.3f
c04InstructorSolutions.indd 51
16/10/20 7:56 PM
52
C HA PTER 4
Including a Quantitative Explanatory Variable
4.CE.6
h. We have strong evidence that there is an association between blood glucose and BMI, and that the nature of the association differs based on whether or not someone is diabetic, with a higher BMI being a greater disadvantage for non-diabetics. However, because not all validity conditions were met, our p-value, confidence interval, and prediction interval calculations may not be valid.
a. For diabetics predicted blood glucose = 141.40 − 0.0125(BMI); for non-diabetics predicted blood glucose = 86.5853 + 0.4971(BMI).
a.
c. We are 95% confident that the mean blood glucose level for similar diabetic people with BMI = 30 kg/m2 is somewhere in the interval 138.70 to 142.17 mg/dL. We are 95% confident that a diabetic individual with BMI = 30 kg/m2 will have a blood glucose level between 104.07 and 176.80 mg/dL.
Blood glucose level
b. The interaction plot is shown next. For those who do not have diabetes, an increase in BMI appears to be associated with an increase in blood glucose level; however, for those who already have diabetes, we don’t see much of an association between BMI and blood glucose level. 155 150 145 140 135 130 125 120 115 110 105 100 95 90
Diabetes 0 1
4.CE.7
b. R2 = 0.081 About 8.1% of the observed variation in blood glucose levels is explained by BMI and whether someone takes cholesterol medications regularly. c. For those who take cholesterol meds regularly predicted blood glucose = 83.44 + 0.9696(BMI); for those who do not take cholesterol meds regularly predicted blood glucose = 76.18 + 0.9696(BMI). d. Yes, even after adjusting for cholesterol meds, the p-value corresponding to BMI is very small (< 0.0001). e. We have strong evidence that even after adjusting for cholesterol meds, there is a strong positive association between blood glucose levels and BMI. 4.CE.8 a. For those who take cholesterol medications regularly, predicted blood glucose = 80.07 + 1.0856(BMI); For those who do not take cholesterol medications regularly, predicted blood glucose = 78.27 + 0.8943(BMI).
10
15
20
25
30
35 40 BMI
45
50
55
60
c. The R2 = 0.396 is slightly larger than when the interaction term was not included, and SE of residuals = 18.497 is slightly smaller than before.
b. See the interaction plot; there does not appear to be much of an interaction between BMI and taking cholesterol meds because the increase in blood glucose level associated with increase in BMI appears to be similar in both cases. 145
d. The interaction is statistically significant, F = 10.08, p-value = 0.0015. We should not drop the interaction term from the model.
f. The intervals have only very slightly changed from before when the interaction was not included; this is not surprising given that R2 and the SE of residuals changed so little. g. The plot of residuals versus predicted values does show some pattern that suggests the variability in the response variable does not stay the same across different BMI values for diabetics and non-diabetics.
135 130 Blood glucose level
e. We are 95% confident that the mean blood glucose level for similar diabetic people with BMI = 30 kg/m2 is somewhere in the interval 139.26 to 142.80 mg/dL. We are 95% confident that a diabetic individual with BMI = 30 kg/m2 will have a blood glucose level between 104.72 and 177.34 mg/dL.
Cholesterol medication Don't take cholesterol medication Take cholesterol medication
140
125 120 115 110 105 100 95 90 85 10
15
20
25
30
35 40 BMI
45
50
55
60
Residual
200
c. R2 is still 0.081 and SE of residuals stayed at 22.82 mg/dL.
100
d. No, the interaction term does not appear significant (p-value = 0.2415), so we are justified in dropping the interaction from the model.
0
–100 100
c04InstructorSolutions.indd 52
110 120 Fitted value
130
140
e. Using the model without the interaction term: We are 95% confident that similar people who take cholesterol medications regularly with BMI = 30 kg/m2 the mean blood glucose level is somewhere between 111.33 and 113.93 mg/dL. We are 95% confident that an individual with BMI = 30 kg/m2 and who take cholesterol
16/10/20 7:56 PM
Solutions to Exercises 53
200 Frequency
Frequency
800 600 400
100 0
200 0
–50
0
50 100 Residual
150
200
–100
90
100
110 120 Fitted values
130
Solution 4.CE.8f
f. Residual plots (see graphs for Solution 4.CE.8f) show no reason to suspect that any of the validity conditions have been violated. g. We have strong evidence that even after adjusting for the effect of regularly taking cholesterol medications, there is a positive association between blood glucose and BMI, with a higher BMI being associated with a higher predicted blood glucose level.
b. See the scatterplot that follows; it appears that the difference in speeds for the males and females is pretty similar at every age and that the increase in time per year of age is about the same for males and for females. Therefore, we have no reason to believe that there is an interaction between age and sex of respondent. Regression Plot 220
b. The 95% confidence interval is 129.55 ± 1.75; I’m 95% confident that the mean finish time of the population of all 48-year-old runners is between 127.8 and 131.3 minutes. The 95% prediction interval is 129.54 ± 51.84; I’m 95% confident that a randomly selected 48-year-old runner from this population will finish between 77.8 and 181.27 minutes. 9.70 if female c. predicted finish time = 116.33 + 0.3495(age) + {− 9.70 if male 2 d. R value has increased from 0.015 to about 0.14 and the SE of residuals has decreased from 26.36 minutes to 24.64 minutes. e. After adjusting for age (e.g., comparing a male and a female of the same age), females are predicted to take 9.7 minutes longer on average than the overall average. f. Yes, with t = −13.98 and p-value < 0.0001, the sex of a runner is a significant predictor of finish time. g. The 95% confidence interval is (136.2 ± 1.9); I’m 95% confident that the mean finish time of the population of all 48-year-old female runners is between 134.3 and 138.1 minutes. The 95% prediction interval is 136.2 ± 48.4; I’m 95% confident that a randomly selected 48-yearold female runner from this population will take between 87.8 and 184.6 minutes to finish the race. The intervals have gotten a bit narrower. Their midpoints have also shifted to larger values. The larger midpoints are due to older people tending to have longer finishing times. The narrower width is because of the increased precision due to having more information in the model (the smaller SE of residuals value). This is especially noticeable with the prediction interval which is much more driven by the SE of residuals. 4.CE.10 a. Males: predicted finish time = 105.19 + 0.3853(age); Females: predicted finish time = 127.02 + 0.3229(age).
c04InstructorSolutions.indd 53
Time (min)
ˆ a. The regression equation is The association is statistically significant, t = 4.58, F = 21.02, p-value < 0.0001. We have strong evidence of an association between finish time and age for the population of runners represented by this sample. Note: The R2 value is only 0.015! There is enough consistency in the direction of the association we observe in this sample that we don’t think it could have happened by chance, but the model does not explain much of the variation in the finish times.
Female
200
4.CE.9
180
Male
160 140 120 100 80 10 15 20 25 30 35 40 45 50 55 60 65 70 75 Age (years)
c. According to the applet, neither R2 (0.14) nor s (24.64) have changed. d. No, the interaction term is not statistically significant. t = 0.57 and p-value = 0.5668 for the interaction effect, which doesn’t really surprise us based on the nearly parallel lines and the lack of noticeable improvement in R2 and SE of residuals. 4.CE.11 a. The regression equation is With t = 1.88, F = 3.52, p-value = 0.0655, we have some evidence of an association between median salary 10 years after graduation and tuition for schools similar to the ones represented by this sample. Note: The R2 value is only 0.057. SE of residuals = 11,134.71. b. +
− 10, 008.25 if private , SE of residuals = 9740.05. { 10, 008.25 if public
c. R2 = 0.291; about 29.1% of the observed variation in median salary is explained by tuition and whether the school is private or public. d. Yes, the F-statistic (4.76) and small p-value (< 0.0001) provide strong evidence that tuition is a significant predictor of median salary after adjusting for type of school. 4.CE.12 a. Private: predicted median salary = 24,439.49 + 0.6483(tuition); Public schools: predicted median salary = 42,026.89 + 0.8929(tuition) b. See the scatterplot for Solution 4.CE.12b; it appears that the increase in median salaries corresponding to increases in tuition is about
16/10/20 7:56 PM
54
C HA PTER 4
Including a Quantitative Explanatory Variable 80,000
Median salary ($)
70,000 60,000 50,000 40,000 30,000 20,000 10,000
20,000
30,000 40,000 Tuition ($)
50,000
Solution 4.CE.12b
20 Mean = 0.00 SE = 9816.917 (df = 56) 15 Count
Residuals
20,000
0
10 5
–20,000 30,000
40,000 50,000 Predicted values
0 –20,000
0
20,000 Residuals
Solution 4.CE.12e
the same for private and public schools. Therefore, we have no reason to believe that there is an interaction between tuition and type of school.
c. About 14.1% of the observed variation in median salary is explained by tuition and the size of the city in which the school is located. The R2 value has increased from 0.057.
c. R2 has increased to 0.292 (from 0.291) and SE of residuals also increased to 9,740.55. Both have increased a tiny bit.
d. No, based on the p-value (0.1181), after accounting for city size, tuition does not appear to be a significant predictor of median salary. This is possibly because there is a relationship between the size of the city and tuition, and the effect of the city size is possibly confounded with the effect of tuition, so that when both variables are included, neither looks significant.
d. We are justified in dropping the interaction term because the p-value corresponding to it is not small (0.7404), t = −0.33. e. There is some fanning in the residual versus predicted plot (Solution 4.CE.12e) giving us reason to be concerned that the validity condition pertaining to equal variances in the response across values of the explanatory variable(s) is not met. f. We have strong evidence that even after adjusting for the effect of type of school, there is a positive association between tuition and median salary 10 years after graduating with a bachelor’s degree from schools such as the ones in the sample (F = 4.40, p-value = 0.0469). However, our p-value calculations might not be valid because all the validity conditions are not met. 4.CE.13 a. We need three indicator variables. b. Large city: predicted median salary = 43,873.49 + 0.15(tuition); Mid-sized city: predicted median salary = 48,139.26 + 0.15(tuition); Small city: predicted median salary = 40,528.40 + 0.158(tuition); Rural: predicted median salary = 37,577.93 + 0.15(tuition) SE of residuals = 10,912.01.
c04InstructorSolutions.indd 54
e. Yes, it is justifiable dropping city size from the model because of the possible confounding of the effects of city size and tuition, and because the effect of city size is not statistically significant (p-value = 0.1584) after we have accounted for tuition.
Chapter 4 Investigation 1. The observational units are the schools. 2. This was an observational study. 3. Random sampling was used so the results can be generalized to four-year California universities. No random assignment was used so no causal conclusions can be made. 4. The response variable is the annual tuition and fees and it is quantitative. If you are a student, low values are desirable for the response. 5. The inclusion criteria for the sample was four-year universities in California.
16/10/20 7:56 PM
55
Solutions to Exercises 6.
Observed variation in:
Sources of explained variation
Sources of unexplained variation
Inclusion criteria
• Tuition and fees • Private/Public
• Endowment • Student Faculty ratio • Degrees offered
Four-year graduation rates • Four-year universities in California
10. For each one dollar increase in tuition, the predicted four-year graduation rate increases by 0.0009. 11. The F-statistic = 26.84 and p-value < 0.0001. (See ANOVA table in Solution 11 output.) This provides strong evidence of an association between four-year graduation rate and tuition. The histogram of residuals appears normal and the plot of residuals vs predicted values appears to have equal variation. (See residual graphs in Solution 11.). 12. Next, are side-by-side histograms of the graduation rates by type of institution. 15 Public
7. Overall mean = 41.98, Overall SD = 24.388, SSTotal = 29,142.98. predicted four-year graduation rate = 41.98, SE of residuals = 24.388. 8. There is a moderately strong positive linear association between tuition and graduation rate as seen in the scatterplot. There appear to be three institutions with low tuition and high four-year graduation rates: UCLA, UC Berkley, and UC Irvine. A linear model does seem appropriate.
5
Private
0 15
80 Graduation rate
10
10 5 0
60
0
50 Graduation rate
40 20
100
13.
0
20,000 40,000 Tuition ($)
9. predicted four-year graduation rate = 18.58 + 0.0009(tuition); R2 = 0.359 and the SE of residuals for the model = 19.734. The percentage of variation in four-year graduation rates that is explained by tuition and fees is 35.9%. The SE of residuals is smaller in this model compared to the single-mean model, thus the predicted four-year graduation rates are more accurate using this model that includes tuition.
Source
DF
Tuition
=
10,450.87 389.42
48
18,692.11
49
29,142.98
, SE of residuals = 22.887, R2 = 0.137.
14. The F-statistic (7.63) is significant (p-value = 0.0081) which leads us to conclude there is a significant association between type of school (public or private) and graduation rate. (See ANOVA table.) In this case, private schools have a higher graduation rate on average than do public schools. There are only 18 public schools in the sample and while their
10,450.87
Total
48.69, if private
13.7% of the variation in four-year graduation rates is explained by the linear association with tuition. The SE of residuals is slightly higher for this model than for the one-variable model using tuition to predict four-year graduation rates.
Sum of Mean Squares Squares
Error
{30.06, if public
40
15
F
p-value
26.84
< 0.0001
Mean = 0.00 SE = 19.734 (df = 48)
Count
Residuals
20 0
10
5
–20 –40 30
40 50 Predicted values
60
0
–40
–20
0 Residuals
20
40
Solution 11
c04InstructorSolutions.indd 55
16/10/20 7:56 PM
56
C HA PTER 4
Including a Quantitative Explanatory Variable
graduation rates are slightly skewed to the right, the distribution of the residuals appears to be normal. (See residual graph.).
DF
Sum of Squares
Mean Squares
F
p-value
3999.16
3999.16
7.63
0.0081
Type of institution Error
48
25,143.82
Total
49
29,142.98
250 Graduation rate
Source
schools increases at a faster rate than it does for private schools. (See the following graph.)
523.83
200 150 100 50
Count
Mean = 0.00 15 SE = 22.887 (df = 48)
0 20,000 40,000 Tuition ($)
18.
10
5
0
–40
–20
0 Residuals
20
50.8% of the variability in graduation rates is explained by the model. The predicted graduation rate for a public university that costs $0 in tuition is −19.40%. For a private university, the increase in graduation rates is 0.0045 less for each 1 dollar increase in tuition compared to public universities. We can see this in our interaction plot because the slope of the prediction equation for graduation rates is less steep for private universities than for public institutions.
40
15. predicted graduation rate = 18.63 + 0.0013(tuition) − 18.56(type), where type = 1 if Private, and type = 0 if Public. SE of residuals = 19.237, R2 = 0.403. The R2 value has increased and the SE of residuals has decreased compared to each of the one-variable models. The value 0.0013 tells us that for each one dollar increase in tuition, the graduation rate is predicted to increase 0.0013 regardless of institution type. The slope (for tuition) has changed from the one-variable model, because adjusting for the type of institution changes the association between tuition and graduation rate. 16. An interaction between tuition and type of institution would mean that the association between graduation rate and tuition is different for different types of institutions (private and public). 17. Yes, it does look like there is an interaction between type of institution and tuition. As tuition increases, the graduation rate for public
Source Model Tuition Type Interaction Error Total
The intercept for the private universities is positive but for the public schools it is negative. The slope for the private universities is smaller than the slope for the public universities. 20. H0: The slope coefficient of the interaction between tuition and type of institution is 0. Ha: The slope coefficient of the interaction between tuition and type of institution is not 0. 21. There is convincing evidence of an interaction (F = 9.83, p-value = 0.003). The distribution of residuals is normal, however, there appears to be some fanning in the plot of the residuals vs. predicted values which is an indication of unequal variation (Solution 21).
DF
Sum of Squares
Mean Squares
3
14,812.39
1
6997.09
1
F
p-value
4937.46
15.85
< 0.0001
6997.09
22.46
< 0.0001
714.58
714.58
2.29
0.1367
1
3063.00
3063.00
9.83
0.0030
46
14,330.59
311.53
49
29,142.98
Mean = 0.00 SE = 17.650 (df = 46)
20 Residuals
Count
15
19. private predicted graduation rate = 5.84 + 0.0012(tuition); public predicted graduation rate = −19.40 + 0.0057(tuition).
10
0 –20
5 –40 0
–40
–20 0 Residuals
20
20
40 Predicted values
60
Solution 21
c04InstructorSolutions.indd 56
16/10/20 7:56 PM
Solutions to Exercises 22. 65.84%. 23. The confidence interval is (1.6, 33.3) and (−21.5, 56.4) is the prediction interval. A prediction interval is for an individual school and individuals are always more variable than averages. Therefore, the prediction interval is wider than the confidence interval. 24. We have strong evidence of an interaction between type of university (private or public) and tuition on the predicted four-year graduation rate for four-year universities in California. As tuition increases, the four-year graduation rate increases faster for public schools compared to private schools. No cause-and-effect conclusions can be drawn
c04InstructorSolutions.indd 57
57
because this was an observational study. We can generalize these results to four-year universities in California because the schools were randomly selected from both types of schools in California. 25. It would be beneficial to sample more public universities because the observed graduation rates of the 18 in this study were skewed to the right. As the current data stands, it looks as if validity conditions are not met to carry out a theory-based test. It would be interesting to see if these trends are similar in other states besides California. We might also gather other variables on each school that deal with size of endowment and amount of scholarships given.
16/10/20 7:56 PM
CHAPTER 5
Multiple Quantitative Explanatory Variables Section 5.1
ii.
5.1.1 B, D.
R2 value is 0.
5.1.2 C.
5.1.12
5.1.3 C.
a. are not associated because the slope of the regression model relating the two variables is 0, the pvalue for this model is 1, and the R2 value is 0.
5.1.4 A. 5.1.5 A. 5.1.6 B, C.
(temp−75) × (air−2) are not associated because the slope b. of the regression model relating the two variables is 0, the p-value for this model is 1, and the R2 value is 0.
5.1.7 A, B, C. 5.1.8 C. 5.1.9 There was no covariation because there was no association between the explanatory variables and covariation is variation in the response that is attributed to two associated explanatory variables. 5.1.10 which is the same a. value as the slope coefficient. b. std ˆperoxide = − 0.3318 × std air; r = −0.3318 which is the same value as the slope coefficient.
c. When the outcomes of a variable are standardized, their standard deviation is 1. This means that sy /sx = 1/1 = 1, so b = r. 5.1.11 a. ˆ As temp increases, temp air is predicted to increase, hence an association exists between these two variables. The p-value for this model is < 0.0001. ii. ˆ As Air increases, temp × air is predicted to increase, hence an association exists between these two variables. The p-value for this model is < 0.0001. b.
c.
i. The slope of the regression model relating std temp and std temp × std air is 0, the p-value for this model is 1, and the R2 value is 0. All these tell us there is no association between std temp and std temp × std air. ii. The slope of the regression model relating std air and std temp × std air is 0, the p-value for this model is 1, and the R2 value is 0. All these tell us there is no association between std air and std temp × std air. R2 value is 0.
c. By subtracting off the means, large values of temp will go with positive values of (temp−75) × (air−2) when air is above 2 mph, 0 when air is at its mean, and negative values when air is less than 2 mph. Similarly, small values of temp (below 75oF) will correspond to negative values of (temp–75) × (air–2) when air is above 2 mph, 0 when air is at its mean, and positive values when air is less than 2 mph. Because both large and small values of temp correspond to positive, negative and 0 values of (temp–75) × (air–2), there is not an association between Temp and (temp–75) × (air–2). 5.1.13 ( )
= 26.77 − 0.29(temp) − 9.27(air) + 0.11(temp × air). 5.1.14 a. It uses a factorial design because all the possible combinations of temperature (325°F and 350°F) and time (8, 11, 14 min) are used. It is balanced because there are the same number of observations (8) for each of the 6 treatments. b. c. d. Yes, this R2 value is the sum of those from the two one-variable models. e. Solving for time, we get 12.52 minutes. 5.1.15 ˆ a.
58
c05InstructorSolutions.indd 58
16/10/20 8:08 PM
Solutions to Exercises b. As the time increases by one standard deviation, the rating is predicted to increase by 2.0886, holding temperature constant.
5.2.3 A.
c. The time has a stronger effect because the coefficient for standardized time is larger than that for standardized temperature. d. ˆ
5.2.5 D.
5.1.16 a. The standardized value for 325°F is −0.9896 and for 350°F it is 0.9896. b. Solving 5 = 4.6667 + 0.9685(− 0.9896) + 2.0886(std time) + 0.8077 ( − 0.9896 × std time)for std time, we get 1.0019. Converting this standardized time to a time, we solve 1.0019 = (time − 11)/2.475 to get a time of 13.48 minutes. c. Solving −0.2165. Converting this standardized time to time we solve −0.2165 = (time − 11)/2.475 to get a time of 10.46 minutes.
5.2.4 A, B, C. 5.2.6 B. 5.2.7 a. Holding income constant, for every 1°F increase in temperature, predicted ice cream consumption increases by 0.0567 ounces per capita. b. No. Based on the coefficients of the unstandardized variables you cannot tell which variable has a stronger effect. 5.2.8 a. predicted consumption = −1.8111 + 0.5946(855/100) + 0.0567(70) = 7.242 ounces per capita per week. b. 3.5 − 7.242 = −3.742. 5.2.9 a.
ii. p-value < 0.0001. b.
iii. p-value < 0.0001. iv. The answers are the same as those in part (a). This makes sense because changing the units of the explanatory variable doesn’t change the association between the explanatory variable and the response variable.
R2 values from the one-variable models.
5.1.18 ˆ a.
c.
b. As time increases by 1 standard deviation, the predicted rating will increase by 0.5864 when temperature is held constant.
iii. The missing variation is the variation in price explained by both sqft and sqm changing together and can’t be explained by one without the other.
e. The p-value on the interaction term is 0.0129, so adding this term improves the model significantly.
iv. When both sqft and sqm are put into the same model, the SSModel is only slightly larger than it was in either of the one-variable models. The degrees of freedom is larger in the two-variable model, but because the SSModel is only slightly larger than it was in either of the one-variable models, the MSModel actually goes down. In addition, the degrees of freedom for the Error went down, but the small reduction in the SSError actually results in an increase to the MSError. The decrease in the MSModel and the increase in the MSError make the F-statistic smaller and p-value larger (but still very significant).
5.1.19 a. It uses a factorial design because all the possible combinations of angle and stretch are used. It is balanced because there are the same number of observations for each of the nine treatments. ˆ b.
ˆ e = 219.91 + 6.12(angle c. distanc ); R2 = 0.156. ˆ d. which is the sum of the R2 values from the one-variables models.
b. As the angle increases by 1 standard deviation the predicted distance increases by 37.87 cm when centimeters stretched is held constant. c. Stretch has a stronger effect because its coefficient is larger. ˆ d. distance = 326.93 + 80.87(std stretch) + 37.87(std angle) − 7.02 stretch × std angle); R2 = 0.874. Adding the interaction does not ( std significantly improve the model because the p-value on the interaction is 0.1992.
Section 5.2 5.2.1 B. 5.2.2 C.
c05InstructorSolutions.indd 59
i. approx 0.61 (the applet shows slightly different values depending on which variable you enter first because the two explanatory variables are perfectly linearly related which causes numerical problems in the estimation method). ii. The sum of squares for each is 294 and these are much lower than those from the one-variable models (and neither is significant), after adjusting for the other variable in the model.
c. Time has a stronger effect because its coefficient is larger. ˆ d.
5.1.20 ˆ a.
i. R2 = 0.608. ii. Sum of Squares for sqm= 73,611.
ˆ c. ˆ d.
e. Solving we get 10.24 minutes.
i. R2 = 0.608. ii. Sum of Squares for sqft = 73,611.
5.1.17 a. There is not an association between temperature and time because this was a balanced factorial design experiment. ˆ b.
59
5.2.10 a.
i. R2 = 0.293. ii. Sum of squares for age = 62.59. iii. p-value = 0.0024; the association is positive.
b.
i. R2 = 0.063. ii. Sum of squares for resp = 13.54. iii. p-value = 0.1879; the association is negative.
c.
This is larger than the R2 value for age in the one-variable model. This shows that resp, with its negative association with length of stay, was masking some of the association between age and length of stay.
16/10/20 8:08 PM
60
C HA PTER 5
Multiple Quantitative Explanatory Variables
ii. The R2 for age is 0.384, for resp is 0.155, and for the model is 0.448. Both age and resp explain more variation after adjusting for the other variable than each explained alone in their respective one-variable models. iii. This shows an added benefit of having both variables in the model. In particular, the negative association between resp and length of stay was hiding some of the association between age and length of stay. 5.2.11 ˆ = 3.37 − 2.03(resp a. length ) + 0.0844(age ); R2 = 0.448.
b. ˆ
I would not keep the interaction in the model because it does not significantly improve the model (R2 = 0.452, p-value = 0.6821), and it increases the p-value for the overall model. 5.2.12 a.
i. There is no evidence that body type is associated with length of stay (p-value = 0.9874), R2 = 0.00. ii. Body type does not improve any of the other associations with length of stay. No, p-value is lowered and the R2 value increases significantly.
b.
i. T here is not strong evidence that physical status is associated with length of stay (p-value = 0.2293, R2 = 0.053). ii. Physical status doesn’t improve the association with age and length of stay (p-value does not decrease). However, it does improve the associations between resp and length of stay, and between body and length of stay. The p-values decrease and the R2 values increase with the addition of physical, but nothing is significant.
5.2.13 a. Length: R2 = 0.161, p-value = 0.1967, SE of residuals = 2.396; Width: R2 = 0.128, p-value = 0.2538, SE of residuals = 2.442. b. R2 = 0.326 which is less than the sum of the other two R2 values; model p-value = 0.1693, SE of residuals = 2.263. c. The R2 value is reasonable, but the sample size of 12 is quite small, thus making it difficult to get a small p-value.
c. The sign on weight should be positive because if height is held constant, as weight goes up so does BMI. The sign on height should be negative because if weight is held constant, as height goes up BMI should go down. ˆ d. BMI = 50.70 + 0.1412(weight) − 0.7188(height). e. Adding the interaction does improve the model significantly as the p-value on the interaction term is less than 0.0001. 5.2.15 a. 0.292. b. Within about 2 × 0.292 = 0.584 kg/m2. 5.2.16 a. There should be a positive association between abdomen and BMI because the larger their belly, the more body fat they will tend to have. There should not be an association between height and BMI because taller people probably don’t tend to have a higher or lower percentage of body fat than shorter people. b. For a fixed abdomen circumference, as height increases, BMI should decrease because there should be a reduction in body fat percentage if we are assuming the individual has the same “thickness” but spread over a larger body size. c. The sign on abdomen should be positive because if height is held constant, as abdomen increases so does BMI. The sign on Height should be negative because if abdomen circumference is held constant, as height goes up BMI should go down. d. ˆ BMI = 9.49 + 0.3216(abdomen) − 0.1966(height). e. Adding the interaction does not improve the model significantly as the p-value on the interaction term is 0.9143. 5.2.17 ˆ a. b. Yes, the validity conditions seem to be met. There is no evidence of a pattern or curvature in the Residuals vs. Predicted plot, the residuals form a fairly symmetric bell-shaped distribution, and the variability of the residuals is fairly consistent in the residuals vs. predicted values plot. It is not mentioned, but we will assume the observations are independent.
d. The interaction does not help the model significantly. The p-value on the interaction term is 0.6661, the residual SE = 2.37, and R2 = 0.343. 20 Residuals
e. With just the interaction term in the model we get residual SE = 2.17, and an R2 of 0.314 which is almost as large as that from part (b), but because the model is much simpler (fewer degrees of freedom) the p-value is much smaller at 0.0579. This makes sense because the volume of an egg (a three-dimensional measurement) should come from some product of 3 one-dimensional measurements. We only have two measurements here, but their product seems to give a pretty accurate prediction of volume.
0
–20
5.2.14 a. There should be a positive association between weight and BMI because the more body fat someone has, the heavier they should be. There should not be an association between height and BMI because taller people probably don’t tend to have a higher or lower percentage of body fat than shorter people. b. For a fixed weight, as height increases, BMI should decrease because it is the ratio of weight to height; that same amount of weight for a taller person will correspond to less body fat, bones make up a higher percentage of the weight etc.
c05InstructorSolutions.indd 60
–40 100 120 Predicted values
140
c. Sugar has a stronger association because its t-statistic is larger than that of fat (5.13 vs. 4.13). This can also be seen in the ANOVA table because the sum of squares for sugar is larger as well. d. When sugar is held constant, as fat increases by 1 gram per serving, calories are predicted to increase by 7.19 calories per serving.
16/10/20 8:08 PM
Solutions to Exercises
61
e. For a cereal with 0 fat and 0 sugar, the predicted calories per serving is 85.49.
d. Distance has the strongest evidence of an effect on time because it has the largest t-statistic of 19.04.
f. When sugar is held constant, as fat increases, so do calories. When fat is held constant, as sugar increases, so do calories.
5.2.20 ˆ a.
g. SSsugar (5689.39) + SSfat (3681.63) = 9371.02 < SSModel (12,862.32). This means there is some variation in calories per serving explained by sugar and fat together that cannot be separated from one variable or the other. h. Adding the interaction term does not improve the model significantly. The p-value for the interaction term is 0.4400. 5.2.18
ˆ b.
20
b. R2 = 0.051. Residuals
c. There is a slight positive association between fat and rating, but it is not significant (two-sided p-value = 0.6140). i. R2 = 0.172. This is much larger than before adjusting for fat (0.051). ii. For a fixed value for fat, ratings are predicted to go up for an increase in calories because the coefficient on calories is positive (0.7522). iii. For a fixed value for calories, ratings are predicted to go down for an increase in Fat because the coefficient on fat is negative (–4.5104). e. The calories in our dataset go from 280 to 412 per serving. On the low end, if calories = 280 then the slope on fat is 4.5237 + (–0.0242) (280) = –2.2523. On the high end, if calories = 412 then the slope on fat is 4.5237 + (–0.0242)(412) = –5.4467. So, the slope on fat varies between –2.2523 and –5.4467 in our data range, so it is always negative, just like it was in part (d)(iii).
c. Based on the residuals vs. predicted plot the variability of the residuals increases as the predicted values increase. The validity condition states that this variability should be constant.
a. There is a positive association between calories and rating, but it is not significant (two-sided p-value = 0.2369).
d.
0
–20
100
0
200
Predicted values
Section 5.3 5.3.1 A. 5.3.2 D. 5.3.3 B. 5.3.4 A, B. 5.3.5
es by 1 kilometer, time is predicted to increase by 5.81 minutes. ˆ b.
ˆ = 56.74 + −0.3727(year a. %r ural ) While the linear model fits quite well, R2 = 96.1, there is a pattern to the residuals (see graph in Solution 5.3.5a) that indicates a quadratic model might fit better (positive, then negative, then positive residuals).
In a race where the climb is 0 meters, for each 1-kilometer increase in distance, time is predicted to increase by 3.46 minutes.
c. Adding the squared term does improve the model significantly. The p-value on the squared term is 0.0018 and R2 has increased from 0.961 to 0.988. It does appear the quadratic model fits better because there is not such a strong pattern in the residual plot (Solution 5.3.5c).
5.2.19 ˆ a. As distance increas-
Holding climb constant, for each 1-kilometer increase in distance, time is predicted to increase by 4.0420 minutes. ˆ c.
b. year = 56.74/0.3727 ≈ 152. The %rural is predicted to be 0 in 2052.
4
4
2 Residuals
Residuals
2 0 –2 –4
0
–2 20
30
40
50
20
Predicted values
Solution 5.3.5a
c05InstructorSolutions.indd 61
30
40 50 Predicted values
60
Solution 5.3.5c
16/10/20 8:08 PM
62
C HA PTER 5
Multiple Quantitative Explanatory Variables
d. The answer is positive. This makes sense because the rate at which the percent rural is decreasing is slowing down. You can see that in the data toward the (upper) end.
0.154. However, the residual versus predicted plot still shows some substantial problems (up/down/up pattern).
5.3.6 a. 0.5 0.1
b. Residuals
Residuals
c. The one using year from part (a)0 was not at all accurate. d. Using the year from part (a) the prediction changed greatly. It is now 28.18 which is much closer to the actual number. Using year –0.5 just changed a little bit to 21.17. since 1900 in part (b) the prediction
–0.1
e. Using std year we won’t have the rounding problem because squar–1 number. ing a std year will not result in a large 5.3.7 a. See graph for Solution 5.3.7a, R2 = 1.00.
8
0
8 10 Predicted values
6
10 12 Predicted values
b. See graph for Solution 5.3.7b, R2 = 0.928.
12
ˆ c. even though the data do not show more than one turn, this is the best fit. R2 has increased to almost 1.0 and SE of the residuals has decreased to 0.032.
c. See graph for Solution 5.3.7c, R2 = 0.0. d. SSyear = 59.62, SSyear2 = 54.36, SSModel = 2,040.63. ii. SSyearsince1900 = 362.17, SSYS19002 = 54.36, SSModel = 2,040.63. iii. SSstdyear = 1983.26, SSstdyear2 = 54.36, SSModel = 2,040.63.
5.3.9
ˆ a. although it fits fairly well (R2 = 0.953 and SE of the residuals = 0.524), there is a clear curve to the data. This is most obvious in the residuals versus predicted values plot as follows.
e. The sum of squares for the model and the squared term don’t change, but the sum of squares for the linear term increase as you go from year, to year since 1900 to std year. This corresponds to decreasing the linear association between the linear and squared term. 5.3.8
ˆ a. while it fits fairly well (R2 = 0.934 and SE of the residuals = 0.636), there is a clear curve to the data. This is obvious in the residual versus predicted values plot as follows:
–0.5
0
0
–0.02 6
–0.1 –1 8
Year2
3.8M 3.7M 3.6M 1900
Solution 5.3.7a
c05InstructorSolutions.indd 62
1950 Year
2000
12
6
8 Predicted val
ˆ b. this is a much better 8 10 12 fit. R2 has increased to 0.998 and SE of the residuals has decreased to 0.116. Predicted values ˆ c. even though the data do not show more than one turn, this is the
2.5 2
10k
Std year2
(Years since 1900)2
ˆ b. this is a much better fit. R2 has increased to 0.997 and SE of the residuals has decreased to
3.9M
8 10 Predicted values
6
10 12 Predicted values
4M
0
–0.5
Residuals
Residuals
0.1
Residuals
Residuals
0.02
0.5 0
0.04
0.5
5k
1.5 1 0.5 0
0 0
50 100 Year since 1900
Solution 5.3.7b
–1
0 Std year
1
Solution 5.3.7c
16/10/20 8:08 PM
Solutions to Exercises best fit. R2 has increased to 1.00 and SE of the residuals has decreased to 0.028. The residual versus predicted plot no longer shows a clear pattern. Std age3
2
0.04
Residuals
0
–2
0.02
–1
0
–0.02 8 10 Predicted values
0 Std age
1
5.3.12 6
12
63
8 10 Predicted values
12
ˆ a. ˆ b.
5.3.10 a. SSage = 7.566, SSage2 = 2.310, SSModel = 36.420. b. SSstdage = 34.110, SSstdage2 = 2.310, SSModel = 36.420. c. Without standardizing, the SSage + SSage2 was much less than SSModel, showing that much of the variability in weight couldn’t be directly attributed to either age or age2, but to the combination of the two. Hence, there had to be a linear association between the two. After standardizing SSstdage + SSstdage2 = SSModel so that linear association completely disappears.
c. The R2 and SE of residuals did not change. The sum of squares did change: SSyear = 3.66 while SSstdyear = 27.27. Because the linear association between year and year2 was removed by standardizing, we can see that much more variability in area could be attributed directly to year (rather than year2) when we use std year.
d. SSstdage = 3.5572, SSstdage2 = 2.3101, SSstdage3 = 0.1146, SSModel = 36.5347.
0.5
Residuals
e. There is a linear association between std age and std age3, R2 = 0.858.
0
–0.5
Std age3
2 –1
0
8
10 Predicted values
12
–2
5.3.13 –1
0 Std age
a. The data are clearly in the shape of a parabola, so a quadratic model looks like an excellent fit.
1
5.3.11
80
a. SSage = 6.190, SSage2 = 1.581, SSModel = 35.332. c. Without standardizing, the SSage + SSage2 was much less than SSModel showing that much of the variability in height couldn’t be directly attributed to either age or age2, but to the combination of the two. Hence, there had to be a linear association between the two. After standardizing SSstdage + SSstdage2 = SSModel so that linear association completely disappears. 2
3
d. SSstdage = 3.8191, SSstdage = 1.5815, SSstdage = 0.0637, SSModel = 35.3957. e. There is a linear association between std age and std age3, R2 = 0.858.
c05InstructorSolutions.indd 63
60 Height (cm)
b. SSstdage = 33.751, SSstdage2 = 1.581, SSModel = 35.332.
40 20 0
0
0.2
0.4 Time (s)
0.6
0.8
ˆ b.
16/10/20 8:08 PM
64 c.
C HA PTER 5
Multiple Quantitative Explanatory Variables
i. Yes, the intercept of 0.0489 is very close to 0.
5.3.16
ii. Half of –981 is –490.5, which is close to the –499.8950 on the squared term.
a. GDP is almost always increasing at a faster and faster rate. There is a very short-term dip around 2008. See the Solution 5.3.16a graph.
iii. which is close to the linear term of 386.4653.
b. The dip is so short a cubic might not make this a better model than a quadratic.
5.3.14
c. R2 = 0.998 and SE of the residuals = 274.900.
a. See graph. Debt has mostly been increasing at a faster and faster rate, but not always. In the 1990s, the rate of increase slowed down until it looks like there was no increase at all.
d. R2 = 0.998 and SE of the residuals = 272.685. e. The quadratic model is a better model. Adding the cubic term did not change R2, made little change in the SE of the residuals, and the cubic term is not significant (p-value = 0.1666).
25k GDP (billion $)
Debt
20k 15k 10k 5k 0
20k 15k 10k 5k 0
0
20
40
0
20
60
40 Year
60 Solution 5.3.16a
Year
b. A cubic model might be better than a quadratic model because it looks like there is more than one turn in the graph. c. R2 = 0.978 and SE of residuals = 1,088.357. d. R2 = 0.992 and SE of residuals = 640.534. e. The cubic model is significantly better. In addition to increasing R2 and decreasing SE of residuals, the cubic term is significant 5.3.15 a. See the graph that follows. Debt per person has been mostly increasing at a faster and faster rate, but not always. In the 1990s, the rate of increase slowed down until it looks like there was no increase at all.
a. popˆ ulation = 175.59 + 2.62( year). 5.3.17
b. The slope means that for every additional year, the population is predicted to increase by 2.62 million. The intercept means that in 1900 the predicted population was 175.59 million. c. While the linear model is a great fit, there are a couple of small curves that would indicate a cubic model may work better. Compared to a linear model, the cubic model increases R2 from 0.997 to 0.999 and reduces the SE of the residuals from 2.604 to 1.687. 5.3.18 a. See the graph that follows. It looks like a quadratic function will be the best fit.
80 80
Seeds
Debt/person
60 40
40
20
20
0 0
20
40
60
1
Year
b. A cubic model might be better than a quadratic model because it looks like there is more than one turn in the graph. c. R2 = 0.981 and SE of residuals = 3.013. d. R2 = 0.990 and SE of residuals = 2.216. e. The cubic model is significantly better. In addition to the increasing R2 and decreasing SE of residuals, the cubic term is significant (p-value < 0.0001).
c05InstructorSolutions.indd 64
60
b.
2
3 Water
4
5
and the SE of residuals = 10.792.
c. R2 = 0.892 and the SE of residuals = 7.454. This is a substantial improvement. d. The predicted number is −12.90 + 40.1143(5) − 4.5357(52) = 74.3. The residuals are 79 − 74.3 = 4.7, 68 − 74.3 = −6.3, 74 − 74.3 = −0.3, and 70 − 74.3 = −4.3. e. −12.90 + 40.1143(6) − 4.5357(62) = 64.5.
16/10/20 8:08 PM
65
Solutions to Exercises f. The residual would by −64.5 in all four cases. This is fairly large because we are extrapolating the model out to 6. Sometimes extrapolating, like in this case, will not give accurate estimates.
Section 5.4
5.3.19
5.4.2 B, C, E.
a. See the following graph. A cubic model might be the best fit. The data looks like it has one obvious turn down and then a more subtle turn up toward the right side.
Note: All solutions in Section 5.4 use log base 10. 5.4.1 D. 5.4.3 E. 5.4.4 B. 5.4.5 E. 5.4.6 E.
Attractiveness
5.4.7 60
a. See the table that follows.
x
1
2
3
4
5
y = 2x
2
4
8
16
32
log(2 x)
0.301
0.
0.903
1.204
1.505
40 20
30 40 VHI l/m2 (liters per meters squared)
b. There is a constant slope because for each one unit increase in x, log(2x) increases by 0.301. 5.4.8
2
b. R = 0.195 and the SE of the residuals = 14.902. c. R2 = 0.409 and the SE of the residuals = 12.934; both are significant improvements over the linear model (p-value for quadratic term = 0.0008). d. R2 = 0.535 and the SE of the residuals = 11.640; both are significant improvements over the quadratic model (p-value for cubic term = 0.0036). e. The cubic model is significantly better than the other two. 5.3.20 a. See the following graph. A quadratic model might be the best fit. There only appears to be one turn in the data on the left side. b. R2 = 0.411 and the SE of the residuals = 9.050. c. R2 = 0.591 and the SE of the residuals = 7.653; both are significant improvements over the linear model (p-value for quadratic term is 0.0006).
a. See the table that follows.
x
1
2
3
4
5
2
y = x
1
4
9
16
25
log(x)
0
0.301
0.477
0.602
0.699
2
0
0.602
0.954
1.204
1.398
log(x )
b. This is a linear function because the slope between the first two points (0.602 – 0)/(0.301 – 0) = 2 is constant throughout. For example, the slope between the second two points (0.954 – 0.602)/(0.477 – 0.301) = 2 as well. 5.4.9 a. Yes, the graph looks quite linear.
d. R2 = 0.604 and the SE of the residuals = 7.653; only the R2 improves slightly over the R2 from the quadratic model, the SE of the residuals remains the same for both the quadratic and cubic models (p-value for cubic term is 0.3253).
1.5 Log (Dice)
e. The cubic model did not improve the SE of the residuals at all and made a fairly minor improvement in the R2 value. The quadratic model is the best.
2.0
1.0 0.5
Attractiveness
70 0.0
60
0
4
6 Toss
8
10
12
40
b.
30 20
c05InstructorSolutions.indd 65
2
50
The final equation is close to the theoretical model dice remaining = 100(⅔)toss.
c.
20
25 30 VHI (l/m2)
35
16/10/20 8:08 PM
66
C HA PTER 5
Multiple Quantitative Explanatory Variables
5.4.10
a. log(cˆ oins) = 2.22 − 0.2755(tosses). ˆ = 2.22 − 0.2755(3) = 1.3935, 10 1.3935 = 24.75 ≈ 25. The b. log(coins) predicted number of coins remaining after 3 tosses is a little bit higher than the actual number remaining, 23.
5.4.17 a. There is a slight curve to the relationship between MPG and weight. This curvature is easiest to see in the residual versus predicted plot shown here. 30
c. Theoretically, there should be 200(0.5)(0.5)(0.5) = 25 coins remaining after 3 tosses.
20 Residuals
5.4.11 a. Using gives the strongest association with an R2 of 1.000. _ b. ˆ
ˆ c.
10
_
d.
0
_
–10 10
20 30 Predicted values
5.4.12 a. b. The temperature of the probe is predicted to be an additional 14.59° higher than room temperature, or 34.59°C.
5.4.13 a. has the strongest linear association with R2 = 1.000. ˆ b. c. ˆ )
c. d. Residual = 37 – 34 = 3 mpg. 5.4.18 a. There is a quite pronounced curve in the relationship between MPG and HP. This curvature is very obvious in the residual vs. predicted plot shown here.
20
5.4.14 a. The log(DJIA) versus Year gives the strongest linear association. b. log(DJIA) ˆ = −39.91 + 0.0218(year). It appears to be a reasonable fit
Residuals
c.
b. Both log transformations should be used to get the regression equation with the highest R2. log(ˆ MPG) = 4.45 − 0.8544log(weight).
with R2 = 0.954 and the p-value for the model < 0.0001. c. log(DJIA) ˆ = − 39.91 + 0.0218(2006) = 3.8208. This is a little smaller
10 0 –10
than the actual value of 4.0956.
20
d. 10 3.8208 = 6,619.12. The predicted value is about half of what it should be, 12,463.2, so it is a very poor fit. 30
5.4.15
20
a. log(life) vs. log(doctor) has the strongest linear association. b. ˆ The fit is pretty good with
R2 = 0.649 and a model p-value < 0.0001. ˆ c. The actual value is 1.87795 so the prediction is a little bit low.
d. 10 1.8607 = 72.6 5.4.16 a. log(life) vs. log(TV) has the strongest linear association. b. ˆ The fit is pretty good with R2 =
0.742, and a model p-value < 0.0001. ˆ c. The actual value is 1.87795 so the prediction is a little bit high.
d. 10 1.8925 = 78.1 which is a little higher than the actual value of 75.5.
c05InstructorSolutions.indd 66
Residuals
e. The answer in part (c) is an exponent. Two exponents can be relatively close together, but when 10 is raised to each of these values, the results can be very different, like those in part (d).
30 Predicted values
40
10 0 –10 20 30 Predicted values
b. A log transformation on HP only, should be used to get the regression equation with the highest R2 (0.503); however, there is still a curve evident in the plot of MPG vs. log(HP), as well as in the residual vs. predicted plot, indicating some other transformation might be better.
16/10/20 8:08 PM
Solutions to Exercises ˆ = 93.72 − 28.92 × log(124) = 33.2 mpg. c. MPG d. Residual = 37 − 33.2 = 3.8 mpg. 5.4.19 a. These data are increasing and concave down, indicating raising Weight to a positive exponent less than 1 would make a good fit. 1_ b. ˆ ) 2 (
ˆ gives the largest R2.
Using an exponent of ½
c. The coefficients are smaller on the equations with exponents of ⅔ and ¾. In particular, the y-intercept is smallest on the equation with the exponent of ⅔, indicating that a simple power function (that has a y-intercept of 0) might be a nice simple model. 5.4.20
ˆ ) = 0.9018 + 0.8043 × log(weight). a. log(wing ˆ b. wing = 10 0.9018( weight 0.8043)≈ 7.9763(weight 0.8043).
End of Chapter 5 Exercises 5.CE.1
ˆ = 34.09 + 0.1912(height); speed ˆ = 34.09 + 0.1912(150) = a. speed 62.77 mph. A 95% confidence interval for the average maximal speed of all coasters with greatest height equal to 150 feet is (61.67, 63.90). The width is 2.23 feet.
b. sˆ peed = 33.59 + 0.0758(height ) + 0.1276(drop). Drop is a statistically significant addition to the model with p-value < 0.0001. sˆ peed = 33.59 + 0.0758(150) + 0.1276(200) = 70.47 mph. The 95% confidence interval for the average maximal speed of all coasters with greatest height equal to 150 feet and maximum vertical drop of 200 feet is (67.53, 73.41). The width of this interval is 5.88 which is wider than the width of the interval using only height to predict speed. This second interval is wider despite the addition of a highly significant variable to the model because the x-values (150 and 200) are a highly unusual combination of values, which leads to hidden extrapolation. 2
c. R = 0.881. This suggests that drop and height are highly correlated. They are explaining most of the same variation in the speed, so they aren’t both needed in the model. d. Height and height–drop are not strongly associated as their R2 is 0.075. e. Holding the height constant, for each 1-foot increase in height– drop (so less of the overall height is in the largest drop), the average speed is expected to decrease by 0.1276 mph.
67
e. SE of the residuals = 0.065, R2 = 0.067. With a p-value of 0.166 we have weak evidence against the null hypothesis of no association between ice cream consumption and price. The model says that for each one dollar per pint increase in ice cream price, consumption of ice cream is predicted to decrease by 2.05 pints per capita. f. The standard error of residuals is larger for the model with price as a predictor and smaller for the model with temperature as a predictor. The R2 for the model with temperature as a predictor is much higher than the R2 for the model with price as a predictor. This makes sense because temperature was a statistically significant predictor of consumption, and price was not. 5.CE.3 a. we have very strong evidence of both temperature and price providing a good model to predict ice cream consumption. b. To standardize a variable, the mean of the variable is subtracted from each value of the variable, and the differences are each divided by the SD of the variable. This is important because standardized variables aren’t associated with each other and thus explain different parts of the variability in the response. c. d. These values are the same for both models. This makes sense because standardized variables are just linear combinations of the original variables, so they aren’t explaining any more or less of the variability in the response. e. No, the strength of evidence of an association between the standardized explanatory variables and the response variables did not change. The p-values remained the same for both explanatory variables. There must not have been a strong association between the price and temperature variables. 5.CE.4 a. There is an interaction between price and temperature provided the effect of price on consumption of ice cream is different for different values of temperature. umption = 0.3586 + 0.0496(std temp) − 0.0119(std price) − b. consˆ 0.0079(std price × std temp), SE of the residuals = 0.041, R2 = 0.644. c. There is no evidence of a statistically significant interaction between temperature and price, as the p-value of the interaction coefficient is 0.3769.
5.CE.2
d.
a. Temperature explains most of the variation in ice cream consumption because it has the strongest linear association with ice cream consumption.
e. There is no evidence of a statistically significant interaction between temperature and income as the p-value of the interaction coefficient is 0.4128.
b. With a p-value less than 0.0001 we have very strong evidence against the null hypothesis of no association between ice cream consumption and temperature. This model tells us that each 1° increase in temperature (in °F), predicts an increase of 0.0031 pints per capita in ice cream consumption.
f.
c. and a 95% confidence interval for average consumption is (0.3741, 0.4124).
g. There is evidence of a statistically significant interaction between temperature, price, and income as the p-value for this interaction coefficient is 0.0002.
d. When the average temperature is 60°F, we are 95% confident that the consumption of ice cream in pints per capita is between 0.30 and 0.48 for a single 4-week period.
h. Of these three models, model 3 with the three-way interaction best predicts ice cream consumption. This model has the lowest SE of the residuals and the highest R2.
c05InstructorSolutions.indd 67
16/10/20 8:08 PM
68
C HA PTER 5
Multiple Quantitative Explanatory Variables
Mean = 0.00 SE = 49.281 (DF = 98) 100
20
Residuals
Count
30
10
0
–100 0 –100
0 Residuals
50
100
100 150 Predicted values
200
Solution 5.CE.5c
c.
5.CE.5 a. Budget seems to explain most of the variation in U.S. gross because it has the strongest linear relationship with U.S. gross. (Note that none of these three variables has a very strong linear relationship with U.S. gross.). b.
c. The histogram of residuals is slightly skewed to the right and the plot of residuals vs. predicted values looks to be quadratic (see Solution 5.CE.5c). It would probably be better to use a model with either a squared term or possibly a log transformation of the U.S. gross. d. 5.CE.6 a.
The model is a statistically significant predictor of U.S. gross as the p-value is less than 0.0001. Note that run time is not a significant predictor in this model (p-value = 0.5002), and that residuals show that the validity conditions may not be met for this model. (The residuals are skewed right and do not appear to have equal variance as seen in the plots for Solution 5.CE.6a.). 2
b. To standardize a variable, one subtracts the mean and then divides by the standard deviation of the variable for each value of the variable. This makes units easier to interpret with other variables in the model, and standardized variables aren’t associated with each other and so explain different parts of the variability in the response.
d. The values of R2 and SE residuals are the same in both models. This makes sense because standardized variables are just linear combinations of the original variables, so they aren’t explaining any more or less of the variability in the response. e. The strength of evidence of an association between the standardized explanatory variables and response variable did not change; the p-values are still the same for each predictor variable. Therefore, there must not have been much of an association between budget and run time. f. Yes, the magnitude of the coefficient for budget (44.69) is much larger than the magnitude of the coefficient for run time (3.44). This is consistent with the p-values of < 0.0001 for budget and 0.5002 for run time. 5.CE.7 a. There would be in interaction between budget and run time if the effect of budget on U.S. gross is different for different values of run time.
.S. gross = 64.75 + 45.31(std budget) + 5.15(std run time) − 3.35 b. Uˆ (std budget × std run time), SE of residuals = 49.511, R2 = 0.468. c. There is no evidence of a statistically significant interaction between std budget and std run time because the p-value of the interaction coefficient is 0.4281. d.
Mean = 0.00 SE = 49.417 (DF = 97) 30 Residuals
Count
100 20 10
0 –100
0 –100
0 100 Residuals
50
100 150 Predicted values
200
Solution 5.CE.6a
c05InstructorSolutions.indd 68
16/10/20 8:08 PM
Solutions to Exercises e. There is no evidence of a statistically significant interaction between std budget and std rating because the p-value is 0.1309. f.
h. None of these models has significant interactions, so probably the best model is the model with just Budget as the predictor for U.S. gross, although the validity conditions were suspect on this model and suggested a quadratic model or a transformation of one of the variables. 5.CE.8
a. ( SE of residuals = 0.306, R2 ) = 0.382. The model indicates that for each one million-dollar increase in the budget of a movie, the log(U.S. gross) is predicted to increase 0.0048 million dollars. The residual plots in Solution 5.CE.8a indicate the validity conditions for the linear regression are met. b. Exponentiate (base e) both sides of the equation to get 4.988 million for the predicted U.S. gross. 5.CE.9 a. hand span seems to explain more of the variation in Height than does length of index finger because it appears to have a stronger linear association with height. ˆ b.
The model tells us that for each one cm increase in handspan, height is predicted to increase 0.8970 inches.
e. There is no evidence of a statistically significant interaction between std hand span and std length of index finger because the p-value of the interaction coefficient is 0.5972. 5.CE.10 ˆ a.
where sex = 1 if female and 0 if male. SE of the residuals = 2.380, R2 = 0.655. The validity conditions are met for a linear regression as confirmed by residual plots (see Solution 5.CE.10a output). The model is a significant predictor of height because the model p-value is less than 0.001. b. First standardize the values of 22 centimeters and 8 centimeters: The mean female handspan is 19.4 centimeters and the SD is 1.873 centimeters, so (22–19.4)/1.873 = 1.388. The mean female length of index finger is 7.237 and the SD is 0.608. Then (8 – 7.237)/0.608 = 1.255. ˆht The predicted height is heig = 69.9 + 1.46(1.388) + 0.9638(1.255) − 3.2035(1) = 69.93 inches. c. First standardize the values of 22 centimeters and 8 centimeters: for a male: the mean male handspan is 22.21 centimeters and the SD is 2.805 centimeters, so (22–22.21)/2.805 = –0.075. The mean male length of index finger is 7.97 and the SD is 1.243. Then (8 – 7.97)/1.243
0.5 Residuals
Count
. Validity conditions are met for a linear regression as confirmed by residual plots. (See graphs of Solution 5.CE.9d) The model is a significant predictor of height as the p-value is less than 0.0001.
Mean = 0.00 SE = 0.306 (DF = 98)
20 10 0
c. The histogram of residuals appears to be normal and the plot of the residuals versus predicted shows no pattern, so validity conditions are met for the linear regression. (See the plots for Solution 5.CE.9c). d. Because the span of one’s hand and length of one’s index finger are associated, these variables were standardized. ˆ
g. There is no evidence of a statistically significant interaction between std budget, std run time, and std rating because the p-value for this interaction coefficient is 0.1290.
30
69
0 –0.5 –1
–1
–0.5
0 Residuals
0.5
1.5
2 Predicted values
Solution 5.CE.8a
Mean = 0.00 10 SE = 3.005 (df = 32)
5 Residuals
Count
8 6 4 2 0
–5 –5
Solution 5.CE.9c
c05InstructorSolutions.indd 69
0
0 Residuals
65
5
70 Predicted values
75
16/10/20 8:08 PM
70
C HA PTER 5
Multiple Quantitative Explanatory Variables
Mean = –0.00 10 SE = 2.745 (df = 30)
6 4 2 0
–5
0 Residuals
5
Count
8
Solution 5.CE.9d
Mean = 0.00 10 SE = 2.380 (df = 30) 4 Residuals
Count
8 6 4
2 0 –2
2
–4 0
65
5
0 Residuals
70 Predicted values
75
Solution 5.CE.10a
ˆ = 69.9 + 1.46(− 0.075) + 0.9638 = 0.0241. The predicted height is height (0.0241) + 3.2035(0) = 69.81 inches. d. The SE of the mean is 0.661 for females and for males it is 0.468. We expect the SE of mean for the females to be higher than that for males because it is more unusual to have a female student with a hand span of 22 centimeters and 8 centimeters length of index finger than it is for a male, according to the data. 5.CE.11 a. For each 1 centimeter increase in hand span, the predicted height increases more for males than for females. ˆ b. The validity conditions are met for a linear regression as confirmed by residual plots. The model is a significant predictor of height as the model p-value is less than 0.0001.
b. Based on the curved pattern in the residual vs. fitted plot, a linear model does not seem appropriate for these data. c. The residual vs. fitted value plot suggests a quadratic model would be a better model. d. ˆ MPG = 75.37 − 27.13(weight) + 3.12(weight × weight), SE of residuals = 2.521, R2 = 0.860.
e. Yes, the model in (d) is appropriate for these data; the residuals vs. predicted plot in Solution 5.CE.12e doesn’t show any pattern, and the histogram of the residuals is normally distributed. f. The polynomial model has a smaller SE of residuals and a larger R2 than the linear model.
5 Residuals
Count
5.CE.12 ˆ a. MPG = 48.707 − 8.3646(weight), SE of residuals = 2.85, R2 = 0.816.
Mean = 0.00 SE = 2.521 (DF = 35)
10
5
0
c. There is not significant evidence of an underlying interaction between std hand span and sex of student because the p-value is 0.3819.
0
–5 –5
0
5 Residuals
15
20
25 30 Predicted values
35
Solution 5.CE.12e
c05InstructorSolutions.indd 70
16/10/20 8:08 PM
Solutions to Exercises
Chapter 5 Investigation 1. Each bottle of Bordeaux wine is an observational unit. 2. This is an observational study. 3. This study used neither random sampling nor random assignment. This will make it difficult to generalize findings and no causal conclusions can be made from the data. 4. The response variable is price index; it is quantitative. High values are desirable for the response as that indicates the price for a case of Bordeaux from that year compares similarly to the price of a case of Bordeaux in the index year of 1961.
71
10. Points that fall above the regression line are index prices that are higher than would be predicted by age and points that fall below the regression line are index prices that are lower than would be predicted by age. Positive residuals come from the more expensive Bordeaux and negative residuals from the less expensive Bordeaux. 11. predicted price index = 0.1012 + 0.0115(age); SE of residuals = 0.191 and R2 = 0.204. The standard error of residuals is smaller than the standard error of residuals for the single mean model.
5. Possible Sources of Variation diagram for this study:
12. See graphs in Solution 12. The histogram of the residuals is normally distributed with one high outlier. The residuals vs. predicted graph appears to fan out for larger predicted values, which is a violation of the equal variation assumption.
Observed variation in:
Sources of explained variation
Sources of unexplained variation
13. There is a positive linear association between log(price index), using the natural log, and the age of a Bordeaux wine. This seems to be a moderately strong association with no unusual observations.
Inclusion criteria
• Age • Summer temperature • Harvest rainfall • Winter rainfall
• Harvest method • Pesticide use • Herbicide use • Fermenting process • Place of storage
• Vintages from 1952–1980
0 Log (price index)
Price index
6. Price index of the 1961 vintage is 1. Price indices of all of the other vintages are less than 1. This makes sense because the prices of the other vintages are indexed to the price in 1961. 7. Overall mean of price index = 0.288, SD = 0.210, SSTotal = 1.15.
–0.5 –1 –1.5 –2
8. predicted price index = 0.288, with SE of residuals = 0.21. 9. There is a positive association with price index and the age of a Bordeaux wine. The positive association looks to be somewhat linear although an exponential association would be an apt description as well. This seems to be a moderately strong association with the only unusual observation being the price index of 1 for the index vintage of 1961 (age 22).
20
30
Age
14. predicted log(price index) = −2.03 + 0.0354(age). The slope coefficient is 0.0354: For each 1-year increase in age of wine, log(price index) is predicted to increase 0.0354. The y-intercept is −2.03: For a wine of age of 0, the log(price index) is predicted to be −2.03.
1 0.8 Price index
10
15. See graphs for Solution 15. Yes, the histogram of residuals is normal and the plot of residuals vs. predicted values shows equal variation across the predicted values.
0.6 0.4
16. H0: No association between age of a wine and its log(price index) vs. Ha: There is an association between the age of a wine and its log(price index). The p-value for the test is 0.0157 which leads us to conclude there is strong evidence against the null and for the alternative that there is an association between age of a wine and its log(price index).
0.2 10
20
30
Age
8
Mean = 0.00 SE = 0.191 (DF = 25)
0.6 Residuals
Count
6 4 2
0.4 0.2 0 –0.2
0 –0.2
0
0.2 Residuals
0.4
0.6
0.2
0.3 0.4 Predicted values
Solution 12
c05InstructorSolutions.indd 71
16/10/20 8:09 PM
72
C HA PTER 5
Multiple Quantitative Explanatory Variables Mean = 0.00 SE = 0.574 (DF = 25)
8
1
4 2 0
0.5
Residuals
Count
6
0 –0.5 –1
–1
–0.5
0 0.5 Residuals
1
–2
–1.5 Predicted values
–1
Solution 15
17. The percentage of the variation in the log(price index) that is explained by this regression model with age is 21.2%, and 78.8% is not explained by this regression model with age as the explanatory variable. 18. SE of residuals = 0.457 and the predicted log(price index) = −11.0498 + 0.5607(summer temp) + 0.0222(age). The residual plots meet all validity conditions (see Solution 18). 19. The sign of the coefficient of summer temperature is positive. This is what we would expect as hot summers are conducive to good growth of grapes. 20. The coefficient of age did change from the one-variable model. This implies there is an association between age and summer temperature. Because the sign of the coefficient of age didn’t change, there is no evidence of Simpson’s Paradox with these data.
22. predicted log(price index) = −3.3739 − 0.05094(age) + 0.0952 (summer temp) + 0.0320(age × summer temp). The coefficient for the interaction term is 0.032. This means that as the age of the wine increases, the predicted log(price index) increases by 0.032 for each 1 degree increase in summer temp. The interaction is not statistically significant after adjusting for age and summer temp (p-value = 0.0915), so we are justified in removing the interaction from the model. 23. predicted log(price index) = −12.1453 + 1.1668(winter rain) − 3.8606(harvest rain) 0.6164(summer temp) + 0.0238(age). The p-value for winter rain coefficient is 0.0242, for harvest rain, p-value = 0.0001, for summer temp, p-value < 0.0001, for age, p-value = 0.0031; R2 =
Mean = 0.00 SE = 0.457 (DF = 24)
8
0.5 Residuals
6 Count
21. There would be an interaction between age and summer temperature if the effect of age on log(price index) changes for different values of summer temperature.
4 2
0 –0.5
0
–0.5
0 Residuals
0.5
–2.5
–2 –1.5 –1 Predicted values
Solution 18
8
Mean = 0.00 SE = 0.287 (DF = 22)
0.6 0.4 Residuals
Count
6 4 2
0.2 0 –0.2 –0.4
0 –0.5
0 Residuals
0.5
–2 –1 Predicted values
0
Solution 23
c05InstructorSolutions.indd 72
16/10/20 8:09 PM
Solutions to Exercises 73 0.828. All validity conditions are met as the histogram of residuals is normal and the plot of residuals vs. predicted values shows equal variation (see graphs in Solution 23). 24. The p-value for the model is < 0.0001, so there is convincing evidence that this model is “useful” for predicting the log(price index). 25. For the model with only age, R2 = 0.212 and SE of residuals = 0.574. For the model with both age and summer temp, R2 = 0.522 and the standard errors of the residuals is 0.457. For the model with all four explanatory variables in it, R2 = 0.828. It seems that winter rain may not be needed to accurately predict the log(price index) of wine. Without this variable, the total R2 only goes down to 0.782 and the SE of residuals only increases from 0.287 to 0.315. 26. We are 95% confident that after controlling for age of wine, the amount of winter rain, and summer temperature, for each 1-inch increase in harvest rainfall the average log(price index) decreases between 2.19 and 5.54. 27. Using age = 31 years, summer temperature = 17.11°C, harvest rain = 0.16 m, and winter rain = 0.6 m, (−1.02, −0.53) is a confidence
c05InstructorSolutions.indd 73
interval for the mean log(price index), and (−1.42, −0.13) is a prediction interval for log(price index) because prediction intervals are always wider than confidence intervals. A 95% confidence interval for the price index under these conditions is (e−1.02, e–0.13) = (0.2417, 0.8781). 28. There is strong evidence that the age, summer temperature, and rainfall in both the winter and at harvest time aid in the prediction of the log(price index) of Bordeaux wines. The regression equation will work well to predict the log(price index) of Bordeaux wines whose grapes were harvested between 1952 and 1980. Generalizing to vineyards near the 13 in the study should be fine as the weather conditions don’t vary much across this area in France. No causal conclusions can be made as this was an observational study. 29. It would be nice to use the price of individual bottles of Bordeaux rather than cases. If possible, measuring more variables, such as fermenting procedures, barrels used, and herbicides or pesticides used, might prove to be interesting. It would be nice to expand the range of dates for the bottles used in the study.
16/10/20 8:09 PM
CHAPTER 6
Categorical Response Variable Section 6.1
c. 1.510; The odds that a student took more than one candy from a small bowl are 1.510 times as large as from a large bowl.
6.1.1 C.
6.1.13
6.1.2 E.
a. χ2 = 2.161.
6.1.3 A.
b. No, there is not strong evidence that the amount of candy in the bowl will affect whether people are disobedient and take more than one piece as the χ2-statistic is less than 4.
6.1.4 B, C. 6.1.5 A, D. 6.1.6 B.
6.1.14
6.1.7 D.
a. See the following table.
6.1.8 E.
Child
6.1.9 A.
Adult
Yes
6.1.10 a. See the following table.
Afternoon
Evening
Total
On phone
48
45
93
Not on phone
184
70
254
Total
232
115
347
b. 1.891; The chance that a student was on their phone in the evening is 1.891 times larger than in the afternoon. c. 2.464; The odds that a student was on their phone in the evening are 2.464 times larger than in the afternoon. 6.1.11
Total 10
No
18
22
40
Total
20
30
50
b. df = 3 (You can fill in 3 of the cells in the top row, and then all the other table values are determined.). c. df = 6. d. df = (n – 1) × (m – 1). 6.1.15 a. H0: There is no association between banding and whether a penguin survives after 4.5 years. There is an association between banding and whether a penguin survives after 4.5 years. (πband − πno band ≠ 0). b. 1.9375; The “risk” that a penguin is alive after 4.5 years is 1.9375 times more likely if the penguin is not banded than if it is banded.
a. χ2 = 13.328. b. Yes, there is very strong evidence that a different (or larger) proportion of students are using their cellphones while walking around campus in the evenings than during the afternoons because the χ2-statistic is much larger than 4. 6.1.12 a. See the table that follows.
Large Took more than one
39
Took one
60
Total
99
Small
Total 115
109
208
b. 1.258; The chance that a student took more than one candy from a small bowl is 1.258 times as large as from a large bowl.
c. χ2 = 9.0325. d. The p-value should be around 0.004. e. We have strong evidence that the survival rate of the penguins after 4.5 years is higher if they are un-banded than if they are banded. Because the banding was randomly assigned, we can conclude that the banding is causing this difference. Generalizing to populations beyond this particular type of penguin who are already tagged might be unwise. 6.1.16 H0: There is no association between banding and whether a penguin survives after 4.5 years. (π band− π no band= 0)vs. Ha: There is an association between banding and whether a penguin survives after 4.5 years. After 4.5 years, 32% of the penguins that were banded were alive, and 62% of the penguins that weren’t banded were alive. χ2 = 9.0325; p-value = 0.0027; we have strong evidence that the survival rate of the penguins after 4.5 years is higher if they are un-banded than if they are banded.
74
c06InstructorSolutions.indd 74
16/10/20 8:30 PM
75
Solutions to Exercises 6.1.17 a. H0: There is no genuine association between the color of the sign and whether a student obeys it. (π yellow − π red = 0); Ha: There is an association between the color of the sign and whether a student obeys it. (π yellow− π red≠ 0). b. 2.365; The odds that students obey the red sign are 2.365 times larger than the odds that students obey the yellow sign. c. χ2 = 4.376. d. The p-value should be just under 0.05. e. We have strong evidence that students, like those in this study, will obey the red sign at a higher rate than they will obey the yellow sign. 6.1.18 H0: There is no underlying association between the color of the sign and whether a student obeys it. (πyellow − πred = 0); Ha: There is an association between the color of the sign and whether a student obeys it. (πyellow − πred ≠ 0); 25.9% of the students obeyed the yellow sign and 45.3% of the students obeyed the red sign. χ2 = 4.376. The p-value = 0.0364, so we have strong evidence that students obey a red sign at a higher rate than they obey a yellow sign. 6.1.19 a. Yes, people who are judged to be overconfident are more likely to get both answers correct. Of those who were overconfident, 78.9% got both answers correct whereas only 65.7% of those that were not overconfident got both answers correct. b. The relative risk is 1.201 meaning that the chances that overconfident people get both answers correct is 1.201 times as large as that for non-overconfident people. c. χ2 = 6.192. d. H0: There is no association between being overconfident and whether a person gets both answers correct. ( There is an association between being overconfident and whether a person gets both answers correct. e. p-value = 0.0128. It is appropriate to use a theory-based test as validity conditions are met. f. We have strong evidence that overconfident people are more likely to get both answers correct than people who are not overconfident. Because this was an observational study, we cannot conclude that being overconfident causes people to be more likely to get both answers correct. We can generalize the results only to people similar to those in the study. 6.1.20 Yes, based on the survey results, people are less likely to answer yes to a socially undesirable question if they are answering via a phone call compared to a text because 25.4% of the sample answered yes from the text message, whereas only 12.9% of the sample answered yes from the phone call. b. See the following table.
how the survey is administered and whether a person says they exercise less than once per week. (π text − π phone call≠ 0). e. p-value = 0.0001. It is appropriate to use a theory-based test because validity conditions are met. f. We have strong evidence that those answering the survey with a phone call are less likely to say they exercise less than once per week than those who answer the survey from a text. We can conclude the type of survey is causing this difference because the survey method was randomly assigned. We can generalize to people similar to the study participants. 6.1.21 a. Yes, the proportion who answer yes for texts is noticeably different between the human and automated texts (21.5% for human and 29.3% for automated) but very similar for phone calls (13.1% for human and 12.6% for automated). b. χ² = 19.252; p-value = 0.0002. c. We have strong evidence (small p-value) that at least one of the probabilities that someone will answer yes is different among the four categories of human text, automated text, human voice, and automated voice. d. Both texting methods give significantly higher proportions of yes answers than either of the voice methods. 6.1.22 a. Yes, based on the survey results, people are less likely to round an answer if they are answering via a text because 14.6% of those in this sample rounded, but 21.3% rounded when answering from a phone call. b. See the forthcoming table.
Text
Call
Total
Round
56.64
57.36
114
Not round
258.36
261.64
520
Total
315
319
634
c. χ2 = 4.844. d. H0: There is no association between how the survey is administered and whether a person rounds an answer. Ha: There is an association between how the survey is administered and whether a person rounds an answer. (πtext − πphone call ≠ 0). e. p-value = 0.0278. It is appropriate to use a theory-based test because validity conditions are met.
f. We have strong evidence that those answering the survey with a text are less likely to round their answers than those answering from a phone call. We can conclude the type of survey is causing this difference, because the survey method was randomly assigned, and we can generalize to people similar to those in the study. 6.1.23
Text
Call
Total
Yes
60.118
60.882
121
No
254.882
258.118
513
Total
315
319
634
a. Yes, the percentages who round their answers seems to differ for both the texts and phone calls (17.1% for human and 12.1% for automated) than for voice (24.4% for human and 18.2% for automated). b. χ² = 8.207; p-value = 0.0419.
c. χ2 = 16.150.
c. We have strong evidence that at least one of the probabilities that someone will round is different among the four categories of human text, automated text, human voice, and automated voice.
d. H0: There is no underlying association between how the survey is administered and whether a person says they exercise less than once per week. (π text − π phone call = 0); Ha: There is an association between
d. The automated text method gives significantly lower rounding results than the human voice. This is the only pair where there is a significant difference (95% confidence interval: (−0.2066, −0.0389)*).
c06InstructorSolutions.indd 75
16/10/20 8:30 PM
76
C HA PTER 6
Categorical Response Variable
Section 6.2
c. predicted probability of alive = 1.6316/2.6316 = 0.62. This is the same as the proportion that were alive, 31/50 = 0.62.
6.2.1 C.
a. log(odds ˆ of alive) = 0.48955 − 1.24332(1) = − 0.75377; predicted odds of alive = e −0.75377 = 0.4706; predicted probability of alive = 0.4706/1.4706 = 0.32. This is the same as the proportion that were alive, 16/50 = 0.32. 6.2.15
6.2.2 C. 6.2.3 B. 6.2.4 A. 6.2.5 A, C. 6.2.6 D.
prob = _______ ; odds(1 − prob) = prob; odds − odds(prob) = 6.2.7 odds 1 − prob odds prob; odds = prob + odds(prob); odds = prob(1 + odds); prob = _______ 1 + odds 6.2.8 Whereas the log(odds of y) can take on values between –∞ and ∞, the odds of y can only vary between 0 and ∞ because a logarithm is an exponent on e and any power of e is positive. Because probability odds , the denominator is always larger than the numerator, = ________ 1 + odds meaning the probability can never be more than 1. Because the odds of y are positive, the probability of y must be between 0 and 1. 6.2.9 Letting
probability of y.
6.2.10 a. For a score of 25 the predicted log(odds of admission) = − 2 + 0.1
(25) = 0.5, so e 0.5 = odds of admission. For a score of 26 the
So the odds of admission increased by a factor of
b. i. and the number of bacteria after 4 hours is ii. log(number of bacteria)= log(100 × 2 n)= log(100)+ log(2 n)= log(100) + n × log(2). iii. The log(2) indicates a multiplicative increase of 2 because e log(2) = 2. 6.2.11
ˆ a. b. The odds of admission is so the probability of suc cessful admission is ( ) c. If the probability of successful admission is 0.50, then the odds are 1, meaning the log(odds) = log(1) = 0. So, if we solve score = 69.5/0.14 ≈ 496. 6.2.12 a. e −69.5+0.14(501) / e −69.5+0.14(500) = 1.8965 / 1.6487 = 1.150which is also just e 0.14. −69.5+0.14(511)
b. e /e is also just
−69.5+0.14(510)
= 7.6906 / 6.6859 = 1.150 which again
c. (1.8965/2.8965)/(1.6487/2.6487) = 1.052. d. (7.6906/8.6906)/(6.6859/7.6859) = 1.017. 6.2.13 a. 0.14 ± 2(0.04) or 0.06 to 0.22. b. As the MCAT score increases by 1, the odds increase by a factor of e 0.06 to e 0.22 or as the MCAT score increases by 1, the predicted odds will be 1.062 to 1.246 times as large. 6.2.14 a. b. predicted odds of alive= e 0.48955 = 1.6316.
c06InstructorSolutions.indd 76
b. 1.6316/0.4706 = 3.467; This can also be obtained using a reciprocal (since we are going from 1 to 0 instead of 0 to 1) involving the slope or 1 /e −1.24332 = e 1.24332 = 3.467. Observe that this is the odds ratio found in Exercise 6.1.15. c. 0.62/0.32 = 1.9375, which is the relative risk found in Exercise 6.1.15. 6.2.16 a. The proportion of shots made from 8 feet or less was 18/21 = 0.8571. The proportion of shots made from 9 feet or more was 7/21 = 0.3333. All the shots were made at distances of 5 and 6 feet. There were no distances where all the shots were missed. b. c. There is strong evidence of an association between distance and shots made because the p-value for the slope is 0.0028. d.
3 out of 6, or 50% made the shot from 10 feet in the sample so our estimate for the probability is a little lower than this. e. A 50% probability is the same as odds of 1 and because log(1) = 0 we need to solve the following: doing so gives us distance = 9.46 feet. 6.2.17 a. −0.5499 ± 2(0.1842) or −0.9183 to −0.1815. b. As the distance increases by 1 foot, the odds of making change by a factor of 6.2.18 a. 61.9% of the first-class passengers survived, 43.0% of the secondclass passengers survived, and 25.5% of the third-class passengers survived. ˆsurviving) = 1.2680 − 0.7790(class). b. log(odds of c. Yes, there is strong evidence of an association between passenger class and surviving because the p-value for the slope is less than 0.0001. ˆ d.
this is just slightly over the 0.619 we had for the proportion of first-class passengers that survived. ˆsurviving) e. Second-class passengers: log(odds of = 1.2680 − 0.7790
(2) = − 0.29; predicted odds of surviving = e −0.29 = 0.7483; predicted
probability of surviving = 0.7483/1.7483 = 0.428, this is just slightly under the 0.430 we had for the proportion of second-class passengers that survived. Third-class passengers: log(odds ˆ of surviving) = 1.2680 − 0.7790(3) = − 1.069; predicted odds of surviving = e −1.069 = 0.3436; predicted probability of surviving = 0.3436/1.3436 = 0.256, this is just slightly over the 0.255 we had for the proportion of third-class passengers that survived. 6.2.19 a. 72.75% of the female passengers survived and 19.10% of the male passengers survived.
16/10/20 8:30 PM
Solutions to Exercises b.
Section 6.3
c. Yes, there is strong evidence of an association between passenger sex and surviving because the p-value for the slope is less than 0.0001.
6.3.1 C.
d.
6.3.3 B.
this is the same as 0.7275, the proportion of female passengers who survived. e.
this is exactly what we had for the proportion of male passengers who survived. 6.2.20
6.3.2 C. 6.3.4 B, D. 6.3.5 a. Temperature above 50°F(temp = 0), artificial turf (turf = 1), elevation ≥ 4,000 ft (elv = 1), no precipitation (precip = 0), and wind < 10 mph (wind = 0) all improve the probability of success. b. The kicker. c.
a. b. Yes, there is strong evidence of an association between a passenger’s fare and surviving because the p-value for the slope is less than 0.0001. c. d. A 50% probability is the same as odds of 1 and because log(1) = 0 we need to solve the following: doing so gives us fare = 70.73 pounds. 6.2.21 a. b. Because the slope is positive, the higher the hematocrit value, the higher the likelihood of surviving. c. Yes, there is strong evidence of an association between hematocrit level and surviving because the p-value for the slope = 0.00045.
77
d. e 0.299 = 1.349 times larger. e. e (10)(0.106) = 2.886. 6.3.6 a. Under the best conditions the equation simplifies to (Because the NFL record distance is 64 yards, the model doesn’t appear to work well for the best conditions.).
b. Under the worst conditions the equation simplifies to log( odds ˆ of success) = 5.953 − 0.106(Dist) − 0.341(1) + 0.299(0) + 0.694(0) − 0.280(1) − 0.140(1) = 5.192 − 0.106(dist). A probability = 0.50 is the same as odds = 1 or log(odds) = 0, so solving 0 = 5.192 − 0.106(Dist), we get dist = 48.98 yards. a. Chemistry: log(odds ˆ of success) = − 0.875 − 0.032(URM) + of success)= 0.051 − 1.647(URM) 0.131(sex); Not Chemistry: log(odds ˆ
d. A 50% probability is the same as odds of 1 and because log(1) = 0 we need to solve the following: Doing so gives us hematocrit = 13.77.
6.3.7
e. The hematocrit value should be above 13.77.
− 0.641(sex).
6.2.22
b. A male, non-URM is more likely to publish in the non-chemistry departments because the intercept is larger for that equation in part (a). For a male, non-URM, all the inputs would be 0 so we only need to look at the intercepts.
a. b. Because the slope is negative, the higher the lactic acid value, the less likelihood of surviving. c. Yes, there is strong evidence of an association between lactic acid and surviving because the p-value for the slope is 0.003. d. A 50% probability is the same as odds of 1 and because log(1) = 0 we need to solve the following: 0 = 1.7968 − 0.2738(lactic acid); doing so gives us lactic acid = 6.56. e. The lactic acid value should be below 6.56. 6.2.23 a. The slope coefficient and the SE are both ten times larger for agedecade than for age. b. The odds ratio of survival is e −1.135 = 0.3214, meaning that the odds of survival are 0.3214 times as much after 10 years. c. e −1.135−2(0.072) to e −1.135+2(0.072) = 0.2783 to 0.3712. We can be 95% confident that in the population, for a one decade increase in time, the odds of survival are predicted to be 0.2783 to 0.3712 times as much. d. The endpoints of the confidence interval for the decade are the same as those for the year raised to the 10th power (with a little rounding error). For example, the lower endpoint for the year raised to the 10th power is 0.879910 = 0.2782. e. Answers will vary.
c06InstructorSolutions.indd 77
c. The effect of being a URM on publishing is greatest in the nonchemistry departments because the coefficient on that term is larger in absolute value (−1.647 compared to −0.032). Because the coefficients are negative, being a URM decreases the probability of publishing. d. The effect of being a female on publishing is greatest in the non-chemistry departments because the coefficient on that term is larger in absolute value (−0.641 compared to 0.131). Because the coefficient for non-chemistry departments is negative, being a female decreases the probability of publishing. The opposite is true in the chemistry department where being a female increases the probability of publishing. e. A non-URM male who is not in the chemistry department has the greatest probability of publishing. The odds are or a probability of 1.0523/2.0523 = 0.513. Being a URM male that is not in the chemistry department has the least probability of publishing. The odds are or a probability of 0.2027/1.2027 = 0.17. a. Female: log(odds ˆ of success)= 0.659 − 0.616(URM) + 0.204(chem) ˆsuccess) = 0.713 − 0.616 + 0.630(URM × chem); Male: log(odds of 6.3.8
(URM) + 0.198(chem) + 0.630(URM × chem).
16/10/20 8:30 PM
78
C HA PTER 6
Categorical Response Variable
b. A non-URM, non-chemistry department member is more likely to present if male because the intercept is larger for that equation in part (a). For a non-chemist, non-URM, all the inputs would be 0 so we only need to look at the intercepts. c. The effect of being a URM on presenting is the same between males and females because the coefficient on the URM terms is the same in both equations as is the coefficient on the (URM × chem) term. Because the coefficient is negative on the URM, being a URM decreases the probability of presenting for those not in the chemistry department. Because the sum of the coefficients on the URM and (URM × chem) is positive there is an increase in the probability of presenting for URM in the chemistry department. d. The effect of being in the chemistry department is greater for females than for males because the coefficient on the chem term is larger in the equation for females and the coefficient is the same in both equations on the (URM × chem) term. Because all of these coefficients are positive, the effect of being in the chemistry department increases the probability of presenting whether a URM or not. e. Being a URM male that is in the chemistry department has the greatest probability of presenting. The odds are e 0.713−0.616+0.198+0.630 = 2.5219 or a probability of 2.5219/3.5219 = 0.72. Being a non-URM female that is not in the chemistry department has the least probability of presenting. The predicted odds are e 0.659 = 1.9329or a predicted probability of 1.9329/2.9329 = 0.66. 6.3.9 a.
alive, the bold numbers in the Dead rows were predicted to die and did die. The non-bold numbers in the Alive row were predicted to die but did not. The non-bold numbers in the Dead rows were predicted to be alive but were not. Putting all those together we get the following prediction table. From this the correct classification rate is (910 + 130)/1237 = 0.841 which is the same as was obtained when the interaction was included in the model.
Smokers Age 21
29.5
39.5
49.5
59.5
69.5
Alive
53
121
95
103
64
7
Dead
2
3
14
27
51
29
Total
55
124
109
130
115
36
21
29.5
39.5
59.5
69.5
Alive
61
152
114
66
81
28
Dead
1
5
7
12
40
101
Total
62
157
121
78
121
129
Non-smokers Age 49.5
Predicted alive Predicted dead
b. c. The larger intercept in the equation using all the data means that for young people, it is predicting a higher probability of being alive. The smaller slope means that the probability of being alive decreases faster as age increases. d. Setting the equations equal to each other and solving for age we get: 7.708 − 0.125(age) = 8.409 − 0.137(age); 0.012(age) = 0.701; age = 58.4. So, the equation that includes the 77 oldest people will predict a lower probability of surviving for nonsmokers that are older than 58.4. ˆ a. For nonsmokers: so the predicted odds of being alive is e −2.4825= 0.0835 and the predicted probability of being alive is 0.0835/1.0835 = 0.077. For smokers: ˆ so the predicted odds of being alive is and the predicted probability of being alive is 0.1217/1.1217 = 0.1085. 6.3.10
b. It is just like the one shown. c. Correct classification rate = (910 + 207)/1314 = 0.850. ˆ a. Non-smokers: 6.3.11
Total
Actual alive
910
35
945
Actual dead
162
130
292
1,072
165
1,237
Total 6.3.12
a. Females in first-class had the highest survival rate and males in second-class had the lowest. See table.
First class
Second class
Third class
Proportion of females that survived
96.53%
88.68%
49.07%
Proportion of males that survived
34.08%
14.62%
15.21%
b. ˆ c. For females in first class we get log(odds of survival) = 2.9633 −
0.8603(1) − 2.5150(0) = 2.103; odds of survival = e 2.103 = 8.1907; probability of survival = 8.1907/9.1907 = 0.8912. All other categories were done similarly and are shown the Solution 6.3.12c table. Females in first-class had the highest probability of survival and males in thirdclass had the lowest.
b. A probability of 0.50 is the same as odds of 1 and log odds of 0 so solving 0 = 6.958 − 0.116(age)for age we get 60.
d. The odds of survival for females are predicted to be e 2.5150 = 12.37 times larger than for males.
c. Solving for age we get 62.
e. The odds of survival for first-class passengers are predicted to be e 0.8603 = 2.36 times larger than for second-class passengers. The same is true for second-class passengers compared to third. The odds of survival for first-class passengers are predicted to be e 0.8603×2 = 5.59times larger than for third-class passengers.
d. From part (c) we see that all those 59.5 years old and younger have more than a 50% chance of survival so are predicted to survive whereas all others are not. In the tables that follow, the bold numbers in the Alive rows represent those that were predicted to be alive and were
c06InstructorSolutions.indd 78
16/10/20 8:30 PM
79
Solutions to Exercises f. See prediction table for Solution 6.3.12.f. The correct classification rate is (339 + 682)/1,309 = 0.780.
First class
Second class
Third class
Probability that females survive
0.8912
0.7760
0.5945
Probability that males survive
0.3984
0.2189
0.1060
e. The prediction table is as follows. The correct classification rate is (342 + 671)/1,309 = 0.774.
Solution 6.3.12.c
Predicted dead
Total
Actual alive
339
161
500
Actual dead
127
682
809
Total
466
843
1,309
Solution 6.3.12f
6.3.13 a.
b. For females in first class we get All other categories were done similarly and are shown the table below. Females in first class had the highest probability of survival and males in third class had the lowest.
First class
Second class
Third class
Probability that females survive
0.9748
0.8609
0.4971
Probability that males survive
0.3089
0.2128
0.1406
c. See prediction table that follows. The correct classification rate is (233 + 792)/1,309 = 0.783.
Predicted dead
Total
Actual alive
342
158
500
Actual dead
138
671
809
Total
480
829
1,309
a.
b. As fare increases the predicted probability of survival for females also increases because the fare term has a positive coefficient. Because the sum of the fare and interaction coefficients is positive, the predicted probability of survival also increases for males as Fare increases. c. The sex coefficient is negative meaning if we put males (sex = 1) into the equation it will reduce the predicted log odds of survival, so males are less likely to survive and hence females are more likely to survive. d. The prediction table is below. The correct classification rate is (341 + 682)/1,309 = 0.782.
Predicted alive
Predicted dead
Total
Actual alive
341
159
500
Actual dead
127
682
809
Total
468
841
1,309
6.3.16 a. b. Yes, the p-value on the hematocrit term is 0.0219 and on the lactic acid term is 0.0106. c. Higher levels of hematocrit lead to survival because the coefficient is positive on that term. d. Lower levels of lactic acid lead to survival because the coefficient is negative on that term. ˆs of survival for this turtle are e −0.6502+0.1532(22)−0.2850(3.3) = e. Odd
Predicted alive
Predicted dead
Total
233
5.9281, so the predicted probability of survival is 5.9281/6.9281 = 0.856. f. See the following prediction table.
Actual dead
17
Total
250
1,059
1,309
6.3.14 a. b. As fare increases the probability of survival also increases because the Fare/100 term has a positive coefficient. c. The sex coefficient is negative meaning if we put males (which is sex = 1) into the equation it will reduce the predicted log odds of survival, so males are less likely to survive and hence females are more likely to survive.
c06InstructorSolutions.indd 79
Predicted alive
6.3.15
Predicted alive
Actual alive
d. The predicted odds that females will survive are times higher than for males.
Actual alive
Predicted alive
Predicted dead
Total
21
4
25
12
17
26
16
42
Actual dead Total
6.3.17 No, the interaction term is not significant (p-value = 0.88). 6.3.18 a. 14/21 ⇒ (implies) 66.7% of the males agreed to the short survey and 3/19 ⇒ 15.8% agreed to the long survey.
16/10/20 8:30 PM
80
C HA PTER 6
Categorical Response Variable
b. 17/23 ⇒ 73.9% of the females agreed to the short survey and 11/22 = 50% agreed to the long survey. c. d. The odds of agreeing for a female on a short survey are e 0.4133−1.6243(0)+0.9624(1) = 3.9578 for a probability of 3.9578/4.9578 = 0.798. This is a little above the actual 0.739. Similarly, for females with a long survey the probability is 0.438 which is a lower than the actual 0.50. Males with a long survey the probability is 0.230 which is above the actual of 0.158. Males with a short survey the probability is 0.602 which is below the actual proportion of 0.667. e. The correct classification rate is (31 + 27)/85 = 0.682. See the following prediction table.
Predicted agree
Predicted not agree
Total
Actual agree
31
14
45
Actual not agree
13
27
40
Total
44
41
85
6.3.19 a. using indicator coding. b. The correct classification rate is (0 + 4195)/4514 = 0.929. See the forthcoming prediction table. This seems like a good model as it correctly predicts almost 93% of the observed alcohol abuser statuses; however, it doesn’t predict anyone to be an alcoholic abuser as the 93% correct predictions are all of non-abusers. c. The correct classification rate is (246 + 3276)/4514 = 0.780. See the forthcoming prediction table. Using the new cutoff, 78% of the observed alcohol abuser statuses are correctly predicted, so using the new cut-off does improve the correct prediction of actual alcohol abusers. Prediction table using 0.50 cut-off:
b. The correct classification rate is now (249 + 3272)/4514 = .780. See the new forthcoming prediction table. This correct classification rate is not higher than the model without interactions. c. The correct classification rate is now (246 + 3303)/4514 = .786 See new prediction table. This correct classification rate is slightly higher than the model without interactions. d. The model with live Chernobyl and age interaction does better than the other, but not by enough to make it worth including the additional term in the model. Prediction table using 0.071 cut-off:
Predicted alcohol abuser
Predicted not alcohol abuser
Total
Actual alcohol abuser
246
73
319
Actual not alcohol abuser
919
3276
4195
Total
1165
3349
4514
Prediction table using 0.071 cut-off with interaction sex × live Chernobyl in model:
Predicted alcohol abuser
Predicted not alcohol abuser
Total
Actual alcohol abuser
249
70
319
Actual not alcohol abuser
923
3272
4195
Total
1172
3342
4514
Prediction table using 0.071 cut-off with interaction age × live Chernobyl in model:
Predicted alcohol abuser
Predicted not alcohol abuser
Total
Actual alcohol abuser
0
319
319
Actual not alcohol abuser
0
4195
4195
Total
0
4514
4514
Prediction table using 0.071 cut-off:
Predicted alcohol abuser
Predicted not alcohol abuser
Total
Actual alcohol abuser
246
73
319
Actual not alcohol abuser
892
3303
4195
Total
838
3376
4514
6.3.21
Predicted alcohol abuser
Predicted not alcohol abuser
Total
Actual alcohol abuser
a. The observed odds of pain cessation for the treatment group are(7/10)/(3/10) = 2.33.
246
73
319
Actual not alcohol abuser
b. The observed odds of pain cessation for the control group is (1/8)/ (7/8) = 0.1429.
919
3276
4195
Total
1165
3349
4514
c. Odds ratio is ~2.33/0.1429 = 16.3333. ˆimprovement) d. log(odds of = − 1.9459 + 2.7932(treatment). Treat-
6.3.20 a. The correct classification rate is (246 + 3276)/4514 = .780. See the next prediction table.
c06InstructorSolutions.indd 80
ment is a significant predictor of log odds of cessation in pain because the p-value is 0.0281. The predicted odds of pain cessation for those receiving the treatment are times greater than for those not receiving treatment. This is the same as the observed odds ratio from part (c).
16/10/20 8:30 PM
Solutions to Exercises e. The correct classification rate is (7 + 7)/18 = 0.78. See prediction table as follows.
Predicted improver
Predicted not improver Total
Actual improver
7
1
8
Actual not improver
3
7
10
Total
10
8
18
f. Age is not a significant predictor of improvement (p-value = 0.7200) and there is no change to the correct classification rate. Sex is not a significant predictor of improvement (p-value = 0.2173) and there is no change to the correct classification rate. Duration is not a significant predictor of improvement (p-value = 0.5337) and there is no change to the correct classification rate.
81
d. H0: There is no association between type of care and listening to audiotapes vs. Ha: There is an association between type of care and listening to audiotapes; statistic = diff in proportions = −0.232 (answers may vary); estimated (two-sided) p-value = 0.078 (answers may vary); there is moderate (but not strong) evidence that the type of care has an association with whether a woman listens to audiotapes for relaxation. e. H0: There is no association between type of care and listening to audiotapes vs. Ha: There is an association between type of care and listening to audiotapes; chi-square = 4.115; theory-based (two-sided) p-value = 0.0425; there is strong evidence against the null and for the alternative that type of care is associated with whether a woman listens to audiotapes for relaxation. A theory-based test may not be valid because all observed cell counts are not at least 10. 6.CE.3 a. −0.201.
g. Only treatment is significant in the model with age, sex, monthsduration, and treatment. The correct classification rate does increase to 0.83. Because this rate is only the result of one more person being correctly classified, I would stay with the model that only involves treatment.
b. 2.286; women in the intervention group are 2.286 times as likely to have specific dietary habits as women in the control group.
Predicted improver
Predicted not improver
Actual improver
6
2
8
Actual not improver
1
9
10
Total
7
11
18
d. H0: There is no association between type of care and having specific dietary habits related to health vs. Ha: There is an association between type of care and having specific dietary habits related to health; statistic = diff in proportions = −0.201 (answers may vary); estimated (two-sided) p-value = 0.121 (answers may vary); there is little evidence that type of care affects whether a woman has specific dietary habits related to health.
Total
End of Chapter 6 Exercises 6.CE.1 a. −0.036. b. 1.017; women in the intervention group are 1.017 times as likely to exercise regularly as women in the control group. c. 1.154; the odds of being a regular exerciser were 1.154 times as large for a woman in the intervention group compared to a woman in the control group. d. H0: There is no association between type of care and exercising regularly vs. Ha: There is an association between type of care and exercising regularly; statistic = diff in proportions = −0.036 (answers may vary); estimated (two-sided) p-value = 0.815 (answers may vary); there is very weak evidence against the null, thus it is plausible that type of care is not associated with exercise habits. e. H0: There is no association between type of care and exercising regularly vs. Ha: There is an association between type of care and exercising regularly; z-statistic = −0.28 or chi-square = 0.076; theorybased (two-sided) p-value = 0.7824; there is very weak evidence against the null, thus it is plausible that type of care is not associated with exercise habits. A theory-based test is valid because all observed cell counts are at least 10. 6.CE.2 a. −0.232. b. 1.371; women in the intervention group are 1.371 times as likely to listen to audiotapes as women in the control group. c. 3.6; the odds of listening to audiotapes were 3.6 times as large for a woman in the intervention group compared to a woman in the control group.
c06InstructorSolutions.indd 81
c. 3.0; the odds of having specific dietary habits related to health were 3 times as large for a woman in the intervention group compared to a woman in the control group.
e. H0: there is no association between type of care and having specific dietary habits related to health vs. Ha: there is an association between type of care and having specific dietary habits related to health; chi-square = 3.214; theory-based (two-sided) p-value = 0.073; there is moderate (but not strong) evidence that the type of care has an association with whether a woman has specific dietary habits related to health. A theory-based test may not be valid because all observed cell counts are not at least 10. 6.CE.4 a. The intercept = 0 tells us that the predicted odds of regular exercise for control group should be e0 = 1 which is the same as 16/16 = 1, as seen in the data. The slope tells us that the predicted odds ratio for the intervention group compared to the control group is e0.143 = 1.154 which matches the odds ratio from the two-way table of counts. b. predicted log odds(audiotapes) = 0.511 + 1.28 × care group (intervention); 0.511 = ln((20/32)/(12/32); 1.28 = ln((24/28)/(4/28) −0.511. c. predicted log odds(dietary habits) = −1.686 + 1.099 × care group (intervention). 6.CE.5 a. predicted log odds (displayed aggresssion towards male peer) = 2.303 − 1.609 × sex. b. The intercept = 2.303 says that the predicted odds of a female preschooler displaying nonverbal aggressive behavior towards a male peer are e2.303 ≈ 10.00, implying that the proportion of females displaying nonverbal aggressive behavior toward male peers was about 0.909. c. The slope says that predicted odds ratio for the male preschoolers compared to the female preschoolers is e–1.609 = 0.200, implying that
16/10/20 8:30 PM
82
C HA PTER 6
Categorical Response Variable
the odds of males displaying nonverbal aggressive behavior towards male peers are about 0.200 times as large as those of females. d. H0: There is no difference between male and female preschoolers with regard to display of nonverbal aggression towards male peers, versus Ha: There is a difference. Yes, the data provide some evidence (odds ratio = 0.200, p-value = 0.0609) that there is a difference between male and female preschoolers with regard to display of nonverbal aggression towards male peers. Due to some of the small cell counts it is better to use a simulation-based p-value or Fisher’s Exact Test rather a theory-based p-value.
c. The predicted odds ratio for those who take cholesterol meds compared to those who don’t is e1.33 = 3.86, implying that the odds of having diabetes for those who take cholesterol meds are 3.86 times as large as for those don’t take cholesterol meds. d. H0: There is no association between taking cholesterol meds and whether the person has diabetes vs. Ha: There is an association. Yes, the data do provide convincing evidence (p-value < 0.0001) that taking cholesterol meds is a predictor of whether the person has diabetes. e. See the forthcoming prediction table. Correct classification rate is 84.3%.
6.CE.6 a. predicted log odds(displayed aggression towards female peer) = 1.504 + 1.631 × sex. b. The intercept = 1.504 says that the predicted odds of a female preschooler displaying nonverbal aggressive behavior towards a female peer are e1.504 ≈ 4.500, implying that the proportion of females displaying nonverbal aggressive behavior toward female peers was about 0.818. c. The slope says that predicted odds ratio for the male preschoolers compared to the female is e1.631 = 5.109, implying that the odds of a male displaying nonverbal aggressive behavior towards female peers was 5.109 times larger than those of females. d. H0: There is no difference between the male and female preschooler populations with regard to display of nonverbal aggression towards female peers, versus Ha: There is a difference. No, the data do not provide convincing evidence (odds ratio = 5.109, p-value = 0.1601) that there is a difference between male and female preschoolers with regard to display of nonverbal aggression towards female peers. Due to some of the small cell counts it is better to use a simulation-based p-value (or Fisher’s Exact Test) rather a theory-based p-value. 6.CE.7 a. predicted log odds(diabetes) = −5.137 + 0.118 × BMI. −5.137
b. When BMI = 0, the predicted odds of diabetes are e not practically meaningful.
= 0.006;
c. For every 1kg/m2 increase in BMI, the predicted odds of diabetes increase by e0.118 = 1.125. d. H0: A person’s BMI is not a predictor of whether the person has diabetes, vs. Ha: A person’s BMI is a predictor of whether the person has diabetes. Yes, the data do provide convincing evidence (p-value < 0.0001) that a person’s BMI is a predictor of whether the person has diabetes. e. See the next prediction table. Correct classification rate is 84.1%.
Predicted diabetic
Predicted not diabetic
Total
Actual diabetic
0
444
444
Actual not diabetic
0
2384
2384
Total
0
2828
2828
6.CE.9 a. predicted log odds(diabetes) = −5.77 + 1.288 × chol med(yes) + 0.115 × BMI. b. For those who don’t take meds: predicted log odds(diabetes) = −5.77 + 0.115 × BMI; For those who take meds: predicted log odds(diabetes) = −4.482 + 0.115 × BMI. c. Yes, both variables have very small p-values (< 0.0001), so the data suggest both are statistically significant predictors. d. See the forthcoming prediction table. The correct classification rate is 85.1%.
Predicted diabetic
Predicted not diabetic
Total
Actual diabetic
43
401
444
Actual not diabetic
20
2364
2384
Total
63
2765
2828
e. There would be an interaction between BMI and whether a person takes cholesterol medication if the relationship between BMI and diabetes status changes based on whether someone takes cholesterol medications. f. There is no evidence of an interaction based on the p-value = 0.201; The correct classification rate is almost exactly the same at about 85.2%.
Chapter 6 Investigation
Predicted diabetic
Predicted not diabetic
Total
Actual diabetic
16
428
444
Actual not diabetic
21
2363
2384
Total
37
2791
2828
1. The observational units are 52 healthy females in Portugal and 64 Portugal women newly diagnosed with breast cancer. 2. This was an observational study as all variables were merely observed; nothing was manipulated by the investigators.
a. predicted log odds(diabetes) = −2.414 + 1.35 × chol med(yes).
3. The study didn’t make use of either random sampling or random assignment. This information is relevant because it determines how broadly we can generalize the results of the study and whether or not any causal conclusions can be made.
b. The predicted odds of diabetes are e−2.414 = 0.090 for those who do not take cholesterol medications.
4. The response variable is disease status, it is binary categorical as each woman is categorized as having breast cancer or not.
6.CE.8
c06InstructorSolutions.indd 82
16/10/20 8:30 PM
Solutions to Exercises 83 5.
Observed Variation in:
Sources of explained variation
Disease status Inclusion criteria Women in Portugal either newly diagnosed with breast cancer, or healthy with no prior treatment for cancer
• Age • BMI • Glucose • Resistin
Sources of unexplained variation • Family history of breast cancer • SES • Diet • Exercise • Number of children • Stressors
6. ( 34 / 78) / (44 / 78) = 0.7727. 7. ( 30 / 38) / (8 / 38) = 3.75. 8. 3.75/0.7727 = 4.853. The odds of cancer are 4.8531 times greater for a woman with high blood glucose than for a woman with normal blood glucose. 9. Yes, it seems that a woman with high blood glucose levels is more likely (almost five times more likely) to have breast cancer than a woman with normal blood glucose levels. 10. Because roughly equal numbers of women with breast cancer and women free of breast cancer were recruited to the study, the proportion of women with cancer in different subgroups will not reflect the proportions in the population. 11. ˆ where blood glucose level
el is the better predictor of disease status. Based on the following table, the correct classification rate based on the model is (49 + 35)/116 = 0.724. So, this model would correctly predict the disease status of 72.4% of the sample.
Predicted with cancer
Predicted healthy
Total
Actual with cancer
49
15
64
Actual healthy
17
35
52
Total
66
50
116
17. ˆ For each one mmol/L increase in blood glucose level, the predicted log odds for breast cancer are 0.085 times larger. 18. ˆ
19. H 0 : βglucose = β age = β BMI = β resistin = 0 vs. Ha: At least one is different from the rest. The p-value for the model is < 0.0001. So, we have strong evidence against the null hypothesis and in support of the alternative that at least one of the slope coefficients is different from 0. 20. age: Chi-square = 2.46, p-value = 0.1165; BMI: Chi-square = 8.124, p-value = 0.0044; glucose: Chi-square = 18.342, p-value < 0.0001; resistin: Chi-square = 4.81, p-value = 0.0282.
= 1 if normal and 0 if high. The y-intercept of 1.322 implies that the predicted odds of breast cancer when glucose level group is 0 (high blood glucose) are e1.322 = 3.75. The slope is −1.580 which implies that the predicted odds of breast cancer for those with normal blood glucose are e−1.580 = 0.206 times those for women with high blood glucose.
21. We are 95% confident that the odds of breast cancer are between 1.05 and 1.15 times (or 5% to 15% higher) for each one mmol/L increase in blood glucose level. We are 95% confident the odds of breast cancer are between 5% lower and 1% higher for each additional year of age. We are 95% confident the odds of breast cancer decrease by 4.12% to 20.33% for each increase of 1 in the BMI. We are 95% confident the odds of breast cancer are between 0.7% and 13.5% higher for each increase of 1 ng/mL in the blood resistin level.
12. Using the upcoming table, the correct classification rate based on the model is (30 + 44)/116 = 0.638. So, this model would correctly predict the disease status of 63.8% of the sample.
22. If there were an interaction between glucose level and age, the log odds of breast cancer would change differently as blood glucose levels changed for different ages of women. 23. ˆ
Predicted with cancer
Predicted healthy
Total
Actual with cancer
30
34
64
Actual healthy
8
44
52
is significant.
Total
38
78
116
24. It appears that glucose, resistin, and BMI measures are good predictors of breast cancer. As each of these predictors increases, so do the odds of breast cancer. No causal connection can be made between these predictors and the incidence of breast cancer because this was an observational study and none of the predictors was randomly assigned. We can generalize these findings to women similar to those who participated in the study in Portugal.
13. There is evidence of an association between disease classification and glucose classification as the p-value for the model is 0.0003.
14. ˆ For a woman with 0 blood glucose level, the predicted odds of breast cancer are e−7.2 = 0.00075. The predicted odds of breast cancer are e0.079 = 1.08 times, or 8% higher for each one mmol/L increase in blood glucose level.
15. There is evidence of an association between disease classification and glucose classification as the p-value for the model is < 0.0001. 16. The p-value decreased and the percentage correctly classified increased when blood glucose was used as a quantitative predictor and not as a binary categorical predictor. This makes sense as quantitative variables are more informative than categorical variables. glucose lev-
c06InstructorSolutions.indd 83
that this interaction
25. To get a better idea of incidence of breast cancer in the population a cross-sectional sample could be explored, including gathering information on similar demographic variables as in the case control study. It would be interesting to run randomized trials to see if medications that control blood glucose levels of women at risk for breast cancer might be preventative in disease expression. This may pose difficulties as women at risk may want to pursue more aggressive and proven treatments, so inclusion in the study may be difficult.
16/10/20 8:30 PM
c06InstructorSolutions.indd 84
16/10/20 8:30 PM
PRELIMINARIES
Multivariable Thinking and Sources of Variation P.A.1 C.
P.A.7
P.A.2 D.
a. A larger proportion of the subjects did not take a college science course because the width of the No bars is larger than the width of the Yes bars.
P.A.3 a. The observational units are the 2856 people that answered the survey. b. The explanatory variable is was the respondent born in the U.S. or not? It is a binary categorical variable. c. The response variable is the size of the respondent’s household; it is quantitative. P.A.4 a. The observational units are the 100 penguins in the study. b. The explanatory variable is was the penguin tagged with a metal band; it is binary categorical. c. The response variable is was the penguin still alive after 4.5 years; it is a binary categorical variable. P.A.5
b. The proportion of those who had taken a college science class that rated their health as excellent is much larger (looks a little less than double) than the proportion of those who had not taken a college science class that rated their health as excellent. We can see this in the heights of the bottom bars in the graph. c. The proportion of those who had taken a college science class that rated their health as good is about the same as the proportion of those who had not taken a college science class that rated their health as good. We can see this in very similar heights of the solid bars that are second from the bottom in the graph. d. The proportion of those who had not taken a college science class that rated their health as poor is much larger (roughly double) than the proportion of those who had taken a college science class that rated their health as poor. We can see this in the heights of the top bars in the graph.
a. There are more right-handed students because the widths of the right-handed bars are greater than the widths of the left-handed bars.
P.A.8
b. The left-handed group has a larger proportion with allergies.
b. Left-handers are more likely to have allergies.
c. Whether someone is left- or right-handed is the explanatory variable because that variable is displayed on the horizontal axis. P.A.6 a. There was a larger proportion of respondents born in the United States because the width of the Yes bars is much larger than that of the No (not born in the United States) bars. b. The proportion of those not born in the United States that strongly agreed was much larger (looks like almost double) than the proportion of those born in the United States that strongly agreed. We can see this in the heights of the bottom “strongly agree” bars in the graph. c. The proportion of those born in the United States that strongly disagreed was much larger (looks like almost double) than the proportion of those not born in the United States that strongly disagreed. We can see this in the heights of the top “strongly disagree” bars in the graph. d. The proportion of those not born in the United States that disagreed was about the same as the proportion of those born in the United States that disagreed. We can see this in the very similar heights of the “disagree” bars that are second from the top in the graph.
a. Right-handers are more likely to have allergies. c. Overall, left-handers are more likely to have allergies. Combining the percentages with allergies for the left-handers of each sex looks like at least 50% of them have allergies. (The average of 33.3% and 64.3% in the graph is 48.8%, but it looks like there are more males that are left-handed than females, so that percentage will be a bit larger.) And clearly less than 50% of the right-handers have allergies because the percentage for each sex is less than 50%. P.A.9 a. Male proportion = 0.629, Female proportion = 0.680, so the Female proportion is higher. b. Male proportion = 0.337, Female proportion = 0.352, so the Female proportion is higher. c. See table.
Programs B and D Male Applicant Female Applicant Total Accepted
489
149
638
Not Accepted
478
251
729
Total
967
400
1397
1
PreliminariesInstructorSolutions.indd 1
16/10/20 9:02 PM
2
PR ELIMIN AR I E S
Multivariable Thinking and Sources of Variation
d. Male proportion = 0.506, Female proportion = 0.373, so the Male proportion is higher.
P.B.1 A. P.B.2 C.
e. Yes, this is an example of Simpson’s Paradox because the females had higher acceptance proportions in both programs B and D, but when they were aggregated, Males had the higher overall acceptance proportion.
P.B.4 A.
P.A.10
P.B.6 C.
a. Male proportion = 0.369, Female proportion = 0.341, so the Male proportion is higher.
P.B.7 a. Height.
b. Male proportion = 0.278, Female proportion = 0.242, so the Male proportion is higher.
b. Arm span and sex.
c. See the table that follows.
P.B.3 C. P.B.5 D.
c. Ethnicity and Age (but see also Inclusion Criterion), unknown. d. H.S. seniors.
Programs C and E Male applicant
Female applicant
Total
Accepted
173
297
Not accepted
343
689
1,032
P.B.8 Graph A is residuals based on just temperature because it is centered at 0 and the standard deviation is larger than that in Graph C. Graph B is the actual chirp rates because it is not centered at 0. Graph C is residuals based on both temperature and species because it is centered at 0 and the standard deviation is the smallest of the three.
Total
516
986
1,502
P.B.9
d. Male proportion = 0.335, Female proportion = 0.301, so the Male proportion is higher. e. No, this is not an example of Simpson’s Paradox because the Males had higher proportions of acceptances in both programs C and E, and when the data were aggregated the Males still had the higher proportion of acceptances. P.A.11 In both individual years, Justice had higher batting averages (1995: 104/411 = 0.253, and 1996: 45/140 = 0.321) than did Jeter (1995: 12/48 = 0.250 and 1996: 183/582 = 0.314). However, when the data for the two years are combined, we see that Justice had a lower average (149/551 = 0.270) than did Jeter (195/630 = 0.310). This reversal of direction of the better average when the data are aggregated shows that this is an example of Simpson’s Paradox. It is happening here because in the year when they both had lower averages, Jeter was at bat just a few times (48) but when they both had high averages Justice was at bat a relatively few number of times (140).
PreliminariesInstructorSolutions.indd 2
a. –7.931 + 0.118(20) = 1.567 chirps per second for Karschi and –0.458 + 0.118(20) = 1.902 chirps per second for Fultoni; the Fultoni cricket is chirping at a higher rate. b. The Fultoni cricket will be chirping at a higher rate because that equation has the larger of the two y-intercepts, but the slopes of both equations are the same. c. 1.946 – 1.902 = 0.044. d. The slope of 0.118 indicates that as temperature increases by 1°C the predicted chirp rate of both crickets is predicted to increase by 0.118 chirps per second. P.B.10 a. 40.03°F. b. The slope of 0.2229 indicates that as the chirp rate increases by 1 chirp per minute, the predicted temperature increases by 0.2229°F.
16/10/20 9:02 PM