SOLUTIONS MANUAL FOR ST A TISTICS THE ART AND SCIENCE OF LEARNING FROM DATA FOURTH EDITION Alan Agre by welldoneassistant

CONTENTS

Part One: Gathering and Exploring Data Chapter 1: Statistics: The Art and Science of Learning from Data Section 1.1: Using Data to Answer Statistical Questions ................................................1 Section 1.2: Sample Versus Population ...........................................................................1 Section 1.3: Using Calculators and Computers ...............................................................3 Chapter Problems: Practicing the Basics .........................................................................4 Chapter Problems: Concepts and Investigations..............................................................5 Chapter Problems: Student Activities..............................................................................5

Chapter 2: Exploring Data with Graphs and Numerical Summaries Section 2.1: Different Types of Data ...............................................................................7 Section 2.2: Graphical Summaries of Data......................................................................8 Section 2.3: Measuring the Center of Quantitative Data ...............................................14 Section 2.4: Measuring the Variability of Quantitative Data ........................................16 Section 2.5: Using Measures of Position to Describe Variability .................................20 Section 2.6: Recognizing and Avoiding Misuses of Graphical Summaries ..................25 Chapter Problems: Practicing the Basics .......................................................................26 Chapter Problems: Concepts and Investigations............................................................35 Chapter Problems: Student Activities............................................................................39

Chapter 3: Association: Contingency, Correlation, and Regression Section 3.1: The Association Between Two Categorical Variables ..............................41 Section 3.2: The Association Between Two Quantitative Variables.............................45 Section 3.3: Predicting the Outcome of a Variable........................................................49 Section 3.4: Cautions in Analyzing Associations ..........................................................55 Chapter Problems: Practicing the Basics .......................................................................62 Chapter Problems: Concepts and Investigations............................................................72 Chapter Problems: Student Activities............................................................................75

Chapter 4: Gathering Data Section 4.1: Experimental and Observational Studies...................................................77 Section 4.2: Good and Poor Ways to Sample ................................................................79 Section 4.3: Good and Poor Ways to Experiment .........................................................81 Section 4.4: Other Ways to Conduct Experimental and Nonexperimental Studies.......82 Chapter Problems: Practicing the Basics .......................................................................84 Chapter Problems: Concepts and Investigations............................................................89 Chapter Problems: Student Activities............................................................................91

Part Two: Probability, Probability Distributions, and Sampling Distributions Chapter 5: Probability in Our Daily Lives Section 5.1: How Probability Quantifies Randomness..................................................93 Section 5.2: Finding Probabilities..................................................................................94 Section 5.3: Conditional Probability..............................................................................98 Section 5.4: Applying the Probability Rules................................................................101 Chapter Problems: Practicing the Basics .....................................................................105 Chapter Problems: Concepts and Investigations..........................................................112 Chapter Problems: Student Activities..........................................................................115

Chapter 6: Probability Distributions Section 6.1: Summarizing Possible Outcomes and Their Probabilities.......................117 Section 6.2: Probabilities for Bell-Shaped Distributions.............................................120 Section 6.3: Probabilities When Each Observation Has Two Possible Outcomes ......124 Chapter Problems: Practicing the Basics .....................................................................128 Chapter Problems: Concepts and Investigations..........................................................135 Chapter Problems: Student Activities..........................................................................138

Chapter 7: Sampling Distributions Section 7.1: How Sample Proportions Vary Around the Population Proportion ........139 Section 7.2: How Sample Means Vary Around the Population Mean ........................142 Chapter Problems: Practicing the Basics .....................................................................145 Chapter Problems: Concepts and Investigations..........................................................148 Chapter Problems: Student Activities..........................................................................151

Part Three: Inferential Statistics Chapter 8: Statistical Inference: Confidence Intervals Section 8.1: Point and Interval Estimates of Population Parameters...........................153 Section 8.2: Constructing a Confidence Interval to Estimate a Population Proportion ...............................................................................................................154 Section 8.3: Constructing a Confidence Interval to Estimate a Population Mean.......157 Section 8.4: Choosing the Sample Size for a Study.....................................................160 Section 8.5: Using Computers to Make New Estimation Methods Possible...............162 Chapter Problems: Practicing the Basics .....................................................................163 Chapter Problems: Concepts and Investigations..........................................................170 Chapter Problems: Student Activities..........................................................................174

Chapter 9: Statistical Inference: Significance Tests About Hypotheses Section 9.1: Steps for Performing a Significance Test ................................................175 Section 9.2: Significance Tests About Proportions .....................................................175 Section 9.3: Significance Tests About Means .............................................................179 Section 9.4: Decisions and Types of Errors in Significance Tests ..............................182 Section 9.5: Limitations of Significance Tests ............................................................184 Section 9.6: The Likelihood of a Type II Error and the Power of a Test ....................185 Chapter Problems: Practicing the Basics .....................................................................187 Chapter Problems: Concepts and Investigations..........................................................193 Chapter Problems: Student Activities..........................................................................196

Chapter 10: Comparing Two Groups Section 10.1: Categorical Response: Comparing Two Proportions............................197 Section 10.2: Quantitative Response: Comparing Two Means ..................................200 Section 10.3: Other Ways of Comparing Means, Including a Permutation Test.........206 Section 10.4: Analyzing Dependent Samples..............................................................209 Section 10.5: Adjusting for the Effects of Other Variables .........................................213 Chapter Problems: Practicing the Basics .....................................................................214 Chapter Problems: Concepts and Investigations..........................................................223 Chapter Problems: Student Activities..........................................................................226

Part Four: Analyzing Association and Extended Statistical Methods Chapter 11: Analyzing the Association Between Categorical Variables Section 11.1: Independence and Dependence (Association) .......................................227 Section 11.2: Testing Categorical Variables for Independence...................................229 Section 11.3: Determining the Strength of the Association.........................................232 Section 11.4: Using Residuals to Reveal the Pattern of Association...........................234 Section 11.5: Fisher’s Exact and Permutation Tests....................................................235 Chapter Problems: Practicing the Basics .....................................................................236 Chapter Problems: Concepts and Investigations..........................................................241 Chapter Problems: Student Activities..........................................................................244

Chapter 12: Analyzing the Association Between Quantitative Variables: Regression Analysis Section 12.1: Modeling How Two Variables Are Related ..........................................245 Section 12.2: Inference About Model Parameters and the Association.......................247 Section 12.3: Describing the Strength of Association .................................................251 Section 12.4: How the Data Vary Around the Regression Line ..................................254 Section 12.5: Exponential Regression: A Model for Nonlinearity .............................256 Chapter Problems: Practicing the Basics .....................................................................258 Chapter Problems: Concepts and Investigations..........................................................261 Chapter Problems: Student Activities..........................................................................267

Chapter 13: Multiple Regression Section 13.1: Using Several Variables to Predict a Response .....................................269 Section 13.2: Extending the Correlation and R2 for Multiple Regression ...................273 Section 13.3: Inferences Using Multiple Regression...................................................274 Section 13.4: Checking a Regression Model Using Residual Plots.............................276 Section 13.5: Regression and Categorical Predictors ..................................................280 Section 13.6: Modeling a Categorical Response .........................................................282 Chapter Problems: Practicing the Basics .....................................................................285 Chapter Problems: Concepts and Investigations..........................................................289 Chapter Problems: Student Activities..........................................................................292

Chapter 14: Comparing Groups: Analysis of Variance Methods Section 14.1: One-Way ANOVA: Comparing Several Means...................................293 Section 14.2: Estimating Differences in Groups for a Single Factor...........................296 Section 14.3: Two-Way ANOVA................................................................................298 Chapter Problems: Practicing the Basics .....................................................................301 Chapter Problems: Concepts and Investigations..........................................................306 Chapter Problems: Student Activities..........................................................................309

Chapter 15: Nonparametric Statistics Section 15.1: Compare Two Groups by Ranking ........................................................311 Section 15.2: Nonparametric Methods for Several Groups and for Matched Pairs.....313 Chapter Problems: Practicing the Basics .....................................................................315 Chapter Problems: Concepts and Investigations..........................................................318

Chapter 1: The Art and Science of Learning from Data 1

Section 1.1 Using Data to Answer Statistical Questions 1.1 Aspirin and heart attacks a) Aspects of the study that have to do with design include the sample of 22,000 physicians, the randomization of the halves of the sample to the two groups (aspirin and placebo), and the plan to obtain percentages of each group that have heart attacks. b) Aspects having to do with description include the actual percentages of the people in the sample who have heart attacks (i.e., 0.9% for those taking aspirin and 1.7% for those taking placebo). c) Aspects that have to do with inference include the use of statistical methods to conclude that taking aspirin reduces the risk of having a heart attack. 1.2 Poverty and race a) The aspects referring to description are the percentages of the 68,000 households (18.0% of whites, 37.5% of blacks, and 13.4% of Asians) who had incomes below the poverty level. b) The statistical method that predicted that the percentage of all black households in the United States that had income below the poverty level was between 35.6% and 39.4% is an example of inference. 1.3 GSS and heaven Yes, definitely: 64.6%; Yes, probably: 20.8%; No, probably not: 8.7%; No, definitely not: 5.9% 1.4 GSS and heaven and hell a) Yes, definitely: 64.3%; Yes, probably: 20.8%; No, probably not: 8.8%; No, definitely not: 6.0% b) Yes, definitely: 52.6%; Yes, probably: 20.3%; No, probably not: 14.8%; No, definitely not: 12.3%; The percentage of “yes, definitely” responses was higher for belief in heaven in 2008. 1.5 GSS for subject you pick The results for this item will be different depending on the topic that you chose.

Section 1.2 Sample Versus Population 1.6 Description and inference a) With description, we are summarizing a group of numbers. We can use description with either samples or populations. With inferences, we use data from samples to make conclusions or predictions about populations. For example, if we ask a sample of adults how many pets they own, and take the mean number of pets, that number is a description. If we use that number to predict the mean number of pets owned by the whole population, the predicted mean (or the predicted range for the mean) would be an inference. b) Descriptive statistics would be useful to summarize data from a population. With a census, it would be unwieldy to examine everyone’s ages, for example, but it would be useful to know a mean age. Inferential statistics are not needed, however, because we already have information about the population; we don’t need to predict it. 1.7 Censorship a) The sample is the 3077 people who responded. b) The population is all adults in the United States. c) The statistic is 23% of respondents said antireligious books should be removed. 1.8 Concerned about global warming? a) The sample is the set of polled Floridians. The population is the set of all adult Florida residents. b) The percentages quoted are statistics since they are summaries of the sample. 1.9 Graduate school information a) Each student in the program is a subject. b) The sample is the students identified for an interview from the given program. c) The population is all students in the program. 1.10 Is globalization good? a) The samples are those people selected from each country to participate in the survey. The populations are all adults in Africa and all adults in North America. b) These are statistics because they represent a summary of the sample data. Copyright © 2017 Pearson Education, Inc.

2 Statistics: The Art and Science of Learning from Data, 4th edition 1.11 Graduating seniors’ salaries a) These are descriptive statistics. They are summarizing data from a population – all graduating seniors at a given school. b) These analyses summarize data on a population – all graduating seniors at a given school; thus, the numerical summaries are best characterized as parameters. 1.12 At what age did women marry? a) The mean age of 24.1 years for this sample is descriptive. b) The historian estimates the age for the whole population of brides in early 19th century New England, estimating the average age to fall between 23.5 and 24.7. This is inferential. c) The inference refers to the population of all New England brides between the years of 1800 and 1820. d) The average of 24.1 years is based on a sample and is therefore a statistic. 1.13 Age pyramids as descriptive statistics a) The bar graph for 1750 shows shorter and shorter bars as age increases indicating that there were few Swedish people who were old in 1750. b) For every age range, the bars are much longer for both men and women in 2010 than in 1750. c) The bars for women in their 70’s and 80’s in 2010 are longer than those for men of the same age in the same year. d) The first manned space flight took place in 1961 so that people born during this era would fall in the 45–49 year old category. This is the largest five-year group for both men and women. 1.14 Gallup polls Responses to this exercise will differ depending on the studies that students choose. a) The descriptive statistic will be a summary of data, without any prediction or population estimate. It might be a mean rating for a given attitude, for example. b) The inferential statistical analysis will have some kind of prediction or estimation; for example, the inferential statistic might include the margin of error for a mean, indicating that the population mean likely falls somewhere in a given range. 1.15 National service a) Yes, the populations are the same in the two studies. For both, it’s all students at your school. b) It is very unlikely that you will choose the same 20 students. c) Although it is most likely that the sample proportions will not be the same, they should be close to each other. 1.16 Samples vary less with more data a) It would be more surprising to flip a coin 500 times and observe all heads. b) As the sample size increases, the amount by which sample proportions tend to vary decreases. The estimates from larger samples, therefore, tend to be more accurate than estimates from smaller samples. When the coin is flipped just 5 times, it’s easy to see that we could get a sample with all heads. However, when the number of flips is increased to 500, it is much more likely that the sample proportion is near the population proportion of 0.5. It would be extremely unlikely to observe very few heads or almost all heads in 500 flips of a fair coin. 1.17 Comparing polls a)

1 n   100%  1 1000   100%  0.0316  100%  3.16% (rounds to 3.2%)

b) The first four polls are all within the margin of error; however, Rand favored Obama slightly, and Fox underestimated Obama’s margin. Generally, the polls are fairly accurate. 1.18 Margin of error and n a)

1 n   100%  1 100   100%  0.1  100%  10%, which suggests that between 50% and 70% of

Americans favored offshore drilling as a means of reducing U.S. dependence on foreign oil. b)

1 n   100%  1 400   100%  0.05  100%  5%, which suggests that between 55% and 65% of

Americans favored offshore drilling as a means of reducing U.S. dependence on foreign oil.

Chapter 1: The Art and Science of Learning from Data 3 1.18 (continued) c)

1 n   100%  1 1600   100%  0.025  100%  2.5%, which suggests that between 57.5% and

62.5% of Americans favored offshore drilling as a means of reducing U.S. dependence on foreign oil. As n increases, the sample becomes a more accurate reflection of the population, and the margin of error decreases. 1.19 Smoking cessation a) iii b) Yes. Because the employees were assigned to treatments randomly, the study provides us with convincing evidence that the difference was due to the effect of the financial incentive.

Section 1.3 Using Calculators and Computers 1.20 Data file for friends The results for this exercise will be different for each person who does it. The data files, however, should all look like this: Friend Characteristic 1 Characteristic 2 1 2 3 4 For each friend, you’ll have a number or label under characteristics 1 and 2. For example, if you asked each friend for gender and hours of exercise per week, the first friend might have m (for male) under Characteristic 1, and 6 (for hours exercised per week) under Characteristic 2. 1.21 Shopping sales data file Customer Clothes Sporting goods Books Music CDs 1 $49 $0 $0 $16 2 $0 $0 $0 $0 3 $0 $0 $0 $0 4 $0 $0 $92 $0 5 $0 $0 $0 $0 1.22 Sample with caution A sample of individuals with children who read the Ann Landers column is not a random sample of individuals with children because every member of the population does not have the same chance of being in the sample. Many individuals with children may not read Ann Landers while others who do read the column may choose not to participate in the survey. The feelings of those who choose to participate usually are not representative of the general population. In general, one should not rely much on the information contained in such samples. 1.23 Create a data file with software Your MINITAB data (from Exercise 1.21) will be in the following format, although it will reside in the cells of the MINITAB worksheet. Customer Clothes Sporting goods Books Music CDs 1 49 0 0 16 2 0 0 0 0 3 0 0 0 0 4 0 0 92 0 5 0 0 0 0 1.24 Use a data file with software See solution for Exercise 1.21 for format of data in MINITAB.

4 Statistics: The Art and Science of Learning from Data, 4th edition 1.25 Simulate with the Sampling Distribution for the Sample Proportion web app a) These will be different each time this exercise is completed. b) Regardless of the specific graphs constructed in (a), you will see that the amounts by which sample percentages tend to vary get smaller as the sample size n gets larger. c) The practical implication of this is that larger sample sizes tend to provide more accurate estimates of the true population percentage value. 1.26 Margin of error a) Answers will vary. b)

1 n   100%  1 1000   100%  0.0316  100%  3.16% (rounds to 3%)

c) Answers will vary. d) Answers will vary. 1.27 Ebola outbreaks The answer to this problem is based on a random process. This leads to potentially different answers each time it is performed. The binomial distribution (see Section 6.3) says that 14 or fewer people who died should occur in only about 1 in 100 simulations, so most students will likely not see any of these situations.

Chapter Problems: Practicing the Basics 1.28 UW Student survey a) The population is the entire UW student body of 40,858. The sample is the 100 students who were asked to complete the questionnaire. b) This value would not necessarily equal the value for the entire population of UW students. It is quite possible that the sample of 100 is not exactly representative of the whole student body. This percentage is only an estimate of the percentage of all students who would respond this way. It is unlikely that any single sample of 100 would have a percentage that was exactly the percentage of the entire population. c) The numerical summary is a sample statistic because it only summarizes for a sample, not for a population. 1.29 Euthanasia a) The population is all American adults. b) The sample data are summarized by a proportion, 0.598. c) The population proportion who would commit suicide. 1.30 Sleep disorders among college students It is very likely that between 25% and 29% of students are at risk for at least one sleep disorder. 1.31 Breaking down Brown versus Whitman a) The results summarize sample data because not every voter in the 2010 California gubernatorial election was polled. b) The percentages reported here are descriptive in that they describe the exact percentages of the sample polled who were Democrat and voted for Brown, who were Republican and voted for Brown and who were Independent and voted for Brown. c) The inferential aspect of this analysis is that the exit poll results were used to predict what percentage of each of the three parties (Democrat, Republican and Independent) voted for Brown in the 2010 California gubernatorial election. The margins of error give a likely range for the population percentages for each of the three parties. 1.32 Online learning a) The sample is the 100 students surveyed. The population is all students in this school. b) (i) Descriptive statistics would give us information about the preferences of the 100 students in the sample. (ii) Inferential statistics allow us to draw a conclusion about the preferences of the student body.

Chapter 1: The Art and Science of Learning from Data 5 1.33 Marketing study For the study on the marketing of digital media, the population is all Facebook users, and the sample is the 1000 Facebook users to whom the ad was displayed. Example 5 suggests that we might determine that the average sales per person equaled $0.90. This would be a descriptive statistic in that it describes the average sales per person in the sample of 1000 potential customers. If one were to use this information to make a prediction about the population, this would be an inferential statistic. 1.34 Support of labor unions a)

1 n   100%  1 1540   100%  0.025  100%  2.5% .

b) Between 50.5% and 55.5% c) ii. Inferential statistics 1.35 Multiple choice: Use of inferential statistics? The best answer is (c). 1.36 True or false? False: We often want to describe the sample AND make inferences about the population.

Chapter Problems: Concepts and Investigations 1.37 Statistics in the news If your article has numbers that summarize for a given group (sample or population), it’s using descriptive statistics. If it uses numbers from a sample to predict something about a population, it’s using inferential statistics. 1.38 What is statistics? Answers will vary. 1.39 Surprising suicide data? The likelihood of getting this result is extremely small. 1.40 Create a data file See solution for Exercise 1.23 for format of data in MINITAB.

Chapter Problems: Student Activities 1.41 Getting to know the class Answers will vary.

Chapter 2: Exploring Data with Graphs and Numerical Summaries

Section 2.1 Different Types of Data 2.1 Categorical/quantitative difference a) Categorical variables are those in which observations belong to one of a set of categories, whereas quantitative variables are those on which observations are numerical. b) An example of a categorical variable is religion. An example of a quantitative variable is temperature. 2.2 U.S. married-couple households The variable summarized is categorical. The variable is type of U.S. married-couple households, and there are four types: traditional, dual-income with children, dual-income with no children, and other. These types are the categories. 2.3 Identify the variable type a) quantitative c) categorical b) categorical d) quantitative 2.4 Categorical or quantitative? a) categorical c) categorical b) quantitative d) quantitative 2.5 Discrete/continuous a) A discrete variable is a quantitative variable for which the possible values are separate values such as 0, 1, 2, …. A continuous variable is a quantitative variable for which the possible values form an interval. b) Example of a discrete variable: the number of children in a family (a given family can’t have 2.43 children). Example of a continuous variable: temperature (we can have a temperature of 48.659). 2.6 Discrete or continuous? a) continuous c) continuous b) discrete d) discrete 2.7 Discrete or continuous 2 a) continuous c) discrete b) discrete d) continuous 2.8 Number of children a) The variable, number of children, is quantitative. b) The variable, number of children, is discrete. c) No. children 0 1 2 3 4 5 6 7 8+ Count 521 323 524 344 160 77 30 19 22 Proportion 0.258 0.160 0.259 0.170 0.079 0.038 0.015 0.009 0.011 Percentage 25.8 16.0 25.9 17.0 7.9 3.8 1.5 0.9 1.1 2.9 Fatal Shark Attacks a) Florida Location Count 2 Proportion 0.032 Percentage 3.2

Hawaii 2 0.032 3.2

California 4 0.063 6.3

Australia 15 0.238 23.8

Reunion Island Brazil Bahamas Other Location Count 6 4 6 11 Proportion 0.095 0.063 0.095 0.175 Percentage 9.5 6.3 9.5 17.5 b) Australia is the modal category. c) The regions with most frequent fatal shark attacks are Australia and South Africa.

South Africa 13 0.206 20.6

8 Statistics: The Art and Science of Learning from Data, 4th edition

Section 2.2 Graphical Summaries of Data 2.10 Generating Electricity a) Electricity Generation 40

Percent

30 20 10 0 oa C

a ur at N

as lG

uc N

er ow

le le ab ab w w e e n en re yd H rR on e N th er O th O Source

p ro

b) Sketching a bar chart would be easier. Sketching the precise areas corresponding to the percentages is more challenging in a pie chart. c) It is straightforward to judge the relative sizes when comparing the bars corresponding to the percentages. d) Coal is the modal category. 2.11 What do alligators eat? a) Primary food choice is categorical. b) The modal category is “fish.” c) Approximately 43% of alligators ate fish as their primary food choice. d) This is an example of a Pareto chart, a chart that is organized from most to least frequent choice. 2.12 Weather stations a) The slices of the pie portray categories of a variable (i.e., regions). b) The first number is the frequency, the number of weather stations in a given region. The second number is the percentage of all weather stations that are in this region. c) It is easier to identify the modal category using a bar graph than using a pie chart because we can more easily compare the heights of bars than the slices of a piece of pie. For example, in this case, the slices for Midwest and West look very similar in size, but it would be clear from a bar graph that West was taller in height than Midwest. 2.13 France is most popular holiday spot a) Country visited is categorical. b) A Pareto chart would make more sense because it allows the viewer to easily locate the categories with the highest and lowest frequencies. c) A dot plot or stem-and-leaf plot do not make sense because the data are categorical; these two types of plots are used with quantitative data (and also with data that have relatively few observations).

Chapter 2: Exploring Data with Graphs and Numerical Summaries 2.14 Pareto chart for fatal shark attacks (i) Alphabetically

(ii) Pareto chart Shark Fatalities 25

20 Percent

Percent

Shark Fatalities 25

15 10

l a i h er an d r ica zi nia r ida lia as l f tr a am Bra if or lo Haw Ot Is s A F l h u n th A io Ca Ba u n u So Re

i l ai z i nia id a lia ica er a s nd tr a A f r Oth am Isla Br a if or lor aw s F l h H u th n A Ca Ba nio u u So Re

Location

With a Pareto chart, it is straightforward to identify the few regions with the largest number of fatal shark attacks. 2.15 Sugar dot plot a) The minimum sugar value is zero grams, and the maximum is 18 grams. b) The sugar outcome that occurs most frequently is called the mode. For this data set there are five modes: three, four, eleven, twelve and fourteen grams. 2.16 Spring break hotel prices a) 1 | 24677999 2 | 133445 3 | 1338 b) 1 | 24 1 | 677999 2 | 13344 2|5 3 | 133 3|8 The plot with split stems gives a clearer picture of the shape of the distribution. c) Most hotels charge between $150 and $250 per night, with a few charging more. The distribution of prices is right-skewed. Histogram of Hotel Price 6

Frequency

5 4 3 2 1 0

100

150

200 250 300 Hotel Price

350

400

10 Statistics: The Art and Science of Learning from Data, 4th edition 2.17 Graphing exam scores a) There are 33 students in the class; the minimum score is 65 and the maximum is 98. b) Dotplot of Exam Scores

80 85 Exam Scores

c) Histogram of Exam Scores 14 12

Frequency

10 8 6 4 2 0

80 90 Exam Scores

100

2.18 Fertility rates a) 1 | 3333445677778899 2 | 04 A disadvantage of this plot is that it is too compact making it difficult to visualize where the data fall. b) 1 | 333344 1 | 5677778899 2 | 04

Chapter 2: Exploring Data with Graphs and Numerical Summaries

2.18 (continued) c) Histogram of Fertility 9 8

Frequency

7 6 5 4 3 2 1 0

1.1

1.4

1.7 2.0 Fertility

2.3

2.6

2.19 Split Stems a) smallest 0 g, largest 18 g b) 10 g, 11 g, 11 g c) Six cereals have less than 5 grams of sugar with 0 g, 1 g, 3 g, 3 g, 4 g, and 4 g. 2.20 Histogram for sugar a) –1 to 1,1 to 3, 3 to 5, 5 to 7, 7 to 9, 9 to 11, 11 to 13, 13 to 15, 15 to 17, and 17 to 19 b) The distribution is bimodal; child cereals, on average, have more sugar than adult cereals have. c) The dot and stem-and-leaf plots allow us to see all the individual data points. d) The relative differences among bars would remain the same. 2.21 Shape of the histogram a) Assessed value of houses in a large city – skewed to the right (a long right tail) because of some very expensive homes. b) Number of times checking account overdrawn in the past year for the faculty at the local university – skewed to the right because of the few faculty who overdraw frequently. c) IQ for the general population – symmetric because most would be in the middle, with some higher and some lower; there is no reason to expect more to be higher or lower (particularly because IQ is constructed as a comparison to the general population’s “norms”). d) The height of female college students – symmetric because most would fall in the middle, going down to a few short students and up to a few tall students. 2.22 More shapes of histograms a) The scores of students (out of 100 points) on a very easy exam in which most score perfectly or nearly so, but a few score very poorly – skewed to the left because of the few who score poorly. b) The weekly church contribution for all members of a congregation, in which the three wealthiest members contribute generously each week – skewed to the right because of the few wealthy members’ contributions. c) Time needed to complete a difficult exam (maximum time is 1 hour) – skewed to the left because most take almost or all of the whole time, whereas a few finish very quickly. d) Number of music CDs (compact discs) owned, for each student in your school – skewed to the right because of a few students’ huge CD collections.

12 Statistics: The Art and Science of Learning from Data, 4th edition 2.23 Gestational Period a) 9 8

Frequency

7 6 5 4 3 2 1 0

100 200 300 400 500 600 700 gestational period

b) The elephant, with a gestational period of 624 days, is unusual. c) The distribution is right-skewed. d) Neither of the two histograms accurately summarizes the distribution. The one with 4 intervals is too coarse, the one with 30 intervals too fine. 14

12 3 Frequency

Frequency

10 8 6 4

2 0

160 320 480 gestational period

640

100 200 300 400 500 600 700 gestational period

2.24 How often do students read the newspaper? a) This is a discrete variable because the value for each person would be a whole number. One could not read a newspaper 5.76 times per week, for example. b) (i) The minimum response is zero. (ii) The maximum response is nine. (iii) Two students did not read the newspaper at all. (iv) The mode is three. c) This distribution is unimodal and somewhat skewed to the right. 2.25 Blossom widths a) The distribution is slightly right-skewed (or roughly symmetric). Most blossoms have a width between 3.2 and 3.6 in. There is one blossom with an unusual small width for that species of less than 2.4 in. b) The distribution is left-skewed. Most blossoms have a width between 2.8 and 3.2 in. 6  15  24 c)  0.90 or 90% 50 d) No. We don’t know how many blossoms in the interval from 2.8 to 3.2 in. are actually wider than 3 in.

Chapter 2: Exploring Data with Graphs and Numerical Summaries

2.26 Central Park temperatures a) The distribution is somewhat skewed to the left. b) A time plot connects the data points over time to show time trends. c) A histogram shows the number of observations at each level more easily than does the time plot. We also can see the shape of the distribution from the histogram but not from the time plot. 2.27 Is whooping cough close to being eradicated? a) One can see in the time plot below that after an initial slight increase, there was a sharp and steady decrease in incidence of whooping cough starting around 1940. The decrease leveled off starting around 1960. These data suggest that the whooping cough vaccination was proving effective in reducing the incidence of whooping cough. Scatterplot of Rate per 100,000 vs Year 160

Rate per 100,000

140 120 100 80 60 40 20 0 1920

1930

1940 1950 Year

1960

1970

b) The incidence rate stayed low until about 2000, after which a sharp increase can be observed. No, the United States is not close to eradicating whooping cough. Potential reasons for this include fewer people deciding to get vaccinated and less efficient vaccinations. c) A histogram would not address this question because it does not show the rates for each year; we would not be able to see changes over time. 2.28 Warming in Newnan, GA? Overall, the time plot (below) does seem to show a decrease in temperature over time. Scatterplot of Temperature vs Year 66 65

Temperature

64 63 62 61 60 59 58 1900

1920

1940

1960 Year

1980

2000

14 Statistics: The Art and Science of Learning from Data, 4th edition

Section 2.3 Measuring the Center of Quantitative Data 2.29 Median versus mean a) Median (The distribution would be right-skewed.) b) Median (The distribution would be left-skewed.) c) Mean (The distribution would be symmetric.) 2.30 More median versus mean a) Median (The distribution would be right-skewed.) b) Mean (The distribution would be symmetric.) c) Median (The distribution would be left-skewed.) 2.31 More on CO2 emissions a)

Mean: x 

 x  8.0  5.3  1.8  1.7  1.2  0.8  0.6  0.5  0.4  0.4  20.7  2.07 n

10 10 n  1 10  1 Median: Find the middle value:   5 12 th position 2 2 0.4, 0.4, 0.5, 0.6, 0.8, 1.2, 1.7, 1.8, 5.3, 8.0 0.8  1.2  1. The median is 2 b) Comparing absolute emission values for nations with different population sizes might be misleading because nations with larger populations tend to have larger total emissions. When viewed per capita, a different picture might emerge. 2.32 Resistance to an outlier a) The median for all three data sets is ten. The values for all three sets of observations are already arranged in numerical order, and the middle number for each is 10.  x  8  9  10  11  12  50  10 b) Set 1: x  n 5 5 x 8+9+10+11+100 138     27.6 Set 2: x  n 5 5  x  8+9+10+11+1000  1038  207.6 Set 3: x  n 5 5 c) As the highest value becomes more and more of an extreme outlier, the median is unaffected, whereas the mean increases as the outlier becomes more extreme. 2.33 Income and health insurance The distributions for both will be skewed to the right because the mean is much larger than median. 2.34 Labor dispute Management would want to use the mean because it would be skewed right by the outliers – the few members of management who make a whole lot of money. The mean income would be higher because of the outliers. The workers would prefer the median because it is not affected by the large outliers. It is a more accurate measure of the actual typical income. 2.35 Cereal sodium The moderate skewness to the left causes the mean to be lower than the median. 2.36 Center of plots a) The mean and median would be the same for the dot plots to the middle and to the right because the distributions are symmetric. b) The distribution to the left is skewed to the right, and the mean would be higher than the median would. The mean would be pulled toward the higher, atypical values.

Chapter 2: Exploring Data with Graphs and Numerical Summaries

2.37 Public transportation – center a) The mean is 2, the median is 0, and the mode is 0. Thus, the average score is 2, the middle score is 0 (indicating that the mean is skewed by outliers), and the most common score also is 0.  x  0  0  4  0  0  0  10  0  6  0  20  2 Mean: x  n 10 10 Median: middle score of 0, 0, 0, 0, 0, 0, 0, 4, 6, 10 Mode: the most common score is zero. b) Now the mean is 10, but the median is still 0.  x  0  0  4  0  0  0  10  0  6  90  110  10 Mean: x  n 10 10 Median: middle score of 0, 0, 0, 0, 0, 0, 0, 4, 6, 10, 90 The median is not affected by the magnitude of the highest score, the outlier. Because there are so many zeros, even though we’ve added one score, the median remains zero. The mean, however, is affected by the magnitude of this new score, an extreme outlier. 2.38 Public transportation – outlier a) The mean versus median applet confirms that the median is not affected by the magnitude of the highest score. Because there are so many zeros, even though we’ve added one score, the median remains zero. The mean, however, is affected by the magnitude of this new score, an extreme outlier. b) The applet demonstrates that the outlier has a weaker effect when there are more scores near the original mean. 2.39 Baseball salaries There are a few valuable players who receive exorbitant salaries, whereas the typical player is paid much less (although still a lot by most people’s standards!). The very high salaries of the few affect the mean, but not the median. 2.40 More baseball salaries Answers will vary. 2.41 European fertility a) The median fertility rate is 1.7. Thus, about half of the countries listed have mean fertility rates at or below 1.7 with the remaining countries having fertility rates above 1.7. b) The mean of the fertility rates is 1.65. c) Since the population of adult women can vary greatly among the countries, it is necessary to calculate an overall fertility rate for the country in order to make comparisons. This rate is found by calculating the mean number of children per adult woman. The mean for a variable need not be one of the possible values for the variable. Although the number of children born to each adult woman is a whole number, the mean number of children born per adult woman need not be a whole number. For example, the mean number of children per adult woman is considerably higher in Mexico than in Canada. 2.42 Sex partners a) Number of partners Number of respondents 0 1 2 3 4 5

102 233 18 9 2 1

Total

365

If the data is sorted from smallest to largest, the median is the number in the (365 + 1)/2 = 183rd position. Since 102 respondents answered 0 and 233 answered 1, the median is 1. Copyright © 2017 Pearson Education, Inc.

16 Statistics: The Art and Science of Learning from Data, 4th edition 2.42 (continued) b) Mean: x 

 xi  102  0  233 1  18  2   9 3  2  4  15  309  0.85

n 365 365 Since the total number of respondents is 365, the median is still the value in the 183rd place when the data are sorted from smallest to largest. Since 233 respondents gave an answer of 1, the median is still 1. However, the value of the mean changes: Mean: x 

 xi  0 0  233 1  18  2   9 3  2  4  103 5  819  2.2

n 365 365 2.43 Marriage statistics for 20–24-year-olds a) Women: The mean is 0.274, the median is 0.  xi  7350 0  2587 1  80  2   2747  0.274 x n 10, 017 10,017 The median is the middle score. With 10,017 scores, the median is the score in the 5009th position. Thus, the median is 0. Men: The mean is 0.161, the median is 0.  xi  8418 0  1594 1  10  2   1614  0.161 x n 10,022 10,022 The median is the middle score. With 10,022 scores, the median is the score between the 5011th and 5012th positions. Thus, the median is 0. b) Using the medians, it seems that there is no difference. Using the mean, in this age group, women have, on average, been married more often. 2.44 Knowing homicide victims a) The mean is 0.16.  xi  3944 0  279 1  97  2   40 3  23  4.5  696.5  0.16 x n 4383 4383 b) The median is the middle score. With 4383 scores, the median is the score in the 2192nd position. Thus, the median is 0. c) The median would still be 0, because there are still 2200 people who gave 0 as a response. The mean would now be 1.95.  xi  2200 0  279 1  97  2  40 3  1767  4.5  8544.5  1.95 x n 4383 4383 d) The median is the same for both because the median ignores much of the data. The data are discrete; hence, a high proportion of the data falls at only one or two values. The mean is better in this case because it uses the numerical values of all of the observations, not just the ordering. 2.45 Accidents In this case, the mean is likely to be more useful because it uses the numerical values of all of the observations, not just the ordering. Because so many people would report 0 motor accidents, the median is not very useful. It ignores too much of the data.

Section 2.4 Measuring the Variability of Quantitative Data 2.46 Sick leave a) The range is 6; this is the distance from the smallest to the largest observation. In this case, there are six days separating the fewest and most sick days taken (6 – 0 = 6).

Chapter 2: Exploring Data with Graphs and Numerical Summaries

2.46 (continued) b) The standard deviation is the typical distance of an observation from the mean (which is 1.25). s2 

 ( x  x )2  (0  1.25)2    (0  1.25)2  (4  1.25)2  (6  1.25)2

n 1 39.5   5.643 7

s  s 2  5.643  2.38 The standard deviation of 2.38 indicates a typical number of sick days taken is 2.38 days from the mean of 1.25. Redo (a) and (b). a) The range is 60; this is the distance from the smallest to the largest observation. In this case, there are sixty days separating the fewest and most sick days taken (60 – 0 = 60). b) The standard deviation is the typical distance of an observation from the mean (which is 8). s2 

 ( x  x )2  0  8     0  8   4  8  60  8 2

n 1 3104   443.43 7

s  s 2  443.43  21.06 The standard deviation of 21.06 indicates a typical number of sick days taken is 21.06 days from the mean of 8. The range and mean both increase when an outlier is added. 2.47 Life expectancy a) Upon examination of the data, the countries in Africa will have a larger standard deviation since the spread of the data is greater for this group than for the countries in Western Europe. b) Western Europe:  x  81  80  80  81    80  82  82  83  1220  81.3333 x n 15 15

s Africa:

2 2 2   x  x   81  81.3333    83  81.3333  1.05

n 1

15  1

x

 x  47  50  51  57    64  63  62  61  914  57.125

s

2   x  x    47  57.125    61  57.125  5.18

n 1 16  1 Note that the standard deviation for the Western Europe group, 1.0 (rounded), is much smaller than for the Africa group, 5.2. 2.48 Life expectancy including Russia We would expect the standard deviation to be larger since the value for Russia is significantly smaller than the rest of the group adding additional spread to the data. The standard deviation including Russia is, in fact, 3.01. 2.49 Shape of home prices? The most plausible value is $60,000. –$15,000 is not possible because a standard deviation cannot be negative. $1,000 and $1,000,000 are unlikely because they are too small or too big, respectively, for a typical deviation. One would not expect the typical deviation to be that far from the median for home prices.

18 Statistics: The Art and Science of Learning from Data, 4th edition 2.50 Exam standard deviation The most realistic value is 12. There are problems with all the others. –10: We can’t have a negative standard deviation. 0: We know there is spread because the scores ranged from 35 to 98, so the standard deviation is not 0. 3: This standard deviation seems very small for this range. 63: This standard deviation is too large for a typical deviation. In fact, no score differed from the mean by this much. 2.51 Heights a) According to the Empirical Rule, 68% of men would be within one standard deviation of the mean between 71 – 1(3) = 68 and 71 + 1(3) = 74 inches. 95% of men would be within two standard deviations of the mean, between 71 – 2(3) = 65 and 71 + 2(3) = 77 inches. All or nearly all men would be within three standard deviations of the mean, between 71 – 3(3) = 62 and 71 + 3(3) = 80 inches. b) The mean for women is lower than the mean for men. Because each gender’s heights would tend to be closer to that gender’s mean than to the overall mean, the standard deviation would be smaller when we compared them with the appropriate gender group than when we compared them to the overall group. Would not expect unimodal but more bimodal. 2.52 Histograms and standard deviation a) (i) The sample on the right has the largest standard deviation since is the most spread out. (ii) The sample in the middle has the smallest standard deviation since it has no spread. b) The Empirical Rule is relevant only for the distribution on the left because the distribution is bellshaped. 2.53 Female strength According to the Empirical Rule, 68% of women would be able to lift within one standard deviation from the mean, between 79.9 – 1(13.3) = 66.6 and 79.9 + 1(13.3) = 93.2 pounds. 95% of women would be able to lift within two standard deviations from the mean, between 79.9 – 2(13.3) = 53.3 and 79.9 + 2(13.3) = 106.5 pounds. All or nearly all women would be able to lift within three standard deviations from the mean, between 79.9 – 3(13.3) = 40.0 and 79.9 + 3(13.3) = 119.8 pounds. 2.54 Female body weight a) 95% of weights would fall within two standard deviations from the mean, between 133 – 2(17) = 99 and 133 – 2(17) = 167. b) An athlete who is three standard deviations above the mean would weight 133 + 3(17) = 184 pounds. This would be an unusual observation because typically all or nearly all observations fall within three standard deviations from the mean. In a bell-shaped distribution, this would likely be about the highest score one would obtain. 2.55 Shape of cigarettes taxes With a bell-shaped distribution, we expect scores to extend about three standard deviations from the mean 0  73  1.52, or 1.52 standard in either direction. The lowest possible value of 0, however, is only 48 deviations below the mean, and so the distribution likely is skewed to the right. 2.56 Empirical rule and skewed, highly discrete distribution a)

x

 x  8418 0  1594 1  10  2  1614  0.16

s

x  x  

10022

n 1

10022

8418  0  0.16  1594 1  0.16  10  2  0.16   0.37 10022  1 2

Chapter 2: Exploring Data with Graphs and Numerical Summaries

2.56 (continued) b) Observations

Predicted by Empirical Rule

One standard deviation from the mean is between 0.16 84.0% 68% – 1(0.37) = –0.21 and 0.16 + 1(0.37) = 0.53 Two standard deviations from the mean is between 84.0% 95% 0.16 – 2(0.37) = –0.58 and 0.16 + 2(0.37) = 0.90 Three standard deviations from the mean is between 99.9% About 100% 0.16 – 3(0.37) = –0.95 and 0.16 + 3(0.37) = 1.27 There are more observations within one standard deviation of the mean and fewer within two standard deviations than would be predicted by the Empirical Rule. c) The Empirical Rule is only valid when used with data from a bell-shaped distribution. This is not a bell-shaped distribution; rather, it is highly skewed to the right. Most observations have a value of 0, and hardly any have the highest value of 2. 2.57 How much TV? These statistics suggest that this distribution is highly skewed toward the right for two main reasons. The mean is larger than the median, and the standard deviation is almost as large as the mean. In fact, the 0  3.09  1.08, or 1.08 standard deviations below the mean. lowest possible value of 0 is only 2.87 2.58 How many friends? a) The standard deviation is larger than the mean; in addition, the mean is higher than the median. In 0  7.4  0.67, or 0.67 standard deviations below the fact, the lowest possible value of 0 is only 11.0 mean. These situations occur when the mean and standard deviation are affected by an outlier or outliers. It appears that this distribution is skewed to the right. b) The Empirical Rule does not apply to these data because they do not appear to be bell-shaped. 2.59 Judging skew using x and s The largest observation is 86 is less than one standard deviation above the mean of 70.4. Specifically, 86 is 86  70.4  0.93 standard deviations above the mean. The smallest observation is 35, which is only 16.7 35  70.4  2.12 , or 2.12 standard deviations below the mean. This distribution, therefore, is likely 16.7 skewed to the left. Dotplot of Exam Scores

56 64 Exam Scores

20 Statistics: The Art and Science of Learning from Data, 4th edition 2.60 Youth unemployment in the EU a) 30

unemployment

25 20 15 10 5 0

l ria n y rg lta d s l ic r k ia m e n nd m i a ia ry c e nd i a ia l y ri a nd i a u s a ti a in ce s t a ou Ma la n u b m a a n gd o e d n la l gi u to n v en ga ra n o la ua n atv I ta lg a e la v a k pr rtug ro a S pa ree r e p n o m in w F i e s l o un F P th L r lo y o C u I A u erm mb e G S B B E S H h R De R K S C P Li G xe et d N ec h Lu te z ni C U

country

b) Mean = 11.1%, median = 10.1%, s = 5.6% c) The distribution of the youth unemployment rate in the EU is skewed to the right, with two countries (Greece and Spain) showing an unemployment rate of more than 20%. The mean unemployment rate is 11.1% (median 10.1%). The variability in the unemployment rate is relatively large, with a standard deviation of 5.6%, but this may be inflated due to the two outliers and right-skewness of the distribution. 2.61 Create data with a given standard deviation a) One possible answer: 30, 50, 80 b) One possible answer: 10, 50, 90. c) The largest standard deviations results from two 0s and two 100s, with s = 57.74.

Section 2.5 Using Measures of Position to Describe Variability 2.62 Vacation days a) Median: Find the middle value of 13, 25, 26, 28, 34, 35, 37, 42. The median is 31, the average of the two middle values, 28 and 34. b) The first quartile is the median of 13, 25, 26, 28. The first quartile is 25.5, the average of the two middle values, 25 and 26. c) The third quartile is the median of 34, 35, 37, 42. The third quartile is 36, the average of the two middle values. d) 25% of countries have residents who take fewer than 25.5 vacation days, half of countries have residents who take fewer than 31 vacation days, and 75% of countries have residents who take fewer than 36 vacation days per year. The middle 50% of countries have residents who take an average of between 25.5 and 36 vacation days annually. 2.63 Youth unemployment a) The median is 10.15, the average of the 14th and 15th values, 10.0 and 10.3. In 2013, half of the European Union nations had an unemployment rate less than 10.15%. b) The first quartile is 7.15, the average of the 7th and 8th values, 7.0 and 7.3. In 2013, 75% of the European Union nations had an unemployment rate larger than 7.15% (or 25% had rate less than 7.15%). c) The third quartile is 13.05, the average of the 21st and 22nd values, 13.0 and 13.1. In 2013, the unemployment rate was larger than 13.05% for 25% of the European Union nations. d) The 10th percentile will be around 6% because Q1 = 7.15%.

Chapter 2: Exploring Data with Graphs and Numerical Summaries

2.64 Female strength a) One fourth of the females had a maximum bench press less than 70 pounds, and one fourth had a maximum bench press greater than 90 pounds. b) The mean and median are the about the same, and the first and third quartiles are equidistant from the median. These are both indicators of a roughly symmetric distribution. 2.65 Female body weight a) One quarter of the females had weight below 119 and one quarter had weight above 144. b) The mean and median are the about the same, and the first and third quartiles are approximately equidistant from the median. These are both indicators of a roughly symmetric distribution. 2.66 Ways to measure variability a) The range is even more affected by an outlier than is the standard deviation. The standard deviation takes into account the values of all observations and not just the most extreme two. b) With a very extreme outlier, the standard deviation will be affected both because the mean will be affected and because the deviation of the outlier (and its square) will be very large. The IQR would not be affected by such an outlier. c) The standard deviation takes into account the values of all observations and not just the two marking 25% and 75% of observations. 2.67 Variability of cigarette taxes a) (i) Q1, marking the lowest 25% of states, has a value of 36. Thus, 75% of states have cigarette taxes greater then 36 cents. (ii) Q3, marking the highest 25% of states, has a value of 100. Thus, 25% of states have cigarette taxes greater than $1.00. b) The two values that demarcate the middle 50% are Q1 = 36 cents and Q3 = 100 cents (one dollar). c) The interquartile range (IQR) is the difference between Q1 and Q3. IQR = Q3 – Q1 = 100 – 36 = 64 cents. For the middle 50% of state cigarette taxes, $0.64 is the distance between the largest and smallest cigarette tax amount. d) With a bell-shaped distribution, we expect Q1 and Q3 to be roughly equidistant from the median which is not the case here. The maximum value is also quite far from Q3. Thus, it appears that the distribution is skewed to the right. 2.68 Sick leave a) The range is six; this is the distance from the smallest to the largest observation. In this case, there are six days separating the fewest and most sick days taken (6 – 0 = 6). b) The interquartile range is the difference between Q3 and Q1. IQR = Q3 – Q1 = 2. c) Redo (a) and (b). a) The range is sixty; this is the distance from the smallest to the largest observation. In this case, there are sixty days separating the fewest and most sick days taken (60 – 0 = 60). b) Q1, the median of all scores below the median, is still 0. Q3, the median of all scores above the median, is still 2 (the average of 0 and 4). The interquartile range remains the same: IQR = Q3 – Q1 = 2. The IQR is least affected by the outlier because it doesn’t take the magnitudes of the two extreme scores into account at all, whereas the range and s do. 2.69 Infant mortality Africa a) Q1 is the median of the lower half of the sorted data: 54, 63, 68, 76, 78, 79, 80. It is 76. Q3 is the median of the upper half of the sorted data: 81, 84, 96, 101, 110, 121, 154. It is 101. b) IQR = Q3 – Q1 = 25. For the middle half of the infant mortality rates, the distance between the largest and smallest rates is 25. 2.70 Infant mortality Europe Q1 is the median of the lower half of the sorted data: 3, 3, 3, 4, 4, 4, 4. It is 4. Q2, the median, is the 44  4 . Q3 is the median of the upper half of the sorted data: 4, average of the middle two data values 2 4, 4, 4, 5, 5, 5. It is 4.

22 Statistics: The Art and Science of Learning from Data, 4th edition 2.71 Computer use a) This five-number summary suggests that the distribution is skewed to the right. The distance between the minimum and the median is much smaller than the distance between the median and the maximum. b) In this case, outliers would be those values more than 1273.5 points from the first and third quartiles: IQR = Q3 – Q1 = 1105 – 256 = 849 and 1.5(IQR) = 1.5(849) = 1273.5 The lower boundary: Q1 – 1.5(IQR) = 256 – 1273.5 = –1017.5 The upper boundary: Q3 + 1.5(IQR) = 1105 + 1273.5 = 2378.5 In the current example, the lowest score is 4, so there are no scores below –1017.5. The highest score, on the other hand, is 320,000, much higher than 2378.5. Thus, there are potential outliers according to this criterion. 2.72 Central Park temperature distribution revisited a) We would expect it to be skewed to the left because the maximum is closer to the median than is the minimum. b) Numbers are approximate: Minimum: 49.0, Q1: 52.5, Median: 53. 5, Q3: 55.0, Maximum: 57.0 These approximations support the premise that the distribution is skewed to the left if it is skewed. The median is closer to the maximum and Q3 than it is to the minimum and Q1. 2.73 Box plot for exam The minimum, Q1, median, Q3, and maximum are used in the box plot. Boxplot of Exam Score 100

Score

2.74 Public transportation a) Minimum: 0, Q1: 0, Median: 0, Q3: 4, Maximum: 10 Boxplot of Miles per day 10

Miles per day

Chapter 2: Exploring Data with Graphs and Numerical Summaries

2.74 (continued) b) Q1 and the median share the same line in the boxplot because so many employees have a score of zero that the middle score of the whole set of data is zero and the middle score of the lower half of the data also is zero. c) There is no whisker because the minimum score also is zero. This situation resulted because there are so many people with the lowest score. 2.75 Energy statistics a) Numbers are approximate: Minimum: 50, Q1: 130, Median: 160, Q3: 250, Maximum: 650 One country was a potential outlier, the one around 650. b) We can know how far Italy was from the mean in terms of standard deviations by calculating its zscore. It is 0.47 standard deviations below the mean of 195. x  x 139  195 z   0.47 120 s c) The U.S. is 1.16 standard deviations above the mean. x  x 334  195 z   1.16 120 s 2.76 European Union youth unemployment rates a) In a box plot, Q1 = 7.15 and Q3 = 13.05, would be the outer edges of the box. 1.5(IQR) = 1.5(13.05 – 7.15) = 8.85. The whisker on the left would extend to the minimum 4.9, since it is larger than 7.15 – 8.85 = –1.7. The whisker on the right would extend to the value of 17.2 (Croatia), since it is the largest value below 7.15 + 8.85 = 21.9. b) Greece (27.3) and Spain (26.4) have values larger than 21.9, so would be considered outliers. c) Greece’s score is 2.89 standard deviations above the mean, and thus, is not an outlier according to the three standard deviation criterion. x  x 27.3  11.1 z   2.89 s 5.6 d) A z-score of 0 indicates that the country’s unemployment rate is zero standard deviations from the mean; hence, the unemployment rate is equal to the mean. In this case, a country with an unemployment rate of 11.1 would have a z-score of 0. 2.77 Air pollution a) Finland’s pollution is exactly one standard deviation above the mean pollution of all countries in the EU. x  x 11.5  7.9 z  1 s 3.6 b) Sweden’s pollution is 0.64 standard deviation below the mean pollution of all countries in the EU. x  x 5.6  7.9 z   0.64 s 3.6 c) The United Kingdom’s pollution is exactly equal to the mean pollution of all countries in the EU. x  x 7.9  7.9 z  0 s 3.6 2.78 Female heights x  x 56  65.3   3.1 s 3.0 b) The negative sign indicates that the height of 56 inches is below the mean. c) Because the height of 56 inches is more than three standard deviations from the mean, it is a potential outlier.

z

24 Statistics: The Art and Science of Learning from Data, 4th edition 2.79 Hamburger sales This z-score indicates that the sales for this day are more than three standard deviations above the mean, and thus, would be a potential outlier – in other words, an unusually good day. x  x 2000  1165   3.80 220 s 2.80 Florida students again a) The distribution depicted in the box plot is skewed to the right. Most observations fall between about 0 and 15 but there are a few outliers representing very large values. Minimum: 0, Q1: 3, Median: 6, Q3: 10, Maximum: 37 z

Boxplot of TV 40

b) Since IQR = 10 – 3 = 7 and 1.5(IQR) = 1.5(7) = 10.5, the 1.5(IQR) criterion would indicate that all data should fall between about 3 – 10.5 = –7.5 and 10 + 10.5 = 20.5. Because some data points fall beyond this range, it appears that there are potential outliers. 2.81 Females or males watch more TV? Based on the Florida survey data, females tend to watch more TV. The median, Q1 and Q3 are higher for females than for males. Boxplot of TV 40

0 f

m gender

2.82 CO2 comparison a) The two outliers for Central and South America have a value of roughly 12 metric tons. b) The distributions would be skewed to the right. The median sits low in the box (pulled toward the first quartile), and the upper whisker stretches out more from the box to the maximum value than the left whisker stretches to minimum value.

Chapter 2: Exploring Data with Graphs and Numerical Summaries

2.82 (continued) c) The median CO2 emission is much larger in Europe than the median for Central and South America. The spread of the middle 50% of the distribution of emissions, as measured by the IQR, seems to be about the same for Europe and Central and South America. 75% of the distribution of emissions for Europe is higher than the lower 75% of the distribution of emissions for Central and South America. Overall, emissions are higher for Europe than for Central and South America.

Section 2.6 Recognizing and Avoiding Misuses of Graphical Summaries 2.83 Great pay (on the average) a) The mean is $43,700 and the median is $9300. b) It is misleading because the mean is so heavily influenced by the outlier (her own salary) that it is not a typical value. The median would be a much more accurate summary of these salaries. 2.84 Market share for food sales a) One problem with this chart is that the percentages do not add up to 100. Second, the Tesco slice seems too large for 27.2%. A third problem is that contiguous colors are very similar. This increases the difficulty in easily reading this chart. b) It would be easier to identify the mode with a bar graph because one would merely have to identify the highest bar. 2.85 Enrollment trends a) This graph shows an overall decrease in enrollment in STEM majors at first, with what appears to be somewhat of a plateau toward the end of the time span. Time Series Plot of STEM Majors 2000

STEM Majors

1950

1900

1850

1800 2004

2005

2006

2007

2008 Year

2009

2010

2011

2012

b) This graph shows a gradual decrease over time in the percentage of students who are enrolled in STEM majors. Time Series Plot of Percent of enrolled students in STEM

Percent of enrolled students

0.0675 0.0650 0.0625 0.0600 0.0575 0.0550 2004

2005

2006

2007

2008 Year

2009

2010

2011

2012

26 Statistics: The Art and Science of Learning from Data, 4th edition 2.85 (continued) c) The graphs in (a) and (b) tell us that although there are some fluctuations in the numbers of students enrolling in STEM majors over the years, there is a steady decrease in the percentage of enrolling students who are enrolled in STEM majors over the years. We cannot learn this from Figures 2.18 and 2.19. 2.86 Terrorism and war in Iraq a) This graph is misleading. Because the vertical axis does not start at 0, it appears that six times as many people are in the “no, not” column than in the “yes, safer” column, when really it’s not even twice as many. b) With a pie chart, the area of each slice represents the percentage who fall in that category. Therefore the relative sizes of the slices will always represent the relative percentages in each category. 2.87 BBC license fee The 2013 projection is shown where the observation would be plotted for the year 2007, not 2013. 2.88 Federal government spending The slices do not seem to have the correct sizes, for instance the slice with 16% seems larger than the slice with 19%. 2.89 Bad graph Answers will vary.

Chapter Problems: Practicing the Basics 2.90 Categorical or quantitative? a) Number of children in family: quantitative b) Amount of time in football game before first points scored: quantitative c) Choice of major (English, history, chemistry, …): categorical d) Preference for type of music (rock, jazz, classical, folk, other): categorical 2.91 Continuous or discrete? a) Age of mother: continuous b) Number of children in a family: discrete c) Cooking time for preparing dinner: continuous d) Latitude and longitude of a city: continuous e) Population size of a city: discrete 2.92 Young non-citizens in the U.S. a) Region of Birth is Categorical. Noncitizens aged 18 to 24 in the United States Region of Birth Number (in Thousands) Percentage

Africa

115

Asia

590

Europe

148

Latin America & Caribbean

1666

Other

Total

2568

115  0.045  4.5% 2568 590  0.230  23.0% 2568 148  0.058  5.8% 2568 1116  0.649  64.9% 2568 49  0.019  1.9% 2568

Chapter 2: Exploring Data with Graphs and Numerical Summaries

2.92 (continued) b) Mode. Most young noncitizens are from Latin America and the Caribbean. c) Latin America and Caribbean, Asia, Europe, Africa, Other. One immediately sees that most young noncitizens are from two regions, Latin America and Caribbean and from Asia. 2.93 Cool in China a) The variable being measured is the personality trait that defines “cool.” b) This is a categorical variable. c) Because the data are categorical with unordered categories, we could use only the bar chart and the modal category. 2.94 Chad voting problems a) We first locate the dot directly above 11.6% on the horizontal x axis. We then look at the vertical y axis across from this point to determine the label for that dot: Optical scanning with a two-column ballot. This tells us that the over-vote was highest among those using optical scanning with a twocolumn ballot. b) We first locate the dots above the lowest percentages on the x axis. We then determine the labels across from these dots on the y axis to determine the lowest two combinations: optical, one column, and votomatic, one column. Thus, the lowest over-voting occurred when voters had a ballot with only one column that was registered either using optical scanning or votomatic (manual punching of chads). c) We could summarize these data further by using a bar for each combination: optical, one column; optical, two column; votomatic, one column, etc. For each bar, we could then plot the average overvote of all counties in that category. To do this, we would need the exact percentages of each county in each category. 2.95 Number of children a) A histogram would be most appropriate since the upper interval is 8+, which may contain digits above 8. b) The distribution is skewed to right. 600

Frequency

500 400 300 200 100 0

2 3 4 5 6 7 Number of Children

2.96 Longevity a) 0 | 0 | 57789 1 | 0011112234 1|9 2 | 23 2| 3|0 3|5

28 Statistics: The Art and Science of Learning from Data, 4th edition 2.96 (continued) b) 9 8

Frequency

7 6 5 4 3 2 1 0

16 20 24 longevity

c) The distribution of longevity is right-skewed. Most animals live to be between 5 and 15 years old. 2.97 Newspaper reading a) Dotplot of Newspaper Reading

4 6 Times per Week

0 | 00 1 | 0000 2 | 0000 3 | 00000000 4 | 0000 5 | 00000 6 | 00 7 | 0000 8 | 00 9|0 The leaf unit is identified above. The stems are the whole numbers, 0 through 9. c) The median is the middle number. There are 36 numbers, so the median is between the 18th and 19th which have the values 3 and 4, respectively. Thus, the median is 3.5. d) The distribution is slightly skewed to the right.

Chapter 2: Exploring Data with Graphs and Numerical Summaries

2.98 Match the histogram a) symmetric and bimodal c) skewed to the left b) skewed to the right d) symmetric and unimodal 2.99 Sandwiches and protein a) 0 | 8 1| 1 | 7889 2 | 113 2 | 666 b) A stem-and-leaf plot allows one to see the individual amounts. c) The protein amounts are mostly between 17 and 21 grams with a few sandwiches having a higher protein value of 26 grams. There appears to be one outlier having only 8 grams of protein. 2.100 Sandwiches and cost a) (The data values have been truncated.) 2|4 2 | 999 3 | 1444 3 | 688 b) A stem-and-leaf plot allows one to see the individual prices. c) Most of the sandwiches cost between $2.90 and $3.89. The prices are skewed to the left with one sandwich costing only $2.49. 2.101 What shape do you expect? a) Number of times arrested in past year – skewed to the right because most values are at 0 but there are some large values. b) Time needed to complete difficult exam (maximum time is 1 hour) – skewed to the left because most values are at 1 hour or slightly less, but some could be quite a bit less. c) Assessed value of home – skewed to the right because there are some extremely large values. d) Age at death – skewed to the left because most values are high, but some very young people die. 2.102 Sketch plots NOTE: Plots will vary, but should have the following characteristics. a) It would be skewed to the right, and the mean would be greater than the median because of a few mansions that sell for millions. b) It would be skewed to the right, and the mean would be higher than the median. Most women do not give birth over age 40. Thus, the median would be zero. The mean, however, would be positive, because some women do give birth over the age of 40. c) It would be skewed to the left, and the mean would be lower than the median. The mean would be pulled down by the outlier of 50. The standard deviation is only 10, so there probably aren’t lots of low scores. Moreover, the highest possible score of 100 is only 12/10 = 1.2 standard deviations above the mean. d) It would be skewed to the left, and the mean would likely be lower than the median. Most people with cars drive them every month, but a few drive them less, and some hardly or not at all. These outliers would pull the mean, but not the median, lower. The median and mode probably would be 12. 2.103 Median versus mean sales price of new homes We would expect the mean sales price to have been higher due to the distribution being skewed to the right. A few very expensive homes will greatly affect the mean, but not the median sales price. 2.104 Household net worth a) The distribution of these families’ net worth is likely to be skewed to the right because relatively few families would have very high net worth so that we expect the mean to be greater than the median. b) When assets such as homes and retirement savings decline due to a recession, it is typical for the highest valued assets to be affected the most. Thus, we would expect the mean net worth to drop more than the median net worth. Copyright © 2017 Pearson Education, Inc.

30 Statistics: The Art and Science of Learning from Data, 4th edition 2.105 Golfers’ gains a) The data for the 90 players would be skewed to the right with the majority of the golfers earning between $1 and $3 million and a few earning over $3 million. b) Since the data is skewed to the right, the mean would be the higher value of $2,090,012 and the median the lesser value of $1,646,853. 2.106 Hiking The classification into easy, medium or hard is categorical and the length classification is quantitative. 2.107 Lengths of hikes a) One example is 1, 2, 4, 6, 7. Both the mean and median are 4. b) One example is 2, 2, 3, 5 and 6. 2.108 Central Park monthly temperatures a) Both distributions are fairly symmetric and bell-shaped, with January having greater variability than July. b) The mean temperature for January is around 32º and the mean temperature for July is around 76º. The average monthly temperature in January is approximately 44º less than the average monthly temperature in July. c) The average monthly temperature in January is more variable than in July. The range of average temperatures for January is approximately 22º to 43º and the standard deviation is approximately 5º. The range of average temperatures for July is approximately 71º to 81º and the standard deviation is approximately 2º. It may be a bit surprising to see how much more variable are the average monthly temperatures in January than in July. 2.109 What does s equal? a) Given the mean and range, the most realistic value is 12. –10 is not realistic because standard deviation must be 0 or positive. Given that there is a large range, it is not realistic that there would be almost no spread; hence, the standard deviation of 1 is unrealistic. 60 is unrealistically large; the whole range is hardly any more than 60. b) –20 is impossible because standard deviations must be nonnegative. 2.110 Female heights a) According to the Empirical Rule, 95% of scores in a bell-shaped distribution fall within two standard deviations of the mean. x  2 s  65  2  3.5  58 x  2 s  65  2  3.5  72

Thus, 95% of heights likely fall between 58 and 72 inches. b) The height for a woman who is three standard deviations below the mean is 54.5. x  3s  65  3  3.5  54.5 This is on the cusp of what would be considered an outlier according to the z-score criterion. Scores that are beyond three standard deviations from the mean are considered to be potential outliers. So, yes, this height is bordering on unusual. 2.111 Energy and water consumption a) The distribution is likely skewed to the right because the maximum is much farther from the mean than the minimum is, and also because the standard lowest possible value of 0 is only 780/506 = 1.54 standard deviations below the mean. b) The distribution is likely skewed to the right because the standard deviation is almost as large as the mean, and the smallest possible value is zero, only 1.15 standard deviation below the mean. 2.112 Hurricane damage a) The distribution is skewed to the right. b) The median should be used since the distribution is skewed to the right. c) The values are correct.

Chapter 2: Exploring Data with Graphs and Numerical Summaries

2.112 (continued) d) The distribution of hurricane damage is skewed to the right, with the damage for the costliest hurricane Katrina (more than 100 billion) far exceeding all others. The median damage was 7.9 billion. The 25% most costly hurricanes had a cost of over 11.8 billion, whereas the 25% fewest damaging hurricanes cost no more than 5.7 billion. 2.113 More hurricane damage a) The percentage differs from 68% because of the extreme right skew of the distribution. b) The mean and standard deviation would get much smaller due to the removal of the extreme value. The median, IQR, and 10th percentile would not change much. 2.114 Student heights a) For a bell-shaped distribution, such as the heights of all men, the Empirical Rule states that all or nearly all scores will fall within three standard deviations of the mean ( x  3s ). In this case, that means that nearly all scores would fall between 62.2 and 79.6. In this example, almost all men’s scores do fall between these values. b) The center for women is about five inches less than the center for men. The variability, however, is very similar. These distributions are likely very similar in shape; they are just centered around different values. x  x 62  70.9 c) The lowest score for men is 62. This would have a z-score of z    3.07 . Thus, it s 2.9 falls 3.07 standard deviations below the mean. 2.115 Cigarette tax a) 12

Frequency

10 8 6 4 2 0

120 Cigarette Tax

160

200

The histogram shows a unimodal distribution that is skewed to the right. If there are any outliers, they would be the most extreme scores, such as the one around 200. b) The mean is 72.85 and the median is 60. The mean is inflated relative to the median as one would expect from the distribution depicted in the histogram that is skewed to the right. The few high scores would pull the mean higher, but not the median. c) The standard deviation is 48.00. This indicates that the typical score falls about 48.0 from the mean. 2.116 Cereal sugar values a) Numbers are approximate: Minimum: 0, Q1: 4, Median: 9.5, Q3: 13.5, Maximum: 18 b) Because the median is closer to Q3 and the maximum than it is to Q1 or the minimum, it appears that this distribution is slightly skewed to the left. c) This sugar value falls 1.64 standard deviations below the mean. x x 0  8.75 z , z  1.64 s 5.32

32 Statistics: The Art and Science of Learning from Data, 4th edition 2.117 NASDAQ stock prices a) Dotplot of Price per Share

45 Price per Share

b) The median is the average of the two middle numbers, 23 and 26. Thus, the median is 24.5. The first quartile is the median of all the numbers below the median: 3, 4, 4, 7, 7, 8, 9, 9, 13, and 23. Thus, the first quartile is 7.5. The third quartile is the median of all numbers above the median: 26, 26, 26, 37, 40, 52, 52, 60, 78, 87. Thus, the third quartile is 46. c) The box plot does not show the gaps in the observations. Also, the individual data values cannot be reproduced from the box plot. Boxplot of Price per Share 90 80

Price per Share

70 60 50 40 30 20 10 0

2.118 Temperatures in Central Park a) The distribution appears to be fairly symmetric, but perhaps a little left-skewed . Most values lie between 35 and 45 degrees, but range from 30 to 51 degrees. The mean and median are almost the same; the typical March temperature is around 42 degrees. The spread is not large in comparison to the mean.

Chapter 2: Exploring Data with Graphs and Numerical Summaries

2.118 (continued) 30

Frequency

25 20 15 10 5 0

30 32 34 36 38 40 42 44 46 48 50 52

Average March Temperature

b) Mean: 40.4; Standard deviation: 4.2 c) The mean average temperature is higher in November than in March and the spread of average temperatures is less in November than in March. Histogram of NOV 40

Frequency

36 38 40 42 44 46 48 50 52 54 Average November Temperature

d) As indicated by the histograms, the average monthly temperature is greater in November than in March and the standard deviation is less. The side-by-side box plot makes it easier to see the relative spreads of the data as well as the difference in the means. Boxplot of March and November Temps 55

MA R

NOV

34 Statistics: The Art and Science of Learning from Data, 4th edition 2.119 Teachers’ salaries a) The range is the maximum minus the minimum. Range = 69,119 – 35,070 = 34,049 The interquartile range (IQR) equals Q3 – Q1. IQR = 55,820 – 45,840 = 9980 These statistics indicate that the salaries range across a $34,049 span, and that the middle 50% of salaries range across an $9,980 span. b) (i) The values at the ends of the boxes would be 45,480 (Q1) and 55,820 (Q3). (ii) The line in the middle of the box would be the median, 48,630. (iii) The lower end of the left whisker would be the minimum, 35,070. (iv) The upper end of the right whisker would be the maximum, 69,119. c) The minimum and Q1 are closer to the median than are Q3 and the maximum. This indicates that the data are likely skewed to the right. d) The most realistic standard deviation would be 7000. 100 and 1,000 are too small for typical deviations given a range of 34,049. 25,000 is too big given that it is almost three-quarters of the range; the typical score could never be this far from the mean. 7000 is the only realistic score. 2.120 Health insurance a) The distribution is probably skewed to the right because the distance of Q3 from the median and from Q3 to the maximum is longer than the distance of Q1 from the median and from Q1 to the minimum. b) The most plausible value for the standard deviation of this distribution is 4. The middle 50% of scores fall within a range of 5.0%, making it plausible that the typical score would deviate four percentage points from the mean. We cannot have a negative percentage point, so –16 is not plausible. We know that there is variation, so 0 is not plausible. The whole range is not much more than 20; thus, 20 and 25 are implausibly large for the standard deviation of this distribution. 2.121 What box plot do you expect? Box plots will vary, but should have the following characteristics. a) The center of these data is closer to the maximum than the minimum. Although the mean is likely to be pulled by an outlier more than the median, this still indicates that the data might be skewed to the left, and that the box plot might have more distance between the median and both Q1 and the minimum than between the median and both Q3 and the maximum. b) IQ scores are designed to be symmetric, and these data support that. The box plot, thus, would appear symmetric. c) The mean is higher than is the median, indicating that the data are skewed to the right. Thus, the box plot would have more distance between the median and both Q3 and the maximum than between the median and both Q1 and the minimum. 2.122 High school graduation rates a) The range is the difference between the lowest and highest scores: 91.8 – 79.9 = 11.9. The interquartile range (IQR) is the difference between scores at the 25th and 75th percentiles: IQR = Q3 – Q1 = 89.8 – 84.0 = 5.8 b) Potential outliers are 1.5(IQR) = 1.5(5.8) = 8.7 below Q1 or above Q3. This criterion suggests that potential outliers would be those scores less than 84.0 – 8.7 = 75.3 and greater than 89.8 + 8.7 = 98.5. There are no scores beyond these values, and so it would not indicate any potential outliers. c) No, since the z-scores of both the minimum and maximum values are both less than 3 in absolute value. No outliers are present. x  x 79.9  86.9 x  x 91.8  86.9 z   2.06 z   1.44 s 3.4 s 3.4 2.123 SAT scores a) Because the right whisker extends further than does the left whisker, and the line through the center of the box is left of center, the box plot suggests that the distribution is somewhat skewed to the right. b) Numbers are approximate: Minimum: 1400, Q1: 1475, Median: 1550, Q3: 1700, Maximum: 1800 The lowest score is approximately 1400 and the highest is 1800. The score below which the lowest 25% fall is approximately 1475, and the score above which the highest 25% fall is approximately 1700. The middle score, that below which 50% of the scores fall, is 1550. c) If only viewing the box plot, we would not see that the distribution may be bimodal. Copyright © 2017 Pearson Education, Inc.

Chapter 2: Exploring Data with Graphs and Numerical Summaries

2.124 Blood pressure a) A z-score of 1.19 indicates that a person with a blood pressure of 140, the cutoff for having high blood pressure, falls 1.19 standard deviations above the mean. x  x 140  121 z   1.19 s 16 b) About 95% of all values in a bell-shaped distribution fall within two standard deviations of the mean – in this case, 32. About 95% of systolic blood pressures fall between 121 – 2(16) = 89 and 121 + 2(16) = 153. 2.125 No cereal sodium If a cereal has a sodium value of 0, it falls 2.16 standard deviations below the mean. x  x 0  167   2.16 s 77.3 2.126 Who was Roger Maris? a) Neither the minimum nor the maximum score reaches the criterion for a potential outlier of being more than three standard deviations from the mean (i.e., having a z-score less than –3 or greater than 3). Thus, there are no potential outliers according to three standard deviation criterion. x  x 5  22.92 x  x 61  22.92 z   1.12 z   2.38 s 15.98 s 15.98 b) The maximum is much farther from the mean and median than is the minimum, an indicator that the distribution might not be bell-shaped. Moreover, the lowest possible value of 0 is only 22.92/15.98 = 1.43 standard deviations below the mean. c) Based on the criteria noted above, this is not unusual. It does not even come close to meeting the three standard deviation criterion for a potential outlier and therefore is not an unusual number of homeruns for Roger Maris. x  x 13  22.92 z   0.62 s 15.98 z

Chapter Problems: Concepts and Investigations 2.127 Baseball’s great homerun hitters The responses will be different for each student depending on the methods used. 2.128 How much spent on haircuts? The responses will be different for each student depending on the methods used.

36 Statistics: The Art and Science of Learning from Data, 4th edition 2.129 Controlling asthma a) The distribution of children on both Formoterol (F) and Salbutamol (S) are skewed to the left. There is a data point that qualifies as an outlier, indicated by a dot to the far left, in the Salbutamol distribution. Children on Formoterol seem to be doing better, on average, than do children on Salbutamol. Boxplot of formoterol 450

400

350

300

formoterol

salbutamol

Boxplot of salbutamol 450

250

200

150

100

b) Here are the difference scores for each child. A positive difference indicates a higher score for Formoterol than for Salbutamol. Child Formoterol Salbutamol Difference 1 310 270 40 2 385 370 15 3 400 310 90 4 310 260 50 5 410 380 30 6 370 300 70 7 410 390 20 8 320 290 30 9 330 365 –35 10 250 210 40 11 380 350 30 12 340 260 80 13 220 90 130 Boxplot of Differences (formoterol - salbutamol) 150

Differences (F - S)

100

-50

Chapter 2: Exploring Data with Graphs and Numerical Summaries

2.129 (continued) If there is, on average, no difference between PEF levels for the two brands, the distribution of differences would be centered around 0, a score indicating no difference. The current difference scores appear skewed slightly to the right. The difference scores indicate a positive difference, on average. The center is well into the positive side, and the data points are quite spread out. Thus, children on Formoterol have higher scores, on average, than when they are on Salbutamol, although there is a quite a bit of variation in amount of improvement. Moreover, there appears to be one outlier, a child who responds more poorly on F than on S. 2.130 Google trend The response to this exercise will be different for each student. 2.131 Youth unemployment by gender The median unemployment rate is similar (at about 10%) for males and females. Also, for both genders, unemployment rate varies among countries (except Greece and Spain) from roughly 6% to 16%. The middle 50% of the distribution for females (IQR about 5%) has less variability than the distribution for males (IQR about 7%). For the two countries with highest unemployment rate (Greece and Spain, with rate larger than 20%), unemployment rate for females is even higher than for males. Including the outliers, both the male and female distributions are right skewed. 2.132 You give examples Answers will vary. a) Approximately symmetric – number of letters that can be remembered in a memory task, or IQ. b) Skewed to the right – number of alcoholic beverages consumed in a week (this would be skewed by a few extreme binge-drinkers) or distance traveled to work (skewed by a few with incredibly long commutes). c) Skewed to the left – happiness levels on one’s wedding day (most would be very happy, but there’d likely be a few who were sad) or score on an easy exam (skewed by a few who did poorly anyway). d) Bimodal – writing ability in a university writing center (some would come because they need help, the rest would be skilled tutors, and there would be fewer in the middle) or income for a sample that includes people from the U.S. and people from a third world country (some would center on a very low income, and some around a much higher income). e) Skewed to the right, with a mode and median of 0 but a positive mean – number of times students have eaten snake in their lives (most would never have eaten it, but a few would have tried it once, perhaps on Fear Factor, and an even smaller number would have had it several times) or number of times married for a sample of high school students (most would not have been married at all, but a few would have been married once, and an even smaller number would have been married more than once). 2.133 Political conservatism and liberalism a) As seen in Example 12, one need not add up every separate number when calculating a mean. This would be unwieldy with the political conservatism and liberalism data. We would have to add up 69 ones, 240 twos, etc. (all the way up to 68 sevens), then divide by the 1933 people in the study. There’s a far easier way. We can find the sum of all values in the study,  x, by multiplying each

possible value (1–7 in this case) by its frequency.  x  69(1)  240(2)  221(3)  740(4)  268(5)  327(6)  68(7)  7950  4.11 x n 1933 1933 b) The mode, the most common score, is four. c) The median would be the 967th score. In this case, that category is four. 2.134 Mode but not median and mean We use the mode when we’re interested in the category with the highest frequency, as opposed to merely finding the “center” of the data. To find a mean or median, we must have observations that measure a quantity. With unordered categories, observations do not do this. But, we can still find the most common outcome, so the mode is appropriate.

38 Statistics: The Art and Science of Learning from Data, 4th edition 2.135 Multiple choice - GRE scores The best answer is (a). 127 and 129 are close, but 649 is relatively larger than 529. 2.136 Multiple choice - Facts about s The best answer is (b), s can be zero if all observations hold the same value. 2.137 Multiple choice - Relative GPA The best answer is (a). The standard deviation would allow her to calculate her z-score. 2.138 True or false a) False, consider the following data set: 3, 3, 3, 3, 3. Note that the mean = median = mode = 3. b) False, consider the following data set: 1 2 3 4. The mean is 2.5 which is not one of the data points.  n 1 c) True, when n is odd, the median is the data point in the  th position of the sorted data.  2 

d) True, by definition, the median is the second quartile which is also known as the 50th percentile. 2.139 Bad statistic The standard deviation was incorrectly recorded. The standard deviation represents a typical scores distance from the mean. For grades ranging between 26 and 100, a standard deviation of 76 is way too large. 2.140 Soccer true or false False, the mean would be inflated by the salaries of the few players who earn exorbitant salaries, but the magnitudes of these salaries would not affect the median. Thus, the mean would be larger than would the median. ♦♦2.141 Mean for grouped data In Exercise 2.43 or 2.133, the mean could be expressed as a sum. Before, the mean was calculated by multiplying each score by its frequency, then summing these and dividing by the total number of subjects. Alternatively, we could first divide each frequency by the number of subjects, rather than dividing the sum by the number of subjects. Dividing the frequency for a given category by the total number of subjects would give us the proportion. We are just changing the order in which we perform the necessary operations to calculate the mean. ♦♦2.142 Male heights a) The median falls at the 50th percentile. In this case, the 50th percentile falls in the group that is 70 inches or less (i.e., 54% of all subjects), but above 69 inches. Thus, the median is in the category “70 inches or less,” but above 69 inches. b) If the distribution is bell-shaped, the mean would fall in the middle, and be about 70 inches. Further, the Empirical Rule would apply, and nearly all scores would fall within three standard deviations of the mean. If nearly all scores fall within 10 inches from the mean (60 is 10 inches below the mean of 70, and 80 is ten inches above it), the standard deviation would be about 10 divided by 3, or about 3.3. ♦♦2.143 Range and standard deviation approximation Based on the work of statisticians (the Empirical Rule), we know that most, if not all, data points fall within three standard deviations of the mean if we have a bell-shaped distribution. The formula for this is x  3s . If the region from three standard deviations below the mean to three standard deviations above the mean encompasses just about everyone in the data set, we could add the section below the mean (3s) to the section above the mean (3s) to get everyone in the data set. 3s  3s  6s . Because the range is defined as everyone in the dataset, we can say that the range is equal, approximately, to 6s. ♦♦2.144 Range the least resistant There are only two observations that are taken into account by the range, the minimum and maximum scores. The range is the difference between these two, so the range increases exactly the same amount as one of these scores increases (or in the case of the minimum, decreases). The mean and standard deviation, however, take the magnitude of all observations into account. Although an extreme score would pull the mean in its direction, and would increase the standard deviation, this “pull” would be offset, at least to some degree, by the values of the other observations.

Chapter 2: Exploring Data with Graphs and Numerical Summaries

♦♦2.145 Using MAD to measure variability a) With greater variability, numbers tend to be further from the mean. Thus, the absolute values of their deviations from the mean would be larger. When we take the average of all these values, the overall MAD is larger than with distributions with less variability. b) The MAD is more resistant than the standard deviation because by squaring the deviations using the standard deviation formula, a large deviation has greater effect. ♦♦2.146 Rescale the data a) c = 20, new mean: 57 + 20 = 77, standard deviation: 20 (unchanged) b) c = 12 , new mean: $39,000/2 = 19,500 pounds, standard deviation: $15,000/2 = 7500 pounds c)

Linear transformations do not change the shape of the distribution.

Chapter Problems: Student Activities 2.147 The average student Answers will vary. 2.148 Create own data. Answers will vary. 2.149 GSS Answers will vary.

Chapter 3: Association: Contingency, Correlation, and Regression 41

Section 3.1 The Association Between Two Categorical Variables 3.1 Which is response/explanatory? a) The explanatory variable is carat and the response variable is price. b) The explanatory variable is dosage and the response variable is severity of adverse event. c) The explanatory variable is construction type and the response variable is top speed. d) The explanatory variable is type of college and the response variable is graduation rate. 3.2 Sales and advertising a) The two variables are amount spent on advertising and monthly sales. b) Both variables are quantitative. c) The explanatory variable is amount spent on advertising and the response variable is monthly sales. 3.3 Does higher income make you happy? a) The response variable is happiness and the explanatory variable is income. b) Using 2010 data: Happiness Income Not Too Happy Pretty Happy Very Happy Total n 21 213 126  0.06  0.59  0.35 1.00 360 360 360 360 96 506 248  0.11  0.60  0.29 Average 1.00 850 850 850 850 143 347 114  0.24  0.57  0.19 Below Average 1.00 604 604 604 604 Total 260 1066 488 1814 The proportion of people who are very happy is larger for those with above-average income (35%) compared to those with below-average income (19%), showing an association between these two variables. Also, the proportion of people who are not too happy is much larger (24%) for people with below-average income compared to people with average (11%) or above-average (6%) income. 488  0.27 . c) Overall, the proportion of people who reported being very happy is 1814 3.4 Diamonds a) Clarity Cut IF VVS VS SI I Total n

Above Average

Good Fair

2  0.025 80 1  0.023 44

4  0.050 80 3  0.068 44

16  0.200 80 8  0.182 44

55  0.688 80 30  0.682 44

3  0.038 80 2  0.045 44

1.00

b) The conditional proportions for the two cuts are very similar. For both cuts, the majority of diamonds are rated as slightly included (69% for good cuts, 68% for fair cuts) followed by very slightly included (20% for good cuts, 18% for fair cuts).

42 Statistics: The Art and Science of Learning from Data, 4th edition 3.4 (continued) Chart of IF, VVS, VS, SI, I 60 50

Data

40 30 20 10 0 Cut

IF V S V S SI V d oo G

IF V S V S SI V ir Fa

The conditional proportions are very similar for the two cuts. Based on these data, there appears to be no meaningful association between the cut of a diamond and its clarity rating. 3.5 Alcohol and college students a) The response variable is binge drinking and the explanatory variable is gender. We wonder if a person’s binge drinking status can be explained in part by their gender. (We don’t wonder if a person’s gender can be explained by their binge-drinking!) b) (i) There are 1908 male binge drinkers. (ii) There are 2854 female binge drinkers. c) The counts in (b) cannot be used to answer the question about differences in proportions of male and female students who binge drink. These are not proportions of male and female students; these are counts. There are far more females than males in this study, so it’s not surprising that there are more female than male binge drinkers. This doesn’t mean that the percentage of women who binge drink is higher than the percentage of men. If we used these numbers, we might erroneously conclude that women are more likely than are men to be binge drinkers. d) Binge Drinking Status Gender Binge Drinker Non-Binge Drinker Total n Male Female

1908  0.49 3925 2854  0.41 6979

2017  0.51 3925 4125  0.59 6979

1.00

3925

1.00

6979

These data tell us that 49% of men are binge drinkers, whereas 51% are not. They also tell us that 41% of women are binge drinkers, whereas 59% are not. e) It appears that men are more likely than are women to be binge drinkers. 3.6 How to fight terrorism? a) The response variable is opinion about how to best fight terrorism with categories, let terrorists know we’ll fight back aggressively and work to find an international solution. The explanatory variable is gender with categories, male and female. b) Opinion on how to fight terrorism Gender Fight back International Total Men 0.53(600) = 318 0.47(600) = 282 600 Women 0.36(400) = 144 0.64(400) = 256 400 Total 462 538 1000

Chapter 3: Association: Contingency, Correlation, and Regression 43 3.6 (continued) c) Opinion on how to fight terrorism Gender Fight back International Total Men 0.53 0.47 1 Women 0.36 0.64 1 d) These proportions could be converted into percentages by multiplying by 100. The resulting percentages are conditional percentages because we are reporting the percentages of people who have a given opinion, given their gender. e) Opinion on How to Fight Terrorism 350 300

frequency

250 200 150 100 50 0

C2 C1

Men Women Fight Back

Men Women International

For those that chose wanting to fight back, the difference is 0.53 – 0.36 = 0.17. The proportion of males is 17 percentage points larger than the proportion of females. The ratio is 0.53/0.36 = 1.47. The proportion of males wanting to fight back is 47% higher (or 1.47 times higher) than the proportion of females. For those that chose working with other nations, the difference is 0.64 – 0.47 = 0.17. The proportion of females is 17 percentage points larger than the proportion of males. The ratio is 0.64/0.47 = 1.36. The proportion of females wanting to work with other nations is 36% higher (or 1.36 times higher) than the proportion of males. g) We’d need to have the same conditional percentages for men and women in each category, such as 57% for both male and female. 3.7 Heaven and hell a) Each one could be the outcome of interest, and how it depends on the other could be studied. b) Using 2008 data: Do you believe in hell? Do you believe in heaven? Yes No Total Yes 955 162 1117 No 9 188 197 Total 964 350 1314 c) Do you believe in heaven? Do you believe in hell? Yes No Total n 955 9 1.00 964 Yes  0.991  0.009 964 964 162 188 1.00 350 No  0.463  0.537 350 350 Overall, it appears that if a person believes in hell, they are very likely to believe in heaven (99.1%); however, if a person does not believe in hell, the probability that they believe in heaven (46%) is much closer to the probability that they do not believe in heaven (54%). Copyright © 2017 Pearson Education, Inc.

44 Statistics: The Art and Science of Learning from Data, 4th edition 3.7 (continued) d) Do you believe in heaven? Yes No

Do you believe in hell? Yes No 955 117  0.855 162 1117  0.145 9 197  0.046 188 197  0.954

Total 1.00 1.00

n 1117 197

If a person believes in heaven, they are much more likely to believe in hell (85.5%) than to not believe in hell (14.5%). If a person does not believe in heaven, they are almost certain to not believe in hell (95%). e) (i) 1117/1314 = 85.0% of the respondents believe in heaven. (ii) 964/1314 = 73.4% of the respondents believe in hell. 3.8 Surviving the Titanic a) The percentage of children and female adult passengers who survived is 373/(373 + 161) = 69.9%, the percentage of male adults is 338/(338 + 1329) = 20.3%. b) The difference between children and female adult passengers and male adult passengers is 69.9 – 20.3 = 49.6. The proportion of children and female adult passengers surviving the sinking of the Titanic is 49.6 percentage points higher than the proportion for male adult passengers. c) The ratio between children and female adult passengers and male adult passengers is 69.9/20.3 = 3.4. The proportion of children and female adult passengers surviving the sinking of the Titanic is 3.4 times larger than the proportion for male adult passengers. 3.9 Gender gap in party ID a) The response variable is party identification, and the explanatory variable is gender. b) (i) Male and Republican: 89 892  0.10 (ii) Female and Republican: 95 892  0.11 c)

(i) Male: 355 892  0.40

(ii)

Republican: 184 892  0.21

d) The proportions in (c) are marginal proportions because they are proportions referring to the entire sample. e) These conditional proportions suggest that more females (about 43%) identify as Democrat than males (about 30%), whereas a higher proportion of males identifies as Independent or Republican than females. 3.10 Use the GSS a) Happiness Gender Very happy Pretty happy Not too happy Row total Male 7128 13,088 2840 23,056 Female 9402 16,164 3699 29,265 Column total 16,530 29,252 6539 52,321 b) Happiness Gender Very happy Pretty happy Not too happy Row Total 7128 23,056  0.309 13, 088 23,056  0.568 2840 23,056  0.123 1.00 Male 9402 29, 265  0.321 16,164 29, 265  0.552 3699 29, 265  0.126 Female 1.00 1.00 Column total 16, 530 52, 321  0.316 29, 252 52, 321  0.559 6539 52, 321  0.125 c) The conditional proportions for males and females are similar across all three types of happiness. For those reporting being not too happy, the difference is 12.3 – 12.6 = –0.3. The proportion of males being not too happy is 0.3 percentage points smaller than the proportion for females. The ratio is 12.3/12.6 = 0.98. The proportion of males being not too happy is 2 percent smaller than the proportion for females.

Chapter 3: Association: Contingency, Correlation, and Regression 45

Section 3.2 The Association Between Two Quantitative Variables 3.11 Used cars and direction of association a) We would expect a positive association because as cars age, they tend to have covered more miles. Higher numbers on one variable tend to associate with high numbers on the other variable (and low with low). b) We would expect a negative association because as cars age, they tend to be worth less. High numbers on one variable tend to associate with low numbers on the other. c) We would expect a positive association because older cars tend to have needed more repairs. d) We would expect a negative association. Heavier cars tend to travel fewer miles on a gallon of gas. e) We would expect a positive association; the heavier the car, the more fuel it will burn to move forward. 3.12 Broadband and GDP a) For countries with GDP less than $5000 billion, there is a clear trend in that countries with larger GDP have a larger number of broadband subscribers. Three countries stand out in terms of both their GDP and number of subscribers. Although Japan, the third-largest country in terms of GDP, seems to follow the trend, China (the second-largest country in terms of GDP) has by far the most broadband subscribers, whereas the United States (the country with the largest GDP by far) has fewer broadband subscribers than China. b) The country is China, with approximately x = 8000 billion GDP and y = 160 million broadband subscribers. c) r = 0.77; the positive sign indicates a positive association between GDP and number of broadband subscribers; as GDP increases, the number of broadband subscribers tends to increase as well. d) The United States has fewer than expected; China has more than expected. e) The correlation coefficient would not change because it does not depend on the units of measurement. 3.13 Economic development based on GDP a) Boxplot of GDP in billions of $US 18000 16000 GDP in billions of $US

14000 12000 10000 8000 6000 4000 2000 0

b) The distribution is skewed to the right with two clear outliers (China and United States). c) Nations with small GDP can have both large and small population sizes, resulting in both small and large per capita GDP and revealing no overall trend. d) These two variables are not measuring the same thing. If the GDP were divided by the same value for all nations (such as in standardizing when dividing by the standard deviation of GDP), the correlation between GDP and standardized GDP would be 1. Here, each nation’s GDP value is divided by a different value (the nation’s population size). 3.14 Politics and newspaper reading a) The association is weak. We know this because it is close to zero; zero indicates no relation. b) Religiosity has a stronger association with political ideology than does newspaper reading because the magnitude of the correlation is larger.

46 Statistics: The Art and Science of Learning from Data, 4th edition 3.15 Internet use correlations a) Internet users and broadband subscribers have the strongest linear relationship. b) Facebook users and population have the weakest linear relationship. c) The correlation between Internet users and Facebook users does not take population size into account whereas the correlation between Internet use and Facebook use does. 3.16 Match the scatterplot with r 1) (c); strong negative association 3) (d); no linear association 2) (a); moderate negative association 4) (b); moderate positive association 3.17 What makes r = 1? a) Scatterplot of y vs x 16 15 14

13 12 11 10 9 8 7 3

5 x

b) It is the point (4, 13). c) The value of 13 would have to be changed to 10. 3.18 Gender and Chocolate Preference The correlation coefficient is only valid to measure the association between two quantitative variables, not between two categorical variables. 3.19 r = 0 Scatterplot of y vs x 14 12 10

8 6 4 2 0 1

4 x

Chapter 3: Association: Contingency, Correlation, and Regression 47 3.20 Correlation inappropriate It is only appropriate to use this measure of correlation when an association is linear. If an association is curvilinear (e.g., U-shaped), we should not use this statistic. For example, in some situations people perform poorly at very low and very high levels of anxiety. They perform best with a moderate amount of performance-enhancing anxiety. This would form an upside-down U-curve, and it would not be appropriate to measure this association using the correlation you learned in this chapter. 3.21 Which mountain bike to buy? a) (i) The explanatory variable would be weight. (ii) The response variable would be price. b) Scatterplot of Price vs Weight 1200 1000

price

800 600 400 200 0 28

32 33 weight

The relation deviates from linearity in that the bikes with weights in the middle tend to cost the most, with those weighing less and more tending to cost less. c) The correlation is negative and fairly small. This indicates some relation between variables, such that as weight increases, price tends to decrease. Because, however, these variables deviate from linearity in their relation, this correlation coefficient is not an entirely accurate measure of the relation. 3.22 Prices and protein revisited a) Scatterplot of Protein (g) vs Cost ($)

protein (g)

$2.50

$2.75

$3.00

$3.25 cost ($)

$3.50

$3.75

$4.00

The association seems to be positive; however, the association is probably heavily influenced by the unusual data point representing the sandwich that costs $2.49 and has a protein content of 8 grams. b) The unusual data point is the only vegetarian sandwich on the list which reasonably explains why it has a lower protein content as well as a lower cost than the other sandwiches.

48 Statistics: The Art and Science of Learning from Data, 4th edition 3.22 (continued) c) The correlation is 0.864. This is a strong positive correlation. It suggests that as the cost of the sandwich increases, the protein content of the sandwich increases as well. 3.23 Buchanan vote a) Boxplot of Buchanan

Boxplot of Gore 400000

3500 3000

300000

2000 gore

buchanan

2500

1500 1000

200000

100000

500 0

Both box plots indicate that the counts are skewed to the right with few counties in the high ranges of vote counts. b) Scatterplot of buchanan vs gore 3500 3000

buchanan

2500 2000 1500 1000 500 0 0

100000

200000 gore

300000

400000

The point close to 3500 on the variable “Buchanan” is a regression outlier; we were unable to make this comparison from the box plots because there were two separate depictions, one for each candidate. c) We would have expected Buchanan to get around 1000 votes. d) The box plot for Buchanan would be the same as in (a).

Chapter 3: Association: Contingency, Correlation, and Regression 49 3.23 (continued) Boxplot of Bush

Scatterplot of Buchanan vs Bush

300000

3500 3000

250000

2500 Buchanan

bush

200000 150000

2000 1500 1000

100000

500 50000

0 0

50000

100000

150000 200000 250000

300000

Bush

As with the scatterplot with the data for Gore, the point close to 3500 on the variable “Buchanan” is an outlier.

Section 3.3 Predicting the Outcome of a Variable 3.24 Sketch plots of lines a)

ŷ = 7 + 0.5x

ŷ = 7 + x

The y-intercept is 7 and the slope is 1.

The y-intercept is 7 and the slope is 0.5.

ŷ = 7 – x

The y-intercept is 7 and the slope is –1.

ŷ = 7

The y-intercept is 7 and the slope is 0. 10

8 6

4 2

6 x

-2 -4

-6 -8

6 x

50 Statistics: The Art and Science of Learning from Data, 4th edition 3.25 Sit-ups and the 40-yard dash a)

(i)

ŷ = 6.71 – 0.024x = 6.71 – 0.024(10) = 6.47

(ii) ŷ = 6.71 – 0.024x = 6.71 – 0.024(40) = 5.75 The regression line would the line that connects the points (10, 6.47) and (40, 5.75). b) The y-intercept indicates that when a person cannot do any sit-ups, she/he would be predicted to run the 40-yard dash in 6.71 seconds. The slope indicates that every increase of one sit-up leads to a decrease in predicted running time of 0.024 seconds. c) The slope indicates a negative correlation. The slope and the correlation based on the same data set always have the same sign. 3.26 Home selling prices (i) ŷ = 9.2+77.0x = 9.2+77.0(2) = 163.2; we would, therefore, predict that that a house with two thousand square feet would sell for $163,200. (ii) ŷ = 9.2+77.0x = 9.2+77.0(3) = 240.2; we would, therefore, predict that that a house with three thousand square feet would sell for $240,200. b) For every increase of one unit (a thousand square feet), home prices are predicted to increase by 77 units (that is, 77 thousand dollars). We see this from the value of the slope in the equation, and from the results in (a) by subtracting the predicted value for two units of area from three units of area (240.2 – 163.2 = 77.0). c) The correlation between these variables is positive. There is a positive slope. Also, by putting in different values for x, we can see that as square footage increases, so do predicted home prices. d) The predicted value is 240.2 (see part a), and the actual value is 300. The formula for the residual is y  yˆ. In this case, 300 – 240.2 = 59.8. The residual is a measure of error; thus, error for this data point is 59.8; the selling price was $59,800 higher than what would be predicted by this equation. 3.27 Rating restaurants a) (i) The predicted cost of a dinner in a restaurant that gets the lowest food quality rating of 21 is ŷ = –70 + 4.9(21) = $32.90. a)

(ii) The predicted cost of a dinner in a restaurant that gets the highest food quality rating of 28 is ŷ = –70 + 4.9(28) = $67.20. b) For every 1 point increase in food quality rating, the predicted price of the dinner increases by $4.90. c) The correlation between the cost of a dinner and the food quality rating is 0.68, which is a moderate positive correlation. This indicates that higher costs are associated with restaurants receiving higher food quality ratings.





d) The slope can be calculated using the formula: b  r s y s x  0.68 14.92 2.08  4.9. 3.28 Predicting cost of meal from rating Since the service rating has the highest absolute correlation with the cost of a dinner (0.69), it can be used to make the best predictions of the cost of a dinner. 3.29 Internet in Indonesia a) The positive correlation means a positive association, i.e., nations with higher Internet use tend to have higher Facebook use, which results in a positive slope for the regression line. Also, because





b  r s y s x and r, sy and sx are all positive, b must be positive.

ŷ = 7.90 + 0.439x = 7.90 + 0.439 (15.4) = 14.7, so Indonesia’s predicted Facebook use is 14.7%.

From (b), the predicted value is 14.7%, and the actual value is 20.7%. The formula for the residual is y  yˆ. In this case, 20.7 – 14.7 = 6.0%. The residual is a measure of error; thus, the error for this data point is 6.0; it is 6.0 percentage points higher than what would be predicted by this equation.

Chapter 3: Association: Contingency, Correlation, and Regression 51 3.30 Broadband subscribers and population a) The slope is 0.0761, a positive number; hence, this is a positive association. For every million person increase in population size, the predicted number of broadband subscribers increases by 76,100. b) (i) ŷ = 5,530,203 + 0.0761(7,154,600) = 6,074,688; At the minimum population size of 7,154,600, we would predict there to be 6,074,688 broadband subscribers. (ii) ŷ = 5,530,203 + 0.0761(1,350,695,000) = 108,318,093; At the maximum population size of 1,350,695,000, we would predict there to be 108,318,093broadband subscribers. c) ŷ = 5,530,203 + 0.0761(313,914,040) = 29,419,061 The predicted value is 29,419,061 and the actual value is 88,520,000. The formula for the residual is y  yˆ. In this case, 88,520,000 – 29,419,061 = 59,100,939. The residual is a measure of error; thus, the error for this data point is 59,100,939; it is 59,100,939 higher than what would be predicted by this equation. The U.S. has far more broadband subscribers than would be predicted from this equation. 3.31 SAT reading and math scores a) yˆ  18.1  0.975 x  18.1  0.975(501)  506.6. California’s predicted average math score is 506.6. b) The formula for the residual is y  yˆ. In this case, 516 – 506.6 = 9.4. The residual is a measure of error; thus, error for this data point is 9.4; California’s average math score was 9.4 points higher than would be predicted using this equation. c) r2 = 95.5%, indicating that the prediction error using the regression line to predict y is 96% smaller than the prediction error using the mean of y to predict y. Therefore, a state’s average reading score appears to be a reliable predictor of its average math score. 3.32 How much do seat belts help? a) The slope is the amount that y is predicted to change for every unit increase for x. As seat belt usage (x) increases by 1 percentage point, the predicted number of deaths per year (y) decreases by 270. Thus, the slope is –270. b) (i) yˆ  28,910  270 x  28,910  270(0)  28,910. If no one wears seat belts, the predicted number of annual deaths is 28,910. (ii) yˆ  28,910  270 x  28,910  270(73)  9200. If 73% of people wear seat belts, the predicted number of annual deaths is 9200. (iii) yˆ  28,910  270 x  28,910  270(100)  1910. If everyone wears seat belts, the predicted number of annual deaths is 1910. 3.33 Regression between cereal sodium and sugar a) The software calculates the line for which the sum of squares of the residuals is a minimum. b) No, any other line would have a larger sum of squares of the residuals. c) The two rightmost bars represent two cereals whose predicted sodium contents were much less than their actual sodium contents. The cereals are Rice Krispies and Raisin Bran. d) The amount of sugar is not a reliable predictor for the amount of sodium since r2 is close to zero (0.2%). 3.34 Regression and correlation between cereal sodium and sugar a) r2 would be small because not much is gained from using the regression equation with sugar to predict sodium values. b) Since r2 = (–0.017)2 = 0.03%, almost none of the variability in the sodium content of cereals can be explained by their sugar content. c)





The slope can be calculated using the formula: b  r s y s x  0.017  77.3 5.32   0.25.

52 Statistics: The Art and Science of Learning from Data, 4th edition 3.35 Advertising and sales a) Fitted Line Plot of Sales vs Advertising Sales = 4.000 + 2.000 Adv ertising

Sales

4 0.0

0.5

1.0 Advertising

1.5

2.0

b) The correlation is 1.0. The equation for the regression line is ŷ = 4 + 2x. See (a). c)

Advertising:  x  0  1  2  3  1; x 3 3 n s2 

 ( x  x )2   0  1  1  1   2  1  2  1 2

n 1

31

s  s  1 1 Sales:  y  4  6  8  18  6; y n 3 3 s2 

 ( y  y ) 2   4  6    6  6   8  6   8  4 2

n 1

31

s  s2  4  2





b  r s y s x  1 2 1  2; a  y  bx  6  2(1)  4; yˆ  4  2 x

The y-intercept of 4 indicates that when there is no advertising, it is predicted that sales will be about $4000. The slope of 2 indicates that for each increase of $1000 in advertising, predicted sales increase by $2000. 3.36 Midterm–final correlation a)

(i) ŷ = 30+0.6x = 30+0.6(100) = 90 (ii) ŷ = 30+0.6x = 30+0.6(50) = 60 In both cases, the prediction for final exam score is closer to the mean of 75 than to the original midterm score.





b  r s y s x  0.6  r (10 /10)  0.6  r (1)  r  0.6

(Note that when the spread for two variables is the same, the correlation equals the slope. In this instance, both variables (midterm and final) have a standard deviation of 10; thus, the correlation would be the same as the slope, 0.6.) This indicates that these two variables have a positive correlation. Those who scored higher on the midterm also tended to score higher on the final.

Chapter 3: Association: Contingency, Correlation, and Regression 53 3.37 Predict final exam from midterm a)





b  r s y s x  0.70(10 /10)  0.70; a  y  bx  80  (0.70)(80)  24

The regression equation is yˆ  24  0.70 x. b) The predicted final exam score for a student with an 80 on the midterm is 24 + 0.70(80) = 80. The predicted final exam score for a student with an 90 on the midterm is 24 + 0.70(90) = 87. 3.38 NL baseball a) The slope indicates that for an increase in team batting average of 0.010, the predicted team scoring increases by 0.415. (Note that there is never a difference of 1 between any two teams given the range of the team batting averages, so it is not relevant to consider an increase of 1. For an increase of 0.01 in team batting average, the predicted increase is 0.42 runs per game.) b)





b  r s y s x ; b = 0.900(0.3604/0.00782) = 41.5 2

An r of 0.81 indicates 81% of the variability in runs per game is accounted for by variability in team batting averages. 3.39 Study time and college GPA a) c)

Fitted Line Plot of GPA vs Study Time GPA = 2.625 + 0.04391 Study Time

3.8 3.6

GPA

3.4 3.2 3.0 2.8 2.6 5

15 Study Time

The linear correlation between GPA and study time appears to be positive and fairly strong since the data points follow a positive linear trend. b) The correlation is 0.81. This indicates that the association between GPA and study time is strong and positive; longer study times are associated with higher GPAs. c) See (a); The prediction equation is yˆ  2.625  0.0439 x. (i) A student who studies 5 hours per week is predicted to have a GPA of 2.625  0.0439(5)  2.84. (ii) A student who studies 25 hours per week is predicted to have a GPA of 2.625  0.0439(5)  3.72.

54 Statistics: The Art and Science of Learning from Data, 4th edition 3.40 Oil and GDP a) Fitted Line Plot of Oil Consumption vs GDP Oil Consumption = - 0.105 + 0.5464 GDP

Oil Consumption

25 20 15 10 5 0 0

20 GDP

The linear correlation between oil consumption and GDP appears to be positive and fairly strong since the data points follow a positive linear trend. b) See (a); The prediction equation is given by yˆ  0.105  0.5464 x. c)

The correlation is 0.85. This indicates that the association between GDP and oil consumption is strong and positive, higher gross domestic products are associated with higher annual oil consumptions per person. d) Canada has a GDP value of 34; thus, the predicted annual oil consumption per person for Canada is –0.105 + 0.5464(34) = 18.5. This gives a residual value of 26 – 18.5 = 7.5. 3.41 Mountain bikes revisited a) Fitted Line Plot of Price vs Weight price = 1896 - 40.45 weight

1200 1000

price

800 600 400 200 0 28

32 33 weight

b) See (a); The regression equation is price = 1896 – 40.5weight. For every 1 unit increase in weight, the predicted price decreases by $40.50. Because it’s impossible for a bike to have 0 weight, the y-intercept has no contextual meaning here. c) yˆ  1896  40.5 x  1896  40.5(30)  681; the predicted price is $681. 3.42 Mountain bike and suspension type a) The relationship between weight and price seems to be linear among bikes with front end suspension, and it also seems to be linear among bikes with full suspension, but when all bikes are included, the relationship is not completely linear. Thus, the simple regression line is not the best way to fit the data. It’s better to calculate separate regression lines for each type of suspension.

Chapter 3: Association: Contingency, Correlation, and Regression 55 3.42 (continued) b) Front end: The regression equation is price_FE = 2136 – 55.0weight_FE. Full: The regression equation is price_FU = 2432 – 44.0weight_FU. Compared to the slope calculated in Exercise 3.41 (which was –40.5), these slopes are larger in magnitude, an indication of a stronger relationship between variables when types of bike are looked at separately. c) If the correlations for full and front end suspension bikes are found separately, I would expect that the correlations would be higher for each type of bike. For front end suspension bikes, the correlation is –0.888, whereas for full suspension bikes, the correlation is –0.952. Thus, the correlations are still negative (i.e., heavier bikes cost less), but the magnitude of the relationships are far stronger when correlations are examined only among bikes of a certain suspension type. d) You may justify numerous ways such as: plotting the point on the scatterplot and seeing which cluster of points it most likely belongs to; plugging into least squares regression equations, then taking predicted values and computing the residuals to see which suspension type has the smallest residual (see below for the solution using this method); recalculating correlation coefficients for each suspension type with the new point to see if the original correlation coefficients change. For front end: yˆ  2136  55.0 x  2136  55.0(28.5)  $568.50; Residual: y  yˆ  700  568.5  131.5 For full: yˆ  2432  44.0 x  2432  44.0(28.5)  $1178.00; Residual: y  yˆ  700  1178  478 The error is smaller when the prediction is made with the formula for front end suspension bikes than with the formula for full suspension bikes; I would predict that the bike has a front end suspension. 3.43 Fuel Consumption a) The scatterplot reveals a nonlinear (curved) pattern. The correlation coefficient measuring the strength of a linear relationship is meaningless for nonlinear relationships. b) Due to the nonlinear relationship, the regression equation is not appropriate to model the relationship between driving speed and mpg and cannot be used for making predictions. c) For the range from about 40 mph to 85 mph (or from 5 mph to 40 mph). Over each of these ranges, the relationship is approximately linear.

Section 3.4 Cautions in Analyzing Associations 3.44 Extrapolating murder a) The x-value was approximately 14 for Utah and 30 for Mississippi. b) yˆ  8.25  0.56 x  8.25  0.56(0)  8.25; This prediction makes no sense because we cannot have a murder rate that is less than 0! This occurs because we are extrapolating beyond our data, a statistically dangerous practice. 3.45 Men’s Olympic long jumps a) The observation in the lower left of the scatterplot (1896) may influence the fit of the regression line. This observation was identified because it well below the general trend of the data. b) The prediction from the regression line would be more reasonable than would the prediction based on the mean because of the strong positive linear trend exhibited by the data. c) No. Extrapolating predictions well beyond the range of the observed x values is unreliable. No one knows whether the linear trend continues so many years out.

56 Statistics: The Art and Science of Learning from Data, 4th edition 3.46 U.S. average annual temperatures a) The regression equation is Temperature = 27.304 + 0.01273Year Fitted Line Plot of Temperature vs Year Temperature = 27.30 + 0.01273 Year

Temperature

55 54 53 52 51 50 1900

1920

1940

1960 Year

1980

2000

2020

The slope of 0.01273 indicates a predicted increase of 0.01273 degree for each increase of one year. b) (i) yˆ  27.304 + 0.01273(2016) = 52.97 (ii) yˆ  27.304 + 0.01273(2500) = 59.13 c)

I have more faith in the prediction made for 2016. It is dangerous to extrapolate to a year that is so far off – like 2500. The temperature trends might have changed drastically by that point (although they might have even changed by 2016). 3.47 Murder and education a) x = 15%; yˆ  3.1  0.33(15)  1.85 x = 40%; ŷ = –3.1 + 0.33(40) = 10.1 b) x = 15%; ŷ = 8.0 – 0.14(15) = 5.9 x = 40%; ŷ = 8.0 – 0.14(40) = 2.4 c) D.C. is a regression outlier because it is well removed from the trend that the rest of the data follow. d) Because D.C. is so high on both variables, it pulls the line upwards on the right and suggests a positive correlation, when the rest of the data (without D.C.) are negatively correlated. The relationship is best summarized after removing D.C. 3.48 Murder and poverty a) Yes; D.C. has a large influence on this regression analysis. When it is included, the intercept decreases by 4.5, and the slope, although still positive, becomes over twice as large. b) Based on this information, the poverty values of D.C. would be relatively large. A high poverty value, coupled with the high D.C. murder rate, would lead this to be a regression outlier that would pull the right hand side of the regression line upwards. 3.49 TV watching and the birth rate a) The U.S. is an outlier on (i) x, (ii) y, and (iii) relative to the regression line for the other six observations. b) The two conditions under which a single point can have such a dramatic effect on the slope: (1) the x value is relatively low or high compared to the rest of the data; (2) the observation is a regression outlier, falling quite far from the trend that the rest of the data follow. In this case, the observation for the U.S. is very high on x compared to the rest of the data. In addition, the observation for the U.S. is a regression outlier, falling far from the trend of the rest of the data. Specifically, TV watching in the U.S. is very high despite the very low birth rate.

Chapter 3: Association: Contingency, Correlation, and Regression 57 3.49 (continued) c) The association between birth rate and number of televisions is (i) very weak without the U.S. point because the six countries, although they vary in birth rates, all have very few televisions and these amounts don’t seem to relate to birth rate. The association is (ii) very strong with the U.S. point because the U.S. is so much higher in number of televisions and so much lower on birth rate that it makes the two variables seem related. A very high number of televisions does coincide with a very low birth rate in the U.S., whereas all the Asian countries are relatively high in birth rates and low in numbers of televisions. d) The U.S. residual for the line fitted using that point is very small because that point has a large effect on pulling the line downward. There are no other data points near that line, and all other data points are in the far corner, so the line runs almost directly through the U.S. point. 3.50 Looking for outliers a) Fitted Line Plot of Single Parent vs College single parent = 21.16 + 0.0889 college

single parent

40 35 30 25 20 15 10 15

college

Fitted Line Plot w/o District of Columbia

Fitted Line Plot w/o Utah

single parent = 28.12 - 0.2065 college

single parent = 21.07 + 0.1002 college

30.0

27.5

25.0

single parent

The point at x (college) equals approximately 38 (District of Columbia) is quite a bit different from other observations and at y (single parent) equals approximately 13 (Utah) is a little different from other observations. b) (i) single parent = 21.2 + 0.089college; see (a). (ii) single parent = 28.1 – 0.206college (iii) single parent = 21.1 + 0.100college

22.5 20.0 17.5 15.0

30 25 20

25 college

25 30 college

The observation for the District of Columbia, with x is about 38, has a pretty strong influence on the slope. When it is deleted, the previously positive slope is now negative.

58 Statistics: The Art and Science of Learning from Data, 4th edition 3.50 (continued) d) yˆ  21.2  0.089(38.3)  24.61 (rounds to 24.6) yˆ  28.1  0.206(38.3)  20.21 (rounds to 20.2)

Yes, the predicted value for D.C. is about four points different depending on which equation is used. 3.51 Regression between cereal sodium and sugar a) Scatterplot of Sugar (g) vs Sodium (mg) 20

Sugar (g)

0 0

100

150 200 Sodium (mg)

250

300

350

The point (18, 340), which represents Raisin Bran, meets the two criteria in that it has an x value far from all the others and falls far from the trend that the rest of the data follow. b) All data points Regression line: SUGAR(g) = 8.949 – 0.00119SODIUM(mg); Correlation: –0.017 All data points except Raisin Bran Regression line: SUGAR(g) = 11.77 – 0.02221SODIUM(mg) Correlation: –0.30; Raisin Bran lowers the intercept and makes the slope less steep; overall, the two variables appear less strongly associated when Raisin Bran is included. Fitted Line Plot of Sugar (g) vs Sodium (mg)

Fitted Line Plot of Sugar (g) vs Sodium (mg)

Sugar (g) = 8.949 - 0.00119 Sodium (mg)

Sugar (g) = 11.77 - 0.02221 Sodium (mg)

18 16 14

15 Sugar (g)

Sugar (g)

12 10

10 8 6 4 2

0 0

100 150 200 250 300 350 Sodium (mg)

100 150 200 Sodium (mg)

250

300

3.52 Gestational period and life expectancy a) The scatterplot shows a strong positive correlation between the length of the gestational period and longevity. The hippo has a higher longevity than what would be expected based on the data from other animals. Although the elephant has by far the longest gestational period, its longevity is also the longest and follows the general trend. b) Statistical software will verify the results.

Chapter 3: Association: Contingency, Correlation, and Regression 59 3.52 (continued) c) The elephant and the hippo are outliers. The elephant has a gestational period that is far longer than that of the other animals but is not a regression outlier (because it follows the general trend). The hippo is unusual in its combination of average gestational length with an above-average longevity and is a regression outlier because it does not fit with the general trend. Neither seems too influential in that removing either will not change the slope much. See (d). d) Removing the elephant: ŷ = 6.47 + 0.043x and r = 0.75. The slope is almost identical, but the correlation is much weaker, dropping from 0.86 to 0.75. Removing the hippo: ŷ = 6.02 + 0.043x and r = 0.92. The slope is almost identical, but the correlation is much stronger, increasing from 0.86 to 0.92. Fitted Line Plot w/o Hippo

Fitted Line Plot w/o Elephant

longev ity = 6.019 + 0.04256 gestation

longev ity = 6.468 + 0.04339 gestation

30 25

longevity

20 15

20 15 10

5 0

100 200 300 400 500 600 700 gestation

100

200 gestation

300

400

3.53 Height and vocabulary a) There is not likely a causal relationship between height and vocabulary. Rather, it is more likely that both increase with age. b) Values would be higher on both variables as age increases (e.g., ten-year-olds should be higher on both variables than are five-year-olds). At each age, there should be no overall trend (i.e., some first graders would be high on both, some would be low on both, and some would be low on one and high on the other). Age plays a role in the association because age predicts both height and vocabulary. Height and vocabulary are related because they have a common cause. c) In the scatterplot below, we see an overall positive correlation between height (in inches) and vocabulary (assessed on a scale of 1-10) if we ignore grade. However, if we look within each grade, we see roughly a horizontal trend and no particular association. It is age that predicts both height and vocabulary. Scatterplot of Vocabulary vs Height grade 1 6 12

vocabulary

8 6 4 2 0 35

55 height

60 Statistics: The Art and Science of Learning from Data, 4th edition 3.54 More firefighters cause worse fires? a) No. Having more firefighters is likely a result of the fire being bad to start with. It is the bad fire that leads both to the increased number of firefighters and the increased damage. The two variables in this correlation (number of firefighters and amount of damage) have the common cause of severity of the fire. b) As mentioned in (a), the severity of the fire is a possible third variable that could considered a common cause of x and y. Each student’s hypothetical scatterplot will be different, but points that are high in firefighters and damage will be high in fire severity, and points that are low in firefighters and damage will be low in fire severity. 3.55 Anti-drug campaigns a) Although there are several possible responses to this exercise, one possible lurking variable could be television watching. Kids who are home watching television are more likely to see these ads and are less likely to be out doing drugs. b) Pot smoking (or the lack thereof) might be caused by anti-drug ads, but almost might be caused by other variables, some of which could be associated with anti-drug ads. Including television watching, as mentioned above, other such causal variables could include regular school attendance (a place where students might see more anti-drug ads), parental influence, and neighborhood type. 3.56 What’s wrong with regression? a) It’s dangerous to extrapolate far beyond one’s data. b) Correlation is not causation. For example, there could be a third variable (e.g., income) that is related to both of these variables. c) This point would be a regression outlier, and could greatly affect the regression equation. We should report the results without this regression outlier. 3.57 Education causes crime? a) In Minitab, your columns should look similar to the following: Education Crime Rate Rural/Urban 70 140 u 75 120 u 80 110 u 85 105 u 55 50 r 58 40 r 60 30 r 65 25 r b) Scatterplot of Crime Rate vs Education Rural/Urban r u

140

Crime Rate

120 100 80 60 40 20 55

70 Education

The correlation for all 8 data points is 0.73. This indicates a strong, positive linear correlation.

Chapter 3: Association: Contingency, Correlation, and Regression 61 3.57 (continued) d) (i) The correlation for the urban counties is –0.96, which is a very strong, negative linear correlation. (ii) The correlation for the rural counties is –0.95, which is also a very strong, negative linear correlation. Note that for each subset of data, a higher education rate is associated with a lower crime rate; however, because both the education and crime rates are so much higher for urban counties than for rural, the correlation appears positive when all of the data is considered together. This is a good example of why it is always important to look at a graphical display of your data to determine if a measure of linear correlation is appropriate. 3.58 Death penalty and race a) Death penalty No death penalty 53 414 White defendant  0.11  0.89 467 467 11 37 Black defendant  0.23  0.77 48 48 Black defendants were more likely than were white defendants to get the death penalty when the victim was white. b) Death penalty No death penalty 0 16 White defendant  0.00  1.00 16 16 4 139 Black defendant  0.03  0.97 143 143 Black defendants were more likely than were white defendants to get the death penalty when the victim was black. c) Death Penalty Defendant’s Race Yes No Total White 53 430 483 Black 15 176 191 Death penalty No death penalty 53 430 White defendant  0.11  0.89 483 483 15 176 Black defendant  0.08  0.92 191 191 These data indicate that white defendants were more likely than were black defendants to get the death penalty. d) These data satisfy Simpson’s paradox which occurs when the association between two variables changes after a third variable is included and the data are analyzed at separate levels of that variable. In this case, the race of the victim played a role. There were so many more white victims, and most people were killed by a member of their own race. Thus, there were more white killers to be put to death. Yet, the few blacks who killed white people were more likely to be put to death than were the many white people who killed white people, and the few white people who killed black people were never put to death. e) We would call victim’s race a confounding variable. Victim’s race and defendant’s race both predict death penalty status. Confounding occurs when two explanatory variables are associated with a response variable, but also with each other. In such cases, it is difficult to determine whether either of them truly causes the response, because the variable’s effect could be at least partly due to its association with the other variable.

62 Statistics: The Art and Science of Learning from Data, 4th edition 3.59 NAEP scores a) The response variable is eighth grade math scores, and the explanatory variable is state. b) The third variable is race. Nebraska has the overall higher mean because the race ratio is quite different from that in New Jersey. There is a higher percentage of whites and a lower percentage of blacks in Nebraska than in New Jersey, and overall, whites tended to have higher math scores than blacks. 3.60 Age a confounder? a) The researchers wondered whether age was responsible for the association between regular exercise and serious illness. It is possible that those who exercised more were younger, and that youth, rather than exercise, was responsible for the lower rate of serious illness. b) If age were not actually measured, it would be a possible lurking variable. A lurking variable is one that might be present, but is not measured in the study. We do not know if it is a confounding variable, but it might be. If it were included in the study, and were found to be linked with both the explanatory and response variables, then it would become a confounding variable.

Chapter Problems: Practicing the Basics 3.61 Choose explanatory and response a) The response variable is assessed value, and the explanatory variable is square feet. b) The response variable is political party, and the explanatory variable is gender. c) The response variable is income, and the explanatory variable is education. d) The response variable is pounds lost, and the explanatory variable is type of diet. 3.62 Graphing data a) (a) Both variables are quantitative. (b) Both variables are categorical. (c) Both variables are quantitative. (d) Pounds lost is quantitative, and diet is categorical. b) (a) These data could be graphed with a scatterplot (or histogram for individual variables). (b) These data could be graphed with a bar graph with side-by-side bars for the two genders for each political party (or we could use two separate pie charts, one for each gender). (c) These data could be graphed with a scatterplot (or histogram for individual variables). (d) These data could be graphed with side-by-side box plots or histograms (one for each diet). 3.63 Life after death for males and females a) Opinion about life after death Gender Yes No Total n 621 187 1.00 808 Male  0.769  0.231 808 808 834 145 1.00 979 Female  0.852  0.148 979 979 b) 76.9% of males believe in life after death, as opposed to 85.2% of females. The difference in the proportions between females and males is 0.852 – 0.769 = 0.083. The proportion of females believing in life after death is about 8 percentage points higher than the one for males. The ratio of proportions is 0.852/0.769 = 1.108. The proportion of females believing in life after death is about 11% higher (or 1.1 times higher) than the one for males.

Chapter 3: Association: Contingency, Correlation, and Regression 63 3.64 God and happiness a) Happy 2: PRETTY HAPPY

3: NOT TOO HAPPY

Row Total

143

190

316

108

252

415

520

752

190

1462

782

1391

340

2513

GOD

1: VERY HAPPY 1: DONT BELIEVE 2: NO WAY TO FIND OUT 3: SOME HIGHER POWER 4: BELIEVE SOMETIMES 5: BELIEVE BUT DOUBTS 6: KNOW GOD EXISTS Column Total

GOD

1: DONT BELIEVE 2: NO WAY TO FIND OUT 3: SOME HIGHER POWER 4: BELIEVE SOMETIMES 5: BELIEVE BUT DOUBTS 6: KNOW GOD EXISTS

1: VERY HAPPY 13  0.149 87

Happy 2: PRETTY HAPPY 60  0.690 87

3: NOT TOO HAPPY 14  0.161 87

36  0.252 143

88  0.615 143

19  0.133 143

1.00

77  0.244 316

190  0.601 316

49  0.155 316

1.00

28  0.311 90

49  0.544 90

13  0.144 90

108  0.260 415

252  0.607 415

55  0.133 415

Row Total 1.00

1.00

520 752 190  0.356  0.514  0.130 1.00 1462 1462 1462 Column Total 0.311 0.554 0.135 1.00 Those who respond that they “know God exists” are most likely to report being “very happy.” c) It is more informative to view the proportions in (b) because they allow us to see the proportion of people with a certain level of reported happiness given that they have a certain belief about God. The table in (a) doesn’t allow us to make these kinds of comparisons. 3.65 Degrees and income a) The response variable is income. It is quantitative. b) The explanatory variable is degree. It is categorical. c) A bar graph could have a separate bar for each degree type. The height of each bar would correspond to mean salary level for a given category.

64 Statistics: The Art and Science of Learning from Data, 4th edition 3.66 Bacteria in ground turkey a) The difference in proportion of positive tests between ground turkey with no claim of antibiotic use and a claim of antibiotic use was 26/46 – 23/28 = 0.565 – 0.821 = –0.26; the percent of packages that tested positive for Enterococcus is 26 percentage points lower for conventional packages than for those claiming no use of antibiotics. b) The ratio of the proportion of positive tests between ground turkey with no claim of antibiotic use and a claim of antibiotic use was 0.565/0.821 = 0.69; the proportion of packages testing positive for Enterococcus is 31% lower for conventional packages compared to those claiming no use of antibiotics. 3.67 Women managers in the work force a) The response variable is gender, and the explanatory variable is type of occupation. b) Percent of Total in Executive, Administrative, and Managerial Positions Year Female Male Total 1972 0.197 0.803 1.00 2002 0.459 0.541 1.00 c) Based on (b), it does seem that there is an association between these variables. Women made up a larger proportion of the executive work force in 2002 than in 1972. d) The two explanatory variables shown are year and type of occupation. 3.68 RateMyProfessor.com a) The easier a professor grades, the more likely he or she is to receive a higher quality rating. b) We would expect the correlation to be closer to 0 if there was no association between quality rating and easiness of grading. 3.69 Women in government and economic life a) Fitted Line Plot of Women in Parliament (%) vs Female Economic Activity Women in Parliament (%) = - 48.91 + 0.9186 Female Economic A ctiv ity

Women in Parliament (%)

35 30 25 20 15 10 65

75 80 Female Economic Activity

The correlation between women in parliament and female economic activity is 0.745. This correlation is supported by the positive linear trend evident in the scatterplot, but note this is largely driven by the point (for Japan) having female economic activity very low (65). b) See (a): The regression equation is given by yˆ  48.91  0.9186 x . Since the y-intercept would correspond to an x-value of 0, the y-intercept is not meaningful in this case (Female economic activity = 0 is outside of the range of observed data). c) The predicted value for the U.S. is –48.91 + 0.9186(81) = 25.5 with 15.0 – 25.5 = –10.5 as the corresponding residual. The regression equation underestimates the percentage of women in parliament by 10.5% for the U.S. d) b = 0.56(9.8/7.7) = 0.7127 and a = 26.5 – 0.7127(76.8) = –28.24. Thus, the prediction equation is given by yˆ  28.24  0.7127 x.

Chapter 3: Association: Contingency, Correlation, and Regression 65 3.70 African droughts and dust a) (i) B, (ii) C, (iii) A b) Dust and rainfall amounts are negatively related. As one increases, the other decreases. 3.71 Crime rate and urbanization a) An increase of 100 is an increase of 100 times the slope = 0.56(100) = 56. As the urban nature of a county goes from 0 to 100, the predicted crime rate increases by 56%. b) The correlation indicates a relatively strong, positive relationship. c)





The slope and correlation are related by the formula b  r s y s x ; 0.56  0.67  28.3 34.0  .

3.72 Gestational period and life expectancy revisited a) Animals with a gestational period that is 100 days longer are predicted to live 0.045(100) = 4.5 years longer. b) Leopards are predicted to live 6.28 + 0.045(98) = 10.7 years. c) 73% of the variability in the longevity of animals can be explained by the linear relationship between longevity and gestational period. d) 40 weeks is 280 days. The regression equation would predict an average longevity of 6.28 + 0.045(280) = 18.9 years for humans. 3.73 Height and paycheck a) The response variable is salary, and the explanatory variable is height. b) The slope of the regression equation is 789 when height is measured in inches and income in dollars. An increase of one inch predicts an increase in salary of $789. c) An increase of seven inches (from 5 foot 5 to 6 feet) is worth a predicted $789 per inch, or $5523. 3.74 Predicting college GPA a) Scatterplot of College GPA vs High School GPA 30

college GPA

25 20 15 10 5 0 0

2 high school GPA

This equation is not realistic because it predicts an increase of seven in college GPA for an increase of one in high school GPA when GPA ends at 4.0! This would predict a college GPA of 28.5 when high school GPA is 4.0. b) ŷ = 0.5 + 0.7(3.0) = 2.6 ŷ = 0.5 + 0.7(4.0) = 3.3

As high school GPA goes up by 1.0 (from 3.0 to 4.0), predicted college GPA goes up by exactly the amount of the slope, 0.7 (from 2.6 to 3.3). 3.75 College GPA = high school GPA The y-intercept would be zero (the line would cross the y-axis at zero when x was zero), and the slope would be one (an increase of one on x would mean an increase of one on y). In this case, the line matches up with the exact points on x and y (0.0 with 0.0, 3.5 with 3.5, etc.). This means that your predicted college GPA equals your high school GPA. Copyright © 2017 Pearson Education, Inc.

66 Statistics: The Art and Science of Learning from Data, 4th edition 3.76 What’s a college degree worth? a) The slope is the amount that y is predicted to increase when x increases by one unit. In this case, income (y) increases by 0.9 million when years of education (x) increases by four. It would follow that an increase of one would be ¼ of the four year increase: 0.9/4 = 0.225 million dollars (i.e., $225,000). b) Earnings per year would be 1/40 of earnings per 40 years. Thus, the slope would be 1/40 of the slope in (a): 0.005625 million dollars (that is, $5625). We also can calculate it in the same way we did (a), but first dividing each income by 40. 3.77 Car weight and gas hogs a) The slope indicates the change in y predicted for an increase of one in x. Thus, a 1000 increase in x would mean a predicted change in y of 1000 times the slope: (–0.0052)(1000) = –5.2 (poorer mileage). b) ŷ = 47.32 – 0.0052(6400) = 14.04. The actual mileage is 17; thus, the residual is 17 – 14.04 = 2.96. The Hummer gets 2.96 more miles to the gallon than one would predict from this regression equation. 3.78 Predicting Internet use from cell phone use a) (i) The response variable is Internet use, and the explanatory variable is cell-phone use. (ii) The scatterplot shows a positive association. (iii) There is little variability of internet use for cellular use below 30%; for cellular use above 30%, internet use is generally higher but also has higher variability. b) One nation that has less Internet use than one would expect, given its level of cell-phone use is the point with approximate x- and y-coordinates of 75 and 13, respectively, or (75, 13). c) As x increases from 0 to 90, predicted y increases from 1.3 to 44.0. This represents a positive association. ŷ = 1.27 + 0.475(0) = 1.3 ŷ = 1.27 + 0.475(90) = 44.0

ŷ = 1.27 + 0.475(45.1) = 22.7; the predicted Internet use for the U.S. is 22.7%.

The residual (27.5) is the difference between the actual value of 50.15 and the predicted value of 22.7; 50.15 – 22.7 = 27.5. The large positive residual indicates that the U.S. has a much higher Internet use percentage than one would predict from this regression equation. 3.79 Income depends on education? a) For each increase of one percentage in x, we would expect an increase in the predicted value on y by 0.42. Thus, an increase in 10 would be 10 times the slope: 0.42(10) = 4.2 (or $4200).





b) The slope can be calculated using the formula b  r s y s x . Thus, 0.42  r  4.69 8.86   r  0.42 8.86 4.69   0.79. (i) The positive sign indicates a positive relationship; as one variable goes up, the other goes up. As one goes down, the other goes down. (ii) A correlation of 0.79 indicates a strong relationship.

Chapter 3: Association: Contingency, Correlation, and Regression 67 3.80 Fertility and GDP a) Based on the plot, regression seems appropriate. Fitted Line Plot of FERTILITY vs GDP FERTILITY = 3.534 - 0.07220 GDP

FERTILITY

6 5 4 3 2 1 0

GDP

b) The correlation is –0.615. The regression equation is: FERTILITY = 3.53 – 0.072GDP. c) We cannot compare slopes to determine which one indicates a bigger association because slopes are dependent on the measures used. We must use correlation which does not depend on the measure; correlation is based on standardized variables. d) From (c). the correlation between GDP and fertility is –0.615. Thus, contraception has a stronger association with fertility than does GDP. 3.81 Women working and birth rate a)

ŷ = 36.3 – 0.30(0) = 36.3 ŷ = 36.3 – 0.30(100) = 6.3

When women’s economic activity is 0%, predicted birth rate (36.3) is much higher than the predicted birthrate (6.3) when women’s economic activity is 100%. b) The correlation between birth rate and women’s economic activity is bigger in magnitude than the correlation between crude birth rate and nation’s GNP, indicating a larger association between birth rate and women’s economic activity. 3.82 Education and income a)





b  r s y s x  0.50 16, 000 2   4000 , so the slope is 4000.

b) The correlation will not change because it is not dependent on which variable is considered the explanatory and which is considered the response. The slope will change in value as shown in the equation below.





b  r s y s x  0.50  2 16, 000  0.000625

3.83 Income in euros a)

 1 euro  The intercept is –20,000 in dollars; thus, the intercept in euros is $20,000   16,000 euro.  $1.25 

 1 euro  b) The slope of the regression equation is 4000 in dollars; thus, the slope in euros is $4000  =  $1.25  3200 euro. c) The correlation remains the same when income is measured in euros because correlation is not dependent on the units used – whether dollars or euros. It is still 0.50.

68 Statistics: The Art and Science of Learning from Data, 4th edition 3.84 Changing units for cereal data a) SODIUM(mg) = 169 – 0.00025SUGAR(mg) The slope changes from grams to milligrams. For every 1 milligram increase in sugar, we expect the sodium content to decrease by 0.00025 milligrams. The y-intercept does not change, it remains in milligrams. b) SODIUM(mg) = 169 – 0.009SUGAR(oz) If we change the unit of measurement for sugar from grams to ounces, we would divide the slope by the appropriate constant. The new slope can be calculated using this relationship as –0.25/28.35 = –0.009. 3.85 Murder and single-parent families a) The District of Columbia is the outlier to the far, upper right. This would have an effect on the regression analysis because it is a regression outlier; that is, it is an outlier on x and also is somewhat out of line with the trend of the rest of the data. b) When the District of Columbia is included, the y-intercept decreases and the slope increases. The District of Columbia point pulls the regression line upwards on the right side. 3.86 Violent crime and college education a) Fitted Line Plot of Violent Crime rate vs College v iolent crime rate = 199.8 + 9.598 college

1600

violent crime rate

1400 1200 1000 800 600 400 200 0 15

college

The point at approximately x = 38 might be influential in a regression analysis. b) See (a), violent crime rate = 200 + 9.6college This slope suggests that for every 1% increase in college educated people, there is a predicted increase of 9.6 in the violent crime rate. c) violent crime rate = 525 – 4.2 college Fitted Line Plot of Violent Crime rate vs College v iolent crime rate = 525.0 - 4.200 college

900

violent crime rate

800 700 600 500 400 300 200 100 0 15

25 college

Chapter 3: Association: Contingency, Correlation, and Regression 69 3.86 (continued) This slope indicates that for every increase of 1% in college educated people, there is a predicted decrease of 4.2 in the violent crime rate. The deletion of one point has changed the association from a positive one to a negative one. 3.87 Violent crime and high school education a) Fitted Line Plot of Violent Crime rate vs High School v iolent crime rate = 2545 - 24.61 high school

1600

violent crime rate

1400 1200 1000 800 600 400 200 0 75.0

77.5

80.0

82.5 85.0 high school

87.5

90.0

92.5

The point with a y value of around 1500 is furthest from other data points. b) See (a), violent crime rate = 2545 – 24.6 high school The slope indicates that for each increase of one percent of people with a high school education, the predicted violent crime rate decreases by 24.6. c) violent crime rate = 2268 – 21.6 high school Fitted Line Plot of Violent Crime rate vs High School v iolent crime rate = 2268 - 21.61 high school

900

violent crime rate

800 700 600 500 400 300 200 100 0 75.0

77.5

80.0

82.5 85.0 high school

87.5

90.0

92.5

The slope indicates that for each increase of one percent of people with a high school education, the predicted violent crime rate decreases by 21.6. This is similar to the slope in (b).

70 Statistics: The Art and Science of Learning from Data, 4th edition 3.88 Crime and urbanization a) Boxplot of Violent Crime Rate 1600 1400

violent crime rate

1200 1000 800 600 400 200 0

Along with the box plot, the mean of 441.6 and standard deviation of 241.4 suggest that there might be some skew; the standard deviation is fairly large compared to the mean. Because the lowest possible value is 0, scores can only be as much as 1.8 standard deviations below the mean. Thus, we expect the distribution to be skewed to the right. b) Scatterplot of Violent Crime Rate vs Urban 1600

violent crime rate

1400 1200 1000 800 600 400 200 0 20

60 urban

100

The observation at around y = 1500 appears to be a potentially influential observation, although it is only a regression outlier on one of the two criteria; it is outside of the trend of the rest of the data, but is not far from the rest of the data on x. Without this point, we might expect that the slope would be somewhat smaller. This point is very high on both x and y and so we would expect that it would pull the regression line upwards.

Chapter 3: Association: Contingency, Correlation, and Regression 71 3.88 (continued) c) violent crime rate = 111 + 4.56 metropolitan; The slope decreases from 5.93 to 4.56. Fitted Line Plot of of Violent Crime Rate vs Urban v iolent crime rate = 111.2 + 4.564 urban

900

violent crime rate

800 700 600 500 400 300 200 100 0 20

60 urban

100

3.89 High school graduation rates and health insurance a) Scatterplot of % w/o Health Ins vs HS Grad Rate

% w/o Health Ins

84 86 88 HS Grad Rate

The scatterplot suggests a negative relationship. b) The correlation is –0.45. As does the scatterplot, this correlation indicates a negative association. c) The regression equation is: Health Insurance = 49.2 – 0.42HS_Grad_Rate. The slope of –0.42 indicates that for each increase of one in the percentage who are high school graduates, the predicted percentage of individuals without health insurance goes down by 0.42. This summarizes the negative relationship between the variables. 3.90 Women’s Olympic high jumps a) Women_Meters = –10.94 + 0.0065Year_Women (i) For 2016: Women_Meters  10.94  0.0065  2016  2.16 meters, or 7.1 feet. (ii) For 3000: Women_Meters  10.94  0.0065 3000   8.56 meters, or 28.1 feet. b) Although 2016 is outside of the range of data, it is the next data point in the time sequence and the regression equation should be able to predict its value fairly well assuming there are no major changes to the sport; however, the year 3000 is too far beyond the range of the data to extrapolate.

72 Statistics: The Art and Science of Learning from Data, 4th edition 3.91 Income and height a) Men tend to be taller and make more money than women. Gender could be the common cause of both of these variables. b) If gender had actually been measured, it would be a confounding variable. When measured, a lurking variable becomes a confounding variable. 3.92 More TV watching goes with fewer babies? a) Correlation does not indicate causation. There are lots of other possible ways these two variables could be related. b) There are several possible responses to this exercise. One possible lurking variable is GDP, because nations with higher GDP tend to have lower birth rates and higher television ownership. 3.93 More sleep causes death? a) As people age, they might both sleep more and be more likely to die. This could be the lurking variable that influences both. b) Subject’s age might be the common cause of both of the variables reported in this study. It might actually cause people to sleep more and cause them to be more likely to die. 3.94 Ask Marilyn a) White Collar Hired Gender Yes No Total Male 30 170 200 Female 40 160 200 Blue Collar Hired Gender Yes No Total Male 300 100 400 Female 85 15 100 b) Male: 270/600 = 45%; Female: 175/300 = 58% Hired Gender Yes No Total Male 330 270 600 Female 125 175 300 c) This is an example of Simpson’s paradox, the fact that the direction of an association between two variables can change after we include a third variable and analyze the data at separate levels of that variable. When the third variable of type of job (white vs. blue collar) is included in the analysis, women fare better than do men, whereas when this variable is not included, women fare worse than do men.

Chapter Problems: Concepts and Investigations 3.95 NL baseball team ERA and number of wins The responses will be different for each student depending on the methods used. 3.96 Time studying and GPA The responses will be different for each student depending on the methods used.

Chapter 3: Association: Contingency, Correlation, and Regression 73 3.97 Warming in Newnan, GA The regression equation is: Temp = 119 – 0.029Year The regression line indicates a very slight decrease over time. The Central Park data from Example 12 indicate the opposite, a very slight increase over time. 3.98 Regression for dummies Answers will vary. Your answer should say something like: Regression allows us to use information we currently know to develop a way to predict in the future (as long as we don’t predict too far into the future because the trends might change!). For example, we know how many catalogs we mail in a given month, and we know our total sales for the next month. If we can look over the data for a whole year, we can get an idea of how well the numbers of catalogs mailed predicted total sales. The technique of regression allows us to develop an equation that we can use to do this kind of prediction. For example, we might find that the more catalogs we send out, the more sales we have. Then we can make predictions for future months so that we can have an idea of our sales. 3.99 Fluoride and AIDS San Francisco could be higher than other cities on lots of variables, but that does not mean those variables cause AIDS, as association does not imply correlation. Alternative explanations are that San Francisco has a relatively high gay population or relatively high intravenous drug use, and AIDS is more common among gays and IV drug users. 3.100 Fish fights Alzheimer’s a) A lurking variable is a possible third variable that might affect the relationship between two other variables. In this example, those who eat fish had a lower risk of Alzheimer’s disease. There might be another variable, however, that’s related to both of these. For example, those who eat healthy foods like fish might also exercise; it might be exercise, rather than the fish, that leads to the lower rate of Alzheimer’s. b) There can be multiple causes for any particular response variable. In the example in (a) of this exercise, exercise and fish might both cause lower rates of Alzheimer’s. A lurking variable (a third variable as described above) and the explanatory variable (the variable that we think is causing something else) might both cause the outcome of interest, in this case Alzheimer’s. c) People should be skeptical when they read new research results such as in this story because the researcher might not have considered all possible explanations for the correlation. 3.101 Dogs make you healthier Stress level, physical activity, wealth, and social contacts are all possible lurking variables. Any one of these variables may contribute to one’s physiological and psychological human health as well as be associated with whether or not a person owns a dog. For example, it may be that people who are more active are more likely to own a dog as well as being physically healthier. Thus, it is possible that one of these lurking variables is responsible for the perceived association between health and dog ownership and if they had been controlled in the study, the association would not be present. 3.102 Multiple choice: Correlate GPA and GRE The best answer is (d). 3.103 Multiple choice: Properties of r The best answer is (b). 3.104 Multiple choice: Interpreting r The best answer is (a). 3.105 Multiple choice: Correct statement about r The best answer is (d). 3.106 Multiple choice: Describing association between categorical variables The best answer is (b). 3.107 Multiple choice: Slope and correlation The best answer is (c).

74 Statistics: The Art and Science of Learning from Data, 4th edition 3.108 Multiple choice: Interpretation of r2 The best answer is (d). 3.109 True or false a) False, the weakest correlation is between y and x1. b) True, the slope and the correlation would have the same sign. c) True, the slope tells us that an increase of one year leads to a predicted increase of 0.4 thousands of dollars, which translates into $400. d) True, ten times the slope equals four. Income is in thousands of dollars and thus, this predicted increase is $4000. ♦♦3.110 Correlation does not depend on units a) If we convert income from British pounds to dollars, then each pound is now worth $2. In other words, we multiply each score by two. Thus, each y-value doubles, the mean of y is now doubled, and the distance of each score from the mean doubles. If this is all so, then the variability, as represented by the standard deviation, has now doubled. As an example, imagine two scores in pounds, 5 pounds and 10 pounds. If both are converted to dollars, they are now $10 and $20. The mean of the two was 7.5 pounds, and it is now $15. The distance of each of the two scores in pounds from the mean was 2.5, but now the distance of each of the two scores in dollars from the mean is $5. b) The correlation would not change in value, however, because the correlation is based on standardized versions of the measures, and is not affected by the measure used. The formula for the correlation uses z-scores, rather than raw scores. Thus, both pounds and dollars would be converted to z-scores and would lead to the same correlation. ♦♦3.111 When correlation = slope





If the two standard deviations are equal, s y s x  1 , so the formula for slope is b  s y s x r  1 r  r. Thus, in cases where the standard deviations are equal, mathematically the slope must equal the correlation. ♦♦3.112 Center of the data a) Algebraically, we can manipulate the formula a  y  bx to become y  a  bx by isolating y . The latter formula is very similar to the regression equation, except the generic predicted y, ŷ , is replaced by the mean of y, y . Similarly, the generic x is replaced by x . Thus, a score on any x that is at the mean will predict the mean for y. b) Here are the algebraic steps to go from one formula to the other. Step 1: Because we know that a  y  bx , we can replace the a in the regression equation with this formula. This yields yˆ  y  bx  bx. Step 2: We can now subtract y from both sides. It cancels itself out on the right, and is now subtracted from the left. This yields yˆ  y  bx  bx, or yˆ  y  bx  bx , (if we switch the two parts of the right hand side of the equation) Step 3: Finally, we can take the b on the right and put it outside parentheses to denote that it is to be multiplied by both variables within the parenthesis. This yields yˆ  y  b  x  x  . This formula tells us that if we figure out how far from the mean our x is, we can multiple that deviation by the slope to figure out how far from the mean the predicted y is. ♦♦3.113 Final exam “regress toward mean” of midterm a)

As we saw in (b) of Exercise 3.112, ŷ  y  b  x  x  is mathematically equivalent to the usual equation we use for regression, yˆ  a  bx. We also know from Exercise 3.111 that when standard deviations are equal, the slope equals the correlation. If we fill in 0.70 for b, we obtain yˆ  y  0.70  x  x  .

Chapter 3: Association: Contingency, Correlation, and Regression 75 3.113 (continued) b) This means that the predicted difference between one’s final exam grade and the mean for the class is 70% of the difference between your midterm exam score and the mean for the class. For example, if the class mean were 80 and your score were 90, you’d be 10 points above the mean. If you multiplied that by 0.70, you’d predict that you’d deviate from the mean on y by seven points. If the mean on y also were 80, your predicted score would be 87. Thus, your predicted score is closer to the mean; it regresses, or comes back, toward the mean.

Chapter Problems: Student Activities 3.114 Analyze your data The responses to this exercise will vary for each class depending on the data files that each class constructed. 3.115 Activity: Effect of moving a point The responses to this exercise will vary depending on the data points provided by the instructor. 3.116 Activity: Guess the correlation and regression The responses to this exercise will vary depending on the randomly generated data points.

Chapter 4: Gathering Data 77

Section 4.1 Experimental and Observational Studies 4.1 Cell phones a) The response variable is a specific type of brain activity. The explanatory variable is whether or not the automated call was placed to the phone on the right ear. b) This was an experiment because researchers controlled via randomization whether the call to a given participant would be received during the first PET scan or the second and then measured the participant’s brain activity under both treatments. 4.2 High blood pressure and binge drinking a) This was an observational study because the experimenter did not assign subjects to treatments. b) The response variable is whether or not a subject dies from stroke or heart attack. The explanatory variable is whether the subject has high blood pressure and binge drinks even occasionally or was a teetotaler with normal blood pressure. c) This does not prove that a combination of high blood pressure and binge drinking causes an increased risk of death by heart attack or stroke. There could be a third variable associated with both high blood pressure/binge drinking and death by heart attack or stroke. 4.3 Low-fat versus low-carb diet? a) The response variable is weight loss (weight after 1 year minus weight before experiment). The explanatory variable is whether one is on a low-fat or low-carbohydrate diet. b) This was an experimental study since the subjects were randomized by the researchers into one of the two groups. c) No. This is an issue of generalizability. Because patients with heart disease or diabetes were excluded from the study, it is inappropriate to say that this applied to everyone. 4.4 Experiments versus observational studies An experiment is preferred over an observational study when either is feasible because it tells us something about cause and effect. In an experiment, the researcher has more control over the levels of the explanatory variable, and is thus able to reduce the possibility of lurking variables. When subjects are randomly assigned to treatments, the groups tend to be balanced in terms of possible lurking variables. For example, if we were interested in whether the use of an online study tool increased statistics grades, we could conduct either an observational study or an experiment. In an observational study, we would ask students whether they used an available online tool (the explanatory variable) and would assess their grades (the response variable). In an experiment, we would randomly assign students either to use the online tool or not to use the online tool. If our studies supported our hypothesis that the online tool increased student grades, the experiment would be a stronger finding because random assignment likely made the groups balanced in terms of other characteristics, such as hours of studying. With the observational study, it is possible that the students who chose to use the available extra tool were the students who tended to study more to begin with, and thus, received higher grades because of their higher overall levels of studying. However, it is not always possible for researchers to carry out a study in an experimental framework. Many factors can make experimentation impossible or next to impossible. One such factor is ethical concerns. For example, one could not study the effect of smoking during pregnancy by assigning one group of pregnant women to smoke during pregnancy and the other to not smoke. 4.5 School testing for drugs Although this study found similar levels of drug use in schools that used drug testing and schools that did not, lurking variables might have affected the results. For example, it is possible that schools that institute drug testing are those in higher crime areas than are those that did not choose to use drug testing. Perhaps the level of drug use in these higher crime communities would have been much higher without these programs than it was with them. 4.6 Hormone therapy and heart disease a) It is possible that a lurking variable such as health-consciousness caused the observed difference. Perhaps women who were higher in health-consciousness had better access to the hormonereplacement drug than did those lower in health-consciousness. In this case, it is possible that healthconsciousness and the better nutrition, better overall healthcare, and other health benefits that accompany it are the cause of the decreased risk of heart disease. (Other possible lurking variables are genetics and wealth.) Copyright © 2017 Pearson Education, Inc.

78 Statistics: The Art and Science of Learning from Data, 4th edition 4.6 (continued) b) Different types of studies can lead to different findings. An observational study is more susceptible to effects of lurking variables, such as health-consciousness, so its results could easily differ from those of a controlled experiment. An experiment would randomly assign subjects to treatments – hormonereplacement drugs or not – and thus would have more balanced groups. Levels of healthconsciousness and its associated benefits would be more evenly distributed between groups, and differences in the response variable would be more likely due to the explanatory variable. c) Randomized experiments, when feasible, are preferable to observational studies because they reduce the effects of lurking variables. Randomization into the treatment groups will in theory evenly spread out the effects of the lurking variables within all the treatment groups. 4.7 Speaking foreign languages a) The response variable is saving self versus sacrificing self. The explanatory variable was language used (native versus foreign). b) This study is an observational study. No assignment of treatments was made by the researchers; the researchers simply observed the language choice made by the subjects and the response. c) No, the reason someone chooses to speak a foreign language (i.e., his or her global awareness or higher education level) might be the true cause of the choice they made. 4.8 Breast-cancer screening a) This is an observational study since the women were not assigned to treatment groups. b) The response variable is whether or not the woman died from breast cancer during the time period of study. The explanatory variables are time period during which the woman was observed (1996 to 2005 or 1986 to 1995) and whether or not she was living in a country with mammography screening. c) The study does not prove that being offered mammography screening causes a reduction in death rates associated with breast cancer because observational studies cannot demonstrate causation. There could be other variables associated with mammography screening and breast cancer survival rates. 4.9 Experiment or observe? a) observational study (unethical to assign people to smoking condition) b) observational study (can’t assign people to SAT scores) c) experiment (can assign recipients to catalog condition) 4.10 Baseball under a full moon a) The comment is based on observational data. b) No, the Boston Brouhahas should not be concerned. It is more likely that it is mere coincidence or that there is a lurking variable affecting his observed finding. 4.11 Seat belt anecdote Anecdotal evidence cannot be expected to be representative of the whole population. The seat belt incident might be the exception, rather than what is typical. Death rates are in fact higher for those who do not wear seat belts. 4.12 Poker as a profession? No, Nick’s anecdotal evidence should not soothe the concerns of Tony’s mother. His friend’s success is an exception, most professional poker players do not obtain this kind of success. 4.13 What’s more to blame for obesity? a) This is an observational study since no assignment of treatments was made by the researchers; the researchers simply observed current habits of the subjects. b) The response variable is weight gain. The explanatory variables are exercise habits and caloric intake. c) No, it demonstrates a correlation, but observational studies cannot demonstrate causation. An experiment would be needed to show causation. d) Motherhood leads to less exercise, eating more, and a more sedentary lifestyle.

Chapter 4: Gathering Data 79 4.14 Census every 10 years? a) Censuses are extremely costly and time consuming. b) The census gives the government a count of the population, enabling it to make better plans for health, employment, education, etc. in the future. It also allows the government to re-apportion congressional seats. c) Answers will vary. Examination of the form will show that of the 10 questions on the form, Question 6 pertained to gender and Question 8 pertained to race.

Section 4.2 Good and Poor Ways to Sample 4.15 Choosing officers a) PV PS PT PA VS VT VA ST SA TA b) Given that there are ten combinations, there is a one in ten, or ten percent chance that a particular sample of size two will be drawn. c) The Activity Coordinator is in four of the ten combinations; thus, there is a four in ten, or forty percent chance that she/he will be chosen. 4.16 Simple random sample of students Answers will vary. To use a random number generator, select the minimum value as 1, the maximum value as 60, and the number to select to two. Using a random number table, select two-digit numbers until you have obtained two distinct values between 01 and 60. 4.17 Auditing accounts–app Answers will vary each time this is run. The accounts should be labeled from 01 to 60, then random twodigit numbers should be generated. Using a random number table, select the first ten two-digit numbers that fall between 01 and 60. Ignore duplicates. 4.18 Sampling from a directory Using a random number table, you would select five-digit random numbers, ignoring 00000, and numbers above 50,000, as well as duplicates until 10 numbers were found. Using a random number generator, you would select the minimum value as 1, the maximum value as 50,000, and the number to select to ten. Then you would find the names associated with those ten numbers. If, for example, you selected the number 13,050, you would turn to the 131st page (which would include 13,001 to 13,100), and then select the fiftieth name. You would continue until you had ten names. 4.19 Bias due to perceived race This example illustrates response bias because at least some subjects are not giving their true responses. 4.20 Confederates a) This is a leading question because it provides negative information within the question – “symbol of past slavery” and “supported by extremist groups.” This negative slant might lead to response bias. b) This is a better way to ask the question because there is neither explicitly negative nor explicitly positive information about the Confederate symbol within the question. 4.21 Instructor ratings Comment a. is valid. The ratings are a volunteer sample and likely not representative of the population. Comment b. is invalid. Taking a simple random sample of a biased sample will not provide you with an unbiased sample. 4.22 Job trends a) The population for this survey was employers. Theoretically, it would be all employers in the United States. b) The number of employers that were surveyed (the sampling frame) is required to calculate the nonresponse rate. c) Two potential sources of bias are nonresponse bias and voluntary response bias. 4.23 Gun control a) Someone who opposes gun control would prefer to quote the statistic that says that more than 75% of Americans say no when asked “Would you favor a law giving police the power to decide who may own a firearm?”. Copyright © 2017 Pearson Education, Inc.

80 Statistics: The Art and Science of Learning from Data, 4th edition 4.23 (continued) b) Both statements would be considered leading, and could increase response bias. The first mentions “illegal gun sales” and the second states “giving police the power to decide” – both statements might sway people’s responses, albeit in different directions. 4.24 Violent video games and family closeness a) The population is all children in Singapore. b) No assignment of treatments was made by researchers; the children were simply observed by the researchers. c) Parents who have a closer relationship also value education more. In families that value education more, children spend less time playing video games. 4.25 Fracking a) The population is all adults in the United States. b) It is almost impossible to ask all adults in the United States their opinions. A random sample gives a reliable estimate of this value with a much smaller set of people. c) Those who chose to respond might feel more strongly than those who did not respond. 4.26 Teens buying alcohol over Internet a) This study likely has sampling bias since the sampling method was not random. Not all teenagers are equally likely to respond to an Internet survey. b) It is possible that teenagers who have purchased alcohol over the Internet are unlikely to respond to the survey because they are fearful of getting caught. This would introduce nonresponse bias into the study. c) It is also possible that not all teenagers answer the survey question truthfully, particularly if they are fearful of getting in trouble for answering in the affirmative. 4.27 Cheating spouses and bias a) Those who don’t admit to being in an extramarital affair might spend more money to hide the fact. Therefore this estimate would be too low. b) If people lied about how much money they spent on an affair, it seems likely that they would underestimate the amount spent, to assuage their guilt. 4.28 Online dating a) It is likely that the study contains sampling bias due to undercoverage because it is unlikely that all men who go online are represented by the study. b) Since the sampling design did not consist of a random sample (participants of the study were selfselected), it is very likely that there is sampling bias due to the sampling design. c) Response bias is probable in this study since many men are unlikely to answer truthfully regarding the question of whether they go online to cheat. 4.29 Identify the bias a) Undercoverage occurs because not all parts of the population have representation, only those who have subscribed to this newspaper the longest. b) One problem with sampling design is that the newspaper does not even use random sampling among its subscribers; but instead the 1000 people who have subscribed the longest. c) Among those who are sent a questionnaire, not all will respond. It is possible that those who respond feel more strongly about the proposal. There was a high percentage of people who did not respond. d) The question is framed in a negative way, perhaps skewing people’s responses in a negative direction – a kind of response bias. 4.30 Types of bias a) A convenience sample, rather than a random sample, might lead to sampling bias. For example, if a researcher collected survey data on exercise at an expensive local gym, the sample would not likely reflect the general population of people who exercise.

Chapter 4: Gathering Data 81 4.30 (continued) b) Undercoverage occurs when the sampling frame (the list of subjects from which the sample is taken) does not include some part of the population. For example, a survey conducted by emailing people would omit from the sampling frame all people without email addresses. These people likely differ in important ways from those who complete the survey. c) Response bias might occur in a survey on drug use in a local high school because students might lie to give what they think the researchers view as an acceptable response. d) A survey about voting intentions in an upcoming election might suffer from nonresponse bias in that those who feel most strongly about the election might be more likely to respond, thus skewing the results.

Section 4.3 Good and Poor Ways to Experiment 4.31 Smoking affects lung cancer? a) This is an experiment because subjects (students in your class) are randomly assigned to treatments (smoking a pack a day, or not ever smoking). Subjects do not choose whether or how much they smoke. b) One practical difficulty is that it is unethical to assign students to smoke, given that it could cause lung cancer. A second is that we would have no way to ensure that our subjects do as assigned and smoke or not smoke according to the assignment. A third is that we would not have results for fifty years – too long to wait for an answer. 4.32 Never leave home without duct tape a) The response variable was whether the wart was successfully removed, and the explanatory variable was type of treatment for removing warts. The experimental units were the 51 subjects between the ages of 3 and 22. The treatments were duct tape therapy and cryotherapy. b) You could label the 51 patients from 01 to 51. You could then pick two-digit random numbers until you had chosen 25 numbers between 01 and 51 and thus 25 patients to assign to one treatment. The rest would be assigned to the second treatment. You would disregard 00 and any number over 52, as well as duplicates. 4.33 More duct tape a) The response variable was whether the wart was successfully removed and the explanatory variable was the type of treatment for removing the wart (duct tape or the placebo). The experimental units were the 103 patients in the Netherlands. The treatments were duct tape therapy and the placebo. b) The difference between the number of patients whose warts were successfully removed using the duct tape method and those using the placebo was not large enough to attribute to the treatment type. In other words, the difference in the success rates could be attributed to random variation. 4.34 Vitamin B a) The response variable is whether or not the subject had a heart attack during the study period. b) The explanatory variable is the treatment type: placebo or one of three different doses of vitamin B. c) The experimental units were the subjects who had recently had a heart attack and were observed during the study period. d) The treatments were the placebo and each of the three different doses of vitamin B. e) The differences were not large enough to support that the observed effect was due to something other than ordinary random variation. 4.35 Facebook study a) This is an experiment because there was a control group and an experimental group and the researchers assigned the Facebook users to each group. b) The experimental units are Facebook users. c) The explanatory variable is manipulated versus not manipulated. The response variables are percentages of all words that were positive and percentage of all words that were negative. d) If the participants were informed, they would not have acted authentically. They were likely upset that they were not informed because that violates the requirement of informed consent.

82 Statistics: The Art and Science of Learning from Data, 4th edition 4.36 Science faculty selection of grad students a) Is there an association (or relationship) between gender and employability in how science faculty selects lab managers? b) The explanatory variable is gender. The treatments are male or female. The response variables are competency, employability, and starting salary. The experimental units are faculty. c) If faculty always chose students of their gender, it would bias the results. 4.37 Pain reduction medication It is important to use a placebo so that the two treatment groups appear identical to the individuals in the study. It is also important to account for the placebo effect, people who take a placebo tend to respond better than those who take nothing at all. Without a placebo or a control comparison group, there is no way to separate the placebo effect from the actual effect of the medication. 4.38 Pain reduction medication, continued a) The second design is better for generalizing the results to the entire population. Under the first design, it is impossible to tell whether the results are due to the medication or gender. b) As long as the recruited individuals are representative of the population, the results can be generalized. 4.39 Pain reduction medication, yet again The researchers should be blinded to the treatment as well so that they don’t intentionally or unintentionally treat the subjects differently according to which treatment group they are in. 4.40 Colds and vitamin C a) (i) We would have two treatments: vitamin C and a placebo version of vitamin C. (ii) Subjects would be randomly assigned to the two treatments using random numbers. They would be asked to take a pill (either vitamin C or placebo) every day for an entire winter. The presence of vitamin C would be the explanatory variable and the presence of the common cold over the next winter would be the response variable. (iii) We could make the study double blind by not letting subjects know if the pill they were taking was vitamin C or a placebo. In addition, the people from the research team who contact participants to find out if they had had a cold would not know which treatment the subjects were in. b) People who regularly take vitamin C might also be more health conscious and have other healthy behaviors. They might exercise, get plenty of sleep, and eat nutritious foods that have all kinds of vitamins – not just vitamin C. We would not be able to isolate vitamin C from these other possible lurking variables. 4.41 Reducing high blood pressure a) We could design an experiment by recruiting volunteers with a history of blood pressure; these volunteers would be the experimental units. The volunteers could be randomly assigned to one of two treatments: the new drug or the current drug. In this experiment, the explanatory variable would be treatment type and the response variable would be blood pressure after the experimental period. b) To make the study double-blind, the two drugs would have to look identical so that neither the subjects nor the experimenters who have contact with subjects know what drug a particular subject is taking.

Section 4.4 Other Ways to Conduct Experimental and Nonexperimental Studies 4.42 Student loan debt This is a stratified sample because the sample sizes are fixed in each of the groups of interest, graduates of four-year public universities and graduates of four-year private universities. The two types of universities are called strata. Simple random samples of 100 graduates are then taken from each type of university. 4.43 Club officers again a) This sample is drawn by numbering the students from 1 to 5.Then, one-digit numbers are randomly picked, and the students selected are the first female student to have her number picked (1 to 3) and the first male student to have his number picked (4 or 5). Numbers beyond the range of 1 to 5 are ignored, as are duplicates. In addition, once a student has been chosen, we will ignore the numbers of other students of that gender. For example, if 2 is picked, we would then ignore 1 and 3, because the other student must be male. Copyright © 2017 Pearson Education, Inc.

Chapter 4: Gathering Data 83 4.43 (continued) b) This is not a simple random sample because every sample of size two does not have an equal chance of being selected. For example, samples of two women or two men are prohibited from being selected at all. Moreover, this means that each of the two men has a higher chance of being selected than does any of the three women. c) Since there are three females, each having an equally likely chance of being selected and one must be selected, the activity coordinator has a 1 in 3 chance of being selected. If the activity coordinator is male, he has a 1 in 2 change of being selected since there are two males. 4.44 Security awareness training a) A stratified random sample can be used to obtain the desired random sample consisting of 0.25  20 = 5 employees from production, 0.40  20 = 8 employees from sales and marketing and 0.35  20 = 7 employees from new product development. b) The answer to this problem is based on a random process. This leads to potentially different answers each time it is performed. First, label the employees from 001 to 400 where those numbered 001 to 100 work in production, those numbered 101 to 260 work in sales and marketing and those numbered 261 to 400 work in new product development. We then select five numbers from 01 to 100, eight from 101 to 260 and seven from 261 to 400. As we select random three-digit numbers, we ignore those outside of our range, as well as duplicates. Additionally, once we have selected the correct number from a stratum, we only choose employees from the remaining strata. 4.45 Teaching and learning model a) Label the 24 schools from 01 to 24. Randomly select schools until roughly 20% of the students (about 0.25  6057 = 1212 students) are included in the sample. Note that how many schools are included in the sample will vary from one sample to the next since the schools have different numbers of students. b) Random samples will vary. c) Random samples will vary. d) No, it would not be possible to implement a stratified random sample in this case. The only possible strata are the schools themselves and as mentioned previously, the new model must be implemented at the entire school, it cannot be implemented for a random sample of students selected from each school. 4.46 German mobile study a) A retrospective study is one in which subjects are asked to report on their past mobile phone use. b) Cases refer to subjects who had eye cancer, and controls refer to subjects who did not have eye cancer. c) (i) 16/118 = 0.14 (Proportion of subjects with eye cancer who used mobile phones.) (ii) 46/475 = 0.10 (Proportion of subjects who did not have eye cancer who used mobile phones.) 4.47 Smoking and lung cancer It is possible that people with lung cancer had a different diet than did those without. For example, these people might have eaten out at restaurants quite a bit, thus consuming more fat. The social aspect of eating out might also have made them more likely to smoke. However, it could have been the fat and not the smoking that caused lung cancer. 4.48 Smoking and death This study was a prospective study because subjects were followed for 20 years. They reported behaviors as they occurred, and outcomes were seen later. They were not reporting retrospectively on previous behaviors. 4.49 Baseball under a full moon a) Yes, in large databases it is possible to uncover many surprising trends. b) Yes, anticipated results stated in advance are generally more convincing than results that have already occurred. c) In a prospective study, the researcher can gather the data he/she desires in the manner he/she desires. In a retrospective study, variables that the researcher is interested in may not be available and the data may not have been collected consistently or accurately. For these reasons, a prospective study will likely give more reliable results.

84 Statistics: The Art and Science of Learning from Data, 4th edition 4.50 Two factors helpful? a) One factor is gender, and the other is type of diet. The response variable is amount of weight lost. b) If the study did a one-factor analysis of diet, it would have found a mean weight loss of six pounds for each diet, and would not have concluded that type of diet makes a difference. c) From the two-factor study, we can learn that the low-carb diet leads to the most weight loss for women, whereas the low fat diet leads to the most weight loss for men. We would not have observed that the type of diet makes a difference had we only looked at gender and not at type of diet. We would merely have seen that women and men each lost an average of six pounds. 4.51 Growth Mindset a) The explanatory variable is type of praise. The treatments are praised for effort versus praised for intelligence. The response variable is whether they chose a challenging task on the subsequent task. The experimental units are the study participants. b) This is a randomized experiment. 4.52 Allergy relief a) The blocks are the subjects themselves. This type of block design is called a matched design. b) If this study were double-blind, that would mean that neither the subjects nor the experimenters who have contact with the subjects knew whether the subjects were taking a low dose of the drug, high dose of the drug, or placebo on a given day. c) This study could incorporate randomization by randomly assigning subjects to a given order of treatments. For example, one subject might be randomly assigned to the low dose treatment first, then the placebo, then the high dose treatment. Another subject might be randomly assigned to start with placebo, then take the low dose, then the high dose, and so on. 4.53 Effect of partner smoking in smoking cessation study a) This is not a completely randomized design because the researchers are not randomly assigning subjects to living situation (living with another smoker versus not living with another smoker). b) The experiment has two blocks: those living with smokers and those not living with smokers. c) This is a randomized block design because randomization of units to treatments occurs within blocks.

Chapter Problems: Practicing the Basics 4.54 Cell phones Since the outcome under study is brain cancer, it is more realistic to collect a random sample of subjects who have contracted brain cancer and then look at their past cell phone usage. An experiment would require the researcher to randomly assign subjects to the various treatments (varying amounts of cell phone use) and then observe these subjects long enough for a sufficient number to develop brain cancer. This is not practical and likely not even possible. 4.55 Observational versus experimental study In an observational study, we observe people in the groups they already are in. For example, we might compare cancer rates among smokers and nonsmokers. We do not assign people to smoke or not to smoke; we observe the outcomes of people who already smoke or do not smoke. In an experiment, we actually assign people to the groups of interest. Although it would be unethical, we could turn the above observational study into an experiment by assigning people either to smoke or not to smoke. We would not allow them to make this choice. (Of course, even if we ignored ethics and did this, our subjects might ignore our instructions!) The major weakness of an observational study is that we cannot control (such as by balancing through randomization) other possible factors that might influence the outcome variable. For example, in the smoking study, it could be that smokers also drink more, and drinking causes cancer. With the experiment, we randomly assign people to smoke or not to smoke; thus, we can assume that these groups are similar on a range of variables, including drinking. If the smokers still have higher cancer rates than the nonsmokers, we can assume it’s because of smoking, and not because of other associated variables such as drinking.

Chapter 4: Gathering Data 85 4.56 Unethical experimentation Examples will vary. We might be interested in whether combat experience leads to higher rates of anxiety disorders; however, we cannot randomly assign people to go off to war. We could, however, conduct an observational study comparing rates of anxiety disorders among combat veterans and among a similar sample of non-veterans. 4.57 Spinal fluid proteins and Alzheimer’s a) The explanatory variables are whether the individual has the two types of proteins in their spinal fluid and the state of the individual’s memory at the time the spinal fluid was analyzed: normal memory, memory problems or Alzheimer’s disease. The response variable is whether or not the individual developed Alzheimer’s within the next five years. b) This is a non-experimental study because individuals are not assigned to a specific memory classification. c) It would not be practically possible to design this study as an experiment because one cannot assign individuals to a specific memory category. 4.58 Fear of asbestos The friend should give more weight to the study than to the story, which is just anecdotal evidence. Something can be true on average, and yet there can still be exceptions, such as the teacher the friend knows about. The story of one person is anecdotal and not as strong evidence as a carefully conducted study with a much larger sample size. 4.59 NCAA men’s basketball poll a)







n  100%  1

Indiana: 1



Wisconsin: 1



3300  100%  1.7%



n  100%  1



5600  100%  1.3%

b) The percentages varied so drastically because of home team support. The residents of Indiana and Wisconsin wanted their teams to win and were willing to overlook their chances of winning in predicting the outcome. c) One type of potential bias is from undercoverage, the only possible respondents are visitors to the website espn.com, not all residents of the state. Another is from this being a volunteer sample, visitors to the site chose whether or not to respond to the website’s survey. 4.60 Sampling your fellow students a) This would be an example of bias because some parts of the population are favored over others. Moreover, restricting our range in this way would not allow us to determine an association between these variables. b) There are different ways to select a sample that would yield useful information. Here are two. We could select a simple random sample. We could get a list of all students at the school, number them (e.g., 00001 to 10,000 for a school of ten thousand students), then select 20 random five-digit numbers in this range. Alternately, if we wanted to be sure you had equal numbers of first year students through seniors, you could stratify your sample, dividing the students into these categories, and selecting 5 students from each year. 4.61 Beware of Internet polling Respondents to such an internet poll might be those who feel most strongly about this topic. For example, gun owners who are concerned about restrictions, such as those who belong to a pro-gun association such as the National Rifle Association, might have felt more compelled to respond than non-gun owners. The lurking variable might be gun ownership. Members of such an association might be organized to cast votes with greater probability than others. 4.62 Comparing female and male students a) The students would be numbered from 0001 to 3500. Then we would choose random four-digit numbers, ignoring those outside the range of 0001 to 3500, and ignoring duplicates. We would select the first three students whose numbers matched these criteria. b) No; every sample is not equally likely; any possible sample with more than 40 males or fewer than 40 males has probability 0 of being chosen.

86 Statistics: The Art and Science of Learning from Data, 4th edition 4.62 (continued) c) This would be a stratified random sample. It offers the advantage of having the same numbers of men and women in the study, which would be unlikely if the population had a small proportion of one of these, and this is useful for making comparisons. 4.63 Football discipline a) Because this is a volunteer sample, there is the potential for sampling bias, both because the sample is not randomly selected (those who responded might have been those who felt the most strongly) and because of undercoverage (anyone without Internet access would not have been able to participate). There also is potential for response bias because the statements are leading. b) If the sample is biased due to undercoverage and lack of random sampling, it doesn’t matter how big the sample. It’s almost always better to have a small random sample than a large volunteer sample. 4.64 Obesity in metro areas a) No, we are not able to conclude that obesity causes a higher incidence of these conditions because this is an observational study, not an experiment. b) Answers will vary. Some possibilities are education level and income. 4.65 Voluntary sports polls a) No, it was done by voluntary response. b) They could send out a survey to all sports fans (perhaps using a cluster sampling method by choosing to sample from each sports stadium). 4.66 Video games mindless? a) The explanatory variable is history of playing video games, and the response variable is visual skills. b) This was an observational study because the men were not randomly assigned to treatment (played video games versus hadn’t played); those who already were in these groups were observed. c) One possible lurking variable is reaction time. Excellent reaction times might make it easier, and therefore more fun, to play video games, leading young men to be more likely to play. Excellent reaction times also might lead young men to perform better on tasks measuring visual skills. These young men might have performed well on tasks measuring visual skills regardless of whether they played video games. 4.67 Physicians’ health study a) This was (i) an experiment, and it was (ii) prospective. b) The response variable was presence/absence of myocardial infarction and the explanatory variable was treatment group (aspirin or placebo). 4.68 Aspirin prevents heart attacks? a) The response variable was whether they had a heart attack; the explanatory variable was treatment group (aspirin or placebo). b) This is an experiment because physicians were randomly assigned to treatment – either aspirin or placebo. c) Because the experiment is randomized, we can assume that the groups are fairly balanced with respect to exercise. Each group would have some physicians with low exercise and some with high. On average, they’d be similar. 4.69 Exercise and heart attacks a) 72.5% of physicians in the aspirin group exercised vigorously, and 72.0% of physicians in the placebo group exercised vigorously. These percentages are very similar. b) The percentages of physicians in the two groups who exercised vigorously are very similar. It does seem that the randomization process did a good job in achieving balanced treatment groups in terms of exercise. Heart attack response between the two groups should not be systematically influenced in one direction or the other due to exercise. 4.70 Smoking and heart attacks a) Among those who were in the aspirin group, 49.3% never smoked, 39.7% smoked in the past, and 11.0% are current smokers. Among those in the placebo group, 49.8% never smoked, 39.1% smoked in the past, and 11.1% currently smoke. These proportions are very similar. Copyright © 2017 Pearson Education, Inc.

Chapter 4: Gathering Data 87 4.70 (continued) b) It does seem that the randomization process did a good job in achieving balanced treatment groups in terms of smoking status. Because there are similar proportions of physicians in both groups who report that they have never smoked, used to smoke, or currently smoke, this variable is not likely to be responsible for any differences in heart attack rates. The heart attack response between the two groups should not be systematically influenced in one direction or the other due to smoking status of the physicians. 4.71 Aspirin, beta-carotene, and heart attacks Aspirin (Factor 1) Yes No Treatment 2 Treatment 1 Aspirin No Aspirin Yes (Placebo) Beta-carotene Beta-carotene Beta-carotene (Factor 2) Treatment 3 Treatment 4 Aspirin No aspirin No (Placebo) No beta-carotene No beta-carotene (Placebo) (Placebo) 4.72 Bupropion and nicotine patch study results a)



Nicotine patch only: 1



Bupropion only: 1



244  100%  6.4 percentage points



244  100%  6.4 percentage points



Nicotine patch with bupropion: 1





245  100%  6.4 percentage points

Placebo only: 1 160  100%  7.9 percentage points It is believable that the true abstinence percentage falls anywhere within the range indicated by the margin of error. For example, the range for the nicotine patch only is 16.4% – 6.4% = 10% to 16.4% + 6.4% = 22.8%. These are all believable values for the abstinence percentage of those using the nicotine patch only. b) Yes, it does seem as if the treatments bupropion only and Placebo only are different. The margin of error for bupropion only indicates that the low end of believable values is 30.3% – 6.4% = 23.9%, whereas the margin of error for Placebo only indicates that the high end of believable values is 15.6% + 7.9 = 23.5%. Because there’s no overlap, we can conclude that it’s likely that these two percentages are significantly different from one another. c) No, it does not seem as if the treatments bupropion only and Nicotine patch with bupropion are significantly different. There is substantial overlap between the ranges indicated by the margins of error. The range for bupropion only extends from 23.9% to 36.7%, and the range for Nicotine patch with bupropion extends from 29.1% to 41.9%. Because 29.1% through 36.7% are believable values for both treatments, we cannot conclude that there are different abstinence percentages for these two groups. d) Using the results of (a) – (c), the results of the study suggest that two of the treatments, bupropion only and Nicotine patch with bupropion, led to higher abstinence percentages than did either of the other two treatments, Nicotine only or Placebo only. However, there was not a statistically significant difference between bupropion only and Nicotine patch with bupropion. One possible recommendation may be to use bupropion only as an aid for quitting smoking.

88 Statistics: The Art and Science of Learning from Data, 4th edition 4.73 Prefer Coke or Pepsi? a) (i) If we wanted to use a completely randomized design, one could randomly assign subjects to one of the two treatments, Coke or Pepsi, and have them rate it on a scale such as 0 to 10 with higher numbers being a better rating. Neither the subject nor the experimenter would know which cola the subject was drinking. (For example, an experimenter with no contact with the subjects could put both types of colas into identical cups; a code could be used to later identify the type of cola.) (ii) If one wanted to use a matched-pairs design, we could have all subjects participate in both treatments. We would randomly assign each subject to drink either Coke or Pepsi first, then to drink the other one, and then to indicate which they prefer. As in the completely randomized design, neither the subject nor the experimenter administering the cola would know which type of cola the subject was drinking. b) There are advantages to each. The completely randomized design eliminates the possibility that drinking one cola would alter one’s preference for the second cola. The matched-pairs design decreases the possible effects of lurking variables because the two groups are made up of the same people. 4.74 Comparing gas brands a) The response variable is the gas mileage. The explanatory variable is the brand of gas. Its treatments are Brand A (the name brand) and Brand B (the independent brand). b) In a completely randomized design, 10 cars would be randomly assigned to Brand A and 10 cars to Brand B. c) In a matched-pairs design, each car would be a block. It would first use gas from one brand, and then from the other. d) A matched-pairs design would reduce the effects of possible lurking variables because the two groups would be identical. With a completely randomized design, it is possible that, just by chance, one group of cars gets better gas mileage to begin with than does the other group. 4.75 Samples not equally likely in a cluster sample? With a simple random sample, every possible sample of a given size has an equal chance of being selected. With a cluster random sample, even with equally-sized clusters, there are many samples that have no chance of being selected. As just one example, a sample that has one subject from each cluster would never be selected in a cluster random sample because cluster random sampling, by definition, includes all the subjects in the chosen clusters. For example, the cluster sample will not contain subjects from every cluster. 4.76 Nursing homes a) The nursing homes are clusters. b) The sample is not a simple random sample because each possible sample does not have an equal chance. For instance, there is no chance of a sample in which just one person from a particular nursing home is sampled. 4.77 Multistage health survey The first stage is the division of the U.S. into four regions – these are strata, and so the researcher is using stratification to ensure she has the same number of subjects from each region. She then takes a simple random sample of ten schools in each region. At this level, she is using cluster sampling, because she is identifying all possible clusters (schools), and randomly selecting a given number of them in each region. She then randomly samples three classrooms in each school. This also is cluster sampling because she is identifying all possible clusters (classrooms this time), and randomly selecting a given number of them in each selected school. Finally, she interviews all students in those classrooms. 4.78 Hazing This is cluster random sampling. The colleges are the clusters. 4.79 Marijuana and schizophrenia It seems that there are lurking variables that are responsible for at least some of the association. For example, it is possible that individuals who are genetically susceptible to schizophrenia are also predisposed to liking marijuana. Education level and socioeconomic status are also possible lurking variables.

Chapter 4: Gathering Data 89 4.80 Twins and breast cancer a) This was a retrospective study because it recruited patients based on breast cancer status now, and then asked questions about the past (e.g., age at puberty, age at breast cancer diagnosis). b) A randomized experiment would be unethical.

Chapter Problems: Concepts and Investigations 4.81 Cell phone use The answers to these questions will be different for each student, depending on the study that each student locates. 4.82 Read a medical journal The answers to these questions will be different for each student, depending on the study that each student locates. 4.83 Internet poll Regardless of the study found, the results should not be trusted due to the volunteer nature of the sample. 4.84 Search for an observational study The answers to these questions will be different for each student, depending on the study that each student locates. 4.85 Search for an experimental study The answers to these questions will be different for each student, depending on the study that each student locates. 4.86 Judging sampling design a) This is a volunteer survey, and so likely suffers from sampling bias. It is quite possible, for example, that those with the most need for these social programs have the least access to this Internet poll. This type of sampling bias is called undercoverage because some groups in the population are not in the sampling frame. b) Again, this is a volunteer sample. This is not a random sample. Those who are writing to the congresswomen are probably those who have the strongest opinions about this issue. c) This is a biased sample. Physical and social science majors who have chosen to take a course in Comparative Human Sexuality quite possibly already are more similar in terms of sexual attitudes than are physical and social science majors who have not chosen to take this course. d) This study suffers from nonresponse bias. It is quite possible that the very large percentage of people who did not respond are different in some important way from the smaller proportion who did respond. 4.87 More poor sampling designs a) The principal is attempting to use cluster sampling by listing all of her clusters (first-period classes), and taking a random sample of clusters. However, her sample includes only one cluster. She would need to choose several clusters in order to have something resembling a representative sample. b) Values might be higher than usual because the days sampled are at the start of the weekend. Sampling just Fridays is an example of sampling bias. She should take a simple random sample of all days in the past year. 4.88 Age for legal alcohol a) “Do you think it should be legal for people to drink at age 18 given that they can get married, go to war, drive a car, and buy cigarettes by this age?” This is biased because it pushes respondents to say yes. b) “Do you think that it should be legal or illegal for people to drink at age 18?” 4.89 Quota sampling This is not a random sampling method. People who approach the street corner are interviewed as they arrive (and as they agree to the interview!). Although researchers strive to obtain data from people from a number of backgrounds, the people within these backgrounds (e.g., Hispanic) who are surveyed on the street corner may not be representative of the general population of that kind of person. Although the quota leads to a diversity of people being surveyed, the choice of a given street corner likely constitutes sampling bias. Copyright © 2017 Pearson Education, Inc.

90 Statistics: The Art and Science of Learning from Data, 4th edition 4.90 Smoking and heart attacks a) This is an observational study. No subject was assigned to a treatment. b) There could have been numerous lurking variables that might explain this association. For example, perhaps the first six months included the summer and perhaps the population size in that part of Montana is much higher in the summer. 4.91 Issues in clinical trials a) Randomization is necessary because subjects would choose the treatment in which they have the most faith. Such a study would be a measure of how well a treatment works if patients believe in it, rather than how much a treatment works independent of subjects’ beliefs about its efficacy. b) Patients might be reluctant to be randomly assigned to one of the treatments because they might perceive it as inferior to another treatment. In this case, patients might perceive (even in the absence of the data that this study is trying to collect) that the new treatment will be an improvement, and might be reluctant to participate in the study without the guarantee that they can get that treatment. c) If the researcher thinks that the new treatment is better than the current standard, he or she might be reluctant to proceed because he or she might feel that all patients should get the new treatment, and not just those randomly assigned to it. 4.92 Compare smokers with nonsmokers? This study has equal numbers of people with and without lung cancer. The rates of lung cancer are not nearly so high in the general population; these samples were chosen specifically because of their lung cancer status. Thus, in this sample, we will find far more smokers and nonsmokers with lung cancer than in the general population, because we have explicitly chosen a sample with such a high rate of lung cancer. 4.93 Is a vaccine effective? Because the disease is so rare, it’s very unlikely that the 200 people randomly chosen to be in this study would have the disease, whether or not they get the vaccine. It would be more practical to find a certain number of people who already have the rare disease (the cases). We would compare the proportion of these people who had received the vaccine to the proportion in a group of controls who did not have the disease. 4.94 Distinguish helping and hindering among infants a) If the videos are always shown in the same order, it’s possible that the infant grows bored before the second video and simply chooses the toy he/she recognizes from the first video. b) Randomizing the order in which the videos are shown will minimize this type of response bias lending more credibility to the results. 4.95 Distinguish helping and hindering among infants, continued Answers will vary. Number the infants 01 to 16. Using the table of random digits, select distinct pairs of two digit numbers between 01 and 16 until eight have been chosen. To use a random number generator, select the minimum value as 1, the maximum value as 16, and the number to select to eight. These eight infants are assigned to watch the video with the helpful figure first. The remaining eight infants are assigned to watch the video with the hindering figure first. 4.96 Distinguish helping and hindering among infants, continued Answers will vary but are most likely to be either 0 or 1. The results should provide convincing evidence that infants actually tend to exhibit a preference. 4.97 Multiple choice: What’s a simple random sample? The best answer is (b). 4.98 Multiple choice: Be skeptical of medical studies? The best answer is (b). 4.99 Multiple choice: Opinion and question wording The best answer is (a). 4.100 Multiple choice: Campaign funding The best answer is (b). 4.101 Multiple choice: Emotional health survey The best answer is (d). Copyright © 2017 Pearson Education, Inc.

Chapter 4: Gathering Data 91 4.102 Multiple choice: Sexual harassment The best answer is (a). 4.103 Multiple choice: Effect of response categories The best answer is (b). ♦♦4.104 Systematic sampling a) Although every subject is equally likely to be chosen – at least before the first subject is chosen – every possible sample of 100 is not equally likely. We would never, for example, have a sample that included subjects whose names were next to each other on the population list. b) The company would determine the first item using a randomly selected two-digit number between 01 and 50; the item that coincided with that number would be checked, and then every 50th item thereafter also would be checked. ♦♦4.105 Complex multistage GSS sample (Note: Because there are many aspects of this study, and sampling changes over the years of the study, there are several possible responses for each part of this exercise.) a) In some of the research, clustering was used. For example, the researchers randomly selected among all possible Standard Metropolitan Statistical Areas (SMSAs) or non-metropolitan counties. b) Much of the research used stratification. For example, within the clusters mentioned above, the researchers stratified participants based on region, age, and race. c) Simple random sampling also was used at times. The GSS website notes: “The full-probability GSS samples used since 1975 are designed to give each household an equal probability of inclusion in the sample.” Simple random sampling means that each subject (in this case, household), has an equal chance of being in the study. ♦♦4.106 Mean family size Consider the hint. The mean population family size is 6. If we choose families, the possible values are 2 and 10, each equally likely, and the sampling is not biased. If we choose individuals, the value 2 has probability 2/12 and the value 10 has probability 10/12. We are likely to overestimate the mean size. ♦♦4.107 Capture–recapture a) M is the 50 deer who were tagged initially, n is the 125 deer who were captured several weeks later, and R is 12, the number of tagged deer who showed up in the second sample. b) If one assumes that the sample proportion of tagged deer equals the population proportion of tagged deer, one can easily calculate the estimated population size of deer. We know the proportion of tagged deer in the sample is the number of tagged deer in the second sample (R), divided by the total number of deer in the second sample (n). We also know the total number of tagged deer in the population (M). That leaves only one variable to solve for, the estimated number of deer in the population (N). R M 12 50 125  50 c)    N  520.8 n N 125 N 12 d) In first sample (the census itself)? Yes No Total (returned form) (did not return form) In second Yes R n sample (PES)? No Total M N

Chapter Problems: Student Activities 4.108 Munchie capture–recapture If the estimate is not close, one factor that could be responsible is the fairly small sample size. This could be a problem in real-life applications as well, particularly if the animal species of interest were endangered. 4.109 Activity: Sampling the states Answers will vary.

Chapter 5: Probability in Our Daily Lives 93

Section 5.1 How Probability Quantifies Randomness 5.1 Probability The long run relative frequency definition of probability refers to the probability of a particular outcome as the proportion of times that the outcome would occur in a long run of observations. 5.2 Testing a coin a) With a relatively short run, such as 10 flips of a coin, the cumulative proportion of heads can fluctuate a lot. b) We’d have to flip the coin many, many times. In the long run, the cumulative proportion approaches the actual probability of an outcome. 5.3 Vegetarianism No. In the short run, the proportion of a given outcome can fluctuate a lot. Only in the long run does a given proportion approach the actual probability of an outcome. 5.4 Airline accident deaths a) This would be considered a “long run” of trials. A long run refers to many, many trials. 825 million passengers per year certainly would qualify as many trials. b) The probability of dying on a particular flight is 265 / 825,000,000  0.00000032, or about 1 in 3 million. c) We generally assume that crashes are independent of each other. Therefore, the chance of dying stays constant throughout the entire year. 5.5 World Cup 2014 a) 35.9% is the subjective probability that Brazil will win the World Cup. It is not based on a frequency of events. b) Any event with nonzero probability could occur. Even Algeria could have won the World Cup. 5.6 Random digits (b) is not correct because in the short run, probabilities of each digit being generated can fluctuate a lot. 5.7 Polls and sample size You should disagree. With a biased sampling design, having a large sample does not remove problems from the sample not being selected to represent the entire population. 5.8 Heart transplant In the absence of pre-existing data, Dr. Barnard relied on the subjective definition of probability. He used his own judgment rather than objective information such as data. 5.9 Nuclear war We would be relying on our own judgment rather than objective information such as data, and so would be relying on the subjective definition of probability. 5.10 Simulate coin flips a) The ten outcomes will be different each time this exercise is completed. The outcomes will likely show a good deal of variation. b) The ten outcomes will be different each time this exercise is completed. c) The sample proportions will tend to vary less than the proportions based on n = 10 or n = 100. d) As the number of trials increases from 1 to 10 to 1000, the variability of the proportion decreases. The law of large numbers indicates that as the number of independent trials increases, the cumulative proportion approaches the actual probability of a given outcome. Each of the two outcomes will occur closer to 0.50 of the time as the number of trials increases. 5.11 Unannounced pop quiz a) The results will be different each time this exercise is conducted. b) We would expect to get about 50 questions correct simply by guessing. c) The results will depend on your answer to (a). d) 42% of the answers were “true.” We would expect this percentage to be 50%. They are not necessarily identical, because observed percentages of a given outcome can fluctuate in the short run. Copyright © 2017 Pearson Education, Inc.

94 Statistics: The Art and Science of Learning from Data, 4th edition 5.11 (continued) e) There are some groups of answers that appear nonrandom. For example, there are strings of five “trues” and eight “falses”, but this can happen by random variation. Typically, the longest strings of trues or falses that students will have will be much shorter than these. 5.12 Stock market randomness a) The students will probably see runs of consecutive Hs or Ts. b) It will probably not be unusual to see a run of about 5 Hs in a row. c) If you are a serious investor, you should not get too excited if you see a sequence of increases in the stock market over several days. This would not be a surprising outcome if the market’s direction (rising or falling) were randomly generated.

Section 5.2 Finding Probabilities 5.13 Student union poll a) There are 4  3 = 12 possible responses: great/in favor, great/opposed, great/no opinion, good/in favor, good/opposed, good/no opinion, fair/in favor, fair/opposed, fair/no opinion, poor/in favor, poor/opposed, poor/no opinion. b)

in favor great

opposed no opinion in favor

good

opposed no opinion in favor

fair

opposed no opinion in favor

poor

opposed

no opinion 5.14 Random digit a) The sample space is the set of all possible outcomes; in this case, it is all possible single digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9. b) The probability for each possible outcome is 0.10 because there are ten possible outcomes, and each has an equal chance of being selected. c) The probability of each digit being selected is 0.10 and there are 10 possible digits that could be selected. 0.10 + 0.10 + 0.10 + 0.10 + 0.10 + 0.10 + 0.10 + 0.10 + 0.10 + 0.10 = 1.0.

Chapter 5: Probability in Our Daily Lives 95 5.15 Pop quiz a) C C C I C C I I

C C I

I C I C I C I C I C I C

I C I I

I C

I b) There are 2  2  2  2 = 16 possible outcomes; therefore, the probability of each possible individual outcome is 1/16 = 0.0625. c) Looking at the tree diagram, there are five outcomes in which the student would pass: CCCC, CCCI, CCIC, CICC, and ICCC. The probability of each of these outcomes is 0.0625. Thus, the probability that the student would pass is 0.0625 + 0.0625 + 0.0625+ 0.0625+ 0.0625 = 0.3125 (rounds to 0.313). 5.16 More true-false questions a) For each question, you could give one of two answers: “true” or “false,” and there are 10 questions; thus, the number of possible outcomes is 2  2  2  2  2  2  2  2  2  2 = 210 = 1024. b) The complement of the event of getting at least one of the questions wrong is the event of getting none of the questions wrong. c) Only one of these outcomes is entirely correct; therefore, the possibility of getting them entirely correct with random guessing is 1/1024 = 0.00098. The possibility of getting at least one wrong is 1.0 – 0.00098 = 0.999. 5.17 Rolling two dice You should disagree. Not all events from the sample space are equally likely. Specifically, there are six ways of getting a 7 (1 on the first die and 6 on the second, 2 and 5, etc.) whereas there is only one way of getting a 2 (1 on both dice). 5.18 Two girls a) The sample space includes FF, FM, MF, and MM. b) One of these four possibilities is FF; thus the probability that the family has one girl is 1/4 = 0.25. c) If the chance of a girl is 0.49, then the probability that the family will have two girls is: P(A and B) = P(A)  P(B) = 0.49  0.49 = 0.24. 5.19 Three children a) The sample space for the possible genders of three children is BBB, BBG, BGB, GBB, BGG, GBG, GGB, GGG. b) There are 8 possible outcomes, each equally likely, so the probability of any particular outcome is 1/8. Copyright © 2017 Pearson Education, Inc.

96 Statistics: The Art and Science of Learning from Data, 4th edition 5.19 (continued) c) There are 3 outcomes with two girls and one boy so the probability is 3/8 = 0.375. d) There are only 2 outcomes, BBB and GGG, that do not have children of both genders so that the probability is 1– 2/8 = 6/8 = 3/4. 5.20 Wrong sample space Outcomes are not equally likely. This logic assumes that it is equally likely to have each of these outcomes. In fact, it is more likely to have 1, 2, or 3 girls than it is to have 0 or 4 girls. For example, there is only one possibility of all four children being girls, GGGG, whereas there are four possibilities of having 1 girl: GBBB, BGBB, BBGB, BBBG. 5.21 Insurance

20 heads has probability (1/ 2)20 , which is 1/1,048,576 = 0.000001. The risk of a one in a million death is 1/1,000,000 = 0.000001. 5.22 Cell phone and case It is not possible to calculate this probability based on the information given because the events “purchasing a new cell phone” and “purchasing a cell phone protective case” are not independent. 5.23 Seat belt use and auto accidents a) The sample space of possible outcomes is YS; YD; NS; and ND. b) P(D) = 2111/577,006 = 0.004; P(N) = 164,128/577,006 = 0.284 c) The probability that an individual did not wear a seat belt and died is 1601/577,006 = 0.003. This is the probability that an individual will fall in both of these groups – those who did not wear seat belts and those who died. d) If the events N and D were independent, the answer would have been P(N and D) = P(N)  P(D) = (0.004)(0.284) = 0.001. In the context of these data, this means that more people died than one would expect if these two events were independent since 0.001 is not equal to 0.003. This indicates that the chance of death depends on seat belt use. 5.24 Protecting the environment GRNPRICE GRNGROUP Yes No Not Sure Total Yes 293 71 66 430 No 2211 1184 1386 4781 Total 2504 1255 1452 5211 a) (i) P(GRNGROUP) = 430/5211 = 0.083 (ii) P(GRNPRICE) = 2504/5211 = 0.481 b) P(GRNGROUP and GRNPRICE) = 239/5211 = 0.056 c) If the variables were independent, P(GRNGROUP and GRNPRICE)= P(GRNGROUP)  P(GRNPRICE) = (430/5211) × (2504/5211) = 0.040. This is smaller than the actual probability which was computed in (b). If a respondent is a member of an environmental group it is more likely that they are also willing to pay higher prices to protect the environment. d) (i) P(GRNGROUP or GRNPRICE) = (239 + 71 + 66 + 2111)/5211 = 2641/5211 = 0.507 (ii) P(GRNGROUP or GRNPRICE) = P(GRNGROUP)+ P(GRNPRICE) – P(GRNGROUP and GRNPRICE) = 430/5211+ 2504/5211 – 239/5211 = 2641/5211 = 0.508

Chapter 5: Probability in Our Daily Lives 97 5.25 Global warming and trees a) Less Yes

About the same More Less About the same

More b) If A and B were independent events, P(A and B) = P(A)  P(B). Since P(A and B) > P(A)  P(B), A and B are not independent. Thus, whether or not a person plans to use less fuel in the future depends on whether they believe that global warming is happening. The probability of responding “yes” on global warming and “less” on future fuel use is higher than what is predicted by independence. 5.26 Newspaper sales a) Weekday Weekend Yes No Total Yes 0.25 0.20 0.45 No 0.05 0.50 0.55 Total 0.30 0.70 1.00 b) P(W) = 0.25 + 0.10 = 0.30; P(S) = 0.25 + 0.20 = 0.45 c) The event “W and S” means that they bought the paper on the weekday and weekend. P(W and S) = 0.25. d) If W and S were independent, we’d expect that P(W and S) = P(W)  P(S) = (0.30)(0.45) = 0.135, not 0.25. These are not the same, so the events are dependent. It makes sense that customers who buy on the weekend are more likely to buy on the weekday, because you already know that they like reading the newspaper. 5.27 Arts and crafts sales a) There are eight possible outcomes as seen in the tree diagram. Y Y Y N

Y N N

N Y N Y N Y

N b) The probability of at least one sale to three customers is 0.488. Of the 8 outcomes, only one includes no sale to any customer. The probability that this would occur = (0.8)(0.8)(0.8) = 0.512. Thus, 1 – 0.512 = 0.488, the probability that at least one customer would buy. c) The calculations in (b) assumed that each event was independent. That outcome would be unrealistic if the customers are friends or members of the same family, and encourage each other to buy or not to buy.

98 Statistics: The Art and Science of Learning from Data, 4th edition

Section 5.3 Conditional Probability 5.28 Recidivism rates Using R for reincarcerated, B for blacks, and W for whites, the conditional probabilities are: P(R | B ) = 0.81; P(R | W) = 0.73. 5.29 Spam c

a) P(B | S) b) P(B | S ) c c c) P(S | B ) d) P(S | B) 5.30 Audit and low income a) P(Audited | Income < $200,000) = P(Audited and Income < $200,000)/P(Income < $200,000) = 0.0085/(0.0085 + 0.9556) = 0.0088 b) P(Income < $200,000 | Audited) = P(Income < $200,000 and Audited)/P(Audited) = 0.0085/(0.0085 + 0.0009 + 0.0003) = 0.8763 5.31 Religious affiliation a) The probability that a randomly selected individual is identified as Christian is (57,199 + 36,148 + 16,834 + 11,366 + 51,855)/228,182 = 0.7599. b) P(Catholic | Christian) = P(Catholic and Christian)/P(Christian) = (57,199/228,182)/0.7599 = 0.3299 c) P(No Religion | Answered) = P(No Religion and Answered)/P(Answered) = (34,169/228,182)/[(228,182 – 11,815)/(228,182)] = 0.1497/0.9482 = 0.1579 5.32 Cancer deaths a) There are three possible events among deaths that are due to cancer. If cancer is C, tobacco is T, diet is D, and other causes is O, then the three possible events among cancer deaths are CT, CD, and CO. The latter three probabilities (of the deaths that are due to cancer, 30% are attributable to tobacco, 40% to diet, and 30% to other causes) are conditional probabilities. b) P(C and T) = P(T | C)  P(C) = (0.30)(0.25) = 0.075 5.33 Revisiting seat belts and auto accidents a) P(D) = 2111/577,006 = 0.004 b) P(D | wore seat belt) = 510/412,878 = 0.001 P(D | didn’t wear seat belt) = 1601/164,128 = 0.010 c) Neither P(D | wore seat belt) nor P(D | didn’t wear seat belt) equals P(D); specifically, 0.001 and 0.010 are different from 0.004. Thus, the events are not independent. 5.34 Go Celtics! a) Free Throw Success 2nd free throw made 2nd free throw missed Total 1st free throw made 251 34 285 1st free throw missed 48 5 53 Total 299 39 338 b) (i) P(made first) = 285/338 = 0.84 (ii) P(made second) = 299/338 = 0.88 c) P(made second | made first) = 251/285 = 0.88; it seems as if his success on the second shot depends hardly at all on whether he made the first. P(made second | made first) = P(made second) 5.35 Identifying spam a) Identified as Spam by ASG Spam Yes No Yes 7005 835 No 48 b) 7005/(7005+835) = 0.8935 c) 7005/(7005+48) = 0.9932

Chapter 5: Probability in Our Daily Lives 99 5.36 Homeland security a) Detected by Device Radioactive Material Yes No Yes a b No c d The cell corresponding to the false alarms the NYPD fears are given by the cell containing “c”.

b) A

Since the event A contains the event B, if B is known to have occurred, it must be that A has also occurred. Thus, P(A | B) = 1. However, knowing that A has occurred does not guarantee the occurrence of B since B is a subset of A. Thus, P(B | A) < 1. 5.37 Down syndrome again a) P(D | NEG) = 6/3927 = 0.0015 b) P(NEG | D) = 6/54 = 0.111; these probabilities are not equal because they are built on different premises. One asks us to determine what proportion of fetuses with negative tests actually has Down syndrome. This is a small number based on a very large pool of fetuses who had negative tests (3927). The other asks us to calculate what proportion of fetuses with Down syndrome had a negative test. This is a larger number because even though the number of false negatives is small, it’s based on a small pool of fetuses (just 54). 5.38 Obesity in America a) 35% and 9% are conditional. They are the chance of being obese given your race. b) Black Asian Other Obese 8446 518 64,570 Not Obese 15,685 5234 177,894 Total 24,131 5752 242,464 c) Obese (35%) Black Not Obese (65%) Obese (9%) Black

100 Statistics: The Art and Science of Learning from Data, 4th edition 5.39 Happiness in relationship a) P(very happy) = 147/317 = 0.46 b) (i) P(very happy | male) = 69/146 = 0.47 (ii) P(very happy | female) = 78/171 = 0.46 c) From (a), P(very happy) = 0.46 and from (b), P(very happy | male) = 0.47. Since these values are very close, we can say the events of being happy and being male are independent. 5.40 Petra Kvitova serves a) P(1st serve good) = 28/41 = 0.68 b) P(double fault | first serve is a fault) = P(double fault)/P(first serve is a fault) = (3/41)/((41 – 28)/41) = 3/13 = 0.23 c) 3/41 = 0.07; She double faults on 7% of her service points. 5.41 Shooting free throws a) When two events are not independent: P(makes second and makes first) = P(makes second | makes first)  P(makes first) = (0.60)(0.50) = 0.30. b) (i) P(misses second and makes first) = P(misses second | makes first)  P(makes first) = (0.40)(0.50) = 0.20 P(makes second and misses first) = P(makes second | misses first)  P(misses first) = (0.40)(0.50) = 0.20 P(making one of the two free throws) = 0.20+0.20 = 0.40 (ii) As shown in (a), P(makes both) = 0.30. P(misses both) = P(misses second | misses first)  P(misses first) = (0.60)(0.50) = 0.30, P(makes both or none) = 0.30+0.30 = 0.60. Thus, P(makes only one) = 1 – 0.60 = 0.40. c) The results of the free throws are not independent because the probability that he will make the second shot depends on whether he made the first shot. 5.42 Drawing cards a) False, once one has been dealt one black card, there are only 25 out of 51, which is not one half. If one has been dealt two, there are now 24 out of 50, which also is not one half. The correct probability is P(first card black)  P(second card black | first card black)  P(third card black | first and second black) = (26/52)(25/51)(24/50) = 0.118 b) A and B are not independent because your chances of getting a red card on the second draw are lower if you got a red card on the first draw. On the first draw, there was a 0.50 probability of drawing a red card, but on the second draw, there’s now a 25/51 = 0.49 probability. c) redo (a) and (b). a) True because half the cards are black. b) Yes; because the cards are replaced, the first card drawn has no effect on the second card drawn. 5.43 Drawing more cards P(winning) = P(two diamonds) = P(first card is a diamond)×P(second card is a diamond | first card is a diamond) = (10/47)(9/46) = 0.0416 5.44 Big loser in Lotto P(0 winning numbers) = P(have none)  P(have none | had none on first)  P(have none | had none on first and second)  P(have none | had none on first, second or third)  P(have none | had none on first through fourth)  P(have none | have none on first through fifth) = (43/49)(42/48)(41/47)(40/46)(39/45)(38/44) = 0.436 5.45 Family with two children a) P(C | A) = P(C and A)/P(A) = (1/4)/(1/2) = 1/2 b) These are not independent evens because P(C | A), 1/2, is not equal to P(C), 1/4. c) P(C | B) = P(C and B)/P(B) = (1/4)/(3/4) = 1/3

Chapter 5: Probability in Our Daily Lives 101 5.45 (continued) d) P(C | A) is the probability of both children being female given that the first is female. This is the same as the probability that the second child is female, 0.5. P(C | B) is the probability that both children are female given that one of the children is female. There are 3 possibilities where one of the children is female: FF, FM, MF. For only one of these are both children female so this probability is smaller, namely 1/3. 5.46 Checking independence a) P(A) = 0.50; P(B) = 0.50; P(C) = (0.5)(0.5) = 0.25; P(D) = (0.5)(0.5)(0.5) = 0.125 b) A and B are independent. The other sets would not be independent because if D (all three heads) occurs, we know the other three have occurred; thus, DC, DB, and DA are not independent. If C (heads on the first two) occurs, we know that A and B have occurred; thus, CB and CA are not independent.

Section 5.4 Applying the Probability Rules 5.47 Birthdays of presidents It’s easiest to find the probability of no birthday matches among 44 people and then subtract from one. To do this one multiplies (364/365)(363/365)(362/365)…(322/365) = 0.0671; 1 – 0.0671 = 0.9329. The probability of finding at least one birthday match among 44 people is about 0.93. This is not highly coincidental. 5.48 Matching your birthday a) Because we’re noting a specific date rather than students with the same birthday on any date, the probability would be lower. b) There’s a 0.57 chance that two students out of 25 will share a birthday. However, if we are interested in finding the probability that one of the remaining 24 students also has a January first birthday, the probability is one minus the probability no one shares your birthday. Thus, it is 1 – (364/365)24 = 0.06. 5.49 Lots of pairs Each student can be matched with 24 other students, for a total of (25)(24) = 600 pairs. But this considers each pair twice (e.g., student 1 with student 2, and student 2 with student 1), so the answer is (25)(24)/2 = 300. 5.50 Holes in one at Masters a) P(no holes in one during a round of golf) = (1 – 0.0005)(1 – 0.0015)(1 – 0.0005)(1 – 0.0025) = 0.995 b) P(no holes in one during the next 20 rounds of golf) = (0.995)20 = 0.905 c) P(at least one hole in one during the next 20 rounds of golf) = 1 – P(no holes in one during the next 20 rounds of golf) = 1 – 0.905 = 0.095 5.51 Corporate bonds P(company does not default during the five-year bond term) = (1 – 0.05)(1 – 0.07)(1 – 0.07)(1 – 0.07)… (1 – 0.09) = 0.6954 5.52 Horrible 11 on 9/11 Because of the huge number of possible occurrences, combined with our tendency as humans to look for patterns, we are going to see coincidences fairly frequently. Coincidences, however, are not amazing when we consider them in the context of all the possible random occurrences at all times. The percentage that constitute coincidences is very small; we just pay more attention to coincidental than to non-coincidental occurrences. 5.53 Coincidence in your life The response will be different for each student. The explanation, however, will discuss the context of the huge number of the possible random occurrences that happen in one’s life, and the likelihood that at least some will happen (and appear coincidental) just by chance. 5.54 Monkeys typing Shakespeare If we assume that each time the monkey hits a key, it is independent of the other times he/she hits a key, we can use the multiplicative rule. (1/50)(1/50)(1/50)(1/50)(1/50)(1/50)(1/50) = 0.0000000000013

102 Statistics: The Art and Science of Learning from Data, 4th edition 5.55 A true coincidence at Disneyworld a) The probability that the first will go multiplied by the probability that the second will go and so on for all 5.4 million, that is (1/5000)5,400,000, which is zero to a huge number of decimal places. b) This solution assumes that each person decides independently of all others. This is not realistic because families and friends often make vacation plans together. 5.56 Rosencrantz and Guildenstern a) The sample space would be all possible combinations of coin flips. Thus, one could get 91 heads, or 90 heads in a row and then one tail, or 89 heads in a row then a tail then another head, and so on for 2 to the 91st power, which is 2.48  1027 combinations! b) Guildenstern’s outcome for this sample space is 91 heads: HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH. If the second flip were a tail, the outcome would be: HTHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH. HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH. c) The probability of the event of getting a head 91 times in a row is the probability of getting no tails, (1./2)91, essentially 0. d) The probability of at least one tail is 1.0 minus the probability of no tails which is essentially 1. e) Our probability model assumes that each coin flip is independent, and that each has the same probability of success. 5.57 Mammogram diagnostics a) Mammogram Intersection Breast Cancer? Result Probabilities Positive (0.86)

(0.01)(0.86) = 0.0086

Negative (0.14)

(0.01)(0.14) = 0.0014

Positive (0.12)

(0.99)(0.12) = 0.1188

Yes (0.01)

No (0.99) (0.99)(0.88) = 0.8712 Negative (0.88) b) P(POS) = P(S and POS) + P(S and POS) = 0.0086+0.1188 = 0.1274 c) P(S | POS) = P(POS and S)/P(POS) = (0.0086/0.1274) = 0.068 d) The frequencies on the branches are calculated by multiplying the proportion for each branch by the total number. For example, (0.01)(100) = 1, the frequency on the “yes” branch. Similarly, we can multiply that 1 by 0.86 to get 0.86, which rounds up to 1, the number on the “pos” branch. There are 13 positive tests in this example, only one of which indicates breast cancer. Thus, the proportion of positive tests with breast cancer is 1/(1 + 12) = 0.077 (rounds to 0.08). 5.58 More screening for breast cancer a) The four intersection probabilities are now 0.0009, 0.0001, 0.1200, and 0.8791. c P(POS) = P(S and POS) + P(S and POS) = 0.0009+0.1200 = 0.121 P(S | POS) = P(POS and S)/P(POS) = (0.0009/0.121) = 0.007 c

Chapter 5: Probability in Our Daily Lives 103 5.58 (continued) Breast Cancer?

Mammogram Result Positive (1)

Yes (1) Negative (0) 1000 Women

Positive (120) No (999)

Negative (879) b) The tree diagram shows that only 1 out of 121 positive tests indicated actual breast cancer. 1/121 = 0.008. c) As can be seen from the two tree diagrams, 12 out of 13 were false positives for older women, whereas 120 out of 121 were false positives among younger women. Since it is less likely for a younger woman to have breast cancer, yet the specificity is the same for the two groups, the proportion of false positives will be greater for younger women. 5.59 Was OJ actually guilty? a) 45 is entered based on the proportion of 100,000 women who are murdered annually; it is the sum of women who are murdered by their partners and women who are murdered by someone other than their partners. 5 is entered because it is the number, out of 100,000 who are murdered by someone other than their partners. b) The blanks in the tree diagram will be 40 for those women who are murdered by their partners, and 99,955 for women who are not murdered. c) Of the women who suffer partner abuse and are murdered, 40/45, or 89%, are murdered by their partners. On the other hand, the probability that a woman is murdered by her partner given that she has suffered partner abuse is 40/100000 or 0.0004. O.J.’s wife had been abused by him and murdered, so it is the 89% statistic that is relevant in this case. These statistics differ drastically because most women who suffer partner abuse are not murdered. However, those who are murdered are typically murdered by their partner. 5.60 Convicted by mistake a) We are given the following information: P(Convicted | Guilty) = 0.95, P(Acquitted | Innocent) = 0.95, P(Guilty) = 0.90. We are asked to find P(Innocent | Convicted). Based on these values we can fill in the table as follows: Guilty Innocent Total Convicted (0.95)(0.9) = 0.855 0.10 – 0.095 = 0.005 0.855 + 0.005 = 0.86 Acquitted 0.90 – 0.855 = 0.045 (0.95)(0.10) = 0.095 0.045 + 0.095 = 0.14 Total 0.90 1 – 0.90 = 0.10 P(Innocent | Convicted) = P(Innocent and Convicted)/P(Convicted) = 0.005/0.86 = 0.0058 b) Making the change P(Guilty) = 0.50 we have the following: Guilty Innocent Total Convicted (0.95)(0.50) = 0.475 0.50 – 0.475 = 0.025 0.475 + 0.025 = 0.50 Acquitted 0.50 – 0.475 = 0.025 (0.95)(0.50) = 0.475 0.025 +0.475 = 0.50 Total 0.50 1 – 0.50 = 0.50 P(Innocent | Convicted) = P(Innocent and Convicted)/P(Convicted) = 0.025/0.50 = 0.05

104 Statistics: The Art and Science of Learning from Data, 4th edition 5.60 (continued) c) We are given the following information: P(Convicted | Guilty) = 0.99, P(Acquitted | Innocence) = 0.75 and P(Guilty) = 0.90. We are asked to find P(Innocent | Convicted). Based on these values we can fill in the table as follows: Guilty Innocent Total Convicted (0.99)(0.9) = 0.891 0.10 – 0.075 = 0.025 0.891 + 0.025 = 0.916 Acquitted 0.90 – 0.891 = 0.009 (0.75)(0.10) = 0.075 0.009 + 0.075 = 0.084 Total 0.90 1 – 0.90 = 0.10 P(Innocent | Convicted) = P(Innocent and Convicted)/P(Convicted) = 0.025/0.916 = 0.0273. 5.61 DNA evidence compelling? a) P(Innocent | Match) = P(Innocent and Match)/P(Match) = 0.0000005/0.4950005 = 0.000001 Intersection Innocent? DNA Match? Probabilities Yes (0.000001) 0.0000005 Yes (0.50) No (0.999999)

0.4999995

Yes (0.99)

0.495

No (0.50) 0.005 No (0.01) b) P(Innocent | Match) = P(Innocent and Match)/P(Match) = 0.00000099/0.00990099 = 0.0001 When the probability of being innocent is higher, there’s a bigger probability of being innocent given a match. Intersection Innocent? DNA Match? Probabilities Yes (0.000001) 0.00000099 Yes (0.99) No (0.999999)

0.98999901

Yes (0.99)

0.099

No (0.01) No (0.01) 0.0001 c) P(Innocent | Match) can be much different from P(Match | Innocent). 5.62 Triple Blood Test a) The prevalence is 54/5282 = 0.01. b) (i) The sensitivity is 48/54 = 0.89. (ii) The specificity is 3921/5228 = 0.75. c) (i) The positive predictive value is 48/1355 = 0.035. (ii) The negative predictive value is 3921/3927 = 0.998. d) There are four ways of describing the probability that a diagnostic test makes a correct decision. First, in (b-i), we see the probability that the test would be positive if one had Down syndrome. Second, in (b-ii), we see the probability that the test would be negative given that one does not have Down syndrome. In (c-i), we see the probability that one would have Down syndrome if the test is positive. Finally, in (c-ii), we see the probability that one would not have Down syndrome given a negative test.

Chapter 5: Probability in Our Daily Lives 105 5.63 Simulating donations to local blood bank a) The results of this exercise will be different each time it’s conducted. One would make the assumption of independence with respect to donors. b) We would multiply the chances of the first person not being an AB donor (19/20) by the chances of the second person not being an AB donor (19/20) by the chances of the third person not being an AB donor (19/20), etc. This results in multiplying (0.95)  (0.95)  (0.95)    (0.95) – a grand total of twenty 0.95’s, or (0.95)20. This product of these probabilities is 0.36. 5.64 Probability of winning a) You would expect the probability of winning to be 0.65, but values will vary due to the simulation. b) You would expect the probability of winning to be 0.65, but values will vary due to the simulation. 5.65 Probability of winning a) Multiply the table results in Exercise 16 by 4 seconds to get the answers. b) Answers will vary depending on the simulation. c) Answers will vary depending on the simulation, but rapid succession is likely faster.

Chapter Problems: Practicing the Basics 5.66 Peyton Manning completions a) No. What it means is that in 100 passes we expect to see about 65 completions, but the actual number may vary somewhat. b) If Manning is still at his typical playing level, it would be quite surprising if his completion percentage over a large number of passes differed significantly from 0.65. The more passes he throws, the closer the observed percentage should be to 0.65. 5.67 Due for a boy? The gender of each child is independent of the genders of the previous children. Thus, the chance that this child is a boy is still 1/2. 5.68 P(life after death) The relative frequency refers to the probability of an outcome as a long-run proportion; that is, we observe how many times a given event occurs out of a certain number of trials. Subjective definitions of probability refer to our degree of belief that a given outcome will occur. Because we cannot observe whether people have life after death, we can only give a subjective definition of (a) the probability of life after death. On the other hand, we could observe (b) how often we remember at least one dream that we had the previous night. Over the long-run, we could calculate the relative frequency of remembering a dream, and estimate the probability that this will occur.

106 Statistics: The Art and Science of Learning from Data, 4th edition 5.69 Choices for lunch a) Given that all customers select one dish from each category, there are 2  3  3  1 = 18 possible meals.

corn

beef

green beans

potatoes

corn

chicken

green beans

potatoes

cola

apple pie

ice tea

apple pie

coffee

apple pie

cola

apple pie

ice tea

apple pie

coffee

apple pie

cola

apple pie

ice tea

apple pie

coffee

apple pie

cola

apple pie

ice tea

apple pie

coffee

apple pie

cola

apple pie

ice tea

apple pie

coffee

apple pie

cola

apple pie

ice tea

apple pie

coffee

apple pie

b) In practice, it would not be sensible to treat all the outcomes in the sample space as equally likely for the customer selections we’d observe. Typically, some menu options are more popular than are others. 5.70 Caught doctoring the books a) If we were to randomly pick one of the digits between 1 and 9 using a random numbers table, the probability for each digit would be 1/9 = 0.111. b) The probability of a 5 or a 6 as the first digit using (i) Benford’s Law would be 0.08 + 0.07 = 0.15, and using (ii) random selection would be 0.111 + 0.111 = 0.222. 5.71 Life after death a) The estimated probability that a randomly selected adult in the U.S. believes in life after death is 1455/1787 = 0.8142. b) The probability that both subjects believe in life after death is 0.8142  0.8142 = 0.6629. c) The assumption used in the answer to (b) is that the responses of the two subjects are independent. This is probably unrealistic because married couples share many of the same beliefs. 5.72 Death penalty jury a) The probability of all 12 jurors being white is the probability that the first is white (0.90) multiplied by the probability that the second is white (0.90), and so on for all 12 jurors. 0.90  0.90  0.9    0.90 = 0.28.

Chapter 5: Probability in Our Daily Lives 107 5.72 (continued) b) This problem would be solved in the same manner as (a), substituting 0.50 for 0.90. The probability is now 0.0002. 5.73 Driver’s exam a) There are 2  2  2 = 8 possible outcomes. First Second Third Friend Friend Friend Pass Pass Fail Pass Pass Fail Fail Pass Pass Fail Fail Pass Fail Fail b) If the eight outcomes are equally likely, the probability that all three pass the exam is 1/8 = 0.125. This could also be calculated by multiplying the probability that the first would pass (0.5), by the probability that the second would pass (0.5), by the probability that the third would pass (0.5). 0.5  0.5  0.5 = 0.125. c) If the three friends were a random sample of their age group, the probability that all three would pass is 0.7  0.7  0.7 = 0.343. d) The probabilities that apply to a random sample are not likely to be valid for a sample of three friends because the three friends are likely to be similar on many characteristics that might affect performance on such a test (e.g., IQ). In addition, it is possible that they studied together. 5.74 Independent on coffee? a) These events are dependent since if the student has visited Europe in the past 12 months, it is likely that they flew there. b) One could explain that independent means that the first event has no bearing on the second event – that one could not predict the second event from the first. c) Dependent. Individuals who visit Europe are likely to visit more than one country, particularly ones that are close together. d) Independent. These two events seem to be unrelated. e) The pairs of events in (a) are most dependent, those in (d) the least. 5.75 Health insurance a) The probability that a patient does not have health insurance is 0.16. Thus, the probability that a patient has health insurance is 1 – 0.16 = 0.84. b) P(private | health insurance) = P(private and health insurance)/P(health insurance) = 0.59/0.84 = 0.70 5.76 Teens and drugs The second probability that is given, 26%, refers to a conditional probability. The event is conditional on the teen reporting that they go to clubs for music or dancing at least once a month. The probability refers to the event that a teen says that drugs were usually available at the club events. 5.77 Teens and parents a) The last two percentages, 31% and 1%, are conditional probabilities. The 31% is conditioned on the event that the teen says that parents are never present during the parties they attend. The 1% is conditioned on the event that the teen says that parents are present at the parties they attend. For both percentages, the event to which the probability refers is a teen reporting that marijuana is available at the parties they attend.

108 Statistics: The Art and Science of Learning from Data, 4th edition 5.77 (continued) b) Parents Present Marijuana available Yes No Yes 9 133 No 860 295 c) The probability that parents are present given that marijuana is not available at the party is P(parents are present and marijuana is not available)/P(marijuana is not available) = 860/(860 + 295) = 0.74. 5.78 Laundry detergent a) The probability that a randomly chosen consumer would have seen advertising for the new product and tried the product is 0.10. b) The probability that the person has tried the product given that the person has seen the product advertised is P(tried the product and seen product advertised)/P(seen product advertised) = 0.10/0.35 = 0.29. c) from (a): P(A and B), from (b): P(A | B) d) P(A) = 0.15; P(A | B) = 0.29. Because P(A) does not equal P(A | B), A and B are not independent. 5.79 Board games and dice a) The sample space of all possible outcomes for the two dice is as follows: (1,1); (1,2); (1,3); (1,4); (1,5); (1,6); (2,1); (2,2); (2,3); (2,4); (2,5); (2,6); (3,1); (3,2); (3,3); (3,4); (3,5); (3,6); (4,1); (4,2); (4,3); (4,4); (4,5); (4,6); (5,1); (5,2); (5,3); (5,4); (5,5); (5,6); (6,1); (6,2); (6,3); (6,4); (6,5); (6,6). b) The outcomes in A are: (1,1); (2,2); (3,3); (4,4); (5,5); and (6,6); the probability of this is 6/36 = 0.167. c) The outcomes in B are (1,6); (2,5); (3,4); (4,3); (5,2); and (6,1); the probability of this is 6/36 = 0.167. d) (i) There are no outcomes that include both A and B; that is, none of the doubles add up to seven. Thus, the probability of A and B is 0. (ii) The probability of A or B = 0.1667+0.1667 = 0.333. (iii) The probability of B given A is 0. If you roll doubles, it cannot add up to seven. e) A and B are disjoint. You cannot roll doubles that add up to seven. 5.80 Roll two more dice a) P(B and D) = 0.167 which also is P(B); when an event B is contained within an event D, P(B and D) = P(B) because all Bs also are Ds. The requirement that the event also be D does not constrain B anymore than it already is constrained. b) P(B or D) = 0.50 which also is P(D); when an event B is contained within an event D, P(B or D) = P(D) because B does not add anything to D. It already is part of it. 5.81 Conference dinner P(Dinner | Breakfast) = P(Dinner and Breakfast)/P(Breakfast) = 0.40/0.50 = 0.80 5.82 Waste dump sites a) Let A = violation at the first project and B = violation at the second project. Then, if the two projects are disjoint, P(A or B) = P(A) + P(B) = 0.30 + 0.25 = 0.55. c c c b) P(B | A ) = P(B and A )/P(A ) = 0.25/0.7 = 0.357. Note that if the projects A and B are disjoint, the c probability of (B and A ) is the same as the probability of B. c) (a) If independent, P(A or B) = P(A) + P(B) – P(A and B) = 0.30 + 0.25 – 0.075 = 0.475. (Note: If independent, P(A and B) = P(A)  P(B) = 0.3  0.25 = 0.075.) c c c (b) If independent, P(B | A ) = P (B and A )/P(A ) = (0.175)/(0.70) = 0.25. (Note: If independent, P(B c c and A ) = P(A )  P(B) = 0.7  0.25 = 0.175.)

Chapter 5: Probability in Our Daily Lives 109 5.83 A dice game There are 36 possible combinations of dice. Of these, eight add up to seven or eleven [1,6; 2,5; 3,4; 4,3; 5,2; 6,1; 5,6; and 6,5], and four add up to two, three, or twelve [1,1; 1,2; 2,1; and 6,6]. Twenty-four add up to other sums. If you roll another sum, it doesn’t affect whether you win or lose. Given that you roll a winning or losing combination, there is an 8 in 12 chance of winning. Thus, there is a 0.67 chance of winning. This game would not be played in a casino because the game favors the player to win in the long run, not the house. 5.84 No coincidences a) The probability that we would never have a coincidence on these 100 topics is the probability that we would not have a coincidence on the first multiplied by the probability that we would not have a coincidence on the second multiplied by the probability that we would not have a coincidence on the third, and so on through 100 topics. P(disagree on first)  P(disagree on second)  P(disagree on third)    P(disagree on 100th) = (0.98)100 = 0.13. b) The probability that we would have a coincidence on at least one topic is 1 – 0.13 = 0.87. 5.85 Amazing roulette run? a) This strategy is a poor one as the roulette wheel has no memory. The chance of an even slot or of an odd slot is the same on each spin of the wheel. b) (18/38)26, which is essentially 0. c) It would not be surprising if sometime over the past 100 years one of these wheels had 18 evens in a row. Events that seem highly coincidental are often not so unusual when viewed in the context of all the possible random occurrences at all times. 5.86 Death penalty and false positives a) One error would be a false positive, where an individual who is innocent is convicted. The other error would be a false negative where a guilty defendant acquitted. b) (i) The probability that all were truly guilty is the probability of the first being guilty multiplied by the probability of the second being guilty, and so on through the 1234th person. For 1234 people this equals (0.99)1234. The probability that all were truly guilty, therefore, is 0.00000411 or close to 0. (ii) The probability that at least one was actually innocent is 1 – 0.00000411 = 0.99999589, or close to 1. c) The answers in (b) become (i) essentially 0 and (ii) essentially 1. 5.87 Screening smokers for lung cancer False negatives would be when the helical computed tomography diagnostic test indicates that an adult smoker does not have lung cancer when he or she does have lung cancer. Conversely, a false positive would occur when this test indicates the presence of lung cancer when there is none. 5.88 Screening for heart attacks a) The sensitivity is the probability of a positive test given that someone actually has the condition. In this case, given that someone has AMI, there is a 37% chance that they have a positive CK test. b) The specificity is the probability of a negative test given that someone does not have the condition. In this case, given that the individual does not have AMI, there is an 87% chance that they have a negative CK test.

110 Statistics: The Art and Science of Learning from Data, 4th edition 5.88 (continued) c) AIM?

Positive (0.37)

Intersection Probabilities (0.25)(0.37) = 0.0925

Negative (0.63)

(0.25)(0.63) = 0.1575

Positive (0.13)

(0.75)(0.13) = 0.0975

Negative (0.87)

(0.75)(0.87) = 0.6525

CK Test Result?

Yes (0.25)

No (0.75)

5.89 Screening for colorectal cancer a) Colorectal Hemoccult Cancer? Test Result? Positive (15) Yes (30) Negative (15) Positive (300) No (9970) Negative (9670) b) 15/315 = 0.048 of those who have a positive hemoccult test actually have colorectal cancer. Because so few people have this cancer, most of the positive tests will be false positives. There are so many people without this cancer that even a low false positive rate will result in many false positives. 5.90 Color blindness a) Given that one is a man, one has a 0.05 probability of being color blind. Given that one is a woman, one has a 0.0025 probability of being color blind. b) If the population is half male and half female, the proportion of the population that is color blind is 525/20,000 = 0.026. Color Blind Not Color Blind Total Male 500 9500 10,000 Female 25 9975 10,000 Total 525 19,475 20,000 c) Given that a randomly chosen person is color blind, the probability that the person is female is 25/525 = 0.048.

Chapter 5: Probability in Our Daily Lives 111 5.91 HIV testing a) HIV Positive?

Positive (0.999)

Intersection Probabilities 0.0999

Negative (0.001)

0.0001

Positive (0.0001)

0.00009

Negative (0.9999)

0.89991

Test Result?

Yes (0.10)

No (0.90)

b) Positive Negative Total HIV 0.0999 0.0001 0.1 No HIV 0.00009 0.89991 0.9 Total 0.09999 0.90001 1.0 c) Given that someone has a positive test result, the probability that this person is truly HIV positive is 0.0999/0.09999 = 0.999. d) A positive result is more likely to be in error when the prevalence is lower as relatively more of the positive results are for people who do not have the condition. With fewer people with HIV, the chances of a false positive are higher. The contingency tables below demonstrate that with a prevalence rate of 10%, there is likely to be one false positive out of 1000 positive tests, whereas with a prevalence rate of 1%, there is likely to be 1 false positive out of only 101 positive tests. 10% Rate Positive Negative Total HIV 999 1 1000 No HIV 1 8999 9000 Total 1,000 9000 10,000 1% Rate Positive Negative Total HIV 100 0 100 No HIV 1 9899 9900 Total 101 9899 10,000 5.92 Prostate cancer a) If 10% of those who took the PSA test truly had prostate cancer, the probability that a man truly had prostate cancer, given that he had a positive test, is 0.086/0.689 = 0.125. Positive Negative Total Prostate cancer 0.086 0.014 0.10 No prostate cancer 0.603 0.297 0.90 Total 0.689 0.311 1.00 b) Positive Negative Total

Prostate cancer No prostate cancer

86 603

14 297

100 900

Total

689

311

1000

112 Statistics: The Art and Science of Learning from Data, 4th edition 5.92 (continued) c) If the cases increase for which a test is positive, the sensitivity will go up because more people will have positive tests – this group will include more of the people who actually have prostate cancer. On the other hand, because more people are testing positive for prostate cancer, the probability of a false positive is increasing as well. This means that the specificity, the probability that someone has a negative test given that they don’t have prostate cancer, will go down. An increase in false positives means a decrease in correct negatives. 5.93 U Win Answers will vary. To set up the simulation, one can assign the digit 0 to the letter U, assign the digits 1–3 to the letter W, assign the digits 4–6 to the letter I and the digits 7–9 to the letter N. Choose a row of the random number table to start and read off sets of 5 digits at a time. In this simulation, it is possible to receive the same letter more than once, so duplicate digits should not be discarded. 5.94 Win again Answers will vary. To set up the simulation, one can make the letter assignments as in Exercise 5.93 and then select random digits until one of each letter has been chosen. Record the number of random digits that were required. Repeat this process 20 times. To estimate the expected number of combo meals one would need to purchase in order to win the free shake, sum the required number of digits from the 20 repetitions and divide by 20.

Chapter Problems: Concepts and Investigations 5.95 Simulate law of large numbers a) The cumulative proportions for (i) through (iv) will differ for each student who conducts this exercise. Students will notice, however, that the cumulative proportion of heads approaches 0.50 with larger numbers of flips. This illustrates the law of large numbers and the long-run relative frequency definition of probability in that as the number of trials increases, the proportion of occurrences of any given outcome (in this case, of heads) approaches the actual proportion in the population “in the long run.” b) The outcome will be similar to that in (a), with the cumulative proportion of heads approaching one third with larger numbers of flips. 5.96 Illustrate probability terms with scenarios a) The sample space is the set of possible outcomes for a random phenomenon. (i) Examples will differ for each student. One example for a designed experiment might involve participants being randomly assigned to either listen to music or sit in silence while attempting to solve a puzzle, with the possibilities of success or failure. The sample space would consist of music and success, music and failure, silence and success, and silence and failure. (ii) Examples will differ for each student. One example for an observational study is what students choose for a drink and a snack from a vending machine. If students must choose one of each, and the drink choices are regular and diet soda, and the snack choices are cookies and chips, then the sample space includes regular soda and cookies, regular soda and chips, diet soda and cookies, and diet soda and chips. b) Disjoint events are events that do not share any outcomes in common. Examples will differ for each student. One example of two events that are disjoint are having your first-born child be a girl, and having your first-born child be a boy. c) A conditional probability occurs when one assesses the probability of one event occurring given that another event already has occurred. (i) Examples will differ for each student. One possible example of independent events from everyday life is the car in front of you on the highway. The car in front of you on your commute home is likely independent of the car in front of you on your commute to school. (ii) Examples will differ for each student. One possible example of dependent events from everyday life is meals. What you eat for dinner likely depends to some degree on what you ate for lunch.

Chapter 5: Probability in Our Daily Lives 113 5.97 Short term versus long run a) The cumulative proportion of heads would be 60/110 = 0.545. b) The cumulative proportion of heads is now 510/1010 = 0.505. c) The cumulative proportion of heads is now 5010/10,010 = 0.500. As n increases, the cumulative proportion tends toward 0.500. 5.98 Risk of space shuttle a) We often form our beliefs about probability based on subjective information rather than solely on objective information such as data. We assess the probability of an outcome by taking into account all available information. In this case, the available information is 20 safe missions, which leads NASA workers to trust the safety of the missions more than after just one mission. b) The problem with this belief is that each particular mission has no memory of the previous mission. A given mission has pretty much the same risk of disaster as another mission. Of course, missions are not entirely independent of other missions, so the risks might differ somewhat. With a coin flip, each event truly is independent of the previous one. Having 20 heads in a row gives us no information about the outcome of the next flip. Similarly, having 20 safe missions in a row doesn’t necessarily predict the outcome of the next mission. 5.99 Mrs. Test (1) 99% accurate could refer to specificity, meaning that if you are pregnant, you’ll have a positive test 99% of the time. (2) It could refer to sensitivity, meaning that if you’re not pregnant, you’ll have a negative test 99% of the time. (3) 99% chance that you are pregnant given a positive test. (4) 99% chance that you are not pregnant given a negative test. 5.100 Marijuana leads to heroin? The H set is very small compared to the M set and the H set is almost completely contained within the M set.

H M 5.101 Stay in school a) Use the extension of the multiplication rule with conditional probabilities. P(high school)  (P college | high school)  P(masters’ | high school and college)  P(Ph.D. | high school, college, and masters’) = (0.80)(0.50)(0.20)(0.30) = 0.024 b) We multiplied (1) the probability of getting a high school degree by (2) the probability of getting a college degree once you had a high school degree by (3) the probability of getting a masters’ degree once you had the earlier degrees by (4) the probability of getting a Ph.D. once you had the earlier degrees. c) Of those who finish college, 20% get a masters’ degree; of these, 30% get a Ph.D. (0.20)(0.30) = 0.06 5.102 How good is a probability estimate? 1 n  1 5282  0.014; Thus, the margin of error would lead to a predicted range of 0.257 – 0.014 = 0.243 to 0.257 + 0.014 = 0.271. b) The margin of error gets smaller and smaller (approaches zero) as n gets larger. The implication of this is that the estimate of probability is more accurate with a larger n.

114 Statistics: The Art and Science of Learning from Data, 4th edition 5.103 Protective bomb The fallacy of his logic is that the event of a person bringing a bomb is independent of the event of any other person bringing a bomb. Thus, if the probability of one person bringing a bomb on the plane is one in a million, that is true whether or not this person has a bomb on the plane. The probability of another person bringing a bomb given that this person has a bomb is the same as the probability of another person bringing a bomb given that this person does not have a bomb. 5.104 Streak shooter a) Over the course of a season, there will be many runs. Randomness produces runs. For example, if you flip a coin six times, it’s not unlikely that in the long run, you’ll have streaks of six heads. Using

Coselli’s logic, this would only have a (0.5) 6 = 0.016 chance, but over the long run, such streaks will occur just randomly. In fact, if you flip a balanced coin 2000 times, the longest run of heads you can expect during those flips is about 10! In the current example, you should think about whether this is like seeing six baskets in a row on the next six shots, or more like seeing six baskets in a row in a long series of shots (over a season, for example). b) This example demonstrates how things that look highly coincidental may not be so when viewed in a wider context. If you choose only the “streaks” and ignore the rest of the season, the sets of baskets in a row look like streaks. If you look at the context of the whole season, the “streaks” look like the kinds of runs that would occur if each basket were truly independent of the others. 5.105 Multiple choice Both (c) and (d) are correct. 5.106 Multiple choice The best answer is (d). 5.107 Multiple choice: Coin flip The best answer is (e). 5.108 Multiple choice: Dream come true The best answer is (b). 5.109 Multiple choice: Comparable risks The best answer is (b). 5.110 True or false a) False, any given sequence is equally likely to happen as any other possible sequence. b) True, there is only one sequence of ten flips resulting in ten heads but there are many combinations resulting in five heads. 5.111 True or false False, the sample space is TT, HT, TH, and HH. Thus, the probability of 0 heads is 0.25, of one head is 0.50, and 0.25 two heads. 5.112 Driving versus flying The reasoning behind this statement is that driving has a higher probability of death than does flying. Thus, the more people who switch to driving, the higher the rate of death. 5.113 Prosecutor’s fallacy Being not guilty is a separate event from the event of matching all the characteristics listed. It might be easiest to illustrate with a contingency table. Suppose there are 100,000 people in the population. We are given that the probability of a match is 0.001. So out of 100,000 people, 100 people would match all the characteristics. Now the question becomes, of those 100 people, what is the probability that a person would not be guilty of the crime? We don’t know that probability. However, let’s SUPPOSE in the population of 100,000, 5% of the people could be guilty of such a crime, 95% are not guilty. Here’s a contingency table that illustrates this concept.

Chapter 5: Probability in Our Daily Lives 115 5.113 (continued) Guilty

Not Guilty ???

Match No Match Total 5000 95,000 Thus, the P(not guilty | match) = ???/ 100. ♦♦5.114 Generalizing the addition rule

Total 100 99,900 100,000

C As we can see from the Venn diagram, disjoint events don’t overlap. Thus, the probability of any one of them occurring is the sum of the probability of each one occurring. ♦♦5.115 Generalizing the multiplication rule When two events are not independent, P(A and B) = P(A)  P(B | A); if we think about (A and B) as one event, we can see that P[C and (A and B)] = P(A and B)  P(C | A and B). If we replace P(A and B) with its equivalent, P(A)  P(B | A), we see that: P(A and B and C) = P(A)  P(B | A)  P(C | A and B). ♦♦5.116 Bayes’s rule a) P(A | B) = P(A and B)/P(B) and P(B | A) = P(A and B)/P(A) If we rearrange the second of the above formulas, we get P(A and B) = P(A)P(B | A). Now, we can replace P(A and B) in the first formula in the first line above with the rearrangement of the second formula. We now get: P(A | B) = [P(A )P(B | A)]/P(B). b) The probability that B occurs is the sum of the probability that B occurs and A occurs, and the probability that B occurs and A does not occur. Thus, we’re adding the probabilities that B occurs in both the presence and the absence of A. c c) If we replace P(B and A) with the rearrangement of the formula above, P(A)P(B | A), and P(B and A ) c c c with a similar rearrangement, P(A )P(B | A ), we can rearrange P(B) = P(B and A) + P(B and A ) to get c c the formula in the exercise: P(B) = P(A) P(B | A) + P(A )/P(B | A ) d) Based on the information in (a) through (c), we can replace parts of the following formula: P(A | B) = P(A and B)/P(B). The numerator P(A and B) can be replaced with the rearrangement from (a). The denominator P(B) can be replaced with the rearrangement from (c). c c P(A | B) = P(A)P(B | A) / [P(A) P(B | A) + P(A )P(B | A )]

Chapter Problems: Student Activities 5.117 Simulating matching birthdays a) The results will be different each time this exercise is conducted. b) The simulated probability should be close to 1. 5.118 Simulate table tennis a) The results will be different each time this exercise is conducted. b) The results will be different each time this exercise is conducted. 5.119 Which tennis strategy is better? a) The results will be different each time this exercise is conducted. b) The results will be different each time this exercise is conducted.

116 Statistics: The Art and Science of Learning from Data, 4th edition 5.120 Saving a business a) The results will be different each time this exercise is conducted. b) The results will be different each time this exercise is conducted. c) The results will be different each time this exercise is conducted.

Chapter 6: Probability Distributions 117

Section 6.1: Summarizing Possible Outcomes and Their Probabilities 6.1 Rolling dice a) Uniform distribution; P(1) = P(2) = P(3) = P(4) = P(5) = P(6) = 1/6 b) The probabilities below correspond to the stems on the graph in this exercise. Each probability is calculated by counting how many rolls of the dice add up to a particular number. For example, there are three rolls that add up to four (1,3; 2,2; 3,1); thus, the probability of four is 3/36 = 0.083. x P(x) x P(x) 2 1/36 8 5/36 3 2/36 9 4/36 4 3/36 10 3/36 5 4/36 11 2/36 6 5/36 12 1/36 7 6/36 c)

The probabilities in (b) satisfy the two conditions for a probability distribution: 0  P( x )  1 for each x and  P( x )  1.

6.2 Dental Insurance a) The value of the payout is determined by a random phenomenon, depending on whether you need major, minor, or no dental repair over the next 5 years. b) X is discrete, with possible values $0, $100, or $1000. c) P($0)  0.35, P($100)  0.60, and P($1000)  0.05 6.3 San Francisco Giants hitting a) The probabilities give a legitimate probability distribution because each one is between 0 and 1 and the sum of all of them is 1. b)   0P  0  1P 1  2P  2   3P  3  4P  4   0  0.7429   1 0.1704   2  0.0517   3 0.0055  4  0.0295  0.4083; The expected number of bases for a random time at bat for a San Francisco

Giants player is 0.4083. The mean is the expected value of X; that is, what we expect for the average in a long run of observations. Since the mean is a long term average of values, it doesn’t have to be one of the possible values for the random variable. 6.4 Best of three a) The probability distribution is X = 2 or 3, with P(2) = 0.58 and P(3) – 1 – 0.58 = 0.42. b)   2P  2   3P  3  2  0.58  3 0.42   2.42 c)

The sample space is WW, LL, WLW, WLL, LWW, LWL. P(Series ends after two games) = P(WW or LL) = 0.72 + 0.032 = 0.49 + 0.09 = 0.58 6.5 Grade distribution a) Final grade Probability 4 0.20 3 0.40 2 0.30 1 0.10 b)

  1 0.10  2  0.30  3  0.40  4  0.20  2.70; The average final grade for the class is 2.7.

118 Statistics: The Art and Science of Learning from Data, 4th edition 6.6 Selling houses a) The probabilities are not the same for each outcome; that is, each x value (0, 1, 2, 3 and 4) does not carry the same weight. b)   0  0.68  1 0.19  2  0.09   3  0.03  4  0.01  0.5; We expect the agent to sell 0.5 houses per month. 6.7 Playing the lottery a) Your 3-digit number only matches one of the 1000 options, so P($500) = 1/1000 = 0.001. b) x P(x) 0 0.999 500 0.001 c)

  0 0.999   500  0.001  0.5; Your expected return on a $1.00 bet is $0.50.

d) For Pick 4, your expected return is   0  0.9999   5000  0.0001  0.5 or $0.50. Since both games have the same expected return, we are indifferent as to which we play. 6.8 Roulette a) Answers will vary. 1  37  b) Expected profit for wager (1):   350     10    $0.53  38   38   18   20  Expected profit for wager (2):   10     10    $0.53  38   38 

In this sense, the two wagers are the same and both have a negative expected return. 6.9 More Roulette The wager on 23 will have the higher standard deviation since the winnings are, on average, further from the mean for this bet. 6.10 Ideal number of children a)

The mean for females is   0  0.01  1 0.03  2  0.55  3  0.31  4  0.11  2.50. The mean for males is   0  0.02   1 0.03  2  0.60  3  0.28  4  0.08  2.39. The means for the two

distributions are quite similar. b) Although the means are similar, the responses for males tend to be slightly closer to the mean than the responses for females. Thus, males seem to hold slightly more consistent views than females about ideal family size. 6.11 Profit and the weather a) x P(x) $80,000 0.70 $50,000 0.20 $20,000 0.10 b)

P( X  $50,000)  P( X  $50, 000 or X  $20,000)  0.20  0.10  0.30

  80,000  0.70   50,000 0.20  20, 000  0.10  68,000; The wheat farmer’s expected profit is $68,000.

d) x $77,000 $47,000 $37,000

P(x) 0.70 0.20 0.10

Chapter 6: Probability Distributions 119 6.11 (continued)

  77,000  0.70  47,000  0.20  37,000 0.10  67,000; With the insurance policy, the farmer’s expected profit is $67,000. The insurance policy actually results in a lower expected profit for the farmer. 6.12 Buying on eBay a) and b) Sample Space Probability WW (0.1)(0.2) = 0.02 WL (0.1)(0.8) = 0.08 LW (0.9)(0.2) = 0.18 LL (0.9)(0.8) = 0.72 c) x P(x) $50 0.02 $30 0.08 $20 0.18 $0 0.72 d)

  50 0.02   30  0.08  20  0.18  0 0.72   7, or $7

6.13 Selling at the right price a)

  90  0.50  120 0.20  130 0.30   108; The expected selling price for the sale of a drill is $108. x $90 $120 $130

P(x) 0.50 0.20 0.30

  90  0.30  110  0.40  130 0.30  110; The expected selling price for the sale of a drill under the

new pricing strategy is $110. The new strategy will result in higher profits in the long run. 6.14 Uniform distribution a) 1

0 0 1 b) The mean of this distribution is 0.50. c) There is a 0.40 probability that this random variable falls between 0.35 and 0.75. d) There is a 0.80 probability that this random variable falls below 0.80.

120 Statistics: The Art and Science of Learning from Data, 4th edition 6.15 TV watching a) TV watching is, in theory, a continuous random variable because someone could watch exactly one hour of TV or 1.835 hours or 2.07 hours of TV. b) Histograms were used because TV watching was measured to the nearest integer. The histograms would display the frequencies of the rounded (to the nearest integer) values obtained in the sample. c) The two smooth curves would represent the approximate (based on the histograms) shape of the probability distribution for TV watching in the population if we could measure it in a continuous manner. Then, the area under the curve above an interval would represent the probability of an observation falling in that interval.

Section 6.2: Probabilities for Bell-Shaped Distributions 6.16 Probabilities in tails a) The probability that an observation is at least one standard deviation above the mean is 0.159. b) The probability that an observation is at least one standard deviation below the mean is 0.159. c) The probability that an observation is within one standard deviation of the mean is 1 – 2(0.159) = 0.682. 6.17 Probability in graph a) The observation would fall 0.67 standard deviations above the mean, and thus, would have a z-score of 0.67. Looking up this z-score in Table A, we see that this corresponds to a cumulative probability of 0.749. The probability that an observation falls above this point (in the shaded region) is 1 – 0.749 = 0.251. b) The observation would fall between 0.50 standard deviations below the mean and 0.50 standard deviations above the mean. Looking up the z-score of 0.50 in Table A, we see that this corresponds to a cumulative probability of 0.6915. The probability that an observation falls above this point is 1 – 0.6915 = 0.3085. Because of the symmetry of the normal distribution, the probability of falling below the opposite of this z-score is also 0.3085. This means that the probability of falling more than 0.5 standard deviations away from the mean is 2(0.3085) = 0.6170. Therefore, the probability of falling within 0.5 standard deviations of the mean is 1 – 0.6170 = 0.383. 6.18 Empirical rule a) The cumulative probability for one standard deviation above the mean (z-score of 1) is 0.8413, and for one standard deviation below the mean (z-score of –1) is 0.1587. The difference, 0.8413 – 0.1587 = 0.6826 (rounds to 0.68), the probability of a normally distributed random variable falling within 1 standard deviation of the mean on either side. b) Similarly, we can look up a z-score of 2 to find that its cumulative probability is 0.9772. The cumulative probability for –2 is 0.0228. 0.9772 – 0.0228 = 0.9544 (rounds to 0.95). c) Finally, we can do the same for z-scores of 3 and –3, to get cumulative probabilities of 0.9987 and 0.0013, respectively. 0.9987 – 0.0013 = 0.9974, or roughly 1.00. 6.19 Central probabilities a) If we look up 1.64 on Table A, we see that the cumulative probability is 0.9495. The cumulative probability is 0.0505 for –1.64. 0.9495 – 0.0505 = 0.899, which rounds to 0.90. b) Using similar logic, we see 0.9951 and 0.0049 on the table for 2.58 and –2.58, respectively. 0.9951 – 0.0049 = 0.9902, which rounds to 0.99. c) Finally, we can use the same logic for z-scores of 0.67 and –0.67. We find cumulative probabilities of 0.7486 and 0.2514. The difference between these is 0.7486 – 0.2514 = 0.4972, which rounds to 0.50.

Chapter 6: Probability Distributions 121 6.19 (continued) d)

6.20 z-score for given probability in tails a) First we calculate the cumulative probability for the total probability of 0.02 in the tails. We divide 0.02 by two to determine the probability in each tail, 0.01. Then subtract the probability in the top tail from 1 to get 0.99. We can look up the closest cumulative probabilities to 0.01 and 0.99 in Table A to get z-scores of –2.33 and 2.33. b) The probability more than 2.33 standard deviations above the mean equals 0.01 both because it is half of 0.02, the probability that it would fall beyond this z-score in either direction, and because it is the probability above the cumulative probability of 0.99 associated with this z-score and below the cumulative probability of 0.01 associated with this z-score. c) 2.33 standard deviations above the mean is the 99th percentile because only 1% (0.01) of the population falls above this standard deviation. 6.21 Probability in tails for given z-score a) We have to divide this probability by two to find the amount in each tail, 0.01/2 = 0.005. We then subtract this from 1.0 to determine the cumulative probability associated with this z-score, 0.995. We can look up this probability on Table A to find the z-score of 2.58. b) For both (a) and (b), we divide the probability in half, subtract from one, and look it up on Table A (a) 1.96 (b) 1.645

6.22 z-score for right-tail probability a) 0.20 is associated with a cumulative probability of 1 – 0.20 = 0.80. When we look up 0.80 in Table A, we find a z-score of 0.84.

122 Statistics: The Art and Science of Learning from Data, 4th edition 6.22 (continued) b) (i) 0.05 is associated with a cumulative probability of 1 – 0.05 = 0.95. When we look up 0.95 in Table A, we find a z-score of 1.645. (ii) 0.05 is associated with a cumulative probability of 1 – 0.05 = 0.995. When we look up 0.995 in Table A, we find a z-score of 2.58. 6.23 z-score and central probability a) The middle 50% means that 25% of the scores fall between the mean and each z-score (negative and positive). This corresponds to a cumulative probability of 0.75. When looked up in Table A, we find a z-score of 0.67. b) Using the same logic as in (a), we find a cumulative probability of 0.95, and a z-score of 1.645. c)

6.24 Female heights x

62  65  0.86; which, from Table A, corresponds to a cumulative probability of 0.1949 3.5 (rounds to 0.19). Thus, 0.19 of adult females in North America would not be tall enough to be a flight attendant. 6.25 Blood pressure

z





x

140  121  1.19 16  b) A z-score of 1.19 corresponds, from Table A, to a cumulative probability of 0.8830. The amount above this z-score would be 1 – 0.88 = 0.12. x   100  121   1.31; which corresponds to a cumulative probability of 0.0951. If we c) z  16  subtract 0.0951 from 0.8830 (the cumulative probability for 140), we get 0.79 as the probability between 100 and 140. d) The 90th percentile is associated with a cumulative probability of 0.90. When we look up 0.90 in Table A, we find a z-score of 1.28. Using x    z , the 90th percentile is 121 + 1.28(16) = 141.5.

z



6.26 Coffee Machine x

12  13  1.67; which corresponds to a cumulative probability of 0.0475. Therefore,  0.6 4.75% of cups will be filled with less than 12 ounces of coffee. x   12.5  13   0.83; which corresponds to a cumulative probability of 0.2033. Therefore, 1 b) z   0.6 – 0.2033 = 0.7967, or about 80% of cups will be filled with more than 12.5 ounces of coffee. c) Since half the cups will be filled with 13 or less ounces of coffee, the percentage of cups with between 12 and 13 ounces of coffee would be 50% – 4.8% = 45.2%.

z



Chapter 6: Probability Distributions 123 6.27 Energy use x

1000  673  0.59; which corresponds to a cumulative probability of 0.7224. Therefore, 556 the probability that household electricity use was greater than 1000 kilowatt-hours is 1 – 0.7224 or 0.28. b) No, the distribution of energy use doesn’t appear to be normal since for households with no energy x   0  673   1.21. If the distribution was normal, this implies there are houses with use, z   556 negative energy use. 6.28 Birth weight for boys

z



x



2.5  3.41  1.65; which corresponds to a cumulative probability of 0.049. Therefore,  0.55 the proportion of baby boys born with low birth weight is 0.49. x   1.5  3.41   3.47 b) z   0.55 x   4.0  3.41   1.07; which corresponds to a cumulative probability of 0.858. Thus, the c) z   0.55 probability that a baby boy is born with a weight between 2.5 kg and 4.0 kg is 0.858 – 0.049 = 0.809, or about 80.9%. x   3.6  3.41 d) z    0.345; which corresponds to a cumulative probability of 0.635. Therefore,  0.55 Matteo falls at the 63.5th percentile. e) The 96th percentile is associated with a cumulative probability of 0.96, which corresponds to a z-score of 1.75. Using x    z , Max weighs 3.41 + 1.75(0.55) = 4.37 kilograms.

z



6.29 MDI x

120  100  1.25; which corresponds to a cumulative probability of 0.894. This  16 indicates that the proportion of children with MDI of at least 120 is 1 – 0.894 = 0.106. x   80  100   1.25; which corresponds to a cumulative probability of 0.106. This (ii) z   16 indicates that the proportion of children with MDI of at least 80 is 1 – 0.106 = 0.894. b) The 99th percentile for MDI is 137.3. The z-score corresponding to the 99th percentile is 2.33. To find the value of x, we calculate x    z  100   2.3316  137.3. z



(i)

The 1st percentile for MDI is 62.7. The z-score corresponding to the 1st percentile is –2.33. To find the value of x, we calculate x    z  100   2.3316  62.7.

6.30 Quartiles and outliers a) The z-score corresponding to the lower quartile (25%) of a normal distribution is –0.67. b) For MDI, Q1 = 89.3. The z-score corresponding to the lower quartile, the 25th percentile, is –0.67, so x    z  100   0.67 16  89.3. Thus 89.3 is the MDI score that marks the lowest 25% of

scores. For MDI, Q3 = 110.7. The z-score corresponding to the upper quartile, the 75th percentile, is 0.67, so x    z  100   0.67 16   110.7. Thus 110.7 is the MDI score that marks the highest 25% of scores. c) IQR = Q3 – Q1 = 110.7 – 89.3 = 21.4 d) 1.5(21.4) = 32.1, 89.3 – 32.1 = 57.2, and 110.7 + 32.1 = 142.8, therefore MDI scores lower than 57 or higher than 143 would be considered potential outliers.

124 Statistics: The Art and Science of Learning from Data, 4th edition 6.31 April precipitation z

x

8.4  3.6

  3; If the distribution were roughly normal, this would be unusually high since  1.6 99.9% of observations fall below that level. x   4.5  3.6   0.563; which corresponds to a cumulative probability of 0.713. Therefore, 4.5 b) z   1.6 inches of precipitation falls at the 71.3rd percentile. c) The given percentages are similar to the percentages within 1, 2, or 3 standard deviations for the normal distribution, so it appears the distribution of April precipitation is approximately normal.

6.32 Tall enough to ride? z

x

56  54.5

  0.33; which corresponds to a cumulative probability of 0.6293. Thus,  4.5 approximately 1 – 0.6293 = 0.3707, or about 37% of 10-year-old boys are tall enough to ride the roller coaster. x   50  54.5   1; which corresponds to a cumulative probability of 0.1587. Thus, b) z   4.5 approximately 1 – 0.1587, or about 84% of 10-year-old boys are tall enough to ride the smaller roller coaster. c) 0.6293 – 0.1587 = 0.4706, or about 47%

6.33 SAT versus ACT 600  500 x   25  21  1.0; Kate’s ACT score: z    0.85  100  4.7 The SAT of 600 is relatively higher than the ACT of 25 because it is further from the mean in terms of the number of standard deviations, so Joe did relatively better. 6.34 Relative height

Joe’s SAT score: z 



70  65 x   75  70  1.43 l; Man: z    1.25  3.5  4.0 The woman’s height is relatively taller than the man’s height because it is further from the mean in terms of standard deviations. This is due to the fact that men’s scores are more spread out than are women’s scores, and so a slightly more extreme score is more likely for a man than for a woman.

Woman: z 

x



Section 6.3: Probabilities When Each Observation Has Two Possible Outcomes 6.35 Kidney transplants a) Sample Space SSS SSF SFS FSS SFF FSF FFS FFF

Probability (0.1)3 (0.1)2(0.9) (0.1)2(0.9) (0.1)2(0.9) (0.1)(0.9)2 (0.1)(0.9)2 (0.1)(0.9)2 (0.9)3

Compatible Donors 3 2 1 0

Probability (0.1)3 = 0.001 3(0.1)2(0.9) = 0.027 3(0.1)(0.9)2 = 0.243 (0.9)3 = 0.729

Chapter 6: Probability Distributions 125 6.35 (continued)

b) The formula for the binomial distribution is P  x  

n! p x (1  p )n  x . x !( n  x )!

P 0 

3! 3! 0.10 (1  0.1)30  0.729, P 1  0.11 (1  0.1)31  0.243, 0!(3  0)! 1!(3  1)!

P 2 

3! 3! 0.12 (1  0.1)31  0.027, P  3  0.13 (1  0.1)33  0.001 2!(3  2)! 3!(3  3)!

P(x)

6.36 Compatible donors a) A trial consists of selecting a donor from the registry and observing whether the donor is compatible (two possible outcomes, i.e., binary). There are 3 trials. b) Yes, for each trial there is a 10% chance that the selected donor is compatible. c) Whether one donor is compatible does not depend on some other donor being compatible, so the trials are independent. 6.37 Symmetric binomial a) x P(x) 0.40 0 0.0625 1 0.2500 0.35 2 0.3750 0.30 3 0.2500 4 0.0625 0.25 0.20 0.15 0.10 0.05 0

2 x

b) P(x) 0.2401 0.4116 0.2646 0.0756 0.0081

0.4

0.3 P(x)

x 0 1 2 3 4

0.2

0.1

0.0

126 Statistics: The Art and Science of Learning from Data, 4th edition 6.37 (continued) c) x 0 1 2 3 4

0.7

P(x) 0.6561 0.2916 0.0486 0.0036 0.0001

0.6 0.5

P(x)

0.4 0.3 0.2 0.1 0.0 0

2 x

d) Only the graph in (a) is symmetric. The case n = 20 and p = 0.5 would be symmetric. e) The graph in (c) is the most heavily skewed. The case n = 4 and p = 0.01 would exhibit more skewness than the graph in (c) since even more of the mass would be centered at 0. 6.38 Number of girls in family a) The data are binary (boy, girl), there is the same probability of success for each trial (0.49), and the trials are independent (a previous child’s sex has no bearing on the next child’s sex). b) For this distribution, n is four, and p is 0.49. c) The probability that the family has 2 girls and 2 boys is 0.37. P 2 

4! 0.49 2 1  0.49 42  0.37 2! 4  2 !

6.39 Bidding on eBay a) Each bid can be thought of as a trial which can result in two outcomes (win or lose). The bids are independent, and for each bid there is a constant probability (25%) of winning. b) The probability of winning exactly two bids is 0.211. 4! P 2  0.252 1  0.2542  0.211 2! 4  2 !

The probability of winning at most two bids is 0.949. 4! P  X  2   P  0 +P 1 +P  2    0.250 1  0.2540 0! 4  0! 

4! 0.251 1  0.2541 1!  4  1!



4! 0.252 1  0.2542 2!  4  2 !

 0.949 d) The probability of winning at more than two bids is 0.051. Using the result from (c): P  X  2   1  P  X  2   1  0.949  0.051.

6.40 More eBay bidding a) This is not a binomial distribution, the probability of success is not constant. b) This is not a binomial distribution, X does not count number of successes.

Chapter 6: Probability Distributions 127 6.41 Passing by guessing a) The probability she answers all four questions correctly is 0.0016. 4 0  4  1   4  P (4)         0.0016  4  5   5 

b) The probability she passes the quiz is 0.0272. 3 1 4 0 4  1   4  4  1   4  P (Passes quiz)  P(3)  P(4)                0.0256  0.0016  0.0272  3  5   5   4   5   5 

6.42 NBA shooting a) We must assume that the data are binary (which they are – free throw made or missed), that there is the same probability of success for each trial (free throw), and that the trials are independent. b) n = 10; p = 0.90 c) (i) P(Makes all 10 free throws) = 0.349 10! P 10  0.9 10 (1  0.9)1010  0.3487 10!(10  10)! (ii) P(Makes 9 free throws) = 0.387 10! P 9  0.99 (1  0.9)109  0.3874 9!(10  9)! (iii) P(Makes more than seven free throws) = P(8) + P(9) + P(10) = 0.3487 + 0.3874 + 0.1937 = 0.9298 10! P 8  0.9 8 (1  0.9)108  0.1937 8!(10  8)! 6.43 Season performance

  np   400 0.90  360,   np 1  p   400  0.9 0.1  6

b) By the Empirical Rule, we would expect the number to fall almost certainly in the range from 360 – 3(6) = 342 to 360 + 3(6) = 378. This is the case because the probability within 3 standard deviations of the mean is close to 1.0. c) We can calculate the proportion indicated by each end of the range. 342/400 = 0.855, and 378/400 = 0.945 6.44 Is the die balanced? a) n = 60; p = 1/6 = 0.1667 (Which rounds to 0.167.) b)   np   60 1 6  10,   np 1  p   60 1 65 6  2.887 (Which rounds to 2.89.) This indicates that if we sample 60 rolls from a population in which 1/6 are 6s, we would expect that about 10 in the sample are 6s. We also would expect a spread for this sample of about 2.887. c) We would be skeptical because 0 is well over three standard deviations from the mean. 60! d) P  0  0.16670 (1  0.1667)600  0.0000177 0!(60  0)!

 

6.45 Exit poll a) This scenario satisfies the three conditions needed to use the binomial distribution because 1) the data are binary (voted for proposition or not), 2) there is the same probability of success for each trial (i.e., 0.50), and the trials are independent (the vote of one individual likely does not affect the vote of the next individual; n < 10% of the population size). n = 3000; p = 0.50.

  np  3000 0.50  1500,   np(1  p )  3000  0.501  0.50 = 27.4

c) We would expect X to fall almost certainly between 1500 – 3(27.4) = 1418 and 1500 + 3(27.4) = 1582. d) If the exit poll had x = 1706, this would suggest that the actual value of p is higher than 0.50.

128 Statistics: The Art and Science of Learning from Data, 4th edition 6.46 Jury duty a) One can assume that X has a binomial distribution because 1) the data are binary (Hispanic or not), 2) there is the same probability of success for each trial (i.e., 0.40), and 3) each trial is independent of the other trials (whom you pick for the first juror is not likely to affect whom you pick for the other jurors, and n < 10% of the population size). n = 12; p = 0.40 b) The probability that no Hispanic is selected is 0.002. 12! P 0  0.400 1  0.40120  0.002 0! 12  0!

If no Hispanic is selected out of a sample of size 12, this does cast doubt on whether the sampling was truly random. There is only a 0.2 % chance that this would occur if the selection were done randomly. 6.47 Poor, poor, Pirates a)

  np  162  0.42   68.04

b) Using technology: P(X ≥ 81) = 0.0242. c) It is unlikely that the probability of winning remains constant from one trial to the next since the team may use different pitchers/players for different games. It is also possible that the trials are not independent. After losing several games in a row, the players may become discouraged and not play as well in future games. 6.48 Checking guidelines a) The sample is less than 10% of the size of the population size; thus the guideline about the relative sizes of the population and the sample was satisfied. b) The binomial distribution is not likely to have a bell shape because neither the expected number of success, np, nor the expected number of failures, n(1 – p), is at least 15. Both are 5. 6.49 Class sample X would not have a binomial distribution because the trials are not independent. n also needs to be less than 10% of the population size, but is 25% of the population size. 6.50 Binomial needs fixed n a) The formula for the probabilities for each possible outcome in a binomial distribution relies on a given number of trials, n. b) The binomial applies for X. We only have n for X, not Y. n = 3, p = 0.50 6.51 Binomial assumptions a) Families tend to make decisions about church attendance together; thus, a given family member’s decision is likely to be dependent on others’ decisions. The independence assumption is not plausible. b) The 100 votes are not randomly selected from the population. Rather, they are the 100 votes in the first precinct that reports. Thus, this is not a random sample. c) Four is not less than 10% of the population size of 20. We cannot assume independence.

Chapter Problems: Practicing the Basics 6.52 Grandparents a) This refers to a discrete random variable because there can only be whole numbers of grandparents. One can’t have 1.78 grandparents. b) The probabilities satisfy the two conditions for a probability distribution because they each fall between 0 and 1, and the sum of the probabilities of all possible values is 1. c) The mean of this probability distribution is   0(0.71)  1(0.15)  2(0.09)  3(0.03)  4(0.02)  0.50. 6.53 Straight or boxed? Your expected winnings under the straight play are 500  P($500) = 500(1/1000) = $0.50 since there are 1000 possible Pick-3 lottery numbers and the number chosen must match yours exactly in order for you to win. Your expected winnings under the boxed play are 80  P($80) = 80(6/1000) = $0.48 since there are 6 possibilities containing the numbers 5, 1 and 4. Note that both plays will cost you $1 so that your expected profit is negative (–$0.50 and –$0.52, respectively) in either case! If you must play, you are slightly better off playing straight but your best option is not to play.

Chapter 6: Probability Distributions 129 6.54 Auctioning paintings a) Letting W = wins the bid, and L = loses the bid, the sample space is WW, WL, LW, LL. b) No, the probability of winning the second bid changes with the outcome of the first bid, so the events are dependent. c) Outcome Probability WW (0.3)(0.8) = 0.24 WL (0.3)(0.2) = 0.06 LW (0.7)(0.1) = 0.07 LL (0.7)(0.9) = 0.63 d) x P(x) $5000 0.24 $3000 0.06 $2000 0.07 $0 0.63

  0(0.63)  2000(0.07)  3000(0.06)  5000(0.24)  1520, or $1520

6.55 NJ Lottery a) In this lottery, each pick is independent, because digits can occur more than once. Thus, the probability of one of your numbers occurring is 3/10; if the first does occur, the probability of your second number occurring is 2/10; if the first two occur, the probability of your third number occurring is 1/10. Thus, the probability that all three will occur is 0.006. Subtracting from 1.0, we find that the probability that you will not get all three is 0.994. The mean is $0(0.994) + $45.50(0.006) = $0.27. Winnings Probability $0 0.994 $45.50 0.006 b) The probability that your first number will occur is 1/10, as it is for the second and third. Thus, the probability that all three will occur is 0.001, and the probability that all three will not occur is 0.999. The mean is $0(0.999) + $45.50(0.001) = $0.045. Winnings Probability $0 0.999 $45.50 0.001 c) The first strategy of picking three different digits is a better strategy because you have a higher chance of winning than with the second strategy. 6.56 Are you risk averse? a) The expected outcome for Program 1 is 200. The expected outcome for Program 2 is 0.6667(0) + 0.3333(600) = 200. Thus, the expected outcomes for the two programs are the same. b) Expected deaths with Program 3 = 400 Expected deaths with Program 4 = 0.3333(0) + 0.6667(600) = 400 Thus, the expected deaths are the same for both programs. c) Programs 1 and 3 are similar because the outcomes are known; these are risk-averse strategies. Programs 2 and 4 are similar because they involve risk-taking. 6.57 Flyers’ insurance a) Money Probability $0 0.999999 $100,000 0.000001

130 Statistics: The Art and Science of Learning from Data, 4th edition 6.57 (continued) b)   0(0.999999)  100,000(0.000001)  0.10, or $0.10

The company is very likely to make money in the long run because the return on each $1 spent per flyer averages to $0.10. 6.58 Normal probabilities a) From Table A, we see that 0.9750 falls below a z-score of 1.96 and 0.0250 falls below –1.96. Thus, 0.9750 – 0.0250 = 0.95 would fall within 1.96 standard deviations of the mean. b) From Table A, we see that 0.9901 falls below a z-score of 2.33 and 0.0099 falls below –2.33. Since 0.9901 – 0.0099 = 0.9802 would fall within 2.33 standard deviations of the mean, 1 – 0.9802 = 0.02 would fall more than 2.33 standard deviations from the mean. 6.59 z-scores a) First, divide 0.95 in half to determine the amount between the mean and the z-score. This amount is 0.475, which means that 0.975 falls below the z-score in which we’re interested. According to Table A, this corresponds to a z-score of 1.96. b) First, divide 0.99 in half to determine the amount between the mean and the z-score. This amount is 0.495, which means that 0.995 falls below the z-score in which we’re interested. According to Table A, this corresponds to a z-score of 2.58.

6.60 z-score and tail probability a) The z-score that is less than only 1% of the values would be greater than 99% of the values. If we look up 0.99 on Table A, we see that the z-score is 2.33.

b) (i) The z-score that is above 0.90 is 1.28. (ii) The z-score that is above 0.99 is 2.33. 6.61 Quartiles a) If the interval contains 50% of a normal distribution, then there is 25% between the mean and the positive z-score. Added to 50% below the mean, 75% of the normal distribution is below the positive z-score. If we look this up in Table A, we find a z-score of 0.67.

Chapter 6: Probability Distributions 131 6.61 (continued) b) The first quartile is at the 25th percentile, and the third is at the 75th percentile. We know that the 75th percentile has a z-score of 0.67. Because the normal distribution is symmetric, the z-score at the 25th percentile must be –0.67. c) Q1    0.67 and Q3    0.67

The interquartile range is Q3  Q1     0.67      0.67   2  0.67   . 6.62 Boys and girls birth weight x

2.5  3.41  1.645; which corresponds to a cumulative probability of 0.049, so  0.55 2.5 kg falls at the 4.9th percentile for boys. x   2.5  3.29   1.519; which corresponds to a cumulative probability of 0.064, so For girls, z   0.52 2.5 kg falls at the 6.4th percentile for girls. b) Since 2.5 kg falls at a lower percentile for boys, it is a more extreme weight for boys. Also, a z-score of –1.645 is more extreme than a z-score of –1.519, so 2.5 kg is a more extreme weight for boys. 6.63 Cholesterol

For boys, z 

x



240  220  0.5; This z-score corresponds to a proportion of 0.691. With 0.69 below the high 40 risk demarcation, that leaves 1 – 0.69 = 0.31 above it. 6.64 Female heights z





x

60  65  1.43; According to Table A, 0.076 of women are below this z-score; so 0.076 3.5 of women are below five feet in height. x   72  65   2.0; 0.977 of women are below this z-score, which indicates that 1 – 0.977 = b) z   3.5 0.023 of women are beyond this z-score; that is, over 6 feet in height. c) 60 and 70 are equidistant from the mean of 65. If 0.076 of women are below 60 inches, that indicates that 0.076 women are above 70 inches, leaving 0.848 of women between 60 and 70 inches. d) For North American males: x   60  70   2.5; According to Table A, 0.006 of men are below this z-score; a) z   4.0 therefore, 0.006 of men are below five feet in height. x   72  70   0.5; 0.691 of men are below this z-score, which indicates that 1 – 0.691 = b) z   4.0 0.309 of men are beyond this z-score; that is, over 6 feet. c) 70 is the mean, and 60 has a cumulative probability of 0.006 (from part a). Thus, 0.50 – 0.006 = 0.494 of men are between 60 and 70 inches. 6.65 Cloning butterflies

z

cm.



x

 x



89  1.33; According to Table A, 0.092 of the butterflies have a wingspan less than 8 0.75

10  9  1.33; 0.092 of the butterflies have a wingspan wider than 10 cm.  0.75 c) From (a) and (b), 1 – 2(0.092) = 0.816 of the butterflies have wingspans between 8 and 10 cm. d) From Table A, the 90th percentile is 1.28. Using x    z , 10% of the butterflies have wingspan wider than 9 + 1.28(0.75) = 9.96 cm.

z



132 Statistics: The Art and Science of Learning from Data, 4th edition 6.66 Gestation times x

258  281.9  2.10; According to Table A, 0.018 fall below this z-score; thus, 0.018 of 11.4 babies would be classified as premature. 6.67 Used car prices z





x

25,000  23,800  0.27; which corresponds to a cumulative probability of 0.608. Thus, 4380 1 – 0.608 = 0.392, about 39.2% of used Audi A4s cost more than $25,000. x   22, 000  23,800   0.41; which corresponds to a cumulative probability of 0.341 and b) z   4380 18,000  23,800 z  1.32; which corresponds to a cumulative probability of 0.093. Thus, 0.341 – 4380 0.093 = 0.248, about 24.8% of used Audi A4s cost between $18,000 and $25,000. c) The z-score that corresponds to the lowest 10% is –1.2816. Using x    z , 10% of used Audi 4As cost less than 23,800 – 1.2816(4380) = 18,187, or about $18,187. 6.68 Used car deals a) According to Table A, a z-score of 1.5 indicates that 0.93 of the curve falls below this point. Therefore, 1 – 0.93 = 0.067, or 6.7% of used Audi 4As will be highlighted. b) 6.7% of used Honda Civics will be highlighted for the same reasons as in (a). Since the z-score is given, the mean and standard deviation are not needed. c) The percentage will most likely be larger because more cars will be priced at the lower end if distribution is right-skewed. 6.69 Global warming The third quartile indicates the point that 75% of observations fall below, and corresponds to a z-score of 0.67. We can solve for  algebraically in the equation below; the mean weekly gas usage, therefore, would have to be reduced to 16.0. 0.67   20    / 6  20    4.02    16

z





6.70 Fast food profits x

0  140  1.75; Table A indicates that 0.04 of observations fall below this z-score. The  80 probability that the restaurant loses money on a given day is 0.04. b) Probability = (1 – 0.04)7 = (0.96)7 = 0.75. For this calculation to be valid, each day’s take must be independent of every other day’s take. 6.71 Metric height a) The distribution would still be normal because it is the same distribution in another metric; just as converting to z-scores would give us the same distribution, converting to centimeters would give us the same distribution. b) In centimeters, the mean would be (72)(2.54) = 182.88 cm, and the standard deviation would be (4)(2.54) = 10.16 cm. x   200  182.88   1.69; On Table A, this z-score corresponds to a proportion of 0.95. Thus, c) z   10.16 0.95 fall below this height, and 1 –0.95 = 0.05 fall above it. 6.72 Manufacturing tennis balls

z



56.7  57.6 x   58.5  57.6  3.0; 58.5 grams: z    3.0  0.3  0.3 The proportion of observations below a z-score of 3.0 is 0.9987 and below –3.0 is 0.0013. 0.9987 – 0.0013 = 0.997 (essentially 1.0) is the probability that a ball manufactured with this machine satisfies the rules.

56.7 grams: z 

x



Chapter 6: Probability Distributions 133 6.72 (continued) 56.7  57.6 x   558.5  57.6  1.5; 58.5 grams: z    1.5 0.6  0.6 The proportion of observations below a z-score of 1.5 is 0.933 and below –1.5 is 0.067. 0.933 – 0.067 = 0.866 (rounds to 0.87), which is the probability that a ball manufactured with this machine satisfies the rules. 6.73 Bride’s choice of surname If each marriage is independent of the others (as they would be in a random sample), the probability would be (0.9)(0.9)(0.9)(0.9) = 0.6561. 6.74 ESP a) Sample space: (SSS, SSF, SFS, SFF, FSS, FSF, FFS, FFF), where S = guessing correctly. The probability of each is (0.5)(0.5)(0.5) = 0.125. For each number of successes, the probability equals the number of outcomes with that number of successes multiplied by 0.125. For example, there is one outcome with zero successes, giving a probability of 0.125. x P(x) 0 0.125 1 0.375 2 0.375 3 0.125 b) The probability distribution will be the same as in (a). 3! 3! P 0  0.50 1  0.530  0.125, P 1  0.51 1  0.531  0.375, 0!  3  0! 1!  3  1!

b) 56.7 grams: z 

P 2 

x





3! 3! 0.52 1  0.532  0.375, P 3  0.53 1  0.533  0.125 2! 3  2 ! 3! 3  3!

6.75 More ESP a) For the analogy with coin flipping, heads would represent a correct guess and tails would represent an incorrect guess on any trial. b) It is sensible to assume the same probability for each trial because with random guessing, she has a constant probability of 0.2 of guessing the right number, for each of the three trials. c) It is sensible to assume independent trials because trials are not affected by the outcomes of previous trials. 6.76 Yale babies Yes, since the probability that 14 or more infants would choose the helpful figure randomly is so small (0.002), this can be considered evidence that the infants are exhibiting a preference for the helpful object. P (14)  P (15)  P(16) 14! 15! 16!  0.514 1  0.51614   0.515 (0.5)1   0.516 (0.5)0 14!16  14 ! 15!1! 16!0!  0.002

6.77 Weather a) If R = rain and D = dry, the possibilities for the weekend’s weather are RR, RD, DR and DD, all equally likely because P(rain) = 0.5. Because it rains on at least one of the days for 3 out of the 4 possibilities, the probability is 0.75. 2! b) 1  P(0)  1  0.50 (1  0.5)20  1  0.25  0.75 0!(2  0)!

134 Statistics: The Art and Science of Learning from Data, 4th edition 6.78 Dating success a) There are three conditions to use the binomial distribution. 1) The outcomes must be binary (e.g., yes or no as the only two options). 2) He must have the same probability of success for each call. 3) Each trial (phone call) must be independent of the others. b) If he calls the same girl five times, her responses are not likely to be independent of her other responses! c) n = 5; p = 0.60;   np  5(0.60)  3 6.79 Canadian lottery a) Since P(winning) = 1/1,000,000,   np  0.000001n.

b) For the mean to be 1, you’d have to play a million times. 1,000,000(0.000001) = 1. c) If you play a million times, the mean is 1. You can calculate your winnings as follows: 1(100,000) = $100,000. However, you would have spent a million to play, and so the profit is 100,000 – 1,000,000 = –900,000, or a loss of $900,000. 6.80 Likes on Facebook a) The data are binary (like message, not like message), there is the same probability of success (liking the message) for each trial, and the trials are independent (one user’s response will not affect another’s). n = 15,000,000, p = 0.00001 b)

  np  15, 000, 000(0.00001)  150,   np (1  p )  15,000,000(0.00001) 1  0.00001  12.25

Using x    z , 150 – 3(12.25) = 113.25 and 150 + 3(12.25) = 186.75, so the interval is [113.25, 186.75]. d) Users may not have the same probability of liking the message and responses may not be independent. (If you see your friend liking the message, you may be more inclined to like it, too, if you received it.) 6.81 Likes with online credit c)

  15, 000, 000(0.0005)  7500,   np(1  p )  15,000, 000(0.0005) 1  0.0005  86.6

b) Almost all values will fall within 3 standard deviations of the mean. Using x    z , 7500 – 3(86.6) = 7240 and 7500 + 3(86.6) = 7760, so the interval is [7240, 7760]. c) Since with high probability at most 7760 users will like the message, the company should set aside $7760. (Answers may vary.) 6.82 Which distribution for sales? a) Since (i) each trial has two outcomes, (ii) the probability of a successful phone call is the same for each call (2%), and (iii) the trials are independent (the outcome of one call does not affect the outcome of another call) the distribution is binomial. b) c)

  np  200(0.02)  4.0,   np(1  p )  200(0.02)(0.98)  2.0; The expected number of successful calls out of 200 is 4.  200  P 0   0.98200 0.02 0  0.018   0 

Chapter 6: Probability Distributions 135

Chapter Problems: Concepts and Investigations 6.83 Best of five a) Let A = team A wins (P(A) = 0.5) and B = Team B wins (P(B) = 0.5). Outcome x Probability Outcome x Probability AAA 3 (0.5)3 = 0.125 BAAA 4 (0.5)4 = 0.0625 AABA 4 (0.5)4 = 0.0625 BAABA 5 (0.5)5 = 0.03125 5 AABBA 5 (0.5) = 0.03125 BAABB 5 (0.5)5 = 0.03125 5 AABBB 5 (0.5) = 0.03125 BABAA 5 (0.5)5 = 0.03125 ABAA 4 (0.5)4 = 0.0625 BABAB 5 (0.5)5 = 0.03125 5 ABABA 5 (0.5) = 0.03125 BAAB 4 (0.5)4 = 0.0625 ABABB 5 (0.5)5 = 0.03125 BBAAA 5 (0.5)5 = 0.03125 5 ABBAA 5 (0.5) = 0.03125 BBAAB 5 (0.5)5 = 0.03125 5 ABBAB 5 (0.5) = 0.03125 BBAB 4 (0.5)4 = 0.0625 ABBB 4 (0.5)4 = 0.0625 BBB 3 (0.5)3 = 0.125 b) See table in (a) c) See table in (a) d) P(3) = 2(0.125) = 0.25, P(4) = 6(0.0625) = 0.375, P(5) = 12(0.031250) = 0.375 6.84 More best of five a)   3(0.25)  4(0.375)  5(0.375)  4.125

b) Let A = team A wins (P(A) = 0.8) and B = Team B wins (P(B) = 0.2). Outcome x Probability Outcome x 3 AAA 3 (0.8) = 0.512 BAAA 4 AABA 4 (0.8)3(0.2) = 0.1024 BAABA 5 3 2 AABBA 5 (0.8) (0.2) = 0.02048 BAABB 5 AABBB 5 (0.8)2(0.2)3 = 0.00512 BABAA 5 ABAA 4 (0.8)3(0.2) = 0.1024 BABAB 5 ABABA 5 (0.8)3(0.2)2 = 0.02048 BAAB 4 ABABB 5 (0.8)2(0.2)3 = 0.00512 BBAAA 5 ABBAA 5 (0.8)3(0.2)2 = 0.02048 BBAAB 5 ABBAB 5 (0.8)2(0.2)3 = 0.00512 BBAB 4 ABBB 4 (0.8)(0.2)3 = 0.0064 BBB 3

Probability (0.8)3(0.2) = 0.1024 (0.8)3(0.2)2 = 0.02048 (0.8)2(0.2)3 = 0.00512 (0.8)3(0.2)2 = 0.02048 (0.8)2(0.2)3 = 0.00512 (0.8)(0.2)3 = 0.0064 (0.8)3(0.2)2 = 0.02048 (0.8)2(0.2)3 = 0.00512 (0.8)(0.2)3 = 0.0064 (0.2)3 = 0.008

P(3) = 0.512 + 0.008 = 0.52, P(4) = 3(0.1024) + 3(0.0064) = 0.3264, P(5) = 6(0.02048) + 6(0.00512) = 0.1536;   3(0.52)  4(0.3264)  5(0.1536)  3.63 6.85 Family size in Gaza The median would be the score that falls at 50%. If we add the probabilities from either end, we find that 4 falls at 0.50. The median is 4. 6.86 Longest streak made a) The mean increases by one for each doubling of number of shots. The chance of a streak becomes longer with more trials. b) (i) For 400, we would expect 8 because 400 is double 200. (ii) For 3200, we would expect 11 because we would have to double 400 to 800 to 1600 to 3200. c) 95% of a bell-shaped curve falls within about two standard deviations of the mean, and 4 is a bit more than two standard deviations. 6.87 Stock market randomness A streak of 7 is likely to occur just by chance given enough trials.

136 Statistics: The Art and Science of Learning from Data, 4th edition 6.88 Airline overbooking

Using the binomial distribution with n = 190 and p = 0.8,   np  190(0.8)  152 and   npq  190  0.8 0.2   5.5136. Since both np = 152 and n(1 – p) = 38 are greater than 15, the normal

distribution can be used to approximate the binomial distribution. Thus, we expect nearly all counts to fall within 3 standard deviations of the mean, i.e., approximately between 152 – 3(5.5) = 135 and 152 + 3(5.5) = 169 seats. Since 170 falls outside of this range, the airline is justified in selling more tickets for the flight. b) There might be situations where large groups buy tickets and travel together. This would violate the assumption of independent trials. 6.89 Babies in China Based on the original birth rate and a sample size of 2000, we’d have   np  2000(0.49)  980 and

  np(1  p )  2000(0.49)(1  0.49)  22.4; we’d expect most samples to have the number of female babies fall within three standard deviations of the mean for a range of 980 – 3(22.4) = 913 to 980 + 3(22.4) = 1047. Because only 800 females were born in this particular year, we can conclude that the current probability of a female birth in this town does seem to be less than it used to be. This relies on an assumption of independence which might not hold true in a given town. 6.90 True or false? IQR for normal distribution False, as show in Exercise 6.30, Q1 and Q3 represent z-scores of –0.67 and 0.67, so the IQR would cover the interval from   0.67 , which is smaller that the interval of    given in this exercise. 6.91 Multiple choice: Guess answers The best answer is (d). 6.92 Multiple choice: Terrorist coincidence? The best answer is (a). ♦♦6.93 SAT and ethnic groups 1200  1500  1; which corresponds to a cumulative probability of 0.16. The 300 proportion not admitted for ethnic group A is 0.16. 1200  1350 Group B: z   0.75; which corresponds to a cumulative probability of 0.23. 200 b) (0.16+0.23) = 0.39 so 0.23/0.39 = 0.59 of those not admitted are from ethnic group B. 600  1500 c) Group A: z   3; which corresponds to a cumulative probability of 0.0013. 300 600  1350 Group B: z   3.75; which corresponds to a cumulative probability of 0.00009. 200 Now, (0.00009)/(0.0013+0.00009) = 0.065 are from group B. ♦♦6.94 College acceptance a) Since the scores are close to continuous and likely to follow a bell-shaped distribution with most of the scores falling within 3 standard deviations of the mean, the appropriate distribution is the normal. b) The top 20% corresponds to a cumulative probability of 80%, which yields a z-score of 0.842. Using x    z , the corresponding ACT score is 21.1 + 0.842(5.3) = 25.56.

Group A: z 

 5 0 P(0)=    0.2  (0.8)5  0.328 0

Chapter 6: Probability Distributions 137 ♦♦6.95 Standard deviation of a discrete probability distribution a)

 

 4  5.82152 0.1250     7  5.82152 0.3215  1.02734  1.01

b) The standard deviation will be smaller since the observations are centered at 4 games, with very few series lasting longer than that. For the 99% chance scenario,   0.194. 6.96 Mean and standard deviation for a binary random variable

  0 1  p   1 p   p

  2 

0  p 2 1  p   1  p 2  p  

p 2  p3  p3  2 p 2  p 

p  p2 

p 1  p 

♦♦6.97 Linear transformations: Taxes and fees a)

The distribution of the new prices will still be normal with   1.06  23,800  $25, 228 and

  1.06  4380  $4643. b) The distribution of the new prices will still be normal with   23,800  199  $23, 999 and   $4380. ♦♦6.98 Binomial probabilities If we want to see the probability of multiple independent events occurring, we have to multiply the probabilities of each of those events. If the events have the same probabilities, we can use p x rather than multiply p by itself x times. We also want to know the probability of the event NOT occurring multiple times. The probability of the event not occurring is 1 – p, and the number of times we’re interested in is all the times that the event doesn’t occur, or n – x. The logic for using (1  p )n  x is the same as the logic for using p x . ♦♦6.99 Waiting time for doubles a) The probability of rolling doubles is 1/6. Whatever you get on the second die has a 1/6 chance of matching the first. To have doubles occur first on the second roll, you’d have to have no match on the first roll; there’s a 5/6 chance of that. You’d then have to have a match on the second roll, and there’s a 1/6 chance of that. To get the probability of both occurring is (5/6)(1/6). For no doubles until the third roll, both the first and the second rolls would not match, a 5/6 chance for each, followed by doubles on the third, a 1/6 chance. The probability of all of three events is (5 / 6) 2 (1/ 6). b) By the logic in (a), P(4) = (5/6)(5/6)(5/6)(1/6) or (5 / 6)3 (1/ 6). By extension, we could calculate P(x) for any x by (5 / 6)( x1) (1/ 6). 6.100 Geometric mean We would expect another 1/0.024 = 41.7 losing seasons, so their next win is expected in 2042 (given the assumptions hold, in particular, the constant probability of a win being 0.42.

138 Statistics: The Art and Science of Learning from Data, 4th edition 6.101 World series in baseball

  4 13 68  5 12 68  6 15 68  7  28 68  5.853

b) 68-year Probabilities

Theoretical Probabilities 0.30

0.40

0.25 Probability

Probability

0.35

0.30

0.20

0.25

0.15 0.20

0.10 4.0

4.5

5.0

5.5

6.0

6.5

7.0

Number of Games

4.0

4.5

5.0 5.5 6.0 6.5 Number of Games

7.0

The 68-year distribution is much more left-skewed, with a mode at 7 games. Among the reasons for the difference between the 68-yeardistribution and the theoretical distribution is the likely fact that win probabilities are not constant (home-field advantage). You would expect 68  0.3125  21.25 series to be decided in seven games.

Chapter Problems: Student Activities 6.102 Best of seven games The results will be different each time this exercise is conducted.

Chapter 7: Sampling Distributions 139

Section 7.1: How Sample Proportions Vary Around the Population Proportion 7.1 Simulating the exit poll a) Answers for the sample proportion will vary. Although the population proportion is 0.53, it is unlikely that exactly 53 out of 100 polled voters will vote yes. Sample proportions close to 0.53 will be more likely than those further from 0.53. b) Simulations will vary. The graph of the sample proportions should be close to bell-shaped and centered around 0.53. c)

The predicted standard deviation is

0.53(1  0.53) 100  0.0499.

d) The graph should look similar but shifted so that it is centered around 0.70. The standard deviation changes to 0.70(1  0.70) 100  0.0458. 7.2 Simulate condo solicitations a) Answers for the sample proportion will vary. Although the population proportion is 0.10, it is unlikely that exactly 10% of the customers in the sample will accept the offer. Sample proportions close to 0.10 will be more likely than those further from 0.10. b) Simulations will vary. The graph of the 100 sample proportion values should be approximately bell shaped and centered around 0.10. Yes, as the simulation shows, almost all sample proportions will fall between 0.05 and 0.15. 7.3 Condo sample distribution The mean is p = 0.10 and the standard deviation is

p(1  p )  n

0.10(1  0.10)  0.0134. 500

b) The mean is p = 0.10 and the standard deviation is

p(1  p )  n

0.10(1  0.10)  0.0067. 2000

p(1  p ) 0.10(1  0.10)   0.0268. n 125 As the sample size gets larger (i.e., from 500 to 2000), the standard deviation decreases (in fact, it is only half as large). As the sample size gets smaller (i.e., from 500 to 125), the standard deviation increases (in fact, it becomes twice as large). 7.4 iPhone apps a) The population distribution is the set of all x values for the population of people who own an iPhone, 25% of which are 1 (individual has the given app) and 75% of which are 0 (individual does not have the app). P(X = 1) = 0.25, P(X = 0) = 0.75 b) Don’t have the app: P(X = 0) = 0.40, Have the app: P(X = 1) = 0.60 c) The mean is p = 0.25.

The mean is p = 0.10 and the standard deviation is

p(1  p ) 0.25(1  0.25)   0.061. n 50 e) The standard deviation describes how much the sample proportion varies from one sample (of size 50) to the next. Since the sampling distribution is approximately normal, most of the sample proportions will fall within 3(0.061) = 0.183 of the mean (0.25). 7.5 Other scenario for exit poll a) The binary variable is whether the voter voted for Whitman (1) or not (0). For each observation, P(1) = 0.409 and P(0) = 0.591.

d) The standard deviation is

b) The mean is p = 0.409 and the standard deviation is

p(1  p )  n

0.409(1  0.409) = 0.0079. 3889

7.6 Exit poll and n a) The interval of values within the sample proportion will almost certainly fall within three standard deviations of the mean: 0.409 – 3(0.008) = 0.385 to 0.409 + 3(0.008) = 0.433. b) Since 0.424 falls within the interval calculated in (a), it is one of the plausible values.

140 Statistics: The Art and Science of Learning from Data, 4th edition 7.7 Random variability in baseball p(1  p ) 0.30(1  0.30)   0.0205. The n 500 shape is approximately normal with a mean of 0.30 and standard deviation of 0.0205. 0.32  0.30 0.28  0.30  1.0 and  1.0; Since both 0.32 and 0.28 are about one standard deviation b) 0.020 0.020 from the mean, they would not be considered unusual for this player’s year-end batting average. 7.8 Relative frequency of heads a) Sample Proportion Probability 0 0.50 1 0.50 b) Sample Proportion Probability 0 0.25 ½ 0.50 1 0.25 c) Sample Proportion Probability 0 0.125 1/3 0.375 2/3 0.375 1 0.125 d) The distribution begins to take a bell shape. 7.9 Experimental medication a) P(0 out of 3) = 27/64 = 0.4219, P(1 out of 3) = 27/64 = 0.4219, P(2 out of 3) = 9/64 = 0.1406, P(3 out of 3) 1/64 = 0.0156

The mean is equal to p or 0.30. The standard deviation is

b) n = 3: The mean is p = 0.25, the standard deviation is

p(1  p )  n

0.25(1  0.25)  0.25. 3

p(1  p )  n

0.25(1  0.25)  0.137. 10

n = 10: The mean is p = 0.25, the standard deviation is

p(1  p ) 0.25(1  0.25)   0.043. n 100 The mean does not change as n increases but the standard deviation decreases. 7.10 Effect of n on sample proportion

n = 100: The mean is p = 0.25, the standard deviation is

(i) The standard deviation is

p(1  p )  n

0.50(1  0.50)  0.05. 100

p(1  p ) 0.50(1  0.50)   0.016. n 1000 b) The sample proportion is likely to fall within three standard deviation of the mean. (i) With a mean of 0.50 and a standard deviation of 0.05, this would be between 0.50 – 3(0.05) = 0.35 and 0.50 + 3(0.05) = 0.65. (ii) With a mean of 0.50 and a standard deviation of 0.016, this would be between 0.50 –3(0.016) = 0.45 and 0.50 + 3(0.016) = 0.55.

(ii) The standard deviation is

Chapter 7: Sampling Distributions 141 7.10 (continued) c) When the sample size is larger, the standard deviation is smaller (a reflection of the fact that a larger representative sample is likely to be more accurate than a smaller representative sample). Three standard deviation from a larger sample, therefore, will be smaller than three standard deviation from a smaller sample. The interval will be smaller, an indication of a more precise estimate of the population proportion. 7.11 Syracuse full-time students a) The population distribution (see graph below left) is based on the x values of the 14,201 students, 95.1% of which are 1s and 4.9% of which are 0s, so P(0) = 0.049, P(1) = 0.951. Population Distribution

Data Distribution 1.0

Full-time

0.9

0.8

0.7

0.6

Proportion

1.0

0.5 0.4

0.3

0.2

0.1 0.0

0.1

Part-time

0.0

Full-time

Part-time

b) The data distribution (see graph above right) is based on the x values in the sample of size 350, 330 of which are 1s and 20 of which are 0s, so the data distribution has P(0) = 20/350 = 0.0571 and P(1) = 330/350 = 0.9429. c)

The mean is p = 0.951 and the standard deviation is

p(1  p ) n  0.951(1  0.951) 350  0.0115.

The sampling distribution represents the probability distribution of the sample proportion of full-time students in a random sample of 100 students. In this case, the sampling distribution is bell shaped and centered at 0.951. d) The population and sampling distribution graphs will look the same. Data distribution graphs will vary, since they are based on a different sample. 7.12 Gender distributions a) We can set up a binary random variable by assigning a number, such as 1, to women, and another number, such as 0, to men. b) Using 1 for woman, P(1) = 0.60, P(0) = 0.40. (See graph below left.) Population Distribution

Sample Distribution

0.7

0.7 women

0.6

0.6 women

0.4

0.5 Probability

Probability

0.5 men

0.3

0.4 0.3

0.2

0.1

0.0

men

0.0

Gender

1 Gender

142 Statistics: The Art and Science of Learning from Data, 4th edition 7.12 (continued) c) Sample proportions of 0.52 for x = 1 (women) and 0.48 for x = 0 (men). (See graph above right.) d) The sampling distribution of the sample proportion of women in the sample is approximately a normal p(1  p ) 0.60(1  0.60) distribution. Its mean is 0.60 and its standard deviation is   0.069. n 50 e) The population and sampling distribution graphs will look the same. Data distribution graphs will vary, since they are based on a different sample. 7.13 Shapes of distributions a) With random sampling, the data distribution would more closely resemble the population distribution. In both cases, the distributions are based on individual scores, not means of samples. b) For a population proportion of 0.9 or 0.95, with n = 30, the sampling distribution does not look bell shaped but rather skewed to the left. c) It is inappropriate because we expect the bell shape to occur only when np and n(1 – p) are larger than 15, which is not the case here. 7.14 Student government election p(1  p ) 0.55(1  0.55)   0.035. n 200 b) It is reasonable to assume a normal distribution since np = 200(0.55) = 110 and n(1 – p) = 200(0.45) = 90 are both greater than 15. 0.50  0.55 c) For a sample proportion of 0.50, z   1.429, which corresponds to a cumulative 0.035 probability of 0.0776. Therefore, the probability she will not get a majority is 7.66%.

The mean is p = 0.55 and the standard deviation is

d) For n = 1000, the standard deviation is

p(1  p )  n

0.55(1  0.55)  0.001573. For a sample 1000

0.50  0.55  3.179, which corresponds to a cumulative probability of 0.01573 0.00074. Therefore, the probability she will not get a majority is 0.074%.

proportion of 0.50, z 

Section 7.2: How Sample Means Vary Around the Population Mean 7.15 Simulate taking midterms a) Although the mean of the population distribution is 70, because of random variability (as expressed by the population standard deviation), the sample mean of a sample of size 12 will sometimes be smaller or larger than 70. b) The simulated sampling distribution is bell shaped and centered at 70. Almost all sample means fall between a score of about 60 to 80, or within three sample standard deviations of 10 12  2.89. c)

It is still bell shaped and centered at 70 but now has smaller variability. Now, almost all sample means fall within about 65 to 75, or within three sample standard deviations of 5 12  1.44.

7.16 Education of the self-employed a) The random variable X is years of education of self-employed U.S. citizens. b) The mean is 13.6 and the standard deviation is  n  3.0 100  0.30. The mean of the sampling distribution is 13.6, the same as the mean of the population. The standard deviation is smaller than the standard deviation of the population. It reflects the variability among all possible samples of a given size taken from this population. c)

The mean is 13.6 and the standard deviation is  n  3.0 400  0.15. As n increases, the mean of the sampling distribution stays the same, but the standard deviation gets smaller.

Chapter 7: Sampling Distributions 143 7.17 Rolling one die a) (i)

(ii) Sample Distribution of Mean of Two Die 0.20

0.18

0.16

0.14

0.14 Probability

Probability

Probability Distribution for One Die 0.20

0.12 0.10 0.08

0.06

0.04

0.02

0.00

3 4 Outcome

0.00

1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0

Mean

b) (i) The mean for n = 2 is 3.50 and the standard deviation is 

n  1.71

2  1.21.

(ii) The mean for n = 30 is 3.50 and the standard deviation is 

n  1.71

30  0.31.

As n increases, the sampling distribution becomes more normal in shape and less variable. 7.18 Playing roulette a) x P(x) 0 37/38 = 0.973684 35 1/38 = 0.026316 b) The mean would be the same as the population mean of 0.921 and the standard deviation would be  n  5.603 5040  0.079. 1  0.921  1, which corresponds to a cumulative probability of 0.079 0.8413. The probability of winning at least $1 is 1 – 0.8413 = 0.1586, or about 16%. 7.19 Simulate rolling dice a) Judging by the histogram, the simulated sampling distribution is not bell shaped but, rather, has a triangular shape. b) Values will vary. One run of the simulation yielded a mean of 3.49 and a standard deviation of 1.21 for the simulated sampling distribution of the 10,000 sample means. These are very close to the theoretical mean and standard deviation of the sampling distribution, which are 3.5 and  n 

For a sample proportion of $1, z 

1.71

2  1.21.

With n = 30, the histogram representing the sampling distribution is now bell shaped and shows a much smaller standard deviation compared to the case with n = 2. 7.20 Canada lottery a)

The mean would be 0.10 and the standard deviation would be 

n  100

1,000, 000  0.10.

1.00  0.10  9.0. When we look this up in the Table A, this z-score is not 0.10 on the table. We find that an area of 0.0 of the curve falls above this z-score. Thus, it’s exceedingly unlikely that Joe’s average winnings would exceed $1.

b) The z-score at $1 would be

144 Statistics: The Art and Science of Learning from Data, 4th edition 7.21 Shared family phone plan Since  is almost as large as  and 0 is a lower bound that is only (0  2.8) 2.1  1.33, or 1.33 standard deviations below the mean, the distribution is right skewed. b) With a random sample, the data distribution picks up the characteristics from the population distribution, so we anticipate it as right skewed. c) With n = 45, the sampling distribution of sample mean will be bell shaped by the central limit theorem. 7.22 Dropped from plan a) The population distribution will be right-skewed with a mean of 2.8 minutes and a standard deviation of 2.1 minutes. b) The data distribution also has a mean of 3.40 and a standard deviation of 2.9. a)

The sampling distribution has a mean of 3.40 and a standard deviation of 

n  2.9

45  0.31.

3.4  2.8  1.936. Since the distribution is bell shaped, this is not unusually high 0.31 since it is within 2 standard deviations of the mean. 3.5  2.8  2.258, which corresponds to a cumulative probability of 0.9980. The e) For 3.5 minutes, z  0.31 probability of sample mean being larger than 3.5 is 1 – 0.9980 = 0.0119, or 1.2%. 7.23 Restaurant profit?

d) For 3.4 minutes, z 

The mean would be 8.20 and the standard deviation would be 

n 3

100  0.30.

8.95  8.20  2.5; Table A tells us that 0.994 is the cumulative proportion below this z-score. 0.3 Thus, the probability that the restaurant makes a profit that day is 0.994. 7.24 Survey accuracy

z

The standard deviation is 

n  15

100  1.5.

z = 2/1.5 = 1.33; the proportion below this z-score is 0.91 (2 is used in the numerator of the z-score equation because it would be the difference between a given sample mean and the population mean no matter what these means actually were), and the proportion below –1.33 is 0.09. Thus, 0.91 – 0.09 = 0.82 is the proportion that falls within two years of the mean age. b) If the population standard deviation were 10, the sample standard deviation would be 10 100  1. Thus, the sampling distribution of the mean is less variable and the probability of being within two years of the mean age is higher. 7.25 Blood pressure a)

The mean would be 130 and the standard deviation would be 

n 6

3  3.46.

b) If the probability distribution of the blood pressure reading is normal, the sampling distribution of the sample mean would also be normal. If the population distribution is approximately normal, then the sampling distribution is approximately normal for all sample sizes. 140  130 c) z   2.89; Table A tells us that 0.998 of the sampling distribution falls below this z-score. 3.46 The probability that the sample mean exceeds 140 is 1.00 – 0.998 = 0.002. 7.26 Household size a) The random variable, X, is number of people in a household consisting of both family and non-family members, is quantitative. b) The center of the population distribution is the mean of the population, 4.43. The spread is the standard deviation of the population, 2.02. The distribution is skewed slightly to the right due to some very large households. c) The center of the data distribution is 4.2. The spread is the standard deviation of 1.9. The shape is similar to the population distribution.

Chapter 7: Sampling Distributions 145 7.26 (continued) d) The center of the sampling distribution of the sample mean is 4.43. The spread, or standard deviation, is  n  2.02 225  0.135. The shape is approximately normal due to the Central Limit Theorem. 7.27 Average monthly sales a) The center of the population distribution is the mean of the population, $74,550. The spread is the standard deviation of the population, $19,872. The shape is probably skewed to the right due to a few very large incomes. b) The center of the data distribution is $75,207. The spread is the standard deviation of $18,901. The shape is similar to the shape of the population distribution. c) The center of the sampling distribution of the sample mean is $74,550. The spread, or standard deviation, is  n  19,872 100  1987.2. With a sample size of 100, this would have approximately a normal distribution due to the Central Limit Theorem. d) It would not be unusual to observe an individual who earns more than $100,000 since this is only 100,000  74,550  1.28 standard deviations above the mean. It would be highly unusual to observe a 19,872 sample mean over $100,000 for a random sample of 100 people because this is more than 100,000  74,550  12.8 standard deviations above the mean. 1987.2 7.28 Central limit theorem for uniform population a) The shape is triangular (you may see this better by decreasing the bin size), although it is already pretty close to bell shaped. b) The shape is already close to bell shaped. Compared to n = 2, the variability is smaller. c) The shape is bell shaped, and the variability is much smaller compared to n = 2. Most sample means fall between 0.35 and 0.65. Results are similar to the ones indicated in Figure 7.11. d) As the sample size increases, the sampling distribution increasingly resembles a normal distribution. 7.29 CLT for skewed population a) The shape is decisively right skewed. b) The shape is still right skewed but begins to resemble a bell shape. Compared to n = 2, the variability is a bit smaller because the right tail’s extend is smaller. c) The shape is bell shaped, and the variability is much smaller compared to n = 2. Most sample means fall between about 5 and 10. Results are similar to the ones indicated in Figure 7.11. d) As the sample size increases, the sampling distribution increasingly resembles a normal distribution. 7.30 Sampling distribution for normal population The sampling distribution is normal even for n = 2. If the population distribution is approximately normal, the sampling distribution is approximately normal for all sample sizes.

Chapter Problems: Practicing the Basics 7.31 Exam performance a)

The mean is p = 0.70 and the standard deviation is

p(1  p ) n  0.70(1  0.70) / 50  0.0648.

b) Since n = 50, by the Central Limit Theorem we would expect the shape of the sampling distribution to be approximately normal with mean = 0.70 and standard deviation = 0.0648. 0.60  0.70  1.54, which corresponds to a cumulative probability of 0.06. It c) The z-score for 0.60 is 0.0648 would not be too surprising to only get 60% of the answers correct.

146 Statistics: The Art and Science of Learning from Data, 4th edition 7.32 Blue eyes a) The mean is 1/6 = 0.1667, the standard deviation is p(1  p ) n  0.1667(1  0.1667) /100  0.0373 . 0.50  0.1667  8.9. It would be very 0.0373 unusual to obtain a sample proportion that falls 8.9 standard deviations above the population proportion. The population distribution is the set of 0s and 1s describing whether an American has blue eyes (1) or does not (0). The population distribution consists of roughly 17% 1s and 83% 0s. The sample distribution is the set of 50 0s and 50 1s describing whether the students in your class have blue eyes or not. The sampling distribution of the sample proportion is the probability distribution of the sample proportion. It has a mean of 0.1667 and a standard deviation of p(1  p ) n

b) Yes, this would be a surprising result. The z-score for 0.50 is

 0.1667(1  0.1667) 100  0.0373.

7.33 Alzheimer’s Since np = 200(1/9) = 22.2 and n(1 – p) = 200(8/9) = 177.8, which are both greater than 15, the shape of the sampling distribution is approximately normal with a mean of p = 1/9. For n = 800, the number of successes and failures will be even larger. For a sample size of 200, the standard deviation is

p(1  p ) n 

1 91  1 9 200  0.0222.

b) For a sample size of 800, the standard deviation is

p(1  p ) n 

1 91  1 9 800  0.0111.

7.34 Basketball shooting

The mean is p = 0.45 and the standard deviation is

p(1  p ) n  0.45(1  0.45) /12  0.144.

b) Since (3/12  0.45) 0.144  1.39, this game’s result is 1.39 standard deviations below the mean. c)

If the population proportion is 0.45, it would not be that unusual to obtain a sample proportion of 3/12 = 0.25 since this value is only 1.39 standard deviations below the mean. 7.35 Defective chips a) The shape of the sampling distribution is approximately normal with a mean of p = 0.04 and a standard deviation of p(1  p ) n  0.04(1  0.04) 500  0.0088. b) Since np = 500(0.04) = 20 and n(1 – p) = 500(0.96) = 480, which are both greater than 15, the shape of the sampling distribution is approximately normal. 0.05  0.04  1.136, which corresponds to a cumulative probability of 0.8721. c) The z-score for 5% is 0.0088 The probability of a shipment being returned is 1 – 0.8721 = 0.1279, or 12.8%. 7.36 Returning shipment a) Since np = 380(0.04) = 15.2 and n(1 – p) = 380(0.96) = 364.8, which are both greater than 15, the shape of the sampling distribution is approximately normal with a mean of p = 0.04 and a standard 0.05  0.04  1, deviation of p(1  p ) n  0.04(1  0.04) 380  0.010. The z-score for 5% is 0.010 which corresponds to a cumulative probability of 0.8413. The probability of a shipment being returned is 1 – 0.8413 = 0.1587, or 15.8%. b) Since np = 380(0.06) = 22.8 and n(1 – p) = 380(0.94) = 357.2, which are both greater than 15, the shape of the sampling distribution is approximately normal with a mean of p = 0.06 and a standard 0.05  0.06  0.833, deviation of p(1  p ) n  0.06(1  0.06) 380  0.012. The z-score for 5% is 0.012 which corresponds to a cumulative probability of 0.2023. The probability of a shipment being returned is 1 – 0.2023 = 0.7977, or 79.8%.

Chapter 7: Sampling Distributions 147 7.37 Aunt Erma’s restaurant a) The population distribution has a mean of $900 and a standard deviation of $300. b) The data distribution has a mean of $980 and a standard deviation of $276. The standard deviation of the data distribution describes the spread of the daily sales values for this past week. c) The mean of the sampling distribution of the sample mean is $900; the standard deviation is  n  300 / 7  113.4 dollars. The standard deviation describes the spread of the sample means based on samples of seven daily sales. 7.38 Home runs a) No. X is discrete, positive, and likely skewed to the right given the relative sizes of the mean and standard deviation. b) Since n = 162 is greater than 30, the shape of the sampling distribution of the mean number of runs the team will hit in its 162 games is approximately normal with a mean of 1.0 and a standard deviation of  n  1 162  0.0786.

The z-score for 1.50 is

1.5  1  6.36. The probability of exceeding 1.5 runs per game is practically 0.0786

0. 7.39 Physicians’ assistants The mean would be $84,396. The standard deviation would be  n  21,975 100  $2197.50. The sampling distribution would have a bell shape because the sample size is greater than 30. 80, 000  84,396  2.00 b) z  2197.5 c) The z-scores are –1.82 and 1.82, which enclose a probability of 0.93. 7.40 Bank machine withdrawals a)

The sample standard deviation is  n  50 100  5; $10 is two standard deviations, so 0.9545 of the observations will fall within two standard deviations of the mean. 7.41 PDI a) b)

90  100  0.67 15 90  100 z  10 15 225 z

For an individual PDI value, 90 is only 0.67 standard deviations below the mean and is therefore not surprising. However, it would be unusual for the mean of a sample of size 225 to be 10 standard deviations below the mean of its sampling distribution. 7.42 Number of sex partners a) X does not likely have a normal distribution because the standard deviation is as big as the mean, an 0 1 indication of skew. In fact, for the lowest possible value of 0, z   1, so it is only one standard 1 deviation below the mean, an indication of right skew. b) The sampling distribution would have a mean of 1.0, and a standard deviation of  n  1.0 100  0.1. Because the sample sizes are greater than 30, the sampling distribution approximates a normal curve even though the population is not normally distributed. ♦♦7.43 Using control charts to assess quality a) For a normal distribution, nearly every sample mean will fall within three standard deviations of the mean of the sampling distribution. Thus, there is a probability of 0.003 that this process will indicate a problem where none exists.

148 Statistics: The Art and Science of Learning from Data, 4th edition 7.43 (continued) b) The Empirical Rule says that 95% of the sampling distribution falls within two standard deviations of the mean. The probability of falsely indicating a problem would then be 5% if we used two standard deviations. c) (i) The chance that any one sample mean would fall above the mean is 0.50. Thus, the probability that the next 9 means in a row will all fall above the mean is 0.0002. 9! P 9  0.509 1  0.5099  0.002 9! 9  9 !

(ii) In this scenario, it does not matter where the first observation falls, only that each succeeding observation falls on the same side as the first. No matter where the first falls, the chance that the next will be on the same side of the mean is 0.50. The probability is twice the probability found in (i), 2(0.002) = 0.004. 7.44 Too little or too much cola? a)

With a sample size of 4, the standard deviation would be  n  4 4  2. The target line would be at the mean of 500 within upper and lower control limits at 506 and 494, respectively (each three standard deviations from the mean).

b) The standard deviation is now 

n 6

4  3. The z-score at the lower control limit would be

494  491  1. The proportion falling below this z-score would be 0.84. The z-score at the upper 3 506  491 control limit would be  5. The proportion falling above this z-score is essentially 0.Thus, 3 the probability that the next value plotted on the control chart indicates a problem with the process is 0.84. We assume a normally distributed population in making these calculations because we do not have a sample size of more than 30.

Chapter Problems: Concepts and Investigations 7.45 CLT for custom population Answers will vary. However, as n increases, the sampling distribution will become more bell shaped and have less variability around the mean, no matter what distribution the student chose. 7.46 What is a sampling distribution? Each time a poll is conducted, there will be a sample proportion that is calculated (How many of the 1000 polled Canadians think the prime minister is doing a good job?). The distribution of sample proportions is called a sampling distribution. The sample proportions fluctuate from one poll to the next around the population proportion value, with the degree of spread around the population proportion being smaller when the sample size is larger. 7.47 What good is a standard deviation? The standard deviation of the sampling distribution describes how closely a sample statistic will be to the parameter it is designed to estimate, in this case how close the sample proportion will be to the unknown population proportion. 7.48 Purpose of sampling distribution This standard deviation tells us how close a typical sample proportion is to the population proportion.

Chapter 7: Sampling Distributions 149 7.49 Sampling distribution for small and large n a) Sample Number Who Prefer Pizza A Proportion Who Prefer Pizza A AAA 3 1 AAD 2 2/3 ADA 2 2/3 DAA 2 2/3 ADD 1 1/3 DAD 1 1/3 DDA 1 1/3 DDD 0 0 b) If the proportion of the population who favors Aunt Erma’s pizza is 0.50, each of the eight outcomes listed in (a) are equally likely. Thus, the probability of obtaining a sample proportion of 0 is 1/8, a sample proportion of 1/3 has a (1 + 1 + 1)/8 = 3/8 chance, a sample proportion of 2/3 has a (1 + 1 + 1)/8 = 3/8 chance and a sample proportion of 1 has a 1/8 chance. This describes the sampling distribution of the sample proportion for n = 3 with p = 0.50. c) For p = 0.5 and n = 50, the mean of the sampling distribution is 0.5 and the standard deviation is p(1  p ) n  0.5(1  0.5) 50  0.0707. 7.50 Sampling distribution via the binomial a) Let X = the number of people, out of 3, that prefer pizza A.  3 1 31 P ( X  1)     0.5  0.5  0.375  3/ 8 1

 3  3 0 3 0 2 3 2 P ( X  0)     0.5  0.5  0.125  1/ 8, P ( X  2)     0.5 0.5  0.375  3/ 8, 0 2  3 3 3 3 P ( X  3)     0.5  0.5  0.125  1/ 8 3  

Let X = the number of people, out of 4, that prefer pizza A.  4 0 4 0 P ( X  0)     0.5  0.5  0.0625  1/16  0

0.40 0.35

4 1 41 P ( X  1)     0.5  0.5  0.25  1/ 4 1

0.25 P(x)

 4 2 4 2  0.375  3/ 8 P( X  2)     0.5  0.5  2

0.30

0.20

4 3 43 P ( X  3)     0.5  0.5  0.25  1/ 4  3

0.15

 4 4 4 4 P ( X  4)     0.5  0.5  0.0625  1/16  4

0.05

0.10

0.00

7.51 Pizza preference with p = 0.6 a) Let X = the number of people, out of 3, that prefer pizza A.  3  3 0 3 0 1 31 P ( X  0)     0.6  0.4   0.064, P ( X  1)     0.6  0.4   0.288, 0 1  3  3 2 3 2 3 33 P ( X  2)     0.6  0.4   0.432, P ( X  3)     0.6  0.4   0.216 2  3

2 x

150 Statistics: The Art and Science of Learning from Data, 4th edition 7.51 (continued) b) The mean number of people preferring pizza A is np = 100(0.6) = 60. c) The proportion of people preferring pizza A is 60/100 = 0.60. 7.52 Simulating pizza preference with p = 0.5 a) The mean of the sampling distribution is 0(1/8) + (1/3)(3/8) + (2/3)(3/8) + 1(1/8) = 1/2 = 0.5. b) Values will vary. Yes, one run of the simulation yielded a mean of 0.496 and a standard deviation of 0.289 for the 10,000 simulated sample proportions, almost identical to the theoretical values. 7.53 Simulating pizza preference with p = 0.6 Values will vary. Yes, one run of the simulation yielded a mean of 0.60 and a standard deviation of 0.282 for the 10,000 simulated sample proportions. These are almost identical to the theoretical mean and standard deviation of the sampling distribution, which are 0.60 and 0.283. 7.54 Winning at roulette a) In order to win at least $100, the average winnings per spin must be at least $100/40 = $2.50. b) To win at least $100, you must win on at least 25 of the spins (note that if you win 25 of the spins, you lose 15 of the spins so that your total winnings are 25($10) +15(–$10) = $100). The sampling distribution of the sample proportion of winning spins has a mean of 18/38 and standard deviation of

p(1  p ) n 

18 381  18 38 40  0.0789. Winning at least 25/40 = 5/8 of the time has a z-

5 8  18 38  1.92, which corresponds to a cumulative probability of 0.9726. The probability 0.0789 of winning at least $100 is 1 – 0.9726 = 0.0274, or about 2.74%. c) You must win at least 25 of the spins. d) P  X  25  P  X  25  P  X  26    P  X  40  0.0392. The estimate using the normal

score of

approximation is about 0.0118 smaller. 7.55 True or false False, as the sample size increases, the denominator of the equation for standard deviation increases. A larger denominator leads to a smaller solution. This reflects the fact that a larger sample size is likely to lead to a more accurate estimate of the population mean. 7.56 Multiple choice: Standard deviation The best answer is (b). 7.57 Multiple choice: CLT The best answer is (c). 7.58 Multiple choice: Sampling distribution of sample proportion The best answer is (c). The mean of the sampling distribution is p = 0.5 and the standard deviation is p(1  p ) n 

0.51  0.5 150  0.408 and 95% of sample proportions should be within two standard

deviations of the mean. 7.59 Multiple choice: Sampling distribution The best answer is (a). ♦♦7.60 Sample = population a) If we sampled everyone in the population, the sample mean would always be the same as the population mean, and we would not have any variability of sample means. b) The sampling distribution would look exactly like the population distribution with a sample size of one. ♦♦7.61 Standard deviation of a proportion The standard deviation is

 n



p(1  p ) n



p(1  p ) . n

Chapter 7: Sampling Distributions 151 ♦♦7.62 Finite populations a)

The standard deviation is

b) The standard deviation is

30,000  300    30,000  1 n

29,700   .   0.995 29,999 n n

30, 000  30,000    30,000  1 n

0     0  0. 29, 999 n n

Chapter Problems: Student Activities 7.63 Simulate a sampling distribution a) Answers will vary. b) Answers will vary. c) You would expect the mean to be close to the population mean of 196.

d) You would expect a standard deviation of about  n  57.4 9  19.1, which is the standard deviation of the sampling distribution when n = 9. 7.64 Coin tossing distributions a) The mean is 0(0.5) + 1(0.5) = 0.5. x P(x) 0 0.5 1 0.5 b) x P(x) 0 0.4 1 0.6 c) The results will be different each time this exercise is conducted but should be roughly bell-shaped around 0.5. d) If we performed the experiment in (c) an indefinitely large number of times, we’d expect to get a mean that’s very close to the population proportion and we’d expect the standard error to approach the standard deviation we’d calculate using this formula: p (1  p ) / n . The population proportion is 0.5 and the standard error is 0.158. 7.65 Sample versus sampling a) The histograms will be different each time this exercise is conducted, but we should have a positively skewed distribution. b) This is a sampling distribution, and with ten coins in each sample, the distribution should be closer to a normal distribution than that in (a). It should also be less spread out than the distribution in (a). This exercise illustrates the floor effect (no coin can have a score below 0, but the scores can go quite high). It also illustrates the Central Limit Theorem in that the increase in sample size makes the distribution somewhat closer to a normal distribution.

Chapter 8: Statistical Inference: Confidence Intervals 153

Section 8.1: Point and Interval Estimates of Population Parameters 8.1 Health care a) This study will estimate the population proportion who have health insurance and the mean dollar amount spent on health insurance by the population. b) The sample proportion and the sample mean can be used to estimate these parameters. 8.2 Video on demand a) This study will estimate the proportion of U.S. adults watching content time-shifted and the mean amount of hours spent watching over the Internet. b) The proportion in the sample watching content time-shifted and the mean number of hours watching over the Internet of those sampled can be used to estimate these parameters. c) The sample mean number of hours watched online is the unbiased estimator of the mean number of hours watched online for the entire population of U.S. adults. The sampling distribution of the mean number of hours watched online is centered at the true mean number of hours watched online in the population. 8.3 Projecting winning candidate a) The point estimate is 54.8%, or 0.548. b) The interval estimate is given by the point estimate plus or minus the margin of error: 0.548 – 0.03 = 0.518 to 0.548 + 0.03 = 0.578. c) A point estimate is one specific number, such as a proportion. An interval estimate is a range of numbers. 8.4 Believe in hell? The point estimate is 698/1327 = 0.526, or 52.6%. 8.5 Government spying a) The point estimate is the proportion of the sample: 900/1000 = 0.90, or 90%. b) With a probability of 95%, the point estimate of 0.9 falls within a distance of 0.045 of the actual proportion of German citizens who find it unacceptable. 8.6 Game apps a) The sample point estimate is the mean of these responses, (1.09 + 4.99 + 1.99 + 1.99 + 2.99)/5 = $2.61. b) With a probability of 95%, the point estimate of $2.61 falls within a distance of $1.85 of the actual mean fee charged for paid games in the app store. 8.7 Nutrient effect on growth rate a) The point estimate is the mean of the heights of the six tomato plants, 62.2. b) The interval would be: 62.2 – 4.9 = 57.3 mm to 62.2 + 4.9 = 67.1 mm. It includes all heights within one margin of error on either side of the mean. c) A point estimate alone may be highly inaccurate, especially with a small sample. An interval estimate gives us a sense of the accuracy of the point estimate. 8.8 Believe in heaven? a) The margin of error for a 95% confidence interval would be 1.96 times the standard deviation of 0.01, which is 0.02. It is very likely that the population proportion is no more than 0.02 lower or 0.02 higher than the reported sample proportion. b) The 95% confidence interval includes all points within the margin of error of the mean. Lower endpoint: 0.85 – 0.02 = 0.83; upper endpoint: 0.85 + 0.02 = 0.87. The confidence interval goes from 0.83 to 0.87. This is the interval containing the most believable values for the parameter.

154 Statistics: The Art and Science of Learning from Data, 4th edition 8.9 Feel lonely often? a) This solution uses the 1972–2010 Cumulative data file. Response Percentage 0 54.0 1 13.9 2 8.6 3 6.6 4 4.1 5 3.0 6 1.4 7 8.4 The mean is 1.5 and the standard deviation is 2.21. Most respondents said that they were never lonely, but on the average subjects were lonely 1.5 days a week. b) The standard deviation of the sample mean, 0.06, refers to the standard deviation of the sampling distribution for samples of size 1450. 8.10 CI for loneliness The confidence interval would range from 1.5 – 0.12 = 1.38 to 1.5 + 0.12 = 1.62. This is the range that includes the most believable values for the population mean. 8.11 Newspaper article a) The article chosen, and the responses to (a), will be different for each student. b) The specific interpretation will depend on the article chosen.

Section 8.2: Constructing a Confidence Interval to Estimate a Population Proportion 8.12 Putin For a 95% confidence interval, multiply 1.96, the z-score for a 95% confidence interval, by the standard error. The margin of error is 1.96 pˆ (1  pˆ ) n  1.96 0.83(1  0.83) 2000  0.016, or 1.6%. 8.13 Flu shot a) The point estimate of the proportion of the population who were victims would be 24/3900 = 0.00615. b) The standard error would be pˆ (1  pˆ ) n  0.00615(1  0.00615) 3900  0.00125. c) The margin of error would be (1.96)(0.00125) = 0.00245. d) The 95% confidence interval would include all proportions within one margin of error of the mean proportion of 0.00615, namely 0.00615 – 0.00245 = 0.0037 to 0.00615 + 0.00245 = 0.0086, or (0.0037, 0.0086). We are 95% confident that the proportion of people receiving the flu shot but still developing the flu is between 0.37% and 0.86%. e) Yes. The upper limit of the confidence interval is 0.86%, which is less than 1%. 8.14 How green are you? a) The point estimate is 344/1170 = 0.294. b) The standard error is pˆ (1  pˆ ) n  0.294(1  0.294) /1170  0.013. The margin of error is (1.96)(0.013) = 0.025. 0.294 – 0.025 = 0.27 0.294 + 0.025 = 0.32 The numbers represent the most believable values for the population proportion. d) We must assume that the data are obtained randomly, and that a large enough sample size is used so that the number of successes and the number of failures both are greater than 15. Both seem to hold true in this case. 8.15 Make industry help environment? a) The data must be obtained randomly, which the text assures us we can assume for the GSS. We also must assume that the number of successes and the number of failures both are greater than 15, also true in this case. c)

Chapter 8: Statistical Inference: Confidence Intervals 155 8.15 (continued) b) The point estimate is 1403/1497 = 0.937. The standard error is pˆ (1  pˆ ) / n  0.9372(1  0.9372) 1497  0.006. 0.937 – (1.96)(0.006) = 0.925 0.937 + (1.96)(0.006) = 0.949 We can be 95% confident that the proportion of the population who believe it should be the government’s responsibility to impose strict environmental laws is between 0.937 – 0.925 = 0.925 and 0.937 + 0.949 = 0.949. Since the lower limit is above 50%, we can conclude that a majority of the population would answer yes. 8.16 Favor death penalty a) We can obtain the value reported under “Sample p” by dividing the number of those in favor by the total number of respondents, 1183/1824. b) We can be 95% confident that the proportion of the population who are in favor of the death penalty is between 0.626665 and 0.670484, or rounding, (0.627, 0.670). c) 95% confidence refers to a probability that applies to the confidence interval method. If we use this method over and over for numerous samples, in the long run we make correct inferences (that is, the confidence interval contains the parameter) 95% of the time. d) We can conclude that more than half of all American adults were in favor because all the values in the confidence interval are above 0.50. 8.17 Oppose death penalty If 0.649 are in favor, then 1 – 0.649 = 0.351 are opposed. We can figure the margin of error by subtracting the proportion who favor the death penalty from one of the limits of the confidence interval. The margin of error, 0.649 – 0.627 = 0.022, can then be added to and subtracted from the proportion who are opposed to get a confidence interval of 0.351 – 0.022 = 0.329 to 0.351 + 0.021 = 0.373. These are also 1 minus the endpoints of the interval in the previous exercise. 8.18 Stem cell research a) The “Sample p”, 1521/2113 = 0.720, is the proportion of all respondents who believe that stem cell research has merit. The “95% CI” is the 95% confidence interval. We can be 95% confident that the population proportion falls between 0.701 and 0.739. b) The margin of error is (0.7390  0.7007)/2 = 0.019. 8.19 z-score and confidence level a) 1.645 b) 2.33 c) 3.29 8.20 Believe in ghosts We can be 95% confident that the population proportion of adults who believe in ghosts is between 0.40 and 0.44. 8.21 Stem cell research and religion The inference applies to the population of adults describing themselves as Republicans. We can be 95% confident that the population proportion of adult Republicans who believe that stem cell research has merit is between 0.55 and 0.61. 8.22 Fear of breast cancer a)

The sample proportion is 0.61 and the standard error is

pˆ (1  pˆ ) / n  0.61(1  0.61) 1000  0.015.

The 90% confidence is pˆ  z.05 ( se)  0.51  1.645(0.015), or (0.585, 0.635). We can be 95% confident that the population proportion falls within this range. b) For the inference to be valid, the data must be obtained randomly. In addition, the number of successes and the number of failures both must be greater than 15, which they are.

156 Statistics: The Art and Science of Learning from Data, 4th edition 8.23 Chicken breast a)

The sample proportion is 207/316 = 0.655. The standard error is

pˆ (1  pˆ ) n 

0.655(1  0.655) 316  0.0267. The 99% confidence interval is pˆ  z.005 ( se)  0.655  2.58(0.0267), or (0.586, 0.724). At the 99% confidence level, a range of plausible values for the population proportion of chicken breast that contain E.coli is 0.586 to 0.724. We can conclude that the population proportion exceeds 50% because 50% is below the lowest believable value of the confidence interval. b) The 95% confidence interval would be narrower because the margin of error will decrease. The zscore will be 1.96 compared to 2.58 for the 99% confidence interval. 8.24 Same-sex marriage

The sample proportion is 0.51. The standard error is

pˆ (1  pˆ ) n  0.51(1  0.51) 1504  0.0129. The

95% confidence interval is pˆ  z.025 ( se)  0.51  1.96(0.0129), or (0.485, 0.535). At the 95% confidence level, a range of plausible values for the population proportion of people that support same-sex marriage is 0.485 to 0.535. We cannot conclude that the population proportion exceeds 50% because 50% is not below the lowest believable value of the confidence interval. 8.25 Exit poll predictions a) The sample proportion is 660/1400 = 0.471. The standard error is pˆ (1  pˆ ) n  0.471(1  0.471) 1400  0.013. The 95% confidence interval is pˆ  z.025 ( se)  0.471  1.96(0.013), or (0.446, 0.496). We could predict the winner because 0.50 falls outside of the confidence interval. It does not appear that the Democrat received more than half of the votes. b) The 99% confidence interval pˆ  z.005 ( se)  0.471  2.58(0.013), or (0.437, 0.506). We now cannot predict a winner because it is plausible that the Democrat received more than half of the votes. The more confident we are, the wider the confidence interval. 8.26 Exit poll with smaller sample a) The sample mean proportion is the same, 0.471. The standard error is pˆ (1  pˆ ) n  0.471(1  0.471) 140  0.042. The 95% confidence interval is pˆ  z.025 ( se)  0.471  1.96(0.042), or (0.389, 0.553). We cannot predict the winner because 0.50 falls within this interval. It is possible that the Democrat received more than half of the votes. b) Larger sample sizes increase the denominator, making the standard error smaller. A smaller standard error leads to smaller margins of error and narrower confidence intervals. 8.27 Simulating confidence intervals The results of the simulation will be different each time it is conducted. The percentage we’d expect would be 95% and 99%, but the actual values may differ a bit because of sampling variability. 8.28 Simulating confidence intervals with poor coverage a) The results of the simulation will be different each time it is conducted, but around 15% would fail to contain the true value. b) We would expect 5% not to contain the true value given that we’re using a 95% confidence interval. This suggests at least one requirement for calculating the 95% confidence interval has not been met. c) The results of the simulation will be different each time it is conducted, but around 15% would still fail to contain the true value. d) The results of the simulation will be different each time it is conducted, but the distribution will be right-skewed, suggesting the central limit theorem does not apply.

Chapter 8: Statistical Inference: Confidence Intervals 157

Section 8.3: Constructing a Confidence Interval to Estimate a Population Mean 8.29 Females’ ideal number of children a) The point estimate of the population mean is 2.56. b) The standard error of the sample mean is s

n  0.84

590  0.035.

c) We’re 95% confident that the population mean falls between 2.49 and 2.62. d) It is not plausible that the population mean is 2 because it falls outside of the confidence interval. 8.30 Males’ ideal number of children a) The point estimate of the population mean is 2.51 and the standard error of the sample mean is s n  0.87 530  0.038. b) We can be 95% confident that the population mean falls between 2.43 and 2.59. If random samples of size 530 were repeatedly drawn under the same conditions and 95% confidence intervals were constructed for each sample, the proportion of these intervals that would contain the population proportion would be about 0.95. c) No. The sample means and standard deviations are very similar, and the two intervals almost overlap completely. 8.31 Using t table a) 2.776 b) 2.145 c) 2.977 8.32 Anorexia in teenage girls a) Dotplot of Weight Change

-4

8 12 Weight Change

b) The mean and standard deviation can be verified on MINITAB, or other technology. c) The standard error can be verified on MINITAB. d) df = n – 1 = 17 – 1 = 16; a 95% confidence interval uses the t-score equal to 2.120 because that is the tscore for 16 degrees of freedom and a confidence interval of 95% (.025 beyond the t-score on either side). e) The margin of error would be (2.120)(1.74) = 3.7; therefore, the confidence interval would be 7.29 – 3.7 = 3.59 to 7.29 + 3.6 = 10.89. The true mean weight change is likely positive because no negative scores fall in the confidence interval. The true mean weight change also could be very small, because the lower end point of 3.6 pounds is near 0.

158 Statistics: The Art and Science of Learning from Data, 4th edition 8.33 Talk time on smartphones a) Dotplot of Talk Time

300

400

500

600

700 Talk Time

800

900

1000

The shape of the distribution is right-skewed. The assumptions are a random sample (fulfilled) and an approximately normal distribution (questionable because of right-skew, but the t-interval is robust to deviation from normal). The outlier at 1050 might make validity of results questionable. b) (i) IQR = 650 – 420 = 230; 1.5  IQR = (1.5)(230) = 345 Q1 – IQR = 420 – 345 = 75 and Q3 + IQR = 650 + 345 = 995 There is one potential outlier according to this criterion: 1050. (ii) x  3s  553  3(227)  128 and x  3s  553  3(227)  1234, so there are no potential outliers according to this criterion. c)



The 95% confidence interval is x  t.025  se   553  2.179 227



13 , or (416, 690). We can be 95%

confident that the population mean talk time is between 416 and 690 minutes. d) With 1050 removed: x  512, s = 178, df = 11.



The 95% confidence interval is x  t.025  se   512  2.201 178



12 , or (399, 625). The new

confidence interval is narrower than the one using all of the data and centered at a lower value (512 instead of 553). 8.34 Heights of seedlings a) From MINITAB: Variable N Mean StDev SE Mean 95% CI Seedling height 6 62.20 4.71 1.92 (57.26, 67.14) b) We could increase the sample size or decrease the confidence level for the confidence interval. c) From MINITAB: Variable N Mean StDev SE Mean 99% CI Seedling height 6 62.20 4.71 1.92 (54.45, 69.95) It is wider because we have more assurance of a correct inference by using a higher confidence level, which gives a larger t-value for the margin of error. d) The assumptions in (a) are that the data are produced randomly and that the population distribution is approximately normal. The former is more important than the latter. This method is robust in terms of the normal population assumption.

Chapter 8: Statistical Inference: Confidence Intervals 159 8.35 Buy it now a) The assumptions are a random sample (fulfilled) and approximately normal distribution of the Buy It Now price. The dotplot and box plot suggest a bell-shaped distribution except for an outlier indicated 675  630.8 by the box plot that might make results questionable. However, its z-score is  2.2, so it 20.47 is within three standard deviations of the mean. b) The standard error is s

n  20.47

9  6.82.

We are 95% confident that the population mean But It Now price of the iPhone 5s on eBay is between $615 and $647. d) Yes, all plausible values for the mean Buy It Now price are larger than the plausible values for mean closing price of auctions. e) No, the interval computed without $675 still lies above (569, 599). 8.36 Time spent on e-mail a)



The margin of error is t.025 s





n  1.96 13.05



1050  0.79. The margin of error can also be

found using the confidence interval given in the exercise; (7.680 – 6.100)/2 = 0.79. b) We have 95% confidence that the mean number of hours spent on e-mail per week is between 6.6 and 7.1 hours. c) There is no concern because the sample size is large (n = 1050). The central limit theorem applies, and the sampling distribution of the sample mean is approximately normal. 8.37 Grandmas using e-mail a)

The mean is 3.07, the standard deviation is 3.38, and the standard error is s

n  3.38

14  0.90.

b) The 90% confidence interval is x  t.05  se   3.07  1.771 0.90 , or (1.47, 4.67). We are 90% confident that the population mean number of hours per week spent sending and answering e-mail for women of at least age 80 is between 1.5 and 4.7 hours. c) Many women of age 80 or older will spend 0 or 1 hour per week on e-mail, with only a few spending more time. This leads to a right-skewed distribution. However, the t-interval is robust against departures from the normal distribution assumption. Also, there are no outliers in the preceding sample, so the inference should still be valid. 8.38 Wage discrimination? a) The confidence interval refers to means, not individual scores. b) This should say, “if random samples of 9 women were repeatedly selected, then 95% of the time, the confidence interval would contain the population mean.” c) We know that x falls in the confidence interval; this is the sample mean on which the confidence interval is based. d) If we sample the entire population and take the mean, we’re going to get the exact mean of the population. 8.39 How often read a newspaper? a) It is not plausible that  = 7. This value is well outside the confidence interval. b) The sample mean is 4.1, the standard error is s

n  3.0

240  0.194.

The confidence interval is x  t.025 ( se)  4.1  1.97(0.194), or (3.7, 4.5). If the sample size increased to 240, the margin of error would decrease. The standard deviation is fairly large relative to the mean, an indication that the population distribution might be skewed. (The lowest possible value of 0 is only (0 – 4.1)/3.0 = –1.37, or 1.37 standard deviations below the mean.) This would not affect the validity of this analysis because the sample size is bigger than 30. With a large random sample size, the sampling distribution approaches a bell shape. d) The term “robust” means that even if the normality assumption is not completely met, this analysis is still likely to produce valid results. c)

160 Statistics: The Art and Science of Learning from Data, 4th edition 8.40 Political views a)



The 95% confidence interval is x  t.025  se   4.0624  1.96 1.4529



1874 .

b) We cannot conclude that the population mean is higher than 4.0 because values up to 4.13 (the upper bound) are plausible values for the mean. c) (i) A 99% confidence interval would be wider than a 95% confidence interval. (ii) A smaller sample size would lead to a wider confidence interval than would a larger sample size. 8.41 Length of hospital stays If a sample mean is expected to fall within 1.0 of the true mean about 95% of the time, that indicates that the confidence interval extends from 1.0 below the mean to 1.0 above the mean, or 5.3 – 1.0 = 4.3 to 5.3 + 1.0 = 6.3. 8.42 Effect of n The standard error is s

n  100

25  20 and the margin of error is (2.0639)(20) = 41.3.

b) The standard error is s

n  100

100  10 and the margin of error is (1.9842)(10) = 19.8.

As the sample size increases, the margin of error decreases. 8.43 Effect of confidence level i) The standard error is s n  100 25  20 and the margin of error is (2.0639)(20) = 41.3. ii) The margin of error is (2.7969)(20) = 55.9. The margin of error increases as the chosen confidence level increases. 8.44 Catalog mail-order sales a) It is not plausible that the population distribution is normal because a large proportion are at the single value of 0. Because we are dealing with a sampling distribution of a sample greater than size 30, this is not likely to affect the validity of a confidence interval for the mean. Large random samples lead to sampling distributions of the sample mean that are approximately normal. b) The sample mean is 10, the standard error is s n  10 100  1. The 95% confidence interval is x  t.025 ( se)  10  1.984(1), or (8.0, 12.0). It does seem that the sales per catalog declined with this issue. $15 is not in the confidence interval, and, therefore, is not a plausible population mean for the population from which this sample came. 8.45 Number of children a)

The standard error is s

n  1.67

1971  0.04.

b) The 95% confidence interval is x  t.025 ( se)  1.89  1.96(0.04), or (1.81, 1.97). We can conclude that the population mean is less than 2.0 because the entire confidence interval is below 2.0. 8.46 Simulating the confidence interval a) The results will differ each time this exercise is conducted. b) We would expect 5% of the intervals not to contain the true value. c) Close to 95% of the intervals contain the population mean even though the population distribution is quite skewed. This is so because with a large random sample size, the sampling distribution is approximately normal even when the population distribution is not. The assumption of a normal population distribution becomes less important as n gets larger.

Section 8.4: Choosing the Sample Size for a Study 8.47 South Africa study n

pˆ (1  pˆ ) z 2



0.5(1  0.5)1.962

 196 people are needed to estimate the population proportion having m 0.072 at least a high school education to within 0.07 with 95% confidence. 2

Chapter 8: Statistical Inference: Confidence Intervals 161 8.48 Binge drinkers

n

pˆ (1  pˆ ) z 2



0.44(1  0.44)1.962 0.052

 379

8.49 Abstainers

n

pˆ (1  pˆ ) z 2 m2

pˆ (1  pˆ ) z 2



0.052

 385

0.19(1  0.19)1.962

 237 m2 0.052 c) Strategy (a) is inefficient if we are quite sure we’ll get a sample proportion that is far from 0.50 because it overestimates the sample size by quite a bit. The first sample size would be more costly than needed. 8.50 How many businesses fail? m2 pˆ (1  pˆ ) z 2



n

pˆ (1  pˆ ) z 2 m2



0.5(1  0.5)1.962 0.102

 97

n

pˆ (1  pˆ ) z 2 m2



0.5(1  0.5)1.962 0.052

 385

0.5(1  0.5)2.5752

 664 m2 0.052 d) As the margin of error decreases, we need a larger sample size to guarantee estimating the population proportion correct within the given margin of error and at a given confidence level. As the confidence level increases, we also need a larger sample size to get the desired results. 8.51 Canada and the death penalty c)

n

pˆ (1  pˆ ) z 2



0.5(1  0.5)1.962

0.48(1  0.48)1.962 0.0252

 1535

8.52 Farm size   246 m2 252 b) We can use the same formula as in (a):

n

 2002 1.962

 2 z2

246 

3002 1.962 m2

 m2 

3002 1.962 246

m

3002 1.962 246

 37.5

8.53 Income of Native Americans We guess that the range of a bell-shaped distribution is about 6s. One sixth of the range, 120,000, is 20,000, a good guess for the standard deviation. n

 2 z2 m2



 20, 0002  2.582 10002

 2663

8.54 Population variability For a very diverse population, we’d have a wider range of observed values, and hence, a larger standard deviation. Larger standard deviations result in larger standard errors and wider confidence intervals. For a homogeneous population, we would have a smaller standard deviation, and would not need the large sample size for the denominator of the standard error formula. When estimating the mean income for all medical doctors in the U.S., we’d have a fairly wide income: from the lower range of incomes of rural family doctors to the extremely high range of incomes of specialists at major teaching hospitals. A sample from this population would likely have a large standard deviation. For a population of McDonald’s entrylevel employees in the U.S., however, we’d have a much smaller range. They’d all likely be making minimum wage or slightly higher. The standard deviation of a sample from this population would be relatively small.

162 Statistics: The Art and Science of Learning from Data, 4th edition 8.55 Web survey to get large n They are better off with the random sample of 100 responses than with the website. The website does not produce the data randomly, one of the assumptions to make the inferences we’ve been discussing. 8.56 Do you like tofu? a) The sample proportion is 5/5 = 1.0. b) The standard error is pˆ (1  pˆ ) n  1(1  1) 5  0. The usual interpretation of standard error does

not make sense. The sampling distribution is likely, after all, to have some variability because the true probability (which determines the exact standard error) is positive. c) The margin of error is (1.96)(0) = 0, and thus, the confidence interval is 1.0 to 1.0. It is not sensible to conclude that all students at the school like tofu because this method works poorly in this case. d) It is not appropriate to use the large-scale confidence interval in (c) because we do not have more than 15 successes and 15 failures. It is more appropriate to use the small-sample method and add two successes and two failures to the results and repeat the process. This would give seven who said they liked it, and two who said they did not, for a total of 9. The new sample proportion would be 7/9 = 0.78. The standard error is now pˆ (1  pˆ ) n  0.78(1  0.78) 9  0.138. The 95% confidence interval is pˆ  z.025 ( se)  0.78  1.96(0.138), or (0.51, 1.00) because p cannot exceed 1. We can be 95% confident that the proportion of students who like tofu is within this interval. 8.57 Alleviate PMS? a) We first add two to the numbers of successes and failures. We then have 9 successes and 5 failures, for a total of 14. The sample proportion would be 9/14 = 0.643. The standard error is pˆ (1  pˆ ) n  0.643(1  0.643) 14  0.128. The 95% confidence interval is pˆ  z.025 ( se)  0.64  1.96(0.128), or (0.39, 0.89). b) It is plausible that it’s successful for only half the population. 0.50 is within the confidence interval. 8.58 Google Glass a)

The sample proportion is 0/500 = 0 and the standard error is

pˆ (1  pˆ ) / n  0(1  0) 500  0.

b) The margin of error is (1.96)(0) = 0, so the confidence interval is 0 – 0 = 0 to 0 + 0 = 0, or (0, 0). It is not sensible, the true proportion may be very small, but it is larger than zero. (Google sold some glasses.) c) With this small a sample, we expect 500(0) = 0 successes, which is well below the 15 required for the large sample method to work. Adding two to the number of successes and failures, we have 2 successes and 502 failures, for a total of 504. The sample proportion is now 2/504 = 0.004 and the standard error is pˆ (1  pˆ ) / n  0.004(1  0.004) 504  0.0028. The 95% confidence interval is pˆ  z.025 ( se)  0.004  1.96(0.0028), or (0, 0.0095) since the lower proportion cannot be negative.

d) Yes it is plausible, since the entire interval lies below 0.01.

Section 8.5: Using Computers to Make New Estimation Methods Possible 8.59 Why bootstrap? One purpose of using the bootstrap method is to determine confidence intervals for a given point estimate when we do not have a standard error or confidence interval formula that works well.

Chapter 8: Statistical Inference: Confidence Intervals 163 8.60 Estimating variability a) Sample with replacement from the 10 values, taking 10 observations and find the standard deviation. Do this many, many times. The 95% confidence interval would be the values in the middle 95% of the standard deviation values. b) Results will vary for each simulation. One simulation, using 10,000 resamples, resulted in a 95% confidence interval of (0.30, 0.63). 8.61 Bootstrap interval for the mean a) The sample mean is 11.6, the sample median is 6. 600

Frequency

500 400 300 200 100 0

60 80 hours

100



b) The 95% confidence interval is x  t.025  se   11.6  1.96 15

120



1399 , or (10.8, 12.4).

Sample, with replacement, the1399 values from the sample (where each value has a 1 in 1399 chance of being selected). Compute the mean for each such resample. Do this 10,000 times. The 2.5th and 97.5th percentile of these 10,000 values are the lower and upper bounds for a confidence interval for the mean. d) Results will vary for each simulation. One simulation, using 10,000 resamples, resulted in a 95% confidence interval of (10.8, 12.4). e) The bootstrap distribution of x looks approximately normal. This is not surprising, as the central limit theorem predicts that for large sample sizes (here 1399), the sampling distribution of will have this shape, regardless of the shape of the population distribution. 8.62 Bootstrap interval for the proportion a) Results will vary for each simulation. One simulation resulted in a sample proportion of 0.014. b) Results will vary for each simulation. One simulation resulted in a confidence interval of (0.04, 0.18). c) Judging by the histogram, the sampling distribution is right-skewed and not bell shaped. Hence, the interval extends more to the right than to the left relative to the sample proportion.

Chapter Problems: Practicing the Basics 8.63 Unemployed college grads a) These are point estimates. b) The information here is not sufficient to construct confidence intervals. We also need to know sample sizes. 8.64 Approval rating for president We could tell someone who had not taken a statistics course that we do not know the exact percentage of the population who approve of the job Barack Obama is doing as president, but we are quite sure that it is within 3% of 42%, that is, between 39% and 45%.

164 Statistics: The Art and Science of Learning from Data, 4th edition 8.65 British monarchy

The first sample proportion is 0.86. It has a standard error

pˆ (1  pˆ ) n  0.86(1  0.86) 1667 = 0.0085,

and the margin of error is z0.025  se   (1.96)(0.0085) = 0.017. The second sample proportion is 0.73. It has a standard error

pˆ (1  pˆ ) n  0.73(1  0.73) 1667 0.0109,

and the margin of error is z0.025  se   (1.96)(0.0109) = 0.021. 8.66 Born again a) This is a point estimate because it gives only one point, or one number, as an estimate of what the true population percentage is, rather than an interval, or range of possible percentages. b) This is only an estimate based on one sample of 2000 people. The true population percentage may not be exactly the same as the sample percentage. 8.67 Life after death The 95% confidence interval is 0.81 – 0.018 = 0.792 to 0.81 + 0.018 = 0.828, or (0.792, 0.828). We can say that we are 95% confident that the population percentage of people who believe in life after death falls between 79.2% and 82.8%. 8.68 Female belief in life after death X, 822, is the number of females who said that they believe in life after death out of N, 977, the total number of females in the sample, regardless of their response. “Sample p” is the proportion, 0.841, of females who said that they believe in life after death. “95.0% CI” is the 95% confidence interval; we can be 95% confident that the proportion of the population of females who believe in life after death is between 81.8% and 86.4%. 8.69 Vegetarianism a) We must assume that the data were obtained randomly. b) The standard error is pˆ (1  pˆ ) n  0.04(1  0.04) 10,000  0.002.

The 99% confidence interval is pˆ  z.005 ( se)  0.04  2.58(0.002), or (0.035, 0.045). The interval is so narrow, even though the confidence level is high, mainly because of the very large sample size. The very large sample size contributes to a small standard error by providing a very large denominator for the standard error calculation. c) We can conclude that fewer than 10% of Americans are vegetarians because 10% falls above the highest believable value in the confidence interval. 8.70 Alternative therapies This is not correct. A 95% confidence interval refers to an interval, not to a point estimate. 8.71 Population data It doesn’t make sense to construct a confidence interval because we have data for the entire population. We can actually know the proportion of vetoed bills; we don’t have to estimate it. 8.72 Wife supporting husband a) 122 said they strongly agree, and 359 said that they agree, for a total of 481. The sample size in 2008 was 1308. b) From (b) the sample proportion is 481/1308 = 0.368 and the standard error is pˆ (1  pˆ ) n  0.368(1  0.368) 1308  0.0133.

The 99% confidence interval for the population proportion who would agree is pˆ  z.005 ( se)  0.368  2.58(0.0133), or (0.33, 0.40). We can be 99% confident that the population proportion of women who agree with this statement falls within this interval.

Chapter 8: Statistical Inference: Confidence Intervals 165 8.73 Legalize marijuana? a) 496 said “legal.” 751 said “not legal.” The sample proportions are 0.398 and 0.602, respectively. b) The standard error is pˆ (1  pˆ ) n  0.398(1  0.398) 1247  0.0139. The 95% confidence interval

is pˆ  z.025 ( se)  0.398  1.96(0.0139), or (0.37, 0.43). We can conclude that a minority of the population supports legalization because 0.50 is above the upper endpoint of the 95% confidence interval. c) It appears that the proportion favoring legalization is increasing over time. 8.74 Smoking When the sample size is extremely large, the standard error is extremely small. This is because the standard error is pˆ (1  pˆ ) n . When n increases, the standard error decreases and will be quite small for very large values of n. Since the confidence interval is computed by taking the sample mean and adding and subtracting the appropriate multiple of the standard error, when n is extremely large the confidence interval will be quite narrow even for large confidence levels. 8.75 Streaming a)

The sample proportion is 0.43, the standard error is

pˆ (1  pˆ ) n  0.43(1  0.43) 2300  0.0103.

The 95% confidence interval is pˆ  z.025 ( se)  0.43  1.96(0.0103), or (0.41, 0.45). We are 95% confident that the percentage of U.S. adults regularly watching television shows via streaming is between 41% and 45%. b) A 99% confidence interval would be wider. To have more confidence, we need a wider interval. Technically, the z-score changes from 1.96 to 2.58, resulting in a wider interval. c) The confidence interval would be narrower. With 5000 adults, we have more information, resulting in a smaller standard error for the sample proportion. A smaller standard error leads to a narrower margin of error and thus narrower interval. 8.76 Edward Snowden a) True, this is the definition of a confidence interval. b) False, a confidence interval is about a population parameter, not about the result of a sample. 8.77 More NSA spying a) False, the true percentage may be as large as 52%. b) True, because 76% is no longer a plausible value and is outside the confidence interval 8.78 Grandpas using e-mail a) The first result is a 90% confidence interval for the mean hours spent per week sending and answering e-mail for males of at least age 80. The sample mean, x , is listed as 3.000 hours. Thus, the estimated mean hours spent per week sending and answering e-mail for males of at least age 80 is 3.000 hours. The sample standard deviation is 4.093. This quantity estimates the population standard deviation which tells us how far we can expect a typical observation to vary from the mean. These estimates are based on a sample of size 9. b) The sample mean is 3.000, the standard error is s n  4.093 9  1.3643. The 90% confidence interval is x  t.05 ( se)  3.000  1.8595(1.3643), or (0.463, 5.537). We can be 90% confident that the population mean number of hours spent per week sending and answering email for males of at least age 80 is between 0.5 and 5.5 hours. c) With only 9 observations, hard to determine shape, but distribution may be bell shaped or skewed right. There are no outliers in the sample. By robustness of t-interval, results should be valid. 8.79 Travel to work a) With a very large sample, the standard error is very small (because the denominator of the standard error formula is so large). With a small standard error, the margin of error also is small. b) We would need to know a standard deviation for the mean travel time.

166 Statistics: The Art and Science of Learning from Data, 4th edition 8.80 t-scores a) The t-score for a 95% confidence interval with a sample size of 10 is 2.262. For a sample size of 20 the t-score is 2.093. For a sample size of 30 the t-score is 2.045. For a sample size of infinity the tscore is 1.96 (the z-score for a 95% confidence interval). b) The answer in (a) suggests that the t distribution approaches the standard normal distribution as the sample size gets larger. 8.81 Fuel efficiency a) With only 10 observations, it is hard to determine shape, but the distribution may be bell shaped or skewed right. There are no outliers in the sample. Dotplot of Combined MPG

Boxplot of Combined MPG 30.0

Combined MPG

27.5 25.0 22.5 20.0

20 22 24 Combined MPG

17.5 15.0

b) From technology, the sample mean is 22.7 and the standard deviation is 4.244. The standard error is s n  4.244 10  1.3421. The 95% confidence interval is x  t.025 ( se)  22.7  2.2622(1.3421), or (19.7, 25.7). We have 95% confidence that the mean combined mpg for SUVs manufactured from 2012 to 2015 is between 19.7 and 25.7 mpg. 8.82 Psychologists’ income a)

The sample mean is 43,834, the standard error is s n  16,870 31  3029.94. The 95% confidence interval is x  t.025 ( se)  43,834  2.042(3029.94), or (37,647, 50,021).

We can be 95% confident that the population mean income is between $37,647 and $50,021. b) It assumes an approximately normal population distribution. c) If the assumption about the shape of the population distribution is not valid, even with a small n the results aren’t necessarily invalidated. The method is robust in terms of the normal distribution assumption. However, if there were some extreme outliers, this might not hold true. 8.83 More psychologists The first column, “Variable,” refers to the variable in which we are interested, income. The second column, “N,” tells us that there were 190 psychologists with a doctorate but with less than 1 year of experience in this sample. The third column tells us that the mean income for these 190 psychologists was $49,411, and the fourth column tells us that the standard deviation associated with this mean is $15,440. The fifth column, “SE Mean,” tells us the standard error, the standard deviation of the sampling distribution for samples of size 190. Finally, the sixth column tells us the 95% confidence interval for income. We can be 95% confident that the population mean income is between $47,204 and $51,618. 8.84 How long lived in town? a) The population distribution is not likely normal because the standard deviation is almost as large as the mean. In fact, the lowest possible value of 0 is only (0 – 20.3)/18.2 = –1.1, or 1.1 standard deviations below the mean. Moreover, the mean is quite a bit larger than the median. Both of these are indicators of skew to the right.

Chapter 8: Statistical Inference: Confidence Intervals 167 8.84 (continued) b) We can construct a 95% confidence interval, however, because the normal population assumption is much less important with such a large sample size.

The sample mean is 20.3, the standard error is s n  18.2 1415  0.484. The 95% confidence interval is x  t.025 ( se)  20.3  1.96(0.484), or (19.4, 21.2). We can be 95% confident that the population mean number of years lived in a given city, town, or community is between 19.4 and 21.2. 8.85 How often do women feel sad? a)

The sample mean is 1.81 and the standard error is s

n  1.98

816  0.069. The 95% confidence

interval is x  t.025 ( se)  1.81  1.963(0.069), or (1.67, 1.95). We can be 95% confident that the population mean number of days women have felt sad over the past seven days is between 1.67 and 1.95. b) This variable is not likely normally distributed given that the means for both women and men are much larger than their respective medians, and both standard deviations are larger than their respective means. In fact, the lowest possible score of 0 is (0 – 1.81)/1.98 = –0.9, or 0.9 standard deviations below the mean for women. Because the sample size is so large, however, there is no problem with the confidence interval method used in (a). 8.86 How often feel sad? From technology, the sample mean is 1.4 and the standard deviation is 2.221. The standard error is s n  2.221 10  0.702. The 90% confidence interval is x  t.05 ( se)  1.4  1.833(0.702), or (0.1, 2.7). We can be 90% confident that the mean for the population of Wisconsin students is between 0.1 and 2.7. For this inference to apply to the population of all University of Wisconsin students, we must assume that the data are randomly produced and that we have an approximately normal population distribution. 8.87 Happy often? a) We can verify these numbers using the GSS. The sample size is 1451. b) The standard error for the sample mean is s

n  2.05

1451  0.054, 0.05.

We have to assume that the data are produced randomly (true for the GSS, although not actually a simple random sample), and we have to assume that the population distribution is approximately normal. Given our large sample size, the second assumption is met. The 95% confidence interval is x  t.025 ( se)  5.27  1.962(0.05), or (5.2, 5.4). Since the confidence level is completely above 5, we can conclude that the population mean is at least 5.0. 8.88 Revisiting mountain bikes The sample mean is $628.30 and the standard error is s n  341.4 12  $98.55. The 95% confidence interval is x  t.025 ( se)  628.3  2.2010(98.55), or (411.4, 845.3). We can be 95% confident that the population mean for mountain bike price falls between $411 and $845. b) To form the interval, we need to assume that the data are produced randomly, and that the population distribution is approximately normal. The population does not seem to be distributed normally. The data seem to cluster on the low and high ends, with fewer in the middle. Unless there are extreme outliers, however, this probably does not have much of an effect on this inference. This method is fairly robust with respect to the normal distribution assumption. 8.89 eBay selling prices a) We could estimate the parameter  , which represents the mean Buy It Now selling price for the population of Samsung S5 16GB smartphones. b) From technology, the point estimate of  is the sample mean of $576.90. a)

168 Statistics: The Art and Science of Learning from Data, 4th edition 8.89 (continued)

The standard deviation is $36.70, and the standard error is s n  36.7 12  $10.60. The average deviation of prices from the sample mean of $576.90 is 36.70. The average deviation of the sample mean, x to the population mean  is $10.6.

d) The 95% confidence interval is x  t.025 ( se)  576.9  2.201(10.6), or (554, 600). We can be 95% confident that the population mean Buy It Now selling price for the population of Samsung S5 16GB smartphones is between $554 and $600. 8.90 Income for families in public housing a) 140 130

Annual Income

120 110 100 90 80 70 60 50

From the box plot, we can predict that the population distribution is skewed to the right. There does not seem to be an extreme outlier, so this should not affect the population inferences. This method is fairly robust with respect to the normal distribution assumption. b) From technology, the mean is 90.2 and the standard deviation is 20.99. c)



The 95% confidence interval is x  t.025 ( se)  90.2  2.0484 20.99



29 , or (82.26, 98.27). We can

be 95% confident that the mean income for the population of families living in public housing in Chicago is between $8,226 and $9,827. 8.91 Females watching TV a) We would not expect that TV watching has a normal distribution, but rather that it is right-skewed. One indication is that the standard deviation is almost as large as the mean. The other is that the lowest possible value is zero which is only (0 – 3.08)/2.7 = –1.14, or 1.14 standard deviations below the mean. b) The confidence interval is based on the assumptions that the data were produced randomly and that the population distribution is approximately normal. Although the distribution is not normal, the confidence interval is still valid since the method used to calculate the interval is robust to violations of the normality assumption. c) The interval refers to the mean number of hours of TV watched in a typical day for adult females. It does not refer to the range of possible hours of TV watched by a typical female 95% of the time. In repeated samples of adult females of size 698, we would expect 95% of the confidence intervals calculated to contain the population mean, the number of hours of TV watched by adult females in a typical day. 8.92 Males watching TV We can be 95% confident that the mean number of hours spent watching TV for the population of males falls between 2.67 and 3.08.

Chapter 8: Statistical Inference: Confidence Intervals 169 8.93 Working mother a) This choice of scoring assumes that strongly agree and agree are the same distance apart as are strongly disagree and disagree. It assumes, however, that there is a larger difference between agree and disagree, almost as if there’s another category in the middle (e.g., neutral). b) Based on this scoring, we would interpret this sample mean as close to this middle “neutral” area, but slightly toward disagree. c) We could make inferences about proportions for these data by looking at the proportion who responded in each way. For example, the proportion who responded “strongly agree” is 104/1308 = 0.0795. 8.94 Miami spring break a) 400

350

300 Hotel Price

Frequency

3 2

250 200

1 0

150 120 160 200 240 280 320 360 400 Hotel Price

100

b) The sample mean is $230.80 and the standard error is s n  70.8 18  $16.69. The 95% confidence interval is x  t.025 ( se)  230.8  2.1098(16.69), or (196, 266). With 95% confidence, the mean price for a double room in Miami over spring break is between $196 and $266. c) You should not be too concerned, the distribution is not too far from normal and the t-interval is robust with respect to the normal distribution assumption. The potential outlier of $386 is not too influential. Using technology, the interval without the potential outlier is (190, 253). 8.95 Sex partners in previous year a)

The standard error is s

n  1.22

1766  0.029.

b) The distribution was probably highly skewed to the right because the standard deviation is larger than 0  1.11 the mean. In fact, the lowest possible value of 0 is only  0.91 , or 0.91 standard deviation 1.22 below the mean. c) The skew need not cause a problem with constructing a confidence interval for the population mean, unless there are extreme outliers, because this method is robust with respect to the normal distribution assumption. 8.96 Men don’t go to the doctor n

pˆ (1  pˆ ) z 2 m2



0.5(1  0.5)1.962 0.052

 385

170 Statistics: The Art and Science of Learning from Data, 4th edition 8.97 Driving after drinking

n

m2 pˆ (1  pˆ ) z 2



n

pˆ (1  pˆ ) z 2 m2



0.2(1  0.2)1.962 0.042

 385

0.5(1  0.5)1.962

 601 m2 0.042 This is larger than the answer in (a). If we can make an educated guess about what to expect for the proportion, we can use a smaller sample size, saving possibly unnecessary time and money. 8.98 Changing views of United States

n

pˆ (1  pˆ ) z 2



0.83(1  0.83)1.962 0.032

 603

If we assume a 95% confidence interval, we can estimate the sample size to have been about 603. 8.99 Mean property tax n

 2 z2



10002 1.962

 385 m2 1002 The solution makes the assumption that the standard deviation will be similar now. b) The margin of error would be more than $100 because the standard error will be larger than predicted. c) With a larger margin of error, the 95% confidence interval is wider; thus, the probability that the sample mean is within $100 (which is less than the margin of error from part b) of the population mean is less than 0.95. 8.100 Accept a credit card? Since the number of successes is less than 15, add two to the successes and failures. The sample proportion is now 2/104 = 0.019.

The standard error is

pˆ (1  pˆ ) n  0.019(1  0.019) 104  0.013. The 95% confidence interval is

pˆ  z.025 ( se)  0.019  1.96(0.013), or (–0.006, 0.044), which should be reported as (0, 0.044), because 0 is the lowest possible proportion. They can conclude that fewer than 10% of their population would take the credit card. 8.101 Kicking accuracy a) With the small sample size, we’d have to add two to each outcome. We’d then have 2 failures and 12 successes, for a sample proportion of 12/14 = 0.857 successes. We can now use the large sample method to find the confidence interval. The standard error is pˆ (1  pˆ ) n  0.857(1  0.857) 14  0.094. The 95% confidence interval is pˆ  z.025 ( se)  0.857  1.96(0.094), or (0.67, 1.04), which should be reported as (0.67, 1.00), because 1 is the highest possible proportion. b) The lowest value that is plausible for that probability is 0.67. c) The random sample assumption might not be met. For example, under this anxiety-producing situation of trying out, the player might react well to a first success, increasing his or her chances of future successes. Alternately, he or she might miss the first one, increasing anxiety and diminishing the chances of making later kicks.

Chapter Problems: Concepts and Investigations 8.102 Religious beliefs Each student’s one-page report will be different, but will explain the logic behind random sampling and the effect of sample size on margin of error.

Chapter 8: Statistical Inference: Confidence Intervals 171 8.103 TV watching and race We can compare black and white subjects by creating a confidence interval for each group. The assumptions on which the confidence intervals are based are that the data were randomly produced and that the population distributions are approximately normal. As calculated below, the 95% confidence interval for white subjects is (2.84, 3.12) and for black subjects it is (3.87, 4.89). It seems that blacks watch more TV than do whites, on the average.

White subjects: The sample mean is 2.98, the standard error is s n  2.66 confidence interval is x  t.025 ( se)  2.98  (1.96)(0.073), or (2.84, 3.12). Black subjects: The sample mean is 4.38 and the standard error is s

1324  0.073. The 95%

n  3.58

188  0.261. The 95%

confidence interval is x  t.025 ( se)  4.38  (1.97)(0.261), or (3.87, 4.89). 8.104 Housework and gender We can compare men and women by creating a confidence interval for each gender. The assumptions on which the confidence interval is based are that the data were randomly produced and that the population distributions are approximately normal. As calculated below, the 95% confidence interval for men is (17.7, 18.5) and for women is (32.2, 33.0). It seems that women do more housework than do men, on the average.

Men: The sample mean is 18.1, the standard error is s n  12.9 interval is x  t.025 ( se)  18.1  (1.96)(0.198), or (17.7, 18.5).

4252  0.198. The 95% confidence

Women: The sample mean is 32.6, the standard error is s n  18.2 6764  0.221. The 95% confidence interval is x  t.025 ( se)  32.6  (1.96)(0.221), or (32.2, 33.0). 8.105 Women’s role opinions Running the house:

The sample proportion is 275/1831 = 0.150 and the standard error is

pˆ (1  pˆ ) n  0.15(1  0.15) 1831

= 0.008. The 95% confidence interval is pˆ  z.025 ( se)  0.150  1.96(0.008), or (0.134, 0.166). Because 0.50 does not fall in this range, we can conclude that fewer than half of the population agrees with the statement that women should take care of running their homes and leave running the country up to the men. Man as the achiever: The sample proportion is 627/1835 = 0.342 and the standard error is pˆ (1  pˆ ) n  0.342(1  0.342) 1835 = 0.011. The 95% confidence interval is pˆ  z.025 ( se)  0.342  1.96(0.011), or (0.320, 0.364).

Because 0.50 does not fall in this range, we can conclude that fewer than half of the population agrees with the statement that it is better for everyone involved if the man is the achiever outside the home and the woman takes care of the home and the family. Preschool child: The sample proportion is 776/1830 = 0.424 and the standard error is pˆ (1  pˆ ) n  0.424(1  0.424) 1830 = 0.012. The 95% confidence interval is pˆ  z.025 ( se)  0.424  1.96(0.012), or

(0.400, 0.448). Because 0.50 does not fall in this range, we can conclude that fewer than half of the population agrees with the statement that a preschool child is likely to suffer if her mother works. The full one-page report will differ for each student, but should include the above findings, along with interpretations and descriptions, and information about assumptions.

172 Statistics: The Art and Science of Learning from Data, 4th edition 8.106 Types of estimates If we know the confidence interval of (4.0, 5.6), we know the mean falls in the middle because the confidence interval is calculated by adding and subtracting the same number from the mean. In this case, the mean equals 4.8. On the other hand, if we only knew the mean of 4.8, we could not know the confidence interval, and would have much less of an idea of how accurate this point estimate is likely to be. 8.107 Width of a confidence interval When we use larger confidence levels, we want to be able to be even more accurate, and thus, we have to have a wider interval. For example, if we try to guess someone’s age, we’re more likely to be accurate if we guess a wider range of ages. Mathematically, a higher confidence level gives us a higher t- or z-score. This score is what we multiply by the standard error to get the confidence interval; if it’s bigger, we have a bigger confidence interval. When we use a larger sample size, on the other hand, we end up with a narrower interval. This makes sense if we think about the likelihood of a larger sample being more accurate. Also, mathematically, the larger sample size in the denominator of the standard error calculation gives us a smaller standard error. Because standard error is part of the calculation of margin of error, a smaller standard error gives us a smaller margin of error, and hence, a smaller confidence interval. 8.108 99.9999% confidence An extremely large confidence level makes the confidence interval so wide as to have little use. 8.109 Need 15 successes and failures a) (0.5)(30) = 15 b) (0.3)(50) = 15 c) (0.1)(150) = 15 In all cases, if we have a smaller sample size, this number of successes mathematically cannot be 15. 8.110 Outliers and CI When the fifth observation is changed to 875, the 95% confidence interval becomes (588, 718), which is much wider than original interval. An outlier can dramatically affect the standard deviation (21 for original data, 84 for modified data) and hence the margin of error. 8.111 What affects n? a) An increase in the confidence level, leads to a higher z-score, which leads to a higher n because it increases the numerator. We would need a larger sample size to be more confident. b) A decrease in the margin of error, m, would lead to a higher n because it decreases the denominator. We would need a larger sample size to have less error. 8.112 Multiple choice: CI property The best answer is (a). 8.113 Multiple choice: CI property 2 The best answer is (b). 8.114 Multiple choice: Number of close friends Both (b) and (e) are correct. 8.115 Multiple choice: Why z? The best answer is (a). 8.116 Mean age at marriage a) The confidence interval refers to the population, not the sample, mean. b) The confidence interval is an interval containing possible means, not possible individual scores. c) x is the sample mean; we know exactly what it is. d) If we sampled the entire population even once, we would know the population mean exactly. 8.117 Interpret CI If we repeatedly took samples of 50 records from the population, approximately 95% of those intervals would contain the population mean age. 8.118 True or false False, it should be the population proportion. Copyright © 2017 Pearson Education, Inc.

Chapter 8: Statistical Inference: Confidence Intervals 173 8.119 True or false False, it is the sampling distribution that must be approximately normal, not the population distribution. 8.120 True or false False, a volunteer sample is not a random sample, thus violating one of the necessary assumptions. 8.121 True or false

True, since the denominator of the margin of error is n , quadrupling n doubles the value of the denominator, thereby halving the margin of error. 8.122 Women’s satisfaction with appearance False, margin of error depends on standard error. Standard error uses the sample proportion, which would differ for each of these responses, in its calculation. ♦♦8.123 Opinions over time about the death penalty a) When we say we have 95% confidence in an interval for a particular year, we mean that in the long-run (that is, if we took many random samples of this size from this population), the intervals based on these samples would capture the true population proportion 95% of the time. b) The probability of all intervals containing the population mean is 0.264. 26! P  26  0.9526 1  0.9526 26  0.264 26!(26  26)! c) The mean of the probability distribution of X is np = (26)(0.95) = 24.7. d) To make it more likely that all 26 inferences are correct, we could increase the confidence level, to 99% for example. ♦♦8.124 Why called “degrees of freedom”? The mean is the sum of all observations divided by the sample size. If we don’t know one observation, but we do know the mean, we can solve algebraically to find the observation. For example, if we know that two of three observations are 1 and 2, and we know the mean is 2, we can solve as follows: 1 2  x 2  6  1 2  x  6  3 x  x  3 3 ♦♦8.125 An alternative interval for the population proportion 20  p  1.96 p(1  p ) 10 20 If we substitute p = 1, we get 0 on both sides of the equation. If we substitute p = 0.83887, we get 0.16113 on both sides of the equation. b) The confidence interval formed using the method in this exercise seems more believable because it forms an actual interval and contains more than just the value of 1. It seems implausible that all of the students have iPods. ♦♦8.126 m and n The z-score of 1.96 is approximately 2.0. So, the numerator of the formula for n is approximately 0.50(1 – 0.50)(2.0)(2.0), which is 1. ♦♦8.127 Median as point estimate If the population is normal, the standard error of the median is 1.25 times the standard error of the mean. A larger standard error means a larger margin of error, and therefore, a wider confidence interval. The sample mean tends to be a better estimate than the sample median because it is more precise.

pˆ  p  1.96 p(1  p ) n 

174 Statistics: The Art and Science of Learning from Data, 4th edition

Chapter Problems: Student Activities ♦♦8.128 Randomized response a) If a head is obtained on the first flip (probability = 0.5), P(HH) = 1/2  1/2 = 1/4 and P(HT) = 1/2  1/2 = 1/4. If a tail is obtained on the first flip (probability = 1/2), the student answers H with probability p for P(TH) = 1/2  p = p/2 and the student answers T with probability of 1 – p for P(TT) = (1 – p)/2. b) The proportion who report head for their second response is the sum of P(HH) and P(TH) = 0.25 + p/2. Thus we can estimate p as pˆ  2( qˆ  0.25)  2qˆ  0.5 . c) Answers will be different for each class. 8.129 GSS project The results will be different each time this exercise is conducted.

Chapter 9: Statistical Inference: Significance Tests About Hypotheses 175

Section 9.1: Steps for Performing a Significance Test 9.1 H0 or Ha? a) null hypothesis b) alternative hypothesis c) (a) H0: p = 0.50; Ha: p  0.50 (b) H0: p = 0.24; Ha: p < 0.24 9.2 H0 or Ha? a) alternative hypothesis b) null hypothesis c) alternative hypothesis 9.3 Burden of proof H0: The mean toxicity level equals the threshold. (“no effect”); H0 specifies a single value, the threshold, for the parameter. Ha: The mean toxicity level exceeds the threshold. 9.4 Financial aid Let  be the mean dollar amount of financial aid granted to students admitted in 2016. The null hypothesis is H 0 :   44,000 and the alternative hypothesis is H a :   44,000. 9.5 Low-carbohydrate diet a) This is an alternative hypothesis because it has a range of parameter values. b) The relevant parameter is the mean weight change,  . H0:  = 0; this is a null hypothesis. 9.6 Examples of hypotheses The examples given by each student will differ. 9.7 Proper hypotheses? a) The null and alternative hypotheses are always about population parameters (e.g., p or  ) and never about sample statistics such as p̂ or x . The correct hypotheses are: H0: p = 0.5, Ha: p > 0.5. b) The alternative hypothesis needs to specify a range of parameters. The correct hypotheses are: H0:  = 100, Ha:  > 100. c)

The range of values in the alternative hypothesis needs to be an “alternative’’ to the value specified in the null hypothesis, so it cannot include the null value. (p > 0.10 includes the null value 0.30.) The correct hypotheses are: H0: p = 0.30, Ha: p > 0.30 or H0: p = 0.10, Ha: p > 0.10. 9.8 z test statistic The data give strong evidence against the null hypothesis. Most scores fall within three standard errors of the mean, and this sample proportion falls over three standard errors from the null hypothesis value. 9.9 P-value a) This P-value does not give strong evidence against the null hypothesis. b) This extreme P-value does give strong evidence against the null hypothesis.

Section 9.2: Significance Tests About Proportions 9.10 Psychic The parameter of interest is the proportion, p, of correct predictions of coin flips by psychic. The null hypothesis is that the psychic will successfully predict the outcome of the flip of a coin 1/2 of the time, and the alternative hypothesis is that the psychic will predict the outcome more than 1/2 of the time. H0: p = 0.5; Ha: p > 0.5 or H0: p = 1/2; Ha: p > 1/2 9.11 Believe in astrology? The parameter of interest is, p, the true probability of a correct prediction. The null hypothesis is that the astrologer will successfully predict the personality profile 1/4 of the time and the alternative hypothesis is that the astrologer will successfully predict the personality profile more than 1/4 of the time. H0: p = 1/4 and Ha: p > 1/4 Copyright © 2017 Pearson Education, Inc.

176 Statistics: The Art and Science of Learning from Data, 4th edition 9.12 Get P-value from z a) 0.15 b) 0.30 c) 0.85 d) None of these P-values gives strong evidence against H0. All of them indicate that the null hypothesis is plausible. 9.13 Get more P-values from z a) (i) 0.006 (ii) 0.012 (iii) 0.994 b) Yes, the P-values in (i) and (ii) indicate that the test statistic is very extreme, strong evidence against H0. 9.14 Find test statistic and P-value a)

The standard error is

p0 (1  p0 ) n  0.5(1  0.5) 100  0.05, so z 

0.35  0.50  3.0. 0.05

b) The P-value is 0.001. c) If the null hypothesis were true, the probability would be 0.001 of getting a test statistic at least as extreme as the value observed. This does provide strong evidence against H0. This is a very small Pvalue; it does not seem plausible that p = 0.50. 9.15 Dogs and cancer a) H0: p = 1/5 b) Ha: p  1/5 c) Ha: p > 1/5 d) P-value  0.000. The probability of obtaining a sample proportion of 81 or more successes in 83 trials is essentially 0; thus, there is strong evidence that the probability of a correct selection is greater than with random guessing. 9.16 Religion important in your life? 1. The response is categorical with outcomes “yes” or “no” to the statement that young adults pray daily; p represents the probability of a yes response. The poll was a random sample of 1679 18-29-year-olds and np0  n(1  p0 )  1679(0.5)  839.5  15. 2.

H 0 : p  0.5; H a : p  0.5

pˆ  0.45, so z 

pˆ  p0

p0 1  p0  / n



0.45  0.5 0.5(0.5) /1679

 4.10. The sample proportion is 4.1 standard

errors below the null hypothesis value. The P-value  0 is the probability of obtaining a sample proportion at least as extreme as the one observed, if the null hypothesis is true. 5. Since the P-value is approximately 0, the sample data supports the alternative hypothesis. There is very strong evidence that the percentage of 18-29-year-olds who pray daily is not 50%. 9.17 Another test of astrology a) Let p be the proportion of adults who guess correctly. H 0 : p  1/ 3; H a : p  1/ 3 4.

p0 (1  p0 ) n  0.333(1  0.333) 83  0.052, so

pˆ  28 / 83  0.337; The standard error is

0.337  0.333  0.08. 0.052 The P-value is 0.47. If the null hypothesis were true, the probability would be 0.47 of getting a test statistic at least as extreme as the value observed. It is plausible that the null hypothesis is correct. I would not conclude that people are more likely to select their “correct” horoscope than if they were randomly guessing. z

Chapter 9: Statistical Inference: Significance Tests About Hypotheses 177 9.18 Opinion on fracking a year earlier a) Let p be the proportion of the population that opposes increased use of fracking. H 0 : p  0.5; H a : p  0.5 b)

pˆ  740 /1506  0.491; The standard error is

p0 (1  p0 ) n  0.50(1  0.50) 1506  0.0129, so

0.491  0.50  0.670; the sample proportion of 0.491 is less than one standard error less than the 0.0129 null hypothesis value of 0.50. c) The P-value is 0.251. Because the P-value is larger than the significance level of 0.05, we do not reject the null hypothesis. There is no evidence from the survey that in 2013 those opposing fracking are in the minority. The probability would be 0.251 of getting a test statistic at least as extreme as the value observed if the null hypothesis were true, and the population proportion were 0.50. d) np0  n(1  p0 )  1506(0.5)  735  15; The sample of 1506 respondents must be a random sample representative of the U.S. population in 2013 (which it was). e) The P-value for the two-sided alternative is 2(0.251) = 0.502. 9.19 Testing a headache remedy z

pˆ  22 / 30  0.733; The standard error is

p0 (1  p0 ) n  0.50(1  0.50) 30  0.091, so

0.733  0.50  2.56. 0.091 b) When we look up the P-value, we find that the proportion beyond that z-score is 0.005. Doubled to include both tails, the P-value is 0.01. If the null hypothesis were true, the probability would be 0.01 of getting a test statistic at least as extreme as the value observed. We have strong evidence that the population proportion of children who had more pain relief with the drug differs from 0.50. c) We have to meet three assumptions to use this test. The data must be categorical, the data must be obtained using randomization, and the sample size must be large enough that the sampling distribution of the sample proportion is approximately normal (i.e., expected successes and failures both at least fifteen under H0). np0  n(1  p0 )  30(0.5)  15  15, But in this case, the data were not obtained using randomization, but rather from a convenience sample which might not be representative of the population. 9.20 Gender bias in selecting managers z

Let p be the probability that the company selects a female (H 0 : p  0.4). No effect here means a probability of selecting females that is in accordance with the proportion of eligible females. (Note: It is also correct to let p be the probability of selecting a male (H 0 : p  0.6).

b) There is a gender bias if p is different (i.e., either smaller or larger) than 0.4, so we test H 0 : p  0.4 and H a : p  0.4. (Or, if p was selected as the probability of selecting a male in (a), then we have H 0 : p  0.6 and H a : p  0.6. ) c)

The large-sample analysis is justified because the expected successes and failures are both at least fifteen under H0, np0  40(0.4)  16  15 and n(1  p0 )  40(0.6)  24  15. pˆ  12 / 40  0.30; The

0.3  0.4  1.29. 0.077 d) The P-value in the table refers to the alternative hypothesis (H a : p  0.4). (Note: If you chose p as the probability of selecting a male in (a) and tested the alternative (H a : p  0.6) , it will have the same Pvalue.) The P-value of 0.1967 is large. The sample proportion of 0.3 falls not too far (1.29 standard errors) from the hypothesized value of 0.4, indicating no unusual data in light of the null hypothesis. If the null-hypothesis were true, observing a test statistic (or sample proportion) this extreme or even more extreme is not that unlikely (19.7%). e) Because the P-value of 0.1967 is larger than the significance level, we would not reject the null hypothesis. There is insufficient evidence that the proportion of females selected for management training is different from the proportion of 40% eligible for that training.

standard error is

p0 (1  p0 ) n  0.40(1  0.40) 40  0.077, so z 

178 Statistics: The Art and Science of Learning from Data, 4th edition 9.21 Gender discrimination Plausible values for probability of a female to be selected range from 0.16 to 0.44. This is in accordance with the decision reached in (e) of the previous exercise not to reject the null hypothesis (H 0 : p  0.4) in favor of the alternative hypothesis (H a : p  0.4) because 0.4 is a plausible value for that probability. 9.22 Garlic to repel ticks a) The relevant variable is whether garlic or placebo is more effective, and the parameter is the population proportion, p, those for whom garlic is more effective than placebo. b) H 0 : p  0.5 and H a : p  0.5; the sample size is adequate because there are at least 15 successes (37 with garlic more effective) and failures (29 with placebo more effective). c) pˆ  37 / 66  0.561; The standard error is p0 (1  p0 ) n  0.50(1  0.50) 66  0.062, so 0.561  0.50  0.984. 0.062 d) The P-value is 0.33. This P-value is not that extreme. If the null hypothesis were true, the probability would be 0.33 of getting a test statistic at least as extreme as the value observed. It is plausible that the null hypothesis is correct. Because the probability is 0.33 that we would observe our test statistic or one more extreme due to random variation, there is not strong evidence that the population proportion which would have fewer tick bites with garlic versus placebo differs from 0.50. 9.23 Exit-poll predictions a) The variable is whether someone voted for Brown. The parameter of interest is the proportion, p, among all California voters who voted for Brown b) H 0 : p  0.5 and H a : p  0.5; The first assumption is that the data are categorical; voting status for Brown is categorical. Second, the sample must be random; we will assume that the sample is random. Third, the sample must be sufficiently large to assume that the sampling distribution of the sample proportion is approximately normal: np0  n(1  p0 )  3889(0.5)  1944.5  15. z

z = 3.87 and the P-value = 0.0001. Assuming p = 0.50, the probability of obtaining a sample proportion where 53.1% or more of the voters voted for Brown or the other extreme, less than 46.9% (since two-sided alternative) is less than 0.1%. d) Since the P-value < 0.05, we reject the null hypothesis. There is very strong evidence that the proportion of voters voting for Brown is different from 50%. Our sample percentage of 53.1% indicated it is higher. We predict that Brown will win the election. 9.24 Which cola? a) The test statistic (“Z-Value”) is calculated by taking the difference between the sample proportion and the null proportion and dividing it by the standard error. b) We get the “P-value” by looking up the “Z-value” in Table A or using technology. We have to determine the two-tail probability from the standard normal distribution below –1.286 and above 1.286. The P-value of 0.20 tells us that if the null hypothesis were true, a proportion of 0.20 of samples would fall at least this far from the null hypothesis proportion of 0.50. This is not very extreme; it is plausible that the null hypothesis is correct, and that Coke is not preferred to Pepsi. c) It does not make sense to accept the null hypothesis. It is possible that there is a real difference in the population that we are not detecting in our test (perhaps because the sample size is not very large), and we can never accept a null hypothesis. A confidence interval shows that 0.50 is one of many plausible values. d) The 95% confidence interval tells us the range of plausible values, whereas the test merely tells us that 0.50 is plausible. 9.25 How to sell a burger 1) Assumptions: The data are categorical (higher sale with coupons versus higher sales with posters); we’ll assume the data are obtained randomly; the expected successes and failures are both at least fifteen under H0; np0  n(1  p0 )  50(0.5)  25  15. 2) Hypotheses: H 0 : p  0.5; H a : p  0.5

Chapter 9: Statistical Inference: Significance Tests About Hypotheses 179 9.25 (continued) 3) Test Statistic: z 

0.56  0.50 0.5(1  0.5) / 50

 0.85

4) P-value: 0.40 5) Conclusion: If the null hypothesis were true, the probability would be 0.40 of getting a test statistic at least as extreme as the value observed. It is plausible that the null hypothesis is correct, and that coupons do not lead to higher sales than do posters. ♦♦9.26 A binomial headache This P-value gives strong evidence against the null hypothesis. It would be very unlikely to have a sample proportion of 1.00 if the actual population proportion were 0.50. ♦♦9.27 P-value for small samples a) This has the binomial distribution because there are two possible outcomes, each trial holds the same probability of success, and the n trials are independent. b) The P-value represents the probability of observing the test statistic x = 22, or a value even more extreme if the population proportion is 1/7. Because the random variable X is binomial, the P-value is P(x = 22) + P(x = 23)+ … + P(x = 54).

Section 9.3: Significance Tests About Means 9.28 Which t has P-value = 0.05? a) t = –2.145 or t = 2.145 b) t = 1.762 c) t = –1.762 9.29 Practice mechanics of a test a) 0.0026 or 0.03 b) 0.013 c) 0.987 9.30 Effect of n The P-value would be larger when t = 1.20 than when t = 2.40 because the t-value of 1.20 is less extreme. 9.31 Low-carbohydrate diet a) P-value  0 b) The P-value is interpreted as the proportion of samples that would have a test statistic value at least as extreme as –8.2 (in either direction), given that the null hypothesis is true. c) Yes, both the P-value and the 95% confidence interval lead to the same conclusion about H0, namely to reject in favor of the alternative. At approximately 0, the P-value is below any reasonable significance level for the test. The confidence interval does not contain the hypothesized value of 0, indicating that the null hypothesis should be rejected in favor of the alternative. 9.32 Female work week a) The relevant variable is the number of hours worked by females in the previous week; the parameter of interest is the mean number of hours,  , worked by females in the United States in the previous week.

H 0 :   40; H a :   40

t

37.0  40

 4.8; P-value  0. The P-value is the probability of observing a test statistic this 15.1 583 extreme or even more extreme (equivalently, a sample mean this far away from the null value of 40 or even further away, in both directions) when the null hypothesis is true, the P-value is very small (less than 0.1%). d) Because the P-value is less than the significance level of 0.01, we have sufficient evidence to reject the null hypothesis and conclude that the mean working week for females in the United States is different from 40 hours.

180 Statistics: The Art and Science of Learning from Data, 4th edition 9.33 Facebook friends a) The relevant variable is the number of friends on Facebook; the parameter of interest is the mean number of friends,  , for students at Williams that use Facebook.

H 0 :   100; H a :   100 t

x  0

 2000  20002  1000  20002  3000  20002   2000  20002

s

4 1

se  s

t



122.7  100

 1.14; The sample mean of 122.7 falls 1.14 standard errors above the null s n 71.8 13 hypothesis value of 100. d) P-value = 0.138. The P-value is the probability of observing a test statistic this large or larger (equivalently a sample mean that is this high or higher) when the null hypothesis is true. In this case, the P-value is not that small (about 14%). This does not provide evidence to reject the null hypothesis. There is insufficient evidence to conclude that the mean number of Facebook friends is larger than 100. 9.34 Lake pollution 2000  1000  3000  2000 8000 a) x    2000 4 4

n  816.5

x  0 s



2, 000,000  816.5 3

4  408.25

2000  1000 816.5

 2.45

The P-value is 0.046 for a one-sided test. This is smaller than 0.05, so we have enough evidence to reject the null hypothesis at a significance level of 0.05. We have relatively strong evidence that the wastewater limit is being exceeded. d) The one-sided analysis in (b) implicitly tests the broader null hypothesis that   1000. We know this because if it would be unusual to get a sample mean of 2000 if the population mean were 1000, we know that it would be even more unusual to get this sample mean if the population mean were less than 1000. 9.35 Weight change for controls 1) Assumptions: Random sample on a quantitative variable having a normal population distribution. Here, the data are not likely produced using randomization. Population distribution may be skewed, but the test is two-sided so it is robust to a violation of the normal population assumption. 2) Hypotheses: H 0 :   0; H a :   0 3) Test statistic: t 

x  0 s



0.5  0 8.0

 0.32

4) P-value: 0.75 5) Conclusion: If the null hypothesis were true, the probability would be 0.75 of getting a test statistic at least as extreme as the value observed. Based on this large P-value, it is plausible that the null hypothesis is correct, and that there was no change in mean weight in the control group.

Chapter 9: Statistical Inference: Significance Tests About Hypotheses 181 9.36 Crossover study a) The difference scores are 40, 15, 90, 50, 30, 70, 20, 30, –35, 40, 30, 80, 130. The sample size is small so we cannot tell too much from the plot, but there is no evidence of severe non-normality even though the distribution is right-skewed. Dotplot of Difference Between F and S

-40 -30 -20

-10

90 100 110 120 130 140

Difference Between F and S

b) 1) Assumptions: The data (PEF change scores) are randomly obtained from a normal population distribution. Here, the data are not likely produced using randomization, but are likely a convenience sample. The two-sided test is robust if the population distribution is not normal. 2) Hypotheses: H 0 :   0; H a :   0 3) Test statistic: t 

x  0 s



45.4  0 40.6 / 13

 4.03

4) P-value: 0.002 5) Conclusion: If the null hypothesis were true, the probability would be 0.002 of getting a test statistic at least as extreme as the value observed. There is strong evidence that PEF levels were lower with salbutamol than with formoterol. c) The assumption of random production of does not seem valid for this example. A convenience sample limits reliability in applying this inference to the population at large. 9.37 Too little or too much wine? 1) Assumptions: The data are produced using randomization, from a normal population distribution. 2) Hypotheses: H 0 :   5.1; H a :   5.1 3) Test statistic: t 

5.065  5.1 0.0870

 0.80

4) P-value: 0.48 5) Conclusion: If the null hypothesis were true, the probability would be 0.48 of getting a test statistic at least as extreme as the value observed. There is not enough evidence to support that the true mean differs from 5.1 ounces. 9.38 Selling a burger 1) Assumptions: The data are produced using randomization, from a normal population distribution. The two-sided test is robust for the normality assumption. 2) Hypotheses: H 0 :   0; H a :   0 3) Test statistic: t 

x  0 s



3000  0 4000 / 10

 2.37

4) P-value: 0.04 5) Conclusion: If the null hypothesis were true, the probability would be 0.04 of getting a test statistic at least as extreme as the value observed. Because the P-value of 0.04 < 0.05, there is sufficient evidence that the coupons led to higher sales than did the outside posters. Copyright © 2017 Pearson Education, Inc.

182 Statistics: The Art and Science of Learning from Data, 4th edition 9.39 Assumptions important? a) The confidence interval does not include 0 and also indicates that coupons led to higher sales than did the outside posters. b) A one-sided test might be problematic. If the population distribution is highly non-normal (such as very skewed) the method is not robust for a one-sided test. 9.40 Anorexia in teenage girls a) Most of the data fall between 4 and 14. The sample size is small so we cannot tell too much from the plot, but there is no evidence of severe non-normality. Dotplot of Weight Change

-4

8 12 Weight Change

b) Technology verifies these statistics. c) 1) Assumptions: The data are quantitative and are produced randomly and the population distribution should be approximately normal. 2) Hypotheses: H 0 :   0; H a :   0 3) Test statistic: t 

x  0 s



7.29  0 7.18 / 17

 4.19

4) P-value: 0.001 5) This extreme P-value suggests that we have strong evidence against the null hypothesis that family therapy has no effect. If the null hypothesis were true, the probability would be only 0.001 of getting a test statistic at least as extreme as the value observed. 9.41 Sensitivity study From technology, after changing 20.9 to 2.9, the test statistic changes from 2.21 to 1.98, and the P-value changes from 0.04 to 0.06. The test statistic is less extreme, and the P-value is no longer smaller than 0.05. We can no longer reject the null hypothesis. The conclusion does depend on the single observation of 20.9. 9.42 Test and CI Results of 99% confidence intervals are consistent with results of two-sided tests with significance levels of 0.01. A confidence interval includes the most plausible values for the population mean to a 99% degree of confidence. If the test rejects the null hypothesis with significance level 0.01, then the 99% confidence interval does not contain the value in the null hypothesis.

Section 9.4: Decisions and Types of Errors in Significance Tests 9.43 Dr. Dog a) For the significance level of 0.05, we would reject the null hypothesis. We have strong evidence that dogs can detect urine from bladder cancer patients at a rate higher than would be expected by chance. b) If we made an error, it was a Type I error. A Type I error would indicate that we concluded that dogs could detect urine from bladder cancer patients, but they really were not able to do so any better than chance.

Chapter 9: Statistical Inference: Significance Tests About Hypotheses 183 9.44 Error probability a) The probability of Type I error would be 0.05. b) If this test resulted in a decision error, it was a Type I error. 9.45 Fracking errors a) A Type I error would occur if we concluded that those opposing fracking are in the minority when in fact they are not. b) A Type II error would occur if we failed to reject the null hypothesis when it was false. Thus, we determined that it was plausible that the null hypothesis was correct and said that there is no evidence of a minority of those who oppose fracking when, in fact, they are in the minority. 9.46 Anorexia errors a) A Type I error would occur if we rejected the null hypothesis when it was true. Thus, we concluded that the therapy had an effect when in fact it did not. b) A Type II error would occur if we failed to reject the null hypothesis when it was false. Thus, we determined that it was plausible that the null hypothesis was correct, that the therapy might have no effect, when in fact it does. 9.47 Anorexia decision a) We would decide to reject the null. We would have strong evidence that the population mean weight change post-therapy is greater than 0. b) If this decision were in error, it would be a Type I error. c) If the significance level were instead 0.01, we would decide not to reject the null hypothesis. If this decision were in error, it would be a Type II error. 9.48 Errors in the courtroom a) If H0 is rejected, we conclude that the defendant is guilty. b) A Type I error would result in finding the defendant guilty when he/she is actually innocent. c) If we fail to reject H0, the defendant is found not guilty. d) A Type II error would result in failing to convict a defendant who is actually guilty. 9.49 Errors in medicine a) If H0 is rejected, we conclude that the new drug is not safe. b) A Type I error would result in finding the new drug is not safe when it actually is safe. c) If we fail to reject H0, we conclude that the drug is safe. d) A Type II error would result in failing to find that the new drug is not safe when it actually is not safe. 9.50 Decision errors in medical diagnostic testing a) A Type I error is a false positive because we have rejected the null hypothesis that there is no disease, but we were wrong. The woman in fact does not have breast cancer. The consequence would be that the woman would have treatment, or at least further testing, when she did not need any. b) A Type II error is a false negative because we have failed to reject the null hypothesis that there is no disease, but we were wrong. The woman in fact does have breast cancer. The consequence would be failing to detect cancer and treat the cancer when it actually exists. c) The disadvantage of this tactic is that more women who do have breast cancer will have false negative tests and not receive necessary treatment. 9.51 Detecting prostate cancer a) A Type I error would occur if we diagnose prostate cancer when there is none. This would be that a man would have treatment, or at least further testing, when he did not need any. b) A Type II error would occur if we fail to diagnose prostate cancer when there is prostate cancer. This would mean that a man who had prostate cancer would not receive necessary treatment. c) The probability of 1 in 4 refers to the probability of a Type II error. d) The 2/3 refers to the probability that someone does not have prostate cancer given that he received a positive result. The probability of a Type I error, on the other hand, refers to the probability that someone will receive a positive result, given that he does not have prostate cancer.

184 Statistics: The Art and Science of Learning from Data, 4th edition 9.52 Which error is worse? a) When rejecting the null results in the death penalty, a Type I error is worse than a Type II error. With a Type II error, a guilty man or woman goes free, whereas with a Type I error, an innocent man or woman is put to death. b) When rejecting the null hypothesis results in treatment for breast cancer, a Type II error is worse than a Type I error. With a Type I error, someone might receive additional tests (e.g., biopsy) before ruling out breast cancer, but with a Type II error, someone might not receive life-saving treatment when they need it.

Section 9.5: Limitations of Significance Tests 9.53 Misleading summaries? a) Researcher A: P-value = 2P(Z > 2.0) = 0.046 b) Researcher B: P-value = 2P(Z > 1.90) = 0.057 c) Researcher A’s result has a P-value less than 0.05; thus, it is “statistically significant.” Researcher B’s P-value is not less than 0.05, and is not, therefore, “statistically significant.” Results that are not different from one another in practical terms might lead to different conclusions if based on statistical significance alone. d) If we do not see these two P-values, but merely know that one is statistically significant and one is not, we are not able to see that the P-values are so similar. e) For A, the 95% confidence interval is pˆ  z.025 pˆ (1  pˆ ) n  0.550  1.96 0.550(1  0.550) 400, or

(0.501, 0.599). For B: the 95% confidence interval is pˆ  z.025 pˆ (1  pˆ ) n  0.5475  1.96 0.5475(1  0.5475) 400, or (0.499, 0.596). This method shows the enormous amount of overlap between the two confidence intervals. The plausible values for the population proportions are very similar in the two cases, which we would not realize by merely reporting whether the null was rejected in a test. 9.54 Practical significance a)

Test statistic: t 

x  0 s



498  500 100

25,000

 3.16

b) P-value = 2P(Z < –3.16) = 0.0016 c) This result is statistically significant because the P-value is very small, but it is not practically significant because the sample mean of 498 is very close to the null hypothesis mean of 500. 9.55 Effect of n a)

Test statistic: t 

x  0 s



4.09  4.0 1.43

 0.31

b) This test statistic is associated with a P-value of 0.76. We cannot reject the null hypothesis; it is plausible that the null hypothesis is correct and that the population mean score is 4.0. c)



The 95% confidence interval is x  t.025 s





n  4.09  2.0639 1.43



25 , or (3.5, 4.7).

d) (i) This illustrates that a decrease in sample size increases the P-value greatly; a finding that might be statistically significant with a large sample size might not be with a small sample size. (ii) In addition, confidence intervals become wider as sample sizes become smaller. 9.56 Fishing for significance This is misleading because, with a significance level of 0.05, we would expect 5% of tests to be significant just by chance if the null hypothesis is true, and for 60 tests this is 0.05(60) = 3 tests.

Chapter 9: Statistical Inference: Significance Tests About Hypotheses 185 9.57 Selective reporting If we report only results that are “statistically significant,” these are the only ones of which the public becomes aware. There might be many studies on the same subject that did not reject the null and did not get published. The public is not able to identify situations in which only one of twenty or so studies on the same phenomenon had a significant finding. In such a case, the finding that did get published might be an example of a Type I error. 9.58 How many medical discoveries are Type I errors? The following tree diagram is based on 100 studies. Decision: Reject H0? True effect? Yes (14) Yes (20) No (6) 100

Yes (4) No (80)

No (76) The proportion of actual Type I errors (of cases where the null is rejected) would be about 4/(4 + 14) = 0.22. 9.59 Interpret medical research studies a) This does not mean the population proportion was exactly the same for those using Claritin and placebo. It just means that the difference was not big enough to conclude that it was statistically significant. We could tell someone who has not studied statistics that there could be a small difference between Claritin and placebo; if so, it’s possible that the small difference occurred just as a result of random variation, or it’s possible that it’s real. Until the difference is large enough based on a given sample size, we’re not willing to say that there is a significant difference. b) Research that suggests an impact of a therapy or drug tends to get the most media coverage if it’s a very large difference. It’s quite possible that studies that found a smaller difference did not get media coverage, or studies that failed to find a difference did not get published at all.

Section 9.6: The Likelihood of a Type II Error and the Power of a Test 9.60 Find P(Type II error) a) A one-tailed test would have a z-score of 1.645 at the cutoff. Here, the standard error would be pˆ (1  pˆ ) n  0.5(1  0.5) 100  0.050. The value 1.645 standard errors above 0.50 is 0.50 +

1.645(0.050) = 0.582. b)

186 Statistics: The Art and Science of Learning from Data, 4th edition 9.60 (continued)

z

(0.582  0.60)

 0.37; Table A tells us that 0.36 falls beyond this z-score; thus, P(Type II .60(1  .60) /100 error) = 0.36. 9.61 Gender bias in selecting managers a) The cutoff z-score for a 0.05 significance level and a one-tailed test (in a negative direction) is –1.645. Here, the standard error would be p0 (1  p0 ) n  0.4(1  0.4) 50  0.069. The value c)

1.645 standard errors below 0.069 is 0.4 – 1.645(0.069) = 0.286. b) The standard error would now be

p0 (1  p0 ) n  0.2(1  0.2) 50  0.0566. If p = 0.20, the z-score

0.286 - 0.20  1.52 . If we look this z-score up on a table, we find 0.0566 that the proportion of this curve that is not in the rejection area is 0.06. Thus, the Type II error has probability of 0.06. 9.62 Balancing Type I and Type II errors a) The cutoff for a 0.01 significance level and a one-tailed test is 2.33. Here, the standard error would be p0 (1  p0 ) n  (1/ 3)(1  1/ 3) 116  0.0438. The value 2.33 standard errors above 0.333 is 0.333

for 0.286 in reference to 0.20 is z 

+ 2.33(0.0438) = 0.435. b) If p = 0.50, the z-score for 0.435 in reference to 0.50 is z 

0.435  0.5

 1.40. If we look this 0.5(1  0.5) /116 z-score up on a table, we find that the proportion of this curve that is not in the rejection area is 0.08. 9.63 P(Type II error) large when p close to H0 a)

When p = 0.35, z 

0.405  0.35 0.35(1  0.53) /116

 1.24. P (Type II error) P(z < 1.24) = 0.89

b) When the parameter value is close to the value in H0, there is not much difference between these two values. Thus, the chances of being able to reject the null hypothesis are smaller, and the chances of failing to reject are larger. If we fail to reject the null hypothesis, but the null hypothesis is not true, this is a Type II error. As the parameter value moves away from H0, the chances of rejecting the null hypothesis go up, and the chances of failing to reject (and therefore the chances of a Type II error) go down. 9.64 Type II error with two-sided Ha a)

The standard error is

p0 (1  p0 ) n  (1/ 3)(1  1/ 3) 116  0.0438. For Ha, a test statistic of z =

1.96 has a P-value (two-tail probability) of 0.05. We reject H0 when pˆ  1/ 3  1.96( se)  pˆ  1/ 3  1.96(0.0438)  pˆ  1/ 3  0.086, hence we need pˆ  0.086  1/ 3 or pˆ  1/ 3  0.086 , or pˆ  0.419 or pˆ  0.248. When H0 is false, a Type II error occurs if 0.248< p̂ <0.419.

b) We can calculate z-scores for each of these proportions. z  p̂ is less than this z-score is 0. z 

0.248  0.50  5.43; the probability that 0.0464

0.419  0.50  1.75; the probability that p̂ is greater than this z0.0464

score is 0.96. The probability of a Type II error is the portion of the curve (for the parameter 0.50) that is not over the rejection area, so P(Type II error) = 1 – 0.96 = 0.04. 9.65 Power for infertility trial a) If p = 0.2 is true, then with 60% probability, the proposed trial will lead to a rejection of the null hypothesis H 0 : p  0.1 in favor of H a : p  0.1. c)

b) Not rejecting H 0 : p  0.1, when, in fact, p > 0.1.

Chapter 9: Statistical Inference: Significance Tests About Hypotheses 187 9.65 (continued) c) P(Type II error when p = 0.2) = 1 – P(not committing a Type II error when p = 0.2) = 1 – (Power when p = 0.2) = 1 – 0.60 = 0.4. 9.66 Exploring Type II errors a) As n increases, the probability of a Type II error decreases. b) The sample size is around n = 85. c) The probability will decrease. d) The power increases.

Chapter Problems: Practicing the Basics 9.67 H0 or Ha? a) null hypothesis c) alternative hypothesis 9.68 Write H0 and Ha

b) alternative hypothesis d) null hypothesis

p is the proportion of fathers taking parental leave. H 0 : p  0.15; H a : p  0.15

 is the mean CO2 emission from cars purchased last year. H 0 :   160; H a :   160

 is the mean retirement age for female workers. H 0 :   60; H a :   60

9.69 ESP 1) Assumptions: The data are categorical (correct versus incorrect guesses) and are obtained randomly. The expected successes and failures are both fifteen under H0; np0  n(1  p0 )  30(0.5)  15, so the sample size condition is met. 2) Hypotheses: H 0 : p  0.5; H a : p  0.5, p is the probability of a correct guess.

3) Test statistic: z 

pˆ  p0 p0 (1  p0 ) n



18 / 30  0.50 0.5(1  0.5) 30

 1.1

4) P-value: 0.137. 5) Conclusion: If the null hypothesis were true, the probability would be 0.137 of getting a test statistic at least as extreme as the value observed. Fail to reject H0, there is not strong evidence with this P-value that the probability of a correct guesses is higher than 0.50. 9.70 Free throw accuracy a) Assumptions: The data are categorical (make first only versus make second only) and are obtained randomly; the expected successes and failures are both at least fifteen under H0; np0  n(1  p0 )  82(0.5)  41  15. Hypotheses: H 0 : p  0.5; H a : p  0.5, p is the proportion of pairs of shots in which only one shot was made in which the first shot went in. pˆ  p0 0.415  0.50   1.55 b) Test statistic: z  p0 (1  p0 ) n 0.5(1  0.5) 82 c)

The P-value is 0.12; If the null hypothesis were true, the probability would be 0.12 of getting a test statistic at least as extreme as the value observed. Fail to reject H0, it is plausible that the null hypothesis is correct and that the population proportion of first free shots made (out of all pairs in which only one shot was made) is 0.50. 9.71 Brown or Whitman? a) 1) Assumptions: The data are categorical (Brown or not Brown) and are obtained randomly; the expected successes and failures are both at least fifteen under H0; np0  n(1  p0 )  650(0.5)  325  15. 2) Hypotheses: H 0 : p  0.5; H a : p  0.5, p is the population proportion of voters who prefer Brown.

188 Statistics: The Art and Science of Learning from Data, 4th edition 9.71 (continued)

3) Test statistic: z 

pˆ  p0 p0 (1  p0 ) n



0.554  0.5 0.5(1  0.5) 650

 2.75

4) P-value: 0.006 5) Conclusion: We can reject the null hypothesis at a significance level of 0.05; we have strong evidence that the population proportion of voters who chose Brown is different from 0.50. 0.56  0.5 b) If the sample size had been 50, the test statistic would have been z   0.849, and 0.5(1  0.5) 50 the P-value would have been 0.40. We could not have rejected the null hypothesis under these circumstances. c) The result of a significance test can depend on the sample size. As the sample size increases, the standard error decreases (because the sample size is the denominator of the standard error equation; dividing by a larger number leads to a smaller result). A smaller standard error leads to a larger z-score and a smaller P-value. 9.72 Protecting the environment? a) The assumptions are that the data are categorical, that they are obtained using randomization, and that the sample size is large enough that the sampling distribution of the sample proportion is approximately normal (i.e., expected successes and failures both at least fifteen under H0; np0  n(1  p0 )  1085(0.5)  542.5  15. The data are categorical (yes, no), the text tells us that we can assume that the GSS data are randomly obtained, and the sample size is large enough. b) Hypotheses: H 0 : p  0.5; H a : p  0.5; the point estimate of p is pˆ  0.423. The value of the test statistic is –5.07. c) The P-value  0.000, indicating that p̂ is quite extreme. If the null hypothesis were true, the probability would be close to 0 of getting a test statistic at least as extreme as the value observed. There is strong evidence that the population proportion who would answer yes is different from 0.50. d) It is not plausible that p = 0.50. The test statistic is too extreme for 0.50 to be plausible. e) The advantage of the confidence interval is that it provides a range of plausible values. 9.73 Majority supports gay marriage 1) Assumptions: The data are categorical (yes versus no); the sample is a random sample of 1690 adults; the expected number of yes and no responses are both at least 15 under H0; np0  n(1  p0 )  1690(0.5)  854  15. 2) Hypotheses: H 0 : p  0.5; H a : p  0.5, p is the proportion of Americans agreeing that homosexuals should be able to marry. pˆ  p0 955 /1690  0.5   5.35 3) Test statistic: z  p0 (1  p0 ) n 0.5(1  0.5) 1690 4) P-value: 0.000 5) Conclusion: At the 0.05 significance level, we have strong evidence (P-value < 0.001) that the proportion of Americans agreeing with the statement that homosexuals should have the right to marry is different from 0.5. With a sample proportion of 0.565, a clear majority now supports gay marriage. 9.74 Plant inheritance 1) Assumptions: The data are categorical (green versus yellow) and are obtained randomly; the expected successes and failures are both at least fifteen under H a ; np0  1103(0.75)  827.25  15 and n(1  p0 )  1103(0.25)  275.75  15. 2) Hypotheses: H 0 : p  0.75; H a : p  0.75, p is the proportion of green seedlings. 3) Test statistic: z 

pˆ  p0 p0 (1  p0 ) n



854 /1103  0.75 0.75(1  0.75) /1103

 1.86

Chapter 9: Statistical Inference: Significance Tests About Hypotheses 189 9.74 (continued) 5) Conclusion: If the null hypothesis were true, the probability would be 0.06 of getting a test statistic at least as extreme as the value observed. Fail to reject H0, it is plausible that the null hypothesis is correct. 9.75 Ellsberg paradox

Hypotheses: H 0 : p  0.50; H a : p  0.50, p is the proportion of people who would pick Box A.

b) Test statistic: z 

pˆ  p0 p0 (1  p0 ) n



36 / 40  0.5 0.5(1  0.5) / 40

 5.06; P-value  0.000; there is very strong

evidence that the population proportion that chooses Box A is not 0.50. Rather, it appears to be much higher than 0.50. Box A seems to be preferred. 9.76 Start a hockey team 1) Assumptions: The data are categorical (male versus female) and are obtained randomly; the expected successes and failures are both at least fifteen under H0; np0  100(0.55)  55  15 and n(1  p0 )  100(0.55)  45  15. 2) Hypotheses: H 0 : p  0.55; H a : p  0.55, p is proportion of university students that are male. 3) Test statistic: z 

pˆ  p0 p0 (1  p0 ) n



80 /100  0.55 0.55(1  0.55) /100

 5.0

4) P-value: 0.0000 5) Conclusion: If the null hypothesis were true, the probability would be close to 0 of getting a test statistic at least as extreme as the value observed. There is extremely strong evidence that the probability of selecting a male was higher than 0.55 and thus the sample was not random. 9.77 Interest charges on credit card The output first tells us that we are testing “p = 0.50 versus not p = 0.50.” That tells us that the null hypothesis is that the two cards are preferred equally and the alternative hypothesis is that one is preferred more than the other. The printout then tells us that X is 40. This is the number of people in the sample who preferred the card with the annual cost. 100 is the “N,” the size of the whole sample. The “Sample p” of 0.40000 (rounds to 0.40) is the proportion of the sample that preferred the card with the annual cost. The “95.0% CI” is the 95% confidence interval, the range of plausible values for the population proportion. The “Z-Value” is the test statistic. The sample proportion is 2.00 standard errors below the proportion as per the null hypothesis, 0.50. Finally, the “P-Value” tells us that if the null hypothesis were true, the proportion 0.0455 of samples would fall at least this far from the null hypothesis proportion of 0.50. This is barely extreme enough to reject the null hypothesis with a significance level of 0.05. We have evidence that the population proportion of people who prefer the card with the annual cost is below 0.50. The majority of the customers seem to prefer the card without the annual cost, but with the higher interest rate. 9.78 Jurors and gender a)

Hypotheses: H 0 : p  0.53; H a : p  0.53, p is proportion of jurors who are women.

b) Test statistic: z 

pˆ  p0 p0 (1  p0 ) n



0.125  0.53 0.53(1  0.53) / 40

 5.1

The P-value is 0.000. If the null hypothesis were true, the probability would be almost 0 of getting a test statistic at least as extreme as the value observed. d) This P-value is more extreme than the significance level of 0.01. We can reject the null hypothesis; we have strong evidence that women are not being selected in numbers proportionate to their representation in the jury pool. 9.79 Type I and Type II errors a) In the previous exercise, a Type I error would have occurred if we had rejected the null hypothesis, concluding that women were being passed over for jury duty, when they really were not. A Type II error would occur if we had failed to reject the null, but women really were being picked disproportionate to their representation in the jury pool.

190 Statistics: The Art and Science of Learning from Data, 4th edition 9.79 (continued) b) If we made an error, it was a Type I error. 9.80 Levine = author? a) 1) Assumptions: The data are categorical (whereas versus not whereas) and are obtained randomly; the expected successes and failures are both at least fifteen under H0; np0  300(0.10)  30  15 and n(1  p0 )  300(0.90)  270  15.

2) Hypotheses: H 0 : p  0.10; H a : p  0.10, p is proportion of sentences beginning with whereas. 3) Test statistic: z 

pˆ  p0 p0 (1  p0 ) n



0.00  0.10 0.10(1  0.10) / 300

 5.8

4) P-value: 0.000 5) Conclusion: If the population proportion is 0.10, we would expect a sample proportion this extreme almost none of the time. Reject H0, it seems unlikely that Levine wrote this document. b) The assumptions for this conclusion to be valid are in part (a)-(1). 9.81 Practice steps of test for mean a)

(i)

x

3  7  3  3  0  8  1  12  5  8 50   5.0 10 10

3  52  7  52    5  52  8  52

(ii) s 

10  1

(iii) se  s (iv) t 

124  3.71 9

n  3.71 10  1.17

x  0 s



5.0  0 3.71 10

 4.26

(v) df  n  1  10  1  9 b) The P-value of 0.002 is less than the significance level of 0.05. We can reject the null hypothesis. We have very strong evidence that the population mean is not 0. c) If we had used the one-tailed test, H a :   0, the P-value would be 0.002/2 = 0.001, also less than the significance level of 0.05. Again, we have very strong evidence that the population mean is positive. d) If we had used the one-tailed test, H a :   0, the P-value would be 1 – 0.001 = 0.999, far from the significance level of 0.05. It would be plausible that the null hypothesis is correct; we cannot conclude that the population mean is negative. 9.82 Two ideal children? a)

The test statistic value is t 

x  0 s



2.518  2 0.875 1417

 22.29.

b) The P-value is the probability of observing a t test statistic as extreme as the one observed (22.29) given the null hypothesis is true. The P-value is extremely small, so observing such a test statistic (and sample mean) is extremely unlikely if H0 is true. We reject H0 and conclude that the ideal number of children is different from two. 9.83 Hours at work a)

Hypotheses: H 0 :   40; H a :   40

b) (i) SE Mean, 0.445, is the standard error of the sampling distribution of the mean. (ii) T = 0.60, is the test statistic, the distance (measured in standard errors) of the sample mean of 40.27 from the null value of 40 hours. (iii) The P-value, 0.548, is the probability of observing a sample mean of 40.27 or more extreme (on either side) when the null hypothesis is true. This is rather large, so the sample mean is not unusually extreme if H0 is true. There is no evidence to reject the null hypothesis. c) The confidence interval shows that 40 is a plausible value for the hours in a workweek in the population. This is consistent with result of the hypothesis test, which does not reject H 0 :   40. Copyright © 2017 Pearson Education, Inc.

Chapter 9: Statistical Inference: Significance Tests About Hypotheses 191 9.84 Females liberal or conservative? 1) Assumptions: The data are quantitative, have been produced randomly, and have an approximate normal population distribution. 2) Hypotheses: H 0 :   4.0; H a :   4.0

3) Test statistic: t 

x  0

4.06  4.00

 1.61, (Note: when using technology with the original data, s n 1.37 1345 a more precise value is t = 1.72.) 4) P-value: 0.11; (0.08 using technology.) 5) Conclusion: With a significance level of 0.05, we would not reject H0 (P-value = 0.11 > 0.05). We have insufficient evidence to support the claim that the mean rating for females has changed from 4. 9.85 Blood pressure a) 1) Assumptions: The data are quantitative, have been produced randomly, and have an approximate normal population distribution. 2) Hypotheses: H 0 :   130; H a :   130

3) Test statistic: t 



x  0 s



150  130 8.37

 5.85

4) P-value: 0.002 5) Conclusion: If the null hypothesis were true, the probability would be 0.002 of getting a test statistic at least as extreme as the value observed. There is very strong evidence that the population mean is different from 130; reject H0, we can conclude that Vincenzo Baronello’s blood pressure is not in control. b) The assumptions are outlined in Step 1 in (a). Blood pressure readings are quantitative data. These data are the last six times he monitored his blood pressure. This might be considered a random sample of possible readings for that point in time. We do not know whether the population distribution is normal, but the two-sided test is robust for violations of this assumption. 9.86 Increasing blood pressure H a :   130; the P-value is 0.002/2 = 0.001, which is more extreme. We can still conclude that the blood pressure is not in control. 9.87 Tennis balls in control? a) Technology indicates a test statistic of t = –5.5 and a P-value of 0.001. b) For a significance level of 0.05, we would conclude that the process is not in control. The machine is producing tennis balls that weigh less than they are supposed to. c) If we rejected the null hypothesis when it is in fact true, we have made a Type I error and concluded that the process is not in control when it actually is. 9.88 Catalog sales x  0 10  15   5.0; P-value  0.000 t s n 10 100 If the null hypothesis were true, the probability would be close to 0 of getting a test statistic at least as extreme as the value observed. There is strong evidence that mean sales for the catalog differed from the mean of $15 from past catalogs. 9.89 Wage claim false? 1) Assumptions: The data are quantitative. The data seem to have been produced using randomization. We also assume an approximately normal population distribution. 2) Hypotheses: H 0 :   500; H a :   500

3) Test statistic: t 

x  0 s



441.11  500 12.69

 13.9

4) P-value: 0.000

192 Statistics: The Art and Science of Learning from Data, 4th edition 9.89 (continued) 5) Conclusion: If the null hypothesis were true, the probability would be almost 0 of getting a test statistic at least as extreme as the value observed. Reject H0, there is extremely strong evidence that the population mean is different than 500; with a sample mean of 441.11, we can conclude that the mean income is less than $500 per week. 9.90 CI and test a) We can reject the null hypothesis for any of these significance levels (i.e., 0.10, 0.05, and 0.01). b) None of the intervals based on the three confidence levels (i.e., 0.90, 0.95, and 0.99) would contain 500. c) If a finding is statistically significant, then the confidence interval associated with that significance level will not include the value in H0. d) (i) A Type I error would occur if we conclude that the mean income for all senior-level assembly-line workers is different from $500 when it actually is not. (ii) A Type II error would occur if we fail to reject the null hypothesis that the mean income for all senior-level assembly-line workers is $500 when it actually is different from $500. 9.91 CI and test connection a) We can reject the null hypothesis. b) It would be a Type I error. c) A 95% confidence interval would not contain 100. When a value is rejected by a test at the 0.05 significance level, it does not fall in the 95% confidence interval. 9.92 Religious beliefs statistically significant? a) We could explain that there is a smaller than 5% chance that we would find a difference in religiosity score this extreme if there had been no change in religious beliefs. It is unlikely that there has been no change. b) It would have been informative to have the actual P-value because not only would we know that there is a smaller than 0.05 chance of finding a difference in score this extreme if there had been no change, but we would know the actual probability of finding a difference in score at least this extreme. c) We cannot include that a practically important change in religiosity has occurred because significance levels only tell us that a change has occurred beyond chance; it does not tell us the size of that change in practical terms. 9.93 How to reduce chance of error? a) The researcher can control the probability of a Type I error by choosing a smaller significance level. This will decrease the probability of a Type I error. b) If a researcher sets the probability equal to 0.00001, the probability of a Type I error is low, but it will be extremely difficult to reject the null hypothesis, even if the null hypothesis is not true. 9.94 Legal trial errors a) A Type I error in a trial setting would occur if we convicted a defendant who was not guilty. A Type II error would occur if we failed to convict a guilty defendant. b) To decrease the chance of a Type I error, we could decrease the significance level. In doing this, it is more difficult to reject the null hypothesis (i.e., find someone guilty). Thus, there will be more guilty people who are not found guilty, a Type II error. 9.95 P(Type II error) with smaller n

The standard error is

p0 (1  p0 ) n  (1/ 3)(1  1/ 3) 60  0.061.

b) When P(Type I error) = significance level = 0.05, z = 1.645, and the value 1.645 standard errors above 1/3 is 1/3 + (1.645)(0.061) = 0.433.

Chapter 9: Statistical Inference: Significance Tests About Hypotheses 193 9.95 (continued)

The standard error is now

p0 (1  p0 ) n  0.5(1  0.5) 60  0.0645 and z 

0.433  0.5 = –1.03. 0.0645

The probability that p̂ falls below this z-score is 0.15. The Type II error is larger when n is smaller, because a smaller n results in a larger standard error and makes it more difficult to have a sample proportion fall in the rejection region. If we’re less likely to reject the null with a given set of proportions, we’re more likely to fail to reject the null when we should reject it.

Chapter Problems: Concepts and Investigations 9.96 Student data a) The P-value for this significance test is 0.000. If the null hypothesis were true, the probability would be almost 0 of getting a test statistic at least as extreme as the value observed; we have strong evidence that the population mean political ideology is not 4.0. b) The P-value for this significance test is 0.001. If the null hypothesis were true, the probability would be almost 0 of getting a test statistic at least as extreme as the value observed; we have strong evidence that the population proportion favoring affirmative action differs from 0.50. The one page report will be different for each student. 9.97 Class data The report will be different for each student. 9.98 Gender of best friend The short reports will vary, but will report the following information: pˆ  p0 0.1064  0.5   29.25, P-value = 0.000, 95% confidence interval: (0.08, 0.13) z p0 (1  p0 ) n 0.5(1  0.5) /1381

for the proportion having an opposite gender best friend. The confidence interval is more informative than the significance test. Not only does it tell us that the value at the null hypothesis (0.50) is implausible; it actually gives us a range of plausible values. 9.99 Baseball home team advantage a) These data give us a sense of what the probability would look like in the long run. If we look at just a few games, we don’t get to see the overall pattern, but when we look at a number of games over time, we start to see the long run probability of the home team winning (in this case 1359/2430). pˆ  p0 1359 / 2430  0.5   5.82; P-value = 0.000. Since the P-value is b) (i) z  p0 (1  p0 ) n 0.5(1  0.5) 2430 approximately 0, we reject the null hypothesis and conclude that there is a home team advantage. (ii) The 95% confidence for the population proportion is pˆ  z.025 p0 (1  p0 ) n  0.5592  1.96 0.5592(1  0.5592) 2340, or (0.54, 0.58). The interval also supports the

alternative hypothesis since all of the values are greater than the hypothesized value of 0.5. The test merely indicates whether p = 0.50 is plausible whereas the confidence interval displays the range of plausible values. 9.100 Statistics and scientific objectivity We can examine experimentally the claims of quack scientists, such as astrologers. If we fail to find a statistically significant effect (and this is replicated over and over), we start to get information that a given effect does not exist. 9.101 Two-sided or one-sided? a) Once a researcher sees the data, he or she knows in which direction the results lie. At this point, it is “cheating” to decide to a do a one-tailed test. In this scenario, one has actually done a two-tailed test, then cut the P-value in half upon seeing the results, making it easier to reject the null hypothesis. The decision of what type of test to use must be made before seeing the data.

194 Statistics: The Art and Science of Learning from Data, 4th edition 9.101 (continued) b) A result that is statistically significant with a P-value of 0.049 is not greatly different from one that is not statistically significant with a P-value of 0.051. The decision to use such a cutoff is arbitrary, is dependent on sample size, and is dependent on the random nature of the sample. Such a policy leads to the inflation of significant findings in the public view. If there really is no effect, but many studies are conducted, eventually someone will achieve significance, and then the journal will publish a Type I error. 9.102 No significant change and P-value The mean change in glucose level that this particular study found is not unusual. If, in fact, there is no effect of using a cell phone on the glucose levels (i.e., if the true mean change is zero), observing such a mean change or an even more dramatic one happens 63% of the time. 9.103 Subgroup lack of significance The sample size (n) has an impact on the P-value. The subgroups have smaller sample size, so for a particular size of effect will have a smaller test statistic and a larger P-value. 9.104 Vitamin E and prostate cancer Given a significance level of 0.05, we are going to get a test statistic in the rejection region 5% of the time when the null hypothesis is true. Thus, in every twenty studies, we’re likely to have one that is a Type I error. Type I errors, particularly those with larger effects, tend to get more exposure both in peer-reviewed journals and by the media. Thus, the early published research in a given area is likely to be that that found a significant effect. Only after the public becomes interested in it, might we then hear about other studies with null results. 9.105 Overestimated effect The studies with the most extreme results will give the smallest P-values and be most likely to be statistically significant. If we could look at how results from all studies vary around a true effect, the most extreme results would be out in a tail, suggesting an effect much larger than it actually is. 9.106 Choosing 

We might prefer a smaller significance level because we want to diminish the chances of a Type I error. We don’t want to run the risk of taking people off a drug that works until we are sure that our drug works at least as well. b) The disadvantage of a smaller significance level is that it is more difficult to reject the null hypothesis. There’s a higher chance of making a Type II error, and failing to find support for a drug that actually is better. 9.107 Why not accept H0? When we do not reject H0, we should not say that we accept H0. Just because the sample statistic was not extreme enough to conclude that the value in H0 is unlikely doesn’t mean that the value in H0 is the actual value. As a confidence interval would demonstrate, there is a whole range of plausible values for the population parameter, not just the null value. 9.108 Report P-value Knowing the exact P-value is more informative and, often, less misleading. For example, a very small Pvalue, such as 0.0001, tells us that this significance test provided stronger evidence than if the P-value were just beyond the cutoff, such as 0.049. On the other hand, if one P-value is just beyond the cutoff and one just barely fails to fall beyond the cutoff, we know that in practical terms they are giving about the same amount of evidence, even though one allows us to reject the null hypothesis and one does not. 9.109 Significance Statistical significance means that we have strong evidence that the true parameter value is either above or below the value in H0; this need not indicate practical significance. Practical significance means that the true parameter is sufficiently different from the value in H0 to be important in practical terms. Examples will vary.

Chapter 9: Statistical Inference: Significance Tests About Hypotheses 195 9.110 More doctors recommend

The company could conduct a significance test of the null hypothesis H 0 : p  0.75 vs. H a : p  0.75, and support their claim by quoting the P-value for this test. b) (i) If this claim is based on a random sample of 40 doctors, it is more impressive than if it is based on 4 doctors. A higher n means a smaller standard error, and therefore, a larger test statistic. A larger test statistic is more likely to fall in the rejection region. (ii) If this claim is based on a random sample of 40 doctors nationwide, it would be more impressive than if based on a sample of all 40 doctors who work in a particular hospital. In the former case, we are more able to generalize. If all of the doctors are in one hospital, we do not know if this finding would be true for the whole population. A more representative sample provides stronger evidence. 9.111 Medical diagnosis error With the probability of a false positive diagnosis being about 50% over the course of 10 mammograms, it would not be unusual for a woman to receive a false positive over the course of having had many mammograms. Likewise, when conducting many significance tests with a type I error of 0.05, it would not be unusual to have some show statistical significance (i.e., support the alternative hypothesis) even though the null hypothesis is in fact true. 9.112 Bad P-value interpretations Proper interpretation of the P-value: If the null hypothesis is correct (and the population mean is 100) there is a 0.057 chance that we would obtain a sample mean at least this far from the population mean (a sample mean at least as far from the population mean as 104 is). This is pretty unlikely, but not beyond the typical cutoff value of 0.05. a) 0.057 is not the probability that the null hypothesis is correct; it’s the probability that we would obtain a sample mean at least this extreme, if the null hypothesis is correct. We calculate probabilities for test statistic values – not for hypotheses about parameters. We actually never know whether the null hypothesis is or isn’t true. b) x is the sample mean. The probability that the sample mean equals 104, regardless of whether the null hypothesis is true, is 100%. The researchers would have calculated the actual sample mean from the actual sample data. This is not an inference. The probability that we would obtain a sample mean of at least as extreme as 104 if the null hypothesis is true is 0.057. c) This probability of 0.057 refers to the likelihood of getting a sample at least this extreme if the null hypothesis = 100, not if the null hypothesis does not equal 100. d) The probability of a Type I error is the probability to which we set  . If we were to set  to a level of 0.05, the most common level, the probability of a Type I error would be 0.05. e) It is never a good idea to accept Ha because, even though our P-value of 0.057 is not smaller than 0.05, it is very possible that the population mean is not 100. Remember – the null hypothesis contains only a single value! If we were to set up a 95% confidence interval around the sample mean of 104, we would see a lot of plausible values that the population mean could be. f) In order to reject Ha at the  = 0.05 level, the P-value would have to be less than 0.05, but this P-value is greater than 0.05. 9.113 Interpret P-value The P-value tells us the probability of getting a test statistic this large if the value at the null hypothesis represents the true parameter. In this case, it is 0.057. We can reject the null hypothesis if the P-value is at or beyond the significance level  If it were any number below this, there would not be enough evidence. a)

9.114 Incorrectly posed hypotheses This notation is for sample statistics, not population parameters. Hypotheses are about populations, not samples, and therefore use parameters, not statistics. 9.115 Multiple choice: Small P-value The best answer is (b). 9.116 Multiple choice: Probability of P-value The best answer is (a).

196 Statistics: The Art and Science of Learning from Data, 4th edition 9.117 Multiple choice: Pollution The best answer is (a). 9.118 Multiple choice: Interpret P(Type II error) The best answer is (c). 9.119 True or false False 9.120 True or false True 9.121 True or false False 9.122 True or false True 9.123 True or false False 9.124 True or false True 9.125 True or false False 9.126 True or false True ♦♦9.127 Standard error formulas If the sample probability is 0, then the standard error is 0, and the test statistic is infinity, which does not make sense. A significance test is conducted by supposing the null is true, so in finding the test statistic we should substitute the null hypothesis value, giving a more appropriate standard error. ♦♦9.128 Rejecting true H0? a) The distribution is binomial, with n = 100, p = 0.05. We would expect the researcher to reject the null hypothesis about np = 100(0.05) = 5 times. b) If she rejects the null hypothesis in five out of 100 tests, it is plausible that the null hypothesis is correct in every case. We’d expect about 5 rejections merely by chance when the null is true each time.

Chapter Problems: Student Activities 9.129 The results will be different for each class. 9.130 The results will be different for each class.

Chapter 10: Comparing Two Groups 197

Section 10.1: Categorical Response: Comparing Two Proportions 10.1 Unemployment rate a) The response variable is unemployment rate and the explanatory variable is race. b) The two groups that are the categories of the explanatory variable are white and black. c) The samples of white and black individuals were independent. No individual could be in both samples. 10.2 Sampling sleep a) The samples on weekdays and weekends should be treated as dependent samples because every person is in both samples. b) When compared with other people from another year, the samples should be treated as independent. No one person is in both samples. 10.3 Binge drinking a) The estimated difference between the population proportions in 2009 and 1999 is 0.212 – 0.422 = –0.21. The proportion of students who reported binge drinking at least 3 times within the past 2 weeks has apparently decreased between 1999 and 2009. b) The standard error of 0.0305 is the standard deviation of the sampling distribution of differences between the sample proportions. se 

pˆ1 1  pˆ1  n1



pˆ 2 1  pˆ 2  n2



(0.422)(0.578) 0.212(0.788)   0.0305 334 843

p1 is the proportion of the student population in 1999 that binge drinks, p2 is the proportion of the student population in 2009 that binge drinks. d) The 95% confidence interval is ( pˆ1  pˆ 2 )  z.025 ( se)  (0.212  0.422)  1.96(0.0305), or (–0.27, –0.15). We can be 95% confident that the population mean change in proportion is between –0.27 and –0.15. This confidence interval does not contain zero; thus, we can conclude that there was a decrease in the population proportion of UW students who reported binge drinking at least 3 times in the past 2 weeks between 1999 and 2009. e) The assumptions are that the data are categorical (reported binge drinking at least 3 times in the past 2 weeks versus did not), that the samples are independent and are obtained randomly, and that there are sufficiently large sample sizes (given), and that the two sample sizes are sufficiently large (both have at least 10 successes and 10 failures). 10.4 Less smoking now? a) The point estimate is 0.541 – 0.238 = 0.303. We estimate that the population percent of adults who never smoked is 30 percentage points higher in the group that did not have any lung obstruction compared to the group that did. b) We are 99% confident that the population percent of adults who never smoked is between 27 and 34 percentage points higher for adults with no lung obstruction compared to those with lung obstruction. c) The assumptions are that the response variable is categorical (whether someone never smoked) and observed in two groups (those with and without lung obstruction), that the two samples are independent (the responses from adults without lung obstruction are independent of the ones from adults with lung obstruction) and obtained randomly (given), and that the two sample sizes are sufficiently large (both have at least 10 successes and 10 failures). 10.5 Do you believe in miracles? a) Males: pˆ1  277 603  0.46; Females: pˆ 2  461 730  0.633 c)

se 

pˆ1 (1  pˆ1 ) pˆ 2 (1  pˆ 2 )   n1 n2

0.46(0.54) 0.63(0.37)   0.027 603 730

198 Statistics: The Art and Science of Learning from Data, 4th edition 10.5 (continued) The 95% confidence interval is ( pˆ1  pˆ 2 )  z.025 ( se)  (0.459  0.632)  1.96(0.027), or (–0.22, – 0.11). We can be 95% confident that the population proportion for females falls between 0.11 and 0.22 higher than the population proportion for males. Because 0 does not fall in this interval, we can conclude that females are more likely than are males to say that they believe in religious miracles. The assumptions are that the data are categorical, that the samples are independent and obtained randomly, and that there are sufficiently large sample sizes. c) The confidence interval has a wide range of plausible values for the population mean difference in proportions, ranging from –0.22 which represents a fairly large difference to –0.12 which represents a more modest difference. 10.6 Aspirin and heart attacks in Sweden a) Each “Sample p” is obtained by taking the proportion of the people in that sample who had a heart attack. b) The “estimate for p(1) – p(2)” is obtained by subtracting the “Sample p” for the second sample from the “Sample p” from the first sample. There is a difference of 0.014. c) The confidence interval tells us that we can be 95% confident that the population difference in proportions is between –0.005 and 0.033. Because zero is in this interval, it is plausible that there is no difference between proportions. There may be no difference in proportions of heart attacks between the aspirin and placebo groups. d) The estimate for the difference would change in sign; it would be negative instead of positive. The endpoints of the confidence interval also would change in signs. They would be (–0.033, 0.005). The confidence interval still includes zero; there may be no difference in proportions of heart attacks between those who take aspirin and those who take placebo. 10.7 Swedish study test a)

H 0 : p1  p2 ; H a : p1  p2

b) The P-value of 0.14 tells us that, if the null hypothesis were true, we would obtain a difference between sample proportions at least this extreme 0.14 of the time. c) The bigger the sample size, the smaller the standard error and the bigger the test statistic. This study has smaller samples than the Physicians Health Study did. Therefore, its standard error was larger and its test statistic was smaller. A smaller test statistic has a larger P-value. d) The P-value would be 0.14/2 = 0.07. 10.8 Significance test for aspirin and heart attacks study a) pˆ  (347  327) (11,535  14, 035)  674 25,570  0.026 se0 

z

1 1  1   1 pˆ (1  pˆ )     0.026(1  0.026)     0.002  n n 11,535 14,035  1 2

( pˆ1  pˆ 2 )  0 0.0301  0.0233   3.4 0.002 se0

The P-value is 0.001. If the null hypothesis were true, the probability would be close to 0 of getting a test statistic at least as extreme as the value observed. We have strong evidence that there is a difference in the proportion of cancer deaths between those taking placebo and those taking aspirin. 10.9 Drinking and unplanned sex a) Assumptions: Categorical response variable (whether someone has unplanned sex), two groups (students samples in 1999 and students samples in 2009), samples are random and independent (given), and must have at least 10 successes and 10 failures. Notation: p1 is the population proportion of students engaging in unplanned sex in 1999, p2 is the population proportion of students engaging in unplanned sex in 2009. Hypotheses: H 0 : p1  p2 ; H a : p1  p2

Chapter 10: Comparing Two Groups 199 10.9 (continued) b) The pooled proportion is pˆ  (103  194) (334  843)  279 1177  0.252. This is the common value of p1 and p2 , estimated by the proportion of the total sample who reported that they had engaged in such activities. c)

se0 

1 1  1   1 pˆ (1  pˆ )     0.252(0.748)    0.028  334 843   n1 n2 

In this case, the standard error is interpreted as the standard deviation of the estimates  pˆ1  pˆ 2  from different randomized studies using these sample sizes. ( pˆ  pˆ 2 )  0 0.307  0.23   2.75 d) z  1 se0 0.028 The P-value is 0.006. If the null hypothesis were true, the probability would be 0.006 of getting a test statistic at least as extreme as the value observed. We have sufficient evidence to reject the null hypothesis; there is a difference in proportions of reports of engaging in unplanned sexual activities because of drinking between 1999 and 2009. 10.10 Comparing marketing commercials a) Here are the results from software: Sample X N Sample p 1 25 100 0.250000 2 20 100 0.200000 Difference = p (1) – p(2) Estimate for difference: 0.05 95% CI for difference: (–0.0655382, 0.165538) Test for difference = 0 (vs. not = 0): Z = 0.85 P-Value = 0.396 1) Assumptions: Each sample must have at least ten outcomes of each type. The data must be categorical, and the samples must be independent random samples. 2) H 0 : p1  p2 ; H a : p1  p2 3) z = 0.85 4) P-value: 0.40 5) If the null hypothesis were true, the probability would be 0.40 of getting a test statistic at least as extreme as the value observed. It is plausible that the null hypothesis is correct, and that there is no population difference in proportions of Group A and B who say they would buy the product. b) The manager’s conclusion is not supported by the data; we do not have strong enough evidence to make this conclusion. One limitation of this study is the volunteer nature of the sample. It is not a random sample. 10.11 Hormone therapy for menopause a) Assumptions: Each sample must have at least ten outcomes of each type. The data must be categorical, and the samples must be independent random samples. p is the probability that someone developed cancer. The hypotheses are H 0 : p1  p2 ; H a : p1  p2 . b) The test statistic is 1.03, and the P-value is 0.303 (rounds to 0.30). If the null hypothesis were true, the probability would be 0.30 of getting a test statistic at least as extreme as the value observed. It is plausible that the null hypothesis is correct and that there are the results for the hormone therapy group are not different from the results for the placebo group. c) We cannot reject the null hypothesis.

200 Statistics: The Art and Science of Learning from Data, 4th edition 10.12 Obama A/B testing a) Button Selection 0.10 0.09 Sample Proportions

0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0.00

2 Sample

b) 1) Assumptions: Categorical response (whether visitor clicked button), two groups (two versions of the website), independent and random samples, at least 10 successes and 10 failures in each group. 2) Hypotheses: H 0 : p1  p2 ; H a : p1  p2 , where p1 and p2 are the population proportion of visitors who clicked the button on the original and alternative version of the website, respectively. 3) Test Statistic: z  10.03 4) The P-value is approximately 0. 5) Conclusion: With this small P-value, there is strong evidence of a significant difference. The population proportion of visitors who clicked the button differs between those visiting the original site and those visiting the alternative version. c) The percentage of visitors clicking the button on the original website is at least 1.1 percentage points and at most 1.7 percentage points lower than on the alternative version of the website. This interval tells us about the range of the effect, rather than just telling us that the effect is significant. 10.13 Believe in heaven and hell One of the assumptions for the inference in this section is that we have two independent samples. Here, we ask the same subjects two questions, so each subject is measured twice and the responses to the two questions are not independent. Therefore, we cannot use the methods of this section (but see Section 10.4).

Section 10.2: Quantitative Response: Comparing Two Means 10.14 Alcohol and Energy Drinks a) The two groups compared were students consuming alcohol with energy drinks (Group 1) and students just consuming alcohol (Group 2). Let 1 be the population mean blood alcohol level for students that mix alcohol and energy drinks and 2 the population mean alcohol level for students that drink alcohol alone. H 0 : 1  2 b) A confidence interval estimates a range of possible values for the difference in the population mean blood alcohol level, whereas a P-value just indicates whether the difference is statistically significant without telling the size of the effect. 10.15 Address global warming a) The response variable is the amount of tax the student is willing to add to gasoline in order to encourage drivers to drive less or to drive more fuel-efficient cars; the explanatory variable is whether the student believes that global warming is a serious issue that requires immediate action or not. b) Independent samples; the students were randomly sampled so which group the student falls in (yes or no to second question) should be independent of the other students.

Chapter 10: Comparing Two Groups 201 10.15 (continued) c) A 95% confidence interval for the difference in the population mean responses on gasoline taxes for the two groups, 1  2 , is given by ( x1  x2 )  t.025 ( se) where x1 is the sample mean response on the gasoline tax for the group who responded “yes” to the second question, x2 is the sample mean response on the gasoline tax for the group who responded “no” to the second question, t is the t-score for a 95% confidence interval and se 

s12 s22  is the standard error of the difference in mean n1 n2

responses. 10.16 Housework for women and men a) The study estimated that, on the average, women spend 33.0 – 19.9 = 13.1 more hours than men on housework. b) The standard error for comparing the means is 1.20. The standard error is so small compared to the sample standard deviations for the two groups because the sample sizes are so large. se 

s12 s22   n1 n2

21.92 14.62   1.20 476 496

The 95% confidence interval is ( x1  x2 )  t.025 ( se)  (33.0  19.9)  1.96(1.2), or (10.7, 15.5). We can be 95% confident that the difference between the population mean scores of women and men falls between 10.7 and 15.5. Because zero is not in the interval, we can conclude that there is a mean difference between the populations. It appears that the population mean for women is higher than the population mean for men. d) The assumptions are that the data are quantitative, both samples are independent and random, and there is approximately a normal population distribution for each group. 10.17 More confident about housework c)

The 99% confidence interval is ( x1  x2 )  t.005 ( se)  (33.0  19.9)  2.58(1.2), or (10.0, 16.2).

b) This interval is wider than the 95% confidence interval because we have chosen a larger confidence level, and thus, the t value associated with it will be higher. To be more confident, we must include a wider range of plausible values. 10.18 Employment by gender a) It does seem plausible that employment has a normal distribution for each gender because the standard deviations are close in size and much smaller than the means, an indication of a fairly normal distribution. b) The assumption of an approximately normal population distribution for each group is satisfied and so our inferences are not affected. The remaining assumptions, quantitative response variable for two groups and independent random samples are also satisfied. c) We can be 95% confident that the population mean gender difference in weekly time spent in employment is between 4.5 and 6.6 hours. d) The population means are not likely equal. Because 0 is not included in the interval, we can conclude that there is a difference between genders in the population means with respect to time spent in employment. It appears that men spend more time in employment (on the average) than do women. 10.19 Ideal number of children a) The standard error is 0.043. se 

s12 s22   n1 n2

(0.89)2 (0.85)2   0.043 921 754

b) The 95% confidence interval is ( x1  x2 )  t.025 ( se)  (2.54  2.50)  1.96(0.043), or (–0.04, 0.12). In the U.S. population, the mean number of children that women think is ideal is between 0.04 smaller and 0.12 larger compared to what men think is ideal. Because 0 is in the confidence interval, with 95% confidence, the population mean number of children that women and men think is ideal does not differ significantly between the sexes. Copyright © 2017 Pearson Education, Inc.

202 Statistics: The Art and Science of Learning from Data, 4th edition 10.20 Pay by gender a) It does not seem plausible that earnings has a normal distribution for each gender because at least for women, the standard deviation is about the same size as the mean, indicating skew. Specifically, the lowest possible value of $0 is only (0 – 35,800)/14,600 = –2.5, or 2.5 standard deviations below the mean for men and (0 – 23,900)/21,900 = –1.09, or 1.09 standard deviations below the mean for women, indicating skewness to the right in both cases. b) One of the assumptions for this inference is that both populations are normally distributed. Although we do not likely have normal population distributions, the two-sided test is robust with respect to that assumption, particularly with such large sample sizes. Our inference is not likely affected. The remaining assumptions, quantitative response variable for two groups and independent random samples are satisfied. c) We can be 95% confident that the true difference in average earnings per year between men and women in 2008 was between $9,547 and $14,253. Since $0 is not contained in the interval, it appears that men had a higher average yearly income than women in 2008. The assumptions are that we have independent random samples for each of the two groups and that the population distribution is approximately normal for each group. 10.21 Bulimia CI a)

se 

s12 s22   n1 n2

(2.1)2 (3.2)2   0.97 13 17

b) The 95% confidence interval is ( x1  x2 )  t.025 ( se)  (2.0  4.8)  2.048(0.97), or (–4.79, –0.81). We can be 95% confident that the difference in the population mean family cohesion scores between sexually abused students and non-abused students is between –4.79 and –0.81. Since 0 is not contained in the interval, we can conclude that the mean family cohesion score is lower for the sexually abused students than for the non-abused students. 10.22 Chelation useless? a) We can be 95% confident that the population mean difference between chelation and placebo was between –53 and 36 seconds. b) H 0 : 1  2 ; H a : 1  2 c)

The confidence interval also supports this conclusion. Because 0 falls in this interval, it is plausible that there is no population mean difference between these treatments. 10.23 Nicotine dependence a) (i) The overwhelming majority of noninhalers must have had HONC scores of 0 because the mean is very close to 0 (and there’s a small standard deviation). It would only be this low with a large number of scores of 0. (ii) On the average, those who reported inhaling had a mean score that was 2.9 – 0.1 = 2.8 (rounds to 3) higher than did those who did not report inhaling. b) The HONC scores were probably not approximately normal for the noninhalers. The lowest possible value of 0, which was very common, was only a fraction of a standard deviation below the mean. c) The standard error is interpreted as the standard deviation of the difference between sample means from different studies using these sample sizes. se 

s12 s22   n1 n2

(3.6)2 (0.5)2   0.24 237 95

d) Because 0 is not in this interval, we can conclude that there is a difference in population mean HONC scores between inhalers and noninhalers at the 95% confidence level. Inhalers appear to have a higher mean HONC score than noninhalers do. 10.24 Inhaling affect HONC? a)

(2.9  0.1)  0  11.7; the P-value associated with this is approximately 0; if the population means 0.239 were equal, the probability of getting a test statistic this large is about 0. t

Chapter 10: Comparing Two Groups 203 10.24 (continued) b) We would reject the null hypothesis. Since the sample mean for inhalers is higher than noninhalers, we can conclude that those who inhaled have a higher population mean HONC score than those who did not inhale. c) The assumptions are that the data are quantitative, both samples are independent and random, and there is approximately a normal population distribution for each group. 10.25 Females or males more nicotine dependent? a) The standard error of 0.364 is the standard deviation of the difference between samples from different studies using these sample sizes. se 

s12 s22   n1 n2

(3.6)2 (2.9)2   0.36 150 182

( x1  x2 )  0 (2.8  1.6)  0   3.30; P-value: 0.001 se 0.364 If the null hypothesis were true, the probability would be 0.001 of getting a test statistic at least as extreme as the value observed. We have very strong evidence that there is a difference between men’s and women’s population mean HONC scores. The females seem to have a higher population mean HONC score than the males. c) The HONC scores were probably not normal for either gender. The standard deviations are bigger than the means, an indication of skew. The lowest possible value of 0 is (0 – 2.8)/3.6 = –0.778, or 0.778 standard deviations below the mean for females, and (0 – 1.6)/2.9 = –0.552, or 0.552 standard deviations below the mean for males, indicating skew to the right in both cases. This does not affect the validity of our inference greatly because of the robustness of the two-sided test for the assumption of a normal population distribution for each group. 10.26 Female and male monthly smokers b) From software: t = 2.54; P-value: 0.012. If the null hypothesis were true, the probability would be 0.012 of getting a test statistic at least as extreme as the value observed. We have strong evidence that there is a difference between men’s and women’s population mean HONC scores among those who were “monthly smokers.” The females seem to have a higher population mean HONC score than the males do. c) The HONC scores were probably not normal for either gender. The standard deviations are almost as large as the means, an indication of skew. The lowest possible value of 0 is (0 – 5.4)/3.5 = –1.543, or 1.543 standard deviations below the mean for females, and (0 – 3.9)/3.6 = –1.083, or 1.083 standard deviations below the mean for males, indicating skew to the right in both cases. Because of the robustness of the two-sided test for the assumption of a normal population distribution for each group, this probably does not have a great effect on the validity of our analysis. 10.27 Body language

t

H 0 : 1  2 ; H a : 1  2 , where 1 is the population mean score for clips based on male speakers and 2 is the population mean score for clips based on female speakers.

b) The test statistic is 1.69 (df = 57.8) and the P-value is 0.097. If there is no difference in the population mean score for clips based on male and female speakers, then there is a 9.7% chance of observing a difference in the sample means of 11.8 or larger (or –11.8 and smaller). Compared to a 0.05 significance level, this is not small. There is insufficient evidence to conclude that the population mean scores differ. The population mean dominance score for clips based on male speakers may equal the one for clips based on female speakers. c) Yes, since we failed to reject the null hypothesis H 0 : 1  2 , 0 is a plausible value for this difference.

204 Statistics: The Art and Science of Learning from Data, 4th edition 10.28 Student survey a) The plot indicates that the population distributions for both females and males might be skewed to the right.

Gender

Dotplot of Newspapers vs Gender

b) The 95% confidence interval is (–2.2, 0.9). We can be 95% confident that the population mean gender difference is between –2.2 and 0.9. Because 0 is included in this interval, it is plausible that there is no population mean gender difference. c) 1) The assumptions made by these methods are that the data are quantitative, both samples are independent and random, and there is approximately a normal population distribution for each group. 2) H 0 : 1  2 ; H a : 1  2 3) t = –0.82 4) The P-value is 0.42 5) If the null hypothesis were true, the probability would be 0.42 of getting a test statistic at least as extreme as the value observed. It is plausible that the null hypothesis is correct and that there is no mean population gender difference in number of times reading a newspaper. d) The assumptions are listed in (c). The population distributions may be skewed right. Given that we’re conducting a two-tailed test, however, the violation of this assumption probably does not affect our inferences. 10.29 Study time a) Let group 1 represent the students who planned to go to graduate school and group 2 represent those who did not. Then, x1  11.67 , s1  8.34 , x2  9.10, and s2  3.70. The sample mean study time per week was higher for the students who planned to go to graduate school, but the times were also much more variable for this group. b) If further random samples of these sizes were obtained from these populations, the differences between the sample means would vary. The standard error of these values would equal about 2.2. 8.342 3.702   2.16 21 10 The 95% confidence interval for this data is (–1.9, 7.0). We are 95% confident that the difference in the mean study time per week between the two groups is between –1.9 and 7.0 hours. Since 0 is contained within this interval, we cannot conclude that the population mean study times differ for the two groups. se 

Chapter 10: Comparing Two Groups 205 10.30 More on study time a) 1) Assumptions: the data are quantitative (number of hours of study time per week); the samples are independent and we will assume that they were collected randomly; we assume that the number of hours of study time per week is approximately normal for each group. 2) H 0 : 1  2 ; H a : 1  2 11.67  9.10  1.19 2.16 4) P-value: 0.25 5) If the null hypothesis were true, the probability would be 0.25 of getting a test statistic at least as extreme as the value observed. Since the P-value is quite large, we are unable to conclude that there is a significant difference in the average number of study hours per week for the two groups. b) The sample was a convenience sample because it was drawn from students in one of the teaching assistant’s classes rather than from a random sample of all students at the university. 10.31 Time spent on social networks

t

Let group 1 represent males and group 2 represent females. Then, x1 = 11.92, s1 = 3.94, x2 = 15.62, and s2 = 8.00. The sample mean time spent on social networks was higher for females than for males, but notice the apparent outlier for the female group (40). The data were also much more variable for females, but this may also merely reflect the outlier. b) If further random samples of these sizes were obtained from these populations, the differences between the sample means would vary. The standard deviation of these values would equal about 2.08. se 

s12 s22   n1 n2

3.942 8.002   2.084 12 21

A 90% confidence interval is (–7.24, –0.17). We are 90% confident that the difference in the population mean number of hours spent on social networks per week is between –7.24 and –0.17 for males and females Since 0 is not contained within this interval, we can conclude that the population mean time spent on social networks per week differs for males and females. 10.32 More time on social networks

Gender

Dotplot of Time on Social Networks vs Gender

15 20 25 30 Hours per Week on Social Networks

It appears that the largest value in each group could be an influential outlier. Removing the largest data point in each of the two groups, we obtain x1 = 11.18, s1 = 3.16, x2 = 14.40, and s2 = 5.87 (much less variability). The sample mean time spent on social networks is still higher for the sample of females. The confidence interval (–5.98, –0.46) still contains values all less than 0.

206 Statistics: The Art and Science of Learning from Data, 4th edition 10.33 Normal assumption With large random samples, the sampling distribution of the difference between two sample means is approximately normal regardless of the shape of the population distributions. Substituting sample standard deviations for unknown population standard deviations then yields an approximate t sampling distribution. With small samples, the sampling distribution is not necessarily bell-shaped if the population distributions are highly non-normal. 10.34 Vital capacity We can’t use the methods from this section because the samples are not independent. Each checkup provided two measurements on the same person, one before and one after using an inhaler. These two measurements are not independent of each other. One can also argue that the biannual measurements are not independent because they are observed on the same person over time.

Section 10.3: Other Ways of Comparing Means, Including a Permutation Test 10.35 Body dissatisfaction a)

The standard error for comparing the means is s  6.413, and the standard error is se  s

( n1  1) s12  ( n2  1) s22 15(8.0)2  67(6.0)2 =  n1  n2  2 16  68  2

1 1 1 1   6.413   1.782. n1 n2 16 68

b) The 95% confidence interval is ( x1  x2 )  t.025 ( se)  (13.2  7.3)  1.989(1.782), or (2.36, 9.44). We can be 95% confident that the population mean difference is between 2.36 and 9.44. Because 0 does not fall in this interval, we can conclude that, on average, the body dissatisfaction assessment score was higher for lean sport athletes than for nonlean sport athletes. 10.36 Body dissatisfaction test (13.2  7.3)  0  3.311; P-value: 0.001 1.782 b) The assumptions are that the data are quantitative, constitute random samples from two groups, and are from populations with approximately normal distributions. In addition, we assume that the population standard deviations are equal. Given the large standard deviations of the groups, the normality assumption is likely violated, but we’re using a two-sided test, so inferences are robust to that assumption. 10.37 Surgery versus placebo for knee pain a) The confidence interval is (–10.63, 6.43). We can be 95% confident that the population mean pain score is between 10.6 points smaller and 6.4 points larger for patients treated with the placebo procedure compared to patients treated with the lavage procedure. Because 0 falls in this interval, it is plausible that the mean pain score is the same under both procedures. b) It is reasonable to assume equal population standard deviations, because sample standard deviations s1 and s2 are very similar, in fact, identical. c) 1) We assume independent random samples from the two groups, an approximately normal population distribution for each group (particularly if the sample sizes are small), and equal population standard deviations. 2) H 0 : 1  2 ; H a : 1  2

t

3) t = –0.49 4) The P-value is 0.627 5) If there is no difference in the population mean pain score, the probability of observing a test statistic this extreme is 0.63. This is large. There is no evidence of a difference in the population mean pain score between the placebo and lavage arthroscopic surgery procedures.

Chapter 10: Comparing Two Groups 207 10.38 Comparing clinical therapies a) Using technology, choosing “assume equal variances,” confirms the given values. b) We can be 95% confident that the population mean difference between change scores is between –1.2 and 41.2. Because 0 falls in this range, it is plausible that there is no difference in mean change scores between the two populations. We do not have sufficient evidence to conclude that there is a mean difference between the two populations. The therapies may not have different means, but if they do the population mean could be much higher for therapy 1. The confidence interval is so wide because the two sample sizes are very small. c) Using technology again, the 90% confidence interval is (3.7178, 36.2822). At this confidence level, we can conclude that therapy 1 is better. 0 is no longer in the range of plausible values. 10.39 Clinical therapies 2 a)

H 0 : 1  2 ; H a : 1  2 ; t   (40.0  20.0)  0 7.64  2.62; P-Value = 0.059; If the null hypothesis

were true, the probability would be 0.059 (rounds to 0.06) of getting a test statistic at least as extreme as the value observed. b) (i) With a 0.05 significance level, we cannot reject the null hypothesis. It is plausible that the null hypothesis is true and that there is not a population difference in change scores between the two therapy types. (ii) With a 0.10 significance level, we can reject the null hypothesis. We have strong evidence that there is a difference in population mean change scores between the two therapy types. Therapy 1 led to better results than therapy 2. c) If the researcher had predicted ahead of time that therapy 1 would be better, that would have corresponded to H a : 1  2 ; the P-value for a one-tailed test would be 0.030. For a significance level of 0.05, we would reject the null hypothesis. We would have strong evidence to conclude that therapy 1 has bigger population change scores than does therapy 2. 10.40 Vegetarians more liberal? a) The first set of inferences assumes equal population standard deviations, but the sample standard deviations suggest this is not plausible. It is more reliable to conduct the second set of inferences, which do not make this assumption. b) Based on the first set of results, we would not conclude that the population means are different. 0 falls within the confidence interval; thus, it is plausible that there is no population mean difference. Moreover, the P-value is not particularly small. Based on the second set of results, however, we would conclude that the population means are different. Zero is not in the confidence interval, and hence, is not a plausible value. In addition, the P-value is quite small, an indication that results such as these would be very unlikely if the null hypothesis were true. From this set of results, it appears that the vegetarian students are more liberal than are the non-vegetarian students. 10.41 Teeth whitening results a)

We are testing the hypotheses H 0 : 1  2 ; H a : 1  2 , where 1 represents the population mean change in Vita shade from baseline for the whitening gel group and  2 represents the population mean change in Vita shade from baseline for the toothpaste only group. Alternatively, we could test H 0 : 1  2  0; H a : 1  2  0.

b) The probability is less than 5% of obtaining a sample difference in means at least as extreme as that observed here assuming the null hypothesis of no difference in population means is true. c)

The pooled standard deviation is s  the standard error is se  s

( n1  1) s12  ( n2  1) s22  n1  n2  2

57(1.32)2  58(1.29)2  1.305 and 58  59  2

1 1 1 1   1.305   0.241. The t statistic is n1 n2 58 59

x1  x2 0.67   2.78, df  n1  n2  2  58  59  2  115. The resulting P-value is 0.006. se 0.241 This is statistically significant. t

208 Statistics: The Art and Science of Learning from Data, 4th edition 10.41 (continued) d) The change in mean Vita shade score from baseline was 2.91 times higher for the whitening gel group than for the toothpaste only group. 10.42 Permuting therapies a) Patient All possible assignments 1 1 1 1 2 2 2 2 1 2 2 1 1 2 3 2 1 2 1 2 1 4 2 2 1 2 1 1 b) Patient Score All possible assignments 1 30 1 1 1 2 2 2 2 60 1 2 2 1 1 2 3 20 2 1 2 1 2 1 4 30 2 2 1 2 1 1 45 25 30 40 45 25 x1 25 45 40 30 25 45 x2 20 –20 –10 10 20 –20 x1  x2 10.43 Permutations equally likely a) The distribution of the improvement scores is the same under therapy 1 and therapy 2. b) Under H0, outcomes under each assignment are equally likely. Because there are a total of six possible assignments, each has probability 1/6. Because two assignments lead to a difference of –20, that difference occurs with probability 2/6. Differences –10 and 10 occur once, so each has probability 1/6. The difference of 20 occurs twice, so it has probability 2/6 of occurring. c) The P-value is the probability of observing a difference as extreme or even more extreme if H0 is true. The observed difference was 20, the most extreme possible. Because P(20) = 2/6 = 1/3, the P-value is 0.333. If the distribution of improvement scores is the same under therapy 1 and therapy 2, we would observe a difference of 20 or more with a probability of 0.33. This would not be considered unusual, indicating that it is plausible the two therapy distributions of improvement scores (and their means) are the same. 10.44 Two-sided permutation P-value For the two-sided alternative hypothesis, the extreme differences are the observed difference of 20 in the upper tail and the corresponding difference of –20 in the lower tail of the sampling distribution. These have probabilities of P(20) = 1/3 and P(–20) = 1/3. Therefore, the two-sided P-value is 1/3 + 1/3 = 2/3, or 0.67. 10.45 Time spent on social networks revisited

a) b c) d) e)

From the app: x1  11.9, x2  15.6, and x1  x2  3.7. Answers will vary. One permutation yielded x1  14.6, x2  14.1, and x1  x2  0.49. Answers will vary. The one permutation from (b) was less extreme because –3.7 < 0.49 < +3.7. Answers will vary slightly. One generation of 10,000 permutations yielded 1498 that resulted in a difference smaller than –3.7 or larger than +3.7 (two-sided alternative). Based on these 10,000 random permutations, the permutation P-value = 1498/10,000 = 0.149. This is not small. If truly the distribution of time spent on social network sites is the same for males and females, then the probability of observing a difference of –3.7 or more extreme (i.e., larger in absolute value) in the sample means is 0.149. This indicates that the null hypothesis is plausible.

Chapter 10: Comparing Two Groups 209 10.46 Compare permutation test to t test

a) H 0 : 1  2 ; H a : 1  2 ; t = –1.78 ; P-value = 0.085 b) No, the permutation test in Exercise 10.45 with a P-value of 0.149 > 0.10 indicates not enough evidence to reject H0, whereas the t-test with a P-value of 0.085 < 0.10 indicates to reject H0. The data in both groups indicate a skewed population distribution, which is supported by an outlier in the female group. Also, the sample size in the male group is rather small. The P-value from the permutation approach appears more trustworthy because assumptions for the t-test might be violated. c) Based on 100,000 random permutations, the permutation P-value now equals 0.1098, which is still larger than 0.10. The conclusion to not reject H0 does not change. 10.47 Dominance of politicians a) H0: The population distribution of ratings is the same for clips based on male and female speakers. (This implies the population means are the same.) Ha: The population mean rating is different for clips based on male and female speakers. The observed difference in sample means equals 11.8. Out of 10,000 random permutations, 980 yielded a difference at least as extreme, resulting in a permutation P-value of 0.098. (Your results might differ slightly.) Because 0.098 > 0.05, there is no evidence of a difference in the population mean dominance rating between clips based on female and male speakers. It is plausible that the distributions are the same. b) From technology: t = 1.68, df = 57.82, P-value = 0.097 > 0.05. The results are comparable. The sample size is fairly large (30 in each group), and the histogram of the ratings in each group does not indicate major deviations from normality. 10.48 Sampling distribution of x1  x2

For large sample sizes, the sampling distribution is approximately normal, with a mean of 1  2 and a standard error of

s12 n1  s22 n2 .

b) The sampling distribution derived via the permutation approach has broader tails, showing more variability compared to the normal distribution. c) The P-value computed from the permutation sampling distribution would be noticeably larger than the one computed from the approximate normal distribution. The area to the right of 100 in the upper tail is noticeably larger under the permutation distribution.

Section 10.4: Analyzing Dependent Samples 10.49 Does exercise help blood pressure? a) The three “before” observations and the three “after” observations are dependent samples because the same patients are in both samples. b) The sample mean of the “before” scores is 150, of the “after” scores is 130, and of the difference scores is 20. The difference between the means for the “before” and “after” scores is the same as the mean of the difference scores. c) From technology, the standard deviation of the difference scores is 5.00. The standard error is se  sd n  5 3  2.887; The 95% confidence interval is xd  t.025 ( se)  20  4.303(2.887), or (7.6, 32.4). We can be 95% confident that the difference between the population means is between 7.6 and 32.4. Because 0 is not included in this interval and because all differences are positive, we can conclude that there is a decrease in blood pressure after patients go through the exercise program. 10.50 Test for blood pressure

H 0 : 1  2 ; H a : 1  2

b) The P-value of 0.02 indicates that if the null hypothesis were true, the probability would be 0.02 of getting a test statistic at least as extreme as the value observed. The exercise program does seem beneficial to lowering blood pressure. c) The assumptions on which this analysis is based are that the sample of difference scores is a random sample from a population of such difference scores and that the difference scores have a population distribution that is approximately normal, particularly if the sample is small (n < 30).

210 Statistics: The Art and Science of Learning from Data, 4th edition 10.51 Social activities for students a) To compare the mean movie attendance and mean sports attendance using statistical inference, we should treat the samples as dependent; the same students are in both samples. b) There is quite a bit of spread, but the outliers appear in both directions. c) The 95% confidence interval is xd  t.025 ( se)  4  2.262(5.11), or (–7.56, 15.56), df = 10 – 1 = 9. xd  0 4.0  0   0.78. The P-value is 0.45. It is plausible that the null se 5.11 hypothesis is true. Because the P-value is large, we cannot conclude that there is a population mean difference in attendance at movies versus sports events. 10.52 More social activities a) We can be 95% confident that the population mean difference score is between –3.3 and 28.9. Because 0 falls in this interval, it is plausible that there is no population mean difference between attendance at parties and sporting events. b) H 0 : 1  2 ; H a : 1  2 , the P-value of 0.11 indicates that if the null hypothesis were true, the probability would be 0.11 of getting a test statistic at least as extreme as the value observed. It is plausible that the null hypothesis is true. We cannot conclude that there is a population mean difference in attendance at parties versus sporting events. c) When we cannot reject the null hypothesis, the confidence interval will include 0 (an indication that it is plausible that there is no population mean difference). d) The assumptions on which this analysis is based are that the sample of difference scores is a random sample from a population of such difference scores and that the difference scores have a population distribution that is approximately normal, particularly if the sample is small (n < 30). 10.53 Movies versus parties a) 1) The assumptions made by these methods are that the difference scores are a random sample from a population distribution that is approximately normal. 2) H 0 : 1  2 ; (or the population mean of difference scores is 0); H a : 1  2

d) The test statistic is t 

3) t = –1.62 4) The P-value is 0.140. 5) If the null hypothesis were true, the probability would be 0.14 of getting a test statistic at least as extreme as the value observed. Because the probability is 0.14 of observing a test statistic or one more extreme by random variation, we have insufficient evidence that there is a mean population difference between attendance at movies versus attendance at parties. b) The 95% confidence interval is (–21.111, 3.511) which rounds to (–21.1, 3.5); we are 95% confident that the population mean number of times spent attending movies is between 21.1 less and 3.5 higher than the population mean number of times spent attending parties. 10.54 Freshman 15 a myth? a) A point estimate of the mean weight change is 135.1 – 133.0 = 2.1 pounds. b) This is not sufficient information to find a confidence interval or conduct a test about the change in the mean. We would need to know the difference score for each woman so that we could get the standard deviation of the difference scores and then the standard error of the mean difference. 10.55 Checking for freshmen 15 a) The standard deviation of the change in weight scores could be much smaller because even though there is a lot of variability among the initial and final weights of the women, most women do not see a large change in weight over the course of the study, so the weight changes would not vary much. b)

se  sd n  2.0 132  0.174 ; The 95% confidence interval is xd  t.025 ( se)  2.1  1.978(0.174), or (1.76, 2.44). 15 is not a plausible weight change in the population of freshmen women. The plausible weight change falls in the range from 1.76 to 2.44 pounds. The data must be quantitative, the sample of difference scores must be a random sample from a population of such difference scores, and the difference scores must have a population distribution that is approximately normal (particularly with samples of size less than 30).

Chapter 10: Comparing Two Groups 211 10.56 Internet book prices a) The samples are dependent because they are the prices of the same ten books at two different internet sites. b) Let group 1 be the prices from Site A and group 2 be the prices from Site B. Then, x1  $87.30, x2  $83.00, and xd  $4.30. The sample mean price for the books from Site A is higher than the sample mean price for the books from Site B. Thus, the sample mean of the difference between the prices from these two sites is positive. c) From technology, the 90% confidence interval for 1   2 is given by (1.57, 7.03). Since 0 is less than the values in the confidence interval, we can conclude that the prices for textbooks used at her college are more expensive at Site A than at Site B. 10.57 Comparing book prices 2 1) Assumptions: The differences in prices are a random sample from a population that is approximately normal. 2) H 0 : d  0; H a : d  0

t

d 0 sd



4.3 4.7152

 2.88

4) The P-value is 0.02. 5) If the null hypothesis is true, the probability of obtaining a difference in sample means as extreme as that observed is 0.02. We would reject the null hypothesis and conclude that there is a significant difference in prices of textbooks used at her college between the two sites for   0.05 or   0.10 , but not for   0.01. 10.58 Lung capacity revisited

xd 

0.28   0.01  0.30  0.23  0.24

 0.21 liters 5 b) From technology, the 95% confidence interval is (0.05, 0.36). With 95% confidence, the population mean FVC of the author improves by at least 0.05 liters and at most 0.36 liters after using the inhaler. c) From technology, analyzing the data as if we had two independent samples results in a confidence interval of (–0.36, 0.77). Then, we would conclude, with 95% confidence, that the population mean FVC could be as much as 0.36 liters smaller and as much as 0.77 liters larger after using the inhaler. Because 0 is in the confidence interval, it is plausible that there is no improvement, which differs from the conclusion in (b), which showed an improvement of at least 0.05 liters. 10.59 Comparing speech recognition systems a) pˆ1  1979 2000  0.9895 and pˆ 2  1937 2000  0.9685

b) We are 95% confident that the population proportion of correct results for GDMS is between 0.013 and 0.029 higher than the population proportion of correct results for CDHMM. 10.60 Treat juveniles as adults? a) They are dependent since matched pairs were formed by matching certain criteria and one juvenile from each pair was assigned to either juvenile court or adult court. b) Let group 1 represent the juveniles assigned to adult court and group 2 represent the juveniles assigned to juvenile court, then pˆ1  673 2097  0.32 and pˆ 2  448 2097  0.21. c)

1) Assumptions: the difference in re-arrest rates are a random sample from a population that is approximately normal. 2) H 0 : d  0; H a : d  0 z

bc



515  290

 7.9 bc 515  290 4) The P-value is approximately 0.

212 Statistics: The Art and Science of Learning from Data, 4th edition 10.60 (continued) 5) If the null hypothesis is true, the probability of obtaining a difference in sample means as extreme as that observed is very close to 0. There is extremely strong evidence of a population difference in the re-arrest rates between juveniles assigned to adult court and those assigned to juvenile court. 10.61 Change coffee brand? a) The point estimates for the population proportions choosing Sanka for the first and second purchases, respectively, are (i) 204/541 = 0.38 and (ii) 231/541 = 0.43. The estimated difference of population proportions is the difference between the sample means at the two times: 0.43 – 0.38 = 0.05. We estimate the population proportion of people buying Sanka coffee after the advertising campaign was 0.05 larger than before the campaign. In terms of percentages, we estimate 5% more people in the population bought Sanka coffee after the campaign. b) The confidence interval of (0.01, 0.09) tells us that we can be 95% confident that the proportion of people buying Sanka coffee after the advertising campaign is between 0.01 and 0.09 larger than before it. This means that we are 95% confident that between 1% and 9% more people in the population bought Sanka coffee after the advertising campaign. c) H 0 : p1  p2 , where p1 and p2 are the (marginal) population proportions of people buying Sanka coffee before and after the advertising campaign, respectively. We have sufficient evidence (P-value of 0.02 < 0.05 significance level) that the population proportion of people buying Sanka coffee differs before and after the advertising campaign. 10.62 President’s popularity a) This Month Yes No Last Month Yes 450 60 No 40 450 b) (i) The sample proportion giving a favorable rating last month is (450  60) 1000  0.51.

(ii) The sample proportion giving a favorable rating this month is (450  40) 1000  0.49. c)

The difference is 0.51 – 0.49 = 0.02. For the 1000 subjects surveyed, the proportion giving a favorable rating dropped by 2 percentage points from last month to this month. bc 40  60 d) z    2.0; The P-value is 0.046. bc 40  60 If the null hypothesis were true, the probability would be 0.046 of getting a test statistic at least as extreme as the value observed. We have reasonably strong evidence that there is a difference in the population proportions of people who thought the president is doing a good job between last month and this month. 10.63 Heaven and hell a) The point estimate for the difference between the population proportions believing in heaven and believing in hell is 0.631 – 0.496 = 0.135. b) (i) The assumptions are that the data are categorical, the samples are independent and random, and the sum of the two counts in the test is at least 30. (ii) H 0 : p1  p2 ; H a : p1  p2 (iii) z 

bc



65  35

 3.00 bc 65  35 (iv) The P-value is 0.003. (v) This is a very small P-value; if the null hypothesis were true, the probability would be 0.003 of getting a test statistic at least as extreme as the value observed. We have strong evidence that there is a difference between the population proportions of those who believe in heaven and those who believe in hell. It appears that more people believe in heaven than in hell.

Chapter 10: Comparing Two Groups 213 10.64 Heaven and hell around the world We do not have enough information to compare the two proportions inferentially for a given country because we do not know how many people answered “yes” for both, and how many people answered “no” for both.

Section 10.5: Adjusting for the Effects of Other Variables 10.65 Benefits of drinking a) This refers to an analysis of three variables, a response variable, an explanatory variable, and a control variable. The response variable is “whether or not at risk for cardiovascular disease;” the explanatory variable is “whether drink alcohol moderately;” and the control variables are socioeconomic status and mental and physical health. b) There is a stronger association between drinking alcohol and its effect on risk for cardiovascular disease for subjects who have a higher socioeconomic status. 10.66 Death penalty in Kentucky a) White victim: 31/391 = 0.079 or 7.9% of white defendants received the death penalty. 7/57 = 0.123 or 12.3% of black defendants received the death penalty. Black victim: 0/18 = 0.000 or 0.0% of white defendants received the death penalty. 2/108 = 0.019 or 1.9% of black defendants received the death penalty. These results suggest that, for each type of victim’s race, black defendants were more likely to receive the death penalty than white defendants were. b) The response variable is verdict, the explanatory variable is defendant’s race, and the control variable is victim’s race. c) Death Penalty Defendant’s Race Yes No Total White 31 378 409 Black 9 156 165 The overall proportions of whites and blacks who receive the death penalty, regardless of victim’s race, are 0.076 and 0.055, respectively. d) The data do satisfy Simpson’s paradox in that the direction of association reverses when an additional variable is controlled for. This occurs because of the preponderance of whites who are accused of killing whites, a victim group that leads to higher death penalty rates. e) (i) A test for two proportions could be conducted using the values in (c). (ii) A test for two proportions could be conducted using proportions calculated from the accompanying table. 10.67 Basketball paradox a) The proportion of shots made is higher for Barry both for 2-point shots and for 3-point shots, but the proportion of shots made is higher for O’Neal overall. b) O’Neal took almost exclusively 2-point shots, where the chance of success is higher. 10.68 Teacher salary, gender, and academic rank a) (i) Overall, the difference is 84.4 – 68.8 = 15.6. The mean salary for male faculty was $15,600 higher than for female faculty. (ii) The difference was 109.2 – 96.2 = 13 for professors, 77.8 – 72.7 = 5.1 for associate professors, 66.1 – 61.8 = 4.3 for assistant professors, and 46.0 – 46.9 = –0.9 for instructors. b) Perhaps the proportion of the faculty who are men is relatively higher at the higher academic ranks, for which the salaries are higher, and relatively lower at the lower academic ranks, for which the salaries are lower.

214 Statistics: The Art and Science of Learning from Data, 4th edition 10.69 Family size in Canada a) The mean number of children for English-speaking families (1.95) is higher than the mean number of children in French-speaking families (1.85). b) Controlling for province, this association reverses. In each case, there’s a higher mean for Frenchspeaking families. For Quebec, the mean number for French-speaking families (1.80) is higher than for English-speaking families (1.64). Similarly, for other provinces, the mean for French-speaking families (2.14) is higher than the mean for English-speaking families (1.97). c) This paradox likely results from the fact that there are relatively more English-speaking families in the “other” provinces that tend to produce more children regardless of language, and more Frenchspeaking families in Quebec where they tend to have fewer children regardless of language. This illustrates Simpson’s paradox. 10.70 Heart disease and age This occurs because there are more young people, who are at the lower levels of heart disease fatalities, in Utah, and more old people, who are at the higher levels of heart disease, in Colorado. Even though the rates are lower in Colorado at each age level, the heart disease death rates for young people in Utah are still lower than the death rates for old people in Colorado. 10.71 Breast cancer over time There could be no difference in the prevalence of breast cancer now and in 1900 for women of a given age. Overall, the breast cancer rate would be higher now, because more women live to an old age now, and older people are more likely to have breast cancer.

Chapter Problems: Practicing the Basics 10.72 Pick the method a) The statistical method would be a significance test, the samples are independent, the relevant parameter is the mean difference between the means, and the inference method would include the calculation of a t statistic and a P-value. The P-value would be compared to a significance level. b) The statistical method could be either a significance test (significantly different from a mean change of 0?) or a confidence interval (is 0 included in the interval?), the samples are dependent, the relevant parameter is the mean difference score, and the inference method would include either the calculation of a t statistic and P-value, or the calculation of a confidence interval. c) The statistical method could be either a McNemar test, a significance test comparing proportions from dependent samples, or a confidence interval (is 0 included in the interval?), the samples are dependent, the relevant parameter is the difference between proportions, and the inference method would include either the calculation of a z statistic and P-value using a McNemar test, or the calculation of a confidence interval. d) The statistical method could be either a significance test (significantly different from a difference in means of 0?) or a confidence interval (is 0 included in the interval?), the samples are independent, the relevant parameter is the difference between means, and the inference method would include either the calculation of a t statistic and P-value, or the calculation of a confidence interval. 10.73 Public versus scientists’ opinions on fracking a) The response variable the opinion on fracking (favor or oppose), and the explanatory variable is the type of survey (U.S. adults/scientists). b) The separate samples of subjects should be treated as independent sample in order to conduct inference because each survey uses different subjects. (Highly unlikely that a subject from the scientists’ survey was also included in the Pew Research Center survey.) c) Now, the two samples include the same subjects, and so should be treated as dependent samples (matched pairs) because the same scientists who were asked about fracking were asked about offshore drilling. 10.74 BMI then and now a) The point estimate for the change in the population proportion is 0.69 – 0.66 = 0.03. It estimates that the population proportion of adults that are overweight increased by 0.03. In terms of percentages, the estimated population percent of adults that are overweight increased by 3 percentage points between 2003/2004 and 2011/2012. Copyright © 2017 Pearson Education, Inc.

Chapter 10: Comparing Two Groups 215 10.74 (continued) b) The standard error is so small mainly because of the large sample sizes for both samples. c) With 95% confidence, the population proportion of adults overweight in 2011/2012 is by at least 0.01 and at most 0.05 larger than in 2003/2004. In terms of percentages, with 95% confidence, the population percent of adults overweight in 2011/2012 is by at least 1 and at most 5 percentage points larger than in 2003/2004. Zero is not contained in the interval as a plausible value, indicating the population proportion of adults overweight is greater in 2011/2012 than in 2003/2004. 10.75 Marijuana and gender a) We are 95% confident that the population proportion of females who have used marijuana is at least 0.0077 lower and at most 0.0887 lower than the population proportion of males who have used marijuana. Because 0 is not in the confidence interval, we can conclude that females and males differ with respect to marijuana use. b) The confidence interval would change only in sign. It would now be (0.0077, 0.0887). We are 95% confident that the population proportion of males who have used marijuana is at least 0.0077 higher and at most 0.0887 higher than the population proportion of females who have used marijuana. 10.76 Gender and belief in afterlife a) The sample proportions who report that they believe in an afterlife for females is 1026/1233 = 0.8321, for males is 757/1009 = 0.7502, and for the difference between females and males is 0.8321 – 0.7502 = 0.0819.

b) The standard error for the estimate of p1  p2 is se 

pˆ1 (1  pˆ1 ) pˆ 2 (1  pˆ 2 )   n1 n2

0.8321(1  0.8321) 0.7502(1  0.7502)   0.017. This expresses how much, for samples of these 1233 1009 sizes, the difference in the sample proportion varies (roughly on average) around the true (unknown) difference in the population. c) The 95% confidence interval is ( pˆ1  pˆ 2 )  z.05 ( se)  (0.8321  0.7502)  1.96(0.017), or (0.05, 0.12). Because 0 is not in the confidence interval, we can conclude that the population proportion believing in an afterlife is larger for females. d) The difference between these population proportions, 0.81 – 0.72 = 0.09, is in the confidence interval. The confidence interval in (c) contains the parameter it is designed to estimate. 10.77 Belief depend on gender?

pˆ 

1026  757  0.795 and 1233  1009

se0 

z

1 1  1   1 pˆ (1  pˆ )     0.795(1  0.795)    0.01714  1233 1009   n1 n2 

( pˆ1  pˆ 2 )  0 0.0819   4.78; The P-value is approximately 0. se0 0.01714

If the null hypothesis were true, the probability would be approximately 0 of getting a test statistic at least as extreme as the value observed. Therefore, we reject the null hypothesis and conclude that the population proportions believing in an afterlife are different for females and males. c) If the population difference were 0.81 – 0.72 = 0.09, our decision would have been correct. d) The assumptions on which the methods in this exercise are based are that we used independent random samples for the two groups (okay since from GSS) and that we had at least 5 successes and 5 failures in each sample. 10.78 Females or males have more close friends? a) The point estimate of the difference between the population means for males and females is 8.9 – 8.3 = 0.6.

216 Statistics: The Art and Science of Learning from Data, 4th edition 10.78 (continued) b) We are 95% confident that the population mean for males is between 0.60 – 2.1 = –1.5, or 1.5 lower and 0.60 + 2.1 = 2.7, or 2.7 higher than the population mean for females. Because 0 falls in this interval, it is plausible that there is no difference between the population means for males and females. c) For each gender, it does not seem like the distribution of number of friends is normal. The standard deviations are larger than the means; the lowest possible value of 0 is (0 – 8.3)/15.6 = –0.532, or 0.532 standard deviations below the mean for females and (0 – 8.9)/15.5 = –0.574, or 0.574 standard deviations below the mean for males, an indication of right skew. The confidence interval in (b) is based on the assumption of normal population distributions. The t-distribution, however, is robust with two-sided confidence intervals. With large samples (n > 30), it should not affect the validity. 10.79 Heavier horseshoe crabs more likely to mate? a) Boxplot of Weight vs Status 5.5 5.0 4.5 4.0 Weight

3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0

Mate

No Mate Status

The female crabs have a higher median and a bigger spread if they had a mate than if they did not have a mate. The distribution for the female crabs with a mate is right-skewed, whereas the distribution for the female crabs without a mate is symmetrical. b) The estimated difference between the mean weights of female crabs who have mates and who do not have mates is 2.6 – 2.1 = 0.5. c)

se 

s12 s22   n1 n2

0.36 0.16   0.076 111 62

d) The 90% confidence interval is ( x1  x2 )  t.05 ( se)  0.5  1.645(0.076), or (0.375, 0.625). Because n1 and n2 are large, we approximate t.05 with the normal distribution using z.05 = 1.645. We can be 90% confident that the difference between the population mean weights of female crabs with and without a mate is between 0.375 and 0.625 kg. Because 0 does not fall in this interval, we can conclude that female crabs with a mate weigh more than do female crabs without a mate. 10.80 TV watching and race a) TV distribution does not likely have a normal distribution for either race because the standard deviations are almost as large as the means. In fact, the lowest possible value of 0 is (0 – 3.97)/3.54 = –1.12, or 1.12 standard deviations below the mean for blacks, and (0 – 2.77)/2.25 = –1.23, or 1.23 standard deviations below the mean for whites indicating skew to the right in both cases. Although inferences comparing population means assume a normal population distribution, the large sample sizes indicate that these skewed distributions do not likely affect the validity of inferences drawn from these samples. b) We can be 95% confident that the difference between the population means of blacks and whites is between 0.75 hours and 1.65 hours. Because 0 does not fall in this interval, we can conclude that blacks watch more television than whites do, on the average.

Chapter 10: Comparing Two Groups 217 10.80 (continued) c) This inference is based on the assumptions that the data are quantitative, the samples are independent and random (okay since using GSS), and the population distributions for each group are approximately normal, (which is not so important because of large sample size). 10.81 Test TV watching by race

H 0 : 1  2 ; H a : 1  2

b) The test statistic is 5.25, and the P-value is 0.000. If the null hypothesis were true, the probability would be almost 0 of getting a test statistic at least as extreme as the value observed. c) Using a significance level of 0.05, we can reject the null hypothesis. We have strong evidence that there is a difference in mean TV watching time between the populations of blacks and whites. Blacks seem to watch more TV than do whites. d) When we reject the null hypothesis at a significance level of 0.05, a 95% confidence interval does not include the value at the null hypothesis, in this case, 0. 10.82 Ibuprofen and lifespan a) The response variable is the level of amino acid (quantitative). The explanatory variable is the whether or not the cells received ibuprofen (categorical). b) The standard error of the mean for the ibuprofen group is se1  s1 standard error of the mean for the untreated group is se2  s2 standard error for the difference in means is se 

s12 s22   n1 n2

n1  0.0038

n2  0.0050

6  0.0016, the

6  0.0020, and the

0.00382 0.00502   0.0026. 6 6

With 95% confidence, the population mean level of tryptophan for cells treated with ibuprofen is between 0.007 lower and 0.004 larger compared to untreated cells. Because 0 falls in the interval, it is plausible that the population mean level of tryptophan is the same for cells treated with ibuprofen and untreated cells. 10.83 Time spent on Internet a) The response variable is the number of hours a week spent on the Internet and is quantitative. The explanatory variable is the respondent’s gender and is categorical. b) The 99% confidence interval is (–0.8, 2.5). With 95% confidence, the population mean number of hours spent on the web is between 0.8 hours shorter and 2.5 hours longer for males compared to females. Because 0 is in the confidence interval, there is no evidence that the population mean time on the web differs by gender. c) 1) Assumptions: Independent random samples, and number of hours spent on the Internet per week has an approximately normal population distribution for each gender. 2) H 0 : 1  2 ; H a : 1  2 , where 1 is the mean number of hours for the population of all U.S. males and 2 is the mean number of hours for the population of all U.S. females. 3) t  1.03. 4) The P-value is 0.3044. 5) If the null hypothesis is true, the probability of obtaining a difference in sample means as extreme as that observed is 0.3044, which is not unusual. At a significance level of 0.05, we do not have evidence to reject H0. It is plausible that the population mean number of hours a week spent on the Internet is the same for males and females. 10.84 Test–CI connection Since the 95% confidence interval contains 0, it is plausible that the population mean number of hours a week spent on the Internet is the same for males and females. This is the same conclusion we reached using a hypothesis test with a significance level of 0.05.

218 Statistics: The Art and Science of Learning from Data, 4th edition 10.85 Sex roles 1) Assumptions: The data are quantitative (child’s score); the samples are independent and we will assume that they were collected randomly; we assume that the population distributions of scores are approximately normal for each group. 2) H 0 : 1  2 ; H a : 1  2 , where group 1 represents the group with the male tester and group 2 represents the group with the female tester. 1.42 1.22 2.9  3.2   0.2349; t   1.2 50 90 0.2349 4) The P-value is 0.205. 5) If the null hypothesis were true, the probability would be 0.205 of getting a test statistic at least as extreme as the value observed. Since the P-value is quite large, there is not much evidence of a difference in the population mean of the children’s scores when the tester is male versus female. 10.86 How often do you feel sad? a) The P-value of 0.000 indicates that if the null hypothesis were true, the probability would be almost 0 of getting a test statistic at least as extreme as the value observed. In other words, it is extremely unlikely that the population means are equal because, if they were, it is improbable that we’d get a test statistic of 3.88. It appears that women report feeling sad more often than do men, on the average. b) We can be 95% confident that the difference between the population means for women and men falls between 0.193 and 0.587. Because 0 does not fall in this confidence interval, we can conclude that women report feeling sad more often than do men, on the average. We learn from the confidence interval, but not from the test, the actual range of plausible values for the mean population difference, and we see the difference may be quite small. c) The assumptions for these analyses are that the data are quantitative, the samples are independent and random, and the population distributions for each group are approximately normal (particularly with small sample sizes). It appears that the population distributions are not normal; the standard deviations are larger than the means. In fact, the lowest possible value of 0 is (0 – 1.81)/1.98 = –0.914, or 0.914 standard deviations below the mean for females, and (0 – 1.42)/1.83 = –0.776, or 0.776 standard deviations below the mean for males, indicating skew to the right in both cases, an indication of skew. Given the large sample sizes, this does not likely affect the validity of our inferences. 10.87 Parental support and household type a) The 95% confidence interval is 4  3.4 , or (0.6, 7.4). b) The conclusion refers to the results of a significance test in that it tells us the P-value of 0.02. If the null hypothesis were true, the probability would be 0.02 of getting a test statistic at least as extreme as the value observed. 10.88 Car bumper damage Using technology, a 95% confidence interval is (6.2, 14.5). The test statistic is t = 7.11 and P-value is 0.003. We can conclude that the population mean is higher for one bumper type. Zero does not fall in the confidence interval, an indication that it is unlikely that there is no population mean difference, and the P-value of 0.003 is quite small, an indication that it would be improbable to obtain a test statistic this large if the null hypothesis is true (and if we’re using samples of this size). 10.89 Teenage anorexia a) The P-value of 0.102 indicates that if the null hypothesis that there is no difference between population mean change scores were true, the probability would be 0.10 of getting a test statistic at least as extreme as the value observed. b) The assumptions for this analysis are that the data are quantitative, the samples are independent and random, and the population distributions for each group are approximately normal. Based on the box plots (which show outliers and a skew to the right for the cognitive-behavioral group), it would not be a good idea to conduct a one-sided test. It is not as robust as the two-sided test to violations of the normal assumption.

se 

Chapter 10: Comparing Two Groups 219 10.89 (continued) c) The lowest plausible difference between population means is the lowest endpoint of the confidence interval, –0.7, a difference of less than 1 pound. Thus, if there is a change in this direction (cognitivebehavioral group less), then it is less than 1 pound. On the other hand, the highest plausible difference between population means is the highest endpoint of the confidence interval, 7.6. Thus, if there is a change in this direction (cognitive-behavioral group more), then it could be almost as much as 8 pounds. d) The confidence interval and test give us the same information. We do not reject the null hypothesis that the difference between the population means is 0, and 0 falls in the 95% confidence interval for the difference between the population means. 10.90 Equal pay in sports? a) The means are 50,637 for males and 38,286 for females with a difference of 50,637 – 38,286 = 12,351. b) Judging by the dot plots, the population distribution of the prize money earned may be rather right skewed for both male and female skiers (especially males). Further, we are interested in a one-sided test, and the sample size is not that large. All these are conditions under which the t-test may not work well. c) Answers will vary. One generated permutation yielded a difference in the sample means of –12,971. d) Answers will vary. For our random permutation, –12,971 is less extreme (one-sided test) than the observed difference of 12351. e) Answers will vary. Of ten generated permutations, five yielded a test statistic at least as extreme. (Make sure you select “greater” for the alternative hypothesis.) f) Answers will vary. One generation yielded 3249 out of 10,000 permutations resulted in a test statistic at least as extreme. g) The permutation P-value is 3249/10,000 = 0.325. If the population distribution of the prize money is the same for male and female skiers, observing a difference of 12,351 in our sample occurs with a probability of 0.325. Because this P-value is not small, there is not enough evidence to conclude that the population means of the population distributions differ. h) No, the shape of the histogram is almost identical, as is the permutation P-value. 10.91 Surgery versus placebo for knee pain From technology, the 95% confidence interval is (–10.84257, 5.24257). We can be 95% confident that the population mean difference in pain scores falls between –10.8 and 5.2. Because 0 is included in this range, it is plausible that there is no difference between the population mean pain scores of these two groups. 10.92 More knee pain

From technology using H 0 : d  0; H a : d  0, the t-value is –0.69 and the P-Value is 0.492 with df = 117. The P-value of 0.49 indicates that if the null hypothesis were true, the probability would be 0.49 of getting a test statistic at least as extreme as the value observed. It is plausible that the null hypothesis is true and that there is no difference in population mean pain levels between these two groups. For this inference, we assume quantitative response variables, independent random samples, and approximately normal population distributions for each group. 10.93 Anorexia again a) We can be 95% confident that the population mean difference is between –0.7 and 7.6. Because 0 falls in this range, it is plausible that there is no population mean difference between these groups. b) The P-value of 0.10 indicates that if the null hypothesis were true, the probability would be 0.10 of getting a test statistic at least as extreme as the value observed. c) The P-value is 0.10/2 = 0.05. If the null hypothesis were true, the probability would be 0.05 of getting a test statistic of 1.68 or larger. This P-value is smaller than the one in (b), thus providing stronger evidence against the null hypothesis. d) For these inferences, we assume a quantitative response variable, independent random samples, and approximately normal population distributions for each group.

220 Statistics: The Art and Science of Learning from Data, 4th edition 10.94 Breast-feeding helps IQ? a) From technology, the 95% confidence interval is (–13.5414, –6.6586). The confidence interval does not include 0; therefore, we can conclude that there is a difference between the population means. In addition, the P-value (for t = –5.77) is almost 0; if the null hypothesis were true, the probability would be almost 0 of getting a test statistic at least as extreme as the value observed. We have strong evidence that mean IQ is higher among the population of babies who have been breast-fed for 7 to 9 months than among the population of babies who had been breast-fed for no longer than a month. b) This was an observational study because babies were not randomly assigned to condition. There are several potential lurking variables. Parents’ education levels or IQ might affect babies’ IQ’s and might be related to tendency to breast-feed. 10.95 Australian cell phone use a) The observations are paired because two observations were made on the same driver: were they using their cell phone when the crash occurred and had they used their cell phone at an earlier time when no accident occurred. Thus, methods for dependent samples should be used. b) McNemar’s test is used for comparing (marginal) proportions from matched pairs. 10.96 Improving employee evaluations We would explain that there’s less than a 5% chance that we’d get a sample mean at least this much higher after the training course if there were, in fact, no difference in the population. To make this more informative, it would have been helpful to have the sample means or the sample mean difference and its standard error – or better yet, the confidence interval. 10.97 Which tire is better? a) There is a range of possible answers for this problem. The standard deviation must be small enough that the resulting standard error and test statistic will lead to a test statistic with a small P-value. b) The design of this study could be improved by increasing the sample size and randomizing the side the tire is put on. 10.98 Effect of alcoholic parents a) The groups are dependent since they were matched according to age and gender. b) 1) Assumptions: the differences in scores are a random sample from a population that is approximately normal. 2) H 0 : d  0; H a : d  0

t

2.7 9.7

 1.95

4) The P-value is 0.057. 5) If the null hypothesis is true, the probability of obtaining a difference in sample means at least as extreme as that observed is 0.057. This is some, but not strong, evidence that there is a difference in the mean scores between children of alcoholics versus children of non-alcoholics. c) We assume that the population of differences is approximately normal and that our sample is a random sample from this distribution. 10.99 CI versus test a)



The 95% confidence interval is xd  t.025 ( se)  2.7  2.01  9.7



49 , or (–0.1, 5.5), df = 48.

We are 95% confident that the difference in population mean scores is between –0.1 and 5.5. b) The confidence interval gives us a range of values for the difference between the population mean scores rather than just telling us whether or not the scores are significantly different. 10.100 Breast augmentation and self esteem a) The samples were dependent since the same women were sampled before and after their surgeries. b) No. In order to find the t statistic, we need to know the standard deviation of the differences which cannot be obtained from the information given.

Chapter 10: Comparing Two Groups 221 10.101 Internet use 1) Assumptions: the differences in time spent reading news stories on the Internet and time spent communicating on the internet are a random sample from a population that is approximately normal. 2) H 0 : d  0; H a : d  0

3) t = –3.30 4) The P-value is 0.03. 5) If the null hypothesis is true, the probability of obtaining a difference in sample means at least as extreme as that observed is 0.03. At a significance level of 0.05, we would reject the null hypothesis and conclude that there is a significant difference in the population mean amount of time spent reading news stories on the Internet versus population mean time spent communicating on the Internet. A 95% confidence interval is given by (–5.9, –0.5). We are 95% confident that the population mean amount of time spent reading news stories on the Internet is between 5.9 and 0.5 hours less than the population mean amount of time spent communicating on the Internet. Since 0 is not contained in this interval, we conclude that the population means differ. The population consists of the 165 students in the course. 10.102 TV or rock music a worse influence? a) The samples are dependent; the same people are answering both questions. b) From technology: The 95% confidence interval is (–0.335, 1.335). We can be 95% confident that the population mean difference between ratings of the influence of TV and rock music is between –0.3 and 1.3. Because 0 falls in this range, it is plausible that there is no difference between the population mean ratings. c) From technology: H 0 : d  0; H a : d  0, t = 1.32, and the P-Value is 0.21. The P-value of 0.21 indicates that if the null hypothesis were true, the probability would be 0.21 of getting a test statistic at least as extreme as the value observed. 10.103 Influence of TV and movies a) We can be 95% confident that the population mean difference between responses with respect to movies and TV is between –0.43 and 0.76. Because 0 falls in this interval, it is plausible that there is no difference between the population mean responses for TV and the population mean responses for movies. b) The significance test has a P-value of 0.55. If the null hypothesis were true, the probability would be 0.55 of getting a test statistic at least as extreme as the value observed. it is plausible that the null hypothesis is correct and that there is no population mean difference between responses with respect to movies and TV. 10.104 Crossover study

The sample proportions are: High Dose: (53  16) (53  16  8  9)  69 86  0.802; Low Dose: (53  8) (53  16  8  9)  61 68  0.709.

z

8  16

 1.63; P-value: 0.10; if the null hypothesis were true, the probability would be 0.10 of 8  16 getting a test statistic at least as extreme as the value observed. It is plausible that the null hypothesis is correct and that there is no difference between low-dose and high-dose analgesics with respect to population proportions who report relief of menstrual bleeding. The sum of the counts in the denominator should be at least 30, although in practice the two-sided test works well even if this is not true. In addition, the sample should be an independent random sample, and the data should be categorical.

222 Statistics: The Art and Science of Learning from Data, 4th edition 10.105 Belief in ghosts and in astrology a) Because the same sample is used for both sets of responses, it is not valid to compare the proportions using inferential methods for independent samples. Rather, we should use methods for dependent samples. b) We do not have enough information to compare the proportions using methods for dependent samples. We would need to know the specific numbers of subjects who said that they did believe in ghosts but not in astrology, and who said that they did believe in astrology, but not in ghosts. 10.106 Death penalty paradox a) When we ignore victim’s race, we observe a proportion of 53/(53 + 414 + 16) = 0.11 white defendants who receive the death penalty, and a proportion of (11 + 4)/(11 + 4 + 37 + 139) = 0.079 (rounds to 0.08) black defendants who receive the death penalty. It appears that whites are more likely to receive the death penalty than are blacks. When we take victim’s race into account, the direction of the association changes with black defendants more likely to get the death penalty. Specifically, when the victim was white: White defendants have a probability of 53/(53 + 414) = 0.113 (rounds to 0.11) and blacks have a probability of 11/(11 + 37) = 0.229 (rounds to 0.23) of receiving the death penalty. When the victim was black: White defendants have a probability of 0/(0 + 16) = 0.000 (rounds to 0.00) and blacks have a probability of 4/(4 + 139) = 0.028 (rounds to 0.03) of receiving the death penalty. In both cases, the proportion of blacks who receive the death penalty is higher than the proportion of whites who receive the death penalty. b) The death penalty was imposed more frequently when the victim was white, and white victims where more common when the defendant was white. 10.107 Death rate paradoxes a) If the U.S. has a higher proportion of older people (who are more likely to die in both countries than are younger people), and Mexico has a higher proportion of younger people (who are less likely to die in both countries than are older people), then the death rate in the U.S. could be higher. b) This could occur if Maine has more older people. Even if older people in South Carolina are more likely to die than are older people in Maine, if Maine has far more older people and far fewer younger people, this overrepresentation of older people could lead to an overall higher death rate in Maine than in South Carolina. 10.108 Income and gender a) The mean income difference could disappear if we controlled for number of years since receiving highest degree. If most of the female faculty had been hired recently, they would be fewer years from their degree, and would have lower incomes. So, the overall mean could be lower for females. If we look only at those who are a given year from receiving their degree (e.g., received degree five years ago), we might find no gender difference. b) The mean income difference could disappear if we controlled for college of employment. If more women seek positions in low salary colleges and more men in high salary colleges, it might appear that men make more. If we look only within a given college (e.g., law school), we might not find a gender difference in income.

Chapter 10: Comparing Two Groups 223

Chapter Problems: Concepts and Investigations 10.109 Student survey Each report will be different, but will present findings such as those in the following software output. The assumptions are that each sample must have at least 10 outcomes of each type, the data must be categorical, and the samples must be independent random samples. Using software, we compared the proportions of women and men who said yes. The data, from technology, follow: From MINITAB: Gender Y N Sample p f 18 31 0.580645 m 13 29 0.448276 Difference = p (f) - p (m) Estimate for difference: 0.132369 95% CI for difference: (-0.118500, 0.383238) Test for difference = 0 (vs. not = 0): Z = 1.03 P-Value = 0.301

Since the confidence interval contains 0 and the P-value is large, we are unable to conclude that there is a difference in the population proportion of males and females who believe in life after death. 10.110 Review the medical literature Answers will vary. 10.111 Attractiveness and getting dates The short report will be different for each student, but should include the results such as those in the following technology outputs. Men, From MINITAB: Sample N Mean StDev SE Mean 1 35 9.7 10.0 1.7 2 36 9.9 12.6 2.1 Difference = mu (1) - mu (2) Estimate for difference: -0.200000 95% CI for difference: (-5.582266, 5.182266) T-Test of difference = 0 (vs. not =): T-Value = -0.07 P-Value = 0.941 DF = 66 Since the confidence interval contains 0 and the P-value is greater than 0.05, we are unable to conclude that there is a difference in the population mean number of dates between more and less attractive men. Women, From MINITAB: Sample N Mean StDev SE Mean 1 33 17.8 14.2 2.5 2 27 10.4 16.6 3.2 Difference = mu (1) - mu (2) Estimate for difference: 7.40000 95% CI for difference: (-0.70930, 15.50930) T-Test of difference = 0 (vs. not =): T-Value = 1.83 P-Value = 0.073 DF = 51 Since the confidence interval contains 0 and the P-value is greater than 0.05, we are unable to conclude that there is a difference in the population mean number of dates between more and less attractive women. However, the confidence interval shows there could be a large difference. 10.112 Pay discrimination against women? a) We would need to know the sample standard deviations and sample sizes for the two groups. b) It would not be relevant to conduct a significance test. A significance test lets us make inferences about a population based on a sample. If we already have the information on the entire population, there’s no need to make inferences about the population.

224 Statistics: The Art and Science of Learning from Data, 4th edition 10.113 Mean of permutation distribution

If the population distributions are identical, the sampling distribution of x1  x2 should be centered at around 0. The mean is: 1 2 2 3 1 1  xP  x    4.67     3.83     0.33    1.17     2.00    6.17     0. 10 10 10 10 10 10 10.114 Treating math anxiety Solutions will vary but should include the information that follows from technology. From MINITAB: Two-sample T for Program A vs. Program B N Mean StDev SE Mean Program A 5 4.00 2.45 1.1 Program B 5 12.00 3.74 1.7 Difference = mu (Program A) - mu (Program B) Estimate for difference: -8.00000 95% CI for difference: (-12.89382, -3.10618) T-Test of difference = 0 (vs. not =): T-Value = -4.00 0.007 DF = 6

P-Value =

Since the P-value is quite small (or, equivalently, since the confidence interval are less than 0), we reject the null hypothesis and conclude that there is a significant difference in the drop in the mean number of items that caused anxiety between Programs A and B. 10.115 Obesity and earnings a) Whether obese (yes or no) and wage are stated to have an association. b) Education level is one possibility. The women could be paired according to education level and then compared in obesity rates. 10.116 Multiple choice: Alcoholism and gender The best answer is (b). 10.117 Multiple choice: Comparing mean incomes The best answer is (d). 10.118 Multiple choice: Sample size and significance The best answer is (a). 10.119 True or false? Positive values in CI False 10.120 True or false? Afford food? False 10.121 True or false? Control for clinic False

Chapter 10: Comparing Two Groups 225 ♦♦10.122 Guessing on a test a)

Denote the proportion of correct responses on the test by p̂1 for Joe and by p̂2 for Jane. The sampling distribution of pˆ1  pˆ 2 is approximately normal and has a mean of pˆ1  pˆ 2  0.50  0.60  0.10 and a standard error of se 

pˆ1 1  pˆ1  n1



pˆ 2 1  pˆ 2  n2

0.6(1  0.6) 0.5(1  0.5)   0.070 . The probability that Joe gets a 100 100



higher score is the probability that pˆ1  pˆ 2 is positive. This equals approximately the probability that a normal random variable having a mean of –0.10 and a standard deviation of 0.070 takes a positive 0  ( 0.10) 0.10   1.43, and the tail value. The z-score for a sample difference of 0 is z  0.070 0.070 probability above that value (which corresponds to positive values for the difference of sample proportions) is 0.08. This is the answer. b) If the test had only 50 questions, we would have a larger standard error, and hence a smaller z-score and a larger right-tail probability. The probability would be higher than 0.08 that Joe would get a higher score than Jane. (With larger samples, it is less likely for Joe to “luck out” and do better than Jane, even though he has a lower chance of a correct response with any given question.). ♦♦10.123 Standard error of difference se(estimate 1  estimate 2)  [ se(estimate 1)]2  [ se(estimate 2)]2  0.62  1.82  1.897; The 95%

confidence interval is ( x1  x2 )  1.96( se)  (46.3  33.0)  1.96(1.897), or (9.58, 17.02). We can be 95% confident that the difference between the population mean number of years lost is between 9.6 and 17.0. Because 0 is not in this range, we can conclude that there is a population mean difference between those who smoke and are overweight and those who do not smoke and are normal weight in terms of number of years of life left. It appears that those who do not smoke and are normal weight have more years left than do those who smoke and are overweight. ♦♦10.124 Gap between rich and poor: a)

2 n margin

The margin of error is z.025 ( se), or approximately 2(se), 2( se)  2 pˆ (1  pˆ ) 1 n1  1 n2  . With n1  n2  n and pˆ  0.5, pˆ (1  pˆ ) 1 n1  1 n2   0.5(1  0.5) 1 n  1 n   0.5(0.5)(2 n )  0.5 n . The margin of error is

therefore 2 0.5 n  4 0.5 n  (4)(0.5) n  2 n . b)

2 / n = 2 /1000 = 0.045. The following pairs of countries are less than 4.5% different from each other: The United States and the United Kingdom, and Turkey and Argentina. The population percentages for these two pairs of countries might not be different. ♦♦10.125 Small-sample CI

(i)

p̂1 = p̂2 = 0, because there are no successes in either group (i.e., 0/10 = 0).

(ii) se = 0, because there is no variability in either group if all responses are the same. Specifically, both numerators under the square root sign in the se formula would have a zero in them, leading to a calculation of 0 as the se. (iii) The 95% confidence interval would be (0, 0) because we’d be adding 0 to 0 (se multiplied by z would always be 0 with se of 0, regardless of the confidence level). b) Using the small-sample method: (i) pˆ1  pˆ 2  1 12  0.083 (ii) se 

pˆ1 1  pˆ1  n1



pˆ 2 1  pˆ 2  n2



0.083(1  0.083) 0.083(1  0.083)   0.113 12 12

226 Statistics: The Art and Science of Learning from Data, 4th edition ♦♦10.126 Symmetry of permutation distribution For each permutation that leads to a given difference, you get the same difference but with the opposite sign when all dogs in Group 1 are switched to Group 2 and vice versa. Hence, for each permutation that leads to a positive difference, there is one that gives the same difference (by just switching all observations between groups), but with a negative sign, resulting in a symmetric distribution. 10.127 Null standard error for matched pairs a)

From Table 10.18, b  162, c  9, and n  1314, so the null standard error is se0  (b  c ) n 2  (162  9) 13142  0.010.

z

( pˆ1  pˆ1 )  0 (1117 1314  964 1314) bc 162  9   11.7   se0 0.010 bc 162  9

♦♦10.128 Graphing Simpson’s paradox a) A comparison of a pair of circles having the same letter in the middle (e.g., W) indicates that the death penalty was more likely for black than white defendants, when we control for victim’s race. For both cases with white victims and with black victims, a higher percentage of black defendants received the death penalty; the circles for black defendants with victims of a given race are higher on the y-axis than the circles for white defendants with victims of that same race. b) When we compare the x marks for black and white defendants without regard to victim’s race, the mark for white defendants is higher with respect to the y-axis than is the mark for black defendants. c) For white defendants, the overall percentage who got the death penalty is so close to the percentage for the case with white victims because almost all white defendants are accused of killing white victims.

Chapter Problems: Student Activities 10.129 Reading the medical literature The reports will differ based on the article chosen by the class instructor.

Chapter 11: Analyzing the Association Between Categorical Variables 227

Section 11.1: Independence and Dependence (Association) 11.1 Gender gap in politics? a) The response variable is political party identification, and the explanatory variable is gender. b) Political Party Identification Gender Democrat Independent Republican Total n Female 39.6% 37.4% 23.0% 100% 1063 Male 33.0% 43.5% 23.5% 100% 843 Women are more likely than are men to be Democrats, whereas men are more likely than are women to be Independents. The likelihood of being Republican is about the same for males and females. c) There are many possible hypothetical conditional distributions for which these variables would be independent. Distributions should show percentages in the party categories that are the same for men and women. Here’s one such example: Political Party Identification Gender Democrat Independent Republican Total n Female 39.6% 35.7% 26.4% 100% 1063 Male 39.6% 35.7% 26.4% 100% 843 d) Graph for (b): Gender and Political Party Democrat

Independent Republican

Female

0.5

Male

Percent

0.4 0.3 0.2 0.1 0.0

Democrat

Independent Republican Political Party

Graph for (c): Gender and Political Party Democrat

Independent Republican

Female

0.4

Percent

0.3

0.2

0.1

0.0

Democrat

Independent Republican Political Party

Male

228 Statistics: The Art and Science of Learning from Data, 4th edition 11.2 Beliefs of college freshmen a) If the results for the population of college freshmen were similar to these, gender and feelings of being overwhelmed would be dependent. b) There are many hypothetical populations for which these variables would be independent. Distributions should show percentages that are the same for men and women. For example, if the percentages of men and women both were 46%, these variables would be independent. 11.3 Williams College admission a) These distributions refer to those of Y (admitted or not) at given categories of X (gender). Admitted Gender Yes No Total n Male 18.2% 81.8% 100% 3195 Female 16.9% 83.1% 100% 3568 b) X and Y are dependent because the probability of a student being admitted differs by gender. 11.4 Happiness and gender a) The response variable is happiness, and the explanatory variable is gender. b) Happiness Not Too Pretty Very Gender Happy Happy Happy Total n Female 14.2% 54.7% 31.1% 100% 1082 Male 13.9% 56.9% 29.1% 100% 882 The conditional distributions of happiness look nearly identical between females and males, with about 14% being not too happy, about 56% being pretty happy, and 30% being very happy. c) The following is an example of a population conditional distribution that is both consistent with this sample and for which happiness and gender are independent. Happiness Not Too Pretty Very Gender Happy Happy Happy Total n Female 14.0% 56.0% 30.0% 100% 1082 Male 14.0% 56.0% 30.0% 100% 882 11.5 Marital happiness and income a)

Income Below Average Above

Happiness of Marriage Not Too Pretty Very Happy Happy Happy 6 62 139 7 125 283 6 69 115

Total 207 415 190

Happiness of Marriage Not Too Pretty Very Happy Happy Happy Income Total n Below 3% 30% 67% 100% 207 Average 2% 30% 68% 100% 415 Above 3% 36% 61% 100% 190 The conditional distributions of marital happiness for above-average and average income are nearly identical; however, among people of below-average income, the percentage of very happy people is lower, and the percentage of pretty happy people is larger. Across all three income categories, the percentage of very happy people is much larger for marital happiness (always over 60%) than for general happiness (always below 40%).

Chapter 11: Analyzing the Association Between Categorical Variables 229 11.6 What is independent of happiness? Answers will vary. One possible answer is The region of the country in which you live, because the probability of being happy could well be the same in different regions, whereas we would expect happiness to be higher for those who believe in an afterlife, those with higher family income, those with better health, and those who are happier with their job. 11.7 Sample evidence about independence Answers will vary depending on the column variable selected. For example, in 2008, the percentage in each of the 9 regions was about 30% for very happy, 55% for pretty happy, and 15% for not too happy. Independence seems plausible.

Section 11.2: Testing Categorical Variables for Independence 11.8 Life after death and gender a) Observed values: Belief In Life After Death Yes No 605 185 822 155

Gender Male Female b) Expected values:

Total 790 977

Belief In Life After Death Gender Yes No Total Male 368 152 790 Female 789 188 977 There are fewer males but more females who believe in life after death than what would be expected under independence. c) X  2



(observed count  expected count)

608  638

expected count 2



185  152 2 882  7892 155  1882

   16.0 638 152 789 188 11.9 Happiness and gender a) H0: Gender and happiness are independent. Ha: Gender and happiness are dependent. b) If the null hypothesis of independence is true, it is not unusual to observe a chi-square value of 1.04 or larger because the probability is 59% of this occurring. Hence, there is no evidence of an association between gender and happiness. 11.10 What gives P-value = 0.05? a) 1: 3.84 b) 2: 5.99 c) 4: 9.49 d) 16: 26.30 e) 16: 26.30 11.11 Marital happiness and income a) H0: Marital happiness and family income are independent. Ha: Marital happiness and family income are dependent. b) df = (3 – 1)(3 – 1) = 4 c) (i) The expected value of the chi-squared statistic is df = 4.

(ii) The standard deviation is

2df  2  4   8  2.8. 4.58 is (4.58  4) 2.8  0.21, or about 0.2

standard deviations above the expected value under independence. d) We would need a chi-squared value of 9.49 to get a P-value of exactly 0.05.

230 Statistics: The Art and Science of Learning from Data, 4th edition 11.11 (continued) e) The P-value for a chi-squared statistic of 4.58 is greater than 0.25 using Table C and 0.33 using technology. This is a very large P-value. Do not reject the null hypothesis, there is no evidence of an association between marital happiness and income. 11.12 First and second free throw independent? a) Made Second Made First Yes No Total No 48 5 53 Yes 251 34 285 b) It does not seem that his success on the second shot depends on whether he made the first. The chisquared statistic is small and the P-value is large. It would not be unusual for a random sample to have a chi-squared statistic of this size. 11.13 Cigarettes and marijuana a) Marijuana Cigarettes Yes No Total n Yes 61.1% 38.9% 100% 1495 Above 5.9% 94.1% 100% 781 This conditional distribution suggests that marijuana use is much more common for those who have smoked cigarettes than for those who have not. b) 1) The assumptions are that there are two categorical variables (cigarette use and marijuana use in this case), that randomization was used to obtain the data, and that the expected count was at least five in all cells. 2) H0: Cigarette use and marijuana use are independent. Ha: Cigarette use and marijuana use are dependent.

3) X 2  642.0 4) The P-value is approximately 0. 5) If the null hypothesis were true, the probability would be close to 0 of getting a test statistic at least as extreme as the value observed. This P-value is quite low. We have extremely strong evidence that marijuana use and cigarette use are associated. 11.14 Smoking and alcohol a) False b) The z statistic would be the square root of the chi-squared statistic, so z  X 2  451.404  21.25. The P-value would remain 0.000. 11.15 Help the environment a) H0: Willingness to accept cuts to help the environment and being in school or retired are independent. Ha: Willingness to accept cuts to help the environment and being in school or retired are dependent. b) r = 2 and c = 5; thus, df = (r – 1)(c – 1) = (2 – 1)(5 – 1) = 4 For a P-value of 0.05, X 2  9.49 and for a P-value of 0.025, X 2  11.14. For this data, X 2  9.56, so: (i) The P-value is less than 0.05. (ii) The P-value is greater than 0.025. d) (i) With a significance level of 0.05, there is evidence for an association between helping the environment and whether someone is in school or retired. (ii) With a significance level of 0.025, there is not enough evidence for an association between helping the environment and whether someone is in school or retired. c)

Chapter 11: Analyzing the Association Between Categorical Variables 231 11.16 Primary food choice of alligators a) Primary Food Lake Fish Invertebrates Birds & Reptiles Other Total n Hancock 54.5% 7.3% 14.6% 23.6% 100% 55 Trafford 24.5% 34.0% 22.6% 18.9% 100% 53 b) H0: The distribution of primary food choice is the same for alligators caught in lakes Hancock and Trafford (homogeneity). Ha: The distributions differ for the two lakes. c) df = (2 – 1)(4 – 1) = 3, so we expect the chi-squared statistic to be 3 with a standard deviation of 2df  2  3  6  2.5. Since 16.79 is (16.79  3) 2.5  5.5, or 5.5 standard deviations about the

expected value of 3, it is considered extreme. d) Since the P-value is less than 0.001, there is strong evidence that the distribution of primary food choice of alligators differs in the two lakes. 11.17 Aspirin and heart attacks a) Heart attack Treatment Yes No Total Placebo 28 656 684 Aspirin 18 658 676 b) 1) The assumptions are that there are two categorical variables (treatment and heart attack incidence), that randomization was used to obtain the data, and that the expected count was at least five in all cells. 2) H0: Treatment and incidence of heart attack are independent. Ha: Treatment and incidence of heart attack are dependent. 3) X 2  2.1 4) The P-value is 0.14. 5) If the null hypothesis were true, the probability would be 0.14 of getting a test statistic at least as extreme as the value observed. This is not strong evidence against the null hypothesis. It is plausible that the null hypothesis is correct and that treatment and heart attack incidence are independent. 11.18 z test for heart attack study a) The population proportion, p1, is the proportion of placebo users who have heart attacks, and, p2, is the proportion of aspirin users who have heart attacks. The hypotheses would be: H0: Treatment and incidence of heart attack are independent (H 0 : p1  p2 ). Ha: Treatment and incidence of heart attack are dependent (H a : p1  p2 ). b) The results in the two tests are identical. The P-value is the same, and the chi-squared statistic is the square of the z test statistic. 11.19 Severity of fever after flu shot a) H0: The distribution of severity of fever is the same in the active and placebo group (homogeneity). Ha: The distributions differ. b) Since df = (2 – 1)(3 – 1) = 2, the expected chi-squared statistic would be around 2, with a standard deviation of

2df  2  2   4  2. 2.49 is (2.49  2) 2  0.245, or about one-fourth of a standard

deviation above the expected value, which makes it not extreme. Since a P-value of 0.287 is larger than any reasonable significance level, there is no evidence that the distribution of the severity of fewer differs between the active and placebo group. 11.20 What is independent of happiness? The results for this exercise will be different based on the variables selected by each student. c)

232 Statistics: The Art and Science of Learning from Data, 4th edition 11.21 Testing a genetic theory

H 0 : p  0.75 (The probability of a green seedling is 0.75.) H a : p  0.75 (The probability of a green seedling is not 0.75.)

b) Under the null hypothesis, we would expect 1103(0.75) = 827.25 green seedlings and 1103(0.25) = 275.75 yellow seedlings. The chi-squared goodness-of-fit statistic is then (854  827.25)2 (249  275.75)2 2 X    3.46 with df = 2 – 1 = 1. 827.25 275.75 c) The P-value is 0.06. The probability of obtaining a test statistic as extreme as that observed, assuming the null hypothesis is true, is 0.06. There is evidence against the null, but not very strong. 11.22 Birthdays by quarters a) H0: The probabilities of having a birthday in any given quarter are the same (p = 1/4). Ha: The probabilities are not the same. b) Under the null hypothesis, we would expect 84(0.24) birthdays in each quarter. The chi-squared (19  21)2 (21  21)2 (31  21)2 (13  21)2     8. goodness-of-fit statistic is: X 2  21 21 21 21 c) df = 4 – 1 = 3 d) The P-value is 0.046. Since this is less than the significance level of 5%, there is evidence that in the population of Williams College students, the probabilities of a birthday in any given quarter are not the same. 11.23 Checking a roulette wheel a) P(Each Pocket) = 1/37 b) 3700(1/37) = 100 X2  

(observed count  expected count) 2 expected count



110  1002

1 100 d) df = c – 1 = 37 – 1 = 36; the P-value is 0.54; Since the P-value is quite large, there is not strong evidence that the roulette wheel is not balanced.

Section 11.3: Determining the Strength of the Association 11.24 Democrat, race and gender a) 212/300 = 0.707 of blacks and 422/1468 = 0.287 of whites identify as Democrat. The difference between blacks and whites who identify as Democrat is 0.707 – 0.287 = 0.42, so the proportion of blacks identifying as Democrat is 42 percentage points higher than for whites. 421/1081 = 0.389 of females and 278/879 = 0.317 of males identify as Democrat. The difference between females and males who identify as Democrat is 0.389 – 0.317 = 0.072, so the proportion of females identifying as Democrat is 7 percentage points higher compared to males. Race has a stronger association with whether someone identifies as Democrat because the difference of proportion is much larger. The difference between blacks and whites is 0.40. Thus, race has a stronger association with whether one identifies as a Democrat. b) The proportion identifying as Democrat is 0.707/0.287 = 2.46 times higher for blacks than for whites. (Or, blacks are 2.46 times more likely to identify as Democrat than whites.) The proportion of females identifying as Democrat is 0.389/0.316 = 1.23 times higher (or 23% higher) higher than the proportion of males. Race has a stronger association with whether someone identifies as Democrat because the ratio of proportion is larger. c) The odds for blacks are 0.707/(1 – 0.707) = 2.4, which can be written as 2.4:1 = 1:0.42 or 100:42. For blacks, for every 100 identifying as Democrat, there are 42 not identifying as Democrat. The odds for whites are 0.287/(1 – 0.287) = 0.4, which is 0.4:1 = 4:10 or 100:250. For whites, for every 100 identifying as Democrat, there are 250 not identifying as Democrat. The odds ratio is 2.4/0.4 = 6. The odds for identifying as Democrat are 6 times higher for blacks than for whites.

Chapter 11: Analyzing the Association Between Categorical Variables 233 11.25 Death penalty associations a) False, a larger chi-squared value might be due to a larger sample size rather than a stronger association. b) Yes. The confidence interval for race effect covers larger values than confidence interval for gender effect, so race has the stronger association with opinion on death penalty. 11.26 Smoking and alcohol a) 64% of those who had not smoked cigarettes also used alcohol, whereas 97% of those who had smoked cigarettes had used alcohol. 0.97 – 0.64 = 0.33, so the proportion who had used alcohol is 0.33 higher for cigarette users than for non-cigarette users. b) The proportion who had used alcohol is 0.97/0.64 = 1.5 times higher for those who used cigarettes compared to those who didn’t. c) The odds of having used alcohol for cigarette users are 0.97/0.03 = 32, or 32:1. For cigarette users, for every 32 using alcohol, one is not using alcohol. For nonusers of cigarettes, the odds of having used alcohol are 0.64/0.36 = 1.8, or 1.8:1. For nonusers of cigarettes, for every 1.8 using alcohol, one is not using it. The odds ratio is 32/1.8 = 17.8. The odds of having used alcohol are about 18 times higher for students who also used cigarettes compared to those who didn’t. 11.27 Sex of victim and offender There are several different possible answers to this exercise, including the following answer. Among female offenders, 421/545 = 0.77 had male victims. Among male offenders, 3725/5334 = 0.70 had male victims. The relative risk is 0.77/0.70 = 1.1. Female offenders were 1.1 times more likely to have a male victim than were male offenders. (That is, relatively more females murder males than males murder males.) 11.28 Smelling and mortality No, the odds ratio being greater than 3 does not imply that the relative risk is greater than 3. With the given information, we do not know whether the proportion dying is more than 3 times larger in the anosmic group. We only know that the odds of dying are more than three times larger. We cannot interpret an odds ratio as a relative risk. 11.29 Vioxx a) The proportion with myocardial infarctions in the naproxen group was 0.001 – 0.004 = –0.003, or 0.3 percentage points lower than the proportion in the rofecoxib group. b) The proportion with myocardial infarctions in the naproxen group is 0.2 times (or 80%) lower than the proportion in the rofecoxib group. c) Myocardial infarctions were 1/0.2 = 5 times more likely. 11.30 Egg and cell derived vaccine a) 26/3900 = 0.0067 of subjects that received the cell-derived flu vaccine developed the flu, whereas 24/3900 = 0.0062 of subjects that received the egg-derived flu vaccine developed the flu. The relative risk of developing the flu is (26 / 3900) (24 / 3900)  1.1. Subjects that received the cell-derived flu vaccine were 1.1 times as likely as subjects that received the egg-derived flu vaccine to develop the flu.  26 3900 1  26 3900 b) The odds ratio is  1.1. The odds of developing the flu are 10% larger in the  24 3900 1  24 3900 cell-derived vaccine group. c) Yes, both relative risk and odds ratio are close to 1, which is the value that occurs when the probabilities are roughly the same in the two groups. 11.31 Risk of dying for teenagers a) The difference of proportions is 0.00135 – 0.00046 = 0.00089. The proportion of male teenagers who die is 0.00089 higher than the proportion of female teenagers who die. b) The relative risk is 0.00135/0.00046 = 2.9. Male teenagers are 2.9 times more likely to die than are female teenagers. c) The relative risk seems more useful because it shows there is a substantial gender effect, which the difference does not show when both proportions are close to 0.

234 Statistics: The Art and Science of Learning from Data, 4th edition 11.32 Marital happiness a) Since the P-value is less than 0.001, there is strong evidence for an association between marital and general happiness.

b) No, not generally. Large X 2 values can occur even for weak (but still significant) associations. c) The percentage of being not too happy is 20/588 – 11/26 = 0.389, or about 40 percentage points higher for those who are not too happy in their marriage compared to those who are very happy in their marriage. d) Those who are not too happy in their marriage are about (11/ 26) (20 / 588)  11.8, or about 12 times as likely to be not too happy compared to those who are very happy in their marriage. 11.33 Party ID and gender a) The proportion of females who identify as Republican is 244/1063 – 198/843 = –0.005, or 0.005 smaller than the proportion for males. b) The proportion of females who identify as Democrat is 421/1063 – 278/843 = 0.066, or 0.066 higher than the proportion for males. c) The proportion of females who identify as Republican is (244 /1063) (198 / 843)  0.98, or 2% lower than the proportion for males. d) The proportion of females who identify as Democrat is (421/1063) (278 / 843)  1.2, or 20% higher than the proportion for males. e) There is a rather weak association between gender and whether identifying as Republican, as seen from (a) and (c). There is a stronger association between gender and whether identifying as Democrat, as seen in (b) and (d). 11.34 Chi-squared versus measuring association The analysis in (c) and (d) describes the association observed in the data, whereas the chi-squared test is an inferential procedure about the association in the population (here: the association between marital and general happiness in the married U.S. adult population).

Section 11.4: Using Residuals to Reveal the Pattern of Association 11.35 Standardized residuals for happiness and income a) The standardized residual indicates the number of standard errors that the observed count falls from the expected count. In this case, the observed count falls 2.49 standard errors below the expected count. b) The standardized residuals highlighted in green designate conditions in which the observed counts are much higher (more than 3 standard deviations) than the expected counts, relative to what we’d expect due to sampling variability. For people with above-average income, many more were very happy than what independence between income and happiness would predict. For people with below-average income, many more were not happy than what independence would imply. c) The standardized residuals highlighted in red designate conditions in which the observed counts are much lower (more than 3 standard deviations) than the expected counts, relative to what we’d expect due to sampling variability. For people with below-average income, many fewer were very happy than what independence would imply. For people with average income, many fewer were not happy than what independence would predict. 11.36 Happiness and religious attendance a) The large chi-squared statistic and small P-value indicate that we have strong evidence that there is an association between happiness and religious attendance. b) The cell for attendance at most several times a year and “not too happy,” and that for attendance every week or more and “very happy” have strong evidence that in the population there are more people than if the variables were independent. c) The cell for attendance at most several times a year and “very happy” gives strong evidence that in the population there are fewer people than if the variables were independent.

Chapter 11: Analyzing the Association Between Categorical Variables 235 11.37 Marital happiness and general happiness a) The relatively small standardized residual of –0.9 indicates that the observed count for this cell is only 0.9 standard errors below the expected count. This is not unusual under the null hypothesis of independence, so it is not strong evidence that there is a true effect in that cell. b) There is less than a 1% chance that a standardized residual would exceed 3 in absolute value, if the variables were independent. Based on this criterion, the following cells would lead us to infer that the population has many more cases than would occur if happiness and marital happiness were independent. People who are not too happy in their marriage are more likely to be not too happy in general than would be expected if the variables were independent. People who are pretty happy in their marriage are more likely to be not too happy or pretty happy overall than would be expected if the variables were independent. Lastly, people who are very happy in their marriage are more likely to be very happy overall than would be expected if the variables were independent. 11.38 Happiness and marital status (i) There are more people who say they are “very happy” in the married category than we would expect if the variables were independent. (ii) There are fewer people who say they are “very happy” in the divorced and never married categories than we would expect if the variables were independent. 11.39 Gender gap? There are more women who identify as Democrats, and fewer men who identify as Democrats than would be expected if there were no association between political party and gender. It does seem that political party and gender are not independent – that there is an association. 11.40 Ideology and political party

There is a very large chi-squared statistic: X 2  723, df = 42, and the P-value is approximately 0. There is extremely strong evidence for an association between political ideology and party identification. b) The standardized residuals help us to understand the natural of the association. It appears that people who think of themselves as (extremely) liberal tend to identify strongly with the Democratic Party, whereas people who think of themselves as (extremely) conservative tend to identify with the Republican party.

Section 11.5: Fisher’s Exact and Permutation Tests 11.41 Keeping old dogs mentally sharp a) Could Solve Task Care And Diet Yes No Total Standard 2 6 8 Extra 12 0 12 b) 1) There are two binary categorical variables, and randomization was used. 2) H0: Care and ability to solve the task are independent. (H 0 : p1  p2 )

Ha: Care and ability to solve the task are dependent. (H a : p1  p2 )

3) Test statistic: 2 4) The P-value is 0.001. 5) If the null hypothesis were true, the probability would be 0.001 of getting a test statistic at least as extreme as the value observed. The P-value is quite low (lower than a significance level of 0.05, for example); we can reject the null hypothesis. We have strong evidence that a dog’s care and diet are associated with its ability to solve a task. It is improper to conduct the chi-squared test for these data because the expected cell counts are less than 5 for at least some cells.

236 Statistics: The Art and Science of Learning from Data, 4th edition 11.42 Tea-tasting results Prediction Actual Milk Tea Total Milk 4 0 4 Tea 0 4 4 Total 4 4 8 The P-value for a Fisher’s Exact Test of Independence is 0.014. There is strong evidence of an association between the actual tea preparation and the prediction with respect to the tea preparation. 11.43 Claritin and nervousness a) The P-value for the small-sample test is 0.24. It is plausible that the null hypothesis is true and that nervousness and treatment are independent. b) It is not appropriate to conduct the chi-squared test for these data because two cells have an expected count of less than five, (2.5 and 3.5, respectively). 11.44 AIDS and condom use: a) p1 is the population proportion of those who always used condoms who became infected, and p2 is the population proportion of those who did not always use condoms who became infected. H0: Condom use and infection are independent. Ha: Condom use and infection are dependent. b) A chi-squared test would not be appropriate because the expected count of the cell for those who did not use condoms and who did become infected is less than five. The Fisher’s exact test gives a P-value of 0.0007, which rounds to 0.001. This is very strong evidence against the null hypothesis. There seems to be an association between condom use and HIV infection. 11.45 Quizzes worthwhile? a) Change Study Habit? Class Yes No Total STAT 5 0 5 CS 6 6 12 b) 462 out of 6188, or 0.075, of the tables have a first cell count as large or larger than 5. c) The (exact or permutation) P-value of 0.075 is greater than the significance level of 0.05, which indicates insufficient evidence to conclude that the probability to change study habits is larger in the STAT course. 11.46 Quizzes enhance learning? a) H0: The conditional distribution of the responses on benefits is the same for students in the two courses. Ha: The conditional distributions differ.

b) X 2 c) The test statistic would follow a chi-squared distribution with df = 2. d) The permutation P-value is 2660/10000 = 0.266. There is no evidence that the distribution of responses differs between the STAT and CS class.

Chapter Problems: Practicing the Basics 11.47 Female for President? a) Vote Gender Female Male

Yes 94% 94%

No 6% 6%

Total 100% 100%

Chapter 11: Analyzing the Association Between Categorical Variables 237 11.47 (continued) b) If results for the entire population are similar, it does seem possible that gender and opinion about having a woman President are independent. The percentages of men and women who would vote for a qualified woman may be the same. 11.48 Down syndrome diagnostic test

P(Positive | D)  48 54  0.8889, or about 89% and P(Positive | Dc )  1307 5228  0.25, or 25%.

b) For the Down cases, 89% were correctly diagnosed. For the unaffected cases, 25% get a negative result. The test seems fairly good, but there are a good number of false positives and false negatives. P(Positive | Dc )  48 (48  1307  6)  48 1361  0.035; Of the positive cases, only 0.035 truly have Down syndrome. This result is not surprising because there are so few cases overall. The fairly large number of false positives will overwhelm the much smaller number of actual cases. 11.49 Down and chi-squared 1) The assumptions are that there are two categorical variables (Down syndrome status and blood test result), that randomization was used to obtain the data, and that the expected count was at least five in all cells. 2) H0: Down syndrome status and blood test result are independent. Ha: Down syndrome status and blood test result are dependent.

3) X 2  114.4; df = 1 4) P-value: 0.000 5) If the null hypothesis were true, the probability would be almost 0 of getting a test statistic at least as extreme as the value observed. There is very strong evidence of an association between test result and actual status. 11.50 Herbs and the common cold a) The response variable is whether or not the individual’s cold symptoms improved, and the explanatory variable is treatment (placebo versus Immumax). b) I would explain that if improved cold symptoms did not depend on whether one took Immumax or placebo, then it would be quite unusual to observe the results actually obtained. This provides relatively strong evidence of improved cold symptoms for adults taking Immumax. 11.51 Happiness and number of friends a) If the two variables were independent, it would mean that the chance of any particular happiness category (such as very happy) would be identical for each number of close friends. b) An expected cell count is the number of cases we’d expect in a given cell if the two variables were not associated (i.e., were independent). For the first cell, the expected count = (169  164)/1446 = 19.2. c) (i) Those with the highest number of close friends tend to be happier than independence predicts. (ii) Those with the fewest number of close friends tend to be less happy than independence predicts. 11.52 Gender gap? 1) The assumptions are that there are two categorical variables (party identification and gender), that randomization was used to obtain the data, and that the expected count was at least five in all cells. 2) H0: Party identification and gender are independent. Ha: Party identification and gender are dependent. 3) X 2  10.04; df = 2 4) The P-value is 0.0066. 5) If the null hypothesis were true, the probability would be 0.0066 (less than any reasonable significance level) of getting a test statistic at least as extreme as the value observed. We have very strong evidence that party identification depends on gender.

238 Statistics: The Art and Science of Learning from Data, 4th edition 11.53 Job satisfaction and income a) df = 9; the sampling distribution is the chi-squared probability distribution for df = 9. b) The large P-value indicates that it is plausible that the null hypothesis is correct and that income and job satisfaction are independent. c) With a 0.05 significance level, we would fail to reject the null hypothesis. We cannot accept the null hypothesis because it is possible that there is an association in the population that we are not detecting in this sample, or that there is a weak association that would be detected with a larger sample size. 11.54 Aspirin and heart attacks for women a) (i) The assumptions are that there are two categorical variables (group and cardiovascular event), that randomization was used to obtain the data, and that the expected count was at least five in all cells. (ii) H0: Group and cardiovascular event are independent. Ha: Group and cardiovascular event are dependent.

(iii) X 2  10.66; df = 2 (iv) The P-value is 0.005. (v) The P-value is very small. If the null hypothesis were true, the probability would be 0.005 of getting a test statistic at least as extreme as the value observed. We have very strong evidence that there is an association between cardiovascular event and group for women. b) The proportion of women on placebo who had a stroke was 0.013. The proportion of those on aspirin who had a stroke is 0.011. Thus, the relative risk is 0.013/0.011 = 1.2. Women on placebo are 1.2 times as likely as those on aspirin to have a stroke. 11.55 Crossing Peas a) Pea Type RY RG WY WG Total 315 108 101 32 556 b) P(RY) = 9/16, P(RG) = 3/16, P(WY) = 3/16, and P(WG) = 1/16 c) Expected Values RY RG WY WG Total 9 3 3 1 556   556   556   556    16   16   16   16  556  312.75  104.25  104.25  34.75 d) df = 4 – 1 = 3 e) The P-value will be large because the chi-square statistic of 0.47 is well below the expected value of 3, so area above 0.47 under chi-squared curve with df = 3 is large. 11.56 Women’s role a) The difference of proportions based on gender is 0.148 – 0.159 = –0.011. b) The difference of proportions based on education is 0.390 – 0.117 = 0.273. c) Educational level seems to have the stronger association with opinion. The difference between the proportions of the two educational groups is much larger than the difference between the proportions of the two genders. Education level makes a larger difference than gender. 11.57 Seat belt helps? a) The proportion of those who were injured given that they did not wear a seat belt is 0.125. The proportion of those who were injured given that they wore a seat belt is 0.064. The difference between proportions is 0.125 – 0.064 = 0.061. The proportion who were injured is 0.06 higher for those who did not wear a seat belt than for those who did wear a seat belt. b) The relative risk is 0.125/0.064 = 1.95. People were 1.95 times as likely to be injured if they were not wearing a seat belt than if they were wearing a seat belt.

Chapter 11: Analyzing the Association Between Categorical Variables 239 11.58 Serious side effects a) Serious Side Effect Treatment Yes No Total Zelnorm 13 11,601 11,614 Placebo 1 7030 7031 b) The relative risk is (13 /11,614) (1/ 7031)  7.87. Patients receiving the drug were 7.87 times more likely to experience a serious side effect than patients receiving placebo. 13 11,614 1  13 11,614  13 11,601 c) The odds ratio is   7.88. The odds of experiencing a serious 1 7031 1 7031 1  1 7031 side effect were 7.88 times larger for patients receiving the drug rather than placebo. d) There is sufficient evidence to reject the null hypothesis and conclude the probability of a side effect differs between the drug and placebo group. 11.59 Pesticides

Relative risk is (29 /127) (19, 485 / 26,571)  0.31. The proportion of organic food samples with pesticide residues present was 69% (or 0.31 times) lower than the proportion of conventional food samples with pesticide residues present. b) The proportion of conventional food samples with pesticide residues present was 3.2 (from 1/0.31) times larger than the proportion for organic food samples with pesticide residues.  29 127  1  29 127 29 89 c) The odds ratio is   0.11. The odds of finding 19, 485 26,571 1  19, 485 26,571 19, 485 7086 pesticide residues on organic food samples were 89% (or 0.11 times) lower than on conventional food samples. d) The odds of finding pesticide residues on conventional food samples were 1/0.11 = 9.1 times higher than on organic food samples. e) The proportion is only about 3.2 times larger (the relative risk), but the odds are more than 9 times larger. This statement confuses relative risk with the odds ratio. 11.60 Race and party ID a) The expected count for the first cell is (275  651)/1791 = 100.0. It is what we would expect if there were no association. b) The standardized residual of 12.5 for the first cell indicates that the observed count falls 12.5 standard errors from the expected count. It is very likely that the population proportion in this cell is higher than what would be expected if the two variables were independent. c) The four corner cells indicate that there are many more blacks who are Democrats and many more whites who are Republicans than we would expect if political party and race were independent. Similarly, there are far fewer blacks who are Republicans and far fewer whites who are Democrats than we would expect if these two variables were independent. 11.61 Happiness and sex It seems as though those with no partners or two or more partners are less likely to be very happy than would be expected if these variables were not associated, and those with one partner are more likely to be very happy than would be expected if these variables were independent. 11.62 Education and religious beliefs For each of these cells (bachelor or graduate and fundamentalist, bachelor or graduate and liberal and less than high school and fundamentalist), the observed value is much higher than what we would expect if the variables were independent, relative to what we’d expect due to sampling variability. a)

240 Statistics: The Art and Science of Learning from Data, 4th edition 11.63 TV and aggression a) H0: Amount of TV watching and aggression are independent. Ha: Amount of TV watching and aggression are dependent. If p1 is the population proportion of those who are aggressive in the group that watches less than one hour of TV per day and p2 is the population proportion of those who are aggressive in the group that watches more than one hour of TV per day, then the hypotheses can be expressed as H0: p1 = p2 and Ha: p1  p2. b) The P-value is 0.0001. This is a very small P-value. If the null hypothesis were true, the probability would be close to 0 of getting a test statistic at least as extreme as the value observed. We have very strong evidence that TV watching and aggression are associated. 11.64 Botox side effects 1) The assumptions are that there are two binary categorical variables, and randomization was used. 2) H0: Treatment and pain status are independent. Ha: Treatment and pain status are dependent. 3) Test statistic: 9 4) The P-value is 0.12. 5) If the null hypothesis were true, the probability would be 0.12 of getting a test statistic at least as extreme as the value observed. It is plausible that the null hypothesis is correct and that treatment and pain status are independent. 11.65 Clarity of diamonds

a) Technology will confirm that X 2  0.267. b) No, many cell counts are very small, leading to expected cell counts that are less than 5, so the sampling distribution of X 2 may not be approximately chi-squared. c) Approximate permutation P-value is 9908/10,000 = 0.9908. This P-value is large and almost 1. There is no evidence that the clarity depends on whether the diamond’s cut is good or fair. 11.66 Benford’s Law a) Leading Digit Expected Value 1 0.301(130) = 39.13 2 0.176(130) = 22.88 3 0.125(130) = 16.25 4 0.097(130) = 12.61 5 0.079(130) = 10.27 6 0.067(130) = 8.71 7 0.058(130) = 7.54 8 0.051(130) = 6.63 9 0.046(130) = 5.98 b) Chi-squared goodness-of-fit test with df = 9 – 1 = 8. From technology: X 2  7.2, the P-value is 0.5147. Because P-value is not small, there is no evidence that the distribution deviates significantly from Benford’s Law.

Chapter 11: Analyzing the Association Between Categorical Variables 241

Chapter Problems: Concepts and Investigations 11.67 Student data Each student’s short report will be different, but could include the following findings. From MINITAB: Rows: religiosity Columns: life_after_death n u y All 0 8 3 4 15 3.250 4.000 7.750 15.000 1 5 11 13 29 6.283 7.733 14.983 29.000 2 0 2 5 7 1.517 1.867 3.617 7.000 3 0 0 9 9 1.950 2.400 4.650 9.000 All 13 16 31 60 13.000 16.000 31.000 60.000 Cell Contents: Count Expected count Pearson Chi-Square = 21.386, DF = 6, P-Value = 0.002 11.68 Marital happiness decreasing? The conditional distributions show a slight trend for the response of “very happy” such that the percentages drop over time. For the most part, percentages are in the high 60’s in the beginning, dropping to the low 60’s over time. The standardized residuals show a similar pattern. They’re higher in the first few years with one as high as 3.92. Through the years, the standardized residuals tend to get closer to zero, and then become negative in a number of the most recent years. 11.69 Another predictor of happiness? The one-page report will be different depending on the variable that each student finds to be associated with happiness. 11.70 Pregnancy associated with contraceptive use? We would expect these variables to be associated. We would expect pregnancy rates to be higher among women whose partners do not use contraceptives than among women whose partners do use contraceptives. 11.71 Babies and gray hair a) Has Young Children Gray Hair Yes No Yes 0 4 No 5 0 b) Has Young Children Gray Hair Yes No Yes 0% 100% No 100% 0% There does seem to be an association. All women in the sample who have gray hair do not have young children, whereas all women in the sample who do not have gray hair do have young children. c) There often are third factors that influence an association. Gray hair is associated with age (older women being more likely to be gray), and age is associated with having or not having young children (older women being less likely to have young children). Just because two things are associated, doesn’t mean that one causes the other.

242 Statistics: The Art and Science of Learning from Data, 4th edition 11.72 When is chi-squared not valid? The examples of contingency tables can differ for each student; however, a student could choose to include numbers for which the expected scores would be less than five. 11.73 Gun homicide in United States and Britain a) The proportion in the U.S. is 0.000047. The proportion in Britain is 0.00001. The difference of proportions with the U.S. as group 1 is 0.000037, and with Britain as group 1 is –0.000037. The only thing that changes is the sign. b) The relative risk with the U.S. as group 1 is 0.000047/0.00001 = 4.7. The relative risk with Britain as group 1 is 0.00001/0.000047 = 0.213. One value is the reciprocal of the other. c) When both proportions are so small, the relative risk is more useful for describing the strength of association. The difference between proportions might be very small, even when one is many times larger than the other. 11.74 Colon cancer and race

The relative risk of developing colorectal cancer was (46.5 /100,000) (57.3 /100,000)  0.81 times lower for white residents of North Carolina than for African American residents of North Carolina during 20022006. Thus, African American residents of North Carolina were 19% more likely to have been diagnosed with colorectal cancer (1 – 0.81 = 0.19) than white residents. 11.75 True or false: X2 = 0 False 11.76 True or false: Group 1 becomes Group 2 True 11.77 True or false: Relative risk False 11.78 True or false: Relative risk versus odds ratio False 11.79 True or false: Statistical but not practical significance True 11.80 Statistical versus practical significance Given enough participants in a study, we might find a very weak association to be statistically significant. It is important to examine the size of an association in addition to its statistical significance. Otherwise, the significant association might not be practically important. 11.81 Normal and chi-squared with df = 1 a) The chi squared value for a right-tail probability of 0.05 and df = 1 is 3.84, which is the z value for a two-tail probability of 0.05 squared: (1.96)(1.96) = 3.84. b) The chi-squared value for P-value of 0.01 and df = 1 is 6.64 which is (apart from rounding) the z value for a two-tail probability of 0.05 squared: (2.58)(2.58) = 6.63. ♦♦11.82 Multiple response variables a) Because participants were able to give more than one response, these are dependent samples, and so it is not valid to do a chi-squared test. We would need different participants in each cell. b) If we only look at men versus women for one of the factors, the data are independent and we can use a chi-squared test. For example, here is the contingency table for factor A. Income Gap Responsible Gender Yes No Men 60 40 Women 75 25

Chapter 11: Analyzing the Association Between Categorical Variables 243 ♦♦11.83 Standardized residuals for 2  2 tables The two observed values in a given row (or column) must add up to the same total as the two expected values in that same row (or column). Thus, we know that if one of these two expected values is above its related observed count, then the other expected value in that same row (or column) must be below its related observed count. For example, let’s say that we have a row with observed count of 50 in one cell and an observed count of 50 in the other cell. If the expected count for the first cell is 60, then the expected count for the other must be 40. Both pairs must add up to the same number – in this case, 100. 11.84 Degrees of freedom explained a) The order of the calculations is given in the table. Vote for Female President Political Views Yes No Total Extremely Liberal 56 1st: 58 – 56 = 2 58 Moderate 490 2nd: 509 – 490 = 19 509 Extremely Conservative 4th: 61 – 3 = 58 3rd: 24 – 2 – 19 = 3 61 Total 604 24 628 b) The order of the calculations is given in the table. Vote for Female President Political Views Yes No Total Extremely Liberal 2nd: 58 – 2 = 56 1st: 24 – 3 – 19 = 2 58 Moderate 3rd: 509 – 19 = 490 19 509 Extremely Conservative 4th: 61 – 3 = 58 3 61 Total 604 24 628 11.85 What is df? For the top row the last number must be 33 because the numbers must add up to the row total of 100. For the bottom row, the numbers must add to the column totals of 40 for each column. A B C D E Total 24 21 12 10 100 – 67 100 = 33 40 – 24 40 – 21 40 – 12 40 – 10 40 – 33 100 = 16 = 19 = 28 = 30 =7 Total 40 40 40 40 40 200 ♦♦11.86 Variability of chi-squared a) When df is large enough, the chi-squared distribution is fairly bell-shaped, so about 95% of cases fall within two standard deviations of the mean. For the chi-squared distribution (which has df as its mean), the mean plus and minus two standard deviations would be   2 , or df  2 2  df . b)

df  2 2  df   8  2 2 8  8  2 16  8  2(4)  8  8, or (0, 16).

The chi-squared distribution always has 0 as its lower limit. The chi-squared table tells us that for 8 df and a P-value of 0.05, the chi-squared value is 15.5. This would mark the upper 95% of the curve. ♦♦11.87 Explaining Fisher’s exact test a) The twenty distinct possible samples that could have been selected are as follows: F1, F2, F3 F1, F2, M1 F1, F2, M2 F1, F2, M3 F1, F3, M1 F1, F3, M2 F1, F3, M3 F1, M1, M2 F1, M1, M3 F1, M2, M3 F2, F3, M1 F2, F3, M2 F2, F3, M3 F2, M1, M2 F2, M1, M3 F2, M2, M3 F3, M1, M2 F3, M1, M3 F3, M2, M3 M1, M2, M3 This contingency table shows that two males were chosen (M1 and M3) and one was not (M2). It also shows that one female was chosen (F2) and two were not (F1 and F3).

244 Statistics: The Art and Science of Learning from Data, 4th edition 11.87 (continued) b) 10 of the 20 tables have a first cell count of 2 or 3. Under a null hypothesis of no gender bias versus an alternative hypothesis of a preference for males, observing tables with 2 or 3 males selected (i.e., a 2 or 3 in the first cell) has a probability of 0.5, which is the P-value from Fisher’s exact test. These samples are as follows: F1, M1, M2 F1, M1, M3 F1, M2, M3 F2, M1, M2 F2, M1, M3 F2, M2, M3 F3, M1, M2 F3, M1, M3 F3, M2, M3 M1, M2, M3 ♦♦11.88 Likelihood-ratio chi-squared a) If the observed count equals the expected count, then the all the ratios of observed count over expected count in the equation for G2 are equal to 1. The log of 1 is 0. Any observed count multiplied by 0 is 0. Thus, we are summing several 0’s and multiplying them by 2, which, of course, gives 0.

b) In practice, we would not expect to get exactly G 2  X 2  0, even if the variables are truly independent. A given random sample is not likely to have exactly the same breakdown as the population, because of sampling variability. 11.89 Voting with 16 a) Technology will confirm that X 2  14.1. b) No, there are 6 cells with expected cell counts below 5. The chi-squared distribution may not approximate well the actual sampling distribution of X 2 . c) Answers will vary. One permutation yielded a value of 3.1, smaller than the observed 14.1. d) Answers will vary. In one sample, none of the 10 permutations gave a value as large or larger than 14.1. e) Answers may vary slightly. In one sample, 233 of the 10,000 permutations resulted in a chi-squared statistic as large or larger than the observed one, giving a P-value of 0.0233. This means that if there is no association between grade level and opinion, observing a test statistic as large or larger than the one we have observed is unlikely (probability of 0.02). With such a small P-value, we reject the null hypothesis and conclude that there is an association between the grade level and the opinion about voting with 16.

Chapter Problems: Student Activities 11.90 Conduct a research study using the GSS Responses to this exercise will depend on the categorical response variable assigned by the instructor and on the explanatory variables chosen by each student.

Chapter 12: Analyzing the Association Between Quantitative Variables: Regression Analysis 245

Section 12.1: Modeling How Two Variables Are Related 12.1 Car mileage and weight a) The response variable is mileage, and the explanatory variable is weight. b) yˆ  45.6  0.0052 x; The y-intercept is 45.6 and the slope is –0.0052. c)

For each 1000 pound increase in the vehicle, the predicted mileage will decrease by 5.2 miles per gallon. d) The y-intercept is the predicted miles per gallon for a car that weighs 0 pounds. This is far outside the range of the car weights in this database and, therefore, does not have contextual meaning for these data. 12.2 Predicting car mileage a) yˆ  45.6  0.0052(2590)  32.1 b)

y  yˆ  38  32.1  5.9

12.3 Predicting maximum bench strength in males a) yˆ  117.5  5.86(35)  322.6 b)

yˆ  117.5  5.86(0)  117.5

The y-intercept indicates that for male athletes who cannot perform any repetitions for a fatigue bench press, the predicted maximum bench press is 117.5 kg. As repetitions to fatigue bench press (repBP) increases from 0 to 35, the predicted maximum bench press (maxBP) increases from 117.5 kg to 322.6 kg. 12.4 Income higher with age a)

The mean y value for age 20 years is  y  10, 000  1000(20)  10, 000. The variability, based on

  5000, would likely include values from 0 to 25,000 (within three standard deviations of the mean, but with a floor of zero). b) The mean y value for age 50 years is  y  10,000  1000(50)  40,000. The variability, based on   5000, would likely include values from 25,000 to 55,000 (within three standard deviations of the mean). 12.5 Mu, not y For a given x value, there will not be merely one y value because not every elementary schoolgirl in your town who is a given height will weigh the same. It makes more sense to include the mean,  y , rather than

a specific value, y, in the equation. 12.6 Parties and dating The mean would be the mean number of parties attended in the past month for individuals who had a specific amount of dates (e.g., 3) in the past month. Variability would be the amount that the actual individuals who had a specific amount of dates varied in terms of the number of parties they attended.

246 Statistics: The Art and Science of Learning from Data, 4th edition 12.6 (continued) a) It is more sensible to use a straight line to model the means of the conditional distributions rather than the individual observations because the individual observations are not likely to fall in a line, even if their means do. b) The model needs to allow variation around the mean to account for the fact that the people who had a certain amount (number) of dates might have attended different numbers of parties. 12.7 Study time and college GPA a) Scatterplot of GPA vs Study Tim e 3.8 3.6

GPA

3.4 3.2 3.0 2.8 2.6 5

15 Study Tim e

Based on the scatterplot, there appears to be a positive association between GPA and study time. b) From MINITAB: Predicted GPA = 2.63 + 0.0439Studytime. For every 1 hour increase in study time per week, GPA is predicted to increase by about 0.04 points. c) Predicted GPA = 2.63 + 0.0439(25) = 3.73 d) y  yˆ  3.6  3.73  0.13; The observed GPA for Student 2, who studies an average of 25 hours per week, is 3.6, which is 0.13 points below the predicted GPA of 3.73. 12.8 GPA and skipping class a) Scatterplot of GPA vs Skipped 3.8 3.6

GPA

3.4 3.2 3.0 2.8 2.6 0

6 Skipped

The association appears to be negative. b) From MINITAB: Predicted GPA = 3.56 – 0.0820Skipped. For a student who does not skip any classes, the predicted GPA is 3.56 (the y-intercept). For every class a student skips, we predict their GPA to drop by 0.08 points. c) The predicted GPA is 3.56 – 0.082(9) = 2.82. The residual is y  yˆ  2.8  2.82  0.02.

Chapter 12: Analyzing the Association Between Quantitative Variables: Regression Analysis 247 12.9 Cell phone specs a) The response variable is cell phone weight (g), and the explanatory variable is battery size (mAh). Scatterplot of Weight(g) vs Battery_Capacity(m Ah) 200 180

Weight(g)

160 140 120 100 80 500

1000

1500 2000 2500 Battery_Capacity(m Ah)

3000

3500

A clear trend is visible in the scatterplot, showing that phones with larger capacity tend to weigh more. One phone (number 70 with battery capacity of 3300) has a much larger battery capacity than all the others, yet its weight is about average, not following this trend. This phone is a clear outlier. b) The outlier will pull the regression line toward it. Its residual will be negative and very large in absolute value. c) From MINITAB: Predicted Weight = 66.7 + 0.0436Battery (i) Predicted Weight = 66.7 + 0.0436(1000) = 110g (ii) Predicted Weight = 66.7 + 0.0436(1500) = 132g d) For every 100 mAh increase in the capacity of a cell phone’s battery, the predicted weight increases by 4.3g. 12.10 Exercise and watching TV a) Scatterplot of Exercise vs WatchTV 60 50

Exercise

40 30 20 10 0 0

100 WatchTV

150

200

The point with a score of 60 on exercise is an outlier and could make the slope more positive. b) With the exercise score of 60: Exercise = 2.90 + 0.0462WatchTV Without the exercise score of 60: Exercise = 4.54 + 0.0075WatchTV The observation decreased the intercept and increased the slope.

Section 12.2: Inference About Model Parameters and the Association 12.11 t-score? a) df = n – 2 = 25 – 2 = 23 c) We’d use 2.07.

b) –2.07 and 2.07

248 Statistics: The Art and Science of Learning from Data, 4th edition 12.12 Predicting house prices a) (i) Assumptions: Assume randomization, a linear relationship between mean selling price and size of a house in the population, and a normal conditional distribution of price for given sizes, with the same standard deviation. Scatterplot of price vs size 350000 300000

price

250000 200000 150000 100000 50000 0 0

1000

2000 size

3000

4000

(ii) Hypotheses: The null hypothesis that the variables are independent is H0:   0. The two-sided alternative hypothesis of dependence is Ha:   0. (iii) Test statistic: From technology, the test statistic is t = 11.62. We also could calculate the test statistic as follows: t = b/se = 77.008/6.626 = 11.6. (iv) P-value: From technology, the P-value is 0.000. (v) Conclusion: If H0 were true that the population slope   0, it would be extremely unusual (the probability would be almost 0) to get a sample slope at least as far from 0 as b = 77.008. The P-value gives very strong evidence that an association exists between the size and price of houses; this is extremely unlikely to be due to random variation. b) The 95% confidence interval is b  t.025 ( se)  77.008  1.985(6.626), or (64, 90). c)

An increase of $100 is outside the confidence interval and so is a very implausible value for the population slope. 12.13 Confidence interval for slope a) On average, the selling price of a house increases between $64 and $90 for every one square foot increase in size. b) On average, the selling price of a house increases between $6400 and $9000 for every 100 square feet increase in size. 12.14 House prices in bad part of town a) The null hypothesis posits that the slope is 0; that is, it hypothesizes that there is no association between selling price and size of house. A data analyst might choose a one-sided alternative hypothesis for this test because the previous analysis showed a positive association for these variables. b) (i) The test statistic would have to be 1.714 to get a P-value equal to 0.05. (ii) The test statistic would have to be 2.500 to get a P-value equal to 0.01. 12.15 Strength through leg press a) (i) Assumptions: Assume randomization; a linear relationship between mean maximum leg press and number of 200-pound leg presses; and a normal conditional distribution of maximum leg press for a given number of 200-pound leg presses with constant standard deviation. These data are not a random sample, so conclusions are highly tentative. (ii) Hypotheses: The null hypothesis that the variables are independent is H0:   0. The two-sided alternative hypothesis of dependence is Ha:   0. (iii) Test statistic: From the table, the test statistic is 9.64. We also could calculate the test statistic as follows: t = b/se = 5.271/0.547 = 9.64. Copyright © 2017 Pearson Education, Inc.

Chapter 12: Analyzing the Association Between Quantitative Variables: Regression Analysis 249 12.15 (continued) (iv) P-value: From software, the P-value is 0.000. (v) Conclusion: If H0 were true that the population slope   0, it would be extremely unusual (the probability would be close to 0) to get a sample slope at least as far from 0 as b = 5.271. The P-value gives very strong evidence that an association exists between maximum leg press and number of 200-pound leg presses. b) The 95% confidence interval is b  t.025 ( se)  5.2710  2.004(0.5469), or (4.2, 6.4). On average, maximum leg press increases between 4.2 pounds and 6.4 pounds for every additional 200-pound leg press that the athlete can do. The interval gives us a range of plausible values for the increase. The test only tells us there is strong evidence that the increase (slope) is significantly different from 0. 12.16 More boys are bad? a) The negative slope indicates a negative association between life length and number of sons. Having more sons is bad. b) (i) Assumptions: Assume randomization, linear trend with normal conditional distribution for y and the same standard deviation at different values of x. (ii) Hypotheses: The null hypothesis that the variables are independent is H0:   0. The two-sided alternative hypothesis of dependence is Ha:   0. (iii) Test statistic: t = b/se = –0.65/0.29 = –2.24. (iv) P-value: The P-value is 0.026. (v) Conclusion: If H0 were true that the population slope   0, it would be unusual to get a sample slope at least as far from 0 as b = –0.65. In fact, the probability would be 0.026. The P-value gives very strong evidence that an association exists between number of sons and life length. c) The 95% confidence interval is b  t.025 ( se)  0.65  1.966(0.29), or (–1.2, –0.1). The plausible values for the true population slope range from –1.2 to –0.1. It is not plausible that the true slope is 0. 12.17 More girls are good? a) The positive slope indicates a positive association between life length and number of daughters. Having more daughters is good. b) (i) Assumptions: Assume there was roughly a linear relationship between variables, and that the data were gathered using randomization and that the population y values at each x value follow a normal distribution, with roughly the same standard deviation at each x value. (ii) Hypotheses: The null hypothesis that the variables are independent is H0:   0. The two-sided alternative hypothesis of dependence is Ha:   0. (iii) Test statistic: t = b/se = 0.44/0.29 = 1.52 (iv) P-value: From technology, the P-value is 0.13. (v) Conclusion: If H0 were true that the population slope   0, it would not be very unusual to get a sample slope at least as far from 0 as b = 0.44. The probability would be 0.13. It is plausible that there is no association between number of daughters and life length. c) The 95% confidence interval is b  t.025 ( se)  0.44  1.966(0.29), or (–0.1, 1.0). The plausible values for the true population slope range from –0.1 to 1.0. Zero is a plausible value for this slope. 12.18 CI and two-sided tests correspond We would reject the null hypothesis regarding the association between number of sons and life length, but would not reject the null hypothesis regarding the association between number of daughters and life length. For boys, zero does not fall in the confidence interval, and we reject the null hypothesis. It does seem as if number of sons and life length are related. For girls, zero does fall in the confidence interval, and we do not reject the null hypothesis. It is plausible that number of daughters and life length are not related.

250 Statistics: The Art and Science of Learning from Data, 4th edition 12.19 Advertising and sales a) The mean for advertising is 2, and for sales is 7. The standard deviation for advertising is 2.16. The standard deviation for sales also is 2.16.





b  r s y s x  0.857  2.16 2.16  0.857 a  y  x  7  (0.857)(2)  5.286 yˆ  5.286  0.857 x

(i) Assumptions: Assume randomization, linear trend with normal conditional distribution for y and the same standard deviation at different values of x. (ii) Hypotheses: The null hypothesis that the variables are independent is H0:   0. The two-sided alternative hypothesis of dependence is Ha:   0. (iii) Test statistic: t = b/se = 0.857/0.364 = 2.35 (iv) P-value: From technology, the P-value is 0.14. (v) Conclusion: If H0 were true that the population slope   0, it would not be very unusual to get a sample slope at least as far from 0 as b = 0.857. The probability would be 0.14. The P-value is not below the significance level of 0.05, and, therefore, we cannot reject the null hypothesis. It is plausible that there is no association between advertising and sales. 12.20 GPA and study time–revisited (i) Assumptions: Assume randomization, linear trend with normal conditional distribution for y and the same standard deviation at different values of x. (ii) Hypotheses: The null hypothesis that the variables are independent is H0:   0. The one-sided alternative hypothesis of dependence is Ha:   0. (iii) Test statistic: From technology, t = 3.38 (iv) P-value: From technology, the P-value is 0.0075. (v) Conclusion: If H0 were true, the population slope   0, the probability of obtaining a sample slope at least as large as b = 0.044 would be 0.0075. There is strong evidence of an association between GPA and study time. 12.21 GPA and skipping class–revisited The 90% confidence interval is b  t.05 ( se)  0.082  1.94(0.016) , or (–0.11, –0.05). We are 90% confident that the population slope  falls between –0.11 and –0.05. On average, GPA decreases by between 0.11 to 0.05 points for every additional class that is skipped. 12.22 Battery capacity a) (i) Assumptions: Assume randomization; a linear relationship between mean weight of cell phone and the capacity of its battery with a normal conditional distribution of weight of phone for a given battery capacity with constant standard deviation. (ii) Hypotheses: The null hypothesis that the variables are independent is H0:   0. The two-sided alternative hypothesis of dependence is Ha:   0. (iii) Test statistic: t = b/se = 0.0436/0.00806 = 5.41 (iv) P-value: From technology, the P-value is 0.000. (v) Conclusion: If H0 were true that the population slope   0, it would be very unusual (the probability would be almost 0) to get a sample slope at least as far from 0 as b = 0.0436. The Pvalue is beyond the significance level of 0.05, and we can reject the null hypothesis. We have very strong evidence that an association exists between a phone's battery capacity and its weight. b) The 95% confidence interval is b  t.025 ( se)  0.0436  1.9917(0.00806), or (0.028, 0.060). With 95% confidence, we predict that an increase of 1000 mAh in the capacity of a phone’s battery increases the weight of the phone by between 28 and 60 grams. Because 0 is not contained in this interval, the slope is significantly different from 0; i.e., an association exists. This is consistent with the result of the hypothesis test.

Chapter 12: Analyzing the Association Between Quantitative Variables: Regression Analysis 251

Section 12.3: Describing the Strength of Association 12.23 Dollars and thousands of dollars The slope when income is in dollars is 1.50/1000 = 0.0015. 12.24 When can you compare slopes? a) For a $1000 increase in GDP, the predicted percentage of adults using the internet increases by 1.02 percentage points, and the predicted percentage using Facebook increases by 0.46 percentage points. b) The slope for GDP predicting Internet usage is about twice as large as the slope for predicting Facebook usage. Therefore, the impact of an increase in GDP is larger for Internet use than for Facebook use. 12.25 Sketch scatterplot There are a number of possible scatterplots that would fit these scenarios. Here are possibilities. a) b) Scatterplot of y vs x

Scatterplot of y vs x

9 8 7

4 3

2 1 0

0 5

7 x

12.26 Sit-ups and the 40-yard dash a) (i) yˆ  6.707  0.02435(10)  6.46

(ii) yˆ  6.7065  0.024346(40)  5.73 For an increase of 30 in the number of sit-ups, the predicted change in the time for the 40-yard dash is (–0.0243)(30) = –0.73, That is, an athlete who can do 30 more sit-ups is predicted to be 6.46 – 5.73 = 0.73 seconds faster. b) The time of the 40-yard dash will decrease by 0.46 standard deviations for a standard deviation increase in number of sit-ups. c) Because the slope is negative, the correlation also will be negative.





r  b s x s y  0.024346  6.887 0.365  0.46

d) The predicted time difference is 0.46 standard deviations of the 40-yard dash time, which is 0.46(0.365) = 0.17 seconds. 12.27 Body fat a) There is a strong, positive linear association between weight and percent body fat. b) r2 = 0.78; Using the regression equation with weight to predict percent body fat instead of predicting it with the sample mean results in a 78% reduction in the overall prediction error (the sum of the squared errors). c) No, the correlation r does not depend on the units. 12.28 Verbal and math SAT a) yˆ  250  0.5(500)  500; Generally, at the x value equal to its mean, the predicted value of y is equal to its mean.

252 Statistics: The Art and Science of Learning from Data, 4th edition 12.28 (continued)





b) We can find the correlation as follows: r  b s x s y  0.5 100 100  0.5. When the x and y variables have the same variability, the correlation equals the slope. r 2  (0.5)(0.5)  0.25; The sum of squared errors is 25% less when we use the regression equation instead of the mean of y. 12.29 SAT regression toward mean a) yˆ  250  0.5(800)  650

b) The predicted y value will be 0.5 standard deviations above the mean, for every one standard deviation above the mean that x is. Here, x = 800 is three standard deviations above the mean; so the predicted y value is 0.5(3) = 1.5 standard deviations above the mean. 12.30 GPAs and TV watching a) The correlation of –0.35 indicates that there is a negative relation between the two variables. r2 = 0.13 indicates: (i) only a small reduction of 13% in the overall prediction error when using the regression equation with time watching TV to predict GPA rather than using the mean GPA. (ii) 13% of the variability observed in college GPA can be explained by its relationship with time watching TV. b) (i) The student would be below the mean because the correlation and, hence, slope is negative. (ii) 2(0.35) = 0.70 standard deviations below the mean of college GPA. With regression to the mean, the predicted college GPA is closer to the mean GPA than time watching TV is to its mean. 12.31 GPA and study time a) r = 0.81, there is a fairly strong, positive, linear association between GPA and study time. The more time a student studies, the higher their GPA is likely to be. b) r2 = (0.81)2 = 0.656 (i) The overall prediction error when using the regression equation with study time to predict GPA is 66% smaller compared to using the sample mean GPA. (ii) 66% of the variability observed in college GPA can be explained by the linear relationship with study time. 12.32 Placebo helps cholesterol? a) Their mean cholesterol reading at time 2 should be 200 + 100(0.7) = 270. b) This does not suggest that placebo is an effective treatment; this decrease could occur merely because of regression to the mean. Subjects who are relatively high at one time will, on the average, be lower at a later time. So, if a study gives placebo to people with relatively high cholesterol (that is, in the right-hand tail of the blood cholesterol distribution), on the average we expect their values three months later to be lower. 12.33 Does tutoring help? The explanatory variable is the midterm score and the response variable is the final score. We cannot conclude that the tutoring program was successful. These students were very low to start with – two full standard deviations below the mean. This increase could have occurred because of regression to the mean. Subjects who are relatively low at one time will, on the average, be higher at a later time. So, for people with relatively low scores (that is, in the left-hand tail of the distribution of midterm scores), on the average we expect their values on the final to be higher. 12.34 What’s wrong with your stock fund? This might be due to regression to the mean. Stocks that are relatively high one year will, on the average, be lower at a later time. 12.35 Golf regression Regression to the mean suggests that we can expect that the five leaders would, on average, have higher scores the second time around.

Chapter 12: Analyzing the Association Between Quantitative Variables: Regression Analysis 253 12.36 Car weight and mileage There is a 75% reduction in error in predicting a car’s mileage based on knowing the weight, compared to predicting by the mean mileage. This relatively large value means that we can predict a car’s mileage quite well if we know its weight. 12.37 Food and drink sales The scatterplots below demonstrate that at the individual level, the correlation is weaker. The first has a correlation of 0.087 (here, it was constructed using 100 pairs of dollar amounts, representing 10 transactions per day, for individuals, rather than 2500 individuals). The second has a correlation of 0.390 and is constructed the means of the dollar amounts per day). There is a great deal of variability in amounts spent for individuals, but not much variability in mean amounts spent for days. The summary values for days fall closer to a straight line. (Note that the x and y axes of the second scatterplot have a restricted range compared to the first.) Individual Values

Averaged Values 25

24 23 spent on drinks

spent on drinks

22 21 20 19 18 17

16 10

20 25 spent on food

20 21 spent on food

12.38 Yale and UConn The correlation between high-school and college GPA would likely be higher at the University of Connecticut than at Yale. Yale would have a restricted range of high school GPA values, with nearly all of its students clustered very close to the top. UConn would have a wider range of high-school GPAs. The correlation tends to be smaller when we sample only a restricted range of x values than when we use the entire range. 12.39 Violent crime and single-parent families a) Scatterplot of violent crim e rate vs single parent 1600

violent crim e rate

1400 1200 1000 800 600 400 200 0 10

25 30 single parent

The scatterplot shows a likely positive correlation between these variables, with one extreme outlier.

254 Statistics: The Art and Science of Learning from Data, 4th edition 12.39 (continued) b) Technology confirms the change from 0.77 to 0.59. The correlation drops so dramatically because it depends in magnitude on the variability in scores. Without the outlier, the scores concentrate more narrowly at the lower end of the scale, leading to a weaker correlation. By contrast, when all the scores, including the outlier, are included, there is a wider range of values, and we would likely see a stronger correlation for them. In general, the correlation tends to be smaller in absolute value when we sample only a restricted range of x values than when we use the entire range. 12.40 Correlations for the strong and for the weak a) From technology, the correlation between number of 60-pound bench presses before fatigue (BP60) and maximum bench press (maxBP) is 0.80 for females and 0.91 for male. Although both are strong, positive correlations, the correlation for males is stronger. b) (i) Using only the x values below the median of 10 for females, the correlation is only 0.47. Using only the x values below the median of 17 for males, the correlation is 0.93. (ii) Using only the x values above the median of 10 for females, the correlation is 0.67. Using only the x values above the median of 17 for males, the correlation is 0.57. They are so different because the correlation usually is smaller in absolute value when the range of predictor values is restricted.

Section 12.4: How the Data Vary Around the Regression Line 12.41 Poor predicted strengths a) For athlete 10, BP60 = 15.0, which is the number of 60 pound bench presses, and maxBP = 105.00, which is the maximum bench press for this athlete . The predicted maximum bench press for this person, however, 85.90, is in the column “Fit.” In the “Resid” column, we see the difference between the actual and predicted maximum bench press is 19.10. Finally, in the column titled “Std Resid,” we see the standardized residual, 2.41, which is the residual divided by the standard error that describes the sampling variability of the residuals; it does not depend on the units used to measure the variable. b) We would expect about 5% of standardized residuals to have an absolute value above 2.0. Thus, it is not surprising that three would have an absolute value above 2.0. 12.42 Loves TV and exercise a) The residual, 48.8, is the difference between the actual reported minutes of exercise, and the predicted reported minutes of exercise for this student. To find the predicted value: 48.8  60  yˆ  yˆ  60  8.8  yˆ  11.2.

b) The standardized residual is the residual divided by standard error. It is the number of standard errors the residual falls from 0 (the mean or the predicted score). This student falls 6.41 standard errors above what would be predicted for the number of minutes per day she/he watches TV. 12.43 Bench press residuals a) This figure provides information about the distribution of standardized residuals, and hence the conditional distribution of maximum bench press. b) The conditional distribution in (a) seem to be approximately normal. 12.44 Predicting house prices a) Using the residual df of 98, 98  n  2  n  89  2  100. The sample size was 100. b) The sample predicted mean selling price was yˆ  9.2  77(1.53)  127.010, or $127,010. The estimated residual standard deviation of y is the square root of MS Error = 1349. The square root of 1349 is 36.7. d) The prediction interval is: yˆ  2 s  127.010  2(36.729), or (53.6, 200.5).

12.45 Predicting clothes purchases a) The value under “Fit,” 448, is the predicted amount spent on clothes in the past year for those in the 12th grade of school. b) The 95% confidence interval of (427, 469) is the range of plausible values for the population mean of dollars spent on clothes for 12th grade students in the school.

Chapter 12: Analyzing the Association Between Quantitative Variables: Regression Analysis 255 12.45 (continued) c) The 95% prediction interval of (101, 795) is the range of plausible values for the individual observations (dollars spent on clothes) for all the 12th grade students at the school. 12.46 CI versus PI A 95% prediction interval is meant to predict where we can expect individual observations to fall for a given value of x. A 95% confidence interval provides a range of plausible values for the population mean for y at a given value of x. We would expect the PI to be wider than the CI because we can predict the population mean more precisely than we can predict an individual value. 12.47 ANOVA table for leg press a) The residual standard deviation is the square root of Mean Square Error. The square root of 1303.72 is 36.1. This is the estimated standard deviation of maximum leg presses for female athletes who can do a fixed number of 200-pound leg presses. b) The standard deviation is 36.1; with x = 22, yˆ  233.89  5.27(22)  349.92, so the 95% prediction interval for female athletes is yˆ  2 s  349.83  2(36.1), or (277.6, 422.0). 12.48 Predicting leg press

MINITAB got the “Fit” of 365.66 from: yˆ  233.89  5.27 x  233.89  5.27(25)  365.6 (the difference between 365.6 and 365.7 is due to differences in rounding). b) The 95% confidence interval is approximately yˆ  t.025 ( se)  356.66  2.004(5.02), or (355.6, 375.7); these are the plausible values for the population mean of y values at x = 25. c) With 95% confidence, we predict that the maximum leg press for an individual athlete who can leg press 200 pounds 25 times is between 293 pounds and 439 pounds. 12.49 Variability and F a) The Total SS is the sum of the regression SS and the residual SS. The residual SS represents the error in using the regression line to predict y. The regression SS summarizes how much less error there is in predicting y using the regression line compared to using y . a)

b) The sum of squares around the mean divided by n – 1 is 192,787/56 = 3442.6, and its square root is 58.7. This estimates the overall standard deviation of y values, whereas the residual s estimates the standard deviation of y values at a fixed value of x. c) The F test statistic is 92.87; its square root is the t-statistic of 9.64. 12.50 Assumption violated a) Each student’s scatterplot will be different, but should show less variability on y when x is low, and more variability on y when x is large. This reflects the fact that people with lower incomes are limited in how much they can give to charity; the amounts they give to charity will be, by necessity, low. Those with higher incomes have the ability to donate much more to charity, although many will choose to give low or moderate amounts; thus, the variability in amount donated to charity will be much higher among those with high incomes. b) A 95% prediction interval would not work well at very small or very large x values because the prediction interval will have similar widths for each x value. Thus, it will predict a wider range than exists for those with very low incomes, and a narrower range than exists for those with very high incomes. 12.51 Understanding an ANOVA table a) The MS values will be calculated by dividing SS by DF. The top MS will be 200,000/1 = 200,000, and the bottom MS will be 700,000/31 = 22,580.6. F is the ratio of the two mean squares, 200,000/22,580.6 = 8.86. b) The F test statistic is an alternative test statistic for testing H0:   0 against Ha:   0.

256 Statistics: The Art and Science of Learning from Data, 4th edition 12.52 Predicting cell phone weight a) The necessary assumptions are a linear relationship between the mean weight of cell phone and the capacity of its battery; a random sample of cell phones; and a normal conditional distribution of weight of phone for given battery capacity with constant standard deviations. The 95% confidence interval is (127, 137). With 95% confidence, the mean weight of a phone with 1500 mAh of battery capacity falls between 127 grams and 137 grams. b) The 95% prediction interval is (89, 175). With 95% confidence, the weight of a cell phone with 1500 mAh battery capacity falls between 89 grams and 175 grams. c) A prediction interval predicts where individual observations fall for a given value of x. A confidence interval provides a range of plausible values for the population mean for y at a given value of x. 12.53 Cell phone ANOVA From Minitab: Analysis of Variance Source Regression Residual Error Total

DF 1 76 77

SS 13682 35560 49242

MS 13682 468

F 29.24

P 0.000

Total SS is the sum of the Residual SS and Regression SS, so Total SS = 49242 = 13682 + 35560. The total SS is a measure of the overall variability in y (phone weight) but also measures the overall error when using y to predict y. The residual SS measures the overall prediction error when using ŷ to predict y. Their difference equals the regression SS, which tells us how much the prediction error decreases when using ŷ instead of y to predict y.

b) s = 21.6, which is the square root of 468, the mean square error. It estimates the standard deviation of cell phone weight at given battery capacity value and describes a typical value of the residual. c) sy = 25.3; This describes the variability in cell phone weight over the entire range of battery capacity values, not just the variability in cell phone weight at a particular capacity value.

Section 12.5: Exponential Regression: A Model for Nonlinearity 12.54 Savings grow exponentially

y   x  100(1.10)1  110

y   x  100(1.10)5  161.05

y   x  100(1.10) x

d) The first year after which you’ll have more than $200 is the 8th: y   x  (100)(1.10)8  214.36. 12.55 Growth by year versus decade

(1.072)10  2.0

(1.10)10  2.59; The effect here is multiplicative. If you added 10% a year, your savings would double in a decade. 12.56 Moore’s law today

yˆ  151.61(1.191) 20151994  5955

b) The number of components is predicted to increase by a factor of 1.191, which is the estimated value for  in the exponential regression model. This corresponds to a 19.1% increase year over year. c) A linear trend on the log scale and the high correlation indicate that the regression model is appropriate.

Chapter 12: Analyzing the Association Between Quantitative Variables: Regression Analysis 257 12.57 U.S. population growth

a) yˆ  81.14(1.339)0  81.14 million; yˆ  81.14(1.339)11  323.26 million b) 1.1339 is the multiplicative effect on ŷ for a one-unit increase in x. c)

This suggests a very good fit of data to model. The high correlation indicates a linear relation between the log of the y values and the x values. 12.58 Future shock a)

(1.15)5  2; The population size after five decades is predicted to be 2.0 times the original population size.

(1.15)10  4.0; The population size after ten decades is predicted to be 4.0 times the original population size.

(1.15)20  16.4; The population size after twenty decades is predicted to be 16.4 times the original population size. 12.59 Age and death rate

0.32 is the prediction for ŷ when x = 0; 1.078 is the multiplicative effect on ŷ for a one-unit increase in x.

b) (i)

0.32(1.078) 20  1.4

(ii) 0.32(1.078)50  13.7 (iii) 0.32(1.078)80  130.2 c)

Since 0.32(1.078)10  2.12; the predicted death rate doubles every ten years.

12.60 Leaf litter decay a) A straight-line model is inappropriate because the scatterplot shows that the relation between the two variables is curvilinear. Scatterplot of w eight vs w eeks 80 70 60

w eight

50 40 30 20 10 0 0

10 w eeks

b) From technology: weight = 55.0 – 3.59weeks The predicted weight after 20 weeks is 55.0 – 3.59(20) = –16.8; this does not make sense because a weight cannot be negative.

258 Statistics: The Art and Science of Learning from Data, 4th edition 12.60 (continued) c) A straight-line model seems appropriate for the log of y against x. Scatterplot of log(w eight) vs w eeks 2.0

log(w eight)

1.5

1.0

0.5

0.0 0

d) (i)

10 w eeks

yˆ  80.6(0.813)0  80.6

(ii) yˆ  80.6(0.813)20  1.3 e) The coefficient 0.813 indicates that the predicted weight multiplies by 0.813 each week. 12.61 More leaf litter a) The exponential model is more appropriate because the log of y values and the x values have a relation that is closer to a straight line than is the relation of x and y. b)

(0.813)3  0.54, (0.813) 4  0.44; The half life is predicted to be between three and four weeks. (0.813)3.34  0.5 gives a more specific predicted half-life of 3.34 weeks.

Chapter Problems: Practicing the Basics 12.62 Parties and sports a) At fixed values of x there is variability in the values of y so we can’t specify individual y values using x but we can try to specify the mean of those values and how that mean changes as x changes. b) Because y values vary at a fixed value of x, the model has a  parameter to describe the variability of the conditional distribution of those of those y values at each fixed x. 12.63 Verbal–math correlation





b  r s y s x  0.60 120 80   0.9

a  y  bx  500  (0.9)(480)  68, so yˆ  68  0.9 x

Treating mathematics aptitude test as x and verbal aptitude test as y:





b  r s y s x  0.60 80 12   0.4, a  y  bx  480  (0.4)(500)  280, so yˆ  280  0.4 x.

12.64 Stem cells

This is the interpretation of r 2  (0.804)2  0.65, or 65%.

b) This refers to the proportional reduction in overall prediction error, which is r2 = 0.65. 12.65 Short people The response variable is height of children and the explanatory variable is height of parents. Very short parents would tend to have children who are short, but not as short as they are. The prediction equation is based on correlation. Because the correlation between two variables is never greater than the absolute value of 1, a y value tends to be not so far from its mean as the x value is from its mean. This is called regression toward the mean.

Chapter 12: Analyzing the Association Between Quantitative Variables: Regression Analysis 259 12.66 Income and education in Florida a) Technology gives us a correlation of 0.79. (i) This is a positive correlation. As one variable increases, so does the other tend to increase. (ii) The magnitude of the correlation is close to 1, which is a strong correlation. These two variables are highly related. b) A county that is one standard deviation above the mean on x is predicted to be 0.79 standard deviations above the mean on y (see calculations below). This exemplifies regression to the mean; the y value is predicted to be fewer standard deviations from its mean than x is from its mean. The regression equation is: predicted income = –4.6 + 0.419education. Since the mean for education is 69.49 and the standard deviation is 8.86, one standard deviation above the mean, therefore, is 78.35. We would predict an income of –4.6 + 0.419(78.35) = 28.229 (income is in thousands of dollars; hence, $28,229). The mean for income is 24.51 and its standard deviation is 4.69. $28,229 is 0.79 standard deviations above the mean. 12.67 Bedroom residuals a) yˆ  33,778  31,077(3)  127,009; The residual is 338,000 – 127,009 = 210,991. This house sold for $210,991 more than predicted. b) A residual divided by its se is a standardized residual, which measures the number of standard errors that a residual falls from 0. It helps us identify unusual observations. The standardized residual of 4.02 indicates that this observation is 4.02 standard errors higher than predicted. 12.68 Bedrooms affect price? a) The regression parameter is the population slope that is estimated by the slope of the sample, 31,077. It represents the change in predicted selling price when the number of bedrooms increases by 1. b) The 95% confidence interval is b  t.025 ( se)  31,077  1.984(8049), or (15,108, 47,046).

A difference of two bedrooms would double both ends of the confidence interval because the slope is for an increase of 1, and here we have an increase of 2. It would now be: (30,216, 94,092). The difference between the means is double the slope. 12.69 Types of variability a) The residual standard deviation of y refers to the variability of the y values at a particular x value, whereas the standard deviation refers to the variability of all of the y values. b) The fact that they’re not very different indicates that the number of bedrooms is not strongly associated with selling price. The variability of y values at a given x is about the same as the variability of all of the y observations. We can see this by considering the r2 of 0.13. The error using ŷ to predict y is only 13% smaller than the error using y to predict y. 12.70 Exercise and college GPA a) Scatterplot of Exercise vs CGPA 60 50

Exercise

40 30 20 10 0 2.50

2.75

3.00

3.25 CGPA

3.50

3.75

4.00

260 Statistics: The Art and Science of Learning from Data, 4th edition 12.70 (continued) The observation with x of 2.60 and y of 60 is an outlier. It likely makes the slope and correlation more negative than they would be otherwise. b) From technology, the regression equation is: Exercise = 35.7 – 8.24CGPA. The standardized residual of for the point (2.60, 60) is 6.35. The exercise score for this individual is 6.35 standard errors higher than predicted. c) The new regression equation is: Exercise = 9.20 – 1.15CGPA. As expected, the outlier made the slope more negative than without the outlier. 12.71 Bench press predicting leg press a) The 95% confidence interval provides a range of plausible values for the population mean of y when x = 80. The plausible values range from 338 to 365 for the mean of y values for all female high school athletes having x = 80. b) The prediction interval provides a range of predicted y values for an individual observation when x = 80. For all female high school athletes with a maximum bench press of 80, we predict that 95% of them have maximum leg press between about 248 and 455 pounds. The 95% prediction interval is for a single observation y, whereas the confidence interval is for the mean of y. 12.72 Leg press ANOVA a) The estimated standard deviation of the maximum leg press values of those with a maximum bench press of 80 is the residual standard deviation or the square root of the residual MS. The square root of 2624 is 51.2. b) The approximate 95% prediction interval is: yˆ  2 s  351.2  2(51.2), or (248.8, 453.7). 12.73 Savings grow

1000(2)5  $32,000

1000(2)10  $1,024,000

The equation based on decade instead of year is: y  1000(2) x .

12.74 Florida population a) The approximate rate of growth per year is 3.6%.

b) (i) The predicted population size in 1830 is: yˆ  46(1.036)0  46 thousand (ii) In 2000: yˆ  46(1.036)170  18,790 thousand (i.e., almost 19 million) For 2100: yˆ  46(1.036)270  645, 493 thousand. The same formula will not likely hold up between 2000 and 2100. Eventually, we would expect population size within a constrained area to level off. 12.75 World population growth c)

In 1900: yˆ  1.424(1.014)0  1.42 billion In 2000: yˆ  1.424(1.014)110  6.57 billion

b) The fit of the model corresponds to a rate of growth of 1.4% per year because multiplying by 1.014 adds an additional 1.4% each year. c)

(i) The predicted population size doubles after 50 years because (1.014)50  2.0, the number by which we’d multiply the original population size. (ii) It quadruples after 100 years since (1.014)100  4.0.

d) The exponential regression model is more appropriate for these data because the log of the population size and the year number are more highly correlated (r = 0.99) than are the population size and the year number.

Chapter 12: Analyzing the Association Between Quantitative Variables: Regression Analysis 261 12.76 Match the scatterplot a) These values correspond to scatterplot 1. The point to the upper right of the data decreases the magnitude of the correlation and slope; it is an outlier with respect to the fitted line and has a large residual. b) These values correspond to scatterplot 3. The point to the left of the data increases slope in magnitude and correlation; this point is an outlier in the x direction. c) These values correspond to scatterplot 2. There are no outliers.

Chapter Problems: Concepts and Investigations 12.77 Softball data a) The 3 outlying points represent outliers; values more than 1.5  IQR beyond either Q1 or Q3. 30

Difference

-10

-20

b) From Minitab: Difference = –9.125 + 1.178Run The difference is positive when –9.125 + 1.178RUNS > 0, which is equivalent to RUNS > 9.125/1.178, or RUNS > 7.7; thus, the team scores more runs than their opponents with eight or more runs. c) Runs, hits, and difference are positively associated with one another. Errors are negatively associated with those three variables. From Minitab: Run Hits Errors Hits 0.819 Errors -0.259 -0.154 Difference 0.818 0.657 -0.501 d) From technology, the P-value of 0.000 for testing that the slope equals 0 provides extremely strong evidence that DIFF and RUNS are associated.

262 Statistics: The Art and Science of Learning from Data, 4th edition 12.78 Runs and hits a) A scatterplot indicates that a straight-line regression model seems appropriate. Box plots indicate that both runs and hits are skewed in a positive direction. Scatterplot of Run vs Hits

Boxplots of Run and Hits

Run

Hits

25 30

Run

20 15

20 10 5

0 0

20 Hits

40 0

b) Reports will vary, but should include the following descriptive statistics about the individual variables. From Minitab: Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 Run 277 0 10.596 0.325 5.414 1.000 7.000 Hits 277 0 14.643 0.298 4.958 3.000 11.000 Variable Run Hits

Median 10.000 14.000

Q3 14.000 17.000

Maximum 30.000 37.000

Reports will vary, but should include the following statistics about their relationship. From Minitab: Pearson correlation of Run and Hits = 0.819; P-Value = 0.000 The regression equation is Run = - 2.49 + 0.894 Hits Predictor Coef SE Coef T P Constant -2.4940 0.5845 -4.27 0.000 Hits 0.89395 0.03781 23.64 0.000 S = 3.11481 R-Sq = 67.0% R-Sq(adj) = 66.9% Analysis of Variance Source DF SS Regression 1 5422.6 Residual Error 275 2668.1 Total 276 8090.7

MS 5422.6 9.7

F 558.92

P 0.000

Chapter 12: Analyzing the Association Between Quantitative Variables: Regression Analysis 263 12.78 (continued) c) The report should include a discussion of unusual observations from the report of standardized residuals that are greater than 2 in absolute value (listed below). From Minitab: Obs Hits Run Fit SE Fit Residual St Resid 32 14.0 18.000 10.021 0.189 7.979 2.57R 53 8.0 11.000 4.658 0.313 6.342 2.05R 82 12.0 2.000 8.233 0.212 -6.233 -2.01R 119 14.0 17.000 10.021 0.189 6.979 2.24R 120 15.0 18.000 10.915 0.188 7.085 2.28R 123 22.0 25.000 17.173 0.335 7.827 2.53R 156 16.0 5.000 11.809 0.194 -6.809 -2.19R 171 13.0 16.000 9.127 0.197 6.873 2.21R 200 37.0 29.000 30.582 0.866 -1.582 -0.53 X 206 28.0 30.000 22.536 0.539 7.464 2.43RX 211 13.0 16.000 9.127 0.197 6.873 2.21R 230 17.0 19.000 12.703 0.207 6.297 2.03R 233 9.0 12.000 5.551 0.284 6.449 2.08R 246 17.0 22.000 12.703 0.207 9.297 2.99R 258 16.0 21.000 11.809 0.194 9.191 2.96R 263 10.0 16.000 6.445 0.257 9.555 3.08R

d) See Minitab output in (b). 12.79 GPA and TV watching The two-page report will be different for each student, but should interpret results from the following output from technology. From Minitab: Pearson correlation of high_sch_GPA and TV = -0.268

The regression equation is high_sch_GPA = 3.44 - 0.0183 TV Predictor Coef SE Coef T P Constant 3.44135 0.08534 40.32 0.000 TV -0.018305 0.008658 -2.11 0.039 S = 0.446707 R-Sq = 7.2% R-Sq(adj) = 5.6% Analysis of Variance Source DF SS Regression 1 0.8921 Residual Error 58 11.5737 Total 59 12.4658

MS 0.8921 0.1995

F 4.47

P 0.039

12.80 Female athletes’ speed The two-page report will be different for each student, but should interpret results from the following output from technology. From Minitab: Pearson correlation of 40-YD (sec) and WT (lbs) = 0.367 The regression equation is 40-YD (sec) = 5.29 + 0.00536 WT (lbs) Predictor Coef SE Coef T P Constant 5.2920 0.2611 20.27 0.000 WT (lbs) 0.005363 0.001831 2.93 0.005 S = 0.342616 R-Sq = 13.5% R-Sq(adj) = 11.9%

264 Statistics: The Art and Science of Learning from Data, 4th edition 12.80 (continued) From Minitab: Analysis of Variance Source DF SS Regression 1 1.0068 Residual Error 55 6.4562 Total 56 7.4630

MS 1.0068 0.1174

F 8.58

P 0.005

12.81 Football point spreads a) If there is no bias in the Las Vegas predictions, the predictions should exactly match the observations. In other words, y = x, which translates to the true y-intercept equaling 0 and the true slope equaling 1. b) No. Based on the results in the table, the P-value for testing that the true y-intercept is 0 is quite large so that we are unable to conclude that the y-intercept differs from 0. The P-value for testing that the slope is equal to 0 is approximately 0 so that we reject the null hypothesis of the true slope equaling 0. The least squares fit for the slope is quite close to 1, namely 1.0251. 12.82 Iraq war and reading newspapers Regression toward the mean indicates that for any particular pre-war RBS, the predicted during-war RBS will be relatively closer to its mean than the pre-war RBS will be to its mean. Thus, the finding that the during-war scores are not as extreme as the pre-war scores for the light and heavy readers may merely be reflecting the tendency for regression toward the mean. 12.83 Sports and regression There are many examples that students could use in their response. Here’s one example. If a Major League baseball player has an amazing year with many homeruns, he’s unlikely to have that many the following year. If we look at the top ten homerun hitters for a given year, most of them are likely to have fewer homeruns the following year. Once a hitter reaches a given level, he’s more likely to come back toward the average; it’s hard to go up at that point. 12.84 Regression toward the mean paradox Regression toward the mean does not imply that, over many generations, there are fewer and fewer very short people and very tall people. To reverse the logic, most very tall people had parents who were a bit shorter than they, and most very short people had parents who were a bit taller. Extreme observations will always occur, but then the offspring of the extremely tall or short, the next generation, will likely be closer to the mean. 12.85 Height and weight As the range of values reflected by each sample is restricted, the correlation tends to decrease when we consider just students of a restricted range of ages. Using the two samples will increase the ranges for height and weight, and likely increase the correlation. 12.86 Income and education Correlation tends to decrease as the ranges of the values of the sample decrease. Adults who have a Ph.D. have a smaller range of years of education and a smaller range of annual incomes. 12.87 Dollars and pounds a) The slope would change because it depends on units. The slope would be 2 times the original slope. b) The correlation would not change because it is independent of units. c) The t statistic would not change because although the slope doubles, so does its standard error. (The results of a test should not depend on the units we use.) 12.88 All models are wrong a) All models are wrong because reality is never so simple as to follow, for example, exactly a straight line for how the mean of y changes as x changes, with exactly the same variability on y at all values of x. b) Some models are not useful because they are applied inappropriately. For example, a linear model might be applied inappropriately to data that have a curvilinear pattern.

Chapter 12: Analyzing the Association Between Quantitative Variables: Regression Analysis 265 12.89 df for t tests in regression

df = n – the number of parameters; for the model,  y     x, there are two parameters  and  ,

and so df = n – 2. b) When the inference is about a single mean, there is only one parameter, and therefore, df = n – 1. 12.90 Assumptions a) To describe the relationship between two variables, the assumption is that the population mean of y has a straight-line relationship with x. b) To make inferences about the relationship, we assume that the population mean of y has a straight-line relationship with x, that the data were gathered using randomization, and that population y values at each x value have a normal distribution with the same standard deviation at each x value. The third assumption is least critical, especially when the sample size is large. 12.91 Assumptions fail? a) The percentage of unemployed workers would likely fluctuate quite a bit between 1900 and 2005, and this would not be a linear relationship. b) Annual medical expenses would likely be quite high at low ages, then lower in the middle, then high again, forming a parabolic, rather than linear, relationship. c) The relation between these variables is likely curvilinear. Life expectancy increases at first as per capita income increases, and then gradually levels off. 12.92 Lots of standard deviations a)

s y is the standard deviation of all y values around the mean of all y values, y .

s x is the standard deviation of all x values around the mean of all x values, x .

The residual standard deviation s is the standard deviation of all y values at a particular x value. It summarizes the sample variability around the regression line. d) se of the slope estimate b describes the variability of the sampling distribution that measures how that estimate varies from sample to sample of size n. 12.93 Decrease in home values a) The statement is referring to additive growth, but this is multiplicative growth. There is an exponential relation between these variables. yˆ  $175, 000(0.966)10  $123,825; ($123,825  $175,000) $175, 000  0.292, so the percentage decrease for the decade is about 29.2%. 12.94 Population growth

(1.0123)10  1.13, which indicates a growth rate of about 13%.

b) At a growth rate of 1.23%, we’d multiply the initial population by 1.0123 each year. The exponent of x is necessary to indicate the exponential growth over x number of years. 12.95 Multiple choice: Interpret r The best response is (b). 12.96 Multiple choice: Correlation invalid The best response is (a). 12.97 Multiple choice: Slope and correlation The best response is (d). 12.98 Multiple choice: Regress x on y The best response is (a). 12.99 Multiple choice: Income and height The best response is (b).

266 Statistics: The Art and Science of Learning from Data, 4th edition 12.100 True or false a) True b) True c) False d) True e) False ♦♦12.101 Golf club velocity and distance a) We would expect that at 0 impact velocity, there would be 0 putting distance. The line would pass through the point having coordinates (0, 0). b) If x doubles, then x2 (and hence the mean of y) quadruples. For example, if x goes from 2 to 4, then x2 goes from 4 to 16. ♦♦12.102 Why is there regression toward the mean?

For every standard deviation (represented in the units of the variable) that x changes, ŷ will change by that many units times the slope. b) An increase of 1 standard deviation in x is sx units which, from (a), results in a change in ŷ of sxb units, but this is equal to rsy, or r standard deviations in y. ♦♦12.103 r2 and variances Because r 2 

 ( y  y )2   ( y  yˆ )2 , and dividing each term by approximately n (actually, n – 1 and  ( y  y )2

n – 2) gives the variance estimates, it represents the relative difference between the quantity used to summarize the overall variability of the y values (i.e. the variability of the marginal distribution of y) and the quantity used to summarize the residual variability (i.e. the variance of the conditional distribution of y for a given x). These go in the numerator of the respective variance estimates, and their denominators are nearly identical (n – 1 and n – 2). Therefore, the estimated variance of the conditional distribution of y for a given x is approximately 30% smaller than the estimated variance of the marginal distribution of y. ♦♦12.104 Standard error of slope a) A smaller numerator (s) will lead to a smaller value (se). If the standard error of the sample slope is smaller, it is a better estimate of the population slope. This is true because se of the slope estimate b describes the spread of the sampling distribution that measures how that estimate varies from sample to sample of size n. If it varies little from sample to sample, then it is a more precise estimate of the population slope. b) The residual standard deviation decreases when the typical size of the residuals is smaller. If the residuals are smaller, that indicates that the observations are closer to the prediction equation. c) As the sample size increases, the denominator of the expression increases, and se decreases. When the x values are more highly spread out, the denominator also increases, and se decreases. ♦♦12.105 Regression with an error term a) Error is calculated by subtracting the mean from the actual score, y. If this difference is positive, then the observation must fall above the mean. b)   0 when the observation falls exactly at the mean. There is no error in this case. c) Because the residual is e  y  yˆ, e  y  yˆ  y  yˆ  e  y  ( a  bx )  e  a  bx  e. As ŷ is an estimate of the population mean, e is an estimate of  . d) It does not make sense to use the simpler model, y     x, that does not have an error term because it is improbable that every observation will fall exactly on the regression line; it is improbable that there will be no error. ♦♦12.106 Rule of 72 a) The actual number of years necessary for the investment to reach 2000 is the value of x for which the exponential regression with multiplicative effect of 1.06 gives a predicted value of 2000.

b) Using the rule log( a x )  x log( a ); 1.06 x  2  log(1.06 x )  log(2)  x log(1.06)  log(2)  x log(1.06)  log(2) log(1.06)  12.

Chapter 12: Analyzing the Association Between Quantitative Variables: Regression Analysis 267 12.106 (continued) c) (i) 72/1 = 72 years (ii) 72/18 = 4 years

Chapter Problems: Student Activities 12.107 Analyze your data Responses will be different based on the data files for each class and the variables chosen by each instructor.

Chapter 13: Multiple Regression 269

Section 13.1: Using Several Variables to Predict a Response 13.1 Predicting weight a) yˆ  121  3.50 x1  1.35 x2  121  3.50 (66)  1.35 (18)  134.3 b) The residual is y  yˆ  115  134.3  19.3. The actual total body weight is 19.3 pounds lower than predicted. 13.2 Does study help GPA? a) yˆ  1.13  0.643 x1  0.0078 x2  1.13  0.643(3.5)  0.0078(3)  3.40 b) For a fixed study time, the change in predicted college GPA is 0.64 (the slope) as high school GPA goes from 3.0 to 4.0. 13.3 Predicting college GPA a) (i) yˆ  0.20  0.50 x1  0.002 x2  0.20  0.50(4.0)  0.002(800)  3.80 (ii) yˆ  0.20  0.50 x1  0.002 x2  0.20  0.50(2.0)  0.002(200)  1.60 b)

yˆ  0.20  0.50 x1  0.002 x2  0.20  0.50 x1  0.002(500)  0.20  0.50 x1  1  1.20  0.50 x1

yˆ  0.20  0.50 x1  0.002 x2  0.20  0.50 x1  0.002(600)  0.20  0.50 x1  1.2  1.40  0.50 x1

13.4 Interpreting slopes on GPA a)

Setting x2 at a variety of values yields a collection of parallel lines relating ŷ to x1 because the model assumes that the slope for a particular explanatory variable such as x1 is identical for all fixed values of the other explanatory variable, x2 . The slope of each parallel line is 0.50.

b) This does not imply that x1 has a larger effect than does x2 on y in this sample because the two variables do not use the same units. We can only compare slopes if their variables use the same units. 13.5 Does more education cause more crime? a) (i) yˆ  59.1  0.583 x1  0.683 x2  59.1  0.583(70)  0.682(0)  18.3 (ii) yˆ  59.1  0.583 x1  0.683 x2  59.12  0.583(80)  0.683(0)  12.5 b) When we control for urbanization, crime rate changes by the slope multiplied by the change in education. When education goes up 10 (from 70 to 80), predicted crime rate changes by ten multiplied by the slope, 10(–0.5834) = –5.8. c) (i) yˆ  59.1  0.583x1  0.683(0)  59.1  0.583 x1 (ii) yˆ  59.1  0.583 x1  0.683(50)  93.2  0.583 x1 (iii) yˆ  59.1  0.583x1  0.683(100)  127.4  0.583 x1 Scatterplot of predicted crim e vs education 140

urbanization 0 50 100

predicted crim e

120 100 80 60 40 20 0 0

40 60 education

100

For each fixed level of urbanization, the predicted crime rate decreases by 5.8 for every 10 percentagepoint increase in education

270 Statistics: The Art and Science of Learning from Data, 4th edition 13.5 (continued) d) The line passing through the points having urbanization = 50 has a negative slope. The line passing through all the data points has a positive slope. Simpson’s paradox occurs because the association between crime rate and education is positive overall but is negative at each fixed value of urbanization. It happens because urbanization is positively associated with crime rate and with education. As urbanization increases, so do crime rate and education tend to increase, giving an overall positive association between crime rate and education. (i) Scatterplot of crim e rate (per 1000) vs education (%) 140

crim e rate (per 1000)

120 100 80 60 40 20 0 50

65 70 education (%)

(ii) Scatterplot of predicted crim e vs education 100

predicted crim e

90 80 70 60 50 40 30 0

100

education

13.6 Crime rate and income a) (i) yˆ  40.0  0.791x1 (ii) yˆ  104.2  0.791x1 As urbanization increases from 0 to 100%, the intercept increases from 40.0 to 104.2; the slope stays the same. b) As income increases, predicted crime increases. This is opposite of the effect of income in the multiple regression equation. c) (i) Ignoring urbanization, from the simple regression model with y = crime rate and x = median income, the predicted crime rate increases by 2.6 for every $1000 increase in income. (ii) Controlling for urbanization by fitting the multiple regression model including income and urbanization, the predicted crime rate decreases by 0.8 for every $1000 increase in income.

Chapter 13: Multiple Regression 271 13.7 The economics of golf a) The regression formula for a PGA Tour golfer’s earnings for 2008 is: yˆ  26, 417,000  168,300GIR  33,859SS  19,784,000AvePutt  44,725Events. b) The coefficient for each variable is its slope. The predicted total score will decrease by $19,784,000 for each increase of one in the average number of putts after reaching the green, when controlling for the other variables in the model. c) yˆ  26, 417,000  168, 300(60)  33,859(50)  19, 784,000(1.5)  44,725(20)  $7,637, 450 13.8 Comparable number of bedrooms and house size effects a) House selling price is predicted to increase by $63 because the slope associated with a variable is the amount that the predicted value of y will increase when all other variables in the equation are held constant. b) For a fixed house size of 2000 square feet, the predicted selling price of a two bedroom house is yˆ  60,102  63.0(2000)  15,170(2)  $216,442, a three bedroom house is yˆ  60,102  63.0(2000)  15,170(3)  $231,612, and a four bedroom house is yˆ  60,102  63.0(2000)  15,170(4)  $246,782. For a fixed house size, the predicted selling price will increase by $15,170 for each additional bedroom. 13.9 Controlling can have no effect If x1 and x2 are not related to each other, then their slopes would be the same as if each were in a bivariate regression equation by itself. We don’t need to control x2 if it’s not related to x1. Changes in x2 will not have an impact on the effect of x1 on y. 13.10 House selling prices a) Box plots: 900

12000

250000

800

10000 200000 8000

500 400

150000

100000

300 200

6000

4000 50000 2000

100 0

House Size

600 Lot Size

HP in thousands

700

272 Statistics: The Art and Science of Learning from Data, 4th edition 13.10 (continued) Scatterplot matrix: Matrix Plot of HP in thousands, House Size, Lot Size 0

1000

500

5000

10000

HP in thousands

0 10000 House Size

5000

0 200000 Lot Size

100000 0 0

500

1000

100000 200000

House price and house size are positively correlated. The relationship between house price and lot size is not as clear. There are a lot of large lot sizes that don’t increase the house price. Many lot sizes are in the lower conventional size, but there are several large properties. It is possible that the large properties have older or farm houses in worse condition, thereby not yielding a higher price. Ignoring the large lot sizes, the variables show a positive correlation. House size and lot size have similar issues with most of the houses in the lower range. b) From technology: price = 97,001+67.6house_size – 0.08lot_size c) If house size remains constant, there is very little change with respect to an increase in lot size. The reason is that larger properties could be older homes in poorer condition with not as much value. 13.11 Used cars a) The relationship between price and age is linear, negative, and strong. The relationship between price and HP is less clear; it may be linear but rather weak. Matrix Plot of Price, Age, HP 5.0

7.5

10.0

15000 10000 Price 5000

10.0 Age

7.5 5.0

120 80

40 5000

10000

15000

120

Chapter 13: Multiple Regression 273 13.11 (continued) b) From technology: Predicted Price = 19,348.7 – 1406.3Age + 25.5HP (i) Predicted Price = 19,348.7 – 1406.3(8) + 25.5(80) = $10,100 (ii) Predicted Price = 19,348.7 – 1406.3(10) + 25.5(80) = $7,300 c) No, the predicted price difference depends on the age of the two cars because each predicted price depends on it. Only if the age of the two cars is the same does the effect of age cancel out, and the predicted price difference will be 25.5(80 – 60) = 510.

Section 13.2: Extending the Correlation and R2 for Multiple Regression 13.12 Predicting sports attendance a)

R2 

Regression SS   y  y     y  yˆ  4,354, 684,931  1,993,805, 006    0.542 2 Total SS 4,354, 684,931  y  y 2

b) Using these variables together to predict attendance reduces the prediction error by 54%, relative to using y alone to predict attendance. The prediction is somewhat better.

R  R 2  0.542  0.736; There is a moderately strong association between the observed attendance and the predicted attendance. 13.13 Predicting weight a) Because height is more strongly correlated with weight than are age and percent body fat, it is, by itself, the best predictor of weight. Because correlations are not based on units, we can directly compare the strengths of relationships by comparing correlations. c)

b) One of the properties of R 2 is that it gets larger, or at worst stays the same, whenever an explanatory variable is added to the multiple regression model. c) No, the reduction in error from using the regression equation to predict weight rather than the mean is only 1% more when adding age. It only increases from 0.66 to 0.67. 13.14 When does controlling have little effect? Controlling for body fat and then age does not change the effect of height much because height is not strongly correlated with either body fat or age. 13.15 Price of used cars a)

R2 

Regression SS  ( y  y )2   ( y  yˆ )2 222,102, 253  69,534,753    0.69 Total SS 222,102, 253  ( y  y )2

b) Using both age and horsepower to predict used car price reduces the prediction error by 69%, relative to using the sample mean price y . c)

69% of the variability in used car prices can be explained by the varying age and horsepower of the cars. 13.16 Price, age, and horsepower Because HP is correlated with price, and HP and age are correlated (older cars tend to have less HP), once age is in the model for predicting price, HP does not help in further improving the predictive power. Most of the effect of HP on price has already been captured by age. 13.17 Softball data a) We know that the number of hits does not make much of a difference, over and above runs and errors, because its slope is so small. An increase of one hit only leads to a predicted difference 0.026 more. b) A small increase in R2 indicates that the predictive power doesn’t increase much with the addition of this explanatory variable over and above the other explanatory variables. 13.18 Slopes, correlations, and units a) The correlation between predicted house selling price and actual house selling price is 0.72.

274 Statistics: The Art and Science of Learning from Data, 4th edition 13.18 (continued) b) If selling price is measured in thousands of dollars, each y-value would be divided by 1000. For example, $145,000 would become 145 thousands of dollars. Each slope would also be divided by 1000 (e.g., a slope of 63.0 for house size on selling price in dollars corresponds to 0.063 in thousands of dollars for ŷ ).

c) The multiple correlation would not change because it is not dependent on units. 13.19 Predicting college GPA Technology reports that R2 = 25.8%; the multiple correlation is the square root of 0.258, which is 0.51. Using these variables together to predict college GPA reduces the prediction error by 26%, relative to using y alone to predict college GPA. There is a correlation of 0.51 between the observed college GPAs and the predicted college GPAs. Only 25.8% of the observed variability in students’ college GPA can be explained by their high school GPA and study time. The remaining 74.2% of the variability in college GPA is due to other factors.

Section 13.3: Inferences Using Multiple Regression 13.20 Predicting GPA a) If the population slope coefficient equals 0, it means that, in the population of all students, high school GPA doesn’t predict college GPA for students having any given value for study time. For example, for students who study 5 hours, high school GPA does not predict college GPA. b) 1) Assumptions: We assume a random sample and that the model holds (each explanatory variable has a straight-line relation with  y , controlling for the other predictors, with the same slope for

all combinations of values of other predictors in model, and there is a normal distribution for y with the same standard deviation at each combination of values of the other predictors in the model). Here, the 59 students were a convenience sample, not a random sample, so inferences are highly tentative. 2) Hypotheses: H0: 1  0; Ha: 1  0 3) Test statistic: t  (b1  0) se  0.6434 0.1458  4.41 4) P-value: The P-value is approximately 0.000. 5) Conclusion: The P-value of 0.000 gives evidence against the null hypothesis that 1  0. If the null hypothesis were true, the probability would be almost 0 of getting a test statistic at least as extreme as the value observed. We have very strong evidence that high school GPA predicts college GPA, if we already know study time. At common significance levels, such as 0.05, we reject H0. 13.21 Study time help GPA? a) If the null hypothesis were true, the probability would be 0.63 of getting a test statistic at least as extreme as the value observed. It is plausible that the null hypothesis that  2  0 is correct, and that study time does not predict college GPA, if we already know high school GPA. b) The 95 confidence interval for  2 is b2  t.025 ( se)  0.0078  2.003(0.0161), or (–0.02, 0.04). Because 0 falls in the confidence interval, it is plausible that the slope is 0 and that study time has no association with college GPA when high school GPA is controlled. c) No. It is likely that study time and high school GPA are highly correlated and therefore it doesn’t add much predictive power to the model to include study time once high school GPA is already in the model. This does not mean that study time has no association with college GPA. 13.22 Variability in college GPA a) The residual standard deviation, 0.32, describes the typical size of the residuals and also estimates the standard deviation of y at fixed values of the predictors. For students with certain fixed values of high school GPA and study time, college GPAs vary with a standard deviation of 0.32. b) Approximately 95% of college GPAs fall within about 2s, 0.64, of the true regression equation. When high school GPA = 3.80 and study time = 5.0, college GPA is predicted to be 1.116 + 0.643(3.80) + 0.0078(5.0) = 3.61. Thus, we would expect that approximately 95% of the Georgia college students fall between 3.61 – 0.64 = 2.97 and 3.61 + 0.64 = 4.25. Copyright © 2017 Pearson Education, Inc.

Chapter 13: Multiple Regression 275 13.23 Does leg press help predict body strength? a) 1) Assumptions: We assume a random sample and that the model holds (each explanatory variable has a straight-line relation with  y , with same slope for all combinations of values of other

predictors in model, and there is a normal distribution for y with the same standard deviation at each combination of values of the other predictors in the model). Here, the 57 athletes were a convenience sample, not a random sample, so inferences are tentative. 2) Hypotheses: H0:  2  0; Ha:  2  0 3) Test statistic: t  (b2  0) se  0.211 0.152  1.39 4) P-value: The P-value is 0.17. 5) Conclusion: If the null hypothesis were true, the probability would be 0.17 of getting a test statistic at least as extreme as the value observed. It is plausible that the null hypothesis that  2  0 is correct, and that the number of times an athlete can perform a 200-pound leg press does not predict upper body strength (maximum number of pounds she could bench press), if we already know the number of times she can do a 60-pound bench press. b) The 95 confidence interval for  2 is b2  t.025 ( se)  0.2110  2.005(0.1519), or (–0.1, 0.5). Based on this interval, LP200 seems to have a weak impact; 0 is in the confidence interval, indicating that it is plausible that there is no association between LP200 and BP when controlling for BP60. c) When LP200 is included in the model, the P-value of the slope associated with BP60 is 0.000, very strong evidence that the slope of BP60 is not 0. 13.24 Leg press uncorrelated with strength? The first test analyzes the effect of LP200 at any given fixed value of BP60, whereas the second test describes the overall effect of LP200 ignoring other variables. These are different effects, so one can exist when the other does not. In this case, it is likely that LP200 and BP60 are strongly associated with one another, and the effect of LP200 is weaker once we control for BP60. 13.25 Interpret strength variability a) The residual standard deviation estimates the standard deviation of the distribution of maxBP at given values for BP60 and LP200. This standard deviation is assumed to be the same for any combination of BP60 and LP200 values and is estimated as 7.9. The sample standard deviation of maxBP of 13.3 shows how much maxBP values vary overall, over the entire range of BP60 and LP200 values, not over just a particular pair of values. b) Approximately 95% of BPs fall within about 2s = 15.8 of the true regression equation. c) The prediction interval is an inference about where the population maxBP values fall at fixed levels of the two explanatory variables. The prediction interval indicates where a response outcome has a 95% chance of falling. d) It would be unusual because 100 is not in the prediction interval. 13.26 Any predictive power? H0: 1   2  0; the null hypothesis states that neither of the two explanatory variables has an effect on the response variable y. b) F = 3.16 c) The observed F statistic is 51.39 with a P-value of approximately 0.000. If the null hypothesis were true, the probability would be close to 0 of getting a test statistic at least as extreme as the value observed. This P-value gives extremely strong evidence against the null hypothesis that 1   2  0. At common significance levels, such as 0.05, we can reject H0. At least one of the two explanatory variables has an effect on maxBP. 13.27 Predicting pizza revenue a)

 y    1 x1   2 x2 ; where  y is the population mean for monthly revenue, x1 is a given level of TV advertising and x2 is a given level of newspaper advertising.

b) H0: 1  0 c)

276 Statistics: The Art and Science of Learning from Data, 4th edition 13.28 Regression for mental health

The 95% confidence interval for 1 is b1  t.025 ( se)  0.1033  2.0260(0.03250), or (0.04, 0.17).

b) The confidence interval gives plausible values for the slope, the amount that mean mental impairment will increase when life events score increases by one, when controlling for SES. For an increase in 100 units of life events, we multiply each endpoint of the confidence interval by 100, giving (4, 17); this indicates that plausible increases in mean mental impairment range from 4 to 17 when life events scores increase by 100, controlling for SES. 13.29 Mental health again a) The test statistic, F, is 9.49, and the P-value is approximately 0.000. b) Ha: At least one  parameter is not equal to 0. c) The result in (a) indicates only that at least one of the variables is a statistically significant predictor of mental health impairment. 13.30 More predictors for selling price H0: 1   2   3  0 means that house selling price is independent of size of home, number of bedrooms, and age. b) The large F value and small P-value provide strong evidence that at least one of the three explanatory variables has an effect on selling price. c) The results of the t tests tell us that both house size and number of bedrooms contribute to the prediction of selling price. At the 5% significance level, age does not contribute significantly when house size and number of bedrooms are included in the model. 13.31 House prices a)

H0: 1   2  0; Ha: At least one  parameter is not equal to 0. The F-statistic is 108.6 and the P-value is approximately 0.000. The P-value of 0.000 gives very strong evidence against the null hypothesis that 1   2  0. It is not surprising to get such a small P-value for this test because house size and bedrooms are both statistically significant predictors of selling price. b) The t-statistic is 2.85 with a P-value of 0.0025. If the null hypothesis were true, the probability would be 0.025 of getting a test statistic at least as extreme as the value observed. The P-value gives very strong evidence against the null hypothesis that  2  0.

The 95 confidence interval for  2 is b2  t.025 ( se)  15.170  1.97(5.330), or (4.67, 25.67). The plausible values for the slope for number of bedrooms, when controlling for house size, range from about 4.7 to 25.7. This is more informative than the significance test because it not only tells us that the slope is likely different from 0, but it gives us a range of plausible values for the slope.

Section 13.4: Checking a Regression Model Using Residual Plots 13.32 Body weight residuals a) These give us information about the conditional distribution. b) The distribution of the residuals has a shape that is slightly right-skewed, although not too far from bell shaped. This suggests that the conditional distribution of y may be right skewed, although not too far from normal. 13.33 Strength residuals a) The values of BP60 play a role in determining the standardized residuals against which the LP200 values are plotted. b) The residuals are closer to 0 at lower values of LP200. c) Without the three points with standardized residuals around –2, the data do not appear as though there is more variability at higher levels of LP200. We should be cautious in looking at residuals plots because one or two observations might prevent us from seeing the overall pattern. 13.34 More residuals for strength One might think that this suggests less variability at low levels and even less at high levels of BP60, but this may merely reflect fewer points in those regions. Overall, it seems OK.

Chapter 13: Multiple Regression 277 13.35 Nonlinear effects of age a) Example of possible scatterplot: Scatterplot of sleep vs age 10

sleep

6 0

100

age

b) Example of plot of standardized residuals against the values of age: Residuals Versus age (response is sleep)

Standardized Residual

-1

-2 0

60 age

13.36 Driving accidents a) Example of possible scatterplot: Scatterplot of accident rate vs age 0.10

accident rate

0.09 0.08 0.07 0.06 0.05 0.04 0.03 10

age

100

278 Statistics: The Art and Science of Learning from Data, 4th edition 13.36 (continued) b) Example of plot of standardized residuals against the values of age: Residuals Versus age (response is accident rate)

2.0

Standardized Residual

1.5 1.0 0.5 0.0 -0.5 -1.0 10

100

age

13.37 Why inspect residuals? The purpose of performing residual analysis is to determine if assumptions are met for tests. We cannot construct a single plot of the data for all the variables at once because a plot of all of the variables at once would require many dimensions. 13.38 College athletes a) The bottom left and bottom middle plots give 1RM as the response variable. They both show strong, positive associations with 1RM. b) ŷ = 55.01 + 0.1668LBM + 1.658REPS70; 1.66 is the amount that predicted maximum bench press changes for a one unit increase in number of repetitions, controlling for lean body mass.

R 2  0.832; Using these variables together to predict 1RM reduces the prediction error by 83%, relative to using y alone to predict BP.

d) The multiple correlation is 0.91. There is a strong association between the observed and predicted 1RMs. e) F = 7641.5/50.7 = 150.75; The P-value is approximately 0.000. If the null hypothesis were true, the probability would be close to 0 of getting a test statistic at least as extreme as the value observed. We have very strong evidence that BP is not independent of these two predictors. f) 1) Assumptions: We assume a random sample and that the model holds (each explanatory variable has a straight-line relation with  y , with same slope for all combinations of values of other predictors in model, and there is a normal distribution for y with the same standard deviation at each combination of values of the other predictors in the model). Here, the 64 athletes were a convenience sample, not a random sample, so inferences are tentative. 2) Hypotheses: H0: 1  0; Ha: 1  0 3) Test statistic: t = 2.22 4) P-value: The P-value is 0.030. 5) Conclusion: If the null hypothesis were true, the probability would be 0.03 of getting a test statistic at least as extreme as the value observed. The P-value gives relatively strong evidence against the null hypothesis that 1  0. g) The histogram suggests that the residuals are roughly bell-shaped about 0. They fall between about –3 and +3. The shape suggests that the conditional distribution of the response variable is roughly normal. h) The plot of residuals against values of REPS70 describes the degree to which the response variable is linearly related to this particular explanatory variable. It suggests that the residuals are less variable at smaller values of REPS70 than at larger values of REPS70.

Chapter 13: Multiple Regression 279 13.38 (continued) i) The individual with REPS70 around 32 and standardized residual around –3 had a 1RM value considerably lower than predicted. 13.39 House prices a) The histogram checks the assumption that the conditional distribution of y is normal, at any fixed values of the explanatory variables. Although the distribution appears mostly normal, there is an outlier with a very large standardized residual greater than 6.0 that should be considered. Histogram (response is House Price (USD))

Frequency

50 40 30 20 10 0

-1.5

0.0

1.5 3.0 Standardized Residual

4.5

b) This plot checks the assumption that the regression equation approximates well the true relationship between the predictors and the response. The same large standardized residual (greater than 6) is evident, but there does not appear to be any discernable pattern in the residuals. Residuals Versus Lot Size (response is House Price (USD))

Standardized Residual

5 4 3 2 1 0 -1 -2 -3 0

50000

100000 150000 Lot Size

200000

250000

280 Statistics: The Art and Science of Learning from Data, 4th edition 13.40 Selling prices level off For large values of lot size, residuals would be negative and have decreasing trend. The assumption of a straight-line relationship between price and lot size, for given number of bedrooms, is violated. Residuals Versus lot size (response is house price)

Standardized Residual

-1

-2 10000

20000

30000 lot size

40000

50000

Section 13.5: Regression and Categorical Predictors 13.41 U.S. and foreign used cars a) U.S. Cars: yˆ  20, 493  1185 x1  2379(1)  18,114  1185 x1 Foreign Cars: yˆ  20, 493  1185 x1  2379(0)  20, 493  1185 x1 b) Both prediction equations in (a) have the same slope of –1185. For a one-year increase in the age of a car, we predict that the price drops by $1,185. Since the slope is the same, this applies for both types of cars. c) Using the equations from (a): (i) U.S. Cars: yˆ  18,114  1185 x1  18,114  1185(8)  8634 (ii) Foreign Cars: yˆ  20, 493  11,857 x1  20, 493  1185(8)  11, 013 The difference between them is 8634 – 11,013 = –2379, the coefficient for the indicator variable for type. 13.42 Mountain bike prices a) For each one-pound increase in weight, the predicted price decreases by $53.75 for the same type of suspension. b) For a given weight, the predicted price for a front end suspension bike is $643.60 less than for a full suspension bike. 13.43 Predict using house size and condition a) yˆ  96.3  0.0665House_Size  12.9Condition

Good Condition: yˆ  96.3  0.0665House_Size  12.9(1)  109.2  0.0665House_Size Not Good Condition: yˆ  96.3  0.0665House_Size  12.9(0)  96.3  0.0665House_Size

Chapter 13: Multiple Regression 281 13.43 (continued) b) Scatterplot of HP in thousands vs House Size Condition 0 1

900 800

HP in thousands

700 600 500 400 300 200 100 0 0

2000

4000

6000 8000 House Size

10000

12000

The difference between the predicted selling price between homes in good and not good condition, controlling for house size, is the slope for condition, 12.9. 13.44 Quality and productivity a) (i) Minimum: yˆ  61.3  0.35(12)  65.5 (ii) Maximum: yˆ  61.2  0.35(54)  80.2 b) When controlling for region, an increase of one hour leads to a decrease in predicted defects of 0.78 per 100 cars. Japanese facilities had 36 fewer predicted defects, on average, than did other facilities. c) Simpson’s paradox has occurred because the direction of the association between time and defects reversed when the variable of whether facility is Japanese was added. d) Simpson’s paradox occurred because, overall, Japanese facilities have fewer defects and take less time, whereas other facilities have more defects and take more time. When the data are looked at together, this leads to an overall positive association between defects and time. Scatterplot of Defects vs Tim e 180

Facility Japanese Other

160 140 Defects

120 100 80 60 40 20 10

Tim e

13.45 Predicting hamburger sales

x1  1 if inner city; 0 if other; x2  1 if suburbia; 0 if other

b) Suburbia: yˆ  5.8  0.7(0)  1.2(1)  7.0; interstate exits: yˆ  5.8  0.7(0)  1.2(0)  5.8 . Interstate exit restaurants have predicted sales of $5.80 – $7.00 = –$1.20, or $1.20 less than suburban restaurants.

282 Statistics: The Art and Science of Learning from Data, 4th edition 13.46 Houses, size and garage a) From technology: HP in thousands = 64.7 + 0.0674House_Size + 40.3Garage For homes with a garage, HP in thousands = 104.96 + 0.0674House_Size. For homes without a garage, HP in thousands = 64.66 + 0.0674House_Size. b) The coefficient, 40.3, indicates that the predicted selling price for houses with a garage is $40,300 higher than for houses without a garage. 13.47 House size and garage interact? a) The assumption of no interaction means that we are assuming that the slope for house size is the same for houses with and without a garage. b) Scatterplot of HP in thousands vs House Size Garage 0 1

900 800

HP in thousands

700 600 500 400 300 200 100 0 0

2000

4000

6000 House Size

8000

10000

12000

13.48 Equal slopes for car prices? a) interaction b) The prediction equations do not have the same slope. For a one-year increase in the age of a used U.S. car, we predict that the price drops by $1,715. For a one-year increase in the age of a foreign car, we predict that the price drops by only $557. The effect of an increase in age differs for the two types of cars, so they need to be treated separately. c) U.S. Cars: yˆ  23, 417  1715 x1  23, 417  1715(8)  $9,697 Foreign Cars: yˆ  15,536  557 x1  15,536  557(8)  $11,080 13.49 Comparing sales a) For one equation, we would have two explanatory variables, one for number of people and one for location (campus = 0, and mall = 1). b) For separate equations, we would have a model with one explanatory variable, number of people, for the campus location, and another model with one explanatory variable, number of people, for the mall location.

Section 13.6: Modeling a Categorical Response 13.50 Income and credit cards pˆ 

e 3.520.105 x 1 e

 3.520.105 x

e 3.520.105(25) 1 e

 3.520.105(25)

e0.895 1 e

0.895



0.41  0.29 1.41

Chapter 13: Multiple Regression 283 13.51 Hall of Fame induction

For 359 runs: pˆ  For 369 runs: pˆ 

b) For 465 runs: pˆ  For 475 runs: pˆ 

e 6.70.0175(359) 1 e

6.70.0175(359)

e 6.70.0175(369) 1 e

6.70.0175(369)

e 6.70.0175(465) 1 e

6.70.0175(465)

e 6.70.0175(475) 1  e 6.70.0175(475)



0.6587  0.397 1.6587



0.7847  0.440 1.7847



4.2102  0.808 5.2102



5.0153  0.834 6.0153

13.52 Horseshoe crabs

Q1:

e 3.6951.815 x 1 e

3.6951.815 x

e3.6951.815(2.00) 1 e

3.6951.815(2.00)

e3.6951.815(2.85)



0.937  0.48 10937

4.383  0.81 5.383 1 e 1 e b) 0.81 – 0.48 = 0.33. The probability increases by 0.33 over the middle half of the sampled weights. 13.53 More crabs a) x  ˆ ˆ  3.695 1.815  2.04

Q3:

 3.6951.815 x  3.6951.815 x

3.6951.815(2.85)



b) For crabs over 2.04, the estimated probability is greater than 0.50. c) For crabs under 2.04, the estimated probability is less than 0.50. 13.54 Voting and income a)

e 1.000.02(10) 1 e

1.000.02(10)

e 1.000.02(100)



0.4493  0.31 1.4493

2.718  0.73 3.718 1 e The predicted probability of voting Republican increases as income increases. 13.55 Equally popular candidates a) x  ˆ ˆ  1.00 0.02  50, or $50,000

1.000.02(100)



b) (i) Above $50,000, the estimated probability of voting for the Republican candidate is greater than 0.50. (ii) Below $50,000, the estimated probability of voting for the Republican candidate is less than 0.50. c) When x is close to the value at which p = 0.50, the approximate change in the predicted probability p for a one-unit increase in x, $1,000 in this case, is  4  0.02 4  0.005. 13.56 Many predictors of voting a) As family income increases, people are, on average, more likely to vote Republican. As number of years of education increases, people are, on average, more likely to vote Republican. Men are more likely, on average, to vote Republican than are women.

b) (i)

pˆ 

(ii) pˆ 

e2.400.02 x1 0.08 x2 0.20 x3 1 e

2.400.02 x1 0.08 x2 0.20 x3

1 e

2.400.02 x1 0.08 x2 0.20 x3



e2.400.02(40)0.08(16)0.20(1) 1 e

2.400.02(40)0.08(16)0.20(1)

2.400.02(40)0.08(16)0.20(0)

1 e

2.400.02(40)0.08(16)0.20(0)



e0.12 1 e

0.12

e0.32 1 e

0.32



0.8869  0.47 1.8869



0.7261  0.42 1.7261

284 Statistics: The Art and Science of Learning from Data, 4th edition 13.57 Graduation, gender and race a) The response variable is whether or not the student graduated (yes or no). b) Graduated Race Gender Yes No Total White Female 10,781 20,468 31,249 Male 10,727 28,856 39,583 Black Female 2309 10,885 13,194 Male 2054 15,653 17,707 c) Based on these estimates, white women have the highest estimated probability of graduating. The coefficient for race is positive, indicating that 1 (white) would lead to a higher estimated probability of graduating than would 0 (black). Similarly, the coefficient for gender is positive, indicating that 1 (female) would lead to a higher estimated probability of graduating than would 0 (male). 13.58 Death penalty and race a) Controlling for victim’s race, the proportion of black defendants who received the death penalty was 15/191 = 0.079, and the proportion of white defendants who received the death penalty was 53/483 = 0.110, so the death penalty was more likely for black defendants. b) According to this equation, the death penalty is predicted to be most likely for black defendants who had white victims. We know this because the coefficient for defendant race is negative; therefore 0 (black) would lead to a higher predicted death penalty proportion than would 1 (white). Also, the coefficient for victim’s race is positive; therefore, 1 (white) would lead to a higher predicted death penalty proportion than would 0 (black). 13.59 Death penalty probabilities

Black defendant, white victim: pˆ 

e3.5960.868(0)2.404(1) 1 e

3.596 0.868(0) 2.404(1)



0.304  0.233 1.304

b) Defendant’s Race Victim’s Race White Black White 0.113 0.233 Black 0.011 0.027 Defendant’s race has the same effect on the predicted probability of receiving the death penalty for both white and black victims. In both cases, black defendants are predicted to be more likely to receive the death penalty than are white defendants.

c) Death Penalty Percent Defendant’s Race Yes No Yes White 53 430 11.0 Black 15 176 7.9 This is an example of Simpson’s paradox because the direction of association changes when we ignore a third variable. When ignoring victim’s race, the predicted proportion of whites receiving the death penalty, rather than blacks, is now the higher of the two. This occurs because there are more white defendants with white victims than any other group.

Chapter 13: Multiple Regression 285

Chapter Problems: Practicing the Basics 13.60 House prices a) Matrix Plot of House Price (USD), House Size, Bedrooms, T Bath 0

5000

10000

1000000 500000

House Price (USD)

0 10000 House Size

5000 0

8 4

Bedrooms

4 2

T Bath

0 0

500000

1000000

The plots that pertain to selling price as a response variable are those across the top row. The highly discrete nature of x2 and x3 limits the number of values these variables can take on. This is reflected in the plots, particularly the plot for bedrooms by baths. b) From technology: House Price = 39,001 + 53.2House_Size – 7885Bedrooms + 57796Bath When number of bedrooms and number of bathrooms are fixed, an increase of one square foot in house size leads to an increase of $53.2 in predicted selling price. c)

R2 

1.60874  1012

 0.603; This indicates that predictions are 60% better when using the prediction 2.66887  1012 equation instead of using the sample mean y to predict y.

d) The multiple correlation, 0.78, is the square root of R2. It is the correlation between the observed yvalues and the predicted ŷ -values. e)

(i) Assumptions: multiple regression equation holds, data gathered randomly, normal distribution for y with same standard deviation at each combination of predictors. (ii) Hypotheses: H0: 1   2   3  0; Ha: At least one  parameter differs from 0. (iii) Test statistic: F = (5.36245  1011)/5,408,848,557 = 99.14 (iv) P-value: The P-value is approximately 0 for df(3,196). (v) Conclusion: If the null hypothesis were true, the probability would be close to 0 of getting a test statistic at least as extreme as the value observed. We have very strong evidence that at least one explanatory variable has an effect on y. The t statistic is –1.29 with a one-sided P-value of 0.2/2 = 0.1. If the null hypothesis were true, the probability would be 0.1 of getting a test statistic at least as extreme as the value observed. At a significance level of 0.05, we cannot reject the null. It is plausible that the number of bedrooms does not have an effect on selling price. This is likely not significant because it is correlated with the other explanatory variables in this model. It might be associated with selling price on its own, but might not provide additional predictive information over and above the other explanatory variables.

286 Statistics: The Art and Science of Learning from Data, 4th edition 13.60 (continued) g) Histogram (response is House Price (USD))

Frequency

50 40 30 20 10 0

-3.0

-1.5

0.0 1.5 3.0 Standardized Residual

4.5

6.0

This histogram describes the shape of the conditional distribution of y at given values of the explanatory variables. The distribution appears fairly close to normal with two large outliers (greater than 3 in absolute value). h) Residuals Versus House Size (response is House Price (USD))

Standardized Residual

5.0

2.5

0.0

-2.5

-5.0 0

2000

4000

6000 House Size

8000

10000

12000

This plot depicts the size of the residuals for the different house sizes observed in this sample. It indicates possibly greater residual variability (and hence, greater variability in selling price) as house size increases. 13.61 Predicting body strength a) yˆ  60.6  1.33BP_60  0.21LP_200  60.6  1.33(10)  0.21(20)  78.1; The residual is y  yˆ  85  78.1  6.9. b) For athletes who have LP_200 = 20, the equation is yˆ  60.6  1.33BP_60  0.21(20)  64.8  1.33BP_60; the slope of 1.33 indicates that, when controlling for LP_200, an increase of one in BP_60 leads to an increase of 1.33 in predicted BP. c) The small difference indicates that the additional variable does not add much predictive ability.

Chapter 13: Multiple Regression 287 13.62 Softball data a) From technology, the prediction equation is: Difference = –5.00 + 0.934Hits – 1.61Errors; the slopes indicate that for each increase of one hit, the predicted difference increases by 0.93, and for each increase of one error, the predicted difference decreases by 1.61. b) When errors = 0, the prediction equation is yˆ  5.00  0.934Hits; for a predicted difference of 0, we would need 5.35 hits. ( 0  5.00  0.934Hits  0.934Hits  5.00  Hits  5.00 0.934  Hits  5.35 ) Thus, the team would need six or more hits so that the predicted difference is positive (if they can play error-free ball). 13.63 Violent crime a) yˆ  270.7  28.334 x1  5.416 x2  270.7  28.334(10.2)  5.416(92.1)  517.1; The residual is 476 – 517.1 = –41.1. The violent crime rate for Massachusetts is 41.1 lower than predicted from this model. b) (i) yˆ  270.7  28.334 x1  5.416(0)  270.7  28.334 x1

(ii) yˆ  270.7  28.334 x1  5.416(100)  270.9  28.334 x1 As percent living in urban areas increases from 0 to 100, the intercept of the regression equation increases from –270.7 to 270.9. When the second explanatory variable, poverty rate, is held constant, the increase in percent living in urban areas from 0 to 100 would result in an increase in predicted violent crime rate of 541.6. 13.64 Effect of poverty on crime The slope with x3 in the model represents the effect of poverty when controlling for percentage of singleparent families, as well as percent living in urban areas. The slope without x3 in the model represents the effect of poverty when controlling only for percent living in urban areas. 13.65 Modeling fertility a) –0.661 e) 44.50 b) 0.443 f) 33.4554 c) 279,160 g) –5.21 d) 155,577 h) 0.000 13.66 Significant fertility prediction? a) F = 61791/1119 = 55.21; The P-value is 0.000; if the null hypothesis were true, the probability would be close to 0 of getting a test statistic at least as extreme as the value observed. We have very strong evidence that at least one of these explanatory variables predicts y better than the sample mean does. b) The significance test would not be relevant if we were not interested in nations beyond those in the study; if this were the case, the group of nations that we studied would be a population and not a sample. 13.67 GDP, CO2, and Internet a) Predicted GDP  39,892  798CO2  399NoInternet b) For a given percentage of the population not using the internet, we predict GDP to increase by $798,000 for every metric ton increase in CO2 emissions. Controlling for CO2 emissions, we predict GDP to decrease by $399,000 for every percentage point increase in the percentage of the population not using the Internet. 13.68 Education and gender in modeling income Since the effect of education on income changes depending on gender, the explanatory variables, education and gender, are said to interact.

288 Statistics: The Art and Science of Learning from Data, 4th edition 13.69 Horseshoe crabs and width

Q1:

e12.3510.497(24.9)

1 e

12.3510.497(24.9)

e12.3510.497(27.7)



1.0246  0.51 2.0246

4.1202  0.81 5.1202 1 e Across the middle half of crabs with respect to width, the estimated probability of having a male partner increases 0.30, from 0.51 to 0.81. b) (i) x  ˆ ˆ  12.351 0.497  24.9

Q3:

12.3510.497(27.7)



(ii) Above a width of 24.9, the estimated probability of having a male partner nearby is greater than 0.50. (iii) Below a width of 24.9, the estimated probability of having a male partner nearby is less than 0.50. 13.70 AIDS and AZT a) The negative sign indicates that if an individual used AZT, the predicted probability of developing AIDS symptoms was lower. b) black/yes: pˆ 

1 e

1.0740.720(1)0.056(0)



0.1663  0.14 1.1663

e1.0740.720(0)0.056(0)

0.3416   0.26 1  e1.0740.720(0)0.056(0) 1.3416 1) Assumptions: The data were generated randomly. The response variable is binary. 2) Hypotheses: H0: 1  0; Ha: 1  0

black/no: pˆ 

e1.0740.720(1)0.056(0)

3) Test statistic: z  (b  0) se  (0.720  0) 0.279  2.58 4) P-value: 0.010 5) Conclusion: If the null hypothesis were true, the probability would be 0.01 of getting a test statistic at least as extreme as the value observed. We have very strong evidence against the null hypothesis that 1  0. At significance level of 0.05, we can reject H0. 13.71 Factors affecting first home purchase a) We know that, other things being fixed, the predicted probability of home ownership increases with husband’s earnings, wife’s earnings, number of children, and home ownership because the coefficients are positive for all of these explanatory variables. b) We know that the number of years married, given the other variables in the model, shows little evidence of an effect because the estimate divided by the standard error is small (equals –0.93).

Chapter 13: Multiple Regression 289

Chapter Problems: Concepts and Investigations 13.72 Student data The reports will be different for each student, but could include the following, along with associated graphs. From Minitab: The regression equation is college_GPA = 2.83 + 0.203 high_sch_GPA - 0.0092 sports Predictor Coef SE Coef T P Constant 2.8293 0.3385 8.36 0.000 high_sch_GPA 0.20309 0.09753 2.08 0.042 sports -0.00922 0.01161 -0.79 0.430 S = 0.341590

R-Sq = 8.8%

Analysis of Variance Source DF SS Regression 2 0.6384 Residual Error 57 6.6510 Total 59 7.2893

R-Sq(adj) = 5.6% MS 0.3192 0.1167

Unusual Observations Obs high_sch_GPA college_GPA 11 2.30 2.6000 42 2.00 3.0000 50 3.00 4.0000 60 3.40 3.0000

F 2.74

Fit 3.1580 3.1616 3.3002 3.3722

P 0.073

SE Fit 0.1476 0.1351 0.1224 0.1345

Residual St Resid -0.5580 -1.81 X -0.1616 -0.52 X 0.6998 2.19 R -0.3722 -1.19 X

R denotes an observation with a large standardized residual. X denotes an observation whose X value gives it large influence. 13.73 Why regression? The explanations will be different for each student, but should indicate that multiple regression uses more than one characteristic of a subject to predict some outcome. 13.74 Unemployment and GDP a) The positive sign indicates that GDP is predicted to increase as the unemployment rate increases. One would think that GDP would decrease for increasing unemployment rate. b) The P-value of 0.86 indicates that the coefficient is not significantly different from 0. However, this doesn’t automatically imply that it has no effect. It only means it has no effect when CO2 and NoInternet are already included in the model. c) R2 d) This means that the prediction error is essentially the same whether unemployment is in the model or not. Therefore, unemployment does not help in predicting GDP when CO2 and NoInternet are in the model. 13.75 Multiple choice: Interpret parameter The best answer is (a). 13.76 Multiple choice: Interpret indicator The best answer is (d).

290 Statistics: The Art and Science of Learning from Data, 4th edition 13.77 Multiple choice: Regression effects The best answer is (a). 13.78 True or false: R and R2 a) True b) False, it falls between 0 and 1. c) False, R2 describes how well you can predict y using a set of explanatory variables together in a multiple regression model d) True 13.79 True or false: Regression a) True b) True c) False, R-squared cannot exceed 1. d) False, the predicted values ŷ cannot correlate negatively with y. Otherwise, the predictions would be worse than merely using y to predict y. 13.80 True or false: Slopes a) True, the slope for this variable is positive in the bivariate regression equation. b) False, a one-unit increase in x1 corresponds to a change of 0.45 in the predicted value of y, only when we ignore x2 .

True, the slope for x2 is 0.003 when controlling for x1. 0.003 multiplied by 100 is 0.30.

13.81 Scores for religion Indicator variables for a particular explanatory variable must be binary. A variable would equal 1 if an observation fell in that category, and 0 if it did not. Here, we’d have to have three variables: one for Protestant (1 = Protestant, 0 = other), one for Catholic (1 = Catholic, 0 = other), and one for Jewish (1 = Jewish, 0 = other). We would not need one for the “other” category because we would know that someone was in that category if he/she were not in the other three. Using numerical scores would treat the religion as quantitative with equidistant categories, which is not appropriate. 13.82 Lurking variable

y = math achievement score, x1 = height, x2 = age for a sample of children from all the different grades in a school system. 13.83 Properties of R2 R 2  1 only when all residuals are 0 because when all regression predictions are perfect (each y  yˆ ),

residual SS =  ( y  yˆ )2 = 0. When residual SS = 0, R 2 is total SS divided by total SS, which must be 1. On the other hand, R 2  0 when each yˆ  y . In that case, the estimated slopes all equal 0, and the correlation between y and each explanatory variable equals 0. Under these circumstances, the residual SS

would equal the total SS, and R 2 would then be 0. In practical terms, it means that R 2 is only 1 when the regression model predicts y perfectly, and it is only 0 when it doesn’t predict y at all. 13.84 Why an F test? When doing multiple significance tests, one may be significant merely by random variation. When there are many explanatory variables, doing the F test first provides protection from doing lots of t tests and having one of them be significant merely by random variation when, in fact, there truly are no effects in the population.

Chapter 13: Multiple Regression 291 13.85 Multicollinearity a) From Minitab: Source Regression Residual Error Total

DF 2 54 56

SS 1200.2 8674.4 9874.6

The P-value of 0.030 is less than 0.05. b) From Minitab: Predictor Coef SE Coef Constant 54.019 9.677 WT (lbs) 0.1251 0.1378 BF% 0.3315 0.6950

MS 600.1 160.6

F 3.74

T 5.58 0.91 0.48

P 0.000 0.368 0.635

P 0.030

The P-values of 0.368 and 0.635 are both larger than 0.35. 13.86 Logistic versus linear When x = 0, a little extra income is not going to make a difference; one likely can’t afford a home no matter what. Similarly, when x = 50,000, a little extra income won’t make much difference; one likely can afford a home no matter what. Only in the middle, when x = 500, is the extra income likely to “push” someone over the income level at which he or she can afford a home. In such a case, a linear regression model would not be appropriate, although a logistic regression model would. ♦♦13.87 Adjusted R2 10: Adjusted R 2  0.500  2 10  (2  1)  (1  0.500)  0.500  0.143  0.357

100: Adjusted R 2  0.500  2 100  (2  1)  (1  0.500)  0.500  0.010  0.490

1000: Adjusted R 2  0.500  2 1000  (2  1)  (1  0.500)  0.500  0.001  0.499 As the sample size increases, adjusted R 2 approaches R 2 . ♦♦13.88 R can’t go down When you add a predictor, if it has no effect its coefficient is 0. Then the prediction equation is exactly the same as with the simpler model without that variable and R will be exactly the same as before. If having a nonzero coefficient results in better predictions overall, then R will increase. ♦♦13.89 Indicator for comparing two groups If we wanted to compare two groups on a given variable, we could use regression analysis. The response variable y would be the same. The explanatory variable would be the two levels of the groups; one would be assigned 0 and one would be assigned 1. 1  2 would correspond to   0. ♦♦13.90 Simpson’s paradox Scatterplot of death rate vs age state FL LA

7.5 7.0

death rate

6.5 6.0 5.5 5.0 4.5 4.0 30

50 age

292 Statistics: The Art and Science of Learning from Data, 4th edition ♦♦13.91 Parabolic regression a) b)

This is a multiple regression model with two variables. If x, for example, is 5, we multiply 5 by 1 and its square, 25, by  2 . (i)

(ii) Scatterplot of 10 + 2x + 0.5x^2 vs x

Scatterplot of 10 + 2x - 0.5x^2 vs x

12.0 11.5

11.0

10.5 10.0

9.5 9.0

10 0

♦♦13.92 Logistic slope a)

When p = 0.5, p(1 – p) = 0.5(1 – 0.5) = 0.25; 0.25 multiplied by  is the same as  4.

b) 0.1(1 – 0.1) = 0.09, 0.3(1 – 0.3) = 0.21, 0.7(1 – 0.7) = 0.21, and 0.9(1 – 0.9) = 0.09; as p gets closer and closer to 1, the slope approaches 0. 0.26 0.24 0.22

p(1 - p)

0.20 0.18 0.16 0.14 0.12 0.10 0.0

0.1

0.2

0.3

0.4

0.5

0.6

♦♦13.93 When is p = 0.50? p

e   (  /  )

e0 1 1     0.5 0    (  /  ) 11 2 1 e 1 e

Chapter Problems: Student Activities 13.94 Class data The responses will be different for each class.

0.7

0.8

0.9

Chapter 14: Comparing Groups: Analysis of Variance Methods 293

Section 14.1: One-Way ANOVA: Comparing Several Means 14.1 Hotel satisfaction a) The response variable is the performance gap, the factor is which hotel the guest stayed in, and the categories are the five hotels. b) H0: 1  2  3  4  5 ; Ha: at least two of the population means are unequal. df1 = 4 because there are five groups and df1 = g – 1; df2 = 120 because there are 125 people in the study and five groups, and df2 = N – g. d) From a table or technology: F = 2.45 and higher. 14.2 Satisfaction with banking c)

H0: 1  2  3 ; Ha: At least two of the population means are unequal. 1 denotes the population mean level of satisfaction for Group 1, those who interact with a teller at the bank the most; 2 denotes the population mean level of satisfaction for Group 2, those who use ATMs the most, and 3 indicates the population mean level of satisfaction for Group 3, those who use the bank’s Internet banking service the most. b) df1 = g – 1 = 3 – 1 = 2; df2 = N – g = 400 – 3 = 397; From technology: F = 3.02 and higher. c) If the null hypothesis were true, the probability would be 0.63 of getting a test statistic at least as extreme as the value observed. It is plausible that the null hypothesis is correct. We cannot conclude that at least two of the population means are unequal. d) The assumptions are (1) that the population distributions of the response variable for the g groups are normal, (2) those distributions have the same standard deviation  , and (3) the data resulted from randomization. The third assumption is the most important. 14.3 What’s the best way to learn French? a) (i) Assumptions: Independent random samples, normal population distributions with equal standard deviations (ii) Hypotheses: H0: 1  2  3 ; Ha: At least two of the population means are unequal. a)

(iii) Test statistic: F = 2.50 (df1 = 2, df2 = 5) (iv) P-value: 0.18 (v) Conclusion: If the null hypothesis were true, the probability would be 0.18 of getting a test statistic at least as extreme as the value observed. There is not much evidence against the null. It is plausible that the null hypothesis is correct and that there is no difference among the population mean quiz scores of the three types of students. b) The P-value is not small likely because of the small sample sizes. Because the numerator of the between-groups variance estimate involves multiplying by the sample size n for each group, a smaller n leads to a smaller overall between-groups estimate of variance and a smaller test statistic value. c) This was an observational study because students were not assigned randomly to groups. There could have been a lurking variable, such as school GPA, that was associated with students’ group membership, but also with the response variable. Perhaps higher GPA students are more likely to have previously studied a language and higher GPA students also tend to do better on quizzes than other students. 14.4 What affects the F value? a) The F test statistic would be smaller because the between-groups estimate of the variance would be smaller, while the within-groups estimate would stay the same. b) The F statistic would be larger because the within-subjects estimate of variance would be smaller. c) The F statistic would be larger because the numerator of the between-groups variance estimate would be larger; this occurs because part of the calculation of the numerator of the between-groups estimate of variance involves multiplying by n. d) The P-value in (a) would be larger because the F statistic is smaller, whereas the P-values in (b) and (c) would be smaller because the F statistics are larger. A larger F statistic corresponds with a smaller P-value.

294 Statistics: The Art and Science of Learning from Data, 4th edition 14.5 Outsourcing a)

H0: 1  2  3 ; 1 represents the population mean satisfaction rating for San Jose, 2 for Toronto, and 3 for Bangalore.

b) F = 13.00/0.47 = 27.6, calculated by dividing the between-groups variance estimate, 13.00, by the within-groups variance estimate, 0.47. The degrees of freedom are: df1 = 2 and df2 = 297. c) If the null hypothesis were true, the probability would be close to 0 of a test statistic at least as extreme as the value observed. We have very strong evidence that customer satisfaction ratings are different for at least two of the populations. With a 0.05 significance level, we would reject the null hypothesis. 14.6 ANOVA and box plots a) Study 2 will more likely lead to a rejection of the null hypothesis. Judging from the box plots, the sample medians for the three groups seem to be fairly equal in Study 1 and rather different in Study 2. Since distribution looks symmetric, this implies that sample means in Study 2 are rather different and more likely to lead to rejection of null hypothesis. b) The variability within each group seems to be the same for both Study 1 and Study 2, whereas the variability between the group medians is much larger for Study 2. Since distribution is fairly symmetric, this implies larger variability between group means than variability within a group for Study 2, leading to a larger F test statistic. c) No, the small P-value only implies that at least two population means are different, but not necessarily all three. 14.7 How many kids to have? a)

1 represents the population mean ideal number of kids for Protestants; 2 represents the population mean for Catholics; 3 for Jewish people; 4 for those of another religion; H0: 1  2  3  4 .

b) The assumptions are that there are independent random samples, and normal population distributions with equal standard deviations. c) The F statistic is 5.48, and the P-value is 0.001. If the null hypothesis were true, the probability would be 0.001 of getting a test statistic at least as extreme as the value observed. We have strong evidence that a difference exists between at least two of the population means for ideal numbers of kids. d) We cannot conclude that every pair of religious affiliations has different population means. ANOVA tests only whether at least two population means are different. 14.8 Smoking and personality a) An F statistic of 3.00 is needed to get P-value = 0.05 in an ANOVA with 2 and 1635 degrees of freedom. The F statistic for the extraversion scale, 0.24, is not larger than the F statistic required to reject the null hypothesis. Therefore, we must fail to reject the null hypothesis. b) This does not mean that the population means are necessarily equal. It is possible that this is a Type II error. Confidence intervals would show plausible differences other than 0.

Chapter 14: Comparing Groups: Analysis of Variance Methods 295 14.9 French cuisine a) 27 26 25

Rating

24 23 22 21 20 19 18 London

From Minitab: Variable City Rating London New York Paris

N 8 8 8

New York City

Mean 20.375 21.875 23.375

Paris

StDev 2.774 1.885 2.825

b) Hypotheses: H0: 1  2  3 ; Ha: At least two of the population means are unequal. From technology, F = 2.8 and the P-value is 0.083. If the null hypothesis were true, the probability would be 0.083 of getting a test statistic at least as extreme as the value observed. At a significance level of 0.05, we do not have sufficient evidence to reject H0. It is plausible that the population mean ratings of French restaurants in New York, London, and Paris are equal. 14.10 Software and French ANOVA a) From Minitab: Variable group N Mean StDev quiz score 1 3 6.00 2.00 2 2 3.00 2.83 3 3 8.00 2.65 b) From Minitab: Source DF SS MS F P group 2 30.00 15.00 2.50 0.177 Error 5 30.00 6.00 Total 7 60.00

If the null hypothesis were true, the probability would be 0.18 of getting a test statistic at least as extreme as the value observed. It is plausible that the null hypothesis is correct and that there is no difference among means. There are several ways in which this could be accomplished. For example, we could change the 5 to a 2. This change makes the P-value smaller because it increases the difference among means (betweengroups variance estimate) while decreasing the difference within means (within-groups variance estimate).

296 Statistics: The Art and Science of Learning from Data, 4th edition 14.11 Comparing therapies for anorexia a) 25 20

Weight Change

15 10 5 0 -5 -10 cogchange

From Minitab: Level N famchange 17 cogchange 29 conchange 26

Mean 7.265 3.007 -0.450

controlchange Treatm ent

famchange

StDev 7.157 7.309 7.989

The box plots and descriptive statistics suggest that the means of these groups are somewhat different. The standard deviations are similar. b) F = 5.42 and the P-value is 0.006. If the null hypothesis were true, the probability would be 0.006 of getting a test statistic at least as extreme as the value observed. We have strong evidence that at least two population means are different. c) The assumptions are that there are independent random samples and normal population distributions with equal standard deviations. There is evidence of skew, but the test is robust with respect to this assumption. The subjects were randomly assigned to treatments but are not a random sample of subjects suffering from anorexia, so results are highly tentative.

Section 14.2: Estimating Differences in Groups for a Single Factor 14.12 House prices and age a)

The 95% CI is  y1  y2   t.025 ( s )

1 1 1 1    305.8  242.8  1.972(110.38)  , or (26.3, 99.7). n1 n2 78 72

b) Because 0 does not fall in this confidence interval, we can infer at the 95% confidence level that the population means are different (higher for new homes than for medium-aged homes). 14.13 Time on Facebook The 95% CI is  y1  y2   t.025 ( s )

1 1 1 1    63.7  49.0  1.96(70.81)  , or (5.2, 24.2). n1 n2 440 407

Because 0 does not fall in this confidence interval, we can infer at the 95% confidence level that the population means are different (higher for Freshman compared to Seniors). 14.14 Comparing telephone holding times a) We are 95% confident that the difference in the population mean times that callers are willing to remain on hold for classical music versus Muzak is between 2.9 and 12.3 minutes. Since 0 does not fall in the interval, we conclude that the hold time is greater for classical music than for Muzak. b) The margin of error would be the same for each pair of means because t.025 , s, and both values of n would be the same for each calculation.

Chapter 14: Comparing Groups: Analysis of Variance Methods 297 14.14 (continued) c) 0 does not fall in the confidence interval for the difference between classical music and advertising; therefore, we can infer that the population means are different. 0 does, however, fall in the confidence interval for the difference between advertising and Muzak; therefore, we cannot infer that the population means are different. Together with (a), the airline learned that their best bet for keeping customers on hold is to play classical music. d) The sample sizes could be increased. 14.15 Tukey holding time comparisons a) The only significant difference now is that between classical music and Muzak. b) The margins of error are larger than with the separate 95% intervals because the Tukey method uses an overall confidence level of 95% for the entire set of intervals. 14.16 REM sleep a) If the null hypothesis were true, the probability would be 0.48 of getting a test statistic at least as extreme as the value observed. It is plausible that the null hypothesis is true. We cannot conclude that at least two population means are different. b) False, it is plausible, however, that all three differences between pairs of means equal 0. c) The margin of error would be smaller because the 95% confidence level is applied to each interval rather than to the overall set. 14.17 REM regression a) x1 = 1 for observations from the first group and x1 = 0 otherwise; x2 = 1 for observations from the second group and x2 = 0 otherwise. b) H0: 1  2  3 ; H0: 1   2  0 For group 1, x1 = 1 and x2 = 0, so the predicted mean response is 12 + 6(1) + 3(0) = 18. For group 2, the predicted mean response is 12 + 6(0) + 3(1) = 15. For group 3, the predicted mean response is 12 + 6(0) + 3(0) = 12. 14.18 Outsourcing satisfaction a) The margin of error is the same because the sample sizes for the three service centers are the same. 1 1 1 1 The margin of error is t.025 ( s )   1.968(0.686)   0.19. n1 n2 100 100 c)

b) The 95% confidence intervals for the difference in population means are: San Jose/Toronto: –0.2  0.19, or (–0.39, –0.01) San Jose/Bangalore: 0.5  0.19, or (0.31, 0.69) Toronto/Bangalore: 0.7  0.19, or (0.51, 0.89) Because 0 does not fall in any of these intervals, we can infer that all three pairs of population means are different. Toronto is higher than both San Jose and Bangalore. San Jose is higher than Bangalore. c) The Tukey 95% multiple comparison confidence intervals are: San Jose/Toronto: –0.2  0.23, or (–0.43, 0.03) San Jose/Bangalore: 0.5  0.23, or (0.27, 0.73) Toronto/Bangalore: 0.7  0.23, or (0.47, 0.93) Because 0 does not fall in the intervals for the difference between San Jose and Bangalore and for the difference between Toronto and Bangalore, we can infer that these two pairs of population means are different. Toronto and San Jose are each higher than Bangalore. 0 does fall, however, in the interval for the difference between San Jose and Toronto, and so we cannot infer that these population means are different. d) The intervals in (b) and (c) are different because (b) uses a 95% confidence level for each interval, whereas (c) uses a 95% confidence level for the overall set of intervals. The advantage of using the Tukey intervals is that it ensures that we achieve the overall confidence level of 95% for the entire set of intervals.

298 Statistics: The Art and Science of Learning from Data, 4th edition 14.19 Regression for outsourcing a) x1 = 1 for observations from San Jose and x1 = 0 otherwise; x2 = 1 for observations from Toronto and x2 = 0 otherwise. b) (i) This is the estimate for 1 , which equals 0.5. The estimated difference between the population means for San Jose and Bangalore is 0.5. (ii) This is the estimate for  2 , which equals 0.7. 14.20 Advertising effect on sales a)

 y    1 x1   2 x2   3 x3 , where x1 = 1 for observations for radio and x1 = 0 otherwise, x2 = 1 for

observations for TV and x2 = 0 otherwise, and x3 = 1 for observations for newspaper and x3 = 0 otherwise. b) Because 1  4  1 , 2  4   2 , and 3  4   3 , the ANOVA null hypothesis H0: 1  2  3  4 is equivalent to H0: 1   2   3  0. c)

(i) A and D: 5 (ii) A and B: 15 (Because A must be 40, and B must be 25.) 14.21 French ANOVA a) Group 2 – Group 1: (–8.7, 2.7) Group 3 – Group 1: (–3.1, 7.1) Group 3 – Group 2: (–0.7, 10.7) Because 0 falls in all three confidence intervals, we cannot infer that any of the pairs of population means are different. b) Note: Statistical software such as MINITAB is needed to complete this solution. Group 2 – Group 1: (–10.3, 4.3) Group 3 – Group 1: (–4.5, 8.5) Group 3 – Group 2: (–2.3, 12.3) Again, 0 falls in all three confidence intervals; we cannot infer that any of the pairs of population means are different. The intervals are wider than those in (a) because we are now using a 95% confidence level for the overall set of intervals rather than for each separate interval. 14.22 Multiple comparison for time on Facebook a) There will be 6 comparisons. b) The Tukey 95% multiple comparison confidence intervals are: Freshman versus Sophomore: (–20, 6) Freshman versus Junior: (–6, 19) Freshman versus Senior: (2, 27) Sophomore versus Junior: (–27, –0.5) Sophomore versus Senior: (–6, 21) Junior versus Senior: (–34, –9) c) It is partially true. Seniors spent significantly less time on Facebook than Freshmen (by at least 2 minutes and at most 27 minutes) and Juniors (by at least 9 minutes and at most 43 minutes). However, there is no significant difference between seniors and sophomores.

Section 14.3: Two-Way ANOVA 14.23 Reducing cholesterol a) The response variable is change in cholesterol level. The factors are dosage level and type of drug. b) The four treatments are low-dose Lipitor, high-dose Lipitor, low-dose Zocor, and high-dose Zocor. c) When we control for dose level, we can compare change in cholesterol level for the two types of drugs.

Chapter 14: Comparing Groups: Analysis of Variance Methods 299 14.24 Drug main effects Hypothetical sets of population means will be different for each student. The means provided are examples of possible answers. a) Lipitor Zocor Low 10 10 High 20 20 b) Lipitor Zocor Low 10 20 High 10 20 c) Lipitor Zocor Low 10 20 High 20 30 d) Lipitor Zocor Low 10 10 High 10 10 14.25 Political ideology in 2014 a) H0: The mean political ideology in the adult U.S. population is identical for blacks and whites, for each of the two sexes. b) F = 36.5/2.1 = 17.4 and the P-value is approximately 0. If the null hypothesis were true, it is extremely unlikely to observe such a value for the F test statistic. We have strong evidence that the mean political ideology in the United States depends on race, for each sex. c) With a P-value of 0.229, there is no evidence that mean political ideology in the United States differs by gender, for blacks and for whites. 14.26 House prices, bedrooms and age a) F = 81,021/10,095 = 8.03 b) The small P-value of 0.000 provides strong evidence that the population mean house selling price depends on the age of the home. If the null hypothesis were true, the probability would be approximately 0 of getting a test statistic at least as extreme as the value observed. 14.27 Corn and manure a) When we input the indicators (0 versus 1) into the regression equation, we get the following equations for the four treatments. The only difference between the equations on the top for the low manure groups and the ones on the bottom for the high manure groups is the addition of 1.96. Fertilizer Manure Low High Low 11.65 11.65 + 1.88 = 13.53 High 11.65 + 1.96 = 13.61 11.65 + 1.88 + 1.96 = 15.49 b) For 17 degrees of freedom, t.025 = 02.11 is the value, the standard error of the estimate of the manure effect is 0.747, and the estimate of the manure effect is 1.96. 14.28 Hang up if message repeated? a) H0: The population mean holding time is equal for the three types of messages, for each fixed level of repeat time. b) F = 74.60/10.52 = 7.09; the small P-value of 0.011 provides strong evidence that the population mean holding time depends on the type of message. If the null hypothesis were true, the probability would be 0.011 of getting a test statistic at least as extreme as the value observed. c) The assumptions for two-way ANOVA are that the population distribution for each group is normal, the population standard deviations are identical, and the data result from a random sample or randomized experiment. Copyright © 2017 Pearson Education, Inc.

300 Statistics: The Art and Science of Learning from Data, 4th edition 14.29 Regression for telephone holding times a)

The population regression model is  y    1 x1   2 x2   3 x3 . Type

Length

Advert

10 min

Mean of y   1   3

Muzak

10 min

   2  3

Classical

10 min

  3

Advert

5 min

  1

Muzak Classical

5 min 5 min

0 0

1 0

0 0

  2 

yˆ  8.867  5.000 x1  7.600 x2  2.556 x3 ; 8.867 (rounds to 8.87) is the estimated holding time when a customer listens to classical music repeating every 5 minutes (all x values = 0); –5.0 represents the decrease in estimated mean when an advertisement is played, and –7.6 represents the decrease in estimated mean when Muzak is played – both at each level of repeat time; 2.556 (rounds to 2.56) represents the increase in estimated mean when the repeat time is 10 minutes, at all levels of type of message. c) Note: Numbers in parentheses represent values on x1 and x2 for type of message, and on x3 for repeating time. All numbers are rounded to two decimal places. 10 minutes (1) 5 minutes (0) Advertisement (1,0) 6.42 3.87 Muzak (0,1) 3.82 1.27 Classical music (0,0) 11.42 8.87 d) For a fixed message type, the estimated difference between mean holding times for 10-minute and 5-minute repeats is 2.56. This estimate is the coefficient for x3, the indicator variable for repeat time. e) The 95% confidence interval for  3 is: b3  t.025 se  2.556  2.201(1.709), or (–1.2, 6.3).

14.30 Interaction between message and repeat time? a) H0: no interaction; F = 0.67; P-value = 0.535 (rounds to 0.54) b) If the null hypothesis were true, the probability would be 0.54 of getting a test statistic at least as extreme as the value observed. It is plausible that the null hypothesis is correct and that there is no interaction. This lends validity to the previous analyses that assumed a lack of interaction. 14.31 Income by gender and job type a) The response variable is hourly wage, and the two factors are degree and gender. b) High School College Advanced Male 17 33 43 Female 14 24 32 c) (i) For high school graduates, males make on average $3 more per hour than females. (ii) For college graduates, males make on average $9 more per hour than females. There is an interaction because the difference between males and females is not the same for the two types of degrees. In particular, the wage gap is much larger for graduates with a college degree compared to a high school degree. d) There are several possible hypothetical mean hourly wages. This is but one example. High School College Advanced Male 17 33 43 Female 14 30 40

Chapter 14: Comparing Groups: Analysis of Variance Methods 301 14.32 Ideology by gender and race a) The means for women are almost the same (black: 4.164; white: 4.2675), whereas the mean for white males is about 0.6 higher than the mean for black males (black: 3.819; white: 4.4443). b) When race is not considered, the overall means for females and males could be very similar so that an ANOVA with gender as the only factor would not be significant. However, when both factors are considered, there is a clear effect due to gender. c) From a two-way ANOVA, we learn that the effect of gender differs based on race (as described above), but we do not learn this from the one-way ANOVA. 14.33 Attractiveness and getting dates a) The response variable is the number of dates in the last three months. The factors are gender and attractiveness. b) The data suggest that the effect of attractiveness differs according to the level of gender. There appears to be a large effect of attractiveness among women, and little to no effect among men. c) The population standard deviations likely differ, based on the sample standard deviations. ANOVA is typically robust with respect to violations of this assumption. 14.34 Diet and weight gain a) F = 13.90 and the P-value is approximately 0. The small P-value provides very strong evidence that the mean weight gain depends on the protein level. b) H0: no interaction; F = 2.75 and the P-value is 0.07. The P-value is not smaller than 0.05, so we cannot reject the null hypothesis. It is plausible that there is no interaction. c)

se  s

1 1 1 1   14.648   6.551 10 10 n for low protein beef n for high protein beef





The 95% CI is yhigh beef  ylow beef  t.025 se  100.0  79.20  2.005(6.551), or (7.7, 33.9). 14.35 Regression of weight gain on diet a) Let x1 = 1 for beef and 0 otherwise, x2 = 1 for cereal and 0 otherwise, and x1 = x2 = 0 for pork. Likewise, let x3 = 1 for high-protein and x3 = 0 for low protein.  y    1 x1   2 x2   3 x3 b) From technology: weight_gain = 81.8 + 0.50x1 – 4.20x2 + 14.5x3 The parameter estimate for x3, 14.5, is the difference in the estimate of the weight gain between low and high protein diets. c) The null hypothesis of no effect of protein source is H0: 1   2  0. d) Beef Cereal Pork

High 81.8 + 0.50(1) – 4.20(0) + 14.5(1) = 96.8 81.8 + 0.50(0) – 4.20(1) + 14.5(1) = 92.1 81.8 + 0.50(0) – 4.20(0) + 14.5(1) = 96.3

Low 81.8 + 0.50(1) – 4.20(0) + 14.5(0) = 82.3 81.8 + 0.50(0) – 4.20(1) + 14.5(0) = 77.6 81.8 + 0.50(0) – 4.20(0) + 14.5(0) = 81.8

It means that we assume that the difference between means for the two (or three) categories for one factor is the same in each category of the other factor. In this example, we assume that the difference between the high and low protein levels is the same for each source of protein.

Chapter Problems: Practicing the Basics 14.36 Good friends and marital status a) We can denote the number of good friends means for the population that these five samples represent by 1 for married, 2 for widowed, 3 for divorced, 4 for separated, and 5 for never married. The null hypothesis is H0: 1  2  3  4  5 . The alternative hypothesis is Ha: At least two of the population means are different.

302 Statistics: The Art and Science of Learning from Data, 4th edition 14.36 (continued) b) No; large values of F contradict the null, and when the null is true the expected value of the F statistic is approximately 1. c) If the null hypothesis were true, the probability would be 0.53 of getting a test statistic at least as extreme as the value observed. It is plausible that the null hypothesis is correct and that no difference exists among the five marital status groups in the population mean number of good friends. 14.37 Going to bars and having friends a)

(i) H0: 1  2  3 ; Ha: At least two of the population means are unequal.

(ii) F = 3.03 (iii) P-value = 0.049; if the null hypothesis were true, the probability would be 0.049 of getting a test statistic at least as extreme as the value observed. We have evidence that a difference exists among the three frequencies of bar-going in the population mean number of friends. b) Yes; the sample standard deviations suggest that the population standard deviations might not be equal and the distributions are probably not normal. 14.38 Singles watch more TV a) The assumptions are an approximately normal distribution of number of hours of TV watching (not so important here because of large sample size), equal standard deviation of number of hours of TV watching in each status group and data gathered through randomization. F = 54.8/6.0 = 9.1, which is far out in the tail of the F distribution with df1 = 2 and df2 = 1472. The P-value is 0.0001. There is strong evidence of a difference between the mean hours of TV watching. b) The means for single and married subjects and single and divorced subjects are significantly different. Compared to married subjects, single subjects watch on average at least 0.3 more hours of TV. The lower bound for the interval comparing single to divorced subjects is close to zero, so the difference between these two groups may be very small. c)

The 95% CI is  y s  ym   t.025 ( s )

1 1 1 1   (3.27  2.65)  1.961(2.45)  , or (0.33, 0.91). 459 731 n1 n2

d) The corresponding interval formed with the Tukey method would be wider because it uses a 95% confidence level for the overall set of intervals, rather than for each one separately. 14.39 Comparing auto bumpers a) The assumptions are that there are independent random samples, and normal population distributions with equal standard deviations. b) H0: 1  2  3 ; Ha: At least two of the population means are unequal. c)

F = 18.50; df1 = 2, df 2 = 3

d) The P-value is 0.02. e) If the null hypothesis were true, the probability would be 0.02 of getting a test statistic at least as extreme as the value observed. We have strong evidence that a difference exists among the three types of bumpers in the population mean cost to repair damage. 14.40 Compare bumpers a)

The margin of error is t.025 ( s )

1 1 1 1  =3.182(2.00)   6.4. n1 n2 2 2

b) The confidence interval formed using the Tukey 95% multiple comparison uses a 95% confidence level for the overall set of intervals, rather than a 95% confidence level for each separate interval. c) Let x1 = 1 for Bumper A and 0 otherwise, x2 = 1 for Bumper B and 0 otherwise, and x1 = x2 = 0 for Bumper C. d) 13 is the estimated mean damage cost for Bumper C, –11 is the difference between the estimated mean damage costs between Bumpers A and C, and –10 is the difference between the estimated mean damage costs between Bumpers B and C.

Chapter 14: Comparing Groups: Analysis of Variance Methods 303 14.41 Segregation by region a) Mean Standard Deviation NE 70.75 5.56 NC 73.75 11.90 S 62.00 3.65 W 61.25 7.18 b) We can denote the segregation index means for the population that these four samples represent by 1 for Northeast, 2 for North Central, 3 for South, and 4 for West. The null hypothesis is H0: 1  2  3  4 . The alternative hypothesis is that at least two of the population means are different. c) From technology: F = 2.64 and the P-value is 0.097. There is some evidence, but not very strong, that a difference exists among the four regions in the population mean segregation index. d) The ANOVA would not be valid because this would not be a randomly selected sample. 14.42 Compare segregation means a) From technology, the margin of error is 16.179. b) All the intervals contain 0; therefore, no pair is significantly different. 14.43 Georgia political ideology a) The ANOVA assumption of equal population standard deviations does seem plausible given the similar sample standard deviations. b) Dot plots generated by software indicate the possibility of some skew, particularly among the sample of Republicans which appears to be skewed to the left. One outlier, however, contributes to the appearance of being skewed to the left, and so this inference must be treated with caution. Regardless, the normality assumption is not as important as the assumption that the groups are random samples from the population of interest. c) The confidence interval does not include 0; we can conclude at the 95% confidence level that on the average Republicans are higher in conservative political ideology than are Democrats. d) We can conclude that a difference exists among the three political parties in their political ideologies; however, we can only conclude that there is a difference between Republicans and Democrats and between Independents and Republicans. We cannot draw conclusions about differences between Independents and Democrats. 14.44 Comparing therapies for anorexia a) From Minitab: Variable N Mean StDev cogchange 29 3.01 7.31 controlchange 26 -0.45 7.99 famchange 17 7.26 7.16 Source Treatment Error Total

DF 2 69 71

SS MS 614.6 307.3 3910.7 56.7 4525.4

F 5.42

P 0.006

304 Statistics: The Art and Science of Learning from Data, 4th edition 14.44 (continued) For a 95% confidence interval with df = 69, technology gives t.025 = 1.99 and s  56.7  7.53. Using Fisher’s method (confirmed by technology): Control – Cog:  y1  y2   t.025 ( s )

1 1 1 1    0.45  3.01  1.99(7.53)    7.5, 0.6 n1 n2 26 29

Cog – Family:  3.01  7.26  1.99(7.53)

1 1    8.8, 0.3 17 29

1 1    12.4,  3.0 26 17 The differences between the control and cognitive treatments and the family and cognitive treatments include 0; it is plausible that there are no population mean differences in these two cases. The other interval (i.e., family and control) does not include 0. We can infer that the population mean weight change is greater among those who receive family therapy than among those in the control group. b) Technology gives the following Tukey 95% multiple comparison confidence intervals: Difference between control and cognitive: (–8.3, 1.4) Difference between family and cognitive: (–9.8, 1.3) Difference between family and control: (–13.3, –2.1) The interpretations are the same as for the intervals in (a). The intervals are wider because the 95% confidence level is for the entire set of intervals rather than for each interval separately. 14.45 Lot size varies by region? Lot sizes by quadrant of city Source DF SS MS F P-value Quadrant 2700 900  900  180 0.000 3 4180 – 1480 = 2700 3 5 Error 1480 5 296 1480 296 Total 300 – 1 = 299 4180

Control – Family:  0.45  7.26  1.99(7.53)

14.46 House with garage a) Let x1 = 1 for houses with a garage and 0 otherwise. From technology: HP in thousands = 247 + 26.3x1. The intercept, 247, is the estimated mean selling price in thousands when a house does not have a garage, and 26.3 is the difference in the estimated mean selling price in thousands between a house with a garage and a house without a garage. b) From Minitab: Predictor Coef SE Coef T P Garage 26.28 19.27 1.36 0.174

If the null hypothesis were true, the probability would be 0.174 of getting a test statistic at least as extreme as the value observed. We do not have sufficient evidence to conclude that the population mean house selling price in thousands is significantly higher for houses with a garage than without a garage. From Minitab: Source DF SS MS F P Garage 1 24830 24830 1.86 0.174 Error 198 2644040 13354 Total 199 2668870

As with the regression model, the P-value is 0.174; this is the same result. d) The value of t in (b) is the square root of the value of F in (c).

Chapter 14: Comparing Groups: Analysis of Variance Methods 305 14.47 Ideal number of kids by gender and race a) The response variable is ideal number of kids. The factors are gender and race. b) If there were no interaction between gender and race in their effects, it would mean that the difference between population means for the two genders is the same for each race. There are many possible sets of population means that would show a strong race effect and a weak gender effect and no interaction. This is one hypothetical set. Female Male Black 3.5 3.3 White 1.5 1.3 c) H0: no interaction; Ha: There is an interaction. F = 1.36 and the P-value is 0.24. If the null hypothesis were true, the probability would be 0.24 of getting a test statistic at least as extreme as the value observed. It is plausible that the null hypothesis is true and that there is no interaction. 14.48 Regress kids on gender and race a) The coefficient of f (0.04) is the difference between the estimated means for men and women for each level of race. The fact that this estimate is close to 0 indicates that there is very little difference between the means for men and women, given race. b) Female (1) Male (0) Black (1) 2.83 2.79 White (0) 2.46 2.42 c) The data suggest that there is an effect only of race. If the null hypothesis were true, the probability would be close to 0 of getting a test statistic at least as extreme as the value observed. We have strong evidence that on the average blacks report a higher ideal number of children than whites do (by 0.37). 14.49 Energy drink a) At both 11 P.M. and 8 A.M., the mean time Williams College students need to complete the task is 0.39 minutes shorter when consuming an energy drink. b) The mean time Williams College students need to complete the task is 2 minutes shorter at night (11 P.M.) compared to in the morning (9 A.M.), whether or not consuming an energy drink. c) The model is Y    1e   2 d . 1 needs to equal zero. d) For the energy drink, t = –0.58 and the P-value is 0.57. There is no evidence of a difference in means. 14.50 Income, gender, and education a) The response variable is income and the factors are gender and education. b) HS graduate Bachelor Degree Female 31,666 60,293 Male 43,493 94,206 c) (i) Among high school graduates, men are 43,493 – 31,666 = 11,827 above women. (ii) Among college graduates, men are 94,206 – 60,293 = 33,913 above women. If these are close estimates of the population, there is an interaction because the effect of gender is different among high school graduates than among college graduates. There is a bigger mean difference between genders among college graduates than among high school graduates. 14.51 Birth weight, age of mother, and smoking This suggests an interaction since smoking status has a different impact at different fixed levels of age. There is a bigger mean difference between smokers and non-smokers among older women than among younger women.

306 Statistics: The Art and Science of Learning from Data, 4th edition 14.52 TV watching by gender and race a) The response variable is hours of TV watched per day and the factors are gender and race. b) F = 0.21 and the P-value is 0.649. If the null hypothesis were true, the probability would be 0.649 of getting a test statistic at least as extreme as the value observed. It is plausible that the null hypothesis is correct and that there is no interaction. c) (i) There is not a significant gender effect. The P-value of 0.259 is not less than a typical significance level such as 0.05. (ii) There is a significant race effect. The small P-value of 0.000 is less than a typical significance level such as 0.05. d) At each level of race, there is little difference between genders. At each level of gender, however, there is a large difference between races. 14.53 Salary and gender a) The coefficient for gender, –13, indicates that at fixed levels of rank, men have higher estimated mean salaries than women by 13 (thousands of dollars). b) 96.2  ˆ  13(1)  40(0)  96.2  ˆ  13  ˆ  96.2  13  109.2, so yˆ  109.2  13x1  40 x2 . (i)

yˆ  109.2  13(0)  40(0)  109.2

(ii) yˆ  109.2  13(1)  40(1)  56.2 14.54 Political ideology interaction a) For blacks, the difference in the mean political ideology is 3.82 – 3.81 = 0.01 between females and males. For whites, this difference is 4.13 – 1.22 = –0.09. Both differences are small and about the same. This indicates that the effect of gender on the mean may be the same for blacks and whites (i.e., no interaction) and that the effect is essentially zero; i.e., there is no difference between females and males, for either blacks or whites. b) With a P-value of 0.54, there is no evidence of an interaction between gender and race. The effect of gender is the same for blacks and whites and, vice versa, the effect of race is the same for females and males.

Chapter Problems: Concepts and Investigations 14.55 Regress TV watching on gender and marital status a)

 y    1 g   2 m1   3m2 Gender

Status

Male

Single

Mean of y   1   2

Male

Married

  1   3

Male

Divorced

  1

Female

Single

  2

Female Female

Married Divorced

0 0

1 0

  3 

b) TV hours = 2.78 + 0.16g + 0.41m1 – 0.21m2. The mean hours of TV watching are estimated to be 0.16 hours higher for males, for each marital status. c) The difference for both males and females is  2 . For males, the difference is (  1   2 )  (  1 )   2 . For females, the difference is (   2 )  ( )   2 . d) The 95% confidence interval for  2 is: b2  t.025 se  0.41  1.962(0.184), or (0.05, 0.77). With 95% confidence, the mean number of hours watching TV for singles is between 0.05 hours and 0.77 hours larger than the mean for divorced subjects. 14.56 Number of friends and degree The short report will be different for each student, but should include and interpret the F test statistic of 0.81 and the associated P-value of approximately 0.52. Copyright © 2017 Pearson Education, Inc.

Chapter 14: Comparing Groups: Analysis of Variance Methods 307 14.57 Sketch within- and between-groups variability a) There are many possible dot plots for (a) and (b). These are just two possible dot plots. Dotplot of score vs group

group

Score

b) Dotplot of score vs group

group

4 Score

14.58 A = B and B = C, but A  C? There are many possible means that would meet these requirements. An example is means 10, 17, 24 for A, B, C. 14.59 Multiple comparison confidence A confidence interval of 0.95 for a single comparison gives 95% confidence for only the one interval. A confidence level of 0.95 for a multiple comparison of all six pairs of means provides a confidence level for the whole set of intervals. In this case, the confidence level for each of the individual intervals would be a good deal higher than for the overall set. That way the error probability of 0.05 applies to the whole set, and not to each comparison. 14.60 Another Simpson paradox a) Female Male Humanities 65,000 64,000 Science 72,000 71,000 25(65,000)  5(72, 000)  66,167. 30 20(64, 000)  30(71,000)  68, 200. The overall mean for men is 50

The overall mean for women is

308 Statistics: The Art and Science of Learning from Data, 4th edition 14.60 (continued) b) A one-way comparison of mean income by gender would reveal that men have a higher mean income than women do. A two-way comparison of mean incomes by gender, however, would show that at fixed levels of university divisions, women have the higher mean. 14.61 Multiple choice: ANOVA/regression similarities The best answer is (d). 14.62 Multiple choice: ANOVA variability The best answer is (c). 14.63 Multiple choice: Multiple comparisons The best answer is (c). 14.64 Multiple choice: Interaction The best answer is (c). 14.65 True or false: Interaction True ♦♦14.66 What causes large or small F? a) The four sample means would have to be identical. b) There would have to be no variability (standard deviation of 0) within each sample. That is, all five individuals in each sample would have to have the same score. ♦♦14.67 Between-subjects estimate a)

 ( yi  y )2 estimates the variance,  2 / n, of the distribution of the { y } values, because i

g 1

 ( yi  y )2 is essentially the formula for variance, in which each observation is a sample mean. g 1

b) If

 ( yi  y )2 estimates  2 / n , then n times  ( yi  y )2 estimates  2 . g 1

g 1

♦♦14.68 Bonferroni multiple comparisons a) Fertilizer: 1.88  2.46(0.7471), or (0.04, 3.7) Manure: 1.96  2.46(0.7471), or (0.1, 3.8) b) The P-value should be 0.05/35 = 0.0014. ♦♦14.69 Independent confidence intervals a) (0.95)(0.95)(0.95)(0.95)(0.95) = 0.77 b)

(0.9898)5  0.95

♦♦14.70 Regression or ANOVA? a) (i) In an ANOVA F test the number of bathrooms would be treated as a categorical variable. Three would not be considered “more” than one; it would simply be treated as a different category. (ii) In a regression t-test, the number of bathrooms would be treated as a quantitative variable and we would be assuming a linear trend. b) The straight-line regression approach would allow us to know whether increasing numbers of bathrooms led to an increasing mean house selling price. The ANOVA would only let us know that the mean house price was different at each of the three categories. Moreover, we could not use the ANOVA as a prediction tool for other numbers of bathrooms (although even with regression, we must be careful when we interpolate or extrapolate). c) Mean selling price $150,000 for 1 bathroom, $100,000 for 2 bathrooms, and $200,000 for 3 bathrooms. (There is not an increasing or decreasing overall trend.)

Chapter 14: Comparing Groups: Analysis of Variance Methods 309 ♦♦14.71 Three factors a) 8 groups. Let NL = low level of nitrogen, NH = high level of nitrogen, PHL = low level of phosphate, PHH = high level of phosphate, POL = low level of potash and POH = high level of potash. Then the groups are NLPHLPOL, NHPHLPOL, NLPHHPOL, NLPHLPOH, NHPHHPOL, NHPHLPOH, NLPHHPOH, NHPHHPOH. b) Let x1 = 1 for the high level of nitrogen and 0 for the low level of nitrogen, x2 = 1 for the high level of phosphate and 0 for the low level of phosphate, and x3 = 1 for the high level of potash and 0 for the low level of potash. Then Y    1 x1   2 x2   3 x3 . c)

One possibility is ˆ  6, ˆ1  2, ˆ2  2, and ˆ3  2.

Chapter Problems: Student Activities 14.72 Student survey data The short reports will be different for each class.

Chapter 15: Nonparametric Statistics

311

Section 15.1: Compare Two Groups by Ranking 15.1 Tanning experiment a) Treatment Lotion (1,2) Studio (3,4) b) Lotion mean rank 1.5 Studio mean rank 3.5 Difference of mean ranks –2.0 c) Difference between Mean Ranks –2.0 –1.0 0.0 1.0 2.0

(1,3) (2,4)

Ranks (1,4) (2,3) (2,3) (1,4)

(2,4) (1,3)

(3,4) (1,2)

2.0 3.0 –1.0

2.5 2.5 0.0

3.0 2.0 1.0

3.5 1.5 2.0

2.5 2.5 0.0

Probability 1/6 1/6 2/6 1/6 1/6

15.2 Test for tanning experiment a) The P-value is 1/6 = 0.167; if the treatments had identical effects, the probability would be 0.167 of getting a sample like we observed, or even more extreme, in this direction. It is plausible that the null hypothesis is correct, and that the studio does not lead to better results than the lotion. b) The P-value is 2/6 = 0.33; if the treatments had identical effects, the probability would be 0.33 of getting a sample like we observed, or even more extreme, in either direction. It is plausible that the null hypothesis is correct and that the treatments do not lead to different results. c) It is a waste of time to conduct this experiment if we plan to use a 0.05 significance level because the smallest possible P-value is 0.17. 15.3 Comparing clinical therapies a) and b) together Treatment Ranks Therapy 1

(1,2,3)

(1,2,4)

(1,2,5)

(1,2,6)

(1,3,4)

(1,3,5)

(1,3,6)

Therapy 2

(4,5,6)

(3,5,6)

(3,4,6)

(3,4,5)

(2,5,6)

(2,4,6)

(2,4,5)

Therapy 1 mean rank

2.0

2.33

2.67

3.0

2.67

3.0

3.33

Therapy 2 mean rank

5.0

4.67

4.33

4.0

4.33

4.0

3.67

Difference of mean ranks

–3.0

–2.33

–1.67

–1.0

–1.67

–1.0

–0.33

Treatment

Ranks

Therapy 1

(1,4,5)

(1,4,6)

(1,5,6)

(2,3,4)

(2,3,5)

(2,3,6)

(2,4,5)

Therapy 2

(2,3,6)

(2,3,5)

(2,3,4)

(1,5,6)

(1,4,6)

(1,4,5)

(1,3,6)

Therapy 1 mean rank

3.33

3.67

4.0

3.0

3.33

3.67

Therapy 2 mean rank

3.67

3.33

3.0

4.0

3.67

3.33

Difference of mean ranks

–0.33

0.33

1.0

–1.0

–0.33

0.33

312 Statistics: The Art and Science of Learning from Data, 4th edition 15.3 (continued) Treatment

Ranks

Therapy 1

(2,4,6)

(2,5,6)

(3,4,5)

(3,4,6)

(3,5,6)

(4,5,6)

Therapy 2

(1,3,5)

(1,3,4)

(1,2,6)

(1,2,5)

(1,2,4)

(1,2,3)

Therapy 1 mean rank

4.0

4.33

4.0

4.33

4.67

5.0

Therapy 2 mean rank

3.00

2.67

3.0

2.67

2.33

2.0

Difference of mean ranks

1.0

1.67

1.0

1.67

2.33

3.0

c) Difference Between Mean Ranks Probability –3.00 1/20 –2.33 1/20 –1.67 2/20 –1.00 3/20 –0.33 3/20 0.33 3/20 1.00 3/20 1.67 2/20 2.33 1/20 3.00 1/20 d) The P-value is 4/20 = 0.20 (2/20 for each tail); if the treatments had identical effects, the probability would be 0.10 of getting a sample like we observed, or even more extreme, in either direction. It is plausible that the null hypothesis is correct and that the treatments do not lead to different results. 15.4 Baby weight and smoking a) H0: Birth weights are the same for babies born to women who smoke during pregnancy and babies born to women who don’t smoke during pregnancy; Ha: Birth weights are higher for babies born to women who don’t smoke during pregnancy than for babies born to women who smoke during pregnancy. b) Ranks of smokers: 1, 2, 3, 5, 9, 4, 7; Ranks of nonsmokers: 6, 10, 12, 14, 11, 13, 8 Sum of ranks of smokers: 31; Sum of ranks of nonsmokers: 74 Mean rank for smokers: 4.43; Mean rank for nonsmokers: 10.57 c) If the birth weights were the same for babies born to women who smoke during pregnancy as for babies born to women who do not smoke during pregnancy, the probability would be 0.002 of getting a sample like we observed, or even more extreme, in this direction. We have strong evidence that babies born to women who smoke during pregnancy have lower birth weights than babies born to women who do not smoke during pregnancy. 15.5 Estimating smoking effect a) The point estimate of –2.00 is an estimate of the difference between the population median birth weight for babies born to women who smoke during pregnancy and babies born to women who do not smoke during pregnancy. b) The confidence interval of (–3.3, –0.7) estimates that the population median birth weight for babies born to women who smoke during pregnancy is between 3.3 and 0.7 pounds below the population median birth weight for babies born to women who don’t smoke. Because all of the values in the interval are below 0, this interval supports the hypothesis that the median birth weight for babies born to women who smoke during pregnancy is lower than the median birth weight for babies born to women who do not smoke.

Chapter 15: Nonparametric Statistics

313

15.6 Trading volumes a) The box plots suggest that the Monday trading volumes are heavily skewed to the right and are more variable than the Friday trading volumes. The Friday trading volumes are slightly right skewed with one extreme large outlier. Monday s

Friday s

Trading Valum e (m illions)

90 80 70 60 50 40 30

b) H0: Identical population distributions for trading volumes of General Electric shares on Mondays and Fridays; Ha: Different expected values for the sample mean ranks. From technology: W = 134.5 and the P-value is 0.902. c) (–11.0, 13.0); This interval estimates that the population median trading volume for Mondays is between 11 million below and 13 million above the population median trading volume on Fridays. Because 0 is included in the interval, it is plausible that the median trading volume is the same for Mondays and Fridays. d) The assumption is that there are independent random samples from two groups. The confidence interval requires an extra assumption: that the population distributions for the two groups have the same shape. 15.7 Teenage anorexia a) The estimated difference between the population median weight change for the cognitive behavioral treatment group and the population median weight change for the control group is 3.05. b) The confidence interval of (–0.6, 8.1) estimates that the population median weight change for the cognitive-behavioral group is between 0.6 below and 8.1 above the population median weight change for the treatment group. Because 0 falls in the confidence interval, it is plausible that there is no difference between the population medians for the two groups. c) The P-value is 0.11 for testing against the alternative hypothesis of different expected mean ranks. If the null hypothesis were true, the probability would be 0.11 of getting a test statistic at least as extreme as the value observed. It is plausible that the null hypothesis is correct and that the population distributions are identical.

Section 15.2: Nonparametric Methods for Several Groups and for Matched Pairs 15.8 How long do you tolerate being put on hold? a) H0: Identical population distributions for the three groups; Ha: Population distributions not all identical b) H = 7.38; its approximate sampling distribution is the chi-squared distribution with df = g – 1 = 3 – 1 = 2. c) The P-value is 0.025, if the null hypothesis were true, the probability would be 0.025 of getting a test statistic at least as extreme as the value observed. We have strong evidence that the population distributions are not all identical. d) To find out which pairs of groups significantly differ, we could follow up the Kruskal-Wallis test with a Wilcoxon test to compare each pair of groups. Or, we could find a confidence interval for the difference between the population medians for each pair.

314 Statistics: The Art and Science of Learning from Data, 4th edition 15.9 What’s the best way to learn French? a) Group 1 ranks: 2, 5, 6 Group 2 ranks: 1, 3.5 Group 3 ranks: 3.5, 7, 8 The mean rank for Group 1 = (2 + 5 + 6)/3 = 4.33 b) The P-value is 0.209, if the null hypothesis were true, the probability would be 0.21 of getting a test statistic at least as extreme as the value observed. It is plausible that the null hypothesis is correct and that the population median quiz score is the same for each group. 15.10 Sports versus TV a)

se  (0.50)(0.50) n  (0.50)(0.50) 54  0.068; z  ( pˆ  0.50) se  (0.556  0.50) 0.068  0.82

b) The P-value is 0.41, if the null hypothesis were true, the probability would be 0.41 of getting a test statistic at least as extreme as the value observed. It is plausible that the null hypothesis is correct and that p = 0.50. 15.11 Cell phones and reaction times a) The observations are dependent samples. All students receive both treatments. b) The sample proportion is 26/32 = 0.8125. c)

se  (0.50)(0.50) n  (0.50)(0.50) 32  0.088; z  ( pˆ  0.50) se  (0.8125  0.50) 0.088  3.55; The

P-value is 0.0002. If the null hypothesis were true, the probability would be 0.0002 of getting a test statistic at least as extreme as the value observed. We have strong evidence that the population proportion if drivers who have a faster reaction time when not using a cell phone is greater than 0.50. d) The parametric method would be the matched-pairs t-test. The sign test uses merely the information about which response is higher and how many, not the quantitative information about how much higher. This is a disadvantage compared to the matched-pairs t test which analyzes the mean of the differences between the two responses. 15.12 Sign test for GRE scores P(2) 

3! 2!(3  2)!

(0.50) (0.50)  0.375; The more extreme result that all three people score higher on the

writing portion is P(3) = (0.50)3 = 0.125. The P-value is the right-tail probability of the observed result and the more extreme one, that is, 0.375 + 0.125 = 0.50. In summary, the evidence is not strong that the population median change in score is positive (but we can’t get a small P-value with such a small n for this test). 15.13 Does exercise help blood pressure? H0: p = 0.50; Ha: p > 0.50; all three subjects show a decrease. P(3) = (0.50)3 = 0.125; if the null hypothesis were true, the probability would be 0.125 of getting a test statistic at least as extreme as the value observed. It is plausible that the null hypothesis is correct and that walking does not lower blood pressure (but we can’t get a small P-value with such a small n for this test). 15.14 More on blood pressure a) H0: Population median of difference scores is 0; Ha: Population median of difference scores is > 0. b) Sample Rank of Absolute Subject 1 2 3 4 5 6 7 8 Value 1 20 20 –20 20 –20 20 –20 –20 2 2 25 25 25 –25 25 –25 –25 25 3 3 15 –15 15 15 –15 –15 15 –15 1 Sum of ranks for positive differences 6 5 4 3 3 2 1 0

Chapter 15: Nonparametric Statistics

315

15.14 (continued) c) The P-value is 1/8 = 0.125; if the null hypothesis were true, the probability would be 0.125 of getting a test statistic at least as extreme as the value observed. It is plausible that the null hypothesis is correct and that walking does not lower blood pressure (but we can’t get a small P-value with such a small n for this test). 15.15 More on cell phones a) H0: Population median of difference scores is 0; Ha: Population median of difference scores is > 0. b) Sample Rank of Absolute Subject 1 2 3 4 5 6 7 8 Value 1 32 –32 32 32 32 –32 –32 –32 1 2 67 67 –67 67 67 –67 67 67 2 3 75 75 75 –75 –75 75 –75 75 3 4 150 150 150 150 –150 150 150 –150 4 Sum of ranks for positive differences 10 9 8 7 6 7 6 5 Sample Subject 1 2 3 4

9 32 –67 –75 150

10 32 –67 75 –150

Rank of Absolute Value 1 2 3 4

11 12 13 14 15 16 32 –32 –32 –32 32 –32 67 –67 –67 67 –67 –67 –75 –75 75 –75 –75 –75 –150 150 –150 –150 –150 –150 Sum of ranks for positive differences 5 4 3 4 3 2 1 0 c) The P-value is 1/16 = 0.06; if the null hypothesis were true, the probability would be 0.06 of getting a test statistic at least as extreme as the value observed. There is some, but not strong, evidence that cell phones tend to impair reaction times. 15.16 Use all data on cell phones a) H0: Population Median of Difference Scores = 0. Ha: Population Median of Difference Scores not = 0. b) MINITAB found the reported value by determining the sum of ranks for the positive differences. c) The two-sided P-value is 0.000. If the null hypothesis were true, the probability would be close to 0 of getting a test statistic at least as extreme as the value observed. We have very strong evidence that the population median of the differences is not 0. d) The estimated median is 47.25. This estimates that the population median difference between the reaction times of those not using cell phones and those using cell phones was 47.25.

Chapter Problems: Practicing the Basics 15.17 Car bumper damage a) Bumper A: Ranks are 4, 6, 5; mean is 5. Bumper B: Ranks are 1, 2, 3; mean is 2. b) Treatment Ranks Bumper A (1,2,3) (1,2,4) (1,2,5) (1,2,6) (1,3,4) (1,3,5) (1,3,6) (1,4,5) (1,4,6) (1,5,6) Bumper B (4,5,6) (3,5,6) (3,4,6) (3,4,5) (2,5,6) (2,4,6) (2,4,5) (2,3,6) (2,3,5) (2,3,4) Treatment Bumper A Bumper B

Ranks (2,3,4) (2,3,5) (2,3,6) (2,4,5) (2,4,6) (2,5,6) (3,4,5) (3,4,6) (3,5,6) (4,5,6) (1,5,6) (1,4,6) (1,4,5) (1,3,6) (1,3,5) (1,3,4) (1,2,6) (1,2,5) (1,2,4) (1,2,3)

316 Statistics: The Art and Science of Learning from Data, 4th edition 15.17 (continued) c) There are only two ways in which the ranks are as extreme as in this sample: Bumper A with 1,2,3 and B with 4,5,6, or Bumper A with 4,5,6 and B with 1,2,3. d) The P-value is 0.10 because out of 20 possibilities, only two are this extreme. 2/20 = 0.10 15.18 Comparing more bumpers a) The results would not change. This illustrates that the analysis does not take the magnitude of the sample scores into account and is not affected by outliers. b) Kruskal-Wallis test 15.19 Telephone holding times a) Group Ranks Mean Rank Muzak 1, 2, 4, 5, 3 3.0 Classical 9, 8, 7, 10, 6 8.0 b) There are only two cases this extreme, that in which Muzak has ranks 1–5 as it does here, and that in which Muzak has ranks 6–10. Thus, the P-value is the probability that one of these two cases would occur out of the 252 possible allocations of rankings. If the treatments had identical effects, the probability would be 0.008 of getting a sample like we observed or even more extreme. This is below a typical significance level such as 0.05; therefore, we can reject the null hypothesis. 15.20 Treating alcoholics a) The two-sample t test assumes that the population distribution is normal. The test is robust with respect to this assumption with a two-sided test, but these researchers planned to conduct a one-sided test. b) Group Ranks Mean Rank Control 13, 23, 18, 12, 22, 19, 16, 21, 5, 15, 11, 20 16.250 Treated 9, 2, 4, 7, 17, 10, 3, 6, 14, 8, 1 7.364 c) The P-value of 0.001 tells us that if the treatments had identical effects, the probability would be 0.001 of getting a sample like we observed or even more extreme. We have strong evidence that the treatment with social skills training reduced drinking. d) The confidence interval is (186.0, 713.0). We infer that the population median alcohol consumption for the control group is between 186 and 713 centiliters more than for the treated group. 15.21 Comparing tans a) Kruskal-Wallis test b) There are several possible examples, but all would have one group with ranks 1–3, one with ranks 4–6 and one with ranks 7–9. 15.22 Comparing therapies for anorexia a) H0: Identical population distributions for the three anorexia treatment groups; Ha: Population distributions not all identical. b) Test statistic: 9.07; chi-squared distribution with df = g – 1 = 3 – 1 = 2 From Minitab: Kruskal-Wallis Test on weight change treatment N Median Ave Rank Z cogchange 29 1.4000 37.0 0.15 controlchange 26 -0.3500 28.4 -2.46 famchange 17 9.0000 48.1 2.61 Overall 72 36.5 H = 9.07 DF = 2 P = 0.011 c)

The P-value is 0.011; if the null hypothesis were true, the probability would be 0.011 of getting a test statistic at least as extreme as the value observed. We have strong evidence that the population distributions for the three treatments for anorexia are not all identical.

Chapter 15: Nonparametric Statistics

317

15.23 Internet versus cell phones a) (i) H0: Population proportion p = 0.50 who use a cell phone more than the Internet; Ha: p  0.50. (ii) se  (0.50)(0.50) n  (0.50)(0.50) 39  0.080; z  ( pˆ  0.50) se  (0.897  0.50) 0.080  4.96 (iii) The P-value is 0.000; if the null hypothesis were true, the probability would be near 0 of getting a test statistic at least as extreme as the value observed. We have extremely strong evidence that a majority of countries have more cell phone use than Internet use. b) This would not be relevant if the data file were comprised only of countries of interest to us. We would know the population parameters so inference would not be relevant. 15.24 Browsing the Internet a) We would use the Kruskal-Wallis test because there are three political affiliations. H0: Identical population distributions for the 3 groups; Ha: Population distributions not all identical. b) H = 4.55 From Minitab: Kruskal-Wallis Test on BrowseInternet PoliticalAff N Median Ave Rank Z 1 8 30.00 31.7 0.30 2 36 30.00 26.5 -1.96 3 15 60.00 37.5 1.96 Overall 59 30.0 H = 4.43 DF = 2 P = 0.109 H = 4.55 DF = 2 P = 0.103 (adjusted for ties) c)

The P-value is 0.10, if the null hypothesis were true, the probability would be 0.10 of getting a test statistic at least as extreme as the value observed. It is plausible that the null hypothesis is correct and that time spent browsing the Internet is independent of political affiliation. 15.25 GPAs a) We could use the sign-test for matched pairs or the Wilcoxon signed-ranks test. b) One reason for using a nonparametric method is if we suspected that the population distributions were not normal, for example, possibly highly skewed, because we have a one-sided alternative, and parametric methods are not then robust. c) From Minitab: Wilcoxon Signed Rank Test: CGPA-HSGPA Test of median = 0.000000 versus median < 0.000000 N for Wilcoxon Estimated N Test Statistic P Median CGPA-HSGPA 59 55 268.5 0.000 -0.1850

If the null hypothesis were true, the probability would be near 0 of getting a test statistic at least as extreme as the value observed. We have very strong evidence that population median high school GPA is higher than population median college GPA. 15.26 Sign test about the GRE workshop a) H0: Population proportion p = 0.50 who score better on the GRE; Ha: p > 0.50 P(2) 

3! 2!(3  2)!

(0.50) (0.50)  0.375; The more extreme result that all three people would score

higher has probability P(3) = (0.50)3 = 0.125. The P-value is the right-tail probability of the observed result and the more extreme one, that is, 0.375 + 0.125 = 0.50. If the null hypothesis were true, the probability would be 0.50 of getting a test statistic at least as extreme as the value observed. It is plausible that the null hypothesis is correct and that the GRE score difference is not positive. b) The results are identical. Outliers do not have an effect on this nonparametric statistical method.

318 Statistics: The Art and Science of Learning from Data, 4th edition 15.27 Wilcoxon signed-rank test about the GRE workshop a) H0: Population median of difference scores is 0.; Ha: Population median of difference scores is > 0. Possible Samples with Absolute Difference Values of Sample Rank of Absolute Subject 1 2 3 4 5 6 7 8 Value 1 5.5 5.5 –5.5 5.5 –5.5 5.5 –5.5 –5.5 3 2 –0.5 –0.5 –0.5 0.5 –0.5 0.5 0.5 0.5 1 3 1.5 –1.5 1.5 1.5 –1.5 –1.5 1.5 –1.5 2 Sum of ranks for positive differences 5 3 2 6 0 4 3 1 The rank sum is 5 one-eighth of the time, and is more extreme (i.e., 6) one-eighth of the time. Thus, the P-value is 2/8 = 0.25. If the null hypothesis were true, the probability would be 0.25 of getting a test statistic at least as extreme as the value observed. It is plausible that the null hypothesis is correct and that the population median of difference scores is not positive. b) The P-value is smaller than in Example 8. Outliers do not have an effect on this nonparametric statistical method.

Chapter Problems: Concepts and Investigations 15.28 Student survey The one-page reports will be different for each student, but should include the following findings from technology: From Minitab: Mann-Whitney Test and CI: Newspaper_F, Newspaper_M N Median Newspaper_F 31 3.000 Newspaper_M 29 3.000 Point estimate for ETA1-ETA2 is -0.000 95.1 Percent CI for ETA1-ETA2 is (-2.000,1.000) W = 892.5 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.4374 The test is significant at 0.4323 (adjusted for ties)

15.29 Why nonparametrics? There are many possible situations. One example is a situation in which the population distribution is likely to be highly skewed and the researcher wants to use a one-sided test. 15.30 Why matched pairs? With a crossover design, we get a score for each treatment for every subject, whereas with an independent samples design, we must assign subjects to one treatment only. 15.31 Complete the analogy Kruskal-Wallis 15.32 Complete the analogy Sign test for matched pairs or the Wilcoxon signed-ranks test 15.33 True or false False 15.34 Multiple choice The best answer is (c). ♦♦15.35 Mann-Whitney statistic a) The proportions are calculated by pairing up the subjects in every possible way, and then counting the number of pairs for which the tanning studio gave a better tan. For the first set of ranks in the chart (lotion: 1,2,3 and studio: 4,5), the possible pairs are as follows with lotion first: (1,4) (1,5) (2,4) (2,5) (3,4) (3,5). In none of these pairs did the studio have the higher rank; therefore, the proportion is 0/6.

Chapter 15: Nonparametric Statistics

319

15.35 (continued) b) Proportion Probability 0/6 1/10 1/6 1/10 2/6 2/10 3/6 2/10 4/6 2/10 5/6 1/10 6/6 1/10 c) The P-value is 2/10. The probability of an observed sample proportion of 5/6 or more extreme (i.e., 6/6) is 2/10 = 0.20. ♦♦15.36 Rank-based correlation a) The Spearman rank correlation is not affected by an outlier. The largest score in a data set of 30 observations would receive a rank of 30 whether it was bigger than the second largest score by 1 or by 100. b) The null hypothesis would include the value 0. ♦♦15.37 Nonparametric regression The nonparametric estimate of the slope is not strongly affected by a regression outlier because we are taking the median of all slopes. The median is not susceptible to outliers. The ordinary slope, on the other hand, takes the magnitude of all observations into account.

Part 1 Review: Chapters 1–4: Gathering and Exploring Data 1

Review Exercises: Practicing the Basics R1.1 Believe in astrology? a) The sample of interest is the 1245 subjects who responded to the General Social Survey question on astrology. The population of interest is the American adult public. b) The observed variable is the subject’s response about whether astrology has scientific truth, which is categorical. c) The sample proportion who responded “definitely or probably true” is 651/1245 = 0.523. R1.2 Time spent in housework a) The response variable is the number of hours spent a week on housework and is quantitative. The explanatory variable is the respondent’s gender and is categorical. b) The population is the American adult public. The sample is the set of 391 females and 292 males who responded to the question on housework. c) The distributions are not bell-shaped because the data extends just past x  s for the females and not even this far for the males, yet the data extend far beyond x  3s for both populations. The means are also larger than the medians for these two distributions. These characteristics of the data indicate that the distributions are skewed to the right. R1.3 Best long-term investment a) This variable is categorical since each observation belongs to one of four possible choices (real estate, savings accounts/CDs, stocks/mutual funds, bonds). b) These percentages are statistics since they are calculated from a sample. R1.4 Pay more to reduce global warming? a) One statistic is the percentage of subjects interviewed in the UK who indicated that they would be willing to pay more for energy produced from renewable sources than for energy produced from other sources. b) One parameter is the percentage of the adult population in the UK who would be willing to pay more for energy produced from renewable sources than for energy produced from other sources. c) The descriptive part of this statistical analysis consists of summarizing the responses into the percentage who responded “yes” for each country. d) The inferential part of this statistical analysis consists of the predictions made for each country regarding the percentage of the entire adult population who would answer “yes”. R1.5 Religions a) Religion Frequency Percent Christianity 2.1 2.1/5.1 = 41.18 Islam 1.3 1.3/5.1 = 25.49 Hinduism 0.9 0.9/5.1 = 17.65 Confucianism 0.4 0.4/5.1 = 7.84 Buddhism 0.4 0.4/5.1 = 7.84 Total 5.1 100

2 Statistics: The Art and Science of Learning from Data, 4th edition R1.5 (continued) b)

Follow ers (billions)

2.0

1.5

1.0

0.5

0.0

ity ian ir st Ch

m la Is

nis ia uc f n Co Religion

uis nd Hi

m is dh d Bu

A mean and median cannot be found for this data because the variable “religion” is not quantitative and cannot be meaningfully ordered. The modal religion is Christianity. R1.6 Highest degree a) The modal category is the one with the highest percentage, “High school only”. b) The median is appropriate since the categories can be ordered from lowest to highest amount of education. The median category is the one containing the 50th percentile, “Some college, no degree”. R1.7 Newspaper reading a) The modal category is the response with the highest frequency, “every day”. The median response is the one containing the 50th percentile, “a few times a week”. b) The sample mean number of times per week reading a newspaper is 431  7  300  3  207  1  200  0.5  191  0  3.18. 431  300  207  200  191 The average number of times per week the respondents spent reading a newspaper is about 3.2. This is less than the mean of 4.4 for the 1994 GSS, perhaps because of the increased popularity of news sites on the internet. R1.8 Earnings by gender a) Since both means are quite a bit larger than their respective medians, the distributions of income for each gender are skewed to the right. 39,890  73.8  56, 724  83.4  $48,821. b) The mean is 73.8  83.4 R1.9 Females in the labor force 83  82  72  80  80  81  84  81  80.4. 8 To find the median, first sort the data: 72, 80, 80, 81, 81, 82, 83, 84. The median is the average of the middle two observations, 81. 48  58  52  50  62  40  51  44  45  68  55  52.1. b) In South America, the mean is 11 Upon comparing the mean values, female economic activity tends to be lower in South America than in Eastern Europe. c) Female economic activity is quantitative. Nation is categorical. The response variable is female economic activity.

The mean is

Part 1 Review: Chapters 1–4: Gathering and Exploring Data 3 R1.10 Females working in Europe a) South America Western Europe 40|4| 85|4| 210|5|4 85|5|88 2|6|003 8|6|678 |7|12 |7|68 |8| |8|567 |9|0 Based on the above back-to-back stem-and-leaf plot, it is evident that the values tend to be higher for Western Europe than for South America. b) The standard deviation is much larger for Western Europe than for Eastern Europe. This indicates that the data are more spread out for Western Europe than for Eastern Europe. R1.11 Golf scoring a) Since the correlation is very low the linear association appears to be weak. b) A negative correlation means that as scoring average decreases, the percentage of greens reached in regulation tends to increase. c) The variables with the largest correlation to scoring average are percentage of greens reached in regulation and the average number of putts taken on holes for which the green was reached in regulation. R1.12 Holiday time a) According to the Empirical Rule, all or nearly all of the observations fall within 3 standard deviations of the mean or between 35 – 3  3 = 26 and 35 + 3  3 = 44 days. b) (i) Since the observation of 19 is below the mean, the mean will decrease. (ii) The standard deviation will increase since this point is quite far from the mean (more than 3 standard deviations below). 19  35 c) The z-score for the U.S. is z   5.33. The U.S. has a national mean number of holiday and 3 vacation days in a year that is 5.33 standard deviations below the remaining OECD nations. R1.13 Infant mortality The infant mortality rates range from 1.8 to 17. Since the distances from the median to the minimum and lower quartile are less than the distances from the median to the maximum and upper quartile, the distribution of infant mortality rates is skewed to the right. 25% of the nations had infant mortality rates less than 2.775 and 25% of the nations had infant mortality rates above 5.025. R1.14 Murder rates a) The mean and standard deviation heavily influenced by the outlier, D.C., the median is not. b) The range is more affected by outliers than the interquartile range because it does not take all of the data into consideration, it relies only on the maximum and minimum values. The range of the entire data set is 44 – 1 = 43. When the outlier, 44, is removed, the range is much smaller, 13 – 1 = 12. On the other hand, the interquartile range for both data sets is 6 – 3 = 3. R1.15 Using water a) The most realistic value is 300. –10 is not possible because the standard deviation is always positive. 0 is not possible because the data cover a range of values. Since the data is usually spread between 3 standard deviations of the mean, 10 is too small to be realistic and 1000 is too large. Thus, 300 is the most realistic value for the standard deviation.

4 Statistics: The Art and Science of Learning from Data, 4th edition R1.15 (continued) b) The most realistic value is 350. The interquartile range covers the middle 50% of the data. It cannot be negative, nor is it 0 for a data set that covers a range of values. Since the median is 500 and the data ranges between 200 and 1700, the interquartile range is unlikely to be either 10 (too small) or 1500 (too large). The most realistic option is 350. R1.16 Energy consumption 4222  4998  0.43. Italy’s value was 0.43 standard deviations below the mean. 1786 11,067  4998  3.40. The value for the U.S. was 3.4 standard deviations above the b) United States: z  1786 mean. c) Since the U.S. had a z-score of 3.4, its value would be considered unusually high (an outlier) with respect to the EU, assuming the EU energy values were bell-shaped. R1.17 Human contacts a) Since the mean is quite a bit larger than the median and the distances from the median to the minimum and lower quartile are much smaller than from the median to the maximum and upper quartile, the distribution is skewed to the right. b)

Italy: z 

R1.18 Attacked in Iraq a) Attacked/Ambushed Armed force/County in which served Yes No Army/Afghanistan 1139 822 Army/Iraq 789 94 Marine/Iraq 764 41 The response variable is whether the members were attacked or ambushed or not. The explanatory variable is whether the member was an Army member serving in Afghanistan, an Army member serving in Iraq, or a Marine serving in Iraq. b) (i) The proportion of those who were attacked given that they served in the Army in Iraq is 789/883 = 0.89. (ii) The proportion of those who were attacked given that they served in the Marines in Iraq is 764/805 = 0.95. R1.19 Opinion about homosexuality a) 282/368 = 0.77, or 77% b) 116/381 = 0.30, or 30% c) Yes, whether or not a person believes that homosexual relations are always wrong seems to depend on whether the person considers themselves a liberal or a fundamentalist. The percentage of liberals surveyed who hold this belief is 30% compared to 77% of fundamentalists.

Part 1 Review: Chapters 1–4: Gathering and Exploring Data 5 R1.20 Iris blossoms a) Many comparisons can be made. For example, for the versicolor variety the petals are only slightly smaller than the sepals, whereas for the setosa variety, the petals are much smaller than the sepals. b) Versicolor petals vary from about 3 cm to just over 5 cm. Setosa petals vary less, from about 1 cm to 2 cm. c) Versicolor, as evidenced by a stronger linear pattern than the setosa in the scatterplot. d) Versicolor; unclear; petal R1.21 How much is a college degree worth? a) Using the two points (0, 28,645) and (4, 51,554) to fit a straight line, we find that the slope is 51,554  28,645  5727.25. 40 b) Age could be a lurking variable, but as it is described here, it would not be responsible for the association. This is because age is positively associated with income but negatively associated with years of education. So, as age increases, income tends to go up and education tends to go down, which has a negative influence on the association between income and education. R1.22 Yours and mother’s education a) For x =10, yˆ  9.592 + 0.35(10) = 13.092 years of education b) The correlation is positive since the slope is positive. R1.23 Child poverty a) Since none of the social expenditures as a percent of gross domestic product were lower than 2%, the y-intercept does not have a meaningful interpretation. If 0 were within the range of x-values, 22 would represent the child poverty rate for a country with 0% of their gross domestic product spent on social expenditures. The slope represents the change in y for a one unit change in x. If social expenditure as a percent of gross domestic product increases by 1%, the child poverty rate is predicted to decrease by 1.3%. b) The predicted poverty rate for the U.S. is 22 – 1.3(2) = 19.4% and the predicted poverty rate for Denmark is 22 – 1.3(16) = 1.2%. c) As social expenditure increases, the child poverty rate tends to decrease. The association between social expenditure and child poverty rate is strong and negative. R1.24 TV and GPA a) The predicted GPA of a high school student who does not watch television is 3.44. For each one hour decrease in the number of weekly hours spent viewing television, high school GPA is predicted to increase by 0.03. b) The predicted GPA of a student who watched 20 hours of television a week is 3.44 – 0.03(20) = 2.84. c) The association between high school GPA and weekly number of hours viewing television is moderately strong and negative. Thus, as the weekly number of hours viewing television decreases, the high school GPA tends to increase. R1.25 U.S. child poverty The y-intercept appears to be about 18 and the slope appears to be about 0.5. The estimate of the slope was found using the approximate points (10, 22.5) and (25, 30). R1.26 Ginger for pain relief a) Randomization was used so that the treatment groups would be as homogeneous as possible with respect to potential lurking variables. This allows any observed differences to be attributed to the treatments rather than to lurking variables or bias. b) The volunteers who were not taking ginger were given fake pills so that they were unaware of which treatment group they were in, they were blinded to the treatment. It is important that the subjects be blinded so that they do not alter their behavior as a result of their treatment assignment.

6 Statistics: The Art and Science of Learning from Data, 4th edition R1.27 Fewer vacations and death If higher SES is responsible for both lower mortality and for more frequent vacations, it is likely that including it in the study would remove their apparent association. In other words, the association between vacationing frequency and frequency of deaths from heart attacks would no longer be found significant for those within the same SES. R1.28 Education and a long life? a) If the typical behavior in a society was for a person to attend school until they were unable due to illness, it is possible that having a longer life could be responsible for having more education: the longer you are alive, the longer you attend school. b) Assuming more education leads to greater wealth which then leads to a longer life, the association between education and longer life could be attributed to income level. In this case, the association between longer life and education level would no longer be significant when income level was taken under consideration. R1.29 Taxes and global warming This is an example of response bias. Although the basis of all three questions is the same, does the respondent favor a tax increase on gasoline, the wording of the last two questions led more of the respondents to answer affirmatively.

Review Exercises: Concepts and Investigations R1.30 Executive pay a) When a distribution is highly skewed, as incomes usually are, the median is a better measure of a typical observation from the population because it is not affected by outliers. The median divides the distribution in half, the observations to either side are only important in terms of how many there are, their magnitude is not a factor. However, the mean takes the magnitude of each observation into consideration so that outliers can have a significant effect on its value. b) The standard deviation is another measure that is influenced by outliers because it is calculated using all of the data values. Any outliers in the data will inflate the standard deviation because their deviations will be unusually large. On the other hand, the IQR is not affected because the magnitude of the outliers is not part of its calculation. It is simply the difference between the third and first quartiles; since any outliers would be above or below these values, their magnitude does not affect the calculation of the IQR. R1.31 Fat, sugar, and health a) The slope is –0.22. b) The slope is 0.23. c) In (a), the correlation would be negative, as the amount of fats and sweets eaten increases, diet costs tend to decrease. In (b), the correlation is positive, as the amount of fruits and vegetables eaten increases, diet costs tend to increase. Also, the correlation has the same sign as the slope. R1.32 Effects of nuclear fallout a) Obvious problems arise when considering an experiment on which the subjects are humans due to the ethical concerns associated with deliberately exposing some group of humans to the radioactive isotope. b) A careful examination that eliminates the effects of any potential lurking variables may help to make the results more convincing. In lieu of the discussion in (a), the relationship between the radioactive isotope and cancer could be studied in an experimental framework in which animals are the subjects of the study. R1.33 Sneezing at benefits of Echinacea Anecdotal evidence is usually not representative of the population. In this example, the customers who believed the Echinacea was effective were the ones who were more likely to come back in and tell the manager good things. These customers are unlikely to be representative of the population. In the randomized experiment, the subjects who received Echinacea were randomized so that the effects of potential lurking variables were minimized. The results of such a study are more trustworthy because the results can be attributed to the treatments rather than to lurking variables.

Part 1 Review: Chapters 1–4: Gathering and Exploring Data 7 R1.34 Compulsive buying Income was found to be a confounding variable in this study. It is associated with both compulsive buying and mean total credit card balance making it difficult to determine the association between these two variables. It is likely that if the compulsive buying behaviors were studied within each income group, there would have been a significant difference in mean credit card balances between those who are compulsive buyers and those who are not. However, when the association was considered for all income groups, the difference was not significant. R1.35 Internet time and age Answers will vary but should include the following: The fitted regression equation for 2004 is yˆ  9.89  0.058 x, where y = WWWHR and x = AGE. Thus, for every year increase in age, the number of hours spent per week on the WWW is expected to decrease by 0.058. The correlation between WWWHR and AGE was found to be –0.08 for 2004. This represents a very weak negative association between the two variables.

Part 2 Review: Chapters 5–7: Probability, Probability Distributions, and Sampling Distributions 1

Review Exercises: Practicing the Basics R2.1 Vote for Jerry Brown? c

a) P(A ) = 1 – P(A) = 1 0.54 = 0.46 b) The probability for the complement of an event A. R2.2 Correct inferences a) P(A and B) = P(A)P(B) = 0.95(0.95) = 0.9025 b) The multiplication rule for the intersection of independent events was used. R2.3 Embryonic stem cell research a) P(A or B) = P(A) + P(B) = 0.03 + 0.03 = 0.06 b) The addition rule for the union of two disjoint events was used. R2.4 Married and very happy a) Let A denote the event that a randomly selected American adult is married and B the event that a person is reports being “very happy”. Then, P(A and B) = P(B | A)P(A) = 0.40(0.56) = 0.224. b) The form of the multiplication rule for evaluating P(A and B) that does not assume independence was used. R2.5 Heaven and hell a) 1 – 0.85 = 0.15 b) P(heaven and hell) = P(hell | heaven)P(heaven) = 0.85(0.85) = 0.72 R2.6 Environmentally green a) 80 is the number of respondents out of the 825 subjects surveyed who said that they were a member of an environmental group. b) (i) 56/80 = 0.7; of the 80 subjects who were members of an environmental group, 56 said they were willing to pay higher taxes to protect the environment. (ii) 307/745 = 0.412; of the 307 subjects who were not members of an environmental group, 745 said they were willing to pay higher taxes to protect the environment. c) (i) 56/825 = 0.068 (ii) 0.097(0.7) = 0.068 56  438 d) z   0.599 825 R2.7 UK lottery a) 6/49 b) np = 2600(1/13,983,816) = 0.000186 c) You would need to play 13,983,816/52 = 268,920 years to expect to win once. R2.8 SAT quartiles 650  500  1.5, which has a cumulative probability of 0.93. Thus, 1 – 0.93 = 0.07 of the scores 100 fell above 650. b) Since the z-score for the 25th percentile is approximately –0.675, the corresponding score is 500 – 0.675(100) = 433. R2.9 Fraternal bias?

z

4 P(0)    (0.80)0 (0.20)4  0.0016 0 R2.10 Verbal GRE scores

500  467  0.28, which has a cumulative probability of 0.61. 118 (ii) 1  0.61 = 0.39

(i)

z

2 Statistics: The Art and Science of Learning from Data, 4th edition R2.10 (continued)

600  467  1.13, so 1 – 0.8708 = 0.1292 scored above 600. Thus, of those who scored above 500, 118 0.1292  0.332, or 33.2% scored above 600. 0.3897 Yes. The average score of the 30 students is approximately normal with a mean of 467 and a standard deviation of 118 30  21.54. Thus, if the group of 30 students had a mean score of 600, this would z

600  467  6.2, which would be very unusual. 21.54 d) No. If the population mean and standard deviation are the same, we can assume that the distribution of sample means is still approximately normal by the Central Limit Theorem since n is sufficiently large. e) 0.15(0.53) = 0.08. R2.11 Quantitative GRE scores

correspond to a z-score of z 

700  591  0.74, which has a cumulative probability of 0.77. 148 (ii) 1 – 0.77 = 0.23 b) The distance between the mean and lowest score is much greater than between the mean and highest score. Also, the highest score is only about 1.4 standard deviations above the mean. c) The sampling distribution of the sample mean quantitative exam score for a random sample of 100 people who take this exam is approximately normal with a mean of 591 and a standard error of 148 100  14.8.

(i)

z

R2.12 Baseball hitting a) The sampling distribution of the proportion of times the player gets a hit after a season of 500 at-bats is p(1  p ) 0.28(1  0.28) approximately normal with a mean of 0.280 and a standard error of =  n 500 0.020. 0.30  0.28 b) No. z   1.00, which has a cumulative probability of 0.84. Thus, the probability that the 0.020 player got a hit more than 30% of the time in this season is 1 – 0.84 = 0.16. R2.13 Ending the war in Afghanistan p(1  p ) 0.72(1  0.72)   0.0140. n 1034 b) The standard deviation describes how much we can expect the sample proportion to vary from one sample of size 1034 to the next. R2.14 Estimating mean text time a) By the central limit theorem, the sampling distribution of the sample mean text messaging time for the 36 students is approximately normal.

p = 0.72 and the standard error is

b) The mean is 100 and the standard error is 60

36  10.

20  2, which has a cumulative probability of 0.977. Thus, the probability that the sample mean 10 falls more than 20 minutes above or below the population mean is 2(1 – 0.977) = 0.046. R2.15 Ice cream sales a) For the population distribution, the mean is 1000 and the standard deviation is 300. b) For the data distribution, the mean is 880 and standard deviation is 276. c) For the sampling distribution of the sample mean for a random sample of 7 daily sales, the mean is 1000 and the standard error is  n  300 7  113. The standard error describes how much we can expect the sample mean to vary from one sample of 7 daily sales to the next.

z

Part 2 Review: Chapters 5–7: Probability, Probability Distributions, and Sampling Distributions 3 R2.16 Election poll a) The data distribution is the set of 3870 0s and 1s describing whether a voter voted for Fiorina or another candidate (0) or Boxer (1); 2015/3870 = 52% are 1s and the remainder 0s. b) The population distribution is the set of 0s and 1s representing the votes of the 9,534,523 voters in the 5, 218,137 actual election;  0.547, or 54.7%, are 1s and 45.3% are 0s. 9,534,523 c) The sampling distribution of the sample proportion voting for Boxer for a random sample of 3870 0.52(1  0.52) voters is approximately normal with a mean of 0.52 and a standard error of  0.008. 3870 R2.17 NY exit poll Assuming that the proportion voting for Charles Schumer were 0.50, the sampling distribution of the sample proportion who voted for him would be approximately normal with a mean of 0.50 and a standard 0.5(1  0.5) error of  0.0119 . The z-score associated with a sample proportion value of 0.65 would then 1751 0.65  0.50  12.6. Since the associated probability is essentially 0, we would conclude that the be 0.0119 population proportion of those voting for Charles Schumer is in fact larger than 0.50 and predict Charles Schumer to be the winner of the election. R2.18 Rising materialism We need to know n, the sample size.

Review Exercises: Concepts and Investigations R2.19 Breast cancer gene test a) P(B | H) = 0.25 c c b) P(B | H ) = 0.95 c) Positive Test

Relapse B (5)

H (20) c

B (15) 100 Women

B (4) c

H (80) c

B (76) In total, we would expect about 5 + 4 = 9 women to relapse. R2.20 Exit poll a) (i) Each of the 3895 votes has two possible outcomes, voted for the legalization of marijuana or voted against. (ii) The probability a randomly selected voter votes for legalization does not change from one voter to the next. (iii) The outcome of one vote does not depend on the outcome of any other vote, the votes are independent.

b) The mean is np = 3895(0.50) = 1947.5 and the standard deviation is 31.20.

np(1  p )  3895(0.5)(0.5) =

4 Statistics: The Art and Science of Learning from Data, 4th edition R2.20 (continued) c) Assuming the population proportion is 0.50, we would expect almost all of the sample proportions to fall within 3 standard deviations of the mean or between 1853.9 and 2041.1. If x = 1809, it would be highly unlikely that the population proportion is actually 0.50 and we would predict the ballot measure to fail. R2.21 Sample means vary Since samples are merely representative of the population, not the same as the population, we would not expect to obtain the exact same sample mean each time a sample of size n is collected. Rather, we would expect these values to vary about the true population mean. The amount by which the sample means vary is summarized by the sampling distribution of the sample mean which has a mean of  and a standard

error of  n , where  is the population mean,  is the population standard deviation, and n is the sample size. R2.22 True or false: Data are normal? False: You expect a histogram of the sampling distribution to look more and more like a normal distribution. R2.23 True or false: Data and population True R2.24 True or false: Measuring variability True, the standard error will decrease as the sample size increases thereby making it more likely that the sample proportion is close to the population proportion. R2.25 True or false: CLT True R2.26 Profit variability The best answer is (a). The standard error of the sample mean charge per day is higher during the week

6 100  0.6 than on the weekend 6 200  0.42 because there are more weekend customers on

average. Thus, the mean charges per day vary more during the week than on the weekend making it more likely that the proportion of weekdays in which the mean charge is less than $10 would be larger than the proportion of weekend days in which the mean charge is less than $10.

Part 3 Review: Chapters 8–10: Inferential Statistics 1

Review Exercises: Practicing the Basics R3.1 Reincarnation a)

The standard error is

pˆ (1  pˆ ) / n  0.20(1  0.20) / 2303  0.0083.

b) The standard error would be twice as large

 4  2 . In order to increase the precision of estimates,

the standard error must get smaller which happens when the sample size increases. Since the sample size is in the denominator through its square root, it must quadruple for the standard error to be half as large. R3.2 Environmental regulations pˆ  229 1200  0.1908; A 95% confidence interval is given by pˆ  1.96se, where se is

pˆ (1  pˆ ) / n .

Thus, a 95% confidence interval is 0.1908  1.96 0.1908(1  0.1908) /1200  0.1908  0.0222, or (0.17, 0.21). We are 95% confident that in 2006, between 17 and 21 percent of adult Floridians felt that the environmental regulations were too strict. This inference is based on the sample being random and the sample size being large enough so that both npˆ and n(1  pˆ ) are at least 15. R3.3 Homosexual relations a)

The point estimate is given by 0.54. A 95% confidence interval is given by pˆ  1.96se, where se is

pˆ (1  pˆ ) / n . Thus, a 95% confidence interval is 0.54  1.96 0.54(1  0.54) /1200  0.54  0.028, or (0.51, 0.57). b) 1) Assumptions: The variable, whether sexual relations between two adults of the same sex is wrong, is categorical. The samples in the two years are independent random samples. The sample sizes, 1200 each, are large enough to insure that the sampling distribution of the difference between the sample proportions is approximately normal. (Check that the number of successes and failures for each group are greater than 5.) 2) Hypotheses: H 0 : p1  p2 ; H a : p1  p2

3) Test statistic: z  10.2 (From technology) 4) The P-value is approximately 0. 5) Conclusion: Since the P-value is smaller than the significance level of 0.05, we reject the null hypothesis and conclude that the proportion of Floridians who say sexual relations between two adults of the same sex is always wrong has changed from years 1988 to 2006. There is extremely strong evidence that it has decreased R3.4 Random variability in baseball 1) Assumptions: The variable, whether the team wins a game, is categorical. We will assume that the sample is a random sample (this is not actually the case, but will be assumed in conducting the test). The sample size, 162, is large enough to insure that the sampling distribution of the sample proportion is approximately normal. (Check that the number of successes and failures are both greater than 15.) 2) Hypotheses: H 0 : p  0.5; H a : p  0.5 3) Test statistic: z 

pˆ  p0

p0 1  p0  n



92 /162  0.5  1.73 0.5(1  0.5) 162

4) P-value = P(z > 1.73) = 0.042 5) Conclusion: For a significance level of 0.05, we reject the null hypothesis since the P-value = 0.04 < 0.05. We conclude that this team is a better than “average” team.

2 Statistics: The Art and Science of Learning from Data, 4th edition R3.5 Reduce services, or raise taxes? a) 1) Assumptions: The variable, whether to raise taxes or reduce services, is categorical. The sample is a random sample. The sample size, 1200, is large enough to insure that the sampling distribution of the sample proportion is approximately normal. (Check that the number of successes and failures are both greater than15.) 2) Let p be the proportion of adult Floridians who favor raising taxes to handle the problem. Hypotheses: H 0 : p  0.5; H a : p  0.5 3) Test statistic: z 

pˆ  p0

p0 1  p0  n



0.52  0.5  1.39 0.5(1  0.5) 1200

4) P-value = 2P(z > 1.39) = 0.16 5) Conclusion: Since the P-value is not very small, there is not much evidence about whether a majority or minority of Floridians favored raising taxes to handle the government’s problem of not having enough money to pay for all of its services. n

pˆ (1  pˆ ) z 2 2



0.52(1  0.52)1.962

 1066 m 0.032 R3.6 Florida poll 1) Assumptions: The variable, whether it is appropriate for state government to make laws restricting access to abortion, is categorical. The sample is a random sample. The sample size, 1200, is large enough to insure that the sampling distribution of the sample proportion is approximately normal. (Check that the number of successes and failures are both greater than15.) 2) Let p be the proportion of adult Floridians who think it is appropriate for state government to make laws restricting access to abortion. Hypotheses: H 0 : p  0.5; H a : p  0.5

3) Test statistic: z 

pˆ  p0

p0 1  p0  n



396 /1200  0.5  11.8 0.5(1  0.5) 1200

4) P-value = 2P(z < –11.8), which is approximately 0. 5) Conclusion: There is extremely strong evidence that in 2006 a minority of Floridians thought it was appropriate for state government to make laws restricting access to abortion. R3.7 “Don’t ask, don’t tell” opinions The hypotheses are H 0 : p  0.5; H a : p  0.5. The assumptions are as follows: (i) the variable, whether the “don’t ask, don’t tell” policy should be repealed, is categorical; (ii) the sample is a random sample; (iii) the sample size, 1029, is large enough to insure that the sampling distribution of the sample proportion is approximately normal. (Check that the number of successes and failures are both greater than15.) pˆ  p0 0.70  0.5  12.83; The sample proportion, 0.70, lies 12.83 b) Test statistic: z  0.5(1  0.5) 1029 p0 1  p0  n

standard errors above the hypothesized value of 0.5. P-value = 2P(z > 12.83), which is approximately 0. If the null hypothesis is true, the probability of obtaining a sample result at least as extreme as that observed is approximately 0. d) Since the P-value is very small, there is very strong evidence that in 2010 a majority of U.S. adults believed that the “don’t ask, don’t tell” policy should be appealed. R3.8 Compulsive buying c)

The standard error is s n  1706.22 74  198.34. If many studies on compulsive buying behavior were conducted using the same sample size, the sampling distribution of the sample mean credit card balances would have a standard deviation of about $198.34. b) A 90% confidence interval is given by x  t0.05 ( se)  1333.61  1.67(198.34), or (1002.38, 1664.84). We are 90% confident that the population mean credit card balance for compulsive buyers is between $1,002.38 and $1,664.84. a)

Part 3 Review: Chapters 8–10: Inferential Statistics 3 R3.8 (continued) c) 1) Assumptions: The variable, credit card balance, is quantitative. The sample is a random sample and we will assume that the population distribution of credit card balances for compulsive buyers is approximately normal. 2) Hypotheses: H 0 :   1000; H a :   1000 x  0 1333.61  1000   1.68 se 198.34 4) P-value = 2P(t > 1.68) = 0.098; Assuming the population mean is equal to $1000, the probability of obtaining a sample result at least as extreme as that observed is about 0.098. 5) Since the P-value is larger than 0.05, there is not sufficient evidence against the null hypothesis. It is plausible that the population mean credit card balance for college students with compulsive buying tendencies who had debt equals $1000. R3.9 Sex partners

3) Test statistic: t 

The standard error is s

n  8.37

166  0.650.

b) We are 95% confident that in 2008 the population mean number of male sex partners for females between the ages of 20 and 29 was between 3.7 and 6.3. c) Since the standard deviation is more than 1.5 times the mean, the distribution of number of male sex partners for females between the ages of 20 and 29 is probably highly skewed to the right. Note that the smallest value, 0, is only 4.99/8.37 = 0.6, or 0.6 standard deviations below the mean. Since the sample size is quite large, the confidence interval is still valid because the sampling distribution will still be bell-shaped by the Central Limit Theorem. d) The median may be a more appropriate measure of center since the population distribution is likely to be very highly skewed. R3.10 Gas tax and global warming a) The sample mean is x  0.8 and the sample standard deviation is s  0.81. Our estimate of the amount University of Toronto students are willing to pay per gallon of gas in a special tax to encourage people to drive more fuel efficient autos is 0.8 Canadian dollars. The standard deviation tells us how far we can expect a typical observation to fall from the mean. b) The standard error is s n  0.81 20  0.18. If repeated samples of this same size were drawn, we would expect the standard deviation of the sample means to be about 0.18 Canadian dollars. c) A 95% confidence interval is (0.42, 1.18). We are 95% confident that the population mean amount University of Toronto students would be willing to pay per gallon of gas in a special tax to encourage people to drive more fuel efficient autos is between 0.42 and 1.18 Canadian dollars. d) We assume that a random sample was collected and that the population distribution is normal. It is unlikely that the population distribution is normal since the standard deviation is larger than the mean and the minimum value of 0 only about 1 standard deviation from the mean; however, the method for constructing the confidence interval is robust with respect to this assumption so that the inference made in (c) is still valid. R3.11 Legal marijuana? a) The samples are independent since the respondents differ from one year to the next. b) The percentage favoring legalization showed a noteworthy increase in the 70s but dipped back down in the 80s. The percentage in favor of legalization began increasing fairly steadily since 1991. R3.12 Renewable energy a) The response variable is whether the respondent was willing to pay more for energy produced from renewable sources than for energy produced from other sources. The explanatory variable is the respondent’s country of residence. b) Independent samples since the data are not paired. R3.13 Laughter and blood flow The samples are dependent since the same 20 people were observed watching both of the films.

4 Statistics: The Art and Science of Learning from Data, 4th edition R3.14 European views about Obama a) (i) If the results were based on independent samples, two different random samples of around 1000 adults from each of the 13 European countries would have been taken; one in 2002 and another in 2010. (ii) If the results were based on dependent samples, the same sample of around 1000 adults from each of the 13 European countries would have been questioned in 2002 and again in 2010. b) A confidence interval for the difference between two population proportions will tell us not only if the proportions are significantly different, but also by how much we can expect the population proportions to differ. Since it is highly unlikely that these are dependent samples, the interval can be calculated as pˆ1 1  pˆ1  pˆ 2 1  pˆ 2   follows:  pˆ1  pˆ 2   z ( se) where se  . n1 n2 R3.15 Listening to rap music a) We are 95% confident that the population proportion of black youth who listen to rap music every day is between 0.09 and 0.17 higher than the population proportion of Hispanic youths who listen to rap music every day. b) In order for it to be plausible that the population proportions are identical for black and Hispanic youths, 0 would need to be included in the interval. R3.16 Offensive portrayal of women False, the observations are not matched simply because the respondents were of the same race. R3.17 Evolution

Let p1 denote the proportion of fundamentalists who answered “definitely not true” and p2 denote the proportion of liberals who answered “definitely not true”. Then the hypotheses are H 0 : p1  p2 ; H a : p1  p2 .

b) Test statistic: z  Thus, z 

 pˆ1  pˆ 2   0 , where se  se0

190 323  60 309 1   1 0.3956(1  0.3956)    323 309 

1 1 190  60 pˆ 1  pˆ     and pˆ   0.3956. 323  309  n1 n2 

 10.13.

c) P-value = 2P(z > 10.13), which is approximately 0. d) Since the P-value is approximately 0, it is highly unlikely that we would obtain a test statistic as extreme as that observed if the null hypothesis were true. We conclude that the population proportion who responded “definitely not true” when asked if human beings evolved from earlier species of animals is different for those who classify themselves as religious fundamentalists than for those who classify themselves as liberal in their religious beliefs. R3.18 LAPD searches a) The sample proportion of African-Americans whose motor vehicle stop resulted in a search by the 12,016 0.1964(1  0.1964)  0.1964, with standard error of LAPD in the first half of 2005 is  61,188 61,188 0.0016. The sample proportion of white drivers whose motor vehicle stop resulted in a search by the LAPD in 5312 0.0492(1  0.0492)  0.0492, with a standard error of the first half of 2005 is  0.0007. 107,892 107,892

Part 3 Review: Chapters 8–10: Inferential Statistics 5 R3.18 (continued) b) 1) Assumptions: the response, whether or not the motor vehicle stop resulted in a search, is categorical for both groups; the groups are independent and we will assume that they represent random samples; the sample sizes are large enough to have at least five successes (searches) and five failures for each group. 2) Let group 1 be African American and group 2 be white drives. Then the hypotheses are H 0 : p1  p2 ; H a : p1  p2

3) Test statistic: z 

 pˆ1  pˆ 2   0 , where se  se0

12, 016  5312 pˆ   0.1025. Thus, z  61,188  107,892

1 1 pˆ 1  pˆ     and  n1 n2 

12,016 61,188  5312107,892

1  1   0.1025(1  0.1025)   61,188 107,892 

 95.86.

4) P-value = 2P(z > 95.9), which is approximately 0. 5) Since the P-value is approximately 0, it is highly unlikely that we would obtain a test statistic as extreme as that observed if the null hypothesis were true. We conclude that the population proportion of African-American drivers whose motor vehicle stop resulted in a search by the LAPD in the first half of 2005 differs significantly from the proportion of white drivers whose motor vehicle stop resulted in a search by the LAPD in the first half of 2005. A confidence interval for the difference of proportions would be even more informative. R3.19 No time cooking A 95% confidence interval is given by  pˆ1  pˆ 2   1.96( se) where se 

pˆ1 1  pˆ1  pˆ 2 1  pˆ 2  . For this  n1 n2

0.45(0.55) 0.26(0.74)   0.0216. A 95% confidence interval for the difference between 1219 733 the population proportions of men and women who reported spending no time on cooking and washing up during a typical day is then given by (0.45  0.26)  1.96(0.0216), or (0.15, 0.23). We are 95% confident that the population proportion of men who reported spending no time on cooking and washing up during a typical day is between 0.15 and 0.23 higher than the population proportion of women who responded the same. R3.20 Degrading sexual song lyrics a) We would need to know the sample sizes in order to conduct a statistical inference. b) We are 95% confident that the population proportion of teens who listened to lots of music with degrading sexual messages and then had intercourse within the following two years is between 0.18 and 0.26 higher than the population proportion of teens who did not listen to music with degrading sexual messages and then had intercourse within the following two years. c) Since the P-value is so small, it is unlikely that we would obtain a test statistic as extreme as that observed if the null hypothesis were true. We can conclude that teens who listened to music with degrading sexual messages were more likely to have intercourse within the following two years than teens who did not listen to music with degrading sexual messages. R3.21 Compulsive buying a) 1) Assumptions: the response, whether the respondent is a compulsive buyer, is categorical for both groups; the groups are independent and the samples are random; the sample sizes are large enough so that there are at least five successes and five failures for each group. 2) Let group 1 be the sample of women and group 2 the sample of men. Then the hypotheses are H 0 : p1  p2 ; H a : p1  p2 .

example, se 

3) Test statistic: z  0.48 4) P-value = 0.63

6 Statistics: The Art and Science of Learning from Data, 4th edition R3.21 (continued) 5) The P-value is quite large indicating that the test statistic observed is not unusual under the null hypothesis. We are unable to conclude that there is a difference in the population proportions of male and female compulsive buyers. b) A 95% confidence interval is given by (–0.015, 0.025). Since 0 is contained in the interval, we are unable to conclude that there is a difference in the population proportions of males and females who are compulsive buyers. R3.22 Credit card balances a) The difference in the mean credit card balances for compulsive buyers versus non-compulsive buyers is 3399 – 2837 = 562.

b) The standard error is c)

s12 n1  s22 n2  55952 100  63352 1682  580.43.

1) Assumptions: the response, credit card balance, is quantitative; the samples are random and independent; we will assume that the credit card balance is approximately normally distributed for each of the two groups. 2) Let group 1 be the compulsive buyers and group 2 be the non-compulsive buyers. Then, the hypotheses are H 0 : 1  2 ; H a : 1  2 . 3) Test statistic: t 

 x1  x2   0  562  0.97

580.43 se 4) P-value = 0.34 5) Since the P-value is quite large, the test statistic observed is not unusual under the assumption of the null hypothesis. It is plausible that the population mean credit card balance is the same for compulsive and non-compulsive buyers. R3.23 Men and women’s expectations on chores a) The 95% confidence interval is given by (21.05, 26.79). We are 95% confident that the mean percentage expectation of time spent on chores is between 21.1% and 26.8% more for women than for men. Since 0 does not fall within this interval, we can conclude that the mean percentage of expected time spent on chores is higher for women than for men. b) The assumptions are that the samples are random and independent and that the number of minutes per day spent on cooking and washing up is approximately normally distributed for the two groups. R3.24 More expectations on chores

H 0 : 1  2 ; H a : 1  2 , where group 1 represents women and group 2 represents men.

b) The P-value is 0. The probability of obtaining a test statistic at least as extreme as that observed assuming the null hypothesis is true is close to 0. c) Since the P-value is less than 0.01, we reject the null hypothesis and conclude that the population mean percentage of expected time spent on chores is higher for women than for men. d) A Type I error would represent our concluding that there is a difference in the population mean percentage of expected time spent on chores between men and women when, in fact, there is no difference. R3.25 Loneliness Answers will vary depending on the current year used. R3.26 Gas tax revisited Using technology, t = 1.57 and the P-value is 0.14 for testing H 0 : 1  2 ; H a : 1  2 . There is not much evidence of a difference between men and women in the population mean amount of tax they are willing to pay per gallon of gas to encourage people to drive more fuel efficient cars.

Part 3 Review: Chapters 8–10: Inferential Statistics 7 R3.27 Sex partners and gender a) The P-value, 0.000, is the probability of obtaining a test statistic at least as extreme as that observed if the null hypothesis is true. Since the P-value is close to 0, there is extremely strong evidence that the population mean number of sex partners differs for males and females. b) We are 95% confident that the population mean number of sex partners over the past year is between 0.16 and 0.52 more for males than females. The confidence interval not only tells us that there is a significant difference (0 is not contained in the interval), but it also gives us an idea of what the population mean difference is. c) We assume that the samples are independent, random samples from distributions that are approximately normal. R3.28 Are larger female crabs more attractive?

x1  x2  1.7

b) A 90% confidence interval is given by  x1  x2   t0.05 se. The confidence interval for the difference between females with a mate and those without a mate is (1.2, 2.2). Since 0 is not contained in the interval, we can conclude that the population mean shell width is larger for female crabs that have a male crab nearby. c) We assume that the samples of female crabs are random and independent. We also assume that the distribution of shell widths is approximately normal for both populations of female crabs. R3.29 Binge eating a) The difference in the proportion of women who suffer from binge eating versus suffer from anorexia is 0.035 – 0.01 = 0.025. b) The relative risk is given by 0.035/0.01= 3.5. Thus, the proportion of women who suffer from binge eating is 3.5 times the proportion of women who suffer from anorexia. R3.30 Motor vehicle fatalities and race a) The relative risk is given by 21.5/8.8 = 2.44. The motor vehicle death rate is about 2.4 times higher for white males than for white females. b) If the risk is 94% higher, that means that the risk for the American Indian/Alaskan Native population is 1.94 times the risk for Whites. R3.31 Improving math scores a)

(i)

70  80    97 60  73    96  78.0 , xbefore   71.0 ; thus, 10 10 xafter  xbefore  78  71  7. xafter 

(ii) To find the mean of the difference scores, we must first calculate the differences: 70 – 60 = 10, 80 – 73 = 7, 40 – 42 = –2, 94 – 88 = 6, 79 – 66 = 13, 86 – 77 = 9, 93 – 90 = 3, 71 – 63 = 8, 70 – 55 = 10  7    1  7.0, so the two methods give identical answers. 15, and 97 – 96 = 1. xd  10 x 0 7   4.216; Thus, the P-value = 2P(t > 4.216) = 0.002. The probability of b) t  d sd n 5.25 10 obtaining a test statistic at least as extreme as that observed, assuming the null hypothesis is true, is 0.002. Since the P-value is quite small, there is very strong evidence of a difference in the population mean scores before and after the training course. c)



A 90% confidence interval is given by xd  t.05 sd





n  7  1.83 5.25



10 , or (4.0, 10.0). We are

90% confident that the population mean difference in scores after and before taking the training course is between 4.0 and 10.0. Since 0 is not included in the interval, we conclude that the population mean test score was higher after the training course. d) The 90% confidence interval does not contain 0. Likewise, the significance test has P-value below 0.10. Each inference suggests that 0 is not a plausible value for the population mean difference. R3.32 McNemar’s test McNemar’s test is used to test equality of two population proportions using two dependent samples. Copyright © 2017 Pearson Education, Inc.

8 Statistics: The Art and Science of Learning from Data, 4th edition

Review Exercises: Concepts and Investigations R3.33 Student survey Answers will vary but could include a significance test and/or confidence interval. The test steps are: 1) Assumptions: the response, weekly number of times reading the newspaper, is quantitative; the samples are random and independent; the weekly number of times reading a newspaper is approximately normally distributed for each of the two groups (although this assumption is not met, the sample sizes are relatively large and we are conducting a two-sided test which is robust to violations of this assumption). 2) Let group 1 be females and group 2 be males. Then, the hypotheses are H 0 : 1  2 ; H a : 1  2 .

3) Test statistic: t 

 x1  x2   0  0.82

se 4) P-value = 0.42 5) Since the P-value is quite large, the test statistic observed is not unusual under the assumption of the null hypothesis. We are unable to conclude that the weekly number of times a person reads the newspaper depends on the person’s gender. A 95% confidence interval for the mean difference in weekly number of times reading a newspaper for females versus males is (–2.2, 0.9). R3.34 Time Spent on WWW Answers will vary. R3.35 More people becoming isolated? The proportion of people who said that they had not discussed matters of importance with anyone over the last six months was 0.089 in 1985 and 0.25 in 2004. To make an inference concerning whether these proportions are statistically different, we can construct a confidence interval. A 95% confidence interval is pˆ 1  pˆ1  pˆ 2 1  pˆ 2  0.089(1  0.089) 0.25(1  0.25) given by =  pˆ1  pˆ 2   z 1 ,    0.089  0.25  1.96  1531 1426 n1 n2 or (–0.19, –0.13). Since 0 is not contained in the confidence interval, we can conclude that the population proportion of people who said that they had not discussed matters of importance with anyone over the last 6 months was less in 1985 than in 2004. R3.36 Parental support and single mothers a) The statement “For samples of this size, 95% of the time one would expect this difference to be within 3.4 of the true value” refers to the results of a confidence interval. The 95% confidence interval was (46  42)  3.4 or (0.6, 7.4). For a 99% confidence interval, 99% of the time one would expect the difference (46 – 42) to be within a larger amount than 3.4 (that is, a larger margin of error then the 95% confidence interval). b) The conclusion “The mean parental support was 4 units higher for the single-mother households. If the true means were equal, a difference of at least this size could be expected only 2% of the time” refers to the results of a significance test. The P-value for testing for a difference in population means was found to be 0.02. If a one-sided significance test was conducted, the P-value would be 0.01 instead of 0.02. R3.37 Variability and inference By the sample size formula,

 2 z2

, the needed sample size is proportional to the squared standard m2 deviation. To estimate the mean income for all lawyers in the U.S., we would need a large sample because of the wide range of salaries (large standard deviation) due to differences in specialty, experience, location, etc. To estimate the mean income for all entry-level employees at Burger King restaurants in the U.S., pay is likely to be fairly homogeneous (small standard deviation) so that a small sample would suffice.

Part 3 Review: Chapters 8–10: Inferential Statistics 9 R3.38 Survey about alcohol

The alternative hypothesis usually contains the statement you would like to prove (such as H a : p  0.5, if you think a majority favor lowering the drinking age); the null hypothesis is that the parameter equals the “no effect” value (H 0 : p  0.5).

b) The P-value is the probability of obtaining sample results as extreme as those observed, assuming the null hypothesis is true. Thus, small P-values support the alternative hypothesis. c) We reject the null hypothesis falsely (we conclude that a majority of the student body favors reducing the legal age for drinking alcohol when this is not the case). d) We fail to reject the null hypothesis when it is actually false (we fail to conclude that a majority favors lowering the legal age for drinking alcohol when this is actually the case). R3.39 Freshman weight gain If we assume no difference in mean weight gain for the population of freshman men and women, the probability of obtaining a sample difference as large as or larger than that observed would be quite small. This probability is called the P-value and when it is small enough, it contradicts the statement of no difference, providing us with sufficient evidence to reject this statement and conclude the two groups have differing mean weight gains. R3.40 Practical significance Statistical significance means that the sample results were “extreme” enough to reject the null hypothesis of no difference. However, the difference might be so small that it is of no practical value. In fact, with a large enough sample size, any difference, however small, can be found “statistically significant” although practically it may be of little or no value. R3.41 Overweight teenagers Yes, an increase of 12% in the percentage of teenagers who are overweight seems practically significant as well. Statistical significance means that the sample results were “extreme” enough to reject the null hypothesis of no difference. The results are practically significant if they have practical value, this must be determined by the investigator. R3.42 True or false? False, statistical inference methods using the t distribution are robust to violations of the normality assumption. ♦♦R3.43 Comparing literacy Although the sample sizes are needed to calculate the t-score used in the confidence interval, a z-score can be used when the sample sizes are sufficiently large and similar. Assuming that this is the case, a 95% confidence interval comparing the difference in population means for Canada and the U.S. is given by (286.9  277.9)  1.96 32  22 , which is (1.9, 16.1). We are 95% confident that the difference in population mean prose literacy scores for Canada and the U.S. is between 1.9 and 16.1. Since 0 is not contained in the interval, we can conclude that the population mean prose literacy score is higher in Canada than in the U.S. ♦♦R3.44 Effect size

3.01  (0.45) 

 0.45. 7.64 b) The difference between the sample means is 0.45 times the standard deviation estimate, that is, less than half a standard deviation. This effect size represents a relatively small difference, in practical terms.

The effect size is

10 Statistics: The Art and Science of Learning from Data, 4th edition ♦♦R3.45 Margins of error a)

The margin of error is given by z

0.6(0.4) pˆ (1  pˆ )  1.96  0.03. The limits for either 40% or 60% 1000 n

are 3.0 points. b) By the formula given in (a), we see that the margin of error will change for different sample proportions, and tends to get smaller as the sample proportion moves toward 0 or 1. c) Since the numerator for the standard error contains the term pˆ 1  pˆ  , the margin of error will be the same for a particular sample proportion and for 1 minus that value. ♦♦R3.46 Prayer study a) The association may be a weak one, so it was unlikely to detect it with the sample size used in this study. Or, perhaps prayer is only effective if done by someone emotionally close to the subject. b) It is possible that the effectiveness of the prayers is confounded with whether or not the prayers were heartfelt. In other words, it may be the case that prayers offered by the patient’s loved ones would be effective, but prayers offered by strangers who do not have a personal relationship with the patient are ineffective.

Part 4 Review: Chapters 11–15: Analyzing Associations and Extended Statistical Methods 1

Review Exercises: Practicing the Basics R4.1 Gender and opinion about abortion a) Unrestricted abortion should be legal? Gender Yes No Male 0.40 0.60 Female 0.40 0.60 b) Independent since the opinion of the respondent is the same regardless of the respondent’s gender. R4.2 Opinion depends on party? The association appears to be strong. The difference between Republicans and Democrats who approved of President Barack Obama’s performance is 0.72 – 0.13 = 0.59 which is very large. Based on the poll, whether or not someone approves of President Obama’s performance depends strongly on his or her political party. R4.3 Murders and gender a) From MINITAB: Expected counts are printed below observed counts Chi-Square contributions are printed below expected counts Female Male Total 1 182 484 666 195.89 470.11 0.986 0.411 2 1719 4078 5797 1705.11 4091.89 0.113 0.047 Total 1901 4562 6463 Chi-Sq = 1.557, DF = 1, P-Value = 0.212 The test statistic is X 2  1.56, with a P-value of 0.21. b) The difference in the proportions of male versus female offenders when the victim was female is 1719/1901 – 182/1901 = 0.904 – 0.096 = 0.808. The difference in the proportions of male versus female offenders when the victim was male is 4078/4562 – 484/4562 = 0.894 – 0.106 = 0.788. Among female offenders, 182/666 = 0.273 had female victims. Among male offenders, 1719/5797 = 0.297 had female victims. Male offenders were 1.09 times more likely to have a female victim than were female offenders. R4.4 Change in opinion a) The P-value is 0. If it is true that whether or not someone agrees with the statement “It is much better for everyone involved if the man is the achiever outside the home and the woman takes care of the home and family” is independent of the year of the survey, the probability of obtaining sample results at least as extreme as those observed is close to 0. b) The standardized residuals tell us how far the observed values are from the expected values under the null hypothesis of independence. There were more people who agreed in 1977 and disagreed in 2008 than would be expected if the variables, agree/disagree and year of survey, were independent (these residuals are large and positive) and there were fewer people who disagreed in 1977 and agreed in 2008 than would be expected if the variables were independent (these residuals are large and negative). R4.5 Ma and Pa Education The entry for row i, column j gives the correlation between the variable in row i and the variable in column j. The correlation tells us the strength of the linear association between two variables. Note that when i = j, the correlation is 1 since every variable is perfectly correlated with itself. Mother’s education and father’s education have the strongest linear association with a correlation of 0.68. All of the linear associations are positive (as one variable increases, so does the other tend to increase) since all of the correlations are positive.

2 Statistics: The Art and Science of Learning from Data, 4th edition R4.6 Mother’s education and yours a)

1) Hypotheses: The null hypothesis that the variables are independent is H 0 :   0. The two-sided alternative hypothesis of dependence is H a :   0.

2) Test statistic: t = b/se = 0.358/0.017 = 21.06 3) P-value: The P-value is approximately 0. 4) Conclusion: If H 0 were true that the population slope   0, it would be extremely unusual (the probability would be close to 0) to get a sample slope at least as far from 0 as b = 0.358. The P-value gives very strong evidence that an association exists between number of years of education and number of years of mother’s education. b) The 95% confidence interval is b  t.025 ( se)  0.358  1.96(0.017) , or (0.325, 0.391). c)

The population satisfies a linear regression line, the data are gathered using randomization, the population y values at each x value have a normal distribution, with the same standard deviation at each x value. A scatterplot of the data should be drawn to determine if the relationship between x and y appears to be linear. Once the regression model has been fitted, a histogram of the residuals in useful for checking the assumption that the conditional distribution of y is normal (in which case the residuals should have an approximate bell-shaped histogram). R4.7 Fertility and contraception a) yˆ  6.663  0.06484 x (i)

yˆ  6.663  0.0648(0)  6.66

(ii) yˆ  6.663  0.0648(100)  0.18 We can obtain the difference between these by using the slope. The difference is the change in x (100 – 0) times the estimated slope (–0.0648). b) yˆ  6.663  0.0648(51)  3.4; The residual is y  yˆ  1.3  3.4  2.1. c)

The standardized residual of –2.97 indicates that the observation for Belgium was 2.97 standard errors below the predicted value from the regression line. R4.8 Association between fertility and contraception a)

r 2  (37.505 / 47.644)  0.787; The error using ŷ to predict y is 78.7% smaller than the error using y to predict y.

b) The slope is negative, and 0.787  0.887, thus, the correlation is – 0.887. There is a strong, negative linear association between fertility and contraception. R4.9 Predicting body fat a) r 2 is 0.892. This indicates a strong association. b) The P-value of 0.000 indicates that it if H 0 were true that the population slope   0, it would be extremely unusual to get a sample slope as far from 0 as b = 1.338. c) The 95% confidence interval is in agreement because 0 is not included in the interval and is therefore not a plausible value for the slope. d) The residual standard deviation is the square root of the residual MS and 2.7  1.64. The residual standard deviation describes the typical size of the residuals and estimates the standard deviation of y at a fixed value of x. R4.10 Body weight and lean body mass a) The “T” value investigates how much evidence there is that the variables of total body weight and lean body mass truly are associated. It tells us that the sample slope falls nearly 18 standard errors above the null hypothesis value of 0 for the population slope. b) The P-value of 0 indicates that if H 0 were true and the population slope   0, it would be extremely unusual to get a sample slope as far from 0 as the observed one of b = 1.23.

Part 4 Review: Chapters 11–15: Analyzing Associations and Extended Statistical Methods 3 R4.10 (continued) c) S = 6.96 is the residual standard deviation which describes the typical size of the residuals and estimates the standard deviation of TBW at a fixed value of LBM. r 2  0.838 is the proportional reduction in error. The error using ŷ to predict y is 84% smaller than the error using y to predict y. R4.11 Using the Internet a) No, the slopes are in different units. b) Yes, the slopes measure the change in the percentage of those using either the Internet or Facebook for a 1 unit ($1000) change in GDP. The first slope, 0.0157, predicts that for every $1000 increase in GDP, the percentage using the Internet will increase by 0.016 and the second tells us that for every $1000 increase in GDP, the percentage using Facebook will increase by 0.0075. The impact of GDP is greater on the percentage using the Internet than on the percentage using Facebook. R4.12 Fertility rate and GDP a) 3.15 represents the fertility rate when per capita GDP is $0; 0.81 represents the multiplicative effect on the mean of the fertility rate for each $10,000 increase in GDP. For every 1 unit ($10,000) increase in per capita GDP, the fertility rate is multiplied by 0.81, a 19% decrease. b) (i) For x = 1, yˆ  3.15(0.81)1  2.55 (ii) For x = 4, yˆ  3.15(0.81)4  1.36 R4.13 Growth of Wikipedia a) 100,000 represents the predicted number of English-language articles in Wikipedia as of January 1, 2003. 2.1 represents the estimated multiplicative effect on the mean of y for each one year change in x. b) (i)

yˆ  100,000(2.1)5  4,084,101

(ii) yˆ  100,000(2.1)10  166,798,810 2013 is too far into the future from the observed data on which the model was built to trust the prediction, 2008 may be as well. Since posting articles on the Internet is a relatively new practice, it is possible that the number of articles posted in a given time period will level off, thereby changing the relationship between x and y. Caution should be used in making predictions into the future based on this model. R4.14 Distance of college from home a) yˆ  222.5  34.18 x1  24.6 x2 ; For each one year increase in age, the distance from home is predicted to increase by 34.18 miles, assuming the gender remains fixed. For females, the predicted distance is 24.6 miles further holding age constant. b) (i) yˆ  247.1  34.18 x1 (ii) yˆ  222.5  34.18 x1 The y-intercept is 24.6 miles less for males. In other words, the predicted distance from home is 24.6 miles closer for men at a given age than for women of the same age. c)

R 2  3% gives the relative improvement from using ŷ to predict y instead of using y to predict y.

d) The F-statistic tells us whether the explanatory variables collectively have a statistically significant effect on the response variable y. When the F-statistic is large, the model is effective in predicting y. For this example, F = 0.87, with a corresponding P-value of 0.426. Since the P-value is quite large, the data do not support rejection of the null hypothesis which is that all of the beta parameters in the model are 0. e) The t-statistic for the age predictor tells us whether the age explanatory variable is an effective predictor of y. In this case, t = 1.32, with a corresponding P-value of 0.194. Thus, the age variable is not effective in predicting y. The t-statistic for the gender predictor tells us whether gender is an effective predictor of y. In this case, t = 0.06, with a corresponding P-value of 0.955. Thus, gender is not effective in predicting y either.

4 Statistics: The Art and Science of Learning from Data, 4th edition R4.14 (continued) f) Scatterplot of distance_hom e vs age 9000

gender f m

8000

distance_hom e

7000 6000 5000 4000 3000 2000 1000 0 20

age

Looking at the scatterplot, there does not appear to be a linear relationship between the response, distance from home, and the predictor, age. Thus, a linear regression model is not valid. R4.15 Baseball offensive production Since the coefficient of HR has the greatest magnitude (1.48), it has the largest effect on ŷ for a oneunit increase. b) As the number of stolen bases increases, the predicted number of runs scored will increase, since the coefficient for SB is positive. The coefficient for CS is negative, so that as the number caught stealing increases, the predicted number of runs scored will decrease. c) yˆ  100  0.59(600)  0.71(100)  0.91(10)  1.48(200)  0.30(300)  0.27(40)  0.14(4000)  0.20(20) = 367 R4.16 Correlates of fertility a) Since the largest correlation (in absolute value) of a single predictor with y is 0.661, the multiple correlation must be at least this large. Adding additional predictors to the model will never decrease the predictive power of the model, although it may stay the same. Thus, R will never decrease when additional predictors are added to the model. b) R describes the association between adolescent fertility rate and the set of explanatory variables, adult literacy rate and combined enrollment. It is the correlation between the observed adolescent fertility values and those predicted using adult literacy rate and combined enrollment as explanatory variables. R = 0.67 indicates a strong association. a)

R 2  0.672  0.45; R 2 gives the proportional reduction in error using ŷ (with ALR and CE as explanatory variables) to predict y rather than using y to predict y.

R4.17 Attitudinal research a) For every one year increase in education, the predicted score on the 4-point scale determining one’s attitude toward homosexuality increases by 0.09 points, all other predictors held constant. The positive sign means that as years of education increase, feelings toward homosexuality become more accepting, all other explanatory variables being held constant. b) If the respondent is a political conservative, their predicted score on the 4-point scale is 0.49 points less than if they are not, all other explanatory variables being held constant. The negative sign means that political conservatives are predicted to be less accepting of homosexual relations, all other predictive variables being held constant. c) yˆ  1.53  0.09 x1  0.01(20)  0.49(1)  0.39(1)  0.15(0)  0.45  0.09 x1 For x1  10, yˆ  0.45  0.09(10)  1.35; For x1  20, yˆ  0.45  0.09(20)  2.25

Part 4 Review: Chapters 11–15: Analyzing Associations and Extended Statistical Methods 5

Quality of health

R4.18 Interaction between SES and age in quality of health

Age=high

Age=low

SES

R4.19 Protecting children in car crashes a) A logistic regression model is appropriate since the response is categorical (child was injured or not). b) When an explanatory variable is a binary categorical variable, it can be included by adding an indicator variable to the regression equation. For example, let x1  1 for a SUV and x1  0 for a regular sedan, x2  1 if the child was using a restraint, 0 otherwise, x3  1 if the vehicle rolled over, 0 otherwise, and x4 is the weight of the vehicle. c)

Wearing a restraint will decrease the predicted probability of injury, so the effect of x2 should be negative. R4.20 Political ideology and party affiliation a) The response variable is political ideology score. The factor is the respondent’s political party. b) H 0 : 1  2  3 ; H a : at least two of the population means are unequal; 1 denotes the population mean political ideology score for Democrats; 2 denotes the population mean political ideology score for Independents and 3 indicates the population mean political ideology score for Republicans. c)

Since the F-statistic has a corresponding P-value of approximately 0, there is sufficient evidence to reject the null hypothesis. The probability of getting an F-statistic at least as large as that observed is essentially 0 under the null hypothesis. We can conclude that the population mean political ideology scores differ for at least two of the political parties. d) The assumptions are: Independent random samples from normal population distributions with equal standard deviations, which seem plausible in this case. R4.21 MS error for ideology 1.581  1.257 a) b) The assumption seems reasonable since the sample standard deviations for each group are quite close (1.32, 1.23, 1.17). c) We can be 95% confident that the population mean political ideology score for Republicans is between 1.35 and 1.60 points higher than the population mean political ideology score for Democrats. In other words, Republicans tend to be more conservative than Democrats. R4.22 Income and education a) (i) yˆ  20  23(0)  17(0)  20

(ii) yˆ  20  23(1)  17(0)  43 b) The estimated population mean income in 2005 for college graduates is $23,000 higher than for high school graduates when gender is held constant. Copyright © 2017 Pearson Education, Inc.

6 Statistics: The Art and Science of Learning from Data, 4th edition R4.23 Nonparametric rank test The Wilcoxon test is a nonparametric test for comparing two groups using independent samples. The test uses the ranks of the observations in the two groups and compares the resulting test statistic to what would be expected if the population distributions were identical. If the test statistic is sufficiently different than what is expected, the null hypothesis is rejected and we conclude that the two samples come from different populations. The hypotheses are H 0 : Identical population distributions for the two groups versus either H a : Different expected values for the sample mean ranks (two-sided test) or H a : Higher expected value for the sample mean rank for a specified group (one-sided test). For large sample tests, software usually calculates the P-value based on a normal distribution approximation to the sampling distribution of the rank sum.

Review Exercises: Concepts and Investigations R4.24 Racial prejudice 1) Assumptions: random sampling was used and the sample sizes are large enough so that the expected cell counts are at least 5. 2) H 0 : religious preference and whether or not one favors laws against interracial marriage are independent H a : religious preference and whether or not one favors laws against interracial marriage are dependent 3) From Minitab: X 2  28.2 4) The P-value is 0.000. 5) If the variables are independent, the probability of obtaining a test statistic at least as large as that observed is essentially 0. Thus, we reject the null hypothesis and conclude that whether or not one favors laws against interracial marriage depends on their religious preference. Also, can do residual analysis and compare groups using difference of proportions. R4.25 GPA and TV watching Answers will vary. R4.26 Analyze your data Answers will vary. R4.27 Review research literature Answers will vary. R4.28 Predicting college success This could summarize the result of a regression analysis using college GPA as the response and the various factors considered by the admissions officers as the explanatory variables. The 30% refers to R 2 and gives the proportional reduction in error using ŷ to predict y rather than using y to predict y. R4.29 Regression toward mean For any particular mother’s height, the predicted daughter’s height will be relatively closer to its mean than the mother’s height is to its mean. For example, for a very tall mother, we predict that the daughter will tend to be taller than average but not so tall as the mother. R4.30 Why ANOVA? One-way ANOVA is a method used to compare the population means of several groups. The null hypothesis is that all of the groups come from populations with equal means and the alternative is that at least two of these population means differ. If the null hypothesis is rejected, we can compute confidence intervals for pairs of means to determine which population means are different as well as how different they are. R4.31 Variability in ANOVA If the variation between the groups is relatively large, as in the first graph, the F-statistic will be large, indicating that the population means are likely to differ. The F-statistic increases and the P-value decreases as the between-groups variability increases. As the variability within the groups increases, the F-statistic will decrease and there is less evidence that the population means differ (second graph).

Part 4 Review: Chapters 11–15: Analyzing Associations and Extended Statistical Methods 7 R4.31 (continued)

R4.32 Violating regression assumptions a) Exponential regression model b) Logistic regression model R4.33 True or false? True