Statistics Unlocking the Power of Data, 3rd Edition By Robin H. Lock
Percentages and Proportions on a Number Line Learning Objectives: • Given a percent, convert it to a decimal and then locate it on a number line. • Given a point on a number line, write it in decimal form (as precisely as you can) and represent it as a percent.
• Compare two numbers to say whether they are the same number or, if not, which one is smaller and which one is larger.
Percentages and Proportions in Context: Proportions to Percentages and Vice Versa: In statistics classes, we deal with numbers expressed as decimals and percent much more often than numbers expressed as whole numbers. It is crucial that students are at ease with dealing with numbers expressed as decimals or percent just as easily as they learned to deal with whole numbers in math classes. Statistics Example 1: In class, 8 out of 25 students made an A on Test 1. 1. What is the proportion of students who made an A on Test 1? 2. What is the percent of students who made an A on Test 1? Solution: 1. 8/25 = 0.32 2. 8/25 = 0.32 = 32% This worksheet will provide more explanation and practice of these ideas. Which decimal number is smaller? Sometimes when you compare two decimal numbers, they do not have the same number of decimal places. If it looks hard to compare them, the first step is to make them have the same number of decimal places. When you do that, it will look much easier. Statistics Example 2: Consider the two numbers 0.005 and 0.0034. Discussion: It is easy for a person to look at this and think “34 is larger than 5, so the second one here must be larger.” That is a mistake.
Concepts needed for statistics: • Given a percent, immediately see what the equivalent decimal expression is. • Given a decimal, immediately see what the equivalent percent expression is. • Compare two numbers (given as either decimals or percent) to say whether they are the same or, if not, which is larger and which is smaller. • In statistics class, we refer to the decimal equivalent of a percentage as a “proportion.” 1
Mainchapter.indd 1
LOCK13e_2PP
08/07/21 3:32 PM
2
Percentages and Proportions on a Number Line
This worksheet covers all of these skills, because comparing two numbers to decide which is smaller or larger is done by visualizing where those numbers are on a number line. Of the two numbers, the one to the right of the other on the number line is larger.
Plotting Percentages on a Number Line When plotting a percent on a number line, first either change a percent to a decimal or change a decimal to a percent.
• When changing a decimal to a percent, we move the decimal point two places to the right.
• When changing a percent to a decimal, we move the decimal point two places to the left.
E X A MPLE A 1
Given a percent, locate it on a number line
Problem: Locate 72% on a number line. Step 1
Step 1
Convert the percent to a decimal by moving the decimal point two places to the left.
Step 2
Step 2
Locate the number on a number line. Refer to Figure A1.
Since 0.72 is located slightly to the right of 0.7 on the number line, this is where 72% is located.
Answer: FIGURE A1 Graph of 72% 72% –1.0 –0.8 –0.6 –0.4 –0.2
E X A MPLE A 2
0
0.2
0.4
0.6
0.8
1.0
Given a point on a line, represent it as a percent
Problem: Given the below diagram, represent the approximate value of the point as a percent. FIGURE A2 Graph of a decimal number 2.2
2.4
2.6
2.8
3.0
3.2
3.4
3.6
3.8
4.0
4.2
Step 1
Step 1
Approximate the location of the point as a decimal. Refer to Figure A2.
Since the point is approximately halfway between 3.4 and 3.6, the value of the point is approximately 3.5.
Step 2
Step 2
Convert the value from a decimal to a percent by moving the decimal two places to the right.
Answer: 350%
Mainchapter.indd 2
LOCK13e_2PP
08/07/21 3:32 PM
Percentages and Proportions on a Number Line
E X A MPLE A 3
3
Given a point on a line, represent it as a percent
Problem: Given the below diagram, decide whether the value of the point is best represented by −213%, −84%, FIGURE A3 Estimating a percent –3
–2
–1
0
1
2
3
Step 1
Step 1
Approximate the location of the point as a decimal. Refer to Figure A3.
Since the point is between 0 and 1 but closer to 0, the value of the point is approximately 0.3.
Step 2
Step 2
Convert the value from a decimal to a percent by moving the decimal two places to the right.
Step 3
Step 3
Choose the number that is the most accurate to your approximation.
The correct answer is 30.7%.
Answer: 30.7%
E X A MPLE A 4
Which decimal number is larger?
Consider the two numbers 0.005 and 0.0034. Discussion: It is easy for a person to look at this and think “34 is larger than 5, so the second one here must be larger.” That is a mistake. Which of these two numbers is larger: 0.005 and 0.0034? Strategy: These do not have the same number of decimal places. Step 1 is to add zeros to the one with fewer decimal places so that they do have the same number of decimal places and then compare the two by comparing the corresponding decimal places. Step 1
Step 1
Add zeroes at the end of the decimal expansions to make the two numbers have the same number of decimal places.
0.0034 has four decimal places, and 0.005 only has three. So change to comparing 0.0050 to 0.0034.
Step 2
Step 2
Compare the two numbers, starting with the left-most digit. End this when we find two digits that are not the same. Notice which is larger and which is smaller.
Compare 0.0050 to 0.0034. See the underlined digits. 0.0050 to 0.0034. Both are the same. 0.0050 to 0.0034. Both are the same. 0.0050 to 0.0034. Both are the same.
Step 3
0.0050 to 0.0034.
Conclude by writing the correct comparison of the two numbers with the number of decimal places they were given in the original problem.
Step 3 Since 5 is larger than 3, we see that 0.0050 is larger than 0.0034. In terms of the original problem, 0.005 is larger than 0.0034.
Answer: 0.005 is larger than 0.0034.
Mainchapter.indd 3
LOCK13e_2PP
08/07/21 3:32 PM
4
Percentages and Proportions on a Number Line
Practice Problems 1. Locate 38% on a number line. Solution: 38% = 0.38 38% –1.0 –0.8 –0.6 –0.4 –0.2
0
0.2
0.4
0.6
0.8
1.0
2. Locate 152% on a number line. Solution: 152% = 1.52 152% 0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
0.2
0.4
0.6
0.8
1.0
3.4
3.6
3.8
4.0
3. Locate −24.9% Solution: −24.9% = −0.249 –24.9% –1.0 –0.8 –0.6 –0.4 –0.2
0
4. Locate 307.4% on a number line. Solution: 307.4% = 3.074 307.4% 2.0
2.2
2.4
2.6
2.8
3.0
3.2
5. Given the number line below 5, represent the approximate value of the point as a percent. 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Solution: 0.2 = 20%. The answer is 20%. 6. Given the below diagram, represent the approximate value of the point as a percent. –1.0 –0.8 –0.6 –0.4 –0.2
0
0.2
0.4
0.6
0.8
1.0
Solution: 0.5 = 50%. The answer is 50%. 7. Given the number line below, decide whether the value of the point is best represented by 8%, 35%, 61%, or 97%. 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Solution: The point is just a little less than 1.00. Approximately, 0.98 = 98%. The correct choice is 97%. 8. Given the number line below, decide whether the value of the point is best represented by 61.8%, or 479.08%. –3.0 –2.5 –2.0 –1.5 –1.0 –0.5
0
0.5
1.0
1.5
2.0
Solution: Since the point is approximately halfway between −1.0 and−0.5, −72.5%. 9. Which is smaller: 0.02 or 0.003? Solution: Rewrite these so that both have three decimal places. Then compare the corresponding decimal places. Step 1: 0.02 is the same number as 0.020, and 0.003 already has three decimal places. Step 2: Compare 0.020 to 0.003, digit by digit. The first digits are both zero. The second digits are both zero. So far, we can’t tell which is larger. In the third digit of each, we see that 0 is smaller than 2, so 0.003 is the smaller of these two numbers. There is no need to look at the fourth digits to determine which number is smaller. Step 3: 0.003 is smaller than 0.020.
Mainchapter.indd 4
LOCK13e_2PP
08/07/21 3:32 PM
Percentages and Proportions on a Number Line
5
Which is larger: 0.003 and 0.0018? Solution: Rewrite these so that both have four decimal places. Then compare the corresponding decimal places. Step 1: 0.0018 has four decimal places, and 0.003 has three decimal places; so we rewrite it as 0.0030. Step 2: Compare 0.0018 to 0.0030, digit by digit. The first digits are both zero. The second digits are both zero. The third digits are both zero. So far, we can’t tell which is larger. In the fourth digit of each, we see that 3 is larger than 1, so 0.0030 is the larger of these two numbers. There is no need to look at the fifth digits to determine which number is larger. Step 3: 0.003 is larger than 0.0018. Solution: 0.006 is compared to 0.050, and looking at it this way 50 is larger than 06, so 0.006 is smaller than 0.05. Which is smaller: 0.005 or 0.000897? Solution: When 0.005000 is compared to 0.000897, 0.005000 is larger; so the smaller number is 0.000897.
Mainchapter.indd 5
LOCK13e_2PP
08/07/21 3:32 PM
Adding Rounded Proportions Learning Objectives: • Round decimals to given precision. • Compute proportions from one-sample data, giving both fraction and decimal form. • Know that the proportions of all the categories of a variable will sum to 1, but when the decimals are rounded, the sum may not be exactly 1, even when all the parts are correct.
Rounding Proportions in Context: When we compute proportions, we frequently round them when we are reporting them. In statistics, we will then often sum all the proportions. Sometimes the sum of the rounded proportions is surprising and appears to be incorrect. After we review the rounding rules here, we’ll see, in Example A3, how (and why) we might obtain a surprising result for the sum of the proportions. Rounding Decimals: • Given a decimal, identify place values of digits. • Round decimals to a given digit, identified by either the number of digits to the right of the decimal point or the place-value of the digit.
Rounding Decimals To round decimals, follow these steps: • First, identify the place value digit you need to round to and find that digit within your number. Use Figure A1 as a reference. • Next, look at the number to the right of the digit you are rounding.
• If the number to the right of the digit you are rounding is between 0 and 4, the digit you are rounding remains the same.
Tenths Hundredths Thousandths Ten-thousandths Hundred-thousandths
Place Value
Hundred thousands Ten thousands Thousands Hundreds Tens Ones
FIGURE A1 Place value chart
• If the number to the right of the digit you are rounding is between 5 and 9, you round up by increasing the digit you are rounding by one.
6
Mainchapter.indd 6
LOCK13e_2PP
08/07/21 3:32 PM
Adding Rounded Proportions
E X A MPLE A 1
7
Given a decimal, identify place values of digits
Problem: Given Step 1
Step 1
Identify the decimal point and count the number of places to the underlined digit.
It is three places to the right of the decimal.
Step 2
Step 2
Name the place value of the digit.
The place value of the digit is the thousandths place.
Answer: thousandths
E X A MPLE A 2
Round decimals
Problem: Round 0.99865 to the nearest ten-thousandth. Step 1
Step 1
Identify the ten-thousandths place.
̲
Step 2
Step 2
Determine if the value to the right is greater than or equal to 5 or less than 5.
The value to the right is 5. This value is greater than or equal to 5, so the number in the ten-thousandths place will be rounded up.
Step 3
Step 3
Add one to the 6 in the ten-thousandths place.
Answer: 0.99870
E X A MPLE A 3
Adding rounded decimals
In a nutrition study, 102 people participated. One question on the survey was: Do you take vitamins? The three answer choices were “No,” “Yes, regularly,” and “Yes, occasionally.” The table below shows the counts for each answer choice, the proportions as a calculator showed them, and the rounded proportions. a. Compute each proportion to seven decimal places (to the nearest ten-millionth). b. Round each proportion to four decimal places (to the nearest ten-thousandth). c. Add all three of the rounded proportions in part b, to find the total. Discuss: We know that the total of all the proportions should be 1. Why is it not exactly 1 here? Does that mean we did anything wrong?
Mainchapter.indd 7
LOCK13e_2PP
No
Yes, regularly
Yes, occasionally
Total
Count
39
42
21
102
Proportion (seven decimal places)
39 = 0.3823529 _ 102
_ 42 = 0.4117647
_ 21 = 0.2058824
Proportion (rounded to the fourth decimal place)
0.3824
0.4118
0.2059
102
102
1.0001
08/07/21 3:32 PM
8
Adding Rounded Proportions Discussion: The rounded proportions do not necessarily sum to 1 because of “round-off error.” That does not mean that actual errors were made. It simply means that, when doing arithmetic with rounded numbers, the results don’t always exactly agree with the results you would have found using the non-rounded numbers (or the numbers rounded to a different number of decimal places). However, the sum will be very close.
Practice Problems Practice Problem (Proportions) In a survey of high school students, 71 students participated. One item on the survey was: Consider the statement: “My parents have reasonable rules for me.” Do you agree? The three answer choices were “Agree,” “Disagree,” and “Not sure.” There were 21 students who agreed, 43 who disagreed, and 7 were not sure. Examine the proportions for each of the three categories. a. Compute each proportion to at least six decimal places (to the nearest millionth). b. Round each proportion to three decimal places. c. Add all three of the rounded proportions in part b. d. We know that the total of all the proportions should be 1. Why is it not exactly 1 here? Does that mean we did anything wrong? Discuss.
Count Proportion (at least six decimal places) Proportion (rounded to the nearest thousandth)
Agree
Disagree
Not sure
Total
21
43
7
71
_ 21 = 0.295775
_ 43 = 0.605634
_ 7 = 0.098592
71
(It is not necessary to fill in this box.)
0.296
0.606
0.099
1.001
71
71
Discussion: We did not do anything wrong. Each of the three numbers was rounded correctly. As it turned out, all were (correctly) rounded up, which made the sum a bit larger than it would have been if the numbers had not been rounded. 1. Given 0.24517̲, identify the place value of the underlined digit. Solution: Problem: Given 0.24517̲, identify the place value of the underlined digit. Step 1
Step 1
Identify the decimal point and count the number of places to the underlined digit.
It is five places to the right of the decimal.
Step 2
Step 2
Name the place value of the digit.
The place value of the digit is the hundred-thousandths place.
Answer: hundred-thousandths 2. Given 45.6107̲5, identify the place value of the underlined digit. Solution: Problem: Given 45.6107̲5, identify the place value of the underlined digit. Step 1 Identify the decimal point and count the number of places to the underlined digit.
Step 1 It is four places to the right of the decimal.
Step 2 Name the place value of the digit.
Step 2 The place value of the digit is the ten-thousandths place.
Answer: ten-thousandths
Mainchapter.indd 8
LOCK13e_2PP
08/07/21 3:32 PM
Adding Rounded Proportions
9
3. Given 1.9̲734, identify the place value of the underlined digit. Solution: Problem: Given 1.9̲734, identify the place value of the underlined digit. Step 1 Identify the decimal point and count the number of places to the underlined digit.
Step 1 It is one place to the right of the decimal.
Step 2 Name the place value of the digit.
Step 2 The place value of the digit is the tenths place.
Answer: tenths 4. Given 35.41̲9, identify the place value of the underlined digit. Solution: Problem: Given 35.41̲9, identify the place value of the underlined digit. Step 1 Identify the decimal point and count the number of places to the underlined digit.
Step 1 It is two places to the right of the decimal.
Step 2 Name the place value of the digit.
Step 2 The place value of the digit is the hundredths place.
Answer: hundredths 5. Round 13.543 to the nearest tenth. Solution: Problem: Round 13.543 to the nearest tenth. Step 1 Identify the tenths place.
Step 1 ̲
Step 2 Determine if the value to the right is greater than or equal to 5 or less than 5.
Step 2 The value to the right is 4. This value is less than 5, so the number in the tenths place will remain the same.
Step 3 Keep the number in the tenths place.
Step 3
Answer: 13.5 6. Round 49.7867 to the nearest thousandth. Solution: Problem: Round 49.7867 to the nearest thousandth. Step 1 Identify the thousandths place.
Step 1 ̲
Step 2 Determine if the value to the right is greater than or equal to 5 or less than 5.
Step 2 The value to the right is 7. This value is greater than or equal to 5. so the number in the thousandths place will be rounded up.
Step 3 Add one to the 6 in the thousandths place.
Step 3
Answer: 49.787
Mainchapter.indd 9
LOCK13e_2PP
08/07/21 3:32 PM
10
Adding Rounded Proportions 7. Round 100.0042 to the nearest hundredth. Solution: Problem: Round 100.0042 to the nearest hundredth. Step 1 Identify the hundredths place.
Step 1 ̲
Step 2 Determine if the value to the right is greater than or equal to 5 or less than 5.
Step 2 The value to the right is 4. This value is less than 5, so the number in the hundredths place will remain the same.
Step 3 Keep the number in the hundredths place.
Step 3
Answer: 100.00 8. Round 0.9672453 to the nearest hundred-thousandth. Solution: Problem: Round 0.9672453 to the nearest hundred-thousandth. Step 1 Identify the hundred-thousandths place.
Step 1 ̲
Step 2 Determine if the value to the right is greater than or equal to 5 or less than 5.
Step 2 The value to the right is 5. This value is greater than or equal to 5, so the number in the hundred-thousandths place will be rounded up.
Step 3 Add one to the 4 in the hundred-thousandths place.
Step 3
Answer: 0.96725
Mainchapter.indd 10
LOCK13e_2PP
08/07/21 3:32 PM
Summarizing Numerical Data Learning Objectives: • Choose intervals into which the data will be separated. • Tally the values in the various intervals. • Summarize this with all three of these:
• Counts of the data in each interval • Graph • Proportions of the data in each interval (relative frequencies)
Summarizing Numerical Data in Context: In a songwriter contest, each contestant submitted a recording of themselves performing a song. There were 25 entries. Each was judged by 10 judges, each of which rated the entry on a scale of 1–8. The score for each entry was the sum of the 10 judges’ scores for that entry. Below is a list of all the scores. 47 48 30 48 41 61 42 21 49 50 49 42 59 37 56 33 53 49 33 42 58 45 63 46 28 It is difficult to get useful information from such a list alone. We can see the pattern in the data better in a graph such as this. 12
50%
10
40%
8
30%
6 20% 4 10%
2
20
25
30
35
40
45
50
55
60
65
70
How do we produce such a graph? Part 1. Choose intervals. Looking over the list of scores (not very carefully) we see that all of the scores are between 10 and 80. I decide to use intervals that are 5 units long. So that is 10–15, 15–20, etc. However, that raises the question of where to put a score of 15. I choose to re-define my intervals as 10–14.9, 15–19.9, 20–24.9, etc. Then it is clear that every value will fit in exactly one interval. (VERY important that each score will appear in the graph only once.)
11
Mainchapter.indd 11
LOCK13e_2PP
08/07/21 3:32 PM
12
Summarizing Numerical Data
Part 2. Tally the data. (I’ll make a mark | for each value as the “tally.”) STEP A: Make a list of all the intervals. Then start making tally marks. The first three numbers are 47, 48, and 30. I will put the three tally marks in the appropriate places for these. STEP B: The next three numbers are 48, 41, and 61. I put tally marks in the appropriate places for these. STEP C, ETC.: I keep putting a tally mark in the correct place for each score in the list, carefully. Complete: When I finish, I count the tally marks to be sure there are exactly 25 of them. Step A Put in 47, 48, 30
Step B Include 48, 41, 61
Step C, etc.
Complete
10–14.9
10–14.9
10–14.9
15–19.9
15–19.9
15–19.9
20–24.9
20–24.9
20–24.9 |
25–29.9
25–29.9
25–29.9 |
30–34.9 |
30–34.9 |
30–34.9 |||
35–39.9
35–39.9
35–39.9 |
40–44.9
40–44.9 |
40–44.9 ||||
45–49.9 ||
45–49.9 |||
45–49.9 ||||||||
50–54.9
50–54.9
50–54.9 ||
55–59.9
55–59.9
55–59.9 |||
60–64.9
60–64.9 |
60–64.9 ||
65–69.9
65–69.9
65–69.9
70–74.9
70–74.9
70–74.9
75–79.9
75–79.9
75–79.9
Part 3. Make a graph. The graph given on the previous page is a graph of this dataset. Here are the steps for making such a graph by hand. STEP A: Draw a horizontal axis and label it with the endpoints of the intervals that you used to divide the data into groups. STEP B: Draw a vertical axis (for the counts) and label it from 0 up to a number at least as large as the most scores you have in any interval. STEP C: Draw a bar on each interval of the height of the number of scores in that interval. STEP D: Notice that it could be true that some intervals have no scores. In this dataset, the first two intervals and the last three intervals did not. Intervals on the end with no scores may be omitted, as some of these were in the graph of this data. If there are intervals with scores in them on both sides of an empty interval, then you must leave the interval in and it has a height of 0. Here is the graph of a dataset that has no scores in the interval from 35 to 39.9. 12
50%
10
40%
8
30%
6 20% 4 10%
2
20
Mainchapter.indd 12
LOCK13e_2PP
25
30
35
40
45
50
55
60
65
70
08/07/21 3:32 PM
Summarizing Numerical Data
13
How do we summarize the data with a table of counts? See the left-hand side of the table below. How do we summarize the data with a table of relative frequencies? For each interval, find the fraction of the total counts in each, and convert it to a decimal or percent. Table of counts
Table of relative frequencies
10–14.9 0
10–14.9 0/25 = 0.00 = 0%
15–19.9 0
15–19.9 0/25 = 0.00 = 0%
20–24.9 1
20–24.9 1/25 = 0.04 = 4%
25–29.9 1
25–29.9 1/25 = 0.04 = 4%
30–34.9 3
30–34.9 3/25 = 0.12 = 12%
35–39.9 1
35–39.9 1/25 = 0.04 = 4%
40–44.9 4
40–44.9 4/25 = 0.16 = 16%
45–49.9 8
45–49.9 8/25 = 0.32 = 32%
50–54.9 2
50–54.9 2/25 = 0.08 = 8%
55–59.9 3
55–59.9 3/25 = 0.12 = 12%
60–64.9 2
60–64.9 2/25 = 0.08 = 8%
65–69.9 0
65–69.9 0/25 = 0.00 = 0%
70–74.9 0
70–74.9 0/25 = 0.00 = 0%
75–79.9 0
75–79.9 0/25 = 0.00 = 0%
Notice that the graphs produced by the software here are labeled on the vertical axes by counts and by relative frequencies. It is acceptable to use either of those on the vertical axis. (Or both, as illustrated in these.)
Practice Problems Recall the songwriter contest of the example at the beginning of this worksheet. This dataset is from a different 25 songwriters who submitted entries. Here are their scores. 33 8 21 26 10 18 48 38 34 36 26 15 36 47 44 17 20 39 28 32 12 40 44 19 27 1. To summarize these data into five intervals, first notice that the numbers are all between 0 and 50. 2. Make a list of the intervals and then “tally” the scores into them. First three values: 33, 8, 21
Include next three values: 26, 10, 18
Complete
0–9.9 |
0–9.9 |
0–9.9 |
10–19.9
10–19.9 ||
10–19.9 ||||||
20–29.9 |
20–29.9 ||
20–29.9 ||||||
30–39.9 |
30–39.9 |
30–39.9 |||||||
40–49.9
40–49.9
40–49.9 |||||
3. From that, make two charts: counts and relative frequencies. Solution: Counts
Relative frequencies
0–9.9 1
0–9.9 1/25 = 0.04 = 4%
10–19.9 6
10–19.9 6/25 = 0.24 = 24%
20–29.9 6
20–29.9 6/25 = 0.24 = 24%
30–39.9 7
30–39.9 7/25 = 0.28 = 28%
40–49.9 5
40–49.9 5/25 = 0.20 = 20%
Mainchapter.indd 13
LOCK13e_2PP
08/07/21 3:32 PM
14
Summarizing Numerical Data 4. Use your results to sketch a graph of this distribution by hand. 8 30% 7 25%
6 5
20%
4
15%
3 10% 2 5%
1 0
0% 0
Mainchapter.indd 14
LOCK13e_2PP
5
10 15 20 25 30 35 40 45 50 55
08/07/21 3:32 PM
Two-Way Tables Learning Objectives: • Fill in missing values in a two-way table. • Compute proportions from a two-way table.
Two-Way Tables in Context: Sleep Study: As part of a sleep study on adults, this data was collected. 18–34
35–50
51 and older
Totals
Do not snore
122
120
90
332
Snore
37
82
121
240
Totals
159
202
211
572
Statistics question: Estimate, using a 90% confidence interval, the difference of the proportion of snorers in the people in the two younger age groups among the population from which the study was done. In order to start this, you must identify and compute the appropriate sample proportions. That is the very important prerequisite material for a statistics course. Often students think that the work on two-way tables will be easy—it’s just computing proportions. Most students find that the hard part is reading the problem carefully to see WHICH proportions they must compute to get started. It is useful to practice just that part before you start working on comparing the proportions.
Example: Election Study In this election study, two candidates were running for office. After the election was over, there was interest in comparing how the candidates did in two different precincts. (In this hypothetical example, numbers are chosen to make it easy for you to do computations.) Activity 1. Two numbers in this table are missing. Use the given values to fill those in. Candidate Lopez
Candidate Ramirez
2100
4200
Precinct 1 Precinct 2 Totals
5300
Totals
3600
6800
7800
13100
What do totals mean here? Answer: Totals are obtained by adding all the numbers for the row or column.
E X A MPLE A 1
What is the missing number in the totals column?
Solution: Add together the two values to obtain the total. Precinct 1
2100
4200
2100 + 4200 = 6300
15
Mainchapter.indd 15
LOCK13e_2PP
08/07/21 3:32 PM
16
Two-Way Tables
E X A MPLE A 2
What is the missing number for Precinct 2, Candidate Lopez?
Solution: Subtract the vote for the Precinct 1 from the totals row, so that the total will be the sum of the two counts for the two Precincts. Candidate Lopez 2100 5300 − 2100 = 3200 5300 Completed two-way table Candidate Lopez
Candidate Ramirez
Totals
Precinct 1
2100
4200
6300
Precinct 2
3200
3600
6800
Totals
5300
7800
13100
Proportions: When reporting the results of a study, we often use the proportions, as illustrated below. Here’s how the question might be stated: “What proportion of the overall group is also in the group who meet a certain condition?” To do that, we compute a proportion from this number from the overall group who meet the condition _________________________________________ number in the overall group The proportion can be given by just the fraction or by the decimal computed from the fraction. Since the number from the overall group who meet the condition can range from no one to everyone, the value of the proportion ranges from 0 to 1, inclusive. Here’s a typical question: What proportion of Group D is also in Group N? number in Group D who are also in Group N _________________________________ number in Group D
Procedure to complete this question: STEP 1: Identify which group will go in the denominator (overall group). (Where is the “of”?) STEP 2: Identify which group will go in the numerator (those in the overall group who ALSO meet the condition). STEP 3: Put each of those numbers in the appropriate place in the fraction. STEP 4: Divide to obtain a decimal. Report it to three decimal places (or as many decimal places as you are asked to use).
Mainchapter.indd 16
LOCK13e_2PP
08/07/21 3:32 PM
Two-Way Tables
E X A MPLE A 3
What proportion of the total votes came from Precinct 1?
Step 1
Step 1
_ total votes
Step 2
1 ______________ votes in Precinct
Step 3
6300 1 = _ ______________ votes in Precinct
Step 4
Find the group for the denominator. (Where is the “of”?) Step 2 Find the group for the numerator. Step 3 Put in the numbers. Step 4 Compute the decimal, which is the proportion.
E X A MPLE A 4
total votes total votes
13100
= 0.481
What proportion of Candidate Ramirez’ votes came from Precinct 1?
Step 1
Step 1
_______________ Ramirez total votes
Step 2
1 ____________________ Ramirez votes in Precinct
Step 3
1 = _ 4200 ____________________ Ramirez votes in Precinct
Step 4
4200 = _ 7800 = 0.538
Find the group for the denominator. (Where is the “of”?) Step 2 Find the group for the numerator. Step 3 Put in the numbers. Step 4 Compute the decimal, which is the proportion.
E X A MPLE A 5
17
Ramirez total votes Ramirez total votes
7800
What proportion of Candidate Lopez’ votes came from Precinct 2?
Step 1
Step 1
_____________ Lopez total votes
Step 2
Lopez votes in Precinct 2 __________________
Step 3
Lopez votes in Precinct 2 _ __________________ = 3200
Find the group for the denominator. (Where is the “of”?) Step 2 Find the group for the numerator. Step 3 Put in the numbers. Step 4
Step 4
Compute the decimal, which is the proportion.
Lopez total votes Lopez total votes
5300
3200 = _ 5300 = 0.604
Practice Problems Use these two-way tables to answer the following questions. Smoking Study: Male Smokes Does not smoke
Mainchapter.indd 17
LOCK13e_2PP
Female
20
Totals 32
110
240
08/07/21 3:32 PM
18
Two-Way Tables Sleep Study: 18–34
35–50
51 and older
Totals
Do not snore
122
120
90
332
Snore
37
82
121
240
Totals
159
202
211
572
1. In the Smoking Study, find the missing values in the table. Solution: Smokes Does not smoke
20
32 − 20 = 12
240 − 110 = 130
32
110
240
2. In the Smoking Study, use the values you computed in the previous problem and then add a row of totals at the bottom of the table. Show your work here and then make a complete table here. Male
Female
Totals
Smokes Does not smoke Totals Solution: Male
Female
Totals
Smokes
20
12
32
Does not smoke
130
110
240
Totals
20 + 130 = 150
12 + 110 = 122
150 + 122 = 272
3. In the Sleep Study, what proportion of 18–34-year-olds do not snore? Solution: Step 1
Step 1
___________
Step 2
18–34-year-olds who do not snore _________________________
Step 3
18–34-year-olds who do not snore _ _________________________ = 122
Find the group for the denominator. (Where is the “of”?) Step 2 Find the group for the numerator. Step 3
18–34-year-olds
18–34-year-olds 18–34-year-olds
Put in the numbers. Step 4
Step 4
Compute the decimal, which is the proportion.
159
122 = _ 159 = 0.767
Answer: The proportion of 18–34-year-olds who do not snore is 122/149 = 0.767. 4. In the Sleep Study, what proportion of all the people in the study are 18–34-year-olds? Solution: Step 1
Step 1
_____________
Step 2
18–34 year olds _____________
Step 3
18–34 year olds 159 _____________ = _
Find the group for the denominator. (Where is the “of”?) Step 2 Find the group for the numerator. Step 3 Put in the numbers.
Mainchapter.indd 18
LOCK13e_2PP
all people in study
all people in study all people in study
572
08/07/21 3:32 PM
Two-Way Tables Step 4
Step 4
Compute the decimal, which is the proportion.
19
159 = _ 572 = 0.278
Answer: The proportion of all the people in the study who are 18–34-year-olds is 5. In the Smoking Study, what proportion of the males are smokers? Step 1
Step 1
_____
Step 2
____________ male smokers
Step 3
20 ____________ male smokers = ____
Find the group for the denominator. (Where is the “of”?) Step 2 Find the group for the numerator. Step 3 Put in the numbers.
males
males
150
males
Answer: The proportion of males who are smokers is 6. In the Smoking Study, what proportion of all the people in the study do not smoke? Solution: Step 1
Step 1
_____________
Step 2
people in study who do not smoke _____________________________
Step 3
people in study who do not smoke ____ _____________________________ = 240
Find the group for the denominator. (Where is the “of”?) Step 2 Find the group for the numerator. Step 3 Put in the numbers.
people in study
People in study
272
people in study
Answer: The proportion of people in the study who do not smoke is 240/272. 7. In the Sleep Study, what proportion of the 51-and-older people snore? Solution: Step 1
Step 1
___________
Step 2
____________________ 51 and older who snore
Step 3
121 ____________________ 51 and older who snore = ____
Find the group for the denominator. (Where is the “of”?) Step 2 Find the group for the numerator. Step 3 Put in the numbers.
51 and older
51 and older 51 and older
211
Answer: The proportion of people 51 and older who snore is 121/211. 8. In the Sleep Study, what proportion of the people who snore are 51 or older? Solution: Step 1
Step 1
_______________
Step 2
people who snore and are 51 and older ________________________________
Step 3
people who snore and are 51 and older ____ ________________________________ = 121
Find the group for the denominator. (Where is the “of”?) Step 2 Find the group for the numerator. Step 3 Put in the numbers.
people who snore
people who snore people who snore
240
Answer: The proportion of people who snore who are 51 and older is 121/240.
Mainchapter.indd 19
LOCK13e_2PP
08/07/21 3:32 PM
Mathematics and Statistics Differences Learning Objectives: • Given a question, decide whether it is most appropriately answered with statistics. • Given a question, decide whether it is most appropriately answered with mathematics. Statistics is the science of getting useful information from data. To obtain that information, one has to do many steps: decide how to collect the data, collect it, organize and summarize it, and then decide whether and what useful information can be found in it. In simple terms, mathematics is the study of numbers, shapes, and patterns. The mathematics that most of us think of is the result of those studies where someone has found patterns to use that, when given the same input ALWAYS give the same output. Mathematics is one of the methods we use to investigate questions in statistics, but not the only method. For students in a statistics class, a main question is “When am I expected to simply use mathematical methods to obtain an answer to the question asked and when am I supposed to be using additional ideas that take into account statistical issues?”
Example: Scenario 1: A community college calculates tuition based solely on the number of credit hours a person is taking. The same community college has a bookstore, but students can buy their books elsewhere. And, of course, the books for different courses do not all cost the same. For each of these, we want to determine what the cost to a student will be. To find an acceptable answer to that question in each of these, are mathematical methods alone enough?
Solution Scenario 1: Mathematical methods alone are enough. We can ask the student how many credit hours they will take and give a clear and precise answer for the cost of tuition. Mathematical methods alone are not enough. We can imagine that we could ask enough questions (what courses are they taking, what are the required books, will they buy them from the college bookstore or not, etc.) that we could give a particular student a definitive answer. But that’s not what would happen from this. Instead, we will make enough assumptions to make some predictions about a range of costs. Those are statistical methods.
Example: Scenario 2: For quantitative variables, we can compute various statistics, such as mean, median, and standard deviation. We say that a statistic is “resistant” if it is relatively unaffected by extreme values in the dataset. A question that can be asked is, “Between the mean and the median, which statistic is more resistant?” Is this a question that is answered by only mathematical methods or is statistical thinking also required? 20
Mainchapter.indd 20
LOCK13e_2PP
08/07/21 3:32 PM
Mathematics and Statistics Differences
21
Solution Scenario 2: We can investigate this question mathematically, by deciding on a method of comparison and carrying out that method. When we carry it out, the answer to the question is quite clear: the median is more resistant than the mean. This method requires statistical thinking, as described in the first paragraph here.
Practice Problems Solutions to Practice Problems: Scenario 3: Data was gathered on a sample of adults. Two questions were “What is your highest level of education completed?” and “Some people say that there is only one true love for each person. Do you agree or disagree?” The data are summarized in this table. For each of the two questions below, is only mathematical thinking required or does the question also require statistical thinking? High school
Some college
College degree
Agree
322
146
162
Disagree
521
432
744
Don’t know
33
51
60
Among those surveyed, what is the difference in the proportion of people with a college degree who agree and the proportion of people with a high school diploma who agree? b. Is there evidence of a difference in the proportions of people who agree with this statement among the population in general? Solution: Scenario 3: a. Calculating a proportion is a mathematical method. So only mathematical methods are needed for this question. b. Determining whether the data provide evidence for a claim about the population requires statistical thinking. Scenario 4: In your statistics class, you are asked to look at a histogram of a bell-shaped distribution and estimate the standard deviation. You have been told a method about using the middle 95% of the distribution to do this and actually calculate a number. Is this question using only mathematical methods or does it also require statistical thinking? Solution: Scenario 4: Despite your having been given a mathematical way of finding an answer, that method requires some estimation of where the middle 95% is. Thus, it is also about interpreting the variability in the data. Interpreting variability in the data is certainly a part of statistical thinking.
Read more about the differences: Tran, D., & Lee, H. S. (2015). The difference between statistics and mathematics. In Teaching statistics through data investigations MOOC-Ed, Friday Institute for Educational Innovation: NC State University, Raleigh, NC. Retrieved from https://fi-courses.s3.amazonaws.com/tsdi/unit_2/Essentials/Statvsmath.pdf.
Mainchapter.indd 21
LOCK13e_2PP
08/07/21 3:32 PM
Decimals: Addition and Subtraction Learning Objectives: • Add signed decimals. • Subtract signed decimals.
Adding Decimals in Context: Consider all American households and how many cars they own. The following are the proportions from a recent survey (ignoring those households with more than 5 cars). Number of cars
0
1
2
3
4
5
Proportion of households
0.0787
0.392
0.347
0.127
0.04
0.0153
Find the proportion of households who own between 3 and 5 cars (including both 3 and 5). Solution: This is the sum of 0.127, 0.04, and 0.0153. The sum here is 0.1823. While it is possible to use a calculator to find this sum, for full understanding of how to use decimals and “think with” decimals, it is useful to understand how to add and subtract these without a calculator.
Adding and Subtracting Decimals • Change any integers to decimal form by writing a decimal point after the integer.
• If adding two numbers with different signs, use subtraction on the two unsigned numbers.
• Sometimes when adding/subtracting decimals, it is necessary to write additional zeros after the decimal point.
• If subtracting two numbers with the same sign, use subtraction on the two unsigned numbers.
• Look at the other decimal numbers in the problem to determine how many zeros should be written after the decimal point. Write as many zeros as needed so that all numbers in the calculation have the same number of decimal places.
• If subtracting two numbers with different signs, use addition on the two unsigned numbers. • Look at the problem and determine what sign the answer should have: • If adding two numbers with the same sign, the sign of the final answer will be the same.
• If the problem is not originally written as a vertical calculation, rewrite the problem as a vertical calculation. • When adding/subtracting decimals, make sure that the decimal points are vertically aligned. • When adding/subtracting decimals, use the same techniques as when adding/subtracting integers: • Look at the problem and determine whether addition or subtraction should be used: • If adding two numbers with the same sign, use addition on the two unsigned numbers.
• If adding two numbers with different signs, the sign of the final answer will be the same sign as the original sign of the larger of the two unsigned numbers. • If subtracting two numbers with the same sign, the final answer will be the same sign as the result of the subtraction of the unsigned numbers. • If subtracting two numbers with different signs, the sign of the final answer will be the same as the first number.
22
Mainchapter.indd 22
LOCK13e_2PP
08/07/21 3:32 PM
Decimals: Addition and Subtraction 23
E X A MPLE A 1
Adding decimals
Problem: Add: 73.146 + 2. Step 1
Step 1
Change any integers to decimal form by writing a decimal point after the integer.
2 ⇒ 2.
Step 2
Step 2
Add any necessary zeros after the decimal point.
Since the number 73.146 has 3 digits after the decimal point, 2 needs to also have 3 digits after the decimal point, by adding 3 zeros after the decimal point:
Step 3
Step 3
Vertically align the decimal points. Then determine whether addition or subtraction on the unsigned numbers should be used.
Since we are adding two numbers with the same sign, we use addition on the unsigned numbers:
Step 4
Step 4
Determine the sign of the final answer.
Since we are adding two numbers with the same sign, the answer will have the same sign. Since the sign of the two numbers was originally positive, the final answer will be positive.
Answer: 75.146
E X A MPLE A 2
Subtracting decimals
Problem: Subtract: Step 1
Step 1
Change any integers to decimal form by writing a decimal point after the integer.
All numbers are already written in decimal form.
Step 2
Step 2
Add any necessary zeros after the decimal point.
Since the number 3.41 has 2 digits after the decimal point, 25.9 needs to also have 2 digits after the decimal point, by adding 1 zero after the decimal point:
Step 3
Step 3
Vertically align the decimal points. Then determine whether addition or subtraction on the unsigned numbers should be used.
Since we are subtracting two numbers with the same sign, we use subtraction on the unsigned numbers:
Step 4
Step 4
Determine the sign of the final answer.
Since we are subtracting two numbers with the same sign, the final answer will be the same sign as the result of the subtraction of the unsigned numbers. Since the sign of 25.9 was originally negative, the final answer is negative.
Answer: −22.49
Mainchapter.indd 23
LOCK13e_2PP
08/07/21 3:32 PM
24
Decimals: Addition and Subtraction
Practice Problems 1.
Answer: 26.7 2.
Answer: 5.84 3. Since we are adding two numbers with different signs, we use subtraction on the unsigned numbers:
Since we are adding two numbers with different signs, the sign of the final answer will be same as the original sign of the larger of the two unsigned numbers. Since the original sign of 183.176 is negative, the final answer is negative. Answer: −128.676 4. Since we are subtracting two numbers with different signs, we use addition on the unsigned numbers:
Since we are subtracting two numbers with different signs, the sign of the final answer will be the same as the first number. Since the original sign of 0.036 is negative, the final answer is negative. Answer: −15.516 5. 379.05 Answer: 379.05 6. Since we are subtracting two numbers with different signs, we use subtraction on the unsigned numbers:
Since we are subtracting two numbers with the same sign, the final answer will be the same sign as the result of the subtraction of the unsigned numbers. Since the original sign of 23 is positive, the final answer is positive. Answer: 17.159
Mainchapter.indd 24
LOCK13e_2PP
08/07/21 3:32 PM
Decimals: Addition and Subtraction
25
7. Since we are subtracting two numbers with the same sign, we use subtraction on the unsigned numbers:
Since we are subtracting two numbers with the same sign, the final answer will be the same sign as the result of the subtraction of the unsigned numbers. Answer: −93.18 8. Since we are adding two numbers with the same sign, we use addition on the unsigned numbers:
Since the original two numbers are negative, the final answer is negative. Answer: −107.5963
Mainchapter.indd 25
LOCK13e_2PP
08/07/21 3:32 PM
Fractions to Decimals Learning Objectives: • Use a calculator to convert a fraction to a decimal. • Identify decimal forms of common fractions.
Fractions to Decimals in Context: It is known that, in the overall population, the proportion of people who are left-handed is about 10%. On the governing board of a community college, there are 9 people and 4 of them are left-handed. We want to compare that to the proportion of people in the overall population. Write an appropriate number in two forms: a fraction and a decimal, rounded to four decimal places. Use a calculator for your calculations. number of left-handed governing board members _ Solution: Fraction: __________________________________ = 4 9 number of governing board members Decimal: Do the division indicated by the fraction. In the statistics class, we will • Call this decimal a proportion. • Convert this proportion to a percentage to compare it to values such as 10%. Those objectives are covered in a different worksheet in this course: Percentages and Proportions on a Number Line.
Decimal Forms of Common Fractions _ 1 = 0.3 _ 3 1
1 = 0.1 _ 10
_ 1 = 0.2
5
_ 1 = 0.25
4
E X A MPLE A 1
Converting fractions to decimals using a calculator
78 to decimal form using a calculator. Problem: Convert −_ 5 Step 1
Step 1
Type into the calculator.
Answer:
26
Mainchapter.indd 26
LOCK13e_2PP
08/07/21 3:32 PM
Fractions to Decimals 27
E X A MPLE A 2
Identifying decimal forms of common fractions
Problem: Give the decimal form of the common fraction 5 Step 1 Identify a useful common fraction on the chart above. Step 2 2 = 2 ⋅ _ 1 , we can multiply the decimal equivalent by 2. Because _ 5 5
Step 1
_ 1 = 0.2
5
Step 2 0.2 ⋅ 2 = 0.4
Answer:
Practice Problems
1. _ 64 15 Answer: 4.266666666666 etc., which rounds to 4.2667 45 2. _ 16 Answer: 2.8125 3. Answer: −25.75 38 4. _ 8 Answer: 4.75 5. 1 Answer: 0.25 1 6. _ 5 Answer: 0.20 3 7. _ 5 Answer: 0.60 1 8. _ 2 Answer: 0.50
Mainchapter.indd 27
LOCK13e_2PP
08/07/21 3:32 PM
Estimating Area Learning Objectives: • Find areas of shapes by counting squares on graph paper. • Estimate areas under a statistical distribution curve.
Estimating Area in Context: We often use a graph that looks something like the one below. At times we want to mark the points that cut off some specific percentage of the area in the tails. This example shows 2.5% of the area of the graph on each end.
In this worksheet, you’ll see some “training pictures” to help you develop your visual skill in doing this.
Area by counting squares: In math classes, you learned about finding the area of a rectangle by measuring each side and counting the squares. You know that a rectangle that is 3 units long and 2 units high has 6 squares in it. Count them. A rectangle (square) that is 10 units long and 10 units high has 100 squares in it.
When we are looking at a statistical graph, we usually think of the entire graph as representing 100% of the data. Many statistical graphs are curves, so it is harder to visualize the “squares” underlying how to count the areas. First here, we’ll talk about a statistical graph that has straight lines instead of curves, so it is easy to count the squares to think about areas. For this statistical graph (which has an area of 100 squares, so each square is 1% of the total area), we will think of cutting off 2% of the area from each end. Here is such a graph. It is 50 squares long and 2 squares high, thus it has 100 squares. To cut off 2% of the area on each end, we cut off (shade in) 2 squares on each end.
28
Mainchapter.indd 28
LOCK13e_2PP
Here is another picture of the same graph to illustrate that the white part of the grid, outside of the graph, is not relevant to our calculations at all. The shaded area under the
08/07/21 3:32 PM
Estimating Area
29
graph is 100% of the area under the graph. (The scale and the gridlines were designed to show that.)
Area by estimating the number of squares covered: Now, our statistical graphs aren’t nice “boxes” that make it easy to include whole squares. A triangle is a shape that is easier to find areas in than areas under curves. This large triangle (shaded) has length 40 and the height in the middle is 5. The triangle has an area of 100 squares. To cut off 2% (two squares) is somewhat more complicated than with the rectangle. But we can estimate the area. At the end, we can’t cut off whole squares, because there aren’t whole squares inside the triangle at the ends. Here’s how I cut off the area equivalent to 2 squares of area on each end.
Here’s a “magnified” version of the left cut-off area. Can you see how it is reasonable to estimate the total colored area as about 2 squares worth?
These are the kinds of estimations you will need to do with curves in statistics. It’s harder with curves. And you will not have a grid like this to look at. However, you do not have to be very precise in your estimates. Just remember that the entire shaded area under the graph is 100% and you’re trying to think about what approximately 2% of the area in each tail looks like. Also, when you’re in a statistics class, you’ll usually know how many observations were used in making the graph, so that will help you refine your estimates if you need to do that. The main thing to remember is that, when you’re talking about how much area is in the tails of the distribution, you have to think about BOTH the length of the shaded area along the horizontal axis and the height of the shaded area as well.
Estimating areas in statistical graphs: Example: Is the total of the shaded area in this graph closer to 20% of the area of the graph or 50% of the area of the graph?
Solution: It’s true that the tails go out quite a long way, but the graph has very little area out in the far tails. Think of “how much paint” of each color it would take to “paint” this. You need quite a lot of paint for the middle part, but not so much for the outer part that is shaded. This shaded area doesn’t look close to half of the total area. So, given the choices, it must be closer to 20% of the area. Answer: This graph was carefully prepared so that the colored area is 10% in each tail, so the total colored area is 20%.
Mainchapter.indd 29
LOCK13e_2PP
08/07/21 3:32 PM
30
Estimating Area
Important FAQs: 1. Are you expected to be really good at estimating areas in graphs just by looking at them? Answer: No. Are you expected to improve your skills in estimating areas in graphs just by looking at them? 3. In statistics, will you always have to estimate the areas or are there ways to calculate them?
4. How can you learn more about estimating areas in graphs?
Practice Problem Below are three graphs. The shaded areas on the left-hand sides of these three graphs are these three values. Decide which is which and fill in the left-hand column with the values. 0.05 Label each graph with its shaded area. Area of lefthand side Graph A area of left-hand side 0.05
Graph B area of left-hand side 0.10
Graph C area of left-hand side 0.025
Mainchapter.indd 30
LOCK13e_2PP
08/07/21 3:32 PM
Estimating Area
31
Graphs of Statistical Distributions to Illustrate the Areas in the Tails Symmetric
The region under the curve has a total area of 1.0.
Darkly shaded portions are each 10% of the area under the curve.
Darkly shaded portions are each 5% of the area under the curve.
Darkly shaded portions are each 2.5% of the area under the curve.
Mainchapter.indd 31
LOCK13e_2PP
08/07/21 3:32 PM
32
Estimating Area Right-skewed
The region under the curve has a total area of 1.0.
Darkly shaded portions are each 10% of the area under the curve.
Darkly shaded portions are each 5% of the area under the curve.
Darkly shaded portions are each 2.5% of the area under the curve.
Mainchapter.indd 32
LOCK13e_2PP
08/07/21 3:32 PM
Evaluating Formulas: Part 1 Learning Objectives: • Simplify expressions and formulas containing multiple operations. • Simplify expressions containing grouping symbols. • Evaluate formulas.
Order of Operations: PEMDAS P
Parentheses
()
E
Exponent
x n or √
M
Multiplication
⋅
D
Division
÷
A
Addition
+
S
Subtraction
−
_
whichever comes first, left → right whichever comes first,
Keep in mind that the P includes any grouping symbol. For example, [ ] and { } are also included under the first step of order of operations. Square root is, as an operation, just a special case of an exponent. (Square root is equivalent to an exponent of 1/2.) A fraction bar is meant to show division between the numerator and denominator. The fraction bar also tells us to simplify, separately, the top (numerator) and bottom (denominator). After the numerator and denominator are simplified, you can then use the division oper3 is the same as 3 ÷ 4. ation. For example, _ 4
E X A MPLE A 1
Simplify using order of operations
Problem: Simplify Step 1
Step 1
Apply PEMDAS: Multiplication.
2 + 3 ⋅ 7 2 + 21
Step 2
Step 2
Apply PEMDAS: Addition.
2 + 21 23
Answer: 23
33
Mainchapter.indd 33
LOCK13e_2PP
08/07/21 3:32 PM
34
Evaluating Formulas: Part 1
E X A MPLE A 2
Simplify using order of operations
Problem: Simplify Step 1
Step 1
Apply PEMDAS: Multiplication.
Step 2
Step 2
Apply PEMDAS: Addition.
21
Answer: 21
E X A MPLE A 3
Simplify using order of operations
Problem: Simplify Step 1
Step 1
Apply PEMDAS: Subtraction.
4 − 6 + 3 −2 + 3
Step 2
Step 2
Apply PEMDAS: Addition.
Answer: 1
E X A MPLE A 4
Simplify using order of operations
Problem: Simplify 4 − (6 + 3). Step 1
Step 1
Apply PEMDAS: Simplify inside parenthesis.
4 − (6 + 3) 4−
9
Step 2
Step 2
Apply PEMDAS: Subtraction.
4 − 9 −5
Answer: −5
Mainchapter.indd 34
LOCK13e_2PP
08/07/21 3:32 PM
Evaluating Formulas: Part 1
E X A MPLE A 5
35
Simplify using order of operations
Problem: Simplify Step 1
Step 1
Apply PEMDAS: Inside parentheses.
Step 2
Step 2
Apply PEMDAS: Addition.
1
Answer: 1
E X A MPLE A6
Evaluate and simplify using order of operations
Problem: Evaluate and simplify y = a + bx, where a = 3, b = 2, x = 9. Step 1
Step 1
Substitute numbers in the formula.
Step 2
Step 2
Apply PEMDAS: Multiplication.
Step 3
Step 3
Apply PEMDAS left to right: Addition.
Answer: y = 21
Practice Problems 1 . Simplify Solution: Problem: Simplify Step 1
Step 1
Apply PEMDAS: Subtraction.
Step 2
Step 2
Apply PEMDAS: Addition.
12
Answer: 12
Mainchapter.indd 35
LOCK13e_2PP
08/07/21 3:32 PM
36
Evaluating Formulas: Part 1 2 . Simplify Solution: Problem: Simplify Step 1
Step 1
Apply PEMDAS: Operation in grouping symbols.
Step 2
Step 2
Apply PEMDAS: Subtraction.
4
Answer: 4 3 . Simplify Solution: Problem: Simplify Step 1
Step 1
Apply PEMDAS: Operation in grouping symbols.
Step 2
Step 2
Apply PEMDAS: Addition.
Answer: 12 4 . Evaluate and simplify y = a + bx for b = 2, x = 17. Solution: Problem: Evaluate and simplify y = a + bx for b = 2, x = 17. Step 1
Step 1
Substitute numbers in the formula.
y = −5 + 2 ⋅ 17
Step 2
Step 2
Apply PEMDAS: Multiplication.
Step 3
Step 3
Apply PEMDAS: Addition.
y = 29
Answer: y = 29 5. Compare problems 1, 2, and 3. a. Which of problems 1 and 2 have the same meaning as 3? 1: Simplify 11 − 3 + 4. 2: Simplify 11 − (3 + 4). 3: Simplify (11 − 3) + 4. b. Explain, using ideas from PEMDAS, why these have the same meaning. Solution: Problem: Compare problems 1, 2, and 3. a. Which of problems 1 and 2 have the same meaning as 3? 1: Simplify 11 − 3 + 4. 2: Simplify 11 − (3 + 4). 3: Simplify (11 − 3) + 4. Answer: Exercises 1 and 3 have the same meaning. b. Answer: For exercise 1, PEMDAS tells us to do the subtraction first, because it is the first operation from left to right. For exercise 3, PEMDAS tells us to do the subtraction first, because it is the operation in parentheses.
Mainchapter.indd 36
LOCK13e_2PP
08/07/21 3:32 PM
Evaluating Formulas: Part 2 Learning Objectives: • Evaluate formulas, using decimal numbers as well as integers. • Evaluate formulas where the list of given values is in a different order than expected. • Simplify expressions with grouping symbols. • Simplify expressions with multiple operations. • Simplify expressions containing a fraction bar.
Order of Operations: PEMDAS P
Parentheses
()
E
Exponent
x n or √
M
Multiplication
D
Division
A
Addition
S
Subtraction
−
_
whichever comes first, left → right whichever comes first,
t
Keep in mind that the P includes any grouping symbol. For example, [ ] and { } are also included under the first step of order of operations. Square root is, as an operation, just a special case of an exponent. (Square root is equivalent to an exponent of A fraction bar is meant to show division between the numerator and denominator. The fraction bar also tells us to simplify, separately, the top (numerator) and bottom (denominator). After the numerator and denominator are simplified, you can then use the division oper3 ation. For example, 4 In statistics, many calculations result in decimals that “never end.” Your teacher (or the statement of the problem) may tell you how many decimal places to give in your answer. If you aren’t told, three or four decimal places is usually enough. Some proportion problems may require more decimal places than that for some calculations.
E X A MPLE A 1
Evaluate and simplify using order of operations
Problem: Evaluate and simplify y = a + bx, where a = 3, b = 0.2, x = 9.4. Step 1
Step 1
Substitute numbers in the formula.
y = 3 + 0.2 ⋅ 9.4
37
Mainchapter.indd 37
LOCK13e_2PP
08/07/21 3:32 PM
38
Evaluating Formulas: Part 2
Step 2
Step 2
Apply PEMDAS: Multiplication.
y = 3 + 0.188
Step 3
Step 3
Apply PEMDAS left to right: Addition.
Answer: y = 3.188
E X A MPLE A 2
Evaluate and simplify using the order of operations
̅ Problem: Evaluate and simplify ̅ Step 1
Step 1
Evaluate the formula by substituting the numbers for the symbols.
x − x̅ z = _ s 185 − 150 z = _ 15
Step 2
Step 2
Apply PEMDAS: Simplify the numerator and denominator separately (before dividing).
185 − 150 z = _ 15 35 z = _ 15
Step 3
Step 3
Divide.
z = 2.3333
Answer: z = 2.3333
E X A MPLE A 3
Evaluate and simplify using the order of operations
Problem: Evaluate and simplify Step 1
Step 1
Evaluate the formula by substituting the numbers for the symbols.
Step 2
Step 2
Apply PEMDAS: Simplify the expression in the grouping symbols.
Do the first multiplication.
y = (6 + 9 ⋅ 4)/6
Do the second multiplication.
y = (6 + 36)/6
Add.
y = (42)/6
Step 3
Step 3
Divide.
y=7
Answer: y = 7
Mainchapter.indd 38
LOCK13e_2PP
08/07/21 3:32 PM
Evaluating Formulas: Part 2
E X A MPLE A 4
39
Evaluate and simplify using the order of operations
Problem: Evaluate and simplify for a = 2, b = 4. Step 1
Step 1
Evaluate the formula by substituting the numbers for the symbols.
Step 2
Step 2
Apply PEMDAS:
Do the first multiplication. Do the second multiplication. Do the division.
y=6+6
Step 3
Step 3
Add.
Answer: y = 12
Practice Problems 1. Evaluate and simplify
2. Evaluate and simplify y = a + bx for b = −10.4, ̅ 3. Evaluate and simplify ̅ ̅ 4. Evaluate and simplify ̅ Solutions: 1. Evaluate and simplify y = bx + a for x = 2.5, a = 1.2, b = −3. Problem: Evaluate and simplify y = bx + a for x = 2.5, a = 1.2, b = −3. Step 1
Step 1
Substitute numbers in the formula.
y = −3 ⋅ 2.5 + 1.2
Step 2
Step 2
Apply PEMDAS: Multiplication.
Step 3
Step 3
Apply PEMDAS: Addition.
Answer: 2. Evaluate and simplify y = a + bx for Problem: Evaluate and simplify y = a + bx for a = 1103.2, x = 25.1 Step 1
Step 1
Substitute numbers in the formula.
The parentheses help keep track of
y=( )+( )⋅( )
the sign on the negative number.
y = 1103.2 + (−10.4) ⋅ 25.1
Mainchapter.indd 39
LOCK13e_2PP
08/07/21 3:32 PM
40
Evaluating Formulas: Part 2 Step 2
Step 2
Apply PEMDAS: Multiplication.
Step 3
Step 3
Subtraction
y = 842.16
Answer: y = 842.16
x − x̅ for x = 23.4, s = 2.87, x = 18.2. 3. Evaluate and simplify using the order of operations: z = _ ̅ s Problem: Evaluate and simplify ̅ ̅
Step 1 Evaluate the formula by substituting the numbers for the symbols.
Step 1 x − x ̅
Step 2
Step 2
Apply PEMDAS: Simplify the numerator and denominator separately (before dividing).
Step 3 Divide. In statistics, many calculations result in decimals that “never end.” Your teacher (or the statement of the problem) may tell you how many decimal places to give in your answer. If you aren’t told, three or four decimal places is usually enough.
Step 3 5.2 z = 1.811847 z = 1.812
Answer: z = 1.812 4. Evaluate and simplify using the order of operations: ̅ x = 9.7, s = 0.8. ̅ , x = 9.7, s = 0.8. Problem: Evaluate and simplify ̅ ̅
Step 1
Step 1
Evaluate the formula by substituting the numbers for the symbols.
̅
Step 2
Step 2
Apply PEMDAS: Simplify the numerator and denominator separately (before dividing).
Step 3
Step 3
Divide.
Answer:
Mainchapter.indd 40
LOCK13e_2PP
08/07/21 3:32 PM
Summation Notation Learning Objectives: • Use summation notation with index as is usually done in mathematics. • Use summation notation without index as is usually done in statistics.
Summation Notation in Context: The symbol ∑
To compute the average of the sample values, we use the formula n
Summation Notation as Used in Mathematics The summation symbol (an upper-case Greek letter, sigma) is a shorthand way of indicating 6
a particular sum. Here’s an example: ∑ 3 i is a short way of writing 3 + 6 + 9 + 12 + 15 + 18. i=1
In this case, using the summation symbol seems unnecessary because it is just as easy to write the six numbers to be added. However, suppose you wanted to indicate that you want the sum of all the square roots of numbers between 10 and 50, it would be tedious to write out all of them. But, using summation 50
_
notation, it is very compact. ∑ √i . If we wanted to write it out without summation notation, _
_ j=10 _
_
_
we might write √10 + √11 + √12 + √13 . . . + √50 and just assume that everyone would see what the . . . in the center means. But that seems rather informal and might not be interpreted correctly. We sometimes want to write formulas for the sum of n numbers, where it can be any count that we choose. ∑ nk=1 2k = 2 ⋅ 1 + 2 ⋅ 2 + 2 ⋅ 3 + 2 ⋅ 4 . . . + 2 ⋅ n
Summation Notation as Used in Statistics Suppose we had a study that included collecting the heights of people (inches) as well as other variables. Suppose there were 67 people in this study. To compute the average height, we will add all the values and divide that sum by 67. (In statistics class, we’ll call this average the “mean.”) The formula in the statistics book to tell you this will say something like this. 1 The sample mean is ( _ n )∑X. You are supposed to understand, from this: • The individual subjects’ heights are values on a variable, and, in this formula, we are thinking of this as the variable X. • The variable n in the formula is the sample size. That is, the number of people in the study. (To be very specific, the number of people for whom we have a number for their height.) • The summation symbol says to add all of those values of X for all the different people in the study. 1 • To find the sample mean, we compute the number for _ n and multiply it by the value we found for the summation. 1 • Or it would also be acceptable to divide the sum by n instead of multiplying it by _ n . Why does the summation symbol used in statistics not have the symbols above and below ∑?
Mainchapter.indd 41
LOCK13e_2PP
41
12/07/21 3:59 PM
42
Summation Notation
Answer: In statistics, we are always using all of the values from whatever group we are considering, so there is no need to include that into the symbolic notation. If, in statistics, we mean to do anything else, then we use the symbols as if we were in the realm of the usual mathematics, thus we do put the appropriate symbols above and below ∑.
E X A MPLE 1
Calculating Values using Summation Notation
Find the sample mean for this set of five observations on height, in inches: 61, 71, 68, 63, 61. Solution:
1 ∑X 1 _ n (_ ) = ( 5 ) ⋅ (61 + 71 + 68 + 63 + 61)
1 )(324) = (_ 5 = 64.8
The sample mean for this set of data is 64.8 inches. Sometimes statistics formulas will be more complicated than the formula for the sample mean. But the summation is always treated in the same way.
E X A MPLE 2
Advanced Calculations Using Summation Notation
Evaluate 1 ∑X 2 = ( _ n − 2) ___________
√
=
_____________________________
1 ) ⋅ (612 + 712 + 682 + 632 + 612) √(_ 5–2 ___________
√(_ 31)(21076) _
= √7025.333 = 83.817
Practice Problems
1. Solution: 10
∑ 2k = 2 ⋅ 5 + 2 ⋅ 6 + 2 ⋅ 7 + 2 ⋅ 8 + 2 ⋅ 9 + 2 ⋅ 10
k=5
= 10 + 12 + 14 + 16 + 18 + 20 = 90
2. Solution:
Mainchapter.indd 42
LOCK13e_2PP
1 ∑X 1 _ (_ n ) = ( 4 ) ⋅ (−1 + 2 + 4 + 6) 1 )(11) = (_ 4 = 2.75
12/07/21 3:59 PM
Summation Notation
43
3. Solution:
1 ∑X 2 = _ 1 2 (_ n + 22 + 42) ) ( 3 ) ⋅ ((−1) 1 )(21) = (_ 3 =7
Mainchapter.indd 43
LOCK13e_2PP
12/07/21 3:59 PM
Coordinate Plane and Points Learning Objectives: • Identify the x- and y-axes of a coordinate plane (whether they are labeled with these letters or not).
• Plot points on a coordinate plane. • For a particular point on a coordinate plane, name its coordinates.
Coordinate Plane and Points in Context: Can we predict the values of the weight of an adult from their height? To visualize these data, we graph the points on a coordinate plane, where we put a person’s height along the horizontal axis and their weight along the vertical axis. As we see for the 14 data points here, there is an indication that, for larger values for the height we see larger values for the weight. Thus, we can make some predictions about weight from height. We’ll learn about that in statistics class. 200 190 180 170
y Weight
160 150 140 130 120 110 100 90 62
63
64
65
66
67 68 69 x Height
70
71
72
73
In order to study this in a statistics class, you must know how to graph points on a coordinate plane, which is described in this worksheet. The coordinate plane is a useful tool to plot points, lines, and functions easily. It is made up of two perpendicular number lines called the x-axis and y-axis, which intersect at a point called the origin. Points are plotted using unique x- and y-values. Plot enough points and connect the dots to create all kinds of images!
44
Mainchapter.indd 44
LOCK13e_2PP
12/07/21 3:59 PM
45
Coordinate Plane and Points
The Coordinate Plane and Points The coordinate plane is made up of a horizontal axis called the x-axis and a vertical axis called the y-axis. Where these two axes intersect is called the origin. When working in statistics, the axes may be labeled with the letters x and y or with names of variables. If they are labeled with
E X A MPLE A 1
the names of variables, for this graph, the variable on the horizontal axis is called x and the variable on the vertical axis is called y. A point is designated as (x, y). The x-value is found by following the x-axis left or right. The y-value is found by following the y-axis up or down.
Plot the points on a coordinate plane
Problem: Plot the points (0, −2) and (3, −3) on the coordinate plane. Step 1
Step 1
Given the points, determine which numbers are the x and y components of the point.
(0, −2) x-value: 0 y-value:
Step 2
Step 2
Plot each point on the coordinate plane. Find the x-value.
For each point, find the x-value on the x-axis. Positive values move to the right of the origin. Negative numbers move to the left of the origin.
(3, −3) x-value: 3 y-value: −3
The point (0, −2) The point (3, −3) Step 3
Step 3
Plot each point on the coordinate plane. Find the y-value.
From the x-value, move up or down depending on the sign of the y-value number. Negative numbers are below the origin and positive values are above the origin. The point (0, −2) has a −2 in the y-value. Therefore, we move 2 numbers down from the x-axis. The point x-axis.
in the y-value. We move 3 numbers down from the
Step 4
Step 4
Plot the point and label it.
Place a dot at the location of the point. Label the dot with the coordinates given, in this case
Answer: The points (0, −2) and (3, −3) FIGURE A1 Graph of plotted points (0, −2) and (3, −3) y 4 3 2 1 x –4
–3
–2
–1
0 –1 –2 –3
1
2
3
4
(0, –2) (3, –3)
–4
Mainchapter.indd 45
LOCK13e_2PP
12/07/21 3:59 PM
46
Coordinate Plane and Points
E X A MPLE A 2
Name the coordinates of a plotted point
Problem: Using the points shown in Figure A2, identify the coordinates of the points labeled A and B. FIGURE A2 Graph of plotted points y 8 6 4 2 x –8
–6
–4
–2
A
0 –2
2
4
6
8
–4 –6
B
–8
Step 1
Step 1
Find the x-value of the coordinate.
Place your pencil on Point A. Move the point of the pencil up until it hits the x-axis. In this case, your pencil should be on the number −4. The x-value of the coordinate for Point A is −4. Repeat the process for Point B. The x-value for Point B is 3.
Step 2
Step 2
Find the y-value of the coordinate.
Place your pencil on Point A. Move the point of the pencil over to the right until it hits the y-axis. In this case, your pencil should be on the number −4. The y-value of the coordinate for Point A is −4. Repeat the process for Point B. The y-value for Point B is –6.
Step 3
Step 3
Write the coordinates.
The coordinates for a point are listed as (x, y). Point A’s coordinates are (−4, −4). Point B’s coordinates are (3, −6).
Answer: In Figure A2, the coordinates for Point A are (−4, −4). The coordinates for Point B are (3, −6).
Mainchapter.indd 46
LOCK13e_2PP
12/07/21 3:59 PM
47
Coordinate Plane and Points
E X A MPLE A 3
Identify points on a statistical graph
Problem: Using the following dataset and graph, identify the coordinates of the lowest point on the left-hand side and the points in the upper right where there are two points overlapping. First: Use the graph alone to estimate the coordinates. Second: Compare your estimated coordinates to the points in the dataset and give the actual data values for these points. Data Height
200
Weight
190
173
73
162
180
72
166
170
68
160
160
70
159
63
128
71
171
72
202
64
149
71
155
110
62
94
100
65
114
90
63
118
64
133
y Weight
71
150 140 130 120
62
63
64
65
66
67 68 69 x Height
70
71
72
73
Step 1
Step 1
Use the graph to estimate the x-value of the coordinate.
In statistics, we always call the horizontal axis the x-axis, whether it is labeled that or not. Place your pencil on the lowest point on the left side. Move the point of the pencil down until it hits the horizontal axis. In this case, your pencil should be on the number 62. Repeat the process for the two points in the upper right. It’s harder to see exactly what, on the horizontal axis, they correspond to, but it is close to 71.
Step 2
Step 2
Use the graph to estimate the y-value of the coordinate.
Place your pencil on lowest point on the left. Move the point of the pencil over to the left until it hits the y-axis. It’s hard to know exactly, but it looks like the value is close to 95. Repeat the process for the two points on the upper right. The y-value for this is close to 170.
Step 3
Step 3
Because the full dataset is given, use it to find the more precise values of the coordinates of these points for these points.
Lower left: We estimated (62, 95) and the dataset shows us that (62, 94) is a point in the dataset. Upper right: We estimated (71, 170) and the dataset shows us two points: (71, 173) and (71, 171).
Answer: For the point on the lower left, the coordinates are (62, 94) and for the two points that are close together on the upper right are (71, 173) and (71, 171).
Mainchapter.indd 47
LOCK13e_2PP
12/07/21 3:59 PM
48
Coordinate Plane and Points
Scales of the Graph Notice that the illustrations of the coordinate planes here (and in most math classes) have scales that go to zero. In statistics classes, it is often true that the dataset has no values near zero for one or both of the variables. That is the case for the dataset from the beginning of this handout. Here is the graph.
200 190 180 170
y Weight
160 150 140 130 120 110 100 90 62
63
64
65
66
67 68 69 x Height
70
71
72
73
However, to do all of the work we will do in statistics class with these graphs, you must be able to LOOK at the original graph and visualize what it would look like if the scale included (0, 0). See the picture on the left below. Just thinking about this, you WON’T know how far down the graph would go—just that it would go quite far down. In statistics class, you will learn to use software or a formula to find an equation for the line. One of the numbers in that equation will tell you where that graph crosses the y-axis. Here, the graph shows us that it crosses the x-axis at about y = −255. See the picture on the right below. ̂ (Looking ahead: In your statistics class, you will learn that the equation of the line is −255.8 + 5.99 ⋅ x. 200 200
150
150
100
100
50
0
y Weight
y Weight
50
–50 –100
0 –50 –100
–150
–150
–200
–200
–250 –250 0
10
20
30
40
x Height
Mainchapter.indd 48
LOCK13e_2PP
50
60
70 0
10
20
30 40 x Height
50
60
70
12/07/21 4:00 PM
Coordinate Plane and Points
49
Practice Problems
Solutions for problems 1–4 are at the end of this worksheet. 1. Plot the points (4, 6) and (−3, 4) on the coordinate plane. 2. Plot the points (−1, −3) and (1, −3) on the coordinate plane. 3. Plot the points (−5, 2) and (6, −6) on the coordinate plane. 4. Plot the points (2, 0) and (0, 4) on the coordinate plane. 5. Identify the coordinates of the points labeled A and B in FIGURE A3 below. Answer: The coordinates for Point A are (−5, 5). The coordinates for Point B are (−3, 3). FIGURE A3 Two graphed points y 8 6
A B
4 2 x
–8
–6
–4
–2
0
2
–2
4
6
8
–4 –6 –8
6. Identify the coordinates of the points labeled A and B in FIGURE A4 below. Answer: The coordinates for Point A are (1, −4). The coordinates for Point B are (−1, 4). FIGURE A4 Two points graphed y 8 6 B 4 2 x –8
–6
–4
–2
0
2
–2 –4
4
6
8
A
–6 –8
Mainchapter.indd 49
LOCK13e_2PP
12/07/21 4:00 PM
50
Coordinate Plane and Points 7. Identify the coordinates of the points labeled A and B in FIGURE A5 below. Answer: The coordinates for Point A are (3, −6). FIGURE A5 One point on axis y 8 6
B
4 2 x –8
–6
–4
–2
0
2
–2
4
6
8
–4 A
–6 –8
8. Identify the coordinates of the points labeled A and B in FIGURE A6 below. Answer: The coordinates for Point A are (8, 0). The coordinates for Point B are (0, 7). FIGURE A6 Two points on axes y 8
B
6 4 2 A x –8
–6
–4
–2
0 –2
2
4
6
8
–4 –6 –8
Mainchapter.indd 50
LOCK13e_2PP
12/07/21 4:00 PM
Coordinate Plane and Points
Data
200
Weight
71
173
73
162
72
166
68
160
70
159
63
128
71
171
72
202
64
149
71
155
62
94
65
114
63
118
64
133
190 180 170 160 y Weight
Height
51
150 140 130 120 110 100 90 62
63
64
65
66
67 68 69 x Height
70
71
72
73
9. What is the y-coordinate of a point whose x-coordinate is approximately 67? Solution: From the graph, it appears that the only point with the x-coordinate close to 67 actually has x-coordinate closer to 68. From the graph, the y-coordinate of that point is something close to 160. From the dataset, that point must be x = 68, y = 160. The answer to this question is 160. What is the x-coordinate of a point whose y-coordinate is approximately 135? Solution: From the graph, we see that there are two points whose y-value is around 130. One has y-coordinate below 130, and one has y-coordinate above 130. The point with the next-higher y-coordinate has y-coordinate around 150, so that’s clearly not the one we need. Thus, the point with y-coordinate above 130 but lower than 150 has x-coordinate of 64. From the dataset, that point must be x = 64, y =133. The answer to this question is 64. How many points have x-coordinate equal to 72, and what are their coordinates? Solution: Looking at x = 72 on the graph, we see two points. Their y-values are approximately 160 and 200. Looking at the dataset, we see that the points have coordinates x = 72, y = 166 and x = 72, y = 202. What is the weight of the tallest person in this dataset? Solution: “Tallest” means the largest value for the height. From the graph, it is easy to see that the point farthest out on the x-axis has x-coordinate of approximately 73. From the graph, we can see that the y-coordinate of that point is approximately 160. From the dataset, we find the point (73, 162). Thus, the tallest person in this dataset has weight 162.
Mainchapter.indd 51
LOCK13e_2PP
12/07/21 4:00 PM
52
Coordinate Plane and Points Solutions for Practice Problems 1–4: 1. FIGURE S1 y 8 (4, 6)
6 (–3, 4)
4 2 x
–8
–6
–4
–2
0
2
–2
4
6
8
–4 –6 –8
2. FIGURE S2 y 8 6 4 2 x –8
–6
–4
–2
(–1, –3)
0
2
–2
4
6
8
6
8
(1, –3)
–4 –6 –8
3. FIGURE S3 Graph of plotted points (−5, 2) and (6, −6) y 8 6 4 (–5, 2)
2 x
–8
–6
–4
–2
0 –2
2
4
–4 –6
(6, –6)
–8
Mainchapter.indd 52
LOCK13e_2PP
12/07/21 4:00 PM
Coordinate Plane and Points
53
4. FIGURE S4 Graph of plotted points (2, 0) and (0, 4) y 8 6 4
(0, 4)
2 (2, 0) –8
–6
–4
–2
0 –2
2
x 4
6
8
–4 –6 –8
Mainchapter.indd 53
LOCK13e_2PP
12/07/21 4:00 PM
Linear Formulas Learning Objectives: • Use the slope and intercept of an exact linear formula to draw its graph. • Given an application, define an exact linear formula. • Identify the domain and range of a linear formula.
Linear Formulas in Context In a certain city, a study was done of the times (minutes) it took workers to commute to work and the distance (miles) they traveled to work. There is an approximately linear relationship between these variables. We want to answer these questions: • What is the formula of the line to predict time from distance? • How do we interpret the coefficients of the line? 42
Distance
40
9
14
38
25
26
36
11
22
34
14
22
3
10
22
30
32
Time
30
Time
8
15
28
10
20
26
30
40
24 22 20 18 16 14 12 10 4
6
8
10
12
14
16 18 20 Distance
22
24
26
28
30
32
A very important difference between statistics questions about linear relationships and mathematical questions about linear formulas is: • In a mathematical formula for a line, ALL the points are points exactly on the line. 54
Mainchapter.indd 54
LOCK13e_2PP
• In a statistics course, we explore linear formulas as the “best line” to approximate the relationship. It may be that none of the points are exactly on the line.
12/07/21 4:00 PM
Linear Formulas
55
In your statistics course, to understand the interpretation of the formula for the line, it is important that you first understand (and be able to write) the interpretations of the slope and y-intercept in the mathematical formulas for lines. This worksheet is about understanding the meaning of the slope and y-intercept on a graph of the line with an exact linear relationship.
Identities of Linear Formulas • b = value of y when x = 0 • if m > 0, formula is increasing. • if m < 0, formula is decreasing. • if m = 0, horizontal line crossing through (0, b).
Linear formulas are written in the form y = mx + b. (In math classes, sometimes other letters besides x are used, as illustrated in some of the exercises here.)
E X A MPLE A 1
Draw the graph of a linear formula given slope and y-intercept
Problem: Graph the formula with a slope of 2 and a y-intercept of 10. Step 1
Step 1
Plot the y-intercept.
Plot the point (0, 10). See Figure S1.
Step 2
Step 2
Plot the point corresponding to the slope of the formula.
From the y-intercept, move up 2 units and right 1 unit. See Figure S2. It is also acceptable to move down 2 units and left 1 unit.
Step 3
Step 3
Connect the two points with a straight line.
See Figure S3.
Answer: FIGURE S1 Graph of y-intercept (0, 10) y 14 12 10
(0, 10)
8 6 4 2 x –10
–8
Mainchapter.indd 55
LOCK13e_2PP
–6
–4
–2
0 –2
2
4
6
8
10
12/07/21 4:00 PM
56
Linear Formulas
FIGURE S2 Graph of y-intercept and second point using slope y 14 12 10
(1, 12) (0, 10)
8 6 4 2 x –10
–8
–6
–4
–2
0
2
–2
4
6
8
10
FIGURE S3 Graph of the linear formula y = 2x + 10 y 14 12 10
(1, 12) (0, 10)
8 6 4 2 x –10
–8
–6
–4
E X A MPLE A 2
–2
0 –2
2
4
6
8
10
Define a linear formula from an application
Problem: A jogger walks 0.5 miles from home as a warm-up and then starts running at a rate of 6 miles per hour. Determine the jogger’s distance from home y as a formula using jogging rate (r). Identify the domain and range of the formula. Step 1
Step 1
Determine the initial value.
The jogger starts running at a distance 0.5 miles from here. Therefore, the initial value of the formula is 0.5.
Step 2
Step 2
Determine the rate of increase or decrease.
The jogger runs at a rate of 6 miles per hour. This is the rate of increase of the formula.
Step 3
Step 3
Write the formula in slope-intercept form.
y = 6r + 0.5
Mainchapter.indd 56
LOCK13e_2PP
12/07/21 4:00 PM
Linear Formulas
Step 4
Step 4
Determine the domain of the formula.
The runner cannot run negative miles, so the domain will be the set of all values greater than or equal to zero.
Step 5
Step 5
Determine the range of the formula.
Since the domain is restricted to all positive values, including zero, the range will be restricted to the output produced by this restriction. When the jogger runs zero miles, the distance from home is already 0.5 miles. Therefore, the range is the set of all numbers greater than or equal to 0.5.
57
Answer: The distance of home as a formula of jogging rate is y = 6r + 0.5. The domain of the formula is [0, ∞), and the range of the formula is [0.5, ∞).
Practice Problems Solutions are given at the end of this document. 1–4 Draw the graph of a linear formula given slope and y-intercept. (Use graph paper.) 1. Graph the formula with a slope of −4 and a y-intercept of 0. 2 and a y-intercept of 5. 2. Graph the formula with a slope of −_ 3 3. Graph the formula with a slope of 7 and a y-intercept of −6. 7 and a y-intercept of −2. 4. Graph the formula with a slope of _ 2 5–8 Define a linear formula from an application. 5. A family rents a moving van for a base fee of $100 and is charged $0.05 per mile travelled. Determine the cost (y) to rent the van as a formula of the miles travelled (m) with the van. Identify the domain and range of the formula. 6. A sprinter runs from the starting line to the finish line at an average speed of 0.0042 miles per second. Determine the distance the sprinter runs (y) as a formula of seconds (s). Identify the domain and range of the formula. 7. A car that cost $30,000 is put on a payment plan of $500 per month over 5 years. Determine the amount remaining to pay on the car (y) as a formula of the time elapsed (t). Identify the domain and range of the formula. 8. A swimming pool initially contains 34,000 gallons of water and is desired to be emptied for cleaning. The water can be siphoned out at a rate of 1000 gallons per hour. Determine the volume (y) of the pool as a formula of time (t). Identify the domain and range of the formula.
Solutions 1. Draw the graph of a linear formula given slope and y-intercept. Solution: Problem: Graph the formula with a slope of −4 and a y-intercept of 0. Step 1
Step 1
Plot the y-intercept.
Plot the point (0, 0). See Figure S4.
Step 2
Step 2
Plot the point corresponding to the slope of the formula.
From the y-intercept, move down 4 units and right 1 unit. See Figure S5. It is also acceptable to move up 4 units and left 1 unit.
Step 3
Step 3
Connect the two points with a straight line.
See Figure S6.
Mainchapter.indd 57
LOCK13e_2PP
12/07/21 4:00 PM
58
Linear Formulas Answer: FIGURE S4 Graph of y-intercept (0, 0) y 14 12 10 8 6 4 2 (0, 0) –10
–8
–6
–4
–2
0
2
–2
x 4
6
8
10
FIGURE S5 Graph of y-intercept and second point using slope y 6 4 2 (0, 0) –10
–8
–6
–4
–2
0
2
–2 –4
x 4
6
8
10
(1, –4)
–6 –8 –10
FIGURE S6 Graph of the linear formula y = −4x y 6 4 2 (0, 0) –10
–8
–6
–4
–2
0 –2 –4
2
x 4
6
8
10
(1, –4)
–6 –8 –10
Mainchapter.indd 58
LOCK13e_2PP
12/07/21 4:00 PM
Linear Formulas
59
2. Draw the graph of a linear formula given slope and y-intercept. Solution: 2 and a y-intercept of 5. Problem: Graph the formula with a slope of −_ 3 Step 1 Step 1 Plot the y-intercept.
Plot the point (0, 5). See Figure S7.
Step 2
Step 2
Plot the point corresponding to the slope of the formula.
From the y-intercept, move down 2 units and right 3 units. See Figure S8. It is also acceptable to move up 2 units and left 3 units.
Step 3
Step 3
Connect the two points with a straight line.
See Figure S9.
Answer: FIGURE S7 Graph of y-intercept (0, 5) y 10 8 6
(0, 5)
4 2 x –8
–6
–4
–2
0
2
–2
4
6
8
10
12
–4 –6
FIGURE S8 Graph of y-intercept and second point using slope y 10 8 6
(0, 5)
4
(3, 3)
2 x –8
–6
–4
–2
0 –2
2
4
6
8
10
12
–4 –6
Mainchapter.indd 59
LOCK13e_2PP
12/07/21 4:00 PM
60
Linear Formulas 2x+5 FIGURE S9 Graph of the linear formula y = −_ 3
y 10 8 6
(0, 5)
4
(3, 3)
2 x –8
–6
–4
–2
0
2
–2
4
6
8
10
12
–4 –6
3. Draw the graph of a linear formula given slope and y-intercept. Solution: Problem: Graph the formula with a slope of 7 and a y-intercept of −6. Step 1
Step 1
Plot the y-intercept.
Plot the point (0, −6). See Figure S10.
Step 2
Step 2
Plot the point corresponding to the slope of the formula.
From the y-intercept, move up 7 units and right 1 unit. See Figure S11. It is also acceptable to move down 7 units and left 1 unit.
Step 3
Step 3
Connect the two points with a straight line.
See Figure S12.
Answer: FIGURE S10 Graph of y-intercept (0, −6) y 6 4 2 x –8
–6
–4
–2
0 –2
2
4
6
8
10
12
–4 –6
(0, –6)
–8 –10
Mainchapter.indd 60
LOCK13e_2PP
12/07/21 4:00 PM
Linear Formulas
61
FIGURE S11 Graph of y-intercept and second point using slope y 6 4 2
(1, 1) x
–8
–6
–4
–2
0
2
–2
4
6
8
10
12
–4 –6
(0, –6)
–8 –10
FIGURE S12 Graph of the linear formula y = 7x − 6 y 6 4 2
(1, 1) x
–8
–6
–4
–2
0 –2
2
4
6
8
10
12
–4 –6
(0, –6)
–8 –10
4. Draw the graph of a linear formula given slope and y-intercept. Solution: 7 and a y-intercept of −2. Problem: Graph the formula with a slope of _ 2 Step 1 Step 1 Plot the y-intercept.
Plot the point (0, −2). See Figure S13.
Step 2
Step 2
Plot the point corresponding to the slope of the formula.
From the y-intercept, move up 7 units and right 2 units. See Figure S14. It is also acceptable to move down 7 units and left 2 units.
Step 3
Step 3
Connect the two points with a straight line.
See Figure S15.
Mainchapter.indd 61
LOCK13e_2PP
12/07/21 4:00 PM
62
Linear Formulas Answer: FIGURE S13 Graph of y-intercept (0, −2) y 8 6 4 2 x –8
–6
–4
–2
0 –2
2
4
6
8
10
12
(0, –2)
–4 –6 –8
FIGURE S14 Graph of y-intercept and second point using slope y 8 6 (2, 5)
4 2
x –8
–6
–4
–2
0 –2
2
4
6
8
10
12
(0, –2)
–4 –6 –8
7x−2 FIGURE S15 Graph of the linear formula y = _ 2
y 8 6 (2, 5)
4 2
x –8
–6
–4
–2
0 –2
2
4
6
8
10
12
(0, –2)
–4 –6 –8
Mainchapter.indd 62
LOCK13e_2PP
12/07/21 4:00 PM
Linear Formulas
63
5. Define a linear formula from an application. Solution: Problem: A family rents a moving van for a base fee of $100 and is charged $0.05 per mile travelled. Determine the cost (y) to rent the van as a formula of the miles travelled (m) with the van. Identify the domain and range of the formula. Step 1 Determine the initial value. Step 2 Determine the rate of increase or decrease.
Step 1 The rental van base fee is $100. Therefore, the initial cost to rent the moving van is $100. Step 2 The moving van company charges $0.05 per mile travelled, which is the rate of increase of the formula.
Step 3
Step 3
Write the formula in slope-intercept form.
y = 0.05m + 100
Step 4
Step 4
Determine the domain of the formula.
The domain will be the set of input values for the formula. Since the input is miles travelled, it is not feasible to travel negative miles. Therefore, the plausible set of input values will be all positive numbers including zero.
Step 5
Step 5
Determine the range of the formula.
Since the domain is restricted to all positive input values, including zero, the range will be restricted to the output produced by this restriction. When the family travels zero miles, the van cost is $100. Therefore, the range is the set of all numbers greater than or equal to 100.
Answer: The cost to rent the moving van as a formula of the miles travelled is y = 0.05m + 100. The domain of the formula is [0, ∞), and the range of the formula is [100, ∞). 6. Define a linear formula from an application. Solution: Problem: A sprinter runs from the starting line to the finish line at an average speed of 0.0042 miles per second. Determine the distance the sprinter runs (y) as a formula of seconds (s) run. Identify the domain and range of the formula. Step 1
Step 1
Determine the initial value.
The sprinter starts running at the starting line which can be considered no distance travelled. Therefore, the initial value of the formula is 0.
Step 2
Step 2
Determine the rate of increase or decrease.
The sprinter runs at a rate of 0.0042 miles per second, which is the rate of increase of the formula.
Step 3
Step 3
Write the formula in slope-intercept form.
y = 0.0042s
Step 4
Step 4
Determine the domain of the formula.
The domain will be the set of input values for the formula. Since the input is miles run by the sprinter, it is not feasible to run negative miles. Therefore, the plausible set of input values will be all positive numbers including zero.
Mainchapter.indd 63
LOCK13e_2PP
12/07/21 4:00 PM
64
Linear Formulas Step 5
Step 5
Determine the range of the formula.
Since the domain is restricted to all positive input values, including zero, the range will be restricted to the output produced by this restriction. When the sprinter runs zero miles, there is no distance travelled. Therefore, the range is the set of all numbers greater than or equal to zero.
Answer: The distance the sprinter runs as a formula of time is y = 0.0042s. The domain of the formula is [0, ∞), and the range of the formula is [0, ∞). 7. Define a linear formula from an application. Solution: Problem: A car that cost $30,000 is put on a payment plan of $500 per month over 5 years. Determine the amount remaining to pay on the car (y) as a formula of the time elapsed (t) in months. Identify the domain and range of the formula. Step 1
Step 1
Determine the initial value.
The car initially cost $30,000. Therefore, the initial value of the payment formula is $30,000.
Step 2
Step 2
Determine the rate of increase or decrease.
The car is paid off at a rate of $500 per month, which is the rate of decrease of the formula.
Step 3
Step 3
Write the formula in slope-intercept form.
y = −500t + 30000
Step 4
Step 4
Determine the domain of the formula.
The domain will be the set of input values for the formula. Since the input is time elapsed, it is not feasible to have negative time. It is also not possible to pay off more than the initial amount of the car, which will take 60 months to accomplish. Payments are made once a month so input values will only be integers. Therefore, the plausible set of input values will be all integers between and including zero and 60.
Step 5
Step 5
Determine the range of the formula.
Since the domain is restricted to integers between zero and 60, the range will be restricted to the output produced by this restriction. When no payments have been made on the car, the amount remaining on the loan is $30,000 and the goal amount is a zero balance on the loan. Therefore, the range is the set of all numbers between zero and 30000.
Answer: The amount remaining to pay on the car as a formula of time is y = −500t + 30000. The domain of the formula is [0, 60], and the range of the formula is [0, 30000]. 8. Define a linear formula from an application. Solution: Problem: A swimming pool initially contains 34,000 gallons of water and is desired to be emptied for cleaning. The water can be siphoned out at a rate of 1000 gallons per hour. Determine the volume (y) of the pool as a formula of time (t). Identify the domain and range of the formula.
Mainchapter.indd 64
LOCK13e_2PP
Step 1
Step 1
Determine the initial value.
The pool initially has a volume of 34,000 gallons. Therefore, the initial value of the formula is 34000.
12/07/21 4:00 PM
Linear Formulas Step 2
Step 2
Determine the rate of increase or decrease.
The water is siphoned out of the pool at a rate of 1000 gallons per hour which is the rate of decrease of the formula.
Step 3
Step 3
Write the formula in slope-intercept form.
y = −1000t + 34000
Step 4
Step 4
Determine the domain of the formula.
The domain will be the set of input values for the formula. Since the input is time elapsed, it is not feasible to have negative time. It is also not possible to siphon out more water than is initially present in the pool. To determine the maximum amount of time it will take to empty the pool, we must set the formula equal to zero and solve for time.
65
y = −1000t + 34000 0 = −1000t + 34000 1000t = 34000 t = 34 Therefore, the plausible set of input values will be all real numbers between zero and 34. Step 5
Step 5
Determine the range of the formula.
Since the domain is restricted to all real numbers between zero and 34, the range will be restricted to the output produced by this restriction. When no time has elapsed, the pool will be full at a volume of 34,000 gallons. After 34 hours, the pool will be empty at a volume of zero gallons. Therefore, the range is the set of all real numbers between zero and 34000.
Answer: The volume of the pool as a formula of time is y = −1000t + 34000. The domain of the formula is [0, 34], and the range of the formula is [0, 34000].
Mainchapter.indd 65
LOCK13e_2PP
12/07/21 4:00 PM
Slope-Intercept Form with Rise over Run Learning Objectives: • Determine a line’s y-intercept and slope from its slope-intercept form of the equation. • Plot a line’s intercept, counting off spaces to find another point, and sketching the line. • Recognize whether a line is rising or falling as it moves from left to right.
Slope-Intercept Form of a Line in Context: In a certain city, a study was done of the time (minutes) it took workers to commute to work and the distance (miles) they traveled to work. There is an approximately linear relationship between these variables. We want to answer these questions: • What is the formula of the line to predict time from distance? • How do we interpret the coefficients of the line? 42
Distance
40
9
14
38
25
26
36
11
22
34
14
22
3
10
22
30
32
Time
30
Time
8
15
28
10
20
26
30
40
24 22 20 18 16 14 12 10 4
6
8
10
12
14
16 18 Distance
20
22
24
26
28
30
32
66
Mainchapter.indd 66
LOCK13e_2PP
12/07/21 4:00 PM
67
Slope-Intercept Form with Rise over Run
A very important difference between statistics questions about linear relationships and mathematical questions about linear formulas is: • In a mathematical formula for a line, ALL the points are points exactly on the line. • In a statistics course, we explore the “best line” to approximate the relationship. It may be that none of the points are exactly on the line. In your statistics course, to understand the interpretation of the formula for the line, it is important that you first understand (and be able to write) the interpretations of the slope and y-intercept in the mathematical formulas for lines. This worksheet is about understanding the meaning of the slope and y-intercept on a graph of the line with an exact linear relationship. The slope-intercept form, y = mx + b, provides you with several very important pieces of information about the line. The value of m, the slope, tells you whether the graph of the line rises or falls, and whether it’s steep or gently moving. The value of b, when inserted into the coordinate (0, b), gives you a point on the line. In algebra class, we use the notation y = mx + b, but in a statistics course, we use the notation y = a + bx. In each of these the slope is the value mulitplied by the variable x and the other number gives the value of y when x = 0.
A Line’s Slope-Intercept Form A slope-intercept form of the equation of a line is y = mx + b. change in y rise __________ • The coefficient m is the slope where m = ____ run = change in x . • The b represents the y-intercept of the line, (0, b). When m is positive, the line rises from left to right; when m is negative, the line falls from left to right.
E X A MPLE A 1
When 0 < |m| < 1, meaning that the slope is a proper fraction, the line gently rises or falls. Also, when |m| > 0, the greater the value of m, the steeper the line. The formula for the slope of the line through
Determining the slope and y-intercept
1 ? Problem: What are the slope and y-intercept of the line y = −_ 3 x + _ 5 4 Step 1
Step 1
Determine the slope, m, from the coefficient of x in
1 is −_ 3 , so m = −_ 3 . The coefficient of x in y = −_ 3 x + _ 5 4 4 4
Step 2
Step 2
Determine the y-value of the intercept (0, b) from the equation of the line.
1 is _ 1, The value of b in y = −_ 3 x + _ 5 5 4
1 _ (0, 5 ).
3 and (0, b) = 0, _ 1 are m = − _ 1 Answer: The slope and y-intercept of the line y = −_ 3 x + _ ( 5 ). 5 4 4
Mainchapter.indd 67
LOCK13e_2PP
12/07/21 4:00 PM
68
Slope-Intercept Form with Rise over Run
E X A MPLE A 2
Sketching the graph of a line
Problem: Sketch the graph of x + 3y = 6 using the line’s slope and y-intercept. Step 1
Step 1
Change the equation x + 3y = 6 to the slope-intercept form by subtracting x from each side and then dividing each term by 3.
Step 2
Step 2 Determine the slope and the y-intercept.
1 and the 1 x + 2, the slope is m = − _ From the equation y = −_ 3 3 y-intercept is (0, 2).
Step 3
Step 3
Plot the intercept on a graph.
See Figure A1.
Step 4
Step 4 −1 −_ 1 = _ 3 3
Write the slope as a fraction with the negative sign in the numerator. Step 5
Step 5
Determine the “run” and “rise/fall” indicated by the slope.
−1 The slope _ 3
Step 6
Step 6
From the intercept, count 3 units to the right and 1 unit down.
Step 7
Step 7
Plot the point (3, 1) on your graph.
Count 3 units to the right and 1 unit down from the point (0, 2) to find the point (3, 1). See Figure A1.
Step 8
Step 8
Draw the line through the two points.
The graph of the line x + 3y = 6 is found in Figure A1.
FIGURE A1 The graph of x + 3y = 6 y 8 7 6 5 4
x + 3y = 6
3 2
(0, 2) (3, 1)
1
x –6
–5
–4
–3
–2
–1
0 –1
1
2
3
4
5
6
7
8
–2 –3
Mainchapter.indd 68
LOCK13e_2PP
12/07/21 4:00 PM
Slope-Intercept Form with Rise over Run
69
Practice Problems Answer: The slope is m = −4, and the y-intercept is (0, 2). 2. What are the slope and y-intercept of the line 3 , and the y-intercept is (0, ‒4). Answer: The slope is m = _ 5 3. Graph the line y = 3x − 2 using the slope and y-intercept. See solution near the end of this worksheet. 4. Graph the line y = −4x + 5 using the slope and y-intercept. See solution near the end of this worksheet. 1 x + 3 using the slope and y-intercept. 5. Graph the line y = _ 3 See solution near the end of this worksheet. Graph the line y = −_ 5 x − 1 using the slope and y-intercept. 2 See solution at the end of this worksheet. a. rises slowly (gently)
b. rises rapidly (steeply)
c. falls slowly (gently)
d. falls rapidly (steeply)
e. vertical
f. horizontal
2 x − 10 7. Describe how the graph of the line y = _
3 2 x − 10 rises gently. Answer: The slope of y = _ 3 8. Describe how the graph of the line Answer: The slope of Describe how the graph of the line looks. Answer: The line y = −9 is horizontal. 15 looks. Describe how the graph of the line y = 7x − _ 16 1 5 _ Answer: The slope of y = 7x − rises steeply. 16
Mainchapter.indd 69
LOCK13e_2PP
12/07/21 4:00 PM
70
Slope-Intercept Form with Rise over Run
Supplementary Problems Given the graph of a line in Figure A2, what is its equation in slope-intercept form? Answer: The slope-intercept version of the line’s equation is y = 3x + 6. FIGURE A2 y 9 8 7 6 5 4 3 2 1 x –8
–7
–6
–5
–4
–3
–2
–1 0 –1
1
2
3
4
5
6
–2 –3
12. Given the graph of a line in Figure A3, what is its equation in slope-intercept form? Answer: The slope-intercept version of the line’s equation is FIGURE A3 The graph of a steeply falling line y 7 6 5 4 3 2 1 x –5
–4
–3
–2
–1 0 –1
1
2
3
4
5
6
7
–2 –3 –4 –5 –6 –7 –8
Mainchapter.indd 70
LOCK13e_2PP
12/07/21 4:00 PM
Slope-Intercept Form with Rise over Run
71
Given the graph of a line shown in Figure A4, what is its equation in slope-intercept form? Answer: The slope-intercept version of the line’s equation is y = 3. FIGURE A4 The graph of a horizontal line y 9 8 7 6 5 4 3 2 1 x –7
–6
–5
–4
–3
–2
–1
0
1
–1
2
3
4
5
6
7
–2 –3 –4 –5
Given the graph of a line shown in Figure A5, what is its equation in slope-intercept form? 1 x + 2. Answer: The slope-intercept version of the line’s equation is y = _ 2 FIGURE A5 The graph of a gently rising line y
8 7 6 5 4 3 2 1 x –6
–5
–4
–3
–2
–1
0 –1
1
2
3
4
–2
Mainchapter.indd 71
LOCK13e_2PP
12/07/21 4:00 PM
72
Slope-Intercept Form with Rise over Run
Solutions to 3–6: 3. Graph the line using the slope and y-intercept. Solution: Problem: Graphing the line Step 1
Step 1
Determine the y-intercept and slope of the line.
The y-intercept is (0, −2),
Step 2
Step 2
Plot the y-intercept on your graph.
This y-intercept is 2 units below the origin. See Figure S1 for the plotted point.
Step 3
Step 3 3 3 = _ 1 The rise is 3, and the run is 1. For every move 1 unit to the right, the line will rise 3 units.
Rewrite the slope as a fraction and determine the rise and run.
Step 4
Step 4
From the intercept, count 1 unit to the right and 3 units up to reach another point on the line.
Step 5
Step 5
Plot the point (1, 1) on your graph and draw the line through the two points.
See Figure S1. The graph of and (1, 1).
FIGURE S1 The graph of y 6 5 y = 3x – 2
4 3 2 1
(1, 1) x
–3
–2
–1
0
1
–1 –2
1
2
3
4
5
3
(0, –2)
–3 –4
Mainchapter.indd 72
LOCK13e_2PP
12/07/21 4:00 PM
Slope-Intercept Form with Rise over Run
73
4. Graph the line using the slope and y-intercept. Solution: Problem: Graphing the line Step 1
Step 1
Determine the y-intercept and slope of the line.
The y-intercept is (0, 5), and the slope is m = −4.
Step 2
Step 2
Plot the y-intercept on your graph.
This y-intercept is 5 units above the origin. See Figure S2 for the plotted point.
Step 3
Step 3 −4 −4 = _ 1 The rise (or fall) is ‒4, and the run is 1. For every move 1 unit to the right, the line will fall 4 units.
Rewrite the slope as a fraction and determine the rise and run.
Step 4
Step 4
From the intercept, (0, 5), count 1 unit to the right and 4 units down to reach another point on the line.
Step 5
Step 5
Plot the point (1, 1) on your graph and draw the line through the two points.
See Figure S2. The graph of goes through (0, 5) and (1, 1).
FIGURE S2 The graph of y 8 7 6 5
(0, 5)
4 3 2 1
(1, 1) x
–5
–4
–3
–2
–1
0 –1 –2
1
2
3
4
5
6
y = –4
–3 –4
Mainchapter.indd 73
LOCK13e_2PP
12/07/21 4:00 PM
74
Slope-Intercept Form with Rise over Run 1 x + 3 using the slope and y-intercept. 5. Graph the line y = _ 3 Solution: 1x + 3 Problem: Graphing the line y = _ 3 Step 1
Step 1
Determine the y-intercept and slope of the line.
The y-intercept is (0, 3), and the slope is 1 . m = _ 3 Step 2
Step 2 Plot the y-intercept on your graph.
This y-intercept is 3 units above the origin. See Figure S3 for the plotted point.
Step 3
Step 3 1 m = _ 3 The rise is 1, and the run is 3. For every move 3 units to the right, the line will rise 1 unit.
The slope is written as a fraction; determine the rise and run.
Step 4
Step 4
From the intercept, (0, 3), count 3 units to the right and 1 unit up to reach another point on the line.
Step 5
Step 5
Plot the point (3, 4) on your graph and draw the line through the two points.
See Figure S3.
1 x + 3 goes through (0, 3) The graph of y = _ 3 and (3, 4).
FIGURE S3 The graph of y = _ 1 x + 3
3
y 7 6 y = 13 x + 3
5 4 3
(3, 4) (0, 3)
2 1 x –4
–3
–2
–1
0 –1
1
2
3
4
5
6
–2
Mainchapter.indd 74
LOCK13e_2PP
12/07/21 4:00 PM
Slope-Intercept Form with Rise over Run
75
5 x − 1 using the slope and y-intercept. 6. Graph the line y = − _ 2 Solution: Problem: Graphing the line y = −_ 5 x − 1 2 Step 1
Step 1
Determine the y-intercept and slope of the line.
The y-intercept is and the slope is m = −_ 5 . 2 Step 2
Step 2 Plot the y-intercept on your graph.
This y-intercept is 1 unit below the origin. See Figure S4 for the plotted point.
Step 3
Step 3 −5 −_ 5 = _ 2 2 The rise (fall) is ‒5,
Rewrite the slope as a fraction with the negative sign in the numerator and determine the rise and run.
Step 4
Step 4
From the intercept, (0, −1), count 2 units to the right and 5 units down to reach another point on the line.
Step 5
Step 5
Plot the point
See Figure S4.
5 x − 1 goes through (0, −1) The graph of y = −_ 2 and (2, −6).
FIGURE S4 The graph of y = −_ 5 x − 1
2
y 4 3 y = –52 x – 1
2 1 x
–5
–4
–3
–2
–1
0 –1
1
2
3
4
5
6
(0, –1)
–2 –3 –4 –5 –6
(2, –6)
–7 –8
Mainchapter.indd 75
LOCK13e_2PP
12/07/21 4:00 PM
Statistical Symbols and Concepts Learning Objective: • Identify and make a reference list of statistical notation and terminology used in the course. As you go through the course, use this worksheet to keep track of the letters and symbols used and their meanings. If the symbol is a Greek letter, also include how to say that letter. Also included are several other lists of ideas and vocabulary. You will find most of these in your statistics textbook. You might be able to review more easily if you keep track of where to find those in your textbook.
Notation for Numerical Measurements Name
Sample statistic
What it is called
Population parameter
What it is called
Mean
x̅ or X̅
x-bar
μ
mu
Section of text
Standard deviation Proportion Correlation Slope (regression) Difference of means Difference of proportions
Types of Graphs Graph
Type of variable(s)
Section of text
Bar chart Pie chart Two-way table Dot plot Histogram Side-by-side graph Scatterplot
76
Mainchapter.indd 76
LOCK13e_2PP
12/07/21 4:00 PM
Statistical Symbols and Concepts
77
Concepts Related to Graphs What?
Is there any formula or symbol for it?
Notes
Section of text
Outlier Skewed Symmetric Bell-shaped Median Range First quartile Third quartile Interquartile range IQR Five number summary Resistant statistic z-score 95% rule pth percentile Regression line Slope Intercept Residual
Statistical Terminology Terminology
Notes
Section of text
Cases and variables Categorical variable Quantitative variable Explanatory variable Response variable Population Sample Statistical inference Sampling bias Simple random sample Association versus causation Confounding variables Observational studies
Mainchapter.indd 77
LOCK13e_2PP
12/07/21 4:00 PM
78
Statistical Symbols and Concepts Terminology
Notes
Section of text
experiment Randomized experiment Randomized comparative experiment Matched pairs experiment Control groups Placebo “Blind” and “Double Blind” experiment
Mainchapter.indd 78
LOCK13e_2PP
12/07/21 4:00 PM
Reading Statistics Problems Learning Objectives: • In a statistics problem, identify the type of question asked. • In a statistics problem, identify the variables mentioned. Many statistics problems are presented with some (or even more) “context.” That makes them interesting, and it is often needed to understand what questions are asked. How can you pull out the information you need from that context? Most statistics problems (presented in textbooks) are about some data, which means some set of individuals, variables that were measured on those individuals, and a question that asks for some type of analysis you have learned to do. Your task is to clarify what each of those is, from the context given. • Who/what are the individuals on whom data were collected? • What are the variables? • What type of statistical analysis is requested? You can approach these three questions in any order. In fact, you’re not really finished with any one of them until you have dealt with all three of them, to show that you fully understand what is being asked. In the process of answering these, you will almost certainly need to re-read the problem several times. To begin to identify the variables, think about what is interesting here. What information did they obtain, or try to obtain from the subjects in the study? Is there a dataset available? If so, look at it and see what that tells you about what they measured. If not, try to make up some possible values for what they measured. (Try to make up some strange or unusual values— that is probably more interesting than trying to make up “typical” values.) Also try to make up a name for the variable. It isn’t important that it be right—just that it gives you a start in organizing your thoughts about the information. For the two studies below, no datasets are provided here. The objective is to see what you can find from just the description of the study. Example 1 (Beetles in grain): “In a study of leaf beetle damage to grain, researchers planted grain in 20 small plots. They treated half the plots with a pesticide and did not treat the other half with pesticide. At an appropriate length of time later, they recorded the number of leaf beetle larvae per stem. Then they estimated the difference in the average number of leaf beetle larvae in the group of treated plots and the group of untreated plots.” Solution: (A description of how I read this problem.) • The last sentence tells me that our dataset will have a number associated with it that is the number of leaf beetle larvae counted. • I have no idea what a reasonable number is: it might be 0 on some plots and it might be 1000 on others. (Well, probably not that large—surely they’d have trouble getting anyone to count!) • That suggests that the individuals in the study are the plots of land where grain is planted. • To find the difference mentioned in the last sentence, there must be some way of identifying whether each plot was treated with pesticide or not. I’ve learned (from lots of examples in statistics) that means we create a variable (on which the values are just “Yes” 79
Mainchapter.indd 79
LOCK13e_2PP
12/07/21 4:00 PM
80
Reading Statistics Problems
and “No”) and we use that variable to record the answer for each plot to “Was the plot treated with pesticide?” This is often tricky for students to see because it’s sort of hard to think of a name for that variable. • To summarize, here I find that • the individuals are the plots of land planted with grain • the variables are • what I might call “Beetle larvae,” which is a quantitative variable and • another variable with values “Yes” and “No” (a categorical variable), which I would say is the answer to “Was the plot treated with pesticide?” I’m going to choose to call it “Pesticide.” • the statistical analysis requested is to find the difference of the means of the variable “Beetle Larvae” for the “Yes” and “No” values of “Pesticide.”
Practice Problems 1. For the following study, answer these questions:
Example (Christmas trees): A local club sells non-artificial Christmas trees each year. As they are planning their marketing strategy, the question arises if there is an association between whether a household puts up a Christmas tree and whether the household includes young people under the age of 20. They do a survey. From the results of that survey, they will test a claim that the population proportion of households who put up a Christmas tree is higher for those households with members under age 20 than it is for those households without members under age 20. Solution: (A description of how I read this problem.) • They did a survey of households. So the individuals on whom the data were collected are households, not individual people. • They did a survey—so they asked each household some questions. Only two questions are mentioned here. Both questions have only “Yes” or “No” as an answer. To gather the answers, I’ll need to create two categorical variables. While the variable name probably should be “Do you have a Christmas tree?” that takes up a lot of room in the name for a variable. So I’ll just call that variable “Tree.” And, for a similar reason, I’ll call the other variable “Children.” • The type of analysis asked for is to test a claim about the two proportions: Proportion of households with children who have a tree and the proportion of households without children who have a tree. Caution: If you are frustrated, or short of time, don’t try to answer this question 2. Just read it and then read the solution. Later in your course, if you find problems that have “too much information” so that it is confusing, try to use this approach on them. 2. For the following study, answer these questions:
Example 2 (Trees): “To study the effects of air temperature on the growth characteristics of sour orange trees exposed to various concentrations of atmospheric carbon dioxide, data were collected every other month, over a two-year period on the dry weight per leaf and the mean air temperature of the preceding month, for trees exposed to an extra 300 microliter/liter of CO2. Use the data to form a regression model to predict the dry weight per leaf from the mean air temperature of the preceding month.”
Mainchapter.indd 80
LOCK13e_2PP
12/07/21 4:00 PM
Reading Statistics Problems
81
Solution: (Which is a description of how I read this problem.) • I read it once and felt confused. So I decided that I needed to read it again and underline all the variables. Here’s what I did for that: • “To study the effects of air temperature on the growth characteristics of sour orange trees exposed to various concentrations of atmospheric carbon dioxide, data were collected every other month, over a two-year period on the dry weight per leaf and the mean air temperature of the preceding month, for trees exposed to an extra 300 microliter/liter of CO2. The data are used to form a regression model to predict the dry weight per leaf from the mean air temperature of the preceding month.” • My goodness! That is MANY variables! Are there really that many? This seems too hard to think about now. • I decided to look at what this asks me to do for the analysis. Here’s what I see for that: “form a regression model to predict the dry weight per leaf from the mean air temperature of the preceding month.” • Now I know that what I am supposed to do seems to only involve two variables: dry weight per leaf and mean air temperature of the preceding month. • I have never heard of “dry weight per leaf” but I know what weight means, so a measurement of this will be a number. A leaf doesn’t weigh much so I don’t know what units they will use for weight, but if it is ounces, it will probably be some fraction of an ounce. So I’ll guess that the values of the variable might be numbers like 0.20 or 0.42, etc. of an ounce. (I didn’t really have to make up values here since I had already pretty much decided the variable is weight and it is quantitative. Making up values is something of a reality check on whether I really believe what I think is correct.) • The mean air temperature of the preceding month sounds like you’ll need quite a lot of help to measure, but, however they find it, it will be some number like 60 degrees Fahrenheit or 84 degrees Fahrenheit or something like that. At any rate, it is also clearly a quantitative variable. • I haven’t yet decided “who or what” the variable “dry leaf weight” is measured on. In a sense, it is on a leaf, but given the problem, I’m pretty sure it’s on a “basket” of leaves or something like that. If I were going to actually measure this variable, someone would need to define it for me in much more detail than I can see here. But I’m not being asked to measure it. If I were asked to do the analysis, someone would give me the dataset, and, if I thought I needed to know more about how this variable was measured, I could ask. • To conclude, I look back at all the things I underlined as variables, and I can see duplication— meaning that those same two variables were mentioned several times. And I see more context. In particular, I see that apparently all the trees in the study were exposed to excess carbon dioxide. So that’s not a variable in the study, because it was true for all the trees in the study. That tells me something about why the study was interesting to someone enough to pay for it to be done, but the numbers given about the amount of excess carbon dioxide are not used in our analysis here.
Mainchapter.indd 81
LOCK13e_2PP
12/07/21 4:00 PM
Evaluating Formulas: Part 3 Learning Objective: • Evaluate confidence interval formulas with a given standard error.
Order of Operations: PEMDAS P
Parentheses
()
E
Exponent
x n or √
M
Multiplication
⋅
D
Division
A
Addition
S
Subtraction
_
whichever comes first, left → right whichever comes first, left → right
Keep in mind that the P includes any grouping symbol. For example, [ ] and { } are also included under the first step of order of operations. Square root is, as an operation, just a special case of an exponent. (Square root is equivalent to an exponent of 1/2.) A fraction bar is meant to show division between the numerator and denominator. The fraction bar also tells us to simplify, separately, the top (numerator) and bottom (denominator). After the numerator and denominator are simplified, you can then use the division oper3 is the same as 3 ÷ 4. ation. For example, _ 4 In statistics, many calculations result in decimals that “never end.” Your teacher (or the statement of the problem) may tell you how many decimal places to give in your answer. If you aren’t told, three or four decimal places is usually enough. Some proportion problems may require more decimal places than that for some calculations.
E X A MPLE A 1
Translate the ± symbol
Problem: Find the two separate answers for X̅ ± 2 ⋅ SE, where X̅ = 9.2 and SE = 1.4. Step 1
Step 1
Substitute numbers in the formula.
X ̅ ± 2 ⋅ SE 9.2 ± 2 ⋅ 1.4
82
Mainchapter.indd 82
LOCK13e_2PP
12/07/21 4:00 PM
Evaluating Formulas: Part 3
Step 2
Step 2
Apply PEMDAS:
9.2 ± 2 ⋅ 1.4
Multiplication comes before
9.2 ± 2.8
83
either addition or subtraction. Step 3
Step 3 (for minus)
Translate the ±
6.4 Step 3 (for plus)
Answer: 6.4 and 12.0
E X A MPLE A 2
Evaluate and simplify using order of operations
Problem: Evaluate and simplify Step 1
Step 1
Substitute numbers in the formula.
p 1 − p 2 + 2 ⋅ SE 0.80 − 0.30 + 2 ⋅ 0.128
Step 2
Step 2
Apply PEMDAS: Multiplication.
− + ⋅ 0.80 − 0.30 + 0.256
Step 3
Step 3
Apply PEMDAS left to right: Subtraction.
0.50 + 0.256
Step 4
Step 4
Apply PEMDAS left to right: Addition.
Answer: 0.756
Practice Problem 1. Find the two endpoints of the confidence interval: p A − p B ± z* ⋅ SE, where p A = 0.68, p B = 0.46, z* = 1.645, SE = 0.039 Solution: Problem:
Substitute numbers in the formula.
Step 1 p A − p B ± z* ⋅ SE 0.68 − 0.46 ± 1.645 ⋅ 0.039
Step 2
Step 2
Apply PEMDAS. Multiplication comes
before addition or subtraction.
Mainchapter.indd 83
LOCK13e_2PP
12/07/21 4:00 PM
84
Evaluating Formulas: Part 3 Step 3
Step 3
Apply PEMDAS. Operations from
left to right. Subtraction is first.
Step 4
Step 4 (for minus)
Translate the into two separate evaluation problems.
Step 4 (for plus)
Answer: The end points are 0.156 and 0.284.
Mainchapter.indd 84
LOCK13e_2PP
12/07/21 4:00 PM
Inequality Statements— Meaning and Notation Learning Objectives: • Given a statement, graph the corresponding inequality. (This clarifies the meaning.) • Graph an inequality involving a variety of real numbers. • Graph a compound inequality. • Distinguish between situations where the endpoint is included and the endpoint is not included, in both notation and on a number line.
• Notice that the inequality symbols and the number line graph are always clear about whether the endpoint(s) is/are included. • Notice that sometimes the words are vague about whether the endpoint(s) is/are included and sometimes the words are clear.
Inequality Statements in Context: Often in statistics, the conclusion of a problem is something like this: “I am 90% confident that the population proportion is between 0.23 and 0.34.” While that statement is expressed in words, you are also expected to know that another way to write this is 0.23 ≤ p ≤ 0.34. Also, you are expected to understand the meaning of the various inequality symbols and being able to visualize what these mean on a number line. Graphing Inequalities on a Number Line Inequality
Statement
“x is less than a.”
Graph FIGURE A1
Graph of less than a
“x is less than or equal to a.”
FIGURE A2
Graph of less than or equal to a
“x is greater than a.”
FIGURE A3
Graph of greater than a
“x is greater than or equal to a.”
FIGURE A4
Graph of greater than or equal to a
“x is greater than a and less than b.”
FIGURE A5
Graph of between a and b a
“x is less than a or greater than b.”
b
FIGURE A6
Graph of less than a or greater than b a
b
85
Mainchapter.indd 85
LOCK13e_2PP
12/07/21 4:00 PM
86
Inequality Statements—Meaning and Notation
E X A MPLE A 1
Graphing an inequality given a statement
Problem: A gift exchange allows only gifts to be purchased for under $6. Graph the inequality. Step 1
Step 1
Determine the endpoint(s) and decide whether they are closed or open.
Step 2
Step 2
Label the number line. Be sure the endpoint(s) will be shown.
FIGURE A7 Labeling from 0 to 10 0
1
2
3
4
Step 3
Step 3
Use Figure A7 to plot the endpoint(s) and complete the graph based on the inequality. Refer to Figures A1 and A4. Should you include negative numbers?
FIGURE A8 Graphing the interval 0
1
2
3
4
5
6
7
8
9
10
5
6
7
8
9
10
Read the original statement to see why the number 6 is NOT included here. The numbers are for the cost of a gift. The cost can be zero, but not less than 0.
Answer: FIGURE A8 shows the graph depicting less than $6.
E X A MPLE A 2
Graphing an inequality
Problem: Graph the inequality on a number line: x > 1.75, and write it in interval notation. Step 1
Step 1
Label the number line. Be sure the endpoint will be shown.
FIGURE A9 –5 –4 –3 –2 –1 0 1 2 3 4 5
Step 2
Step 2
Decide whether the endpoint will be closed or open.
1.75 is not included, so it will be open.
Step 3
Step 3
Use Figure A9 to plot the endpoint and complete the graph based on the inequality. Refer to Figure A3.
FIGURE A10 –5 –4 –3 –2 –1 0 1 2 3 4 5
Answer: FIGURE A10 shows the graph of x > 1.75, which, in interval notation, is (1.75, ∞).
Mainchapter.indd 86
LOCK13e_2PP
12/07/21 4:00 PM
Inequality Statements—Meaning and Notation
E X A MPLE A 3
87
Writing interval notation given a statement
Problem: Write the following in interval notation: “Hourly pay starting at $15/hour.” Step 1
Step 1
Identify the endpoint and decide if it is included or excluded.
Step 2
Step 2
Since there is only one endpoint, decide whether the set of numbers goes towards ∞ or −∞.
In this case, it goes towards ∞, because the wages are at least $15/hour.
Step 3
Step 3
Write in interval notation.
Answer:
E X A MPLE A 4
Graphing a compound inequality
Problem: Graph the inequality: Step 1
Step 1
Label the number line. Be sure both endpoints will be shown.
FIGURE A11 A number line labeled from −10 to 10 –10 –9 –8 –7 –6 –5 –4 –3 –2 –1 0 1 2 3 4 5 6 7 8 9 10
Step 2
Step 2
Decide whether the endpoints will be closed or open.
Step 3
Step 3
Use Figure A11 to plot the endpoints and complete the graph based on the inequality. Refer to Figure A6; determine why you wouldn’t use Figure A5.
FIGURE A12 –10 –9 –8 –7 –6 –5 –4 –3 –2 –1 0 1 2 3 4 5 6 7 8 9 10
The statement has values smaller than a negative number or values larger than a positive number. You don’t shade between the endpoints. Answer: FIGURE A12 shows the graph of
Practice Problems 1. FIGURE A13 A number line from 0 to 5 0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Answer:
Mainchapter.indd 87
LOCK13e_2PP
12/07/21 4:00 PM
88
Inequality Statements—Meaning and Notation 2. FIGURE A14 A number line from 35 to 85 35
40
45
50
55
60
65
70
75
80
85
2
3
4
5
Answer: [45, 55] FIGURE A15 A number line from −5 to 5 –5
–4
–3
–2
–1
0
1
Answer: (−∞, 1] 4. FIGURE A16 A number line from −8 to 12 –8 –7 –6 –5 –4 –3 –2 –1 0 1 2 3 4 5 6 7 8 9 10 11 12
Answer: 5. (5.25, ∞) Answer: 0
1
2
3
4
5
6
7
8
9
10
6. ( −∞, 2] ∪ [2.5, ∞) Answer: 0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
6 7. ( −∞, _ 5] Answer: 0
8. ( 15, 23] Answer: 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
9. All values between 6.8 (excluded) and 18 (excluded) Answer: (6.8, 18) You must be at least 48" to ride the ride. Answer: [48, ∞) For ages under 13 Answer: [0, 13) The number of calories in all food choices less than 300 calories Answer: [0, 300)
Mainchapter.indd 88
LOCK13e_2PP
12/07/21 4:01 PM
Intervals: Meaning, Graph, Notation Learning Objectives: • Given a statement in words, graph the corresponding interval (primary method). • After seeing the words, and making a graph, translate this to “inequality notation.” • After seeing the words, and making a graph, translate this to “interval notation.” • Recognize that the description of the interval in words may or may not be as specific
about whether the endpoints are or are not included as the other methods of presenting the interval. • Recognize that the other forms (besides words) of presenting an interval are specific about whether each endpoint is or is not included.
Intervals in Context: In statistics discussion, the conclusions of most problems require a statement involving some inequality. Many of those inequalities are in words, which are relatively easy to understand. However, some of them may be given in any of the mathematically appropriate ways to describe inequalities. • Example 1: I am 99% confident that the population mean is between 75.6 and 83.2. • Example 2: Because the p-value of 0.062 is less than the significance level of 0.10, we conclude that there is significant evidence for the alternative hypothesis.
Presentations of an Interval 1. Words: The variable x is in the interval between −3 (included) and 2 (excluded). 2. Graph: See below. FIGURE A1 –10 –9 –8 –7 –6 –5 –4 –3 –2 –1
0
1
2
3
4
5
6
7
8
9 10
3. Inequality symbol notation: 4. Interval notation: x is in [−3, 2)
89
Mainchapter.indd 89
LOCK13e_2PP
12/07/21 6:38 PM
90
Intervals: Meaning, Graph, Notation
E X A MPLE A 1
Graphing an interval given a statement
Problem: An activity at the zoo is only for kids ages 3 through 8. Graph the interval on the number line below. FIGURE A2 A number line from negative to positive infinity
Step 1
Step 1
Label the number line shown in Figure A2. There are no negative ages, so start with 0 on the left. You will not be dealing with fractions or decimals, so let each unit represent an increase of 1.
FIGURE A3 Labeling the number line
Step 2
Step 2
Determine the endpoints and decide if they are included or excluded.
3 is the youngest age, so 3 is the left endpoint. 3 is included in the interval, so it is a closed circle.
0
1
2
3
4
5
6
7
8
9
10
Kids through age 8 can attend, which includes kids all the way up until their ninth birthday. So, the right endpoint will be 9; however, 9 is excluded and represented by an open circle. Step 3
Step 3
Use Figure A3 to plot the endpoints and draw a solid line between them.
FIGURE A4 Completing the graph 0
1
2
3
4
5
6
7
8
9
10
Answer: Figure A4 shows the graph of the ages from 3 years through 8 years. In symbols, it is 3 ≤ age < 9 or [3, 9).
E X A MPLE A 2
Graphing an interval involving integers
Problem: Graph the interval between (excluded) and 5 (included) on the number line below. Use the number line in Figure A2. Step 1
Step 1
Label the number line. Be sure both endpoints will be shown.
FIGURE A5 Numbering the line –10 –9 –8 –7 –6 –5 –4 –3 –2 –1 0 1 2 3 4 5 6 7 8 9 10
Step 2
Step 2
Decide whether the endpoints will be closed or open.
Step 3
Step 3
Plot the endpoints on Figure A5 and draw a solid line between them.
FIGURE A6 Plotting the interval
–10 –9 –8 –7 –6 –5 –4 –3 –2 –1
Answer: Figure A6 shows the graph of the interval (−2, 5].
Mainchapter.indd 90
LOCK13e_2PP
0
1
2
3
4
5
6
7
8
9 10
12/07/21 6:38 PM
Intervals: Meaning, Graph, Notation
E X A MPLE A 3
91
Graphing an interval involving decimals
Problem: Graph the interval between Step 1
Step 1
Label the number line. Be sure both endpoints will be shown.
FIGURE A7 Labeling the number line –2
–1
0
1
2
Step 2
Step 2
Decide whether the endpoints will be closed or open.
is included, so it will be closed.
Step 3
Step 3
Use Figure A7 to plot the endpoints and draw a solid line between them.
FIGURE A8 Completing the graph
3
4
5
6
7
8
3
4
5
6
7
8
–2
–1
0
1
2
Answer: The interval [−1.4, 6.8] is graphed in Figure A8. In symbols, it is −1.4 ≤ value ≤ 6.8.
Practice Problems
1. The lowest temperature yesterday was 45°. The highest was 72°.
35
40
45
50
55
60
65
70
75
80
85
This is the plot of 45° through 72°. Also [45, 72| or temperature ≤ 72 At a local college, the youngest student is 16, and the oldest is 54.
9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60
6
This is the graph of ages 16 through 54. Also [16, 54| or The lowest gas price in town is $2.11/gallon. The highest is $2.29/gallon.
2.06
2.1
2.14
2.18
2.22
2.26
2.3
2.34
This is the graph of prices from $2.11 through $2.29. Also [2.11, 2.29| or 2.11 ≤ price ≤ 2.29. 4. At a university, the smallest class has 14 students. The largest class has 160 students.
10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200
This graph shows 14 through 160. Also [14, 160| or
Mainchapter.indd 91
LOCK13e_2PP
12/07/21 6:39 PM
92
Intervals: Meaning, Graph, Notation
5. The interval between 3 (excluded) and 12 (excluded)
–5 –4 –3 –2 –1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
This graph shows the interval from 3 to 12. Also (3, 12). or 3 < value < 12. The interval between −35 (included) and 15 (excluded)
–50
–40
–30
–20
–10
0
10
20
30
40
50
This graph shows the interval between −35 7. The interval between 104 (included) and 110 (included)
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115
This graph shows the interval between 104 (included) and 110 (included). Also [104, 110] or 104 ≤ value ≤ 110. 8. The interval between
–20
–18
–16
–14
–12
–10
–8
–6
–4
–2
0
This shows the graph of the interval between −18 (excluded) and −3 The interval between 9.5 (excluded) and 10.3 (excluded)
8.5 8.6 8.7 8.8 8.9
9
9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9 10 10.1 10.2 10.3 10.4 10.5
This shows the graph of the interval between 9.5 (excluded) and 10.3 (excluded). Also (9.5, 10.3) or 9.5 < value < 10.3. The interval between
–20
–19
–18
–17
–16
–15
–14
–13
–12
–11
–10
This shows the graph of the interval between −16.7 (included) and −12.15 (excluded). Also [−16.7, −12.15) or −16.7 ≤ value < −12.15. (included) and 4.6 (included)
–5
–4
–3
–2
–1
0
1
2
3
4
5
This shows the graph of the interval between
The interval between (included) and 7.2 (excluded)
–10 –9 –8 –7 –6 –5 –4 –3 –2 –1
0
1
2
3
4
5
6
7
8
9 10
This shows the interval between (included) and 7.2 (excluded). Also
Mainchapter.indd 92
LOCK13e_2PP
12/07/21 6:39 PM
Square Roots Learning Objectives: • Recognize perfect squares for the numbers up to • Evaluate a square root for a perfect square. • Use a calculator to evaluate a square root. • Estimate a square root for a non-perfect square. • NOT required, but somewhat useful: Use the divide and average method for finding square roots by hand.
Square Roots in Context: Many of the important formulas in statistics involve square roots. Thus, it is important to know how to understand them and how to use a calculator to find them. s For example, the formula for the standard error of a mean is In a statistics class, we use calculators to evaluate square roots. You will feel more confident working problems if you also understand the meaning of a square root and have learned enough about numbers to have some intuition about the approximate size of a square root. The activities here are designed to help you with this. Perfect Squares 12 = 1
5 2 = 25
92 = 81
22 = 4
6 2 = 36
102 = 100
32 = 9
7 2 = 49
42 = 16
8 2 = 64
Finding a Square Root by Hand—NOT required Make a guess at the square root of the number. Divide the number by our guess. 2. Average this number and our previous guess.
E X A MPLE A 1
3. Use this number as our next guess. Continue this process until finding the square root to the desired number of decimal places.
Evaluating a square root for a perfect square
_
Problem: Evaluate √36 . Step 1
Step 1
Guess and check to determine which nonnegative integer can be multiplied by itself to give 36.
5 ⋅ 5 = 25, 6 ⋅ 6 = 36, 7 ⋅ 7 = 49
_
Answer: Since 62 = 36, √36 = 6.
93
Mainchapter.indd 93
LOCK13e_2PP
12/07/21 4:01 PM
94
Square Roots
E X A MPLE A 2
Learn to use YOUR calculator to find a square root
_
Problem: Evaluate√36 . (Useful for exploring your calculator because you already know the answer!) Step 1
Step 1
Determine what button on your calculator gives the square root AND whether your calculator expects the symbol first or the number first.
Do one of the following. • Recognize the symbol. • Ask your teacher or a classmate to look at your calculator with you and help. • Look online to find a manual for your calculator and look it up in there.
Step 2
Step 2
Use those steps to find this square root. Make sure it is done correctly.
• After you determine the steps, write them down so that you’ll remember.
_
Answer: Since 62 = 36, √36 = 6.
E X A MPLE A 3
Estimating the value of a square root
_
Problem: Estimate √77 between two consecutive whole numbers. Step 1
Step 1
Determine which perfect square is close to, but just below, 77.
8 2 = 64
Step 2
Step 2
Determine which perfect square is close to, but just above, 77. _
9 2 = 81
Answer: So √77 is between 8 and 9.
E X A MPLE A 4
_
Problem: Find √23 to 2 decimal places. (Note: While doing the calculations as we work through the problem, we round to 3 decimal places to ensure our final rounded answer is correct to 2 decimal places.) Step 1
_
Step 1
_
_
Estimate √23
Since √16 = 4 and √25 = 5, we will first guess 4.75.
Step 2
Step 2
Divide 23 by our guess of 4.75.
Step 3
Step 3
Average this number and the previous guess.
4.842 ≈ 4.796 ___________ 4.75 +
Step 4
Step 4
This number is our next guess, so we will divide 23 by 4.796.
Step 5
Step 5
Since we got the same answer as our previous guess, we can now round this result.
2
_
Answer: √23 ≈ 4.80
Mainchapter.indd 94
LOCK13e_2PP
12/07/21 4:01 PM
Square Roots
95
Practice Problems _
1. √ 16 _ Answer: Since 42 = 16,√16 = 4. _
√ 81
_
Answer: Since 92 = 81,√81 = 9. _
√ 49
_
Answer: Since _
√ 144
_
Answer: Since 122 = 144,√144 = 12. 5. √ 2 _ Answer: So √2 is between 1 and 2, because _
6. √ 60 _ Answer: So √60 is between 7 and 8, because 72 = 49 and 82 = 64. _
√ 5
_
Answer: So √5 is between 2 and 3, because _
√ 97 _ Answer: So √97 is between 9 and 10, because
_
9. Find √60
Solution:
_
Problem: Find √60 to 2 decimal places. (Note: While working the problem, we round to no fewer than 3 decimal places for all calculations to ensure our final rounded answer is correct to 2 decimal places.) Step 1
_
Estimate √60
Step 1
_
Step 2
Step 2
Divide 60 by our guess of 7.6.
Step 3
Step 3
Average this number and the previous guess.
7.6 + 7.895 ___________ ≈ 7.748 2
Step 4
Step 4
This number is our next guess, so we will divide 60 by 7.748.
Step 5
Step 5
Average this number and the previous guess.
_
Since √49 = 7 and √64 = 8, we will first guess 7.6.
7.744 ≈ 7.746 ____________ 7.748 + 2
7.75 ≈ 7.746 ____________ 7.74 + 2
Step 6
Step 6
This number is our next guess, so we will divide 60 by 7.746.
____________ 60 ≈ 7.746
Step 7
Step 7
Because we are only looking to be accurate to 2 decimal places, we can now round this result.
7.746
_
Answer: √60 ≈ 7.75
Mainchapter.indd 95
LOCK13e_2PP
12/07/21 4:01 PM
Evaluating Formulas: Part 4 Learning Objectives: • Evaluate one-parameter standard error formulas. • Use a two-step strategy to evaluate standard error formula and then put the value into a confidence interval or z-score formula.
Order of Operations: PEMDAS P
Parentheses
()
E
Exponent
x n or √
M
Multiplication
⋅
D
Division
A
Addition
S
Subtraction
_
whichever comes first, left → right whichever comes first, left → right
Keep in mind that the P includes any grouping symbol. For example, [ ] and { } are also included under the first step of order of operations. Square root is, as an operation, just a special case of an exponent. (Square root is equivalent to an exponent of 1/2.) A fraction bar is meant to show division between the numerator and denominator. The fraction bar also tells us to simplify, separately, the top (numerator) and bottom (denominator). After the numerator and denominator are simplified, you can then use the division oper3 ation. For example, _ 4 In statistics, many calculations result in decimals that “never end.” Your teacher (or the statement of the problem) may tell you how many decimal places to give in your answer. If you aren’t told, three or four decimal places is usually enough. Some proportion problems may require more decimal places than that for some calculations.
E X A MPLE A 1
Review computing a z-score
x − x̅ for x = 185, x = 150, s = 15. Problem: Evaluate and simplify z = _ ̅ s Step 1 Evaluate the formula by substituting the numbers for the symbols.
Step 1 x − x̅ 185 − 150 z = _ 15
96
Mainchapter.indd 96
LOCK13e_2PP
12/07/21 4:01 PM
Evaluating Formulas: Part 4
Step 2
Step 2
Apply PEMDAS: Simplify the numerator and denominator separately (before dividing).
185 − 150 z = _ 15 3 5 _ z = 15
Step 3
Step 3
Divide.
35 z = _ 15 z = 2.3333
In statistics, many calculations result in decimals that “never end.” Your teacher (or the statement of the problem) may tell you how many decimal places to give in your answer. If you aren’t told, three or four decimal places is usually enough.
97
In Chapter 6, some of the proportion problems may require more decimal places than that for some calculations. Answer: z = 2.3333
E X A MPLE A 2
Review the strategy for computing confidence intervals
Problem: Find the two separate answers for ̅ ̅ and SE = 1.4. Step 1
Step 1
Substitute numbers in the formula.
x̅ ± 2 ⋅ SE 9.2 ± 2 ⋅ 1.4
Step 2
Step 2
Apply PEMDAS:
Multiplication comes before either addition or subtraction. Step 3 Translate the
Step 3 (for minus) 6.4 Step 3 (for plus) 12.0
Answer: 6.4 and 12.0
E X A MPLE A 3
Describe a strategy for computing confidence intervals
Problem: Sample Statistic ± z* ⋅ SE Strategy: Step 1 Separately identify or compute the values of the Sample Statistic and the SE. Step 2 Look up the value of z*. Step 3 Then use the strategy of Example A2 to complete this calculation. Notice that Example A2 was a special case of this.
Mainchapter.indd 97
LOCK13e_2PP
12/07/21 4:01 PM
98
Evaluating Formulas: Part 4
E X A MPLE A 4
Sample Statistic − Null Parameter Sample Statistic − Null Parameter Problem: z = ________________________ or t = ________________________ SE SE Strategy: Step 1 Identify and compute the value of the Sample Statistic. Step 2 Identify the value of the Null Parameter. Step 3 Compute SE. Step 4 Then use the strategy of Example A1 to complete this calculation.
E X A MPLE A 5
Evaluate a formula for the standard error of a mean
s_ Problem: SE = _ √n Strategy: To the extent that you can do the entire calculation in your calculator at once, that is best. The strategy shown below is for a situation where you don’t know the steps to do the entire calculation in your calculator at once. In this case, it is important to keep many decimal places in any “intermediate calculation” you do with your calculator before you do the final calculation. That is shown here in step 2 by keeping six decimal places in the denominator. Step 1 Substitute numbers in the formula.
Step 1 s
Step 2
Step 2
Simplify the denominator.
12.1 _ SE = _ √ 28 12.1 SE = _ 5.291503
Step 3
Step 3
There is a fraction bar so simplify the numerator and denominator separately.
Complete the division. In your final answer it is not as important to keep many decimal places. In this problem, I chose four decimal places.
12.1 SE = _ 5.291503 SE = 2.2867
Ask your teacher how many decimal places they want you to round to for the (1) intermediate steps. (2) final answer. Answer: SE = 2.2867
Mainchapter.indd 98
LOCK13e_2PP
12/07/21 4:01 PM
Evaluating Formulas: Part 4
E X A MPLE A6
99
Evaluate a formula for standard error of a proportion
_
Problem: SE =
p(1 − p) √ _ n
Strategy: To the extent that you can do the entire calculation in your calculator at once, that is the best way to do this. The strategy shown below is for a situation where you don’t know the steps to do the entire calculation in your calculator at once. In this case, it is important to keep many decimal places in any “intermediate calculation” you do with your calculator before you do the final calculation. That is shown here in step 4 by keeping seven decimal places past the “leading zeroes” in the value under the square root and in the final answer. Ask your teacher for instructions on how many decimal places to keep in such calculations. Step 1
Step 1 _ p(1 − p) SE = _ n
√
Substitute numbers in the formula.
____________
SE = Step 2 Simplify the numerator inside the square root. Apply PEMDAS. Simplify the subtraction in the parentheses first.
Step 3
0.32(1 − 0.32) √ ____________ 70
Step 2 ____________ 0.32(1 − 0.32) SE = ____________ 70 _ 0.32(0.68) SE = _ 70
√ √
Step 3 _ 0.32(0.68) SE = _ 70
√
Multiply in the numerator.
_
SE = √_ 0.2176 70 Step 4 _ SE = √_ 0.2176 70
Step 4 Complete the division.
____________
SE = √0.003108571 Step 5
Step 5
Find the square root.
____________
SE = 0.05575456 Answer: SE = 0.05575456
Practice Problems s_ , where s = 9.29 for n = 27. 1. Find the standard error of the mean using the formula SE = _ √n Solution: s_ when x = 9.29, n = 27. Problem: Find the standard error of the mean using the formula SE = _ √n Strategy: Keep many decimal places in the denominator and then round off more when the computation is complete. Step 1 Substitute numbers in the formula. Step 2 Simplify the denominator, using many decimal places. Step 3 Divide, keeping three or four decimal places. Ask your teacher how many decimal places to give in your answer.
Mainchapter.indd 99
LOCK13e_2PP
Step 1
s_ SE = _ √n
Step 2 9.29 _ SE = _ √ 27 Step 3
9.29 SE = _ 5.196152 SE = 1.788
12/07/21 4:01 PM
100
Evaluating Formulas: Part 4 2. Find a confidence interval for a mean using the formula and ̅ ̅ the SE is the value computed from the immediately previous problem. Solution: Problem: Find a confidence interval for a mean using the formula ̅ Step 1
Step 1
Substitute numbers in the formula.
x ̅ ± t* ⋅ SE
Step 2
Step 2
Multiply.
̅
Step 3
Step 3 (for minus)
Add and subtract separately.
Step 3 (for plus)
3 . Find the standard error (called SE) using the formula SE = Solution:
_
p (1 − p ) 0 n 0 , where p 0 = 0.40 and n = 39. √_
_
Problem: Find SE =
p 0(1 − p 0) when p 0 = 0.40, where n = 39. √ _ n
Step 1 Substitute numbers in the formula. Step 2
Step 1 _ p (1 − p ) SE = _ 0 n 0
√
Step 2 ____________ 0.40(1 − 0.40) SE = ____________ 39
√
Simplify the numerator (multiply).
Step 4
Step 3 _ SE = √_ 0.24 39 Step 4
Take the square root.
SE = √0.00615385
Step 3 Divide.
___________
SE = 0.078446 4 . Formula
̂ ̂
Solution: p̂ − p Problem: Formula z = _0 for p ̂ = 0.55, p 0 = 0.40, n = 39. SE Step 1
Mainchapter.indd 100
LOCK13e_2PP
Evaluate the formula by substituting the numbers for the symbols.
Step 1 p̂ − p z = _0 _ SE
Step 2
Step 2
Simplify the numerator (subtract).
0.55 − 0.40 z = ___________ 0.078446
Step 3
Step 3
Divide.
12/07/21 4:01 PM
Scientific Notation Formats Learning Objectives • Recognizing when an expression is not in scientific notation and making the adjustments. • Changing numbers in scientific notation to other desired formats.
Statistics In statistics courses, we often work with real-world data, which may include numbers much larger and much smaller than those one typically sees. When computing with those numbers, many calculators and software programs translate those numbers into scientific notation in order to deal with them more easily. For statistics, you need to be able to • Recognize when a number is given in scientific notation in your textbook, software program, or calculator. • Convert a number given in scientific notation to the usual notation.
Statistics Examples Usual notation 2,310,000 0.0005893
Scientific notation 6
2.31 × 10 −4
5.893 × 10
Text-based scientific notation 2.31 E+06 or 2.31 E 06 5.893 E−04
Description: A number is given in scientific notation if it has one digit to the left of the decimal point and then is multiplied by the appropriate power of 10 so that it is equivalent to the number you started with.
Rewriting Results in Scientific Notation Scientific notation:
Given b × 10n , where b < 1, and change
Given b × 10n, where b > 10, move the decimal point k places to the left to make
101
Mainchapter.indd 101
LOCK13e_2PP
12/07/21 4:01 PM
102
Scientific Notation Formats
E X A MPLE A 1
Write a number in scientific notation
Problem: Write the number Step 1
Step 1
Write the number 160 in scientific notation.
Step 2
Step 2
Multiply this scientific notation times 10 to the 4th.
Answer: 160 × 104 = 1.60 × 106
E X A MPLE A 2
Write a number in scientific notation
Problem: Write the number 0.05 × 1010 Step 1
Step 1
Write the number 0.05 in scientific notation.
Step 2
Step 2
Multiply this scientific notation times 10 to the 10th.
5 × 10−2 × 1010 = 5 × 10−2+10 = 5 × 108
Answer: 0.05 × 1010 = 5 × 108
Practice Problems Fill in the blanks in the table so that the three numbers in the row are the same number, in different notations. Problem ID
Usual notation
A
0.00047901
B
1800
C
27.34
Scientific notation
Text-based scientific notation
6.1 × 10−3
D E
8.9934 E−03
F
3.17 E+12 or 3.17 E12
Solution Key Problem ID A
Usual notation 0.00047901
Scientific notation −4
4.7901 × 10
Text-based scientific notation 4.7901 E−04
1800
3
1.800 × 10
1.800 E 03
C
27.34
1
2.734 × 10
2.734 E 01
D
0.0061
6.1 × 10−3
6.1 E−03
E
0.0089934
8.9934 × 10−3
8.9934 E−03
B
F
3170000000000 or
12
3.17 × 10
3.17 E+12 or 3.17 E12
3,170,000,000,000
Mainchapter.indd 102
LOCK13e_2PP
12/07/21 4:01 PM
Scientific Notation Formats
103
Additional exercises for more sophistication in working with scientific notation: Write the number in scientific notation. Answer: Write the number in scientific notation. Answer: 19.6 × 10−10 = 1.96 × 10−9 Write the number 0.535 × 106 in scientific notation. 6 5 Answer: Write the number in scientific notation. Answer: 5. Write the number in scientific notation. 2 Answer: 6. Write the number in scientific notation. Answer: 7. Write the number in scientific notation. Answer: 0.006 × 107 = 6 × 104 Write the number in scientific notation. Answer: 9. Write the number Answer: 24 × 10−1 = 2.4 × 100 Write the number in scientific notation. Answer:
Mainchapter.indd 103
LOCK13e_2PP
12/07/21 4:01 PM
Evaluating Formulas: Part 5 Learning Objective: • Evaluate two-parameter standard error formulas.
Order of Operations: PEMDAS P
Parentheses
()
E
Exponent
x n or √
M
Multiplication
D
Division
A
Addition
S
Subtraction
_
whichever comes first, left → right whichever comes first, left → right
Keep in mind that the P includes any grouping symbol. For example, [ ] and { } are also included under the first step of order of operations. Square root is, as an operation, just a special case of an exponent. (Square root is equivalent to an exponent of 1/2.) A fraction bar is meant to show division between the numerator and denominator. The fraction bar also tells us to simplify, separately, the top (numerator) and bottom (denominator). After the numerator and denominator are simplified, you can then use the division oper3 is the same as ation. For example, _ 4 In statistics, many calculations result in decimals that “never end.” Your teacher (or the statement of the problem) may tell you how many decimal places to give in your answer. If you aren’t told, three or four decimal places is usually enough. Some proportion problems may require more decimal places than that for some calculations.
E X A MPLE A 1
__________________
__________________
Problem: Describe a strategy for computing SE =
− p ) p (1 − p ) p (1 n + _ √_ n . 1
1
1
2
2
2
Strategy: Step 1 Notice that there is a sum under the square root.
104
Mainchapter.indd 104
LOCK13e_2PP
12/07/21 4:01 PM
Evaluating Formulas: Part 5
105
Step 2 Compute each of the two parts of the sum separately, keeping at least seven decimal places past any leading zeros. (Similarly, to what you did in computing SEs that were less complicated than this.) Step 3 Add the two parts of the sum. Step 4 Take the square root of the sum. Note that the SE formula for analyzing the difference of means is structured in a similar way, so these same instructions apply to it.
E X A MPLE A 2
__________________
Problem: Use the four-step process to compute this SE = n 1 = 51 and p̂ 2 = 0.65, n 1 = 82. _________________
SE =
√
p ̂ (1 p̂ (1 − p̂ 1) _ − p̂ 2) _ + 2 = 1 n 1
n 2
̂ (1 − p̂ ) p ̂ (1 − p̂ ) + _ for p̂ = 0.30, √ p_ n n 1
1
2
1
2
2
1
_______________________
0.30(1 − 0.30) ___________ 0.65(1 − 65) + √____________ 51 82
Step 1
0.30(1 − 0.30) ___________ 0.30(0.70) _ = = 0.21 = 0.004117647 Calculate the first term of the sum: ____________ 51 51 51 Step 2 0.65(1 − 0.65) ____________ 0.65(1 − 0.65) _ Calculate the second term of the sum: ____________ = = 0.2275 = 0.002774390 82 82 82 Step 3 Add those terms: 0.004117647 + 0.002774390 = 0.006892037 Step 4
____________
Take the square root: √0.006892037 = 0.083018293 Solution: SE = 0.0830
Practice Problem Exercise: Use the strategy to compute the standard error for the difference of two means. _
Problem: Use the four-step process to compute this SE = and n 2 = 17.
Mainchapter.indd 105
LOCK13e_2PP
2 1
2 2
1
2
s s n + _ √_ n for s = 14.68, n = 57, s = 14.26, 1
1
2
12/07/21 4:01 PM
106
Evaluating Formulas: Part 5 Solution:
_
s 2 s 22 Problem: Use the four-step process to compute this SE = _ n1 + _ n 2 for s 1 = 14.68, n 1 = 57, s 2 = 14.26, 1 and n 2 = 17. Because this will not be a very small value, it is not necessary to have more than about four decimal places of accuracy.
√
_
SE =
2 1
2 2
1
2
_____________
s s 14.26 14.68 _ _ n + _ √_ n = √ 57 + 17 2
2
2 215.5024 14.68 = _ = 3.7807439 Part 1: Calculate the first term of the sum: _ 57 57 2 203.3476 14.26 = _ = 11.9616235 Part 2: Calculate the second term of the sum: _ 17 17 Part 3: Add those terms: 3.7807439 +11.9616235 = 15.7423674
_
Part 4: Take the square root: √15.7423674 = 3.9676652 Answer: The standard error is 3.9677.
Mainchapter.indd 106
LOCK13e_2PP
12/07/21 4:01 PM
Evaluating Formulas: Part 6 Learning Objectives: • Evaluate two-parameter confidence interval formulas when the standard error has already been found.
• Evaluate two-parameter hypothesis test formulas when the standard error has already been found.
Order of Operations: PEMDAS P
Parentheses
()
E
Exponent
x n or √
M
Multiplication
D
Division
A
Addition
S
Subtraction
_
whichever comes first, left → right whichever comes first, left → right
Keep in mind that the P includes any grouping symbol. For example, [ ] and { } are also included under the first step of order of operations. Square root is, as an operation, just a special case of an exponent. (Square root is equivalent to an exponent of 1/2.) A fraction bar is meant to show division between the numerator and denominator. The fraction bar also tells us to simplify, separately, the top (numerator) and bottom (denominator). After the numerator and denominator are simplified, you can then use the division oper3 is the same as ation. For example, _ 4 In statistics, many calculations result in decimals that “never end.” Your teacher (or the statement of the problem) may tell you how many decimal places to give in your answer. If you aren’t told, three or four decimal places is usually enough. Some proportion problems may require more decimal places than that for some calculations.
Strategy: The computations for these problems are long enough that it is difficult to handle them all together. It is crucial that you split it into appropriate parts so that you can do the parts separately and then, at the end, put it all together. The entire point of this worksheet is to help you become more comfortable with that.
107
Mainchapter.indd 107
LOCK13e_2PP
12/07/21 4:01 PM
108
Evaluating Formulas: Part 6
E X A MPLE A 1
Find a 95% confidence interval in a two-sample proportion problem __________________
Problem: Use this formula: (p̂ 1 − p̂ 2) ± z* ⋅ SE, where SE = for p̂ 1 = 0.30, n 1 = 51 and p̂ 2 = 0.65, n 2 = 82.
p ̂ (1 p ̂ (1 − p̂ ) _ − p̂ ) + n n √_ 1
1
2
1
2
2
Outline: The main formula (p̂ 1 − p̂ 2) ± z* ⋅ SE has three parts to find separately and then put them together. Think of how you have found these when doing other problems. The most complicated one here is the SE, whose formula is given in the problem. Putting this all together, here are the separate things to find in order to do a problem of this kind. p̂ 1 − p̂ 2 b. Use software to find it. See the worksheet “Evaluating Formulas: Part 5” to practice the steps to do it “by hand.” Solution: 1. p̂ 1 − p̂ 2 = 0.30 − 0.65 = −0.35 2. For a 90% confidence interval, using the normal distribution, this is 1.96. 3. (p ̂ 1 − p̂ 2) ± z* ⋅ SE −0.35 ± 1.96 ⋅ 0.0830 −0.35 ± 0.1627 −0.35 ± 0.16 −0.51 to −0.19 The 95% confidence interval for the difference of the population proportions is −0.51 to −0.19.
Practice Problem Make a list of what is needed to carry out the strategy of breaking this into parts. Then carry it out. This SE was done in the Practice Problem at the end of the worksheet “Evaluating Formulas: Part 5.” X ̅ 1 − X̅ 2 with summary statistics: Problem: To test the hypotheses the test statistic is t = _ SE X̅ 1 = 93.87, X̅ 2 = 87.13 s 1 = 14.68, n 1 = 57, s 2 = 14.26, n 2 = 17. Find the numerical value of the t-statistic. Solution:
X ̅ 1 − X̅ 2 the test statistic is t = _ SE X ̅ 1 = 93.87, X̅ 2 = 87.13, s 1 = 14.68, n 1 = 57, s 2 = 14.26, n 2 = 17. Find the numerical value of the t-statistic. We need to compute Problem: To test the hypotheses
X ̅ 1 − X̅ 2 = 93.87 − 87.13 = 6.74 2. SE From the worksheet: “Evaluating Formulas: Part 5,” we know SE = 3.9677. X̅ 1 − X̅ 2 t = _ SE 6.74 = _ 3.9677 = 1.6987 The t-statistic is 1.6987.
Mainchapter.indd 108
LOCK13e_2PP
12/07/21 4:01 PM
CHAPTER 1
1
Section 1.1 Solutions 1.1
(a) The cases are the people who are asked the question.
(b) The variable is whether each person supports the law or not. It is categorical. 1.2
(a) The cases are the 100 stocks.
(b) The variable is the percentage change, which is a numerical quantity, for each of the stocks. It is quantitative. 1.3
(a) The cases are the teenagers in the sample.
(b) The variable is the result (yes or no) indicating whether each teenager eats at least five servings a day of fruits and vegetables. It is categorical. 1.4
(a) The cases are the bunches of bananas in the sample.
(b) The variable is the number of days until the bananas go bad. It is quantitative. 1.5
(a) The 10 beams that were tested.
(b) The force at which each beam broke. It is quantitative. 1.6
(a) The cases are countries of the world.
(b) The variable is whether or not the literacy rate is over 75%. It is categorical. 1.7 Since we expect the number of years smoking cigarettes to impact lung capacity, we think of the number of years smoking as the explanatory variable and the lung capacity as the response variable. 1.8 Since we expect the amount of fertilizer used to impact the yield (and not the other way around), we think of the amount of fertilizer as the explanatory variable and the yield of the crop as the response variable. 1.9 Ingesting more alcoholic drinks will cause the level of alcohol in the blood to increase, so the number of drinks is the explanatory variable and blood alcohol content is the response. 1.10 The world record time will continue to decrease as the years go by so we expect the year to impact marathon record time. We think of the year as the explanatory variable and the record time as the response variable. 1.11
(a) Year and HigherSAT are categorical. The other six variables are all quantitative, although Siblings might be classified as either categorical or quantitative.
(b) There are many possible answers, such as “What proportion of the students are first year students?” or “What is the average weight of these students?” (c) There are many possible answers, such as “Do seniors seem to weigh more than first year students?” or “Do students with high Verbal SAT scores seem to also have high Math SAT scores?” 1.12
(a) In addition to the identification column, Country, there are 24 variables. We see that Developed is a categorical variable, while the other 23 variables are all quantitative.
(b) There are many possible answers, such as “What is the average life expectancy for all countries of the world?” or “What proportion of countries are developed?”
CHAPTER 1
2
(c) There are many possible answers, such as “Do countries with a greater land area have a larger percent rural?” or “Do countries that spend a relatively large amount on the military spend a relatively small amount on health care?” or “Do developed countries have a longer life expectancy than developing countries?” 1.13
(a) The explanatory variable is the type of format in which the story is presented. It is categorical, with three categories (audio, illustrated, and animated).
(b) The response variable is the measure of brain connectivity. It is quantitative. (c) The cases are the four-year-olds in the study, so there are 27 cases. 1.14
(a) The cases are female gamers in Great Britain.
(b) There are three variables mentioned: Whether the gamers had received obscene messages (categorical), how many hours a week they played (quantitative), and whether they felt there were enough strong female characters in games (categorical). (c) There are 1151 cases and 3 variables, so the dataset will have 1151 rows and 3 columns. 1.15
(a) The cases are the students in the college physics class.
(b) There are 3 variables: Whether the student was assigned to an active or passive learning class, the measure of how much the student thought they learned, and the score of actual learning on the test. The class assignment is categorical, while the other two are quantitative. (c) The class assignment (active or passive) is the explanatory variable, while the two measures of learning are response variables. (d) The students in the physics class are the cases, so there are 154 + 142 = 296 cases. There are three variables, so the dataset will have 296 rows and 3 columns. 1.16 There are at least two variables. One variable is whether or not the spider engaged in mock-sex. This variable is categorical and the explanatory variable. Another variable is length of time to reach the point of real mating once the spider is fully mature. This variable is quantitative and the response variable. 1.17 The individual cases are the lakes from which water samples were taken. For each lake in the sample, we record the concentration of estrogen in the water and the fertility level of fish. Both are quantitative variables. 1.18 There are two variables. One variable indicates the presence or absence of the gene variant and the second variable indicates which of the three ethnic groups the individual belongs to. Both variables are categorical. 1.19
(a) There are 10 cases, corresponding to the 10 cities. The two variables are population, which is quantitative, and the hemisphere the city is in, which is categorical.
(b) We need two columns, one for each variable. The columns can be in either order. See the table.
CHAPTER 1
3 Population 37 26 23 22 21 21 21 21 20 19
1.20
Hemisphere Eastern Eastern Eastern Eastern Eastern Eastern Eastern Western Western Western
(a) There are 7 cases, representing the seven pigeons. There are two variables. One is the sex of the pigeon, which is categorical, and the other is the speed of the pigeon, which is quantitative.
(b) The dataset will have 7 rows and 2 columns. See the table. (The seven cases can be listed in any order.) Sex Hen Hen Hen Cock Cock Cock Cock 1.21
Speed 1676 1452 1449 1458 1435 1418 1413
(a) The cases are the homing pigeons, so there are 1412 of them.
(b) There are 4 variables. Two are categorical (loft, sex) and two are quantitative (distance, speed). (c) The dataset will have 1412 rows (one for each pigeon) and 4 columns (one for each variable). 1.22 One variable is whether each male was fed a high-fat diet or a normal diet. This is the explanatory variable and it is categorical. The response variable is whether or not the daughters developed metabolic syndrome, which is also categorical. 1.23 One variable is whether the young female mice lived in an enriched environment or not. This is the explanatory variable and it is categorical. The response variable is how fast the offspring learned to navigate mazes and is quantitative. 1.24
(a) The mode of transportation (e.g., bus, car, walk) gives a categorical value for each student.
(b) The answer about allergies (yes or no) gives a categorical value for each student. (c) The proportion of students in the sample who are vegetarians is a single number, based on all of the students, and does not give a value for each individual student. It is not a variable for the dataset. (d) The number of hours worked at a paid job gives a quantitative value for each student. (e) The difference in hours of sleep on school nights and non-school nights gives a quantitative value for each student. (f) The maximum time to get to school is a single number, based on all of the students, and does not give a value for each individual student. It is not a variable for the dataset.
CHAPTER 1
4 (g) The desired super power chosen from a list gives a categorical value for each student. 1.25
(a) The total tuition and fees is a quantitative value for each school.
(b) The number of schools in the Northeast is a single value, based on all of the schools, and does not give a value for each individual school. It is not a variable for the dataset. (c) The type of school (public, private, for profit) is a categorical value for each school. (d) The number of undergraduates is a quantitative value for each school. (e) The percentage of part-time students is a quantitative value for each school. (f) The school with the highest average faculty salary is a single value, based on all of the schools, and does not give a value for each individual school. It is not a variable for the dataset. 1.26 In the first study, the cases are the students. The only variable is whether or not the student has smoked a hookah. This is a categorical variable. In the second study, the cases are the people in a hookah bar. The variables are the length of the session, the frequency of puffing, and the depth of inhalation. All are quantitative. In the third study, the cases are the smoke samples, and the variables are the amount of tar, nicotine, and heavy metals. All three variables are quantitative. 1.27
(a) This description of the study mentions six variables: age, nose volume, nose surface area, nose height, nose width, and gender.
(b) One of the variables (gender) is categorical, and the other five are quantitative. (c) There are six variables so the dataset will have six columns. The 859 participants are the cases, so the dataset will have 859 rows. 1.28
(a) The cases are the 47 participants.
(b) The description of the study includes three different variables: the score on the no-distractions test, the score on the test while texting, and whether or not the student considered him or herself to be good at multitasking. The two test score variables are quantitative and the multitasking variable is categorical. (c) The dataset would have 47 rows (one for each participant) and three columns (one for each of the three variables). 1.29
(a) The cases are the 40 people with insomnia who were included in the study.
(b) There are two variables. One is which group the person is assigned to, either therapy or not, and the other is whether or not the person reported sleep improvements. Both are categorical. (c) The dataset would have two columns, one for each of the two variables, and 40 rows, one for each of the people in the study. 1.30 If we simply record age in years and income in dollars, the variables are quantitative. Often, however, in a survey, we don’t ask for the exact age but rather what age category the participant falls in (20 − 29, 30 − 39, etc.). Similarly, we often don’t ask for exact income but for an income category (less than $10,000, between $10,000 and $25,000, etc.). If we ask participants what category they are in for each variable, then the variables are categorical. 1.31 We could sample people eligible to vote and ask them each their political party and whether they voted in the last election. The cases would be people eligible to vote that we collect data from. The variables would be political party and whether or not the person voted in the last election. Alternatively, we could ask whether each person plans to vote in an upcoming election.
CHAPTER 1
5
1.32 We could survey a sample of people and ask their household income and measure happiness in some way, such as asking how happy they are on a scale of 1–10. The cases would be the people we collect data from. The variables in this case would be household income and happiness rating, although any two variables measuring wealth and happiness are possible. 1.33 Answers will vary.
CHAPTER 1
6 Section 1.2 Solutions 1.34 This is a sample, because only a subset of fish are measured. 1.35 This is a population, because all customers are accounted for. 1.36 This is a population, because all registered vehicles are accounted for. 1.37 This is a sample, because only a subset of college students were sent the questionnaire.
1.38 The sample is the 120 people interviewed. The population might be all people in that town or all people that go to the mall in that town or a variety of other groups larger than and containing the 120 people in the sample. 1.39 The sample is the five hundred Canadian adults that were asked the question; the population is all Canadian adults. 1.40 The sample is the 100 customers surveyed; the population is all customers of the cell phone carrier. 1.41 The sample is the 1000 households which have databoxes attached to the televisions. The population is all US households with televisions. 1.42
(a) The sample is the 100 college students who were asked the question.
(b) The population we are interested in is all Americans. (c) A population we can generalize to, given our sample, is college students. 1.43
(a) The sample is the 10 selected twitter accounts.
(b) The target population is all twitter accounts. (c) The population we can generalize to, given the sample, is only twitter accounts of this author’s followers, since this is the population from which the sample was drawn. 1.44
(a) The sample is the 1500 people who were contacted.
(b) The population we are interested in is all residents of the US. (c) A population we can generalize to, given our sample, is residents of Minnesota. 1.45
(a) The sample is the girls who are on the selected basketball teams.
(b) The population we are interested in is all female high school students. (c) A population we can generalize to, given our sample, is female high school students who are on a basketball team. 1.46 Yes, this is a random sample from the population. 1.47 Yes, this is random sample from the population. 1.48 No, this is not a random sample, because some employees may be more likely than others to actually complete the survey. 1.49 No, this is not a random sample, because certain segments of the population (e.g., those not attending college) cannot be selected.
CHAPTER 1
7
1.50 No, this is not a random sample. We might think we can pick out a “representative sample”, but we probably can’t. We need to let a random number generator do it for us. 1.51 No, this is not a random sample, this is a volunteer sample, since the only people in the sample are those that self-select to respond to the online poll. 1.52 This sample is definitely biased because only students who are at the library on a Friday night can be selected. The random sample should be from all students. 1.53 This is biased because the way the question is worded is not at all objective. Although the sample is a random sample, the wording bias may distort the results. 1.54 This sample is biased because taking 10 apples off the top is not a random sample. The apples on the bottom of the truckload are probably more likely to be bruised. 1.55 From the description, it appears that this method of data collection is not biased. 1.56 This sample is biased because it is a volunteer survey in which people choose to participate or not. Most likely, the people taking the time to respond to the email will have stronger opinions than the rest of the student body. 1.57 Because this was a random sample of parents in Kansas City, the result can be generalized to all parents in Kansas City. 1.58
(a) No, the sample is almost certainly not representative, since it is a volunteer sample and only includes people who visit that website and who chose to participate in the poll.
(b) No, it is not appropriate to generalize since the sample is not representative. 1.59
(a) Yes, the sample is likely to be representative since it is a random sample.
(b) Yes, since the sample is a random sample, we can generalize to the population of all Canadian consumers. 1.60
(a) The sample is the 1, 000 US adults that were contacted. The intended population is all US adults.
(b) Yes, it is reasonable to generalize since the sample was selected randomly. 1.61
(a) The sample is the 800 people who participated in the survey. The intended population is all US smartphone users.
(b) The cases are the people who participated in the survey. There are two variables: Whether or not a food delivery app was used in the last month, and which app (if any) was used. Both are categorical. 1.62
(a) The 457 students in the sample is not a larger population. We would know the exact answers for this group so no need to generalize.
(b) The sample was randomly selected from among all Pennsylvania high school seniors who participated in the Census at School project, so that would be a reasonable population to generalize results to. (c) There might be something special about schools (or students) who participate in the Census at School project, so it might not be reasonable to generalize the results from that sample to all Pennsylvania high school seniors. (d) It would not be reasonable to generalize from a sample of only Pennsylvania students to students from all states.
CHAPTER 1
8 1.63
(a) There is no sampling bias here, since we are told that a random sample was used. The results are being manipulated based only on the order of the options (the wording of the question) so this is an example of wording bias.
(b) We see that people are more likely to select the first option given, so you should ask your friend to choose between “Option W and Option Q,“ with Option W presented first. 1.64
(a) Yes, there is probably sampling bias since the group has not shared the method. No, we cannot know if it is appropriate to generalize, and it is probably not appropriate to do so. The survey might have been asked of children, for example. This would certainly create sampling bias.
(b) Yes, the way the question was worded almost certainly biased the results. A better way might be to ask an open-ended question such as describe how chocolate milk is made, 1.65
(a) The individual cases are the over 6000 restroom patrons who were observed. The description makes it clear that at least three variables are recorded. One is whether or not the person washed their hands, another is the gender of the individual, and a third is the location of the observation. All three are categorical.
(b) In a phone survey, people are likely to represent themselves in the best light and not always give completely honest answers. That is why it is important to also find other ways of collecting data, such as this method of observing people’s actual habits in the restroom. 1.66
(a) The sample is the survey participants, the population is all professors at the University of Nebraska.
(b) No, we cannot conclude that the sample of survey responders is not representative of professors at the University of Nebraska since we are not given enough information to decide one way or the other. (c) No, the 94% is based on self descriptions, which can be (and in this case, probably are) biased. 1.67 No. This is a volunteer sample, and there is reason to believe the participants are not representative of the population. For example, some may choose to participate because they LIKE alcohol and/or marijuana, and those in the sample may tend to have more experience with these substances than the overall population. In addition, the advertisements for the study were aired on rock radio stations in Sydney, so only those people who listen to rock radio stations in Sydney would hear about the option to participate. 1.68 Yes! The sample is a random sample so we can be quite confident that it is probably a representative sample. 1.69
(a) This is not a simple random sample from the population, since only those who saw and wanted to click and complete the survey were included.
(b) These results could also have been biased by how the survey was constructed. The wording of the questions might also introduce bias. 1.70 The study given found a relationship in a sample of rats. This relationship may not generalize to the human population. 1.71 The sample of planes that return from bombing missions was biased. More bullet holes were found in the wings and tail because planes that were shot in other regions were more likely to crash and not return. 1.72
(a) The population in the CPS is all US residents. (Also acceptable: US citizens, US households...)
(b) The population in the CES survey is all non-farm businesses and government agencies in the US.
CHAPTER 1 (c)
1.73
9
i. The CES survey would be more relevant, because the question pertains to companies. ii. The CPS would be more relevant, because the question pertains to American people. iii. The CPS would be more relevant, because the question pertains to people, not businesses. (a) Since the NHANES sample is drawn from all people in the US, that is the population we can generalize to.
(b) Since the NHAMCS sample is drawn from patients in emergency rooms in the US, we can generalize the results to all emergency room patients in the US. (c)
i. NHANES: The question about an association between being overweight and developing diabetes applies to all people in the US, not just those who visit an emergency room. ii. NHAMCS: This question asks specifically about the type of injury for people who go to an emergency room. iii. NHAMCS: This question of average waiting time only applies to emergency room patients. iv. NHANES: This question is asking about all US residents. Note that the proportion would be equal to one for the people sampled in NAMCS since they only get into the sample if they visit an emergency room!
1.74 Answers will vary. See the technology notes to see how to use specific technology to select a random sample. 1.75 Answers will vary. See the technology notes to see how to use specific technology to select a random sample.
10
CHAPTER 1
Section 1.3 Solutions 1.76 The use of “improves” implies this is a causal association. 1.77 Since “no link is found” there is neither association nor causation. 1.78 The phrase “leads to deaths” indicates a causal association. 1.79 The phrase “more likely” indicates an association, but there is no claim that wealth causes people to lie, cheat or steal. 1.80 The phrase “tend to be more educated” indicates an association, but there is no claim that owning a cat causes more education (or that better education causes people to prefer cats). 1.81 The statements imply that eating more fiber will cause people to lose weight, so this is a causal association. 1.82 One possible confounding variable is temperature (or season). More people eat ice cream, and go swimming, in warm weather. Other answers are possible. Remember that a confounding variable should be associated with both of the variables of interest. 1.83 One possible confounding variable is population. Increasing population in the world over time may mean more beef and more pork is consumed. Other answers are possible. Remember that a confounding variable should be associated with both of the variables of interest. 1.84 One possible confounding variable is wealth. People who own a yacht are likely wealthy and can afford a sports car. Other answers are possible. Remember that a confounding variable should be associated with both of the variables of interest. 1.85 One possible confounding variable is snow in the winter. When there is more snow, sales of both toboggans and mittens will be higher. Remember that a confounding variable should be associated with both of the variables of interest. 1.86 One possible confounding variable is number of cars (and also number of people). If there are lots of cars, there will be more pavement and more air pollution. Remember that a confounding variable should be associated with both of the variables of interest. 1.87 One possible confounding variable is gender. Males usually have shorter hair and are taller. Other answers are possible. Remember that a confounding variable should be associated with both of the variables of interest. 1.88 We are not manipulating any variables in this study, we are only collecting information (rice preference and metabolism) as they exist. This is an observational study. 1.89 We are actively manipulating the explanatory variable (playing music or not), so this is an experiment. 1.90 We are actively manipulating the explanatory variable (planting trees or not), so this is an experiment. 1.91 We are not manipulating any variables in this study, we are only measuring things (omega-3 oils and water acidity) as they exist. This is an observational study.
CHAPTER 1
11
1.92 Data were collected after the fact from sprinters, marathon runners, and non-athletes. No genes were manipulated. These data came from an observational study. 1.93 The penguins in this study were randomly assigned to get either a metal or an electronic tag so this is an experiment. 1.94 All three studies are experiments since the scientists actively control the treatment (tears or salt solution). 1.95 A possible confounding variable is amount of snow and ice on the roads. When more snow and ice has fallen, more salt will be needed and more people will have accidents. Notice that the confounding variable has an association with both the variables of interest. 1.96 Age or grade level! Certainly, students in sixth grade can read substantially better than students in first grade and they are also substantially taller. Grade level is influencing both of the variables and it is a confounding variable. If we look at individual grades one at a time, the association could easily disappear. 1.97
(a) The cases are people. The two variables are whether or not the person is a golfer and how long the person lives.
(b) No variables were manipulated, so this result comes from an observational study. (c) A confounding variable is a variable that is associated with the other variables and may help explain the association. Wealth or income is one obvious confounding variable, since golfing is a relatively expensive sport and therefore associated with wealth and wealth is also associated with living longer. Other confounding variables are also possible. 1.98
(a) The cases are the 1, 773 participants.
(b) The explanatory variable is the group to which the participant is assigned. It is categorical. (c) This is an experiment since the group is randomly assigned by the researchers. (d) There are 1, 773 cases and four variables (group assignment and all three attitude measures), so the dataset will have 1773 rows and 4 columns. 1.99 Yes, this study provides evidence that louder music causes people to drink more beer, because the explanatory variable (volume of music) was randomly determined by the researchers and an association was found. 1.100 Study 1 is a randomized, controlled experiment, while Study 2 is observational. Therefore, Study 1 provides better evidence of causation, while Study 2 is more prone to confounding variables (e.g., people who eat more nuts may also be more healthy in other ways). 1.101
(a) Yes. Because the study in mice was a randomized experiment, we can conclude causation.
(b) No. Since it appears that the study in humans was an observational study, it is not appropriate to conclude causation. Although the headline may still be true for humans, we cannot make this conclusion based on the study described. 1.102
(a) Since the students weren’t assigned to one type of diet or another, this is an observational study.
(b) No, we cannot conclude a cause-and-effect relationship since the result does not come from a randomized experiment.
CHAPTER 1
12
(c) A confounding variable is a variable that is associated with the other variables and may help explain the association. In this case, there are many possible confounding variables. One is exercise, since students who exercise may also tend to eat well and may also have lower rates of depression. 1.103 (a) The two variables are amount of iron in the soil and amount of potassium in spinach grown in that soil. Both variables are quantitative. (b) Amount of iron in the soil is the explanatory variable since it appears to affect the amount of potassium in the spinach. (c) Since there is no indication that the amount of iron in the soil was controlled by the researchers, the data appear to come from an observational study. (d) No. Since the data do not come from an experiment, we cannot assume causation. (e) A confounding variable is a variable that might affect both the amount of iron in the soil and the amount of potassium in the spinach leaves. Examples might be the amount of rainfall or the temperature of the soil or other chemicals in the soil. 1.104 (a) The cases are the children being studied and there are 19 of them. There are three variables: Hours spent reading, Hours of screen time, and Connectivity in the brain. (b) All three variables are quantitative. It appears that hours spent reading and hours of screen time are the explanatory variables and connectivity in the brain is the response variable. (c) This is an observational study, since no variables were controlled. (d) No, we cannot conclude causation since this was not an experiment. (e) A confounding variable affects both variables in an association and might help explain the apparent association. In this case, one possible confounding variable might be the connectivity in the parents’ brains, since parents with higher brain connectivity might be more likely to read to their kids and less likely to have them spend time on screens, and children of parents with high brain connectivity might be more likely to inherit high brain connectivity. Other answers are possible. 1.105
(a) This is an observational study since none of the variables was controlled.
(b) There are several possible answers. Wealth is one, since wealth has been shown to be associated with better health and wealthy people might find it easier to take vacations. Job stress might be another since job stress might harm health and job stress might also make it difficult to take vacations. (c) The results of this study indicate an association but not a causation association. Thus the first two statements are appropriate and the second two are not. (i) Yes (ii) Yes (iii) No (iv) No. (d) No. “Want to be Healthier? Take a Vacation!” implies cause-and-effect, which is not an appropriate conclusion from this observational study. 1.106 No, this was an observational study and allowed for many confounding variables. 1.107
(a) The cases are the 2,623 schoolchildren.
(b) The explanatory variable is the amount of greenery around the schools. (c) The response variable is the score on the memory and attention tests. (d) Yes, the headline implies that more green space causes kids to be smarter. (e) No variables were manipulated so this is an observational study. (f) No! Since this is not an experiment, we cannot conclude causation.
CHAPTER 1
13
(g) The socioeconomic status of the children is a possible confounding variable, since it is likely to effect both the amount of green space and also the test scores. There are other possible answers. 1.108
(a) The cases are the Danish men who were included in the study.
(b) The explanatory variable is whether or not the person has been in the hospital for an infection. Since the answer is either yes or no, this is a categorical variable. (c) The response variable is the IQ score for the person. We can find an average so this is a quantitative variable. (d) Yes, the headline implies that infections lower IQ, which is causation. (e) The explanatory variable was not manipulated, so this is an observational study. (f) No, since this was not an experiment, it is not appropriate to conclude causation. There are many possible lurking variables that might impact both IQ and the likelihood of having an infection. 1.109 (a) The explanatory variable is amount of leisure time spent sitting and the response variable is whether or not the person gets cancer. (b) This is an observational study because the explanatory variable was not randomly assigned. (c) No, we cannot conclude spending more leisure time sitting causes cancer in women because this is an observational study. (d) No, we also cannot conclude that spending more leisure time sitting does not cause cancer in women; because this was an observational study we can make no conclusions about causality. Sitting may or may not cause cancer. 1.110 (a) The explanatory variable is whether or not the participants were given access to food and drink after 10pm, or just water. The response variables are reaction time and number of attention lapses. (b) This is a randomized experiment because the explanatory variable was randomly assigned. (c) Yes, we can conclude that eating late at night worsens reaction time and increases attention lapses for sleep deprived people. (d) No, there are not likely to be confounding variables, because this was a randomized experiment. 1.111 (a) This is an observational study because we cannot randomly determine when a child learns to talk. (b) No, we cannot conclude that early language skills reduce preschool tantrums because this is an observational study. (c) There are many possible confounding variables, such as the intelligence of the child or parental involvement. 1.112 (a) No, because it is an observational study, not a randomized experiment. People were not randomly assigned to buy organic food or not, but rather chose for themselves, so the groups almost certainly differed to begin with. (b) Answers will vary, but possible answers may include income (organic food costs more money and wealthier people are generally healthier) or other health-conscious behaviors (people who care about eating organic may eat healthier in general). (c) No, because we do not have evidence against alternative explanation (ii); we cannot determine whether explanation (i) or (ii) (or both) are driving the observed difference.
14
CHAPTER 1
1.113 (a) Yes, because this was a randomized experiment; it was randomly determined which flies ate organic and which ate conventional food. (b) Answer to part (a) was not no. (c) Yes, because we have evidence against both of the competing alternative explanations, we have convincing evidence that eating organic food really does make fruit flies live longer. 1.114 (a) Since all the children are presented with all three formats, this is a matched pairs experiment (although it might more accurately be called a matched triples experiment!) (b) Yes, we can conclude causation since the results come from an experiment. (c) The formats (and the stories used) should be presented to the children in random order. (This is how the experiment was conducted.) 1.115 (a) The explanatory variable is whether the person just had a full night of sleep or 24 hours of being awake. The response variable is ability to recognize facial expressions. (b) This is a randomized experiment, a matched pairs experiment because each person received both treatments. (c) Yes, we can conclude that missing a night of sleep hinders the ability to recognize facial expressions, because the explanatory variable was randomly assigned. (d) No, we cannot conclude that better quality of REM sleep improves ability to recognize facial expressions, because the explanatory variable in this case (quality of REM sleep) was not randomly assigned. 1.116 (a) No, we cannot conclude that drinking diet soda causes weight gain because the study was observational and prone to confounding variables. For example, it is possible that seniors who drink more diet soda do so because they know they are prone to gaining weight. (b) Neither study is perfect. The study on senior citizens is observational and the study on rats may not generalize to humans. However, together these studies provide a more convincing case that diet soda can cause weight gain. Opinions can vary on how strong this evidence for causation is. 1.117 (a) It is an observational study since no one assigned some people to live in a city and some to live in the country. (b) No, since we can never conclude from an observational study that there is a causal association. (c) The 2011 study is also an observational study, since, again, no one assigned some people to live in a city and some to live in the country. (d) The explanatory variable is whether or not the participant lives in the city or the country, which is a categorical variable. The response variable is level of activity in stress centers of the brain, which is quantitative. (e) No! The results come from an observational study, so we cannot conclude a causal relationship. 1.118 The explanatory variables are the type of payment and sex. Only the type of payment can be randomly assigned. The number of items ordered and cost are response variables. 1.119 (a) The explanatory variable is whether or not the person had a good night’s sleep or is sleepdeprived. The response variable is attractiveness rating. (b) Since the explanatory variable was actively manipulated, this is an experiment. The two treatments are well-rested and sleep-deprived. Since all 23 subjects were photographed with both treatments, this is a matched pairs experiment.
CHAPTER 1
15
(c) Yes, we can conclude that sleep-deprivation causes people to look less attractive, because this is an experiment. 1.120 (a) We randomly divide the participants into two groups of 25 each. Half will be given fluoxetine and half will get a placebo. (b) The placebo pills will look exactly like the fluoxetine pills and will be taken the same way, but they will not have any active ingredients. (c) The patients won’t know who is getting which type of pill (the fluoxetine or the placebo) and the people treating the patients and administering the questionnaire won’t know who is in which group. 1.121
(a) The explanatory variable is amount of sleep and the response variable is growth in height.
(b) We would take a sample of children and randomly divide them into two groups. One group would get lots of sleep and the other would be deprived of sleep. Then after some time passed, we would compare the amount of height increase for the children in the two groups. (c) An experiment is necessary in order to verify a cause and effect relationship, but it would definitely not be appropriate to randomly assign some of the kids to be sleep-deprived for long periods of time just for the purposes of the experiment! 1.122 (a) Randomly assign 25 people to carbo-load and 25 people to not carbo-load and then measure each person’s athletic performance the following day. (b) We would have each person carbo-load and not carbo-load, on different days (preferably different weeks). The order would be randomly determined, so some people would carbo-load first and other people would carbo-load second. In both cases athletic performance would be measured the following day and we would look at the difference in performance for each person between the two treatments. (c) The matched pairs experiment is probably better because we are able to compare the different effects for the same person. It is more precise comparing one person’s athletic performance under two different treatments, rather than different people’s athletic performance under two different treatments. 1.123 (a) Randomly divide the students into two groups of 20 students each. One group gets alcohol and the other gets water. Measure reaction time for students in both groups. (b) Measure reaction time for all 40 students both ways: after drinking alcohol and after drinking water. Do the tests on separate days and randomize the order in which the students are given the different treatments. Measure the difference in reaction time for each student. 1.124 Answers will vary. Example: The total amount of pizza consumed and the total amount of cheese consumed, per year, over the last century. Eating more pizza causes people to eat more cheese, but the overall rise in population is also a confounding variable.
CHAPTER 2
16 Section 2.1 Solutions
2.1 The total number is 169 + 193 = 362, so we have p̂ = 169/362 = 0.4669. We see that 46.69% are female. 2.2 Since the total number is 43 + 319 = 362, we have p̂ = 43/362 = 0.1188. We see that 11.88% percent of the students in the sample are smokers. 2.3 The total number is 94 + 195 + 35 + 36 = 360 and the number who are juniors or seniors is 35 + 36 = 71. We have p̂ = 71/360 = 0.1972. We see that 19.72% percent of the students who identified their class year are juniors or seniors. 2.4 The total number of students who reported SAT scores is 355, so we have p̂ = 205/355 = 0.5775. We see that 57.75% have higher math SAT scores. 2.5 Since this describes a proportion for all residents of the US, the proportion is for a population and the correct notation is p. We see that the proportion of US residents who are foreign born is p = 0.124. 2.6 The report describes the results of a sample, so the correct notation is p̂. We see that the proportion of likely voters in the sample who believe children of illegal immigrants should be able to attend public school is p̂ = 0.45. 2.7 The report describes the results of a sample, so the correct notation is p̂. The proportion of US teens who say they have made a new friend online is p̂ = 605/1060 = 0.571. 2.8 Information is provided for an entire population so we use the notation p for the proportion. The proportion is p = 554, 665/2, 220, 087 = 0.250. 2.9 A relative frequency table is a table showing the proportion in each category. We see that the proportion preferring an Academy award is 31/362 = 0.086, the proportion preferring a Nobel prize is 149/362 = 0.412, and the proportion preferring an Olympic gold medal is 182/362 = 0.503. These are summarized in the relative frequency table below. In this case, the relative frequencies actually add to 1.001 due to round-off error. Response Academy award Nobel prize Olympic gold medal Total
Relative frequency 0.086 0.412 0.503 1.00
2.10 A relative frequency table is a table showing the proportion in each category. In this case, the categories we are given are “No piercings”, “One or two piercings”, and “More than two piercings”. The relative frequency with no piercings is 188/361 = 0.521, the relative frequency for one or two piercings is 82/361 = 0.227. The total has to add to 361, so there are 361 − 188 − 82 = 91 students with more than two piercings, and the relative frequency is 91/361 = 0.252. These are summarized in the relative frequency table below. Response No piercings One or two piercings More than two piercings Total
Relative frequency 0.521 0.227 0.252 1.00
CHAPTER 2 2.11
17
(a) We see that there are 200 cases total and 80 had Outcome A, so the proportion with Outcome A is 80/200 = 0.40.
(b) We see that there are 200 cases total and 100 of them are in Group 1, so the proportion in Group 1 is 100/200 = 0.5. (c) There are 100 cases in Group 1, and 80 of these had Outcome B, so the proportion is 80/100 = 0.80. (d) We see that 80 of the cases had Outcome A and 60 of these were in Group 2, so the proportion is 60/80 = 0.75. 2.12
(a) We see that there are 100 cases total and 70 had Outcome A, so the proportion with Outcome A is 70/100 = 0.70.
(b) We see that there are 100 cases total and 50 of them are in Group 1, so the proportion in Group 1 is 50/100 = 0.5. (c) There are 50 cases in Group 1, and 10 of these had Outcome B, so the proportion is 10/50 = 0.20. (d) We see that 70 of the cases had Outcome A and 30 of these were in Group 2, so the proportion is 30/70 = 0.429. 2.13 The Canadian census includes all Canadian adults, so this is a proportion for a population and the correct notation is p. We know that 81.7% is the same as 0.817 so we have p = 0.817. 2.14 The Canadian census includes all Canadian adults, so this is a proportion for a population and the correct notation is p. We know that 45.7% is the same as 0.457 so we have p = 0.457. 2.15 Since the dataset includes all professional soccer games, this is a population. The cases are soccer games and there are approximately 66,000 of them. The variable is whether or not the home team won the game; it is categorical. The relevant statistic is p = 0.624. 2.16
(a) The proportion not working is 2649/5204 = 0.509.
(b) The number working is 1436 + 1119 = 2555 so the proportion working is 2555/5204 = 0.491. (We could have also arrived at this answer by taking one minus the proportion not working: 1 − 0.509 = 0.491.) (c) The proportion working on campus is 1436/5204 = 0.276 and the proportion working off campus is 1119/5204 = 0.215. We already found the proportion not working in part (a). The relative frequency table is shown. Paying job? Works on campus Works off campus Does not work Total 2.17
Relative frequency 0.276 0.215 0.509 1.00
(a) Since the proportions are given (instead of the counts) for the different options for this single categorical variable, this is a relative frequency table.
(b) Since the relative frequencies add up to 1, the proportion selecting an “Other” food delivery app is 1 − (0.276 + 0.267 + 0.252 + 0.121) = 0.084. 2.18
(a) The variable records whether or not tylosin appears in the dust samples. The individual cases in the study are the 20 dust samples.
CHAPTER 2
18 (b) Here is a frequency table for the presence or absence of tylosin in the dust samples. Category Tylosin No tylosin Total
Frequency 16 4 20
(c) A bar chart for the frequencies is shown below.
(d) The table below shows the relative frequencies for cases with and without tylosin. Category Tylosin No tylosin Total 2.19
Relative frequency 0.80 0.20 1.00
(a) The sample is the 119 players who were observed. The population is all people who play rockpaper-scissors. The variable records which of the three options each player plays. This is a categorical variable.
(b) A relative frequency table is shown below. We see that rock is selected much more frequently than the others, and then paper, with scissors selected least often. Option selected Rock Paper Scissors Total
Relative frequency 0.555 0.328 0.118 1.0
(c) Since rock is selected most often, your best bet is to play paper. (d) Your opponent is likely to play paper again, so you should play scissors. 2.20
(a) This is a population, since we are looking at all sports-related concussions over an entire year.
(b) The proportion who received their concussion playing football is 20293/100951 = 0.201. (c) The proportion who received their concussion riding bicycles is 23405/100951 = 0.232.
CHAPTER 2
19
(d) We cannot conclude that riding bicycles is more dangerous, because there are probably many more children riding bicycles than there are children playing football. 2.21
(a) The table is given.
Agree Disagree Don’t know Total
HS or less 363 557 20 940
Some college 176 466 26 668
College grad 196 789 32 1017
Total 735 1812 78 2625
(b) For the survey participants with a high school degree or less, we see that 363/940 = 0.386 or 38.6% agree. For those with some college, the proportion is 176/668 = 0.263, or 26.3% agree, and for those with a college degree, the proportion is 196/1017 = 0.193, or 19.3% agree. There appears to be an association, and it seems that as education level goes up, the proportion who agree that every person has one true love goes down. (c) We see that 1017/2625 = 0.387, or 38.7% of the survey responders have a college degree or higher. (d) A total of 1812 people disagreed and 557 of those have a high school degree or less, so we have 557/1812 = 0.307, or 30.7% of the people who disagree have a high school degree or less. 2.22
(a) There are 38 people in the control group and 26 of them developed the disease, so the proportion is p̂C = 26/38 = 0.684.
(b) There are 38 people in the teplizumab group and 16 of them developed the disease, so the proportion is p̂T = 16/38 = 0.421. (c) We see that the differences in proportions is p̂C − p̂T = 0.684 − 0.421 = 0.263. (d) Cases were randomized to treatment groups, so this is an experiment. (e) Yes. Because this is an experiment, we can conclude that the drug causes a reduction in the disease. 2.23
(a) We see that p̂1 = 16/165 = 0.097.
(b) We see that p̂2 = 23/495 = 0.046. (c) The group with high social media use was more than twice as likely to develop symptoms of ADHD. (d) The difference in proportions is p̂1 − p̂2 = 0.097 − 0.046 = 0.051. (e) There were 660 teens in the study and 39 of them developed ADHD symptoms, so the proportion is 39/660 = 0.059. (f) The teens were not randomly assigned to high or low social media use, so this is an observational study. (g) No, we cannot conclude that there is a causation effect since these results are from an observational study. There are many possible confounding variables. 2.24
(a) When hiding, the proportion of the time the rats went into opaque boxes is 38/53 = 0.717.
(b) When seeking, the proportion of the time the rats went into opaque boxes is 14/31 = 0.452. (c) The notation and value for the difference in sample proportions is p̂1 − p̂2 = 0.717 − 0.452 = 0.265 2.25
(a) The proportion of children who were given antibiotics is 438/616 = 0.711.
CHAPTER 2
20
(b) The proportion of children who were classified as overweight at age 9 is 181/616 = 0.294. (c) The proportion of those receiving antibiotics who were classified as overweight at age 9 is 144/438 = 0.329. (d) The proportion of those not receiving antibiotics who were classified as overweight at age 9 is 37/178 = 0.208. (e) Since p̂A = 0.329 and p̂N = 0.208, the difference in proportions is p̂A − p̂N = 0.329 − 0.208 = 0.121. (f) Out of all children classified as overweight, the proportion who were given antibiotics is 144/181 = 0.796. 2.26
(a) The proportion feeling that the voices are mostly negative is 20/60 = 0.333.
(b) The proportion of US participants feeling that the voices are mostly negative is 14/20 = 0.70. (c) There are 40 non-US participants in the study, and 4 + 2 = 6 of them feel that the voices are mostly negative, so the proportion is 6/40 = 0.15. (d) The number of participants hearing positive voices is 29, and none of them is from the US, so the proportion is 0/29 = 0. (e) Yes, there appears to be a strong association between culture and how the voices are perceived. 2.27 Since these are population proportions, we use the notation p. We use pH to represent the proportion of high school graduates unemployed and pC to represent the proportion of college graduates (with a bachelor’s degree) unemployed. (You might choose to use different subscripts, which is fine.) The difference in proportions is pH − pC = 0.097 − 0.052 = 0.045. 2.28
(a) There are two variables, both categorical. One is whether or not the dog selected the cancer sample and the other is whether or not the test was a breath test or a stool test.
(b) We need to include all possible outcomes for each variable when we make a two way table. The result variable has two options (dog is correct or dog is not correct) and the type of test variable has two options (breath or stool). The two-way table below summarizes these data.
Dog selects cancer Dog does not select cancer Total
Breath test 33 3 36
Stool test 37 1 38
Total 70 4 74
(c) The dog got 33/36 = 0.917 or 91.7% of the breath samples correct and 37/38 = 0.974 or 97.4% of the stool samples correct. (d) The dog got 70 tests correct and 37 of those were stool samples, so 37/70 = 0.529 of the tests the dog got correct were stool samples. 2.29
(a) This is an observational study since the researchers are observing the results after the fact and are not manipulating the gene directly to force a disruption. There are two variables: whether or not the person has dyslexia and whether or not the person has the DYXC1 break.
(b) Since 109 + 195 = 304 people participated in the study, there will be 304 rows. Since there are two variables, there will be 2 columns: one for dyslexia or not and one for gene break or not. (c) A two-way table showing the two groups and gene status is shown.
CHAPTER 2
21
Dyslexia group Control group Total
Gene break 10 5 15
No break 99 190 289
Total 109 195 304
(d) We look at each row (Dyslexia and Control) individually. For the dyslexia group, the proportion with the gene break is 10/109 = 0.092. For the control group, the proportion with the gene break is 5/195 = 0.026. (e) There is a very substantial difference between the two proportions in part (d), so there appears to be an association between this particular genetic marker and dyslexia for the people in this sample. (As mentioned, we see in Chapter 4 how to determine whether we can generalize this result to the entire population.) (f) We cannot assume a cause-and-effect relationship because this data comes from an observational study, not an experiment. There may be many confounding variables. 2.30
(a) There are two options for the group: therapy and no therapy. There are two options for the outcome: improvement or no improvement. The two-way table is shown, with totals included.
Therapy No therapy Total
Improvement 14 3 17
No improvement 6 17 23
Total 20 20 40
(b) Seventeen people reported sleep improvement out of 40 people in the study, so the proportion is 17/40 = 0.425. (c) Twenty people received therapy and 14 reported improvement, so the proportion is 14/20 = 0.70. (d) Twenty people did not receive therapy and only 3 reported improvement, so the proportion is 3/20 = 0.15. (e) The difference in proportions is p̂T − p̂N = 0.70 − 0.15 = 0.55. 2.31
(a) This is an experiment. Participants were actively assigned to receive either electrical stimulation or sham stimulation.
(b) The study appears to be single-blind, since it explicitly states that participants did not know which group they were in. It is not clear from the description whether the study was double-blind. (c) There are two variables. One is whether or not the participants solved the problem and the other is which treatment (electrical stimulation or sham stimulation) the participants received. Both are categorical. (d) Since the groups are equally split, there are 20 participants in each group. We know that 20% of the control group solved the problem, and 20% of 20 is 0.20(20) = 4 so 4 solved the problem and 16 did not. Similarly, in the electrical stimulation group, 0.6(20) = 12 solved the problem and 8 did not. See the table. Treatment Sham Electrical
Solved 4 12
Not solved 16 8
22
CHAPTER 2
(e) We see that 4 + 12 = 16 people correctly solved the problem, and 12 of the 16 were in the electrical stimulation group, so the answer is 12/16 = 0.75. We see that 75% of the people who correctly solved the problem had the electrical stimulation. (f) We have p̂E = 0.60 and p̂S = 0.20 so the difference in proportions is p̂E − p̂S = 0.60 − 0.20 = 0.40. (g) The proportions who correctly solved the problem are quite different between the two groups, so electrical stimulation does seem to help people gain insight on a new problem type. 2.32
(a) The total number of respondents is 27,255 and the number in an abusive relationship is 2627, so the proportion is 2627/27255 = 0.096. We see that about 9.6% of respondents have been in an emotionally abusive relationship in the last 12 months.
(b) We see that 2627 have been in an abusive relationship and 593 of these are male, so the proportion is 593/2627 = 0.226. About 22.6% of those in abusive relationships are male. (c) There are 8945 males in the survey and 593 of them have been in an abusive relationship, so the proportion is 593/8945 = 0.066. About 6.6% of male college students have been in an abusive relationship in the last 12 months. (d) There are 18310 females in the survey and 2034 of them have been in an abusive relationship, so the proportion is 2034/18310 = 0.111. About 11.1% of female college students have been in an abusive relationship in the last 12 months. 2.33
(a) The total number of respondents is 27,268 and the number answering zero is 18,712, so the proportion is 18712/27268 = 0.686. We see that about 68.6% of respondents have not had five or more drinks in a single sitting at any time during the last two weeks.
(b) We see that 853 students answer five or more times and 495 of these are male, so the proportion is 495/853 = 0.580. About 58% of those reporting that they drank five or more alcoholic drinks at least five times in the last two weeks are male. (c) There are 8,956 males in the survey and 912 + 495 = 1407 of them report that they have had five or more alcoholic drinks at least three times, so the proportion is 1407/8956 = 0.157. About 15.7% of male college students report having five or more alcoholic drinks at least three times in the last two weeks. (d) There are 18,312 females in the survey and 966 + 358 = 1324 of them report that they have had five or more alcoholic drinks at least three times, so the proportion is 1324/18312 = 0.072. About 7.2% of female college students report having five or more alcoholic drinks at least three times in the last two weeks. 2.34
(a) We see in part (a) of the figure that both males and females are most likely to say that they had no drinks of alcohol the last time they socialized.
(b) We see in part (b) of the figure that both males and females are most likely to say that a typical student at their school would have 5 to 6 drinks the last time they socialized. (c) No, perception does not match reality. Students believe that students at their school drink far more than they really do. Heavy drinkers tend to get noticed and skew student perceptions. When asked about a typical student and alcohol, students are much more likely to think of the heavy drinkers they know and not the non-drinkers. 2.35
(a) More females answered the survey since we see in graph (a) that the bar is much taller for females.
CHAPTER 2
23
(b) It appears to be close to equal numbers saying they had no stress, since the height of the brown bars in graph (a) are similar. Graph (a) is the appropriate graph here since we are being asked about actual numbers not proportions. (c) In this case, we are being asked about percents, so we use the relative frequencies in graph (b). We see in graph (b) that a greater percent of males said they had no stress. (d) We are being asked about percents, so we use the relative frequencies in graph (b). We see in graph (b) that a greater percent of females said that stress had negatively affected their grades. 2.36 A two-way table to compare the smoking status of the Reward and Deposit groups is shown below. Group Reward Deposit Total
Quit 156 78 234
Not quit 758 68 826
Total 914 146 1060
To compare the success rates between the two treatments, we find the proportion in each group who quit smoking for the six months. Reward: 156/914 = 0.171 vs Deposit: 78/146 = 0.534 We see that the percentage of the reward only group who quit (17.1%) is quite a bit smaller than those who deposit some of their own money (53.4%). 2.37 A two-way table to compare the participation rate of the Reward and Deposit groups is shown below. Group Reward Deposit Total
Accepted 914 146 1060
Declined 103 907 1010
Total 1017 1053 2070
To compare the participation rates between the two treatments, we find the proportion in each group who agreed to participate. Reward: 914/1017 = 0.899 vs Deposit: 146/1053 = 0.139 Not surprisingly, we see that the percentage in the Reward group who accepted the offer to participate in the program (89.9%) is much higher than in the Deposit group (13.9%) who were asked to risk some of their own money. 2.38 The Reward group originally had 1017 subjects and 156 + 3 = 159 eventually quit smoking, so the success rate in that group was 159/1017 = 0.156 (15.6%). In the Deposit group a total of 78 + 30 = 108 of the original 1053 subjects quit smoking to give a success rate of 108/1053 = 0.103 (10.3%). So, overall, the reward only was the best financial incentive, but both groups did better than the subjects with no incentive. 2.39 Recall that two variables are associated if values of one variable tend to be related to values of the other variable. In this case, it means that knowing whether a case is in Group 1 or Group 2 gives us information about the likely value of the Yes/No variable for that case. (a) Dataset B shows a clear association between the two variables since we see that cases in Group 1 are more likely to give Yes values while cases in Group 2 are more likely to give No values. The results on the Yes/No variable are quite different between Group 1 and Group 2.
CHAPTER 2
24
(b) Dataset A shows no association between the two variables, since values of the Yes/No variable are virtually the same between the two groups. 2.40
(a) If there is a clear association, then there is an obvious difference in the outcomes based on which treatment is used. There are many possible answers, but the most extreme difference (in which A is always successful and B never is) is shown below.
Treatment A Treatment B Total
Successful 20 0 20
Not successful 0 20 20
Total 20 20 40
(b) If there is no association, then there is no difference in the outcomes between Treatments A and B. There are many possible answers, but in every case the Treatment A and Treatment B rows would be the same or very similar. One possibility is shown in table below.
Treatment A Treatment B Total 2.41
Successful 15 15 30
Not successful 5 5 10
Total 20 20 40
(a) Table where the vaccine has no effect (10% infected in both groups)
Malaria No malaria Total
Vaccine 20 180 200
No vaccine 30 270 300
Total 50 450 500
(b) Table where the vaccine cuts the infection rate in half (from 10% to 5%).
Malaria No malaria Total
Vaccine 10 190 200
No vaccine 30 270 300
Total 40 460 500
2.42 Using technology we find the table below with counts and percentages for the superpower choices in this sample. It shows the most frequently chosen superpower is to freeze time (28.48%), while the least popular is super strength (4.19%). Superpower Fly Freeze time Invisibility Super strength Telepathy N= 2.43
Count 120 129 80 19 105 453
Percent 26.49 28.48 17.66 4.19 23.18
(a) The Year variable in StudentSurvey has two missing values. Tallying the 360 nonmissing values gives the table below.
CHAPTER 2
25
FirstYear 94
Sophomore 195
Junior 35
Senior 36
(b) The largest count is 195 sophomores. The relative frequency is 195/360 = 0.542 or 54.2%. 2.44 Tallying the 157 values in the Server variable of RestaurantTips gives the frequency table below. A 60
B 65
C 32
Server B had the most bills for a relative frequency of 65/157 = 0.414 or 41.4%. 2.45 Here is a two-way table showing the distribution of Year and Gender, with column percentages to show the gender breakdown in each class year.
F M All
FirstYear 43 45.74 51 54.26 94 100.00
Junior 18 51.43 17 48.57 35 100.00
Senior 10 27.78 26 72.22 36 100.00
Sophomore 96 49.23 99 50.77 195 100.00
All 167 46.39 193 53.61 360 100.00
We see that the male/female split is close to 50/50 for most of the years, except for the senior year which appears to have a much higher proportion of males. 2.46 Here is a two-way table showing the distribution of Credit and Server, with column percentages to show the credit card breakdown for each server.
n y All
A 39 65.00 21 35.00 60 100.00
B 50 76.92 15 23.08 65 100.00
C 17 53.13 15 46.88 32 100.00
All 106 67.52 51 32.48 157 100.00
We see that cash (Credit = n) is used more frequently for all servers, but more often for Server B and closer to 50/50 for Server C. 2.47 Here is a side-by-side bar chart showing the relationship between class year and gender for the StudentSurvey data. You might also choose a stacked bar chart or ribbon plot to show the relationship. Note that the categories are ordered alphabetically, rather than in year sequence.
CHAPTER 2
26
2.48 Here is a side-by-side bar chart showing the relationship between server and whether or not the bill was paid with a credit card for the RestaurantTips data. You might also choose a stacked bar chart or ribbon plot to show the relationship.
2.49 Using technology and the data in CollegeScores we find the distribution below for the three types of Control. Control Private Profit Public N=
Count 1832 2333 1976 6141
Percent 29.83 37.99 32.18
The most frequent type of control is for profit with 2333 of the 6141 schools (37.99%). 2.50 Using technology and the data in CollegeScores we find the distribution below for the five types of MainDegree. MainDegree 0 1 2 3 4 N=
Count 4 2689 1141 2012 295 6141
Percent 0.07 43.79 18.58 32.76 4.80
CHAPTER 2
27
The most frequent type of primary degree is the certificate (1) found at 2689 of the 6141 schools (43.79%). 2.51 Using technology and the data in CollegeScores we find the two-way table below for MainDegree and Control. Since we want to see the distribution within each type of Control (row), the table also shows the row percentages below each count. Rows: Control Private Profit Public All
0 4 0.22 0 0.00 0 0.00 4 0.07
Columns: MainDegree 1 2 3 175 167 1243 9.55 9.12 67.85 1906 225 170 81.70 9.64 7.29 608 749 599 30.77 37.90 30.31 2689 1141 2012 43.79 18.58 32.76
4 243 13.26 32 1.37 20 1.01 295 4.80
All 1832 100.00 2333 100.00 1976 100.00 6141 100.00
The most common main degree for Private schools is the bachelor’s degree (3) with 67.85%. The most common main degree for Profit schools is the certificate (1) with 81.70%. Public schools are the most evenly spread, with the highest percentage being 37.90% for associate’s degrees (2). 2.52 Using technology and the data in CollegeScores we find the two-way table below for MainDegree and Control. Since we want to see the distribution within each type of MainDegree (column), the table also shows the column percentages below each count. Rows: Control Private Profit Public All
Columns: MainDegree 0 1 2 3 4 175 167 1243 100.00 6.51 14.64 61.78 0 1906 225 170 0.00 70.88 19.72 8.45 0 608 749 599 0.00 22.61 65.64 29.77 4 2689 1141 2012 100.00 100.00 100.00 100.00
4 243 82.37 32 10.85 20 6.78 295 100.00
All 1832 29.83 2333 37.99 1976 32.18 6141 100.00
The most common type of control for schools with mainly No degrees (0) is Private (100.00%) Certificates (1) is Profit (70.88%) Associates (2) is Public (65.64%) Bachelors (3) is Private (61.78%) Graduates (4) is Private (82.37%) 2.53 Graph (b) is the impostor. It shows more parochial students than private school students. The other three graphs have more private school students than parochial.
28
CHAPTER 2
Section 2.2 Solutions 2.54 Histograms A and H are both skewed to the left. 2.55 Only histogram F is skewed to the right. 2.56 Histograms B, C, D, E, and G are all approximately symmetric. 2.57 While all of B, C, D, E, and G are approximately symmetric, only B, C, and E are also bell shaped. 2.58 Histogram A is skewed left, so the mean should be smaller then the median. The other three histograms (B, C, and D) are approximately symmetric so the mean and median will be approximately equal. 2.59 Histograms E and G are both approximately symmetric, so the mean and median will be approximately equal. Histogram F is skewed right, so the mean should be larger then the median; while histogram H is skewed left, so the mean should be smaller then the median. 2.60 Histogram C appears to have a mean close to 150, so it has the largest mean. Histogram H appears to have a mean around −2 or −3, so it has the smallest mean. 2.61 There are many possible dotplots we could draw that would be clearly skewed to the left. One is shown.
2.62 There are many possible dotplots we could draw that are approximately symmetric and bell-shaped. One is shown.
2.63 There are many possible dotplots we could draw that are approximately symmetric but not bell-shaped. One is shown.
CHAPTER 2
29
2.64 There are many possible dotplots we could draw that are clearly skewed to the right. One is shown.
2.65
(a) We have x = (8 + 12 + 3 + 18 + 15)/5 = 11.2.
(b) The median is the middle number when the numbers are put in order smallest to largest. In order, we have: 3
8
12
15
18
The median is m = 12. Notice that there are two data values less than the median and two data values greater. (c) There do not appear to be any particularly large or small data values relative to the rest, so there do not appear to be any outliers. 2.66
(a) We have x = (41 + 53 + 38 + 32 + 115 + 47 + 50)/7 = 53.714.
(b) The median is the middle number when the numbers are put in order smallest to largest. In order, we have: 32
38
41
47
50
53
115
The median is m = 47. Notice that there are three data values less than the median and three data values greater. (c) The value 115 is significantly larger than all the other data values, so 115 is a likely outlier. 2.67
(a) We have x = (15 + 22 + 12 + 28 + 58 + 18 + 25 + 18)/8 = 24.5.
(b) Since there are n = 8 values, the median is the average of the two middle numbers when the numbers are put in order smallest to largest. In order, we have: 12
15
18
18
22
25
28
58
CHAPTER 2
30
The median is the average of 18 and 22, so m = 20. Notice that there are four data values less than the median and four data values greater. (c) The value 58 is significantly larger than all the other data values, so 58 is a likely outlier. 2.68
(a) We have x = (110 + 112 + 118 + 119 + 122 + 125 + 129 + 135 + 138 + 140)/10 = 124.8.
(b) The data values are already in order smallest to largest. Since there are n = 10 values, the median is the average of the two middle numbers 122 and 125, so we have m = (122 + 125)/2 = 123.5. Notice that there are five data values less than the median and five data values greater. (c) There do not appear to be any particularly large or small data values relative to the rest, so there do not appear to be any outliers. 2.69 This is a sample, so the correct notation is x = 2386 calories per day. 2.70 This mean is from a sample, so the notation is x. 2.71 This is a population, so the correct notation is μ = 41.5 yards per punt. 2.72 This is a population, so the correct notation is μ = 2.6 television sets per household. 2.73
(a) We expect the mean to be larger since there appears to be a relatively large outlier (26.0) in the data values.
(b) There are eight numbers in the data set, so the mean is the sum of the values divided by 8. We have: Mean =
57.7 0.8 + 1.9 + 2.7 + 3.4 + 3.9 + 7.1 + 11.9 + 26.0 = = 7.2 mg/kg 8 8
The data values are already in order smallest to largest, and the median is the average of the two middle numbers. We have: 3.4 + 3.9 Median = = 3.65 2 2.74
(a) We compute x̄ = 26.6. Since there are ten numbers, we average the two middle numbers to find the median. We have m = (15 + 17)/2 = 16.
(b) Without the outlier, we have x̄ = 16.78. Since n = 9, the median is the middle number. We have m = 15. (c) The outlier has a very significant effect on the mean and very little effect on the median. 2.75
(a) This is a mean. Since number of cats owned is always a whole number, a median of 2.39 is impossible.
(b) Since this is a right-skewed distribution, we expect the mean to be greater than the median. 2.76 Since most insects have small weights, the frequency counts for small weights will be large and the frequency counts for larger weights will be quite small, so we expect the histogram to be skewed to the right. The mean will be larger since the outlier of 71 will pull the mean up. 2.77
(a) Since there are only 50 states and all of them are represented, this is the entire population.
(b) The distribution is skewed to the right. There appears to be at least one outlier at just under 40 million. (The outlier represents the state of California.) The value between 28 and 30 million might also considered be an outlier.
CHAPTER 2
31
(c) The median splits the data in half. The first two bars represent 23 of the 50 states, so the median will be somewhere in the third group, perhaps just under 5 million. (In fact, the median is 4.57 million.) (d) The mean is the balance point for the histogram and is harder to estimate. It should be larger than the median because of the right skew. A bit over 6 million might be a good guess. (In fact, it is 6.5 million.) 2.78
(a) The distribution is skewed to the left.
(b) The median is the value with half the area to the left and half to the right. The value 5 has way more area on the right so it cannot be correct. If we draw a line at 7, there is more area to the left than the right. The answer must be between 5 and 7 and a line at 6.5 appears to split the area into approximately equal amounts. The median is about 6.5. (c) Because the data is skewed to the left, the values in the longer tail on the left will pull the mean down. The mean will be smaller than the median. 2.79
(a) The distribution is skewed to the left since there are many values between about 74 and 84 and then a long tail going down to the low values to the left.
(b) Since half the values are above 74, the median is about 74. (The actual median is 74.3.) (c) Since the data is skewed to the left, the mean will be less than the median so the mean will be less than 74. (The actual mean is 72.47.) 2.80
(a) Since there are lots of small numbers and a few very large numbers, the distribution is skewed to the right.
(b) Since the few large numbers pull the mean up, the mean is 5.3 and the median is 1. 2.81 The speed values tend to be larger than the distance values and the speed distribution is right-skewed, so the speed mean should be larger than the speed median. This implies that the speed mean is 632 ypm and the median is 575 ypm. The distance distribution is left-skewed, so its mean should be smaller than its median. Thus the distance mean is 409 miles and the median is 429 miles. 2.82
(a) Using N and V for the subscripts to indicate noun or verb, we have xN = 3.21 and xV = 3.67.
(b) We have xN − xV = 3.21 − 3.67 = −0.46. (c) Because the participants were randomly assigned to one of the two conditions, this is a randomized experiment. 2.83
(a) The mean number of minutes on the treadmill for the mice receiving young blood is xY = 56.76 minutes.
(b) The mean number of minutes on the treadmill for the mice receiving old blood is xO = 34.69 minutes. (c) We see that xY − xO = 56.76 − 34.69 = 22.07. The mice receiving young blood were able to run on the treadmill for 22 minutes longer, on average, than the mice receiving old blood. (d) This is a randomized comparative experiment, as the mice were randomly assigned to the two groups. (e) Yes, we can conclude causation since the data come from an experiment. 2.84
(a) Since this is a sample, we use the notation x for the means. We use subscripts 1 and 2 for smartphone and desktop, respectively (or we could use S and D to be more clear). Using 1 and 2, we have x1 = 230 and x2 = 120.
CHAPTER 2
32
(b) The difference in means is 230− 120 = 110 and the notation is x1 − x2 so we have x1 − x2 = 230− 120 = 110. 2.85 The notation for a median is m. We use mH to represent the median earnings for high school graduates and mC to represent the median earnings for college graduates. (You might choose to use different subscripts, which is fine.) The difference in medians is mH − mC = 626 − 1025 = −399. College graduates earn about $400 more per week than high school graduates. 2.86 The Canadian census includes all Canadian adults, so this is a mean for a population and the correct notation is μ. We have μ = 2.4 people. 2.87
(a) There are many possible answers. One way to force the outcome is to have a very small outlier, such as 2, 51, 52, 53, 54. The median of these 5 numbers is 52 while the mean is 42.4.
(b) There are many possible answers. One way to force the outcome is to have a very large outlier, such as 2, 3, 4, 5, 200. The median of these 5 numbers is 4 while the mean is 42.8. (c) There are many possible answers. One option is the following: 2, 3, 4, 5, 6. Both the mean and the median are 4. 2.88 There are many possible answers. Any variable that measures something with large outliers will work. 2.89 The histogram is shown below.
We see that it is strongly skewed to the right. 2.90 The histogram is shown below.
CHAPTER 2
33
We see that it is mildly right skewed. 2.91
(a) Here is a histogram for the Enrollment variable in CollegeScores2yr.
The distribution shows a very strong skew to the right. (b) Using technology, we find the mean enrollment at two-year colleges is 4,050 students and the median is 1,748 students. (c) As expected for the strong right skew in the graph, the mean enrollment is much larger than the median. 2.92
(a) Here is a histogram for the Enrollment variable in CollegeScores4yr.
CHAPTER 2
34 The distribution shows a very strong skew to the right.
(b) Using technology, we find the mean enrollment at four-year colleges is 4,485 students and the median is 1,722 students. (c) As expected for the strong right skew in the graph, the mean enrollment is much larger than the median. 2.93
(a) It appears that the mean of the married women is higher than the mean of the never married women. We expect that the mean and the median will be the most different for the never married women, since that data is quite skewed while the married data is more symmetric.
(b) We have n = 1000 in each case. For the married women, we see that 162 women had 0 children, 213 had 1 child, and 344 had 2 children, so 162 + 213 + 344 = 719 had 0, 1, or 2 children. Less than half the women had 0 or 1 child and more than half the women had 0, 1, or 2 children so the median is 2. For the never married women, more than half the women had 0 children, so the median is 0.
CHAPTER 2
35
Section 2.3 Solutions (a) Using technology, we see that the mean is x = 17.36 with a standard deviation of s = 5.73.
2.94
(b) Using technology, we see that the five number summary is (10, 13, 17, 21, 28). Notice that these five numbers divide the data into fourths. 2.95
(a) Using technology, we see that the mean is x = 15.09 with a standard deviation of s = 13.30.
(b) Using technology, we see that the five number summary is (1, 4, 10, 25, 42). Notice that these five numbers divide the data into fourths. 2.96
(a) Using technology, we see that the mean is x = 10.4 with a standard deviation of s = 5.32.
(b) Using technology, we see that the five number summary is (4, 5, 11, 14, 22). Notice that these five numbers divide the data into fourths. 2.97
(a) Using technology, we see that the mean is x = 59.73 with a standard deviation of s = 17.89.
(b) Using technology, we see that the five number summary is (25, 43, 64, 75, 80). Notice that these five numbers divide the data into fourths. 2.98
(a) Using technology, we see that the mean is x = 9.05 hours per week with a standard deviation of s = 5.74.
(b) Using technology, we see that the five number summary is (0, 5, 8, 12, 40). Notice that these five numbers divide the data into fourths. 2.99
(a) Using technology, we see that the mean is x = 6.50 hour per week with a standard deviation of s = 5.58.
(b) Using technology, we see that the five number summary is (0, 3, 5, 9.5, 40). Notice that these five numbers divide the data into fourths. 2.100 We know that the standard deviation is a measure of how spread out the data are, so larger standard deviations go with more spread out data. All of these histograms are centered at 10 and have the same horizontal scale, so we need only look at the spread. We see that s = 1 goes with Histogram B and s = 3 goes with Histogram C and s = 5 goes with Histogram A. 2.101 Remember that a standard deviation is an approximate measure of the average distance of the data from the mean. Be sure to pay close attention to the scale on the horizontal axis for each histogram. (a) V (b) III (c) IV (d) I (e) VI (f) II 2.102 Remember that the five number summary divides the data (and hence the area in the histogram) into fourths. (a) II
36
CHAPTER 2
(b) V (c) IV (d) I (e) III (f) VI 2.103 Remember that the five number summary divides the data (and hence the area in the histogram) into fourths. (a) This shows a distribution pretty evenly spread out across the numbers 1 through 9, so this five number summary matches histogram W. (b) This shows a distribution that is more closely bunched in the center, since 50% of the data is between 4 and 6. This five number summary matches histogram X. (c) Since the top 50% of the data is between 7 and 9, this data is left skewed and matches histogram Y. (d) Since both the minimum and the first quartile are 1, there is at least 25% of the data at 1, so this five number summary matches histogram Z. 2.104 The mean appears to be at about x ≈ 500. Since 95% of the data appear to be between about 460 and 540, we see that two standard deviations is about 40 so one standard deviation is about 20. We estimate the standard deviation to be between 20 and 25. 2.105 The 10th-percentile is the value with 10% of the data values below it, so a reasonable estimate would be between 460 and 470. The 90th-percentile is the value with about 10% of the values above it, so a reasonable estimate would be between 530 and 540. 2.106 The minimum appears to be at 440, the median at 500, and the maximum at 560. The quartiles are a bit harder to estimate accurately. It appears that the lower quartile is about 485 and the upper quartile is about 515, so the five number summary is approximately (440, 485, 500, 515, 560). 2.107 The mean appears to be about 68. Since the data is relatively bell-shaped, we can estimate the standard deviation using the 95% rule. Since there are 100 dots in the dotplot, we want to find the boundaries with 2 or 3 dots more extreme on either side. This gives boundaries from 59 to 76, which is 8 or 9 units above and below the mean. We estimate the standard deviation to be about 4.5. 2.108 Since there are exactly n = 100 data points, the 10th-percentile is the value with 10 dots to the left of it. We see that this is at the value 62. Similarly, the 90th-percentile is the value with 10 dots to the right of it. This is the value 73. 2.109 We see that the minimum value is 58 and the maximum is 77. We can count the dots to find the value at the 25th-percentile, the 50th-percentile, and the 75th-percentile to find the quartiles and the median. We see that Q1 = 65, the median is at 68, and Q3 = 70. The five number summary is (58, 65, 68, 70, 77). 2.110 This dataset is very symmetric. 2.111 For this dataset, half of the values are clustered between 100 and 115, and the other half are very spread out to the right between 115 and 220. This distribution is skewed to the right. 2.112 For this dataset, half of the values are clustered between 22 and 27, and the other half are spread out to the left all the way down to 0. This distribution is skewed to the left.
CHAPTER 2
37
2.113 This data appears to be quite symmetric about the median of 36.3. 2.114 We have
243 − 200 Value − Mean = = 1.72 Standard deviation 25 This value is 1.72 standard deviations above the mean. Z-score =
2.115 We have
Value − Mean 88 − 96 = = −0.8 Standard deviation 10 This value is 0.80 standard deviations below the mean, which is likely to be relatively near the center of the distribution. Z-score =
2.116 We have
Value − Mean 5.2 − 12 = = −2.96 Standard deviation 2.3 This value is 2.96 standard deviations below the mean, which is quite extreme in the lower tail. Z-score =
2.117 We have
Value − Mean 8.1 − 5 = = 1.55 Standard deviation 2 This value is 1.55 standard deviations above the mean. Z-score =
2.118 The 95% rule says that 95% of the data should be within two standard deviations of the mean, so the interval is: Mean 200
± ±
2 · StDev 2 · (25)
200 150
± to
50 250
We expect 95% of the data to be between 150 and 250. 2.119 The 95% rule says that 95% of the data should be within two standard deviations of the mean, so the interval is: Mean
±
2 · StDev
10 10
± ±
2 · (3) 6
4
to
16
We expect 95% of the data to be between 4 and 16. 2.120 The 95% rule says that 95% of the data should be within two standard deviations of the mean, so the interval is: Mean 1000
± ±
2 · StDev 2 · (10)
1000 980
± to
20
We expect 95% of the data to be between 980 and 1020.
1020
CHAPTER 2
38
2.121 The 95% rule says that 95% of the data should be within two standard deviations of the mean, so the interval is: Mean 1500
± ±
2 · StDev 2 · (300)
1500 900
± to
600 2100
We expect 95% of the data to be between 900 and 2100. 2.122
(a) The numbers range from 46 to 61 and seem to be grouped around 53. We estimate that x ≈ 53.
(b) The standard deviation is roughly the typical distance of a data value from the mean. All of the data values are within 8 units of the estimated mean of 53, so the standard deviation is definitely not 52, 10, or 55. A typical distance from the mean is clearly greater than 1, so we estimate that s ≈ 5. (c) Using technology, we see x = 52.875 and s = 5.07. 2.123
(a) We see in the computer output that the five number summary is (119, 162, 171, 180, 210).
(b) The range is the difference between the maximum and the minimum, so we have Range = 210 − 119 = 91. The interquartile range is the difference between the first and third quartiles, so we have IQR = 180 − 162 = 18. (c) The 30th-percentile is between the first quartile and the median, so it will be between 162 and 171. The 80th-percentile is between the third quartile and the maximum, so it will be between 180 and 210. 2.124 (a) We see in the computer output that the mean armspan is x = 170.76 cm and the standard deviation is s = 13.46 cm. (b) We see that the largest value is 210, so we compute the z-score as: z-score =
210 − 170.76 x−x = = 2.915 s 13.46
The maximum armspan in this sample is 2.915 (or almost 3) standard deviations above the mean. We compute the z-score for the smallest armspan, 119, similarly: z-score =
x−x 119 − 170.76 = = −3.845 s 13.46
The minimum armspan in this sample is 3.845 standard deviations below the mean. The minimum would probably be considered an outlier. (c) Since the distribution is relatively symmetric and bell-shaped, we expect that about 95% of the data will lie within two standard deviations of the mean. We have: x − 2 · s = 170.76 − 2(13.46) = 143.84
and
x + 2 · s = 170.76 + 2(13.46) = 197.68
We expect about 95% of the data to lie between 143.84 and 197.68. (In fact, the 95% rule is very accurate in this case, since 397 of the 415 armspan values lie between these two numbers, which is 95.7%.) 2.125 (a) We see in the computer output that the mean obesity rate is μ = 31.43% and the standard deviation is σ = 3.82%.
CHAPTER 2
39
(b) We see that the largest value is 39.5, so we compute the z-score as: z-score =
39.5 − 31.43 x−μ = = 2.11 σ 3.82
The maximum of 39.5% obese (which occurs for both Mississippi and West Virginia) is slightly more than two standard deviations above the mean. We compute the z-score for the smallest percent obese, 23%, similarly: z-score =
23.0 − 31.43 x−μ = = −2.21 σ 3.82
The minimum of 23.0% obese, from the state of Colorado, is about 2.2 standard deviations below the mean. The minimum might be considered a mild outlier. (c) Since the distribution is relatively symmetric and bell-shaped, we expect that about 95% of the data will lie within two standard deviations of the mean. We have: μ − 2σ = 31.43 − 2(3.82) = 23.79
and
μ + 2σ = 31.43 + 2(3.82) = 39.07
We expect about 95% of the data to lie between 23.79% and 39.07%. In fact, this general rule is very accurate in this case, since the percent of the population that is obese lies within this range for 47 of the 50 states, which is 94%. (The only states outside the range are Colorado on the low side and West Virginia and Mississippi on the high side.) 2.126
(a) We see in the computer output that the five number summary is (23.0, 28.6, 30.9, 34.4, 39.5).
(b) The range is the difference between the maximum and the minimum, so we have Range = 39.5 − 23.0 = 16.5. The interquartile range is the difference between the first and third quartiles, so we have IQR = 34.4 − 28.6 = 5.8. (c) The 15th-percentile is between the minimum and the first quartile, so it will be between 23.0 and 28.6. The 60th-percentile is between the median and the third quartile, so it will be between 30.9 and 34.4. 2.127
(a) The z-score for Brazil will be positive because the value for Brazil is higher than the mean.
(b) z = (6.24 − 4.8)/1.66 = 0.87 (c) The range is max − min = 12.46 − 1.46 = 11.0. (d) The IQR is Q3 − Q1 = 5.61 − 3.76 = 1.85. 2.128
(a) We use technology to see that the mean is 61.56 hot dogs and the standard deviation is 8.83.
(b) Over the 18 year period, 10 of the years had counts above the mean of 61.56 hot dogs. Only two of these were in the first nine years (2007 and 2009); however eight of the nine later years (all but 2014) were above the mean. People seem to be getting better at eating hot dogs! 2.129
(a) See the table. Year 2009 2008 2007 2006 2005
Joey 68 59 66 52 32
Takeru 64 59 63 54 49
Difference 4 0 3 −2 −17
CHAPTER 2
40
(b) For the five differences, we use technology to see that the mean is −2.4 and the standard deviation is 8.5. 2.130 (a) We expect that 95% of the data will lie between x±2s. In this case, the mean is x = 2.31 and the standard deviation is s = 0.96, so 95% of the data lies between 2.31±2(0.96). Since 2.31−2(0.96) = 0.39 and 2.31 + 2(0.96) = 4.23, we estimate that about 95% of the temperature increases will lie between 0.39◦ and 4.23◦ . (b) Since x = 2.31 and s = 0.96, the z-score for a temperature increase of 4.9◦ is z-score =
4.9 − 2.31 x−x = = 2.70 s 0.96
The temperature increase for this man is 2.7 standard deviations above the mean. 2.131 (a) The 10th percentile is the value with 10% of the area of the histogram to the left of it. This appears to be at about 2.5 or 2.6. A (self-reported) grade point average of about 2.6 has 10% of the reported values below it (and 90% above). The 75th percentile appears to be at about 3.4. A grade point average of about 3.4 is greater than 75% of reported grade point averages. (b) It appears that the highest GPA in the dataset is 4.0 and the lowest is 2.0, so the range is 4.0−2.0 = 2.0. 2.132 The histogram is relatively symmetric and bell-shaped. The mean appears to be approximately 440 billion dollars. To estimate the standard deviation, we estimate an interval centered at 440 that contains approximately 95% of the data. The interval from about 320 to 560 appears to contain almost all the data. Since 320 is 120 units below the mean around 440 and 560 is 120 units above the mean, by the 95% rule we estimate that 2 times the standard deviation is about 120, so the standard deviation appears to be approximately 60 billion dollars. Note that we can only get a rough approximation from the histogram. To find the exact values of the mean and standard deviation, we would use technology and all the values in the dataset. 2.133 Using technology, we see that the mean is 427.8 billion dollars and the standard deviation is 61.0 billion dollars. The 95% rule says that 95% of the data should be within two standard deviations of the mean, so the interval is: Mean
±
2 · StDev
427.8 427.8
± ±
2 · (61.0) 122.0
to
549.8
305.8
We expect 95% of US monthly retail sales over this time period to be between 305.8 billion dollars and 549.8 billion dollars. 2.134 The data is not at all bell-shaped, so it is not appropriate to use the 95% rule with this data. 2.135 (a) Using technology, we see that the mean is 37.0 blocks in a season and the standard deviation is 34.2 blocks. (b) Using technology, we see that the five number summary is (0, 16, 28, 45, 199). (c) The five number summary from part (b) is more resistant to outliers and is often more appropriate if the data is heavily skewed.
CHAPTER 2
41
(d) We create either a histogram, dotplot, or boxplot. A histogram of the data in Blocks is shown. We see that the distribution is heavily skewed to the right.
(e) This distribution is not at all bell-shaped, so it is not appropriate to use the 95% rule with this distribution. 2.136 We first calculate the z-scores for the four values. In each case, we use the fact that the z-score is the value minus the mean, divided by the standard deviation. We have 0.442 − 0.463 = −0.37 0.057 586 − 219 = 2.48 z-score for Assists = 148
z-score for FGPct =
2818 − 981 = 3.84 478 158 − 63.5 z-score for Steals = = 3.03 31.2 z-score for Points =
The most impressive statistic is his total points, which is about 3.84 standard deviations above the mean. The least impressive is his field goal percentage, which is 0.37 standard deviations below the mean. Although just a bit below average in field goal percentage, he is well above the mean for the other three variables. 2.137 (a) We calculate z-scores using the summary statistics for each: Reading and Writing = 600−536 = 102 = 0.605. 0.627, Math = 600−531 114 (b) Stanley’s more unusual score was in the Reading and Writing component, since he has the higher z-score in this section. (c) Stanley performed best on Reading and Writing, since this is the higher z-score. 2.138 (a) Using technology, we find that the mean internet speed is x = 24.86 (Mb/sec) a and the standard deviation is s = 10.92. (b) Using technology, we find that the mean hours online is x = 5.69 and the standard deviation is s = 1.54. (c) There may be some relationship between internet speeds and hours online. Brazil has a much slower internet speed and the highest time online by considerable margin, but the pattern does not seem so obvious with the other countries. 2.139
(a) The range is 6662 − 445 = 6217 and the interquartile range is IQR = 2106 − 1334 = 772.
(b) The maximum of 6662 is clearly an outlier and we expect it to pull the mean above the median. Since the median is 1667, the mean should be larger than 1667, but not too much larger. The mean of this data set is 1796.
42
CHAPTER 2
(c) The best estimate of the standard deviation is 680. We see from the five number summary that about 50% of the data is within roughly 400 of the median, so the standard deviation is definitely bigger than 200. The two values above 680 would be way too large to give an estimated distance of the data values from the mean, so the only reasonable answer is 680. The actual standard deviation is 680.3.
2.140 (a) The smallest possible standard deviation is zero, which is the case if the numbers don’t deviate at all from the mean. The dataset is: 5, 5, 5, 5, 5, 5. (b) The largest possible standard deviation occurs if all the numbers are as far as possible from the mean of 5. Since we are limited to numbers only between 1 and 9 (and since we have to keep the mean at 5), the best we can do is three values at 1 and three values at 9. The dataset is: 1, 1, 1, 9, 9, 9.
2.141 A bell-shaped distribution with mean 3 and standard deviation 1.
2.142 A bell-shaped distribution with mean 7 and standard deviation 1.
2.143 A bell-shaped distribution with mean 5 and standard deviation 2.
CHAPTER 2
43
2.144 A bell-shaped distribution with mean 5 and standard deviation 0.5.
2.145 Using technology, we see that the mean rating is 57.58, the standard deviation is 28.06, and the five number summary is (0, 33, 61, 84, 99). 2.146 Using technology, we see that the mean audience rating is 62.18, the standard deviation is 18.21, and the five number summary is (10, 49, 64, 77, 99). 2.147 Using technology, we see that the sample size is n = 1412, the mean distance is x = 408.7, the standard deviation is s = 78.1, and the five number summary is (137, 375, 429, 448, 577). (Note that different stat packages may give slightly different values for the quartiles.) 2.148 (a) Using technology the mean braking distance is x = 130.8 feet and the standard deviation is s = 6.92 feet. (b) An interval for x ± 2 · s = 130.8 ± 2 · 6.92 = 130.8/pm13.8 goes from 117.0 to 144.6 feet. There are five cars with braking distance less than 117.0 feet: Corvette (107), Porsche 911 (108), BMW Z4 (111), Camaro (112), and Audi TT (113). Only one car, the Toyota Sequoia (146), has a braking distance more than two standard deviations above the mean. 2.149 (a) Using technology we find the following summary statistics for the two variables, which include standard deviation (StDev), range, and interquartile range (IQR) for each variable. Variable N Mean CityMPG 110 16.19 HwyMPG 110 34.00
StDev Minimum Q1 Median Q3 Maximum Range IQR 3.74 10.0 13.0 15.5 19.0 28.0 18.0 6.0 7.19 21.0 28.0 32.5 39.0 54.0 33.0 11.0
CHAPTER 2
44
(b) The range (18), IQR (6) and standard deviation (3.74) for CityMPG are all smaller than the respective range (33), IQR (11), and standard deviation (7.193) for HwyMPG. 2.150
(a) Using technology we find that x = 10.29 hours and s = 10.19 hours.
(b) The computed z-score for a homework hours value of x = 80 is 80 − 10.29 x−x = = 6.84 z= s 10.19 This amount of homework claimed by this student is 6.84 standard deviations above the mean for this sample. While this is not impossible, it would be very surprising to find a student working 80 hours per week on homework! 2.151 (a) One half of the data should have a range of 10 and all of the data should have a range of 100. The data is very bunched in the middle, with long tails on the sides. One possible histogram is shown.
(b) One half of the data should have a range of 40 and all of the data should have a range of 50. This is a bit tricky – it means the outside 50% of the data fits in only 10 units, so the data is actually clumped on the outside margins. One possible histogram is shown.
2.152 (a) The rough estimate is (130 − 35)/5 = 19 bpm compared to the actual standard deviation of s = 12.2 bpm. The possible outliers pull the rough estimate up quite a bit. Without the two outliers, the rough estimate is (96 − 35)/5 = 12.2, exactly matching the actual standard deviation. (b) The rough estimate is (40−0)/5 = 8. The rough estimate is a bit high compared to the actual standard deviation of s = 5.741. (c) The rough estimate is (40 − 1)/5 = 7.8. The rough estimate is quite close to the actual standard deviation of s = 7.24.
CHAPTER 2
45
Section 2.4 Solutions 2.153 We match the five number summary with the maximum, first quartile, median, third quartile, and maximum shown in the boxplot. (a) This five number summary matches boxplot S. (b) This five number summary matches boxplot R. (c) This five number summary matches boxplot Q. (d) This five number summary matches boxplot T. Notice that at least 25% of the data is exactly the number 12, since 12 is both the minimum and the first quartile. 2.154 We match the five number summary with the maximum, first quartile, median, third quartile, and maximum shown in the boxplot. (a) This five number summary matches boxplot W. (b) This five number summary matches boxplot X. (c) This five number summary matches boxplot Y. (d) This five number summary matches boxplot Z. 2.155 (a) Half of the data lies between 585 and 595, while the other half (the left tail) is stretched all the way from 585 down to about 50. This distribution is skewed to the left. (b) There are 3 low outliers. (c) We see that the median is at about 585 and the distribution is skewed left, so the mean is less than the median. A reasonable estimate for the mean is about 575 or 580. 2.156 (a) Half of the data appears to lie in the small area between 20 and 40, while the other half (the right tail) appears to extend all the way up from 40 to about 140. This distribution appears to be skewed to the right. (b) Since there are no asterisks on the graph, there are no outliers. (c) The median is approximately 40 and since the distribution is skewed to the right, we expect the values out in the right tail to pull the mean up above the median. A reasonable estimate for the mean is about 50. 2.157
(a) This distribution looks very symmetric.
(b) Since there are no asterisks on the graph, there are no outliers. (c) We see that the median is at approximately 135. Since the distribution is symmetric, we expect the mean to be very close to the median, so we estimate the mean to be about 135. 2.158 (a) This distribution isn’t perfectly symmetric but it is close. Despite the presence of outliers on both sides, there does not seem to be a distinct skew either way. This distribution is approximately symmetric. (b) There appear to be 3 low outliers and 2 high outliers. (c) Since the distribution is approximately symmetric, we expect the mean to be close to the median, which appears to be at about 1200. We estimate that the mean is about 1200.
CHAPTER 2
46
2.159 (a) We see that Q1 = 260 and Q3 = 300 so the interquartile range is IQR = 300 − 260 = 40. We compute Q1 − 1.5(IQR) = 260 − 1.5(40) = 200, and Q3 + 1.5(IQR) = 300 + 1.5(40) = 360 Since the minimum (210) and maximum (320) values lie inside these values, there are no outliers. (b) Boxplot:
2.160
(a) We see that Q1 = 42 and Q3 = 56 so the interquartile range is IQR = 56−42 = 14. We compute Q1 − 1.5(IQR) = 42 − 1.5(14) = 21,
and Q3 + 1.5(IQR) = 56 + 1.5(14) = 77 There are two data values that fall outside these values. We see that 15 and 20 are both small outliers. (b) Notice that the line on the left of the boxplot extends down to 28, the smallest data value that is not an outlier.
2.161
(a) We see that Q1 = 72 and Q3 = 80 so the interquartile range is IQR = 80 − 72 = 8. We compute Q1 − 1.5(IQR) = 72 − 1.5(8) = 60,
CHAPTER 2
47
and Q3 + 1.5(IQR) = 80 + 1.5(8) = 92 There are four data values that fall outside these values, one on the low side and three on the high side. We see that 42 is a low outlier and 95, 96, and 99 are all high outliers. (b) Notice that the line on the left of the boxplot extends down to 63, the smallest data value that is not an outlier, while the line on the right extends up to 89, the largest data value that is not an outlier.
2.162
(a) We see that Q1 = 10 and Q3 = 16 so the interquartile range is IQR = 16 − 10 = 6. We compute Q1 − 1.5(IQR) = 10 − 1.5(6) = 1,
and Q3 + 1.5(IQR) = 16 + 1.5(6) = 25 There are no small outliers and there are two large outliers, at 28 and 30. (b) Notice that the line on the right extends up to 23, the largest data value that is not an outlier.
2.163
(a) The distribution is left-skewed.
(b) Half of all literacy rates are between about 73% and 98%. (c) The five number summary is about (15, 73, 92, 98, 100). (d) The true median is most likely lower than that shown here, because data is more likely to be available for developed countries, which have higher literacy rates.
CHAPTER 2
48 2.164
(a) The median runtime for the mice receiving old blood appears to be about 29 minutes.
(b) This data appear to be skewed to right, so we expect the mean to be larger than the median. (c) The mice receiving young blood appear to be able to run for much longer. (d) Yes, there appears to be an association between these variables since the runtimes are generally much larger for those mice receiving young blood. (e) No, there are no outliers in either group. 2.165 (a) The explanatory variable is the group the person is in, and it is categorical. The response variable is hippocampus volume, and it is quantitative. (b) The control group of people who never played football appears to have the largest hippocampal volume, while the football players with a history of concussion appears to have the smallest. (c) Yes, there are two outliers (one high and one low) in the group of football players with a history of concussion. (d) The third quartile appears to be at about 7000 μL. (e) Yes, there is a quite obvious association, with those playing football having smaller brain hippocampus volume and those playing football with concussions having even smaller volume. (f) No, we cannot conclude causation as the data come from an observational study and not an experiment. There are many possible confounding variables. 2.166 (a) We see that IQR = 64 − 25 = 39 so Q3 + 1.5(IQR) = 64 + 1.5(39) = 122.5. We see that there are two high outliers, at 289 seconds and 267 seconds. (b) We see that Q1 − 1.5(IQR) = 25 − 1.5(39) = −33.5. There are definitely no low outliers! (c) See the figure.
(d) Due to the right skew of the distribution, and large positive outliers we would expect the mean to be greater than the median. 2.167 Recall that two variables are associated if values of one variable tend to be related to values of the other variable. In this case, it means that knowing whether a case is in group 1 or group 2 gives us information about the likely value of the quantitative variable for that case. (a) Dataset C shows the strongest association between the two variables since the values of the quantitative variable are quite different between group 1 and group 2.
CHAPTER 2
49
(b) Dataset A shows no evidence of an association between the two variables, since values of the quantitative variable are virtually the same between the two groups. 2.168 Recall that two variables are associated if values of one variable tend to be related to values of the other variable. In this case, it means that knowing what category a case is in, in the categorical variable, gives useful information about what the likely values are for the quantitative variable. Possible answers are shown below. Answers may vary.
2.169 (a) Most of the data is between 0 and 500, and then the data stretches way out to the right to some very large outliers. This data is very much skewed to the right. (b) The data appear to range from about 0 to about 10,000, so the range is about 10000 − 0 = 10000. (c) The median appears to be about 250 (although this is difficult to estimate precisely due to the scale of the graph). About half of all movies recover less than 250% of their budget, and half recover more than 250% of the budget. (d) The very large outliers will pull the mean up, so we expect the mean to be larger than the median. (In fact, the median is 268.9 while the mean is 435.7.) 2.170 We see from the five number summary that the interquartile range is IQR = Q3 − Q1 = 77 − 49 = 28. Using the IQR, we compute: Q1 − 1.5 · IQR Q3 + 1.5 · IQR
= 49 − 1.5(28) = 49 − 42 = 7 = 77 + 1.5(28) = 77 + 42 = 119
Scores greater than 119 are impossible since the scale only goes up to 100. An audience score less than 7 would qualify as a low outlier. (That would have to be a very bad movie!) Since the minimum rating (seen in the five number summary) is 10, there are no low outliers. 2.171 (a) Action movies appear to have the largest budgets, while horror movies appear to have the smallest budgets.
CHAPTER 2
50
(b) Action movies have by far the biggest spread in the budgets, with horror movies and comedies appearing to have the smallest spread (but not much smaller than dramas). (c) Yes, there definitely appears to be an association between genre and budgets, with action movies having substantially larger budgets and much more variability than the other three types. 2.172
(a) The highest mean is in Drama (69.52), while the lowest mean is in Horror movies (43.56).
(b) The highest median is in Drama (75), while the lowest median is in Horror movies (38.5). (c) The lowest score is 18 and it is for a Horror movie. The highest score is 97, and it is obtained by a Drama. (d) The genre with the largest number of movies is Drama, with n = 52. (e) We see that xC = 55.64 while xH = 43.56, so the difference in mean score between the two types of movies is xC − xH = 55.64 − 43.56 = 12.08. 2.173 (a) The lowest level of physical activity appears to be in the South, and the highest, in general, appears to be in the West (although the highest individual state is in the Northeast). (b) There are high outliers in the Midwest (Wisconsin) and Northeast (Vermont) and a low outlier in the West (Nevada). (c) Yes, the boxplots are very different between the different regions. All of the states except the outlier in the West are larger than all the values in the South. The Midwest and Northeast tend to be between these two extremes. 2.174 (a) The median corresponds to the middle line in each box. These are at about the same place, around 1380 hits, for both leagues. In fact, the actual medians from the data are 1379 (AL) and 1378 (NL). These are remarkably close. (b) Although the medians are similar, the other values of the five number summary (minimum, maximum, Q1 , and Q3 ) are all smaller for the National League. The boxplot is more symmetric for the National League and appears to have a bit less variability. 2.175 The side-by-side boxplots are almost identical. Vitamin use appears to have no effect on the concentration of retinol in the blood. 2.176 (a) Yes, there does appear to be an association. Honeybees appear to dance many more circuits for a high quality option. (b) It is obvious that there are no low outliers, so we look for high outliers in each case. For the high quality group, we have IQR = 122.5 − 7.5 = 115, so outliers would be those beyond Q3 + 1.5 · IQR = 122.5 + 1.5(115) = 122.5 + 172.5 = 295 There are two outliers in the high quality group, with one at the maximum of 440 and the other appearing on the dotplot to be at approximately 330. These two honeybee scouts must have been very enthusiastic about this possible new home! For the low quality group, we have IQR = 42.5 − 0 = 42.5, so outliers would be those beyond Q3 + 1.5 · IQR = 42.5 + 1.5(42.5) = 42.5 + 63.75 = 106.25 There are three outliers in the low quality group, with one at the maximum of 185 and the other two appearing on the dotplot to be at approximately 140 and 175. Notice that none of these three outliers would be considered outliers in the high quality group.
CHAPTER 2
51
(c) The difference in means is xH − xL = 90.5 − 30.0 = 60.5. (d) We see in the five number summary that the largest value in the high quality group is 440. We find: z-score for 440 =
x − Mean 440 − 90.5 = = 3.695 Standard deviation 94.6
We see in the five number summary that the largest value in the low quality group is 185. We find: z-score for 185 =
185 − 30 x − Mean = = 3.138 Standard deviation 49.4
Both the largest values are more than 3 standard deviations above the mean. The one in the high quality group is a bit larger relative to its group. (e) No, since the data are not bell-shaped. 2.177 (a) The five number summaries for Individual (12, 31, 39.5, 45.5, 59) and Split (22, 40, 46.5, 61, 81) show that costs tend to be higher when subjects are splitting the bill. This is also true for the means (xI = 37.29 vs xS = 50.92), while the standard deviations and IQRs show slightly more variability for those splitting the bill (sI = 12.54 and IQRI = 14.5 vs sS = 14.33 and IQRS = 21). (b) Side-by-side plot show the distributions of costs tend to be relatively symmetric for both groups, but generally higher and slightly more variable for those splitting the bill.
2.178 (a) The five number summaries for females (15, 35, 41.5, 59, 81) and males (12, 39.5, 45, 51, 73) are fairly similar with a somewhat larger interquartile range for the females (24 vs 11.5). The means (xf = 44.46 vs xm = 43.75) and standard deviations (sf = 15.48 vs sm = 14.81) are similar for both groups. (b) Side-by-side boxplots show the distributions of costs tend to be relatively symmetric for both females and males, although the smaller IQR for males produces 2 outlier values at each end of the distribution. Both distributions are centered at roughly the same point (a bit over 40 shekels).
2.179 Here are side-by-side dotplots to compare the completion rates between the three types of control.
CHAPTER 2
52
We see that all three distributions include values for 0% to 100% (and frequently have a cluster of values at those extremes). The private schools have a relatively symmetric distribution with a center that is below that of profit schools, but well above the public schools. The for profit schools show a left skew, while the public schools are right skewed. The ranges are the same (100) and standard deviations are likely to be similar. Note, you might also compare side-by-side boxplots or histograms.
C
H
2.180 Using technology, we create the side-by-side boxplots as in the figure below.
500
1000
1500
Speed
(a) There are high outliers, but no low outliers for the hens. (b) There are high outliers, but no low outliers for the cocks. (c) The speed boxplots are very similar for hens and cocks. Both are right skewed with lots of high outliers, and very similar five number summaries (expect for the hen with the fastest speed). There does not appear to be an association between sex and speed for homing pigeons. 2.181 Using technology we find some summary statistics for the CompRate variable in CollegeScores as shown below.
CHAPTER 2 Variable:CompRate Control N N* Private 1432 400 Profit 2096 237 Public 1908 68
53
Mean 55.370 64.229 41.826
StDev 22.849 20.928 22.657
Minimum 0.000 0.000 0.000
Q1 40.657 52.940 24.243
Median 56.560 66.670 35.960
Q3 70.815 78.570 56.813
Maximum 100.000 100.000 100.000
Comparing means (or medians) between the types of control we see that for profit schools tend to have higher completion rates (mean=64.2, median=66.7), while public schools are lower (mean=41.8, median=36.0) and private schools tend to be in between (mean=55.4, median=56.6). The ranges are all equal to 100 and the standard deviations are similar (22.8, 20.9, and 22.7). Note, however, that a sizable number of schools (especially private ones) have missing values (as shown in the column labeled N*). 2.182 Here is one possible graph of the side-by-side boxplots:
2.183 Here is one possible graph of the side-by-side boxplots:
2.184 Answers will vary. 2.185 Answers will vary.
54
CHAPTER 2
Section 2.5 Solutions 2.186 A correlation of −1 means the points all lie exactly on a line and there is a negative association. The matching scatterplot is (b). 2.187 A correlation of 0 means there appears to be no linear association in the scatterplot, so the matching scatterplot is (c). 2.188 A correlation of 0.8 means there is an obvious positive linear association in the scatterplot, but there is some deviation from a perfect straight line. The matching scatterplot is (d). 2.189 A correlation of 1 means the points all lie exactly on a line and there is a positive association. The matching scatterplot is (a). 2.190 The correlation represents almost no linear relationship, so the matching scatterplot is (c). 2.191 The correlation represents a mild negative association, so the matching scatterplot is (a) 2.192 The correlation shows a strong positive linear association, so the matching scatterplot is (d). 2.193 The correlation shows a strong negative association, so the matching scatterplot is (b). 2.194 We expect that larger houses will cost more to heat, so we expect a positive association. 2.195 Since the amount of gas goes down as distance driven goes up, we expect a negative association. 2.196 We wear more clothes when it is cold outside, so as the temperature goes down, the amount of clothes worn goes up. This describes a negative association. 2.197 Usually someone who sends lots of texts also gets lots of them in return, and someone who does not text very often does not get very many texts. This describes a positive relationship. 2.198 Usually there are not many people in a heavily wooded area and there are not many trees in a heavily populated area such as the middle of a city. This would be a negative relationship. 2.199 While it is certainly not a perfect relationship, we generally expect that more time spent studying will result in a higher exam grade. This describes a positive relationship. 2.200 See the figure below.
CHAPTER 2
55
2.201 See the figure below.
2.202 The correlation is r = 0.915. 2.203 The correlation is r = −0.932. 2.204 The explanatory variable is roasting time and the response variable is the amount of caffeine. The two variables have a negative association. 2.205 (a) The two variables are amount of iron in the soil and amount of potassium in spinach grown in that soil. Amount of iron in the soil is the explanatory variable since it appears to impact the amount of potassium in the spinach. (b) Since higher amounts of iron imply lower levels of potassium, this is a negative association. (c) We know that 840 mg is the average amount of potassium. If soil has high levels of iron, we expect the amount of potassium to be less than 840 mg. (d) If soil has low levels of iron, we expect the amount of potassium to be greater than 840 mg. 2.206
(a) (i) This is a positive association. (ii) This is a negative association.
(b) No, we cannot conclude a cause-and-effect relationship since the results do not come from an experiment. (However, additional experiments do seem to indicate causation in this case.) 2.207
(a) More nurturing is associated with larger hippocampus size, so this is a positive association.
(b) Larger hippocampus size is associated with more resiliency, so this is a positive association. (c) An experiment would involve randomly assigning some children to get lots of nurturing while randomly assigning some other children to get less nurturing. After many years, we would measure the size of the hippocampus in their brains. It is clearly not ethical to assign some children to not get nurtured! (d) We cannot conclude that there is a cause and effect relationship in humans. No experiment has been done and there are many possible confounding variables. We can, however, conclude that there is a cause and effect relationship in animals, since the animal results come from experiments. This causation in animals probably increases the likelihood that there is a causation effect in humans as well. 2.208 Since more cheating by the mother is associated with more cheating by the daughter, this is a positive association.
CHAPTER 2
56
2.209 (a) The association appears to be positive. This means that higher levels of self-reported depression are associated with higher levels of clinical levels of depression. This makes sense since both scales are meant to measure approximately the same thing. (b)
i. A case in the upper left would represent a person who has low levels of self-reported depression symptoms but who scores high on the clinical depression assessment. ii. A case in the upper right would represent a person who scores high on both the self-reported scale and the clinical scale. iii. A case in the lower left would represent a person who scores low on both the self-reported scale and the clinical scale. iv. A case in the lower right would represent a person who has high levels of self-reported depression symptoms but who has a low score on the clinical depression assessment.
(c) For the case farthest to the right, the self-reported score is approximately 54 and the clinical score is approximately 27.
2.210 All the cases are male-female married couples. (a) A case in the top left corner of the scatterplot represents a couple with an old husband and a young wife. (b) A case in the top right corner of the scatterplot represents an old couple with an old husband and an old wife. (c) A case in the bottom left corner of the scatterplot represents a young couple with a young husband and a young wife. (d) A case in the bottom right corner of the scatterplot represents a couple with an young husband and an old wife.
2.211 (a) BMI, cortisol level, depression score, and heart rate are all positively correlated with social jetlag, while weekday hours of sleep and physical activity are negatively correlated. (b) No, we cannot conclude causation since this is an observational study and not an experiment.
2.212 (a) We see that years of football appears to have a stronger association with brain hippocampal volume than with the cognitive percentile score, so the correlation −0.465 goes with Graph (b) while the correlation −0.366 goes with Graph (a). (b) More years playing football tends to be associated with smaller brain size and a lower cognitive percentile score.
2.213 The scatterplot is shown.
CHAPTER 2
57
(a) We see on the scatterplot that one other 50-year-old woman got married during this time period. Her husband was about 43 at the time of the marriage. (b) The oldest man to get married during this time period was 75 years old. The youngest was about 19 years old. (c) There are two dots to the right of 65 on the scatterplot, so 2 of the women who got married during this time period were older than 65. There are three dots to the left of 20, so 3 of the women who got married during this time period were less than 20 years old. 2.214 Using the dataset MarriageAges and StatKey or other technology, we see that the correlation is 0.914. This is a sample correlation, so the notation is r and we have r = 0.914.
1000 500
Speed
1500
2.215 Using technology, we create the scatterplot as in the figure below.
200
300
400
500
Distance
(a) There appears to be a negative association between distance and speed. The pigeons flying shorter distances (under 300 miles) tend to have more speeds above 750 ypm, while the pigeons flying longer distances more often have speeds below 750 ypm. (b) From the scatterplot, the highest value for speed appears to be a pigeon that flew for about 430 miles. This value can be confirmed by looking at the first case in the dataset, because the pigeons are ordered from highest to lowest speed.
CHAPTER 2
58
(c) Using technology, we find that the correlation between speed and distance for these 1412 pigeons is r = −0.231. This is consistent with the negative association observed in the scatterplot. 2.216 (a) A positive associate would indicate that teams that win more in the pre-season tend to win more in the regular season, while a negative association would indicate that teams that win more in the pre-season tend to lose more in the regular season. (b) This is a small positive correlation, so it tells you there is a very weak positive linear relationship between these two variables. 2.217 Type of liquor is a categorical variable, so correlation should not be computed and a positive relationship has no meaning. There cannot be a linear relationship involving a categorical variable. 2.218 Both variables in the study are categorical, not quantitative. Correlation is a statistical measure of the linear relationship between two quantitative variables. 2.219 (a) There is a strong negative linear association between genetic diversity and distance from East Africa. (b) The correlation is r = −0.83. The correlation is clearly negative, correlations must be between −1 and 1, and the association is relatively strong, indicating that the negative correlation is likely closer to −1 than 0. (c) America has both the lowest genetic diversity and is the farthest from East Africa. (d) Based on genetic diversity, populations closer to East Africa appear better suited to adapt to change because they have greater genetic diversity. 2.220 (a) The dots go up as we move left to right, so there appears to be a positive relationship. In context, that means that as a country’s residents use more of the planet’s resources, they tend to be happier and healthier. (b) The bottom left is an area with low happiness and low ecological footprint, so they are countries whose residents are not very happy and don’t use many of the planet’s resources. (c) Costa Rica is the highest dot – a black dot with an ecological footprint of about 2.0. (d) For ecological footprints between 0 and 6, there is a strong positive relationship. For ecological footprints between 6 and 10, however, there does not seem to be any relationship. Using more resources appears to improve happiness up to a point but not beyond that. (e) Countries in the top left are high on the happiness scale but are relatively low on resource use. (f) There are many possible observations one could make, such as that countries in Sub-Saharan Africa are low on the happiness and well-being scale and also low on the use of resources, while Western Nations are high on happiness but also very high on the use of the planet’s resources. (g) For those in the bottom left (such as many countries in Sub-Saharan Africa), efforts should be devoted to improving the well-being of the people. For those in the top right (such as many Western nations), efforts should be devoted to reducing the use of the planet’s resources. 2.221
(a) A beer rated highly by both wife and husband.
(b) A beer rated low by both wife and husband. (c) A beer rated high by the wife and low by the husband. (d) A beer rated low by the wife and high by the husband.
CHAPTER 2 2.222
59
(a) The couple rated the most beers in 2011.
(b) The highest overall average was in 2019. (c) The highest range of average ratings was in 2011. (d) The highest average rated beer was tasted in 2011. 2.223
(a) The correlation is 0.348.
(b) The wife rated the Pumpkin Ale by the Carolina Beer Company highest, in 2011. (c) The husband rated Master of Pumpkins by Troegg and Trick or Treat: Chocolate Pumpkin Porter by Evil Genius Beer Company highest, in 2019. 2.224
(a) The correlation is 0.088. The positive correlation means the ratings generally increase over time.
(b) The low outlier is Harvest Moon by Blue Moon. (c) If the outlier were to be removed, the correlation would decrease (in fact, it turns from positive to negative). 2.225 (a) There are three variables mentioned: how closed a person’s body language is, level of stress hormone in the body, and how powerful the person felt. Since results are recorded on numerical scales that represent a range for body language and powerful, all three variables are quantitative. (b) People with a more closed posture (low values on the scale) tended to have higher levels of stress hormones, so there appears to be a negative relationship. If the scale for posture had been reversed, the answer would be the opposite. A positive or negative relationship can depend on how the data is recorded. (c) People with a more closed posture (low values on the scale) tended to feel less powerful (low values on that scale), so there appears to be a positive relationship. If both scales were reversed, the answer would not change. If only one of the scales was reversed, the answer would change. 2.226 (a) A positive relationship would imply that a student who is good at one of the tests is also likely to be good at the other – that students are generally either good at both or bad at both. A negative relationship implies that students tend to be good at either math or verbal but not both. (b) A student in the top left is good at verbal and bad at math. A student in the top right is good at both. A student in the bottom left is bad at both, and a student in the bottom right is good at math and bad at verbal. (c) There is not a strong linear relationship as the dots appear to be all over the place. This tells you that the scores students get on the math and verbal SAT exams are not very closely related. (d) Since the linear relationship is not very strong, the correlation is likely to be one of the values closest to zero – either −0.235 or 0.445. Since there is more white space in the top left and bottom right corners than in the other two corners, the weak relationship appears to be a positive one. The correct correlation is 0.445. 2.227 (a) A positive relationship would imply that a student who exercises lots also watches lots of television, and a student who doesn’t exercise also doesn’t watch much TV. A negative relationship implies that students who exercise lots tend to not watch much TV and students who watch lots of TV tend to not exercise much.
60
CHAPTER 2
(b) A student in the top left exercises lots and watches very little television. A student in the top right spends lots of hours exercising and also spends lots of hours watching television. (Notice that there are no students in this portion of the scatterplot.) A student in the bottom left does not spend much time either exercising or watching television. (Notice that there are lots of students in this corner.) A student in the bottom right watches lots of television and doesn’t exercise very much. (c) The outlier on the right watches a great deal of television and spends very little time exercising. The outlier on the top spends a great deal of time exercising and watches almost no television. (d) There is essentially no linear relationship between the number of hours spent exercising and the number of hours spent watching television.
2.228 (a) Women who are close to age 30 (specifically, ages 29, 30, 31) rate men who are the same age as they are the most attractive. (b) Women in their 20s tend to rate older men as more attractive. (c) Women past age 31 tend to rate younger men as more attractive. (d) There appears to be a strong positive correlation between the variables, so the correct answer is 0.9. (The actual correlation is 0.982.)
2.229 (a) Men who are age 20 rate women who are the same age as they are the most attractive. This is the only age for which this is true. Men of all other ages rate younger women as more attractive. (b) Men of all ages rate women who are in their early 20s as the most attractive. (c) There is very little association between these variables, so the best answer is 0. (The actual correlation is 0.287.)
2.230 (a) A positive relationship implies that as internet speed goes up, time online goes up. This might make sense because being online is more enjoyable with a fast internet speed, so people may spend more time online. (b) A negative relationship implies that as internet speed goes up, time online goes down. This might make sense because if internet speed is fast, people can accomplish what they need to accomplish online in a shorter amount of time so they spend less time online waiting. (c) See the scatterplot below. These two variables have a negative association. There is one clear outlier. The point in the top left corner, corresponding to Brazil, has much lower internet speed and high number of hours online.
CHAPTER 2
5
6
7
Hours Online
8
9
61
5
10
15
20
25
30
35
40
Internet Speed
(d) If we ignore the outlier for Brazil, there is still a bit of a negative association, but the variables do not appear to have a strong relationship in either direction. (e) For all nine countries, the correlation is r = −0.704. If we leave out Brazil, the correlation for the remaining eight countries changes to r = −0.289. The data point for Braazil appears to have strong influence on the correlation. (f) No! Even with a negative correlation, these data comes from an observational study, so we cannot conclude that there is a causal association. 2.231 (a) We see in the scatterplot that the relationship is positive. This makes sense for irises: petals which are long are generally wider also. (b) There is a relatively strong linear relationship between these variables. (c) The correlation is positive and close to, but not equal to, 1. A reasonable estimate is r ≈ 0.9. (d) There are no obvious outliers. (e) The width of that iris appears to be about 11 mm. (f) There are two obvious clumps in the scatterplot that probably correspond to different species of iris. One type has significantly smaller petals than the other(s). 2.232 There are many ways to draw the scatterplots. One possibility for each is shown in the figure. 8
10
3
4
2
4
3
2
2 1 0
2
4
6 X
2.233
6 5
(a) See the figure.
8
10
0
Y
6
Y
Y
8
0
4
7
1 0 0
1
2
3
4 X
5
6
7
8
0
1
2 X
3
4
62
CHAPTER 2
(b) There is a relatively strong positive relationship, which means players who have lots of defensive rebounds also tend to have lots of offensive rebounds. This makes sense since players who are very tall tend to get lots of rebounds at either end of the court. (c) There are two outliers at the right with unusually high numbers of offensive rebounds. We see in the data file that these two players are Andre Drummond who has the most offensive (423) and defensive (809) rebounds and Steve Adams who has the second most offensive rebounds (391) but a more modest number (369) of defensive rebounds. (d) We use technology to see that the correlation is 0.751. This correlation matches the strong positive linear relationship we see in the scatterplot. 2.234 (a) Since we are looking to see if budget affects audience score, we put the explanatory variable (budget) on the horizontal axis and the response variable (audience score) on the vertical axis. See the figure.
(b) The outlier has a budget of 365 million dollars. This movie is Avengers: Age of Ultron and the audience score is 83. (c) The movie with the lowest audience rating (of 10) is Just Getting Started with a budget of 22.0 million dollars.
CHAPTER 2 (d) We use technology to see that the correlation is 0.132. 2.235 Answers will vary
63
64
CHAPTER 2
Section 2.6 Solutions = 24.3 + 2.74(12) = 57.18 inches. The residual is 2.236 (a) The predicted value for the data point is Hgt 60 − 57.18 = 2.82. This child is 2.82 inches taller than the predicted height. (b) The slope 2.74 tells the expected change in Hgt given a one year increase in Age. We expect a child to grow about 2.74 inches per year. (c) The intercept 24.3 tells the Hgt when the Age is 0, or the height (or length) of a newborn. This context does make sense, although the estimate is rather high. = −0.0127 + 0.018(3) = 0.0413. The residual is 2.237 (a) The predicted value for the data point is BAC 0.08 − 0.0413 = 0.0387. This individual’s BAC was 0.0387 higher than predicted. (b) The slope of 0.018 tells us the expected change in BAC given a one drink increase in drinks. We expect one drink by this individual to increase BAC by 0.018. (c) The intercept of −0.0127 tells us that the BAC of someone who has consumed no drinks is negative. The context makes sense, but a negative BAC is not possible! 2.238 (a) The predicted value for the data point is W eight = 95 + 11.7(5) = 153.5 lbs. The residual is 150 − 153.5 = −3.5. This individual is capable of bench pressing 3.5 pounds less than predicted. (b) The slope 11.7 tells the expected change in Weight given a one hour a week increase in Training. If an individual trains an hour more each week, the predicted weight the individual is capable of bench pressing would go up 11.7 pounds. (c) The intercept 95 tells the Weight when the hours Training is 0, or the bench press capability of an individual who never lifts weights. This intercept does make sense in context. = 41.0+3.8(10) = 79. The residual is 81−79 = 2. 2.239 (a) The predicted value for the data point is Grade This student did two points better than predicted. (b) The slope 3.8 tells the expected change in Grade given a one hour increase in Study. We expect the grade to go up by 3.8 for every additional hour spent studying. (c) The intercept 41.0 tells the Grade when Study is 0. The expected grade is 41 if the student does not study at all. This context makes sense. 2.240 The regression equation is Ŷ = 0.395 + 0.349X. 2.241 The regression equation is Ŷ = 47.267 + 1.843X. 2.242 The regression equation is Ŷ = 111.7 − 0.84X. 2.243 The regression equation is Ŷ = 641.62 − 8.42X. 2.244 (a) The case with the largest negative residual appears to have a self-reported score of about 38, a clinical score of about 5, a predicted clinical score on the line of about 35. These values would give a residual of approximately 5 − 35 = −30. (b) For the case with a large positive residual and a self-reported score of 0, the clinical score appears to be approximately 31 and the predicted clinical score appears to be approximately 5. These values give a residual of approximately 31 − 5 = 26.
CHAPTER 2
65
2.245 (a) The car with the largest positive residual has a QtrMile time of about 15 seconds and Acc030 time of about 2.2 seconds. If we check the Cars2020 data this turns out to be a Mazda Miata-MX5. (b) The car with the most extreme negative residual has a QtrMile time of about 12.5 seconds and Acc030 time of about 2.0 seconds. If we check the Cars2020 data this turns out to be a Chevrolet Corvette. 2.246 (a) The plot for all car models shows a stronger association between QtrMile times and Acc030. The fast accelerating (smaller times) sporty cars in the lower left show more scatter away from the line. (b) The slope shown in the plot for sporty cars (2.487) is larger than the slope shown for all cars (2.219). (c) Using technology, the correlation between QtrMile and Acc030 for all the car models is r = 0.947. This is larger than the correlation (0.885) for just the sporty cars. (d) The information from the correlations is more consistent with the stronger association shown in the plot of all car models, than the comparison of slopes. 2.247
(a) Year is the explanatory variable and CO2 concentration is the response variable.
(b) A scatterplot of CO2 vs Y ear is shown. There is a very strong linear relationship in the data.
(c) We find that r = 0.993. This correlation is very close to 1 and matches the very strong linear relationship we see in the scatterplot. 2 = −2701.2 + 1.5366(Y ear). (d) We see that CO (e) The slope is 1.5366. Carbon dioxide concentrations in the atmosphere have been going up at a rate of about 1.5366 ppm each year. (f) The intercept is −2701.2. This is the expected CO2 concentration in the year 0, but clearly doesn’t make any sense since the concentration can’t be negative. The linear trend clearly does not extend back that far and we can’t extrapolate back all the way to the year 0. 2 = −2701.2 + 1.5366(2003) = 376.6. This seems (g) In 2003, the predicted CO2 concentration is CO reasonable since the value lies between the data values for years 2000 and 2005. 2 = −2701.2 + 1.5366(2025) = 410.4. We have less In 2025, the predicted CO2 concentration is CO confidence in this prediction since we can’t be sure the linear trend will continue that far into the future beyond our data. 2 = −2701.2 + 1.5366(2010) = 387.37. We see in the (h) In 2010, the predicted CO2 concentration is CO data that the actual concentration that year is 389.90, so the residual is 389.90 − 387.37 = 2.53. The CO2 concentration in 2010 was above the predicted value.
CHAPTER 2
66
2.248 (a) The explanatory variable is the duration of the waggle dance. We use it to predict the response variable which is the distance to the source. (b) Yes, there is a very strong positive linear trend in the data. (c) We use technology to see that the correlation is r = 0.994. These honeybees are very precise with their timing! (d) We use technology to see that the regression line is Distance = −399 + 1174 · Duration. (e) The slope is 1174 and indicates that the distance to the source goes up by 1174 meters if the dance lasts one more second. (f) If the dance lasts 1 second, we predict that the source is Distance = −399 + 1174(1) = 775 meters away. If the dance lasts 3 seconds, we predict that the source is Distance = −399 + 1174(3) = 3123 meters away. 2.249
(a) The trend is positive, with a fairly linear relationship.
(b) The residual is the vertical distance from the point to the line, which is much larger in 2007 than it is in 2008. We see that in 2014, the point is below the line so the observed value is less than the predicted value. The residual is negative. (c) We use technology to see that the correlation is r = 0.843. (d) We use technology to find the regression line: HotDogs = −2743.6 + 1.3953 · Y ear. (e) The slope indicates that the winning number is going up by about 1.4 more hot dogs each year. People better keep practicing! (f) The predicted number in 2020 is HotDogs = −2743.6 + 1.3953 · (2020) = 74.9, which would beat the all-time record. (g) It is not appropriate to use this regression line to predict the winning number in 2030 because that is extrapolating too far away from the years that were used to create the dataset. 2.250
(a) Fluoride exposure is not being randomly assigned, so this is an observational study.
(b) The explanatory variable is amount of fluoride exposure in pregnant women. The response variable is IQ scores of their children. (c) As fluoride exposure goes up, the IQ score goes down, so this is a negative correlation. (d) The sentence is describing the slope of the regression line. 2.251 (a) This is a negative association since increases in elevation are associated with decreases in cancer incidence. (b) The sentence is telling us the slope of the regression line. (c) The explanatory variable is elevation and the response variable is lung cancer incidence. 2.252 (a) For someone who has played 8 years of football, we have Cognition = 102 − 3.34(8) = 75.28. A person playing 8 years of football is predicted to be at the 75.28 percentile in cognitive ability. For someone who has played 14 years of football, we have Cognition = 102 − 3.34(14) = 55.24. A person playing 14 years of football is predicted to be at the 55.24 percentile in cognitive ability. (b) The slope is −3.34. For every additional year playing football, cognitive percentile is predicted to go down 3.34.
CHAPTER 2
67
(c) The intercept corresponds to 0 years of playing football, so it is not reasonable to interpret the intercept because we are extrapolating too far away from the original values. Also the intercept is 102, and it is impossible to be above the 100th percentile. 2.253 (a) For this case, the number of years playing football is 18, predicted brain size appears to be about 2700, actual brain size appears to be about 2400, and the residual is 2400 − 2700 = −300. (b) The largest positive residual corresponds to the point farthest above the line. For this point, the number of years playing football appears to be 12, predicted brain size appears to be about 3000, actual brain size appears to be about 3900, so the residual is 3900 − 3000 = 900. (c) The largest negative residual corresponds to the point farthest below the line. For this point, the number of years playing football appears to be 13, predicted brain size appears to be about 2900, actual brain size appears to be about 2200, so the residual is 2200 − 2900 = −700. 2.254 The slope of the regression line indicates that the TV hours decrease by about 0.397 hours for each additional year, so over the decade the total decrease is about 3.97 hours. 2.255 The slope of the regression line indicates that the computer hours increase by about 0.467 hours for each additional year, so over the decade the total increase is about 4.67 hours. 2.256 (a) We are using runs to predict wins, so the explanatory variable is runs and the response variable is wins. (b) The slope or the regression line is 0.1443. This means that we predict that obtaining 1 more run leads to about 0.1443 more wins. (c) The predicted number of wins for Houston is W ins = −31.94 + 0.1443(920) = 100.8 games. The residual is 107 − 100.8 = 6.2, meaning that the Giants won 6.2 more games than expected with 920 runs, so they were efficient at winning games. 2.257 The slope is 0.839. The slope tells us the expected change in the response variable (Margin) given a one unit increase in the predictor variable (Approval). In this case, we expect the margin of victory to go up by 0.839 if the approval rating goes up by 1. The y-intercept is −36.76. This intercept tell us the expected value of the response variable (Margin) when the predictor variable (Approval) is zero. In this case, we expect the margin of victory to be −36.76 if the approval rating is 0. In other words, if no one approves of the job the president is doing, the president will lose in a landslide. This is not surprising! 2.258 (a) For a height of 60, the predicted weight is W eight = −170 + 4.82(60) = 119.2. The predicted weight for a person who is 5 feet tall is 119.2 pounds. For a height of 72, the predicted weight is W eight = −170 + 4.82(72) = 177.04. The predicted weight for a person who is 6 feet tall is about 177 pounds. (b) The slope is 4.82. For an additional inch in height, weight is predicted to go up by 4.82 pounds. (c) The intercept is −170 and indicates that a person who is 0 inches tall will weigh −170 pounds. Clearly, it doesn’t make any sense to predict the weight of a person who is 0 inches tall! It also doesn’t make sense to have a negative weight. (d) For a “height” of 20 inches, the line predicts a weight of W eight = −170 + 4.82(20) = −73.6 pounds. This is a ridiculous answer. We cannot use the regression line in this case because the line is based on data for adults (specifically, college students), and we should not extrapolate so far from the data used to create the line.
68 2.259
CHAPTER 2 (a) The explanatory variable is pre-season wins, the response variable is regular season wins.
(b) For a team that won 2 games in the pre-season, the predicted number of wins is 7.27 + 0.35(2) = 7.97 wins. (c) The slope is 0.35, which implies that for every 1 pre-season win we predict 0.35 more regular season wins. (d) The intercept is 7.27, which implies that we predict a team with 0 pre-season wins will win 7.27 regular season games. This is a reasonable assumption and prediction. (e) The regression line predicts 42.27 wins for a team that wins 100 pre-season games, which is definitely not appropriate because we are extrapolating well away from the possible number of pre-season wins. (There are only four pre-season games.) 2.260
(a) We predict that each year the number of bee colonies decreases by 8.358 thousand.
(b) The y-intercept in the context presented here refers to the predicted number of bee colonies in 0 AD, which is way too far outside the range of values used to create the line to give us anything meaningful. If we adjusted the year, the intercept would indicate the predicted number of bees colonies in 1995, which is reasonable. (c) We see that Colonies = 19, 292 − 8.358(2100) = 1, 740.2 thousand colonies, but this is not appropriate because it extrapolates so far from the data. 2.261 The man with the largest positive residual weighs about 190 pounds and has a body fat percentage about 40%. The predicted body fat percent for this man is about 20% so the residual is about 40 − 20 = 20. 2.262 (a) There is a stronger positive linear trend, and therefore a larger correlation, in the one using abdomen. (b) The actual body fat percent, from the dot, appears to be about 34% while the predicted value on the line appears to be about 40%. (c) This person has an abdomen circumference of approximately 113 cm. The predicted body fat percent for this abdomen circumference appears on the line to be about 32%, which gives a residual of about 40 − 32 = 8. at = −47.9 + 1.75 · (35) = 13.35%. For a 2.263 (a) For 35 cm, the predicted body fat percent is BodyF at = −47.9 + 1.75 · (40) = 22.1%. neck circumference of 40 cm, the predicted body fat percent is BodyF (b) The slope of 1.75 indicates that as neck circumference goes up by 1 cm, body fat percent goes up by 1.75. at = −47.9 + 1.75 · (38.7) = 19.825%, so the (c) The predicted body fat percent for this man is BodyF residual is 11.3 − 19.825 = −8.525. 2.264 (a) The scatterplot of RottenTomatoes vs AudienceScore with regression line (using RottenTomatoes as the predictor) is shown below. We see a positive association with audience scores tending to be higher when critic scores are higher.
CHAPTER 2
60 40 20
AudienceScore
80
100
69
0
20
40
60
80
100
RottenTomatoes
(b) The fitted prediction equation is AudienceScore = 36.52+0.4451·RottenT omatoes. The slope indicates that for every increase of one point in the rotten tomatoes (critics) rating, we expect the audience rating to increase by about 0.45 points. (c) Substituting a rotten tomatoes value of 78 into the prediction equation gives a predicted audience score for Green Room of 71.24. AudienceScore = 36.52 + 0.4451 · 78 = 71.24 The actual audience score is 91, so the residual is 91 − 71.24 = 19.76. 2.265
(a) Using technology to fit a regression we get the prediction equation CompRate = 17.25 + 0.00469 · F acSalary
(b) Substituting F acSalary = 5000 into the prediction equation for part (a) we have CompRate = 17.25 + 0.004694 · 5000 = 40.72 We predict a four-year school with average faculty salary of $5,000 to have a completion rate of 40.72%. (c) The slope of the fitted line (0.004694) is positive, indicating that schools with higher faculty salaries will be predicted to have higher completion rates. While the magnitude of the coefficient might appear to be small, the FacSalary values are quite large. An increase of 0.004694 percent for a $1 in increase in faculty salary, would mean a 4.695% increase for an additional $1,000 in salary. The scatterplot with the fitted line on it shown below indicates this can yield a substantial change in completion percentage over the range of faculty salaries.
CHAPTER 2
70
P G = 6.69 + 1.687 · CityM P G. The 2.266 (a) Using technology, the fitted prediction equation is HwyM slope indicates that, for every increase of one mpg in the city rating, we expect the highway rating to go up by about 1.687 mpg. (b) Putting CityM P G = 23 into the prediction equation, we find that the predicted highway mpg for a Toyota Corolla is P G = 6.69 + 1.687 · 23 = 45.49 HwyP The residual (Actual − P redicted) is 40 − 45.59 = −5.59 mpg. 2.267 (a) See the figure. We see that there is a relatively strong positive linear trend. It appears that the opening weekend is a reasonably good predictor of future total world earnings for a movie.
(b) The movie Frozen had a world gross of $1272.45 million, but an opening weekend of only 0.24 million. (c) We find that the correlation is r = 0.907. (d) The regression line is W orldGross = 15.4 + 7.50 · OpeningW eekend. (e) If a movie makes 50 million dollars in its opening weekend, we predict that total world earnings for the year for the movie will be W orldGross = 15.4 + 7.5 · (50) = 390.4 million dollars. 2.268
(a) See the figure. There is a clear linear trend, so it is reasonable to construct a regression line.
CHAPTER 2
71
(b) Using technology, we see that the regression line is Happiness = −1.09 + 0.103 · Lif eExpectancy. (c) The slope of 0.103 indicates that for an additional year of life expectancy, the happiness rating goes up by 0.103. 2.269 Answers will vary.
CHAPTER 2
72 Section 2.7 Solutions
2.270 (a) Two variables are shown in the scatterplot, and both are quantitative. The range for Variable1 appears to be about 29 − 13 = 16. The range for Variable2 appears to be about 160 − 70 = 90. (b) The association appears to be positive. (c) Variable2 is on the vertical axis and is the response variable. The slope of the line is clearly positive, verifying our answer to part (b). (d) The new variable is categorical, with four different groups, labeled A, B, C, D. (e) The association between Variable1 and Variable2 appears to be negative in all four different groups. (f) The regression line shows a negative slope in all four different groups. (g) The association switches from appearing to be positive to being clearly negative, regardless of which group a case is in. 2.271 (a) The variable Happiness is quantitative, the variable Footprint is quantitative, and the variable Region is categorical. (b) Regions 1 (Latin America) and 2 (Western nations) seem to have the greatest happiness score. Region 2 (Western nations) seems to have the greatest ecological footprint. (c) Region 4 (Sub-Saharan Africa) seems to have the lowest happiness scores. The ecological footprint also tends to be low in that region. (d) Yes, overall, having a greater ecological footprint appears to be associated with having a higher happiness score. (e) No. Considering only those countries in region 2 (Western nations), there does not seem to be a relationship between happiness and footprint. A greater ecological footprint does not appear to be associated with a higher happiness score. (f) It is in the top left, with a relatively large happiness score and a relatively small ecological footprint. (g) For countries in region 4, we should focus on trying to increase the happiness score. For countries in region 2, we should focus on decreasing the ecological footprint. 2.272
(a) Three variables: height, weight, and body fat percentage. All three are quantitative.
(b) There appears to be a positive relationship. As height increases, weight tends to increase. (c) The bubbles tend to be larger on the top half of the scatterplot. This makes sense since heavier people (with weight above 200 pounds) might be more likely to have a larger body fat percentage. (d) The one weighing 125 pounds has the bigger bubble and hence the larger body fat percentage (despite weighing less.) (e) The 125 pound man with a height of 66 inches appears to have a much larger body fat percentage than the one at 67 inches. (f) The person with the largest weight (over 260 pounds) appears to have one of the largest bubbles on the graph, so this person’s body fat percentage is relatively large. (g) The person with the largest height (about 78 inches) has a bubble that is pretty average in size, so that person’s body fat percentage is pretty average. The person who is third largest in height has quite a small bubble, so that person’s body fat percentage is relatively small.
CHAPTER 2
73
(h) One way to incorporate a fourth variable of gender would be to use different color bubbles for the two genders. 2.273 (a) There are three variables displayed in the figure. Hippocampus size is quantitative and number of years playing football is quantitative, while the group (control, football with concussion, football without concussion) is categorical. (b) The blue dots represent the study participants who are in the control group and have never played football, so the ”years of football” for all these participants is zero. (c) The general trend in the scatterplot shows a negative association between years playing football and hippocampus size. (d) The reddish brown line (for the football with concussion group) is lower, which tells us that football players with a history of concussions appear to have smaller hippocampus sizes. (e) The green line (for football players without a history of concussions) has the steeper slope. 2.274 (a) The CO2 concentration in 1960 appears to be about 317 ppm. In 2015, it appears to be about 400 ppm. (b) It is increasing throughout this period. (c) During this long history, the concentration is primarily oscillating. (d) It looks very much like a vertical line. (e) It appears that, before 1950, the concentration of CO2 had never been above 300 ppm. 2.275 (a) While it varies each year, the general trend for both positions is running the 40 yard dash in less time, so faster. (b) Cornerbacks were slightly faster. (c) No, from year to year it varies which position runs faster on average. 2.276
(a) No, the point differential was never below 0, so the Warriors never trailed in this game.
(b) The Warriors had their largest lead in the 2nd quarter, at right around +30. 2.277
(a) There are 7 purple dots at the 24-minute mark, so they were losing in 7 games at halftime.
(b) There are 2 purple dots at the 0-minute mark, so they only lost two of these games (both times by more than 13 points.) 2.278 (a) There is a negative association between the two variables that generally persists over all time points. (b) The number of babies per Russian woman decreases dramatically from 1941 to 1943 (from approximately 4.5 to just under 2). The age at first marriage does not change substantially. (c) Generally, number of babies decrease and age at first marriage increases for Libyan woman from 1973 to 2005. 2.279
(a) In 1970 the majority of people living in extreme poverty were from Asia.
(b) In 2018 the majority of people living in extreme poverty were from Africa. (c) In 1970, the distribution has two peaks (it is bimodal ), with one peak below $1/day and another between $10/day and $20/day. This clearly distinguishes the “third world” from the “first world”. In 2018, the overall distribution is more symmetric and bell-shaped on the log-scale provided.
CHAPTER 2
74 2.280
(a) Most Americans are sleeping at 6am, as the majority of dots are in the “Sleeping” category.
(b) At 10am, more Americans are working, as there are more dots in the “Working” category than any other category. (c) Slightly more Americans are eating & drinking at 7pm (approximately 14%) than at 6pm (approximately 11%). (d) Answers will vary; any correct observation is acceptable. 2.281
(a) There are 6 different categories shown, from “<10%” to “≥30%”.
(b) In 1990, it appears that the highest category needed is “10%–14%”. In 2000, the highest category needed is “20%–24%”. In 2010, many states are in the top category “≥30%”. 2.282 (a) The first year the 15–19% category was needed was 1990 (the first year included in the sequence) with one state (Mississippi) in that category. The 20–24% category first appeared in 2000. The 25–29% category first appeared in 2003, the 30–34% in 2006 (again, with only one state: Mississippi), and the 35+ category appeared in 2013 (in two states: Mississippi and West Virginia.) (b) Answers will vary. 2.283 (a) We see in the spaghetti plot that for every state, the percent obese more than doubled during this time period. (b) There is more variability in 2018. (c) In 1990, the state with the largest percent obese is Mississippi, with 15.0% obese. In 2018, the state with the smallest percent obese is Colorado, with 23.0% obese. 2.284 Answers will vary from person to person. 2.285
(a) We see that “bro” is most commonly used in Texas and surrounding states.
(b) We see that “buddy” is most commonly used in the midwest. 2.286 (a) In 1880, Rhode Island spent the most ($82.72) per person on cotton. Texas spent the least (only $0.01) per person. (b) Cotton was most expensive (high price of $1.90 per pound) in 1864. 2.287
(a) The eastern half of the country is much more heavily populated than the western half.
(b) Answers will vary. 2.288 The top baby girl name in 2014 was Sophia, and the top name in 1880 was Mary. 2.289 (a) We can see that there are two distinct clusters of dots, with one cluster having smaller petals (in length and width) than the other. (b) Setosa has the smallest petals, while Virginica has the largest petals. 2.290 (a) In a scatterplot showing an association between petal length and petal width, there is a pretty clear distinction between the green dots and the red dots. In a scatterplot showing sepal width and sepal length, however, the green and red dots are mostly mixed together. Thus, the petal length and width show a clearer distinction.
CHAPTER 2
75
(b) Looking only at the black dots (for Setosa), we see that the association between sepal width and petal length appears to be neither positive nor negative. (The dots appear either mostly horizontal or mostly vertical depending on which variable you put on which axis. In either case, the association does not look positive or negative.) 2.291 Sepal length is on the vertical axis, and the green dots are highest up. 2.292
(a) In 1950 there were a lot of babies and toddlers (kids 0–4) and very few old people.
(b) In 1950 the distribution of age is right-skewed. (c) In 2060 there are many more old people and many fewer babies than there were in 1950. (d) In 2060, it is projected that there will be more females than males in the 85+ range (females live longer, on average). 2.293 (a) Whites are decreasing the most in terms of percentage of the US population, from 85% in 1960 to (projected) 43% in 2060. (b) Hispanics are increasing the most in terms of percentage of the US population, from 4% in 1960 to (projected) 31% in 2060. (c) Blacks are staying the most constant in terms of percentage of the US population, from 10% in 1960 to (projected) 13% in 2060. 2.294 Direction and speed of the wind at each station are needed to make this map. 2.295 (a) The commutes on the carbon bike tend to be longer in distance than the commutes on the steel bike. (b) The carbon bike is slightly faster, on average, but also tends to cover more distance on the commute, making the overall trip longer in minutes. Perhaps the devices measuring distance, time, or speed give slightly different readings between the two bikes. (c) Distance is associated with both type of bike and commute time in minutes, confounding the relationship between the two. Distance is a confounding variable. (d) To minimize commute time, Dr. Groves should ride the carbon bike (which has a slightly faster average speed), but take the shorter distance commutes that are typically taken with the steel bike in this dataset. He might also want to check the accuracy of the recording devices used with each bike. 2.296 (a) The median Democrat value became more liberal from 1994 to 1998, then stayed relatively constant until 2011, and from 2011 to 2014 it became much more liberal. (b) The median Republican value became more liberal from 1994 to 2003, and then shifted to becoming more conservative from 2004 to 2014. (c) There is much more political polarization in 2014 than in 1994. (d) The two parties started moving rapidly away from each other in 2011 (although the Republican party started moving away from the Democratic party in 2004). (e) In 2014, the politically active people are much more polarized than the general public. (f) Answers will vary. 2.297
(a) In 2014, New England had the highest support for same-sex marriage.
(b) In 2014, South Central had the lowest support for same-sex marriage.
CHAPTER 2
76
(c) The Mountain states displayed the smallest increase (about 9%), and the Midwest displayed the largest (about 24%, although New England and Pacific are close at 23%). (d) This data could also have been visualized with a spaghetti plot, in which each time series would be shown on the same plot, or with a dynamic heat map, in which a map of the US could have been shown color-coded according to the level of support, with the colors changing over time. 2.298
(a) People who bought organic food were richer than those who didn’t.
(b) Richer people are more likely to report very good or excellent health. (c) Income is a confounding variable because it is associated with both whether or not someone bought organic and with their self-reported health status. (d) No, income is a confounding variable. We cannot determine whether the observed difference in reported health is due to buying organic food, or just due to the fact that people who buy organic are richer and richer people are healthier. 2.299 (a) Yes, after breaking it down by income, it is still generally true that people who bought organic are more likely to report very good or excellent health, because the red points are above the blue points in all except one category. (b) No. This is still an observational study, and there are many other potential confounding variables. 2.300 (a) For all stones, percutaneous nephrolithotomy is more successful (289/350 = 83% success for percutaneous nephrolithotomy, as opposed to 273/350 = 78% success for open surgery). (b) For small stones, open surgery is more successful (81/87 = 93% success for open surgery, as opposed to 234/270 = 87% for percutaneous nephrolithotomy). (c) For large stones, open surgery is more successful (192/263 = 73% success for open surgery, as opposed to 55/80 = 69% for percutaneous nephrolithotomy). (d) Smaller stones result in higher success rates for both treatments. This is most easily seen visually (more light gray for small stones), but can also be calculated numerically: small stone treatment has success rates of (81 + 234)/(87 + 270) = 315/357 = 88% when combining both treatments, while large stone treatment has lower success rates of (192 + 55)/(263 + 80) = 247/343 = 72% when combining both treatments. (e) Percutaneous nephrolithotomy is much more commonly used for small stones. On the small stones plot, ignoring success, the bar for percutaneous nephrolithotomy is much taller than the bar for open surgery. (f) Open surgery is much more commonly used for large stones. On the large stones plot, ignoring success, the bar for open surgery is much taller than the bar for percutaneous nephrolithotomy. (g) Open surgery is more commonly used for large stones, which are harder to treat (have a lower success rate in general than small stones). Therefore, even though open surgery is more successful for each stone size, it appears to be worse overall when the stone sizes are combines. (h) This was probably not a randomized experiment. If it had been a randomized experiment, then the numbers of small and large stones should have been approximately the same between the two treatments. 2.301 Answers will vary. 2.302 According to this graphic, greenhouse gases are warming the world.
CHAPTER 2
77
2.303 There is a strong tendency for blacks to live near other blacks in one region of St. Louis (near the middle of the city), and for whites to live near other whites. 2.304 (a) Mornings tend to have more cloud cover than evenings. This may be most visible in the circle plot where the lines extend further from the center (no clouds) in the morning hours. (b) Summer shows the most variability in cloud cover throughout the day. This may be most visible in the spaghetti plot which shows more variability in cloud cover during days in July and August. (c) Answers will vary. 2.305
(a) Early morning (6 am to 7 am) has the highest percent cloud cover in August.
(b) Summer (around August) tends to be the least windy season for Chicago. 2.306 Solar wind is likely responsible for particles leaving the atmosphere of Mars. 2.307
(a) This video is conveying the endangered status and variety of gazelle species.
(b) Answers will vary. 2.308
(a) Student gatherings at a university based on smart phone tracking.
(b) Answers will vary. 2.309 Answers will vary. 2.310 (a) Using technology and the data in Collegescores4yr we find the mean faculty salary is $7091 at private schools and $8520 at public schools, so public schools have higher average salary. (b) The mean completion rate is higher at private schools (55.7%) than at public schools (50.2%). (c) From parts (a) and (b) it looks like schools with higher faculty salaries (public) tend to have lower completion rates. This would suggest a negative association between the two variables. (d) The blue circles appear more at lower faculty salaries (to the left), but have higher completion rates (towards the top). This suggests the blue points and line are for private colleges. Note also that the mean faculty salary should predict the mean completion rate in each group, so (7091, 55.7) should be on the line for private schools and (8520, 50.2) should be on the line for public schools. (e) The regression lines in the scatterplot make it clear that we have a positive association between faculty salaries and completion rates for both private and public schools. This is not consistent with the answer based only on comparing the means for each group. 2.311 Answer will vary. 2.312 Answer will vary. 2.313 Answer will vary. 2.314 Answers will vary. 2.315 Although we could use separate time series plots, the figure below puts all three series on the same scale (as a spaghetti plot) to ease comparisons.
78
CHAPTER 2
(a) The most obvious difference between Moscow and San Francisco is that the variability is much higher in Moscow temperatures. The low temperatures are around the same time of the year (January), but are much lower in Moscow. The warmest months have more similar temperatures. (b) The variability is similar between Melbourne and San Francisco, but the seasonal patterns are reversed. Since Melbourne is in the Southern Hemisphere, its warmest months are in January and February, when San Francisco, in the Northern Hemisphere, tends to be the coldest. Melbourne’s chilliest period in July is around when San Francisco is enjoying warmer temperatures. 2.316 Answers will vary. 2.317 Answers will vary. 2.318 Answers will vary.
CHAPTER 3
109
Section 3.1 Solutions 3.1 This mean is a population parameter; notation is μ. 3.2 This correlation is a population parameter; notation is ρ. 3.3 This proportion is a sample statistic; notation is p̂. 3.4 This proportion is a population parameter; notation is p. 3.5 This mean is a sample statistic; notation is x. 3.6 This is a population parameter for a proportion, so the correct notation is p. We have p = 170,000/ 78,000,000 = 0.00217. 3.7 This is a population parameter for a mean, so the correct notation is μ. We have μ = 59,388/148 = 401.3 students as the average enrollment per charter school. 3.8 This is a sample statistic for a proportion, so the correct notation is p̂. We have p̂ = 0.98. 3.9 This is a sample statistic from a sample of size n = 200 for a correlation, so the correct notation is r. We have r = 0.037. 3.10 This is a sample statistic for a mean, so the correct notation is x. We have x = 13.10 phone calls a day. 3.11 Since the data are for all regular players on the team, we need the population parameter for a correlation, so the correct notation is ρ. We use technology to see that ρ = 0.139. 3.12 We expect the sampling distribution to be centered at the value of the population proportion, so we estimate that the population parameter is p = 0.30. The standard error is the standard deviation of the distribution of sample proportions. The middle of 95% of the distribution goes from about 0.16 to 0.44, about 0.14 on either side of p = 0.30. By the 95% rule, we estimate that SE ≈ 0.14/2 = 0.07. (Answers may vary slightly.) 3.13 We expect the sampling distribution to be centered at the value of the population mean, so we estimate that the population parameter is μ = 85. The standard error is the standard deviation of the distribution of sample means. The middle of 95% of the distribution goes from about 45 to 125, about 40 on either side of μ = 85. By the 95% rule, we estimate that SE ≈ 40/2 = 20. (Answers may vary slightly.) 3.14 We expect the sampling distribution to be centered at the value of the population mean, so we estimate that the population parameter is μ = 300. The standard error is the standard deviation of the distribution of sample means. The middle of 95% of the distribution goes from about 290 to 310, about 10 on either side of μ = 300. By the 95% rule, we estimate that SE ≈ 10/2 = 5. (Answers may vary slightly.) 3.15 We expect the sampling distribution to be centered at the value of the population proportion, so we estimate that the population parameter is p = 0.80. The standard error is the standard deviation of the distribution of sample proportions. The middle of 95% of the distribution goes from about 0.74 to 0.86, about 0.06 on either side of p = 0.80. By the 95% rule, we estimate that SE ≈ 0.06/2 = 0.03. (Answers may vary slightly.)
110
CHAPTER 3
3.16
(a) We see in the sampling distribution that a sample proportion of p̂ = 0.1 is rare for a sample of this size but similar sample proportions occurred several times in this sampling distribution. This value is (ii): unusual but might occur occasionally.
(b) We see in the sampling distribution that a sample proportion of p̂ = 0.35 is not at all unusual with samples of this size, so this value is (i): reasonably likely to occur. (c) We see in the sampling distribution that there are no sample proportions even close to p̂ = 0.6 so this sample proportion is (iii): extremely unlikely to ever occur using samples of this size. 3.17
(a) We see in the sampling distribution that a sample mean of x = 70 is not unusual for samples of this size, so this value is (i): reasonably likely to occur.
(b) We see in the sampling distribution that a sample mean of x = 100 is not unusual for samples of this size, so this value is (i): reasonably likely to occur. (c) We see in the sampling distribution that a sample mean of x = 140 is rare for a sample of this size but similar sample means occurred several times in this sampling distribution. This value is (ii): unusual but might occur occasionally. 3.18
(a) We see in the sampling distribution that there are no sample means even close to x = 250 so this sample mean is (iii): extremely unlikely to ever occur using samples of this size.
(b) We see in the sampling distribution that a sample mean of x = 305 is not unusual for samples of this size, so this value is (i): reasonably likely to occur. (c) We see in the sampling distribution that a sample mean of x = 315 is rare for a sample of this size but similar sample means occurred several times in this sampling distribution. This value is (ii): unusual but might occur occasionally. 3.19
(a) We see in the sampling distribution that a sample proportion of p̂ = 0.72 is rare for a sample of this size but similar sample proportions occurred several times in this sampling distribution. This value is (ii): unusual but might occur occasionally.
(b) We see in the sampling distribution that a sample proportion of p̂ = 0.88 is rare for a sample of this size but similar sample proportions occurred several times in this sampling distribution. This value is (ii): unusual but might occur occasionally. (c) We see in the sampling distribution that there are no sample proportions even close to p̂ = 0.95 so this sample proportion is (iii): extremely unlikely to ever occur using samples of this size. 3.20
(a) The parameter of interest is a population proportion, so the notation is p. We have p = the proportion of all smartphone users in the US who have downloaded an app.
(b) The quantity that gives the best estimate is the sample proportion p̂ and the value is p̂ = 355/461 = 0.770. (c) We would have to ask all smartphone users in the US whether or not they had ever downloaded an app for their phone. This would be very hard to do! 3.21
(a) The parameter of interest is a population mean, so the notation is μ. We have μ = the mean number of apps downloaded by all smartphone users in the US who have downloaded at least one app.
(b) The quantity that gives the best estimate is the sample mean, which is denoted x. Its value is x = 19.7. (c) We would have to ask all smartphone users in the US how many apps they have downloaded.
CHAPTER 3
111
3.22 The quantity we are estimating is a population proportion, and the notation is p. The quantity we are using to make the estimate is our sample proportion p̂, and the value of p̂ is 0.66. We define the parameter p as the proportion of all global consumers who are willing to pay more for products and services from companies who are committed to positive social and environmental impact. 3.23
(a) We are estimating ρ, the correlation between pH and mercury levels of fish for all the lakes in Florida. The quantity that gives the best estimate is our sample correlation r = −0.575. We estimate that the correlation between pH levels and levels of mercury in fish in all Florida lakes is −0.575.
(b) We use an estimate because it would be very difficult and costly to find the exact population correlation. We would need to measure the pH level and the mercury in fish level for all the lakes in Florida, and there are over 7700 of them. 3.24
(a) The value 30 is a population parameter and the notation is μ = 30. The value 27.90 is a sample statistic and the notation is x = 27.90.
(b) The distribution will be bell-shaped and the center will be at the population mean of 30. The sample mean 27.90 would represent one point on the dotplot. (c) The dotplot will have 1000 dots and each dot will represent the mean for a sample of 75 co-payments. 3.25
(a) The sample statistic is p̂ = 38/500 = 0.076. We see in the sampling distribution that this sample value is far lower than any of the values shown, so this sample statistic is very unlikely to occur just by random variation.
(b) The sample statistic is p̂ = 64/500 = 0.128. We see in the sampling distribution that this sample value occurs frequently and is likely to occur just by random variation. (c) The sample statistic is p̂ = 76/500 = 0.152. We see in the sampling distribution that this sample value occurs frequently and is likely to occur just by random variation. (d) The sample statistic is p̂ = 105/500 = 0.210. We see in the sampling distribution that this sample value is far higher than any of the values shown, so this sample statistic is very unlikely to occur just by random variation. 3.26
(a) The Canadian census includes all Canadian adults, so this is a proportion for a population and the correct notation is p. We know that 81.7% is the same as 0.817 so we have p = 0.817.
(b) The notation for a sample proportion is p̂. Answers will vary but will be somewhat close to the population proportion of 0.817. (c) Answers will vary. (d) The distribution will be centered at approximately 0.817. The standard error is approximately 0.017. 3.27
(a) The two distributions centered at the population average are probably unbiased, distributions A and D. The two distributions not centered at the population average (μ = 2.61) are biased, dotplots B and C. The sampling for Distribution B gives an average too high and has large households overrepresented. The sampling for Distribution C gives an average too low and may have been done in an area with many people living alone.
(b) The larger the sample size the lower the variability, so distribution A goes with samples of size 100, and distribution D goes with samples of size 500. 3.28
(a) As the sample size goes up, the accuracy improves, which means the spread goes down. We see that distribution A goes with sample size n = 20, distribution B goes with n = 100, and distribution C goes with n = 500.
CHAPTER 3
112
(b) We see in dotplot A that quite a few of the sample proportions (when n = 20) are less than 0.25 or greater than 0.45, so being off by more than 0.10 would not be too surprising. While it is possible to be that far away in dotplot B (when n = 100), such points are much more rare, so it would be somewhat surprising for a sample of size n = 100 to miss by that much. None of the points in dotplot C are more than 0.10 away from p = 0.35, so it would be extremely unlikely to be that far off when n = 500. (c) Many of the points in dotplot A fall outside of the interval from 0.30 to 0.40, so it is not at all surprising for a sample proportion based on n = 20 to be more than 0.05 from the population proportion. Even dotplot B has quite a few values below 0.30 or above 0.40, so being off by more than 0.05 when n = 100 is not too surprising. Such points are rare, but not impossible in dotplot C, so a sample of size n = 500 might possibly give an estimate that is off by more than 0.05, but it would be pretty surprising. (d) As the sample size goes up, the accuracy of the estimate tends to increase. 3.29 The quantity we are trying to estimate is μm − μo where μm represents the average grade for all fourth-grade students who study mixed problems and μo represents the average grade for all fourth-grade students who study problems one type at a time. The quantity that gives the best estimate is xm − xo , where xm represents the average grade for the fourth-grade students in the sample who studied mixed problems and xo represents the average grade for the fourth-grade students in the sample who studied problems one type at a time. The best estimate for the difference in the average grade based on study method is xm − xo = 77 − 38 = 39. 3.30 The quantity we are trying to estimate is pa − pt where pa represents the proportion of adult cell phone users who text message and pt represents the proportion of teen cell phone users who text message. The quantity that gives the best estimate is p̂a − p̂t , where p̂a represents the proportion of the adult cell phone users in the sample of 2,252 who text message and p̂t represents the proportion of teen cell phone users in the sample of 800 who text message. The best estimate for the difference in the proportion who text is p̂a − p̂t = 0.72 − 0.87 = −0.15. 3.31
(a) We expect means of samples of size 30 to be much less spread out than values of budgets of individual movies. This leads us to conclude that Boxplot A represents the sampling distribution and Boxplot B represents the values in a single sample. We can also consider the shapes. Boxplot A appears to be more symmetric and Boxplot B appears to be right skewed. Since we expect a sampling distribution to be roughly symmetric and bell-shaped, Boxplot A is the sampling distribution and the skewed Boxplot B shows values in a single sample.
(b) Boxplot B shows the data from one sample of size 30. Each data value represents the budget, in millions of dollars, for one Hollywood movie made between 2012 and 2018. There are 30 values included in the sample. The budgets range from about 1 million to around 250 million for this sample. We see in the boxplot that the median is around 30 million dollars. Since the data are right skewed, we expect the mean to be higher. We estimate the mean to be between 50 million and 70 million. This is the mean of a sample, so we have x ≈ 50 million dollars. (Answers may vary.) (c) Boxplot A shows the data from a sampling distribution using samples of size 30. Each data value represents the mean of one of these samples. There are 1000 means included in the distribution. They range from about 20 to 95 million dollars. The center of the distribution is a good estimate of the population parameter, and the center appears to be about μ ≈ 52 million dollars, where μ represents the mean budget, in millions of dollars, for all movies coming out of Hollywood between 2012 and 2018. (Answers may vary.) 3.32
(a) Both distributions are centered at the population parameter of 0.13.
CHAPTER 3
113
(b) The proportions for samples of size n = 100 go from about 0.05 to 0.23. The proportions for samples of size n = 1000 go from about 0.10 to 0.16. (c) The standard error for samples of size n = 100 is about 0.03 (since it appears that about 95% of the data are between 0.07 and 0.19, which is 0.06 from the mean on either side). The standard error for samples of size n = 1000 is about 0.01 (since it appears that about 95% of the data are between 0.11 and 0.15). (d) A sample proportion of 0.17 is not unusual from a sample of 100, but extremely unlikely with a sample size of 1,000. 3.33
(a) Answers will vary. The following table gives one possible set of randomly selected Enrollment values. The mean for this sample is x = 6171. College Iowa Lakes Community College Passaic County Community College SUNY Morrisville Southside Regional Medical Center Professional Schools Stark State College Bryant & Stratton College-Southtowns University of Rio Grande Cincinnati State Technical and Community College College of Southern Nevada Massasoit Community College
Enrollment 1134 6617 2815 110 7778 390 935 6934 28959 6035
(b) Answers will vary. The following table gives another possible set of randomly selected Enrollment values. The sample mean for this sample is x = 5652. College Blackfeet Community College Tulsa Community College North Country Community College North Shore Community College Johnson County Community College Rabbinical College of Long Island Passaic County Community College St Luke’s College Pennsylvania State University-Penn State Mont Alto Community College of Philadelphia
Enrollment 372 14413 995 5514 11418 143 6617 274 759 16012
(c) The population mean number of enrollment for all 1141 two-year colleges is μ = 4050. Most sample means found in parts (a) and (b) will be somewhat close to this, but may vary since the sample size is small. Our samples in (a) and (b) were both on the high side. (d) The distribution should be centered around 4050. 3.34
(a) Here’s one possible sample of size 5 yearly NFL salaries. The sample mean for this sample is x = 3.684.
CHAPTER 3
114 Taysom Hill Kareem Jackson B.J. Hill T.J. Hockenson Steven Means
0.557 11.000 1.012 4.955 0.895
(b) Here’s another possible sample of size 5 yearly NFL salaries. The sample mean for this sample is x = 1.936. Jahlani Tavai Makinton Dorleant Quenton Nelson Jordan Akins AJ Cole
1.723 0.570 5.972 0.831 0.586
(c) The notation for the population mean is μ and we have μ = $3.033 million. For our samples above, the mean for (a) is a bit high and the mean for (b) is quite a bit smaller. 3.35
(a) Answers will vary, but a typical distribution is shown below. The sampling distribution is somewhat right skewed (but not nearly as skewed as the original population). This set of 5000 sample means is centered at 4048 which is close to the population mean μ = 4050. The standard error for this set of simulated means is about 1909.
(b) A typical distribution is shown below. The sampling distribution for n = 60 is more symmetric than the distribution for n = 10 in part (a). It is still centered near the population mean μ = 4050. The standard error for this set of simulated means is smaller than in part (a), about 759.
CHAPTER 3
3.36
115
(a) The smallest value in the population is $0.488 million; the largest value is $35.0 million.
(b) Here is one sampling distribution with means for 2000 samples of size n = 5. The smallest mean in this distribution is $0.560 million and the largest is $11.693 million.
(c) The standard error for the distribution above is $2.03 million (answers will vary). (d) Here is one sampling distribution with means for 2000 samples of size n = 50. The standard error of the means from this distribution is $0.631 million.
116
CHAPTER 3
3.37
(a) The mean and standard deviation from the population of all Hollywood movies made between 2012 and 2018 are parameters so the correct notation is μ and σ respectively. Based on the Budget variable in HollywoodMovies, we find μ = 51.38 million dollars and σ = 57.93 million dollars. Note that several movies in the dataset are missing values for the Budget variable, so we will consider the population to be all movies that have Budget values.
(b) Using technology we produce a sampling distribution (shown below) with means for 5000 samples of size n = 20 taken from the movie budgets in the full dataset. We see that the distribution is symmetric, bell-shaped, and centered at the population mean of $51.5 million. The standard deviation of these 5000 sample means is 12.9, so we estimate the standard error for Hollywood movie budgets based on samples of 20 movies to be SE ≈ $13 million.
3.38
(a) The sampling distribution is symmetric and bell-shaped and is centered approximately at the population proportion of 0.275. We see that the standard error is about 0.063.
CHAPTER 3
117
(b) Again, the sampling distribution is symmetric and bell-shaped and is centered approximately at the population proportion of 0.275. We see that the standard error for this sample size is about 0.020. 3.39
(a) This is a population proportion so the correct notation is p. We have p = 52/329 = 0.158.
(b) We expect it to be symmetric and bell-shaped and centered at the population proportion of 0.158. 3.40
(a) This is a population proportion so the correct notation is p. We have p = 230/329 = 0.699.
(b) We expect it to be symmetric and bell-shaped and centered at the population proportion of 0.699. 3.41
(a) The standard error is the standard deviation of the sampling distribution (given in the upper right corner of the sampling distribution box of StatKey) and is likely to be about 0.115. Answers will vary, but the sample proportions should go from 0 to about 0.6 (as in the dotplot below). In that case, the farthest sample proportion from p = 0.158 is p̂ ≈ 0.6, and it is 0.6 − 0.158 = 0.442 off from the correct population value. In other simulations the maximum proportion might be as high as 0.7.
(b) The standard error is the standard deviation of the sampling distribution and is likely to be about 0.080. Answers will vary, but the sample proportions should go from 0 to about 0.45 (as shown in the dotplot below). In that case, the farthest sample proportion from p = 0.158 is p̂ ≈ 0.45, and it is 0.45 − 0.158 = 0.292 off from the correct population value. Some simulations might produce even larger discrepancies. (c) The standard error is the standard deviation of the sampling distribution and is likely to be about 0.052. Answers for the most extreme proportion will vary. For example, the sample proportions in the dotplot below go from 0.02 to 0.34. In that case, the farthest sample proportion from p = 0.158 is off by is 0.34 − 0.158 = 0.182 off from the correct population value. Some simulations might have even larger discrepancies. (d) Accuracy improves as the sample size increases. The standard error gets smaller, the range of values gets smaller, and values tend to be closer to the population value of p = 0.158.
3.42
(a) The standard error is the standard deviation of the sampling distribution (given in the upper right corner of the sampling distribution box in StatKey) and is likely to be about 0.14. Answers will vary, but the sample proportions should go from about 0.2 to about 1.0 (as shown in the dotplot below). In that case, the farthest sample proportion from p = 0.699 is p̂ = 0.2, and it is 0.699 − 0.2 = 0.499 off from the correct population value.
(b) The standard error is the standard deviation of the sampling distribution and is likely to be about 0.10. Answers will vary, but the sample proportions should go from about 0.35 to about 1.0 (as shown in the dotplot below). In that case, the farthest sample proportion from p = 0.699 is p̂ = 0.35, and it is 0.699 − 0.35 = 0.349 off from the correct population value.
118
CHAPTER 3
(c) The standard error is the standard deviation of the sampling distribution and is likely to be about 0.06. Answers will vary, but the sample proportions should go from about 0.50 to about 0.88 (as shown in the dotplot below). In that case, the farthest sample proportion from p = 0.699 is p̂ = 0.50, and it is 0.699 − 0.50 = 0.199 off from the correct population value. (d) Accuracy improves as the sample size increases. The standard error gets smaller, the range of values gets smaller, and values tend to be closer to the population value of 0.68.
CHAPTER 3
119
Section 3.2 Solutions 3.43
(a) We are estimating a population proportion, so the notation is p.
(b) The best estimate of the population proportion is the sample proportion p̂. 3.44
(a) We are estimating a population mean, so the notation is μ.
(b) The best estimate of the population mean is the sample mean x. 3.45
(a) We are estimating a difference in means, so the notation is μ1 − μ2 .
(b) The best estimate of the difference in population means is the difference in sample means x1 − x2 . 3.46
(a) We are estimating a difference in population proportions, so the notation is p1 − p2 .
(b) The best estimate of the difference in population proportions is the difference in sample proportion p̂1 − p̂2 . 3.47 Using ME to represent the margin of error, an interval estimate for μ is x ± M E = 25 ± 3 so an interval estimate of plausible values for the population mean μ is 22 to 28. 3.48 Using ME to represent the margin of error, an interval estimate for p is p̂ ± M E = 0.37 ± 0.02 so an interval estimate of plausible values for the population proportion p is 0.35 to 0.39. 3.49 Using ME to represent the margin of error, an interval estimate for ρ is r ± M E = 0.62 ± 0.05 so an interval estimate of plausible values for the population correlation ρ is 0.57 to 0.67. 3.50 Using ME to represent the margin of error, an interval estimate for μ1 − μ2 is x1 − x2 ± M E = 5 ± 8 so an interval estimate of plausible values for the difference in population means is −3 to 13. 3.51
(a) Yes, plausible values of μ are values in the interval.
(b) Yes, plausible values of μ are values in the interval. (c) No. Since 105.3 is not in the interval estimate, it is a possible value of μ but is not a very plausible one. 3.52
(a) No. Since 0.85 is not in the interval estimate, it is a possible value of p but is not a very plausible one.
(b) Yes, plausible values of p are values in the interval. (c) No. Since 0.07 is so far out of the interval estimate, it is an extremely unlikely value of the population parameter p. 3.53 The 95% confidence interval estimate is p̂ ± 2 · SE = 0.32 ± 2(0.04) = 0.32 ± 0.08, so the interval is 0.24 to 0.40. We are 95% confident that the true value of the population proportion p is between 0.24 and 0.40. 3.54 The 95% confidence interval estimate is x ± 2 · SE = 55 ± 2(1.5) = 55 ± 3, so the interval is 52 to 58. We are 95% confident that the true value of the population mean μ is between 52 and 58. 3.55 The 95% confidence interval estimate is r ± 2 · SE = 0.34 ± 2(0.02) = 0.34 ± 0.04, so the interval is 0.30 to 0.38. We are 95% confident that the true value of the population correlation ρ is between 0.30 and 0.38.
CHAPTER 3
120
3.56 The interval estimate is r ± margin of error = −0.46 ± 0.05, so the interval is −0.51 to −0.41. We are 95% confident that the true value of the population correlation ρ is between −0.51 and −0.41. 3.57 The 95% confidence interval estimate is (x1 − x2 ) ± margin of error = 3.0 ± 1.2, so the interval is 1.8 to 4.2. We are 95% confident that the true difference in the population means μ1 − μ2 is between 1.8 and 4.2 (which means we believe that the mean of population 1 is between 1.8 and 4.2 units larger than the mean of population 2). 3.58 The interval estimate is (p̂1 − p̂2 ) ± margin of error = 0.08 ± 0.03, so the interval is 0.05 to 0.11. We are 95% confident that the true difference in population proportions p1 − p2 is between 0.05 and 0.11 (which means we believe that the proportion for population 1 is between 0.05 and 0.11 larger than the proportion for population 2). 3.59
(a) The survey was done from a sample of first-year full-time college students, so it is from a sample.
(b) Because the information is from a sample, it is a statistic. Because it is a proportion, the correct notation is p̂. It is helpful to write the value as a proportion instead of a percentage, so we have p̂ = 0.356. (c) Since we are estimating a proportion, the correct notation is p. We define it as p = the proportion of all first-year full-time college students in the US who decide to change their major by the end of the first year. (d) The sample statistic is p̂ = 0.356 and the standard error is SE = 0.007. A 95% confidence interval is given by: Statistic 0.356 0.356
± ± ±
2 · SE 2 · (0.007) 0.014
0.342
to
0.370
We are 95% confident that the proportion of first-year full-time college students in the US who decide to change their major by the end of the first year is between 0.342 and 0.370. 3.60
(a) The 69 percent is a statistic since it comes from a sample. It is a sample proportion, so the notation is p̂. We have p̂ = 0.69.
(b) We are estimating the population proportion p. We define it as p = the proportion of all first year college students who feel homesick. (c) The statistic is p̂ = 0.69 and the margin of error is 0.02. The confidence interval is: Statistic 0.69
± ±
0.67
to
margin of error 0.02 0.71
We are 95% confident that the proportion of all first year college students who feel homesick is between 0.67 and 0.71. 3.61
(a) We are estimating a difference in means, so we are estimating μA − μT , where μA represents the mean fear response for adults and μT represents the mean fear response for teenagers.
(b) The best estimate is given by the difference in sample means xA − xT . Its value is xA − xT = 0.225 − 0.059 = 0.166.
CHAPTER 3
121
(c) The 95% confidence interval is given by Statistic
±
2 · SE
(xA − xT ) 0.166
± ±
2 · SE 2(0.091)
0.166 −0.016
± to
0.182 0.348
The 95% confidence interval for the difference in mean fear response is −0.016 to 0.348. (d) This is an observational study since the explanatory variable (age) was not manipulated. 3.62
(a) We are estimating a proportion, so the notation is p. We define p = the proportion of college students who find it unpleasant to sit alone with their thoughts.
(b) The best estimate is given by the sample proportion p̂. We have p̂ = 76/146 = 0.521. (c) The 95% confidence interval is given by Statistic
±
margin of error
p̂ 0.521
± ±
margin of error 0.08
0.441
to
0.601
The 95% confidence interval for the proportion of college students who would find it unpleasant to sit alone with their thoughts is 0.441 to 0.601. 3.63
(a) We are estimating a difference in proportions, so the notation is pm − pf , where we define pm = the proportion of male college students who choose pain over solitude and pf = the proportion of female college students who choose pain over solitude.
(b) The best estimate is given by the difference in sample proportions p̂m − p̂f . We have p̂m − p̂f = 12/18 − 6/24 = 0.667 − 0.25 = 0.417. (c) The 95% confidence interval is given by Statistic
±
2 · SE
(p̂m − p̂f ) 0.417
± ±
2 · SE 2(0.154)
0.417 0.109
± to
0.308 0.725
The 95% confidence interval for the difference in proportion of college students who would choose pain over solitude, between males and females, is 0.109 to 0.725. (d) “No difference” corresponds to a difference of zero. Since zero is not in the confidence interval, this is not a plausible value. Since all plausible values are positive, males appear to be more likely than females to choose pain over solitude. 3.64
(a) We are estimating a mean, so the notation is μ. We define μ = the mean ergovaline level on the plants (in ppm) after having moose drool applied.
CHAPTER 3
122 (b) The best estimate is given by the sample mean x. We have x = 0.183. (c) The 95% confidence interval is given by Statistic
±
2 · SE
x 0.183
± ±
2 · SE 2(0.016)
0.183 0.151
± to
0.032 0.215
The 95% confidence interval for the mean toxin level on this type of grass after moose drool is applied is 0.151 to 1.215 ppm. 3.65
(a) The information is from a sample, so it is a statistic. It is a proportion, so the correct notation is p̂ = 0.30.
(b) The parameter we are estimating is the proportion, p, of all young people in the US who have been arrested by the age of 23. Using the information in the sample, we estimate that p ≈ 0.30. (c) If the margin of error is 0.01, the interval estimate is 0.30 ± 0.01 which gives 0.29 to 0.31. Plausible values for the proportion p range from 0.29 to 0.31. (d) Since the plausible values for the true proportion are those between 0.29 and 0.31, it is very unlikely that the actual proportion is less than 0.25. 3.66
(a) The population is all people ages 18 and older living in the US. The sample is the 147,291 people who were actually contacted and asked whether or not they got health insurance from an employer. The parameter of interest is p, the proportion of the entire population of US adults who get health insurance from an employer. The relevant statistic is p̂ = 0.45, the proportion of people in the sample who get health insurance from an employer.
(b) An interval estimate is found by taking the best estimate (p̂ = 0.45) and adding and subtracting the margin of error (±0.01). We are relatively confident that the population proportion is between 0.44 and 0.46, or that the percent of the entire population that receive health insurance from an employer is between 44% and 46%. 3.67
(a) We are estimating a difference in means, so the quantity being estimated is μ1 − μ2 , where μ1 = mean anger rating after hearing statements in noun form and μ2 = mean anger rating after hearing statements in verb form.
(b) The notation for the thing that gives the best estimate is the sample difference in means, x1 − x2 . (c) If there is no difference in the means, then the difference in means is zero. Since zero is not in the confidence interval, it is not plausible that noun or verb gives basically the same mean anger reaction. Since the entire confidence interval is negative numbers, the quantity μ1 − μ2 is negative, which means μ2 is larger than μ1 . Mean anger ratings are higher when hearing statements in verb form. (d) The key word in this question is caused. Because the participants were randomly assigned to one of the two conditions, this is a randomized experiment. Yes, we can state that the language structure caused the difference in mean anger levels. (e) It does matter, and you should use the noun form to keep anger levels lower. 3.68
(a) Yes, we can conclude that increased soda consumption is associated with an increased risk of death from all causes, because the confidence interval does not include 1.
CHAPTER 3
123
(b) No, we cannot conclude that increased soda consumption causes a greater risk of death. This study was not an experiment so we cannot conclude causation. There are many possible confounding variables here. Can you think of some? 3.69 We are 95% confident that the proportion of all adults in the US who think a car is a necessity is between 0.83 and 0.89. 3.70
(a) The population is all cell phone users age 18 and older in the US. The population parameter of interest is μ, the mean number of text messages sent and received per day. The best point estimate for μ is the sample mean, x = 41.5.
(b) The point estimate is x, so a 95% confidence interval is given by: x
±
2 · SE
41.5 41.5 29.3
± ± to
2(6.1) 12.2 53.7
We are 95% confident that the mean number of text messages a day for all cell phone users in the US is between 29.3 and 53.7. 3.71 We are estimating p, the proportion of all US adults who agree with the statement that each person has one true love. The best point estimate is p̂ = 735/2625 = 0.28. We find the confidence interval using: p̂ 0.28 0.28 0.262
± ± ±
2 · SE 2(0.009) 0.018
to
0.298
The margin of error for our estimate is 0.018 or 1.8%. We are 95% sure that the proportion of all US adults who agree with the statement on one true love is between 0.262 and 0.298. 3.72 We are estimating pM − pF , the difference in proportions between males and females. For males, we have p̂M = 372/1213 = 0.31 and for females, we have p̂F = 363/1412 = 0.26. The best point estimate for the difference in proportions is p̂M − p̂F = 0.31 − 0.26 = 0.05. We find the confidence interval using: (p̂M − p̂F )
±
2 · SE
(0.31 − 0.26) 0.05
± ±
2(0.018) 0.036
0.014
to
0.086
We are 95% confident that the difference in proportion agreeing that we have only one true love between males and females is between 0.014 and 0.086. Since zero is not in this interval, it is not one of the plausible values for the difference. We are fairly sure that the difference in these proportions is positive; thus men are more likely than women to agree with the statement on one true love. 3.73
(a) We are 95% confident that the mean response time for game players minus the mean response time for non-players is between −1.8 and −1.2. In other words, mean response time for game players is less than the mean response time for non-players by between 1.8 and 1.2 seconds.
124
CHAPTER 3
(b) It is not likely that they are basically the same, since the option of the difference in means being zero is not in the interval. The game players are faster, and we can tell this because the confidence interval for μg − μng has only negative values so the mean time is smaller for the game players. (c) We are 95% confident that the mean accuracy score for game players minus the mean accuracy score for non-players is between −4.2 and 5.8. (d) It is likely that they are basically the same, since the option of the difference in means being zero is in the interval. There is little discernible difference in accuracy between game players and non-game players. 3.74
(a) This is a matched pairs design since all participants participated in both treatments (canned soup for five days and fresh soup for five days). There might be a great deal of variability in people’s BPA concentrations and a matched pairs experiment reduces that variability.
(b) The population is all people, and we are estimating μC − μF , where μC is mean urinary BPA concentration after eating canned soup for five days and μF is mean urinary BPA concentration after eating fresh soup for five days. Since this is a matched pairs design, we could also use μD where μD is the mean difference in urinary BPA concentration between the two treatments. (c) We are 95% confident that BPA concentration is, on average, between 19.6 and 25.5 μg/L higher in people who have eaten canned soup for five days than it is in people who have eaten fresh soup for five days. (d) A larger sample size increases the accuracy, so we would expect the confidence interval to be narrower. 3.75
(a) Using the margin of error, we see that the likely proportion voting for Candidate A ranges from 49% to 59%. Since this interval includes some proportions below 50% as plausible values for the election proportion, we cannot be very confident in the outcome.
(b) Using the margin of error, we see that the likely proportion voting for Candidate A ranges from 51% to 53%. Since all values in this interval are over 50%, we can be relatively confident that Candidate A will win. (c) Using the margin of error, we see that the likely proportion voting for Candidate A ranges from 51% to 55%. Since all values in this range are over 50%, we can be relatively confident that Candidate A will win. (d) Using the margin of error, we see that the likely proportion voting for Candidate A ranges from 48% to 68%. Since this interval includes some proportions below 50% as plausible vaues for the election proportion, we cannot be very confident in the outcome. 3.76
(a) The parameter of interest is μ, the mean effect on weight 2.5 years after a month of overeating and being sedentary.
(b) The only way to find the exact value would be to have all members of a population overeat and be inactive for a month and then measure the effect 2.5 years later. This is not a good idea! (c) The 95% confidence interval using the standard error is x ± 2 · SE = 6.8 ± 2(1.2) = 6.8 ± 2.4. We are 95% sure that the mean weight gain over 2.5 years by people who overeat for a month is between 4.4 and 9.2 pounds. (d) The margin of error is ±2.4 which means we are relatively confident that our estimate of 6.8 pounds is within 2.4 pounds of the true mean weight gain for the population.
CHAPTER 3
125
3.77 Since the confidence interval −2.53 to 7.33 includes negative, positive, and zero values as plausible values for the slope of the population regression line, the association in this case might be positive or negative or non-existent. This confidence interval represents female offspring, where the association is inconclusive. The confidence interval −8.38 to −0.60 has all negative values as plausible values for the slope of the population regression line, so this confidence interval indicates that we are 95% confident that there is a negative association between the two variables. This confidence interval represents male offspring. 3.78
(a) Notation is p̂. Values will vary.
(b) Answers will vary. (c) Answers will vary. (d) Your answer will be yes about 95% of the time. 3.79
(a) Interval is for the mean, not all students.
(b) Interval is for the population mean, not the sample mean. (c) The interval is not uncertain, only whether or not it captures the population mean. (d) Interval is trying to capture the mean, not 95% of individual student pulse rates. (e) Scope of inference could apply to the mean pulse rate for all students at this college, but sample was not taken from all US college students. (f) The population mean pulse rate is a single fixed value. (g) Interval is for the population mean, not other sample means.
CHAPTER 3
126 Section 3.3 Solutions 3.80
(a) No. The value 12 is not in the original.
(b) No. A bootstrap sample has the same sample size as the original sample. (c) Yes. (d) No. A bootstrap sample has the same sample size as the original sample. (e) Yes. 3.81
(a) Yes.
(b) Yes. (c) No. A bootstrap sample has the same sample size as the original sample. (d) No. The value 78 is not in the original sample. (e) Yes. (f) Yes. 3.82 The distribution appears to be centered near 0.7 so the point estimate is about 0.7. Using the 95% rule, we estimate that the standard error is about 0.1 (since about 95% of the values appear to be within 0.2 of the center). Thus our interval estimate is Statistic
±
2 · SE
0.7 0.7
± ±
2(0.1) 0.2
to
0.9
0.5
The parameter being estimated is a proportion p, and the interval 0.5 to 0.9 gives plausible values for the population proportion p. Answers may vary. 3.83 The distribution appears to be centered near 25 so the point estimate is about 25. Using the 95% rule, we estimate that the standard error is about 3 (since about 95% of the values appear to be within 6 of the center). Thus our interval estimate is Statistic
±
2 · SE
25 25
± ± to
2(3) 6 31
19
The parameter being estimated is a mean μ, and the interval 19 to 31 gives plausible values for the population mean μ. Answers may vary. 3.84 The distribution appears to be centered near 0.4 so the point estimate is about 0.4. Using the 95% rule, we estimate that the standard error is about 0.05 (since about 95% of the values appear to be within 0.1 of the center). Thus our interval estimate is Statistic 0.4
± ±
2 · SE 2(0.05)
0.4 0.3
± to
0.1 0.5
CHAPTER 3
127
The parameter being estimated is a correlation ρ, and the interval 0.3 to 0.5 gives plausible values for the population correlation ρ. Answers may vary. 3.85 The distribution appears to be centered near 6 so the point estimate is about 6. Using the 95% rule, we estimate that the standard error is about 4 (since about 95% of the values appear to be within 8 of the center). Thus our interval estimate is Statistic 6
± ±
2 · SE 2(4)
6
± to
8
−2
14
The parameter being estimated is a difference in means μ1 − μ2 , and the interval −2 to 14 gives plausible values for the difference in population means μ1 − μ2 . Answers may vary. 3.86 The statistic for the sample is p̂ = 35/100 = 0.35. Using technology, the standard deviation of the sample proportions for 1000 bootstrap samples is about 0.048 (answers may vary slightly), so we estimate the standard error is SE≈ 0.048. Thus our interval estimate is Statistic
±
2 · SE
0.35 0.35
± ±
2(0.048) 0.096
to
0.446
0.254
Plausible values of the population proportion range from 0.254 to 0.446. 3.87 The statistic for the sample is p̂ = 180/250 = 0.72. Using technology, the standard deviation of the sample proportions for 1000 bootstrap samples is about 0.028 (answers may vary slightly), so we estimate the standard error is SE≈ 0.028. Thus our interval estimate is Statistic 0.72
± ±
2 · SE 2(0.028)
0.72 0.664
± 0.056 to 0.776
Plausible values of the population proportion range from 0.664 to 0.776. 3.88 The statistic for the sample is p̂ = 112/400 = 0.28. Using technology, the standard deviation of the sample proportions for 1000 bootstrap samples is about 0.022 (answers may vary slightly), so we estimate the standard error is SE≈ 0.022. Thus our interval estimate is Statistic 0.28
± ±
2 · SE 2(0.022)
0.28 0.236
± 0.044 to 0.324
Plausible values of the population proportion range from 0.236 to 0.324.
CHAPTER 3
128
3.89 The statistic for the sample is p̂ = 382/1000 = 0.382. Using technology, the standard deviation of the sample proportions for 1000 bootstrap samples is about 0.015 (answers may vary slightly), so we estimate the standard error is SE≈ 0.015. Thus our interval estimate is Statistic
±
2 · SE
0.382 0.382 0.352
± 2(0.015) ± 0.03 to 0.412
Plausible values of the population proportion range from 0.352 to 0.412. 3.90
(a) The best point estimate is the sample proportion, p̂ = 26/174 = 0.149.
(b) We can estimate the standard error using the 95% rule, or we can find the standard deviation of the bootstrap statistics in the upper right of the figure. We see that the standard error is about 0.028. Answers will vary slightly with other simulations. (c) We have p̂
±
2 · SE
0.149 0.149
± ±
2(0.028) 0.056
to
0.205
0.093
We are 95% confident that the percent of all snails of this kind that will live after being eaten by a bird is between 9.3% and 20.5%. (d) Yes, 20% is within the range of plausible values in the 95% confidence interval. 3.91
(a) The best estimate is the value of the sample statistic, and the bootstrap distribution is centered at the value of the sample statistic. Thus, our best estimate is where the bootstrap distribution is centered, which is at 0.49.
(b) We use the 95% rule to estimate the standard error. It appears that about 95% of the bootstrap statistics are between about 0.44 and about 0.54. These values are about 0.05 away from the center value of 0.49, so we estimate that the standard error is about 0.05/2 = 0.025. (c) We find the 95% confidence interval using Statistic 0.49 0.49 0.44
± ± ±
2 · SE 2 · (0.025) 0.05
to 0.54
We are 95% confident that the proportion of students who are distracted by off-task technology use is between 0.44 and 0.54. 3.92
(a) We see that the sample proportion is p̂ = 244/260 = 0.938. Using a bootstrap distribution, we see that the standard error for this statistic is 0.015. The 95% confidence interval is Statistic 0.938
± ±
2 · SE 2 · (0.015)
0.938 0.908
± 0.03 to 0.968
CHAPTER 3
129
We are 95% confident that the proportion of US young adults who can recall a family story about their parents is between 0.908 and 0.968. (b) Yes, since all plausible values of the proportion in the confidence interval are larger than 0.90, we can be confident that the claim is correct. 3.93 We see that the sample proportion is p̂ = 644/920 = 0.70. Using a bootstrap distribution, we see that the standard error for this statistic is 0.015. The 95% confidence interval is Statistic 0.70
± ±
2 · SE 2 · (0.015)
0.70 0.67
± to
0.03 0.73
We are 95% confident that the proportion of US teens who believe anxiety and depression are major problems for their peers is between 0.67 and 0.73. 3.94
(a) In households with incomes below $75,000, the proportion of teens saying it is a major problem is p̂1 = 386/536 = 0.720. In households with incomes above $75,000, the proportion saying it is a major problem is p̂2 = 258/384 = 0.672.
(b) The relevant statistic is a difference in sample proportions, and we have p̂1 − p̂2 = 0.720 − 0.672 = 0.048 (c) We see from a bootstrap distribution of this statistic that the standard error is SE = 0.031. (d) The 95% confidence interval for this difference in proportions is: Statistic
±
2 · SE
0.048 0.048
± ±
2 · (0.031) 0.062
to
0.110
−0.014
(e) No, we cannot conclude that there is a difference because the possibility of no difference (zero difference) is included in the confidence interval. 3.95
(a) We find for the 8 values in the table that x = 34.0 and s = 14.63.
(b) We put the 8 values on the 8 slips of paper and mix them up. Draw one and write down the value and put it back. Mix them up, draw another, and do this 8 times. The resulting 8 numbers form a bootstrap sample, and the mean of those 8 numbers form one bootstrap statistic. (c) We expect that the bootstrap distribution will be bell-shaped and centered at approximately 34. (d) The population parameter of interest is the mean, μ, number of ants on all possible peanut butter sandwich bits set near this ant hill. There are other possible answers for the population; for example, you might decide to limit it to the time of day at which the student conducted the study. The best point estimate is the sample mean x = 34.
CHAPTER 3
130 (e) We have x 34.0
± ±
2 · SE 2(4.85)
34.0 24.3
± to
9.7 43.7
We are 95% confident that the mean number of ants to climb on a bit of peanut butter sandwich left near an ant hill is between 24.3 ants and 43.7 ants. 3.96
(a) The original data is shown in Dotplot I which has only 25 dots and a lot of variability. The bootstrap distribution for many sample means is shown in Dotplot II.
(b) The sample mean hippocampal volume appears to be about x = 7600. This can be seen in either the original data dotplot or (more easily) the bootstrap distribution. (c) We need to use the bootstrap distribution (Dotplot II) to estimate the standard error. Using the 95% rule, the standard error appears to be approximately SE ≈ 200. (d) The standard deviation of the original data (in Dotplot I) appears to be larger than the standard error, as the data in the original sample has more variability than the simulated sample means in the bootstrap distribution. (e) The rough 95% confidence interval is given by Statistic
±
2 · SE
x 7600 7600
± ± ±
2 · SE 2(200) 400
to
8000
7200
We are 95% confident that mean hippocampal volume, in μL, for non-football playing people is between 7200 and 8000. 3.97
(a) The best point estimate for the proportion, p, of rats showing empathy is p̂ = 23/30 = 0.767.
(b) On 23 of the slips, we write “yes” (showed empathy) and on the other 7, we write “no”. We then mix up the slips of paper, draw one out and record the result, yes or no. Put the slip of paper back and repeat the process 30 times. This set of yes’s and no’s is our bootstrap sample. The proportion of yes’s in the sample is our bootstrap statistic. (c) Using technology, we see that the bootstrap distribution is bell-shaped and centered approximately at 0.767. We also see that the standard error is about 0.077.
CHAPTER 3
131
(d) We have p̂ 0.767
± ±
2 · SE 2(0.077)
0.767 0.613
± to
0.154 0.921
For all laboratory rats, we are 95% confident that the proportion of rats that will show empathy in this manner is between 61.3% and 92.1%. 3.98 The sample proportion of females showing compassion is p̂F = 6/6 = 1.0. The sample proportion of males showing compassion is p̂M = 17/24 = 0.708. The best point estimate for the difference in proportions pF − pM is p̂F − p̂M = 1.0 − 0.708 = 0.292. Using StatKey to create a bootstrap distribution for a difference in proportions using this sample data, we see a standard error of 0.094.
CHAPTER 3
132 We have (p̂F − p̂M ) (1.0 − 0.708) 0.292
± ± ±
2 · SE 2(0.094) 0.188
0.104
to
0.480
Based on this interval the percentage of female rats likely to show compassion is between 10.4% and 48% higher than the percentage of male rats likely to show compassion. Since zero is not in the interval estimate, it is not very plausible that male and female rats are equally compassionate. 3.99
(a) The standard error is about 0.015 since most (roughly 95%) of the bootstrap distribution is between 0.12 and 0.18, which is about two standard deviations on either side of the center at 0.15.
(b) The 95% confidence interval is given by: (p̂t − p̂a ) (0.87 − 0.72)
± ±
2 · SE 2 · (0.015)
0.15 0.12
± to
0.03 0.18
We are 95% sure that the proportion of teens who text is between 0.12 and 0.18 higher than the proportion of adults who text. 3.100 Using StatKey or other technology, we create a bootstrap distribution to estimate the difference in means μt − μc where μt represents the mean immune response for tea drinkers and μc represents the mean immune response for coffee drinkers. In the original sample the means are xt = 34.82 and xc = 17.70, respectively, so the point estimate for the difference is xt − xc = 34.82 − 17.70 = 17.12. We see from the bootstrap distribution that the standard error for the differences in bootstrap means is about SE = 7.9. This will vary for other sets of bootstrap differences.
CHAPTER 3
133
For a 95% confidence interval, we have (xt − xc )
±
2 · SE
(34.82 − 17.70) 17.12
± ±
2(7.9) 15.8
1.32
to
32.92
We are 95% sure that the mean immune response is between 1.32 and 32.92 units higher in tea drinkers than it is in coffee drinkers. 3.101 (a) The mean amount of depreciation for this sample of 20 car models is $2356 with a standard deviation of $858. (b) A bootstrap distribution for 2000 sample means from the Depreciation variable is shown below.
The bootstrap distribution is symmetric and bell shaped around the mean of 2354, and has a standard deviation of 187. (c) The 95% confidence interval for mean car depreciation can be found by (2356−2(187), 2356+2(187)) = (1982, 2730). We are 95% sure that the mean amount of depreciation when new cars leave the lot is between $1982 and $2730. 3.102
(a) The correlation between New price and Depreciation in these data is 0.126.
(b) The standard error in a bootstrap distribution of correlations is around 0.24. (c) The 95% confidence interval can be found by (0.126 − 2(0.24), 0.126 + 2(0.24)) = (−0.354, 0.606). We are 95% sure that the correlation between the original price and depreciation of new car models is between −0.354 and 0.606. 3.103
24 (a) The proportion of left-handers in the sample of people with cluster headaches is p̂ = 273 = 0.088.
(b) Using StatKey, we obtain the bootstrap distribution below which indicates the standard error for the sample proportions based on the cluster headache sample is about 0.017.
CHAPTER 3
134
The confidence interval for the proportion is 0.088 ± 2 · 0.017 = 0.088 ± 0.034 = 0.054 to 0.122 We are 95% confident that the proportion of cluster headache sufferers who are left handed is between 0.054 and 0.122. 42 = 0.088. (c) The proportion of left-handers in the sample of people with migraine headaches is p̂ = 477
(d) Using StatKey, we obtain the bootstrap distribution below which indicates the standard error for the sample proportions based on the migraine headache sample is about 0.013.
The confidence interval for the proportion is 0.088 ± 2 · 0.013 = 0.088 ± 0.026 = 0.062 to 0.114 We are 95% confident that the proportion of migraine headache sufferers who are left handed is between 0.062 and 0.114. (e) The confidence interval for the migraine sufferers (d) is slightly more narrow. This is expected, because the sample proportions are similar but the sample size is larger for migraine sufferers.
CHAPTER 3
135
3.104 (a) We are estimating a mean, so the notation is μ. We define μ to be mean closeness rating all people (or maybe all high school students in Brazil) give to others in a group. (b) We see that the mean closeness before an activity is x = 4.873. (c) Using a bootstrap distribution, we see that SE ≈ 0.114. (d) The 95% confidence interval is given by Statistic x
± ±
2 · SE 2 · SE
4.873 4.873
± ±
2(0.114) 0.228
to
5.101
4.645
(e) We are 95% confident that mean closeness rating for those in one’s group before an activity is between 4.645 and 5.101. 3.105 (a) We are estimating a difference in means, so the notation is μS − μN , where μS represents mean closeness rating for people who have just performed a synchronized activity and μN represents mean closeness rating for people who have performed such a non-synchronized activity. (b) We see that the difference in mean closeness ratings is xS − xN = 5.275 − 4.810 = 0.465. (c) Using a bootstrap distribution, we see that SE ≈ 0.228. (d) The 95% confidence interval is given by Statistic (xS − xN )
± ±
2 · SE 2 · SE
0.465 0.465
± ±
2(0.228) 0.456
to
0.921
0.009
(e) We are 95% confident that people who have just done a synchronized activity give a mean closeness rating between 0.009 and 0.921 higher than people who have just done a non-synchronized activity. 3.106 (a) We are estimating a difference in means, so the notation is μS − μN , where μS represents mean pain tolerance for people who have just performed a synchronized activity and μN represents mean pain tolerance for those who have performed a non-synchronized activity. (b) We see that the difference in mean pain tolerance for the samples is xS − xN = 224.428 − 211.087 = 13.341. (c) Using a bootstrap distribution, we see that SE ≈ 9.0. (d) The 95% confidence interval is given by Statistic
±
2 · SE
(xS − xN ) 13.341
± ±
2 · SE 2(9.0)
13.341 −4.659
± 18.0 to 31.341
CHAPTER 3
136
(e) We are 95% confident that mean pain tolerance for people who have done a synchronized activity is between 4.659 mmHg below and 31.341 mmHg above the mean pain tolerance for those have done a non-synchronized activity. 3.107 (a) We are estimating a difference in means, so the notation is μH − μL , where μH represents mean pain tolerance for people who have just performed a high exertion activity and μL represents mean pain tolerance for those who have just performed a low exertion activity. (b) We see that the difference in mean pain tolerance for the samples is xH − xL = 224.130 − 211.413 = 12.717. (c) Using a bootstrap distribution, we see that SE ≈ 9.0. (d) The 95% confidence interval is given by Statistic
±
2 · SE
(xS − xN ) 12.717
± ±
2 · SE 2(9.0)
12.717 −5.283
± 18.0 to 30.717
(e) We are 95% confident that mean pain tolerance for people who have done a high exertion activity is between 5.283 mmHg below and 30.717 mmHg above the mean pain tolerance for those who have done a low exertion activity. 3.108 (a) We are estimating a proportion, so the notation is p, where p represents the proportion of people (or, more accurately, high school students in Brazil) who would allow the pressure to reach its maximum level after treatment. (b) We have proportion of subjects in the sample who reached the pain maximum is p̂ = 75/264 = 0.284. (c) Using a bootstrap distribution, we see that SE ≈ 0.028. (d) The 95% confidence interval is given by Statistic
±
2 · SE
p̂ 0.284
± ±
2 · SE 2(0.028)
0.284 0.228
± to
0.056 0.340
(e) We are 95% confident that the proportion of Brazil students able to “go to the max” after treatment is between 0.228 and 0.340. 3.109 (a) We are estimating a difference in proportions, so the notation is pF − pM , where pF represents the proportion of females who would allow the pressure to reach its maximum level and pM represents the proportion of males who would allow the pressure to reach its maximum level. (b) We have p̂F = 33/165 = 0.200 and p̂M = 42/99 = 0.424, so the difference in sample proportions is p̂F − p̂M = 0.200 − 0.424 = −0.224. (c) Using a bootstrap distribution, we see that SE ≈ 0.059.
CHAPTER 3
137
(d) The 95% confidence interval is given by Statistic (p̂F − p̂M )
± ±
2 · SE 2 · SE
−0.224 −0.224
± ±
2(0.059) 0.118
to
−0.106
−0.342
(e) We are 95% confident that the proportion of females able to “go to the max” is between 0.342 and 0.106 lower than the proportion of males able to do this. Females appear to have lower pain tolerance. 3.110
(a) The original sample is right-skewed with outliers at 75, 81, and 89 minutes.
(b) We find that the mean of the penalty minutes is x = 24.46 with a standard deviation of s = 24.92 minutes. (c) The distribution of bootstrap means is fairly symmetric and centered near 24.5. It does not show the same skewness as in the sample.
(d) The standard deviation of these bootstrap means is SE = 4.78 (answers will vary for other simulations) which is much smaller than the standard deviation in the sample, s = 24.92. (e) For a 95% confidence interval, we have x 24.46
± ±
2 · SE 2 · 4.78
24.46 14.90
± to
9.56 34.02 minutes
CHAPTER 3
138
(f) The style of play on one team might be more or less aggressive than the league as a whole, so the estimate of mean penalty minutes could be biased. 3.111 The standard deviation for the sample of penalty minutes for n = 26 players is s = 24.92 minutes. For one set of 5000 bootstrap sample standard deviations (shown below), the estimated standard error is SE = 4.35.
Based on this the interval estimate is s
±
2 · SE
24.92 24.92
± ±
2 · 4.35 8.70
to
33.62
16.22
We estimate that the standard deviation in penalty minutes for all NHL players is somewhere between 16.2 and 33.6 minutes.
CHAPTER 3
139
Section 3.4 Solutions
3.112
(a) We keep the middle 95% of values by chopping off 2.5% from each tail.
(b) We keep the middle 90% of values by chopping off 5% from each tail. (c) We keep the middle 98% of values by chopping off 1% from each tail. (d) We keep the middle 99% of values by chopping off 0.5% from each tail. 3.113 (a) We keep the middle 95% of values by chopping off 2.5% from each tail. Since 2.5% of 1000 is 25, we eliminate the 25 highest and the 25 lowest values to create the 95% confidence interval. (b) We keep the middle 90% of values by chopping off 5% from each tail. Since 5% of 1000 is 50, we eliminate the 50 highest and the 50 lowest values to create the 90% confidence interval. (c) We keep the middle 98% of values by chopping off 1% from each tail. Since 1% of 1000 is 10, we eliminate the 10 highest and the 10 lowest values to create the 98% confidence interval. (d) We keep the middle 99% of values by chopping off 0.5% from each tail. Since 0.5% of 1000 is 5, we eliminate the 5 highest and the 5 lowest values to create the 99% confidence interval. 3.114 To find a 99% confidence interval, we go farther out on either side than for a 95% confidence interval, so (A) is the most likely result. 3.115 To find a 90% confidence interval, we go less far out on either side than for a 95% confidence interval, so (C) is the most likely result. 3.116 If the sample size goes up, we get greater accuracy and the spread of the bootstrap distribution decreases, so the confidence interval will be narrower. Thus, (C) is the most likely result. 3.117 If the sample size is smaller, we have less accuracy and the spread of the bootstrap distribution increases, so the confidence interval will be wider. Thus, (A) is the most likely result. 3.118 As long as the number of bootstrap samples is reasonable, the width of the confidence interval does not change much as we take more or fewer bootstrap samples. Thus, (B) is the most likely result. 3.119 As long as the number of bootstrap samples is reasonable, the width of the confidence interval does not change much as we take more or fewer bootstrap samples. Thus, (B) is the most likely result. 3.120 The sample proportion who agree is p̂ = 35/100 = 0.35. One set of 1000 bootstrap proportions is shown in the figure below. For a 95% confidence interval we need to find the 2.5%-tile and 97.5%-tile, leaving 95% of the distribution in the middle. For this distribution those points are at 0.26 and 0.44, so we are 95% sure that the proportion in the population who agree is between 0.26 and 0.44. Answers will vary slightly for different simulations.
140
CHAPTER 3
3.121 The sample proportion who agree is p̂ = 180/250 = 0.72. One set of 1000 bootstrap proportions is shown in the figure below. For a 95% confidence interval we need to find the 2.5%-tile and 97.5%-tile, leaving 95% of the distribution in the middle. For this distribution those points are at 0.664 and 0.776, so we are 95% sure that the proportion in the population who agree is between 0.664 and 0.776. Answers will vary slightly for different simulations.
3.122 The sample proportion who agree is p̂ = 112/400 = 0.28. One set of 1000 bootstrap proportions is shown in the figure below. For a 90% confidence interval we need to find the 5%-tile and 95%-tile, leaving 90% of the distribution in the middle. For this distribution those points are at 0.242 and 0.315, so we are 90% sure that the proportion in the population who agree is between 0.242 and 0.315. Answers will vary slightly for different simulations.
CHAPTER 3
141
3.123 The sample proportion who agree is p̂ = 382/1000 = 0.382. One set of 1000 bootstrap proportions is shown in the figure below. For a 99% confidence interval we need to find the 0.5%-tile and 99.5%-tile, leaving 99% of the distribution in the middle. For this distribution those points are at 0.343 and 0.423, so we are 99% sure that the proportion in the population who agree is between 0.343 and 0.423. Answers will vary slightly for different simulations.
3.124 (a) The bootstrap distribution is centered at about 100, so we estimate that the sample mean of the original IQ scores is x ≈ 100. (b) Since we are finding a 99% confidence interval, we want to keep the middle 99%. That means we want an interval that includes the middle 990 of the 1000 bootstrap statistics. We need to cut off 5 values on each end, which appears to give an interval from about 88 to 112. 3.125 The 98% confidence interval uses the 1%-tile and 99%-tile from the bootstrap means. We are 98% sure that the mean number of penalty minutes for NHL players in a season is between 14.3 and 36.7 minutes. 3.126 Using StatKey or other technology, we produce a bootstrap distribution such as the figure shown below. For a 90% confidence interval, we find the 5%-tile and 95%-tile points in this distribution to be 0.730
142
CHAPTER 3
and 0.774. We are 90% confident that the percent of American adults who think exercise is an important part of daily life is between 73.0% and 77.4%.
3.127 After creating the bootstrap distribution, we use the boundaries for the middle 90% of bootstrap statistics to find the confidence interval. The 90% confidence interval is about 0.58 to 0.81. We are 90% confident that the proportion of all college instructors who are bothered by student off-task phone use during class is between 0.58 and 0.81. 3.128 We are finding a confidence interval for a proportion. Using StatKey or other technology, we generate a bootstrap distribution with this data and then find the endpoints with 90% of the bootstrap statistics in the middle. We see that a 90% confidence interval is 0.162 to 0.202. We are 90% confident that the proportion of all US adults who would say they are poor is between 0.162 and 0.202. 3.129 Using StatKey or other technology, we produce a bootstrap distribution such as the figure shown below. For a 99% confidence interval, we find the 0.5%-tile and 99.5%-tile points in this distribution to be 0.467 and 0.493. We are 99% confident that the percent of all Europeans (from these nine countries) who can identify arm or shoulder pain as a symptom of a heart attack is between 46.7% and 49.3%. Since every value in this interval is below 50%, we can be 99% confident that the proportion is less than half.
CHAPTER 3
143
3.130 (a) The is a question involving one quantitative variable (pesticide concentration) and one categorical variable (eating non-organic or organic), so a difference in means would be an appropriate parameter of interest. Let μ1 − μ2 be the average concentration of 3-PBA while eating non-organic minus the average concentration of 3-PBA while eating organic. (b) The bootstrap distribution will be centered around the sample statistic, which in this case is a difference in means. Therefore we have x1 − x2 ≈ 24.5. (c) A 99% confidence interval would leave 99% in the middle, so 0.5% = 0.005 in each tail. Because there are 1000 simulated bootstrap samples, this means we should leave 0.005 × 1000 = 5 dots in each tail, resulting in a 99% confidence interval from about 15 to 35 μg/g crt. (d) We are 99% confident that average concentration of 3-PBA is between 15 and 35 μg/g crt higher while not eating organic than while eating organic. (e) Yes, this interval provides evidence that concentrations of 3-PBA are lower while eating organic, because the entire range of the interval is positive, providing evidence that concentrations are higher, on average, while eating non-organic. (f) No, we cannot use this data to make claims about causality, because it was not randomly determined whether people ate organic or not (it was an experiment and it was controlled by the researchers, but they did not use randomization at all). 3.131 (a) For a 99% confidence interval, we want 99% in the middle, which means leaving 1% total in the tails, so 0.5% in each tail. Therefore, we are looking for the 0.5th and 99.5th percentiles, which are 15.8 and 32.4. Therefore, our 99% confidence interval is (15.8, 32.4) μg/g crt. (b) We are 99% confident that, on average, concentration of 3-PBA is between 15.8 and 32.4 μg/g crt higher while not eating organic as opposed to while eating organic. 3.132 (a) Increasing the sample size decreases the standard error, so will also decrease the width of the confidence interval. (b) Simulating more bootstrap samples will make the dotplot taller (more dots will continually be stacked on top of each other), but will not change the shape, center, or spread of the bootstrap distribution in any noticeable way (besides just natural random variation), so the width of the confidence interval will remain the same. (c) Decreasing the confidence level means we will take only the middle 95%, instead of the middle 99%, so this will decrease the width of our confidence interval. 3.133
(a) The sample difference in proportions is p̂1 − p̂2 = 111/240 − 24/240 = 0.46 − 0.10 = 0.36.
(b) A 98% confidence interval would leave 98% in the middle, so 1% = 0.01 in each tail. Because there are 1000 simulated bootstrap samples, this means we should leave 0.01 × 1000 = 10 dots in each tail, resulting in a 98% confidence interval from about 0.28 to about 0.45. (c) We are 98% confident that the proportion of measurements yielding positive pesticide detection is between 0.28 and 0.45 higher while not eating organic as opposed to while eating organic. 3.134 Using a bootstrap distribution for the difference in means, we estimate the 90% confidence interval to be approximately 11.1 to 33.1. We estimate that, on average, mice getting blood from young mice are able to run about 11 to 33 minutes longer on a treadmill than mice getting blood from old mice. This is a pretty impressive difference!
144
CHAPTER 3
3.135 (a) We are estimating a difference in proportions, so the notation is pT −pN , where pT is proportion of people with staph infections of all people with triclosan in their system and pN is the proportion of people with staph infections of all people who do not have triclosan in their systems. (b) The best estimate is the difference in sample proportions. We have p̂T = 24/37 = 0.649 and p̂N = 15/53 = 0.283, so the difference in sample proportions is p̂T − p̂N = 0.649 − 0.283 = 0.366. (c) Using a bootstrap distribution, we see that a 99% confidence interval for the difference in proportions is approximately 0.100 to 0.614. (d) We see that the proportion of people with staph infections is between 0.10 and 0.61 higher for people with triclosan in their system than it is for people without triclosan. Yes, since all of these plausible values for the difference in proportions are positive, we can conclude that people with triclosan in their system are more likely to have staph infections. We cannot conclude causation, however, since this is not an experiment.
3.136 We are finding a 90% confidence interval for a proportion. In StatKey, we can use the Edit Data button to enter the proportion data. (a) The sample statistic is p̂ = 118/140 = 0.843. Using StatKey or other technology to create a bootstrap distribution, we see that a 90% confidence interval for the population proportion is about 0.79 to 0.89. (b) The sample statistic is p̂ = 57/85 = 0.671. Using StatKey or other technology to create a bootstrap distribution, we see that a 90% confidence interval for the population proportion is about 0.59 to 0.75. (Notice that the effect isn’t quite as strong in this case, probably since we are more used to saying “Heads or Tails” instead of “Tails or Heads”.) (c) The sample statistic is p̂ = 89/99 = 0.899. Using StatKey or other technology to create a bootstrap distribution, we see that a 90% confidence interval for the population proportion is about 0.85 to 0.95. (d) The sample statistic is p̂ = 79/98 = 0.806. Using StatKey or other technology to create a bootstrap distribution, we see that a 90% confidence interval for the population proportion is about 0.74 to 0.87. Notice that, in every case, we can be 90% sure that a majority of people will select the first option.
3.137 (a) We are estimating a proportion, so the notation for the parameter is p and we define it as p = the proportion of all US teens who have used an e-cigarette in the last 30 days. (b) Using a bootstrap distribution, we see that a 99% confidence interval for the proportion of all US teens who have used an e-cigarette in the last 30 days is about 0.25 to 0.30. (c) Yes. Since the entire confidence interval is less than 0.333, we can be confident that less than 1/3 of all US teens used e-cigarettes in the last 30 days.
3.138 Using one bootstrap distribution (as shown below), the standard error is SE = 0.19.
CHAPTER 3
145
The mean tip from the original sample is x = 3.85, so a 95% confidence interval using the standard error is x 3.85
± ±
2 · SE 2(0.19)
3.85 3.47
± to
0.38 4.23
For this bootstrap distribution, the 95% confidence interval using the 2.5%-tile and 97.5%-tile is 3.47 to 4.23. We see that the results (rounding to two decimal places) are the same. We are 95% confident that the average tip left at this restaurant is between $3.47 and $4.23. 3.139 (a) A 99% confidence interval is wider than a 90% confidence interval, so the 90% interval is A (3.55 to 4.15) and the 99% interval is B (3.35 to 4.35). (b) We multiply the lower and upper bounds for the average tip by 20 to get the average daily tip revenue (assuming 20 tables per day). With 90% confidence, the interval is 20 · 3.55 = 71 to 20 · 4.15 = 83. With 99% confidence, the interval is 20 · 3.35 = 67 to 20 · 4.35 = 87. We are 90% confident that this waitress will average between 71 and 83 dollars in tip income per day, and we are 99% confident that her mean daily tip income is between 67 and 87 dollars. 3.140 (a) We have p̂m = 27/193 = 0.140 and p̂f = 16/169 = 0.095 so the best point estimate for the difference in population proportions is p̂m − p̂f = 0.140 − 0.095 = 0.045. In this sample, a larger proportion of males smoke. (b) Using StatKey or other technology, we create a bootstrap distribution and find the boundaries for the middle 99% of values. We see that a 99% confidence interval for pm − pf is the interval from about −0.039 to 0.132. We are 99% confident that the difference between males and females in the proportion that smoke is between −0.039 and 0.132.
146
CHAPTER 3
3.141 (a) The population of interest is all FA premier league football matches. The specific parameter of interest is proportion of matches the home team wins. (b) Our best estimate for the parameter is 70/120 = 0.583. (c) Using StatKey or other technology, we create a bootstrap distribution as shown below. Taking 5% from each tail, the 90% confidence interval is 0.508 to 0.650. We are 90% sure that the home team wins between 50.8% and 65.0% of all FA premier league football matches.
(d) Using the same bootstrap distribution we see that a 99% confidence interval goes from 0.467 to 0.692. We are 99% sure that the home team wins between 46.7% and 69.2% of all FA premier league football matches. (e) If the population parameter is 0.50 or less, then no home field advantage is present. With the 90% confidence interval we are 90% confident the population parameter is between 0.508 and 0.650. Since this interval does not contain 0.50, we are 90% confident that there is a home field advantage. However the 99% confidence interval does contain 0.50, so we are not 99% confident that there is a home field advantage. 3.142 (a) We have xt − xc = 34.82 − 17.7 = 17.12, where xt represents the sample mean immune response for tea drinkers and xc represents the sample mean immune response for coffee drinkers.
CHAPTER 3
147
(b) We are estimating μt − μc where μt represents the mean immune response for all tea drinkers and μc represents the mean immune response for all coffee drinkers. (c) Using StatKey or other technology, we obtain a bootstrap distribution of sample differences in means as shown below. We see that a 90% confidence interval for the difference in means is about 4.17 to 29.70. We are 90% confident that tea drinkers have a mean immune response between 4.17 and 29.70 higher than the mean immune response for coffee drinkers. Answers may vary for other sets of bootstrap differences in means.
(d) Using the same bootstrap distribution, we see that a 99% confidence interval for the difference in means is about −3.30 to 37.04. We are 99% confident that the difference in mean immune response is between −3.30 and 37.04. (e) We are 90% confident that tea drinkers have a stronger mean immune response, since all values in the 90% confidence interval are positive, but we are not 99% confident, since some plausible values for the difference in means in that interval are negative. 3.143 The difference in sample means is xI − xS = 37.29 − 50.92 = −13.63. A bootstrap distribution for differences in means is shown below.
Using the bootstrap standard error (SE = 3.80) we get a 95% confidence interval with (xI − xS ) ± 2 · SE = −13.63 ± 2 · 3.80 = −13.63 ± 7.60 = (−21.23, −6.03)
CHAPTER 3
148
Thus we conclude that we are 95% sure that the mean cost when paying individually is somewhere between 21.23 and 6.03 shekels less than when splitting the bill. Using percentiles from this bootstrap distribution the interval would go from −21.08 shekels to −6.08 shekels. 3.144 (a) We see that both cities have a significant number of outliers, with very long commute times. The quartiles and median are all bigger for Atlanta than for St. Louis, so we expect that the mean commute time is larger for Atlanta. (b) We are estimating the difference between the cities in mean commute time for all commuters, μatl −μstl . We get a point estimate for the difference in mean commute times between the two cities with the difference in the sample means, xatl − xstl = 29.11 − 21.97 = 7.14 minutes. (c) Since the two samples were taken independently in different cities, for each bootstrap statistic we take 500 Atlanta times with replacement from the original Atlanta data and 500 St. Louis times with replacement from the original St. Louis sample, compute the mean within each sample, and take the difference. This constitutes one bootstrap statistic. (d) A bootstrap distribution for the difference in means with 2000 bootstrap samples is shown in the figure.
The standard error for xatl − xstl , found in the upper corner of the figure, is SE = 1.125. We find an interval estimate for the difference in the population means with 7.14 ± 2 · 1.125 = 7.14 ± 2.25 = (4.89, 9.39) We are 95% confident that the average commuting time for commuters in Atlanta is somewhere between 4.89 and 9.39 minutes more than the average commuting time for commuters in St. Louis. 3.145 (a) The bootstrap distribution below shows the sample means for 5000 bootstrap samples of costs taken from the data in SampColleges2yr. Using percentiles, we see that a 90% confidence interval is (14058, 16758). Thus we are 90% sure that the mean cost at all two-year colleges in the US is between $14,058 and $16,758.
CHAPTER 3
149
(b) Using the cost data in CollegeScores2yr the mean cost at two-year colleges in the US is μ = $16,790. This does not (quite) fall within the 90% confidence interval from part (a). Note that 10% of samples should produce intervals that fail to capture the population mean. 3.146 (a) The bootstrap distribution below shows the sample means for 5000 bootstrap samples of costs taken from the data in SampColleges4yr. Using percentiles, we see that a 90% confidence interval is (29593, 35429). Thus we are 90% sure that the mean cost at all four-year colleges and universities in the US is between $29,593 and $35,429.
150
CHAPTER 3
(b) Using the cost data in CollegeScores4yr the mean cost at four-year schools in the US is μ = $34,277. This falls within the 90% confidence interval from part (a). 3.147 (a) The parameter of interest is ρ, the correlation between weight gain during a month of overeating and inactivity and weight gain over the next 2.5 years, for those adults who spend one month (possibly during December) overeating and being sedentary. The best point estimate for this parameter is r = 0.21. (b) To create the bootstrap sample, we sample from the original sample with replacement. In this case, we randomly select one of the 18 ordered pairs, write down the values, and return them to the pile. Then we randomly select one of the 18 ordered pairs (possibly the same one), and write down those values as our second pair. We do this until we have 18 ordered pairs, and that dataset is our bootstrap sample. (c) For each bootstrap sample, we record the correlation between the one month and 2.5 year weight gains of the 18 ordered pairs. (d) We find the standard error by finding the standard deviation of the 1000 bootstrap correlations. (e) The interval estimate is r ± 2 · SE = 0.21 ± 2(0.14) = 0.21 ± 0.28, so a 95% confidence interval for the population correlation ρ is −0.07 to 0.49. (f) There is a reasonable possibility that there is no correlation at all between the amount of weight gained during the one month intervention and how much weight is gained over the long-term. We know that this is a reasonable possibility because 0 is inside the interval estimate so ρ = 0 is included as one of the plausible values of the population correlation. (g) A 90% confidence interval needs to only include the middle 90% of data values in a bootstrap distribution, so it will be narrower than a 95% confidence interval. 3.148 (a) We see that the bootstrap distribution is relatively symmetric and bell-shaped, so it is reasonable to use the distribution to estimate a 95% confidence interval for the standard deviation of prices of all used Mustang cars. Using either the standard error method or the percentile method (estimating values that include the middle 95%), we estimate a 95% confidence interval to be about 7 to 14. We are 95% confident that the standard deviation of all prices of used Mustangs is between 7 thousand dollars and 14 thousand dollars. (b) This bootstrap distribution is not symmetric and is not bell-shaped. It would not be appropriate to use this distribution to find a 95% confidence interval. The sample size is so small (at only n = 5) that the distribution ends up looking a bit bizarre. It is important to always look at the graph of the distribution. These methods apply only when the bootstrap distribution is reasonably symmetric and bell-shaped. 3.149 The bootstrap distribution for the standard deviations (shown below) has at least four completely separate clusters of dots. It is not at all symmetric and bell-shaped so it would not be appropriate to use this bootstrap distribution to find a confidence interval for the standard deviation. The clusters of dots represent the number of times the outlier is included in the bootstrap sample (with the cluster on the left containing statistics from samples in which the outlier was not included, the next one containing statistics from samples that included the outlier once, the next one containing statistics from samples that included the outlier twice, and so on).
CHAPTER 3
151
CHAPTER 4
152 Section 4.1 Solutions 4.1
(a) We see that Sample A has the largest mean (around 30) and only one data point below 25, so it provides the most evidence for the claim that the mean placement exam score is greater than 25.
(b) Sample C has a mean that is clearly below 25, so it provides no evidence for the claim. 4.2
(a) In Sample B there is almost no overlap in the boxplots, with almost all Restaurant #1 service times less than almost all Restaurant #2 times, so Sample B provides the most evidence for the claim that the mean service time is shorter at Restaurant #1.
(b) In both samples C and D the sample mean service time at Restaurant #1 is actually higher than at Restaurant #2, so both of these samples provide no evidence for the claim. 4.3
(a) Sample A shows a negative association and a stronger association than Sample D, so Sample A provides the most evidence for the claim that the correlation between exam grades and time spent playing video games is negative.
(b) In both samples B and C the association in the scatterplots is positive, so both give no evidence for a negative correlation. 4.4
(a) Sample D has the sample proportion, p̂ = 38/40 = 0.95, that is farthest above 0.75, so it provides the strongest evidence for the claim that more than 75% of US citizens can name the capital city of Canada (Ottawa).
(b) Sample C has a sample proportion, p̂ = 27/40 = 0.675, that is less than p = 0.75 so gives no evidence for the claim that p > 0.75. 4.5 The hypotheses are: H0 : Ha :
μA = μB μA = μB
H0 : Ha :
p = 0.3 p > 0.3
H0 : Ha :
μ = 50 μ < 50
H0 : Ha :
ρ=0 ρ<0
4.6 The hypotheses are:
4.7 The hypotheses are:
4.8 The hypotheses are:
4.9 We define pm to be the proportion of males who smoke and pf to be the proportion of females who smoke. The hypotheses are: H0 :
pm = pf
Ha :
pm > pf
CHAPTER 4
153
4.10 We define ρ to be the correlation between height and salary. The hypotheses are: H0 : Ha :
ρ=0 ρ = 0
4.11 We define p to be the proportion of a population who watch the Home Shopping Network. The hypotheses are: H0 :
p = 0.20
Ha :
p < 0.20
4.12 We define μa to be mean sales in stores where customers are approached and μn to be mean sales in stores where customers are not approached. The hypotheses are: H0 :
μa = μn
Ha :
μa > μn
4.13 We define μf to be mean study time for first year students and μu to be mean study time for upperclass students. The hypotheses are: H0 : Ha : 4.14
μf = μu μf = μu
(a) These hypotheses are valid.
(b) These hypotheses are not valid, since the equality should be in H0 . (c) These hypotheses are not valid, since we need equality in H0 . (d) These hypotheses are not valid, since the hypotheses must be in terms of the population parameters (μ1 , μ2 ), not sample statistics (x1 , x2 ). 4.15
(a) These hypotheses are valid.
(b) These hypotheses are not valid, since statistical hypotheses are statements about a population parameter (p), not a sample statistic (p̂). (c) These hypotheses are not valid, since the equality should be in H0 . (d) These hypotheses are not valid, since a proportion, p, is always between 0 and 1 and can never be 25. 4.16
(a) This is a test for a difference in proportions. Using p1 to represent the proportion of times dogs will open the door when the owner is crying (the distressed condition) and p2 to represent the proportion of times dogs will open the door when the owner is humming (the control condition), the hypotheses are: H0 : Ha :
p1 = p2 p1 > p2
(b) Since the evidence was not strong enough to support the alternative hypothesis, we cannot conclude that dogs are more likely to open the door to be with their owner when the owner is crying compared to when the owner is humming. (Remember that a hypothesis test is about whether we can generalize from a sample to a population, so be sure to state your explanation in terms of the population not the sample.)
CHAPTER 4
154
(c) This is a test for a difference in means. Using μ1 to represent the mean time for a dog to open the door when the owner is crying (the distressed condition) and μ2 to represent the mean time for a dog to open the door when the owner is humming (the control condition), the hypotheses are: H0 :
μ1 = μ2
Ha :
μ1 < μ2
(d) Since the evidence was strong enough to support the alternative hypothesis, we conclude that, on average, dogs will open the door faster to be with their owner when the owner appears to be distressed. (Remember that a hypothesis test is about whether we can generalize from a sample to a population, so be sure to state your explanation in terms of the population not the sample.) 4.17
(a) This is a test for a difference in means, and we use the population parameters μ1 and μ2 when stating hypotheses. We let μ1 represent mean test score for all students in an active learning class on this subject, and μ2 represent the mean test score for all students in a passive learning class on this subject. We are specifically interested in whether μ1 is greater than μ2 , so the hypotheses are: H0 : Ha :
μ1 = μ2 μ1 > μ2
(b) Since there is evidence to support the alternative hypothesis, that means the evidence shows that learning is higher for students in an active learning environment. (c) This is also a test for a difference in means. We let μ1 represent mean rating for all students in an active learning environment, and μ2 represent mean rating for all students in a passive lecture. We are specifically interested in whether μ1 is less than μ2 , so the hypotheses are: H0 :
μ1 = μ2
Ha :
μ1 < μ2
(d) Since there is evidence to support the alternative hypothesis, that means the evidence shows that students give higher ratings, and believe that they learned more, when they are in a passive lecture class. (e) Students learn more with active learning. Students think (incorrectly) that they learn more in a passive lecture. 4.18
(a) We define ph to be the proportion with ADHD in the high pesticide group and pl to be the proportion with ADHD in low pesticide group. The hypotheses are: H0 :
ph = pl
Ha :
ph > pl
(b) No. Just because in the sample we have p̂h > p̂l , we can’t assume that the same must be true for the population proportions. (c) Children with high exposure to DAP are more likely to be diagnosed with ADHD than children with low exposure to the pesticide. (Note that these results come from an observational study rather than an experiment so we need to be sure that the statement of our conclusion does not imply causation.)
CHAPTER 4 4.19
155
(a) We define μb to be the mean number of mosquitoes attracted after drinking beer and μw to be the mean number of mosquitoes attracted after drinking water. The hypotheses are: H0 : Ha :
μb = μw μb > μw
(b) The sample mean number of mosquitoes attracted per participant before consumption for the beer group is 434/25 = 17.36 and is 337/18 = 18.72 for the water group. These sample means are slightly different, but the small difference could be attributed to random chance. (c) The sample mean number of mosquitoes attracted per participant after consumption is 590/25 = 23.60 for the beer group and is 345/18 = 19.17 for the water group. This difference is larger than the difference in means before consumption. It is less likely to be due just to random chance. (d) The mean number of mosquitoes attracted when drinking beer is higher than when drinking water. (e) Since this was an experiment we can conclude causation and, if there is evidence for the alternative hypothesis, we have evidence that beer consumption increases mosquito attraction. 4.20
(a) The parameter is p, the proportion of all court cases going to trial that end in a guilty verdict. The sample statistic is p̂, the proportion of guilty verdicts in the sample of 2000 cases.
(b) The hypotheses are: H0 : Ha :
p = 0.95 p = 0.95
(c) How likely is the observed sample proportion when we select a sample of size 2000 from a population with p = 0.95? 4.21
(a) We define μe to be mean BMP level in the brains of exercising mice and μs to be mean BMP level in the brains of sedentary mice. The hypotheses are: H0 :
μe = μs
Ha :
μe < μs
(b) We define μe to be mean noggin level in the brains of exercising mice and μs to be mean noggin level in the brains of sedentary mice. The hypotheses are: H0 : Ha :
μe = μs μe > μs
(c) We define ρ to be the correlation between levels of BMP and noggin in the brains of mice. The hypotheses are: H0 : Ha : 4.22
ρ=0 ρ<0
(a) Notice that this is a test for a single proportion since there is one sample. We define p to be the proportion of people who prefer Brand A. The hypotheses are: H0 : Ha :
p = 0.5 p > 0.5
CHAPTER 4
156
(b) Answers will vary. One possibility is that 98 choose Brand A and 2 choose Brand B, giving p̂ = 0.98. (c) Answers will vary. One possibility is that 40 choose Brand A and 60 choose Brand B, giving p̂ = 0.40 which gives no evidence that p > 0.50. (d) Answers will vary. One possibility is that 52 choose Brand A and 48 choose Brand B, giving p̂ = 0.52. We see that p̂ is bigger than 0.50, but could easily occur by random chance even if p is not bigger than 0.5. 4.23 We define μm to be mean heart rate for males being admitted to an ICU and μf to be mean heart rate for females being admitted to an ICU. The hypotheses are: H0 :
μm = μf
Ha :
μm > μf
4.24 We define pw to be the proportion of white ICU patients who receive CPR and pb to be the proportion of black ICU patients who receive CPR. The hypotheses are: H0 : Ha :
pw = pb pw = pb
4.25 We define ρ to be the correlation between systolic blood pressure and heart rate for patients admitted to an ICU. The hypotheses are: H0 :
ρ=0
Ha :
ρ>0
Note: The hypotheses could also be written in terms of β, the slope of a regression line to predict one of these variables using the other. 4.26 Notice that this is a test for a single proportion. We define p to be the proportion of ICU patients who are female. (We could also have defined p to be the proportion who are male. The test will work fine either way.) The hypotheses are: H0 : Ha :
p = 0.5 p = 0.5
4.27 We define μ to be the mean age of ICU patients. The hypotheses are: H0 : Ha : 4.28
μ = 50 μ > 50
(a) The parameter of interest is μE − μW , the difference in the average household income of all households east of the Mississippi (μE ) and west of the Mississippi (μW ). The hypotheses are H0 :
μE = μW
Ha :
μE = μW
Note that we could also specify the hypotheses in terms of μE − μW = 0 and μE − μW = 0. (b) Use the means of the samples of household incomes in each region, xE and xW , and compute the difference in means xE − xW for the best estimate.
CHAPTER 4 4.29
157
(a) The population parameter of interest is the correlation, ρ, between number of children and household income from all households within the US. The hypotheses are: H0 : Ha :
ρ=0 ρ = 0
(b) We are testing for a relationship, which means a correlation different from 0. So the larger correlation of 0.75 provides more evidence. (c) Since we are simply looking for evidence of a relationship, the same amount of evidence is given from both 0.50 and −0.50. However they would give evidence of a relationship in opposite directions. 4.30
(a) We define μr to be the mean metabolism rate for this species after taking a resveratrol supplement and μp to be the mean metabolism rate after taking a placebo. The hypotheses are: H0 :
μr = μp
Ha :
μr > μp
(b) For species A, the sample mean for the resveratrol group is greater than the mean for the placebo group, but not by much and the distributions in the boxplots overlap quite a bit, so the difference may not be very significant. Thus we probably could not conclude that resveratrol increases metabolism for this species (although we can’t be sure without doing a statistical test). (c) For species B, the sample mean for the resveratrol group is quite a bit greater than the mean for the placebo group. In fact, most of the values in the resveratrol sample are higher than almost all the values in the placebo sample, with less variability in both cases than the plots for species A. This gives stronger evidence that resveratrol increases the metabolism rate for species B. (Although, again, we cannot be sure without doing a statistical test.) 4.31
(a) The null hypothesis (H0 ) is that Muriel has no ability to distinguish whether the milk or tea is poured first, and so her guesses are no better than random. The alternative hypothesis (Ha ) is that Muriel’s guesses for whether the milk or tea are poured first are better than random.
(b) Since there are only two possible answers (tea first or milk first), if she is guessing randomly, we expect her to be correct about half the time. The hypotheses are H0 : p = 0.5 vs Ha : p > 0.5. 4.32
(a) We define μ to be the average amount of omega-3 in one tablespoon of this brand of milled flaxseed. Since the company is looking for evidence that the average is greater than 3800, the hypotheses are: H0 : Ha :
μ = 3800 μ > 3800
(b) We define μ as in part (a). Since the consumer organization is testing to see if there is evidence that the average is less than 3800, the hypotheses are: H0 : Ha :
μ = 3800 μ < 3800
4.33 This analysis does not involve a test because there is no claim of interest. We would likely use a confidence interval to estimate the average.
CHAPTER 4
158
4.34 This analysis does involve a statistical test. The population parameter p is the proportion of people in the community living in a mobile home. The hypotheses are: H0 : Ha :
p = 0.10 p > 0.10
4.35 This analysis does not include a test because from the information in a census, we can find exactly the true population proportion. 4.36 This analysis does include a statistical test. This is a matched pairs experiment, so the population parameter is μD , the average difference of reaction time (right − left) for all right-handed people. The hypotheses are H0 : μD = 0 vs Ha : μD < 0. We could also write the hypotheses as H0 : μR = μL vs Ha : μR < μL . 4.37 This analysis does not include a test because there is no claim of interest. The analysis would probably include a confidence interval to give an estimate of the average reaction time. 4.38 This analysis does include a statistical test for a single proportion. The population parameter p is the proportion of people in New York City who prefer Pepsi and we are testing to see if the proportion who prefer Pepsi is greater than 50%. The hypotheses are: H0 : Ha :
p = 0.5 p > 0.5
4.39 This analysis does not include a statistical test. Since we have all the information for the population, we can compute the proportion who voted exactly and see if it is greater than 50%. 4.40
(a) We define pc to be the proportion supporting Candidate A after a phone call and pf to be the proportion supporting Candidate A after a flyer. The hypotheses are: H0 : Ha :
pc = pf pc = pf
(b) We see that p̂c = 152/250 = 0.608 and p̂f = 145/250 = 0.580. The sample proportions are not equal. (c) We see that p̂c = 188/250 = 0.752 and p̂f = 120/250 = 0.480. (d) Sample B has stronger evidence of a difference in effectiveness because the sample proportions are farther apart than in Sample A and the sample sizes are the same. 4.41
(a) We define pc to be the proportion supporting Candidate A after a phone call and pf to be the proportion supporting Candidate A after a flyer. The hypotheses are: H0 : Ha :
pc = pf pc > pf
(b) Answers will vary, but we need obviously more support when getting a call. Also remember that there are 100 voters in each group. One possible set of data is: Phone call Flyer
Vote A 98 50
Not vote A 2 50
CHAPTER 4
159
(c) Answers will vary but, to have no evidence for Ha , we need to show less support when getting a call. One possible set of data is:
Phone call Flyer
Vote A 40 60
Not vote A 60 40
(d) Answers will vary, but we need more support with a call, but only by a little. One possible set of data is:
Phone call Flyer
Vote A 52 50
Not vote A 48 50
Section 4.2 Solutions 4.42 The sample proportion, p̂ 4.43 The sample mean, x 4.44 The sample correlation, r 4.45 The difference in the sample means, x1 − x2 4.46 The difference in the sample proportions, p̂1 − p̂2 4.47 The randomization distribution will be centered at 0.5, since p = 0.5 under H0 . Since Ha : p < 0.5, this is a left-tail test. 4.48 The randomization distribution will be centered at 10, since μ = 10 under H0 . Since Ha : μ > 10, this is a right-tail test. 4.49 The randomization distribution will be centered at 0, since ρ = 0 under H0 . Since Ha : ρ = 0, this is a two-tailed test. 4.50 The randomization distribution will be centered at 0, since μ1 − μ2 = 0 under H0 . Since Ha : μ1 = μ2 , this is a two-tailed test. 4.51 The randomization distribution will be centered at 0, since p1 − p2 = 0 under H0 . Since Ha : p1 > p2 , this is a right-tail test. 4.52
(a) We see in the randomization distribution that a sample proportion of p̂ = 0.1 is rare when the null hypothesis is true but sample proportions as extreme occurred several times in this randomization distribution. A value this extreme is (ii): unusual but might occur occasionally.
(b) We see in the randomization distribution that a sample proportion of p̂ = 0.35 is not at all unusual when the null hypothesis is true, so a value this extreme is (i): reasonably likely to occur. (c) We see in the randomization distribution that there are no sample proportions even close to p̂ = 0.6 when the null hypothesis is true, so a sample proportion this far out is (iii): extremely unlikely to ever occur when the null hypothesis is true.
160
CHAPTER 4
4.53
(a) We see in the randomization distribution that there are no sample means even close to x = 250 when the null hypothesis is true, so a sample mean this far out is (iii): extremely unlikely to ever occur using samples of this size.
(b) We see in the randomization distribution that a sample mean of x = 305 is not unusual when the null hypothesis is true, so a value this extreme is (i): reasonably likely to occur. (c) We see in the randomization distribution that a sample mean as extreme as x = 315 is rare when the null hypothesis is true, but sample means as extreme occurred several times in this randomization distribution. A value this extreme is (ii): unusual but might occur occasionally. 4.54 This is a right-tail test, so in each case, we are estimating the proportion of the distribution to the right of the value. (a) x = 68 is far in the right tail of the randomization distribution so the p-value is quite small at 0.01. (b) x = 54 is relatively close to the center of the distribution with roughly 1/3 of the values in the upper tail beyond it, so the p-value is closer to 0.30 than it is to 0.10. (c) x = 63 is well out in the upper tail with relatively few (about 5%) of the values above it so the p-value is closer to 0.05. A p-value near 0.50 would come from some x in the middle of the randomization distribution, near 50. 4.55 This is a left-tail test, so in each case, we are estimating the proportion of the distribution to the left of the value. (a) p̂ = 0.25 is in the lower tail, but not very far, so the p-value is closer to 0.30 than 0.001. (b) p̂ = 0.15 is farther out in the lower tail so the p-value should be quite small, around 0.04. (c) p̂ = 0.35 is in the upper tail of the randomization distribution. Since this is a lower tail test (Ha : p < 0.3), the p-value for p̂ = 0.35 is the proportion of randomization samples with proportions below 0.35 which will be more than half the values. Thus the p-value is closer to 0.70 than 0.30. 4.56 This is a two-tail test, so in each case, we find the area beyond the value in the smaller tail and then double it. (a) D = −2.9 is far out in the lower tail of the randomization distribution. Even accounting for the two tails, there are very few points this extreme so the p-value must be quite small, around 0.01. (b) D = 1.2 is in the upper tail and it’s p-value uses the points in the tail beyond it, together with those beyond −1.2 in the other tail. This still leaves more than half of the points in the randomization distribution between −1.2 and +1.2, so the p-value must be less than 0.50. A p-value near 0.30 would be a good estimate of the amount of the distribution beyond ±1.2. 4.57
(a) The figures showing the (two-tail) region beyond the observed statistic are shown below.
CHAPTER 4
161
(b) We see that D = 2.8 is farther out in the tail, so there is less area beyond it (and beyond −2.8). This means D = 2.8 has a smaller p-value and thus provides stronger evidence against H0 . 4.58
(a) The figures showing the (two-tail) region beyond the observed statistic are shown below.
(b) There is less area in the tails beyond ±1.3 than beyond ±0.7. So D = −1.3 has a smaller p-value and provides stronger evidence against H0 . 4.59
(a) We compute the difference in means: D = x1 − x2 = 17.3 − 18.7 = −1.4 and D = x1 − x2 = 19.0 − 15.4 = 3.6. The figures are shown below.
(b) The difference is larger in magnitude for D = x1 − x2 = 19.0 − 15.4 = 3.6 than D = x1 − x2 = 17.3 − 18.7 = −1.4. So x1 = 19.0 with x2 = 15.4 give a sample difference that is farther in a tail of the randomization distribution and provide stronger evidence against H0 . 4.60
(a) We compute the difference in means: D = x1 − x2 = 95.7 − 93.5 = 2.2 and D = x1 − x2 = 94.1 − 96.3 = −2.2. The figures are shown below.
162
CHAPTER 4
(b) The sample means for both pairs, D = x1 −x2 = 95.7−93.5 = 2.2 and D = x1 −x2 = 94.1−96.3 = −2.2 differ by the same amount, but with opposite signs. Since the p-value for this two tail test uses the values that are more extreme from both tails, each pair produces the same p-value. The evidence against H0 is identical in both cases. 4.61 A randomization distribution with sample proportions for 1000 samples of size 50 when p = 0.50 is shown below. Since the alternative hypothesis is p > 0.5, this is a right-tail test and we need to find how many of the randomization proportions are at (or above) the sample value of p̂ = 0.60. For this randomization distribution that includes 102 out of 1000 simulations so the p-value is 0.102. Answers will vary.
4.62 A randomization distribution with sample proportions for 1000 samples of size 100 when p = 0.50 is shown below. Since the alternative hypothesis is p < 0.5, this is a left-tail test and we need to find how many of the randomization proportions are at (or below) the sample value of p̂ = 0.38. For this randomization distribution that includes 11 out of 1000 simulations so the p-value is 0.011. Answers will vary.
CHAPTER 4
163
4.63 A randomization distribution with sample proportions for 1000 samples of size 200 when p = 0.70 is shown below. Since the alternative hypothesis is p < 0.7, this is a left-tail test and we need to find how many of the randomization proportions are at (or below) the sample value of p̂ = 0.625. For this randomization distribution that includes 9 out of 1000 simulations so the p-value is 0.009. Answers will vary.
4.64 A randomization distribution with sample proportions for 1000 samples of size 80 when p = 0.60 is shown below. Since the alternative hypothesis is p > 0.6, this is a right-tail test and we need to find how many of the randomization proportions are at (or above) the sample value of p̂ = 0.65. For this randomization distribution that includes 214 out of 1000 simulations so the p-value is 0.214. Answers will vary.
4.65 A randomization distribution with sample proportions for 1000 samples of size 100 when p = 0.5 is shown below. Since the alternative hypothesis is p = 0.5, this is a two-tail test. We need to find how many of the randomization proportions are at (or below) the sample value of p̂ = 0.42 and double to account for the other tail. For this randomization distribution there are 60 out of 1000 simulations beyond 0.42 so the p-value is 2 ∗ 60/1000 = 0.120. Answers will vary.
CHAPTER 4
164
4.66 A randomization distribution with sample proportions for 1000 samples of size 100 when p = 0.5 is shown below. Since the alternative hypothesis is p = 0.5, this is a two-tail test. We need to find how many of the randomization proportions are at (or above) the sample value of p̂ = 0.70 and double to account for the other tail. For this randomization distribution there are 8 out of 1000 simulations beyond 0.70 so the p-value is 2 ∗ 8/1000 = 0.016. Answers will vary.
4.67 The smaller p-value, 0.08, provides stronger evidence against H0 . 4.68 The smaller p-value, 0.04, provides stronger evidence against H0 . 4.69 The smaller p-value, 0.007, provides stronger evidence against H0 . 4.70 The smaller p-value, 0.0008, provides stronger evidence against H0 . 4.71
(a) The randomization distribution is created assuming the null hypothesis is true, so it is centered at the null hypothesis value of 15. Thus, sample mean B will be closer to the center.
CHAPTER 4
165
(b) The randomization distribution is centered at 15, so sample mean A is farther out in the tail. (c) The sample mean farther out in the tail of the distribution is less likely to just occur by random chance, so it provides stronger evidence against H0 and in support of Ha . This is sample mean A. (d) In this case, providing evidence against H0 and in support of Ha means having unsafe and toxic levels of lead in the drinking water. 4.72
(a) The randomization distribution is created assuming the null hypothesis is true, so it is centered at the null hypothesis value of a difference in means of 0. Thus, sample mean A will be closer to the center.
(b) The randomization distribution is centered at 0, so sample mean B is farther out in the tail. (c) The sample mean farther out in the tail of the distribution is less likely to just occur by random chance, so it provides stronger evidence against H0 and in support of Ha . This is sample mean B. (d) In this case, providing evidence against H0 and in support of Ha means that we have evidence that students who are not allowed to use electronic devices during class perform better on the exam. 4.73
(a) Using μ as the mean difference in ages, we have H0 : Ha :
μ=0 μ>0
Note that the null hypothesis is that there is no difference in ages between husbands and wives, on average. (b) The randomization distribution is centered at 0, the null hypothesis value, since we create it assuming the null hypothesis is true. (c) The sample statistic is x = 2.829 years. It is way out in the right tail, farther out than anything shown in the randomization distribution. (d) No! It is extremely unlikely for this sample statistic to happen just by random chance. (e) Yes, the sample statistic appears to provide strong evidence against H0 and in support of Ha . (f) Yes, we have evidence that, on average, husbands are older than their wives. 4.74
(a) Using p as proportion with husbands older, we have H0 :
p = 0.5
Ha :
p > 0.5
Note that the null hypothesis is that husbands and wives are equally likely to be the older one. (b) The randomization distribution is centered at 0.5, the null hypothesis value, since we create it assuming the null hypothesis is true. (c) The sample statistic is p̂ = 0.714 years. It is way out in the right tail, farther out than anything shown in the randomization distribution. (d) No! It is extremely unlikely for this sample statistic to happen just by random chance. (e) Yes, the sample statistic appears to provide strong evidence against H0 and in support of Ha . (f) Yes, we have evidence that husbands are more likely to be older than their wives.
166
CHAPTER 4
4.75
(a) Letting μc and μn represent the average tap rate of people who have had coffee with caffeine and without caffeine respectively, the null and alternative hypotheses are H0 : μ c = μ n Ha : μ c > μ n
(b) This is a right-tail test so we shade the area to the right of the statistic 1.6. See the figure. The amount of the distribution that lies to the right of 1.6 is a relatively small portion of the entire graph. It is not as small as 0.03 and is not close to half the data so is not as large as 0.45 or 0.60. The p-value of 0.11 is the most reasonable estimate.
(c) See the figure. The amount of the distribution that lies to the right of the statistic xc − xn = 2.4 is very small, so the p-value is closest to 0.03. (d) The difference in part (c), xc − xn = 2.4, is more extreme in the randomization distribution, so it provides stronger evidence that caffeine increases average finger tapping rate. 4.76
(a) This is a right-tail test, so we shade the upper tail of the distribution beyond D = 0.3. See the figure.
(b) The larger the sample difference p̂c − p̂f , the smaller the p-value. We can also estimate rough areas in the right tail beyond each of the differences p̂c − p̂f . We see that • p̂c − p̂f = 0.1 goes with p-value 0.365 • p̂c − p̂f = 0.3 goes with p-value 0.085 • p̂c − p̂f = 0.5 goes with p-value 0.012 • p̂c − p̂f = 0.65 goes with p-value 0.001
CHAPTER 4
167
(c) If the two methods (phone call and flyer) are equally effective, the chance that the sample proportion of support among those getting phone calls is this much bigger (0.65) than the sample proportion for those getting flyers, by random chance alone, is about 0.001. (d) The statistic showing the greatest difference, p̂c − p̂f = 0.65, provides the strongest evidence that the proportion is higher with a phone call. 4.77
(a) Since this is a two-tail test, we shade both tails, first beyond ±0.2, then beyond ±0.4. See the figure.
(b) Less than half the area of the distribution lies in the two tails beyond ±0.2, but not a lot less than half, so the p-value of 0.392 would be the best estimate. From the frequencies in the histogram, there appear to be somewhere between 25 and 35 cases below pf − pc = −0.4, so doubling to account for two tails would make 66/1000=0.066 the most reasonable p-value of those listed. (c) The statistic that shows the greatest difference is pf − pc = −0.4, so this statistic is likely to provide the strongest evidence that the methods are not equally effective. 4.78
(a) This is a test for a difference in means. If we use μN for the mean percent decrease in muscle strength for those who do no imagery and μY for the mean percent decrease in muscle strength for those who do imagery, the hypotheses are: H0 :
μN = μY
Ha :
μN > μY
(b) We see that xN − xY = 51.2 − 24.5 = 26.7. (c) This is a right-tail test so we see what proportion of the samples are greater than the sample statistic of 26.7. We see in the randomization distribution that only two of the 1000 dots are more extreme than 26.7, so the p-value is 2/1000 = 0.002. 4.79
(a) We are testing whether the proportion p of people who die from colon cancer after having polyps removed in a colonoscopy is less than 0.01. The hypotheses are: H0 : Ha :
(b) The sample proportion is p̂ = 12/2602 = 0.0046.
p = 0.01 p < 0.01
CHAPTER 4
168
(c) We want to see how extreme the sample proportion of 0.0046 is on the randomization distribution. This is a left tail test, so we are interested in the number of dots to the left of 0.0046. It appears that there are four dots to the left of the sample statistic out of a total of 1000 dots, so p-value = 4.80
4 = 0.004. 1000
(a) If the mean arsenic level is really 80 ppb, the chance of seeing a sample mean as high (or higher) than was observed in the sample from supplier A by random chance is only 0.0003. For supplier B, the corresponding probability (seeing a sample mean as high as B’s when μ = 80) is 0.35.
(b) The smaller p-value for Supplier A provides stronger evidence against the null hypothesis and in favor of the alternative that the mean arsenic level is higher than 80 ppb. Since it is very rare for the mean to be that large when μ = 80, we have stronger evidence that there is too much arsenic in Supplier A’s chickens. (c) The chain should get chickens from Supplier B, since there is strong evidence that Supplier A’s chicken have a mean arsenic level above 80 ppb which is unacceptable. 4.81
(a) This is a test for a difference in means. If we use μC for mean hippocampus volume of the control people who have never played football and μF for mean hippocampus volume for football players, the hypotheses are: H0 : Ha :
μC = μF μC > μF
(b) We see that xC − xF = 7602.6 − 6459.2 = 1143.4. (c) This is a right-tail test. In the randomization distribution from 2000 simulations below, we see that there are no dots larger than the sample statistic of 1143.4, so the proportion more extreme is 0.000. Therefore, the p-value is approximately 0.000. Answers may vary, but p-value should be very small.
(d) This difference in means was more extreme than any that we saw in the simulated samples, so it is very unlikely that this difference is just the result of random chance.
CHAPTER 4 4.82
169
(a) This is a test for a difference in means. If we use μN for mean hippocampus volume of football players with no concussion and μY for mean hippocampus volume for football players with a history of concussions, the hypotheses are: H0 : Ha :
μN = μY μN > μY
(b) We see that xN − xY = 6459.2 − 5734.6 = 724.6. (c) This is a right-tail test. In the randomization distribution for 2000 samples below, we see that there are no dots larger than the sample statistic of 724.6, so the proportion more extreme on the right side is about 0.000. The p-value is approximately 0.000. Answers may vary, but p-value should be very small.
(d) This difference in means in the original sample was far more extreme than any that we saw in the simulated samples, so it is very unlikely that this difference is just the result of random chance. 4.83
(a) This is a test for a single proportion, and the hypotheses are: H0 :
p = 0.5
Ha :
p > 0.5
(b) The sample proportion is p̂ = 38/53 = 0.717. (c) Using StatKey or other technology to create the randomization distribution, we see that the proportion in the right-tail beyond the sample statistic of 0.717 is very small. To three decimal places, we have p-value = 0.000. 4.84
(a) This is a two-tail test for a single proportion, and the hypotheses are: H0 : Ha :
p = 0.5 p = 0.5
CHAPTER 4
170 (b) The sample proportion is p̂ = 14/31 = 0.452.
(c) We use StatKey or other technology to create the randomization distribution. This is a two-tail test, and the sample statistic is to the left of the center, so we find the area to the left of the sample statistic to be about 0.36. Since this is a two-tail test, we double this to account for both sides of the distribution, and have p-value = 2 · (0.36) = 0.72. 4.85
(a) This is a test for a difference in proportions. Using p1 to represent the proportion of times hiding rats will pick an opaque box and p2 to represent the proportion of times seeking rats will pick an opaque box, the hypotheses are: H0 :
p1 = p2
Ha :
p1 > p2
(b) The sample proportion for hiding rats is p̂1 = 38/53 = 0.717 and the sample proportion for seeking rats is p̂2 = 14/31 = 0.452, so the difference in proportions is p̂1 − p̂2 = 0.717 − 0.452 = 0.265. (c) Using StatKey or other technology to create the randomization distribution, we see that the proportion in the right-tail beyond the sample statistic of 0.265 is about 0.015. We have p-value = 0.015. 4.86
(a) This is a test for a difference in proportions. If we use pM for the proportion of US men who own a smartphone and pW for the proportion of US women who own a smartphone, the hypotheses are: H0 : Ha :
pM = pW pM = pW
(b) We see that p̂M = 688/989 = 0.696 and p̂W = 671/1012 = 0.663. The sample statistic is p̂M − p̂W = 0.696 − 0.663 = 0.033. We see that a larger proportion of men own smartphones. (c) This is a two-tail test. The sample statistic 0.033 is in the right tail of the randomization distribution and we see that the proportion of the 3000 simulated samples more extreme than 0.033 in that direction is about 0.052. Since this is a two-tail test, we multiply by 2: p-value = 2 · 0.052 = 0.104.
CHAPTER 4 4.87
171
(a) This is a test for a difference in proportions. If we use pM for the proportion of US men who own a tablet and pW for the proportion of US women who own a tablet, the hypotheses are: H0 : Ha :
pM = pW pM = pW
(b) We see that p̂M = 197/455 = 0.433 and p̂W = 235/504 = 0.466. The sample statistic is p̂M − p̂W = 0.433 − 0.466 = −0.033. We see that a larger proportion of women own tablets. (c) This is a two-tail test. The sample statistic −0.033 is in the left tail of the randomization distribution below and we see that the proportion of 3000 simulated samples more extreme than −0.033 in that tail is about 0.165. Since this is a two-tail test, we multiply by 2: p-value = 2 · 0.165 = 0.33.
4.88
(a) If we let p̂T be proportion with staph infections of those with triclosan in their system and p̂N be the proportion with staph infections of those without triclosan in their systems, we have p̂T = 24/37 = 0.649 and p̂N = 15/53 = 0.283, so the difference in sample proportions is p̂T − p̂N = 0.649 − 0.283 = 0.366.
(b) This is a test for a difference in proportions, so the hypotheses are: H0 :
pT = pN
Ha :
pT > pN
(c) In a randomization distribution below, we see that only 2 of the 2000 matched the original difference of 0.366 and none were larger. The p-value from this distribution is 2/2000 = 0.001.
CHAPTER 4
172
4.89
(a) Because she guessed 8 out of 8 correctly, p̂ = 8/8 = 1.
(b) Using the Test for Single Proportion applet on StatKey with count 8, sample size 8, and p = 0.5 we find that only 20 out of 5000 simulated samples had this maximum proportion of 1, so right-tailed p-value is 20/5000 = 0.004. 4.90 The randomization distribution represents samples chosen when H0 is true. The area in a tail gives an estimate of the probability that a result as extreme (or more extreme) than the original sample should occur when H0 is true, which is what the p-value measures. 4.91
(a) Yes; average weight gain is 1.85 lbs higher for people eating ultra-processed food as opposed to unprocessed.
(b) Yes; it was randomly determined whether participants ate ultra-processed or unprocessed food first. Due to this random assignment, people eating the two diets should look similar at baseline. (c) Yes; the observed statistic is very extreme on the randomization distribution, so the observed sample statistic of 1.85 would be very extreme, just by random chance, if there were no real difference in weight gain due to diet. This provides evidence against just random chance. (d) Yes; the sample statistic shows higher weight gain under the ultra-processed diet as opposed to the processed diet, and we have evidence against alternative explanations (ii) and (iii), leaving evidence for (i), the causal explanation. 4.92
(a) Yes; average daily caloric intake is 507.7 calories higher for people eating ultra-processed food as opposed to unprocessed.
(b) Yes; it was randomly determined whether participants ate ultra-processed or unprocessed food first. Due to this random assignment, people eating the two diets should look similar at baseline. (c) Yes; the observed statistic is very extreme on the randomization distribution, so the observed sample statistic of 507.8 would be very extreme, just by random chance, if there were no real difference in caloric consumption due to diet. This provides evidence against just random chance.
CHAPTER 4
173
(d) Yes; the sample statistic shows higher caloric consumption under the ultra-processed diet as opposed to the processed diet, and we have evidence against alternative explanations (ii) and (iii), leaving evidence for (i), the causal explanation. 4.93
(a) Sugar consumption was an average of 3.25 grams higher under the unprocessed diet.
(b) Yes; it was randomly determined whether participants ate ultra-processed or unprocessed food first. Due to this random assignment, people eating the two diets should look similar at baseline. (c) No; the observed statistic is in the middle of the randomization distribution. This could easily occur just by random chance, if there were no real difference in sugar consumption due to diet. This does not provide evidence against just random chance. (d) No; because we do not have evidence against (iii), we are unable to distinguish between the causal explanation and just random chance. 4.94
(a) Yes; glucagon levels are an average of 1.25 pg/mL higher for people eating ultra-processed foods as opposed to unprocessed.
(b) Yes; it was randomly determined whether participants ate ultra-processed or unprocessed food first. Due to this random assignment, people eating the two diets should look similar at baseline. (c) No; the observed statistic is in the middle of the randomization distribution. This could easily occur just by random chance, if there were no real difference in glucagon levels due to diet. This does not provide evidence against just random chance. (d) No; because we do not have evidence against (iii), we are unable to distinguish between the causal explanation and just random chance. 4.95
(a) Since a fair die has six equally likely results, we should roll a five on 1/6 of all throws. The question asks if fives are more common than would be expected for a fair die, so this is a right-tail test with H0 : p = 1/6 vs Ha : p > 1/6, where p is the proportion of throws that show a five.
(b) If H0 : p = 1/6 is true, the sample proportions should be centered around the null proportion p = 1/6. (c) Answers can vary, and any value of p̂ < 1/6 = 0.1666 . . . will work. (d) If p̂ < 1/6 it will be to the left of the center of the randomization distribution at p = 1/6. (e) Since this is an upper tail Ha , the p-value is found by the part of the randomization that is above the observed statistic, so we find the area to the right. (f) Since a sample p̂ below 1/6 is in the left half of the randomization distribution, the p-value above it must be more than 0.50. 4.96
(a) X = 8 is 5 above the expected count of 3. A point as far away in the other direction would be at −2. It’s impossible to have negative values when counting how many students choose a number, so there could never be points that far away in the other direction.
(b) To find the p-value we double the proportion of randomization samples that are at or above X = 8. This only happened 3+1=4 times in 1000 randomizations, so the p-value = 2 · 4/1000 = 0.008. (c) The smallest possible lower tail p-value is 0.046 which would occur if none of the thirty students in the sample picked zero.
CHAPTER 4
174 Section 4.3 Solutions 4.97 Reject H0 , since p-value = 0.0007 < 0.05. 4.98 Reject H0 , since p-value = 0.0320 < 0.05. 4.99 Do not reject H0 , since p-value = 0.2531 ≥ 0.05. 4.100 Do not reject H0 , since p-value = 0.1145 ≥ 0.05.
4.101 The results are significant if the p-value is less than the significance level. A p-value of 0.0320 shows the results are significant at a 10% and 5% level, but not a 1% level. 4.102 The results are significant if the p-value is less than the significance level. A p-value of 0.2800 would mean the results are not significant at any of these levels. 4.103 The results are significant if the p-value is less than the significance level. A p-value of 0.008 shows the results are significant at all three levels. 4.104 The results are significant if the p-value is less than the significance level. A p-value of 0.0621 shows the results are significant at a 10% level, but not significant at 5% or 1% levels. 4.105 The results are significant if the p-value is less than the significance level. (a) I. 0.0875, less than 0.10, but not less than 0.05. (b) IV. 0.00003, smaller than any reasonable significance level. (c) II. 0.5457, larger than any reasonable significance level. (d) III. 0.0217, less than 0.05, but not less than 0.01. 4.106
(a) II. 0.0571, less than 0.10, but not less than 0.05.
(b) I. 0.00008, smaller than any reasonable significance level. (c) IV. 0.1753, larger than any reasonable significance level. (d) III. 0.0368, less than 0.05, but not less than 0.01. 4.107 Test A, since we can be more sure the p-value is quite small. It’s likely that Test B has a p-value between 0.05 and 0.10, since otherwise they probably would have reported it as being significant at a smaller level than 10%. 4.108 The p-value is very low, so we have evidence against H0 and for Ha . There is strong evidence that rats understand that they should be quiet when hiding, but that squeaking with excitement while seeking is fine. The test shows that mean vocalization is higher when seeking than when hiding. 4.109 We know that the smaller the p-value, the stronger the evidence for an effect. Therefore, the smaller p-value of 0.001 goes with the test of patient-reported pain and the larger p-value of 0.47 goes with the test of spine mobility. 4.110 (a) Since strong evidence was found for this association, the p-value is small. The correct p-value for this test is 0.009.
CHAPTER 4
175
(b) The evidence in this test was not significant, so the p-value is large. The correct p-value for this test is 0.371. (c) Since the evidence in this test was significant at the 5% level but not the 1% level, the p-value must be smaller than 0.05 but larger than 0.01. The correct p-value for this test is 0.031. 4.111 (a) A p-value of 0.031 provides some evidence of a difference, so this p-value goes with the test for a difference between the illustrated and animated formats. (b) A p-value of 0.258 does not provide evidence of a difference, so this p-value goes with the test for a difference between the audio and illustrated formats. (c) A p-value of 0.006 provides strong evidence of a difference, so this p-value goes with the test for a difference between the audio and animated formats. 4.112 (a) The study found evidence to show a difference between the groups in spatial memory, which implies a low p-value for that test. The p-value for the test of spatial memory must be 0.0001. The study did not find a significant difference between the groups in amount of time exploring objects, which implies a relatively high p-value for that test. The p-value for the test of time exploring objects must be 0.7. (b) The title implies causation, which is justified since the results come from a randomized experiment and we observed an effect. 4.113 The test showing statistical evidence of improvement should have a small p-value, while the test showing no significant improvement should have a larger p-value. Since mice exposed to UV did “significantly better”, that test should have the small p-value of 0.002. There was no significant improvement for the mice given vitamin D, so that test should have the large p-value of 0.472. 4.114
(a) This is a one-tailed test for a difference in means, so the hypotheses are: H0 :
μM = μF
Ha :
μM > μF
where μM and μF represent mean nasal volume for males and females, respectively. (b) Since the p-value is less than 0.01, we reject H0 and find evidence that males do have larger noses on average. 4.115
(a) This is a two-tailed test for a difference in means, so the hypotheses are: H0 :
μM = μF
Ha :
μM = μF
where μM and μF represent mean nasal tip angle for males and females, respectively. (b) Since the p-value is greater than 0.05, we do not reject H0 and do not have evidence at a 5% level of a difference in average tip angle between males and females. 4.116 (a) Both age and nose size are quantitative variables so this is a one-tailed test for a correlation. The hypotheses are: H0 :
ρ=0
Ha :
ρ>0
where ρ represents the correlation between age and nose size.
CHAPTER 4
176
(b) Since the p-value is less than 0.001, we reject H0 and find strong evidence of a positive correlation between age and nose size. Noses do continue to grow as people age! 4.117 (a) The explanatory variable is whether or not antibiotics were given during the first year of life, and the response variable is whether or not the child was categorized as overweight at age 12. Both are categorical. (b) This is an observational study since the explanatory variable was not manipulated. (c) This is a one-tailed hypothesis test for a difference in proportions. Using pA to represent the proportion of children who are overweight of those who have been given antibiotics in infancy and pN to represent the proportion of children who are overweight of those who have not been given antibiotics, we have: H0 : Ha :
pA = pN pA > pN
(d) This is a test for a difference in proportions, so the sample statistic is p̂A − p̂N , where p̂A represents the sample proportion of students overweight in the group that received antibiotics in infancy and p̂N represents the sample proportion of students overweight in the group that did not receive antibiotics. We are told that p̂A = 0.324 and p̂N = 0.182, so the sample statistic is: p̂A − p̂N = 0.324 − 0.182 = 0.142. (e) We are told that the p-value is 0.002 so the conclusion is the reject H0 . The p-value is very small so the evidence is strong. (f) No, we cannot conclude causation since the data come from an observational study. 4.118 (a) The p-value (0.003) is small so the decision is to reject H0 and conclude that the mean recall for sleep (xs = 15.25) is different from the mean recall for caffeine (xc = 12.25). Since the mean for the sleep group is higher than the mean for the caffeine group, we have sufficient evidence to conclude that mean recall after sleep is in fact better than after caffeine. Yes, sleep is really better for you than caffeine for enhancing recall ability. (b) The p-value (0.06) is not less than 0.05 so we would not reject H0 at a 5% level, but it is less than 0.10 so we would reject H0 at a 10% level. There is some moderate evidence of a difference in mean recall ability between sleep and a placebo, but not very strong evidence. (c) The p-value (0.22) is larger than any common significance level, so do not reject H0 . The placebo group had a better mean recall in this sample (xp = 13.70 compared to xc = 12.25), but there is not enough evidence to conclude that the mean for the population would be different for a placebo than the mean recall for caffeine. (d) Get a good night’s sleep! 4.119 (a) Students who think the drink is more expensive solve, on average, more puzzles than students who have a discounted price. The p-value is very small so the evidence for this conclusion is very strong. (b) If you price a product too low, customers might perceive it to be less effective or lower quality than it actually is. 4.120 (a) The hypotheses are H0 : ρ = 0 vs Ha : ρ < 0, where ρ is the correlation between pH and fish mercury levels in all Florida lakes.
CHAPTER 4
177
(b) The very small p-value (0.000017) indicates that we should reject H0 : ρ = 0 in favor of Ha : ρ < 0. There is very strong evidence of a negative correlation between mercury content of fish and acidity of Florida lakes. (c) The data are from an observational study and not an experiment, so we can’t conclude that low pH causes increased mercury in the fish. 4.121 (a) The hypotheses are H0 : p = 0.5 vs Ha : p = 0.5 where p is the proportion of penalty shots that go to the right after a specific body movement. (b) The p-value (0.3184) is not small so we do not reject H0 . There is not evidence that this movement helps predict the direction of the shot, so there is no reason to learn to distinguish it. (c) The p-value (0.0006) is smaller than any common significance level so we reject H0 . The proportion of shots to the right after this movement is different from 0.50, so a goalie can gain an advantage by learning to distinguish this movement. 4.122 (a) Since the results were statistically significant at a 5% significance level, we know the p-value was less than 0.05. (b) Since we aren’t given an exact p-value and can only deduce that the p-value is less than 0.05, we can’t determine whether the evidence is very strong or only moderately strong. (c) Yes, since the results are statistically significant we can conclude that pesticide exposure is related to ADHD. (d) No. The data were collected with an observational study and not an experiment so we should not draw a cause and effect conclusion. 4.123 (a) Since they randomly applied treatments (filtered or polluted air) to mice, this is a randomized experiment. (b) We are testing whether or not a difference exists, in either direction, so this will be a two tailed test. This means a null hypothesis of no difference in mean insulin resistance between treatments H0 : μF A = μP M , and an alternative that there is a difference Ha : μF A = μP M . (c) Since the −4.4 is smaller than all the values in the 1,000 random simulations shown in the histogram of the exercise, the p-value is very small (≈ 0.000) and we will reject the null hypothesis. (d) The p-value is essentially zero. (e) Since the p-value is very small (essentially zero), we reject H0 and conclude that there is strong evidence that mean insulin resistance scores are significantly affected by air pollution. There appears to be a strong connection between insulin resistance (diabetes) and air pollution. 4.124 (a) Yes, since the p-value is very small, there is strong evidence against the null and in favor of the alternative that more mosquitoes, on average, are attracted to those who drink beer than those who drink water. (b) A p-value less than 0.001 indicates that such extreme data would be very rare if beer and water had the same effect, so there is very strong evidence against the null hypothesis. (c) Since the data were collected with an experiment where the conditions (beer or water) were randomly assigned, we can infer from the small p-value that consuming beer causes an increase in mosquito attraction. 4.125 (a) The small p-value (less than 0.01) gives strong evidence that Y-maze performance improves for mice after 7–10 days of exercise.
CHAPTER 4
178
(b) The small p-value (less than 0.01) gives strong evidence that BMP levels decrease for mice after 2 or more days of exercise. (c) The very small p-value (less than 0.001) gives very strong evidence that noggin levels increase for mice after 7–10 days of exercise. (d) The strongest statistical effect is the test for noggin level which appears to have the smallest p-value. (e) In mice that exercise, Y-maze performance improves after 7–10 days, BMP levels decrease after 2 or more days, and noggin levels increase after 7–10 days. Exercise appears to have a very significant positive effect on brain function in mice. 4.126 In each case, this is a test for a single proportion, with p equal to the proportion of times two-year-olds will pick the second option when presented with two choices. In both cases, the hypotheses are: H0 : Ha :
p = 0.5 p > 0.5
(a) We have p̂ = 248/480 = 0.517. Using StatKey or other technology to create a randomization distribution, we see that the proportion in the right-tail beyond 0.517 is about 0.22. We have p-value = 0.22. This p-value is larger than any reasonable significance level, so we do not reject H0 . We do not have evidence that two-year-olds are more likely to choose the second option when they are presented with pictures. (b) We have p̂ = 409/480 = 0.852. Using StatKey or other technology to create a randomization distribution, we see that the sample statistic of 0.852 is way off the scale in the right-tail. The proportion in the right-tail beyond 0.852 is essentially zero. We have p-value = 0.000. This p-value is smaller than any reasonable significance level, so we reject H0 . We have very strong evidence that two-year-olds are more likely to choose the second option when presented with a verbal binary-choice question. 4.127 This is a difference in means test. We use μ1 for the mean improvement for those with a healthy diet and μ2 for the mean improvement for those who don’t change diet. The hypotheses are: H0 : Ha :
μ1 = μ2 μ1 > μ2
We see using StatKey or other technology that x1 = 7.865 and x2 = 0.763 so the sample statistic for the difference in means is about x1 − x2 = 7.1. Using a randomization distribution, we see that the proportion in the right-tail beyond 7.1 is about 0.001, so we have p-value = 0.001. Since this p-value is smaller than any reasonable significance level, we reject H0 . We have evidence that, on average, eating healthy for three weeks can improve depression symptoms.
CHAPTER 4
179
4.128 This is a difference in means test. We use μ1 for the mean reduction in BMI for those with a healthy diet for three weeks and μ2 for the mean reduction in BMI for those who don’t change diet. The hypotheses are: H0 : Ha :
μ1 = μ2 μ1 > μ2
We see using StatKey or other technology that x1 = 0.276 and x2 = 0.118 so the sample statistic for the difference in means is about x1 − x2 = 0.158. Using a randomization distribution, we see that the proportion in the right-tail beyond 0.158 is about 0.29, so we have p-value = 0.29. Since this p-value is larger than any reasonable significance level, we do not reject H0 . We do not have evidence that mean reduction in BMI is higher for people eating healthy for three-weeks than for people who don’t change their diet. 4.129 This is a test for a difference in proportions. If we let p̂1 represent the proportion of teens with high social media use to be diagnosed with ADHD and p̂2 represent the proportion of teens with low social media use to be diagnosed with ADHD, the hypotheses are: H0 : Ha :
p1 = p2 p1 > p2
We see that p̂1 = 16/165 = 0.097 and p̂2 = 23/495 = 0.046, so the sample statistic is p̂1 − p̂2 = 0.097−0.046 = 0.051. Using StatKey or other technology to create a randomization distribution, we see that the proportion in the right-tail beyond the sample statistic of 0.051 is about 0.007. We have p-value = 0.007. Since this p-value is smaller than any reasonable significance level, we reject H0 . We have evidence that teens with a high frequency of social media use are more likely to develop ADHD symptoms. Note, however, that this result comes from an observational study so we cannot conclude causation. 4.130 (a) This is a difference in proportions test. Considering women of a similar age and using a period of time similar to the study, we use p1 for the proportion of women dying who walk at the level of Quartile 1 and p2 for the proportion of women dying who walk at the level of Quartile 2. The hypotheses are: H0 :
p1 = p2
Ha :
p1 > p2
We see using StatKey or other technology that p̂1 = 275/4185 = 0.066 and p̂2 = 103/4185 = 0.025 so the sample statistic for the difference in proportions is p̂1 − p̂2 = 0.041. Using a randomization distribution, we see that the sample statistic of 0.041 is very far out in the right-tail and the proportion in the right-tail beyond 0.041 is essentially zero. To three decimal places, we have p-value = 0.000. Since this p-value is very small, we reject H0 and have very strong evidence that, at this level, increasing number of steps per day has a very significant impact on mortality.
CHAPTER 4
180
(b) This is a difference in proportions test. Considering women of a similar age and using a period of time similar to the study, we use p1 for the proportion of women dying who walk at the level of Quartile 3 and p2 for the proportion of women dying who walk at the level of Quartile 4. The hypotheses are: H0 :
p1 = p2
Ha :
p1 > p2
We see using StatKey or other technology that p̂1 = 77/4186 = 0.0184 and p̂2 = 49/4185 = 0.0117 so the sample statistic for the difference in proportions is p̂1 − p̂2 = 0.0067. Using a randomization distribution, we see that the proportion in the right-tail beyond 0.0067 is about 0.01. We have p-value = 0.01. At a 5% significance level, we reject H0 and have some evidence that, at this level, increasing number of steps per day has a significant impact on mortality. (c) The title of the problem (Want to Live Longer? Walk More!) implies causation, but that is not appropriate here because this data is from an observational study. There are many confounding variables in this situation. (Can you name some?) 4.131 This is a difference in means test. We use μ1 for the mean speed of hens and μ2 for the mean speed of cocks. The hypotheses are: H0 :
μ1 = μ2
Ha :
μ1 = μ2
We see using StatKey or other technology that x1 = 632.2 and x2 = 631.6 so the sample statistic for the difference in means is about x1 − x2 = 0.6. Using a randomization distribution, we see that the proportion in the right-tail beyond 0.6 is about 0.48. This is a two-tail test, so we have p-value = 2 · 0.48 = 0.96. Since this p-value is larger than any reasonable significance level, we do not reject H0 . We do not have evidence that mean speed is different, between male and female pigeons. 4.132 This is a difference in means test. We use μ1 for the mean z-score reaction time on this task for students in air-conditioning and μ2 for the mean z-score reaction time on this task for students during a heat wave who are not in air-conditioning. The hypotheses are: H0 :
μ1 = μ2
Ha :
μ1 < μ2
We see using StatKey or other technology that x1 = −0.031 and x2 = 0.072 so the sample statistic for the difference in means is about x1 − x2 = −0.1. Using a randomization distribution, we see that the proportion in the left-tail beyond −0.1 is about 0.235. We have p-value = 0.235. Since this p-value is larger than any reasonable significance level, we do not reject H0 and we do not have evidence that air-conditioning improves mean reaction time on this task.
CHAPTER 4
181
4.133 This is a difference in means test. We use μ1 for the mean z-score reaction time on this task for students in air-conditioning and μ2 for the mean z-score reaction time on this task for students during a heat wave who are not in air-conditioning. The hypotheses are: H0 : Ha :
μ1 = μ2 μ1 < μ2
We see using StatKey or other technology that x1 = −0.157 and x2 = 0.623 so the sample statistic for the difference in means is about x1 −x2 = −0.78. Using a randomization distribution, we see that the proportion in the left-tail beyond −0.78 is about 0.0035. We have p-value = 0.0035. Since this p-value is very small, we reject H0 and have strong evidence that air-conditioning improves mean reaction time on this task. 4.134 Because this is a matched pairs experiment, we do a test for a single mean on the differences. We let μ = the mean difference in ratings that men would give after smelling female tears or after smelling a placebo. The hypotheses are: H0 : Ha :
μ=0 μ = 0
We see using StatKey or other technology that the sample mean difference in ratings is x = 18.48. Using a randomization distribution, we see that the proportion in the tail beyond 118.48 is about 0.12. Since this is a two-tail test, we have p-value = 2(0.12) = 0.24. Since this p-value is larger than any reasonable significance level, we do not reject H0 and we do not have evidence that tears contain a chemical signal affecting sadness ratings. 4.135 Because this is a matched pairs experiment, we do a test for a single mean on the differences. We let μ = the mean difference in ratings that men would give after smelling female tears or after smelling a placebo. The hypotheses are: H0 : Ha :
μ=0 μ = 0
We see using StatKey or other technology that the sample mean difference in ratings is x = 21.64. Using a randomization distribution, we see that the proportion in the tail beyond 21.64 is about 0.01. Since this is a two-tail test, we have p-value = 2(0.01) = 0.02. At a 5% significance level, we reject H0 . We do have evidence that tears contain a chemical signal affecting sexual arousal ratings. 4.136 Because this is a matched pairs experiment, we do a test for a single mean on the differences. We let μ = the mean difference (Placebo − Tears) in testosterone levels in men after smelling a placebo or after smelling female tears. The hypotheses are: H0 : Ha :
μ=0 μ>0
CHAPTER 4
182
We see using StatKey or other technology that the sample mean difference in testosterone levels is x = 15.32. Using a randomization distribution, we see that the proportion in the right tail beyond 15.32 is about 0.002. We have p-value = 0.002. This is significant at almost any reasonable significance level, so we reject H0 . We do have evidence that female tears contain a chemical signal affecting testosterone levels in men. 4.137 This is a test for a single proportion, since there is only one sample of 95 attacks. We can test the proportion of attacks before the full moon, or after — it doesn’t matter which since if we know one we know the other. We let p represent the proportion of attacks that happen after the full moon. For the sample data we see p̂ = 71/95 = 0.747. Since “equally split” means p = 0.5, we have as our hypotheses: H0 : Ha :
p = 0.5 p > 0.5
We use StatKey or other technology to create a randomization distribution using proportions for samples of size n = 95 simulated when p = 0.5 to match this null hypothesis. One such randomization distribution is shown below. Since the alternative hypothesis is Ha : p > 0.5, this is a right-tail test. We see that the sample statistic p̂ = 71/95 = 0.747 is well beyond any of the randomization proportions, so we see that the p-value is essentially zero. At any significance level, we reject H0 . This data provides very strong evidence that lions are more likely to attack during the five days after a full moon than during the five days before. Watch out for lions after a full moon!
4.138 (a) Using pE to represent the proportion of people who can solve the problem while getting electrical stimulation and pN to represent the proportion of people that can solve the problem with no electrical stimulation, we are testing H0 : Ha :
pE = pN pE > pN
CHAPTER 4
183
Using a randomization distribution from StatKey or other technology, we use the sample statistic p̂E − p̂N = 0.6 − 0.2 = 0.40 to estimate the p-value as the area in the upper tail. For the distribution shown below, we see that the p-value is about 0.009. There is strong evidence to reject H0 and conclude that people are more likely to solve the problem if they get electrical stimulation.
(b) Yes. The results are significant, and because this was a well-designed experiment, we can conclude that there is a causal relationship. Electrical stimulation of the brain appears to help people find fresh insight to solve a new problem. 4.139 (a) This is an experiment since a treatment was actively manipulated. In fact, it is a matched pairs experiment. The experiment cannot be blind to the subject since s/he will know which muscle is being massaged. However the person measuring the level of inflammation should not know which is which. (b) The mean difference is xD = 1.30. (c) We define μD to be the mean difference in inflammation in muscle between a muscle that has not been massaged and a muscle that has (using control level minus massage level). The null hypothesis is no difference from a massage and the alternative is that levels are lower in a muscle that has been massaged. The hypotheses are: H0 :
μD = 0
Ha :
μD > 0
(d) Using StatKey or other technology, we create a randomization distribution of mean differences under the null hypothesis that μD = 0. One such distribution is shown below. Since the alternative hypothesis is Ha : μD > 0, this is a right-tail test. Using our sample mean xD = 1.30, we see that 24 of the 1000 simulated samples were as extreme so the p-value is 0.024.
184
CHAPTER 4
(e) Based on the randomization distribution and p-value in (d), the results are significant at a 5% level but not at a 1% level. Using a 5% level, we reject H0 and find evidence that massage does reduce inflammation in muscles after exercise. 4.140 (a) Let p be the true proportion that answer correctly. Under random guessing, the three choices are equally likely answers and therefore p = 1/3. So, the null and alternative hypotheses are H0 : p = 1/3 and Ha : p = 1/3. (b) Since 5% of the 1005 people in the sample gave the correct answer, the count of correct answers is about 50. Using StatKey to generate a randomization distribution for proportions of samples of size 1005 when p = 1/3 shows no values anywhere near the p̂ = 0.05 in the original sample. Thus the p-value ≈ 0.000. (c) The p-value is less than any commonly used significance threshold. Therefore, we reject the null hypothesis and conclude that US citizens give the correct answer less often than would be expected if they were randomly guessing. 4.141 (a) Let pD be the true proportion of degree holders that answer correctly, and pN be the true proportion of non-degree holders that answer correctly. The null and alternative hypotheses are H0 : pD = pN and Ha : pD = pN . (b) The test statistic is p̂D − p̂N = 45/373 − 57/639 = 0.1206 − 0.0892 ≈ 0.031. The randomization distribution for differences in proportions shown below shows gives p-value = 2 ∗ 0.059 = 0.118.
CHAPTER 4
185
(c) The p-value is greater than any commonly used significance threshold. There is not sufficient evidence to conclude that degree holders give the correct answer more (or less) frequently than non-degree holders. 4.142 (a) The relevant parameters are the average difference between actual and scheduled arrival time for Delta (μD ) and United (μU ). The null hypothesis is μD = μU and the alternative hypothesis is μD = μU . (b) The sample mean for the Delta flights is −5.65 minutes (so 5.65 minutes early), and the sample mean for the United flights is 4.21 minutes late. The difference in means is xD − xU = −5.65 − 4.21 = −9.86. (c) Generating 10,000 samples in a randomization distribution with StatKey we don’t find a single sample with a difference as extreme as xD − xU = −9.86 so p-value ≈ 0. (d) Since the p-value is very small we reject the null hypothesis in favor of the alternative. We have strong evidence the mean difference between actual and schedule arrival times is smaller for Delta than United. 4.143 (a) H0 : μi = μu , Ha : μi < μu , where μi and μu are the average time to flee for lizards from invaded and uninvaded habitats (respectively). (b) In the randomization distribution, none of the simulated statistics are as extreme as the observed statistic of xi − xu = 25.73 − 46.88 = −20.15, so the p-value is very close to 0. (c) Because the p-value is so small, we find very strong evidence that the mean time to flee is shorter for lizards from invaded habitats compared to habitats without fire ants. (d) No. This was not a randomized experiment (whether each lizard came from an uninvaded or invaded habitat was not randomly assigned), so we cannot make conclusions about causality. 4.144 (a) H0 : μi = μu , Ha : μi > μu , where μi and μu are the mean number of twitches for lizards from invaded and uninvaded habitats (respectively). (b) In the randomization distribution below, only 1 out of 5000 simulated statistics is as extreme as the observed statistic of x1 − xu = 2.75 − 1.1 = 1.65, so the p-value is 0.0002.
(c) This is a small p-value so we reject H0 . We have strong evidence that the mean number of twitches when lizards from an invaded habitat encounter fire ants is larger than for lizards from an uninvaded habitat.
186
CHAPTER 4
4.145 We choose a one-tailed test H0 : μI = μS vs Ha : μI < μS , where μI and μS are the mean costs in the Individual and Split situations. The difference in sample means is xI − xS = 37.29 − 50.92 = −13.63. A randomization distribution for differences in means under this null hypothesis is shown below.
In this simulation only 4 of 5000 randomizations gave a difference below the observed −13.63 to produce a p-value of 0.0008. This is a very small p-value so we have very strong evidence that the mean meal cost is lower when people are paying individually than when they are splitting the bill with a group. 4.146 (a) No, case-control studies can never be used to infer causality because they are observational studies; neither variable is manipulated by the researchers. (b) Choosing controls similar to cases is a way of eliminating some confounding variables. For example, if each control was the same age, sex, and socioeconomic status as a case, we can rule these out as confounding variables. However, because we cannot ensure that controls are similar to cases regarding all potential confounding variables, we still cannot make conclusions about causality. (c) p̂1 − p̂2 = 136/262 − 220/522 = 0.098 (d) We use technology to generate the randomization distribution below, and find the proportion of statistics in the upper tail with a difference of at least 0.098 or higher. This yields a p-value of 0.0042.
(e) This p-value is very small, and so results this extreme would be very unlikely to happen just by random chance. This provides convincing evidence that there is an association between owning a cat as a child and developing schizophrenia. Note: This does not imply there is a causal relationship.
CHAPTER 4
187
Section 4.4 Solutions 4.147 (a) In the randomization distribution below (on the left), we see that the p-value is 0.182. The results are not significant when n = 100. (b) In a randomization distribution below (in the middle), we see that the p-value is 0.016. The results are significant when n = 500. (c) In a randomization distribution below (on the right), we see that the p-value is 0.001. The results are significant and the evidence is very strong when n = 1000.
We find the strongest evidence for the alternative hypothesis with the largest sample size, when n = 1000. 4.148 (a) In a randomization distribution below (on the left), we see that the p-value is 0.167. The results are not significant when n = 50. (b) In a randomization distribution below (on the right), we see that the p-value is 0.0002. The results are significant and the evidence is very strong when n = 500.
We find the strongest evidence for the alternative hypothesis with the largest sample size, when n = 500. 4.149 (a) In a randomization distribution below (on the left), we see that the p-value is 0.317. The results are not significant when n1 = 30 and n2 = 20. (b) In a randomization distribution below (on the right), we see that the p-value is 0.007. The results are significant and the evidence is strong when n1 = 300 and n2 = 200.
188
CHAPTER 4
We find the strongest evidence for the alternative hypothesis with the largest sample size, when n1 = 300 and n2 = 200. 4.150 (a) In a randomization distribution below (on the left), we see that the p-value is 0.272. The results are not significant when n1 = n2 = 20. (b) In a randomization distribution below (in the middle), we see that the p-value is 0.0012. The results are significant when n1 = n2 = 200. (c) In a randomization distribution below (on the right), we see that the p-value is 0.000. With these sample sizes, there are no simulated samples anywhere near as extreme as 0.15. The results are significant and the evidence is very strong when n1 = n2 = 2000.
We find the strongest evidence for the alternative hypothesis with the largest sample size, when n1 = n2 = 2000. 4.151 If the null hypothesis is true, we are still likely to find significance in 5% of the tests, which is 0.05 × 100 = 5 of the tests. 4.152 If the null hypothesis is true, we are still likely to find significance in 1% of the tests, which is 0.01 × 300 = 3 of the tests. 4.153 If the null hypothesis is true, we are still likely to find significance in 10% of the tests, which is 0.10 × 40 = 4 of the tests. 4.154 If the null hypothesis is true, we are still likely to find significance in 5% of the tests, which is 0.05 × 800 = 40 of the tests. 4.155 Only choice (c) “The probability of seeing data as extreme as the sample, when the null hypothesis, H0 , is true.” matches the definition of the p-value. The other choices are common misinterpretations of a p-value. The p-value does not measure the probability of any hypothesis being true (of false) or the chance of making either type of error. It only measures how unusual the original data would be if the null hypothesis were true.
CHAPTER 4
189
4.156 (a) We know the two-tailed p-value is less than 0.05, but can’t tell if it’s also less than 0.01, so we can’t make a decision for the 1% test. (b) If we reject H0 at a 5% level, the p-value is less than 0.05, so it is also less than 0.10 and thus we would also reject H0 at a 10% level. (c) If the sample correlation r > 0, its p-value in an upper tail test would be one half of the two-tailed p-value. Since the two-tailed p-value is less than 0.05, half of it will be even smaller and thus also less than 0.05. Since the p-value for the one-tailed test is less than 5%, H0 will be rejected at a 5% level, so the conclusion is valid. (Note, however, if the sample r < 0 (i.e. is in the lower tail) then it’s upper tail p-value would be more than 0.50 and show no evidence to reject H0 : ρ = 0 in favor of Ha : ρ > 0.) 4.157 (a) The population of interest is all Euchre games that could be played between these two teams. The parameter of interest is the proportion of games that a certain team would win, say p = the proportion of all possible games that team A wins. (We could also just as easily have used team B.) (b) We are testing to see whether this proportion is either significantly higher or lower than 0.5. The hypotheses are: H0 : Ha :
p = 0.5 p = 0.5
(c) The sample statistic is the proportion of games played so far that team A has won. We could choose to look at the proportion of wins for either team, but must be consistent defining the population parameter and calculating the sample statistic. We also need to keep track of the sample size (number of games played). (d) No. Even if the two teams are equal (p = 0.5), it is quite possible that one team could win the first two games just by random chance. Therefore, even if one team wins the first two games, we would not have conclusive evidence that that team is better. (e) The game will last longer with the smaller significance level, 1%, that requires stronger evidence to determine a winner for the competition. Any game standings that are significant at a 1% level will also be significant at a 5% level, but the results may become significant with a p-value below 5% before the p-value gets below 1%. 4.158 (a) The null hypothesis is pD = pU and the alternative hypothesis is pD = pU , where pD represents the proportion of Delta flights that are at least 30 minutes late and pU represents the proportion of United flights that are at least 30 minutes late. (b) The statistic being recorded is the difference in proportions, p̂D − p̂U , which for our observed sample 45 114 − 1000 = 0.045 − 0.114 = −0.069. is 1000 (c) Generating 5000 samples in StatKey (below left) we don’t get a single sample with a difference as extreme as 0.045 − 0.114 = −0.069 so we have p-value ≈ 0. (d) Since the p-value is less than α we reject the null hypothesis in favor of the alternative, meaning that we do find evidence that there is a significant difference in the proportion of flights arriving more than 30 minutes late and that proportion this late is smaller for Delta than United. 4 − 10 (e) With the new sample sizes, the difference is about the same size, 88 88 = 0.0455 − 0.1136 = −0.0681, we would not come to the same conclusion. We see in a randomization distribution (below right) that the p-value is 2 ∗ 0.082 = 0.164, so we would fail to reject the null hypothesis. With this sample, we don’t have enough evidence to claim that the proportion of delayed flights is different.
190
CHAPTER 4
4.159 (a) The company is most likely to reject H0 : μ = 3800 (and conclude the mean is more than 3800 mg) if they use a 10% significance level. Any results that are significant at 5% or 1% are also significant at 10%. This increases their chances of making a Type I error, but the results are only internal to the company so they aren’t too concerned about such an error. They should use a 10% level. (b) If the consumer organization is worried about making a mistake in challenging the companies claim of μ = 3800, they should control the chance of making a Type I error by using a small significance level, such as 1%. This would mean they would only launch the suit if they have very strong evidence against H0 : μ = 3800 to show that the mean is less than 3800 mg. They should use a 1% level. 4.160 A Type I error (releasing a drug that is really not more effective) when there are serious side effects should be avoided, so it makes sense to use a small significance level such as α = 0.01. 4.161 A Type I error (saying there’s a difference in TV habits by gender for the class, when actually there isn’t) is not very serious, so a large significance level such as α = 0.10 will make it easier to see any difference. 4.162 A Type I error (saying your average Wii bowling score is higher than a friend’s, when it isn’t) is not very serious, so a large significance level such as α = 0.10 will make it easier to see any difference. 4.163 A Type I error (suing the company when they are not lying) is quite serious so it makes sense to use a small significance level such as α = 0.01. 4.164 A Type I error (getting people to take the supplements when they don’t help) is not serious if there are no harmful side effects, so a large significance level, such as α = 0.10, will make it easier to see any benefit of the supplements. 4.165 The company would prefer a large significance level, such α = 0.10, which means they are more likely to find enough evidence to show that the drug works better. Consumers would prefer a small p-value, such as α = 0.01, so they can be very sure that the drug works better before paying the higher cost. 4.166 Type I error: Release a drug that is really not more effective. Type II error: Fail to show the drug is more effective, when actually it is. Personal opinions will vary on which is worse. 4.167 Type I error: Conclude there’s a difference in TV habits by gender for the class, when actually there is no difference. Type II error: Find no significant difference in TV habits by gender, when actually there is a difference. Personal opinions will vary on which is worse.
CHAPTER 4
191
4.168 Type I error: Find your average Wii bowling score is higher than a friend’s, when actually it isn’t. Type II error: The sample does not contain enough evidence to conclude that your average Wii bowling score is better than your friend’s mean, when actually it is better. Personal opinions will vary on which is worse. 4.169 Type I error: Sue the company when they are not lying. Type II error: Let the company off the hook, when they are actually lying in their advertising. Personal opinions will vary on which is worse. 4.170 Type I error: Conclude people should take the supplements when they actually don’t help. Type II error: Fail to detect that the supplements are beneficial, when they actually are. Personal opinions will vary on which is worse. 4.171 (a) Since the p-value is less than 0.05, we reject H0 and conclude that phone calls are more effective at generating support for the candidate than flyers. (b) Since the decision in (a) is to reject H0 , the error we might be making is a Type I error, which means we conclude that phone calls are more effective when actually they aren’t. (c) Since the p-value is not less than 0.05, we do not reject H0 , and conclude that there is insufficient evidence to show phone calls are better than flyers. (d) Since the decision in (c) is to not reject H0 , the possible error is a Type II error, which means we conclude that phone calls are not more effective than flyers, when actually phone calls are more effective. 4.172 (a) If the sample shows significant results we reject H0 . If that conclusion is right, we have not made an error. If that conclusion is wrong (i.e., H0 is true) we’ve made a Type I error. (b) If the sample shows insignificant results we do not reject H0 . If that conclusion is right, we have not made an error. If that conclusion is wrong (i.e., H0 is false) we’ve made a Type II error. (c) We would need to know the actual value of the parameter for the population to verify if we made the correct decision or an error, and if we knew the actual value of the parameter we would not need to do any statistical inference. 4.173 (a) The hypotheses are H0 : μ = 8 vs Ha : μ < 8, where μ is the mean weight loss in the first month of the program. (b) The p-value (0.02) is small so we reject H0 . The results are statistically significant and the FTC can conclude that the mean weight loss is less than 8 pounds. (c) The results are not practically significant, because, to most people, a mean loss of 7.9 pounds is about the same as a mean loss of 8 pounds. So even though the mean might technically be below the claim of 8 pounds, many people would still say the company’s advertising is essentially correct. 4.174 (a) We let μI and μC represent the mean score on the HRSIW subtest for kindergartners who get iPads and kindergartners who don’t get iPads, respectively. The hypotheses are: H0 :
μI = μC
Ha :
μI > μC
(b) Since the p-value (0.006) is very small, we reject H0 . There is evidence that the mean score on the HRSIW subtest for kindergartners with iPads is higher than the mean score for kindergartners without iPads. The results are statistically significant.
192
CHAPTER 4
(c) The school board member could be arguing whether a 2 point increase in the mean score on one subtest really matters very much (is practically significant). Even though this is a statistically significant difference, it might not be important enough to justify the considerable cost of supplying iPad’s to all kindergartners. 4.175 The problem of multiple tests tells us that, even if there is no effect, we will reject H0 about 5% of the time just due to random chance. So about 1 in 20 experiments will show an effect even if there is no effect really there. Of course, the “one in twenty” could happen in the first (or only) experiment, or it could happen in the first 10 we do. If we conduct 10 tests and the iPad effect shows up as significant in only one of the 10, there is a good chance that the effect is relatively minimal. We might want to continue collecting data before running out to buy iPads for all kindergartners! 4.176 (a) The p-value for the test of difference in means for number of days on a ventilator is 0.15, since that test is not significant. The p-value for difference in means for number of days out of the ICU is 0.03, since that test is significant. (b) Since 46 tests were conducted and the significance level is 5%, we expect about 0.05 · 46 = 2.3 of the tests to show significance just by random chance, even if vitamin C has no effect. Thus, the three significant results could easily just be the result of random chance. 4.177 (a) If extra arts education has no effect, we expect 0.05 · 72 = 3.6 of them to be significant at the 5% level and 0.01 · 72 = 0.72 of them to be significant at the 1% level, just by random chance. (b)
(i) If additional arts education has no effect, we expect 0.05 · 48 = 2.4 of them to be significant at the 5% level and 0.01 · 48 = 0.48 of them to be significant at the 1% level. Since only 1 and 0, respectively, were significant, increased arts education does not appear to have a significant effect in these four areas. (ii) If additional arts education has no effect, we expect 0.05 · 24 = 1.2 of them to be significant at the 5% level and 0.01 · 24 = 0.24 of them to be significant at the 1% level. Since 15 and 6 tests, respectively, were significant at these levels (far more than expected just by random chance), increased arts education does appear to have a significant effect in the areas of discipline and writing (although we should still be aware of the effect of multiple tests overall.)
4.178 (a) If there is no effect due to food choices and α = 0.01, a Type I error should occur on about 1% of tests. We see that 1% of 133 is 1.33, so we would expect one or two tests to show significant evidence just by random chance. (b) When testing more than 100 different foods, making at least one Type I error is fairly likely. (c) No, even if the proportion of boys is really higher for mothers who eat breakfast cereal, the data were obtained from an observational study and not an experiment. A headline that implies eating breakfast cereal will cause an increase in the chance of conceiving a boy is not appropriate without doing an experiment where cereal habits are controlled. 4.179 Even if a new drug is no better than a placebo, we are likely to get significant results at a 5% level about 5% of the time (or about 1 time out of 20). That means, if a drug company could afford to run 40 experiments, they would likely get two showing significance even if the drug provides no benefit. The FDA should revise its guidelines so that the experiments that do not show effectiveness are also reported. 4.180 (a) We should definitely be less confident. If the authors conducted 42 tests, it is likely that some of them will show significance just by random chance even if massage does not have any significant effects. It is possible that the result reported earlier just happened to one of the random significant ones.
CHAPTER 4
193
(b) Since none of the tests were significant, it seems unlikely that massage affects muscle metabolites. (c) Now that we know that only eight tests were testing inflammation, and that four of those gave significant results, we can be quite confident that massage does reduce muscle inflammation after exercise. It would be very surprising to see four p-values (out of eight) less than 5% if there really were no effects at all. 4.181 (a) Because the results are so similar from study to study, it gives us more faith that the results of one single study were not just a fluke and provides more credibility for any single dataset. (b) We create a randomization distribution for each dataset and find the p-value as the proportion of simulated statistics at or above the observed difference in proportions for that dataset. This yields a p-value very close to 0 for the 1987 data, a p-value of about 0.02 for the 1992 data, and a p-value of about 0.006 for the 1997 data.
(c) Yes, all datasets yield significant results, which should decrease suspicions of Type I errors. It would be very unlikely that all three datasets would yield significant results just by random chance! (d) Even though the difference in proportions is smallest for the 1982 data and largest for the 1992 data, the p-value is smallest for the 1982 data because the sample size is so large, and the p-value is largest for the 1992 data because the sample size is smaller than the other two studies. 4.182 (a) We create a randomization distribution for the difference in proportions, and calculate the pvalue as the proportion of simulated statistics at least as high as the sample statistics, p̂1 − p̂2 = 0.0091, yielding a p-value of 0.106.
194
CHAPTER 4
(b) The p-value of 0.106 is not smaller than α = 0.05, so we do not reject the null. We do not have sufficient evidence to conclude that mate choice improves offspring fitness in fruit flies. (c) We create a randomization distribution, and calculate the p-value as the proportion of simulated statistics at least as high as the sample mean difference, x = 1.82 flies, yielding a p-value of 0.204.
(d) The p-value of 0.204 is not smaller than α = 0.05, so we do not reject the null. We do not have sufficient evidence to conclude that mate choice improves offspring fitness in fruit flies. (e) The follow-up study did not find significant results, so if mate choice really does improve offspring fitness in fruit flies, a Type II error would have been made. (f) The original study did find significant results, so if mate choice really does not improve offspring fitness in fruit flies, a Type I error would have been made. 4.183 (a) If mate choice has no impact on offspring fitness, we would expect 5% of the p-values to be significant just by random chance, so about 0.05 × 50 = 2.5 p-values. (b) We count 8 of the 50 p-values are less than 0.05, so 8 are significant at a 5% level. (c) No, it would not be appropriate or ethical to report just the result for just one of the fifty runs. When multiple tests are being conducted, some will be significant just by random chance, and it is misleading and not ethical to (knowingly) pick out the significant results and ignore the insignificant ones. (d) Yes. If the p-value for the upper tail rounds to 1.00, the p-value for the lower tail would round to 1 − 1.00 = 0.00, which is significant. (e) When multiple tests are conducted, even if the null hypothesis is true, we would expect to get some significant results (in either direction) just by random chance. (f) Because Type I errors can occur (significant results can happen just by chance), it is important to validate findings with replication studies. If studies replicate the original result, the original study gains credibility, but if follow-up studies fail to replicate the original result (or yield conflicting results), the conclusion of the original study loses credibility.
CHAPTER 4
195
Section 4.5 Solutions 4.184 (a) We use a confidence interval since we are estimating the proportion of voters who support the proposal and there is no claimed parameter value to test. (b) We use a hypothesis test since we are testing the claim H0 : ph = pa vs Ha : ph > pa . (c) We use a hypothesis test to test the claim H0 : p = 0.5 vs Ha : p > 0.5. (d) Inference is not relevant in this case, since we have information on the entire populations. 4.185 (a) We use a confidence interval since we are estimating the proportion who was hands and there is no claimed parameter value to test. (b) We use a confidence interval, since the question is “how much more?” not “is it more?”. (c) Inference is not relevant in this case, since we have information on the entire population. (d) We use a confidence interval since we are estimating the mean calorie intake and there is no claimed parameter to test. 4.186 (a) Since the null μ = 15 is in the 95% confidence interval, we do not reject H0 using a 5% significance level. (b) Since the null μ = 15 is outside of the 95% confidence interval, we reject H0 using a 5% significance level. (c) Since the null μ = 15 is in the 90% confidence interval, we do not reject H0 using a 10% significance level. 4.187 (a) Since the null p = 0.5 is outside of the 95% confidence interval, we reject H0 using a 5% significance level. (b) Since the null p = 0.5 is in the 95% confidence interval, we do not reject H0 using a 5% significance level. (c) Since the null p = 0.5 is in the 99% confidence interval, we do not reject H0 using a 1% significance level. 4.188 (a) Since the null ρ = 0 is outside of the 95% confidence interval, we reject H0 using a 5% significance level. The interval contains only positive values for the correlation, providing evidence that the correlation for the population is positive. (b) Since the null ρ = 0 is outside of the 90% confidence interval, we reject H0 using a 10% significance level. The interval contains only negative values for the correlation, providing evidence that the correlation for the population is negative. (c) Since the null ρ = 0 is in the 99% confidence interval, we do not reject H0 using a 1% significance level. 4.189 (a) Since the null μ1 − μ2 = 0 is outside of the 95% confidence interval, we reject H0 using a 5% significance level. The interval includes only positive values, suggesting μ1 is larger than μ2 . (b) Since the null μ1 − μ2 = 0 is in the 99% confidence interval, we do not reject H0 using a 1% significance level. (c) Since the null μ1 −μ2 = 0 is outside of the 90% confidence interval, we reject H0 using a 10% significance level. The interval includes only negative values, suggesting μ2 is larger than μ1 .
CHAPTER 4
196
4.190 (a) Since the null p = 0.5 is inside the 95% confidence interval, we do not reject H0 using a 5% significance level. (b) Since the null p = 0.75 is above the 95% confidence interval, we reject H0 using a 5% significance level. (c) Since the null p = 0.4 is below the 95% confidence interval, we reject H0 using a 5% significance level. 4.191 (a) Since the null μ = 100 is below the 99% confidence interval, we reject H0 using a 1% significance level. (b) Since the null μ = 150 is inside the 99% confidence interval, we do not reject H0 using a 1% significance level. (c) Since the null μ = 200 is above the 99% confidence interval, we reject H0 using a 1% significance level. 4.192 (a) Since the null p1 − p2 = 0 is below the 90% confidence interval, we reject H0 using a 10% significance level. (b) Since the null p1 − p2 = 0 is below the 90% confidence interval, we reject H0 : p1 = p2 vs the two-tailed alternative Ha : p1 = p2 using a 10% significance level. That means that the p-value for the two-tailed test is less than α = 0.10. Since the confidence interval contains only positive values, we also know that p̂1 − p̂2 > 0. Thus the original difference is in the upper tail of a randomization distribution for the test and the proportion beyond it (half the p-value) must be less than 0.05. This means the upper-tail test would lead to rejecting H0 at a 5% significance level. (c) The confidence interval having positive endpoints shows that p̂1 > p̂2 in the original sample, so there is no reasonable evidence in the direction of the alternative Ha : p1 < p2 . We do not reject H0 . 4.193 (a) The hypotheses are H0 : p = 0.21 and Ha : p = 0.21. The hypothesized proportion of 0.21 lies inside the confidence interval 0.175 to 0.225, so p = 0.21 is a reasonable possibility for the population proportion given the sample results. Using a 5% significance level, we do not reject H0 and do not find evidence that Congressional approval in December 2019 was different than 21%. (b) The hypotheses are H0 : p = 0.09 and Ha : p = 0.09. The hypothesized proportion of 0.09 lies outside the confidence interval 0.175 to 0.225, so p = 0.09 is not a reasonable possibility for the population proportion given the sample results. Using a 5% significance level, we reject H0 and conclude that Congressional approval in May 2019 was different than the record low of 9%. 4.194
(a) No treatments were controlled so this is an observational study.
(b) The proportion of melanomas on the left side is p̂ = 31/42 = 0.738. (c) We are 95% sure that between 57.9% and 86.1% of all melanomas occur on the left side. (d) The hypotheses are H0 : p = 0.5 vs Ha : p > 0.5, where p is the proportion of melanomas on the left. (e) This is a one-tailed test since the question asks about the proportion on the left being more than on the right. (f) Reject H0 , since the plausible values for the proportion are 0.579 to 0.862 which are all above the hypothesized 0.50. (g) Since the p-value is small (less than any reasonable significance level), we reject H0 . This provides strong evidence that melanomas are more likely to occur on the left side than the right side. (h) No, since the data were not collected with an experiment we cannot infer a cause and effect relationship between these variables.
CHAPTER 4
197
4.195 (a) Randomly assign half the subjects to read a passage in a print book and the other half to read the same passage with an e-book. Record the time needed to complete the reading for each participant. Other approaches are feasible, for example a matched pairs design where each subject reads similar length passages with both sources. (b) With a test we can determine whether the means for the populations are likely to differ, not just the samples. A test also gives us the strength of evidence for the result. (c) A confidence interval for the difference in means gives an estimate of how large the effect might be, not just whether there is one. Also, like a test, a confidence interval allows us to generalize to the population. (d) The first quotations only tells us whether the results are significant and not which method is faster. The second quotation gives additional information to show the mean reading time was fastest for the print book and slowest for the Kindle, with the iPad in between.
4.196 (a) Since 0.5 is not in the 95% confidence interval, it is not a plausible value for the population proportion, so we reject H0 . There is evidence, at the 5% level, that the proportion with both partners reporting being in a relationship on Facebook is more than 0.5. (b) Since 0.5 is in the 95% confidence interval, it is a plausible value for population proportion, so we do not reject H0 . There is not enough evidence, at the 5% level, to conclude that the proportion with both showing a partner in their Facebook profile pictures is different from 0.5.
4.197 (a) If we let pF and pM represent the proportion of rats showing compassion between females and males, respectively, we have H0 :
pF = pM
Ha :
pF = pM
(b) The null hypothesis value for the difference in proportions is pF − pM = 0. Since 0 is not in the confidence interval, we reject H0 and find, at the 5% level, that there is evidence of a difference in proportions of compassionate rats between the two genders. (c) Since we are estimating pF − pM and all plausible values are positive, we have strong evidence that pF −pM > 0 which means pF > pM . This indicates that female rats are more likely to show compassion. 4.198 (a) The proportion of home wins in the sample is p̂ = 70/120 = 0.583. Using StatKey or other technology, we construct a bootstrap distribution of sample proportions such as the one below. We see that a 90% confidence interval in this case goes from 0.508 to 0.658. We are 90% confident that the home team will win between 50.8% and 65.8% of soccer games in this league.
198
CHAPTER 4
(b) To test if the proportion of home wins differs from 0.5, we use H0 : p = 0.5 vs Ha : p = 0.5. (c) Since 0.5 is not in the interval in part (a), we reject H0 at the 10% level. The proportion of home team wins is not 0.5, at a 10% level. (d) We create a randomization distribution (shown below) of sample proportions when n = 120 using p = 0.5. Since this is a two-tailed test, we have p-value = 2(0.041) = 0.082.
(e) At a 10% significance level, we reject H0 and find that the proportion of home team wins is different from 0.5. Yes, this does match what we found in part (c). (f) The confidence interval shows an estimate for the proportion of times the home team wins, which the p-value does not give us. The p-value gives a sense for how strong the evidence is that the proportion of home wins differs from 0.5 (only significant at a 10% level in this case). (g) The bootstrap and randomization distributions are similar, except that the bootstrap proportions are centered at the original p̂ = 0.583, while the randomization proportions are centered at the null hypothesis, p = 0.5.
CHAPTER 4
199
4.199 (a) This is an observational study. Randomization was used to select the sample of 50 stocks, so there should be no bias. (b) Number the stocks 001 to 500, put the numbers in a hat, draw out 50 to identify the sample, or use a random number generator. (c) Most changes are near zero (most slightly above), with a few outliers in both directions. (d) We see that the mean is x = 0.124 and the standard deviation is s = 0.99. The five number summary is (−3.27, −0.03, 0.11, 0.32, 4.86). (e) Using the 2.5%-tile and 97.5%-tiles from a bootstrap distribution (shown below) of 1000 means for samples of size 50, we obtain the 95% confidence interval (−0.142, 0.403). We are 95% sure that the average stock price change for S&P 500 stocks during this period was between −$0.142 and $0.403.
(f) Since the question does not specify a particular direction, the hypotheses are H0 : μ = 0 vs Ha : μ = 0. Since μ = 0 falls within the 95% confidence bounds of part (e) we should not reject H0 at a 5% level. We do not find evidence that the mean change is different from zero. (g) To test for a positive mean change the hypotheses are H0 : μ = 0 vs Ha : μ > 0. In one randomization distribution (shown below) of means for 1000 samples of size 50 when μ = 0, we find 185 values at (or beyond) the observed mean of 0.124. This gives a one-tailed p-value of 0.185, which is not small, so we do not reject H0 . We do not have enough evidence to show that the mean price change for all S&P 500 stocks during this period was positive.
200
CHAPTER 4
(h) Since the decision was to not reject H0 , only a Type II error is possible. To see if that error was made, we could find the mean price change for all 500 stocks during the period and see if it really was positive. 4.200 (a) We are testing H0 : μ = 10 vs Ha : μ = 10, where μ represents the mean longevity, in years, of all mammal species. The mean longevity in the sample is x = 13.15 years. We use StatKey or other technology to create a randomization distribution such as the one shown below. For this two-tailed test, the proportion of samples beyond x = 13.15 in this distribution gives a p-value of 2·0.005 = 0.010. We have strong evidence that mean longevity of mammal species is different from 10 years.
(b) Since, for a 5% significance level, we reject μ = 10 as a plausible value of the population mean in the hypothesis test in part (a), 10 would not be included in a 95% confidence interval for the mean longevity of mammals. 4.201 (a) We are testing H0 : μ = 200 vs Ha : μ = 200, where μ represents the mean gestation time, in days, of all mammals. The mean gestation in the sample is x = 194.3 days. We use StatKey or other technology to create a randomization distribution such as the one shown below. For this two-tailed test, the proportion of samples beyond x = 194.3 in this distribution gives a p-value of 2·0.388 = 0.776. This is not a small p-value, so we do not reject H0 . There is not sufficient evidence to say that mean gestation time is different from 200 days.
CHAPTER 4
201
(b) Since we do not reject μ = 200 as a plausible value of the population mean in the hypothesis test in part (a), 200 would be included in a 95% confidence interval for the mean gestation time of mammals. 4.202
(a) Answers vary. For example, one possible randomization sample is shown below Caffeine 244 250 248 246 248 245 246 247 248 246 xc = 246.8 No caffeine 250 244 252 248 242 250 242 245 242 248 xn = 246.3
(b) Answers vary. For the randomization sample above, xc − xn = 246.8 − 246.3 = 0.5. (c) The sample difference of 0.5 for the randomization above would fall a bit to the right of the center of the randomization distribution.
4.203
(a) We find that x = 91 for the sample of n = 6 arsenic levels.
(b) Since the sample mean (91) is 11 more than the null hypothesized mean (80), we subtract 11 from each data value to get: 57, 64, 70, 82, 84, 123 to move the mean down to the null mean of 80. The sample size (n = 6) and standard deviation (s = 23.47) are the same as in the original sample. (c) Answers vary. For example, 57, 70, 70, 123, 82, 84 gives a mean of x = 81. (d) Answers vary. Here are means for 10 randomization samples: 81.0, 93.0, 87.3, 67.7, 74.5, 85.8, 77.8, 93.0, 81.2, 76.7 which appear in the dotplot below along with the location of the mean for the original sample x = 91.0.
CHAPTER 4
202
4.204 We use technology to create randomization samples by repeatedly sampling six values with replacement (after subtracting 11 from each value in the original sample to match the null hypothesis that μ = 80). We collect the means for 1000 such samples to form a randomization distribution such as the one shown below. To find the p-value for this right-tailed alternative we count how many of the randomization means exceed our original sample mean, x = 91.0. For the distribution below this is 115 of the 1000 samples, so the p-value = 0.115. This is not enough evidence (using a 5% significance level) to reject H0 : μ = 80. The chain should continue to order chickens from this supplier (but also keep testing the arsenic level of its chickens).
4.205 (a) The hypotheses are H0 : μs = μc vs Ha : μs = μc , where μs and μc are the mean number of words recalled after sleep and caffeine, respectively. (b) The number of words recalled would be the same regardless of whether the subject was put in the sleep or the caffeine group. (c) The sample statistic is xs − xc . For the original sample the value is 15.25 − 12.25 = 3.0. (d) Under H0 we have μs − μc = 0 so the randomization distribution should be centered at zero. (e) We randomly divide the 24 sample word recall values into two groups of 12 (one for “sleep” group, the other for “caffeine”) and find the difference in sample means. One such randomization is shown below where xs − xc = 14.75 − 12.75 = 2.0. Answers vary.
Sleep Caffeine
11 16
14 14
15 10
17 13
17 7
18 10
6 14
12 18
18 9
13 14
21 12
15 16
Mean = 14.75 Mean = 12.75
(f) A randomization distribution for 1000 differences in means is shown below. Since this is a two-tailed test so we double the count in one tail (25 out of 1000 values at or beyond xs − xc = 3.0) in order to account for both tails. We see that the p-value is = 2 ∗ 0.025 = 0.05.
CHAPTER 4
203
(g) The p-value is more than α = 0.01 so we do not reject H0 . There is not sufficient evidence (at a 1% level) to show a difference in mean number of words recalled after taking a nap or ingesting caffeine. 4.206 (a) Define ρ to be the correlation between number of hurricanes and year. We have H0 : ρ = 0 (no association between hurricanes and years) and Ha : ρ > 0 (number of hurricanes tends to increase as years increase). (b) We could create note cards with the number of hurricanes each year, shuffle them, then randomly assign the number of hurricane note cards to each of the years from 1914 to 2018. Calculate the correlation between number of hurricanes and years for the simulated sample and then repeat the process thousands of times. 4.207 The correlation between years and hurricanes in the sample is r = 0.321. In the randomization distribution we see that the proportion of simulated samples with correlations in the tail beyond 0.321 gives a p-value of 0.0010. This small p-value implies that there is strong evidence of a positive correlation and that the number of hurricanes that make landfall on the eastern border of the United States is increasing as the years increase.
4.208 We are testing whether the correlation ρ is positive, where ρ represents the correlation between the malevolence rating of uniforms in the National Hockey League and team z-scores for the number of penalty minutes. The hypotheses for the test are H0 : Ha :
ρ=0 ρ>0
204
CHAPTER 4
Using the data for NHL Malevolence and ZPenMin in the dataset MalevolentUniformsNHL and using either StatKey or other technology, we match the null assumption that the correlation is zero by randomly assigning the values for one of the variables to the values of the other variable and compute the correlation of the resulting simulated data. We do this 1000 times and collect these simulated correlation statistics into a randomization dotplot (shown below). To see how extreme the original sample correlation of r = 0.521 is we find the proportion of simulated samples that have a correlations of 0.521 or larger. For the distribution below, this includes 18 of the 1000 randomizations in the right tail, giving a p-value 18/1000 = 0.018. Answers will vary for other randomizations. Using a 5% significance level, we reject H0 and conclude that the malevolence of hockey uniforms is positively correlated with the number of minutes a team is penalized. The results are significant at a 5% level but not at a 1% level.
4.209 (a) The hypotheses are H0 : pd = pc vs Ha : pd < pc , where pd and pc are the proportion who relapse when treated with desipramine or a placebo, respectively. (b) Under H0 we have pd − pc = 0, so the distribution of differences in number relapsing should be centered at zero. (c) Start with 48 cards, put “R” for relapse on 30 of them, leave the others blank (no relapse). Shuffle the cards and deal into two piles of 24 each, one for the desipramine group, the other for the placebo group. Count the number of relapses in the desipramine group and subtract the number in the placebo group. 4.210 (a) Using StatKey or other technology we create randomization samples by randomly scrambling the desipramine/placebo group assignments and recording the difference (desipramine relapses − placebo relapses) for each randomization sample. We could record the differences in the counts for many randomizations (as in the randomization distribution below) or use the differences in the proportion who relapse in each group. The difference in counts for the original sample is D = 10 − 20 = −10. Since the alternative is left-tailed (looking for evidence of fewer relapses with desipramine) we count the number of randomization samples that give a difference of −10 or less. That is just 4 cases in the distribution below, giving a p-value of 0.004. This small p-value shows strong evidence to reject H0 and conclude that desipramine works better than a placebo (gives a smaller proportion of patients relapsing) when treating cocaine addiction.
CHAPTER 4
205
(b) The p-value for testing desipramine vs a placebo (0.004) is much smaller than the p-value for testing lithium vs a placebo (0.345, which is not significant). There is stronger evidence that desipramine is effective for treating cocaine addiction than there is for lithium. 4.211 (a) Under H0 , Muriel’s chances of guessing correctly will be 1/2 for each cup. So, this is equivalent to creating a randomization distribution for a single proportion with sample size 8 and p = 0.5. One approach to generating a randomization sample is to flip a coin eight times and record the proportion of times it lands heads. (b) Under H0 , Muriel will randomly select 4 cups as having milk first and the other 4 cups as having tea first (regardless of which cups truly had milk or tea first). One approach to generate a randomization sample is to create 4 paper cards for ”tea first” and 4 paper cards for ”milk first”, shuffle the cards, and randomly deal 4 into a ”tea first” pile and 4 into a ”milk first” pile. Record the proportion of cards that are allocated to the correct pile. 4.212 (a) We are interested in whether pulse rates are higher on average than lecture pulse rates, so our hypotheses are H0 :
μQ = μL
Ha :
μQ > μL
where μQ represents the mean pulse rate of students taking a quiz in a statistics class and μL represents the mean pulse rate of students sitting in a lecture in a statistics class. We could also word hypotheses in terms of the mean difference D=Lecture pulse−Quiz pulse in which case the hypotheses would be H0 : μD = 0 vs Ha : μD > 0. (b) We are interested in the difference between the two pulse rates, so an appropriate statistic is xD , the differences (D = Lecture − Quiz) for the sample. For the original sample the differences are: +2,
−1,
+5,
−8,
+1,
+20,
+15,
−4,
+9,
−12
and the mean of the differences is xD = 2.7. (c) Since the data were collected as pairs, our method of randomization needs to keep the data in pairs, so the first person is labeled with (75, 73), with a difference in pulse rate of 2. As long as we keep the data in pairs, there are many ways to conduct the randomization. In every case, of course, we need to make sure the null hypothesis (no difference) is met and we need to keep the data in pairs.
CHAPTER 4
206
One way to do this is to sample from the pairs with replacement, but randomly determine the order of the pair (perhaps by flipping a coin), so that the first pair might be (75, 73) with a difference of 2 or might be (73, 75) with a difference of −2. Notice that we match the null hypothesis that the quiz/lecture situation has no effect by assuming that it doesn’t matter – the two values could have come from either situation. Proceeding this way, we collect 10 differences with randomly assigned signs (positive/negative) and compute the average of these differences. That gives us one simulated statistic. A second possible method is to focus exclusively on the differences as a single sample. Since the randomization distribution needs to assume the null hypothesis, that the mean difference is 0, we can subtract 2.7 (the mean of the original sample of differences) from each of the 10 differences, giving the values −0.7,
−3.7,
2.3,
−10.7,
−1.7,
17.3,
−6.7,
12.3,
6.3,
−14.7
Notice that these values have a mean of zero, as required by the null hypothesis. We then select samples of size ten (with replacement) from the adjusted set of differences (perhaps by putting the 10 values on cards or using technology) and compute the average difference for each sample. There are other possible methods, but be sure to use the paired data values and be sure to force the null hypothesis to be true in the method you create! (d) Here is one sample if we randomly assign +/− signs to each difference: +2,
+1,
−5,
−8,
−20,
+1,
−15,
+4,
+9,
⇒ xD = −1.9
+12
Here is one sample drawn with replacement after shifting the differences. −6.7,
−1.7,
6.3,
−3.7,
2.3,
−1.7,
−0.7,
17.3,
2.3,
−0.7
⇒ xD = 1.3
(e) Neither of the statistics for the randomization samples in (d) exceed the value of xD = 2.7 from the original sample, but your answers will vary for other randomizations. 4.213 We are interested in whether pulse rates are higher, on average, during a quiz than during a lecture, so our hypotheses are H0 :
μQ = μL
Ha :
μQ > μL
where μQ represents the mean pulse rate of students taking a quiz in a statistics class and μL represents the mean pulse rate of students sitting in a lecture in a statistics class. Since this is paired data, we could also word hypotheses in terms of the mean difference D=Lecture pulse−Quiz pulse, in which case the hypotheses would be H0 : μD = 0 vs Ha : μD > 0. Because the data are paired, we use the differences (D=Quiz pulse − Lecture pulse). The 10 sample differences are: 2,
−1,
5,
−8,
1,
20,
15,
−4,
and the sample statistic is the mean of the differences, xD = 2.7.
9,
−12
CHAPTER 4
207
We use StatKey or other technology to create a randomization distribution under the assumption that there is no difference in pulse rates between the quiz and lecture settings. The randomization distribution below was obtained subtracting 2.7 from each of the original differences (to make their mean zero) and then selecting samples of size ten (with replacement) to compute each mean difference. (As the solution to the previous exercise indicates, there are other valid ways to construct the randomization samples).
For this randomization distribution 192 of the 1000 samples produced a mean as large as (or larger than) the original xD = 2.7. Since the test has an upper tail alternative this gives a p-value of 0.192 which is larger than any reasonable significance level, so we do not reject H0 . We do not find evidence that mean pulse rates are higher during a quiz than during a lecture. (Notice that the sample size of n = 10 students is quite small, so it may well be true that pulse rates are higher during a quiz, but our sample was too small to give sufficient evidence for the effect.) 4.214 (a) Sampling 100 responses (with replacement) from the original sample is inappropriate, since the samples are taken from a “population” where the proportion is 0.76, so it doesn’t match H0 : p = 0.8. (b) Sampling 100 responses (with replacement) from a set consisting of 8 correct responses and 2 incorrect responses is appropriate, since p = 0.8 in this population and we are taking the same size samples as the original data. 4.215 (a) This method is appropriate. Under H0 the exercise amounts should be unrelated to the genders, so each exercise value could have just as likely come from either gender. (b) This method is appropriate. Adjusting the values so that the male and female exercise means are the same produces a “population” to sample from that agrees with the null hypothesis of equal means. Note that adding 3.0 to all of the female exercise values or subtracting 3.0 from all the male exercise values would also accomplish this goal. Adjusting both, as suggested in part (b) has the advantage of keeping the combined mean at 10.6. Any of these methods give equivalent results. (c) This method is appropriate. As in part(b) the randomization samples are taken from a “population” where the mean exercise levels are the same for males and females, so the distribution of xF − xM should be centered around zero as specified by H0 : μF = μM . 4.216 We are testing H0 : μF = μM vs H0 : μF = μM , where μ represents the average hours of exercise in a week. The difference for the original sample is xF − xM = 9.40 − 12.40 = −3.00. We need to see how many randomization samples give mean differences this small (or even more extreme), then double the count to account for the other tail of the randomization distribution since this is a two-tailed alternative. Note, since
CHAPTER 4
208
this is a two-tailed test, we can just as easily use xM − xF = 12.40 − 9.40 = +3.00 as the difference for the original sample and look for more extreme values in the other tail. (a) We randomly scramble the gender labels (M and F ) and pair them with the actual exercise times, then find the mean exercise time within each randomly assigned group. (StatKey tip: Use “Reallocate” as the method of randomization.) For each randomization we record the difference in means xF − xM . Repeating this for 1000 randomizations produces a distribution such as the one shown below. To find the p-value we count the number of differences at (or below) the original sample difference of −3.00 and double the count to account for the other tail. In this case we have p-value= 2 · 99/1000 = 0.198. 50
Left Tail
Two-Tail
# samples = 1000 mean = ‒0.048 st.dev. = 2.305
Right Tail
40 30 0.096
0.808
0.096
20 10 0 ‒6
‒4
‒3
‒2
0 ‒0.048
2
4
6
8
3
(b) Depending on technology, for this approach it may help to create two separate samples, one for the females and one for the males. Add 1.2 to each of the exercise values in the female sample and subtract 1.8 from each of the exercise values in the male sample to create new variables that have the same mean (10.6) in each group. Take separate samples (with replacement) — size 30 from the females and size 20 from the males — and find the mean exercise value in each sample. To find the randomization distribution we collect 1000 such differences. Depending on technology it might be easier to generate 1000 means for each gender and then subtract to get the 1000 differences. (StatKey tip: Use “Shift Groups” as the method of randomization.) One set of 1000 randomization differences in means is shown below. This distribution has 98 sample differences less than (or equal to) the original difference of −3.00, so the two-tailed p-value= 2 · 98/1000 = 0.196. Left Tail
50
Two-Tail
# samples = 1000 mean = 0.043 st.dev. = 2.368
Right Tail
40 30 0.098
0.804
0.098
20 10 0 ‒6
‒4
‒3
‒2
0 0.043
2
4 3.167
6
8
CHAPTER 4
209
(c) We sample 30 values (with replacement) from the original sample of all 50 exercise amounts to simulate a new sample females and do the same for 20 males. Since both samples are drawn from the same set, we satisfy H0 : μF = μM . (StatKey tip: Use “Combine Groups” as the method of randomization.) Compute the difference in means, xF − xM , for the new samples. One set of 1000 randomizations using this method is shown below. We see that 99 differences in means are at (or below) the original −3.00, so the two-tailed p-value= 2 · 99/1000 = 0.198. 50
Left Tail
Two-Tail
# samples = 1000 mean = 0.004 st.dev. = 2.245
Right Tail
40 30 0.09
0.82
‒4 ‒2 ‒3.002
0 0.004
0.09
20 10 0 ‒6
2
4
6
8
3
Note that that the randomization distributions and p-values produced by each method are similar. In each case the p-value is not small and we have insufficient evidence to conclude that there is difference in mean exercise time between female and male students.
CHAPTER 5
242 Section 5.1 Solutions 5.1 The area in the right tail more extreme than z = 2.20 is 0.014. 5.2 The area in the right tail more extreme than z = 0.80 is 0.212. 5.3 The area in the right tail more extreme than z = −1.25 is 0.894. 5.4 The area in the right tail more extreme than z = 3.0 is 0.0013. 5.5 The area in the left tail more extreme than z = −1.75 is 0.040. 5.6 The area in the left tail more extreme than z = −2.60 is 0.0047.
5.7 For this test of a mean we compare the sample mean to the hypothesized mean and divide by the standard error. Sample mean − Null mean 82.4 − 80 z= = = 3.0 SE 0.8 5.8 For this test of a proportion we compare the sample proportion to the hypothesized proportion and divide by the standard error. z=
0.235 − 0.25 Sample proportion − Null proportion = = −0.83 SE 0.018
5.9 For this test of a proportion we compare the sample proportion to the hypothesized proportion and divide by the standard error. z=
Sample proportion − Null proportion 0.41 − 0.5 = = −1.29 SE 0.07
5.10 For this test of a mean we compare the sample mean to the hypothesized mean and divide by the standard error. Sample mean − Null mean 11.3 − 10 z= = = 13.0 SE 0.10 5.11 The relevant statistic here is the difference in proportions. From the null hypothesis, we see that p1 − p2 = 0. We have: z=
(0.18 − 0.23) − 0 Sample difference in proportions − Null difference in proportions = = −1.0 SE 0.05
5.12 The relevant statistic here is the difference in means. From the null hypothesis, we see that µ1 −µ2 = 0. We have: z= 5.13
(35.4 − 33.1) − 0 Sample difference in means − Null difference in means = = 9.2 SE 0.25
(a) Using technology the standard normal area above z = 0.84 is 0.20. The p-value for an upper tail test is 0.20.
(b) Using technology the standard normal area below z = −2.38 is 0.0087. The p-value for a lower tail test is 0.0087.
CHAPTER 5
243
(c) Using technology the standard normal area above z = 2.25 is 0.012. Double this to find the p-value for a two-tail test, 2 · 0.0122 = 0.024. These three p-values are shown as areas below.
(a) z = 0.84 upper tail 5.14
(b) z = −2.38 lower tail
(c) z = 2.25 two-tailed
(a) Using technology the standard normal area below z = −1.08 is 0.140. The p-value for a lower tail test is 0.140.
(b) Using technology the standard normal area above z = 4.12 is 0.000019. We don’t really need technology to find this p-value. The standardized test statistic is so large (more than four standard deviations above the mean) that the area beyond it is essentially zero. (c) Using technology the standard normal area below z = −1.58 is 0.057. Double this to find the p-value for a two-tail test, 2 · 0.057 = 0.114. These three p-values are shown as areas below.
(a) z = −1.08 lower tail 5.15
(b) z = 4.12 upper tail
(c) z = −1.58 two-tailed
(a) The plot on the left shows the area above 65 on a N (60, 10) distribution to be 0.309. We could also standardize the endpoint with z=
65 − 60 = 0.5 10
and use the N (0, 1) curve on the right below to find the area.
CHAPTER 5
244
N (60, 10)
N (0, 1)
(b) The plot on the left shows the area below 48 on a N (60, 10) distribution to be 0.115. We could also standardize the endpoint with z=
48 − 60 = −1.2 10
and use the N (0, 1) curve on the right below to find the area.
N (60, 10)
5.16
N (0, 1)
(a) The plot on the left shows the area above 28 on a N (15, 5) distribution to be 0.0047. We could also standardize the endpoint with z=
28 − 15 = 2.6 5
and use the N (0, 1) curve on the right below to find the area.
CHAPTER 5
245
N (15, 5)
N (0, 1)
(b) The plot on the left shows the area below 12 on a N (15, 5) distribution to be 0.274. We could also standardize the endpoint with z=
12 − 15 = −0.6 5
and use the N (0, 1) curve on the right below to find the area.
N (15, 5)
5.17
N (0, 1)
(a) The plot on the left shows the area above 140 on a N (160, 25) distribution to be 0.788. We could also standardize the endpoint with z=
140 − 160 = −0.8 25
and use the N (0, 1) curve on the right below to find the area.
CHAPTER 5
246
N (160, 25)
N (0, 1)
(b) The plot on the left shows the area below 200 on a N (160, 25) distribution to be 0.945. We could also standardize the endpoint with 200 − 160 = 1.6 z= 25 and use the N (0, 1) curve on the right below to find the area.
N (160, 25)
N (0, 1)
5.18 The sample statistic is the difference is proportions, which is p̂o − p̂c = 0.90 − 0.84 = 0.06. The null value for the difference in proportions is 0, and the standard error is 0.021. The standardized test statistic is z=
0.06 − 0 Sample Statistic − Null Parameter = = 2.857 SE 0.021
We find the p-value as the proportion of the standard normal distribution beyond 2.86 in the right tail, yielding a p-value of 0.0021. This is below the significance level of 0.05, so we reject H0 and find evidence that, after only 5 days, the proportion alive of fruit flies who eat organic soybeans is significantly higher than the proportion alive who eat conventional soybeans. (In the study, the difference between organic and not-organic soybeans became apparent the fastest and was the strongest compared to the other foods tested. Apparently, you should get organic when buying soybeans!)
CHAPTER 5
247
5.19 The sample statistic is the difference is proportions, which is p̂o − p̂c = 0.42 − 0.40 = 0.02. The null value for the difference in proportions is 0, and the standard error is 0.031. The standardized test statistic is z=
Sample Statistic − Null Parameter 0.02 − 0 = = 0.645 SE 0.031
We find the p-value as the proportion of the standard normal distribution beyond 0.645 in the right tail, yielding a p-value of 0.259. This p-value is not significant at a 5% level, so we do not reject H0 . After 25 days, we don’t see evidence of a significant difference in the proportion alive between those eating organic bananas and those eating conventional bananas. (In fact, throughout the study, it didn’t seem to matter whether bananas were organic or not. With the other types of food tested, the difference became significant after enough time passed.) 5.20 The sample statistic is the difference is proportions, which is p̂o − p̂c = 0.68 − 0.66 = 0.02. The null value for the difference in proportions is 0, and the standard error is 0.030. The standardized test statistic is z=
Sample Statistic − Null Parameter 0.02 − 0 = = 0.667 SE 0.030
We find the p-value as the proportion of the standard normal distribution beyond 0.667 in the right tail, yielding a p-value of 0.252. We do not reject H0 and do not find enough evidence to conclude that, after only 11 days, the proportion alive of fruit flies who eat organic potatoes is significantly higher than the proportion alive who eat conventional potatoes. (In fact, the difference doesn’t become significant until about 13 days.) 5.21 The sample statistic is the difference is proportions, which is p̂o − p̂c = 0.79 − 0.32 = 0.47. The null value for the difference in proportions is 0, and the standard error is 0.031. The standardized test statistic is z=
0.47 − 0 Sample Statistic − Null Parameter = = 15.161 SE 0.031
We find the p-value as the proportion of the standard normal distribution beyond 15.161 in the right tail, yielding a p-value of 0.000. (Recall that the z test statistic is a z-score, and a z-score of 15.161 is very far out in the tail! We don’t even really need a standard normal distribution to know that the p-value here is going to be very close to zero.) We reject H0 and find very strong evidence that, after just 8 days, the proportion alive of fruit flies who eat organic soybeans is significantly higher than the proportion alive who eat conventional soybeans. (In the study, the difference between organic and not-organic soybeans became apparent the fastest and was the strongest compared to the other foods tested. Apparently, you should get organic when buying soybeans!) 5.22 Since we are testing to see if there is evidence that the proportion of all adults saying TV is a primary source of news, p, is greater than 0.65, the relevant hypotheses are H0 : Ha :
p = 0.65 p > 0.65
The sample statistic of interest is p̂ = 0.66, the null parameter from the null hypothesis is 0.65, and the standard error from the randomization distribution is 0.013. The test statistic is z=
0.66 − 0.65 Sample Statistic − Null Parameter = = 0.77 SE 0.013
This is a one-tailed test on the right, so we find the proportion of a standard normal distribution larger than 0.77. The area in this upper tail is 0.221, so we have p-value = 0.221
CHAPTER 5
248
Since the p-value is larger than any reasonable significance level, we do not find evidence that the proportion of adults using TV as a main news source is greater than 65%. 5.23 The relevant hypotheses are H0 : pQ = pR vs Ha : pQ > pR , where pQ and pR are the proportions of words recalled correctly after quiz studying or studying by reading alone, respectively. Based on the sample information the statistic of interest is p̂Q − p̂R = 0.42 − 0.15 = 0.27 The standard error of this statistic is given as SE = 0.07 and the null hypothesis is that the difference in the proportions for the two group is zero. We compute the standardized test statistic with 0.27 − 0 Sample Statistic − Null Parameter = = 3.86 SE 0.07 Using technology, the area under a N (0, 1) curve beyond z = 3.86 is only 0.000056. This very small pvalue provides very strong evidence that the proportion of words recalled using self-quizzes is more than the proportion recalled with reading study alone. z=
5.24 We test H0 : p = 0.5 vs Ha : p < 0.5 where p is the proportion of all World Cup penalty shots for which the goalkeeper guesses the correct direction. The statistic from the original sample is p̂ = 0.41 and the null parameter is p = 0.5. The standard error is SE = 0.043. We use this information to find the standardized test statistic: Sample statistic − Null parameter z = SE 0.41 − 0.5 = 0.043 = −2.09 This is a lower tail test, so we find the area below z = −2.09 in the lower tail of the standard normal distribution. We see in the figure below that this area is 0.0183. The p-value for this test is 0.0183. At a 5% significance level, we find evidence that the proportion of World Cup penalty shots in which the goalkeeper guesses correctly is significantly less than half.
5.25 This is a test for a difference in proportions. Using pR and pU for the proportion of dogs to follow a cue from a person who is reliable or unreliable, respectively, the hypotheses are: H0 : Ha :
pR = p U pR = pU
CHAPTER 5
249
The sample proportions are p̂R = 16/26 = 0.615 and p̂U = 7/26 = 0.269 so the difference in proportions is p̂R − p̂U = 0.615 − 0.269 = 0.346. The null hypothesis difference in proportions is zero and the standard error is 0.138, so the standardized test statistic is: z=
Sample Statistic − Null Parameter 0.346 − 0 = = 2.507 SE 0.138
The p-value is the area more extreme than this test statistic. Using technology, we see that the area beyond this value in the right tail of a standard normal distribution is 0.006. This is a two-tail test, so we have p-value = 2(0.006) = 0.012 At a 5% level, we reject H0 and find evidence that the proportion of dogs following a pointing cue is different depending on whether the previous cue was reliable or not. The proportion that follow the cue appears to be higher when the person is reliable. 5.26 The relevant hypotheses are H0 : µf = µm vs Ha : µf < µm where µf and µm are the respective exercise means for female and male statistics students. The difference in the sample means is xf − xm = 9.4 − 12.4 = −3.0 and the hypothesized difference is µf − µm = 0. The standardized test statistic is z=
−3.0 − 0 = −1.26 2.38
For the lower tail alternative the p-value is the area in a standard normal curve below −1.26 which technology shows is 0.1038. This is not less than the significance level of 0.05, so the data do not provide sufficient evidence to be convinced that the mean exercise time for female statistics students is less than for male statistics students. 5.27 Under the null hypothesis the randomization distribution should be centered at zero with an estimated standard error of 0.393. The standardized test statistic is z=
0.79 − 0 = 2.01 0.393
Since this is an upper tail test, the p-value is the area in a standard normal distribution above 2.01. Technology or a table shows this area to be 0.022 which is a fairly small p-value (significant at a 5% level). This gives support for the alternative hypothesis that students who smile during the hearing will, on average, tend to get more leniency than students with a neutral expression. 5.28 If we let p represent the proportion of all American adults who have gone a week without using cash to pay for anything, the hypotheses are: H0 :
p = 0.40
Ha :
p > 0.40
The relevant sample statistic is p̂ = 0.43 and the null parameter is 0.40. The standard error is SE = 0.016. We use this information to compute the test statistic: z=
Sample statistic − Null parameter 0.43 − 0.40 = = 1.875 SE 0.016
This is an upper tail test so the p-value is the area in the standard normal that is above 1.875. We see that the p-value is 0.0304. At a 5% significance level, we find evidence that the proportion not using cash is greater than 40%.
CHAPTER 5
250
5.29 This is a test for a single mean. Using µ to represent the mean number of days meeting the goal, for people in a 100-day program to encourage exercise. The hypotheses are: H0 :
µ = 35
Ha :
µ > 35
The sample statistic is x = 36.5, the null value for the mean is 35, and the standard error for the estimate is given as 1.80. The standardized test statistic is: z=
36.5 − 35 Sample Statistic − Null Parameter = = 0.833 SE 1.80
The p-value is the area more extreme than this test statistic. Using technology, we see that the area beyond this value in the right tail of a standard normal distribution is 0.202. We have p-value = 0.202 At a 5% level, we do not reject H0 . We do not find enough evidence to conclude that the mean number of days meeting the goal will be greater than 35. 5.30 This is a test for a difference in means. We define µL and µO to represent the mean number of days meeting the goal, for people in a 100-day program to encourage exercise, for those losing money and those given other types of incentives, respectively. The hypotheses are: H0 : Ha :
µL = µO µL = µO
The sample statistic is xL − xO = 45.0 − 33.7 = 11.3, the null value for the difference in means is 0, and the standard error for the estimate is given as 4.14. The standardized test statistic is: z=
11.3 − 0 Sample Statistic − Null Parameter = = 2.729 SE 4.14
The p-value is the area more extreme than this test statistic. Using technology, we see that the area beyond this value in the right tail of a standard normal distribution is 0.0032. This is a two-tail test, so we have p-value = 2(0.0032) = 0.0064 This is a small p-value, so we reject H0 . We have strong evidence that mean number of days meeting the exercise goal is higher for people who might lose money, thus the possibility of losing money provides stronger incentive to exercise than the other incentives used. 5.31
(a) This is a test for a difference in proportions. Using pG and pI for the proportion quitting smoking for those in a group program or those in an individual program, respectively, the hypotheses are: H0 : Ha :
pG = pI pG = pI
We see that p̂G = 148/1080 = 0.137 and p̂I = 120/990 = 0.121. The sample statistic is p̂G − p̂I = 0.137 − 0.121 = 0.016. (b) A randomization distribution for the difference in proportion for 3000 randomizations is shown below. The proportion of samples more extreme than the difference of 0.016 in the original sample gives a p-value of 2 · 0.143 = 0.286.
CHAPTER 5
251
(c) The standard error in the randomization distribution above is SE = 0.015 and the null hypothesis is that the difference pG − pI = 0, so we use a N (0, 0.015) distribution to find the p-value. The area above the original difference of 0.016 in the figure on the left below is 0.143. This gives a p-value of 2 · 0.143 = 0.286.
(c) P-value from N (0, 0.015)
(d) P-value from N (0, 1)
(d) The null hypothesis difference in proportions is zero and the standard error from the randomization distribution in (b) is 0.015, so the standardized test statistic is: 0.016 − 0 Sample Statistic − Null Parameter = = 1.067 SE 0.015 The p-value is the area more extreme than this test statistic. Using technology (as in the figure on the right above), we see that the area beyond this value in the right tail of a standard normal distribution is 0.143. This is a two-tail test, so we have z=
p-value = 2(0.143) = 0.286 (e) The p-values are the same (although a different set of randomizations might give a slightly different p-value). In every case, at a 5% level, we do not reject H0 . We do not find enough evidence to conclude that it makes a difference whether a smoker is in a group program or an individual program.
252
CHAPTER 5
5.32
(a) This is a test for a single proportion. Using p for the proportion of people who can successfully quit smoking using the incentive, the hypotheses are: H0 : Ha :
p = 0.06 p > 0.06
The sample statistic is p̂ = 47/498 = 0.094. (b) A randomization distribution for proportions in samples of size 498 when p = 0.06 is shown below. The proportion of these samples at or above the original sample p̂ = 0.094 gives a p-value of 0.0016.
(c) The standard error in the randomization distribution above is SE = 0.011 and the null hypothesis is p = 0.06, so we use a N (0.06, 0.011) distribution to find the p-value. The area above the original proportion of p̂ = 0.094 in the figure on the left below gives a p-value of 0.0010.
(c) P-value from N (0.06, 0.011)
(d) P-value from N (0, 1)
CHAPTER 5
253
(d) The null hypothesis proportion is 0.06 and the standard error from the randomization distribution in (b) is 0.011, so the standardized test statistic is: z=
0.094 − 0.06 Sample Statistic − Null Parameter = = 3.091 SE 0.011
The area beyond this value in the right tail of a standard normal distribution (shown on the right above) is 0.001, so we have p-value = 0.001 (e) The p-values are very similar, as we expect. In every case, at a 5% level, we reject H0 . We have strong evidence to conclude that the proportion of smokers who quit after incentives is more than 0.06, so incentives appear to help people quit smoking. 5.33
(a) The sample statistic is p̂ = 4/20 = 0.20, the proportion under the null hypothesis is 0.10, and the standard error of proportions based on samples of size n = 20 (from the StatKey figure) is SE=0.067. The standardized test statistic is 0.20 − 0.1 = 1.49 z= 0.067
(b) Using technology, the area above 1.49 for a standard normal density is 0.068. (c) The p-value in the randomization distribution (0.137) is quite a bit larger than the p-value based on the normal distribution (0.068). This is a fairly small sample and the randomization distribution is not symmetric, so the p-value based on a normal distribution is not accurate. Fortunately, the randomization procedure gives a reasonable way to estimate the p-value in cases where the normal distribution is not appropriate.
CHAPTER 5
254 Section 5.2 Solutions 5.34
(a) For 80% confidence we use technology to find the middle 80% of a standard normal distribution (leaving 10% in each tail) to give z ∗ = 1.282.
(b) For 84% confidence we use technology to find the middle 84% of a standard normal distribution (leaving 8% in each tail) to give z ∗ = 1.405. (c) For 92% confidence we use technology to find the middle 92% of a standard normal distribution (leaving 4% in each tail) to give z ∗ = 1.751.
(a) z ∗ for an 80% CI
5.35
(b) z ∗ for an 84% CI
z ∗ for a 92% CI
(a) For 86% confidence we use technology to find the middle 92% of a standard normal distribution (leaving 7% in each tail) to give z ∗ = 1.476.
(b) For 94% confidence we use technology to find the middle 92% of a standard normal distribution (leaving 3% in each tail) to give z ∗ = 1.881. (c) For 96% confidence we use technology to find the middle 92% of a standard normal distribution (leaving 2% in each tail) to give z ∗ = 2.054.
(a) z ∗ for an 86% CI
(b) z ∗ for an 94% CI
z ∗ for a 96% CI
5.36 For a 95% confidence interval, we have z ∗ = 1.96. The confidence interval for p is: Sample statistic p̂
± ±
z ∗ · SE z ∗ · SE
CHAPTER 5
255 0.43
±
1.96(0.05)
0.43 0.332
± to
0.098 0.528
5.37 For a 95% confidence interval, we have z ∗ = 1.96. The confidence interval for µ is: Sample statistic x
± ±
z ∗ · SE z ∗ · SE
72 72
± ±
1.96(1.70) 3.332
68.668
to
75.332
5.38 For a 90% confidence interval, we have z ∗ = 1.645. The confidence interval for µ is: Sample statistic
±
z ∗ · SE
x 23.1
± ±
z ∗ · SE 1.645(1.04)
23.1 21.39
± to
1.71 24.81
5.39 For a 99% confidence interval, we have z ∗ = 2.576. The confidence interval for p is: Sample statistic p̂
± ±
z ∗ · SE z ∗ · SE
0.78 0.78 0.703
± ± to
2.576(0.03) 0.077 0.857
5.40 For a 95% confidence interval, we have z ∗ = 1.96. The confidence interval for p1 − p2 is: Sample statistic
±
z ∗ · SE
(p̂1 − p̂2 ) (0.68 − 0.61) 0.07
± ± ±
z ∗ · SE 1.96(0.085) 0.167
−0.097
to
0.237
5.41 For a 95% confidence interval, we have z ∗ = 1.96. The confidence interval for µ1 − µ2 is: Sample statistic (x1 − x2 ) (256 − 242)
± ± ±
z ∗ · SE z ∗ · SE 1.96(6.70)
14 0.868
± to
13.132 27.132
CHAPTER 5
256
5.42 For a 95% confidence interval, we have z ∗ = 1.96. The sample statistic is p̂ = 0.41 and the standard error is SE = 0.016. The confidence interval for p is: Sample statistic
±
z ∗ · SE
p̂ 0.41 0.41
± ± ±
z ∗ · SE 1.96(0.016) 0.031
0.379
to
0.441
We are 95% confident that the proportion of all UK adults who start sleep in the fetal position is between 0.379 and 0.441. 5.43 To find a confidence interval using the normal distribution, we use Sample statistic ± z ∗ · SE We are finding a confidence interval for a proportion, so the statistic from the original sample is p̂ = 0.195. For a 90% confidence interval, we use z ∗ = 1.645 and we have SE = 0.009. Putting this information together, we have p̂
±
z ∗ · SE
0.195 0.195
± ±
1.645 · (0.009) 0.0148
0.180
to
0.210
We are 90% sure that the percent of people aged 12 to 19 who have at least slight hearing loss is between 18% and 21%. 5.44 To find a confidence interval using the normal distribution, we use Sample statistic ± z ∗ · SE We are finding a confidence interval for a proportion, so the statistic from the original sample is p̂ = 0.60. For a 99% confidence interval, we use z ∗ = 2.576 and we have SE = 0.015. Putting this information together, we have p̂ 0.60
± ±
z ∗ · SE 2.576 · (0.015)
0.60 0.561
± to
0.039 0.639
We are 99% sure that the percent of all air travelers who prefer a window seat is between 56.1% and 63.9%. 5.45 To find a confidence interval using the normal distribution, we use Sample statistic ± z ∗ · SE We are finding a confidence interval for a mean, so the statistic from the original sample is x = 57.55. For a 95% confidence interval, we use z ∗ = 1.960 and we have SE = 1.42. Putting this information together, we
CHAPTER 5
257
have z ∗ · SE 1.960 · (1.42) 2.78
x 57.55 57.55
± ± ±
54.77
to 60.33
We are 95% confident that the average age of patients admitted to the intensive care unit at this hospital is between 54.8 years old and 60.3 years old. 5.46
(a) To find a confidence interval using the normal distribution, we use Sample statistic ± z ∗ · SE We are finding a confidence interval for a mean, so the relevant statistic from the original sample is x = 5.2. For a 95% confidence interval, we use z ∗ = 1.960 and we have SE = 0.7. Putting this information together, we have x
±
z ∗ · SE
5.2 5.2
± ±
1.960 · (0.7) 1.37
3.83
to
6.57
We are 95% sure that, before the smoke-free legislation, childhood hospital admissions for asthma were increasing at an average rate of between 3.8% per year and 6.6% per year. (b) To find a confidence interval using the normal distribution, we use Sample statistic ± z ∗ · SE We are finding a confidence interval for a mean, so the relevant statistic from the original sample is x = −18.2. Notice that the change is negative since admissions are decreasing. For a 95% confidence interval, we use z ∗ = 1.960 and we have SE = 1.79. Putting this information together, we have x −18.2
± ±
z ∗ · SE 1.960 · (1.79)
−18.2 −21.71
± to
3.51 −14.69
We are 95% sure that, after the smoke-free legislation, childhood hospital admissions for asthma were decreasing at an average rate of between 14.7% per year and 21.7% per year. (c) This is an observational study. Note that no one randomly assigned some months to have the law and others to not have the law. (d) Although the results are very compelling, we cannot conclude that the legislation is causing the change in asthma admissions. Because this was not an experiment, there might be confounding variables that happened at the same time as the smoke-free legislation was passed.
CHAPTER 5
258 5.47 To find a confidence interval using the normal distribution, we use Sample statistic ± z ∗ · SE
We are finding a confidence interval for a difference in proportions, so the relevant statistic from the original sample is p̂Q − p̂R = 0.42 − 0.15 = 0.27. For a 99% confidence interval, we use z ∗ = 2.576 and we have SE = 0.07. Putting this information together, we have (p̂Q − p̂R ) (0.42 − 0.15)
± ±
z ∗ · SE 2.576 · (0.07)
0.27 0.09
± to
0.18 0.45
We are 99% sure that the proportion of words remembered correctly will be between 0.09 and 0.45 higher for people who incorporate quizzes into their study habits. 5.48 To find a confidence interval using the normal distribution, we use Sample statistic ± z ∗ · SE We are finding a confidence interval for a difference in means. We define xL and xO to represent the mean number of days meeting the goal, for the participants in a 100-day program to encourage exercise, for those losing money and those given other types of incentives, respectively. The relevant sample statistic is xL − xO = 45.0 − 33.7 = 11.3. For 95% confidence, we use z ∗ = 1.960 and we have SE = 4.14. Putting this information together, we have (xL − xO ) 11.3
± ±
z ∗ · SE 1.960 · (4.14)
11.3 3.19
± to
8.11 19.41
We are 95% sure that people with an incentive to avoid losing money will meet an exercise goal, on average, 3.19 to 19.41 days more (out of 100) than those with other forms of incentive. 5.49 To find a confidence interval using the normal distribution, we use Sample statistic ± z ∗ · SE We are finding a confidence interval for a proportion, so the statistic from the original sample is p̂ = 53/194 = 0.273. For a 95% confidence interval, we use z ∗ = 1.960. We are given that SE = 0.032. Putting this information together, we have p̂
±
z ∗ · SE
0.273 0.273
± ±
1.960 · (0.032) 0.063
0.210
to
0.336
We are 95% sure that the proportion of all US adults ages 18 to 24 who have used online dating is between 0.210 and 0.336.
CHAPTER 5
259
5.50 To find a confidence interval using the normal distribution, we use Sample statistic ± z ∗ · SE We are finding a confidence interval for a proportion, so the statistic from the original sample is p̂ = 50/411 = 0.122. For a 95% confidence interval, we use z ∗ = 1.960. We are given that SE = 0.016. Putting this information together, we have p̂
±
z ∗ · SE
0.122 0.122
± ±
1.960 · (0.016) 0.031
0.091
to
0.153
We are 95% sure that the proportion of all US adults ages 55 to 64 who have used online dating is between 0.091 and 0.153. 5.51
(a) We see from the two-way table that the proportion of college graduates using online dating is p̂C = 157/823 = 0.191 and the proportion of high school graduates using online dating is p̂H = 70/635 = 0.110. The difference in proportions is p̂C − p̂H = 0.191 − 0.110 = 0.081.
(b) To find a confidence interval using the normal distribution, we use Sample statistic ± z ∗ · SE We are finding a confidence interval for a difference in proportions. For a 99% confidence interval, we use z ∗ = 2.576. We are given that SE = 0.019. Putting this information together, we have (p̂C − p̂H ) 0.081
± ±
z ∗ · SE 2.576 · (0.019)
0.081 0.032
± to
0.049 0.130
We are 99% sure that the proportion of all college graduates using online dating is between 0.032 and 0.130 higher than the proportion of all HS graduates using online dating. (c) No, it is not plausible that the proportions using online dating are the same for both groups since 0 (no difference) is not in the confidence interval for pC − pH . 5.52
(a) To find a confidence interval using the normal distribution, we use Sample statistic ± z ∗ · SE We are finding a confidence interval for a difference in proportions. The sample statistic is the difference in proportions of men and women who have used online dating, p̂M − p̂W = 0.17 − 0.14 = 0.03. For a 99% confidence interval, we use z ∗ = 2.576. We are given that SE = 0.016. Putting this information together, we have (p̂M − p̂W )
±
z ∗ · SE
0.03 0.03
± ±
2.576 · (0.016) 0.041
−0.011
to
0.071
We are 99% sure that the proportion of all US men using online dating is between 0.011 less and 0.071 more than the proportion of all US women using online dating.
CHAPTER 5
260
(b) Yes, it is plausible that the proportions for men and women are the same since 0 (no difference) is in the confidence interval. 5.53 For the original 25 Mustangs in the sample the mean price (in $1000s) is x = 15.98. We form a bootstrap distribution by sampling with replacement from the prices given in the original data set and finding the mean for each sample. One such set of 5000 bootstrap means is shown below.
(a) For a 95% confidence interval we find the 2.5%-tile (11.88) and 97.5%-tile (20.44) from the distribution of bootstrap means. This gives a 95% confidence interval of (11.88, 20.44). Based on this analysis, we are 95% sure that the mean price of Mustangs for sale on the Internet is between $11,880 and $20,440. (b) The standard deviation of the 5000 bootstrap means is SE = 2.202. The distribution is reasonably normal so we use the N (0, 1) endpoints z ∗ = ±1.96 for a 95% confidence interval, 15.98 ± 1.96 · 2.202 = 15.98 ± 4.316 = (11.664, 20.296) Based on this analysis we are 95% sure that the mean price for Mustangs for sale on the Internet is between $11,664 and $20,296. 5.54 (a) Using technology to fit a least squares line, the prediction equation is P rice = 30.495 − 0.219 · M iles. The slope for this sample is −0.219, meaning the price drops by roughly $219 for every extra 1,000 miles a Mustang has been driven. (b) To find a confidence interval for the slope we use technology to select bootstrap samples with replacement from the original 25 Mustangs, keeping track of the slope for each sample. A dotplot of slopes for one set of 5000 bootstrap samples is shown below.
Based on the standard deviation of these 5000 sample slopes, we estimate the standard error of the slope for this regression model to be SE = 0.0295. The bootstrap distribution of slopes looks reasonably normal so, for 98% confidence, we use the 1%-tile and 99%-tile from a standard normal density, z ∗ = ±2.326. To compute the confidence interval −0.219 ± 2.326 · 0.0295 = −0.219 ± 0.069 = (−0.288, −0.150) We are 98% sure that the slope of the regression line to predict P rice based on M iles for Mustangs is between −0.288 and −0.150. While our sample slope indicates an average decrease of about $219 for every extra 1,000 miles of driving, it is quite plausible that this decrease could be anywhere from $288 to $150. Notice that the confidence interval (and in fact the entire bootstrap distribution) contains only negative values, so we can be very confident that prices go down as mileage increases.
CHAPTER 5 5.55
261
(a) The standard error for one set of 1000 bootstrap correlations is SE = 0.0355. Answers may vary slightly with other simulated bootstrap distributions.
(b) For a 90% confidence interval based on a normal distribution, the z ∗ value is 1.645. To find the confidence interval we use 0.807 ± 1.645 · 0.0355 = 0.807 ± 0.058 = (0.749, 0.865) We are 90% sure that the correlation between distance and time for all Atlanta commutes is somewhere between 0.749 and 0.865. 5.56
(a) For each bootstrap, sample 1,502 values with replacement from a file that has 69% or 1,036 “yes” values and 31% or 466 “no” values (or any set that has 69% “yes” values). Find the proportion of “yes” responses in each sample. Repeat many times and compute the standard deviation of the sample proportions.
(b) For a 99% confidence interval, the standard normal endpoint (leaving 0.005 area in the upper tail) is z ∗ = 2.576. Thus the 99% confidence interval is 0.69 ± 2.576 · 0.012 = 0.69 ± 0.031 = (0.659, 0.721) We are 99% sure that between 65.9% and 72.1% of US adults (in 2019) used Facebook. 5.57
(a) For a 99% confidence interval the standard normal value leaving 0.5% in each tail is z ∗ = 2.576. From the bootstrap distribution we estimate the standard error of the correlations to be 0.205. Thus the 99% confidence interval based on a normal distribution would be 0.37 ± 2.576 · 0.205 = 0.37 ± 0.528 = (−0.158, 0.898)
(b) The bootstrap distribution of correlations is somewhat right skewed, while the normal-based interval assumes the distribution is symmetric and bell-shaped.
262
CHAPTER 6
Section 6.1-D Solutions 6.1 The sample proportions will have a standard error of p(1 − p) 0.25(1 − 0.25) = = 0.061 SE = n 50 6.2 The sample proportions will have a standard error of p(1 − p) 0.70(1 − 0.70) = = 0.014 SE = n 1000 6.3 The sample proportions will have a standard error of p(1 − p) 0.90(1 − 0.90) SE = = = 0.039 n 60 6.4 The sample proportions will have a standard error of p(1 − p) 0.27(1 − 0.27) = = 0.081 SE = n 30 6.5 The sample proportions will have a standard error of p(1 − p) 0.08(0.92) = = 0.016 SE = n 300 6.6 The sample proportions will have a standard error of p(1 − p) 0.41(1 − 0.41) SE = = = 0.049 n 100 6.7 We compute the standard errors using the formula: p(1 − p) 0.4(0.6) = = 0.089 n = 30 : SE = n 30 p(1 − p) 0.4(0.6) n = 200 : SE = = = 0.035 n 200 p(1 − p) 0.4(0.6) n = 1000 : SE = = = 0.015 n 1000 We see that as the sample size goes up, the standard error goes down. If the standard error goes down, the sample proportions are less spread out from the population proportion, so the accuracy is better. 6.8 We compute the standard errors using the formula: p(1 − p) 0.75(0.25) = = 0.068 n = 40 : SE = n 40 p(1 − p) 0.75(0.25) n = 300 : SE = = = 0.025 n 300 p(1 − p) 0.75(0.25) n = 1000 : SE = = = 0.014 n 1000 We see that as the sample size goes up, the standard error goes down. If the standard error goes down, the sample proportions are less spread out from the population proportion, so the accuracy is better.
CHAPTER 6
263
6.9 In each case, we determine whether np ≥ 10 and n(1 − p) ≥ 10. (a) Yes, the conditions apply, since np = 500(0.1) = 50 and n(1 − p) = 500(1 − 0.1) = 450. (b) Yes, the conditions apply, since np = 25(0.5) = 12.5 and n(1 − p) = 25(1 − 0.5) = 12.5. (c) No, the conditions do not apply, since np = 30(0.2) = 6 < 10. (d) No, the conditions do not apply, since np = 100(0.92) = 92 but n(1 − p) = 100(1 − 0.92) = 8 < 10. 6.10 In each case, we determine whether np ≥ 10 and n(1 − p) ≥ 10. (a) No, the conditions do not apply, since np = 80(0.1) = 8 < 10. (b) No, the conditions do not apply, since np = 25(0.8) = 20 but n(1 − p) = 25(1 − 0.8) = 5 < 10. (c) Yes, the conditions apply, since np = 50(0.4) = 20 and n(1 − p) = 50(1 − 0.4) = 30. (d) Yes, the conditions apply, since np = 200(0.7) = 140 and n(1 − p) = 200(1 − 0.7) = 60. 6.11
(a) Since np = 90 · 0.66 = 59.4 and n(1 − p) = 90 · (1 − 0.66) = 34.2 are both more than 10, the Central Limit Theorem applies to say that the distribution of these sample proportions will be roughly normal. The center should be at p = 0.66 and the standard error is p(1 − p) 0.66(1 − 0.66) = = 0.050 n 90 A sketch of that normal distribution is shown below.
(b) Using technology, the area above 0.75 for the normal curve in (a) is about 0.036. 6.12
(a) Since np = 75 · 0.62 = 46.5 and n(1 − p) = 75 · (1 − 0.62) = 28.5 are both more than 10, the Central Limit Theorem applies to say that the distribution of these sample proportions will be roughly normal. The center should be at p = 0.62 and the standard error is p(1 − p) 0.62(1 − 0.62) = = 0.056 n 75 A sketch of that normal distribution is shown below.
264
(b) Using technology, the area below 0.50 for the normal curve in (a) is about 0.016.
CHAPTER 6
CHAPTER 6
265
Section 6.1-CI Solutions 6.13 The sample size is large enough to use the normal distribution. For a confidence interval using the normal distribution, we use Sample statistic ± z ∗ · SE The relevant sample statistic for a confidence interval for a proportion is p̂ = 0.38. For a 95% confidence interval, we have z ∗ = 1.96, and the standard error is SE = p̂(1 − p̂)/n. The confidence interval is p̂(1 − p̂) ∗ p̂ ± z n 0.38(0.62) 0.38 ± 1.96 · 500 0.38 ± 0.043 0.337 to 0.423 The best estimate for p is 0.38, the margin of error is ±0.043, and the 95% confidence interval for p is 0.337 to 0.423. 6.14 The sample size is large enough to use the normal distribution. For a confidence interval using the normal distribution, we use Sample statistic ± z ∗ · SE The relevant sample statistic for a confidence interval for a proportion is p̂ = 0.85. For a 90% confidence interval, we have z ∗ = 1.645, and the standard error is SE = p̂(1 − p̂)/n. The confidence interval is p̂(1 − p̂) ∗ p̂ ± z n 0.85(0.15) 0.85 ± 1.645 · 120 0.85 ± 0.054 0.796
to
0.904
The best estimate for p is 0.85, the margin of error is ±0.054, and the 90% confidence interval for p is 0.796 to 0.904. 6.15 The sample size is large enough to use the normal distribution, since there are 62 yes answers and 28 other answers, both bigger than 10. For a confidence interval using the normal distribution, we use Sample statistic ± z ∗ · SE The relevant sample statistic for a confidence interval for a proportion is p̂, and in this case we have p̂ = 62/90 = 0.689 with n = 90. For a 99% confidence interval, we have z ∗ = 2.576, and the standard error is SE = p̂(1 − p̂)/n. The confidence interval is p̂(1 − p̂) ∗ p̂ ± z n 0.689(0.311) 0.689 ± 2.576 · 90 0.689 ± 0.126 0.563 to 0.815
266
CHAPTER 6
The best estimate for the proportion who will answer yes is 0.689, the margin of error is ±0.126, and the 99% confidence interval for the proportion who will answer yes is 0.563 to 0.815. Note that the margin of error is quite large since the sample size is relatively small. 6.16 The sample size is large enough to use the normal distribution. For a confidence interval using the normal distribution, we use Sample statistic ± z ∗ · SE The relevant sample statistic for a confidence interval for a proportion is p̂ = 0.23. For a 95% confidence interval, we have z ∗ = 1.96, and the standard error is SE = p̂(1 − p̂)/n. The confidence interval is p̂(1 − p̂) ∗ p̂ ± z n 0.23(0.77) 0.23 ± 1.96 · 400 0.23 ± 0.041 0.189 to 0.271 The best estimate for the proportion of the population in category A is 0.23, the margin of error is ±0.041, and the 95% confidence interval for the proportion in category A is 0.189 to 0.271. 6.17 The desired margin of error is M E = 0.05 and we have z ∗ = 1.96 for 95% confidence. Since we are given no information about the population parameter, we use the conservative estimate p̃ = 0.5. We use the formula to find sample size: ∗ 2 2 1.96 z p̃(1 − p̃) = (0.5 · 0.5) = 384.2 n= ME 0.05 We round up to n = 385. In order to ensure that the margin of error is within the desired ±5%, we should use a sample size of 385 or higher. 6.18 The desired margin of error is M E = 0.01 and we have z ∗ = 2.576 for 99% confidence. Since we are given no information about the population parameter, we use the conservative estimate p̃ = 0.5. We use the formula to find sample size: ∗ 2 2 z 2.576 n= p̃(1 − p̃) = (0.5 · 0.5) = 16, 589.4 ME 0.01 We round up to n = 16, 590. In order to ensure that the margin of error is within the desired ±1%, we should use a sample size of 16,590 or higher. This is a very large sample size! The sample size goes up significantly as we aim for greater accuracy and greater confidence in the result. 6.19 The desired margin of error is M E = 0.03 and we have z ∗ = 1.645 for 90% confidence. We estimate that p is about 0.3, so we use p̃ = 0.3. We use the formula to find sample size: ∗ 2 2 z 1.645 n= p̃(1 − p̃) = (0.3 · 0.7) = 631.4 ME 0.03 We round up to n = 632. In order to ensure that the margin of error is within the desired ±3%, we should use a sample size of 632 or higher.
CHAPTER 6
267
6.20 The desired margin of error is M E = 0.02 and we have z ∗ = 1.96 for 95% confidence. We use the information we have about the sample proportion, so we use p̂ = 0.78 as our estimate for the population parameter. We use the formula to find sample size: n=
z∗ ME
2 p̃(1 − p̃) =
1.96 0.02
2 (0.78 · 0.22) = 1648.05
We round up to n = 1649. In order to ensure that the margin of error is within the desired ±2%, we should use a sample size of 1649 or higher. 6.21
(a) The sample statistic is p̂ = 182/1000 = 0.182.
(b) The standard error for this sample statistic is: p̂(1 − p̂) 0.182(1 − 0.182) SE = = = 0.012 n 1000 (c) We use the normal distribution to see that, for a 90% confidence interval, we have z ∗ = 1.645. (d) We have: Statistic 0.182 0.182
± ± ±
0.162
to
z ∗ · SE 1.645 · (0.012) 0.020 0.202
The 90% confidence interval for the proportion of all US adults who would say that they are poor is 0.162 to 0.202. (This is the same interval that we found using a bootstrap distribution in Section 3.4.) 6.22
(a) We are estimating a proportion, so the notation for the parameter is p and we define it as p = the proportion of all US teens to smoke a cigarette in the last 30 days.
(b) The notation and value for the sample proportion is p̂ = 92/1582 = 0.058. (c) The standard error for this sample statistic is: p̂(1 − p̂) 0.058(1 − 0.058) = = 0.006 SE = n 1582 (d) We use the normal distribution to see that, for a 99% confidence interval, we have z ∗ = 2.576. (e) We have: z ∗ · SE
Statistic
±
0.058 0.058 0.043
± 2.576 · (0.006) ± 0.015 to 0.073
The 99% confidence interval for the proportion of all US teens to smoke a cigarette in the last 30 days is between 0.043 and 0.073. (f) The best estimate is the sample proportion, 0.058. We see that the margin of error is 0.015.
268
CHAPTER 6
6.23
(a) The sample proportion is p̂ = 0.35, the sample size is n = 140, and for 95% confidence, we have z ∗ = 1.96. The 95% confidence interval is given by Statistic
±
p̂
±
0.35
±
0.35 0.271
± to
z ∗ · SE p̂(1 − p̂) ∗ z · n 0.35(0.65) 1.96 · 140 0.079 0.429
We are 95% sure that the proportion of US household cats that hunt outdoors is between 0.271 and 0.429. (b) We see that p = 0.45 is not in the confidence interval, so 0.45 is not a plausible value for the proportion. However, p = 0.30 is in the confidence interval, so 0.30 is a plausible value. 6.24
(a) The sample proportion is p̂ = 605/1060 = 0.571, the sample size is n = 1060, and for 90% confidence, we have z ∗ = 1.645. The 90% confidence interval is given by Statistic p̂ 0.571 0.571 0.546
z ∗ · SE p̂(1 − p̂) ∗ ± z · n 0.571(1 − 0.571) ± 1.645 · 1060 ± 0.025 to 0.596 ±
We are 90% sure that the proportion of all US teens who have made a friend online is between 0.546 and 0.596. (b) The best estimate for population proportion p is the sample proportion p̂ = 0.571 and the margin of error is 0.025. (c) We see that the entire confidence interval of all plausible values for p is above 0.50, so we can be 90% confident that more than half of US teens have made a friend online. 6.25 The sample size is definitely large enough to use the normal distribution. For a confidence interval using the normal distribution, we use Sample statistic ± z ∗ · SE The relevant sample statistic for a confidence interval for a proportion is p̂ = 0.20. For a 99% confidence interval, we have z ∗ = 2.576, and the standard error is SE = p̂(1 − p̂)/n. The confidence interval is p̂(1 − p̂) ∗ p̂ ± z n 0.20(0.80) 0.20 ± 2.576 · 1000 0.20 ± 0.033 0.167 to 0.233
CHAPTER 6
269
We are 99% confident that the proportion of US adults who say they never exercise is between 0.167 and 0.233. The margin of error is ±3.3%. 6.26 The sample size is definitely large enough to use the normal distribution. For a confidence interval using the normal distribution, we use Sample statistic ± z ∗ · SE The relevant sample statistic for a confidence interval for a proportion is p̂, and we have p̂ = 959/1068 = 0.898. For a 95% confidence interval, we have z ∗ = 1.96, and the standard error is SE = p̂(1 − p̂)/n. The confidence interval is p̂(1 − p̂) ∗ p̂ ± z n 0.898(0.102) 0.898 ± 1.96 · 1068 0.898 ± 0.018 0.880
to
0.916
We are 95% confident that the proportion of times NFL teams punt on a fourth down when the analysis shows they should not punt is between 0.880 and 0.916. 6.27 The sample size is clearly large enough to use the formula based on the normal approximation, since there are well more than 10 responses in each category. (a) The proportion in the sample who disagreed is p̂ = 1812/2625 = 0.69 and z ∗ = 1.645 for 90% confidence, so we have 0.69(1 − 0.69) = 0.69 ± 0.015 = (0.675, 0.705) 0.69 ± 1.645 2625 We are 95% sure that between 67.5% and 70.5% of people would disagree with the statement “There is only one true love for each person.” (b) The proportion in the sample who answered “don’t know” is p̂ = 78/2625 = 0.03 so the 90% confidence interval is 0.03(1 − 0.03) = 0.03 ± 0.005 = (0.025, 0.035) 0.03 ± 1.645 2625 We are 90% sure that between 2.5% and 3.5% of people would respond with “don’t know.” (c) The estimated proportion of people who disagree (which is closer to 0.5) has a larger margin of error. 6.28 Using StatKey or other technology to create a bootstrap distribution, we see for one set of 1000 simulations that SE = 0.05. (Answers may vary slightly with other simulations.) Using the formula from the Central Limit Theorem, and using p̂ = 0.52 as an estimate for p, we have SE =
p(1 − p) ≈ n
0.52(1 − 0.52) = 0.050 100
We see that the bootstrap standard error and the formula match very closely.
CHAPTER 6
270
6.29 Using StatKey or other technology to create a bootstrap distribution, we see for one set of 1000 simulations that SE = 0.045. (Answers may vary slightly with other simulations.) Using the formula from the Central Limit Theorem, and using p̂ = 0.583 as an estimate for p, we have SE =
p(1 − p) ≈ n
0.583(1 − 0.583) = 0.045 120
We see that the bootstrap standard error and the formula match very closely. 6.30 Using StatKey or other technology to create a bootstrap distribution, we see for one set of 1000 simulations that SE = 0.068. (Answers may vary slightly with other simulations.) Using the formula from the Central Limit Theorem, and using p̂ = 0.354 as an estimate for p, we have SE =
p(1 − p) ≈ n
0.354(1 − 0.354) = 0.069 48
We see that the bootstrap standard error and the formula match very closely. 6.31 Using StatKey or other technology to create a bootstrap distribution, we see for one set of 1000 simulations that SE = 0.014. (Answers may vary slightly with other simulations.) Using the formula from the Central Limit Theorem, and using p̂ = 0.753 as an estimate for p, we have SE =
p(1 − p) ≈ n
0.753(1 − 0.753) = 0.014 1000
We see that the bootstrap standard error and the formula match very closely. 6.32 Using StatKey or other technology we create a bootstrap distribution with at least 1000 simulated proportions. To find a 95% confidence interval we find the endpoints that contain 95% of the simulated proportions. For one set of 1000 bootstrap proportions (shown below) we find that a 95% confidence interval for proportion of home wins in soccer goes from 0.492 to 0.667.
CHAPTER 6
271
Using the normal distribution and the formula for standard error, we have 0.583(1 − 0.583) = 0.583 ± 0.088 = (0.495, 0.671) 0.583 ± 1.96 · 120 The two methods give very similar intervals. 6.33 Using StatKey or other technology we create a bootstrap distribution with at least 1000 simulated proportions. To find a 95% confidence interval we find the endpoints that contain 95% of the simulated proportions. For one set of 1000 bootstrap proportions we find that a 95% confidence interval for the proportion of orange Reese’s Pieces goes from 0.40 to 0.56.
Using the normal distribution and the formula for standard error, we have 0.48(0.52) = 0.48 ± 0.080 = (0.40, 0.56) 0.48 ± 1.96 · 150 In this case, the two methods give exactly the same interval. 6.34 We use z ∗ = 1.96 for 95% confidence. Since we are given no information about the population proportion, we use the conservative estimate p̃ = 0.5. For a desired margin of error of M E = 0.06, we have: ∗ 2 2 1.96 z p̃(1 − p̃) = (0.5 · 0.5) = 266.8 n= ME 0.06 We round up to n = 267. For a desired margin of error of M E = 0.04, we have: ∗ 2 2 1.96 z p̃(1 − p̃) = (0.5 · 0.5) = 600.25 n= ME 0.04
272
CHAPTER 6
We round up to n = 601. For a desired margin of error of M E = 0.01, we have: ∗ 2 2 z 1.96 n= p̃(1 − p̃) = (0.5 · 0.5) = 9604 ME 0.01 We have n = 9604. We see that the sample size goes up quite a bit as we require more accuracy. Or, put another way, a larger sample size tends to give more accurate estimates. 6.35 We have M E = 0.03 for the margin of error. Since we are given no information about the population proportion, we use the conservative estimate p̃ = 0.5. For 99% confidence, we use z ∗ = 2.576. We have: ∗ 2 2 z 2.576 n= p̃(1 − p̃) = (0.5 · 0.5) = 1843.3 ME 0.03 We round up to n = 1844. For 95% confidence, we use z ∗ = 1.96. We have: ∗ 2 2 z 1.96 n= p̃(1 − p̃) = (0.5 · 0.5) = 1067.1 ME 0.03 We round up to n = 1068. For 90% confidence, we use z ∗ = 1.645. We have: ∗ 2 2 z 1.645 n= p̃(1 − p̃) = (0.5 · 0.5) = 751.7 ME 0.03 We round up to n = 752. We see that the sample size goes up as the level of confidence we want in the result goes up. Or, put another way, a larger sample size gives a higher level of confidence in the accuracy of the estimate. 6.36 We use z ∗ = 1.96 for 95% confidence, and we want a margin of error of M E = 0.03. If we are given no information about the population proportion, we use the conservative estimate p̃ = 0.5. We have: 2 ∗ 2 1.96 z p̃(1 − p̃) = (0.5 · 0.5) = 1067.1 n= ME 0.03 We round up to n = 1068. This sample size should be becoming familiar to you by now! You will see it often in opinion poll surveys as well. If we believe that p ≈ 0.7, we use that as our estimate p̃ = 0.7. We have: ∗ 2 2 1.96 z p̃(1 − p̃) = (0.7 · 0.3) = 896.4 n= ME 0.03
CHAPTER 6
273
We round up to n = 897. If we believe that p ≈ 0.9, we use that as our estimate p̃ = 0.9. We have: n=
z∗ ME
2 p̃(1 − p̃) =
1.96 0.03
2 (0.9 · 0.1) = 384.2
We round up to n = 385. The largest sample size is seen at p̃ = 0.5. If we assume p = 0.5, that means we are assuming a 50-50 split in the population between being in the category we are interested in and not being in that category. With no knowledge, the safest thing is to assume this 50-50 split. If we have more specific knowledge going in that the proportion is likely to be farther from this “safe” estimate of 0.5, the smaller the necessary sample size. The farther the estimate p̃ is from 0.5, the smaller sample size we need. 6.37
(a) The sample size is definitely large enough to use the normal distribution. The relevant sample statistic is p̂ = 0.32 and, for a 95% confidence interval, we use z ∗ = 1.96. The confidence interval is
p̂
±
0.32
±
0.32 0.291
± to
p̂(1 − p̂) n 0.32(1 − 0.32) 1.96 · 1000 0.029 0.349
z
∗
We are 95% confident that the proportion of US adults who favor a tax on soda and junk food is between 0.291 and 0.349. (b) The margin of error is 2.9%. (c) Since the margin of error is about ±3% with a sample size of n = 1000, we’ll definitely need a sample size larger than 1000 to get the margin of error down to ±1%. To see how much larger, we use the formula for determining sample size. The margin of error we desire is M E = 0.01, and for 95% confidence we use z ∗ = 1.96. We can use the sample statistic p̂ = 0.32 as our best estimate for p. We have: n =
= =
z∗ ME
2 p̃(1 − p̃)
2 1.96 0.32(1 − 0.32) 0.01 8359.3
We round up, so we would need to include 8360 people in the survey in order to get the margin of error down to within ±1%. 6.38 The margin of error we desire is M E = 0.02, and for 95% confidence we use z ∗ = 1.96. Since we have
CHAPTER 6
274
no prior knowledge about the proportion in support p, we use the conservative estimate of p̃ = 0.5. We have: n =
= =
z∗ ME
1.96 0.02 2401
2 p̃(1 − p̃) 2 0.5(1 − 0.5)
We need to include 2401 people in the survey in order to get the margin of error down to within ±2%. 6.39 Since we have no reasonable estimate for the proportion, we use p̃ = 0.5. For 98% confidence, use z ∗ = 2.326. The required sample size is n=
2.326 0.04
2 0.5(1 − 0.5) = 845.4
We round up to show a sample of at least 846 individuals is needed to estimate the proportion who would consider buying a sunscreen pill to within 4% with 98% confidence. 6.40 We have M E = 0.01 so the sample size needed is n = 1/(0.01)2 = 10, 000. We need a large sample size of 10, 000 for this much accuracy. 6.41 We have M E = 0.02 so the sample size needed is n = 1/(0.02)2 = 2500. 6.42 We have M E = 0.04 so the sample size needed is n = 1/(0.04)2 = 625. 6.43 We have M E = 0.05 so the sample size needed is n = 1/(0.05)2 = 400. 6.44 The data in CommuteAtlanta contain 254 males and 246 females among the 500 commuters. Thus the sample proportion of males is p̂ = 254/500 = 0.508. Since there are more than 10 males and 10 females in the sample, we may use the normal approximation to construct a confidence interval for the proportion of males. For 95% confidence the standard normal endpoints are z ∗ = 1.96, so we compute the interval with 0.508(1 − 0.508) = 0.508 ± 0.044 = (0.464, 0.552) 0.508 ± 1.96 500 We are 95% sure that somewhere between 46.4% and 55.2% of Atlanta commuters are male. 6.45 We see in the dataset ICUAdmissions that 160 of the patients in the sample lived and 40 died. The sample proportion who live is p̂ = 160/200 = 0.80. Since there are more than 10 in the living and the dying groups in the sample, we may use the normal approximation to construct a confidence interval for the proportion who live. For 95% confidence the standard normal endpoint is z ∗ = 1.96, so we compute the interval with 0.80(1 − 0.80) 0.80 ± 1.96 = 0.80 ± 0.055 = (0.745, 0.855) 200 We are 95% sure that the proportion of ICU patients (at this hospital) who live is between 74.5% and 85.5%.
CHAPTER 6
275
Section 6.1-HT Solutions 6.46 Since np0 = 40(0.5) = 20 and n(1 − p0 ) = 40(0.5) = 20, the sample size is large enough to use the normal distribution. In general, the standardized test statistic is z=
Sample Statistic − Null Parameter SE
In this test for a proportion, the sample statistic is p̂ = 0.57 and the parameter from the null hypothesis is p0 = 0.5. The standard error is SE = p0 (1 − p0 )/n. The standardized test statistic is p̂ − p0 z=
p0 (1−p0 ) n
0.60 − 0.50 = = 1.265 0.5(0.5) 40
This is an upper-tail test, so the p-value is the area above 1.265 in a standard normal distribution. Using technology or a table, we see that the p-value is 0.103. This p-value is quite large so we do not find evidence to support the alternative hypothesis that p > 0.5.
6.47 Since np0 = 200(0.3) = 60 and n(1 − p0 ) = 200(0.7) = 140, the sample size is large enough to use the normal distribution. In general, the standardized test statistic is z=
Sample Statistic − Null Parameter SE
In this test for a proportion, the sample statistic is p̂ = 0.21 and the parameter from the null hypothesis is p0 = 0.3. The standard error is SE = p0 (1 − p0 )/n. The standardized test statistic is p̂ − p0 z=
p0 (1−p0 ) n
0.21 − 0.3 = = −2.78 0.3(0.7) 200
This is a lower-tail test, so the p-value is the area below −2.78 in a standard normal distribution. Using technology or a table, we see that the p-value is 0.0027. This p-value is very small so we find strong evidence to support the alternative hypothesis that p < 0.3.
CHAPTER 6
276
6.48 Since np0 = 100(0.25) = 25 and n(1 − p0 ) = 100(0.7) = 75, the sample size is large enough to use the normal distribution. In general, the standardized test statistic is z=
Sample Statistic − Null Parameter SE
In this test for a proportion, the sample statistic is p̂ = 0.16 and the parameter from the null hypothesis is p0 = 0.25. The standard error is SE = p0 (1 − p0 )/n. The standardized test statistic is p̂ − p0 z=
p0 (1−p0 ) n
0.16 − 0.25 = = −2.08 0.25(0.75) 100
This is a lower-tail test, so the p-value is the area below −2.08 in a standard normal distribution. Using technology or a table, we see that the p-value is 0.019. This p-value is below the significance level of 5% so we reject H0 and find evidence that p < 0.25.
6.49 Since np0 = 50(0.8) = 40 and n(1 − p0 ) = 50(0.2) = 10, the sample size is (just barely) large enough to use the normal distribution. In general, the standardized test statistic is z=
Sample Statistic − Null Parameter SE
In this test for a proportion, the sample statistic is p̂ = 0.88 and the parameter from the null hypothesis is p0 = 0.8. The standard error is SE = p0 (1 − p0 )/n. The standardized test statistic is p̂ − p0 z=
p0 (1−p0 ) n
0.88 − 0.80 = = 1.41 0.8(0.2) 50
CHAPTER 6
277
This is an upper-tail test, so the p-value is the area above 1.41 in a standard normal distribution. Using technology or a table, we see that the p-value is 0.079. This p-value is larger than the significance level of 5%, so we do not find sufficient evidence to support the alternative hypothesis that p > 0.8.
6.50 Since np0 = 120(0.75) = 90 and n(1 − p0 ) = 120(0.25) = 30, the sample size is large enough to use the normal distribution. In general, the standardized test statistic is z=
Sample Statistic − Null Parameter SE
In this test for a proportion, the sample statistic is p̂ = 0.69 and the parameter from the null hypothesis is p0 = 0.75. The standard error is SE = p0 (1 − p0 )/n. The standardized test statistic is p̂ − p0 z=
p0 (1−p0 ) n
0.69 − 0.75 = = −1.52 0.75(0.25) 120
This is a two-tail test, so the p-value is two times the area below −1.52 in a standard normal distribution. Using technology or a table, we see that the p-value is 2(0.0643) = 0.1286. This p-value is not very small and is not significant at any reasonable significance level. We do not find evidence to support the alternative hypothesis that p = 0.75.
6.51 Since np0 = 1000(0.2) = 200 and n(1 − p0 ) = 1000(0.8) = 800, the sample size is large enough to use the normal distribution. In general, the standardized test statistic is z=
Sample Statistic − Null Parameter SE
CHAPTER 6
278
In this test for a proportion, the sample statistic is p̂ = 0.26 and the parameter from the null hypothesis is p0 = 0.2. The standard error is SE = p0 (1 − p0 )/n. The standardized test statistic is p̂ − p0 z=
p0 (1−p0 ) n
0.26 − 0.2 = = 4.74 0.2(0.8) 1000
This is a two-tail test, so the p-value is two times the area above 4.74 in a standard normal distribution. The area beyond 4.74 is essentially zero, so the p-value is essentially zero. The p-value is very small so we have strong evidence to support the alternative hypothesis that p = 0.2.
6.52 If we let p represent the proportion of Canadian infants who receive antibiotics during the first year of life, the hypotheses are: H0 : Ha :
p = 0.70 p > 0.70
The sample size of 616 is clearly large enough for us to use the normal distribution. The sample statistic is p̂ = 438/616 = 0.711 and n = 616. The standardized test statistic is: z=
p̂ − p0 0.711 − 0.70 Statistic − Null value = = = 0.600 SE p0 (1−p0 ) 0.70(0.30) n
616
This is an right-tail test, so the p-value is the area above 0.600 in a standard normal distribution. We see that the p-value is 0.274. This is not a small p-value and we do not reject H0 . We do not have enough evidence to conclude that more than 70% of Canadian infants receive antibiotics. 6.53
(a) If we let pl be the proportion of left-handed lawyers, then we are testing test H0 : pl = 0.10 vs. Ha : pl = 0.10.
(b) The sample proportion is p̂l = 16/105 = 0.1524. The test statistic is 0.1524 − 0.10 = 1.79 z= 0.10(1−.10) 105
The area above 1.79 in the standard normal curve is 0.037, so the p-value is 2(0.037) = 0.074. (c) We do not reject H0 at the 5% significance level, and thus do not conclude that the proportion of left-handed lawyers differs from the proportion of left-handed Americans. At the 10% significance level we do reject H0 and conclude that there is a higher percentage of left-handed lawyers.
CHAPTER 6
279
6.54 If we let p represent the proportion of US adults who believe in ghosts, the hypotheses are: H0 :
p = 0.25
Ha :
p > 0.25
The sample size of 1000 is clearly large enough for us to use the normal distribution. The test statistic is: z=
p̂ − p0 0.31 − 0.25 Sample statistic − Null parameter = = = 4.38 SE p0 (1−p0 ) 0.25(0.75) n
1000
This is an upper-tail test, so the p-value is the area above 4.38 in a standard normal distribution. We know that 4.38 (more than four standard deviations above the mean) is going to be way out in the tail of the distribution, and the p-value is essentially zero. There is very strong evidence that more than 1 in 4 US adults believes in ghosts. 6.55 This is a test for a single proportion. If we let p denote the proportion of all US first-year full-time college students who are satisfied with their overall academic experience at the end of the first year, the hypotheses are: H0 :
p = 0.75
Ha :
p > 0.75
The sample statistic is p̂ = 4122/5204 = 0.792. The null hypothesis value is p0 = 0.75. The standard error is p0 (1 − p0 ) 0.75(1 − 0.75) SE = = = 0.006 n 5204 The z-test statistic is: 0.792 − 0.75 Statistic − Null = = 7.0 z= SE 0.006 This is a right-tail test so the p-value is the area to the right of 7.0 in a normal distribution. The value of 7 is way out in the tail, and we see that the p-value is approximately 0. This p-value is very small and certainly less than any reasonable significance level. We reject H0 . We have evidence that more than 75% of US full-time students are satisfied with their overall academic experience at the end of their first year. 6.56 Using p as the proportion with husbands older, we have H0 :
p = 0.5
Ha :
p > 0.5
Note that the null hypothesis is that husbands and wives are equally likely to be the older one. The sample statistic is p̂ = 75/105 = 0.714 and the null hypothesis value is 0.5. The standard error is p0 (1 − p0 ) 0.5(1 − 0.5) = = 0.0488 SE = n 105 The z-test statistic is:
0.714 − 0.5 Statistic − Null = = 4.385 SE 0.0488 This is a right-tail test so the p-value is the area to the right of 4.385 in a normal distribution. The value of 4.385 is way out in the tail, and we see that the p-value is approximately 0. This p-value is very small and certainly less than any reasonable significance level. We reject H0 . We have evidence that the husband is older than the wife in more than half of male-female married couples. z=
CHAPTER 6
280
6.57 We are conducting a hypothesis test for a proportion p, where p is the proportion of all MLB games won by the home team. We are testing to see if there is evidence that p > 0.5, so we have H0 :
p = 0.5
Ha :
p > 0.5
This is a one-tail test since we are specifically testing to see if the proportion is greater than 0.5. The test statistic is: p̂ − p0 0.549 − 0.5 Sample statistic − Null parameter = = = 4.83 z= SE p0 (1−p0 ) 0.5(0.5) n
2430
Using the normal distribution, we find a p-value of (to five decimal places) zero. This provides very strong evidence to reject H0 and conclude that the home team wins more than half the games played. The home field advantage is real! 6.58 We are conducting a hypothesis test for a proportion p, where p is the proportion of all MLB games won by the home team. We are testing to see if there is evidence that p > 0.5, so we have H0 : Ha :
p = 0.5 p > 0.5
This is a one-tail test since we are specifically testing to see if the proportion is greater than 0.5. The sample statistic is p̂ = 1297/2458 = 0.528. The test statistic is: z=
p̂ − p0 Sample statistic − Null parameter 0.528 − 0.5 = = = 2.78 SE p0 (1−p0 ) 0.5(0.5) n
2458
Using the normal distribution, we find a p-value of 0.0029. This is less than the 1% significance level and provides strong evidence to reject H0 and conclude that the home team wins more than half the games played. The home field advantage is real! 6.59 We are conducting a hypothesis test for a proportion p, where p is the proportion of all US adults who know most or all of their neighbors. We are testing to see if there is evidence that p > 0.5, so we have H0 : Ha :
p = 0.5 p > 0.5
This is a one-tail test since we are specifically testing to see if the proportion is greater than 0.5. The sample proportion is p̂ = 0.51 and the null proportion is p0 = 0.5. The sample size is n = 2255 so the test statistic is: p̂ − p0 0.51 − 0.5 Sample statistic − Null parameter = = = 0.95 z= SE p0 (1−p0 ) 0.5(0.5) n
2255
This is a one-tail test, so the p-value is the area above 0.95 in the standard normal distribution. We find a p-value of 0.171. This p-value is larger than even a 10% significance level, so we do not have sufficient evidence to conclude that the proportion of US adults who know their neighbors is larger than 0.5. 6.60 The sample proportion of questions having B as the correct answer is p̂ = 90/400 = 0.225. If all the choices were equally likely, we would expect each to be correct about 1/5, or 20%, of the time. If p represents the proportion of time B is the correct choice on all AP multiple choice questions, the hypotheses are: H0 : Ha :
p = 0.20 p > 0.20
CHAPTER 6
281
The test statistic is: z=
p̂ − p0 0.225 − 0.20 Sample statistic − Null parameter = = = 1.25 SE p0 (1−p0 ) 0.2(0.8) n
400
This is an upper-tail test, so the p-value is the area above 1.25 in a standard normal distribution. We find the p-value is 0.106. Even at a 10% level, we do not reject H0 . We do not find evidence that B is more likely to be the correct choice. 6.61 Let p be the proportion of times Team B will beat Team A. We are testing H0 : p = 0.5 vs Ha : p = 0.5. The sample proportion is p̂ = 24/40 = 0.6. The test statistic is 0.6 − 0.5 z= = 1.26 0.5(1−0.5) 40
The area to the right of 1.26 on the standard normal distribution is 0.104, so the p-value is 2(0.104) = 0.208. There is not convincing evidence that one team is better than the other. (We arrive at the same conclusion if we let p be the proportion of times that Team A wins.) 6.62
(a) Let p be the proportion of infants who choose the jar with the higher proportion of their preferred color. The relevant hypotheses are: H0
: p = 0.5
Ha
: p > 0.5
(b) The relevant statistic is the sample proportion: p̂ = 18/24 = 0.75. (c) We need to check the conditions to see if the sample size is large enough for the normal distribution to apply when the null hypothesis is true. np0 = n(1 − p0 ) = 24(0.5) = 12 > 10 The result is not far above 10, but does say the sample size is large enough (under H0 ) to use the normal approximation. There is no restriction on sample size to use a randomization test, so either method could be used in this situation. (d) By creating a randomization distribution and finding the proportion of simulated statistics greater than or equal to p̂ = 18/24 = 0.75, we find a p-value of about 0.01. If we use the standardized statistic z=
0.75 − 0.5 p̂ − p0 = = 2.45 SE 0.5(1−0.5) 24
the p-value from the upper tail of a standard normal distribution is 0.007. (e) Either p-value is lower than α = 0.05, so we have evidence that babies are more likely to choose the jar with the higher proportion of their preferred color, and hence do have some understanding of probability. 6.63 We use technology to determine that the number of smokers in the sample is 43, so the sample proportion of smokers is p̂ = 43/315 = 0.1365. The hypotheses are: H0 : Ha :
p = 0.20 p = 0.20
CHAPTER 6
282 The test statistic is: z=
Sample statistic − Null parameter p̂ − p0 0.1365 − 0.20 = = = −2.82 SE p0 (1−p0 ) 0.2(0.8) 315
n
This is a two-tail test, so the p-value is twice the area below −2.82 in a standard normal distribution. We see that the p-value is 2(0.0024) = 0.0048. This small p-value leads us to reject H0 . We find strong evidence that the proportion of smokers is not 20%. 6.64 We use technology to determine that the number of regular vitamin users in the sample is 122, so the sample proportion is p̂ = 122/315 = 0.387. The hypotheses are: H0 : Ha :
p = 0.35 p = 0.35
The test statistic is: z=
Sample statistic − Null parameter p̂ − p0 0.387 − 0.35 = = = 1.38 SE p0 (1−p0 ) 0.35(0.65) n
315
This is a two-tail test, so the p-value is twice the area above 1.38 in a standard normal distribution. We see that the p-value is 2(0.084) = 0.168. This p-value is larger than any reasonable significance level, so we do not reject H0 . We do not find evidence that the proportion of people who regularly take a vitamin pill is not 35%.
CHAPTER 6
283
Section 6.2-D Solutions
6.65 The sample means will have a standard error of 5 σ = 0.158 SE = √ = √ n 1000 6.66 The sample means will have a standard error of 2 σ SE = √ = √ = 0.632 n 10 6.67 The sample means will have a standard error of 80 σ SE = √ = √ = 12.65 n 40 6.68 The sample means will have a standard error of 32 σ SE = √ = √ = 3.695 n 75 6.69 We use a t-distribution with df = 9. Using technology, we see that the values with 5% beyond them in each tail are ±1.83.
6.70 We use a t-distribution with df = 17. Using technology, we see that the values with 1% beyond them in each tail are ±2.57.
284
CHAPTER 6
6.71 We use a t-distribution with df = 24. Using technology, we see that the values with 0.025 beyond them in each tail are ±2.06.
6.72 We use a t-distribution with df = 39. Using technology, we see that the values with 0.005 beyond them in each tail are ±2.71.
6.73 We use a t-distribution with df = 5. Using technology, we see that the area above 2.3 is 0.0349. (On a paper table, we may only be able to specify that the area is between 0.025 and 0.05.)
CHAPTER 6
285
6.74 We use a t-distribution with df = 7. Using technology, we see that the area above 1.5 is 0.0886. (On a paper table, we may only be able to specify that the area is between 0.05 and 0.10.)
6.75 We use a t-distribution with df = 19. Using technology, we see that the area below −1.0 is 0.165. (On a paper table, we may only be able to specify that the area is greater than 0.10.)
6.76 We use a t-distribution with df = 49. Using technology, we see that the area below −3.2 is 0.0012.
CHAPTER 6
286
6.77 We compute the standard errors using the formula: n = 30 :
SE
n = 200 :
SE
n = 1000 :
SE
25 σ = √ = √ = 4.56 n 30 σ 25 =√ =√ = 1.77 n 200 σ 25 =√ =√ = 0.79 n 1000
We see that as the sample size goes up, the standard error goes down. If the standard error goes down, the sample means are less spread out from the population mean, so the accuracy is better. 6.78 We compute the standard errors using the formula: σ=5:
SE
σ = 25 :
SE
σ = 75 :
SE
σ 5 =√ =√ = 0.5 n 100 σ 25 =√ =√ = 2.5 n 100 σ 75 =√ =√ = 7.5 n 100
If the population standard deviation is larger, the standard error of the sample means will also be larger. This make sense: if the data are more spread out, the sample means (assuming the same sample size) will be more spread out also. 6.79 The t-distribution is appropriate if the sample size is large (n ≥ 30) or if the underlying distribution appears to be relatively normal. We have concerns about the t-distribution only for small sample sizes and heavy skewness or outliers. In this case, the sample size is small (n = 12) but the distribution is not heavily skewed and it does not have extreme outliers. A condition of normality is reasonable, so the t-distribution is appropriate. For the degrees of freedom df and estimated standard error SE, we have: df = n − 1 = 12 − 1 = 11,
and
1.6 s SE = √ = √ = 0.46 n 12
6.80 The t-distribution is appropriate if the sample size is large (n ≥ 30) or if the underlying distribution appears to be relatively normal. We have concerns about the t-distribution only for small sample sizes and
CHAPTER 6
287
heavy skewness or outliers. In this case, the sample size is large enough (n = 75) that we can feel comfortable using the t-distribution, despite the clear skewness in the data. The t-distribution is appropriate. For the degrees of freedom df and estimated standard error SE, we have: df = n − 1 = 75 − 1 = 74,
and
10.1 s SE = √ = √ = 1.17 n 75
6.81 The t-distribution is appropriate if the sample size is large (n ≥ 30) or if the underlying distribution appears to be relatively normal. We have concerns about the t-distribution only for small sample sizes and heavy skewness or outliers. In this case, the sample size is small (n = 18) and the data are heavily skewed with some apparent outliers. It would not be appropriate to use the t-distribution in this case. We might try analyzing the data using simulation methods such as a bootstrap or randomization distribution. 6.82 The t-distribution is appropriate if the sample size is large (n ≥ 30) or if the underlying distribution appears to be relatively normal. We have concerns about the t-distribution only for small sample sizes and heavy skewness or outliers. In this case, the sample size is small (n = 12) and the data are heavily skewed with some apparent outliers. It would not be appropriate to use the t-distribution in this case. We might try analyzing the data using simulation methods such as a bootstrap or randomization distribution. 6.83 The t-distribution is appropriate if the sample size is large (n ≥ 30) or if the underlying distribution appears to be relatively normal. Both of these conditions look fine for this example so the t-distribution is appropriate. 6.84 No. The Central Limit Theorem says that the distribution of sample means will follow a normal distribution if the sample sizes are large. This does not apply to the sample itself. If your population is right-skewed, you can expect the samples to also be right-skewed. 6.85
(a) Here is a sketch of a normal distribution with mean of 1135 and standard deviation of 130 that represents the AvgSAT values for the population of all colleges.
(b) For the distribution of means for samples of size 100, the center is still at μ = 1135, but the standard deviation is now √σn = √130 = 13. By the Central Limit Theorem the distribution of AvgSAT means 100 for samples of size 100 will be normal with mean 11,235 and standard deviation 13.
288
CHAPTER 6
6.86
(a) Here is a sketch of a normal distribution with mean of 1135 and standard deviation of 130 that represents the MidACT values for the population of all colleges.
(b) For the distribution of means for samples of size 100, the center is still at μ = 24, but the standard 4 = 0.4. By the Central Limit Theorem the distribution of MidACT means deviation is now √σn = √100 for samples of size 100 will be normal with mean 24 and standard deviation 0.4.
CHAPTER 6 6.87
289
(a) Using technology, the area above 1180 for a normal distribution with mean 1135 and standard deviation 130 is 0.486. This would not be unusual at all.
(b) By the Central Limit Theorem the distribution of AvgSAT means for samples of size 100 will be normal = 13. The area above 1180 for this distribution is with mean 1135 and standard deviation of √130 100 only 0.00027 which would be very unusual.
(c) If the underlying population of AvgSAT scores was right-skewed, the calculation in part (a) would no longer be appropriate because the population does not follow a normal distribution. However, the calculation in part (b) is still appropriate because the Central Limit Theorem says the distribution of the sample means would still be normal for a large sample size. 6.88
(a) Using technology, the area above 25 for a normal distribution with mean 24 and standard deviation 4 is 0.401. This would not be unusual at all.
290
CHAPTER 6
(b) By the Central Limit Theorem the distribution of MidACT means for samples of size 100 will be 4 = 0.4. The area above 25 for this distribution is normal with mean 24 and standard deviation of √100 only 0.0062 which would be very unusual.
(c) If the underlying population of MidACT scores was right-skewed, the calculation in part (a) would no longer be appropriate because the population does not follow a normal distribution. However, the calculation in part (b) is still appropriate because the Central Limit Theorem says the distribution of the sample means would still be normal for a large sample size.
CHAPTER 6
291
Section 6.2-CI Solutions 6.89 For a confidence interval for μ using the t-distribution, we use s x ± t∗ · √ n We use a t-distribution with df = 29, so for a 95% confidence interval, we have t∗ = 2.05. The confidence interval is 12.7
±
12.7 10.6
± to
5.6 2.05 · √ 30 2.10 14.8
The best estimate for μ is x = 12.7, the margin of error is ±2.10, and the 95% confidence interval for μ is 10.6 to 14.8. We are 95% confident that the mean of the entire population is between 10.6 and 14.8. 6.90 For a confidence interval for μ using the t-distribution, we use s x ± t∗ · √ n We use a t-distribution with df = 41, so for a 95% confidence interval, we have t∗ = 2.02. The confidence interval is 84.6
±
84.6 82.17
± to
7.8 2.02 · √ 42 2.43 87.03
The best estimate for μ is x = 84.6, the margin of error is ±2.43, and the 95% confidence interval for μ is 82.17 to 87.03. We are 95% confident that the mean of the entire population is between 82.17 and 87.03. 6.91 For a confidence interval for μ using the t-distribution, we use s x ± t∗ · √ n We use a t-distribution with df = 99, so for a 90% confidence interval, we have t∗ = 1.66. (Since the sample size is so large, the t distribution value is almost identical to the standard normal z value.) The confidence interval is 3.1
±
3.1
±
0.4 1.66 · √ 100 0.066
3.034
to
3.166
The best estimate for μ is x = 3.1, the margin of error is ±0.066, and the 90% confidence interval for μ is 3.034 to 3.166. We are 90% confident that the mean of the entire population is between 3.034 and 3.166.
CHAPTER 6
292 6.92 For a confidence interval for μ using the t-distribution, we use s x ± t∗ · √ n
We use a t-distribution with df = 49, so for a 90% confidence interval, we have t∗ = 1.68. The confidence interval is 53.9 137.0 ± 1.68 · √ 50 137.0 ± 12.8 124.2
to
149.8
The best estimate for μ is x = 137, the margin of error is ±12.8, and the 90% confidence interval for μ is 124.2 to 149.8. We are 90% confident that the mean of the entire population is between 124.2 and 149.8. 6.93 For a confidence interval for μ using the t-distribution, we use s x ± t∗ · √ n We use a t-distribution with df = 9, so for a 99% confidence interval, we have t∗ = 3.25. The confidence interval is 12.5 46.1 ± 3.25 · √ 10 46.1 ± 12.85 33.25
to
58.95
The best estimate for μ is x = 46.1, the margin of error is ±12.85, and the 99% confidence interval for μ is 33.25 to 58.95. We are 99% confident that the mean of the entire population is between 33.35 and 58.95. 6.94 For a confidence interval for μ using the t-distribution, we use s x ± t∗ · √ n We use a t-distribution with df = 14, so for a 99% confidence interval, we have t∗ = 2.98. The confidence interval is 32.1 88.3 ± 2.98 · √ 15 88.3 ± 24.7 63.6 to 113.0 The best estimate for μ is x = 88.3, the margin of error is ±24.7, and the 99% confidence interval for μ is 63.6 to 113.0. We are 99% confident that the mean of the entire population is between 63.6 and 113.0. 6.95 The desired margin of error is M E = 5 and we have z ∗ = 1.96 for 95% confidence. We use σ̃ = 18 to approximate the standard deviation. We use the formula to find sample size: ∗ 2 2 z · σ̃ 1.96 · 18 n= = = 49.8 ME 5 We round up to n = 50. In order to ensure that the margin of error is within the desired ±5 units, we should use a sample size of 50 or higher.
CHAPTER 6
293
6.96 The desired margin of error is M E = 1 and we have z ∗ = 2.576 for 99% confidence. We use σ̃ = 3.4 to approximate the standard deviation. We use the formula to find sample size: ∗ 2 2 z · σ̃ 2.576 · 3.4 n= = = 76.7 ME 1 We round up to n = 77. In order to ensure that the margin of error is within the desired ±1 unit, we should use a sample size of 77 or higher. 6.97 The desired margin of error is M E = 0.5 and we have z ∗ = 1.645 for 90% confidence. We use σ̃ = 25 to approximate the standard deviation. We use the formula to find sample size: ∗ 2 2 z · σ̃ 1.645 · 25 n= = = 6765.1 ME 0.5 We round up to n = 6766. In order to ensure that the margin of error is within the desired ±0.5 units, we would need to use a sample size of 6766 or higher. 6.98 The desired margin of error is M E = 12 and we have z ∗ = 1.96 for 95% confidence. We use σ̃ = 125 to approximate the standard deviation. We use the formula to find sample size: ∗ 2 2 z · σ̃ 1.96 · 125 n= = = 416.8 ME 12 We round up to n = 417. In order to ensure that the margin of error is within the desired ±12 units, we should use a sample size of 417 or higher. 6.99 We have n = 517 with x = 12.85 and s = 63.66. We find the standard error using: 63.66 s = 2.80 SE = √ = √ n 517 For a confidence interval for a mean, we use the t-distribution with df = n − 1 = 517 − 1 = 516. For a 90% confidence interval, we have t∗ = 1.648. The confidence interval is Statistic
±
t∗ · SE
12.85 12.85
± ±
1.648 · (2.80) 4.614
to
17.464
8.236
We are 90% sure that the mean number of hectares burned in all Portugal forest fires is between 8.236 and 17.464. 6.100 We have n = 447 with x = 16.740 and s = 17.688. We find the standard error using: 17.688 s = 0.837 SE = √ = √ n 447 For a confidence interval for a mean, we use the t-distribution with df = n − 1 = 447 − 1 = 446. For a 99% confidence interval, we have t∗ = 2.587. The confidence interval is Statistic 16.740
± ±
t∗ · SE 2.587 · (0.837)
16.740 14.575
± to
2.165 18.905
CHAPTER 6
294
We are 99% sure that the mean number of hours spent on the computer in one week by all seniors in Pennsylvania is between 14.575 and 18.905. 6.101 We have n = 55 and x = 2.4 and s = 1.51. Using df = 54 and a 99% confidence level, we see that t∗ = 2.670. The 99% confidence interval is given by: Statistic x
± ±
z ∗ · SE s t∗ · √ n
2.4
±
2.4 1.856
± to
1.51 2.670 · √ 55 0.544 2.944
We are 99% sure that the mean number of kills per week for all US household cats is between 1.856 and 2.944. 6.102 We use a t-distribution with df = 360, so for a 99% confidence interval, we have t∗ = 2.59. The confidence interval is x
±
6.504
±
6.504 5.743
± to
s t∗ · √ n 5.584 2.59 · √ 361 0.761 7.265
We are 99% confident that the average number of hours of television watched per week by students who take the introductory statistics course at this university is between 5.743 and 7.265. 6.103
(a) The margin of error for estimating μ is given by s M E = t∗ · √ n
For a t-distribution with df = 2005 and 99% confidence, we have t∗ = 2.578. The margin of error is 1.4 s = 0.08 M E = t∗ · √ = 2.578 · √ n 2006 With 99% confidence, the best estimate of the average number of close confidants is 2.2 with a margin of error of 0.08. (b) The 99% confidence interval is the best estimate plus/minus the margin of error. We see that the 99% confidence interval is x ± M E = 2.2 ± 0.08 = 2.12 to 2.28 We are 99% sure that the average number of close confidants for US adults is between 2.12 and 2.28.
CHAPTER 6
295
6.104 We use a t-distribution with df = 49, so for a 95% confidence interval, we have t∗ = 2.01. The confidence interval is x
±
3.1
±
3.1
±
s t∗ · √ n 0.72 2.01 · √ 50 0.20
2.9
to
3.3
The best estimate for the length of gribbles is 3.1 mm, with a margin of error for our estimate of ±0.2. The 95% confidence interval is 2.9 to 3.3, and we are 95% confident that the average length of all gribbles is between 2.9 and 3.3 mm. We need to assume that the sample of gribbles is a random sample, or at least a representative sample. 6.105 We use a t-distribution with df = 98, so for a 95% confidence interval, we have t∗ = 1.98. The confidence interval is x
±
564
±
564 539.7
± to
s t∗ · √ n 122 1.98 · √ 99 24.3 588.3
We are 95% sure that the mean number of unique genes in the gut bacteria of European individuals is between 539.7 and 588.3 million. 6.106 We have n = 210 with x = 32.6 and s = 18.2. We find the standard error using: 18.2 s = 1.256 SE = √ = √ n 210 For a confidence interval for a mean, we use the t-distribution with df = n − 1 = 210 − 1 = 209. For a 95% confidence interval, we have t∗ = 1.971. The confidence interval is Statistic 32.6 32.6
± ± ±
t∗ · SE 1.971 · (1.256) 2.476
30.124
to
35.076
We are 95% sure that the mean age at which transgender adults begin transitioning is between 30.124 and 35.076 years old. 6.107 We have n = 210 with x = 6.6 and s = 3.5. We find the standard error using: 3.5 s = 0.242 SE = √ = √ n 210
CHAPTER 6
296
For a confidence interval for a mean, we use the t-distribution with df = n − 1 = 210 − 1 = 209. For a 90% confidence interval, we have t∗ = 1.652. The confidence interval is Statistic 6.6
± ±
t∗ · SE 1.652 · (0.242)
6.6 6.2
± to
0.40 7.0
We are 90% sure that the mean age at which transgender people first sense that they are transgender is between 6.2 years old and 7.0 years old. 6.108 The parameter we are estimating is mean weight gain over the time of the study of all mice living with dim light at night. To find a confidence interval for a mean using the t-distribution, we use Sample statistic ± t∗ · SE The relevant sample statistic is x = 7.9. For a 90% √ confidence interval with 10 − 1 = 9 degrees of freedom, we use t∗ = 1.83 and the standard error is SE = s/ n. The confidence interval is s x ± t∗ · √ n 3.0 7.9 ± 1.83 · √ 10 7.9 ± 1.74 6.16 to 9.64 We are 90% confident that the mean weight gain of mice living with dim light at night will be between 6.16 and 9.64 grams. 6.109 The sample size of n = 9 is quite small, so we require a condition of approximate normality for the underlying population in order to use the t-distribution. In the dotplot of the data, it appears that the data might be right skewed and there is quite a large outlier. It is probably more reasonable to use other methods, such as a bootstrap distribution, to compute a confidence interval using this data. 6.110
(a) The sample mean is 4.21 and standard deviation is 45.34, both given in the computer output.
(b) Based on the five number summary the flight time differences are probably skewed right with at least one large outlier. The maximum of 872 minutes is much farther from the median at −5 than is the minimum at −51. We shouldn’t assume the sample is normally distributed. (c) Although the sample may not be normally distributed, with a sample size of 1000 we can use the Central Limit Theorem to proceed with the t-distribution. (d) We construct a 95% confidence interval as x̄ ± t ∗ √sn = 4.21 ± 1.962 √45.34 = (1.40, 7.02) minutes. 1000 (e) We are 95% confident that the average actual arrival times for United flights in December is between 1.40 and 7.02 minutes later than the average scheduled arrival time. 6.111 Using StatKey or other technology to create a bootstrap distribution, we see for one set of 5000 simulations that SE ≈ 4.80. (Answers may vary slightly with other simulations.) Using the formula from the Central Limit Theorem, and using s = 24.92 as an estimate for σ, we have 24.92 s = 4.89 SE = √ = √ n 26 We see that the bootstrap standard error and the formula match relatively closely.
CHAPTER 6
297
6.112 Using StatKey or other technology to create a bootstrap distribution, we see for one set of 1000 simulations that SE ≈ 0.92. (Answers may vary slightly with other simulations.) Using the formula from the Central Limit Theorem, and using s = 20.72 as an estimate for σ, we have s 20.72 SE = √ = √ = 0.93 n 500 We see that the bootstrap standard error and the formula match very closely. 6.113 Using StatKey or other technology to create a bootstrap distribution, we see for one set of 1000 simulations that SE ≈ 2.1. (Answers may vary slightly with other simulations.) Using the formula from the Central Limit Theorem, and using s = 11.11 as an estimate for σ, we have 11.11 s = 2.22 SE = √ = √ n 25 We see that the bootstrap standard error and the formula match very closely. 6.114 Using StatKey or other technology to create a bootstrap distribution, we see for one set of 1000 simulations that SE ≈ 0.11. (Answers may vary slightly with other simulations.) Using the formula from the Central Limit Theorem, and using s = 0.765 as an estimate for σ, we have s 0.765 = 0.108 SE = √ = √ n 50 We see that the bootstrap standard error and the formula match very closely. 6.115 We use StatKey or other technology to create a bootstrap distribution with at least 1000 simulated means from samples of the Atlanta commute distances. To find a 95% confidence interval we find the endpoints that contain 95% of the simulated means. For one set of 1000 bootstrap means shown below we find that a 95% confidence interval for mean Atlanta commute distance goes from 16.92 to 19.34 miles.
For a 95% confidence interval with df = 499, we have t∗ = 1.965. Using the t-distribution and the formula for standard error, we have 13.798 18.156 ± 1.965 · √ = 18.156 ± 1.21 = (16.95, 19.37) 500
CHAPTER 6
298 The two methods give very similar intervals.
6.116 We use StatKey or other technology to create a bootstrap distribution with at least 1000 simulated means from samples of the Mustang prices. To find a 95% confidence interval we find the endpoints that contain 95% of the simulated means. For one set of 1000 bootstrap means shown below we find that a 95% confidence interval for mean Mustang price goes from 11.56 to 20.32 (thousand) dollars.
For a 95% confidence interval with df = 24, we have t∗ = 2.06. Using the t-distribution and the formula for standard error, we have 11.11 = 15.98 ± 4.58 = (11.40, 20.56) 15.98 ± 2.06 · √ 25 The two methods give reasonably similar intervals. 6.117
(a) For a confidence interval for μ using the t-distribution, we use s x ± t∗ · √ n
Using a t-distribution with df = 314, for a 95% confidence interval, we have t∗ = 1.97. The confidence interval is
77.03
±
33.83 1.97 · √ 315 3.76
73.27
to
80.79
77.03
±
We are 95% confident that the average number of grams of fat consumed per day by all US adults is between 73.27 grams and 80.79 grams. (b) The margin of error is ±3.76 grams of fat. (c) Since the margin of error is ±3.76 with a sample size of n = 315, we’ll definitely need a sample size larger than 315 to get the margin of error down to ±1. To see how much larger, we use the formula
CHAPTER 6
299
for determining sample size. The margin of error we desire is M E = 1, and for 95% confidence we use z ∗ = 1.96. We can use the sample statistic s = 33.83 as our best estimate for σ. We have: ∗ 2 2 z · σ̃ 1.96 · 33.83 n= = = 4396.6 ME 1 We round up to n = 4397. We would need to obtain data on fat consumption from a sample of 4397 or more people in order to get the margin of error down to within ±1 gram. 6.118 With a sample size of 2006, we can show that the margin of error for the original study is 0.08. If we want the smaller 0.05 margin of error, we will need a larger sample size. How much larger? We find the needed sample size using ∗ 2 z · σ̃ n= ME For 99% confidence, we use z ∗ = 2.576 and we use the standard deviation 1.4 from our earlier sample as our estimated standard deviation σ̃. The desired margin of error is M E = 0.05. Using the formula, we have ∗ 2 2 z · σ̃ 2.576 · 1.4 n= = = 5202.45 ME 0.05 Since sample size must be an integer, we round up and recommend a sample of size 5203 for a margin of error of 0.05, with 99% confidence. 6.119
(a) For a confidence interval for μ using the t-distribution, we use s x ± t∗ · √ n
Since the sample size is so large, we can use either a t-distribution or a standard normal distribution. Using a t-distribution with df = 119, for a 99% confidence interval, we have t∗ = 2.62. The confidence interval is 87.6 290 ± 2.62 · √ 120 290 ± 21.0 269.0 to 311.0
We are 99% confident that the mean number of polyester microfibers entering wastewater when washing a fleece garment is between 269 and 311 per liter. (b) The margin of error is ±21.0 microfibers per liter. (c) Since the margin of error is ±21.0 with a sample size of n = 120, we’ll definitely need a sample size larger than 120 to get the margin of error down to ±5. To see how much larger, we use the formula for determining sample size. The margin of error we desire is M E = 5, and for 99% confidence we use z ∗ = 2.576. We can use the sample statistic s = 87.6 as our best estimate for σ. We have: ∗ 2 2 z · σ̃ 2.576 · 87.6 n= = = 2036.85 ME 5 We round up to n = 2037. We would need to obtain at least 2037 samples of wastewater after washing fleece to get the margin of error down to ±5 particles per liter of wastewater.
CHAPTER 6
300 6.120
(a) For a confidence interval for μ using the t-distribution, we use s x ± t∗ · √ n
Since the sample size is so large, we can use either a t-distribution or a standard normal distribution. Using a t-distribution with df = 89, for a 99% confidence interval, we have t∗ = 2.63. The confidence interval is 8.2 18.3 ± 2.63 · √ 90 18.3 ± 2.27 16.03 to 20.57 We are 99% confident that the mean number of polyester microfibers found on the world’s beaches is between 16.03 and 20.57 particles per 250 mL of sediment. (b) The margin of error is ±2.27 microfibers per liter. (c) Since the margin of error is ±2.27 with a sample size of n = 90, we’ll need a sample size larger than 90 to get the margin of error down to ±1. To see how much larger, we use the formula for determining sample size. The margin of error we desire is M E = 1, and for 99% confidence we use z ∗ = 2.576. We can use the sample statistic s = 8.2 as our best estimate for σ. We have: ∗ 2 2 z · σ̃ 2.576 · 8.2 n= = = 446.2 ME 1 We round up to n = 447. We would need to obtain data from 447 beach sites to get the margin of error down to ±1 particle per 250 mL of sediment. 6.121 We use z ∗ = 1.96 for 95% confidence, and we use σ̃ = 30. For a desired margin of error of M E = 10, we have: ∗ 2 2 z · σ̃ 1.96 · 30 n= = = 34.6 ME 10 We round up to n = 35. For a desired margin of error of M E = 5, we have: ∗ 2 2 1.96 · 30 z · σ̃ = = 138.3 n= ME 5 We round up to n = 139. For a desired margin of error of M E = 1, we have: ∗ 2 2 1.96 · 30 z · σ̃ = = 3457.4 n= ME 1 We round up to n = 3458. We see that the sample size goes up as we require more accuracy. Or, put another way, a larger sample size gives greater accuracy.
CHAPTER 6
301
6.122 We have M E = 3 for the margin of error, and σ̃ = 30 for our estimate of the standard deviation. For 99% confidence, we use z ∗ = 2.576 to give: ∗ 2 2 2.576 · 30 z · σ̃ = = 663.6 n= ME 3 We round up to n = 664. For 95% confidence, we use z ∗ = 1.96 to give: ∗ 2 2 1.96 · 30 z · σ̃ = = 384.2 n= ME 3 We round up to n = 385. For 90% confidence, we use z ∗ = 1.645 to give: ∗ 2 2 1.645 · 30 z · σ̃ = = 270.6 n= ME 3 We round up to n = 271. We see that the sample size goes up as the level of confidence we want in the result goes up. Or, put another way, if we want greater confidence that the interval for a specific margin of error captures the population mean, we need to use a larger sample size. 6.123 We use z ∗ = 1.96 for 95% confidence, and we want a margin of error of M E = 3. Using σ̃ = 100, we have: n=
∗ 2 2 1.96 · 100 z · σ̃ = = 4268.4 ME 3
We round up to n = 4269. Using σ̃ = 50, we have:
∗ 2 2 z · σ̃ 1.96 · 50 n= = = 1067.1 ME 3
We round up to n = 1068. Using σ̃ = 10, we have: n=
∗ 2 2 1.96 · 10 z · σ̃ = = 42.7 ME 3
We round up to n = 43. Not surprisingly, we see that the more variability there is in the underlying data, the larger sample size we need to get the accuracy we want. If the original data are very spread out, we need a large sample size to get an accurate estimate. If, however, the original data are very narrowly focused, a smaller sample size will do the trick.
302
CHAPTER 6
6.124 Using any statistics package, we see that a 95% confidence interval 16.62 ± 0.69 or is 15.93 to 17.31. At this restaurant, the average percent added for a tip on a bill is between 15.93% and 17.31%. 6.125 Using any statistics package, we see that a 95% confidence interval is 12.79 ± 0.59 12.20 to 13.38. The average number of grams of fiber eaten in a day is between 12.20 grams and 13.38 grams.
CHAPTER 6
303
Section 6.2-HT Solutions 6.126 In general, the standardized test statistic is Sample Statistic − Null Parameter SE In this test for a mean, the sample statistic is x = 17.2 and the parameter from the null hypothesis is μ0 = 15. √ The standard error is SE = s/ n. The standardized test statistic is t=
x − μ0 17.2 − 15 √ = √ = 2.17 s/ n 6.4/ 40
This is an upper-tail test, so the p-value is the area above 2.17 in a t-distribution with df = 39. We see that the p-value is 0.0181. This p-value is relatively small, so at a 5% level, we do find evidence to support the alternative hypothesis that μ > 15.
6.127 In general, the standardized test statistic is Sample Statistic − Null Parameter SE In this test for a mean, the sample statistic √ is x = 91.7 and the parameter from the null hypothesis is μ0 = 100. The standard error is SE = s/ n. The standardized test statistic is t=
x − μ0 91.7 − 100 √ √ = = −3.64 s/ n 12.5/ 30
This is a lower-tail test, so the p-value is the area below −3.64 in a t-distribution with df = 29. We see that the p-value is 0.0005. This p-value is very small and below any reasonable significance level. There is strong evidence to support the alternative hypothesis that μ < 100.
CHAPTER 6
304 6.128 In general, the standardized test statistic is
Sample Statistic − Null Parameter SE x = 112.3 and the parameter from the null hypothesis is In this test for a mean, the sample statistic is √ μ0 = 120. The standard error is SE = s/ n. The standardized test statistic is t=
x − μ0 112.3 − 120 √ = √ = −4.18 s/ n 18.4/ 100
This is a lower-tail test, so the p-value is the area below −4.18 in a t-distribution with df = 99. We see that the p-value is 0.00003, or essentially zero. This p-value is very small and below any reasonable significance level. There is strong evidence to support the alternative hypothesis that μ < 100.
6.129 This sample size is quite small, but we are told that the underlying distribution is approximately normal so we can proceed with the t-test. In general, the standardized test statistic is Sample Statistic − Null Parameter SE In this test for a mean, the sample √ statistic is x = 13.2 and the parameter from the null hypothesis is μ0 = 10. The standard error is SE = s/ n. The standardized test statistic is t=
x − μ0 13.2 − 10 √ √ = = 1.27 s/ n 8.7/ 12
This is an upper-tail test, so the p-value is the area above 1.27 in a t-distribution with df = 11. We see that the p-value is 0.115. This p-value is larger than the significance level of 0.05 (and larger than any reasonable significance level), so we do not reject H0 and do not find sufficient evidence to support the alternative hypothesis that μ > 10.
CHAPTER 6
305
6.130 This sample size is quite small, but we are told that the underlying distribution is approximately normal so we can proceed with the t-test. In general, the standardized test statistic is Sample Statistic − Null Parameter SE In this test for a mean, the sample √ statistic is x = 4.8 and the parameter from the null hypothesis is μ0 = 4. The standard error is SE = s/ n. The standardized test statistic is t=
x − μ0 4.8 − 4 √ = 1.35 √ = s/ n 2.3/ 15
This is a two-tail test, so the p-value is two times the area above 1.35 in a t-distribution with df = 14. We see that the p-value is 2(0.0992) = 0.1984. This p-value is larger than any reasonable significance level, so we do not find enough evidence to support the alternative hypothesis that μ = 4.
6.131 In general, the standardized test statistic is Sample Statistic − Null Parameter SE In this test for a mean, the sample statistic is x = 432 and the parameter from the null hypothesis is √ μ0 = 500. The standard error is SE = s/ n. The standardized test statistic is t=
x − μ0 432 − 500 √ √ = = −4.99 s/ n 118/ 75
This is a two-tail test, so the p-value is two times the area below −4.99 in a t-distribution with df = 74. We see that the p-value is essentially zero. This p-value is extremely small and provides strong evidence to support the alternative hypothesis that μ = 500. 6.132 The null and alternative hypotheses are H0 :
μ = 634
Ha :
μ = 634
where μ represents the average number of social ties for a cell phone user. In general, the standardized test statistic is Sample Statistic − Null Parameter SE
CHAPTER 6
306
In this test for a mean, the sample statistic is x = 664 and the parameter from the null hypothesis is √ μ0 = 634. The standard error is SE = s/ n. The standardized test statistic is t=
x − μ0 664 − 634 √ √ = = 1.59 s/ n 778/ 1700
This is a two-tail test, so the p-value is two times the area above 1.59 in a t-distribution with df = 1699. (Since n is very large, we could just as easily use the normal distribution to estimate the p-value.) We see that the p-value is 2(0.056) = 0.112. This p-value is larger than even a 10% significance level, so we do not reject the null hypothesis. There is not sufficient evidence to conclude that US adults who are cell phone users have a different mean number of social ties than the average US adult. 6.133 The sample size of n = 7 is very small so it is important in using the t-distribution to know that the values are not heavily skewed. The mean for non-autistic male children is 1.15 billion, so the null and alternative hypotheses are H0 : Ha :
μ = 1.15 μ > 1.15
where μ represents the mean number of neurons, in billions, in the prefrontal cortex for male autistic children. In general, the standardized test statistic is Sample Statistic − Null Parameter SE In this test for a mean, the sample statistic √ is x = 1.94 and the parameter from the null hypothesis is μ0 = 1.15. The standard error is SE = s/ n. The standardized test statistic is t=
x − μ0 1.94 − 1.15 √ = 4.18 √ = s/ n 0.50/ 7
This is an upper-tail test, so the p-value is the area above 4.18 in a t-distribution with df = 6. Using technology we see that the p-value is 0.003. This p-value is very small so we reject the null hypothesis. There is strong evidence that, on average, male autistic children have an overabundance of neurons in the prefrontal cortex. 6.134 The sample size of n = 32 is large enough to justify using the t-distribution. The null and alternative hypotheses are H0 : Ha :
μ=0 μ>0
where μ represents the mean difference in number of pigeons. In general, the standardized test statistic is Sample Statistic − Null Parameter SE In this test for a mean, the sample √ statistic is x = 3.9 and the parameter from the null hypothesis is μ0 = 0. The standard error is SE = s/ n. The standardized test statistic is t=
x − μ0 3.9 − 0 √ = 3.24 √ = s/ n 6.8/ 32
CHAPTER 6
307
This is an upper-tail test, so the p-value is the area above 3.24 in a t-distribution with df = 31. Using technology, we see that the p-value is 0.0014. This p-value is very small so we reject the null hypothesis. There is strong evidence that the mean number of pigeons is higher for the neutral person, even after they switch clothes and behave the same. This suggests that pigeons can recognize faces and hold a grudge. 6.135 Let μ denote the mean hours of sleep per night for all her students. We wish to test the hypotheses H0 : μ = 8 vs Ha : μ < 8. The distribution is not exactly symmetric, but at least has no outliers, so we proceed with the t-distribution. The t-statistic is t=
6.2 − 8 √ = −3.67 1.7/ 12
The p-value from the lower tail of a t-distribution with 11 degrees of freedom is 0.0018. This is a very small p-value, providing strong evidence to conclude that the mean amount of sleep for her students is less than 8 hours. 6.136 (a) We are testing the null hypothesis that the average air pressure is at the limit of 12.5 psi, against the alternative that the average air pressure is less than 12.5. We calculate our test statistic x̄−μ √ 0 = 11.1−12.5 √ = −11.6, which compared to the t-distribution with 10 degrees of freedom results in s/ n 0.40/ 11 a p-value of approximately 0. We conclude that the average air pressure in footballs used by the New England Patriots during the 2014 season was significantly less than the allowable limit of 12.5 psi. (b) It is not fair to assume that the sample of 11 footballs from one game is representative of all footballs used by the Patriots during the 2014 season. We would need to take a random sample of 11 footballs from throughout the season to get a representative sample. 6.137 In each of parts (a), (b), and (c), the hypotheses are: H0 : Ha :
μ = 278 μ > 278
where μ stands for the mean cost of a house in the relevant state, in thousands of dollars. (a) For New York, we calculate the test statistic t=
x − μ0 365.3 − 278 √ = √ = 1.50 s/ n 317.8/ 30
Using an upper-tail test with a t-distribution with 29 degrees of freedom, we get a p-value of 0.072. At a 5% level, we do not reject the null hypothesis. We do not have sufficient evidence based on this sample to conclude that the average cost of a house in New York State is significantly more then 278 thousand dollars. (b) For New Jersey, we calculate the test statistic t=
x − μ0 328.5 − 278 √ √ = = 1.75 s/ n 158.0/ 30
Using an upper-tail test with a t-distribution with 29 degrees of freedom, we get a p-value of 0.045. We reject the null hypothesis and find sufficient evidence to conclude that the average cost of a house in New Jersey is significantly more than 278 thousand dollars.
CHAPTER 6
308
(c) Note that the sample mean for Pennsylvania, x = 265.6 is less than the typical US cost, μ = 278. We don’t need a statistical test to see that the average cost in Pennsylvania is not significantly greater than 278 thousand. However if we conducted the test we would have t=
x − μ0 265.6 − 278 √ √ = = −0.50 s/ n 137.1/ 30
Using an upper-tail test with a t-distribution with 29 degrees of freedom, we get a p-value of 0.688. (Notice that we still use the upper-tail, which gives a p-value in this case that is greater than 0.5.) (d) New Jersey has the most evidence that the state average is greater than 278 thousand dollars (smallest p-value). This may surprise you, since the mean for NY homes is greater. However, the standard deviation is also important in determining evidence for or against a claim. 6.138 The t-distribution might not be appropriate, since the sample size of 10 houses is quite small and the sample appears to be right skewed with a possible high outlier. We return to the methods of Chapter 4 and use StatKey or other technology to perform a randomization test. To test H0 : μ = 278 vs Ha : μ = 278 we create randomization samples using the Canton data, after shifting the mean from x = 141.4 by adding 136.6 to all of the prices in order to match the null mean of 278. The means for 1000 such randomization samples are shown below.
The mean price from the original Canton sample, x = $141.4 (thousand), is far in the tail of this randomization distribution, well below any of the randomization means. This gives a p-value ≈ 0 and provides very strong evidence that the mean home price in Canton, NY is much less than the typical national value of 278 thousand. 6.139
(a) We see that n = 30 with x = 0.25217 and s = 0.01084.
(b) The hypotheses are: H0 :
μ = 0.260
Ha :
μ = 0.260
where μ represents the mean of all team batting averages in Major League Baseball. We calculate the test statistic x − μ0 0.25217 − 0.260 √ √ = t= = −3.96 s/ n 0.01084/ 30 We use a t-distribution with 29 degrees of freedom to see that the proportion below −3.96 is 0.000022. Since this is a two-tail test, the p-value is 2(0.000022) = 0.00044. We reject the null hypothesis and conclude that the average team batting average is different from (and less than) 0.260.
CHAPTER 6
309
(c) The test statistic matches the computer output and the p-value is the same up to rounding off. 6.140
(a) We see that n = 53 with x = 6.591 and s = 1.288.
(b) The hypotheses are: H0 : Ha :
μ=7 μ = 7
where μ represents the mean pH level of all Florida lakes. We calculate the test statistic t=
x − μ0 6.591 − 7 √ = √ = −2.31 s/ n 1.288/ 53
We use a t-distribution with 52 degrees of freedom to see that the area below −2.31 is 0.0124. Since this is a two-tail test, the p-value is 2(0.0124) = 0.0248. We reject the null hypothesis at a 5% significance level and conclude that average pH of Florida lakes is different from the neutral value of 7. Florida lakes are, in general, somewhat more acidic than neutral. (c) The test statistic matches the computer output exactly and the p-value is the same up to rounding off. 6.141
(a) The hypotheses are: H0 :
μ = 1.0
Ha :
μ < 1.0
where μ represents the mean mercury level of fish in all Florida lakes. Some computer output for the test is shown: One-Sample T: Avg_Mercury Test of mu = 1 vs < 1 Variable Avg_Mercury
N 53
Mean 0.5272
StDev 0.3410
SE Mean 0.0468
95% Upper Bound T 0.6056 -10.09
P 0.000
We see that the p-value is approximately 0, so there is strong evidence that the mean mercury level of fish in Florida lakes is less than 1.0 ppm. (b) The hypotheses are: H0 : Ha :
μ = 0.5 μ < 0.5
where μ represents the mean mercury level of fish in all Florida lakes. Some computer output for the test is shown: One-Sample T: Avg_Mercury Test of mu = 0.5 vs < 0.5 Variable Avg_Mercury
N 53
Mean 0.5272
StDev 0.3410
SE Mean 0.0468
95% Upper Bound 0.6056
T 0.58
P 0.718
CHAPTER 6
310
We see that the p-value is 0.718, so there is no evidence at all that the mean is less than 0.5. (In fact, we see that the sample mean, x = 0.5272 ppm, is actually more than 0.5.) 6.142 The hypotheses are: H0 :
μ = 160
Ha :
μ < 160
where μ represents the mean number of fouls per season for NBA players who play regularly. Some computer output for the test is shown: Descriptive Statistics N Mean StDev SE Mean 193 151.87 54.68 3.94 Test Null hypothesis Alternative hypothesis T-Value P-Value -2.07 0.020
H: H:
= 160 < 160
We see that the p-value is 0.020, so there is strong evidence that the mean number of fouls for regular NBA players is less than 160 for a full season.
CHAPTER 6
311
Section 6.3-D Solutions 6.143
(a) The differences in sample proportions will have a standard error of pA (1 − pA ) pB (1 − pB ) 0.70(0.30) 0.60(0.40) + = 0.086 SE = + = nA nB 50 75
(b) We check the sample size for Group A: nA pA = 50(0.70) = 35 and nA (1 − pA ) = 50(0.30) = 15, and for Group B: nB pB = 75(0.60) = 45 and nB (1 − pB ) = 75(0.40) = 30. In both cases, the sample size is large enough and the normal distribution applies. 6.144
(a) The differences in sample proportions will have a standard error of pA (1 − pA ) pB (1 − pB ) 0.15(0.85) 0.20(0.80) + = 0.031 + = SE = nA nB 300 300
(b) The sample sizes of 300 are large enough for the normal distribution to apply. 6.145
(a) The differences in sample proportions will have a standard error of pA (1 − pA ) pB (1 − pB ) 0.20(0.80) 0.30(0.70) SE = + = 0.076 + = nA nB 100 50
(b) We check the sample size for Group A: nA pA = 100(0.20) = 20 and nA (1 − pA ) = 100(0.80) = 80, and for Group B: nB pB = 50(0.30) = 15 and nB (1 − pB ) = 50(0.7) = 35. The sample sizes are large enough for the normal distribution to apply. 6.146
(a) The differences in sample proportions will have a standard error of pA (1 − pA ) pB (1 − pB ) 0.40(0.60) 0.10(0.90) + = 0.067 SE = + = nA nB 80 60
(b) We check the sample size for Group A: nA pA = 80(0.40) = 32 and nA (1 − pA ) = 80(0.60) = 48, and for Group B: nB pB = 60(0.10) = 6. Since nB pB < 10, the normal distribution does not apply in this case. For inference on the difference in sample proportions in this case, we should use bootstrap or randomization methods. 6.147
(a) The differences in sample proportions will have a standard error of pA (1 − pA ) pB (1 − pB ) 0.30(0.70) 0.24(0.76) + = 0.106 + = SE = nA nB 40 30
(b) We check the sample size for Group A: nA pA = 40(0.30) = 12 and nA (1 − pA ) = 40(0.70) = 28, and for Group B: nB pB = 30(0.24) = 7.2. Since nB pB < 10, the normal distribution does not apply in this case. For inference on the difference in sample proportions in this case, we should use bootstrap or randomization methods.
CHAPTER 6
312 6.148
(a) The differences in sample proportions will have a standard error of pA (1 − pA ) pB (1 − pB ) 0.58(0.42) 0.49(0.51) + = 0.042 + = SE = nA nB 500 200
(b) The sample sizes of 500 and 200 are large enough for the normal distribution to apply. 6.149 (a) This compares two proportions (iPhone vs Different Cell Phone) drawn from the same group (students). The methods of this section do not apply to this type of difference in proportions. (b) This compares proportions (study abroad) for two different groups (public vs private). The methods of this section are appropriate for this type of difference in proportions. (c) This compares two proportions (in-state vs out-of-state) drawn from the same group (students). The methods of this section do not apply to this type of difference in proportions. (d) This compares proportions (get financial aid) from two different groups (in-state vs out-of-state). The methods of this section are appropriate for this type of difference in proportions. 6.150 (a) This compares two proportions (one brand of cola vs the other brand) drawn from the same group (tasters). The methods of this section do not apply to this type of difference in proportions. (b) This compares the proportion who voted using two different groups (males vs females). The methods of this section are appropriate for this type of difference in proportions. (c) This compares the proportion who graduate using two different groups (athletes and non-athletes). The methods of this section are appropriate for this type of difference in proportions. (d) This compares two proportions (proportion in favor vs proportion opposed) drawn from the same group (voters). The methods of this section do not apply to this type of difference in proportions.
CHAPTER 6
313
Section 6.3-CI Solutions 6.151 The sample sizes are both large enough to use the normal distribution. For a confidence interval using the normal distribution, we use Sample statistic ± z ∗ · SE The relevant sample statistic for a confidence interval for a difference in proportions is p̂1 − p̂2 = 0.72 − 0.68. For a 95% confidence interval, we have z ∗ = 1.96, and we use the sample proportions in computing the standard error. The confidence interval is p̂1 (1 − p̂1 ) p̂2 (1 − p̂2 ) (p̂1 − p̂2 ) ± z ∗ · + n1 n2 0.72(0.28) 0.68(0.32) + (0.72 − 0.68) ± 1.96 · 500 300 0.04 ± 0.066 −0.026 to 0.106 The best estimate for the difference in the two proportions p1 − p2 is 0.04, the margin of error is ±0.066, and the 95% confidence interval for p1 − p2 is −0.026 to 0.106. 6.152 The sample sizes are both large enough to use the normal distribution (although just barely large enough in Group 1). For a confidence interval using the normal distribution, we use Sample statistic ± z ∗ · SE The relevant sample statistic for a confidence interval for a difference in proportions is p̂1 − p̂2 = 0.20 − 0.32. For a 90% confidence interval, we have z ∗ = 1.645, and we use the sample proportions in computing the standard error. The confidence interval is p̂1 (1 − p̂1 ) p̂2 (1 − p̂2 ) (p̂1 − p̂2 ) ± z ∗ · + n1 n2 0.20(0.80) 0.32(0.68) + (0.20 − 0.32) ± 1.645 · 50 100 −0.12 ± 0.121 −0.241 to 0.001 The best estimate for the difference in the two proportions p1 − p2 is −0.12, the margin of error is ±0.121, and the 90% confidence interval for p1 − p2 is −0.241 to 0.001. 6.153 The sample sizes are both large enough to use the normal distribution. For a confidence interval using the normal distribution, we use Sample statistic ± z ∗ · SE The relevant sample statistic for a confidence interval for a difference in proportions is p̂1 − p̂2 = 114/150 − 135/150 = 0.76−0.90. For a 99% confidence interval, we have z ∗ = 2.576, and we use the sample proportions in computing the standard error. The confidence interval is p̂1 (1 − p̂1 ) p̂2 (1 − p̂2 ) (p̂1 − p̂2 ) ± z ∗ · + n1 n2
CHAPTER 6
314 (0.76 − 0.90)
±
2.576 ·
−0.14 −0.25
± to
0.110 −0.03
0.76(0.24) 0.90(0.10) + 150 150
The best estimate for the difference in the two proportions p1 − p2 is −0.14, the margin of error is ±0.11, and the 99% confidence interval for p1 − p2 is −0.25 to −0.03. 6.154 The sample sizes are both large enough to use the normal distribution. For a confidence interval using the normal distribution, we use Sample statistic ± z ∗ · SE The relevant sample statistic for a confidence interval for a difference in proportions is p̂1 − p̂2 = 240/500 − 450/1000 = 0.48−0.45. For a 95% confidence interval, we have z ∗ = 1.96, and we use the sample proportions in computing the standard error. The confidence interval is p̂1 (1 − p̂1 ) p̂2 (1 − p̂2 ) + (p̂1 − p̂2 ) ± z ∗ · n1 n2 0.48(0.52) 0.45(0.55) + (0.48 − 0.45) ± 1.96 · 500 1000 0.03 ± 0.054 −0.024
to
0.084
The best estimate for the difference in the two proportions p1 − p2 is 0.03, the margin of error is ±0.054, and the 95% confidence interval for p1 − p2 is −0.024 to 0.084. 6.155 Using I for the Internet users and N for the non-Internet users, we see that p̂I =
807 = 0.46 1754
and
p̂N =
130 = 0.27 483
In the sample, the Internet users are more trusting. We estimate the difference in proportions pI − pN . The relevant sample statistic to estimate this difference is p̂I − p̂N = 0.46 − 0.27. For a 90% confidence interval, we have z ∗ = 1.645. The confidence interval is z ∗ · SE p̂I (1 − p̂I ) p̂N (1 − p̂N ) z∗· + nI nN 0.46(0.54) 0.27(0.73) + 1.645 · 1754 483 0.039
Sample statistic
±
(p̂I − p̂N )
±
(0.46 − 0.27)
±
0.19
±
0.151
to 0.229
We are 90% confident that the proportion of Internet users who agree that most people can be trusted is between 0.151 and 0.229 higher than the proportion of people who do not use the Internet who agree with that statement.
CHAPTER 6
315
6.156 (a) We have p̂1 = 26/38 = 0.684 and p̂2 = 16/38 = 0.421 so the difference in proportions is p̂1 − p̂2 = 0.684 − 0.421 = 0.263. (b) The sample sizes are both large enough to use the normal distribution. For a confidence interval using the normal distribution, we use Sample statistic ± z ∗ · SE The relevant sample statistic for a confidence interval for a difference in proportions is p̂1 − p̂2 = 0.684 − 0.421 = 0.263. For a 99% confidence interval, we have z ∗ = 2.576, and we use the sample proportions in computing the standard error. The confidence interval is p̂1 (1 − p̂1 ) p̂2 (1 − p̂2 ) + (p̂1 − p̂2 ) ± z ∗ · n1 n2 0.684(1 − 0.684) 0.421(1 − 0.421) + (0.684 − 0.421) ± 2.576 · 38 38 0.263 ± 2.576 · (0.110) 0.263 ± 0.283 −0.020
to
0.546
The 99% confidence interval for the difference in proportions is −0.020 to 0.546. (c) The best estimate for the difference in the two proportions p1 −p2 is the sample statistic p̂1 − p̂2 = 0.263 and the margin of error is ±0.283. (d) No, since the confidence interval includes the possibility of no difference, zero, we cannot be confident at the 99% level that the new drug is better. 6.157 Letting p̂e and p̂w represent the proportion of errors in electronic and written prescriptions, respectively, we have 254 1478 = 0.066 and p̂w = = 0.384 p̂e = 3848 3848 The sample sizes are both very large, so it is reasonable to use a normal distribution. For 95% confidence the standard normal endpoint is z ∗ = 1.96. This gives p̂e (1 − p̂e ) p̂w (1 − p̂w ) + (p̂e − p̂w ) ± z ∗ · ne nw 0.066(1 − 0.066) 0.384(1 − 0.384) + (0.066 − 0.384) ± 1.96 · 3848 3848 −0.318 ± 0.017 −0.335
to −0.301
The margin of error is very small because the sample size is so large. Note that if we had subtracted the other way to find a confidence interval for pw − pe , the interval would be 0.301 to 0.335, with the same interpretation. In each case, we are 95% sure that the error rate is between 0.335 and 0.301 less for electronic prescriptions. This is a very big change! Since zero is not in this interval, it is not plausible that there is no difference. We can be confident that there are fewer errors with electronic prescriptions. 6.158 (a) We have p̂f = 726/1423 = 0.510 and p̂m = 505/1329 = 0.380, so the difference in proportions is p̂f − p̂m = 0.510 − 0.380 = 0.130.
CHAPTER 6
316
(b) The sample sizes are both very large, so it is reasonable to use a normal distribution. For 95% confidence the standard normal endpoint is z ∗ = 1.96. This gives Statistic
±
(p̂f − p̂m )
±
(0.510 − 0.380)
±
0.130
±
z ∗ · SE p̂f (1 − p̂f ) p̂m (1 − p̂m ) ∗ z · + nf nm 0.510(1 − 0.510) 0.380(1 − 0.380) + 1.96 · 1423 1329 0.037
0.093
to
0.167
We are 95% sure that the proportion of females visiting the public library is between 0.093 and 0.167 higher than the proportion of males visiting the public library. (c) Because 0 (no difference) is not in this interval, we can be confident (at the 5% level) of a difference. Females are more likely to visit the public library. 6.159 We have p̂C = 421/832 = 0.506 and p̂N = 810/1920 = 0.422. The sample sizes are both very large, so it is reasonable to use a normal distribution. For 90% confidence the standard normal endpoint is z ∗ = 1.645. This gives Statistic
±
(p̂C − p̂N )
±
(0.506 − 0.422)
±
0.084
±
z ∗ · SE p̂C (1 − p̂C ) p̂N (1 − p̂N ) z∗ · + nC nN 0.506(1 − 0.506) 0.422(1 − 0.422) + 1.645 · 832 1920 0.034
0.050
to
0.118
We are 90% sure that the proportion of people with children visiting the public library is between 0.050 and 0.118 higher than the proportion of people without children visiting the public library. 6.160 We have p̂M = 10/50 = 0.20 and p̂E = 18/50 = 0.36. The 95% confidence interval for pM − pE is
(p̂M − p̂E )
±
(0.20 − 0.36)
±
−0.16 −0.305
± to
p̂M (1 − p̂M ) p̂E (1 − p̂E ) + nM nE 0.20(0.80) 0.36(0.64) + 1.645 · 50 50 0.145 −0.015 z∗ ·
We are 90% sure that the survival rate for metal tagged penguins is between 0.305 and 0.015 less than for electronic tagged penguins. This shows a significant difference at a 10% level.
CHAPTER 6
317
6.161 We have p̂M = 0.32 and p̂E = 0.44. The 95% confidence interval is
(p̂M − p̂E )
±
(0.32 − 0.44)
±
−0.12 −0.23
± to
p̂M (1 − p̂M ) p̂E (1 − p̂E ) + nM nE 0.32(0.68) 0.44(0.56) + 1.96 · 122 160 0.11 −0.01 z∗ ·
We are 95% sure that breeding success is between 0.23 and 0.01 less for metal tagged penguins. This shows a significant difference at a 5% level. 6.162 Using StatKey or other technology to create a bootstrap distribution of the differences in sample proportions, we see for one set of 1000 simulations that SE = 0.052. (Answers may vary slightly with other simulations.)
Using the formula with p̂A = 30/100 = 0.30 and p̂B = 50/250 = 0.2 as estimates for the two population proportions, we have SE =
pA (1 − pA ) pB (1 − pB ) + ≈ nA nB
0.30(0.70) 0.20(0.80) + = 0.052 100 250
We see that the bootstrap standard error and the formula match very closely. 6.163 Using StatKey or other technology to create a bootstrap distribution of the differences in sample proportions, we see for one set of 1000 simulations that SE = 0.048. (Answers may vary slightly with other simulations.)
318
CHAPTER 6
Using the formula with p̂A = 90/120 = 0.75 and p̂B = 180/300 = 0.60 as estimates for the two population proportions, we have pA (1 − pA ) pB (1 − pB ) 0.75(0.25) 0.60(0.40) SE = + = 0.049 + ≈ nA nB 120 300 We see that the bootstrap standard error and the formula match very closely. 6.164 We use StatKey or other technology to create a bootstrap distribution with at least 1000 simulated differences in proportion. We find the endpoints that contain 95% of the simulated statistics and see that this 95% confidence interval is 0.120 to 0.179.
Using the normal distribution and the formula for standard error, we have 0.87(0.13) 0.72(0.28) + = 0.15 ± 0.030 = (0.12, 0.18) (0.87 − 0.72) ± 1.96 · 800 2252
CHAPTER 6
319
The two methods give very similar confidence intervals. 6.165 We use StatKey or other technology to create a bootstrap distribution with at least 1000 simulated differences in proportion. We find the endpoints that contain 95% of the simulated statistics and see that this 95% confidence interval is 0.160 to 0.266.
Using the normal distribution and the formula for standard error, we have 0.82(0.18) 0.61(0.39) + = 0.21 ± 0.055 = (0.155, 0.265) (0.82 − 0.61) ± 1.96 · 460 520 The two methods give very similar results. 6.166 We find the proportion who die for those with an infection p̂I and those without p̂N . Using technology, we have p̂I = 0.286 with nI = 84 and p̂N = 0.138 with nN = 116. Also using technology, we find the 95% confidence interval for pI − pN to be 0.033 to 0.263. The proportion who die is between 0.033 and 0.263 higher for those with an infection at admission. 6.167 We find the proportion who die for males p̂m and females p̂f . Using technology, we have p̂m = 0.194 with nm = 124 and p̂f = 0.211 with nf = 76. Also using technology, we find the 95% confidence interval for pm − pf to be −0.132 to 0.098. The proportion who die is between 0.132 lower and 0.098 higher for males than it is for females.
CHAPTER 6
320 Section 6.3-HT Solutions
6.168 (a) For Group 1, the proportion who voted is p̂1 = 45/70 = 0.643. For Group 2, the proportion who voted is p̂2 = 56/100 = 0.56. For the pooled proportion, we combine the two groups and look at the proportion who voted. The combined group has 70 + 100 = 170 people in it, and 45 + 56 = 101 of them voted, so the pooled proportion is p̂ = 101/170 = 0.594. (b) We are testing for a difference in proportions, so we have H0 : p1 = p2 vs Ha : p1 = p2 . The sample sizes are large enough to use the normal distribution. In general, the standardized test statistic is z=
Sample Statistic − Null Parameter SE
In this test for a difference in proportions, the sample statistic is p̂1 − p̂2 and the parameter from the null hypothesis is 0, since we have H0 : p1 − p2 = 0. The standard error uses the pooled proportion. The standardized test statistic is z=
p̂1 − p̂2 p̂(1−p̂) p̂) + p̂(1− n1 n2
=
0.643 − 0.56 0.594(0.406) + 0.594(0.406) 70 100
= 1.08
This is a two-tail test, so the p-value is two times the area above 1.08 in a standard normal distribution. Using technology or a table, we see that the p-value is 2(0.140) = 0.280. This p-value is quite large so we do not find evidence of any difference between the two groups in the proportion who voted.
6.169 (a) For Group A, the proportion who survive is p̂A = 63/82 = 0.768. For Group B, the proportion who survive is p̂B = 31/67 = 0.463. For the pooled proportion, we combine the two groups and look at the overall proportion who survived. The combined group has 82 + 67 = 149 people in it, and 63 + 31 = 94 of these people survived, so the pooled proportion is p̂ = 94/149 = 0.631. (b) This is a test for a difference in proportions, and we have H0 : pA = pB vs Ha : pA > pB . The sample sizes are large enough to use the normal distribution. In general, the standardized test statistic is z=
Sample Statistic − Null Parameter SE
In this test for a difference in proportions, the sample statistic is p̂A − p̂B and the parameter from the null hypothesis is 0, since we have H0 : pA − pB = 0. The standard error uses the pooled proportion. The standardized test statistic is z=
p̂A − p̂B p̂(1−p̂) p̂) + p̂(1− nA nB
=
0.768 − 0.463 0.631(0.369) + 0.631(0.369) 82 67
= 3.84
CHAPTER 6
321
This is an upper-tail test, so the p-value is the area above 3.84 in a standard normal distribution. We see that the p-value is essentially zero. The p-value is very small so we find strong evidence that Treatment A is significantly better.
6.170 (a) We’ll use subscripts Y for yes these people have the genetic marker and N for no these people do not have the genetic marker. For people with the genetic marker, the proportion who have had depression is p̂Y = 0.38. For people without this specific genetic marker, the proportion who have had depression is p̂N = 0.12. For the pooled proportion, we need to know not just the proportions, but how many people in each group have had depression. We see that 0.38 · 42 = 16 people with the genetic marker have had depression and 0.12 · 758 = 91 people without the genetic marker have had depression. We combine the two groups and compute the proportion who have had depression. The combined group has 42 + 758 = 800 people in it, and 16 + 91 = 107 of them have had depression. The pooled proportion is p̂ = 107/800 = 0.134. (b) This is a test for a difference in proportions, and we have H0 : pY = pN vs Ha : pY > pN . The sample sizes are (just barely for the first group) large enough to use the normal distribution. In general, the standardized test statistic is z=
Sample Statistic − Null Parameter SE
In this test for a difference in proportions, the sample statistic is p̂Y − p̂N and the parameter from the null hypothesis is 0, since we have H0 : pY − pN = 0. The standard error uses the pooled proportion. The standardized test statistic is z=
p̂Y − p̂N p̂(1−p̂) p̂) + p̂(1− nY nN
=
0.38 − 0.12 0.134(0.866) + 0.134(0.866) 42 758
= 4.82
This is an upper-tail test, so the p-value is the area above 4.81 in a standard normal distribution. The test statistic 4.81 is almost five standard deviations above the mean, and the area beyond that is minuscule. The p-value is essentially zero, and we find very strong evidence that people with this specific genetic marker are more likely to suffer from clinical depression. 6.171 (a) For males, the proportion who plan to vote yes is p̂m = 0.24. For females, the proportion who plan to vote yes is p̂f = 0.32. For the pooled proportion, we need to know not just the proportions, but how many males and females in the samples plan to vote yes. We see that 0.24 · 50 = 12 males plan to vote yes and 0.32 · 50 = 16 females plan to vote yes in the samples. We combine the two groups and look at the proportion who plan to vote yes. The combined group has 50 + 50 = 100 people in it, and 12 + 16 = 28 of them plan to vote yes, so the pooled proportion is p̂ = 28/100 = 0.28.
CHAPTER 6
322
(b) This is a test for a difference in proportions, and we have H0 : pm = pf vs Ha : pm < pf . The sample sizes are (just barely) large enough to use the normal distribution. In general, the standardized test statistic is Sample Statistic − Null Parameter z= SE In this test for a difference in proportions, the sample statistic is p̂m − p̂f and the parameter from the null hypothesis is 0, since we have H0 : pm − pf = 0. The standard error uses the pooled proportion. The standardized test statistic is z=
p̂m − p̂f p̂(1−p̂) p̂) + p̂(1− nm nf
=
0.24 − 0.32 0.28(0.72) + 0.28(0.72) 50 50
= −0.89
This is a lower-tail test, so the p-value is the area below −0.89 in a standard normal distribution. Using technology or a table, we see that the p-value is 0.187. This p-value is quite large so we do not find evidence of any difference between males and females in the proportion who support the initiative. Note that it is likely that this lack of evidence is partly due to the small sample sizes. Further investigation with larger sample sizes might likely show evidence of a difference.
6.172 (a) For Airline A, the proportion arriving late is p̂A = 151/700 = 0.216. For Airline B, the proportion arriving late is p̂B = 87/500 = 0.174. For the pooled proportion, we combine the two groups and look at the proportion arriving late for the combined group. The combined group has 700 + 500 = 1200 flights in it, and 151 + 87 = 238 of them arrived late, so the pooled proportion is p̂ = 238/1200 = 0.198. (b) This is a test for a difference in proportions, and we have H0 : pA = pB vs Ha : pA = pB . The sample sizes are very large and we can use the normal distribution. In general, the standardized test statistic is Sample Statistic − Null Parameter z= SE In this test for a difference in proportions, the sample statistic is p̂A − p̂B and the parameter from the null hypothesis is 0, since we have H0 : pA − pB = 0. The standard error uses the pooled proportion. The standardized test statistic is z=
p̂A − p̂B p̂(1−p̂) p̂) + p̂(1− nA nB
=
0.216 − 0.174 0.198(0.802) + 0.198(0.802) 700 500
= 1.80
This is a two-tail test, so the p-value is two times the area above 1.80 in a standard normal distribution. Using technology or a table, we see that the p-value is 2(0.0359) = 0.072. The p-value is larger than
CHAPTER 6
323
0.05, so the results are significant at a 10% level but not at a 5% level. At a 5% level, we do not find evidence of a difference between the airlines in the proportion of flights that are late.
6.173 (a) For the treatment group, the proportion with pain relief is p̂T = 36/75 = 0.48. For the control (placebo) group, the proportion with pain relief is p̂C = 21/75 = 0.28. For the pooled proportion, we combine the two groups and look at the proportion with pain relief for the combined group. The combined group has 75 + 75 = 150 patients in it, and 36 + 21 = 57 of them had some pain relief, so the pooled proportion is p̂ = 57/150 = 0.38. (b) This is a test for a difference in proportions, and we have H0 : pT = pC vs Ha : pT > pC . The sample sizes are large enough to use the normal distribution. In general, the standardized test statistic is z=
Sample Statistic − Null Parameter SE
In this test for a difference in proportions, the sample statistic is p̂T − p̂C and the parameter from the null hypothesis is 0, since we have H0 : pT − pC = 0. The standard error uses the pooled proportion. The standardized test statistic is z=
p̂T − p̂C p̂(1−p̂) p̂) + p̂(1− nT nC
=
0.48 − 0.28 0.38(0.62) + 0.38(0.62) 75 75
= 2.52
This is an upper-tail test, so the p-value is the area above 2.52 in a standard normal distribution. Using technology or a table, we see that the p-value is 0.006. The p-value is very small and gives strong evidence that the treatment is better than a placebo at relieving pain.
CHAPTER 6
324 6.174 The hypotheses are H0 :
p1 = p2
Ha :
p1 > p2
where p1 represents the proportion of babies imitating an adult considered reliable and p2 represents the proportion of babies imitating an adult considered unreliable. There are at least 10 in each group (just barely) so we may use the normal distribution. We compute the sample proportions and the pooled sample proportion: 28 18 10 = 0.60 p̂2 = = 0.333 p̂ = = 0.467 p̂1 = 30 30 60 The standardized test statistic is Sample statistic − Null parameter (0.60 − 0.333) − 0 = = 2.073 SE 0.467(0.533) + 0.467(0.533) 30 30 This is an upper-tail test, so the p-value is the area above 2.073 in a standard normal distribution. Using technology we see that the p-value is 0.0191. This p-value is less than the 0.05 significance level, so we reject H0 . There is evidence that babies are more likely to imitate an adult who they believe is reliable. (And this effect is seen after one instance of being unreliable. Imagine the impact of repeated instances over time. It pays to be trustworthy!) 6.175 This is a test for a difference in proportions. Using p1 for the proportion of times hiding rats choose an opaque box and p2 for the proportion of times seeking rats choose an opaque box, the hypotheses are: H0 : Ha :
p1 = p2 p1 > p2
We see that p̂1 = 38/53 = 0.717 and p̂2 = 14/31 = 0.452 so the sample statistic for the difference in proportions is p̂1 − p̂2 = 0.717 − 0.452 = 0.265 For the standard error, we use the pooled proportion p̂ = 52/84 = 0.619. The standard error is given by: 0.619(1 − 0.619) 0.619(1 − 0.619) + = 0.110 SE = 53 31 The test statistic is:
Statistic − Null value 0.265 − 0 = = 2.41 SE 0.110 This is a right-tail test, and we use the normal distribution for a difference in proportions test. We see that the area in the right-tail of a normal distribution beyond 2.41 is 0.008, so we have z=
p-value = 0.008 This is a small p-value, so we reject H0 and have strong evidence that rats are more likely to choose the opaque box when hiding than when seeking. They do appear to understand the rules of the game! 6.176 We test the hypotheses H0 : p1 = p2 vs Ha : p1 = p2 , where p1 and p2 are the proportion of Pennsylvania seniors (among all who did the Census in School survey) choosing mathematics and statistics
CHAPTER 6
325
as their favorite subject for the first and second half of the decade. From the sample data we see that 39+69 = 108/454 = 0.2379. p̂1 = 39/186 = 0.2097 and p̂2 = 69/268 = 0.2575. The pooled proportion is p̂ = 186+268 We compute a z-statistic
z
(p̂1 − p̂2 ) − 0
=
=
=
−1.176
p̂(1 − p̂)( n11 + n12 ) 0.2097 − 0.2575 1 1 0.2379(1 − 0.2379)( 186 + 268 )
Using two tails of a normal distribution, we find the p-value is 0.240. This is not a small p-value, so we don’t have enough evidence to conclude there is a difference in the proportion of Pennsylvania seniors choosing mathematics and statistics as a favorite subject between the first and second halves of this decade. 6.177 This is a test for a difference in proportions. Using T for the treatment group (taking the daily low-dose aspirin) and C for the control group (taking a placebo), the hypotheses are H0 : pT = pC vs Ha : pT < pC . The sample sizes are very large (large enough to get more than 10 heart attacks in each group), so we can use the normal distribution. In general, the standardized test statistic is z=
Sample Statistic − Null Parameter SE
In this test for a difference in proportions, the sample statistic is p̂T − p̂C and the parameter from the null hypothesis is 0, since we have H0 : pT − pC = 0. The standard error uses the pooled proportion. The three relevant proportions are: p̂T
=
p̂C
=
Pooled proportion = p̂
=
104 = 0.0094 11, 037 189 = 0.0171 11, 034 104 + 189 293 = = 0.0133 11, 037 + 11, 034 22, 071
The standardized test statistic is z=
p̂T − p̂C p̂(1−p̂) p̂) + p̂(1− nT nC
=
0.0094 − 0.0171 0.0133(0.9867) + 0.0133(0.9867) 11,037 11,034
= −4.99
This is a lower-tail test, so the p-value is the area below −4.99 in a standard normal distribution. The p-value is essentially zero. The p-value is extremely small and gives very strong evidence that taking a daily lowdose aspirin reduces the risk of having a heart attack. (Indeed, the results are so strong that the study was stopped early in order to let others know of this low-cost and very effective treatment, used broadly today.) We can infer a causal relationship from the results because the data come from a randomized experiment. 6.178 The sample statistics are p̂o = 320/500 = 0.64 and p̂c = 300/500 = 0.60. The pooled proportion is p̂ = 620/1000 = 0.62. The standardized test statistic is z=
(0.64 − 0.60) − 0 Statistic − Null value = = 1.303 SE 0.62(0.38) 0.62(0.38) + 500 500
CHAPTER 6
326
We find the p-value as the proportion of the standard normal distribution beyond 1.303 in the right tail, yielding a p-value of 0.096. This p-value is not significant at a 5% level, so we do not reject H0 . After 15 days, we don’t see convincing that the proportion of fruit flies alive after eating organic raisins is higher than for those eating conventional raisins. 6.179 The sample statistics are p̂o = 345/500 = 0.69 and p̂c = 320/500 = 0.64. The pooled proportion is p̂ = 665/1000 = 0.665. The standardized test statistic is z=
(0.69 − 0.64) − 0 Statistic − Null value = = 1.675 SE 0.665(0.335) 0.665(0.335) + 500 500
We find the p-value as the proportion of the standard normal distribution beyond 1.675 in the right tail, yielding a p-value of 0.047. This p-value is just below the 5% level, so we can reject H0 . After 15 days, we have enough evidence to conclude that the proportion alive for fruit flies that eat organic bananas is higher than for fruit flies that eat conventional bananas. 6.180 The sample statistics are p̂o = 275/500 = 0.55 and p̂c = 170/500 = 0.34. The pooled proportion is p̂ = 445/1000 = 0.445. The standardized test statistic is z=
(0.55 − 0.34) − 0 Statistic − Null value = = 6.681 SE 0.445(0.555) 0.445(0.555) + 500 500
We find the p-value as the proportion of the standard normal distribution beyond 6.681 in the right tail, yielding a p-value of 0.000. (Recall that the z test statistic is a z-score, and a z-score of 6.681 is very far out in the tail! We don’t even really need a standard normal distribution to know that the p-value here is going to be very close to zero.) We reject H0 and find very strong evidence that, after 20 days, the proportion alive of fruit flies that eat organic raisins is significantly higher than the proportion alive that eat conventional raisins. 6.181 The sample statistics are p̂o = 250/500 = 0.50 and p̂c = 130/500 = 0.260. The pooled proportion is p̂ = 380/1000 = 0.380. The standardized test statistic is z=
(0.50 − 0.26) − 0 Statistic − Null value = = 7.818 SE 0.38(0.62) 0.38(0.62) + 500 500
We find the p-value as the proportion of the standard normal distribution beyond 7.818 in the right tail, yielding a p-value of 0.000. (Recall that the z test statistic is a z-score, and a z-score of 7.818 is very far out in the tail! We don’t even really need a standard normal distribution to know that the p-value here is going to be very close to zero.) We reject H0 and find very strong evidence that, after 20 days, the proportion alive of fruit flies that eat organic potatoes is significantly higher than the proportion alive that eat conventional potatoes. 6.182 (a) This is a test for a difference in proportions. Using A for the children with autism and C for the control group, the hypotheses are H0 : pA = pC vs Ha : pA > pC , where pA and pC are the proportions of autistic and non-autistic children, respectively, whose mothers used antidepressant drugs. The sample sizes are large enough to use the normal distribution. In general, the standardized test statistic is Sample Statistic − Null Parameter z= SE
CHAPTER 6
327
In this test for a difference in proportions, the sample statistic is p̂A − p̂C and the parameter from the null hypothesis is 0, since we have H0 : pA − pC = 0. The standard error uses the pooled proportion. The three relevant proportions are: p̂A
=
p̂C
=
Pooled proportion = p̂
=
20 = 0.067 298 50 = 0.033 1507 20 + 50 70 = = 0.039 298 + 1507 1805
The standardized test statistic is z=
p̂A − p̂C p̂(1−p̂) p̂) + p̂(1− nA nC
=
0.067 − 0.033 0.039(0.961) + 0.039(0.961) 298 1507
= 2.77
This is an upper-tail test, so the p-value is the area above 2.77 in a standard normal distribution. Using technology or a table, we see that the p-value is 0.003. The p-value is very small and gives strong evidence that autism rates are higher in children exposed prenatally to antidepressant drugs. (b) No, we cannot conclude that prenatal exposure to these drugs increases the risk of autism. The study describes an observational study (not an experiment) so we cannot conclude that there is a causal relationship. There are many possible confounding variables. (c) One possible confounding variable in the study is the mental health of the mother. The authors of the study acknowledge and try to account for this confounding variable by conducting an additional test, with the results described in the sentence. 6.183 Since the number in the control group that solved the problem is only 4 (which is definitely less than 10) and only 8 in the electrical stimulation group did not solve it, the sample size is too small to use the normal distribution for this test. A randomization test is more appropriate for this data. 6.184 We cannot use the normal distribution and a standardized test statistic to conduct this test! Notice that two of the numbers in the cells are less than 10, so this data do not satisfy the conditions of the Central Limit Theorem for proportions. Luckily, randomization tests for experiments do not have any assumptions, so we conduct the test using a randomization test. Using pT to denote the proportion of men with PIN lesions who get prostate cancer after taking green tea extract for a year and pC to denote the proportion of men with PIN lesions who get prostate cancer after taking a placebo for a year, we have as our hypotheses: H0 : Ha :
pT = pC pT < pC
Using StatKey or other appropriate technology, we conduct a randomization test for this difference in proportions using the data in the table where the original difference is p̂T − p̂C = 1/30 − 9/30 = −0.267. The dotplot below shows results for 10,000 differences in proportions, generated by re-assigning the “Green Tea” and “Placebo” groups at random to the cancer results.
CHAPTER 6
328
The p-value is very small, 0.006 from these randomizations, so we reject the null hypothesis and find strong evidence that green tea extract offers some benefit against prostrate cancer. 6.185 (a) Let po and pu be the proportion of Oocyst and unexposed mosquitoes, respectively, that approach a human. The sample statistic is p̂o − p̂u =
36 20 − = 0.177 − 0.308 = −0.131 113 117
The pooled proportion is p̂ =
20 + 36 = 0.2435 113 + 117
The standard error is p̂(1 − p̂) p̂(1 − p̂) 0.243(1 − 0.243) 0.243(1 − 0.243) SE = + = 0.057 + = n1 n2 113 117 The z-statistic then is z=
−0.131 − 0 statistic − null = = −2.30 SE 0.057
(b) Since we are testing H0 : po = pu vs Ha : po < pu , we look in the lower tail of a standard normal distribution below z = −2.3 to find the p-value of 0.011. (c) Let ps and pu be the proportion of Sporozoite and unexposed mosquitoes, respectively, that approach a human. The sample statistic is p̂s − p̂u =
14 37 − = 0.248 − 0.097 = 0.151 149 144
The pooled proportion is p̂ =
37 + 14 = 0.174 149 + 144
The standard error is p̂(1 − p̂) p̂(1 − p̂) 0.174(1 − 0.174) 0.174(1 − 0.174) SE = + = 0.044 + = n1 n2 149 144 The z-statistic then is z=
0.151 − 0 statistic − null = = 3.43 SE 0.044
CHAPTER 6
329
(d) Since we are testing H0 : ps = pu vs Ha : ps > pu , we look in the upper tail of a standard normal distribution above z = 3.43 to find the p-value of 0.0003. (e) We have enough evidence to conclude that in the Oocyst stage mosquitoes exposed to malaria approach the human less often than mosquitoes not exposed to malaria and in the Sporozoite stage mosquitoes exposed to malaria approach the human more often than mosquitoes not exposed to malaria. (f) Yes, we can conclude that being exposed to malaria (as opposed to not being exposed to malaria) causes these behavior changes in mosquitoes, because this was a randomized experiment (it was randomly determined which mosquitoes ate from the mouse infected with malaria and which mosquitoes ate from the non-infected mouse). 6.186 These are large sample sizes so the normal distribution is appropriate. The sample proportions are p̂H = 164/8506 = 0.0193 and p̂P = 122/8102 = 0.0151 in the HRT group and the placebo group, respectively. The pooled proportion is 164 + 122 = 0.0172 p̂ = 8506 + 8102 The test statistic is z=
0.0193 − 0.0151 0.0172(1−0.0172) + 0.0172(1−0.0172) 8506 8102
=
0.0042 = 2.09 0.002
Answers may vary slightly due to roundoff. The area in the upper tail of the standard normal distribution is 0.018, which we double to find the p-value of 0.036. Because this was a randomized experiment, we can conclude causality. There is evidence that HRT significantly increases risk of cardiovascular disease. 6.187 These are large sample sizes so the normal distribution is appropriate. The sample proportions are p̂H = 166/8506 = 0.0195 and p̂P = 124/8102 = 0.0153 in the HRT group and the placebo group, respectively. The pooled proportion is 166 + 124 = 0.0175 p̂ = 8506 + 8102 The test statistic is z=
0.0195 − 0.0153 0.0175(1−0.0175) + 0.0175(1−0.0175) 8506 8102
=
0.0042 = 2.07 0.002
Answers may vary slightly due to roundoff. The area in the upper tail of the standard normal distribution is 0.019, which we double to find the p-value of 0.038. Because this was a randomized experiment, we can conclude causality. There is evidence that HRT significantly increases risk of invasive breast cancer. 6.188 These are large sample sizes so the normal distribution is appropriate. The sample proportions are p̂H = 502/8506 = 0.0590 and p̂P = 458/8102 = 0.0565 in the HRT group and the placebo group, respectively. The pooled proportion is 502 + 458 = 0.0578 p̂ = 8506 + 8102 The test statistic is z=
0.0590 − 0.0565 0.0578(1−0.0578) + 0.0578(1−0.0578) 8506 8102
=
0.0025 = 0.69 0.0036
The area in the upper tail of the standard normal distribution is 0.245, which we double to find the p-value of 0.490. There is not evidence that HRT influences the chance of getting cancer in general.
CHAPTER 6
330
6.189 These are large sample sizes so the normal distribution is appropriate. The sample proportions are p̂H = 650/8506 = 0.076 and p̂P = 788/8102 = 0.097 for the HRT group and the placebo group, respectively. The pooled proportion is 650 + 788 = 0.087 p̂ = 8506 + 8102 The test statistic is 0.076 − 0.097 −0.021 z= = −4.77 = 0.0044 0.087(1−0.087) 0.087(1−0.087) + 8506 8102 The area in the lower tail of the standard normal distribution below −4.77 is very small, so even after doubling we have p-value ≈ 0. Because this was a randomized experiment, we can conclude causality. There is strong evidence that HRT significantly decreases risk of fractures. 6.190 The hypotheses are H0 : pm = pf vs Ha : pm = pf where are pm and pf are the proportion with an infection of males and females, respectively. Using technology, we see p̂m = 0.411 with nm = 124 and p̂f = 0.434 with nf = 76. Also using technology, we find that the standardized z-statistic is z = −0.32 and the p-value for this two-tailed test is 0.750. There is not sufficient evidence to find a difference between males and females in infection rate. 6.191 The hypotheses are H0 : pm = pf vs Ha : pm = pf where are pm and pf are the proportion having surgery of males and females, respectively. Using technology, we see p̂m = 0.565 with nm = 124 and p̂f = 0.487 with nf = 76. Also using technology, we find that the standardized z-statistic is z = 1.07 and the p-value for this two-tailed test is 0.285. There is not sufficient evidence to find a difference between males and females in the proportion having surgery. 6.192
(a) The two-way table for HealthBinary and Organic is given below:
Health: Very Good or Excellent Health: Poor, Fair, or Good
Bought Organic 786 925
Did Not Buy Organic 991 2014
(b) The difference in proportions is 991 786 − = 0.459 − 0.330 = 0.129 786 + 925 991 + 2014 (c) A hypothesis test for a difference in proportions yields a z-statistic that is larger than 8 and an extremely low p-value (less than 10−16 )! These results would be extremely unlikely just by random chance, so we have very strong evidence against alternative explanation (iii). (d) No, we cannot rule out confounding (ii) as a possible explanation for the association because it is an observational study, not a randomized experiment. People were not randomly assigned to buy organic food or not, so the groups almost certainly differed to begin with. Even though we can safely rule out explanation (iii), we cannot rule out explanation (ii), so we cannot determine whether explanation (i) or (ii) (or both) are driving the observed difference. 6.193 Yes. This was a randomized experiment, so we expect the groups to be relatively similar at baseline. 6.194
(a) The difference in proportions is 43 28 − = 0.0483 − 0.0756 = −0.027 580 569
CHAPTER 6
331
(b) Using technology we obtain the output below: Sample 1 2
X 28 43
N 580 569
Sample p 0.048276 0.075571
Difference = p (1) - p (2) Estimate for difference: -0.0272953 95% upper bound for difference: -0.00391795 Test for difference = 0 (vs < 0): Z = -1.92
P-Value = 0.027
Fishers exact test: P-Value = 0.036 The one-sided p-value based on a z-test with a normal distribution is 0.027. Note that some statistical software packages use a different algorithm, such as Fisher’s exact test (also shown above), that may give a slightly different p-value. (c) The p-value of 0.027 is smaller than the significance level of α = 0.05, so we have significant evidence against alternative explanation (iii). (d) Yes. We have convincing evidence against explanations (ii) and (iii), so therefore have convincing evidence for the causal claim that medical face masks are more effective than cloth face masks at preventing health care workers from clinical respiratory infection. 6.195
(a) The difference in proportions is 13 1 − = 0.0017 − 0.0228 = −0.021 580 569
(b) Using technology we obtain the output below:
Sample 1 2
X 1 13
N 580 569
Sample p 0.001724 0.022847
Difference = p (1) - p (2) Estimate for difference: -0.0211230 95% upper bound for difference: -0.0104373 Test for difference = 0 (vs < 0): Z = -3.25
P-Value = 0.001
* NOTE * The normal approximation may be inaccurate for small samples. Fishers exact test: P-Value = 0.001 The one-sided p-value is 0.001. (c) No; the count of 1 in the medical mask group is too small to use the normal approximation (see the convenient warning in the output), and we should instead use a randomization test. (d) Using a randomization test we find a p-value of approximately 0.0004. (e) The p-value is very small, providing significant evidence against alternative explanation (iii).
CHAPTER 6
332
(f) Yes. We have convincing evidence against explanations (ii) and (iii), so therefore have convincing evidence for the causal claim that medical face masks are more effective than cloth face masks at preventing health care workers from influenza-like illness. 6.196
(a) The difference in proportions is 31 19 − = 0.0326 − 0.0545 = −0.022 580 569
(b) Using technology we obtain the output below: Sample 1 2
X 19 31
N 580 569
Sample p 0.032759 0.054482
Difference = p (1) - p (2) Estimate for difference: -0.0217229 95% upper bound for difference: -0.00190512 Test for difference = 0 (vs < 0): Z = -1.80
P-Value = 0.036
Fishers exact test: P-Value = 0.048 The one-sided p-value based on a z-test with a normal distribution is 0.036. Note that some statistical software packages use a different algorithm, such as Fisher’s exact test (also shown above), that may give a slightly different p-value. (c) The p-value of 0.036 is smaller than the significance level of α = 0.05, so we have significant evidence against alternative explanation (iii). (d) Yes. We have convincing evidence against explanations (ii) and (iii), so therefore have convincing evidence for the causal claim that medical face masks are more effective than cloth face masks at preventing health care workers from laboratory-confirmed virus illness. 6.197 The standard error is much smaller for the ILI response variable, because the overall proportion of infection is so much smaller for ILI. In general, the difference in proportions alone does not determine evidence against (iii), but rather the ratio of the difference in proportions to the standard error.
CHAPTER 6
333
Section 6.4-D Solutions
6.198 The differences in sample means will have a standard error of SE =
σ12 σ2 + 2 = n1 n2
152 122 + = 2.06 100 80
6.199 The differences in sample means will have a standard error of SE =
σ12 σ2 + 2 = n1 n2
7.62 3.72 + = 1.41 25 40
6.200 The differences in sample means will have a standard error of SE =
σ12 σ2 + 2 = n1 n2
1.32 1.72 + = 0.303 50 50
6.201 The differences in sample means will have a standard error of SE =
σ12 σ2 + 2 = n1 n2
182 222 + = 1.43 300 500
6.202 We use the smaller sample size and subtract 1 to find the degrees of freedom, so we use a t-distribution with df = 14. We see that the values with 2.5% beyond them in each tail are ±2.14.
6.203 We use the smaller sample size and subtract 1 to find the degrees of freedom, so we use a t-distribution with df = 7. We see that the values with 5% beyond them in each tail are ±1.89.
334
CHAPTER 6
6.204 We use the smaller sample size and subtract 1 to find the degrees of freedom, so we use a t-distribution with df = 29. We see that the probability the t-statistic is less than −1.4 is 0.0861. (With a paper table, we may only be able to specify that the area is between 0.05 and 0.10.)
6.205 We use a t-distribution with df = 11. We see that the probability the t-statistic is greater than 2.1 is 0.0298. (With a paper table, we may only be able to specify that the area is between 0.025 and 0.05.)
CHAPTER 6
335
Section 6.4-CI Solutions 6.206 For a confidence interval for μ1 − μ2 using the t-distribution, we use s21 s2 (x1 − x2 ) ± t∗ + 2 n1 n2 The smaller sample size is 20, so we use a t-distribution with df = 19. For a 95% confidence interval, we have t∗ = 2.09. The confidence interval is s21 s2 ∗ (x1 − x2 ) ± t + 2 n1 n2 8.32 10.72 + (75.2 − 69.0) ± 2.09 · 30 20 6.2 ± 5.63 0.57 to 11.83 The best estimate for the difference in means μ1 − μ2 is 6.2, the margin of error is ±5.63, and the 95% confidence interval for μ1 − μ2 is 0.57 to 11.83. We are 95% confident that the mean for population 1 is between 0.57 and 11.83 more than the mean for population 2. 6.207 For a confidence interval for μ1 − μ2 using the t-distribution, we use s21 s2 (x1 − x2 ) ± t∗ + 2 n1 n2 The sample sizes are both 50, so we use a t-distribution with df = 49. For a 90% confidence interval, we have t∗ = 1.68. The confidence interval is 5.72 2.32 + (10.1 − 12.4) ± 1.68 · 50 50 −2.3 ± 1.46 −3.76
to
−0.84
The best estimate for the difference in means μ1 − μ2 is −2.3, the margin of error is ±1.46, and the 90% confidence interval for μ1 − μ2 is −3.76 to −0.84. We are 90% confident that the mean for population 2 is between 0.84 and 3.76 more than the mean for population 1. 6.208 For a confidence interval for μ1 − μ2 using the t-distribution, we use s21 s2 (x1 − x2 ) ± t∗ + 2 n1 n2 The smaller sample size is 200, so we use a t-distribution with df = 199. For a 99% confidence interval, we have t∗ = 2.60. The confidence interval is 962 1152 + (501 − 469) ± 2.60 · 400 200 32 ± 23.1 8.9 to 55.1
CHAPTER 6
336
The best estimate for the difference in means μ1 − μ2 is 32, the margin of error is ±23.1, and the 99% confidence interval for μ1 − μ2 is 8.9 to 55.1. We are 99% confident that the mean for population 1 is between 8.9 and 55.1 more than the mean for population 2. 6.209 For a confidence interval for μ1 − μ2 using the t-distribution, we use s21 s2 (x1 − x2 ) ± t∗ + 2 n1 n2 The smaller sample size is 8, so we use a t-distribution with df = 7. For a 95% confidence interval, we have t∗ = 2.36. The confidence interval is 2.82 2.72 + (5.2 − 4.9) ± 2.36 · 10 8 0.3 ± 3.09 −2.79 to 3.39 The best estimate for the difference in means μ1 − μ2 is 0.3, the margin of error is ±3.09, and the 95% confidence interval for μ1 − μ2 is −2.79 to 3.39. We are 95% confident that the difference in the two population means is between −2.79 and 3.39. Notice that zero is in this confidence interval, so it is certainly possible that the two population means are equal. 6.210 For 95% confidence with df = 3094, we have t∗ = 1.961. Note that this is very close to the standard normal value of 1.960, since the sample sizes are so large. The confidence interval is: Statistic
±
(xF − xN )
±
(83.6 − 59.1)
±
24.5 16.595
± to
t∗ · SE s2 s2F ∗ t + N nF nN 152.12 194.72 + 1.961 3095 5782 7.905 32.405
We are 95% sure that the mean concentration of DEHP in people who have recently eaten fast food is between 16.6 ng/mL and 32.4 ng/mL higher than in people who have not recently eaten fast food. 6.211 For 95% confidence with df = 3094, we have t∗ = 1.961. Note that this is very close to the standard normal value of 1.960, since the sample sizes are so large. The confidence interval is: Statistic
±
(xF − xN )
±
(10.1 − 7.0)
±
3.1 1.61
± to
t∗ · SE s2 s2F t∗ + N nF nN 38.92 22.82 + 1.961 3095 5782 1.49 4.59
We are 95% sure that the mean concentration of DiNP in people who have recently eaten fast food is between 1.6 ng/mL and 4.6 ng/mL higher than in people who have not recently eaten fast food.
CHAPTER 6
337
6.212 This is a confidence interval for a difference in means, so we use the t-distribution. The sample sizes are both large enough so that we don’t have to worry about normality. For a confidence interval using the t-distribution, we use Sample statistic ± t∗ · SE The relevant sample statistic for a confidence interval for a difference in means is x1 −x2 = 0.69−0.40 = 0.29. For a 99% confidence interval with degrees of freedom df = 140, we have t∗ = 2.612. The confidence interval is s21 s2 + 2 (x1 − x2 ) ± t∗ · n1 n2 0.272 0.422 + (0.69 − 0.40) ± 2.612 · 141 228 0.29 ± 2.612 · (0.040) 0.29 ± 0.104 0.186 to 0.394 The 99% confidence interval for the difference in means is 0.186 to 0.394. We are 99% sure that fluoride in tap water will increase mean fluoride concentration in women’s bodies by 0.186 to 0.394 mg/L. 6.213 (a) Randomized means the participants were divided randomly between the two groups. Doubleblind means neither the participants nor those measuring vascular health knows which group participants are in. Placebo-controlled means participants are given a treatment that is almost identical to the real treatment — in this case, dark chocolate which has had the flavonoids removed. (b) We are estimating μC − μN where μC represents the mean increase in flow-mediated dilation for people eating dark chocolate every day and μN represents the mean increase in flow-mediated dilation for people eating a dark chocolate substitute each day. For a 95% confidence interval using a t-distribution with degrees of freedom equal to 9, we use t∗ = 2.26. We have: Sample statistic
±
t∗ · SE
(xC − xN )
±
2.26
(1.3 − (−0.96))
±
2.26
2.26
±
1.94
0.32
to
s2 s2C + N nC nN 1.582 2.322 + 11 10
4.20
We are 95% confident that the mean increase in flow-mediated dilation for those in the dark chocolate group is between 0.32 and 4.20 higher than for those in the fake chocolate group. (c) No. “No difference” implies that μC − μN = 0. We see that 0 is not within the confidence interval of 0.32 to 4.20 and, in fact, all plausible values show a positive difference. 6.214 We are estimating μS − μN where μS represents the mean number of close confidants for those using a social networking site and μN represents the mean number for those not using a social networking site. For a 90% confidence interval using a t-distribution with degrees of freedom equal to 946, or a normal distribution
CHAPTER 6
338 since the sample sizes are so large, we use t∗ = 1.65. We have: Sample statistic
±
t∗ · SE
(xS − xN )
±
1.65
(2.5 − 1.9)
±
1.65
0.6 0.50
± to
0.10 0.70
s2 s2S + N nS nN 1.32 1.42 + 947 1059
We are 90% confident that the average number of close confidants for a person with a profile on a social networking site is between 0.5 and 0.7 larger than the average number for those without such a profile. 6.215 Since both samples have size 24, we use a t-distribution with 24 − 1 = 23 degrees of freedom to find a 95% point of t∗ = 2.069.
xI − xS
±
37.29 − 50.92
±
−13.63 −21.67
± to
s2 s2I + S nI nS 14.332 12.542 + 2.069 24 24 8.04 −5.59 t
∗
We are 95% sure that diners using individual bills at this restaurant will spend, on average, somewhere between 5.59 and 21.67 shekels less than diners who are splitting the bill 6.216 We let μN and μY represent mean GPA for students whose roommate does not bring a videogame (No) or does bring one (Yes) to campus, respectively. The two relevant sample sizes are 88 and 38, so the smallest sample size in these two groups is 38 and the degrees of freedom are df = 37. For a 95% confidence level with df = 37, we have t∗ = 2.03. Calculating the 95% confidence interval for the difference in means: Sample statistic
±
(xN − xY )
±
(3.128 − 2.932)
±
0.196
±
t∗ · SE s2 s2N t∗ + Y nN nY 0.6992 0.5902 + 2.03 88 38 0.263
−0.067
to
0.459
We are 95% sure that, for students who do not bring a videogame to campus, mean GPA for students whose roommate does not bring a videogame will be between 0.067 lower and 0.459 higher than the mean for students whose roommate does bring a videogame. Since 0 is in this interval, it is possible that there is no effect from the roommate bringing a videogame. 6.217 We let μN and μY represent mean GPA for students whose roommate does not bring a videogame (No) or does bring one (Yes) to campus, respectively. The two relevant sample sizes are 44 and 40, so the
CHAPTER 6
339
smallest sample size in these two groups is 40 and the degrees of freedom are df = 39. For a 95% confidence level with df = 39, we have t∗ = 2.02. Calculating the 95% confidence interval for the difference in means: Sample statistic
±
(xN − xY )
±
(3.039 − 2.754)
±
0.285 −0.008
± to
t∗ · SE s2 s2N t∗ + Y nN nY 0.6392 0.6892 + 2.02 44 40 0.293 0.578
We are 95% sure that, for students who do bring a videogame to campus, mean GPA for students whose roommate does not bring a videogame will be between 0.008 lower and 0.578 higher than the mean for students whose roommate does bring a videogame. Since 0 is (just barely!) in this interval, it is possible that there is no effect from the roommate bringing a videogame. 6.218 We let μN and μY represent mean GPA for students who do not bring a videogame (No) or do bring one (Yes) to campus, respectively. The two relevant sample sizes are 88 and 44, so the smallest sample size in these two groups is 44 and the degrees of freedom are df = 43. For a 95% confidence level with df = 43, we have t∗ = 2.02. Calculating the 95% confidence interval for the difference in means: Sample statistic
±
(xN − xY )
±
(3.128 − 3.039)
±
0.089 −0.156
± to
t∗ · SE s2 s2N ∗ t + Y nN nY 0.6892 0.5902 + 2.02 88 44 0.245 0.334
We are 95% sure that, for students whose roommate does not bring a videogame to campus, mean GPA for students who do not bring a videogame will be between 0.156 lower and 0.245 higher than the mean for students who bring a videogame. Since 0 is in this interval, it is possible that there is no effect from bringing a videogame. 6.219 We let μN and μY represent mean GPA for students who do not bring a videogame (No) or do bring one (Yes) to campus, respectively. The two relevant sample sizes are 38 and 40, so the smallest sample size in these two groups is 38 and the degrees of freedom are df = 37. For a 95% confidence level with df = 37, we have t∗ = 2.03. Calculating the 95% confidence interval for the difference in means: Sample statistic
±
(xN − xY )
±
(2.932 − 2.754)
±
0.178 −0.130
± to
t∗ · SE s2 s2N t∗ + Y nN nY 0.6392 0.6992 + 2.03 38 40 0.308 0.486
CHAPTER 6
340
We are 95% sure that, for students whose roommate brings a videogame to campus, mean GPA for students who do not bring a videogame will be between 0.130 lower and 0.486 higher than the mean for students who bring a videogame. Since 0 is in this interval, it is possible that there is no effect from bringing a videogame. 6.220 (a) We let μN represent mean GPA for students for whom neither the student nor the roommate brings a videogame and μY represent mean GPA for students for whom both the student and the roommate brings a videogame. The two relevant sample sizes are 88 and 40, so the smallest sample size in these two groups is 40 and the degrees of freedom are df = 39. For a 95% confidence level with df = 39, we have t∗ = 2.02. Calculating the 95% confidence interval for the difference in means: t∗ · SE s2 s2N ∗ t + Y nN nY 0.6392 0.5902 + 2.02 88 40 0.240
Sample statistic
±
(xN − xY )
±
(3.128 − 2.754)
±
0.374
±
0.134
to 0.614
We are 95% sure that mean GPA is between 0.134 and 0.240 points higher for students if neither student in the room brings a videogame to campus than if both students in the room bring a videogame to campus. Since 0 is not in this interval and the entire interval is positive, we have fairly convincing evidence that mean GPA is higher if there are no videogames. (b) We can not conclude that bringing videogames to campus reduces GPA, since these data come from an observational study rather than an experiment. There are many possible confounding variables, such as the possibility that there is a difference in the people who decide to bring a videogame to campus vs those who decide not to. 6.221 We use StatKey or other technology to create a bootstrap distribution. We see for one set of 1000 simulations that SE ≈ 1.12. (Answers may vary slightly with other simulations.)
Using the formula from the Central Limit Theorem, and using the sample standard deviations as estimates
CHAPTER 6
341
of the population standard deviations, we have s21 20.722 14.232 s22 + = 1.124 + = SE = n1 n2 500 500 We see that the bootstrap standard error and the formula match very closely. 6.222 We use StatKey or other technology to create a bootstrap distribution. We see for one set of 1000 simulations that SE ≈ 0.77. (Answers may vary slightly with other simulations.)
Using the formula from the Central Limit Theorem, and using the sample standard deviations as estimates of the population standard deviations, we have s21 13.802 10.752 s22 + = 0.782 + = SE = n1 n2 500 500 We see that the bootstrap standard error and the formula match very closely. 6.223
(a) We see that there are 168 females and 193 males.
(b) The males in the sample exercise more, with a mean of 9.876 compared to the female mean of 8.110. The difference is 9.876 − 8.110 = 1.766 hours per week more exercise for the males, on average. (c) We are estimating μf − μm , where μf represents the number of hours spent exercising per week by all female students at this university and μm represents the number of hours spent exercising a week for all male students at this university. For a 95% confidence interval with degrees of freedom 167, we use t∗ = 1.97. The confidence interval is Sample statistic
±
t∗ · SE
(xf − xm )
±
1.97
(8.110 − 9.876) −1.766 −2.934
±
1.97
s2f
nf
+
s2m nm
6.0692 5.1992 + 168 193
± 1.168 to −0.598
CHAPTER 6
342 A 95% confidence interval for the difference in means is −2.93 to −0.60.
(d) Up to two decimal places, the confidence interval we found is the same as the one given in the computer output. The small differences are probably due to round-off error. Note you might have switched the order to estimate μm − μf but that would only change the signs of the interval. (e) We are 95% confident that the average amount males exercise, in hours per week, is between 0.60 and 2.93 hours more than the amount females exercise per week, for students at this university. 6.224 (a) The males watch more TV, with a mean in this sample of 7.620 compared to the female mean of 5.237. The difference is 7.620 − 5.237 = 2.383 hours per week more TV for the males, on average. (b) We are estimating μf − μm , where μf represents the number of hours spent watching TV per week by all female students at this university and μm represents the number of hours spent watching TV a week for all male students at this university. For a 99% confidence interval with degrees of freedom 168, we use t∗ = 2.61. The confidence interval is Sample statistic
±
t∗ · SE
(xf − xm )
±
1.97
(5.237 − 7.620) −2.383 −3.847
±
2.61
s2f
nf
+
s2m nm
6.4272 4.1002 + 169 192
± 1.464 to −0.919
A 99% confidence interval for the difference in means is −3.85 to −0.92. (c) The confidence intervals are very similar, with minor differences due to round-off error and degrees of freedom. Note you might have switched the order to estimate μm − μf but that would only change the signs of the interval. (d) We are 99% confident that the average amount males watch TV, in hours per week, is between 3.85 and 0.92 hours more than the amount females watch TV per week, for students at this university. 6.225 Using any statistics package, we see that a 95% confidence interval for μf − μm is −2.36 to 0.91. We are 95% sure that the difference in average number of grams of fiber eaten in a day between males and females is between 2.36 more for males and 0.91 more for females. “No difference” is a plausible option since the interval contains 0. 6.226 Using any statistics package, we see that a 95% confidence interval for μ0 − μ1 is 2.9 to 30.7. We are 95% sure that the difference in average systolic blood pressure between those who live and those who die in the ICU is between 2.9 and 30.7. “No difference” is not plausible, given this sample, since the confidence interval contains only positive differences. For all of the plausible differences, the mean blood pressure is higher for those who lived.
CHAPTER 6
343
Section 6.4-HT Solutions 6.227 In general, the standardized test statistic is Sample Statistic − Null Parameter SE In this test for a difference in means, the sample statistic is x1 − x2 and the parameter from the null hypothesis is 0 since the null hypothesis statement is equivalent to μ1 − μ2 = 0. Substituting the formula for the standard error for a difference in two means, we compute the t-test statistic to be: t=
(x1 − x2 ) − 0 56 − 51 2 = = 2.70 s1 s22 8.22 6.92 + + 30 40 n1 n2
This is an upper-tail test, so the p-value is the area above 2.70 in a t-distribution with df = 29. (Since both sample sizes are greater than 30, we could also use the normal distribution to estimate the p-value.) We see that the p-value is about 0.006. This p-value is very small, so we reject H0 and find evidence to support the alternative hypothesis that μ1 > μ2 .
6.228 In general, the standardized test statistic is Sample Statistic − Null Parameter SE In this test for a difference in means, the sample statistic is x1 − x2 and the parameter from the null hypothesis is 0 since the null hypothesis statement is equivalent to μ1 − μ2 = 0. Substituting the formula for the standard error for a difference in two means, we compute the t-test statistic to be: t=
(x1 − x2 ) − 0 15.3 − 18.4 2 = = −1.57 s1 s22 11.62 14.32 + + 100 80 n1 n2
This is a two-tailed test, so the p-value is the two times the area below −1.57 in a t-distribution with df = 79. We see that the p-value is about 2 · 0.06 = 0.12. This p-value is not even significant at a 10% level, so we do not find enough evidence to support the alternative hypothesis. There is insufficient evidence that the population means are different.
CHAPTER 6
344
6.229 In general, the standardized test statistic is Sample Statistic − Null Parameter SE In this test for a difference in means, the sample statistic is xA − xB and the parameter from the null hypothesis is 0 since the null hypothesis statement is equivalent to μA − μB = 0. Substituting the formula for the standard error for a difference in two means, we compute the t-test statistic to be: t=
125 − 118 (xA − xB ) − 0 2 = = 0.96 sA s2B 182 142 + + 8 15 nA nB
This is a two-tailed test, so the p-value is two times the area above 0.96 in a t-distribution with df = 7. We see that the p-value is 2(0.185) = 0.37. This p-value is larger than any reasonable significance level, so we do not reject H0 . There is not enough evidence to support the alternative hypothesis that the means are different.
6.230 In general, the standardized test statistic is Sample Statistic − Null Parameter SE In this test for a difference in means, the sample statistic is xT − xC and the parameter from the null hypothesis is 0 since the null hypothesis statement is equivalent to μT − μC = 0. Substituting the formula for the standard error for a difference in two means, we compute the t-test statistic to be: t=
8.6 − 11.2 (xT − xC ) − 0 2 = = −2.44 2 sT sC 4.12 3.42 + + 25 25 nT nC
CHAPTER 6
345
This is a lower-tail test, so the p-value is the area below −2.44 in a t-distribution with df = 24. We see that the p-value is about 0.0112. This p-value shows significance at a 5% level but not quite significant at a 1% level. At a 5% level, we reject H0 and conclude that there is evidence that μT < μC .
6.231 The hypotheses are H0 : μH = μC vs Ha : μH > μC , where μH and μC represent the mean score on the test for those taking notes by hand and those taking notes on a computer, respectively. The sample sizes are large enough to use the t-distribution. The t-test statistic is: t=
(xH − xC ) − 0 (25.6 − 18.3) − 0 Sample statistic − Null parameter = 2 = = 3.234 2 sH sC SE 10.82 9.02 + + 38 40 nH nC
This is a right-tail test so the p-value is the area above 3.234 in a t-distribution with df = 37. We obtain a p-value of 0.0013. This is a small p-value, so we reject H0 . We have evidence that mean test scores are higher if notes are taken longhand rather than on a laptop. 6.232 This is a test for a difference in means. Using μ1 to represent the mean time for a dog to open the door when the owner is crying (the distressed condition) and μ2 to represent the mean time for a dog to open the door when the owner is humming (the control condition), the hypotheses are: H0 :
μ1 = μ2
Ha :
μ1 < μ2
The t-test statistic is: t=
Statistic − Null (x1 − x2 ) − 0 (23.43 − 95.89) − 0 −72.46 − 0 = 2 = −2.380 = = 2 2 s1 s22 SE 30.447 17.77 89.09 + + 7 9 n1 n2
This is a left-tail test so the p-value is the area to the left of −2.380 in a t-distribution with df = 6. We see that the p-value is 0.027. This p-value is less than the 5% significance level, so we reject H0 . We have evidence that dogs try to reach their owners faster if they believe the owner is in distress. 6.233 This is a difference in means test. We use μ1 for the mean improvement for those with a healthy diet and μ2 for the mean improvement for those who don’t change diet. The hypotheses are: H0 : Ha :
μ1 = μ2 μ1 > μ2
CHAPTER 6
346 The t-test statistic is: t=
Statistic − Null (x1 − x2 ) − 0 (6.03 − (−0.13)) − 0 6.16 − 0 = 2 = 2.32 = = 2 2 2 s s SE 2.653 10.53 12.39 1 2 + + 37 38 n1 n2
This is a right-tail test so the p-value is the area to the right of 2.32 in a t-distribution with df = 36. We see that the p-value is 0.013. This p-value is less than the 5% significance level, so we reject H0 . We have evidence that, on average, eating healthy for three weeks improves depression symptoms. 6.234 This is a difference in means test. We use μ1 for the mean rating by students in an active learning environment and μ2 for the mean rating for students in a passive learning lecture. The hypotheses are: H0 : Ha :
μ1 = μ2 μ1 < μ2
The sample statistic is x1 − x2 = 3.338 − 3.753 = −0.415. The null hypothesis value is 0, and the standard error is: 0.8732 s21 0.9222 s22 + = 0.1046 SE = + = n1 n2 142 154 The t-test statistic is:
Statistic − Null −0.415 − 0 = = −3.97 SE 0.1046 This is a left-tail test so the p-value is the area to the left of −3.97 in a t-distribution with df = 141. We see that the p-value, to three decimal places, is 0.000. This p-value is very small, so we reject H0 . We have very strong evidence that, on average, students feel that they learn more in a lecture class (even if they may actually learn less). t=
6.235 This is a difference in means test. We use μ1 for the mean grade expected by students in an active learning environment and μ2 for the mean grade expected by students in a passive learning lecture. The hypotheses are: H0 : Ha :
μ1 = μ2 μ1 > μ2
The sample statistic is x1 − x2 = 0.702 − 0.600 = 0.102. The null hypothesis value is 0, and the standard error is: 0.1912 s21 0.1782 s22 + = 0.0215 SE = + = n1 n2 140 154 The t-test statistic is:
Statistic − Null 0.102 − 0 = = 4.74 SE 0.0215 This is a right-tail test so the p-value is the area to the right of 4.74 in a t-distribution with df = 139. We see that the p-value, to three decimal places, is 0.000. This p-value is very small, so we reject H0 . We have very strong evidence that, on average, students learn the material better in an active learning class. t=
6.236 We test the hypotheses H0 : μ10 = μ19 vs Ha : μ10 = μ19 , where μ10 and μ19 are the mean hours of TV for Pennsylvania seniors who did the Census in School survey in 2010 and 2019. We compute a t-statistic
CHAPTER 6
347
t
=
=
(x10 − x19 ) − 0 2 s10 s219 n10 + n19 8.12 − 5.50 7.682 6.442 30 + 37
= 1.49 Using two tails of a t-distribution with 30 − 1 = 29 degrees of freedom, we find the p-value is 0.148. This is not a small p-value, so we don’t have enough evidence to conclude there is a difference in mean number of hours of TV watching for Pennsylvania seniors between 2010 and 2019. 6.237 This is a difference in means test. Using μ1 for mean grade on material when devices are not allowed and μ2 for mean grade on material when devices are allowed, the hypotheses are: H0 : Ha :
μ1 = μ2 μ1 > μ2
The sample statistic is x1 − x2 = 86.6 − 80.1 = 6.5. The null hypothesis value is 0, and the standard error is: 10.42 s21 8.12 s22 + = 1.2135. SE = + = n1 n2 118 118 The t-test statistic is:
6.5 − 0 Statistic − Null = = 5.356 SE 1.2135 This is a right-tail test so the p-value is the area to the right of 5.356 in a t-distribution with df = 117. We see that the p-value, to three decimal places, is 0.000. This p-value is very small, so we reject H0 . We have very strong evidence that mean grades are higher on material that is learned when the use of electronic devices is not allowed in the class. Students are not very good at multitasking during class! t=
6.238 This is a difference in means test. Using μ1 for mean anger ratings after hearing statements in noun form and μ2 for mean anger ratings after hearing statements in verb form, the hypotheses are: H0 :
μ1 = μ2
Ha :
μ1 < μ2
The sample statistic is x1 − x2 = 3.21 − 3.67 = −0.46. The null hypothesis value is 0, and the standard error is: 1.432 s21 1.302 s22 + = 0.241 + = SE = n1 n2 65 64 The t-test statistic is:
Statistic − Null −0.46 − 0 = = −1.91 SE 0.241 This is a left-tail test so the p-value is the area to the left of −1.91 in a t-distribution with df = 63. We see that the p-value is 0.030. At a 5% significance level, we reject H0 . We have evidence that mean ratings of anger are lower when statements are phrased in the noun form rather than the verb form. If you are working on conflict resolution, you should phrase controversial statements using nouns! t=
CHAPTER 6
348
6.239 This is a test for a difference in means. Using μ1 to represent the mean age for trans women and μ2 to represent the mean age for trans men, the hypotheses are: H0 :
μ1 = μ2
Ha :
μ1 = μ2
The sample statistic is x1 − x2 = 41.3 − 35.4 = 5.9. The null hypothesis value is μ1 − μ2 = 0, and the standard error is: s21 16.32 10.82 s22 + = 1.958 SE = + = n1 n2 155 55 The t-test statistic is:
Statistic − Null 5.9 − 0 = = 3.013 SE 1.958 This is a two-tail test so the p-value is twice the area to the right of 3.013 in a t-distribution with df = 55 − 1 = 54. We see that the p-value is 2(0.002) = 0.004. This p-value is very small, so we reject H0 . We have strong evidence that the mean age of surgery to transition is not the same for trans men and trans women (and that trans women appear to have the surgery later in life, on average, than trans men.) t=
6.240 This is a test for a difference in means. Using μ1 to represent the mean age for trans women and μ2 to represent the mean age for trans men, the hypotheses are: H0 :
μ1 = μ2
Ha :
μ1 = μ2
The sample statistic is x1 − x2 = 6.7 − 6.2 = 0.5. The null hypothesis value is μ1 − μ2 = 0, and the standard error is: 3.12 s21 3.62 s22 + = 0.508 + = SE = n1 n2 155 55 The t-test statistic is:
Statistic − Null 0.5 − 0 = = 0.984 SE 0.508 This is a two-tail test so the p-value is twice the area to the right of 0.984 in a t-distribution with df = 55 − 1 = 54. We see that the p-value is 2(0.165) = 0.330. This p-value is not small, so we do not reject H0 . We have not have evidence that the mean age of first experience of gender dysphoria differs between trans men and trans women. Indeed, the earliest perception of being transgender appears to occur at a mean age of around six and a half years in both cases. t=
6.241 (a) Let μU and μE be the average time spent looking at the unexpected and expected populations, respectively. The hypotheses are H0
: μU = μE
Ha
: μU > μE
(b) The relevant sample statistic is xU − xE = 9.9 − 7.5 = 2.4 seconds
CHAPTER 6
349
(c) The t-statistic is t=
statistic − null xU − xE − 0 9.9 − 7.5 = 2 = = 1.74 sU s2E SE 4.52 4.22 + + 20 20 nU nE
(d) The p-value is the proportion above t = 1.74 on a t-distribution with 19 degrees of freedom, which is 0.049. (e) The p-value of 0.049 is less than α = 0.10, so we reject H0 . (f) We have somewhat convincing evidence that babies look longer at the unexpected population after viewing the sample data. 6.242 Let μ1 and μ2 represent mean weight loss after six months for maids in the informed and uninformed groups, respectively. The hypotheses are: H0 :
μ1 = μ2
Ha :
μ1 > μ2
Sample sizes are large enough to use a t-distribution. We compute the t-statistic: t=
1.79 − 0.2 (x1 − x2 ) − 0 2 = = 2.65 2 s1 s2 2.882 2.322 + + 41 34 n1 n2
and compare it to a t-distribution with 34 − 1 = 33 df. The area in the upper tail beyond t = 2.65 is 0.0061. This is a one-tailed test so the p-value is 0.0061 which is small, providing strong evidence against H0 and in favor of concluding that the informed maids lost more weight (on average). This is a randomized experiment, so we can conclude causality. Thus we do have evidence that for maids, thinking they are exercising really does cause them to lose more weight! 6.243 We first compute the summary statistics Enriched environment: Standard environment:
xEE = 231.7 sEE = 71.2 nEE = 7 xSE = 438.7 sSE = 37.7 nSE = 7
This is a hypothesis test for a difference in means, with μEE representing the mean number of seconds spent in darkness by mice who have lived in an enriched environment and then been exposed to stressinducing events, while μSE represents the same quantity for mice who lived in a standard environment. The hypotheses are: H0 : Ha :
μEE = μSE μEE < μSE
The relevant statistic for this test is xEE − xSE , and the relevant null parameter is zero, since from the null hypothesis we have μEE − μSE = 0. The t-test statistic is: t=
(xEE − xSE ) − 0 231.7 − 438.7 Sample statistic − Null parameter = 2 = = −6.80 2 sEE sSE SE 71.22 37.72 + + 7 7 nEE nSE
This is a lower-tail test, so the p-value is the area to the left of −6.80 in a t-distribution with df = 6. We see that the p-value is 0.00025. This is an extremely small p-value, so we reject H0 and conclude that there is very strong evidence that mice who have been able to exercise in an enriched environment are better able to handle stress.
CHAPTER 6
350
6.244 The hypotheses are H0 : μC = μW vs Ha : μC > μW , where μC and μW are the mean calcium loss after drinking diet cola and water, respectively. The sample sizes are quite small, so we check for extreme skewness or extreme outliers. We see in the dotplots that the data are not too extremely skewed and don’t seem to have any extreme outliers, so a t-distribution is acceptable. The t-test statistic is: t=
(xC − xW ) − 0 56.0 − 49.1 Sample statistic − Null parameter = 2 = = 3.18 sC s2W SE 4.932 3.642 + + 8 8 nC nW
This is an upper-tail test, so the p-value is the area above 3.18 in a t-distribution with df = 7. We see that the p-value is 0.0078. This is a very small p-value, so we reject H0 and conclude that there is strong evidence (even with such a small sample size) that diet cola drinkers do lose more calcium, on average, than water drinkers. Another reason to drink more water and less diet cola! 6.245
(a) This is an experiment since the subjects are randomly assigned to drink tea or coffee.
(b) This is a difference in means test and we are testing H0 : Ha :
μT = μC μT > μC
where μT and μC are the mean production levels of interferon gamma in tea drinkers and coffee drinkers, respectively. (c) For the tea drinkers, we find that xT = 34.82 with sT = 21.1 and nT = 11. For the coffee drinkers, we have xC = 17.70 with sC = 16.7 and nC = 10. The appropriate statistic is xT − xC and the null parameter is zero, so we have t
= =
Sample statistic − Null parameter SE (xT − xC ) − 0 2 sT s2C nT + nc
=
34.82 − 17.70 21.12 16.72 11 + 10
=
2.07
The smaller sample size is 10, so we find the p-value using a t-distribution with df = 9. This is an upper tail test, so we find that the p-value is 0.0342. At a 5% level, we conclude that there is evidence that mean production of this disease fighting molecule is enhanced by drinking tea. (d) The sample sizes are small so it is hard to tell whether the data are normal or not. Dotplots for each group are shown below. Notice that most of the values are in the “tails” with few dots in the middle. This may cast some doubt on the normality condition. To be on the safe side, a randomization test might be more appropriate.
CHAPTER 6
351
(e) To create a randomization statistic we scramble the tea/coffee assignments (so they aren’t related to the interferon gamma production values) and find the difference in means between the two groups. Repeating this 1000 times produces a randomization distribution of means such as the one below.
In this distribution, 29 of the 1000 randomizations produced a difference in means that was larger than the difference of 17.12 that was observed in the original sample. This gives a p-value of 0.029 which is fairly small, giving evidence that mean interferon gamma production is higher when drinking tea than when drinking coffee. Note that the p-value and strength of evidence are similar to what we found in part (c) with the t-distribution. (f) Since this was a well-designed experiment, we can conclude that there is some evidence that drinking (lots of) tea enhances a person’s immune response (at least as measured by interferon gamma production). 6.246 The hypotheses are H0 : μM = μE vs Ha : μM > μE , where μM and μE represent the mean length of foraging trips in days for metal tagged and electronic tagged penguins, respectively. The sample sizes are large so it is appropriate to use the t-distribution. The t-test statistic is: t=
(xM − xE ) − 0 12.70 − 11.60 Sample statistic − Null parameter = 2 = = 3.89 2 sM sE SE 3.712 4.532 + + 344 512 nM nE
This is an upper-tail test, so the p-value is the area above 3.89 in a t-distribution with df = 343. We see that the p-value is 0.00006, or essentially zero. This is an extremely small p-value, so we reject H0 and conclude that there is very strong evidence that foraging trips are longer on average for penguins with a metal tag. 6.247 The hypotheses are H0 : μM = μE vs Ha : μM > μE , where μM and μE represent the mean arrival time (in days after November 1st ) for metal tagged and electronic tagged penguins, respectively. The sample sizes are large so it is appropriate to use the t-distribution. The t-test statistic is: t=
(xM − xE ) − 0 Sample statistic − Null parameter 37 − 21 = 2 = = 4.44 sM s2E SE 38.772 27.502 + + 167 189 nM nE
This is an upper-tail test, so the p-value is the area above 4.44 in a t-distribution with df = 166. We see that the p-value is 0.000008, or essentially zero. This is an extremely small p-value, so we reject H0 and conclude that there is very strong evidence that penguins with metal tags arrive at the breeding site later, on average, than penguins with electronic tags.
CHAPTER 6
352
6.248 We choose a one-tailed test H0 : μI = μS vs Ha : μI < μS , where μI and μS are the mean costs in the Individual and Split situations. We compute a t-statistic t
=
(xI − xS ) − 0 2 sI s2S nI + nS
=
37.29 − 50.92 2 12.542 + 14.33 24 24
=
−3.51
Using the lower tail of a t-distribution with 24 − 1 = 23 degrees of freedom, we find the p-value is only 0.0009. Since this p-value is so small, we have strong evidence that the mean cost of meals is lower when diners pay individually than when they split the bill. 6.249 We choose a two-tailed test H0 : μf = μm vs Ha : μf = μm , where μf and μm are the mean meal costs for females and males, respectively. We compute a t-statistic t
=
(xf − xm ) − 0
=
44.46 − 43.75 2 15.482 + 14.81 24 24
=
0.16
s2f s2m nf + nm
Using two tails from a t-distribution with 24 − 1 = 23 degrees of freedom we find the p-value is 2 × 0.437 = 0.874. This is not at all a small p-value, so we have no strong evidence of a difference in the mean cost of meal orders between male and females. 6.250 (a) All else being equal, the farther apart the means are, the more evidence there is for a difference in means and the smaller the p-value. Option 2 will give a smaller p-value. (b) All else being equal, the smaller the standard deviations, the easier it is to detect a difference between two means, so the more evidence there is for a difference in means and the smaller the p-value. Option 2 will give a smaller p-value. (c) All else being equal, the larger the sample size, the more accurate the results. Since the sample means are different in this case, the larger the sample size, the more evidence there is for a difference in means and the smaller the p-value. Option 1 will give a smaller p-value. For each of these cases we can also consider the formula for the t-statistic (x1 − x2 ) t= 2 s1 s22 n1 + n2 which is positive for each of the situations. A larger t-statistic will be farther in the tail of the t-distribution and yield a smaller p-value. This occurs when the sample means are farther apart, the standard deviations are smaller, or the sample sizes are larger. 6.251 Let μ1 be the mean grade for students taking a quiz in the second half of class (Late) and μ2 be the mean grade for quizzes at the beginning of class (Early). The relevant hypotheses are H0 : μ1 = μ2 vs
CHAPTER 6
353
Ha : μ1 = μ2 . The computer output shows a test statistic of t = 1.87 and p-value=0.066. This is a somewhat small p-value, but not quite significant at a 5% level. The data from this sample show some evidence that it might be better to have the quiz in the second part of class, but that evidence is not very strong. 6.252 Let μ0 be the mean heart rate for patients who live and μ1 be the mean heart rate for patients who die. The relevant hypotheses are H0 : μ0 = μ1 vs Ha : μ0 = μ1 . The computer output shows a test statistic of t = −0.45 and p-value=0.653. This very large p-value shows no convincing evidence that there is a difference in heart rates between patients who live and die. 6.253 The hypotheses are H0 : μf = μm vs Ha : μf = μm , where μf and μm are the respective mean exercise times in the population. Using any statistics package, we see that the t-statistic is −2.98 and the p-value is 0.003. We reject H0 and conclude that there is a significant evidence showing a difference in mean number of hours a week spent exercising between males and females. 6.254 The hypotheses are H0 : μf = μm vs Ha : μf = μm , where μf and μm are the respective mean TV amounts in the population. Using technology, we see that t-statistic is −4.25 and the p-value is 0.000. We reject H0 and conclude that there is strong evidence of a difference in mean number of hours a week spent watching TV between males and females. 6.255 To compare mean commute times, the hypothesis are H0 : μf = μm vs Ha : μf = μm , where μf and μm are the mean commute times for all female and male commuters in Atlanta. Here are summary statistics for the commute times, broken down by sex, for the data in CommuteAtlanta. Sex F M
N 246 254
Mean 26.8 31.3
StDev 17.3 23.4
26.8 − 31.3 = −2.45 The value of the t-statistic is t = 17.32 23.42 246 + 254 We find the p-value using a t-distribution with 246 − 1 = 245 degrees of freedom. The area in the lower tail of this distribution below −2.47 is 0.007. Doubling to account for two tails gives p-value = 0.014. This p-value is smaller than the significance level (α = 0.05) so we reject the null hypothesis and conclude that the average commute time for women in Atlanta is different from (and, in fact, less than) the average commute time for men.
CHAPTER 6
354 Section 6.5 Solutions
6.256 For a confidence interval for the average difference μd = μ1 − μ2 using the t-distribution and paired sample data, we use sd x d ± t∗ · √ nd We use a t-distribution with df = 29, so for a 95% confidence interval, we have t∗ = 2.05. The confidence interval is
3.7
±
sd t∗ · √ nd 2.1 2.05 · √ 30 0.79
2.91
to
4.49
xd
±
3.7
±
The best estimate for the mean difference μd = μ1 − μ2 is 3.7, the margin of error is 0.79, and the 95% confidence interval for μd is 2.91 to 4.49. We are 95% confident that the mean for population or treatment 1 is between 2.91 and 4.49 larger than the mean for population or treatment 2. 6.257 For a confidence interval for the average difference μd = μ1 − μ2 using the t-distribution and paired sample data, we use sd x d ± t∗ · √ nd We use a t-distribution with df = 99, so for a 90% confidence interval, we have t∗ = 1.66. The confidence interval is xd
±
556.9
±
556.9 533.1
± to
sd t∗ · √ nd 143.6 1.66 · √ 100 23.8 580.7
The best estimate for the mean difference μd = μ1 − μ2 is 556.9, the margin of error is 23.8, and the 90% confidence interval for μd is 533.1 to 580.7. We are 90% confident that the mean for population or treatment 1 is between 533.1 and 580.7 larger than the mean for population or treatment 2. 6.258 For paired difference in means, we begin by finding the differences d = Treatment 1 − Treatment 2 for each pair. These are shown in the table below. Difference
4
−2
6
4
7
The mean of these five differences is xd = 3.8 with standard deviation sd = 3.5. For a confidence interval for the average difference μd = μ1 − μ2 using the t-distribution and paired sample data, we use sd x d ± t∗ · √ nd
CHAPTER 6
355
We use a t-distribution with df = 4, so for a 99% confidence interval, we have t∗ = 4.60. The confidence interval is sd x d ± t∗ · √ nd 3.5 3.8 ± 4.60 · √ 5 3.8 ± 7.20 −3.4
to
11.0
The best estimate for the mean difference μd = μ1 − μ2 is xd = 3.8, the margin of error is 7.20, and the 99% confidence interval for μd is −3.4 to 11.0. We are 99% confident that the mean for Treatment 1 is between 3.4 smaller and 11.0 larger than the mean for Treatment 2. 6.259 For paired difference in means, we begin by finding the differences d = Situation 1 − Situation 2 for each pair. These are shown in the table below. Case 1 2 3 4 5 6 7 8
Difference −8 −3 3 −16 −7 10 −3 −1
The mean of these 8 differences is xd = −3.13 with standard deviation sd = 7.74. For a confidence interval for the average difference μd = μ1 − μ2 using the t-distribution and paired sample data, we use sd x d ± t∗ · √ nd We use a t-distribution with df = 7, so for a 95% confidence interval, we have t∗ = 2.36. The confidence interval is sd x d ± t∗ · √ nd 7.74 −3.13 ± 2.36 · √ 8 −3.13 ± 6.46 −9.59 to 3.33 The best estimate for the mean difference μd = μ1 − μ2 is −3.13, the margin of error is 6.46, and the 95% confidence interval for μd is −9.59 to 3.33. We are 95% confident that the difference in means μ1 − μ2 is between −9.59 and +3.33. (Notice that 0 is within this confidence interval, so based on this sample data we cannot be sure that there is a difference between the means.) 6.260 In general, the standardized test statistic is Sample Statistic − Null Parameter SE
CHAPTER 6
356
In this test for a paired difference in means, the sample statistic is xd and the parameter from the null hypothesis is 0 since the null hypothesis statement is equivalent to μ1 − μ2 = μd = 0. The standard error is √ sd / nd , and the t-test statistic is: xd − 0 15.7 t = sd = 12.2 = 6.43 √
√
nd
25
This is a two-tail test so, using a t-distribution with df = 24, we multiply the area in the tail above 6.43 by two to obtain the p-value. The area above 6.4 is close to zero, however, so the p-value is essentially zero. This sample provides very strong evidence that the two means are not the same. 6.261 In general, the standardized test statistic is Sample Statistic − Null Parameter SE In this test for a paired difference in means, the sample statistic is xd and the parameter from the null hypothesis is 0 since the null hypothesis statement is equivalent to μ1 − μ2 = μd = 0. The standard error is √ sd / nd , and the t-test statistic is: xd − 0 −2.6 t = sd = 4.1 = −2.69 √
√
nd
18
This is a two-tail test so, using a t-distribution with df = 17, we multiply the area in the tail below −2.69 by two to obtain a p-value of 2(0.008) = 0.016. At a 5% level, we reject the null hypothesis and find evidence that the means are not the same. 6.262 For paired difference in means, we begin by finding the differences d = Treatment 1 − Treatment 2 for each pair. These are shown in the table below. Difference
−2
−8
−7
−4
0
3
−1
2
The mean of these eight differences is xd = −2.125 with standard deviation sd = 3.98. In general, the standardized test statistic is Sample Statistic − Null Parameter SE In this test for a paired difference in means, the sample statistic is xd and the parameter from the null hypothesis is 0 since the null hypothesis statement is equivalent to μ1 − μ2 = μd = 0. The standard error is √ sd / nd , and the t-test statistic is: t=
xd − 0 √sd nd
=
−2.125 3.98 √ 8
= −1.51
This is a lower tail test and the p-value is the area below −1.51, in a t-distribution with df = 7. We see that the p-value is 0.0874. At a 5% level, we do not have enough evidence to reject H0 and do not find enough evidence to conclude that μ1 is less than μ2 . 6.263 For paired difference in means, we begin by finding the differences d = Situation 1 − Situation 2 for each pair. These are shown in the table below. Difference
5
11
−10
25
−7
0
20
−7
6
7
CHAPTER 6
357
The mean of these ten differences is xd = 5.0 with standard deviation sd = 11.6. In general, the standardized test statistic is Sample Statistic − Null Parameter SE In this test for a paired difference in means, the sample statistic is xd and the parameter from the null hypothesis is 0 since the null hypothesis statement is equivalent to μ1 − μ2 = μd = 0. The standard error is √ sd / nd , and the t-test statistic is: xd − 0 5.0 t = sd = 11.6 = 1.36 √
√
nd
10
This is an upper tail test and the p-value is the area above 1.36, in a t-distribution with df = 9. We see that the p-value is 0.103. Even at a 10% level, we do not find a significant difference between the two means. We do not find sufficient evidence to conclude that μ1 is greater than μ2 . 6.264 This is a matched pairs experiment, since all 29 men participate in both treatments (with and without the laptop) and we are interested in the differences. We use paired data analysis. 6.265 This is a matched pairs experiment, since all 50 men get both treatments in random order and we are looking at the differences in the results. We use paired data analysis. 6.266 Since the men are divided into two separate groups and the two groups get different treatments, this is a difference in means using two separate groups. 6.267 This is a matched pairs experiment, since the “treatment” students are each matched with a similar “control” student. We use paired data analysis. 6.268 Since there are two separate groups of men doing the rating after being randomly assigned to one of the conditions, this is a difference in means using two separate groups. 6.269 This is a matched pairs experiment since the twins are matched and we are investigating the differences between the twins. We use paired data analysis. 6.270 (a) The same 5 participants were used in both groups, with before and after blood samples drawn from each of them. This is a matched pairs experiment. (b) The mean difference is xd = 293 with a standard deviation in the differences of sd = 242 and nd = 5. We use the t-distribution with df = 4 to find t∗ = 2.13 for a 90% confidence interval. The confidence interval is: xd
±
293
±
293 62.5
± to
sd t∗ · √ nd 242 2.13 · √ 5 230.5 523.5
The mean increase in interferon gamma production in response to a bacteria stimulus after drinking five cups of tea for a week is between 62.5 pg/mL and 523.5 pg/mL. This is quite a large spread, which is not surprising given that we only had a sample size of n = 5.
CHAPTER 6
358
6.271 We are testing H0 : μT = μN vs Ha : μT > μN which we can also write as H0 : μD = 0 vs Ha : μd > 0, where μT is mean production after drinking tea, μN is mean production with no tea, and μd is the difference μT − μN . The original statistic is xd = 293, the standard deviation of the differences is sd = 242 and the sample size is n = 5. The null parameter is 0 and the standardized test statistic is t=
xd − 0 293 − 0 √ = 2.71 √ = sd / nd 242/ 5
The p-value is found using the t-distribution with degrees of freedom 4. The p-value is 0.027 which is less than the 5% significance level. We conclude that mean production of this disease fighting molecule is higher after drinking tea. 6.272 Using μd as the mean difference in ages of all male-female married couples, we have H0 :
μd = 0
Ha :
μd > 0
The sample statistic is xd = 2.829 and the null hypothesis value is 0. The standard error is 4.995 sd = 0.487 SE = √ = √ nd 105 The t-test statistic is:
2.829 − 0 Statistic − Null = = 5.8 SE 0.487 This is a right-tail test so the p-value is the area to the right of 5.8 in a t-distribution with df = 104. The value of 5.8 is way out in the tail, and we see that the p-value is approximately 0. This p-value is very small and certainly less than any reasonable significance level. We reject H0 . We have strong evidence that the average age of husbands is greater than the average age of wives for male-female married couples. t=
6.273 The sample statistic is xd = 2.829. We use the t-distribution with df = 104 to see that t∗ = 1.983. The standard error is sd 4.995 = 0.487 SE = √ = √ nd 105 The 95% confidence interval is: Statistic
±
t∗ · SE
2.829 2.829 1.863
± ± to
(1.983) · (0.487) 0.966 3.795
We are 95% confident that, on average, husbands are between 1.86 and 3.80 years older than their wives. 6.274 (a) A matched pairs design is appropriate because testosterone levels vary greatly between different men, and the matched pairs design eliminates this variability as a factor. (b) We are testing for a difference in two means from a matched pairs experiment. Using the differences, we have H0 : Ha :
μd = 0 μd > 0
CHAPTER 6
359
where μd is the mean of differences, where the differences are computed as (testosterone level after salt solution − testosterone level after sniffing tears). Notice that this is a one-tail test since we are specifically testing to see if testosterone levels are reduced after sniffing female tears. The t-test statistic is: xd − 0 21.7 √ = 3.30 t= √ = sd / nd 46.5/ 50 Using the t-distribution with degrees of freedom 49, we find a p-value of 0.0009. This provides strong evidence to reject H0 and conclude that sniffing female tears significantly reduces testosterone levels in men. (c) Yes, we can conclude that sniffing female tears reduces testosterone levels since the results are significant and come from a double-blind, placebo-controlled, randomized experiment. 6.275 To find a 99% confidence interval using this paired difference in means data (with degrees of freedom 49), we use t∗ = 2.68. We have xd
±
21.7
±
21.7 4.1
± to
sd t∗ √ nd 46.5 2.68 · √ 50 17.6 39.3
We are 99% confident that the average decrease in testosterone level in a man who has sniffed female tears is between 4.1 and 39.3 pg/ml. 6.276 Because the data are paired, we compute a difference (Quiz pulse − Lecture pulse) for each pair. These differences are displayed in the table and dotplot below. Student Quiz − Lecture
1 +2
2 −1
3 +5
4 −8
5 +1
6 +20
7 +15
8 −4
9 +9
10 −12
Mean 2.7
Std. Dev. 9.93
The distribution of differences appears to be relatively symmetric with no clear outliers, so we can use the t-distribution. From the Quiz − Lecture differences in the table above we find that the mean difference is xd = 2.7 and the standard deviation of the differences is sd = 9.93. Since there are 10 students in the sample, we find t∗ = 2.262 using an area of 0.025 in the tail of a t-distribution with 10 − 1 = 9 degrees of freedom. To find the confidence interval for the mean difference in pulse rates we use 9.93 2.7 ± 2.262 · √ = 2.7 ± 7.1 = (−4.4, 9.8) 10 Based on these results, we are 95% sure that the mean pulse rate for students during a quiz is between 4.4 beats less and 9.8 beats more than the mean pulse rate during lecture.
CHAPTER 6
360
6.277 Because the data are paired, we compute a difference (Quiz pulse − Lecture pulse) for each pair. These differences are displayed in the table and dotplot below. Student Quiz − Lecture
1 +2
2 −1
3 +5
4 −8
5 +1
6 +20
7 +15
8 −4
9 +9
10 −12
Mean 2.7
Std. Dev. 9.93
The distribution of differences appears to be relatively symmetric with no clear outliers, so we can use the t-distribution. From the Quiz − Lecture differences in the table above we find that the mean difference is xd = 2.7 and the standard deviation of the differences is sd = 9.93. We can express the hypotheses as H0 : μQ = μL vs Ha : μQ > μL , where μQ and μL are the mean pulse rates during a quiz and lecture, respectively. Equivalently, we can use H0 : μd = 0 vs Ha : μd > 0, where μd is the mean difference of quiz minus lecture pulse rates. These two ways of expressing the hypotheses are equivalent, since μd = μQ − μL , and either way of expressing the hypotheses is acceptable. From the differences and summary statistics, we compute a t-statistic with xd − 0 2.7 √ = 0.86 t= √ = sd / nd 9.93/ 10 To find the one-tailed p-value for this t-statistic we use the area above 0.86 in a t-distribution with 10− 1 = 9 degrees of freedom. This gives a p-value of 0.206 which is not significant at a 5% level. Thus the data from these 10 students do not provide convincing evidence that mean quiz pulse rate is higher than the mean pulse rate during lecture. 6.278 Letting μS and μO represent mean enjoyment rating with the spoiler and for the original version, respectively, the hypotheses are: H0 :
μS = μO
Ha :
μS = μO
To find the t-statistic, we find the differences for all 12 stories, using the spoiler rating minus the rating for the original. Story With spoiler Original Difference
1 4.7 3.8 0.9
2 5.1 4.9 0.2
3 7.9 7.4 0.5
4 7.0 7.1 -0.1
5 7.1 6.2 0.9
6 7.2 6.1 1.1
7 7.1 6.7 0.4
8 7.2 7.0 0.2
9 4.8 4.3 0.5
10 5.2 5.0 0.2
11 4.6 4.1 0.5
12 6.7 6.1 0.6
We compute the summary statistics for the differences: xd = 0.492 with sd = 0.348 and nd = 12. The t-test statistic is: xd − 0 0.492 √ = 4.90 t= √ = sd / nd 0.348/ 12 Using the t-distribution with degrees of freedom 11, we find the area above 4.90 to be 0.0002. This is a two-tailed test so the p-value is 2(0.0002) = 0.0004. This provides strong evidence to reject H0 and conclude that there is a difference in enjoyment rating of the short stories based on whether or not there is a spoiler. From the data, we see, surprisingly, that stories with a spoiler are preferred.
CHAPTER 6
361
6.279 (a) The data are matched by which story it is, so a matched pairs analysis is appropriate. The matched pairs analysis is particularly important here because there is a great deal of variability between the enjoyment level of the different stories. (b) We first find the differences for all 12 stories, using the spoiler rating minus the rating for the original. Story With spoiler Original Difference
1 4.7 3.8 0.9
2 5.1 4.9 0.2
3 7.9 7.4 0.5
4 7.0 7.1 −0.1
5 7.1 6.2 0.9
6 7.2 6.1 1.1
7 7.1 6.7 0.4
8 7.2 7.0 0.2
9 4.8 4.3 0.5
10 5.2 5.0 0.2
11 4.6 4.1 0.5
12 6.7 6.1 0.6
We compute the summary statistics for the differences: xd = 0.492 with sd = 0.348 and nd = 12. Using a t-distribution with df = 11 for a 95% confidence level, we have t∗ = 2.20. We find the 95% confidence interval: sd x d ± t∗ · √ nd 0.348 0.492 ± 2.20 · √ 12 0.492 ± 0.221 0.271
to
0.713
We are 95% sure that mean enjoyment rating of a version of a story with a spoiler is between 0.271 and 0.713 higher than the mean enjoyment rating of the original story without it. 6.280 Since each baby received both the speech and humming treatment, this is a matched pairs design, and since we are testing to see if there is a preference this is a one-tail test, H0 : μd = 0 vs Ha : μd > 0, where μd is the mean difference (speaking minus humming). The matched pairs test statistic is t=
Statistic − Null value xd − 0 27.79 − 0 √ = 3.110 = √ = SE sd / nd 63.18/ 50
We find the proportion in the right tail beyond 3.110 of a t-distribution with 49 degrees of freedom, giving us a p-value of 0.0015. This is a small p-value and we reject the null hypothesis and conclude that we do have evidence to conclude that babies prefer speech to humming. 6.281 Since each baby received both the speech and singing treatment this is a matched pairs design, and since we are not testing specific direction this is a two-tailed test, H0 : μd = 0 vs Ha : μd = 0, where μd is the mean difference (speaking minus singing). The matched pairs test statistic is t=
xd − 0 Statistic − Null value 10.39 − 0 √ = 1.300 = √ = SE sd / nd 55.37/ 48
We find the proportion in the right tail beyond 1.300 of a t-distribution with 47 degrees of freedom to be 0.100. This is a two-tail test, so the p-value is 0.200. With such a large p-value we fail to reject the null hypothesis and conclude that we don’t have enough evidence to say that babies prefer speaking over singing or vice-versa. 6.282 (a) We are testing specifically if wrinkled hands are faster so H0 : μd = 0 vs Ha : μd < 0, where μd is the mean difference in moving times for dry objects (wrinkled minus unwrinkled). The test statistic is xd − 0 0.85 − 0 Statistic − Null value √ = 0.33 = t= √ = SE sd / nd 11.5/ 20
CHAPTER 6
362
We find the p-value using the lower tail of a t-distribution with 19 df results to be 0.627. So we do not have evidence that wrinkled hands are significantly faster at moving dry objects (in fact in our sample they were slightly slower). (b) Again we are testing specifically if wrinkled hands are faster so H0 : μd = 0 vs Ha : μd < 0, where μd is the mean difference in moving times for wet objects (wrinkled minus unwrinkled). The test statistic now is Statistic − Null value xd − 0 −15.1 − 0 √ = −5.04 t= = √ = SE sd / nd 13.4/ 20 We find the p-value using the lower tail of a t-distribution with 19 df results to be 0.000036. So we do have very strong evidence that wrinkled hands are significantly faster at moving wet objects, possibly indicating that hands adapt to being submerged in water! 6.283 (a) This is a difference in means test with separate samples. We first compute the summary statistics and see that, for the first quiz, n1 = 10 with x1 = 78.6 with s1 = 13.4 and, for the second quiz, n2 = 10 with x = 84.0 with s2 = 8.1. To conduct the test for the difference in means, the hypotheses are: H0 : Ha :
μ1 = μ2 μ1 < μ2
where μ1 is the average grade for all students on the first quiz and μ2 is the average grade for all students on the second quiz. The test statistic is x − x2 78.6 − 84.0 12 = = −1.09 2 s1 s2 13.42 8.12 + + 10 10 n1 n2 Using a t-distribution with df=9, we see that the p-value for this lower tail test is 0.152. This is not significant, even at a 10% level. We do not reject H0 and do not find convincing evidence that the grades on the second quiz are higher. (b) This is a paired difference in means test. We begin by finding the differences: first quiz − second quiz. −6
−1
− 16
−2
0
3
− 12
−2
−5
− 13
The summary statistics for these 10 differences are nd = 10, xd = −5.4 with sd = 6.29. The hypotheses are the same as for part (a) and the t-statistic is t=
xd −5.4 √ = −2.71 √ = sd / nd 6.29/ 10
Using a t-distribution with df = 9, we see that the p-value for this lower-tail test is 0.012. This is significant at a 5% level and almost at even a 1% level. We reject H0 and find evidence that mean grades on the second quiz are higher. (c) The spread of the grades is very large on both quizzes, so the high variability makes it hard to find a difference in means with the separate samples. Once we know that the data are paired, it eliminates the variability between people. In this case, it is much better to collect the data using a matched pairs design. 6.284 We start by creating a new variable to contain the differences, Sleep2 − Sleep1, for each student. The data have 3 cases with missing sleep values, so there are 454 valid differences which have a mean of 2.31
CHAPTER 6
363
hours and standard deviation of 1.84 hours. Using a t-distribution with 453 degrees of freedom and 95% confidence we find t∗ = 1.965. Thus we have 2.31
±
2.31
±
1.84 1.965 · √ 454 0.17
2.14
to
2.48
We are 95% sure that Pennsylvania high school seniors (who do the Census at Schools project) report averaging somewhere between 2.14 and 2.48 hours more of sleep on non-school nights than on school nights. 6.285 We have paired data, HwyMPG and CityMPG, for each of the 110 car models, so we find the difference for each model and then construct the confidence interval for the mean difference. Output is given below Estimation for Paired Difference Difference: mean of (HwyMPG - CityMPG)
Mean 17.809
StDev 4.306
SE Mean 0.411
95% CI for Mean difference (16.995, 18.623)
Based on the output we are 95% sure that the mean highway gas rating is between 17.0 and 18.6 mpg more than for city driving.
CHAPTER 7
412 Section 7.1 Solutions 7.1 The expected count in each category is n · pi = 500(0.25) = 125. See the table. Category Expected count
1 125
2 125
3 125
4 125
7.2 Since the categories are equally likely and there are three of them, the null proportion for each is pi = 1/3. The expected count in each category is n · pi = 1200(1/3) = 400. See the table. Category Expected count
A 400
B 400
C 400
7.3 The expected count in category A is n · pA = 200(0.50) = 100. The expected count in category B is n · pB = 200(0.25) = 50. The expected count in category C is n · pC = 200(0.25) = 50. See the table. Category Expected count
A 100
B 50
C 50
7.4 The expected count in category 1 is n · p1 = 400(0.7) = 280. The expected count in category 2 is n · p2 = 400(0.1) = 40. The expected count in category 3 is n · p3 = 400(0.1) = 40. The expected count in category 4 is n · p4 = 400(0.1) = 40. See the table. Category Expected count
1 280
2 40
3 40
4 40
7.5 We calculate the chi-square statistic using the observed and expected counts. χ2
=
(observed − expected)2 expected (32 − 40)2 (53 − 40)2 (35 − 40) + + 40 40 40 0.625 + 1.6 + 4.225 6.45 2
= = =
There are three categories (A, B, and C) for this categorical variable, so we use a chi-square distribution with degrees of freedom equal to 2. The p-value is the area in the upper tail, which we see is 0.0398. 7.6 We calculate the chi-square statistic using the observed and expected counts. χ2
=
(observed − expected)2
=
expected (35 − 50)2 (54 − 50)2 (61 − 50)2 + + 50 50 50 2.42 + 4.5 + 0.32
=
7.24
=
There are three categories (A, B, and C) for this categorical variable, so we use a chi-square distribution with degrees of freedom equal to 2. The p-value is the area in the upper tail, which we see is 0.0268.
CHAPTER 7
413
7.7 We calculate the chi-square statistic using the observed and expected counts. χ2
= = = =
(observed − expected)2 expected (181 − 160)2 (45 − 40)2 (42 − 40)2 (132 − 160)2 + + + 160 160 40 40 4.90 + 2.76 + 0.63 + 0.10 8.38
There are four categories (A, B, C, and D) for this categorical variable, so we use a chi-square distribution with degrees of freedom equal to 3. The p-value is the area in the upper tail, which we see is 0.039. 7.8 We calculate the chi-square statistic using the observed and expected counts. χ2
=
(observed − expected)2
=
expected (55 − 60)2 (79 − 90)2 (128 − 120)2 (38 − 30)2 + + + 30 60 90 120 2.13 + 0.42 + 1.34 + 0.53
=
4.43
=
There are four categories (A, B, C, and D) for this categorical variable, so we use a chi-square distribution with degrees of freedom equal to 3. The p-value is the area in the upper tail beyond 4.43, or about 0.219. 7.9
(a) The sample size is n = 160 and the hypothesized proportion is pb = 0.25, so the expected count is n · pb = 160(0.25) = 40.
(b) For the “B” cell we have (36 − 40)2 (observed − expected)2 = = 0.4 expected 40 (c) The table has k = 4 cells, so the chi-square distribution has 4 − 1 = 3 degrees of freedom. 7.10
(a) The sample size is n = 500 and the hypothesized proportion is pb = 0.25, so the expected count is n · pb = 500(0.25) = 125.
(b) For the “B” cell we have (148 − 125)2 (observed − expected)2 = = 4.232 expected 125 (c) The table has k = 4 cells, so the chi-square distribution has 4 − 1 = 3 degrees of freedom. 7.11
(a) Add the counts in the table to find the sample size is n = 210+732+396+125+213+324 = 2000. The hypothesized proportion is pb = 0.35, so the expected count is n · pb = 2000(0.35) = 700.
(b) For the “B” cell we have (732 − 700)2 (observed − expected)2 = = 1.46 expected 700
CHAPTER 7
414
(c) The table has k = 6 cells, so the chi-square distribution has 6 − 1 = 5 degrees of freedom. 7.12
(a) Add the counts in the table to find the sample size is n = 132 + 468 = 600. The hypothesized proportion is pb = 0.8, so the expected count is n · pb = 600(0.8) = 480.
(b) For the “B” cell we have (468 − 480)2 (observed − expected)2 = = 0.30 expected 480 (c) The table has just k = 2 cells, so the chi-square distribution has 2 − 1 = 1 degree of freedom. 7.13 The null hypothesis is that the proportion of people using each of the four options is equal, while the alternative hypothesis is that they are not equal. We compute the expected counts using n·pi , where n = 160 and the assumed proportions are p1 = p2 = p3 = p4 = 0.25. For example, the expected count for DoorDash is n · p1 = 160 · 0.25 = 40. Similarly, we see that all four expected counts are 40. For each of the four options, we compute (Observed − Expected)2 /Expected, and the results are shown in the “Contribution” column in the table. App DoorDash Grubhub UberEats Other
Observed 44 43 40 33
Expected 40 40 40 40
Contribution 0.400 0.225 0 1.225
Adding up all the contributions, we obtain a χ2 statistic of 1.850. There are four categories, so we use a chi-square distribution with 3 degrees of freedom, and find the area in the right-tail beyond 1.850. We find a p-value of 0.604. This is a large p-value and does not provide any evidence that the proportions are not equally likely. 7.14
(a) The null hypothesis is that the assumed proportions are correct, while the alternative hypothesis is that they are not correct. We compute the expected counts using n · pi , where n = 5204 and the assumed proportions are p1 = 0.25, p2 = 0.25, and p3 = 0.50. For example, the expected count for ‘Works on campus’ is n · p1 = 5204 · 0.25 = 1301. Similarly, we compute the rest of the expected counts. For each of the three options, we compute (Observed − Expected)2 /Expected, and the results are shown in the “Contribution” column in the table. Paying job? Works on campus Works off campus Does not work
Observed 1436 1119 2649
Expected 1301 1301 2602
Contribution 14.01 25.46 0.85
Adding up all the contributions, we obtain a χ2 statistic of 40.32. There are three categories, so we use a chi-square distribution with 2 degrees of freedom, and find the area in the right-tail beyond 40.32. This statistic is very far out in the tail, and we obtain a p-value of approximately zero. This is a very small p-value and provides very strong evidence that the assumed proportions are not correct. (b) We see that the category of students working off campus provides the greatest contribution, and the observed count is less than we expect from the assumed proportions.
CHAPTER 7 7.15
415
(a) Let pg , po , pp , pr , and py be the proportion of people who choose each of the respective flavors. If all flavors are equally popular (1/5 each) the hypotheses are H0 :
pg = po = pp = pr = py = 0.2
Ha :
Some pi = 0.2
(b) If they were equally popular we would have 66(1/5) = 13.2 people in each category. (c) Since we have 5 categories we have 4 degrees of freedom. (d) We calculate the test statistic.
χ2
= = =
(9 − 13.2)2 (15 − 13.2)2 (13 − 13.2)2 (11 − 13.2)2 (18 − 13.2)2 + + + + 13.2 13.2 13.2 13.2 13.2 1.75 + 1.34 + 0.25 + 0.00 + 0.37 3.70
(e) The test statistic 3.70 compared to a chi-square distribution with 4 degrees of freedom yields a p-value of 0.449. We fail to reject the null hypothesis, meaning the data don’t provide significant evidence that some skittle flavors are more popular than others. 7.16 If we let pr , pp , and ps represent the proportions of rock, paper, and scissor choices, the hypotheses are H0 :
pr = pp = ps = 1/3
Ha :
Some pi = 1/3
The expected count is 119(1/3) = 39.7 for each cell. The chi-square statistic is χ2
=
(39 − 39.7)2 (14 − 39.7)2 (66 − 39.7)2 + + 39.7 39.7 39.7 17.4 + 0.01 + 16.6
=
34.01
=
2
The test statistic, χ = 34.01, lies very far in the tail of a chi-square distribution with 2 degrees of freedom, so the p-value is very close to zero. This gives strong evidence that the choices made on the first turn of a rock-paper scissors game are not all equally likely. Comparing the expected counts to the observed counts it appears that “rock” is used more often and “scissors” is less frequent than expected. Unless your opponent has also looked at this study, it might be smart to start with paper. 7.17 Let pg , pc , and pw represent the proportion of people who prefer chicken cooked on gas, charcoal, and wood pellet grills, respectively. The hypotheses to test are H0 : pg = pc = pw = 1/3 Ha : Some pi = 1/3 For a sample of size n = 114, the expected counts for each group are 114 · 1/3 = 38. The chi-square statistic for this sample is (39 − 38)2 (34 − 38)2 (41 − 38)2 + + = 0.684 38 38 38 Using the upper tail of a chi-square distribution with 2 df, the p-value is 0.710. This is not a small p-value so we do not reject H0 . There is not enough evidence in this sample to find a preference for chicken cooked on one type of grill over another.
CHAPTER 7
416
7.18 The null hypothesis is that the proportion of people using each sleep position is as stated on the website, while the alternative hypothesis is that at least one of the proportions is not as stated. We compute the expected counts using n · pi , so, for example, the expected count for the fetal position is 1000 · 0.41 = 410. The other expected counts are shown in the table below. For each of the five options, we compute (observed − expected)2 /expected, and the results are shown in the “Contribution” column in the table. Sleep position Fetal Side, legs straight Back Stomach None
Observed 391 257 156 89 107
Expected 410 280 130 70 110
Contribution 0.880 1.889 5.2 5.157 0.082
Adding up all the contributions, we obtain a χ2 statistic of 13.208. Compared to a chi-square distribution with 4 degrees of freedom we get a p-value of 0.010. We conclude that the proportions appear to be slightly different than those stated on the website. In particular, more people than expected appear to start sleeping on their back or stomach. 7.19 This is a chi-square goodness-of-fit test. (a) We see that the number of boys diagnosed with ADHD is 6880 + 7982 + 9161 + 8945 = 32, 968. (b) The expected count for January to March is n · pi = 32, 968(0.244) = 8044.2. We find the other expected counts similarly, shown in the table below. Birth date Jan–Mar Apr–Jun Jul–Sep Oct–Dec
Observed 6880 7982 9161 8945
Expected 8044.2 8505.7 8472.8 7945.3
Contribution to χ2 168.5 32.2 55.9 125.8
(c) The contribution to the chi-square statistic for the January to March cell is (observed − expected)2 (6880 − 8044.2)2 = = 168.5 expected 8044.2 This number and the contribution for each of the other cells, computed similarly, are shown in the table above, and the chi-square statistic is the sum: χ2 = 168.5 + 32.2 + 55.9 + 125.8 = 382.4. (d) Since there are four categories, one for each quarter of the year, the degrees of freedom is 4 − 1 = 3. The chi-square test statistic is very large (way out in the far reaches of the tail of the χ2 -distribution) so the p-value is essentially zero. (e) There is very strong evidence that the distribution of ADHD diagnoses for boys differs from the proportions of births in each quarter. By comparing the observed and expected counts we see that younger children in a classroom (Oct–Dec) are diagnosed more frequently than we expect, while older children in a class (Jan–Mar) are diagnosed less frequently. 7.20 This is a chi-square goodness-of-fit test. (a) We see that the number of girls diagnosed with ADHD is 1960 + 2358 + 2859 + 2904 = 10,081.
CHAPTER 7
417
(b) The expected count for January to March is n·pi = 10,081(0.243) = 2449.7. We find the other expected counts similarly, shown in the table below. Birth date Jan–Mar Apr–Jun Jul–Sep Oct–Dec
Observed 1960 2358 2859 2904
Expected 2449.7 2600.9 2590.8 2439.6
Contribution to χ2 97.9 22.7 27.8 88.4
(c) The contribution to the chi-square statistic for the January to March cell is (observed − expected)2 (1960 − 2449.7)2 = = 97.9 expected 2449.7 This number and the contribution for each of the other cells, computed similarly, are shown in the table above, and the chi-square statistic is the sum: χ2 = 97.9 + 22.7 + 27.8 + 88.4 = 236.8. (d) Since there are four categories, one for each quarter of the year, the degrees of freedom is 4 − 1 = 3. The chi-square test statistic is very large (way out in the far reaches of the tail of the χ2 -distribution) so the p-value is essentially zero. (e) There is very strong evidence that the distribution of ADHD diagnoses for girls differs from the proportions of births in each quarter. By comparing the observed and expected counts we see that younger children in a classroom (Oct–Dec) are diagnosed more frequently than we expect, while older children in a class (Jan–Mar) are diagnosed less frequently. 7.21 Let p1 , p2 , p3 , and p4 be the proportion of hockey players born in the 1st , 2nd , 3rd , and 4th quarter of the year, respectively. We are testing H0 : p1 = 0.237, p2 = 0.259, p3 = 0.259, and p4 = 0.245 Ha : Some pi is not specified as in H0 The total sample size is n = 147 + 110 + 52 + 50 = 359. The expected counts are 359(0.237) = 85 for Qtr 1, 359(0.259) = 93 for Qtr 2, 359(0.259) = 93 for Qtr 3, and 359(0.245) = 88 for Qtr 4. The chi-square statistic is (110 − 93)2 (52 − 93)2 (50 − 88)2 (147 − 85)2 + + + = 82.6 χ2 = 85 93 93 88 We use the chi-square distribution with 4 − 1 = 3 degrees of freedom, which gives a very small p-value that is essentially zero. This is strong evidence that the distribution of birthdates for OHL hockey players differs significantly from the national proportions. 7.22 The sample size is n = 196 + 162 + 137 + 122 = 617. We multiply this sample size by the national proportion in each quarter to get the expected counts of 153.0, 154.9, 156.7, and 152.4, respectively. We see that the actual number of athletes’ birthdates is higher than expected for the first six months (two quarters) of the year and lower than expected for the second six months. For the hypotheses for the test, we can use the proportions given or give a more general null and alternative hypothesis, such as: H0 : Ha :
The proportions of athletes born in each quarter matches the proportions nationally Some proportion for athletes is different from the national proportion
CHAPTER 7
418 We calculate the chi-square statistic using the observed and expected counts. χ2
= = = =
(observed − expected)2 expected (162 − 154.9)2 (137 − 156.7)2 (122 − 152.4)2 (196 − 153.0)2 + + + 153.0 154.9 156.7 152.4 12.08 + 0.33 + 2.48 + 6.06 20.95
There are four categories, so we use a chi-square distribution with degrees of freedom equal to 3. The p-value is the area in the upper tail beyond 20.95, which we see is 0.0001. There is strong evidence that birthdates of athletes do not match the national distribution of birthdates. Being born early in the year appears to significantly increase the likelihood of growing up to play in the Australian Football League, while being born late in the year appears to decrease the likelihood. (This same effect has been found in European soccer and Canadian hockey.) 7.23
(a) A χ2 goodness-of-fit test was most likely done.
(b) Since the results are given as statistically significant, the χ2 -statistic is likely to be large. (c) Since the results are given as statistically significant, the p-value is likely to be small. (d) The categorical variable appears to record the number of deaths due to medication errors in different months at hospitals. (e) The cell giving the number of deaths in July appears to contribute the most to the χ2 -statistic. (f) In July, the observed count is probably much higher than the expected count. 7.24
(a) Since the results are given as statistically significant, the χ2 -statistic is likely to be large.
(b) Since the results are given as statistically significant, the p-value is likely to be small. (c) In the week before the festival, the expected count is higher than the observed count. This tells us that some elderly people may be able to delay death. (d) For the week before the festival, the contribution to the χ2 statistic is (observed − expected)2 (33 − 50.82)2 = = 6.249 expected 50.82 (e) In the week after the festival, the observed count is higher than the expected count. This tells us that, although some elderly people are able to delay death, they don’t delay it for very long. (f) For the week after the festival, the contribution to the χ2 statistic is (observed − expected)2 (70 − 52)2 = = 6.231 expected 52 (g) The control group allows us to attribute the difference specifically to the meaningful event (the Harbor Moon Festival) since the effect was only seen in the group who found this event meaningful. 7.25
(a) There are 6 actors and we are testing for a difference in popularity. The null hypothesis is that each of the proportions is 1/6 while the alternative hypothesis is that at least one of the proportions is not 1/6. The sample size is 98 + 5 + 23 + 9 + 25 + 51 = 211, so the expected count for each actor is Expected count for each actor = n · pi = 211(1/6) = 35.2
CHAPTER 7
419
The chi-square test statistic calculated using the observed data and these expected counts χ2
=
(observed − expected)2
=
expected (5 − 35.2)2 (23 − 35.2)2 (9 − 35.2)2 (25 − 35.2)2 (51 − 35.2)2 (98 − 35.2)2 + + + + + 35.2 35.2 35.2 35.2 35.2 35.2 112.3 + 25.9 + 4.2 + 19.5 + 3.0 + 7.1
=
172.0
=
This chi-square statistic gives a very small p-value of essentially zero when compared to a chi-square distribution with 5 degrees of freedom. There is strong evidence of a difference in the popularity of the James Bond actors. (b) If we eliminate one actor, the null hypothesis is that each of the proportions is 1/5 while the alternative hypothesis is that at least one of the proportions is not 1/5. The sample size without the 5 people who selected George Lazenby is 98 + 23 + 9 + 25 + 51 = 206, so the expected count for each actor is Expected count for each actor = n · pi = 206(1/5) = 41.2 The chi-square statistic calculated using the observed data (without Lazenby) and these expected counts is χ2
= = = =
(observed − expected)2 expected (23 − 41.2)2 (9 − 41.2)2 (25 − 41.2)2 (51 − 41.2)2 (98 − 41.2)2 + + + + 41.2 41.2 41.2 41.2 41.2 78.3 + 8.0 + 25.2 + 6.4 + 2.3 120.2
This is still a very large χ2 -statistic and we again have a p-value of essentially zero when it is compared to a chi-square with 4 degrees of freedom. Even with the Lazenby data omitted, we still find substantial differences in the proportions of fans who choose the different James Bond actors. (c) No, we should not generalize the results from this online survey to a population of all movie watchers. This poll was a volunteer poll completed by people visiting a James Bond fan site. This is definitely not a random sample of the movie watching population and could easily be biased. The best inference we could hope for is to generalize to people who visit a James Bond fan website and who participate in online polls. 7.26
(a) We see in the bottom row of the output that n = 436. (We could also add the observed counts.)
(b) We see in the output that the observed value for RR is 130 and the expected count is 109. (c) The contribution to the χ2 statistic is the highest for those with the XX variant, which contributes 7.716. For this category, the observed count (80) is less than expected (109). (d) We see in the bottom row of the output that df=2. (We could also compute df using 3 categories minus 1.) (e) We see in the bottom row of the output that the p-value is 0.002. This is a small p-value and provides evidence that at least one of the proportions of the three variants differ from the values of 0.25, 0.5, and 0.25, respectively.
420
CHAPTER 7
7.27
(a) The null hypothesis is H0 : pR = pX = 0.5 and the alternative hypothesis is that at least one of the proportions is not 0.5. The expected count in each category is n · pi = 436(0.5) = 218. The chi-square statistic is χ2 =
(192 − 218)2 (244 − 218)2 + = 3.10 + 3.10 = 6.20 218 218
Using a χ2 distribution with df = 1, we obtain a p-value of 0.0128. This gives evidence at a 5% level that these two genetic variations are not equally likely. (b) The null hypothesis is H0 : p = 0.5 and the alternative hypothesis is Ha : p = 0.5 where p represents the proportion classified R. (Note that the test would give the same results if we used the proportion classified X.) The sample statistic is p̂ = 244/436 = 0.5596. The test statistic is p̂ − p0 z=
=
p0 (1−p0 ) n
0.5596 − 0.5 = 2.490 0.5(0.5) 436
Using a standard normal distribution, we see that the area above 2.490 is 0.0064. This is a two-tail test, so the p-value is 2(0.0064) = 0.0128. This gives evidence at a 5% level that these two genetic variations are not equally likely. (c) The p-values are equal and the conclusions are identical (and the χ2 -statistic is the square of the z-statistic.) 7.28 The null hypothesis is that the superpowers are equally likely to be chosen, while the alternative hypothesis is that they are not equally likely. We can use statistical software to obtain output similar to that given below, or we can compute the results ourselves. We compute the expected counts using n · pi , where n = 453 and, since there are five categories and we are assuming equally likely, the assumed proportions are all pi = 0.2. For example, the expected count for Fly is n · p1 = 453 · 0.2 = 90.6. Similarly, we compute the rest of the expected counts. For each of the five options, we compute (Observed − Expected)2 /Expected, and the results are shown in the “Contribution” column in the output. Chi-Square Goodness-of-Fit Test for Categorical Variable: Superpower
Category Fly Freeze time Invisibility Super strength Telepathy N 453
N* 4
DF 4
Observed 120 129 80 19 105
Chi-Sq 85.9294
Test Proportion 0.2 0.2 0.2 0.2 0.2
Expected 90.6 90.6 90.6 90.6 90.6
Contribution to Chi-Sq 9.5404 16.2755 1.2402 56.5845 2.2887
P-Value 0.000
Adding up all the contributions, we obtain a χ2 statistic of 85.9294. There are five categories, so we use a chi-square distribution with 4 degrees of freedom, and find the area in the right-tail beyond 85.9294. This statistic is very far out in the tail, and we obtain a p-value of approximately zero. This is a very small p-value and provides very strong evidence that the superpowers are not equally popular. We see that, in the sample, freezing time was the most popular superpower and having super strength was the least popular.
CHAPTER 7
421
7.29 The null hypothesis is that the seasons are equally likely to be chosen, while the alternative hypothesis is that they are not equally likely. We can use statistical software to obtain output similar to that given below, or we can compute the results ourselves. We compute the expected counts using n · pi , where n = 454 and, since there are four categories and we are assuming equally likely, the assumed proportions are all pi = 0.25. For example, the expected count for Winter is n · p1 = 454 · 0.25 = 113.5. Similarly, we compute the rest of the expected counts. For each of the four options, we compute (Observed − Expected)2 /Expected, and the results are shown in the “Contribution” column in the output. Chi-Square Goodness-of-Fit Test for Categorical Variable: Season
Category Winter Spring Summer Fall
Observed 59 78 157 160
N 454
DF 3
N* 3
Chi-Sq 72.9956
Test Proportion 0.25 0.25 0.25 0.25
Expected 113.5 113.5 113.5 113.5
Contribution to Chi-Sq 26.1696 11.1035 16.6718 19.0507
P-Value 0.000
Adding up all the contributions, we obtain a χ2 statistic of 73.0. There are four categories, so we use a chisquare distribution with 3 degrees of freedom, and find the area in the right-tail beyond 73.0. This statistic is very far out in the tail, and we obtain a p-value of approximately zero. This is a very small p-value and provides very strong evidence that the seasons are not equally popular. We see that, in the sample, Fall and Summer are the most popular superpower while Winter is the least popular. 7.30
(a) The null hypothesis is that the assumed proportions are correct, while the alternative hypothesis is that they are not correct. We can use statistical software to obtain output similar to that given below, or we can compute the results ourselves. We compute the expected counts using n · pi , where n = 451 and we are told p1 = 0.1, p2 = 0.7, p3 = 0.1, and p4 = 0.1. For example, the expected count for Famous is n · p1 = 451 · 0.1 = 45.1. Similarly, we compute the rest of the expected counts. For each of the four options, we compute (Observed − Expected)2 /Expected, and the results are shown in the “Contribution” column in the output. Chi-Square Goodness-of-Fit Test for Categorical Variable: Preference
Category Famous Happy Healthy Rich
Observed 19 298 44 90
N 451
DF 3
N* 6
Chi-Sq 60.8245
Test Proportion 0.1 0.7 0.1 0.1
Expected 45.1 315.7 45.1 45.1
Contribution to Chi-Sq 15.1044 0.9924 0.0268 44.7009
P-Value 0.000
Adding up all the contributions, we obtain a χ2 statistic of 60.8245. There are four categories, so we use a chi-square distribution with 3 degrees of freedom, and find the area in the right-tail beyond 60.8245. This statistic is very far out in the tail, and we obtain a p-value of approximately zero. This is a very small p-value and provides very strong evidence that the assumed proportions are not correct.
CHAPTER 7
422
(b) We see that far fewer than expected want to be famous and far more than expected want to be rich. (The assumed proportions were pretty accurate for Happy and Healthy.) 7.31
(a) The null hypothesis is that the assumed proportions are correct, while the alternative hypothesis is that they are not correct. We can use statistical software to obtain output similar to that given below, or we can compute the results ourselves. We compute the expected counts using n · pi , where n = 455 and we are told p1 = 0.4, p2 = 0.4, p3 = 0.1, and p4 = 0.1. For example, the expected count for ‘A lot’ is n · p1 = 455 · 0.4 = 182.0. Similarly, we compute the rest of the expected counts. For each of the four options, we compute (Observed − Expected)2 /Expected, and the results are shown in the “Contribution” column in the output. Chi-Square Goodness-of-Fit Test for Categorical Variable: SchoolPressure
Category A lot Some Very little None
Observed 184 200 55 16
Test Proportion 0.4 0.4 0.1 0.1
N 455
Chi-Sq 22.9121
P-Value 0.000
N* 2
DF 3
Expected 182.0 182.0 45.5 45.5
Contribution to Chi-Sq 0.0220 1.7802 1.9835 19.1264
Adding up all the contributions, we obtain a χ2 statistic of 22.9121. There are four categories, so we use a chi-square distribution with 3 degrees of freedom, and find the area in the right-tail beyond 22.9121. This statistic is far out in the tail, and we obtain a p-value of, to three decimal places, 0.000. This is a very small p-value and provides very strong evidence that the assumed proportions are not correct. (b) We see that the only category that contributes significantly to the chi-square statistic is ‘None’. Far fewer than expected feel no pressure from schoolwork. (The assumed proportions were pretty accurate for the other categories.) 7.32 According to Benford’s Law the hypotheses are H0 : Ha :
p1 = 0.301, p2 = 0.176, p3 = 0.125, p4 = 0.097, p5 = 0.079, p6 = 0.067, p7 = 0.058, p8 = 0.051, p9 = 0.046 At least one of the proportions is different from Benford’s law
Here is a table of observed counts for the addresses and expected counts using the Benford proportions and a sample size of 1188. Digit Observed Expected
1 345 357.6
2 197 209.2
3 170 148.4
4 126 115.1
5 101 94.1
6 72 79.5
7 69 68.9
8 51 60.8
9 57 54.4
The value of the chi-square statistic is χ2 =
(observed − expected)2 expected
=
(345 − 357.6)2 (57 − 54.4)2 + ... + = 8.24 357.6 54.4
CHAPTER 7
423
We find the p-value = 0.41 using the upper tail beyond 8.24 of a chi-square distribution with 9 − 1 = 8 degrees of freedom. This is not a small p-value so we do not have convincing evidence that the first digits of street addresses in a phone book do not follow Benford’s law. Note that this doesn’t prove Benford’s law in this situation; we only have a lack of evidence against it. 7.33 According to Benford’s Law the hypotheses are H0 : Ha :
p1 = 0.301, p2 = 0.176, p3 = 0.125, p4 = 0.097, p5 = 0.079, p6 = 0.067, p7 = 0.058, p8 = 0.051, p9 = 0.046 At least one of the proportions is different from Benford’s law
Here is a table of observed counts for the invoices and expected counts using the Benford proportions and a sample size of 7273. Digit Observed Expected
1 2225 2189.4
2 1214 1280.7
3 881 908.7
4 639 704.8
5 655 575.9
6 532 486.9
7 433 421.8
8 362 372.0
9 332 332.8
The value of the chi-square statistic is (observed − expected)2 (2225 − 2189.4)2 (332 − 332.8)2 = + ... + = 26.66 χ2 = expected 2189.4 332.8 We find the p-value = 0.0008 using the upper tail beyond 26.66 of a chi-square distribution with 9 − 1 = 8 degrees of freedom. This is a very small p-value so we have strong evidence that the first digits of these invoices do not follow Benford’s law. The biggest contributions to the chi-square statistic come from an unusually large number of entries starting with “5” and too few with “4”. Auditors might want to look more carefully at invoices for amounts beginning with the digit “5”. 7.34 The hypotheses for the goodness-of-fit test are H0 : Ha :
p1 = 0.298, p2 = 0.380, p3 = 0.322 Some pi is wrong
where p1 , p2 , and p3 are the proportion of schools with private, profit, and public control, respectively. Using technology and the data in SampColleges we obtain the output below: Observed and Expected Counts Test Category Observed Proportion Private 11 0.298 Profit 14 0.380 Public 25 0.322 Chi-Square Test N DF Chi-Sq 50 2 7.25647
Expected 14.9 19.0 16.1
Contribution to Chi-Square 1.02081 1.31579 4.91988
P-Value 0.027
The small p-value (0.027) indicates we should reject the null hypothesis at a 5% level. Perhaps this sample of 50 schools does not represent the entire population well! Note that half of the schools in this sample are public, while the proportion in the population is less than 1/3. 7.35
(a) To test H0 : p0 = p1 = p2 = . . . = p9 = 0.10 vs Ha : Some pi = 0.10 the expected count in each cell is npi = 150(0.1) = 15. Here is a table of observed counts for the digits in RN D4.
CHAPTER 7
424
Category 0 1 2 3 4 5 6 7 8 9
Observed 12 14 16 13 22 10 27 14 10 12
Test Proportion 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
N 150
Chi-Sq 17.8667
P-Value 0.037
DF 9
Expected 15 15 15 15 15 15 15 15 15 15
Contribution to Chi-Sq 0.60000 0.06667 0.06667 0.26667 3.26667 1.66667 9.60000 0.06667 1.66667 0.60000
The contribution to the chi-square statistic for each cell is the value of sum of these values gives the chi-square statistic
(observed − expected)2 . The expected
χ2 = 0.6000 + 0.0667 + 0.0667 + . . . + 0.6000 = 17.87 We compare this to the upper tail of a chi-square distribution with 10 − 1 = 9 degrees of freedom to get a p-value of 0.037. This is a small p-value, providing evidence that the last digits are not randomly distributed. (b) To test the first digits with H0 : p1 = p1 = p2 = . . . = p9 = 1/9 vs Ha : Some pi = 1/9, the expected count in each cell is npi = 150 · 1/9 = 16.67. Here is a table of observed counts for the digits in RN D1.
Category 1 2 3 4 5 6 7 8 9
Observed 44 23 16 14 13 10 14 11 5
N 150
Chi-Sq 61.68
DF 8
Test Proportion 0.111111 0.111111 0.111111 0.111111 0.111111 0.111111 0.111111 0.111111 0.111111
Expected 16.6667 16.6667 16.6667 16.6667 16.6667 16.6667 16.6667 16.6667 16.6667
Contribution to Chi-Sq 44.8267 2.4067 0.0267 0.4267 0.8067 2.6667 0.4267 1.9267 8.1667
P-Value 0.000
The value of the chi-square statistic is χ2 = 44.83 + 2.41 + . . . + 8.17 = 61.68 We compare this to the upper tail of a chi-square distribution with 9 − 1 = 8 degrees of freedom to get a p-value ≈ 0. This provides very strong evidence that the first digits of these numbers are not chosen at random. Looking at the observed and expected counts we see that there are far more numbers starting with “1” than we would expect by random chance.
CHAPTER 7
425
7.36 The hypotheses are H0 : p0 = p1 = p2 = . . . = p9 = 0.10 Ha : Some pi = 0.10 Here are the observed and expected counts for the digits in SSN 8. Digit Observed Expected
0 13 15
1 14 15
2 16 15
3 13 15
4 14 15
5 15 15
6 17 15
7 15 15
8 15 15
9 18 15
Using technology we obtain the following output for this test X-squared = 1.6, df = 9, p-value = 0.9963 This is a very large p-value so we have no substantial evidence that the eighth digits of social security numbers are not random. Here are the observed and expected counts for the digits in SSN 9, the last digit. Digit Observed Expected
0 16 15
1 12 15
2 17 15
3 15 15
4 12 15
5 10 15
6 15 15
7 27 15
8 15 15
9 11 15
Using technology we obtain the following output for this test X-squared = 13.8667, df = 9, p-value = 0.1271 This is not a small p-value so we lack sufficient evidence to conclude the last digits of social security numbers are not random.
CHAPTER 7
426 Section 7.2 Solutions 7.37 For the (Group 3, Yes) cell we have Expected count =
Group 3 row total · Yes column total 100 · 260 = = 65 n 400
The contribution to the χ2 -statistic from the (Group 3, Yes) cell is
(72 − 65)2 = 0.754. 65
7.38 For the (B,E) cell we have Expected count =
B row total · E column total 330 · 160 = = 88 n 600
The contribution to the χ2 -statistic from the (B,E) cell is
(89 − 88)2 = 0.011. 88
7.39 We need to find the row total for Control, 40 + 50 + 5 + 15 + 10 = 120, and the column total for Disagree, 15 + 5 = 20. Adding the counts in all the cells shows that the overall sample size is n = 240. For the (Control, Disagree) cell we have Expected count =
120 · 20 = 10 240
The contribution to the χ2 -statistic from the (Control, Disagree) cell is
(15 − 10)2 = 2.5. 10
7.40 We need to find a row total for Group 2, 1180+320 = 1500, and the column total for No, 280+320 = 600. Since the row total for Group 1 is 720 + 280 = 1000, the overall sample size is n = 1000 + 1500 = 2500. For the (Group 2, No) cell we have 1500 · 600 = 360 Expected count = 2500 The contribution to the χ2 -statistic from the (Group 2, No) cell is
(320 − 360)2 = 4.44. 360
7.41 This is a 3 × 2 table so we have (3 − 1) · (2 − 1) = 2 degrees of freedom. Also, if we eliminate the last row and last column (ignoring the totals) there are 2 cells remaining. 7.42 This is a 3 × 4 table so we have (3 − 1) · (4 − 1) = 6 degrees of freedom. Also, if we eliminate the last row and last column (ignoring the totals) there are six cells remaining. 7.43 This is a 2 × 5 table so we have (2 − 1) · (5 − 1) = 4 degrees of freedom. Also, if we eliminate the last row and last column there are four cells remaining. 7.44 This is a 2 × 2 table so we have (2 − 1) · (2 − 1) = 1 degree of freedom. If the row and column totals are known, we only need the value for any one cell to be able to fill in the rest of the table. 7.45 This is a chi-square test for association for a 3 × 2 table. The relevant hypotheses are H0 : College plans are not associated with household income Ha : College plans are associated with household income
CHAPTER 7
427
We compute expected counts for all 6 cells. For example, for the first cell (less than $30,000 and 4-yr college), we have 210 · 534 = 121.9 Expected = 920 Using a similar process in each cell we find the expected counts shown in the table below.
Less than $30,000 $30,000 to $75,000 More than $75,000
4-yr college 121.9 189.2 222.9
Not 4-yr college 88.1 136.8 161.1
Notice that all expected counts are large, so we are comfortable in using the chi-square distribution. We compute the contribution to the χ2 -statistic, (observed − expected)2 /expected, for each cell. The results are shown in the next table. Less than $30,000 $30,000 to $75,000 More than $75,000
4-yr college 9.427 1.948 12.650
Not 4-yr college 13.044 2.695 17.502
Adding up all of these contributions, we find χ2 = 57.266. Using the upper tail of a chi-square distribution with df = 2, we see that the p-value is very small, with p-value ≈ 0. This is very significant, and we have very strong evidence of an association between four-year college plans and household income. Examining the data more closely, we see that teens from the lowest income category are much less likely than expected to plan on attending a four-year college while teens from the highest income category are far more likely to plan on attending a four-year college. 7.46
(a) This is a chi-square test for association for a 2 × 2 table. The relevant hypotheses are H0 : ADHD diagnosis is not associated with frequency of social media use Ha : ADHD diagnosis is associated with frequency of social media use We compute expected counts for all 4 cells. For example, for the first cell (High frequency and ADHD), we have 39 · 165 Expected = = 9.75 660 Using a similar process in each cell we find the expected counts shown in the table below.
High frequency Low frequency
ADHD 9.75 29.25
No ADHD 155.25 465.75
Notice that all expected counts are large enough, so we are comfortable in using the chi-square distribution. We compute the contribution to the χ2 -statistic, (observed − expected)2 /expected, for each cell. The results are shown in the next table.
High frequency Low frequency
ADHD 4.006 1.335
No ADHD 0.252 0.084
CHAPTER 7
428
Adding up these four contributions, we find χ2 = 5.667. Using the upper tail of a chi-square distribution with df = (2 − 1) · (2 − 1) = 1 · 1 = 1, we see that the p-value is 0.017. At a 5% level, this is significant, and we have evidence of an association between social media use and ADHD diagnosis. (b) We see that the upper-left cell contributes the most to the chi-square statistic. We see that the observed value is greater than the expected count. In context, this means that the number of ADHD diagnoses is greater than expected for those with high frequency of social media use. (c) No, we cannot conclude causation since this comes from an observational study and not an experiment. There are likely many confounding variables in the study. 7.47 The hypotheses for testing an association between these two categorical variables are H0 : Award preference is not related to Gender Ha : Award preference is related to Gender The table below shows the observed and expected counts for each cell. For example, the expected count for the (Female, Academy) cell is 31 · 169/362 = 14.5. Female Male Total
Academy 20 (14.5) 11 (16.5) 31
Nobel 76 (69.6) 73 (79.4) 149
Olympic 73 (85.0) 109 (97.0) 182
Total 169 193 362
The value of the chi-square statistic is χ2
=
=
(76 − 69.6)2 (73 − 85.0)2 (20 − 14.5)2 + + 14.5 69.6 85.0 (73 − 79.4)2 (109 − 97.0)2 (11 − 16.5)2 + + + 16.5 79.4 97.0 2.08 + 0.59 + 1.69 + 1.83 + 0.52 + 1.48 = 8.20
Since this is a 2 × 3 table we use a chi-square distribution with (3 − 1)(2 − 1) = 2 degrees of freedom. The area beyond χ2 = 8.20 is 0.017. This is a fairly small p-value, less than 5%, so we have fairly strong evidence that the award preferences tend to differ between male and female students. 7.48 The null hypothesis is that the attitude about one true love is not related to one’s educational level and the alternative hypothesis is that the two variables are related in some way. We compute expected counts for all 9 cells. For example, for the (Agree, Some) cell, we have Expected =
735 · 668 = 187.0 2625
Computing all the expected counts in the same way, we find the expected counts shown in the table. Note that all cell counts are large enough to use a χ2 -test. Agree Disagree Don’t know
HS 263.2 648.9 27.9
Some 187.0 461.1 19.8
College 284.8 702.0 30.2
We compute (observed − expected)2 /expected for each cell. The results are shown in the next table.
CHAPTER 7
429
Agree Disagree Don’t know
HS 37.71 13.02 2.24
Some 0.65 0.05 1.94
College 27.69 10.78 0.11
Adding up all of these contributions, we obtain the χ2 -statistic 93.7. This is a very large test statistic, and the p-value from a chi-square distribution with df = 4 is essentially zero. There is very strong evidence of an association between education level and how one feels about whether we all have one true love. We see that the largest contribution to the χ2 -statistic is in those who agree, with more high school educated people than expected agreeing and fewer college educated people than expected agreeing. It appears that the greater the amount of education, the less likely a person is to agree that we each have exactly one true love. 7.49
(a) The two-way table of penguin survival vs type tag is shown:
Survived Died Total
Metal 10 40 50
Electronic 18 32 50
Total 28 72 100
(b) The hypothesis are H0 : Type of tag is not related to survival Ha : Type of tag is related to survival (c) The table below shows the expected counts, obtained for each cell by multiplying the row total by the column total and dividing by n = 100.
Survived Died
Metal 14 36
Electronic 14 36
(d) We calculate the chi-square test statistic χ2
= = =
(18 − 14)2 (40 − 36)2 (32 − 36)2 (10 − 14)2 + + + 14 14 36 36 1.143 + 1.143 + 0.444 + 0.444 3.174
(e) We compare our test statistic of 3.174 from part (c) to a chi-square with 1 degree of freedom to get a p-value of 0.075. At a 5% level, we do not have enough evidence that the type of tag and survival rate of the penguins are related. (Remember, though, that this does not mean that they are not related. A larger sample size might show a relationship.) 7.50
(a) Based on the given counts, here is the two-way table, with totals included.
Desipramine Lithium Placebo Total
Relapse 10 18 20 48
No relapse 14 6 4 24
Total 24 24 24 72
CHAPTER 7
430
(b) The expected count for the (Desipramine, Relapse) cell is (48 · 24)/72 = 16. All expected counts are shown in the table. Since the sample size is the same for each group, we see that the expected counts are the same from row to row, matching the null hypothesis that the treatment drug doesn’t matter. Since all the expected counts are greater than 5, a chi-square test is appropriate.
Desipramine Lithium Placebo
Relapse 16 16 16
No relapse 8 8 8
(c) The null hypothesis is that the drug does not affect the likelihood of a relapse, and the alternative hypothesis is that the drug does matter in the chances of recovery. We use the observed and expected counts to find the χ2 statistic is χ2 =
(14 − 8)2 (18 − 16)2 (6 − 8)2 (20 − 16)2 (4 − 8)2 (10 − 16)2 + + + + + = 10.5 16 8 16 8 16 8
Using the χ2 distribution with df = 2, we find a p-value of 0.005. There is strong evidence that the drug used is related to the likelihood of a relapse. (d) Desipramine appears to be significantly more effective than lithium or a placebo. Yes, we can conclude that the drug affects the likelihood of successful recovery, since the results come from a randomized experiment. 7.51 This is a chi-square test for association for a 3 × 2 table. The relevant hypotheses are H0 : Painkiller use is not associated with miscarriages Ha : Painkiller use is associated with miscarriages We compute expected counts for all 6 cells. For example, for the (NSAIDs, Miscarriage) cell, we have Expected =
145 · 75 = 10.8 1009
Using a similar process in each cell we find the expected counts shown in the table below. NSAIDs Acetaminophen No painkiller
Miscarriage 10.8 24.7 109.5
No miscarriage 64.2 147.3 652.5
Notice that all expected counts are above 5, so we are comfortable in using the chi-square distribution. We compute the contribution to the χ2 -statistic, (observed − expected)2 /expected, for each cell. The results are shown in the next table. NSAIDs Acetaminophen No painkiller
Miscarriage 4.80 0.02 0.39
No miscarriage 0.81 0.00 0.06
Adding up all of these contributions, we find χ2 = 6.08. Using the upper tail of a chi-square distribution with df = 2, we see that the p-value is 0.0478. This is significant (just barely) at a 5% level, so we find
CHAPTER 7
431
evidence of an association between the use of painkillers and the chance of a miscarriage. Notice that almost all of the contribution to the chi-square statistic comes from the fact that the number of miscarriages after using NSAIDs (aspirin or ibuprofen) is particularly high compared to what is expected if the variables are unrelated. Pregnant women might be wise to avoid these painkillers. However, we cannot assume that NSAIDs are causing the miscarriages (although that might be the case), since these data come from an observational study not an experiment. For example, there could easily be some other condition that women treat with aspirin or ibuprofen that increases the chance of miscarriage. 7.52 This is a chi-square test for association for a 4 × 2 table. The relevant hypotheses are H0 : Drinking habits are not related to gender Ha : Drinking habits are related to gender We compute the expected counts. For example, the expected count for males with 0 drinks is (8956 · 18712)/27268 = 6145.84. The computer output below shows, for each cell, the observed counts with expected counts below them and contribution to the chi-square statistic below that. M 5402 6145.84 90.027
F 13310 12566.16 44.030
Total 18712
1 - 2
2147 1913.18 28.575
3678 3911.82 13.976
5825
3 - 4
912 616.82 141.262
966 1261.18 69.088
1878
5 +
495 280.16 164.744
358 572.84 80.573
853
Total
8956
18312
27268
0
Chi-Sq = 632.276, DF = 3, P-Value = 0.000 We see that the χ2 test statistic is 632.276 and the p-value is essentially zero. This provides very strong evidence that drinking habits are not the same between males and females. We see that observed counts for males are less than expected for nondrinkers and greater than expected for more drinks, whereas that pattern is switched for females. Males tend to drink more alcoholic beverages than females. 7.53
(a) This question is best answered with a χ2 goodness-of-fit test, ignoring gender because gender is not mentioned in the question. The hypotheses are H0 : pgrades = psports = ppopular = 1/3 Ha : At least one pi = 1/3 If all three answers were equally likely, the expected count for each answer would be 478 × 1/3 = 159.3. Therefore, our table of observed (expected) counts is as follows:
CHAPTER 7
432 Grades 247 (159.3)
Sports 90 (159.3)
Popular 141 (159.3)
We compute the χ2 statistic as χ2 =
(observed − expected)2 expected
=
(247 − 159.3)2 (90 − 159.3)2 (141 − 159.3)2 + + = 80.53 159.3 159.3 159.3
Comparing this to a χ2 distribution with 3−1 = 2 degrees of freedom yields a p-value of approximately 0. There is extremely strong evidence that grades, sports, and popularity are not equally important among middle school students in these school districts. (b) This is a χ2 test for association with a 2 × 3 table. The hypotheses are H0 : Gender and what students value are not associated Ha : Gender and what students value are associated The expected counts, computed with expected =
row total × column total overall total
Expected Boy Girl
Grades 117.3 129.7
are given in the table below. Sports 42.7 47.3
Popular 67.0 74.0
The χ2 statistic is computed as follows:
χ2
= =
(60 − 42.72 ) (50 − 67)2 (130 − 129.7)2 (30 − 47.3)2 (91 − 74)2 (117 − 117.3)2 + + + + + 117.3 42.7 67 129.7 47.3 74 21.56
Because all the expected counts are greater than 5, we can compare 21.56 to a χ2 -distribution with (3 − 1) × (2 − 1) = 2 degrees of freedom. The resulting p-value is 0.00002. This provides enough evidence to reject the null hypothesis and conclude that gender is associated with how students answer what is important to them. At least in these school districts in Michigan, middle school boys and girls have different priorities regarding grades, sports, and popularity. 7.54 This is a chi-square test for association for a 2 × 2 table. The relevant hypotheses are H0 : Outcome is not associated with drug taken Ha : Outcome is associated with drug taken We compute expected counts for all 4 cells. For example, for the first cell (Diabetes and Teplizumab), we have 42 · 38 = 21 Expected = 76 Using a similar process in each cell we find the expected counts shown in the table below.
CHAPTER 7
433
Teplizumab Control
Diabetes 21 21
No diabetes 17 17
Notice that all expected counts are large enough, so we are comfortable in using the chi-square distribution. We compute the contribution to the χ2 -statistic, (observed − expected)2 /expected, for each cell. The results are shown in the next table. Teplizumab Control
Diabetes 1.19 1.19
No diabetes 1.47 1.47
Adding up these four contributions, we find χ2 = 5.32. Using the upper tail of a chi-square distribution with df = (2 − 1) · (2 − 1) = 1 · 1 = 1, we see that the p-value is 0.021. At a 5% level, this is significant, and we have evidence of an association between outcome and the drug taken. Examining the data more closely, we see that those taking the new drug were less likely to develop juvenile diabetes. 7.55
(a) The hypotheses are H0 : Skittles choice does not depend on method of choosing (color vs flavor) Ha : Skittles choice depends on method of choosing
(b) The expected counts under H0 are Color Flavor
Green(Lime) 13.0 18.0
Orange 10.5 14.5
Purple(Grape) 14.3 19.7
Red(Strawberry) 19.8 27.2
Yellow(Lemon) 8.4 11.6
(c) All of the expected counts are larger than 5, so we can use a chi-square test. (d) We have (5 − 1)(2 − 1) = 4 degrees of freedom. (e) The chi-square statistic is χ2 =
(9 − 10.5)2 (9 − 11.6)2 (18 − 13.0)2 + + ... + = 9.07 13.0 10.5 11.6
(f) Comparing our test statistic to a chi-square distribution with 4 degrees of freedom we get a p-value of 0.059. This is right on the border, so we see weak evidence that choosing flavor vs color might affect the choices, but not enough to reject the null hypothesis if we are using a 5% level. 7.56
(a) Architects had the highest proportion of left-handed people (26/148 = 0.176); Orthopedic Surgeon had the highest proportion of right-handed people (121/132 = 0.917).
(b) The null and alternative hypotheses are H0 : Handedness and career are not associated vs Ha : Handedness and career are associated. The observed and expected counts are shown in the following table.
Psychiatrist Architect Orthopedic Surgeon Lawyer Dentist Total
Right 101 (99.6) 115 (124.9) 121 (111.4) 83 (88.6) 116 (111.4) 536
Left 10 (12.5) 26 (15.6) 5 (13.9) 16 (11.1) 10 (13.9) 67
Ambidextrous 7 (5.9) 7 (7.5) 6 (6.7) 6 (5.3) 6 (6.7) 32
Total 118 148 132 105 132 635
CHAPTER 7
434 The value of the chi-square statistic is χ2
=
=
(10 − 12.7)2 (7 − 5.9)2 (115 − 124.9)2 (26 − 15.6)2 (101 − 99.6)2 + + + + 99.6 12.7 5.9 124.9 15.6 (121 − 111.4)2 (5 − 13.9)2 (6 − 6.7)2 (83 − 88.6)2 (7 − 7.5)2 + + + + + 7.5 111.4 13.9 6.7 88.6 (6 − 5.3)2 (116 − 111.4)2 (10 − 13.9)2 (6 − 6.7)2 (16 − 11.1)2 + + + + + 11.1 5.3 111.4 13.9 6.7 19.0
The degrees of freedom are (5 − 1)(3 − 1) = 8, and the area above 19.0 in the chi-square distribution gives a p-value of 0.015. (c) At the 5% significance level we can conclude that career choice is associated with handedness, while at the 1% level we cannot conclude that there is an association with handedness for these five professions. 7.57
(a) The expected count in the (Endurance, XX) cell is 34.75, and the contribution of this cell to the chi-square statistic is 3.645. We find the expected count using Endurance row total · XX column total 194 · 132 = = 34.75 Sample size 737 The contribution to the chi-square statistic is (observed − expected)2 (46 − 34.75)2 = = 3.642 expected 34.75 which is the same (up to round-off) as the computer output.
(b) We see in the bottom row of the computer output that “DF = 4”. Since the two-way table has 3 rows and 3 columns, we have df = (3 − 1) · (3 − 1) = 4, as given. (c) We see in the bottom row of the output that the chi-square test statistic is 24.805 and the p-value is 0.000. There is strong evidence that the distribution of genotypes for this gene is different between sprinters, endurance athletes, and non-athletes. (d) The (Sprint, XX) cell contributes the most, 9.043, to the χ2 -statistic. The observed count (6) is substantially less than the expected count (19.16). Sprinters are not likely to have this genotype. (e) The genotype RR is most over-represented in sprinters (53 compared to an expected count of 35.28). The genotype XX is most over-represented in endurance athletes (46 compared to an expected count of 34.75). 7.58
(a) We see in the Total column that 194 endurance athletes were included in the study.
(b) The expected count for sprinters with the R allele is 61.70 and the contribution to the chi-square statistic is 3.792. We find the expected count using Sprinter row total · R column total 107 · 425 = = 61.70 Sample size 737 The contribution to the chi-square statistic is (Observed − Expected)2 (77 − 61.70)2 = = 3.794 Expected 61.70 which is the same (up to round-off) as the computer output.
CHAPTER 7
435
(c) We see in the bottom row of the computer output that “DF = 2”. Since the two-way table has 3 rows and 2 columns, we have df = (3 − 1) · (2 − 1) = 2, as given. (d) We see in the bottom row of the output that the chi-square test statistic is 10.785 and the p-value is 0.005. There is strong evidence that the distribution of alleles for this gene is different between sprinters, endurance athletes, and non-athletes. (e) The (Sprint, X) cell contributes the most, 5.166, to the χ2 -statistic. The observed count (30) is substantially less than the expected count (45.30). Sprinters are less likely to have this X allele. (f) The allele R is most over-represented in sprinters (77 observed compared to 61.7 expected). For endurance athletes the most over-represented allele is X (90 observed compared to 82.1 expected). 7.59 The p-value is 0.592, which is a large p-value. The sample provides no evidence at all that genotype distribution is different between males and females. Gender does not appear to be associated with whether or not one has the “sprinting gene”. 7.60
(a) The null hypothesis is that superpower preference and gender are not associated, while the alternative hypothesis is that they are associated. We can use statistical software to obtain output similar to that given below, or use the formulas from this section to arrive at the same values shown in the output. Chi-Square Test for Association: Superpower, Gender Female
Male
Fly
51 58.54 0.9719
69 61.46 0.9258
Freeze time
58 62.93 0.3868
71 66.07 0.3685
Invisibility
39 39.03 0.0000
41 40.97 0.0000
Super strength
5 9.27 1.9664
14 9.73 1.8732
Telepathy
68 51.23 5.4933
37 53.77 5.2328
Cell Contents:
Count Expected count Contribution to Chi-square
Pearson Chi-Square = 17.219, DF = 4, P-Value = 0.002
CHAPTER 7
436
As the output explains, each cell contains three things: the observed count, the expected count beneath it, and the contribution to the chi-square statistic below that. In the last line of the output, we see that the χ2 statistic is 17.219. We also see that the degrees of freedom are 4 (as we expect for a 5 × 2 two-way table), and the p-value is 0.002. This p-value is well below any reasonable significance level, and we conclude that there is strong evidence of an association between superpower preference and gender. (b) Looking at the data more closely, we see that the largest contribution to the χ2 statistic is for the superpower ‘telepathy’ and we see that females are more likely than expected to pick that superpower and that males are less likely than expected. 7.61
(a) The null hypothesis is that preference and gender are not associated, while the alternative hypothesis is that they are associated. We can use statistical software to obtain output similar to that given below, or use the formulas from this section to arrive at the same values shown in the output. Chi-Square Test for Association: Preference, Gender Female
Male
Famous
5 9.31 1.996
14 9.69 1.917
Happy
159 146.03 1.153
139 151.97 1.107
Healthy
33 21.56 6.069
11 22.44 5.831
Rich
24 44.10 9.163
66 45.90 8.804
Cell Contents:
Count Expected count Contribution to Chi-square
Pearson Chi-Square = 36.040, DF = 3, P-Value = 0.000 As the output explains, each cell contains three things: the observed count, the expected count beneath it, and the contribution to the chi-square statistic below that. In the last line of the output, we see that the χ2 statistic is 36.040. We also see that the degrees of freedom are 3 (as we expect for a 4 × 2 two-way table), and the p-value is 0.000. This p-value is very small, and we conclude that there is strong evidence of an association between preference and gender. (b) Looking at the data more closely, we see that the cells for choices of Healthy and Rich have the largest contributions to the χ2 statistic. For Healthy, we see that females are more likely than expected to
CHAPTER 7
437
pick that choice and that males are less likely than expected. For Rich, it is the opposite: females are less likely than expected to pick that choice and males are more likely than expected. 7.62 The null hypothesis is that preference and allergies are not associated, while the alternative hypothesis is that they are associated. We can use statistical software to obtain output similar to that given below, or use the formulas from this section to arrive at the same values shown in the output. Chi-Square Test for Association: Preference, Allergies No
Yes
Famous
9 10.49 0.21165
10 8.51 0.26089
Happy
167 164.53 0.03715
131 133.47 0.04579
Healthy
22 24.29 0.21638
22 19.71 0.26672
Rich
51 49.69 0.03456
39 40.31 0.04260
Cell Contents:
Count Expected count Contribution to Chi-square
Pearson Chi-Square = 1.116, DF = 3, P-Value = 0.773 As the output explains, each cell contains three things: the observed count, the expected count beneath it, and the contribution to the chi-square statistic below that. In the last line of the output, we see that the χ2 statistic is 1.116. We also see that the degrees of freedom are 3 (as we expect for a 4 × 2 two-way table), and the p-value is 0.773. This p-value is large, and we conclude that there is not enough evidence to show any association between preference and whether a person has allergies. 7.63
(a) The null hypothesis is that favorite season and gender are not associated, while the alternative hypothesis is that they are associated. We can use statistical software to obtain output similar to that given below, or use the formulas from this section to arrive at the same values shown in the output. Chi-Square Test for Association: Season, Gender
Winter
Female
Male
23 28.98
36 30.02
CHAPTER 7
438 1.2340
1.1913
Spring
35 38.31 0.2864
43 39.69 0.2765
Summer
70 77.12 0.6568
87 79.88 0.6340
Fall
95 78.59 3.4264
65 81.41 3.3077
Cell Contents:
Count Expected count Contribution to Chi-square
Pearson Chi-Square = 11.013, DF = 3, P-Value = 0.012 As the output explains, each cell contains three things: the observed count, the expected count beneath it, and the contribution to the chi-square statistic below that. In the last line of the output, we see that the χ2 statistic is 11.013. We also see that the degrees of freedom are 3 (as we expect for a 4 × 2 two-way table), and the p-value is 0.012. This p-value is smaller than a 5% significance level, so we have evidence of an association between favorite season and gender. (b) Looking at the data more closely, we see that the largest contributions to the χ2 statistic come from cells for the Fall season. We see that females are more likely than expected to pick Fall as their favorite season and that males are less likely than expected to make that choice. 7.64
(a) The null hypothesis is that favorite mode of communication and gender are not associated, while the alternative hypothesis is that they are associated. We can use statistical software to obtain output similar to that given below, or use the formulas from this section to arrive at the same values shown in the output. Chi-Square Test for Association: Communicate, Gender Female
Male
App
23 33.35 3.2123
44 33.65 3.1837
In person
50 47.79 0.1026
46 48.21 0.1017
Phone
21 17.92
15 18.08
CHAPTER 7
439
Text
0.5295
0.5248
129 123.94 0.2062
120 125.06 0.2044
Cell Contents:
Count Expected count Contribution to Chi-square
Pearson Chi-Square = 8.065, DF = 3, P-Value = 0.045 As the output explains, each cell contains three things: the observed count, the expected count beneath it, and the contribution to the chi-square statistic below that. In the last line of the output, we see that the χ2 statistic is 8.065. We also see that the degrees of freedom are 3 (as we expect for a 4 × 2 two-way table), and the p-value is 0.045. This p-value is just barely smaller than a 5% significance level, so, at a 5% level, we have evidence of an association between favorite way to communicate and gender. (b) Looking at the data more closely, we see that the largest contributions to the χ2 statistic come from cells for ‘App’. We see that females are less likely than expected to pick this as their favorite way to communicate and that males are more likely than expected to make that choice. 7.65
(a) We would write “relapse” or “no relapse” on each card. There would be 38 relapse cards and 24 no relapse cards. We would shuffle the cards together and then deal them into three equal piles, signifying the three different groups, desipramine, lithium, and placebo.
(b) Because the p-value is 0.005, about 5 out of 1000 randomization statistics will be greater than or equal to the observed statistic. 7.66 The null hypothesis is that the two variables are related and the alternative hypothesis is that the two variables are not related. The output from one statistics package (Minitab) is given. Output from other packages may look different but will give the same (or similar) chi-square statistic and p-value. We see that the p-value is 0.004 so there is a significant association between these two variables. The largest contribution to the chi-square statistic is from the males who do not take vitamins, and we see that males are less likely to take vitamins than expected. Rows: Gender
Columns: VitaminUse No
Occasional
Regular
Female
87 96.20 0.8798
77 71.07 0.4954
109 105.73 0.1009
Male
24 14.80 5.7189
5 10.93 3.2199
13 16.27 0.6560
Cell Contents:
Count
CHAPTER 7
440 Expected count Contribution to Chi-square Pearson Chi-Square = 11.071, DF = 2, P-Value = 0.004
7.67 The null hypothesis is that the two variables are related and the alternative hypothesis is that the two variables are not related. The output from one statistics package (Minitab) is given. Output from other packages may look different but will give the same (or similar) chi-square statistic and p-value. We see that the p-value is 0.028 so there is a significant association between these two variables at the 5% level although not at the 1% level. The largest contribution to the chi-square statistic is from the males who do not smoke, and we see that males are less likely to be nonsmokers than expected (which means they are more likely to smoke). Rows: Gender
Columns: PriorSmoke 1
2
3
Female
144 136.07 0.4626
93 99.67 0.4459
36 37.27 0.0431
Male
13 20.93 3.0066
22 15.33 2.8986
7 5.73 0.2798
Cell Contents:
Count Expected count Contribution to Chi-square
Pearson Chi-Square = 7.137, DF = 2, P-Value = 0.028
CHAPTER 8
441
Section 8.1 Solutions 8.1 Both datasets have the same group means of x1 = 15 and x2 = 20. However, there is so much variability within the groups for Dataset A that we really can’t be sure of a difference. In Dataset B, since there is so much less variability between the groups, it seems obvious that the groups come from different populations. (Think of it this way: if the next number we saw was a 20, would we know which group it belongs to in Dataset A? No. Would we know which group it belongs to in Dataset B? Yes!) We have more convincing evidence for a difference between group 1 and group 2 in Dataset B, since the variability within groups is so much less. 8.2 Both datasets have the same variability within groups, but the two means are much farther apart in Dataset B (means of 15 and 50) than in Dataset A (means of 15 and 20), so have more convincing evidence for a difference in means in Dataset B. 8.3 The scales are the same and it appears that the variability is about the same for both datasets. However the means appear to be much farther apart in Dataset A, so we have more convincing evidence for a difference in population means in Dataset A. 8.4 The scales are the same and it appears that the sample means (about 10 and 20) are the same for the two datasets. However, there is much more variability within the groups in Dataset A, so we have more convincing evidence for a difference in population means in Dataset B. 8.5 The scales are the same and it appears that the sample means (about 15 and 25) are the same for the two datasets. However, there is much more variability within the groups in Dataset A, so have more convincing evidence for a difference in population means in Dataset B. 8.6 The variability within groups seems very similar between the two datasets but the means appear to be much farther apart in Dataset A than in Dataset B, so we have more convincing evidence for a difference in population means in Dataset A. 8.7 Since there are three groups, the degrees of freedom for the groups is 3 − 1 = 2. Since the total number of data values is 15, the total degrees of freedom is 15 − 1 = 14. This leaves 12 degrees of freedom for the Error (or use n − #groups = 15 − 3 = 12). The Mean Squares are found by dividing each SS by its df, so we compute M SG = 120/2 = 60
(for Groups)
and
M SE = 282/12 = 23.5
The F-statistic is the ratio of the mean square values, we have F =
60 M SG = = 2.55 M SE 23.5
The completed table is shown below. Source
df
SS
MS
Groups
2
120
60
Error
12
282
23.5
Total
14
402
F-statistic 2.55
(for Error)
CHAPTER 8
442
8.8 Since there are four groups, degrees of freedom for the groups is 4 − 1 = 3. Since the total number of data values is 40, the total degrees of freedom is 40 − 1 = 39. This leaves 36 degrees of freedom for the Error (or use n − #groups = 40 − 4 = 36). The Mean Squares are found by dividing each SS by its df, so we compute M SG = 960/3 = 320
(for Groups)
and
M SE = 5760/36 = 160
(for Error)
The F-statistic is the ratio of the mean square values, we have F =
M SG 320 = = 2.0 M SE 160
The completed table is shown below. Source
df
SS
MS
Groups
3
960
320
Error
36
5760
160
Total
39
6720
F-statistic 2.0
8.9 Since there are three groups, degrees of freedom for the groups is 3 − 1 = 2. Since the total number of data values is 10 + 8 + 11 = 29, the total degrees of freedom is 29 − 1 = 28. This leaves 26 degrees of freedom for the Error (or use n − #groups = 29 − 3 = 26). We also find the missing sum of squares for Error by subtraction, SSE = SST otal − SSG = 1380 − 80 = 1300. The Mean Squares are found by dividing each SS by its df, so we compute M SG = 80/2 = 40
(for Groups)
and
M SE = 1300/26 = 50
(for Error)
The F-statistic is the ratio of the mean square values, we have F =
M SG 40 = = 0.8 M SE 50
The completed table is shown below. Source
df
SS
MS
Groups
2
80
40
Error
26
1300
50
Total
28
1380
F-statistic 0.8
8.10 Since there are four groups, degrees of freedom for the groups is 4 − 1 = 3. Since the total number of data values is 5 + 8 + 7 + 5 = 25, the total degrees of freedom is 25 − 1 = 24. This leaves 21 degrees of freedom for the Error (or use n − #groups = 25 − 4 = 21). We also find the missing sum of squares for Groups by
CHAPTER 8
443
subtraction, SSG = SST otal − SSE = 1400 − 800 = 600. The Mean Squares are found by dividing each SS by its df, so we compute M SG = 600/3 = 200
(for Groups)
and
M SE = 800/21 = 38.095
(for Error)
The F-statistic is the ratio of the mean square values, we have F =
200 M SG = = 5.25 M SE 38.095
The completed table is shown below.
8.11
Source
df
SS
MS
Groups
3
600
200
Error
21
800
38.095
Total
24
1400
F-statistic 5.25
(a) Since the degrees of freedom for the groups is 3, the number of groups is 4.
(b) The hypotheses are H0 : Ha :
µ1 = µ2 = µ3 = µ4 Some µi = µj
(c) Using 3 and 16 for the degrees of freedom with the F-distribution, we see the upper-tail area beyond F=1.60 gives a p-value of 0.229. (d) We do not reject H0 . We do not find convincing evidence for any differences between the population means. 8.12
(a) Since the degrees of freedom for the groups is 4, the number of groups is 5.
(b) The hypotheses are H0 : Ha :
µ1 = µ2 = µ3 = µ4 = µ5 Some µi = µj
(c) Using 4 and 35 for the degrees of freedom with the F-distribution, we see the upper-tail area beyond F=5.71 gives a p-value of 0.0012. (d) We reject H0 . There is strong evidence that the population means are not all the same. 8.13
(a) Since the degrees of freedom for the groups is 2, the number of groups is 3.
(b) The hypotheses are H0 :
µ1 = µ2 = µ3
Ha :
Some µi = µj
CHAPTER 8
444
(c) Using 2 and 27 for the degrees of freedom with the F-distribution, we see the upper-tail area beyond F=8.60 gives a p-value of 0.0013. (d) We reject H0 . We find strong evidence for differences among the population means. 8.14
(a) Since the degrees of freedom for the groups is 3, the number of groups is 4.
(b) The hypotheses are H0 : Ha :
µ1 = µ2 = µ3 = µ4 Some µi = µj
(c) Using 3 and 16 for the degrees of freedom with the F-distribution, we see the upper-tail area beyond F=0.75 gives a p-value of 0.538. (d) We do not reject H0 . We do not find convincing evidence of any differences between the population means. 8.15
(a) One variable is which group the girl is in, which is categorical. The other variable is the change in cortisol level, which is quantitative.
(b) This is an experiment, since the researchers assigned the girls to the different groups. (c) The hypotheses are H0 :
µ1 = µ2 = µ3 = µ4
Ha :
Some µi = µj
where the four means represent the mean cortisol change after a stressful event for girls who talk to their mothers in person, who talk to their mothers on the phone, who text their mothers, and who have no contact with their mothers, respectively. (d) Since the overall sample size is 68, total degrees of freedom is 67. Since there are four groups, the df for groups is 3. This leaves 64 degrees of freedom for the error (or use 68 − 4 = 64). (e) Since they found a significant difference in mean cortisol change between at least two of the groups, the F-statistic must be significant, which means its p-value is less than 0.05. 8.16
(a) One variable is the type of beverage, which is categorical. The other variable is the change in number of bacteria, which is quantitative.
(b) Since there are seven beverages (groups), the degrees of freedom for groups is 7 − 1 = 6. (c) Since they found a significant difference in mean change in bacteria between at least two of the groups, the F-statistic must be significant, which means its p-value is less than 0.05. (d) The generic conclusion is to reject H0 . We have evidence of a significant difference in mean change in bacteria between at least two of the groups. 8.17
(a) The explanatory variable is amount of pressure felt from schoolwork, which is categorical. The response variable is number of hours per week hanging out with friends, which is quantitative.
(b) The highest mean (15.91 hours per week hanging with friends) is for the group feeling ‘Very little’ pressure from schoolwork. The lowest mean (9.866 hours per week hanging out with friends) is for the group feeling ‘A lot’ of pressure from schoolwork. (c) The total degrees of freedom is n − 1 = 446 so the number of students included in the analysis is n = 447.
CHAPTER 8
445
(d) We see in the output that the F-statistic is 4.75 and the p-value is 0.003. (e) The p-value is smaller than any reasonable significance level, so we reject H0 . (f) We have evidence of a difference in mean number of hours a week spent hanging out with friends depending on how much stress students feel from schoolwork. 8.18
(a) We see that in both groups with high synchronization, the mean difference in closeness rating (CloseDif f = CloseAf ter − CloseBef ore) is positive, so feelings of closeness with others in the group went up when groups did an exercise with high synchronization.
(b) We see that in the LS+HE group, the mean difference in closeness (CloseDif f = CloseAf ter − CloseBef ore) is positive (0.379), so feelings of closeness with others in the group went up when a group did a high exertion exercise together, even if there was little synchronization. However, we see in the LS+LE group that the mean difference is negative (−0.431), so feelings of closeness with others in the group actually went down when the exercise was both low exertion and low synchronization. (c) The number of students included in the analysis is n = 260. We see this either by adding one to the df-total number in the ANOVA table or by adding up the four group sample sizes. Although there were 264 students in the full dataset, four had missing values for the CloseBef ore variable. (d) The p-value is 0.042, which is less than a 5% significance level, so we reject H0 . We find a difference in mean closeness rating toward others in a group based on the type of exercise one engages in with a group. It appears that if the exercise is either synchronized or high exertion, the mean effect on closeness ratings is similar (and positive), but if the exercise is neither synchronized nor high exertion, the mean effect appears to go in the other direction. (We can use the methods of the next section to verify this conclusion.) (e) Since the p-value of 0.042 is not less than 1%, we do not reject H0 at a 1% level. The evidence is not strong enough to find a difference in mean closeness ratings between the different activity groups at a 1% level. 8.19
(a) Using µr , µg , and µb to represent mean number of anagrams solved by someone with prior exposure to red, green, and black, respectively, the hypotheses are H0 : Ha :
µr = µg = µb At least two of the means are different
(b) We subtract to see that SSE = SStotal − SSG = 84.7 − 27.7 = 57.0. Since there are three groups, df for color is 3 − 1 = 2. Since the total sample size is 19 + 27 + 25 = 71, total df is 71 − 1 = 70. Thus, the error df is 68. We divide by the respective degrees of freedom to find M SG and M SE and then divide those to find the F-statistic. The analysis of variance table is shown, and the F-statistic is 16.5. Source Groups Error Total
DF 2 68 70
SS 27.7 57.0 84.7
MS 13.85 0.84
F 16.5
(c) We find the area above F = 16.5 in an F-distribution with numerator df equal to 2 and denominator df equal to 68, and see that the p-value is essentially zero. (d) Reject H0 and conclude that the means are not all the same. The color of prior instructions has an effect on how students perform on this anagram test. By looking at the sample group means, it appears that seeing red prior to the test may hinder students ability to solve the anagrams.
446
CHAPTER 8
8.20
(a) Legs together with no lap pad has the largest temperature increase. Spreading the legs apart has the smallest temperature increase.
(b) Yes, the standard deviations are similar. The largest, s1 = 0.96, is not more than twice the smallest, s3 = 0.66. (c) The null hypothesis is that the population mean temperature increases for the three conditions are all the same and the alternative hypothesis is that at least two of the means are different. We find the mean squares by dividing the sum of squares by the respective degrees of freedom (df = 3 − 1 = 2 for Groups, df = 87 − 3 = 84 for Error). The F-statistic is the ratio of the two mean squares. F =
6.85 M SG = = 10.9 M SE 0.63
These calculations are summarized in the ANOVA table below. Source Groups Error Total
DF 2 84 86
SS 13.7 53.2 66.9
MS 6.85 0.63
F 10.9
P 0.0001
The p-value is the area above 10.9 in an F-distribution with numerator degrees of freedom 2 and denominator degrees of freedom 84. Using technology we see that the p-value=0.0001. We reject H0 and find strong evidence that average temperature increase is not the same for these three conditions. It appears that spreading legs apart may be more effective at reducing the temperature increase. 8.21
(a) Yes, the control groups spend less time in darkness than the groups that were stressed. The mean time in darkness is smaller for each of the groups with “HC” than the means for any of the stressed groups with “SD””. However, of the groups that were stressed, the mice that spent time in an enriched environment (EE) appear to spend less time (on average) in darkness than the other two stressed groups.
(b) The null hypothesis is that environment and prior stress do not affect mean amount of time in darkness, while the alternative is that environment and prior stress do affect mean amount of time in darkness. To construct the ANOVA table we compute M SG =
482776 = 96355.2 5
M SE =
177835 = 4234.2 42
F =
M SG 96355.2 = = 22.76 M SE 4234.2
We summarize these calculations in the ANOVA table below. Source Light Error Total
DF 5 42 47
SS 481776 177835 659611
MS 96355.2 4234.2
F 22.76
P 0.000
From an F-distribution with 5 and 42 degrees of freedom, we see that the p-value is essentially zero. There is strong evidence that the average amount of time spent in darkness is not the same for all six combinations of environment and stress. We will see in the next section how to tell where the differences lie.
CHAPTER 8 8.22
447
(a) Yes, the control groups spend less time immobile than the groups that were stressed. The mean time immobile is smaller for each of the groups with “HC” than the means for any of the stressed groups with “SD”. However, of the groups that were stressed, the mice that spent time in an enriched environment (EE) appear to spend much less time immobile (on average) than the mice in the other two stressed groups.
(b) The null hypothesis is that environment and prior stress do not affect mean amount of time immobile, while the alternative is that environment and prior stress do affect mean amount of time immobile. To construct the ANOVA table we compute M SG =
188464 = 37692.8 5
M SE =
197562 = 4703.9 42
F =
M SG 37692.8 = = 8.0 M SE 4703.9
We summarize these calculations in the ANOVA table below. Source Light Error Total
DF 5 42 47
SS 188464 197562 659611
MS 37692.8 4703.9
F 8.0
P 0.00002
From an F-distribution with 5 and 42 degrees of freedom, we see that the p-value is 0.00002. There is strong evidence that the average amount of time spent immobile is not the same for all six combinations of environment and stress. We will see in the next section how to tell where the differences lie. 8.23
(a) In each of the three environments, mice in the SD group who were subjected to stress have lower levels than mice in the HC group, which matches what we expect since stress reduces levels of FosB+ cells. In both the HC (no stress group) and the SD (stress) group, the mice in the enriched environment (EE) have the highest levels of FosB+ cells.
(b) The null hypothesis is that environment and prior stress do not affect mean FosB+ levels, while the alternative is that environment and prior stress do affect these mean levels. To construct the ANOVA table we compute M SG =
118286 = 23657.2 5
M SE =
75074 = 2085.4 36
F =
M SG 23657.2 = = 11.3 M SE 2085.4
We summarize these calculations in the ANOVA table below. Source Light Error Total
DF 5 36 41
SS 118286 75074 193360
MS 23657.2 2085.4
F 11.3
P 0.0000
From an F-distribution with 5 and 36 degrees of freedom, we see that the p-value is essentially zero. There is strong evidence that mean FosB+ levels are not the same for all six combinations of environment and stress. We will see in the next section how to tell where the differences lie. 8.24
(a) The fourth condition, in which participants were given money and then lost it if they failed to meet the goal, had the most success. People really hate to lose money! The first condition, in which participants received praise, had the least success.
(b) Yes, the conditions appear to be met. The standard deviations are all very similar and sample sizes are large enough so that we don’t have to worry too much about normality.
CHAPTER 8
448
(c) The null hypothesis is H0 : µ1 = µ2 = µ3 = µ4 where µ represents the population mean for number of days meeting the walking goal under each of the different incentives. The alternative hypothesis is that at least two of the means are different, Ha : Some µi = µj . Using the provided summary statistics, we find the three sum of squares using the shortcut formulas at the end of this section. (The difference between SSG + SSE and SST otal is due to rounding of the means and standard deviations.) SSG = SSE SST otal
= =
70(30.0 − 36.5)2 + 70(35.0 − 36.5)2 + 70(36.0 − 36.5)2 + 71(45.0 − 36.5)2 = 8, 262.25 69(32.0)2 + 69(29.9)2 + 69(29.4)2 + 70(30.1)2 = 255, 404.23 280(30.6865)2 = 263, 665.16
We fill in the rest of the ANOVA table by dividing each sum of square by its respective degrees of freedom and finding the ratio of the mean squares to compute the F-statistic. M SG =
8262.25 = 2754 3
M SE =
255404.23 = 922 277
F =
M SG 2754 = = 2.99 M SE 922
Using an F-distribution with 3 and 277 degrees of freedom, we find the area beyond F = 2.99 gives a p-value of 0.031. These calculations are summarized in the ANOVA table below. Source Incentive Error Total
DF 3 277 280
SS 8262 255404 263665
MS 2754 922
F 2.99
P 0.031
Since the p-value is less than 5%, we reject H0 and find evidence that the type of incentive used does make a difference in the mean success rates of overweight and obese people working to meet an exercise goal. It appears that losing money when a person doesn’t meet the goal is the most effective incentive. Again, people really hate to lost money! 8.25 The null hypothesis is that the mean change in pain threshold is the same regardless of pose struck, H0 : µ1 = µ2 = µ3 , while the alternative hypothesis is that mean change in pain threshold is different between at least two of the types of poses, Ha : Some µi = µj . We find the sum of squares using the shortcut formulas at the end of this section. (The difference between SSG + SSE and SST otal is due to rounding of the standard deviations.) SSG = SSE = SST otal
=
30(14.3 − 1.33)2 + 29(−4.4 − 1.33)2 + 30(−6.1 − 1.33)2 = 7, 654.9 29(34.8)2 + 28(31.9)2 + 29(35.4)2 = 99, 954.9 88(35.0)2 = 107, 800
We fill in the rest of the ANOVA table by dividing each sum of square by its respective degrees of freedom and finding the ratio of the mean squares to compute the F-statistic. M SG =
7654.9 = 3827.5 2
M SE =
99954.9 = 1162.3 86
F =
M SG 3827.5 = = 3.29 M SE 1162.3
Using an F-distribution with 2 and 86 degrees of freedom, we find the area beyond F = 3.29 gives a p-value of 0.042. We reject H0 and find evidence that the type of pose a person assumes is associated with change in mean pain threshold. The ANOVA table summarizing these calculations is shown below.
CHAPTER 8 Source Light Error Total 8.26
449
DF 2 86 88
SS 7654.9 99954.9 107800
MS 3827.5 1162.3
F 3.29
P 0.042
(a) We find the required sum of squares using the shortcut formulas at the end of this section. We compare the group means to the overall mean: SSG =
ni (xi − x)2 = 6(36.00 − 38)2 + 6(37.67 − 38)2 + 6(42.50 − 38)2 + 6(35.83 − 38)2 = 174.4
We find the variability within the groups: SSE =
(ni − 1)s2i = (6 − 1)14.522 + (6 − 1)12.402 + (6 − 1)17.412 + (6 − 1)13.862 = 4299.0
We find the total variability: SST otal = (n − 1)s2 = (24 − 1)13.952 = 4475.9 We see that SSG + SSE = 174.4 + 4299.0 = 4473.4. The difference from SST otal = 4475.9 is due to rounding the means and standard deviations in the summary statistics. (b) We are testing H0 : µ1 = µ2 = µ3 = µ4 vs Ha : Some µi = µj where the µi ’s represent the mean numbers of ants for the four types of bread. We have 4 − 1 = 3 degrees of freedom for the groups, 24 − 4 = 20 degrees of freedom for the error, and 24 − 1 = 23 degrees of freedom for the total. We compute mean squares by dividing sums of squares by degrees of freedom, then take the ratio of the mean squares to compute the F-statistic. 4299.0 M SG 58.13 174.4 = 58.13 M SE = = 214.95 F = = = 0.27 3 20 M SE 214.95 The ANOVA table summarizing these calculations is shown below. M SG =
Source Bread Error Total
DF 3 20 23
SS 174.4 4299.0 4473.4
MS 58.13 214.95
F 0.27
P 0.846
Using an F-distribution with 3 and 20 degrees of freedom, we find the area beyond F = 0.27 gives a p-value of 0.846. This is a very large p-value, so we have no convincing evidence to conclude that the mean number of ants attracted to a sandwich depends on the type of bread. 8.27 We are testing H0 : µD = µL = µW where µ represents the mean number of words recalled. Using the summary statistics we compute the sums of squares needed for the ANOVA table: SSG
=
16(4.8 − 3.8)2 + 16(3.4 − 3.8)2 + 16(3.2 − 3.8)2 = 24.32
SSE SST otal
= =
15(1.3)2 + 15(1.6)2 + 15(1.2)2 = 85.35 47(1.527)2 = 109.59
We summarize the calculations in the ANOVA table below. Note: Some values may differ slightly due to roundoff and once you have any two of the sum of squares you can easily find the third.
CHAPTER 8
450 Source Group Error Total
df SS 2 24.32 45 85.35 47 109.67
MS 12.16 1.90
F-value 6.40
p-value 0.0036
(a) Drawing an image of each word produced the highest mean recall (4.8). Just writing the word down produced the lowest mean recall (3.2). (b) From the ANOVA table, the F-statistic is 12.16/1.90 = 6.40. (c) We see that the area in an F distribution in the right tail beyond 6.40, using 2 and 45 degrees of freedom, is 0.0036. This p-value of 0.0036 is also shown in the ANOVA table. (d) The p-value is less than 0.05, so we reject H0 . We have strong evidence of a difference in mean number of words recalled depending on whether participants draw an image, list attributes, or write the word. (e) It does make a difference, and people wanting to memorize words should draw an image! 8.28 We are testing H0 : µD = µV = µW where µ represents the mean number of words recalled. Using the summary statistics we compute the sums of squares needed for the ANOVA table: SSG = SSE = SST otal
=
9(5.1 − 4.0)2 + 9(3.7 − 4.0)2 + 9(3.2 − 4.0)2 = 17.46 8(1.1)2 + 8(1.7)2 + 8(1.3)2 = 46.32 26(1.566)2 = 63.76
We summarize the calculations in the ANOVA table below. Note: Some values may differ slightly due to roundoff and once you have any two of the sum of squares you can easily find the third. Source Group Error Total
df 2 24 26
SS 17.46 46.32 63.78
MS 8.73 1.93
F-value 4.52
p-value 0.022
(a) Drawing an image of each word produced the highest mean recall (5.1). Just writing the word down produced the lowest mean recall (3.2). (b) From the ANOVA table, the F-statistic is 8.73/1.93 = 4.52. (c) We see that the area in an F distribution in the right tail beyond 4.52, using 2 and 24 degrees of freedom, is 0.022. This p-value of 0.022 is also shown in the ANOVA table. (d) The p-value is less than 0.05, so we reject H0 . We have evidence of a difference in mean number of words recalled depending on whether participants draw an image, visualize an image, or write the word. (e) It does make a difference, and people wanting to memorize words should draw an image! 8.29 We are testing H0 : µD = µV = µW where µ represents the mean number of words recalled. Using the summary statistics we compute the sums of squares needed for the ANOVA table: SSG = SSE = SST otal
=
12(4.4 − 3.6)2 + 12(3.4 − 3.6)2 + 12(3.0 − 3.6)2 = 12.48 11(1.2)2 + 11(1.5)2 + 11(1.1)2 = 53.90 35(1.377)2 = 66.36
We summarize the calculations in the ANOVA table below. Note: Some values may differ slightly due to roundoff and once you have any two of the sum of squares you can easily find the third.
CHAPTER 8 Source Group Error Total
451 df 2 33 35
SS 12.48 53.90 66.38
MS 6.24 1.63
F-value 3.83
p-value 0.032
(a) Drawing an image of each word produced the highest mean recall (4.4). Just writing the word down produced the lowest mean recall (3.0). (b) From the ANOVA table, the F-statistic is 6.42/1.63 = 3.83. (c) We see that the area in an F distribution in the right tail beyond 3.83, using 2 and 33 degrees of freedom, is 0.032. This p-value of 0.032 is also shown in the ANOVA table. (d) The p-value is less than 0.05, so we reject H0 . We have evidence of a difference in mean number of words recalled depending on whether participants draw an image, view an image, or write the word. (e) It does make a difference, and people wanting to memorize words should draw an image! 8.30
(a) The mice in bright light gained the most (xLL = 11.01), while the mice in the normal light/dark cycle gained the least (xLD = 5.93).
(b) Yes, the groups have similar variability since no standard deviation for a group is more than twice another standard deviation. (c) We have
x−x 17.4 − 11.01 = = 2.435 s 2.624 This value is 2.435 standard deviations above the mean, which causes some concern but allows us to proceed with the analysis. z-score =
(d) The cases are the mice. There are two variables. Which light group a mouse is assigned to is categorical while body mass gain over four weeks is quantitative. 8.31
(a) The null hypothesis is that the amount of light at night does not affect how much weight is gained. The alternative hypothesis is that the amount of light at night has some effect on mean weight gain.
(b) We see from the computer output that the F-statistic is 8.38 while the p-value is 0.002. This is a small p-value, so we reject H0 . There is evidence that mean weight gain in mice is influenced by the amount of light at night. (c) Yes, there is an association between weight gain and light conditions at night. From the means of the groups, it appears that mice with more light at night tend to gain more weight. (d) Yes, we can conclude that light at night causes weight gain (in mice), since the results are significant and come from a randomized experiment. 8.32 The null hypothesis is that mean activity level is not related to light condition, while the alternative hypothesis is that the mean activity level is related to light condition. The F-statistic is 0.09 and the p-value is 0.910. This is a very large p-value, so we do not reject H0 . There is no convincing evidence at all that mean activity level of mice is different depending on the amount of light at night. The differences in weight gain in the previous exercise are not caused by different activity levels. 8.33
(a) The standard deviations are very different. In particular, the standard deviation for the LL sample (sLL = 1.31) is more than double the standard deviation for the LD sample (sLD = 0.43). This indicates that an ANOVA test may not be appropriate in this situation.
CHAPTER 8
452
(b) A p-value of 0.652 from the randomization distribution is not small, so we would not reject a null hypothesis that the means are equal. There is not sufficient evidence to conclude that the mean amount consumed is different depending on the amount of light at night. Mice in different light at night conditions appear to eat the roughly similar amounts on average. Weight gain in mice with light at night is not a result of eating more food. 8.34
(a) The p-value is 0.656, so we do not find any strong association between amount of light at night and mean corticosterone levels.
(b) The LL group (bright light always) has mean corticosterone level (xLL = 50.83) quite a bit less than the other two groups. However, the ANOVA does not find this difference significant because the standard deviations are very large relative to the differences in the group means. There apparently is a great deal of variability in how much stress mice feel. The “variability within groups” is very large. We see that the group standard deviations are large in the summary statistics and that the SSE and MSE are both very large in the ANOVA table. 8.35
(a) For mice in this sample on a standard light/dark cycle, 36.0% of their food is consumed (on average) during the day and the other 64.0% is consumed at night. For mice with even just dim light at night, 55.5% of their food is consumed (on average) during the day and the other 44.5% is consumed at night.
(b) The p-value is 0.000 so there is very strong evidence that light at night influences when food is consumed by mice. Since the result comes from a randomized experiment, and the mice with more light average a higher percentage of food consumed during the day, we can conclude that light at night causes mice to eat a greater percentage of their food during the day. This raises the question: how strong is the association between time of calorie consumption and weight gain? That answer will have to wait until the next chapter. 8.36
(a) The standard deviations for the GTT-120 data are very different so it might not be appropriate to use an F-distribution.
(b) We create a randomization sample by assuming the null hypothesis, which in this case means the mean glucose level is the same for all the light conditions — that light condition doesn’t matter. We simulate this by randomly assigning the 27 values to the three light conditions (10 to DM, 8 to LD, and 9 to LL as in the original sample). We then compute the full ANOVA table for this simulated sample to create the randomization F-statistic. (c) Fifteen minutes after the injection, we do not detect a difference in mean glucose levels between the groups, since the p-value is 0.402. After 120 minutes, however, we see a difference between the groups at a 5% level, since the p-value is 0.015. Analysis of the data shows that mean glucose level in the mice in the standard light/dark group has decreased as it should, but that glucose levels remain high in the other two groups. Light at night does appear to affect glucose intolerance. 8.37 Computer output is shown below for the ANOVA test and the summary statistics. Analysis of Variance Source DF Adj SS SchoolPressure 3 5926 Error 447 41024 Total 450 46950
Adj MS 1975.44 91.78
SchoolPressure
StDev
N
Mean
F-Value 21.52
P-Value 0.000
CHAPTER 8
453
None Very little Some A lot
16 55 200 180
2.938 3.577 9.404 13.961
3.860 5.001 9.048 11.374
(a) In the summary statistics, we see that those feeling a lot of pressure from school work have the largest mean number of hours doing homework while those feeling no pressure have the smallest. (b) We see from the ANOVA table in the output that the F-statistic is 21.52 and the p-value is 0.000. (c) The p-value is very small, so we have strong evidence of a difference in mean number of hours spent on homework depending on how much stress students feel from schoolwork. 8.38 Computer output is shown below for the ANOVA test and the summary statistics. Analysis of Variance Source DF Adj SS Communicate 3 336656 Error 441 6272850 Total 444 6609505 Communicate App In person Phone Text
N 67 94 36 248
Mean 40.6 46.81 109.6 99.84
Adj MS 112219 14224
F-Value 7.89
P-Value 0.000
StDev 85.6 85.65 126.2 135.71
(a) In the summary statistics, we see that those who prefer to communicate by phone have the largest mean number of texts, while those who prefer to use an app have the smallest. (b) We see from the ANOVA table in the output that the F-statistic is 7.89 and the p-value is 0.000. (c) The p-value is very small, so we have strong evidence of a difference in mean number of texts sent per day depending on students’ preferred mode of communication. 8.39
(a) The summary statistics are shown below. The controls who had never played football had the largest mean hippocampus volume, and the football players with a history of concussions had the smallest. Group Control FBConcuss FBNoConcuss
N 25 25 25
Mean 7602.6 5734.6 6459.2
StDev 1074.0 593.4 779.7
(b) The ANOVA table is shown below. We see that the F-statistic is 31.47 and the p-value is 0.000. Source Group Error Total
DF 2 72 74
SS 44348606 50727336 95075942
MS 22174303 704546
F 31.47
P 0.000
CHAPTER 8
454
(c) This is a very small p-value, so we reject H0 . We have very strong evidence that mean hippocampus size is different depending on one’s football playing and concussion experience. 8.40
(a) For the original data, the F-statistic is 31.47.
(b) Answers will vary for each simulation. (c) A randomization distribution of F-statistics for 5000 simulated samples is shown below.
We see that the F-statistic for the original sample (31.47) is far beyond any of the F-statistics simulated under the null hypothesis, so we have p-value ≈ 0.000. (d) This is a very small p-value, so we reject H0 . We have very strong evidence that mean hippocampus size is different depending on one’s football playing and concussion experience. 8.41 We start by finding the mean and standard deviations for Exercise within each of the award groups and for the whole sample.
Mean Std. Dev. Sample size
Academy 6.81 4.09 31
Nobel 6.96 4.74 148
Olympic 11.14 5.98 182
Total 9.05 5.74 361
The standard deviations are not too different, so the equal variance condition is reasonable. Side-by-side boxplots of exercise between the award groups show some right skewness, but the sample sizes are large so this shouldn’t be a problem.
CHAPTER 8
Academy
Nobel
Olympic
455
0
10
20
30
40
Exercise
To test H0 : µ1 = µ2 = µ3 vs Ha : Some µi = µj we use technology to produce an ANOVA table. Award Residuals Total
Df Sum Sq Mean Sq F value 2 1597.9 798.96 27.861 358 10266.3 28.68 360 11864.2
Pr(>F) 0.0000
The p-value from the ANOVA (0.0000) is very small, so we have strong evidence that at least one of the award groups has a mean exercise rate that differs from at least one of the other groups. 8.42
(a) Here are the sample sizes, means and standard deviations for each group. The standard deviations are very similar to each other so the equal variance condition is quite reasonable. Variable GillRate
Calcium Low Medium High
Count 30 30 30
Mean 68.50 58.67 58.17
StDev 16.23 14.28 13.78
Side-by-side dotplots of the gill rates for each calcium level show relatively symmetric distributions and no extreme outliers, plus the sample sizes are quite large at 30 each, so we find no concerns with the normality condition.
(b) The conditions are met so we use ANOVA to test H0 : µ1 = µ2 = µ3 vs Ha : Some µi = µj , where µ1 , µ2 , and µ3 are the mean gill rates for fish in low, medium, and high concentrations of calcium, respectively. Here is the ANOVA table courtesy of statistical software.
CHAPTER 8
456 Source Calcium Error Total
DF 2 87 89
SS 2037 19064 21102
MS 1019 219
F 4.65
P 0.012
The F-statistic is 4.65 which gives a p-value of 0.012 when compared to an F-distribution with 2 and 87 degrees of freedom. This is a small p-value (less than 5%) so we reject H0 and conclude that the mean gill rate differs depending on the calcium level of the water. 8.43
(a) Treatment level appears to be associated with drug resistance, with more aggressive treatments yielding more drug resistance. Treatment level does not appear to be associated with health outcomes.
(b) The p-value from the ANOVA test for ResistanceDensity is ≈0, the p-value for DaysInfectious is 0.0002, the p-value for Weight is 0.906, and the p-value for RBC is 0.911. The two response variables measuring drug resistance show a significant difference in means among the dosage groups, while the two response variables measuring health do not show significant differences in means. (c) Treatment level is significantly associated with drug resistance, and in the sample more aggressive treatments yield more drug resistance. This contradicts conventional wisdom that aggressive treatments are better at preventing drug resistance. (d) With the untreated group excluded, the conditions for ANOVA are most reasonable for the DaysInfectious variable. The smallest groups standard deviation (2.48 for the light treatment) is within a factor of two of the largest standard deviation (4.01 for the aggressive treatment). There may be some mild concerns with normality in the boxplot. The other response measuring drug resistance, ResistanceDensity, shows a serious concern with the condition on similar variances, since the boxplot shows the variability is much higher in the aggressive treatment group than in the others. For the response variables measuring health, the condition of normality is most obviously violated, because sample sizes are relatively small (only 18 in each group), and there are a couple of extreme outliers in the light and moderate treatment groups for both variables. 8.44 Here’s a boxplot comparing the red blood cell densities after the outliers have been eliminated.
The normality condition looks much better now but we might still have a slight problem with the standard deviations (sL = 0.395 vs sA = 0.820 which is barely beyond the two times threshold).
CHAPTER 8
457
The ANOVA for testing for a difference in mean RBC between the three groups after removing the outliers gives a p-value of 0.628. The p-value in the previous exercise (with the outliers present) is 0.911. While there is some difference in these p-values, neither is very small, so we still conclude that there is little evidence in difference in mean read blood cell density depending on the aggressiveness of the drug treatment.
CHAPTER 8
458 Section 8.2 Solutions
8.45 Yes, the p-value (0.002) is very small so we have strong evidence that there is a difference in the population means between at least two of the groups. √ √ 8.46 The pooled standard deviation is M SE = 6.20 = 2.49. We use the error degrees of freedom, which we see in the output is 12. 8.47 We see in the output that the sample mean for group A is 10.2 √ with a sample size of 5. For a confidence interval for a mean after an analysis of variance test, we use M SE for the standard deviation, and we use error df for the degrees of freedom. For a 95% confidence interval with 12 degrees of freedom, we have t∗ = 2.18. The confidence interval is √ M SE x A ± t∗ · √ nA √ 6.20 10.2 ± 2.18 · √ 5 10.2 ± 2.43 7.77 to 12.63 We are 95% confident that the population mean for group A is between 7.77 and 12.63. 8.48 We see in the output that the sample mean for group B is 16.8 while the sample mean for group C is 10.8. Both sample sizes are 5. For a confidence interval for a difference in means after an analysis of variance √ test, we use M SE for both standard deviations, and we use error df for the degrees of freedom. For a 90% confidence interval with 12 degrees of freedom, we have t∗ = 1.78. We have 1 1 ∗ (xB − xC ) ± t · M SE + nB nC 1 1 (16.8 − 10.8) ± 1.78 · 6.20 + 5 5 6.0 3.20
± to
2.80 8.80
We are 90% confident that the difference in the population means of group B and group C is between 3.20 and 8.80. Notice that since zero is not in this range, a test is likely to find a difference between the means of these two populations. 8.49 We are testing H0 : µA = µC vs Ha : µA = µC . The test statistic is t=
xA − xC = 1 1 M SE nA + nC
10.2 − 10.8 6.20
1 1 5 + 5
= −0.38
This is a two-tail test so the p-value is twice the area below −0.38 in a t-distribution with df = 12. We see that the p-value is 2(0.355) = 0.71. This is a very large p-value and we do not find any convincing evidence of a difference in population means between groups A and C.
CHAPTER 8
459
8.50 Yes, the p-value (0.003) is very small so we have strong evidence that there is a difference in the population means between at least two of the groups. 8.51 The pooled standard deviation is we see in the output is 20.
√
M SE =
√
48.3 = 6.95. We use the error degrees of freedom, which
8.52 We see in the output that the sample mean for group A is 86.83 √ with a sample size of 6. For a confidence interval for a mean after an analysis of variance test, we use M SE for the standard deviation, and we use error df for the degrees of freedom. For a 99% confidence interval with 20 degrees of freedom, we have t∗ = 2.85. We have √ M SE x A ± t∗ · √ nA √ 48.3 86.83 ± 2.85 · √ 6 86.83 ± 8.09 78.74 to 94.92 We are 99% confident that the population mean for group A is between 78.74 and 94.92. Since 90 is within this interval, it is a plausible value for the population mean of group A. 8.53 We see in the output that the sample mean for group C is 80.0 while the sample mean for group D is 69.33. Both sample√ sizes are 6. For a confidence interval for a difference in means after an analysis of variance test, we use M SE for both standard deviations, and we use the error df degrees of freedom. For a 95% confidence interval with 20 degrees of freedom, we have t∗ = 2.09. The confidence interval is (xC − xD )
±
(80.00 − 69.33)
±
10.67
±
2.28
to
1 1 t · M SE + nC nD 1 1 2.09 · 48.3 + 6 6 ∗
8.39 19.06
We are 95% confident that the difference in the population means of group C and group D is between 2.28 and 19.06. 8.54 We are testing H0 : µA = µD vs Ha : µA = µD . The test statistic is t=
86.83 − 69.33 xA − xD = 4.36 = 48.3 16 + 16 M SE n1A + n1D
This is a two-tail test so the p-value is twice the area above 4.36 in a t-distribution with df = 20. We see that the p-value is 2(0.00015) = 0.0003. This is a very small p-value so we do find evidence of a difference in the population means between groups A and D. This is not surprising since the ANOVA found a difference in means and these two means are the farthest apart of the four means.
CHAPTER 8
460 8.55 We are testing H0 : µA = µB vs Ha : µA = µB . The test statistic is t=
xA − xB 86.83 − 76.17 = 2.66 = 48.3 16 + 16 M SE n1A + n1B
This is a two-tail test so the p-value is twice the area above 2.66 in a t-distribution with df = 20. We see that the p-value is 2(0.0075) = 0.015. With a 5% significance level, we reject H0 and find evidence of a difference in the population means between groups A and B. 8.56 We are testing H0 : µB = µD vs Ha : µB = µD . The test statistic is t=
xB − xD 76.17 − 69.33 = 1.70 = 1 1 1 1 48.3 + M SE nB + nD 6 6
This is a two-tail test so the p-value is twice the area above 1.70 in a t-distribution with df = 20. We see that the p-value is 2(0.052) = 0.104. This is not a small p-value (even at a 10% level), so we do not find convincing evidence of a difference in population means between groups B and D. 8.57 To compare mean ant counts for peanut butter and ham & pickles the relevant hypotheses are H0 : µ2 = µ3 vs Ha : µ2 = µ3 . We compare the sample means, x2 = 34.0 and x3 = 49.25, and standardize using the SE for a difference in means after ANOVA. t=
x2 − x3 = 1 1 M SE n2 + n3
34.0 − 49.25 138.7
1 1 8 + 8
=
−15.25 = −2.59 5.89
We find the p-value using a t-distribution with 21 (error) degrees of freedom, doubling the area below t = −2.59 to get p-value =2(0.0085) = 0.017. This is a small p-value so we have evidence of a difference in mean number of ants between peanut butter and ham & pickles sandwiches, with ants seeming to prefer ham & pickles. 8.58
(a) If we let µ1 be the mean pulse rate for Academy Award and µ2 be the mean for Nobel, the hypotheses are H0 : µ1 = µ2 vs Ha : µ1 = µ2 . Using the information from the output, the test statistic is 70.52 − 72.21 = −0.71 t= 1 1 144 31 + 149 The two-tailed p-value using a t-distribution with 359 degrees of freedom is 0.478 which is not small. We do not have convincing evidence of a difference in average pulse rates between students who prefer an Academy Award and those who want a Nobel Prize.
(b) If we let µ1 be the mean pulse rate for Academy Award and µ3 be the mean for Olympic gold, the hypotheses are H0 : µ1 = µ3 vs Ha : µ1 = µ3 . The test statistic is t=
70.52 − 67.25 144
1 1 31 + 182
= 1.40
The two-tailed p-value using a t-distribution with 359 degrees of freedom is 0.163 which is not small. We do not have convincing evidence of a difference in average pulse rates between students who prefer an Academy Award and those who want an Olympic gold medal.
CHAPTER 8
461
(c) To test H0 : µ2 = µ3 vs Ha : µ2 = µ3 the test statistic is t=
72.21 − 67.25 144
1 1 149 + 182
= 3.74
The two-tailed p-value using a t-distribution with 359 degrees of freedom is 0.0002 which is quite small. We have strong evidence of a difference in average pulse rates between students who prefer a Nobel Prize and those who want an Olympic gold medal, with students who want the Olympic gold tending to have lower pulse rates. Note that the mean pulse rate for Academy Award is not considered different from either Nobel or Olympic, even though those two groups were found to have different mean pulse rates. We could also use technology to address these three pairwise differences. For example, here is some computer output giving confidence intervals for each pairwise difference. Award Nobel Academy Olympic
N 149 31 182
Mean 72.21 70.52 67.25
Grouping A A B B
Award = Academy subtracted from: Difference Nobel - Academy Olympic - Academy Olympic - Nobel
Lower -2.96 -7.85 -7.57
Center 1.70 -3.26 -4.96
Upper 6.36 1.32 -2.35
Note that the first two intervals include zero, so we do not see a significant difference in mean pulse rates when those groups are compared. The last interval has only negative values, indicating that the mean pulse rate for students who prefer the Olympic gold is probably less than the mean pulse rate for students who choose the Nobel Prize. 8.59 We have three pairs to test. We first test H0 : µDM = µLD vs Ha : µDM = µLD . The test statistic is t=
xDM − xLD 7.859 − 5.987 = 1.60 = 1 1 1 6.48 10 + 19 M SE nDM + nLD
This is a two-tail test so the p-value is twice the area above 1.60 in a t-distribution with df = 25. We see that the p-value is 2(0.061) = 0.122. We don’t find convincing evidence for a difference in mean weight gain between the dim light condition and the light/dark condition. We next test H0 : µDM = µLL vs Ha : µDM = µLL . The test statistic is t=
xDM − xLL 7.859 − 11.010 = −2.69 = 1 1 1 6.48 10 + 19 + nLL M SE nDM
This is a two-tail test so the p-value is twice the area below −2.69 in a t-distribution with df = 25. We see that the p-value is 2(0.0063) = 0.0126. At a 5% level, we do find a difference in mean weight gain between the
CHAPTER 8
462
dim light condition and the bright light condition, with higher mean weight gain in the bright light condition. Finally, we test H0 : µLD = µLL vs Ha : µLD = µLL . The test statistic is t=
5.987 − 11.010 xLD − xLL = −4.19 = 1 1 1 1 6.48 + M SE nLD + nLL 9 9
This is a two-tail test so the p-value is twice the area below −4.19 in a t-distribution with df = 25. We see that the p-value is 2(0.00015) = 0.0003. There is strong evidence of a difference in mean weight gain between the light/dark condition and the bright light condition, with higher mean weight gain in the bright light condition. 8.60 We have three pairs to test. We first test H0 : µDM = µLD vs Ha : µDM = µLD . The test statistic is t=
xDM − xLD 55.516 − 36.485 = 4.30 = 1 1 1 1 92.8 + M SE nDM + nLD 10 9
This is a two-tail test so the p-value is twice the area above 4.30 in a t-distribution with df = 25. We see that the p-value is 2(0.0001) = 0.0002. We find strong evidence of a difference in mean daytime consumption percent between the dim light condition and the light/dark condition. A higher mean percentage of food is consumed during the day in the dim light condition. We next test H0 : µDM = µLL vs Ha : µDM = µLL . The test statistic is t=
xDM − xLL 55.516 − 76.573 = −4.76 = 1 1 1 1 92.8 + M SE nDM + nLL 10 9
This is a two-tail test so the p-value is twice the area below −4.76 in a t-distribution with df = 25. We see that the p-value is 2(0.00003) = 0.00006. We find strong evidence of a difference in mean daytime consumption percent between the dim light condition and the bright light condition. A higher mean percentage of food is consumed during the day in the bright light condition. Finally, we test H0 : µLD = µLL vs Ha : µLD = µLL . The test statistic is t=
xLD − xLL 36.485 − 76.573 = −8.83 = 1 1 1 1 92.8 + M SE nLD + nLL 9 9
This is a two-tail test so the p-value is twice the area below −8.83 in a t-distribution with df = 25. We see that the p-value is essentially zero, so there is very strong evidence of a difference in mean daytime consumption percent between the light/dark condition and the bright light condition. A higher mean percentage of food is consumed during the day in the bright light condition. 8.61
(a) The p-value for the ANOVA table is essentially zero, so we have strong evidence that there are differences in mean time spent in darkness among the six treatment combinations. Since all the sample sizes are the same, the groups most likely to show a difference in population means are the ones with the sample means farthest apart. These groups are IE:HC (impoverished environment with no added
CHAPTER 8
463
stress, x1 = 192) and SE:SD (standard environment with added stress, x5 = 438). Likewise, the groups least likely to show a difference in population means are the ones with the sample means closest together. These groups are IE:HC (impoverished environment with no added stress, x1 = 192) and SE:HC (standard environment with no added stress, x2 = 196). (b) Given the six means, there appear to be two distinct groups. The four means (all environments with no added stress, along with the enriched environment with added stress) are all somewhat similar, while the other two means (impoverished or standard environment with added stress) appear to be much larger than the other four and similar to each other. Only the enriched environment with its opportunities for exercise appears to have conferred some immunity to the added stress. (c) Let µ1 and µ6 be the means for mice in IE:HC and EE:SD conditions, respectively. We test H0 : µ1 = µ6 vs Ha : µ1 = µ6 . The test statistic is t=
x1 − x6 = M SE n11 + n16
192 − 231 2469.9
1 1 8 + 8
= −1.57
This is a two-tail test so the p-value is twice the area below −1.57 in a t-distribution with df = 42. We see that the p-value is 2(0.062) = 0.124. We do not reject H0 and do not find convincing evidence of a difference in mean time spent in darkness between mice in the two groups. Prior exercise in an enriched environment may help eliminate the effects of the added stress. 8.62 Here is some computer output for pairwise comparisons after doing an ANOVA to compare mean hang out time between the SchoolPressure groups. Grouping Information Using the Fisher LSD Method and 95% Confidence SchoolPressure N Mean Grouping Very little 55 15.91 A None 16 12.19 A B Some 197 10.977 B A lot 179 9.866 B Means that do not share a letter are significantly different. Fisher Individual Tests for Differences of Means Difference SE of Difference of Levels of Means Difference 95% CI None - A lot 2.32 2.73 (-3.05, 7.69) Some - A lot 1.11 1.08 (-1.01, 3.24) Very little - A lot 6.04 1.61 ( 2.87, 9.22) Some - None -1.21 2.72 (-6.56, 4.14) Very little - None 3.72 2.97 (-2.12, 9.57) Very little - Some 4.93 1.60 ( 1.79, 8.07)
T-Value 0.85 1.03 3.74 -0.44 1.25 3.09
Adjusted P-Value 0.396 0.305 0.000 0.657 0.211 0.002
We see that the mean HangHours for the “Very little” group (15.91) is shown to be higher than either the “Some” (10.98) or the “A lot” (9.87) groups. They are given different letters in the “Grouping Information” of the output and the confidence intervals for both differences include only positive values. Interestingly the mean HangHours for the “None” group (12.19) is not shown to be significantly different from any of the other three groups.
CHAPTER 8
464
We can also easily see these relationships in the plot of the confidence intervals for differences of means shown below. All of the intervals include zero except for “Very little - A lot” and “Very little - Some” which include only positive values.
8.63 Here is some computer output for pairwise comparisons after doing an ANOVA to compare mean texts sent between the Communicate groups. Grouping Information Using the Fisher LSD Method and 95% Confidence Communicate N Mean Grouping Phone 36 109.6 A Text 248 99.84 A In person 94 46.81 B App 67 40.6 B Means that do not share a letter are significantly different. Fisher Individual Tests for Differences of Means Difference SE of Difference of Levels of Means Difference 95% CI In person - App 6.2 19.1 (-31.3, 43.7) Phone - App 69.0 24.6 ( 20.6, 117.5) Text - App 59.2 16.4 ( 27.0, 91.5) Phone - In person 62.8 23.4 ( 16.9, 108.8) Text - In person 53.0 14.4 ( 24.6, 81.4) Text - Phone -9.8 21.3 (-51.6, 32.0)
T-Value 0.32 2.80 3.61 2.69 3.67 -0.46
Adjusted P-Value 0.745 0.005 0.000 0.007 0.000 0.645
We see that Communicate groups are split into two sets. Both the “Phone” (109.6) and the “Text” (99.8) groups have mean number texts sent that are shown to be higher than either the “In person” (46.8) or the “App” (40.6) groups. They get different letters in the “Grouping Information” of the output and all of the confidence intervals for differences in means between those pairs do not include zero. However, we do not have convincing evidence of a difference in mean texts sent between the “Phone” and “Text” groups or the “In person” and “App” groups because those respective confidence intervals include zero. These comparisons can also be seen in the plot of the differences in means confidence intervals shown below.
CHAPTER 8
8.64
465
(a) When doing a pairwise comparison after ANOVA for the moderate and aggressive treatments, the response variables measuring drug resistance both show significant differences (p-value ≈ 0 for ResistanceDensity and p-value = 0.023 for DaysInfectious), with higher means in the aggressive treatment group for both variables.
(b) DaysInfectious is the only resistance response variable with a significant difference (p-value = 0.033) between the light and moderate treatment groups, with a higher mean in the moderate treatment group. The p-value when comparing the ResistanceDensity means between these two groups is 0.744 which is not significant. (c) Duration seems to be more influential, because both drug resistance response variables were significant as opposed to just one for amount and also because the p-values for both of these variables were smaller for the duration comparison than the amount comparison. 8.65 We test H0 : µ1 = µ4 vs Ha : µ1 = µ4 where µ1 and µ4 are the mean change in closeness rating after doing a synchronized, high exertion activity (HS+HE) and a non-synchronized, low exertion activity (LS+LE), respectively. From the table of group means we see x1 = 0.319 for a sample of n1 = 72 and x4 = −0.431 for a sample of n4 = 58. We also find M SE = 3.248 in the ANOVA table, so the t-statistic is t=
x1 − x4 0.319 − (−0.431) = 2.36 = 1 1 1 1 3.248 + M SE n1 + n4 72 58
The p-value is twice the area above 2.36 in a t-distribution with df = 256. We see that the p-value is 2(0.0095) = 0.019. This is a small p-value (less than 5%) so we have convincing evidence that the mean change in closeness rating for a synchronized, high exertion activity is higher than for a non-synchronized, low exertion activity. Note that the HS+HE group had the smallest mean of the three activity groups that were similar and positive, so we can draw a similar conclusion about how the other two groups relate to the LS+LE group. 8.66 Some output for a ANOVA with the data in TextbookCosts is shown below. Source Field Error Total
DF 3 36 39
SS 30848 91294 122142
MS 10283 2536
F 4.05
P 0.014
CHAPTER 8
466 Level Arts Humanities NaturalScience SocialScience
N 10 10 10 10
Mean 94.60 120.30 170.80 118.30
StDev 44.95 58.15 48.49 48.90
We have 36 degrees of freedom for the M SE = 2536 so the t-value for a 95% confidence interval is t∗ = 2.028. The sample size in each group is 10, so the value of LSD for comparing any two means is 1 1 + LSD = 2.028 2536 = 45.7 10 10 We write the group means down in increasing order Arts = 94.6
Social Science = 118.3
Humanities = 120.3
Natural Science = 170.8
The first three sample means are all within 45.7 of each other, while the Natural Science mean is more than 45.7 above each of them. The mean textbook costs for Natural Science courses is more than the other three fields, which are not significantly different from each other. 8.67 We have 42 degrees of freedom for the M SE = 2469.9 so the t-value for a 95% confidence interval is t∗ = 2.018. The sample size in each group is 8, so the value of LSD for comparing any two means is 1 1 + LSD = 2.018 2469.9 = 50.1 8 8 We write the group means down in increasing order Group: IE:HC Mean: 192
SE:HC 196
EE:HC 205
EE:SD 231
IE:SD 392
SE:SD 438
The first four sample means are all within 50.1 of each other so there are no significant differences in mean time in darkness for any of the non-stressed mice groups or the stressed group that has an enriched environment. The mean time is higher for the other two stressed groups (both means are more than 50.1 seconds above the EE:SD group mean), although those two are not significantly different from each other. 8.68 Here is some computer output with the group means and the ANOVA table for testing for a difference in mean gill rates between the three calcium levels. Level High Low Medium
N 30 30 30
Mean 58.17 68.50 58.67
StDev 13.78 16.23 14.28
Source Calcium Error Total
DF 2 87 89
SS 2037 19064 21102
MS 1019 219
F 4.65
P 0.012
The p-value=0.012 is small, so we conclude there is probably some difference among the means. Which groups are different? We use technology to find 95% confidence intervals for the difference in each pair of means.
CHAPTER 8 Calcium Low - High Medium - High Medium - Low
467 Lower 2.74 -7.10 -17.43
Center 10.33 0.50 -9.83
Upper 17.93 8.10 -2.24
We see that the first interval (Low − High) includes only positive values and the third interval (M edium − Low) has only negative values. This indicates that the mean gill rate with a low level of calcium is larger than when the calcium is at a medium or high level. The middle interval (M edium − High) includes zero and has both positive and negative values. We do not find a significant difference in mean gill rate between the medium and high calcium levels.
CHAPTER 9
468 Section 9.1 Solutions
9.1 The estimates for β0 and β1 are given in two different places in this output, with the output in the table having more digits of accuracy. We see that the estimate for the intercept β0 is b0 = 29.3 or 29.266 and the estimate for the slope β1 is b0 = 4.30 or 4.2969. The least squares line is Y = 29.3 + 4.30X. 9.2 The estimates for β0 and β1 are given in two different places in this output, with the output in the table having more digits of accuracy. We see that the estimate for the intercept β0 is b0 = 808 or 807.79 and the estimate for the slope β1 is b1 = −3.66 or −3.659. The least squares line is Y = 808 − 3.66A. 9.3 The estimates for β0 and β1 are given in the “Estimate” column of the computer output. We see that the estimate for the intercept β0 is b0 = 77.44 and the estimate for the slope β1 is b1 = −15.904. The least squares line is Y = 77.44 − 15.904 · Score. 9.4 The estimates for β0 and β1 are given in the “Estimate” column of the computer output. We see that the estimate for the intercept β0 is b1 = 7.277 and the estimate for the slope β1 is b1 = −0.3560. The least squares line is Y = 7.277 − 0.3560 · Dose. 9.5 The slope is b1 = −8.1952. The null and alternative hypotheses for testing the slope are H0 : β1 = 0 vs Ha : β1 = 0. We see in the output that the p-value is 0.000, so there is strong evidence that the explanatory variable X is an effective predictor of the response variable Y . 9.6 The slope is b1 = −0.02413. The null and alternative hypotheses for testing the slope are H0 : β1 = 0 vs Ha : β1 = 0. We see in the output that the p-value is 0.245, so X does not appear to be an effective predictor of Y . 9.7 The slope is b1 = −0.3560. The null and alternative hypotheses for testing the slope are H0 : β1 = 0 vs Ha : β1 = 0. We see in the output that the p-value is 0.087. At a 5% level, we do not find evidence that the explanatory variable Dose is an effective predictor of the response variable for this model. 9.8 The slope is b1 = −3.659. The null and alternative hypotheses for testing the slope are H0 : β1 = 0 vs Ha : β1 = 0. We see in the output that the p-value is 0.006, so there is relatively strong evidence that the explanatory variable A is an effective predictor of the response variable for this model. 9.9 A confidence interval for the slope β1 is given by b1 ± t∗ · SE. For a 95% confidence interval with df = n − 2 = 22, we have t∗ = 2.07. We see from the output that b1 = −8.1952 and the standard error for the slope is SE = 0.9563. A 95% confidence interval for the slope is b1 −8.1952 −8.1952
± ± ±
t∗ · SE 2.07(0.9563) 1.980
−10.1752
to
−6.2152
We are 95% confident that the population slope β1 for this model is between −10.18 and −6.22. We can’t give a more informative interpretation without some context for the variables. 9.10 A confidence interval for the slope β1 is given by b1 ± t∗ · SE. For a 95% confidence interval with df = n − 2 = 28, we have t∗ = 2.05. We see from the output that b1 = −0.3560 and the standard error for
CHAPTER 9
469
the slope is SE = 0.2007. A 95% confidence interval for the slope is b1 −0.3560 −0.3560
± ± ±
t∗ · SE 2.05(0.2007) 0.4114
−0.7674
to
0.0554
We are 95% confident that the population slope β1 for this model is between −0.767 and 0.055. We can’t give a more informative interpretation without some context for the variables. 9.11 We are testing H0 : ρ = 0 vs Ha : ρ > 0. To find the test statistic, we use √ √ 0.35 28 r n−2 = = 1.98 t= √ 1 − r2 1 − (0.35)2 This is a one-tail test, so the p-value is the area above 1.98 in a t-distribution with df = n − 2 = 28. We see that the p-value is 0.029. At a 5% level, we have sufficient evidence for a positive linear association between the two variables. 9.12 We are testing H0 : ρ = 0 vs Ha : ρ = 0. To find the test statistic, we use √ √ 0.28 8 r n−2 = = 0.825 t= √ 1 − r2 1 − (0.28)2 This is a two-tail test, so the p-value is twice the area above 0.825 in a t-distribution with df = n−2 = 8. We see that the p-value is 2(0.217) = 0.434. We do not find sufficient evidence for a linear association between the two variables. 9.13 We are testing H0 : ρ = 0 vs Ha : ρ = 0. To find the test statistic, we use √ √ 0.28 98 r n−2 = = 2.89 t= √ 1 − r2 1 − (0.28)2 This is a two-tail test, so the p-value is twice the area above 2.89 in a t-distribution with df = n − 2 = 98. We see that the p-value is 2(0.0024) = 0.0048. We find strong evidence of a linear association between the two variables. 9.14 We are testing H0 : ρ = 0 vs Ha : ρ < 0. To find the test statistic, we use √ √ −0.41 16 r n−2 = = −1.80 t= √ 1 − r2 1 − (−0.41)2 This is a one-tail test, so the p-value is the area below −1.80 in a t-distribution with df = n − 2 = 16. We see that the p-value is 0.045. At a 5% level, we find (just barely) evidence of a negative linear association between the two variables. 9.15
(a) The two variables most strongly positively correlated are Height and W eight. The correlation is r = 0.619 and the p-value is 0.000 to three decimal places. A positive correlation in this context means that taller people tend to weigh more.
CHAPTER 9
470
(b) The two variables most strongly negatively correlated are GP A and W eight. The correlation is r = −0.217 and the p-value is 0.000 to three decimal places. A negative correlation in this context means that heavier people tend to have lower grade point averages. (c) At a 5% significance level, almost all pairs of variables have a significant correlation. The only pair not significantly correlated is hours of Exercise and hours of T V with r = 0.010 and p-value=0.852. Based on these data, we find no convincing evidence of a linear association between time spent exercising and time spent watching TV. 9.16
(a) The two variables most strongly positively correlated are Rebounds and P oints. The correlation is r = 0.533 and the p-value is 0.000 to three decimal places. A positive correlation in this context means that players who get more rebounds also tend to score more of points. The positive correlation between Steals and P oints is almost as strong (r = 0.527).
(b) The two variables most strongly negatively correlated are Rebounds and F T P ct. The correlation is r = −0.136 and the p-value is 0.060 to three decimal places, which gives some mild evidence for a relationship. A negative correlation in this context means that players who get lots of rebounds tend to not be very good at shooting free throws. (c) The variable Age is not significantly correlated with P oints (p-value=0.319), Rebounds (p-value=0.157), or Steals (p-value=0.912). The variable F T P ct is not also significantly correlated with Steals (pvalue=0.564) and is not quite significant at a 5% level with Rebounds (p-value=0.060). All other pairs are significantly correlated, at a 5% level. 9.17
(a) The explanatory variable is the lifestyle score, and the response variable is the number of diseasefree years.
(b) The second half of the sentence is interpreting the slope of the regression line. (c) No, we cannot conclude this, because correlation does not imply causation. These results come from an observational study, not an experiment. There are possible confounding factors. (d) Since every 1-point improvement in the score predicts 0.93 more disease-free years, we see that an 8 point increase in the score predicts 8 · 0.93 = 7.44 more disease-free years. People with the best lifestyle score are predicted to have an additional 7.44 disease-free years. 9.18
(a) The predicted CESD score for this person is = 4.97 + 0.7923(20) = 20.816 CESD The residual is the actual CESD score of 27 minus the predicted score of 20.816. We have Residual = 27 − 20.816 = 6.184
(b) The estimated slope is 0.7923. If the DASS score is one point higher, the predicted score for CESD is 0.7923 points higher. (c) The test statistic is 8.33 and the p-value is 0.000. This small p-value gives strong evidence that DASS score is effective as a predictor of the CESD score. (d) In the output we see that R2 is 48.75%. This tells us that 48.75% of the variability in CESD scores can be explained by DASS scores. 9.19
(a) On the scatterplot, we have concerns if there is a curved pattern (there isn’t) or variability from the line increasing or decreasing in a consistent way (it isn’t) or extreme outliers (there aren’t any). We do not have any strong concerns about using these data to fit a linear model.
CHAPTER 9
471
(b) Using the fitted least squares line in the output, for a student with a verbal SAT score of 650, we have A = 2.03 + 0.00189V erbalSAT = 2.03 + 0.00189(650) = 3.2585 GP The model predicts that a student with a 650 on the verbal portion of the SAT exam will have about a 3.26 GPA at this college. (c) From the computer output the sample slope is b1 = 0.00189. This means that an additional point on the Verbal SAT exam will give a predicted increase in GPA of 0.00189. (And, likewise, a 100 point increase in verbal SAT will raise predicted GPA by 0.189.) (d) From the computer output the test statistic for testing H0 : β1 = 0 vs Ha : β1 = 0 is t = 6.99 and the p-value is 0.000. This small p-value gives strong evidence that the verbal SAT score is effective as a predictor of grade point average. (e) In the output we see that R2 is 12.5%. This tells us that 12.5% of the variability in grade point averages can be explained by verbal SAT scores. 9.20
(a) On the scatterplot, we have concerns if there is a curved pattern (there isn’t) or variability from the line increasing or decreasing in a consistent way (it isn’t) or extreme outliers (there aren’t any). We do not have any serious concerns about using these data to fit a linear model.
(b) In the output the correlation is r = 0.740 and the p-value is 0.000. This small p-value gives strong evidence of a linear relationship between body mass gain and when food is eaten. (c) From the computer output, the least squares line is BM Gain = 1.11 + 0.127 · DayP ct. For a mouse that eats 50% of calories during the day, we have BM Gain = 1.11 + 0.127(50) = 7.46 grams A mouse that eats 50% of its calories during the day is predicted to gain 7.46 grams over a 4-week period. (d) The estimated slope is b1 = 0.127. For an additional 1% of calories eaten during the day, body mass gain is predicted to go up by 0.127 grams. (e) For testing H0 : β1 = 0 vs Ha : β1 = 0 we see t = 5.50 and p-value ≈ 0. The percent of calories eaten during the day is an effective predictor of body mass gain. (f) The p-values for testing the correlation and the slope for these two variables is the same: both are 0.000. In fact, if we calculate the t-statistic for testing the correlation using r = 0.74 and n = 27 we have √ √ r n−2 0.74 27 − 2 t= √ = = 5.50 1 − r2 1 − (0.74)2 which matches the t-statistic for the slope. (g) We see that R2 = 54.7%. This tells us that 54.7% of the variability in body mass gain can be explained by the percent of calories eaten during the day. More than half of the variability in body mass gain can be explained simply by when the calories are eaten. (h) Using r = 0.740, we find that r2 = (0.740)2 = 0.5476, matching R2 = 54.7% up to round-off. 9.21
(a) Because the variable uses z-scores, we look to see if any of the values for the variable GM density are larger than 2 or less than −2. We see that there is only one value outside this range, a point with GM density ≈ −2.2 which is less than −2. This participant has a normalized grey matter score more than two standard deviations below the mean and has about 140 Facebook friends.
CHAPTER 9
472
(b) On the scatterplot, we have concerns if there is a curved pattern (there isn’t) or variability from the line increasing or decreasing (it isn’t) or extreme outliers (there aren’t any). We do not have any serious concerns about using these data to fit a linear model. (c) In the output the correlation is r = 0.436 and the p-value is 0.005. This is a small p-value so we have strong evidence of a linear relationship between number of Facebook friends and grey matter density. (d) The least squares line is F Bf riends = 367 + 82.4 · GM density. For a person with a normalized grey matter score of 0, the predicted number of Facebook friends is 367. For a person with GM Density one standard deviation above the mean, the predicted number of Facebook friends is 367 + 82.4(1) = 449.4. For a person with grey matter density one standard deviation below the mean, the predicted number of Facebook friends is 367 + 82.4(−1) = 284.6. (e) The p-value for a test of the slope is 0.005, exactly matching the p-value for the test of correlation. In fact, if we calculate the t-statistic for testing the correlation using r = 0.436 and n = 40 we have √ √ 0.436 40 − 2 r n−2 = = 2.99 t= √ 1 − r2 1 − (0.436)2 which matches the t-statistic for the slope in the computer output. (f) We see in the computer output that R2 = 19.0%. This tells us that 19% of the variability in number of Facebook friends can be explained by the normalized grey matter density in the areas of the brain associated with social perception. Since 19% is not very large, many other factors are involved in explaining the other 81% of the variability in number of Facebook friends. 9.22
(a) We see in the output that the estimated slope is b1 = 82.45 and the standard error of the slope is 27.58.
(b) The hypotheses for a test of the slope are H0 : β1 = 0 vs Ha : β1 = 0. The test statistic is t=
82.45 b1 − 0 = = 2.989 SE 27.58
The degrees of freedom are n − 2 = 40 − 2 = 38 and the area above 2.989 on a t-distribution with df = 38 is 0.0024. The p-value for this two-tailed test is 2(0.0024) = 0.0048. Up to round-off error, the test statistic (2.99) and p-value (0.005) match those given in the computer output. Since the p-value is small we reject H0 and conclude that the normalized grey matter density score is an effective predictor of the number of Facebook friends. (c) The estimated slope is b1 = 82.45 and the standard error is SE = 27.58. For 95% confidence we use a t-distribution with 40 − 2 = 38 degrees of freedom to find t∗ = 2.02. The confidence interval for the slope is b1
±
t∗ · SE
82.45 82.45
± ±
2.02(27.58) 55.71
26.74
to
138.16
Based on these data we are 95% sure that, for a one standard deviation increase in normalized brain grey matter density, the predicted increase in the number of Facebook friends is between 26.74 and 138.16.
CHAPTER 9 9.23
473
(a) For a pH reading of 6.0 we have AvgM ercury = 1.53 − 0.152 · pH = 1.53 − 0.152(6) = 0.618 The model predicts that fish in lakes with a pH of 6.0 will have an average mercury level of 0.618.
(b) The estimated slope is b1 = −0.152. This means that as pH increases by one unit, predicted average mercury level in fish will go down by 0.152 units. (c) The test statistic is t = −5.02, and the p-value is essentially zero. Since this is a very small p-value we have strong evidence that the pH of a lake is effective as a predictor of mercury levels in fish. (d) The estimated slope is b1 = −0.152 and the standard error is SE = 0.03031. For 95% confidence we use a t-distribution with 53 − 2 = 51 degrees of freedom to find t∗ = 2.01. The confidence interval for the slope is b1 −0.152
± ±
t∗ · SE 2.01(0.03031)
−0.152 −0.2129
± to
0.0609 −0.0911
Based on these data we are 95% sure that the slope (increase in mercury for a one unit increase in pH) is somewhere between −0.213 and −0.091. (e) We see that R2 is 33.1%. This tells us that 33.1% of the variability in average mercury levels in fish can be explained by the pH of the lake water that the fish come from. 9.24 There are several ways in which these data might raise concern. • There appears to be a curved pattern in the data more than a linear pattern. • The variability seems to be much greater for low levels of alkalinity than for high levels. • There appear to be two outliers in the middle of the plot that are quite far above the trend in the rest of the data. 9.25
(a) Since 79% gives the percent of variation accounted for, it is a value of R-squared.
(b) Since precipitation is accounting for prevalence of virus, the response variable is prevalence of the virus and the explanatory variable is precipitation. √ (c) Since R2 = 0.79, the correlation is r = 0.79 = 0.889. The correlation might be either 0.889 or −0.889, but we are told that prevalence increased as precipitation increased, so the correlation is positive. We have r = 0.889. 9.26
(a) We are testing whether the population correlation is positive, so the hypotheses are H0 : ρ = 0 vs Ha : ρ > 0, where ρ represents the correlation between number of pre-season wins and the number of regular season wins for all NFL teams in any season.
(b) We have n = 480 and r = 0.118. We find the test statistic using: √ √ r n−2 0.118 480 − 2 t= √ = √ = 2.60 1 − r2 1 − 0.1182 To find the p-value, we find the proportion beyond 2.60 in the right-tail of a t-distribution with 478 df, the p-value is 0.0048.
CHAPTER 9
474
(c) The p-value is smaller than 0.05 so we reject H0 . We do find some evidence of a positive linear association between wins in the pre-season and wins in the regular season in the NFL. 9.27
(a) We test: H0 : ρ = 0 vs Ha : ρ = 0, where ρ represents the population correlation between number of honeybee colonies and year. We use technology to calculate the test statistic and the p-value, or we can calculate them as follows: We have n = 18 and r = −0.41 so we can calculate the test statistic as √ √ −0.41 16 r n−2 = = −1.80 t= √ 1 − r2 1 − (−0.41)2 The proportion beyond −1.80 in the left tail of a t-distribution with 16 df is 0.045. This is a two-tail test, so the p-value is 2(0.045) = 0.09. At a 5% level, we do not have enough evidence to conclude that the number of bee colonies is linearly related to year.
(b) The percent of the variability refers to the coefficient of determination R2 = −0.412 = 0.168. We see that about 16.8% of the variability in number of honeybee colonies can be explained by year. 9.28
(a) The null hypothesis is that the correlation between number of bedrooms and number of bathrooms is equal to zero H0 : ρ = 0, and the alternative is that the correlation is greater than zero H1 : ρ > 0.
(b) Using technology we find that the correlation for the sample of 30 homes in HomeForSaleCA is r = 0.350. (c) We calculate the test statistic √ √ 0.35 · 30 − 2 r· n−2 = √ = 1.98 t= √ 1 − r2 1 − 0.352 We compare this to the upper tail of a t-distribution with 28 degrees of freedom to get a p-value of 0.029. (d) Since this is a small p-value, we conclude that there is enough evidence to show a positive association between number of bathrooms and number of bedrooms in houses for sale in California. 9.29
(a) The cases are countries of the world.
(b) The scatterplot with the regression line is shown below. We see no obvious curvature and the variability does not change consistently across the plot. The only mild concern we might have is that the negative residuals (points below the line) tend to stretch farther away than the points above the line. So there may be a bit of concern with the symmetry needed for the residuals to be normally distributed.
(c) Here is some output for fitting this model.
CHAPTER 9
475
The regression equation is LifeExpectancy = 64.17 + 0.811 Health Term Constant Health
Coef 64.17 0.811
S = 6.62
R-Sq =
SE Coef 2.16 0.192 27.15%
T-Value 29.71 4.23
P-Value 0.000 0.000
R-Sq(adj) =
20.74%
The slope is b1 = 0.811, so we predict that an increase in 1% expenditure on health care would correspond to an increase in life expectancy of about 0.81 years. (d) We have df = 48 so t∗ = 2.011. We find a 95% confidence interval by taking b1 ± t∗ · SE = 0.811 ± 2.011(0.192) = (0.425, 1.197) We are 95% confident the slope for predicting life expectancy using health expenditure for all countries is between 0.425 and 1.197. (e) Since our confidence interval does not contain zero we find that the percentage of government expenditure on health may be an effective predictor of life expectancy at a 5% level. We can also reach this conclusion by considering the p-value (0.000) shown for testing the slope in the regression output. (f) The population slope for all countries (β1 = 0.760) is not the same as the slope found from this sample (b1 = 0.811), but we see that β1 = 0.760 is easily captured within our confidence interval of 0.425 to 1.197. (g) In the output we see that R2 = 27.15%. This shows that 27.15% of the variability in life expectancy in these countries is explained by the percentage of the budget spent on health care. 9.30 We are testing whether the population slope is zero, so we look to see whether 0 is in the confidence interval. Since 0 is not in the confidence interval for boys, we reject H0 in that case and find a low p-value. Since 0 is in the confidence interval for girls, we do not reject H0 in that case and find a high p-value. Therefore, the p-values match up as follows: (a) The CI for boys shows that the slope is negative and significant, so the corresponding p-value here is 0.02. (b) The CI for girls shows that the slope might be positive, might be negative, and might be zero, so the test is not significant and the corresponding p-value is 0.33. 9.31
(a) Here is some output for fitting a model to predict Score based on Par. Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.0000 1.6029 0.624 0.5415 Par 1.0000 0.3953 2.530 0.0223 --Residual standard error: 1.118 on 16 degrees of freedom Multiple R-squared: 0.2857,Adjusted R-squared: 0.2411 F-statistic: 6.4 on 1 and 16 DF, p-value: 0.02229 The sample slope for Par is b1 = 1.0 and the p-value for testing the slope is 0.0223. This p-value is fairly small (less than 0.05), so we have good evidence to conclude that Par is an effective predictor of Score.
CHAPTER 9
476 (b) Here is some output for fitting a model to predict Score based on Distance.
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.562077 0.857300 2.989 0.00868 Distance 0.007148 0.002404 2.973 0.00897 --Residual standard error: 1.062 on 16 degrees of freedom Multiple R-squared: 0.3559,Adjusted R-squared: 0.3156 F-statistic: 8.84 on 1 and 16 DF, p-value: 0.008967
The sample slope for Distance is b1 = 0.00715 and the p-value for testing the slope is 0.00897. This p-value is very small (less than 0.05), so we have strong evidence to conclude that Distance is an effective predictor of Score. (c) The slope for Par (1.0) is much larger than the slope for Distance (0.00715). We see that when the par of a hole is larger by one there is a much bigger effect on the predicted score than when the distance of the hole increases by just one yard. (d) Looking at the R2 values of the outputs in parts (a) and (b), we see that Distance (R2 = 35.6%) explains a larger portion of the variability in these scores than does Par (R2 = 28.6%).
9.32
(a) The two scatterplots for predicting W inP ct using P tsF or or P tsAgainst are shown below. The conditions for fitting a linear model appear to be reasonable in both cases. There is no obvious curvature and the vertical variability is consistent across the plot.
CHAPTER 9
477
(b) Here is some output for predicting W inP ct using P tsF or. Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.133765 0.565723 -3.772 0.000772 PtsFor 0.023684 0.005084 4.659 7.05e-05 The least squares line is W inP ct = −2.1338 + 0.02368 · P tsF or. The p-value for testing H0 : β1 = 0 vs Ha : β1 = 0 is 0.00007 which is very small. This gives strong evidence that P tsF or is an effective predictor of NBA winning percentage. (c) Here is some output for predicting W inP ct using P tsAgainst. Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.789654 0.674621 4.135 0.000292 PtsAgainst -0.020588 0.006062 -3.396 0.002064 The least squares line is W inP ct = 2.7897 − 0.02059 · P tsAgainst. The p-value for testing H0 : β1 = 0 vs Ha : β1 = 0 is 0.002 which very small. We have very strong evidence that P tsAgainst is an effective predictor of NBA winning percentage. Note that the negative coefficient means the fewer points a team allows, the more games they are expected to win. (d) The model based on P tsF or has R2 = 43.7%, so P tsF or explains 43.7% of the variability in winning percentage for this season. For P tsAgainst we find R2 = 29.2%, so that variable explains 29.2% of the variability in winning percentage. (e) For the Raptors’ P tsF or = 114.4, we have W inP ct = −2.1338 + 0.02368(114.4) = 0.575. For the Raptors’ P tsAgainst = 108.4, we have W inP ct = 2.7897 − 0.02059(108.4) = 0.524. The model based on P tsF or comes closer to the Raptors’ 0.707 actual winning percentage for the 2018–2019 regular season, but neither value is very close. (f) For the entire league, the model based on offense (P tsF or) with R2 = 43.7% appears to be more effective for this season than the model based on defense (P tsAgainst) with R2 = 29.2%, but those values are very close as are the t-statistics for testing the slope (4.87 vs −4.71). The slopes of the models are close, gaining about 0.0237 in winning percentage for every extra 1.0 in points scored per game, while losing 0.0206 for every 1.0 in points against.
CHAPTER 9
478 9.33
(a) Here is a scatterplot of life expectancy vs birthrate for all countries.
There appears to be a strong negative linear association between birth rate and life expectancy with no obvious curvature and relatively consistent variability. (b) Yes. We have data on the entire population (all countries) so we can compute ρ = −0.879, which is different from 0. (c) We already have data on (essentially) the entire population so we don’t need to use statistical inference to determine where the population correlation might be located. (d) The slope of the linear regression model is β = −0.679, so the predicted life expectancy decreases by about 0.68 years for every one percent increase in birth rate of a country. (e) We cannot conclude that lowering the birthrate in a country would increase its life expectancy. We cannot make conclusions about causality because this is observational data, not data from a randomized experiment.
CHAPTER 9
479
Section 9.2 Solutions 9.34 The hypotheses are H0 : The model is ineffective vs Ha : The model is effective. We see in the ANOVA table that the F-statistic is 1.75 and the p-value is 0.187. This is a relatively large p-value so we do not reject H0 . We do not find evidence that the linear model is effective. 9.35 The hypotheses are H0 : The model is ineffective vs Ha : The model is effective. We see in the ANOVA table that the F-statistic is 21.85 and the p-value is 0.000. This is a small p-value so we reject H0 . We conclude that the linear model is effective. 9.36 The hypotheses are H0 : The model is ineffective vs Ha : The model is effective. We see in the ANOVA table that the F-statistic is 11.01 and the p-value is 0.001. This is a small p-value so we reject H0 . We conclude that the linear model is effective. 9.37 The hypotheses are H0 : The model is ineffective vs Ha : The model is effective. We see in the ANOVA table that the F-statistic is 2.18 and the p-value is 0.141. This is a relatively large p-value so we do not reject H0 . We do not find evidence that the linear model is effective. 9.38 We see in the table that the total degrees of freedom is n − 1 = 175, so the sample size is 176. To calculate R2 , we use 303.7 SSM odel = = 0.010 R2 = SST otal 30450.5 We see that R2 = 1.0%. 9.39 We see in the table that the total degrees of freedom is n − 1 = 175, so the sample size is 176. To calculate R2 , we use 3396.8 SSM odel = = 0.112 R2 = SST otal 30450.5 We see that R2 = 11.2%. 9.40 We see in the table that the total degrees of freedom is n − 1 = 360, so the sample size is 361. To calculate R2 , we use 352.97 SSM odel = = 0.030 R2 = SST otal 11864.20 We see that R2 = 3.0%. 9.41 We see in the table that the total degrees of freedom is n − 1 = 343, so the sample size is 344. To calculate R2 , we use 10.380 SSM odel = = 0.006 R2 = SST otal 1640.951 We see that R2 = 0.6%. 9.42 The sum of squares for Model and Error add up to Total sum of squares, so SSE = SST otal − SSM odel = 3000 − 250 = 2750 The degrees of freedom for the model is 1, since there is one explanatory variable. The sample size is 100 so the Total df is 100 − 1 = 99 and the Error df is then 100 − 2 = 98. We calculate the mean squares by dividing sums of squares by degrees of freedom M SM odel =
250 = 250 1
and
M SError =
2750 = 28.061 98
CHAPTER 9
480 The F-statistic is
250 M SM odel = = 8.91 M SError 28.061 These values are all shown in the following ANOVA table: F =
Source
df
SS
MS
F-statistic
p-value
Model
1
250
250
8.91
0.0036
Error
98
2750
28.061
Total
99
3000
The p-value is found using the upper-tail of an F-distribution with df = 1 for the numerator and df = 98 for the denominator. For the F-statistic of 8.91, we see that the p-value is 0.0036. 9.43 The sum of squares for Model and Error add up to Total sum of squares, so SSE = SST otal − SSM odel = 5820 − 800 = 5020 The degrees of freedom for the model is 1, since there is one explanatory variable. The sample size is 40 so the Total df is 40 − 1 = 39 and the Error df is then 40 − 2 = 38. We calculate the mean squares by dividing sums of squares by degrees of freedom 800 = 800 1
M SM odel =
and
M SError =
5020 = 132.1 38
The F-statistic is
M SM odel 800 = = 6.06 M SError 132.1 These values are all shown in the following ANOVA table: F =
Source
df
SS
MS
F-statistic
p-value
Model
1
800
800
6.06
0.0185
Error
38
5020
132.1
Total
39
5820
The p-value is found using the upper-tail of an F-distribution with df = 1 for the numerator and df = 38 for the denominator. For the F-statistic of 6.06, we see that the p-value is 0.0185. 9.44 The sum of squares for Model and Error add up to Total sum of squares, so SST otal = SSM odel + SSE = 8.5 + 247.2 = 255.7 The degrees of freedom for the model is 1, since there is one explanatory variable. The sample size is 25 so the Total df is 25 − 1 = 24 and the Error df is then 25 − 2 = 23. We calculate the mean squares by dividing sums of squares by degrees of freedom M SM odel =
8.5 = 8.5 1
and
M SError =
247.2 = 10.748 23
CHAPTER 9
481
The F-statistic is
8.5 M SM odel = = 0.791 M SError 10.748 These values are all shown in the following ANOVA table: F =
Source
df
SS
MS
F-statistic
p-value
Model
1
8.5
8.5
0.791
0.383
Error
23
247.2
10.748
Total
24
255.7
The p-value is found using the upper-tail of an F-distribution with df = 1 for the numerator and df = 23 for the denominator. For the F-statistic of 0.791, we see that the p-value is 0.383. 9.45 The sum of squares for Model and Error add up to Total sum of squares, so SSM odel = SST otal − SSE = 23,693 − 15,571 = 8122 The degrees of freedom for the model is 1, since there is one explanatory variable. The sample size is 500 so the Total df is 500 − 1 = 499 and the Error df is then 500 − 2 = 498. We calculate the mean squares by dividing sums of squares by degrees of freedom M SM odel =
8122 = 8122 1
and
M SError =
15,571 = 31.267 498
The F-statistic is
M SM odel 8122 = = 259.76 M SError 31.267 These values are all shown in the following ANOVA table: F =
Source
df
SS
MS
F-statistic
p-value
Model
1
8122
8122
259.76
0.000
Error
498
15571
31.267
Total
499
23693
The p-value is found using the upper-tail of an F-distribution with df = 1 for the numerator and df = 498 for the denominator. We don’t really need to look this one up, though. The F-statistic of 259.76 is so large that we can predict that the p-value is essentially zero. 9.46 The hypotheses are H0 : The model is ineffective vs Ha : The model is effective. We see in the ANOVA table that the F-statistic is 8.94 and the p-value is 0.005. This is a small p-value so we reject H0 . We conclude that the linear model to predict the number of Facebook friends using the normalized brain density score is effective.
CHAPTER 9
482
9.47 The hypotheses are H0 : The model is ineffective vs Ha : The model is effective. We see in the ANOVA table that the F-statistic is 7.44 and the p-value is 0.011. This p-value is significant at a 5% level although not at a 1% level. At a 5% level, we conclude that the linear model to predict calories in cereal using the amount of fiber is effective. 9.48
(a) We see that the predicted price when CostColor = 10 is given by P rice = 378 − 18.6CostColor = 378 − 18.6(10) = 192 The predicted price for a printer where each color page costs about 10 cents to print is $192.
(b) Since the slope (b1 = −18.6) is negative, the price of a printer goes down as the cost of color printing increases. In other words, cheaper printers cost more to print in color. (c) Since the total degrees of freedom is n − 1 = 19, the sample size is 20. (d) To calculate R2 , we use R2 =
57,604 SSM odel = = 0.423 SST otal 136,237
We see that R2 = 42.3%, which tells us that 42.3% of the variability in prices of inkjet printers can be explained by the cost to print a page in color. (e) The hypotheses are H0 : The model is ineffective vs Ha : The model is effective. We see in the ANOVA table that the F-statistic is 13.19 and the p-value is 0.002. This p-value is quite small so we reject H0 . There is evidence that the linear model to predict price using the cost of color printing is effective. 9.49
(a) We see that the predicted grade point average for a student with a verbal SAT score of 550 is given by A = 2.03 + 0.00189V erbalSAT = 2.03 + 0.00189(550) = 3.0695 GP We expect an average GPA of about 3.07 for students who get 550 on the verbal portion of the SAT.
(b) Since the total degrees of freedom is n − 1 = 344, the sample size is 345. (c) To calculate R2 , we use R2 =
6.8029 SSM odel = = 0.125 SST otal 54.5788
We see that R2 = 12.5%, which tells us that 12.5% of the variability in grade point averages can be explained by Verbal SAT score. (d) The hypotheses are H0 : The model is ineffective vs Ha : The model is effective. We see in the ANOVA table that the F-statistic is 48.84 and the p-value is 0.000. This p-value is very small so we reject H0 . There is evidence that the linear model to predict grade point average using Verbal SAT score is effective. 9.50
(a) We see in the output that the correlation is r = −0.366 and the p-value for testing the correlation is 0.015.
(b) We see in the output that the slope of the regression line is −3.34, the t-statistic for testing the slope is −2.55, and the p-value is 0.015. (c) In the ANOVA table in the output, we see that the F-statistic is 6.50 and the p-value is 0.015. (d) The p-value of 0.015 is the same for all three tests. This will always be the case for a regression model with a single predictor.
CHAPTER 9
483
(e) In every case, the p-value is less than 0.05 so we reject H0 . The conclusions are similar in all three tests: there is a linear relationship, the population slope is not zero, and the model is effective. In every case, our conclusion in context is that the number of years playing football is associated with and effective at predicting percentile score on a cognition test. 9.51
(a) We have a small concern about the extreme point in the lower right corner, but otherwise a linear model seems acceptable. (b) The regression equation is M atingActivity = 0.480 − 0.323F emalesHiding. Predicted mating activity for a group in which the females spend 50% of the time in hiding is M atingActivity = 0.480 − 0.323(0.50) = 0.319 (c) For a test of the slope, the hypotheses are H0 : β1 = 0 vs Ha : β1 = 0. We see in the output that the t-statistic is −2.56 and the p-value is 0.033. At a 5% level we conclude that percent of time females are in hiding is an effective predictor of level of mating activity.
(d) The hypotheses are H0 : The model is ineffective vs Ha : The model is effective. We see in the ANOVA table that the F-statistic is 6.58 and the p-value is 0.033. At a 5% level we find some evidence that the linear model based on female time spent hiding is effective at predicting mating activity. (e) The p-values for the t-test and ANOVA are exactly the same. This will always be the case for a simple regression model. (f) In the output we see that R2 = 45.1%, so 45.1% of the variability in level of mating activity is explained by the percent of time females spend in hiding. 9.52
(a) The F-statistic is 69.45 and the p-value is 0.000.
(b) The p-value is very low, so we reject H0 and we have strong evidence that this linear model to predict CESD score from DASS score is effective. (c) The total degrees of freedom is n − 1 and we see that it is 74 in this analysis. Thus we have n = 75. 9.53 Here is some output for fitting the two models. Height: Estimate Std. Error t value Pr(>|t|) (Intercept) 14.19862 7.52464 1.887 0.0599 Height 0.91491 0.04389 20.845 0.0000 --Residual standard error: 9.414 on 409 degrees of freedom Multiple R-squared: 0.5151,Adjusted R-squared: 0.513 F-statistic: 434.5 on 1 and 409 DF, p-value: < 2.2e-16 Foot: Estimate Std. Error t value Pr(>|t|) (Intercept) 83.4552 4.7368 17.62 0.0000 Foot 3.4835 0.1884 18.49 0.0000 --Residual standard error: 9.937 on 402 degrees of freedom Multiple R-squared: 0.4596,Adjusted R-squared: 0.4582 F-statistic: 341.9 on 1 and 402 DF, p-value: < 2.2e-16
CHAPTER 9
484 (a) Foot (b1 = 3.4835) has a larger slope than Height (b1 = 0.91491). (b) Height (s = 9.414) has a smaller standard deviation of error than Foot (s = 9.937). (c) Height (R2 = 51.5%) explains more variability in arms span than Foot (R2 = 46.0%).
(d) Both the smaller standard deviation of error in (b) and the larger R2 in (c) indicate that Height is somewhat more effective than Foot for predicting Armspan. The larger slope for Foot is not so relevant since it also has a much larger standard error for the slope. 9.54
= 22.33 + 0.104P enM ins. The predicted points for a player (a) The regression equation is P oints with 20 penalty minutes is = 22.33 + 0.104(20) = 24.41 P oints We expect players with 20 penalty minutes to average about 24.4 points. The predicted points for a player with 150 penalty minutes is = 22.33 + 0.104(150) = 37.93 P oints We expect players with 150 penalty minutes to average about 37.9 points.
(b) The slope is 0.104. If a player has one additional penalty minute, the predicted number of points goes up by 0.104. (c) For a test of the slope, the hypotheses are H0 : β1 = 0 vs Ha : β1 = 0. We see in the output that the t-statistic is 0.69 and the p-value is 0.499. This is a very large p-value so we conclude that number of penalty minutes is not an effective predictor of number of points. (d) The hypotheses are H0 : The model is ineffective vs Ha : The model is effective. We see in the ANOVA table that the F-statistic is 0.47 and the p-value is 0.499. This is a very large p-value so we do not reject H0 . We do not find evidence that the linear model based on penalty minutes is effective at predicting number of points for hockey players. (e) The p-values for the t-test and ANOVA are exactly the same. This will always be the case for a simple regression model. (f) In the output we see that R2 = 1.93%, so 1.93% of the variability in number of points scored is explained by the number of penalty minutes. Here again, we see that penalty minutes does not give us very much information about the number of points a player will score. 9.55
(a) To find the standard deviation of the error term, s , we use the value of SSE from the ANOVA table in the previous exercise, which is 9979.8, and the sample size of n = 24: SSE 8607.7 = = 18.94 s = n−2 24 The standard deviation of the error term is 18.94, which we also see in the output as “S = 18.938.”
(b) To find the standard error of the slope, we need to use s = 198.94 from part (a). Also, the explanatory variable is P enM in so sx is the standard deviation of the P enM in values, which is 24.92. Using n = 24, we have s 18.94 √ = 0.152 √ SE = = sx · n − 1 24.92 · 25 The standard error of the slope in the model is 0.153, which we also see in the output as “SE Coef” for P enM in.
CHAPTER 9 9.56
485
(a) To calculate R2 based on the information in the ANOVA table, we use R2 =
2.0024 SSM odel = = 0.331 SST otal 6.0479
We see that R2 = 33.1%, which tells us that 33.1% of the variability in average mercury levels can be explained by the pH of the lake water. (b) To find the standard deviation of the error term, s , we use the value of SSE from the ANOVA table, which is 4.0455, and the sample size of n = 53: SSE 4.0455 = = 0.282 s = n−2 51 The standard deviation of the error term (labeled as “S= ” in the output) is 0.282. (c) To find the standard error of the slope, we need to use s = 0.282 from part (b). Also, the explanatory variable is pH so sx is the standard deviation of the pH values, which is 1.288. Using n = 53, we have SE =
s 0.282 √ = 0.0303 √ = sx · n − 1 1.288 · 52
The standard error of the slope in the model (shown in the column labeled “SE Coeff”) is 0.0303. 9.57
(a) To find the standard deviation of the error term, s , we use the value of SSE from the ANOVA table, which is 27,774.1, and the sample size of n = 30: SSE 27,774.1 = = 31.495 s = n−2 28 The standard deviation of the error term is s = 31.495.
(b) To find the standard error of the slope, we need to use s = 31.495 from part (a). Also, the explanatory variable is F iber so sx is the standard deviation of the F iber values, which is 1.880. Using n = 30, we have s 31.495 √ = 3.11 √ SE = = sx · n − 1 1.880 · 29 The standard error of the slope in the model is SE = 3.11. 9.58
(a) To find the standard deviation of the error term, s , we use the value of SSE from the ANOVA table, which is 47.7760, and the sample size of n = 345: SSE 47.7760 = = 0.373 s = n−2 343 The standard deviation of the error term is s = 0.373.
(b) To find the standard error of the slope, we need to use s = 0.373 from part (a). Also, the explanatory variable is V erbalSAT so sx is the standard deviation of the V erbalSAT values, which is 74.29. Using n = 345, we have s 0.373 √ √ SE = = 0.00027 = sx · n − 1 74.29 · 344 The standard error of the slope in the model is SE = 0.00027.
486
CHAPTER 9
9.59
(a) To test H0 : β1 = 0 vs Ha : β1 = 0 using a t-test we use the following computer output for this model. Predictor Constant TV
Coef 7.694 0.5955
SE Coef 1.803 0.2923
T 4.27 2.04
P 0.000 0.047
The test statistic is t = 2.04 and the p-value=0.047 which is (barely) significant at a 5% level. We have some evidence that T V time is a worthwhile predictor of Exercise hours. (b) For the ANOVA with a single predictor we test H0 : β1 = 0 vs Ha : β1 = 0 (or the model is ineffective vs the model is effective). Source Regression Residual Error Total
DF 1 48 49
SS 252.27 2917.73 3170.00
MS 252.27 60.79
F 4.15
P 0.047
The test statistic is F = 4.15 and the p-value is 0.047. Again this is significant at a 5% level and we conclude that T V is an effective predictor of Exercise. (c) To compare the correlation between Exercise and T V we test H0 : ρ = 0 vs Ha : ρ = 0 where ρ is the correlation between these variables for the population of all college students. Using technology we find the correlation between these two variables for the 50 cases in the sample is r = 0.282. We compute the t-statistic as √ √ 0.282 50 − 2 r n−2 = = 2.04 t= √ 1 − r2 1 − (0.282)2 Using two-tails and a t-distribution with 48 degrees of freedom, we find p-value= 2(0.0234) = 0.0468. This is just below a 5% significance level, so we have some evidence of an association between amounts of exercise and TV viewing. (d) The t-statistics are the same for the tests of slope and correlation (the F-statistic is the square of this value). The p-values are the same for all three tests. 9.60
(a) Here is some output for computing and testing the correlation between Lif eExpectancy and Health using the data for this sample of 50 countries. Pearson correlation of Health and LifeExpectancy = 0.521 P-Value = 0.000
We see that the correlation is r = 0.521 and the p-value (to three decimal places) for testing H0 : ρ = 0 vs Ha : ρ = 0 is 0.000. Thus we have strong evidence to conclude (at a 5% significance level) that there is a positive association between life expectancy and spending on health care. (b) The regression equation is Lif eExpectancy = 64.17 + 0.811Health. For a test of the slope, H0 : β1 = 0 vs Ha : β1 = 0, we obtain the output below Term Constant Health
Coef 64.17 0.811
SE Coef 2.16 0.192
T-Value 29.71 4.23
P-Value 0.000 0.000
CHAPTER 9
487
We see that the t-statistic is 4.23 and the p-value is shown as 0.000, so we again have strong evidence, at a 5% significance level, to find a positive relationship between life expectancy and government expenditures for health care. (c) We use technology to complete an ANOVA test for the effectiveness of this model. The relevant computer output is Source Regression Error Total
DF 1 48 49
SS 784.7 2105.3 2890.0
MS 784.68 43.86
F-Value 17.89
P-Value 0.000
We see that the F-statistic is 17.79 with a p-value shown as 0.000. Once again, this is small enough (using a 5% significance level) to conclude that health spending is an effective predictor of life expectancy. (d) The p-values for both t-tests and the ANOVA (which all match at 0.000) give strong that this model for predicting life expectancy based on health expenditures would be considered effective at any reasonable significance level. 9.61 Here is some computer output for fitting the model to predict Beds using Baths for the data in HomesForSaleCA. Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.3585 0.4092 5.763 3.47e-06 Baths 0.3393 0.1715 1.978 0.0578 --Residual standard error: 0.6496 on 28 degrees of freedom Multiple R-squared: 0.1226,Adjusted R-squared: 0.0913 F-statistic: 3.914 on 1 and 28 DF, p-value: 0.057817 Analysis of Variance Table Response: Beds Df Sum Sq Mean Sq F value Pr(>F) Baths 1 1.6514 1.65144 3.9136 0.05781 . Residuals 28 11.8152 0.42197 = 2.3585 + 0.3393Baths. The predicted number of bedrooms (a) The fitted regression equation is Beds for a house with 3 bathrooms is = 2.3585 + 0.33393(3) = 3.38 Beds The average number of bedrooms for homes for sale in California with 3 bathrooms is predicted to be about 3.4 bedrooms. (b) From the output we see that the t-statistic for testing H0 : β1 = 0 vs Ha : β1 = 0 is t = 1.978 and the p-value is 0.0578. This gives some evidence that the number of bathrooms is an effective predictor of the number of bedrooms, but would not be significant at a 5% level. (c) The F-statistic in the ANOVA table is F = 43.9136 and the p-value is 0.0578 (the same as the t-test). Again this give some evidence, but not convincing at a 5% level, that this model based on number of bathrooms is effective at predicting the number of bedrooms in houses for sale in California.
CHAPTER 9
488
(d) In the output we see that R2 = 12.26%, which means that about 12.3% of the variability in the number of bedrooms in houses in California is explained by the number of bathrooms. 9.62
(a) Here is some ANOVA output for predicting Size of the California homes using the P rice. The SST otal value should be the same for any other predictor of Size. Analysis of Variance Table Response: Size Df Sum Sq Mean Sq F value Pr(>F) Price 1 2359041 2359041 15.709 0.0004634 Residuals 28 4204831 150173 Total 29 6563872 The sum of squares for the “Total” is 6,563,872 which is the total variability (sum of squared differences from the mean) for the sizes of the 30 homes in this sample.
(b) Here are sample correlations with Size for each of the potential predictors in the HomesForSaleCA dataset.
Size
Price 0.5995
Beds 0.4490
Baths 0.6435
If we square these to get the percent of total variability in Size that each predictor explains, we get R2 values of 35.94%, 20.16%, and 41.29%, respectively. This indicates that number of bathrooms would be the most effective predictor of the size of these homes. (c) In part (b) we see that R2 = 41.29% is the portion of the variability in size of these homes that is explained by the number of bathrooms. Since SST otal = 6,563,872 in part (a), the amount explained by the Baths variable is SSM odel = 0.4129 · SST otal = 0.4129(6563872) = 2,710,223. This can be confirmed (up to some difference due to rounding R2 )in the ANOVA output below for fitting a regression model to predict Size based on Baths. Analysis of Variance Table Response: Size Df Sum Sq Mean Sq F value Pr(>F) Baths 1 2709896 2709896 19.688 0.0001289 *** Residuals 28 3853976 137642 Total 29 6563872 (d) We see in part (b) that the weakest predictor of Size (smallest correlation) is number of bedrooms (r = 0.4490). We have R2 = 0.44902 = 0.2016, so 20.16% of the total variability in these California home sizes is explained by the number of bedrooms. Using the total variability in part (a) we see that the amount explained by Beds is SSM odel = 0.2016(6563872) = 1,323,277. (e) The output below shows the ANOVA table for assessing the effectiveness as Beds as a predictor of Size. Analysis of Variance Table Response: Size Df Sum Sq Mean Sq F value
Pr(>F)
CHAPTER 9 Beds 1 1323367 1323367 Residuals 28 5240504 187161 Total 29 6563872
489 7.0707 0.01281 *
The small p-value for the F-statistic (0.0128) gives enough evidence to conclude that, even though it is the weakest of the three potential predictors, Beds still gives an effective model for predicting the size of California homes. 9.63 Answers will vary.
CHAPTER 9
490 Section 9.3 Solutions 9.64
(a) The confidence interval for the mean response is always narrower than the prediction interval for the response, so in this case the confidence interval for the mean response is interval A (10 to 14) and the prediction interval for the response is interval B (4 to 20).
(b) The predicted value is in the center of both intervals, so we can use the average of the endpoints for either interval to find the predicted value is (10+14)/2=12. 9.65
(a) The confidence interval for the mean response is always narrower than the prediction interval for the response, so in this case the confidence interval for the mean response is interval A (94 to 106) and the prediction interval for the response is interval B (75 to 125).
(b) The predicted value is in the center of both intervals, so we can use the average of the endpoints for either interval to find the predicted value is (94+106)/2=100. 9.66
(a) The confidence interval for the mean response is always narrower than the prediction interval for the response, so in this case the confidence interval for the mean response is interval B (4.7 to 5.3) and the prediction interval for the response is interval A (2.9 to 7.1).
(b) The predicted value is in the center of both intervals, so we can use the average of the endpoints for either interval to find the predicted value is (4.7+5.3)/2=5. 9.67
(a) The confidence interval for the mean response is always narrower than the prediction interval for the response, so in this case the confidence interval for the mean response is interval B (19.2 to 20.8) and the prediction interval for the response is interval A (16.8 to 23.2).
(b) The predicted value is in the center of both intervals, so we can use the average of the endpoints for either interval to find the predicted value is (19.2+20.8)/2=20. 9.68
(a) The 95% confidence interval for the mean response is 6.535 to 8.417. We are 95% confident that for mice that eat 50% of calories during the day, the average weight gain will be between 6.535 grams and 8.417 grams.
(b) The 95% prediction interval for the response is 2.786 to 12.166. We are 95% confident that a mouse that eats 50% of its calories during the day will gain between 2.786 grams and 12.166 grams. 9.69
(a) The 95% confidence interval for the mean response is −0.013 to 4.783. We are 95% confident that for mice that eat 10% of calories during the day, the average weight change will be between losing 0.013 grams and gaining 4.783 grams.
(b) The 95% prediction interval for the response is −2.797 to 7.568. We are 95% confident that a mouse that eats 10% of its calories during the day will have a weight change between losing 2.797 grams and gaining 7.568 grams. 9.70
(a) The 95% confidence interval for the mean response is 122.0 to 142.0. We are 95% confident that the average number of calories for all cereals with 10 grams of sugars per cup will be between 122 and 142 calories per cup.
(b) The 95% prediction interval for the response is 76.6 to 187.5. We are 95% confident that a cereal with 10 grams of sugar will have between 76.6 and 187.5 calories. 9.71
(a) The 95% confidence interval for the mean response is 143.4 to 172.4. We are 95% confident that the average number of calories for all cereals with 16 grams of sugars per cup will be between 143.4 and 172.4 calories per cup.
CHAPTER 9
491
(b) The 95% prediction interval for the response is 101.5 to 214.3. We are 95% confident that a cereal with 16 grams of sugar will have between 101.5 and 214.3 calories. 9.72
(a) We have Cognition = 102.3 − 3.34 · Y ears = 102.3 − 3.34(8) = 75.58. The predicted cognitive score is 75.6 for a person who has played 8 years of football.
(b) The prediction interval is designed to capture most of the responses while the confidence interval is designed to only capture the mean response. The prediction interval will always be wider, so the 95% confidence interval is given in II, while the 95% prediction interval is given in I. 9.73
(a) We have Cognition = 102.3 − 3.34 · Y ears = 102.3 − 3.34(12) = 62.22. The predicted cognitive score is 62.2 for a person who has played 12 years of football.
(b) The prediction interval is designed to capture most of the responses while the confidence interval is designed to only capture the mean response. The prediction interval will always be wider, so the 95% confidence interval is given in II, while the 95% prediction interval is given in I. 9.74
(a) Here is some output for fitting this regression model: Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -56.81675 154.68102 -0.367 0.716145 Size 0.33919 0.08558 3.963 0.000463 The size of a home is a significant predictor of the price. The estimated slope from the data is 0.3392 and the p-value for testing this coefficient (0.00463) is very small.
(b) We find a point estimate for the price of a 2000 square foot home by substituting Size = 2000 into the fitted regression equation. This gives P rice = −56.81875 + 0.33919(2000) = 621.567 The price of a 2000 square foot home in California is predicted to be $621,567. (c) We use technology to find a 90% confidence interval for the average price to give the output below: fit lwr upr 621.5666 544.0678 699.0653 We are 90% confident that the average price of all 2000 square foot homes in California is between $544.1 thousand and $699.1 thousand. (d) We use technology to find a 90% prediction interval for the price of a 2000 square foot home in California to give the output below: fit lwr upr 621.5666 240.6161 1002.517 We are 90% confident that a 2000 square foot house in California would be worth between $240.616 thousand and $1.0025 million. 9.75
(a) Using technology, we see that the 95% prediction interval for life expectancy of a country with 3% expenditure on health care is (52.88, 80.34). We are 95% confident that a country with 3% government expenditure on health care has a life expectancy between 52.9 and 80.3 years.
CHAPTER 9
492
(b) Using technology, we see that the 95% prediction interval for life expectancy of a country with 10% expenditure on health care is (58.84, 85.74). We are 95% confident that a country with 10% government expenditure on health care has a life expectancy between 58.8 and 85.7 years. (c) Using technology, we see that the 95% prediction interval for life expectancy of a country with 50% expenditure on health care is (84.32, 125.18). We are 95% confident that a country with 50% government expenditure on health care has a life expectancy between 84.3 and 125.2 years. (d) We calculate the width of each interval: (a) 80.34 − 52.88 = 27.46, (b) 85.74 − 58.84 = 26.90, and (c) 125.18 − 84.32 = 40.86. We notice that the interval for (a) is slightly larger than (b), and the interval for (c) is much larger than the other two. This is due to the fact that (c) is extrapolating far out from the data used to create the model. In practice we would have much less confidence in an interval for a point that is so far from the data that built the original model — a country with a life expectancy of 125 years would be pretty surprising! The smallest width is (b), where the predictor value (10%) is closest to the mean of the Health values (10.14%). 9.76
(a) For a student who gets a 500 on the Verbal SAT exam, the predicted GPA is A = 2.03 + 0.00189(500) = 2.98 GP For a student who gets a 700 on the Verbal SAT exam, the predicted GPA is A = 2.03 + 0.00189(700) = 3.35 GP
(b) For the regression intervals for each Verbal SAT score we have the following i. Using technology, we see that a 95% confidence interval for mean GPA for students with a 500 Verbal SAT score is 2.92 to 3.04. We are 95% confident that the mean grade point average for all students at this university who have a 500 Verbal SAT score is between 2.92 and 3.04. ii. Using technology, we see that a 95% prediction interval for GPA for students with a 500 Verbal SAT score is 2.24 to 3.72. We are 95% confident that the grade point average for students at this university who have a 500 Verbal SAT score is between 2.24 and 3.72. iii. Using technology, we see that a 95% confidence interval for mean GPA for students with a 700 Verbal SAT score is 3.29 to 3.43. We are 95% confident that the mean grade point average for all students at this university who have a 700 Verbal SAT score is between 3.29 and 3.43. iv. Using technology, we see that a 95% prediction interval for GPA for students with a 700 Verbal SAT score is 2.62 to 4.10. We are 95% confident that the grade point average for students at this university who have a 700 Verbal SAT score is between 2.62 and 4.00. (At this university, the highest possible GPA is 4.0.) P G = 6.69 + 1.687 · CityM P G. For cars with 9.77 Using technology the prediction equation is HwyM CityM P G = 20, the predicted highway mileage is HwyM P G = 6.69 + 1.687 · 20 = 40.43 mpg. Here is some computer output giving 95% intervals when CityM P G = 20 Prediction Fit SE Fit 40.4243 0.4736
95% CI (39.4855, 41.3631)
95% PI (33.4778, 47.3707)
Since we want to find an interval to contain the mean highway mpg for all cars models with 20 mpg in the city, we use the “95% CI”. Thus we are 95% sure that the mean highway mpg for all new car models that have a city mpg rating of 20 is somewhere between 39.49 and 41.36 mpg.
CHAPTER 9
493
P G = 6.69 + 1.687 · CityM P G. Since the Jeep 9.78 Using technology the prediction equation is HwyM P G = 6.69 + 1.687 · 16 = 33.68 mpg. Renegade has CityM P G = 16, its predicted highway mileage is HwyM Here is some computer output giving 95% intervals when CityM P G = 16 Prediction Fit SE Fit 95% CI 33.678 0.3315 (33.021, 34.335)
95% PI (26.764, 40.592)
Since we want to find an interval to contain the highway mpg for a particular car model, we use the “95% PI”. Thus we are 95% sure that the highway mpg for a Jeep Renegade is somewhere between 26.76 and 40.59 mpg. 9.79
= 2.562 + 0.007148 · Distance. Substituting Distance = 476 gives (a) The least squares line is Score = 2.562 + 0.007148 · 476 = 5.96 Score
(b) Here is some output for finding a 95% prediction interval for Score using the linear model based on Distance when Distance = 476 fit lwr upr 5.964606 3.552106 8.377106 We see that 95% of the time, this golfer’s score for a 476 yard hole should be between 3.55 and 8.38. (c) This 95% prediction interval (3.55 to 8.38) includes the score of a birdie 4, so that result is not too surprising for a 476 yard hole. 9.80
= 1.0 + 1.0 · P ar. Substituting P ar = 5 gives (a) The least squares line is Score = 1.0 + 1.0 · 5 = 6.0 Score
(b) Here is some output for finding a 95% prediction interval for Score using the linear model based on Par when P ar = 5 fit lwr upr 6 3.424778 8.575222 We see that 95% of the time, this golfer’s score for a par 5 hole should be between 3.43 and 8.58. (c) This 95% prediction interval (3.43 to 8.58) includes the score of a birdie 4, so that result is not too surprising for a par 5 hole. 9.81 To calculate 95% confidence and prediction intervals for election margins we use a t-distribution with 12 − 2 = 10 degrees of freedom to find t∗ = 2.228. We see from the computer output that the standard deviation of the error is s = 5.66 and from the summary statistics for the predictor (Approval) that x = 52.92 and sx = 11.04. We use these values to find the requested intervals.
CHAPTER 9
494
ˆ (a) When the approval rating is 50%, the predicted margin is M argin = −36.76 + 0.839(50) = 5.19. We find the 95% confidence interval using (x∗ − x)2 1 ∗ + Approval ± t s n (n − 1)s2x (50 − 52.92)2 1 + 5.19 ± 2.228(5.66) 12 11(11.042 ) 5.19 1.41
± to
3.78 8.97
We are 95% sure that the mean margin of victory for all presidents with 50% approval ratings is between 1.4% and 9.0%. ˆ (b) When the approval rating is 50%, the predicted margin is M argin = −36.76 + 0.839(50) = 5.19. We find the 95% prediction interval using (x∗ − x)2 1 Approval ± t∗ s 1 + + n (n − 1)s2x (50 − 52.92)2 1 + 5.19 ± 2.228(5.66) 1 + 12 11(11.042 ) 5.19 −7.97
± to
13.16 18.35
We are 95% sure that the margin of victory for a president with a 50% approval rating is between losing by 8.0 points and winning by 18.4 points. We should not have very much confidence in predicting the outcome of this election! (c) If we have no information about the approval rating, we can find a confidence interval for the mean margin of victory/defeat using an ordinary t-interval for the sample of n = 12 margins. From the output we see that M argin = 7.62 and the standard deviation is 10.72. For 95% confidence with 11 − 1 = 11 degrees of freedom we have t∗ = 2.201 and we have M argin
±
s t∗ √ n
7.62
±
7.62 0.81
± to
10.72 2.201 √ 12 6.81 14.43
We are 95% sure that the mean margin of victory for all incumbent presidents, regardless of approval rating, is between 0.8% and 14.4%. Note that this interval is considerably wider than the interval in part (a) that uses the additional information from the approval ratings. 9.82 To calculate the 95% confidence intervals and prediction intervals for final scores we use a t-distribution with 20 − 2 = 18 degrees of freedom to find t∗ = 2.10. We see from the computer output that the standard deviation of the error is s = 3.598 and from the summary statistics for the predictor (F irst) that x = −0.550 and sx = 3.154. We use these values to find the requested intervals.
CHAPTER 9
495
(a) When the first round score is 0, we see that the predicted final round score is F inal = 0.1617 + 01.4756(0) = 0.1617. To find a 95% confidence interval for the mean final score after a 0 on the first round, we have (x∗ − x)2 1 ∗ + F inal ± t s n (n − 1)s2x (0 − (−0.550))2 1 + 0.1617 ± 2.10(3.598) 20 19(3.1542 ) 0.1617 −1.55
± to
1.716 1.88
We are 95% sure that the mean final score of golfers who shoot a 0 on the first round at the Masters is between -1.55 and 1.88. (b) When the first round score is −5, we see that the predicted final round score is F inal = 0.1617 + 1.4756(−5) = −7.216. To find a 95% prediction interval for the final score after a −5 on the first round, we have (x∗ − x)2 1 F inal ± t∗ s 1 + + n (n − 1)s2x (−5 − (−0.550))2 1 + −7.216 ± 2.10(3.598) 1 + 20 19(3.1542 ) −7.216 −15.34
± to
8.119 0.90
We are 95% sure that the four round score of a golfer who shoots a −5 on the first round at the Masters is between −15.34 and 0.90. Since golf scores are only integers we could report this prediction interval as −15 to +1 for the golfer’s final score. (c) When the first round score is +3, we see that the predicted final round score is F inal = 0.1617 + 1.4756(3) = 4.589. To find a 95% confidence interval for the mean final score after a +3 on the first round, we have (x∗ − x)2 1 + F inal ± t∗ s n (n − 1)s2x (3 − (−0.550))2 1 + 4.589 ± 2.10(3.598) 20 19(3.1542 ) 4.589
±
2.581
2.01
to
7.17
We are 95% sure that the mean final score of all golfers who shoot a +3 on the first round at the Masters is between 2.01 and 7.17.
CHAPTER 10
496 Section 10.1 Solutions
10.1 There are four explanatory variables: X1, X2, X3, and X4. The one response variable is Y . 10.2 The predicted response is Y = 43.4 − 6.82X1 + 1.70X2 + 1.70X3 + 0.442X4 = 43.4 − 6.82(8) + 1.70(6) + 1.70(4) + 0.442(50) = 27.94 The residual is 30 − 27.94 = 2.06. 10.3 The predicted response is Y = 43.4 − 6.82X1 + 1.70X2 + 1.70X3 + 0.442X4 = 43.4 − 6.82(5) + 1.70(7) + 1.70(5) + 0.442(75) = 62.85 The residual is 60 − 62.85 = −2.85. 10.4 The coefficient of X2 is 1.704 and the p-value for testing this coefficient is 0.202. 10.5 The coefficient of X1 is −6.820 and the p-value for testing this coefficient is 0.001. 10.6 Looking at the p-values, we see that variables X1, X3, and X4 are significant at a 5% level and variable X2 is not. 10.7 Looking at the p-values, we see that variable X1 is the only one that is significant at a 1% level. 10.8 Looking at the p-values, we see that X1 is most significant. 10.9 Looking at the p-values, we see that X2 is least significant. 10.10 Yes, the p-value from the ANOVA table is 0.000 so the model is very effective. 10.11 We see that R2 = 99.8% so 99.8% of the variability in Y is explained by the model. 10.12 There are five explanatory variables: X1, X2, X3, X4, and X5. The one response variable is Y . 10.13 The predicted response is Y
= −61 + 4.71X1 − 0.25X2 + 6.46X3 + 1.50X4 − 1.32X5 = −61 + 4.71(15) − 0.25(40) + 6.46(10) + 1.50(50) − 1.32(95) = 13.85
The residual is Y − Y = 20 − 13.85 = 6.15. 10.14 The predicted response is Y
= −61 + 4.71X1 − 0.25X2 + 6.46X3 + 1.50X4 − 1.32X5 = −61 + 4.71(19) − 0.25(56) + 6.46(12) + 1.50(85) − 1.32(106) =
79.59
The residual is Y − Y = 50 − 79.59 = −29.59.
CHAPTER 10
497
10.15 The coefficient or slope of X1 is 4.715 and the p-value for testing this coefficient is 0.053. 10.16 The coefficient or slope of X5 is −1.3151 and the p-value for testing this coefficient is 0.163. 10.17 Looking at the p-values, we see that variables X3 and X4 are significant at a 5% level and variables X1, X2, and X5 are not. 10.18 Looking at the p-values, we see that variable X4 is the only one that is significant at a 1% level. 10.19 Looking at the p-values, we see that X4 is most significant. 10.20 Looking at the p-values, we see that X2 is least significant, with a very high p-value. 10.21 Yes, the p-value from the ANOVA table is 0.000 so the model is effective. 10.22 We see that R2 = 77.9% so 77.9% of the variability in Y is explained by the model. 10.23
(a) We substitute the values for the four X-variables: = −6.4 + 1.28(8.5) + 1.010(4.6) − 0.110(21) − 2.83(0) = 6.816 Area
This forest fire is predicted to burn 6.816 hectares. (b) We see in part (a) that the actual area burned by this fire was 10.73 hectares while the predicted area is 6.816. We have: Residual = Actual area − Predicted area = 10.73 − 6.816 = 3.914 10.24 (a) The coefficient of Temp is 1.010, so for every one degree Celsius increase in temperature, a forest fire is predicted to burn 1.010 more hectares (if all other variables stayed the same). (b) The coefficient of RH is −0.110, so for every 1% increase in relative humidity, a forest fire is predicted to burn 0.110 fewer hectares (if all other variables stayed the same). 10.25 We use the p-values for the tests of coefficients to answer this question. (a) The temperature variable appears to be most significant, with a p-value of 0.087. (b) The rainfall variable appears to be the least significant, with a p-value of 0.769. (c) Looking at the p-values for the tests of coefficients, we see that none of the variables are significant at the 5% level. Only one (temperature) is significant at the 10% level. 10.26 (a) We see in the Analysis of Variance table that the F-statistic is 1.51 and the ANOVA p-value is 0.197. (b) The p-value of 0.197 is not significant for any reasonable significance level. We do not reject H0 and do not have evidence that this model is effective at predicting burned area. (c) No, the model is not effective. 10.27 We see in the model summary that R2 = 1.17. In context, this means that 1.17% of the variability in area burned by a forest fire can be explained by the atmospheric conditions on the day of the fire (wind speed, temperature, relative humidity, and rainfall). This is a very small R2 , reinforcing that this model is not very good.
CHAPTER 10
498 10.28
(a) We substitute the values for the two X-variables: = 13.81 + 0.7976(23) − 0.400(19.0) = 24.5548 CESD
This person is predicted to have a CESD score of 24.5548. (b) We see in part (a) that the actual CESD score for this person was 18 while the predicted score is 24.5548. We have: Residual = Actual score − Predicted score = 18 − 24.5548 = −6.5548 10.29 (a) The coefficient of DASS is 0.7976, so for every one point increase in DASS score, a person is predicted to have a CESD that is 0.7976 points higher, if BMI stays the same. (b) The coefficient of BMI is −0.400, so for every one unit increase in BMI, a person is predicted to have a CESD score that is 0.400 points lower, if DASS score stays the same. 10.30 We use the p-values for the tests of coefficients to answer this question. (a) The DASS variable appears to be most significant, with a p-value of 0.000. (b) The BMI variable appears to be the least significant, with a p-value of 0.298. (c) Looking at the p-values for the tests of coefficients, we see that one variable (DASS) is significant at the 5% level, and one (the same one) is significant at the 1% level. These p-values are both very clear and one will be significant and one will be insignificant regardless of the significance level used. 10.31 (a) We see in the Analysis of Variance table that the F-statistic is 35.32 and the ANOVA p-value is 0.000. (b) The p-value is very low, so we reject H0 . We have strong evidence that this model is effective at predicting CESD score. (c) Yes, the model is effective. 10.32 We see in the model summary that R2 = 49.52. In context, this means that 49.52% of the variability in CESD score can be explained by DASS score and BMI. 10.33 (a) Using the values in Case 1 for the six X-variables, we substitute the values into the regression equation given in the computer output, and get a predicted HangHours of 9.06. (We can also use statistical software to do this.) Since the actual value is 12.0, we have a residual of 12.0 − 9.06 = 2.94. (b) Using the values in Case 2 for the six X-variables, we substitute the values into the regression equation given in the computer output, and get a predicted HangHours of 6.25. (We can also use statistical software to do this.) Since the actual value is 2.0, we have a residual of 2 − 6.25 = −4.25. 10.34 (a) The coefficient of HWHours is −0.1003, so for every one additional hour spent doing homework, the predicted number of hours spent hanging out with friends goes down by 0.1003 hours, if all other variables stay the same. (b) The coefficient of VideoGameHours is 0.0775, so for every one additional hour spent playing video games, the predicted number of hours spent hanging out with friends goes up by 0.0775 hours, if all other variables stay the same. 10.35 We use the p-values for the tests of coefficients to answer this question.
CHAPTER 10
499
(a) The number of hours playing sports and the number of hours watching TV appear to be most significant, with both p-value given as 0.000. (b) The number of hours playing video games appears to be the least significant, with a p-value of 0.244. (c) Looking at the p-values for the tests of coefficients, we see that four of the variables are significant at the 5% level, while three are significant at the 1% level. 10.36 (a) We see in the Analysis of Variance table that the F-statistic is 11.17 and the ANOVA p-value is 0.000. (b) This is a very small p-value so we reject H0 . We have strong evidence that this model is effective at predicting number of hours spent hanging out with friends. (c) Yes, the model is effective. 10.37 We see in the model summary that R2 = 13.43. In context, this means that 13.43% of the variability in hours per week spent hanging out with friends can be explained by the model (or, equivalently, can be explained by knowing the number of hours per week spent on doing homework, playing sports, playing video games, using a computer, watching TV, or working at a paying job.) 10.38
(a) The predicted price for a 2500 square foot home with 4 bedrooms and 2.5 baths is P rice
= 103.75 + 0.082Size − 25.81Beds + 84.96Baths = 103.75 + 0.082(2500) − 25.81(4) + 84.96(2.5) = 417.91
The predicted price for this house is $417,910. (b) The largest coefficient is 84.96, the coefficient for the number of bathrooms. (c) The most significant predictor in this model is Baths with t = 2.46 and p-value = 0.015. (d) Only the Baths predictor is significant at the 5% level (p-value=0.015). However, the coefficient of Size is almost significant (p-value=0.057) even though the size of the coefficient (0.082) is quite small. (e) If all else stays the same (such as number of bedrooms and number of bathrooms), a house with 1 additional square foot in area is predicted to cost 0.082 thousand dollars ($82) more. (f) The model is effective at predicting price, in the sense that at least one of the explanatory variables are useful, since the p-value in the ANOVA is 0.00001. (g) We see that 19.5% of the variability in prices of homes can be explained by the area in square feet, the number of bedrooms, and the number of bathrooms. = 513+16.3F at+0.421Cholesterol−1.42Age = 513+16.3(40)+0.421(180)− 10.39 (a) We have Calories 1.42(35) = 1191.08. We predict a daily calorie consumption of 1191.08 calories for this person. (b) Age is least significant and grams of fat is most significant. (c) Fat and Cholesterol are significant at a 5% level. (d) If values for cholesterol and age stay the same, eating one more gram of fat boosts predicted calorie consumption by 16.3 calories. (e) If fat and cholesterol consumption stay the same, a person one year older is predicted to eat 1.42 calories less per day.
500
CHAPTER 10
(f) The model is effective at predicting calories, in the sense that at least one of the explanatory variables is useful, since the p-value is 0.000. (g) We see that 76.4% of the variability in calories consumed per day can be explained by consumption of fat and cholesterol, and age. 10.40 (a) The last column in the table represents the p-value. We see that both Internet and BirthRate have p-values less than 0.05, so in this model the percentage of the population with Internet and the birth rate are significant predictors of life expectancy. The p-value for birth rate is much smaller, so this is the most significant predictor in this model. e = 76.30 + 0.142(15) − 0.00031(2.5) + 0.0711(75) − (b) If we plug these values into our equation we get Lif 0.4445(30) = 70.4. So we predict the life expectancy for this country would be 70.4 years. (c) Since the coefficient of internet is positive (b3 = 0.0711), the predicted life expectancy would increase when internet usage increases. 10.41 (a) For a male, the predicted weight is W eight = −23.9 + 2.86Height − 25.5GenderCode = −23.9 + 2.86(67) − 25.5(0) = 167.72 lbs. For a female, the predicted weight is W eight = −23.9 + 2.86Height − 25.5GenderCode = −23.9 + 2.86(67) − 25.5(1) = 142.22 lbs. (b) Both Height and GenderCode are very significant in this model. (c) The coefficient is 2.86. For two people of the same gender, the one who is one inch taller is predicted to weigh 2.86 pounds more. (d) The coefficient is −25.5. If Height is the same, as GenderCode goes up by one (which in this case means from male to female), W eight goes down by 25.5. In other words, if a male and a female are the same height, the female is predicted to weigh 25.5 pounds less. (If the code had been reversed, with 0 for females and 1 for males, the coefficient would have been a positive 25.5.) (e) We see that R2 = 48.2%. This tells us that 48.2% of the variability in weights in the sample is explained by height and gender. 10.42 (a) For this player, Y ears = 9 and Concussion = 0, so the predicted cognition score is Cognition = 100.6 − 3.07(9) − 2.70(0) = 72.97. This person’s actual cognition score was 74, so the residual is 74 − 72.97 = 1.03. (b) For this player, Y ears = 7 and Concussion = 1, so the predicted cognition score is Cognition = 100.6 − 3.07(7) − 2.70(1) = 76.41. This person’s actual cognition score was 42, so the residual is 42 − 76.41 = −34.41. (c) The coefficient of Years is −3.07. For every additional year playing football, a person’s cognitive percentile score on this exam is predicted to go down by 3.07 points, assuming that the person’s concussion status doesn’t change. (d) The coefficient of Concussion is −2.70. Assuming the same number of years playing football, a person who has had a concussion is predicted to have a cognition score that is 2.70 points lower than a person who has not had a concussion. (e) Yes, the model based on years of football and concussion status is effective, at a 10% level, for predicting cognitive score, since the ANOVA p-value is 0.05 which is less than 0.10. (f) We see that the p-value for the test of the Years variable is 0.064, while the p-value for the test of the Concussion variable is 0.777. Thus, one of the variables (Years) is significant at a 10% level, and none are significant at a 5% level.
CHAPTER 10
501
(g) Since the p-value for Years is 0.064, while the p-value for Concussion is 0.777, we see that the number of years playing football is more significant than whether or not the player was ever diagnosed with a concussion. (h) Since df-Total in the ANOVA table is 43, the number of football players included in the analysis is 44. (i) We see that R2 = 13.56%, so 13.56% of the variability in cognitive scores for these football players can be explained by the number of years playing football and whether or not the person was ever diagnosed with a concussion. 10.43 The degrees of freedom for the variability explained by the regression model, 3, equals the number of predictors in the model. 10.44 The total degrees of freedom, 46, is n − 1, so the sample size is 47 horses. 4327.7 SSM odel = = 0.433. Although we don’t know the specific predictors, 43.3% of the 10.45 R2 = SST otal 9999.1 variability in the horse prices is explained by the three predictors. 10.46 H0 : β1 = β2 = β3 = 0 Ha : At least one βi = 0 The p-value in the ANOVA table (0.000) is very small so we have strong evidence to reject H0 and conclude that at least one of the predictors in the model is useful for explaining horse prices. 10.47 (a) The sentences quoted are talking about the percent of variation accounted for, so the quantity being discussed is the value of R-squared. (b) We expect the p-value to be very small. With such a large R2 – 98% of the variability accounted for — this model is clearly very effective at predicting prevalence of hantavirus in mice. 10.48 The quantity that gives the percent of variance explained in the response variable is R2 , so the article is referring to R2 . 10.49
(a) Here is some output for fitting the two-predictor model
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.80189 2.54791 1.492 0.156 Par -0.64495 1.24432 -0.518 0.612 Distance 0.01108 0.00797 1.390 0.185 Residual standard error: 1.087 on 15 degrees of freedom Multiple R-squared: 0.3672,Adjusted R-squared: 0.2828 F-statistic: 4.352 on 2 and 15 DF, p-value: 0.03232 = 3.802 − 0.645 · P ar + 0.01108 · Distance. The prediction equation is Score (b) For the par 5476 yard 16th hole, the predicted score is = 3.802 − 0.645 · (5) + 0.01108 · 476 = 5.85 Score 10.50 We can use technology to produce output similar to that given below:
CHAPTER 10
502 Regression Equation MemoryScore = 39.38 +9.38ReactionTime +0.791Sleep1 -0.417Sleep2 Coefficients Term Constant ReactionTime Sleep1 Sleep2 Model Summary S R-sq 9.66447 2.58%
Coef 39.38 9.38 0.791 -0.417
SE Coef 3.84 3.87 0.433 0.292
T-Value 10.26 2.43 1.83 -1.43
P-Value 0.000 0.016 0.068 0.154
R-sq(adj) 1.87%
Analysis of Variance Source DF Adj SS Regression 3 1017 Error 412 38482 Total 415 39499
Adj MS 339.08 93.40
F-Value 3.63
P-Value 0.013
(a) Case 1 in the dataset has a memory score of 50 and the predicted value comes out to be 46.02, giving a residual of 3.98. (b) We see from the ANOVA table that the total degrees of freedom is 415, so the number of students in the analysis is 416. (c) We see from the p-values for the coefficients that ReactionTime is the most significant in this model, with a p-value of 0.016. (d) We see that Sleep2 is the least significant, with a p-value of 0.154. (e) We see that only one variable (ReactionTime) is significant at the 5% level, while two variables (ReactionTime and Sleep1 ) are significant at the 10% level. (f) We see in the ANOVA table that the F-statistic is 3.63 and the p-value is 0.013. At a 5% level, we conclude that the model is effective at predicting memory score. (g) We see that R2 = 2.58, so only 2.58% of the variability in memory scores can be explained by the X-variables in this model. 10.51 We can use technology to produce output similar to that given below: Regression Equation TextsSent = 63.1 +1.036HangHours +0.118ComputerHours +0.31Occupants Coefficients Term Constant HangHours ComputerHours Occupants Model Summary
Coef 63.1 1.036 0.118 0.31
SE Coef 15.7 0.539 0.329 2.90
T-Value 4.02 1.92 0.36 0.11
P-Value 0.000 0.055 0.720 0.914
CHAPTER 10 S 118.982
503
R-sq 0.93%
R-sq(adj) 0.25%
Analysis of Variance Source DF Adj SS Regression 3 57743 Error 434 6144003 Total 437 6201746
Adj MS 19248 14157
F-Value 1.36
P-Value 0.255
(a) We see from the p-values for the coefficients that HangHours is the most significant in this model, with a p-value of 0.055. (b) We see that Occupants is the least significant, with a p-value of 0.914. (c) We see in the ANOVA table that the F-statistic is 1.36 and the p-value is 0.255. We do not have evidence that the model is effective at predicting number of texts sent. 10.52
(a) Here is some output for using CityMPG alone to predict HywMPG
HwyMPG = 6.69 +1.6866CityMPG Coefficients Term Coef Constant 6.69 CityMPG 1.6866
SE Coef 1.48 0.0889
S 3.47231
R-sq(adj) 76.70%
R-sq 76.91%
T-Value 4.53 18.97
P-Value 0.000 0.000
From the value of R2 we see that 76.91% of the variability in the highway mpg ratings for these cars can be explained by the city mpg ratings. (b) Here is some output for using CityMPG alone to predict HywMPG HwyMPG = 60.52 -0.006846Weight Coefficients Term Coef Constant 60.52 Weight -0.006846 S 4.16381
R-sq 66.80%
SE Coef 1.84 0.000464
T-Value 32.85 -14.74
P-Value 0.000 0.000
R-sq(adj) 66.49%
From the value of R2 we see that 66.80% of the variability in the highway mpg ratings for these cars can be explained by their weights. (c) Here is some output for using both CityMPG and Weight to predict HwyMPG HwyMPG = 20.57 +1.298CityMPG -0.001957Weight
CHAPTER 10
504 Coefficients Term Coef Constant 20.57 CityMPG 1.298 Weight -0.001957 S 3.38284
R-sq 78.29%
SE Coef 5.52 0.172 0.000751
T-Value 3.73 7.52 -2.61
P-Value 0.000 0.000 0.010
R-sq(adj) 77.88%
From the value of R2 we see that 78.29% of the variability in the highway mpg ratings for these cars can be explained by the linear model based on their city mpg ratings and weights. (d) In the output for the multiple regression model in part (c) we see that the p-values for both CityMPG (0.000) and HwyMPG (0.010) are quite small, so both terms are useful and should be kept in the model. 10.53 Here is some output for fitting the two-predictor model for Armspan Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 8.51981 7.07124 1.205 0.229 Height 0.73555 0.05716 12.867 0.000 Foot 1.44765 0.22454 6.447 0.000 --Residual standard error: 8.385 on 398 degrees of freedom Multiple R-squared: 0.6175,Adjusted R-squared: 0.6156 F-statistic: 321.3 on 2 and 398 DF, p-value: < 2.2e-16 (a) The fitted model is Armspan = 8.52+ 0.7356·Height+ 1.4477·F oot. For a student with Height = 180 and F oot = 26 the predicted arm span is Armspan = 8.52 + 0.7356 · 180 + 1.4477 · 26 = 178.55 (b) The p-values for the individual t-tests for both predictors are essentially zero, so we have strong evidence that both Height and Foot are effective in this model to predict Armspan. (c) The output shows R2 = 61.75%, so the model based on Height and Foot explains 61.75% of the variability of the Armspan measurements for these students. 10.54 Here is some computer output for this model: Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 15.74028 216.50398 0.073 0.94260 Size -0.00354 0.08982 -0.039 0.96887 Beds -83.27483 67.59267 -1.232 0.22897 Baths 266.52946 75.34852 3.537 0.00154 (a) We see that only the Baths predictor is significant at a 5% level, with a p-value of 0.00154. The other two predictors are not very helpful in this model.
CHAPTER 10
505
(b) We interpret the coefficient of Beds (−83.275) by saying that if the size of the house and the number of bathrooms were to stay the same, an increase of one bedroom in a house will lower the predicted cost by $83,275. This might not make sense, as we expect one more bedroom to increase the cost not decrease. However, more rooms in the same space means smaller rooms and smaller rooms might decrease the value of a house. Remember that we need to interpret coefficients in the context of the other variables! For Baths we interpret the coefficient of 266.529 by saying that if the size of the house and the number of bedrooms were to stay constant, an increase in one bathroom will raise the predicted cost of a house by $266,529. This direction makes more sense. (c) We predict the price would be 15.740 − 0.00354(1500) − 83.275(3) + 266.529(2) = 293.663 or $293,663. 10.55 Here is some typical computer output for this model: The regression equation is BM Gain = - 1.95 + 0.00096 Corticosterone + 0.116 DayPct + 0.904 Consumption - 0.000102 Activity Predictor Constant Corticosterone DayPct Consumption Activity S = 2.25158
Coef -1.953 0.000963 0.11614 0.9040 -0.0001020
SE Coef 2.772 0.009201 0.02492 0.5721 0.0002326
R-Sq = 59.4%
T -0.70 0.10 4.66 1.58 -0.44
P 0.488 0.918 0.000 0.128 0.665
R-Sq(adj) = 52.1%
Analysis of Variance Source DF SS Regression 4 163.391 Residual Error 22 111.531 Total 26 274.922
MS 40.848 5.070
F 8.06
P 0.000
(a) The coefficient is 0.116, so if all the other explanatory variables in the model are held constant, the predicted body mass gain goes up 0.116 grams for one more percent of food eaten during the day. (b) The coefficient is 0.904, so if all the other explanatory variables in the model are held constant, the predicted body mass gain goes up 0.904 grams for one additional gram of food in daily consumption. (c) The percent of food eaten during the day, DayP ct, is the most significant variable. (d) Stress levels, Corticosterone, is the least significant variable in this model. (e) Yes, the model is effective, in the sense that at least one of the explanatory variables are useful for predicting body mass gain, since the p-value from the ANOVA table is 0.000. (f) We see that 59.4% of the variability in body mass gain can be explained by these four variables. 10.56
(a) Here is some output for fitting W inP ct = β0 + β1 P tsF or + β2 P tsAgainst + .
Coefficients Term Coef Constant 0.303 PtsFor 0.03024
SE Coef 0.198 0.00141
T-Value 1.53 21.52
P-Value 0.138 0.000
CHAPTER 10
506 PtsAgainst
-0.02847
Model Summary S R-sq 0.0300307 96.10%
0.00149
-19.05
0.000
R-sq(adj) 95.81%
Analysis of Variance Source DF Adj SS Regression 2 0.59982 Error 27 0.02435 Total 29 0.62416
Adj MS 0.299908 0.000902
F-Value 332.55
P-Value 0.000
The estimated prediction equation is W inP ct = 0.303 + 0.03024 · P tsF or − 0.02847 · P tsAgainst (b) The predicted winning percentage for the Toronto Raptors with P tsF or = 114.4 and P tsAgainst = 108.4 is W inP ct = 0.303 + 0.03024 · 114.4 − 0.02847 · 108.4 = 0.676 The residual is W inP ct − W inP ct = 0.707 − 0.676 = 0.031 which is better than the residual based on P tsF or or P tsAgainst alone. (c) The t-statistics for both coefficients (21.52 and −19.05) are extremely large in magnitude (and very similar in size) and both p-values are essentially zero, so we have strong evidence that both offensive (P tsF or) and defensive (P tsAgainst) abilities are important for predicting team success in the NBA (as measured by W inP ct). (d) Using just P tsF or as a predictor for W inP ct we find R2 = 43.7% and s = 0.112. Using just P tsAgainst as a predictor for W inP ct we find R2 = 29.2% and s = 0.126. Both single predictor models have small ANOVA p-value so each is an effective predictor on its own. But together in a twopredictor model we see R2 = 96.1% (much higher than the single predictors) and s = 0.0300 (much lower than the single predictors). Thus the model based on both predictors is much more effective at predicting winning percentage in the NBA. 10.57 (a) Here is some output for predicting P rice based on Age alone. The value for the test of slope is essentially zero, providing strong evidence that Age is an effective predictor of Mustang prices in this model. Predictor Constant Age
Coef 30.264 -1.7168
SE Coef 3.440 0.3648
T 8.80 -4.71
P 0.000 0.000
(b) Here is some output for the model using both Age and M iles as predictors of P rice. The p-value for the t-test of the coefficient of Age is 0.769 which is not small at all. This does not provide enough evidence to conclude that Age is a useful predictor of P rice in this model. Predictor Constant Miles Age
Coef 30.867 -0.20495 -0.1551
SE Coef 2.787 0.05650 0.5219
T 11.07 -3.63 -0.30
P 0.000 0.001 0.769
CHAPTER 10
507
(c) The discrepancy between parts (a) and (b) occurs because Age and M iles driven are very strongly related (r = 0.825 for these 25 cars). If M iles is already in the model we don’t need Age as well because it has little additional information to predict price that M iles doesn’t already supply. If M iles is not in the model, as in part (a), then Age is a useful predictor.
10.58
(a) Using technology we find R2 for each model.
Predictor(s) P tsF or P tsAgainst P tsF or, P tsAgainst
R2 43.7% 29.2% 96.1%
The combination of the two predictors works much better than either alone — even better than the sum of the two parts. (b) Here is some output for fitting W inP ct = β0 + β1 Dif f + .
The regression equation is WinPct = 0.50026 +0.02943Diff Coefficients Term Coef Constant 0.50026 Diff 0.02943
SE Coef 0.00548 0.00114
Model Summary S R-sq 0.0300252 95.96%
R-sq(adj) 95.81%
Analysis of Variance Source DF Adj SS Regression 1 0.59892 Error 28 0.02524 Total 29 0.62416
T-Value 91.26 25.78
Adj MS 0.598923 0.000902
P-Value 0.000 0.000
F-Value 664.35
P-Value 0.000
The prediction equation is W inP ct = 0.500 + 0.02943Dif f . This is a very effective model (t = 25.78, p-value ≈ 0 for the t-test for slope) and the R2 = 95.96% almost matches that of the two-predictor model (and is much better than either predictor individually). The scatterplot with the regression line shows a strong linear pattern with small amounts of random scatter above and below the line. Note also that the intercept is essentially 0.500, showing that a team that scores as much as it allows should win about half of its games.
CHAPTER 10
508
10.59 Here are ANOVA tables for two randomizations of the P rice values, using P hotoT ime and CostColor as predictors. Source Regression Residual Error Total
DF 2 17 19
SS 29821 106416 136237
MS 14910 6260
F 2.38
P 0.122
Source Regression Residual Error Total
DF 2 17 19
SS 2428 133808 136237
MS 1214 7871
F 0.15
P 0.858
These give randomization F-statistics 2.38 and 0.15, respectively. We repeat this to get a total of 1000 randomization F-statistics as shown in the dotplot below.
For this simulation we find just 8 of the 1000 randomizations produce an F-statistic as large as the F = 6.30 observed in the original sample. This gives an approximate p-value of 8/1000 = 0.008 which agrees nicely with the p-value=0.009 from the F-distribution. The histogram of the randomization distribution above on the right also shows the density for an F-distribution with 2 numerator and 17 denominator degrees of freedom.
CHAPTER 10
509
Section 10.2 Solutions 10.60 We see that the residuals are all positive at the ends and negative in the middle. This matches the curved pattern we see in graph (c). 10.61 We see that the residuals are pretty evenly distributed around the zero line, without obvious curvature or outliers. This matches the nice linear relationship we see in graph (b). 10.62 We see that there is a large outlier in the scatterplot. Since the outlier is above the line, the residual is positive. Also, the outlier matches a predicted value of about 29, which matches what we see in graph (a). 10.63 We see that there is a large outlier in the scatterplot. Since the outlier is below the line, the residual is negative. Also, the outlier matches a predicted value of about 55, which matches what we see in graph (d). 10.64 The conditions all seem to be met. There appears to be a mild linear trend in the data (although it is not very strong), with no reason to question the equal variability condition. We see no strong skewness or outliers in the histogram of the residuals. Finally, the scatterplot of residuals against predicted values appears to show the residuals equally spread out above and below a line at 0, indicating no obvious trend or curvature, and the spread of the points above/below the line stays reasonably consistent going across the graph. 10.65 The conditions are not all met. Normality of the residuals is acceptable since we see no strong skewness or outliers in the histogram of the residuals. However, we see in the scatterplot of the data with the least squares line that the trend in the data is not really linear. This departure from linearity is also apparent in the scatterplot of residuals against predicted values where we see obvious curvature. 10.66
(a) The arrow is shown on the figure.
(b) The predicted weight is W eight = −170 + 4.82Height = −170 + 4.82(63) = 133.66 lbs. The residual is 200 − 133.66 = 66.34. (c) Arrows are shown on both the figures.
CHAPTER 10
510
(d) Looking at the three graphs, the conditions appear to all be met. 10.67
(a) The arrow is shown on the figure.
(b) The predicted weight is W eight = −170 + 4.82Height = −170 + 4.82(73) = 181.86 lbs. The residual is 120 − 181.86 = −61.86. (c) Arrows are shown on both the figures.
(d) Looking at the three graphs, the conditions appear to all be met. 10.68 The three relevant plots, a scatterplot with least squares line, a histogram of the residuals, and a residuals versus fitted values plot are shown below.
CHAPTER 10
511
The conditions appear to be reasonably met. There are a few outliers which may be cause for concern (a couple outliers with high residuals, and an outlier with a high exercise amount), but these outliers are not too extreme, and mostly the conditions appear to be met. The linear trend does not appear to be particularly strong, but there is no evidence of a non-linear trend. The histogram is roughly normal, and variability appears to be constant.
10.69 The three relevant plots, a scatterplot with least squares line, a histogram of the residuals, and a residuals versus fitted values plot are shown below.
The conditions do not appear to be well met. In particular, there is one very high outlier (someone who eats 6662.2 calories a day!). This outlier is obvious and extreme on the scatterplot with regression line, the histogram of residuals, and the scatterplot of residuals vs fits.
10.70 The three relevant plots, a scatterplot with least squares line, a histogram of the residuals, and a residuals versus fitted values plot are shown below.
CHAPTER 10
512
The conditions appear to be reasonably well met. There appears to be a linear trend in the scatterplot with reasonably equal variability around the regression line. The histogram of residuals appears to have a couple of mild outliers, but nothing too extreme. The scatterplot of residuals vs fits seems to have no obvious trend or curvature and relatively equal variability. The conditions aren’t perfectly met but seem reasonably close. 10.71 The three relevant plots, a scatterplot with least squares line, a histogram of the residuals, and a residuals versus fitted values plot are shown below.
Some of the conditions met, but the very high outliers in the histogram indicate that the residuals are probably not normally distributed. Other than that, there appears to be a relatively strong linear trend in the scatterplot with regression line with reasonably equal variability. There are no big concerns from the scatterplot of residuals vs fits, which seems to show no obvious trends or curvature and relatively equal variability. 10.72
(a) Here is some output for fitting this model
(Intercept) Distance
Estimate Std. Error t value Pr(>|t|) 7.12034 0.90659 7.854 2.5e-14 *** 1.21115 0.03977 30.454 < 2e-16 ***
The prediction equation is T ime = 7.12 + 1.211 · Distance. (b) For Distance = 20 the expected commute time for the model is T ime = 7.12 + 1.211 · 20 = 31.34 minutes.
CHAPTER 10
513
(c) A scatterplot of T ime vs Distance is shown below on the left. There is a steady upward trend with longer commutes tending to take longer times. However, there is a very regular boundary along the bottom edge of the graph (possibly determined by the speed limit?) that keeps points fairly close and in the direction of the regression line. The deviations above the line are more scattered with several unusually large positive residuals.
(d) A histogram of the residuals is shown to the right above. It shows a clear skew to the right with a long tail and several large outliers in that direction. The normality condition is not appropriate for these residuals. (e) A plot of residual vs fits for this model is shown below. It shows a distinct pattern, especially for the negative residuals, of increasing variability of the residuals as the predicted commute time increases. The equal variability condition is not met for this model and sample. Also, the plot shows several very large positive residuals where the commute time is drastically underestimated.
10.73
(a) Here is some output for fitting the model with the St. Louis data
(Intercept) Distance
Estimate Std. Error t value Pr(>|t|) 6.40819 0.58764 10.90 <2e-16 *** 1.09931 0.03307 33.24 <2e-16 ***
The prediction equation for St. Louis is T ime = 6.41 + 1.0999 · Distance. This is similar to the model for Atlanta, with a slightly smaller intercept and slope. (b) For Distance = 20 the expected commute time for the St. Louis model is T ime = 6.41 + 1.099 · 20 = 28.39 minutes (compared to 31.34 minutes for the same distance commute in Atlanta).
CHAPTER 10
514
0
Residuals 0 20 60
0
40
Time 80
Frequency 50 100 150
120
(c) A scatterplot of T ime vs Distance, histogram of the residuals, and residual vs fits plot for the St. Louis data are shown below.
0
20
40 Distance
60
80
0
20 40 60 Residuals
80
20
40 60 80 Predicted Time
These plots are similar to the corresponding plots for the Atlanta data. In the scatterplot of T ime vs Distance, there is a steady upward trend with longer commutes tending to take longer times. However, there is a very regular boundary along the bottom edge of the graph (possibly determined by the speed limit?) that keeps points fairly close and in the direction of the regression line. The deviations above the line are more scattered with several unusually large positive residuals. (d) The histogram of the residuals shows a clear skew to the right with a long tail and several large outliers in that direction. The normality condition is not appropriate for these residuals. (e) The plot of residual vs fits for this model shows a distinct pattern, especially for the negative residuals, of increasing variability of the residuals as the predicted commute time increases. The equal variability condition is not met for this model and sample. Also, the plot shows several very large positive residuals where the commute time is drastically underestimated. 10.74 (a) We are 95% sure that the mean commute time for all Atlantans with a 20 mile commute is between 30.3 and 32.4 minutes. (b) The lower bound of the prediction interval implies a 20 mile commute in 7.235 minutes, which is an average speed of (20/7.235) · 60 ≈ 166 miles per hour! The prediction interval assumes normally distributed, hence symmetric, errors. A symmetric interval around the expected commute time needs to be very wide to capture most of the positively skewed times, which makes the lower bound unreasonably small. 10.75 Here are a residuals vs fits plot and histogram of the residuals for the multiple regression model to predict HwyMPG based on CityMPG and Weight for the cars in Cars2020.
CHAPTER 10
515
The residual vs fits plot shows no consistent curvature, so the linearity condition appears to be met. However, we see that the residuals for smaller fitted values (less than 35) appear to be quite a bit less variable than those for larger fitted values. This would raise concern for the equal variance condition. The histogram of residuals is relatively symmetric and centered around zero, but there might be some mild outliers on both the low and high ends of the distribution. This might indicate a problem with the normality condition, not as clear as the problem with equal variance. Ordering from least to most problematic would be linearity (least), normality, equal variance (most).
−10 −20 −40
−30
Residuals
0
10
20
10.76 (a) We use a residuals vs fits plot (shown below) to assess linearity. There is no regular curved pattern in this plot so we don’t see a problem with the linearity condition.
150
160
170
180
190
200
Fitted Values
(b) The residuals vs fits plot above is also useful to assess the equal variance condition. We see that the scatter above and below the line is relatively consistent as we move from left to right (no regular increasing or decreasing patterns in the variability) so the variance of the residuals appears to be relatively constant.
−10
Residuals
−20
80 60
−30
40
−40
20 0
Frequency
0
100
10
120
20
140
(c) A histogram and normality plot for the residuals are shown below. We see evidence of a left skew in the histogram of the residuals and the normality plot bends away from the line in both tails. We appear to have a problem with the normality condition.
−40
−20
0 Residuals
20
−3
−2
−1
0 Normal
1
2
3
CHAPTER 10
516
200 −200
0
Residuals
8 6 0
−400
2
4
Frequency
10
400
12
14
600
10.77 A histogram of the residuals and plot of residuals versus fitted values are shown below. The histogram looks reasonably symmetric (or perhaps slightly right skewed) with no large outliers, so we have only mild concern with the normality condition. The residual plot presents more issues. There appears to be a decreasing trend for the homes with smaller fitted values and the variability appears to increase with larger fitted values. Thus the linearity and equal variance conditions might both be problematic.
−600
−400
−200
0 Residuals
200
400
600
0
100
200
300
400
500
600
700
Fits
10.78 A histogram of the residuals and plot of residuals versus fitted values are shown below. The histogram is nicely symmetric with no extreme outliers. The normality condition is reasonable. The residuals are randomly scattered in a band on either side of the zero line in the residual vs fits plot, with no obvious trend or curvature. We see no concerns with the constant variability condition.
10.79 A histogram of the residuals and plot of residuals versus fitted values are shown below. The histogram is relatively symmetric with no extreme outliers, so the normality condition looks reasonable. The residuals are randomly scattered in a band on either side of the zero line in the residual vs fits plot. We see no concerns with the constant variability condition.
CHAPTER 10
517
10.80 A histogram of the residuals and plot of residuals versus fitted values are shown below. The histogram appears to have a large potential outlier which raises minor concerns. The residuals appear to have somewhat greater variability for large fits than for small fits, so we have some concern about the constant variability condition. Also, there is some curvature downward and then an upward trend.
CHAPTER 10
518 Section 10.3 Solutions
10.81 (a) We should try eliminating the variable X2 since it has the largest p-value and hence is the most insignificant in this model. (b) We see that R2 = 15%. Eliminating any variable (even insignificant ones) will cause R2 to decrease. A very small decrease in R2 would indicate that removing X2 was a good idea, whereas a large decrease in R2 would indicate that removing X2 may have been a bad idea, because a larger R2 generally means a better model. (c) The p-value is 0.322. Removing an insignificant variable will most likely improve the model, and so cause the ANOVA p-value to decrease. If the p-value decreases it would indicate that removing the insignificant variable was a good idea, if the p-value increases it would indicate that removing the insignificant variable was probably not a good idea. (d) F = 1.23. Dropping an insignificant predictor should cause the F-statistic to increase (since the numerator is divided by fewer degrees of freedom and the denominator is divided by more degrees of freedom), assuming that the dropped variable explains little variation that the other predictors don’t explain. An increase in the F-statistic would indicate that removing the predictor was a good idea, since the MSModel would be larger relative to the MSE. 10.82 (a) We should try eliminating the variable X3 since it has the largest p-value and hence is the most insignificant in this model. (b) We see that R2 = 41.7%. Eliminating any variable (even insignificant ones) will cause R2 to decrease. A very small decrease in R2 would indicate that removing X3 was a good idea, whereas a large decrease in R2 would indicate that removing X3 may have been a bad idea, because a larger R2 generally means a better model. (c) The p-value is 0.031. Removing an insignificant variable will most likely improve the model, and so cause the ANOVA p-value to decrease. If the p-value decreases it would indicate that removing the insignificant variable was a good idea, if the p-value increases it would indicate that removing the insignificant variable was probably not a good idea. (d) The F-statistic is 3.81. If we eliminate an insignificant variable, we hope that the model gets better, which will cause the F-statistic to increase (and move farther out in the tail to give a smaller p-value). An increase in the F-statistic would indicate that removing X3 was a good idea. 10.83 Using all five variables, we get the following information: Regression Equation OpeningWeekend = -217 +0.1376RottenTomatoes +0.2184AudienceScore +0.008216TheatersOpenWeek +0.2623Budget +0.091Year Coefficients Term Constant RottenTomatoes AudienceScore TheatersOpenWeek Budget Year
Coef -217 0.1376 0.2184 0.008216 0.2623 0.091
SE Coef 646 0.0330 0.0499 0.000605 0.0142 0.321
T-Value -0.34 4.17 4.38 13.58 18.51 0.29
P-Value 0.737 0.000 0.000 0.000 0.000 0.776
CHAPTER 10 S 20.2066
519
R-sq 59.75%
R-sq(adj) 59.55%
Analysis of Variance Source DF Adj SS Regression 5 636328 Error 1050 428721 Total 1055 1065049
Adj MS 127266 408
F-Value 311.69
P-Value 0.000
Notice that the p-value from ANOVA is 0.000 so the overall model is effective, but there is one insignificant predictor (Year with p-value=0.776). We can probably make this model better by dropping that variable. Here is the output for the four remaining predictors: Regression Equation OpeningWeekend = -32.75 +0.1388RottenTomatoes +0.2170AudienceScore +0.008233TheatersOpenWeek +0.2621Budget Coefficients Term Constant RottenTomatoes AudienceScore TheatersOpenWeek Budget
Coef -32.75 0.1388 0.2170 0.008233 0.2621
S 20.1978
R-sq(adj) 59.59%
R-sq 59.74%
Analysis of Variance Source DF Adj SS Regression 4 636295 Error 1051 428755 Total 1055 1065049
SE Coef 2.80 0.0327 0.0496 0.000602 0.0141
Adj MS 159074 408
T-Value -11.70 4.24 4.38 13.68 18.53
F-Value 389.93
P-Value 0.000 0.000 0.000 0.000 0.000
P-Value 0.000
We see that R2 stays essentially the same (59.74% compared to 59.75%) and now all four of the remaining terms have individual p-values shown as 0.000. Thus a reasonable choice for the final fitted model is eekend = −32.75+0.139RottenT omatoes+0.217AudienceScore+0.00823T heatersOpenW eek+0.262Budget OpeiningW 10.84 Here is the output for fitting all four predictors: The regression equation is Avg_Mercury = 1.00 - 0.00550 Alkalinity - 0.0467 pH + 0.00413 Calcium - 0.00236 Chlorophyll Predictor Constant Alkalinity pH
Coef 1.0044 -0.005503 -0.04671
SE Coef 0.2576 0.002028 0.04533
T 3.90 -2.71 -1.03
P 0.000 0.009 0.308
CHAPTER 10
520 Calcium Chlorophyll S = 0.262879
0.004129 -0.002361
0.002648 0.001497
R-Sq = 45.2%
Analysis of Variance Source DF SS Regression 4 2.73081 Residual Error 48 3.31707 Total 52 6.04788
1.56 -1.58
0.125 0.121
R-Sq(adj) = 40.6%
MS 0.68270 0.06911
F 9.88
P 0.000
The ANOVA p-value is already very low for this model, so it is clear the model is effective. However, some of the variables are not significant. The least significant variable is pH so we see what happens when we eliminate that one. The regression equation is Avg_Mercury = 0.745 - 0.00649 Alkalinity + 0.00433 Calcium - 0.00303 Chlorophyll Predictor Constant Alkalinity Calcium Chlorophyll S = 0.263045
Coef 0.74458 -0.006487 0.004333 -0.003035
SE Coef 0.05240 0.001790 0.002642 0.001348
R-Sq = 43.9%
Analysis of Variance Source DF SS Regression 3 2.65743 Residual Error 49 3.39045 Total 52 6.04788
T 14.21 -3.62 1.64 -2.25
P 0.000 0.001 0.107 0.029
R-Sq(adj) = 40.5%
MS 0.88581 0.06919
F 12.80
P 0.000
The ANOVA p-value is still zero, but we can see that the model is improved since the F-statistic went up. The value of R2 went down, but only by a bit. The p-values of the other three variables all improved. The only one that is not significant at a 5% level is Calcium so we see what happens if we eliminate that variable. The regression equation is Avg_Mercury = 0.752 - 0.00415 Alkalinity - 0.00298 Chlorophyll Predictor Constant Alkalinity Chlorophyll S = 0.267451
Coef 0.75194 -0.004154 -0.002979
SE Coef 0.05308 0.001105 0.001370
R-Sq = 40.9%
Analysis of Variance Source DF SS Regression 2 2.4714 Residual Error 50 3.5765
T 14.17 -3.76 -2.17
P 0.000 0.000 0.034
R-Sq(adj) = 38.5%
MS 1.2357 0.0715
F 17.27
P 0.000
CHAPTER 10 Total
521 52
6.0479
The F-statistic again went up, and again R2 went down from 43.9% to 40.9%. Both remaining variables are significant at a 5% level. However, s increased from 0.263 to 0.267, and Adjusted R2 decreased. Either this model or the model before eliminating Calcium could be justified as the best model. 10.85 (a) A correlation matrix for all the variables is shown. We see that the variables with strong correlations with BetaP lasma are F iber and BetaDiet. To a smaller extent, but still possibly relevant, are Age and F at. It appears that Alcohol consumption is not significantly correlated with beta-carotene levels in the body. BetaPlasma 0.101 0.073
Age
Fat
-0.092 0.104
-0.169 0.003
Fiber
0.236 0.000
0.045 0.428
0.276 0.000
Alcohol
-0.022 0.695
0.052 0.362
0.186 0.001
-0.020 0.722
BetaDiet
0.225 0.000
0.072 0.203
0.143 0.011
0.483 0.000
Age
Fat
Fiber
Alcohol
0.039 0.486
Cell Contents: Pearson correlation P-Value (b) Let’s start with a model with all five explanatory variables, just to see what we get. The regression equation is BetaPlasma = 92.4 + 0.677 Age - 0.874 Fat + 7.19 Fiber + 0.053 Alcohol + 0.0177 BetaDiet Predictor Constant Age Fat Fiber Alcohol BetaDiet S = 174.845
Coef 92.42 0.6768 -0.8739 7.186 0.0533 0.017745
SE Coef 47.92 0.6943 0.3164 2.191 0.8219 0.007668
R-Sq = 10.2%
Analysis of Variance Source DF Regression 5
SS 1069284
T 1.93 0.97 -2.76 3.28 0.06 2.31
P 0.055 0.330 0.006 0.001 0.948 0.021
R-Sq(adj) = 8.7%
MS 213857
F 7.00
P 0.000
CHAPTER 10
522 Residual Error Total
309 314
9446354 10515638
30571
The ANOVA p-value is 0, but R2 is only 10.2%. As expected from the correlation analysis, the p-value for Alcohol is very large. Let’s look at a model with just the other four variables. The regression equation is BetaPlasma = 92.2 + 0.681 Age - 0.870 Fat + 7.17 Fiber + 0.0178 BetaDiet Predictor Constant Age Fat Fiber BetaDiet S = 174.564
Coef 92.17 0.6809 -0.8695 7.172 0.017770
SE Coef 47.69 0.6903 0.3087 2.177 0.007646
R-Sq = 10.2%
T 1.93 0.99 -2.82 3.29 2.32
P 0.054 0.325 0.005 0.001 0.021
R-Sq(adj) = 9.0%
Analysis of Variance Source DF SS Regression 4 1069155 Residual Error 310 9446483 Total 314 10515638
MS 267289 30473
F 8.77
P 0.000
We see that R2 did not go down at all, and the F-statistic got better (increased), adjusted R2 got better (increased), and s got better (decreased). This is a better model than the first one. However, the variable Age is not significant in the model, so we try again without that one. The regression equation is BetaPlasma = 128 - 0.927 Fat + 7.30 Fiber + 0.0182 BetaDiet Predictor Constant Fat Fiber BetaDiet S = 174.556
Coef 128.20 -0.9275 7.296 0.018228
SE Coef 30.67 0.3030 2.173 0.007632
R-Sq = 9.9%
T 4.18 -3.06 3.36 2.39
P 0.000 0.002 0.001 0.018
R-Sq(adj) = 9.0%
Analysis of Variance Source DF SS Regression 3 1039508 Residual Error 311 9476130 Total 314 10515638
MS 346503 30470
F 11.37
P 0.000
Few quantities except the F-statistic changed, and the F-statistic continued to go up. All three variables are significant at a 5% level (and almost at a 1% level) so we’ll call this our final model. Other answers are also possible.
CHAPTER 10
523
10.86 (a) Here are the correlations between T ime and each predictor, along with p-values for testing if each correlation differs from zero. Runs Margin Hits Errors Pitchers Walks Correlation with T ime 0.504 −0.116 0.349 −0.040 0.721 0.565 p-value 0.005 0.541 0.059 0.833 0.000 0.001 P itchers appear to be the strongest predictor of T ime as a single predictor with the largest correlation and smallest p-value. W alks and Runs are also strong single predictors of T ime with very small p-values. The correlation between Hits and T ime is not quite significant at a 5% level. M argin and Error have the weakest correlations and do not appear to be useful predictors of T ime on their own. (b) If we try the three strongest individual predictors, P itchers, W alks and Runs, we get the following output. Predictor Constant Runs Walks Pitchers S = 13.3518
Coef 120.083 0.7016 2.1966 5.255
SE Coef 9.629 0.7086 0.9813 1.449
R-Sq = 62.1%
T 12.47 0.99 2.24 3.63
P 0.000 0.331 0.034 0.001
R-Sq(adj) = 57.7%
P itchers (p-value=0.001) and W alks (p-value=0.034) both look effective in this model, but we might drop Runs (p-value=0.331). Doing so we get the output below. Predictor Constant Walks Pitchers S = 13.3470
Coef 120.550 2.3632 5.851
SE Coef 9.614 0.9664 1.317
R-Sq = 60.7%
T 12.54 2.45 4.44
P 0.000 0.021 0.000
R-Sq(adj) = 57.8%
Both predictors are significant and, while the R2 has dropped by 1.4%, the estimate of the standard deviation of the error, both adjusted R2 and s = 13.347 are slightly better here than the three predictor model. We tried each of the other predictors together with P itchers and W alks and none of them showed a significant p-value for their coefficient. So we will use the two predictor model, T ime = 120.55 + 2.3632 · W alks + 5.851 · P itchers. 10.87 (a) Since we are told the result is significant and that a significance level of 5% is used, we can conclude that the p-value is less than 0.05. (b) No, this is an observational study so we cannot conclude that one causes the other. (c) We can account for possible confounding variables by including them as additional explanatory variables in the model, and running multiple regression with the additional variables. (d) No, this is still from an observational study and we cannot conclude causation. While the authors of the study accounted for some confounding variables, it is impossible to account for all possible confounding variables in this study. The only way to conclude causation is with a randomized experiment!
CHAPTER 10
524 10.88
(a) We regress LifeExpectancy on Electricity and obtain the following output:
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 7.155e+01 6.244e-01 114.583 < 2e-16 Electricity 5.634e-04 8.543e-05 6.595 8.37e-10 --Residual standard error: 6.004 on 138 degrees of freedom (77 observations deleted due to missingness) Multiple R-squared: 0.2396,Adjusted R-squared: 0.2341 F-statistic: 43.49 on 1 and 138 DF, p-value: 8.371e-10 The p-value of 8.37 × 10−10 (essentially zero) indicates that Electricity is a very significant predictor of LifeExpectancy. (b) GDP is associated with both electricity use (r = 0.702) and with life expectancy (r = 0.656), so is a potential confounding variable. (c) We regress LifeExpectancy on both Electricity and GDP and obtain the following output: Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 7.028e+01 5.828e-01 120.583 < 2e-16 Electricity 5.863e-05 1.034e-04 0.567 0.572 GDP 1.968e-04 2.876e-05 6.845 2.73e-10 --Residual standard error: 5.148 on 130 degrees of freedom (84 observations deleted due to missingness) Multiple R-squared: 0.4438,Adjusted R-squared: 0.4352 F-statistic: 51.86 on 2 and 130 DF, p-value: < 2.2e-16 The p-value of 0.572 for Electricity indicates that after accounting for the information in GDP, Electricity is no longer a significant predictor of LifeExpectancy in this model. 10.89
(a) We regress LifeExpectancy on Cell and obtain the following output:
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 61.18541 1.30314 46.952 <2e-16 Cell 0.10321 0.01132 9.116 <2e-16 --Residual standard error: 6.503 on 190 degrees of freedom (25 observations deleted due to missingness) Multiple R-squared: 0.3043,Adjusted R-squared: 0.3006 F-statistic: 83.1 on 1 and 190 DF, p-value: < 2.2e-16 The p-value for Cell is essentially zero, indicating that Cell is a very significant predictor of LifeExpectancy. (b) GDP is associated with both number of mobile cell subscriptions (r = 0.453) and with life expectancy (r = 0.656), so is a potential confounding variable.
CHAPTER 10
525
(c) We regress LifeExpectancy on both Cell and GDP and obtain the following output: Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.203e+01 1.181e+00 52.55 < 2e-16 Cell 6.615e-02 1.112e-02 5.95 1.42e-08 GDP 1.886e-04 2.203e-05 8.56 5.61e-15 --Residual standard error: 5.41 on 175 degrees of freedom (39 observations deleted due to missingness) Multiple R-squared: 0.5267,Adjusted R-squared: 0.5213 F-statistic: 97.37 on 2 and 175 DF, p-value: < 2.2e-16 The p-value of 1.42 × 10−8 for Cell is very small, indicating that even after accounting for GDP, Cell is a very significant predictor of LifeExpectancy. 10.90 (a) For the model with size in square feet, the coefficient of Size is 0.082. When we code Size1000 in thousands of square feet, we multiply this coefficient by 1000 to get the coefficient of 81.99 in the revised model. Note that predictions will be identical for the same size home with either model. The standard error for the coefficient is also 1000 times larger in the revised model (going from SE = 0.0426 to SE = 42.64) so the t-statistic (t = 1.923) and p-value remain the same. (b) Given the size of the house and the number of bathrooms, each additional bedroom decreases the predicted price by $25,810. (c) The predictors are all correlated with each other, making any individual coefficient difficult to interpret. Number of bedrooms is too highly associated with size of the house and number of bathrooms to draw any meaningful conclusions from the negative coefficient. Also, this is observational data, so we cannot draw any causal conclusions. 10.91
(a) The predicted commute time for the steel bike is M inutes = 108.342 − 0.553(1) = 107.789 minutes
(b) The predicted commute time for the carbon bike is M inutes = 108.342 − 0.553(0) = 108.342 minutes (c) The p-value of 0.711 for testing the coefficient of BikeSteel indicates no significant difference in average commute time between the carbon bike and the steel bike. 10.92 The predicted distance is higher for the carbon bike, by 0.396 miles (as seen by the coefficient of BikeSteel). The p-value of 4.74 × 10−9 (essentially zero) indicates that this difference is very significant. 10.93 (a) For commutes of the same distance, the predicted commute time is 3.571 minutes longer for the steel bike than for the carbon bike. (b) For commutes on the same bike, the predicted commute time increases by 10.5 minutes for every mile ridden.
CHAPTER 10
526 (c) The predicted commute time for a 27 mile commute on the steel bike is M inutes = −176.62 + 3.571(1) + 10.41(27) = 108.021 minutes The predicted commute time for a 27 mile commute on the carbon bike is M inutes = −176.62 + 3.571(0) + 10.41(27) = 104.45 minutes
10.94 (a) Without accounting for distance, the steel bike gives a lower predicted commute time. However, in this experiment the commutes ridden on the steel bike were of slightly shorter distance. After accounting for distance, for rides of the same distance the carbon bike gives a lower predicted commute time. (b) Because the coefficient of BikeSteel is positive with Distance included in the model, the predicted commute time for a ride of a given distance is longer for the steel bike. Therefore, the steel bike has a lower average speed, and so the coefficient for BikeSteel would be negative. 10.95
(a) Here are the individual t-tests for fitting the two-predictor model
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.80189 2.54791 1.492 0.156 Par -0.64495 1.24432 -0.518 0.612 Distance 0.01108 0.00797 1.390 0.185 The p-value for Par (0.612) is quite large, indicating that Par is not very useful in this model to predict Score. Similarly, the p-value for Distance (0.185) is also not small enough to conclude that it is a useful predictor for Score in this model. (b) Here is some output for the ANOVA F-test for checking the overall effectiveness of this model Model: Score ~ Par + Distance Df Sum Sq Mean Sq F value P(>F) Model 2 10.282 5.1409 4.3523 0.03232 Error 15 17.718 1.1812 Total 17 28.000 The small p-value (0.0323 < 0.05) provides evidence that the model based on Par and Distance has some effectiveness for predicting the golf scores. (c) The individual t-tests in part (a) indicate that neither predictor is particularly important in the multiple regression model, while the F-test says that something is useful in the model. The reason for this seeming inconsistency is that the Par and Distance predictors themselves are strongly correlated (r = 0.951). The larger par values are naturally assigned to the longer holes. Thus if either of the two predictors is already in the model, the other won’t offer much new information. Remember that the individual t-tests assess the importance of each predictor after the other predictors are in the model. The ANOVA F-test assesses how well the predictors do as a group. 10.96 There are lots of possibilities for examples. An important predictor that has large values (such as Weight) might have a very small coefficient. Another predictor with relatively small values (such as Acc30 ) might have a coefficient that is larger in magnitude, but a smaller individual t-statistic and larger p-value, indicating it is less important in the model. Here is some output from a model to predict HwyMPG, based on both Weight and Acc30 (time to accelerate to 30 mph).
CHAPTER 10
527
HwyMPG = 53.13 +1.746Acc030 -0.006330Weight Coefficients Term Coef Constant 53.13 Acc030 1.746 Weight -0.006330
SE Coef 3.84 0.798 0.000514
T-Value 13.85 2.19 -12.32
P-Value 0.000 0.031 0.000
The coefficient of Weight (b2 = −0.00633) is quite close to zero, but its t-value (−12.32) is very extreme and the p-value (0.000) is very small, indicating it is a very important predictor in this model. The coefficient for Acc30 (b1 = 1.746) is much larger in magnitude than b2 , but its t-value (2.19) is not nearly so extreme. While its p-value (0.031) is still significant at a 5% level, Acc30 is not nearly as useful as Weight in this model. This illustrates that size of the coefficient alone may not be a good indicator of the importance of its predictor.
CHAPTER P
556 Section P.1 Solutions P.1 We have P (not A) = 1 − P (A) = 1 − 0.4 = 0.6. P.2 We have P (not B) = 1 − P (B) = 1 − 0.3 = 0.7.
P.3 By the additive rule, we have P (A or B) = P (A) + P (B) − P (A and B) = 0.4 + 0.3 − 0.1 = 0.6. P.4 We have P (A if B) =
0.1 P (A and B) = = 0.333 P (B) 0.3
P (B if A) =
0.1 P (A and B) = = 0.25 P (A) 0.4
P.5 We have
P.6 No! If A and B were disjoint, that means they cannot both happen at once, which means P (A and B) = 0. Since we are told that P (A and B) = 0.1, not zero, the events are not disjoint. P.7 We need to check whether P (A and B) = P (A) · P (B). Since P (A and B) = 0.1 and P (A) · P (B) = 0.4 · 0.3 = 0.12, the events are not independent. We can also check from an earlier exercise that P (B if A) = 0.25 = P (B) = 0.3. P.8 We have P (not A) = 1 − P (A) = 1 − 0.8 = 0.2. P.9 We have P (not B) = 1 − P (B) = 1 − 0.4 = 0.6. P.10 By the additive rule, we have P (A or B) = P (A) + P (B) − P (A and B) = 0.8 + 0.4 − 0.25 = 0.95. P.11 We have P (A if B) =
0.25 P (A and B) = = 0.625 P (B) 0.4
P (B if A) =
0.25 P (A and B) = = 0.3125 P (A) 0.8
P.12 We have
P.13 No! If A and B were disjoint, that means they cannot both happen at once, which means P (A and B) = 0. Since we are told that P (A and B) = 0.25, not zero, the events are not disjoint. P.14 We need to check whether P (A and B) = P (A) · P (B). Since P (A and B) = 0.25 and P (A) · P (B) = 0.8 · 0.4 = 0.32, the events are not independent. We can also check from an earlier exercise that P (B if A) = 0.3125 = P (B) = 0.4. P.15 Since A and B are independent, knowing that B occurs gives us no additional information about A, so P (A if B) = P (A) = 0.7. P.16 Since A and B are independent, knowing that A occurs gives us no additional information about B, so P (B if A) = P (B) = 0.6. P.17 Since A and B are independent, we have P (A and B) = P (A) · P (B) = 0.7 · 0.6 = 0.42.
CHAPTER P
557
P.18 Since A and B are independent, we have P (A and B) = P (A)·P (B) = 0.7·0.6 = 0.42. By the additive rule, we have P (A or B) = P (A) + P (B) − P (A and B) = 0.7 + 0.6 − 0.42 = 0.88. P.19 There are two cells that are included as part of event A, so P (A) = 0.2 + 0.1 = 0.3. P.20 There are two cells that are included as part of event not B, so P (not B) = 0.1 + 0.3 = 0.4. P.21 There is one cell that is in both event A and event B, so we have P (A and B) = 0.2. P.22 There are three cells that represent being in event A or event B or both, so we have P (A or B) = 0.2 + 0.4 + 0.1 = 0.7. We could also use the additive rule P (A or B) = P (A) + P (B) − P (A and B) = 0.3 + 0.6 − 0.2 = 0.7. P.23 We have P (A if B) =
0.2 P (A and B) = = 0.333 P (B) 0.6
P (B if A) =
P (A and B) 0.2 = = 0.667 P (A) 0.3
P.24 We have
P.25 No! If A and B were disjoint, that means they cannot both happen at once, which means P (A and B) = 0. We see in the table that P (A and B) = 0.2, not zero, so the events are not disjoint. P.26 We need to check whether P (A and B) = P (A) · P (B). Since P (A and B) = 0.2 and P (A) · P (B) = 0.3 · 0.6 = 0.18, the events are not independent. We can also check from an earlier exercise that P (B if A) = 0.667 = P (B) = 0.6. P.27 The two events are disjoint, since if at least one skittle is red then all three can’t be green. However, they are not independent or complements. P.28 The two events are disjoint because both teams cannot win. They are also complements because if Australia does not win, then South Africa wins. However, they are not independent. P.29 The two events are independent, as Australia winning their rugby match will not change the probability that Poland wins their chess match. However, they are not disjoint or complements. P.30 The two events are not disjoint or complements, as it is possible to have the rolls be {3,5} where the first die is a 3 and the sum is 8. To check independence we need to find P (A) = 1/6
and
P (B) = P ({2, 6} or {3, 5} or {4, 4} or {5, 3} or {6, 3}) = 5/36
There is only one possibility for the intersection so P (A and B) = P ({3, 5}) = 1/36. We then check that P (A if B) =
1/36 P (A and B) = = 1/6 = P (B) = 5/36 P (B) 1/6
so A and B are not independent. We can also verify that P (A and B) = 1/36 = P (A) · P (B) = (1/6) · (5/36) = 5/216 P.31
(a) It will not necessarily be the case that EXACTLY 1 in 10 adults are left-handed for every sample. We can only conclude that approximately 10% will be left-handed in the “long run” (for very large samples).
CHAPTER P
558
(b) The three outcomes each have probability 13 only if they are equally likely. This may not be the case for the results of baseball pitches. (c) To find the probability of two consecutive 1’s on independent dice rolls we should multiply the probabilities instead of adding them. Using the multiplicative rule, the probability that two consecutive 1 . rolls land with a 1 is 16 × 16 = 36 (d) A probability that is not between 0 and 1 does not make sense. P.32
(a) We are told that P (C) = 0.20, P (W ) = 0.105, and P (C and W ) = 0.025.
(b) We use the additive rule for finding the probability of one event or another. P (C or W ) = P (C) + P (W ) − P (C and W ) = 0.20 + 0.105 − 0.025 = 0.28 The probability that a movie is either a comedy or produced by Warner Bros is 0.28. (c) We are finding the probability that a movie is a comedy if it is produced by Warner Bros, so we have P (C if W ) =
P (C and W ) 0.025 = = 0.238 P (W ) 0.105
About 23.8% of movies produced by Warner Bros are comedies. (d) We are finding the probability that a movie is produced by Warner Bros if it is a comedy, so we have P (W if C) =
P (C and W ) 0.025 = = 0.125 P (C) 0.20
About 12.5% of comedies are produced by Warner Bros. (e) To find the probability that a movie is not a comedy we have P (not C) = 1 − P (C) = 1 − 0.20 = 0.80. (f) Saying C and W are disjoint would mean that Warner Bros never makes any comedies. This is not true since we see that 2.5% of all the movies are comedies from Warner Bros. It is clear that C and W are not disjoint. (g) Saying C and W are independent would mean that knowing a movie comes from Warner Bros would give us no information about whether it is a comedy and knowing it is a comedy would give us no information about whether it came from Warner Bros. This is almost true: P (C if W ) = 0.238 which is close to P (C) = 0.20 and P (W if C) = 0.125 which is close to P (W ) = 0.105, but neither result is exactly the same so the two events are not independent (at least for these approximated probability values.) P.33
(a) We are finding P (M P ). There are a total of 329 inductees and 230 of them are performers, so we have 230 P (M P ) = = 0.699 329 The probability that an inductee selected at random will be a performer is 0.699.
(b) We are finding P (not F ). There are a total of 329 inductees and 277 of them do not have any female members, so we have 277 P (not F ) = = 0.842 329 The probability that an inductee selected at random will not have any female members is 0.842.
CHAPTER P
559
(c) In this case, we are interested only in inductees who are performers, and we want to know the probability they have female members, P (F if M P ). There are 230 performers and 43 of those have female members, so we have 43 = 0.187 P (F if M P ) = 230 (d) In this case, we are interested only in inductees that do not have any female members, and we want to know the probability of not being a performer, P (not M P if not F ). There are 277 inductees with no female members and 90 of them are not performers, so we have P (not M P if not F ) =
80 = 0.325 277
(e) We are finding P (M P and not F ). Of the 329 inductees, there are 187 that are performers with no female members, so we have 187 P (M P and not F ) = = 0.568 329 (f) We are finding P (not M P or F ). Of the 329 inductees, 43 + 9 + 90 = 142 are either not performers or have female members (or both), so we have P (not M P or F ) =
142 = 0.432 329
Notice that this is the complement of the event found in part (e). P.34
(a) We are finding P (C). There are a total of 281 inductees and 242 of them were born in Canada, so we have 242 P (C) = = 0.861 281 The probability is 0.861 that an inductee selected at random is a Canadian. It is remarkable how much Canada has dominated the sport!
(b) We are finding P (not D). There are a total of 281 inductees and 87 of them played defense while the other 194 do not, so we have 194 P (not D) = = 0.690 281 The probability that an inductee selected at random will not be a defenseman is 0.690. (c) We are finding P (C and D). Of the 281 inductees, there are 75 that are defensemen born in Canada, so we have 75 P (Cand D) = = 0.267 281 (d) We are finding P (C or D). Of the 281 inductees, 242 were born in Canada and 87 are defensemen and 75 are both, so we have P (C or D) = P (C) + P (D) − P (C and D) =
242 87 75 254 + − = = 0.904 281 281 281 281
(e) In this case, we are interested only in inductees who are Canadian, and we want to know the probability that a Canadian inductee plays defense, P (D if C). There are 242 Canadians and 75 of them play defense, so we have 75 = 0.310 P (D if C) = 242
CHAPTER P
560
(f) In this case, we are interested only in defensemen, and we want to know the probability that a defenseman inductee is also Canadian, P (C if D). There are 87 defensemen and 75 of them are Canadian, so we have 75 = 0.862 P (C if D) = 87 P.35
(a) There are 11 red ones out of a total of 80, so the probability that we pick a red one is 11/80 = 0.1375.
(b) The probability that it is blue is 20/80 = 0.25 so the probability that it is not blue is 1 − 0.25 = 0.75. (c) The single piece can be red or orange, but not both, so these are disjoint events. The probability the randomly selected candy is red or orange is 11/80 + 12/80 = 23/80 = 0.2875. (d) The probability that the first one is blue is 20/80 = 0.25. When we put it back and mix them up, the probability that the next one is blue is also 0.25. By the multiplication rule, since the two selections are independent, the probability both selections are blue is 0.25 · 0.25 = 0.0625. (e) The probability that the first one is red is 11/80. Once that one is taken (since we don’t put it back and we eat it instead), there are only 79 pieces left and 11 of those are green. By the multiplication rule, the probability of a red then a green is (11/80) · (11/79) = 0.191. P.36
(a) There are 18 yellow ones out of a total of 80, so the probability that we pick a yellow one is 18/80 = 0.225.
(b) The probability that it is brown is 8/80 = 0.10, so the probability that it is not brown is 1−0.10 = 0.90. (c) The single piece can be blue or green, but not both, so these are disjoint events. The probability the randomly selected candy is blue or green is 20/80 + 11/80 = 31/80 = 0.3875. (d) The probability that the first one is red is 11/80 = 0.1375. When we put it back and mix them up, the probability that the next one is red is also 0.1375. By the multiplication rule, since the two selections are independent, the probability both selections are red is 0.1375 · 0.1375 = 0.0189. (e) The probability that the first one is yellow is 18/80. Once that one is taken (since we don’t put it back and we eat it instead), there are only 79 pieces left and 20 of those are blue. By the multiplication rule, the probability is (18/80) · (20/79) = 0.0570. P.37 Let S denote successfully making a free throw and F denote missing it. (a) As free throws are independent, we can multiply the probabilities. P (Makes two) = P (S1 and S2 ) = P (S1 ) · P (S2 ) = 0.908 × 0.908 = 0.824 (b) The probability of missing one free throw is P (F ) = 1 − 0.908 = 0.092. So, P (Misses two) = P (F1 and F2 ) = P (F1 ) · P (F2 ) = 0.092 × 0.092 = 0.008 (c) He can either miss the first and make the second shot, or make the first and miss the second. So, P (Makes exactly one) = P (S1 and F2 ) + P (F1 and S2 ) = 0.908 × 0.092 + 0.092 × 0.908 = 0.167 P.38 Let CBM and CBW denote the events that a man or a woman is colorblind, respectively. (a) As 7% of men are colorblind, P (CBM ) = 0.07.
CHAPTER P
561
(b) As 0.4% of women are colorblind, P (not CBW ) = 1 − P (CBW ) = 1 − 0.004 = 0.996. (c) The probability that the woman is not colorblind is 0.996, and the probability that the man is not colorblind is 1 − 0.07 = 0.93. As the man and woman are selected independently, we can multiply their probabilities: P (Neither is Colorblind) = P (not CBM ) · P (not CBW ) = 0.93 × 0.996 = 0.926 (d) The event that “At least one is colorblind” is the complement of part (d) that “Neither is Colorblind” so we have P (At least one is Colorblind) = 1 − P (Neither is Colorblind) = 1 − 0.926 = 0.074 We could also do this part as P (CBM or CBW ) = P (CBM )+P (CBW )−P (CBM and CBW ) = 0.07+0.004−(0.07)(0.004) = 0.074 P.39
(a) The probability that a women is not color-blind is 1 − 0.04 = 0.996, and the probability that a man is not colorblind is 1 − 0.07 = 0.93. As all events are independent, we can multiply their probabilities: P (Nobody is Colorblind) = 0.99625 × 0.9315 = 0.305
(b) The event “At least one is Colorblind” is the complement of the event “Nobody is Colorblind” which has its probability computed in part (a), so P (At least one is Colorblind) = 1 − P (Nobody is Colorblind) = 1 − 0.305 = 0.695 (c) The probability that the randomly selected student is a man is 15/40 = 0.375 and the probability that it is a women is 25/40 = 0.625. Using the additive rule for disjoint events, P (Colorblind)
= P (Colorblind Man OR Colorblind Woman) = P (Colorblind and Man) + P (Colorblind and Woman)
By the multiplicative rule, P (Colorblind and Man) = P (Man) · P (Colorblind if Man) = 0.375 × 0.07 = 0.02625 Similarly, P (Colorblind and Woman) = 0.625 × 0.004 = 0.0025 So, P (Colorblind) = 0.02625 + 0.0025 = 0.029. P.40
85,995 = 0.860. (a) As 85,995 out of 100,000 men live to age 60, the probability is 100,000
73,548 = 0.735. (b) As 73,548 out of 100,000 men live to age 70, the probability is 1 − 100,000
(c) As 17,429 out of 100,000 men live to age 90 and 14,493 out of 100,000 live to age 91, the probability that a man dies at age 90 is 17,429−14,493 = 0.0294. 100,000 (d) Use conditional probability: P (dies at 90 if lives till 90) =
P (dies at 90 and lives till 90) P (dies at 90) 0.0294 = = = 0.169 P (lives till 90) P (lives till 90) 17,429/100,000
CHAPTER P
562 (e) Use conditional probability: P (dies at 90 if lives till 80) =
P (dies at 90) 0.0294 P (dies at 90 and lives till 80) = = = 0.058 P (lives till 80) P (lives till 80) 50,344/100,000
(f) As 85,995 out of 100,000 men live to age 60 and 17,429 out of 100,000 live to age 90, the probability that a man dies between the ages of 60 and 89 is 85,995−17,429 = 0.686. 100,000 (g) Use conditional probability: P (lives till 90 if lives till 60) = P.41
P (lives till 90) 17,429/100,000 P (lives till 90 and lives till 60) = = = 0.203 P (lives till 60) P (lives till 60) 85,995/100,000
(a) The probability that the S&P 500 increased on a randomly selected day is 423/756 = 0.5595.
(b) Assuming independence, the probability that the S&P 500 increases for two consecutive days is 0.5595× 0.5595 = 0.3130 (using the multiplicative rule). The probability that the S&P 500 increases on a day given that it increased the day before remains 0.5595 if the events are independent. (c) The probability that the S&P 500 increases for two consecutive days is 234/755 = 0.3099. The probability that the S&P 500 increases on a day, given that it increased on the previous day is P (Increase both days) 0.3099 = = 0.5539 P (Increase first day) 0.5595 (d) The difference between the results in part (b) and part (c) is very small and insignificant, so we have little evidence that daily changes are not independent. (However, since the question does not ask for a formal hypothesis test, other answers are acceptable.) P.42 If you are served one of the pancakes at random, let A be the event that the side facing you is burned and B be the event that the other side is burned. We want to find P (B if A). As only one of three pancakes is burned on both sides, P (A and B) = 1/3. As 3 out of 6 total sides are burned, P (A) = 3/6 = 1/2. So, P (B if A) =
1/3 2 P (A and B) = = P (A) 1/2 3
CHAPTER P
563
Section P.2 Solutions
P.43 Here is the tree with the missing probabilities filled in.
Note that the probabilities of all branches arising from a common point must sum to one, so
P (I) + P (II) + P (III) = 1 =⇒ P (I) = 1 − 0.43 − 0.31 = 0.26 P (A if II) + P (B if II) = 1 =⇒ P (A if II) = 1 − 0.24 = 0.76
We obtain the probabilities at the end of each pair of branches with the multiplicative rule, so
P (II and B) = P (II) · P (B if II) = 0.43(0.24) = 0.1032 P (III and A) = P (III) · P (A if III) = 0.31(0.80) = 0.248
P.44 Here is the tree with the missing probabilities filled in.
CHAPTER P
564
We use the conditional probability rule to find P (A if I) =
0.09 P (I and A) = = 0.5 P (I) 0.18
We use the complement rule (sum of a branches from one point equals one) to find P (B if I) = 1 − P (A if I) = 1 − 0.5 = 0.5 We use the multiplicative rule to find P (I and B) = P (I) · P (B if I) = 0.18(0.5) = 0.09 From the multiplicative rule for III and A we have P (III and A) = P (III) · P (A if III) =⇒ 0.268 = P (III) · 0.67 =⇒ P (III) = Using the complement rule several more times P (II) P (A if II)
= =
1 − P (I) − P (III)) = 1 − 0.18 − 0.4 = 0.42 1 − P (B if II) = 1 − 0.45 = 0.55
P (B if III) = 1 − P (A if III) = 1 − 0.67 = 0.33 Finally, several more multiplicative rules along pairs of branches give P (II and A) = P (II) · P (A if II) = 0.42(0.55) = 0.231 P (II and B) = P (II) · P (B if II) = 0.42(0.45) = 0.189 P (III and B) = P (III) · P (B if III) = 0.40(0.33) = 0.132
0.268 = 0.4 0.67
CHAPTER P
565
P.45 Here is the tree with the missing probabilities filled in.
First, the sum of all the joint probabilities for all pairs of branches must be one so we have P (A and I) = 1 − (0.225 + 0.16 + 0.45 + 0.025 + 0.025) = 0.115 Using the total probability rule we have P (I) =
P (I and A) + P (I and B) + P (I and C) = 0.115 + 0.225 + 0.16 = 0.5
P (II) =
P (II and A) + P (II and B) + P (II and C) = 0.45 + 0.025 + 0.025 = 0.5
The remaining six probabilities all come from the conditional probability rule. For example, P (A if I) =
0.115 P (A and I) = = 0.23 P (I) 0.5
P.46 Here is the tree with the missing probabilities filled in.
CHAPTER P
566 Using the complement rule (sum of all branches from one point must be one), we have P (II)
=
1 − P (I) = 1 − 0.38 = 0.62
P (C if I)
=
1 − P (A if I) − P (B if I) = 1 − 0.00 − 0.19 = 0.81
By the conditional probability rule, we have P (C if II) =
P (II and C) 0.00 = = 0.00 P (II) 0.62
One more complement rule gives P (B if II) = 1 − P (A if II) − P (C if II) = 1 − 0.56 − 0.00 = 0.44 Finally, we apply the multiplicative rule several times to get P (I and A) = P (I and B) =
P (I) · P (A if I) = 0.38(0.00) = 0.00 P (I) · P (B if I) = 0.38(0.19) = 0.0722
P (I and C) P (II and A)
= =
P (I) · P (C if I) = 0.38(0.81) = 0.3078 P (II) · P (A if II) = 0.62(0.56) = 0.3472
P (II and B) =
P (II) · P (B if II) = 0.62(0.44) = 0.2728
P.47 We use the multiplicative rule to see P (B and R) = P (B) · P (R if B) = 0.4 · 0.2 = 0.08. P.48 We use the multiplicative rule to see P (A and S) = P (A) · P (S if A) = 0.6 · 0.1 = 0.06. P.49 This conditional probability is shown directly on the tree diagram. Since we assume A is true, we follow the A branch and then find the probability of R, which we see is 0.9. P.50 This conditional probability is shown directly on the tree diagram. Since we assume B is true, we follow the B branch and then find the probability of S, which we see is 0.8. P.51 We see in the tree diagram that there are two ways for R to occur. We find these two probabilities and add them up, using the total probability rule. We see that P (A and R) = 0.6 · 0.9 = 0.54 so that top branch can be labeled 0.54. We also see that P (B and R) = 0.4 · 0.2 = 0.08. Since either A or B must occur (since these are the only two branches in this part of the tree), these are the only two ways that R can occur, so P (R) = P (A and R) + P (B and R) = 0.54 + 0.08 = 0.62. P.52 We see in the tree diagram that there are two ways for S to occur. We find these two probabilities and add them up, using the total probability rule. We see that P (A and S) = 0.6 · 0.1 = 0.06 so this branch can be labeled 0.06. We also see that P (B and S) = 0.4 · 0.8 = 0.32. Since either A or B must occur (since these are the only two branches in this part of the tree), these are the only two ways that S can occur, so P (S) = P (A and S) + P (B and S) = 0.06 + 0.32 = 0.38. P.53 We know that
P (A and S) P (S) so we need to find P (A and S) and P (S). Using the multiplicative rule, we see that P (A and S) = 0.6 · 0.1 = 0.06. We also see that P (B and S) = 0.4 · 0.8 = 0.32. By the total probability rule, we have P (A if S) =
P (A if S) =
P (A and S) 0.06 P (A and S) = = = 0.158 P (S) P (A and S) + P (B and S) 0.06 + 0.32
We see that P (A if S) = 0.158.
CHAPTER P
567
P.54 We know that P (B if R) =
P (B and R) P (R)
so we need to find P (B and R) and P (R). Using the multiplicative rule, we see that P (B and R) = 0.4·0.2 = 0.08. We also see that P (A and R) = 0.6 · 0.9 = 0.54. By the total probability rule, we have P (B if R) =
P (B and R) 0.08 P (B and R) = = = 0.129 P (R) P (A and R) + P (B and R) 0.54 + 0.08
We see that P (B if R) = 0.129. P.55 We first create the tree diagram using the information given, and use the multiplication rule to fill in the probabilities at the ends of the branches. For example, for the top branch, the probability of having 1 occupant in an owner-occupied housing unit is 0.65 · 0.217 = 0.141.
(a) We see at the end of the branch with rented and 2 occupants that the probability is 0.091. (b) There are two branches that include having 3 or more occupants and we use the addition rule to see that the probability of 3 or more occupants is 0.273 + 0.132 = 0.405. (c) This is a conditional probability (or Bayes’ rule). We have: P (rent if 1) =
P (rent and 1 person) 0.127 0.127 = = = 0.474 P (1 person) 0.141 + 0.127 0.268
If a housing unit has only 1 occupant, the probability that it is rented is 0.474. P.56 We use F to denote the event of having fibromyalgia and R to denote the event of having restless leg syndrome. The tree diagram is shown, and we use the multiplication rule to find the probabilities at the end of the branches. For example, for the top branch, we have P (F and R) = 0.02 · 0.33 = 0.0066.
CHAPTER P
568
We are finding P (F if R). By adding the probabilities of the R branches, we see that P (R) = 0.0066 + 0.0294 = 0.036. The conditional probability is P (F if R) =
0.0066 P (F and R) = = 0.183 P (R) 0.036
If a person has restless leg syndrome, the probability that the person will also have fibromyalgia is about 18%. P.57 We are given P (Positive if no Cancer) = 86.6/1000 = 0.0866, P (Positive if Cancer) = 1 − 1.1/1000 = 0.9989, and P (Cancer) = 1/38 = 0.0263 Applying Bayes’ rule we have P (Cancer if Positive)
= = =
P (Cancer)P (Positive if Cancer) P (no Cancer)P (Positive if no Cancer) + P (Cancer)P (Positive if Cancer) 0.0263 · 0.9989 (1 − 0.0263) · 0.0866 + 0.0263 · 0.9989 0.2375
P.58 Let F , C, and S denote the three types of pitches (fastball, curveball, and spitball) and K denote the pitch is a strike. We are given the probability for each type of pitch P (F ) = 0.60
P (C) = 0.25
P (S) = 1 − P (F ) − P (C) = 1 − 0.60 − 0.25 = 0.15
and the conditional probability for a strike with each pitch
CHAPTER P
569
P (K if F ) = 0.70
P (K if C) = 0.50
P (K if S) = 0.30
We need to find the probability of a certain type of pitch (curveball) given that we know the outcome (strike). This calls for an application of Bayes’ rule: P (C if K)
P (C)P (K if C) P (F )P (K if F ) + P (C)P (K if C) + P (S)P (K if S) 0.25 · 0.50 = 0.60 · 0.70 + 0.25 · 0.50 + 0.15 · 0.30 0.125 = 0.59 = 0.212
=
If we see Slippery throw a strike, there’s a 21.2% chance it was with a curveball. P.59
(a) Using the formula for conditional probability, P (Free if Spam) =
0.0357 P (Free and Spam) = = 0.266 P (Spam) 0.134
(b) Using the formula for conditional probability, P (Spam if Free) =
P (Free and Spam) 0.0357 = = 0.752 P (Free) 0.0475
P.60 Using the Bayes’ rule, P (Spam if Text) =
P (Spam)P (Text if Spam) 0.134 · 0.3855 = = 0.737 P (Text) 0.0701
P.61 Using Bayes’ rule, P (Spam if Text and Free)
= = =
P (Spam)P (Text and Free if Spam) P (Spam)P (Text and Free if Spam) + P (not Spam)P (Text and Free if not Spam) 0.134 · 0.17 0.134 · 0.17 + 0.866 · 0.0006 0.978
P.62 Applying the total probability rule, P (Free and Text)
=
P (Spam)P (Text and Free if Spam) + P (not Spam)P (Text and Free if not Spam)
= 0.134 · 0.17 + 0.866 · 0.0006 = 0.0233 Again applying the total probability rule, P (Free and not Text) = P (Free) − P (Free and Text) = 0.0475 − 0.0233 = 0.0242 and P (Free and not Text if Spam) = P (Free if Spam) − P (Free and Text if Spam) = 0.2664 − 0.1700 = 0.0964 So, applying Bayes’ rule, P (Spam if Free and not Text) =
0.134 · 0.0964 P (Spam)P (Free and not Text if Spam) = = 0.534 P (Free and not Text) 0.0242
CHAPTER P
570 Section P.3 Solutions P.63 A discrete random variable, as it can only take the values {0, 1, 2, . . . , 10}. P.64 A discrete random variable, as it can only take the values {0, 0.1, 0.2, . . . , 1}.
P.65 A discrete random variable, as an ace must appear somewhere between the 1st and 48th card dealt. P.66 Not a random variable, as it does not have a numerical outcome. P.67 A continuous random variable, as it can take any value greater than or equal to 0 lbs. P.68 There are two conditions for a probability function. The first is that all the probabilities are between 0 and 1 and that is true here. The second is that the probabilities add up to 1.0, and that is true here because 0.4 + 0.3 + 0.2 + 0.1 = 1.0. P.69 We see that P (X = 3) = 0.2 and P (X = 4) = 0.1 so P (X = 3 or X = 4) = 0.2 + 0.1 = 0.3. P.70 We have P (X > 1) = P (X = 2 or X = 3 or X = 4) = 0.3 + 0.2 + 0.1 = 0.6. P.71 We have P (X < 3) = P (X = 1 or X = 2) = 0.4 + 0.3 = 0.7. P.72 We have P (X is an odd number) = P (X = 1 or X = 3) = 0.4 + 0.2 = 0.6. P.73 We have P (X is an even number) = P (X = 2 or X = 4) = 0.3 + 0.1 = 0.4. P.74 The probability values have to add up to 1.0 so we have ? = 1.0 − (0.1 + 0.1 + 0.2) = 0.6. P.75 The probability values have to add up to 1.0 so we have ? = 1.0 − (0.2 + 0.2 + 0.2) = 0.4. P.76 The probability values have to add up to 1.0 but the sum of the two values there is already greater than 1 (we see 0.5+0.6 = 1.1). Since a probability cannot be a negative number, this cannot be a probability function. P.77 The probability values have to add up to 1.0 but the sum of the values there is already greater than 1 (we see 0.3 + 0.3 + 0.3 + 0.3 = 1.2). Since a probability cannot be a negative number, this cannot be a probability function. P.78
(a) We multiply the values of the random variable (in this case, 1, 2, and 3) by the corresponding probability and add up the results. We have μ = 1(0.2) + 2(0.3) + 3(0.5) = 2.3 The mean of this random variable is 2.3.
(b) To find the standard deviation, we subtract the mean of 2.3 from each value, square the difference, multiply by the probability, and add up the results to find the variance; then take a square root to find the standard deviation. σ2
=
=⇒ σ
=
(1 − 2.3)2 · 0.2 + (2 − 2.3)2 · 0.3 + (3 − 2.3)2 · 0.5 = 0.61 √ 0.61 = 0.781
CHAPTER P P.79
571
(a) We multiply the values of the random variable (in this case, 10, 20, and 30) by the corresponding probability and add up the results. We have μ = 10(0.7) + 20(0.2) + 30(0.1) = 14 The mean of this random variable is 14.
(b) To find the standard deviation, we subtract the mean of 14 from each value, square the difference, multiply by the probability, and add up the results to find the variance; then take a square root to find the standard deviation. σ2 =⇒ σ P.80
= (10 − 14)2 · 0.7 + (20 − 14)2 · 0.2 + (30 − 14)2 · 0.1 = 44 √ = 44 = 6.63
(a) We multiply the values of the random variable (in this case, 20, 30, 40, and 50) by the corresponding probability and add up the results. We have μ = 20(0.6) + 30(0.2) + 40(0.1) + 50(0.1) = 27 The mean of this random variable is 27.
(b) To find the standard deviation, we subtract the mean of 27 from each value, square the difference, multiply by the probability, and add up the results to find the variance; then take a square root to find the standard deviation.
P.81
σ2
=
=⇒ σ
=
(20 − 27)2 · 0.6 + (30 − 27)2 · 0.2 + (40 − 27)2 · 0.1 + (50 − 27)2 · 0.1 = 101 √ σ = 101 = 10.05
(a) We multiply the values of the random variable (in this case, 10, 12, 14, and 16) by the corresponding probability and add up the results. We have μ = 10(0.25) + 12(0.25) + 14(0.25) + 16(0.25) = 13 The mean of this random variable is 13, which makes sense since the probability distribution is symmetric and 13 is right in the middle.
(b) To find the standard deviation, we subtract the mean of 13 from each value, square the difference, multiply by the probability, and add up the results to find the variance; then take a square root to find the standard deviation.
P.82
σ2
=
=⇒ σ
=
(10 − 13)2 · 0.25 + (12 − 13)2 · 0.25 + (14 − 13)2 · 0.25 + (16 − 13)2 · 0.25 = 5 √ 5 = 2.236
(a) We see that 0.217 + 0.363 + 0.165 + 0.145 + 0.067 + 0.026 + 0.018 = 1.001. This is different from 1 just by round-off error on the individual probabilities.
(b) We have p(1) + p(2) = 0.217 + 0.363 = 0.580. (c) We have p(5) + p(6) + p(7) = 0.067 + 0.026 + 0.018 = 0.111. (d) It is easiest to find this probability using the complement rule, since more than 1 occupant is the complement of 1 occupant for this random variable. The answer is 1 − p(1) = 1 − 0.217 = 0.783. P.83
(a) We see that 0.362 + 0.261 + 0.153 + 0.114 + 0.061 + 0.027 + 0.022 = 1, as expected.
CHAPTER P
572 (b) We have p(1) + p(2) = 0.362 + 0.261 = 0.623. (c) We have p(5) + p(6) + p(7) = 0.061 + 0.027 + 0.022 = 0.110.
(d) It is easiest to find this probability using the complement rule, since more than 1 occupant is the complement of 1 occupant for this random variable. The answer is 1 − p(1) = 1 − 0.362 = 0.638. P.84
(a) We multiply the values of the random variable by the corresponding probability and add up the results. We have μ = 1(0.217) + 2(0.363) + 3(0.165) + 4(0.145) + 5(0.067) + 6(0.026) + 7(0.018) = 2.635 The average household size for an owner-occupied housing unit in the US is 2.635 people.
(b) To find the standard deviation, we subtract the mean of 2.635 from each value, square the difference, multiply by the probability, and add up the results to find the variance; then take a square root to find the standard deviation. σ2
= = =
=⇒ σ P.85
(1 − 2.635)2 · 0.217 + (2 − 2.635)2 · 0.363 + . . . + (7 − 2.635)2 · 0.018 2.03072 √ 2.03072 = 1.425
(a) We multiply the values of the random variable by the corresponding probability and add up the results. We have μ = 1(0.362) + 2(0.261) + 3(0.153) + 4(0.114) + 5(0.061) + 6(0.027) + 7(0.022) = 2.42 The average household size for a renter-occupied housing unit in the US is 2.42 people.
(b) To find the standard deviation, we subtract the mean of 2.42 from each value, square the difference, multiply by the probability, and add up the results to find the variance; then take a square root to find the standard deviation. σ2
= =
=⇒ σ
=
(1 − 2.42)2 · 0.362 + (2 − 2.42)2 · 0.261 + . . . + (7 − 2.42)2 · 0.022 2.3256 √ 2.3256 = 1.525
P.86 Let the random variable X measure fruit fly lifetimes (in months). (a) The probabilities must add to 1, so the proportion of dying in the second month is P (X = 2) = 1 − (0.30 + 0.20 + 0.15 + 0.10 + 0.05) = 1 − 0.80 = 0.20 (b) P (X > 4) = P (X = 5) + P (X = 6) = 0.10 + 0.05 = 0.15 (c) The mean fruit fly lifetime is μ = 1(0.30) + 2(0.20) + 3(0.20) + 4(0.15) + 5(0.10) + 6(0.05) = 2.7 months (d) The standard deviation of fruit fly lifetimes is √ σ = (1 − 2.7)2 · 0.30 + (2 − 2.7)2 · 0.20 + . . . + (6 − 2.7)2 · 0.05 = 2.31 = 1.52 months
CHAPTER P P.87
573
(a) As the probabilities must sum to 1, so P (X = 4) = 1 − (0.29 + 0.3 + 0.2 + 0.17) = 1 − 0.96 = 0.04
(b) P (X < 2) = P (X = 0) + P (X = 1) = 0.29 + 0.3 = 0.59 (c) To find the mean we use μ = 0 · 0.29 + 1 · 0.3 + 2 · 0.2 + 3 · 0.17 + 4 · 0.04 = 1.37 cars (d) To find the standard deviation we use
P.88
σ2
= =
=⇒ σ
=
(0 − 1.37)2 0.29 + (1 − 1.37)2 0.3 + (2 − 1.37)2 0.2 + (3 − 1.37)2 · 0.17 + (4 − 1.37)2 0.04 1.393 √ 1.393 = 1.180 cars
(a) This is a conditional probability, P (A if B), where the event A is X = 1 and the conditioning event B is {X = 1 or X = 2}. From the probability function we see that P (A) = 0.30 and P (B) = 0.30 + 0.20 = 0.50. Note also that P (A and B) = P (A) since X = 1 is the only outcome they have in common. We have P (A if B) =
P (A) 0.30 P (A and B) = = = 0.6 P (B) P (B) 0.50
(b) This is a conditional probability, P (C if D), where the event C is {X = 5 or X = 6} and the conditioning event D is X ≥ 3. From the probability function we see that P (C) = 0.10 + 0.05 = 0.15 and P (D) = 0.20 + 0.15 + 0.10 + 0.05 = 0.50. Note also that P (C and D) = P (C) since X = 5 and X = 6 are the only outcomes they have in common. We have P (C if D) =
P (C) 0.15 P (C and D) = = = 0.3 P (D) P (D) 0.50
A fruit fly that makes it safely through its first two months will have a 30% chance of living to five or six months. P.89 Let X1 , X2 , and X3 represent the sales on each of three consecutive days. Since daily sales are independent, we can multiple their probabilities. So, the probability that the no cars are sold in three consecutive days is P (X1 = 0 and X2 = 0 and X3 = 0)
P.90
=
P (X1 = 0) · P (X2 = 0) · P (X3 = 0)
= =
0.29 · 0.29 · 0.29 0.293
=
0.0244
(a) There are three possible values for the random variable, {0, 1, 2}. Let S denote he successfully makes a shot and F that he fails to make it. P (X = 2) P (X = 0) P (X = 1)
= = =
P (S1 and S2 ) = P (S1 ) · P (S2 ) = 0.908 · 0.908 = 0.8245 P (F1 and F2 ) = P (F1 ) · P (F2 ) = 0.092 · 0.092 = 0.0085 P (F1 and S2 ) + P (S1 and F2 ) = 0.092 · 0.908 + 0.908 · 0.092 = 0.1671
So, the probability distribution of X is
CHAPTER P
574 x
0
1
2
p(x)
0.0085
0.1671
0.8245
(b) The mean number of free throws made in two attempts is μ = 0(0.0085) + 1(0.1671) + 2(0.8245) = 1.816 P.91
(a) If the woman dies during the first year the organization loses $100,000 − c, if the woman dies during the second year the organization loses $100,000 − 2c, and so on. If the woman does not die during the five year contract the organization earns 5c dollars, and the probability of this is 1 − (0.00648 + 0.00700 + 0.00760 + 0.00829 + 0.00908) = 0.96155. The probability distribution of the profit (as a function of the yearly fee c) is given below. x p(x)
c − $100,000 0.00648
2c − $100,000 0.00700
3c − $100,000 0.00760
4c − $100,000 0.00829
5c − $100,000 0.00908
5c 0.96155
(b) In terms of c, μ
=
(c − $100,000) · 0.00648 + (2c − $100,000) · 0.00700 + . . . + (5c − $100,000) · 0.00908 + 5c · 0.96155
=
4.9296c − $3845
3845 (c) Setting μ = 0 and solving, we get c = 4.9296 = 779.98, so the organization would have to charge approximately $779.98 per year.
P.92
(a) We know that P (X = $29.95) = 2 · P (X = $39.95) and P (X = $23.95) = 3 · P (X = $39.95). It follows that 1 = P (X = $23.95) + P (X = $29.95) + P (X = $39.95) = 6 · P (X = $39.95), so P (X = $39.95) = 16 . The probability distribution of X is x p(x)
$23.95 3/6
$29.95 2/6
$39.95 1/6
(b) The mean μ of X is μ = $23.95 ·
3 2 1 + $29.95 · + $39.95 · = $28.62 6 6 6
(c) The variance σ 2 of X is 3 2 1 + (29.95 − 28.62)2 · + (39.95 − 28.62)2 · = 32.89 6 6 6 √ So, the standard deviation is σ = 32.89 = $5.73 σ 2 = (23.95 − 28.62)2 ·
P.93
(a) We find the probability an street address starts with “1” as P (X = 1) = log10 (1 + 1/1) = log10 (2) = 0.301 For an address starting with the digit “9” we have P (X = 1) = log10 (1 + 1/9) = log10 (1.111 . . .) = 0.046
CHAPTER P
575
(b) To find P (X > 2) we use the complement rule to find P (X > 2)
P.94
= =
1 − [P (X = 1) + P (X = 2)] 1 − [log10 (1 + 1/1) + log10 (1 + 1/2)]
= = =
1 − [log10 (2) + log10 (1.5)] 1 − [0.301 + 0.176] 1 − 0.477 = 0.523
(a) Using the formula for the probability function with p = 1/6 and k = 3 we have 3−1 2 1 1 5 1 P (X = 3) = = = 0.116 1− 6 6 6 6
(b) The event “more than three turns to finish” or X > 3 includes X = 4, 5, 6, . . ., an infinite number of possible outcomes! Fortunately we can use the complement rule. P (X > 3)
= =
1 − (p(1) + p(2) + p(3)) 1 2 0 1 1 1 5 5 5 + + 1− 6 6 6 6 6 6
=
1 − [0.1667 + 0.1389 + 0.1157]
=
1 − 0.4213 = 0.5787
576 Section P.4 Solutions P.95 This is a binomial random variable, with n = 10 and p = 1/6. P.96 Not binomial, since the number of rolls is not fixed. P.97 Not binomial, since it is not clear what is counted as a success. P.98 This is a binomial random variable, with n = 75 and p = 0.3. P.99 This is a binomial random variable, with n = 100 and p = 0.51. P.100 We see that 4! = 4 · 3 · 2 · 1 = 24. P.101 We see that 7! = 7 · 6 · 5 · 4 · 3 · 2 · 1 = 5040. P.102 We see that 8! = 8 · 7 · 6 · 5 · 4 · 3 · 2 · 1 = 40,320. P.103 We see that 6! = 6 · 5 · 4 · 3 · 2 · 1 = 720. 8 8·7·6·5·4·3·2·1 336 8! = = = 56. P.104 We have = 3 3!(5!) 3·2·1·5·4·3·2·1 6 5 5! 5·4·3·2·1 20 P.105 We have = = = = 10. 2 2!(3!) 2·1·3·2·1 2 10 10! 10 · 9 · 8 · 7 · 6 · 5 · 4 · 3 · 2 · 1 90 P.106 We have = = = = 45. 8 8!(2!) 8·7·6·5·4·3·2·1·2·1 2 6 6! 6·5·4·3·2·1 6 P.107 We have = = = = 6. 5 5!(1!) 5·4·3·2·1·1 1 6 6! P.108 We first calculate that = = 15. We then find 2 2!(4!) 6 P (X = 2) = (0.32 )(0.74 ) = 15(0.32 )(0.74 ) = 0.324 2 8 8! = 8. We then find P.109 We first calculate that = 7 7!(1!) 8 P (X = 7) = (0.97 )(0.11 ) = 8(0.97 )(0.11 ) = 0.383 7 10 10! = 120. We then find P.110 We first calculate that = 3 3!(7!) 10 P (X = 3) = (0.43 )(0.67 ) = 120(0.43 )(0.67 ) = 0.215 3
CHAPTER P
CHAPTER P
577
12 12! = 495. We then find P.111 We first calculate that = 8 8!(4!) P (X = 8) =
12 (0.758 )(0.254 ) = 495(0.758 )(0.254 ) = 0.194 8
P.112 The mean is μ = np = 6(0.4) = 2.4 and the standard deviation is σ=
np(1 − p) =
6(0.4)(0.6) =
√
1.44 = 1.2
P.113 The mean is μ = np = 10(0.8) = 8 and the standard deviation is σ=
np(1 − p) =
10(0.8)(0.2) =
√
1.6 = 1.265
P.114 The mean is μ = np = 30(0.5) = 15 and the standard deviation is σ=
np(1 − p) =
30(0.5)(0.5) =
√
7.5 = 2.74
P.115 The mean is μ = np = 800(0.25) = 200 and the standard deviation is σ=
np(1 − p) =
800(0.25)(0.75) =
√
150 = 12.25
P.116 A probability function gives the probability for each possible value of the random variable. This is a binomial random variable with n = 3 and p = 0.49 (since we are counting the number of girls not boys). The probability of 0 girls is: 3 P (X = 0) = (0.490 )(0.513 ) = 1 · 1 · 0.513 = 0.133 0 The probability of 1 girl is: 3 P (X = 1) = (0.491 )(0.512 ) = 3 · (0.491 )(0.512 ) = 0.382 1 The probability of 2 girls is: 3 P (X = 2) = (0.492 )(0.511 ) = 3 · (0.492 )(0.511 ) = 0.367 2 The probability of 3 girls is: P (X = 3) =
3 (0.493 )(0.510 ) = 1 · (0.493 ) · 1 = 0.118 3
We can summarize these results with a table for the probability function. x p(x)
0 0.133
1 0.382
2 0.367
3 0.118
Notice that the four probabilities add up to 1, as we expect for a probability function.
CHAPTER P
578
P.117 A probability function gives the probability for each possible value of the random variable. There are five possible values for this random variable: 0 seniors, 1 senior, 2 seniors, 3 seniors, or all 4 seniors. This is a binomial random variable with n = 4 and p = 0.25. The probability that none of the students are seniors is: 4 P (X = 0) = (0.250 )(0.754 ) = 1 · 1 · 0.754 = 0.316 0 The probability of 1 senior is: P (X = 1) =
4 (0.251 )(0.753 ) = 4 · (0.251 )(0.753 ) = 0.422 1
The probability of 2 seniors is: 4 P (X = 2) = (0.252 )(0.752 ) = 6 · (0.252 )(0.752 ) = 0.211 2 The probability of 3 seniors is: 4 P (X = 3) = (0.253 )(0.751 ) = 4 · (0.253 )(0.751 ) = 0.047 3 The probability that all 4 students are seniors is: 4 P (X = 4) = (0.254 )(0.750 ) = 1 · (0.254 ) · 1 = 0.004 4 We can summarize these results with a table for the probability function. x p(x)
0 0.316
1 0.422
2 0.211
3 0.047
4 0.004
Notice that the five probabilities add up to 1, as we expect for a probability function. P.118 If X is the random variable giving the number of college graduates in a random sample of 12 US adults, then X is a binomial random variable with n = 12 and p = 0.275. We are finding P (X = 6). We first calculate 12 12! = 924 = 6 6!(6!)
We then find P (X = 6) =
12 (0.2756 )(0.7256 ) = 924(0.2756 )(0.7256 ) = 0.058 6
The probability is only 0.058 that exactly half of the sample are college graduates. P.119 If X is the random variable giving the number of senior citizens (aged 65 or older) in a random sample of 10 people in the US, then X is a binomial random variable with n = 10 and p = 0.13. We are finding P (X = 3) and P (X = 4). To find P (X = 3), we first calculate 10 10! = 120 = 3!(7!) 3
CHAPTER P
579
We then find P (X = 3) = To find P (X = 4), we first calculate
We then find P (X = 4) =
10 (0.133 )(0.877 ) = 120(0.133 )(0.877 ) = 0.099 3 10 4
=
10! = 210 4!(6!)
10 (0.134 )(0.876 ) = 210(0.134 )(0.876 ) = 0.026 4
The probability is 0.099 that 3 of the 10 people are senior citizens and is 0.026 that 4 of them are senior citizens. P.120 If X is the random variable giving the number of owner-occupied units in a random sample of 20 housing units in the US, then X is a binomial random variable with n = 20 and p = 0.65. 20 20! = 15,504. We then find (a) To find P (X = 15), we first calculate = 15!(5!) 15 20 P (X = 15) = (0.6515 )(0.355 ) = 15,504(0.6515 )(0.355 ) = 0.1272 15 (b) We know that P (X ≥ 18) = P (X = 18) + P (X = 19) + P (X = 20), and we calculate each of the terms separately and add them up. We have 20 P (X = 18) = (0.6518 )(0.352 ) = 190(0.6518 )(0.352 ) = 0.0100 18 20 P (X = 19) = (0.6519 )(0.351 ) = 20(0.6519 )(0.351 ) = 0.0020 19 20 P (X = 20) = (0.6520 )(0.350 ) = 1 · (0.6520 ) · 1 = 0.0002 20 Then we have P (X ≥ 18) = P (X = 18) + P (X = 19) + P (X = 20) = 0.0100 + 0.0020 + 0.0002 = 0.0122 P.121 The mean is μ = np = 3(0.49) = 1.47 girls and the standard deviation is √ σ = np(1 − p) = 3(0.49)(0.51) .7497 = 0.866 P.122 The mean is μ = np = 4(0.25) = 1 senior (1 senior among the 4 students makes sense since about 25% of the overall student body are seniors). The standard deviation is σ =
√ np(1 − p) = 4(0.25)(0.75) = 0.75 = 0.866.
P.123 This is a binomial random variable with n = 12 and p = 0.275. The mean is μ = np = 12(0.275) = 3.3 college graduates in a sample of 12. The standard deviation is σ =
np(1 − p) =
12(0.275)(0.725) =
√ 2.3925 = 1.55.
CHAPTER P
580 P.124 This is a binomial random variable with n = 10 and p = 0.13. The mean is μ = np = 10(0.13) = 1.3 senior citizens in a sample of 10. The standard deviation is σ =
np(1 − p) =
10(0.13)(0.87) =
√
1.131 = 1.063.
P.125 This is a binomial random variable with n = 20 and p = 0.65. The mean is μ = np = 20(0.65) = 13.0 owner-occupied units in a sample of 20. The standard deviation is σ =
np(1 − p) =
20(0.65)(0.35) =
√
4.55 = 2.13.
P.126 (a) Let X be the number of free throws Stephen Curry makes during the game. Then, X has a binomial distribution with n = 8 and p = 0.908. So, P (X ≥ 7)
= = = =
P (X = 7) + P (X = 8) 8 8 0.9087 0.092 + 0.9088 0.0920 7 8 0.3745 + 0.4620 0.8365
(b) Let X be the number of free throws Stephen Curry makes during the season. Then, X has a binomial distribution with n = 80 and p = 0.908. So, P (X ≥ 70)
=
P (X = 70) + P (X = 71) + . . . + P (X = 80) 80 80 80 0.90870 0.09210 + 0.90871 0.0929 + . . . + 0.90880 0.0920 70 71 80 0.0833 + 0.1157 + 0.1428 + . . . + 0.0004
=
0.8845
= =
(c) The mean for n = 8 free throws in a game μ = 8 · 0.908 = 7.26 The standard deviation for 8 free throws in a game is √ σ = 8 · 0.908 · 0.092 = 0.817 (d) The mean is for 80 free throws in the playoffs is μ = 80 · 0.908 = 72.64 The standard deviation for 80 free throws in the playoffs is √ σ = 80 · 0.908 · 0.092 = 2.59 P.127 Let X measure the number of passengers (out of 32) who show up for a flight. For each passenger we have a 90% chance of showing up, so X is a binomial random variable with n = 32 and p = 0.90.
CHAPTER P
581
(a) The mean number of passengers on each flight is μ = np = 32(0.9) = 28.8 people. (b) Everyone gets a seat when X ≤ 30. To find this probability we use the complement rule (find the chance too many people show up with X = 31 or X = 32, then subtract from one). P (X ≤ 30)
= =
1 − [P (X = 31) + P (X = 32)] 32 32 1− 0.931 0.11 + 0.932 0.10 31 0
= = =
1 − 32 · 0.931 (0.1) + 1 · 0.932 · 1 1 − [0.122 + 0.034] 1 − 0.156
=
0.844
Everyone will have a seat on about 84.4% of the flights. The airline will need to deal with overbooked passengers on the other 15.6% of the flights. P.128 Let X be a binomial random variable with n trials and probability of success p and Y = X n be the corresponding proportion. So, P (X = k) = P (Y = k/n) for all k. Let u be the mean of Y . As the mean of X is np = k · P (X = k), np
=
np
=
k · P (X = k)
n
np
=
np
n = nμ
k · P (X = k) n k · P (Y = k/n) n
It follows that the mean of the sample proportion Y is μ = p. Let σ be the standard deviation of Y . As the variance of X is np(1 − p) = np(1 − p)
=
np(1 − p)
=
np(1 − p)
=
np(1 − p)
=
(k − np)2 P (X = k),
(k − np)2 P (X = k) k 2 2 ( − p) P (X = k) n n k 2 2 n ( − p) P (Y = k/n) n n2 σ 2
It follows that the variance for the sample proportion Y is σ 2 = p(1−p) , and therefore σ = n
p(1−p) . n
582
CHAPTER P
Section P.5 Solutions P.129 The area for values below 25 is clearly more than half the total area, but not as much as 95%, so 62% is the best estimate.
P.130 The area for values above 30 is quite small relative to the total area, so 4% is the best estimate.
P.131 Almost all of the area under the density curve is between 10 and 30, so 95% is the best estimate.
P.132 The area below any density curve must be equal to one. The area below curve (c) is clearly larger than either (a) or (b), which have similar areas to each other. Therefore, (a) and (b) are the valid densities and (c) is not.
CHAPTER P
583
P.133 The plots below show the three required regions as areas in a N (0, 1) distribution. We see that the areas are (a) 0.8508 (b) 0.9332 (c) 0.1359 Distribution Plot Normal, Mean=0, StDev=1 0.4
0.4 0.9332
0.2
0.2
0.1
0.2
0.1
0.0
0 x
1.04
(a) below 1.04
0.0
0.3
Density
0.3
0.8508
Distribution Plot Normal, Mean=0, StDev=1 0.4
Density
0.3
Density
Distribution Plot Normal, Mean=0, StDev=1
0.1359
0.1 0.0
0 x
–1.5
0 x
(b) above –1.5
1
2
(c) between 1 and 2
P.134 The plots below show the three required regions as areas in a N(0,1) distribution. We see that the areas are (a) 0.788 (b) 0.115 (c) 0.0656
0.788 0.0656 0.115 0
0.8
(a) below 0.8
0 (b) above 1.2
1.2
–1.75 –1.25
0
(c) between –1.75 and –1.25
P.135 The plots below show the three required regions as areas in a N(0,1) distribution. We see that the areas are (a) 0.982 (b) 0.309 (c) 0.625
CHAPTER P
584
0.982
0.625
0.309
–2.1
0 (a) above –2.10
0 0.5
–0.5 0
–1.5
(b) below –0.5
(c) between –1.5 and 0.5
P.136 The plots below show the three required regions as areas in a N(0,1) distribution. We see that the areas are (a) 0.0885 (b) 0.212 (c) 0.664 Normal, Mean=0, StDev=1
Normal, Mean=0, StDev=1
Normal, Mean=0, StDev=1 0.664
0.212
0.0885 0
1.35
–0.8
(a) above 1.35
–1.23
0
(b) below –0.8
0
0.75
(c) between -1.23 and 0.75
P.137 The plots below show the required endpoint(s) for a N (0, 1) distribution. We see that the endpoint z is (a) −1.282 (b) −0.8416 (c) ±1.960 Distribution Plot Normal, Mean=0, StDev=1
Distribution Plot Normal, Mean=0, StDev=1
0.3
0.3 0.8
0.2
0.2 0.1
0.1
Density
0.3
0.4
Density
0.4
Density
0.4
Distribution Plot Normal, Mean=0, StDev=1 0.95
0.2 0.1
0.1 0.0
-1.282
0 x
(a) 10% below z
0.0
–0.8416
0 x
(b) 80% above z
0.0
–1.960
0 x
1.960
(c) 95% between –z and +z
P.138 The plots below show the required endpoint(s) for a N(0,1) distribution. We see that the endpoint z is
CHAPTER P
585
(a) 0.524 (b) 2.33 (c) ±1.64
0.9
0.7
0.01 0 0.524
0
(a) 70% below z
2.33
–1.64
(b) 1% above z
0
1.64
(c) 90% between −z and +z
P.139 The plots below show the required endpoint(s) for a N(0,1) distribution. We see that the endpoint z is (a) −1.28 (b) 0.385
0.9 0.65
-1.28
0
0.385
(a) 90% above z
(b) 65% below z
P.140 The plots below show the required endpoint(s) for a N(0,1) distribution. We see that the endpoint z is (a) 2.05 (b) −0.25 Normal, Mean=0, StDev=1
Normal, Mean=0, StDev=1
0.4 0.02 0 (a) 2% above z
2.05
–0.253 0 (b) 40% below z
CHAPTER P
586
P.141 The plots below show the three required regions as areas on the appropriate normal curve. Using technology, we can find the areas directly, and we see that the areas are as follows. (If you are using a paper table, you will need to first convert the values to the standard normal using z-scores.) For additional help, see the online supplements. (a) 0.691 (b) 0.202 (c) 0.643 Normal, Mean=75, StDev=10
Normal, Mean=20, StDev=6
Normal, Mean=12.2, StDev=1.6 0.643
0.691
0.202 75 80
20
25
(b) above 25 on N(20,6)
(a) below 80 on N(75,10)
11 12.2
14
(c) between 11 and 14 on N(12.2, 1.6)
P.142 The plots below show the three required regions as areas on the appropriate normal curve. Using technology, we can find the areas directly, and we see that the areas are as follows. (If you are using a paper table, you will need to first convert the values to the standard normal using z-scores.) For additional help, see the online supplements. (a) 0.252 (b) 0.048 (c) 0.452
0.452
0.252 0.0478 5
6
(a) above 6 on N(5,1.5)
15
20
(b) below 15 on N(20,3)
90
100
(c) between 90 and 100 on N(100,6)
P.143 The plots below show the three required regions as areas on the appropriate normal curve. Using technology, we can find the areas directly, and we see that the areas are as follows. (If you are using a paper table, you will need to first convert the values to the standard normal using z-scores.) For additional help, see the online supplements. (a) 0.023 (b) 0.006
CHAPTER P
587
(c) 0.700
0.700
0.0228 120 200 (a) above 200 on N(120,40)
0.00621 49.5
50 (b) below 49.5 on N(50,0.2)
0.8 1 1.5 (c) between 0.8 and 1.5 on N(1,0.3)
P.144 The plots below show the three required regions as areas on the appropriate normal curve. Using technology, we can find the areas directly, and we see that the areas are as follows. (If you are using a paper table, you will need to first convert the values to the standard normal using z-scores.) For additional help, see the online supplements. (a) 0.0122 (b) 0.869 (c) 0.0807
Normal, Mean=0.3, StDev=0.04
Normal, Mean=500, StDev=25
0.869
Normal, Mean=15, StDev=6
0.0807
0.0122 0.21
0.3 (a) below 0.21 on N(0.3,0.04)
472 500 (b) above 472 on N(500,25)
8 10 15 (c) between 8 and 10 on N(15,6)
P.145 The plots below show the required endpoint(s) for the given normal distribution. Using technology, we can find the endpoints directly, and we see that the requested endpoints are as follows. (If you are using a paper table, you will need to find the endpoints on a standard normal table and then convert them back to the requested normal.) For additional help, see the online supplements. (a) 59.3 (b) 2.03 (c) 60.8 and 139.2. Notice that this is very close to our rough rule that about 95% of a normal distribution is within 2 standard deviations of the mean.
CHAPTER P
588 Normal, Mean=50, StDev=4
Normal, Mean=2, StDev=0.05
Normal, Mean=100, StDev=20 0.95
0.7
0.01 50 59.3 (a) 0.01 above on N(50,4)
2 2.03 (b) 0.70 below on N(2,0.05)
60.8 100 139 (c) 95% between on N(100,20)
P.146 The plots below show the required endpoint(s) for the given normal distribution. Using technology, we can find the endpoints directly, and we see that the requested endpoints are as follows. (If you are using a paper table, you will need to find the endpoints on a standard normal table and then convert them back to the requested normal.) For additional help, see the online supplements. (a) 30.4 (b) 335.7 (c) 4.1 and 15.9. Notice that this is very close to our rough rule that about 95% of a normal distribution is within 2 standard deviations of the mean.
0.95
0.25 0.02 25 30.4
336
500
4.12
(b) 0.02 below on N(500,80)
(a) 0.25 above on N(25,8)
10
15.9
(c) 95% between on N(10,3)
P.147 The plots below show the required endpoint(s) for the given normal distribution. Using technology, we can find the endpoints directly, and we see that the requested endpoints are as follows. (If you are using a paper table, you will need to find the endpoints on a standard normal table and then convert them back to the requested normal.) For additional help, see the online supplements. (a) 110 (b) 9.88
(a) 0.75 below on N(100,15)
(b) 0.03 above on N(8,1)
CHAPTER P
589
P.148 The plots below show the required endpoint(s) for the given normal distribution. Using technology, we can find the endpoints directly, and we see that the requested endpoints are as follows. (If you are using a paper table, you will need to find the endpoints on a standard normal table and then convert them back to the requested normal.) For additional help, see the online supplements. (a) 2.44 (b) 541
Normal, Mean=5, StDev=2
Normal, Mean=500, StDev=25
0.1
0.05 2.44
5
500
541
(a) 0.10 below on N(5,2)
(b) 0.05 above on N(500,25)
P.149 We standardize the endpoint of 40 using a mean of 48 and standard deviation of 5 to get z=
40 − 48 x−μ = = −1.6 σ 5
The graphs below show the lower tail region on each normal density. The shaded area in both curves is 0.0548.
P.150 We use technology to find the endpoint on a standard normal curve that has 30% above it. The graph below shows this point is z = 0.5244. We then convert this to a N(48,5) endpoint with x = μ + z · σ = 48 + 0.5244 · 5 = 50.62 Note: We could also use technology to find the N(48,5) endpoint directly. The graphs below show the upper 30% region on each normal density.
CHAPTER P
590
P.151 We use technology to find the endpoint on a standard normal curve that has 5% above it. The graph below shows this point is z = 1.64. We then convert this to a N(10,2) endpoint with x = μ + z · σ = 10 + 1.64 · 2 = 13.3 Note: We could also use technology to find the N(10,2) endpoint directly. The graphs below show the upper 5% region on each normal density.
P.152 We standardize the endpoint of 13.4 using a mean of 10 and standard deviation of 2 to get z=
13.4 − 10 x−μ = = 1.7 σ 2
The graphs below show the upper tail region on each normal density. We see that the areas are identical, and both are 0.0446.
CHAPTER P
591
P.153 Using technology to find the endpoint for a standard normal density, we see that it is z = −1.28 (as shown in the figure). We convert this to N(500,80) using x = μ + z · σ = 500 − 1.28 · 80 = 397.6 Note: We could also use technology to find the N(500,80) endpoint directly. The graphs show the lower 10% region on each normal density.
0.005 0.004 0.003 0.002 0.001 0.000
P.154 We convert the standard normal endpoint to N(500,80) using x = μ + z · σ = 500 + 2.1 · 80 = 668 The graphs below show the region on each normal density. We see that the area is identical in both and is 0.018.
P.155 We convert the standard normal endpoints to N(100,15) using x = 100 + 1 · 15 = 115
and
x = 100 + 2 · 15 = 130
The graphs below show the region on each normal density. We see that the area is identical in both and is 0.1359.
CHAPTER P
592
P.156 We use technology to find the endpoints for a standard normal density, the 10th and 90th percentiles are z = −1.282 and z = −1.282 (as shown in the figure below). We convert these to N(100,15) using x = 100 − 1.282 · 15 = 80.77
and
x = 100 + 1.282 · 15 = 119.23
Note: We could also use technology to find the N(100,15) endpoints directly. The graphs below show the middle 80% region on each normal density.
P.157 For Reading and Writing, we have a N (531, 104) curve, which will be centered at its mean, 531. We label the points that are one standard deviation away from the mean (427 and 635) and two standard deviations ways (293 and 738=9) so that approximately 95% of the area falls between the values two standard deviations away. Those points should be out in the tails, with only about 2.5% of the distribution beyond them on each side. P.158 Note that a percentile always means the area to the left. Using technology, we can find the endpoints and areas directly, and we obtain the answers below. (Alternately, we could convert to a standard normal and use the standard normal to find the equivalent area.) (a) The area below 700 is 0.948, so that point is the 95th percentile of a N(531,104) distribution. (b) The point where 30% of the scores are below it is a score of 476. P.159 Note that a percentile always means the area to the left. Using technology, we can find the endpoints and areas directly, and we obtain the answers below. (Alternately, we could convert to a standard normal and use the standard normal to find the equivalent area.) (a) The area below 450 is 0.252, so that point is about the 35th percentile of a N(528,117) distribution.
CHAPTER P
593
(b) The point where 90% of the scores are below it is a score of 678. P.160 (a) The N (55.5, 2.7) curve is centered at its mean, 55.5 inches. The points two standard deviations away from the mean on either side, 50.1 and 60.9, enclose 95% of the area. These values should be near the tails, with only about 2.5% of the heights beyond them on each side. See the figure.
(b) Using technology we find the area between 4’4 (52 inches) and 5 (60 inches) for a N (55.5, 2.7) distribution. We see that the areas is 0.8548.
Alternately, we can compute z-scores z=
52 − 55.5 = −1.30 2.7
z=
60 − 55.5 = 1.67 2.7
Using technology or a table the area between −1.30 and 1.67 on a standard normal curve is 0.855, matching what we see directly. About 85.5% of 10 year old boys will be between 52 and 60 inches tall. (c) Using technology we find an endpoint for a N (55.5, 2.7) distribution that has an area of 0.99 below it.
CHAPTER P
594
Alternately, we can use technology or a table to find the 99%-tile for a standard normal distribution, z = 2.326 and convert this value to the N (55.5, 2.7) scale, Height = 55.5 + 2.326 · 2.7 = 61.78. Again, of course, we arrive at the same answer using either approach. A 10-year-old boy who is taller than 99% of the other boys his age is 61.8 inches tall or about 5 2 . P.161 (a) Using technology we find the area between 68 inches and 72 inches in a N (70, 3) distribution. We see in the figure that the area is 0.495.
Alternately, we can compute z-scores: z=
68 − 70 = −0.667 3
z=
72 − 70 = 0.667 3
Using technology or a table the area between −0.667 and 0.667 on a standard normal curve is 0.495, matching what we see directly. About 49.5% (or almost exactly half) of US men are between 68 and 72 inches tall. (b) Using technology we find an endpoint for a N (70, 3) distribution that has an area of 0.10 below it. We see in the figure that the height is about 66.2 inches.
CHAPTER P
595
Alternately, we can use technology or a table to find the 10%-tile for a standard normal distribution, z = −1.28 and convert this value to the N (70, 3) scale, Height = 70 − 1.28(3) = 66.2. Again, of course, we arrive at the same answer using either approach. A US man whose height puts him at the 10th percentile is 66.2 inches tall or about 5 6 .
P.162 A N (0.325, 0.021) curve is centered at its mean, 0.325. The values two standard deviations away, 0.283 and 0.367, are labeled so that approximately 95% of the area falls between these two values. They should be out in the tails, with only about 2.5% of the distribution beyond them on each side. See the figure.
P.163 We use technology to find the points on a N(3.16, 0.40) curve that have 25% and 75%, respectively, of the distribution below them. Note: for the upper quartile we can also look for 25% above the value. In the figure below we see that the two quartiles are at Q1 = 2.89 and Q3 = 3.43.
CHAPTER P
596
Alternately, we can use technology or a table to find the 25% and 75% endpoints for a standard normal distribution, z = ±0.674. We then convert these endpoints to corresponding points on a N (3.16, 0.40) distribution. Q1
= 3.16 − 0.674 · 0.40 = 2.89
Q3
= 3.16 + 0.674 · 0.40 = 3.43
P.164 Using technology we can find the area above 0.35 on a N (0.325, 0.021) curve directly. We see in the figure that the area is 0.1169.
Alternately, we can compute a z-score z=
0.35 − 0.325 = 1.19 0.021
Using technology or a table the area above 1.19 on a standard normal curve is 0.1169, the same as the area we found above. Thus about 11.7% of samples of 500 US adults will contain more than 35% with at least a bachelor’s degree. P.165 The plots below show the three required regions as areas in a N(21.97,0.65) distribution. We see that the areas are 0.0565, 0.0012, and 0.5578, respectively.
CHAPTER P
597 Distribution Plot Normal, Mean=21.97, StDev=0.65
Distribution Plot Normal, Mean=21.97, StDev=0.65
0.6
0.6
0.5
0.5
0.5
0.4 0.3 0.2 0.1 0.0
0.05653 21.97 23 x (a) more than 23
Density
0.6
Density
Density
Distribution Plot Normal, Mean=21.97, StDev=0.65
0.4 0.3
0.4 0.3
0.2
0.2
0.1 0.001220
0.1
0.0
20
21.97 x (b) less than 20
0.5578
0.0
21.5 21.97 22.5 x (c) between 21.5 and 22.5
If converting to a standard normal, the relevant z-scores and areas are shown below. (a) z = 23−21.97 = 1.585. The area above 1.585 for N (0, 1) is 0.0565. 0.65
(b) z = 20−21.97 = −3.031. The area below −3.031 for N (0, 1) is 0.0012. 0.65
(c) z = 21.5−21.97 = −0.7231 and z = 22.5−21.97 = 0.8154. The area between −0.7231 and 0.8154 for 0.65 0.65 N (0, 1) is 0.5578.
P.166
(a) Here is a sketch of a normal curve with center at 0 and standard deviation of 2.5.
(b) Using technology, we can find the area above 3.0 for a N (0, 2.5) density.
CHAPTER P
598 Alternately, we can compute a z-score z=
3−0 = 1.20 2.5
Using technology or a table the area above 1.20 on a standard normal curve is 0.1151. Thus about 11.5% of randomization samples would have slopes of 3.0. (c) We use technology to find the point on a N (0,2.5) curve that has 5% of the distribution below it. This point is at a slope of −4.112.
Alternately, we can use technology or a table to find the 5%-tile for a standard normal distribution, z = −1.645. We then convert this point to the corresponding slope on a N (0, 2.5) distribution, 0 − 1.645 · 2.5 = −4.112. P.167 We use technology to determine the answers. We see in the figure that the results are: (a) 0.0509 or 5.09% of students scored above a 90. (b) 0.138 or 13.8% of students scored below a 60. (c) Students with grades below 53.9 will be required to attend the extra sessions. (d) Students with grades above 86.1 will receive a grade of A.
CHAPTER P
599
P.168 We standardize the original scores using the N (62, 18) distribution. z1 =
47 − 62 = −0.833 18
and
z2 =
90 − 62 = 1.556 18
We then convert these z-scores to a N (75, 10) distribution. x1 = 75 − 0.833 · 10 = 66.67
and
x2 = 75 + 1.556 · 10 = 90.56
Rounding to integers the new exam scores would be 67 and 91. P.169 (a) For any N (μ, σ) distribution, when we standardize μ − 2σ and μ + 2σ we must get z = −2 and z = +2. For example, if we use N (100, 20) the interval within two standard deviations of the mean goes from 60 to 140. z=
60 − 100 = −2 20
and
z=
140 − 100 = +2 20
Using technology, the area between −2 and +2 on a standard normal curve is 0.954. (b) Similar to part (a), if we go just one standard deviation in either direction the standardized z-scores will be z = −1 and z = +1. Using technology, the area between −1 and +1 on a standard normal curve is 0.683. (c) Similar to part (a), if we go three standard deviations in either direction the standardized z-scores will be z = −3 and z = +3. Using technology, the area between −3 and +3 on a standard normal curve is 0.997. (d) The percentages within one, two, or three standard deviations of the mean, roughly 68% between μ ± σ, roughly 95% between μ ± 2σ, and roughly 99.7% between μ ± 3σ, should hold for any normal distribution since the standardized z-scores will always be z = ±1 or z = ±2 or z = ±3, respectively.
Section 1.1: The Structure of Data Example 1: 2011 Hollywood Movies Here is a small part of a dataset that includes information on all 136 movies to come out of Hollywood in 2011. “Rating” is audience rating on a 100-point scale, “Budget” is budget in millions of dollars, and “Opening” is opening weekend gross, in millions of dollars. Title
LeadStudio Sony Warner Bros Relativity DreamWorks Warner Bros DreamWorks
Insidious Harry Potter and Deathly Hallows P2 Bridesmaids The Help Horrible Bosses Transformers: Dark of the Moon
a). What are the cases in this dataset?
Rating 65 92 77 91 72 67
Genre Horror Fantasy Comedy Drama Comedy Action
Budget 1.5 125.0 32.5 25.0 35.0 195.0
Opening 13.27 169.19 26.25 26.04 28.30 197.85
The different movies
b). Identify each variable as quantitative or categorical: [Title is just an identifier variable and is NOT categorical or quantitative] LeadStudio is categorical and Genre is categorical Rating, Budget, and Opening are all quantitative c). Name a question we might ask about this dataset that is about: A single variable: There are lots of possible answers: What proportion are comedies or what is an average audience rating or which movie had the largest budget? Let them suggest ideas! A relationship between two of the variables: Again, lots of possible ideas: Which genre of movie has the highest budget? Is Audience rating related to opening weekend gross? Is one of the studios more likely to make horror films? Do movies with a higher budget tend to get higher audience ratings? Again, let them suggest ideas!
Example 2: Yogurt and Weight Loss Can eating a yogurt a day cause people to lose weight weight (as compared to not eating yogurt)? In a study investigating this: a). What are the cases?
The people involved in the study
b). What are the variables and is each quantitative or categorical? Eating a yogurt a day? (yes/no) -- categorical Amount of weight loss – quantitative (Students might also suggest: Did the person lose weight (yes/no), which is categorical and also a fine answer. It is not clear from the short description given so be flexible with their answers! c). What might a dataset look like that might help us answer this question? Make up some data to show the first couple of rows of such a dataset. Person Yogurt? Weight Loss 1 Yes 10 2 No 2
page 2 Quick Self-Quiz: Cases and Variables Give the cases and the variable(s) in each situation below, and identify each variable as categorical or quantitative: a). Asking US citizens whether or not they support gun control. Cases: US citizens Variable: Support gun control? (yes/no) – categorical b). Asking college students how many hours of sleep they got last night. Cases: College students Variable: How much sleep last night? – quantitative
Example 2 Revisited: Yogurt and Weight Loss Can eating a yogurt a day cause people to lose weight? In a study investigating this: What is the explanatory variable? Whether or not a person eats a yogurt a day. What is the response variable? The weight change that the person experiences during the study.
Quick Self-Quiz: Explanatory and Response Variables In each situation, indicate the explanatory variable and the response variable. a). Does meditation help reduce stress? Explanatory variable: Meditation or not Response variable: Some measure of stress b). Is hyperactivity in children affected by sugar consumption? Explanatory variable: Sugar consumption Response variable: Some measure of hyperactivity This is a good time to point out that it is important to define variables precisely when we actually conduct a study, and to also point out that variables could be quantitative or categorical depending on how it is measured and recorded (for example, asking a person if they feel stress or actually measuring cortisol levels in the blood.).
Section 1.2: Sampling from a Population Example 1: Number of Cell Phone Calls per Day Princeton Survey Research reports on a survey of 1,917 cell phone users in the US, conducted in May 2010, asking “On an average day, about how many phone calls do you make and receive on your cell phone?” a). What is the sample in this study? The 1,917 people who were asked the question.
b). What is the population?
All cell phone users in the US
c). Is the variable in this study quantitative or categorical?
Quantitative
Example 2: Driving with a Pet on your Lap Over 30,000 people participated in an online poll on cnn.com conducted in April 2012 asking “Have you ever driven with a pet on your lap?” We see that 34% of the participants answered yes and 66% answered no. a). What is the sample? The 30,000 people who participated in the poll
b). Can we conclude that 34% of all drivers have driven with a pet on their lap? No! This is a volunteer sample, and volunteer samples are aften biased. It is likely that people who have driven with a pet on their lap are more likely to respond to the poll. c). Explain why it is not appropriate to generalize these results to all drivers, or even to all drivers who visit cnn.com. What is the problem with this method of data collection? Volunteer samples are generally biased since people who feel strongly about the subject are much more likely to participate. d). How might we select a sample of people that would give us results that we can generalize to a broader population? Use a RANDOM SAMPLE! e). Is the variable in this study quantitative or categorical?
Categorical
page 2
Quick Self-Quiz: Sampling Bias You wish to estimate the number of hours a week that students at your university spend studying. For each sampling method below, indicate whether the results are likely to be representative of the entire student body or whether the method is likely to give biased results. a). Ask students you find at the library on a Friday night.
Biased
b). Ask students who are at a party on a Tuesday night.
Biased
c). Ask a random sample of students.
Representative
d). Ask the students in Organic Chemistry.
Biased
e). Ask all the students on one of the school’s athletic teams.
Biased
f). Send an email out to all students and use all the responses you get. Biased – Volunteer sample f). Ask friends that you know, trying to get a reasonably representative group. Biased – People are not good at this
Example 3: Public Speeches Against Democracy How a question is worded can have a dramatic impact on the result. These two questions appear to be asking the same thing: “Do you think the US should allow public speeches against democracy?” “Do you think the US should not forbid public speeches against democracy?” However, 21% said such speeches should be allowed (in answer to the first question) while almost double that (39%) said such speeches should not be forbidden (in answer to the second question). Why do you think the results were so different? Even subtle changes in wording can have a strong effect on the outcome. In this case, it is likely that people don’t want to “forbid” something since it sounds very rigid compared to “not allowing”. Many other answers are possible. The key idea is that wording matters and can affect the outcome!
Quick Self-Quiz: Wording Bias A survey is to be conducted using a random sample of citizens in a town, asking them if they support raising taxes to increase funding for the public school. a). Write the question in a way that is likely to bias the results toward more yes answers. “Our schools do such a fantastic job and really need our help. Do you support raising taxes a small amount to increase funding for them?” Many other answers are also possible. b). Write the question in a way that is likely to bias the results toward more no answers. “Our taxes are far higher than they should be already and the schools are just wasting the money. Do you oppose a big tax hike to throw more money at them?” Many other answers are also possible. (In practice, of course, we should try to actually write questions in a way that will NOT bias the results. The first step is to recognize the effect of leading questions.)
Section 1.3: Experiments and Observational Studies Example 1: Association or Causation? For each of the following statements, indicate whether the statement implies causation or just indicates association without causation. a). If you study more, your grade will improve in this course. Causation! “Will improve” implies causation. b). Want to lose weight? Drink tea! Causation! The implication is that drinking tea will cause you to lose weight. c). Aging of the brain tends to be delayed in people with a college education. Association without causation. The statement is not claiming that a college education will cause a delay in aging of the brain. In fact, there are many confounding factors.
Example 2: Television sets and Life expectancy There is a very strong positive association between television sets per person in a country and the life expectancy of the people in that country, which means people in countries with lots of TVs tend to live longer. a). Does this mean we can improve the life expectancy of people in developing countries by sending them lots of television sets? No! This strong association is not causation. b). Describe a possible confounding variable in this association. The wealth of the countries is a confounding variable. Wealth will affect both the number of television sets and the life expectancy.
Example 3: Does Kindergarten Lead to Crime? A state legislator in New Hampshire claimed in July 2012 that kindergarten programs lead to more crime. He noted as evidence that the largest of all the communities in his county was the only community with a kindergarten program and also had the most crime. a). What common mistake is the legislator making? The legislator is assuming association implies causation. This a very common mistake! Just because there is a strong association does NOT mean that one is causing the other. b). Describe a possible confounding variable in this association. The population of the communities. Larger communities are more likely to have a kindergarten program and are also more likely to have more crime.
Quick Self-Quiz: Association, Causation, and Confounding Variables An association is described. In each case, indicate a). Whether the statement implies causation or just association b). A possible confounding variable 1. More sales of sunscreen tend to occur when more sunglasses are sold. a). Just association without causation. b). Amount of sunshine 2. “Exercise reduces risk of Alzheimers” claims a headline reporting on a study of elderly people that recorded how much each exercised at age 70 and then whether the person got Alzheimer’s disease. a). Causation b). The person’s overall health at age 70. People with early stage (and not yet detected) Alzheimer’s disease are probably less likely to exercise. 3. Increased weight helps students run faster. a). Causation b). Gender! This is an often overlooked confounding variable. Males tend to weigh more and also to run faster. Another possible answer is age if the participants are children.
page 2
Example 4: Exercise and Depression In each case below, we describe a study to determine whether exercise helps increase certain mood enhancing chemicals in the brain. Indicate whether each describes an experiment or an observational study. a). We contact a random sample of 100 people and record how much each person exercises and also measure the chemicals in the brain for each person. Observational study. No variables were manipulated. b). Using a random sample of 100 people, we randomly assign half of them to participate in a regular exercise program for a six-week period while the other half makes no changes. At the end of the time period, we measure the brain chemicals. Experiment. The experimenters actively imposed the explanatory variable (exercise or not).
Example 5: Caffeine and Learning Does ingesting caffeine help mice learn the way through a maze faster? Describe a well-designed randomized experiment to test this. Give all details, including the use of placebos and blinding. We have 20 mice to work with. We randomly divide the mice into two groups of 10 mice each. We give half the mice caffeine in their food while the other group gets identical food without caffeine (placebo). We make the study double-blind by not telling the mice or anyone interacting with the mice which are getting the caffeinated food and which are not. We measure the time it takes for the mice to learn the maze and compare the results between the two groups.
Example 6: Is the Dominant Hand Stronger? Describe a Matched Pairs experiment to see if right hand strength is greater than left-hand strength in right-handed people. You have a hand-grip to measure strength and 30 right-handed people to use in your study. Be sure to include how randomization is used in your design. We are comparing right-hand strength to left-hand strength and in a matched pairs experiment, all participants are measured for both, so we will measure the strength in both the left and right hand for all participants. We randomize the order in which the participants have hand strength measured: some randomly assigned to measure the strength in the left hand first and some randomly assigned to measure strength in the right hand first. Then we compare the right and left hand strength for each individual.
Quick Self-Quiz: Randomized Experiments Design a randomized comparative experiment to determine whether taking vitamin B supplements helps memory. You have 40 people to use in the study and the study will take place over one month. We randomly divide the people into two groups of 20 people each. We have one group take vitamin B supplements every day for a month, while the other group takes a pill that looks exactly like the vitamin B supplements but which is really a placebo. We make the study double-blind by not telling either the participants or the people interacting with the participants which group they are in. At the end of the month, we conduct a memory test and record and compare the results.
Section 2.1: Categorical Variables Example 1: Talking About Sports A survey in November 2012 asked a random sample of 2,000 US adults “How often do you talk about sports with family and friends?” The results are given in the following frequency table. Response Every day or nearly every day About once a week Occasionally Rarely or never TOTAL
Frequency 302 277 526 895 2000
a). What proportion rarely or never talk about sports?
895
𝑝̂ = 2000 = 0.4475
b). What percent of people in the sample talk about sports once a week or more? 𝑝̂ =
302+277 =0.2895 2000
so about 29% talk about sports once a week or more
b). Give a relative frequency table for this dataset. Relative Frequency Every day or nearly every day 0.1510 About once a week 0.1385 Occasionally 0.2630 Rarely or never 0.4475
Quick Self-Quiz: Frequency and Relative Frequency Tables In a blind taste test, people were given four different types of water and asked to select their top choice. Ten of the participants selected tap water, 25 selected Aquafina, 41 selected Fiji, and 24 selected Sam’s Choice. a). Display the results in a frequency table. Frequency Tap 10 Aquafina 25 Fiji 41 Sam’s Choice 24 Total 100 b). What proportion selected Aquafina? 25 𝑝̂ = = 0.25 100 c). What proportion selected bottled (not tap) water? 25 + 41 + 24 90 𝑝̂ = = = 0.90 100 100 d). Display the results in a relative frequency table. Relative Frequency Tap 0.10 Aquafina 0.25 Fiji 0.41 Sam’s Choice 0.24 Total 1.00
page 2
Example 2: Relationship Status and Gender 169 college students were asked about relationship status and gender. The results are given in the following two-way table. In a relationship It’s complicated Single Total
Female 32 12 63 107
Male 10 7 45 62
Total 42 19 108 169
a). What proportion of students in this sample are in a relationship?
42/169 = 0.2485
b). What proportion of females in this sample are in a relationship?
32/107 = 0.299
c). What proportion of the people who are in a relationship in this sample are female? 32/42 = 0.762 d). What proportion of males in this sample are in a relationship?
10/62 = 0.161
e). Using 𝑝̂𝐹 to represent the proportion of females in a relationship and 𝑝̂𝑀 to represent the proportion of males in a relationship, find the difference in proportions 𝑝̂𝐹 − 𝑝̂𝑀 . 0.299 – 0.161 = 0.138
Example 3: Handedness and Occupation In a study of handedness in occupations, 10 out of 118 psychiatrists were left-handed, 26 out of 148 architects were left-handed, 5 of 132 orthopedic surgeons were left-handed, and 16 of 105 lawyers were left-handed. a). Make a two-way table of this relationship. (Rows and columns could be switched) Left Right Total Psychiatrists 10 108 118 Architects 26 122 148 Orthopedic surgeons 5 127 132 Lawyers 16 89 105 TOTAL 57 446 503 b). What proportion of all the people in the sample are left-handed? 57/503 = 0.113
Quick Self-Quiz: Finding Proportions from Two-Way Tables Errors in medical prescriptions occur relatively frequently. In a study, two groups of doctors had similar error rates and one group switched to e-prescriptions while the other continued with hand-written prescriptions. One year later, the number of errors was measured. The results are given in the two-way table. Error No Error Total Electronic 254 3594 3848 Written 1478 2370 3848 Total 1732 5964 7696 a). Fill in the row and column totals. b). What proportion of all the prescriptions had errors in them? 1732/7696 = 0.225 c). What proportion of electronic prescriptions had errors in them? 254/3848 = 0.066 d). What proportion of written prescriptions had errors in them?
1478/3848 = 0.384
e). What proportion of the prescriptions with errors were written prescriptions? 1478/1732 = 0.853
Section 2.2: One Quantitative Variable: Shape and Center Example 1: Height of Students The histogram and dotplot below both show the data for the quantitative variable Height for 355 students. a). Describe the general shape of the distribution. Approximately bell-shaped and symmetric b). Draw a smooth curve over the histogram showing the general shape. See below. You want them to get used to using a smooth curve to show just the very general shape.
Example 2: Hollywood Movies: World Gross World gross income from all viewers (in millions of dollars) of all movies to come out of Hollywood in 2011 is shown in the dotplot.
a). Describe the shape of the distribution.
Skewed to the right
b). Do there appear to be any outliers? If so, which values? Yes, the top three values, circled above. (These three values correspond to Harry Potter, Transformers, and Pirates of the Caribbean.)
Quick Self-Quiz: Shape of a Distribution The number of theaters to show a movie during opening weekend for all movies to come out of Hollywood in 2011 is shown in the dotplot. Describe the shape of the distribution. Slightly skewed to the left
page 2
Example 3: Ants on a Sandwich The number of ants climbing on a piece of a peanut butter sandwich left on the ground near an anthill for a few minutes was measured 7 different times and the results are: 43, 59, 22, 25, 36, 47, 19 a). Calculate the mean number of ants.
𝑥̅ = 35.857
b). Calculate the median number of ants.
m = 36
c). Suppose one of the sandwich bits was extremely appealing to the ants, and instead of 59, the number of ants for that bit was actually 159, giving the 7 values as: 43, 159, 22, 25, 36, 47, 19. Compute the mean and the median for this new dataset, and compare your answers to the previous answers. We have 𝑥̅ = 50.143 and m = 36. Notice that the median didn’t change at all but the large
value of 159 had a strong effect on the mean, pulling it up quite a bit. Example 4: World Gross of Hollywood Movies, revisited A dotplot of world gross income from movies is shown in Example 2. a). Estimate the median world gross income. Encourage students to visualize the point with half the data to the left and half to the right. The median appears to be less than 100, and maybe about 75. (The actual median is 76.66.) b). Do you expect the mean to be greater than or less than the median. Explain. Because the distribution is skewed to the right, we expect the mean to be larger than the median. The large outliers will pull the mean up and won’t have much effect on the median. (The actual mean is 150.74.)
Quick Self-Quiz: Mean and Median 1. Sandwich Ants, revisited. The same person who counted the number of ants on peanut butter sandwiches, described in Example 3, also counted the number of ants on Ham and Pickles sandwiches. The values are : 44, 34, 36, 49, 54, 65, 59. a). What is n?
n=7
b). What is the mean?
𝑥̅ = 48.714
c). What is the median?
m = 49
d). Comparing your answers to those in Example 3, do ants seem to prefer peanut butter sandwiches or ham and pickles sandwiches? Ants seem to like ham and pickles more than peanut butter, since both the mean and median are larger for ham and pickles. 2. Opening Weekend of Hollywood Movies, revisited. A dotplot of number of theaters opening weekend for movies is shown in the Quick Self-Quiz on the reverse. a). Estimate the median number of theaters opening weekend. The median appears to be around 3000. (The actual median is 2995.) b). Do you expect the mean to be greater than or less than the median. Explain. Since the distribution is skewed to the left, we expect the mean to be less than the median, but probably not a lot less since the skew isn’t very extreme. (The actual mean is 2828.5.)
Section 2.3: One Quantitative Variable: Measures of Spread Example 1: Tips for a Pizza Delivery Person A pizza delivery person recorded all of her tips (and other variables) over several shifts. She discusses the results, and much more, on “Diary of a Pizza Girl” on the Slice website. The variable Tip in the PizzaGirl dataset includes the 24 tips she recorded, and the values are also given below. Use technology to find the mean and the standard deviation for these values. Give answers to two decimal places. 2, 4, 6, 2, 3, 3, 2, 0, 5, 3, 3, 4.5, 2.5, 8, 2, 2, 2, 3, 3, 3, 3, 5, 2, 0 𝑥̅ = 3.04 s = 1.75
Example 2: Percent of Body Fat in Men The variable BodyFat in the BodyFat dataset gives the percent of weight made up of body fat for 100 men. For this sample, the mean percent body fat is 18.6 and the standard deviation is 8.0. The distribution of the body fat values is roughly symmetric and bell-shaped. (If you are on a computer, check this!) a). Find an interval that is likely to contain roughly 95% of the data values. 𝑥̅ ± 2𝑠 = 18.6 ± 2(8.0) = 18.6 ± 16.0 = (2.6, 34.6) We expect about 95% of the percent body fat values to be between 2.6 and 34.6. b). The largest percent body fat of any man in the sample is 40.1 and the smallest is 3.7. Find and interpret the z-score for each of these values. Which is relatively more extreme? z-score for 40.1 is:
40.1−18.6 = 2.69 8.0
z-score for 3.7 is:
3.7−18.6 = −1.86 8.0
The value 40.1 is 2.69 standard deviations above the mean and the value 3.7 is 1.86 standard deviations below the mean. Since the large value of 40.1 is more standard deviations away, the relatively more extreme value is the large value of 40.1. (It might also be worth pointing out to the students that, from the 95% rule, we know that being more than 2 standard deviations away from the mean is relatively rare. )
Quick Self-Quiz: Estimating Mean and Standard Deviation The histogram below shows the data for the quantitative variable Height for 355 students. a). Estimate the mean and the standard deviation from the histogram. The mean is about 68 and, from the 95% rule, we estimate that the standard deviation is about 4. Answers may vary but students should be estimating the values containing the middle 95% and then taking half the distance from those cutoffs to the center to estimate s. (The actual mean is 68.423 with s = 4.079.) b). Estimate the value of the maximum height for a person in the sample and use your estimated values of mean and standard deviation to find and interpret an estimated z-score for this person’s height. The maximum is about 83, so the z-score is
83−68 = 3.75. This height (6 ft, 11 inches!) is 3.75 standard 4
deviations above the mean – a very extreme value!
page 2
Example 3: Tips for a Pizza Girl, revisited We revisit the data given for tips for a pizza girl, given in Example 1. a). Use technology to find the five number summary. (0, 2, 3, 3.75, 8) The quartiles may be slightly different, depending on which technology you are using. Different packages and calculators use different methods for calculating the quartiles. 8 – 0 = 8. Emphasize that the range is a number not an interval.
b). What is the range? What is the IQR?
3.75 – 2 = 1.75
This may vary depending on your technology choice.
If one of the pizza delivery customers had given a $20 tip, which measure of spread would the large tip have a greater effect on: the range or the IQR? The range. Have them try it!
Example 4: Percentiles of SAT Scores A score of 400 on the SAT Mathematics General Test is at the 16th percentile for all 2012 college-bound seniors taking the SAT. Clearly explain in terms of SAT scores what it means to be “at the 16th percentile”. For a 2012 college-bound senior taking the Math SAT, a score of 400 is greater than 16% of all scores received on that test (and less than 84% of the scores.)
Quick Self-Quiz: Five Number Summary and Skewness 1. Heights, revisited. A histogram of heights of 355 students is shown on the reverse. Use this graph to give a rough estimate of the five number summary of the heights in this sample. The actual five number summary is (59, 65, 68, 71, 83). They should be quite accurate with the maximum, minimum, and median. Be lenient with them as they just start to get used to estimating 25% of the data from either end, to estimate quartiles. 2. For each five number summary below, indicate whether the data appear to be symmetric, skewed to the right, or skewed to the left. a). (10, 57, 85, 88, 93)
Skewed to the left
b). (200, 300, 400, 500, 600)
Symmetric
c). (5, 30, 40, 50, 75)
Symmetric
d). (5, 7, 8, 15, 42)
Skewed to the right
e). (100, 430, 600, 620, 650)
Skewed to the left
Section 2.4: Outliers, Boxplots, and Quantitative/Categorical Relationships Example 1: Population of US States The five number summary for the populations of the 50 US states, in millions of people, is
(0.506, 1.660, 4.170, 6.676, 35.842). The table shows all 50 populations. Determine whether there are any outliers, and, if so, identify them. 0.506 1.315 2.953 5.504 8.685
0.621 1.395 3.499 5.561 8.918
0.636 1.748 3.524 5.740 10.104
0.658 1.813 3.591 5.760 11.450
0.771 1.903 4.142 5.893 12.394
0.830 2.333 4.198 6.207 12.712
0.927 2.421 4.507 6.227 17.385
1.080 2.734 4.525 6.407 19.281
1.262 2.750 4.602 7.481 22.472
1.299 2.901 5.097 8.540 35.842
IQR = Q3 – Q1 = 6.676 – 1.660 = 5.016. We have Q1 – 1.5(IQR) = 1.660 – 1.5(5.016) = -5.864 and Q3 + 1.5(IQR) = 6.676 + 1.5(5.016) = 14.20. Outliers are any values which lie outside of the interval from -5.864 to 14.20. There are obviously no small outliers, but there are four large outliers (at 17.385, 19.281, 22.472, and 35.842). (Note that these correspond to the states Florida, New York, Texas, and California.)
Example 2: Gross State Product The boxplot shows the Gross State Product (GSP) per capita, in dollars per resident, for the 50 US states. a). Are there any outliers? Is so, approximately what are the values of the outliers? There are two: at about 59,000 and 67,000 b). Estimate the range of the data and estimate the IQR for the data. Range about 67,000 – 28,000 = 39,000 and IQR about 44,000 – 37,000 = 7,000. These are rough estimates. c). Estimate the median of the data. About 39,000.
d). Does the data appear to be symmetric, skewed to the left, skewed to the right, or none of these? Skewed to the right e). Do you expect the mean to be greater than or less than the median? Greater than the median
Quick Self-Quiz: Understanding Boxplots The boxplot below is a graph of the percent of the population to graduate high school in each of the 50 states. a). Are there any outliers? No. b). Estimate the range of the data values. About 92 – 78 = 14 c). Estimate the mean of the data set. The median is about 87 and the data is left-skewed, so the mean will be less than the median. A good estimate is about 86. (The actual mean is 86.464.) d). Which of the following values is the best estimate of the standard deviation? 1, 4, 8, 12 4
page 2
Example 3: Smokers by Region of the Country in the US The side-by-side boxplot shows the percent of adult residents who smoke in each state, categorized by region of the country (Midwest, Northeast, South, or West). a). In general, which region seems to have the highest percent of smokers? South b). In which region is the state with the highest percent of smokers? Midwest c). Which region has the greatest variability? West d). Do any of the regions have any outliers? If, so, which ones? Yes, the Midwest has two outliers.
Quick Self-Quiz: Quantitative/Categorical Relationship In a recent study, participants were randomized to drink either tea or coffee every day for two weeks. After two weeks, blood samples were exposed to an antigen and an immune system response was measured, with higher values representing a stronger immune system. ( If you use the PowerPoint slides, note that this boxplot looks different than the one given there. The reason is that the graphs were made using different stat packages and the different packages compute the quartiles (and hence outliers) slightly differently.) a). Does there appear to be a relationship between the categorical variable (tea or coffee) and the quantitative variable (immune response)? Yes, there appears to be a substantial difference between the two groups in terms of immune response. b). Which group appears to have the stronger immune response? The tea drinkers appear to have a stronger immune response. c). Are there outliers in either group? If so, which one(s)? In this version of the graph, there are no outliers in either group. In the version on PowerPoint, there is one outlier in the coffee group. d). Descriptive statistics for this study are shown. Give notation for and find the difference in means. Variable InterferonGamma
Drink Coffee Tea
N 10 11
N* 0 0
Mean 17.70 34.82
SE Mean 5.28 6.36
StDev 16.69 21.08
Minimum 0.00 5.00
Q1 2.25 13.00
Median 15.50 47.00
Q3 25.25 55.00
Maximum 52.00 58.00
If we let 𝑥̅ 𝑇 represent the mean immune response for the tea drinkers and 𝑥̅𝐶 represent the mean immune response for the coffee drinkers, we have: 𝑥̅ 𝑇 − 𝑥̅𝐶 = 34.82 − 17.70 = 17.12 Notice that we could just as correctly have found 𝑥̅𝐶 − 𝑥̅ 𝑇 = 17.70 − 34.82 = −17.12, since an order wasn’t specified. Both are correct. e). Can we conclude that tea causes an increase in this aspect of the immune response? Why or why not? Yes, if a significant effect is found, then we can assume causation, since the data come from a randomized experiment.
Section 2.5: Two Quantitative Variables: Scatterplot and Correlation Example 1: Positive/Negative Associations In each case, do you expect a positive or negative association between the two quantitative variables? a). Number of years of education and annual salary, for US adults Positive: people with more education generally make higher salaries. b). Age and maximum running speed, for adults. Negative: Older people (such as 80-year-olds) tend to run slower than younger people (such as 20-year-olds). c). Age and maximum running speed, for children. Positive: Older children (such as 12-year-olds) tend to run faster than younger children (such as 4-year-olds). d). Age of the husband and age of the wife, for married couples Positive: 80-year-olds are more likely to be married to other 80-year-olds than to 20-year-olds. Ask the students what a negative relationship between these two variables would mean!
Example 2: Scatterplots and Correlation For each of the scatterplots, which of the following correlation values most closely approximates the correlation of the data shown? –1 -0.9 -0.5 0 0.5 0.9 1
Left scatterplot: about 0.5 Right scatterplot about -0.9 Talk about what correlations of -1 and 1 look like on a scatterplot.
Quick Self-Quiz: Understanding Scatterplots The scatterplot shows the relationship between number of rebounds in a season and free throw percentage for players in the National Basketball Association. a). What can we say about a player represented in the top left of the scatterplot? The bottom right? Top left: High free throw percent, low number of rebounds Bottom right: High number of rebounds, low FT percent b). Does the direction of association appear to be more positive or negative? Negative c). Explain what the direction of association (positive or negative) means in this context. Players who are good at free throws tend to get fewer rebounds and players who get lots of rebounds tend to have a lower free throw percent. d). Which value is the best guess at the correlation: -20, -2, -1, -0.9, -0.4, 0, 0.4, 0.9, The actual correlation is r = -0.384. e). Which of the values listed in part (d) are impossible values for a correlation? -20, -2, 2, 20
1, 2, 20
f). Are there any outliers in the scatterplot? Yes, the player with FTPct = 0.87 and about 1100 rebounds appears to be an outlier. There may be others. This is a good opportunity to talk about outliers on a scatterplot.
page 2
Example 3: Correlation and Outliers Use technology to find the correlation of the following dataset with n = 6 data values. x 1 2 3 4 5 15 y 5 8 6 4 9 50 This is a good time to get the students comfortable using technology to find the correlation. The correlation here is r = 0.964. Now find the correlation for the same dataset, but with the outlier at (10, 50) removed (leaving n = 5 points). Comment on the effect of the outlier on the correlation. The correlation without the outlier is r = 0.305. Notice the very strong effect of the outlier on the correlation! You might want to draw the scatterplot with and without the outlier to show the strong visual effect it has as well.
Example 4: TV and Life Expectancy For a sample of 20 countries, we have values for the average life expectancy (in years) and the prevalence of TV and Life Expectancy television sets (number of TVs per 1000 people). The results are displayed in the scatterplot below. 80
Japan Australia France Canada United KingdomUnited States
60
Pakistan
Russia
Yemen
Cambodia Madagascar Haiti Uganda
50
Life Expectancy
70
Mexico Sri Lanka China Egypt Morocco Vietnam Iraq
r = 0.74
40
South Africa Angola 0
200
400
600
800
1000
TVs per 1000 People
a). Does there appear to be an association between the two variables? If so, is it positive or negative? There appears to be a strong positive association between life expectancy and TVs for countries. b). Based on these results, comment on a proposal to send more TVs to countries with lower life expectancy to help people in those countries live longer. Remember: correlation does not imply causation! We can’t infer that adding more TVs will cause people to live longer. Most likely, wealthier countries have better systems for health care and more TVs than poorer countries, so wealth of the county is a confounding variable that influences both of the other variables.
Section 2.6: Two Quantitative Variables: Linear Regression Example 1: Length of Baseball Games Baseball games can last a long time. How is the length of the game (measured in minutes) influenced by how many total runs are scored in the game? A scatterplot with regression line is shown, and the regression line is ̂ = 158 + 2.53 ∙ 𝑅𝑢𝑛𝑠. (If interested, the data are available in BaseballTimes.) given by 𝑇𝑖𝑚𝑒 a). Use the regression line to predict the length of a game in which 5 runs are scored. ̂ = 158 + 2.53(5) = 170.65. The game is predicted to Time last 170.65 minutes. Point out how to estimate this value on the scatterplot. b). In one game, 5 runs were scored and the total time was 189 minutes. Find the residual for this point. Residual = Observed – Predicted = 189 – 170.65 = 18.35. Point out how to “see” this value on the scatterplot. c). For the game described in part (b), circle the corresponding point on the scatterplot and also show the residual on the graph. See the dot and the line on the scatterplot. d). Interpret the slope in context. The slope is 2.53. If one more run is scored in the game, the time the game lasts is predicted to go up by 2.53 minutes. e). Interpret the intercept in context. The intercept is 158. If there are no runs scored in a baseball game, the game is predicted to last 158 minutes. (Although baseball fans will note that a game would have extra innings until at least one team scored.).
Example 2: Cricket Chirps and Temperature The chirp rate of crickets can be used to predict the outside temperature on a summer evening. The table shows chirp rate (in chirps per minute) and temperature (in F) for 7 data points, and a scatterplot of the data is shown. (The data are also available in CricketChirps.) Temperature Chirps
54.5 81
59.5 97
63.5 103
67.5 123
72.0 150
78.5 182
83.0 195
a). Use technology to find the regression line to predict the temperature from the chirp rate. ̂ 𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 = 37.68 + 0.23 ∙ 𝐶ℎ𝑖𝑟𝑝𝑠 b). Predict the temperature if the chirp rate is 100 chirps per minute. Predicted temperature = 37.68 + 0.23(100) = 60.68F. c). Interpret the slope of the regression line in context. The slope is 0.23. If the chirp rate goes up by 1 chirp per minute, the temperature is predicted to go up by 0.23 F.
page 2
Quick Self-Quiz: Using the Regression Line The time of day in which calories are consumed can affect weight gain. At least, that appears to be true in mice. Mice normally eat all their calories at night, but when mice ate some of their calories during the day (when mice are supposed to be sleeping), they gained more weight even though all the mice ate the same total amount of calories. The scatterplot shows the percent of calories eaten during the day, DayPct, and body mass gain in grams, BMGain, for a study involving 27 mice. The regression equation to predict body mass gain ̂ = 1.11 + 0.127 ∙ 𝐷𝑎𝑦𝑃𝑐𝑡. from percent of calories eaten during the day is 𝐵𝑀𝐺𝑎𝑖𝑛 a). Circle the dot that has the largest positive residual. b). Put a square around the box that has the most negative residual . c). What is the predicted body mass gain for a mouse that eats 50% of its calories during the day? ̂ n = 1.11 + 0.127(50) = 7.46, so a mouse that ̂ BMGai eats 50% of its calories during the day is predicted to gain 7.46 grams. d). Find the residual for the mouse who ate 48.3% of its calories during the day and gained 5.82 grams. ̂ = 1.11 + 0.127(48.3) = 7.24. We first find the predicted body mass gain: 𝐵𝑀𝐺𝑎𝑖𝑛 The residual is then: Residual = Observed – Predicted = 5.82 – 7.24 = –1.42. Have them find this point and its corresponding residual on the scatterplot. e). Interpret the slope of the regression line in context. The slope is 0.127. When a mouse eats one more percent of its calories during the day, its predicted body mass gain goes up by 0.127 grams. f). Interpret the intercept of the line in context, if it makes sense to do so. The intercept is 1.11. A mouse who eats 0% of its calories during the day (and all of them at night when a mouse normally eats all its food) is predicted to gain 1.11 grams. The intercept does make sense in this context.
Section 3.1: Sampling Distributions Example 1: Using Search Engines A 2012 survey of a random sample of 2,253 US adults found that 1,329 of them reported using a search engine (such as Google) every day to find information on the Internet. a). Find the relevant proportion and give the correct notation with it. b). Is your answer to part (a) a parameter or a statistic?
1329
𝑝̂ = 2253 = 0.590
Statistic
c). Give notation for and define the population parameter that we estimate using the result of part (a).
p = the proportion of all US adults that would report that they use an Internet search engine every day Example 2: Number of Books Read in a Year A survey of 2,986 Americans ages 16 and older found that 80% of them read at least one book in the last year. Of these book readers, the mean number of books read in the last year is 17 while the median number of books read in the last year is 8. a). How many “book readers” (defined as reading at least one book in the past year) were included in the sample? Number of book readers is 0.80(2986) = 2388.8 or about 2389 book readers. b). Why might the mean and median be so different? Using the information given about the mean and median number of books read in a year, what is the likely shape of the distribution of number of books read in a year by book readers? Since the mean is so much larger than the median, it is likely that there are some very large outliers. The distribution is probably skewed to the right. c). Give the correct notation for the value “17” in the information above. Is this value a parameter or a statistic? It is a statistic and the correct notation for a sample mean is 𝑥̅ . d). Give notation for and define the population parameter that we estimate using the result of part (c).
= the mean number of books read last year by all Americans ages 16 and older who have read at least one book. [Emphasize that we are estimating the mean of the population.] Quick Self-Quiz: Parameters and Statistics For each of the following, state whether the quantity described is a parameter or a statistic, and give the correct notation. a). The proportion of all residents in a county who voted in the last presidential election. This is a parameter since we have information on all the residents, and the notation is p. b). The mean number of extracurricular activities from a random sample of 50 students at your school. This is a statistic since the mean is from a sample, and the notation is 𝑥̅ . c). The mean grade assigned for all grades given out to undergraduates at your school. This is a parameter since the mean is for all grades, and the notation is . d). The difference in proportion who have ever smoked cigarettes, between a sample of 500 people who are 60 years old and a sample of 200 people who are 25 years old. We use statistics since the proportions are from samples. The notation for the difference in sample proportions is 𝑝̂1 − 𝑝̂2 .
page 2
Example 3: Proportion Never Married A sampling distribution is shown for the proportion of US citizens over 15 years old who have never been married, using the data from the 2010 US Census and random samples of size n = 500.
a). What does one dot in the dotplot represent? One dot represents the proportion of people who have never been married in one sample of 500 people. b). Use the sampling distribution to estimate the proportion of all US citizens over 15 years old who have never been married. Give correct notation for your answer. We are estimating the population parameter p and we know that the sampling distribution is centered at the population parameter, so we estimate that p 0.32. c). If we take a random sample of 500 US citizens over 15 years old and compute the proportion of the sample who have never been married, indicate how likely it is that we will see that result for each sample proportion below. We see how likely the given sample proportion is to occur in a sample of size 500. 𝑝̂ = 0.30 𝑝̂ = 0.20 𝑝̂ = 0.37 𝑝̂ = 0.74 Likely to occur Very unlikely to occur Not very likely but possible VERY unlikely! The idea is just to get the students thinking about what is likely to happen by random chance. d). Estimate the standard error of the sampling distribution. We use the 95% rule to give a rough estimate of SE 0.02. e). If we took samples of size 1000 instead of 500, and used the sample proportions to estimate the population proportion: Would the estimates be more accurate or less accurate? More accurate Would the standard error be larger or smaller?
Smaller
Quick Self-Quiz: Effect of Sample Size Three different sampling distributions A, B, and C are given for a population with mean 50. One corresponds to samples of size n = 25, one to samples of size n = 100, and one to samples of size n = 400. Match the sampling distributions with the three sample sizes, and estimate the standard error for each. We know that the standard error goes down as the sample size goes up, so we match distribution A with n = 100 B with n = 25 C with n = 400 We give a rough estimate of the standard error in each case using the 95% rule: A: SE 2 B: SE 4 C: SE 1
Section 3.2: Understanding and Interpreting Confidence Intervals Example 1: Adopting a Child in the US A survey of 1,000 American adults conducted in January 2013 stated that “44% say it’s too hard to adopt a child in the US.” The survey goes on to say that “The margin of sampling error is +/- 3 percentage points with a 95% level of confidence.” a). What is the relevant sample statistic? Give appropriate notation and the value of the statistic. 𝑝̂ = 0.44 b). What population parameter are we estimating with this sample statistic? p = the proportion of all American adults who say it is too hard to adopt a child in the US c). Use the margin of error to give a confidence interval for the estimate. 0.44 0.03 gives an interval from 0.41 to 0.47 d). Is 0.42 a plausible value of the population proportion? Is 0.50 a plausible value?
0.42 lies in the interval so is a plausible value, but 0.50 is not a plausible value. Example 2: Budgets of Hollywood Movies A sampling distribution is shown for budgets (in millions of dollars) of all movies to come out of Hollywood in 2011, using samples of size n = 20. We see that the standard error is about 10.232. Find the following sample means in the distribution and use the standard error of 10.232 to find the 95% confidence interval given by each of the sample means listed. Indicate which of the confidence intervals successfully capture the true population mean of 53.481 million dollars. 𝑥̅ = 40 40 2(10.23) gives 19.54 to 60.46. Does contain population mean. 𝑥̅ = 70 70 2(10.23) gives 49.54 to 90.46. Does contain population mean. 𝑥̅ = 84 84 2(10.232) gives 63.54 to 104.46. Does not contain population mean.[Note that 𝑥̅ = 84 is quite far in the tail of the sampling distribution.]
Quick Self-Quiz: Constructing Confidence Intervals For each of the following, use the information to construct a 95% confidence interval and give notation for the quantity being estimated. a). 𝑝̂ = 0.72 with standard error 0.04 Estimating p. Interval is 0.72 2(0.04), giving 0.64 to 0.80. b). 𝑥̅ = 27 with standard error 3.2. Estimating.. Interval is 27 2(3.2), giving 20.6 to 33.4. c). 𝑝̂1 − 𝑝̂2 = 0.05 with margin of error for 95% confidence of 0.02. Estimating the difference in proportions 𝑝1 − 𝑝2 . Interval is 0.05 0.02, giving 0.03 to 0.07.
page 2
Example 3: Biomass in Tropical Forests Using a sample of 4079 inventory plots, scientists give a 95% confidence interval of 9,600 to 13,600 tons for the mean amount of carbon per square kilometer in tropical forests . Clearly interpret the meaning of this confidence interval. We are 95% sure that the mean amount of carbon per square kilometer in all tropical forests is between 9,600 and 13,600 tons.
Example 4: Is the Economy a Top Priority? A survey of 1,502 Americans in January 2012 found that 86% consider the economy a “top priority” for the president and congress. The standard error for this statistic is 0.01. Find and interpret a 95% confidence interval for the true proportion of all Americans that considered the economy a “top priority” at that time? A 95% confidence interval is 0.86 2(0.01) which is 0.84 to 0.90. We are 95% sure that the proportion of all Americans that considered the economy a top priority at that time is between 86% and 90%.
Quick Self-Quiz: Interpreting a Confidence Interval Using a sample of 24 deliveries described in “Diary of a Pizza Girl” on the Slice website, we find a 95% confidence interval for the mean tip given for a pizza delivery to be $2.18 to $3.90. Which of the following is a correct interpretation of this interval? Indicate all that are correct interpretations. a). I am 95% sure that all pizza delivery tips will be between $2.18 and $3.90. Incorrect. The interval is about the mean, not individual tips.. b). 95% of all pizza delivery tips will be between $2.18 and $3.90. Incorrect. The interval is about the mean, not individual tips. c). I am 95% sure that the mean pizza delivery tip for this sample will be between $2.18 and $3.90. Incorrect. The interval is about the mean of the population, not a sample mean. d). I am 95% sure that the mean tip for all pizza deliveries in this area will be between $2.18 and $3.90. Correct! e). I am 95% sure that the confidence interval for the mean pizza delivery tip will be between $2.18 and $3.90. Incorrect. The confidence is in where the population mean is, not where the interval itself is.
Section 3.3: Constructing Bootstrap Confidence Intervals Example 1: Textbook Prices Prices of a random sample of 10 textbooks (rounded to the nearest dollar) are shown: $132 $87 $185 $52 $23 $147 $125 $93 $85 $72 a). What is the sample mean? 𝑥̅ = 100.1 b). Describe carefully how we could use cards to create one bootstrap statistic from this sample. Be specific. We use 10 cards and write the 10 sample values on the cards. We then mix them up and draw one and record the value on it and put it back. Mix them up again, draw another, record the value, and put it back. Do this 10 times. Then compute the sample mean of this bootstrap sample. c). Where will be bootstrap distribution be centered? What shape do we expect it to have?
It will be centered approximately at the sample mean of 100.1 and we expect it to be bell-shaped.
Example 2: Reese’s Pieces We wish to estimate the proportion of Reese’s Pieces that are orange, and we have one package of Reese’s Pieces containing 55 pieces. Describe carefully how we can use this one sample to create a bootstrap statistic. Be specific. Mix up the pieces and pull one out and record whether or not it is orange and then put it back. Mix them up again, pull one out again and record whether or not it is orange. Put it back and continue the process until we have recorded the result for 55 pieces sampled this way. Compute the proportion of orange candies from this bootstrap sample.
Quick Self-Quiz: Bootstrap Samples A sample consists of the following values: 8, 4, 11, 3, 7. Which of the following are possible bootstrap samples from this sample? a). 8, 3, 7, 11 No because the sample size does not match. b). 4, 11, 4, 3, 3
Yes.
c). 3, 4, 5, 7, 8
No, because 5 is not in the original sample.
d). 7, 8, 8, 3, 4
Yes.
page 2
Example 3: Global Warming What percentage of Americans believe in global warming? A survey on 2,251 randomly selected individuals conducted in October 2010 found that 1,328 answered Yes to the question “Is there solid evidence of global warming?” A bootstrap distribution for this data is shown. Use the information there to give and interpret a 95% CI for the proportion of Americans who believe there is solid evidence of global warming.
The sample proportion is 0.590 and the standard error from the bootstrap distribution is 0.010 so we compute the 95% confidence interval using 0.590 2(0.010), giving an interval of 0.57 to 0.61. We are 95% confident that the proportion of Americans who believe there is solid evidence of global warming is between 0.57 and 0.61.
Example 4: Global Warning by Political Party Does belief in global warming differ by political party? When the question “Is there solid evidence of global warming?” was asked, the sample proportion answering “yes” was 79% among Democrats and 38% among Republicans. A bootstrap distribution for the difference in proportions (𝑝̂𝐷− 𝑝̂ 𝑅 ) is shown (assuming samples of size 1000 from each party). Use the information there to give a 95% CI for the difference in proportions.
The sample difference in proportions is 0.79 – 0.38 = 0.41 and the standard error from the bootstrap distribution is 0.020 so we compute the 95% confidence interval using 0.41 2(0.020), giving an interval of 0.37 to 0.45. We are 95% confident that the proportion of Democrats who believe there is solid evidence of global warming is between 0.37 and 0.45 higher than the proportion of Republicans who believe this.
Section 3.4: Bootstrap Confidence Intervals using Percentiles To be done in a computer lab Example 1: Body Temperature Is normal body temperature really 98.6F? A sample of body temperature for 50 healthy individuals was taken. Find this dataset in StatKey under “Confidence Interval for a Mean”. a). What is the sample mean? What is the sample standard deviation? Mean = 98.26 and standard deviation = 0.765 b). Generate a bootstrap distribution, using at least 1000 simulated statistics. What is the standard error? SE 0.105. Answers will vary slightly with different simulations. c). Use the standard error to find a 95% confidence interval. Show your work. Is 98.6 in the interval? 𝑥̅ ± 2 ∙ 𝑆𝐸 98.26 ± 2(0.105) 98.05 to 98.47 We see that 98.6 is not in the interval. d). Using the same distribution, find a 95% confidence interval using the “Two-tail” option on StatKey. Give your confidence interval here: 98.05 to 98.47. Answers will vary slightly with different simulations. e). Compare the two 95% confidence intervals you found. Are they similar? Yes, very similar. f). Still using the same bootstrap distribution, give a 99% confidence interval: 97.98 to 98.54 g). Is the 99% confidence interval wider or narrower than the 95% confidence interval? Wider h). Clearly interpret the 99% confidence interval in context. We are 99% sure that the mean body temperature for all healthy individuals is between 97.98 and 98.54.
Example 2: Do Ovulating Women Affect Men’s Speech? In a study (described in more detail in Exercise B.18) men were paired with a woman who was in either a fertile phase of her cycle or a not fertile phase. For men paired with a woman in a less fertile stage, 38 of the 61 men copied their partner’s sentence construction. For men paired with a woman at peak fertility, 30 of the 62 men copied their partner’s sentence construction. Use StatKey (or other technology) to find a 90% confidence interval for the difference in proportion matching a partner’s sentence construction between men paired with a not-fertile woman and those paired with a fertile woman. Is zero in the interval? Interpret the confidence interval. –0.0075 to 0.286. Answers will vary slightly with different simulations. We notice that, with this simulation, zero is (just barely) in the interval. We are 90% sure that the proportion of men matching sentence construction is between 0.0075 less and 0.286 more when the woman is not in a fertile phase of her cycle.
page 2
Example 3: Problems with Bootstrap Distributions If a bootstrap distribution is not relatively symmetric, it is not appropriate to use the methods of this chapter to construct a confidence interval. Consider the following data set: 5, 6, 7, 8, 25, 100 a). What is the standard deviation of this dataset?
s = 37.413
b). Use StatKey to create a bootstrap distribution for the standard deviation of this dataset. Describe the distribution. Is the distribution symmetric and bell-shaped? It is not at all symmetric or bell-shaped. There are four separate sharp peaks and the distribution is strongly skewed to the right. c). Is it appropriate to use the methods of this section to find a bootstrap confidence interval for this standard deviation? No, we should only use the methods of this section when the bootstrap distribution is roughly symmetric. d). Discuss with a neighbor why the bootstrap distribution might look the way it does. There are two outliers in this dataset, and the standard deviation from a bootstrap sample will be dramatically different depending on how many copies of each outlier get selected.
Section 4.1: Introducing Hypothesis Tests Example 1: Extrasensory Perception (ESP) In an ESP test, one person writes down one of the letters A, B, C, D, or E and tries to telepathically communicate the choice to a partner. The partner then tries to guess what letter was selected. If there is no ESP and people are just randomly guessing from among the five choices, what proportion of guesses would be correct? a). If no ESP, we expect p = ______ p = 0.2 (since there are five choices and they are randomly guessing) b). Which sample proportion correct would provide the greatest evidence that people have ESP: (If we assume the sample size is the same in every case.) 𝑝̂ = 0.1
𝑝̂ = 0.25
𝑝̂ = 0.4
𝑝̂ = 0.7 Since ESP means more correct
c). Write down the null and alternative hypotheses for testing whether people have ESP: H0: p = 0.2 Ha: p > 0.2
where p is the proportion correct for all people’s guesses since we are looking for evidence that the proportion is significantly above 0.2 (random guesses)
Example 2: Sleep vs Caffeine for Memory In an experiment, students were given words to memorize, then were randomly assigned to either take a 90 minute nap or take a caffeine pill. A couple hours later, they were tested on their recall ability. We wish to test to see if the sample provides evidence that there is a difference in mean number of words people can recall depending on whether they take a nap or have some caffeine. What are the null and alternative hypotheses for this test? H0: 1 = 2 Ha: 1 2
where 1 and 2 are the mean words recalled in the two different conditions since we are looking for evidence that the means are different
Quick Self-Quiz: Writing hypotheses Write down the hypotheses for the test in each case below: a). Does the proportion of people who support gun control differ between males and females? H0: pm = pf Ha: pm pf
where pm and pf are the proportions supporting gun control for males and females, respectively since we are looking for evidence that the proportions are different
b). Is the average hours of sleep per night for college students less than 7? H0: = 7 Ha: < 7
where is the average number of hours of sleep at night for college students since we are looking for evidence that the mean is less than 7
Page 2 Example 1 Revisited: Statistical Significance and ESP If the results of a test for ESP show evidence against the null hypothesis and in support of the alternative hypothesis, what does that mean in terms of ESP? It means we can conclude that p > 0.2 and that the sample results were so strong that we can generalize to the population of all guesses that people have ESP and get more right than would be expected by random chance. If the results do not show enough evidence to refute the null hypothesis, what does that mean in terms of ESP? The sample results are inconclusive. People may or may not have ESP. Sample results could be just random chance. Which sample proportion correct is most likely to show evidence against the null hypothesis and in support of the alternative hypothesis? (If we assume the sample size is the same in every case.) 𝑝̂ = 0.1 𝑝̂ = 0.25 𝑝̂ = 0.4 𝑝̂ = 0.7 since it is way bigger than 0.2 Which sample proportion correct is least likely to show against the null hypothesis and in support of the alternative hypothesis? (If we assume the sample size is the same in every case.) 𝑝̂ = 0.1 𝑝̂ = 0.25 𝑝̂ = 0.4 𝑝̂ = 0.7 Since it is even below what we would get just by randomly guessing and gives no evidence that p > 0.2
Example 2 Revisited: Statistical Significance for Sleep vs Caffeine If the results of the test comparing sleep and caffeine for memory show evidence against the null hypothesis and in support of the alternative hypothesis, what does that mean in terms of sleep, caffeine, and memory? It means the sample results are so clear that we can generalize to the population and state that there is a difference between sleep and caffeine in their effectiveness at helping word recall. If the results do not show enough evidence to refute the null hypothesis and support the alternative hypothesis, what does that mean in terms of sleep, caffeine, and memory? It means the sample results are inconclusive and we can’t tell if there is a difference. Results might just be random chance.
Quick Self-Quiz: Statistical Significance A sample of 50 cans of tomatoes are tested for levels of the chemical BPA to see if there is evidence that the mean level is greater than 100 ppb (parts per billion). Write down the hypotheses for this test: H0: = 100 where is the average BPA in canned tomatoes Ha: > 100 since we are looking for evidence that the mean is greater than 100 Give a possible sample mean that you think would show strong evidence against the null hypothesis and in support of the alternative hypothesis: 𝑥̅ = any value that is much larger than 100. Make it clear that we are not just determining whether the sample mean is larger than 100, but are looking for such strong evidence that we can generalize to the population mean Give a possible sample mean that would definitely not show evidence against the null hypothesis: 𝑥̅ = any value that is equal to or less than 100. If the sample mean is less than or equal to 100, it provides no evidence that the population mean is greater than 100.
Section 4.2: Measuring Evidence with P-values Example 1: Paul the Octopus During the 2010 World Cup tournament, Paul the Octopus (in a German aquarium) became famous for correctly predicting the winner in all 8 games it was asked to predict. (Two containers of food were lowered into Paul’s tank, each with a flag of the opposing teams. He made a selection by choosing which container to eat from. Check out the video on YouTube!) Is this evidence that Paul has psychic powers and can choose correctly more than half the time? State the null and alternative hypotheses: What is Paul’s sample proportion?
H0: p = 0.5 vs Ha: p > 0.5 where p = proportion of all guesses Paul would get correct
p̂ = 8/8 = 1.0
We want to see how unlikely it is to have an 8 for 8 record if Paul is just randomly guessing. We can simulate this with a coin! Each coin flip represents a guess between two teams, with “heads” standing for a correct guess and “tails” for incorrect. Why does this method work for assuming the null hypothesis is true? Because the null hypothesis is p = 0.5, and a coin lands heads 50% of the time. Flipping a coin 8 times and recording the proportion of heads counts as one simulation. A randomization distribution of 1000 such simulations is shown. Out of these 1000 simulations, the simulated sample proportion was 1 on 4 of them. What is the pvalue for the sample statistic of 1? The p-value is the proportion of area in the tail as extreme as the sample statistic (1 in this case), which we can see is very small. Out of 1000 simulations, only 4 of them are as extreme as the sample statistic of 1.0, so the p-value = 4/1000 = 0.004.
Quick Self-Quiz: Visualizing a P-value If a different octopus only got 6 correct out of 8, for a sample proportion of 𝑝̂ = 0.75, use the randomization distribution above to shade in the simulated samples that are as extreme or more extreme than this sample proportion. How would we calculate the p-value for this sample result? Will the p-value be more than or less than the p-value for Paul the Octopus? We shade in the bars from 0.75 to the right, which includes three bars (counting the little one at 1) so the p-value is definitely greater than the 0.004 we found for Paul.
Example 2: Support for the Death Penalty In 1980 and again in 2010, a Gallup poll asked a random sample of 1000 US citizens “Are you in favor of the death penalty for a person convicted of murder?”. In 1980, the proportion saying yes was 0.66. In 2010, it was 0.64. Does this data provide evidence that the proportion of US citizens favoring the death penalty was higher in 1980 than it was in 2010? Use p1 for the proportion in 1980 and p2 for the proportion in 2010. State the null and alternative hypotheses: What is the sample statistic? (continued on reverse)
H0: p1 = p2 vs Ha: p1 > p2
p̂1 - p̂2 = 0.66 – 0.64 = 0.02
A randomization distribution assuming the null hypothesis is true is shown. Which of the following is closest to the p-value? 0.001, 0.05, 0.20, 0.5 The p-value is the proportion of dots in the area indicated, which is closest to 0.20.
Example3: Sleep or Caffeine for Memory In an experiment, students were given words to memorize, then were randomly assigned to take a 90 minute nap or take a caffeine pill. They were then tested on their recall ability. We test to see if the sample provides evidence that there is a difference in mean number of words people can recall depending on whether they take a nap or have some caffeine. The hypotheses are: H0: s = c vs Ha: s c. The sample statistic is 𝑥̅𝑠 − 𝑥̅𝑐 = 3.0. Use the randomization distribution to state the pvalue. We see in the image that the proportion in the tail beyond the sample statistic of 3.0 is 0.022. Because this is a two-tail test, we have to account for both tails, so the p-value is 2(0.022) = 0.044.
Quick Self-Quiz: P-values from Randomization Distributions To test H0: = 50 vs Ha: < 50 using sample data with 𝑥̅ = 43.7: Where will the randomization distribution be centered? Why? At 50, since we must assume the null hypothesis is true when we create the randomization distribution. Is this a left-tail test, a right-tail test, or a two-tail test? It is a left-tail test, since the alternative hypothesis is < 50. How can we find the p-value once we have the randomization distribution? We see how extreme the sample statistic of 43.7 is in the left tail of the randomization distribution.
Section 4.3: Determining Statistical Significance Quick Self-Quiz: Which P-value shows more evidence? In each case, which p-value provides the strongest evidence against H0 and for Ha? a). p-value = 0.95 or p-value = 0.02 The smaller the p-value, the stronger the evidence! b). p-value = 0.008 or p-value = 0.02
Example 1: Red Wine and Weight Loss Resveratrol, an ingredient in red wine and grapes, has been shown to promote weight loss in animals. In one study, a sample of lemurs had various measurements taken before and after receiving resveratrol supplements for 4 weeks. For each p-value given, indicate the formal generic conclusion as well as a conclusion in context. Use a 5% significance level. a). In the test to see if the mean resting metabolic rate is higher after treatment, the p-value is 0.013.
Reject H0. There is evidence that metabolism is higher after receiving resveratrol. b). In the test to see if the mean body mass is lower after treatment, the p-value is 0.007.
Reject H0: There is strong evidence that body mass is lower after receiving resveratrol. c). In the test to see if locomotor activity changes after treatment, the p-value is 0.980.
Do not reject H0. The data does not provide any evidence that resveratrol affects activity level. d). In the test to see if mean food intake changes after treatment, the p-value is 0.035.
Reject H0: There is evidence that food intake is different after receiving resveratrol. e). Which of the results given in (a) – (d) above are significant at a 1% level?
Only the result in (b) on body mass. That p-value of 0.007 is very small and is significant at the 1% level. Example 2: Multiple Sclerosis and Sunlight It is believed that sunlight offers some protection against multiple sclerosis, but the reason is unknown. Is it the vitamin D, the UV light, or something else? In an experiment, mice were injected with a substance to give them MS and were randomly assigned to either a control group (with no treatment), a group that received vitamin D supplements, or a group that got exposed regularly to UV light. The scientists found that mice exposed to UV light were significantly less likely to get MS than the control mice, but that vitamin D did not seem to reduce the likelihood of getting MS compared to the control group. For these two tests, one of the p-values was 0.470 and one was 0.002. Which p-value goes with which test? Also, for each test, indicate whether we “Reject H0” or “Do not reject H0”.
UV light vs control: p-value = 0.002, Reject H0 (significant results were found in this study and there was evidence of a difference) Vitamin D vs control: p-value = 0.470, Do not reject H0 (The results of this study were not significant and we did not find evidence of a difference) Quick Self-Quiz: Determining significance We give p-values for four different tests: Test A: p-value = 0.23 Test B: p-value = 0.008 Test C: p-value = 0.03 a). Which of the tests are significant at a 10% level?
10%: B, C, D
Test D: p-value = 0.094
At a 5% level?
At a 1% level?
5%: B, C
1%: only B
b). Using a 5% significance level, give the formal conclusion for each test.
Test A: Do not reject H0
Test B: Reject H0
Test C: Reject H0
Test D: Do not reject H0
Quick Self-Quiz: Making Conclusions 1. In a hypothesis test of H0: = 18 vs Ha: > 18, we obtain a p-value of 0.016. Using = 0.05, we conclude: a). Reject H0
b). Do not reject H0
c). Reject Ha
d). Do not reject Ha
Point out that options (c) and (d) are never viable options. 2. In a hypothesis test of H0: = 18 vs Ha: > 18, we obtain a p-value of 0.016. Using = 0.05, we conclude: a). There is evidence that = 18
b). There is evidence that > 18
c). There is no evidence of anything
Point out that (a) is never a viable option. Example 3: Sugar in Bottled Iced Tea The nutrition label on a brand of iced tea says that the average amount of sugar per bottle is 25 grams. A chemical analysis of a sample of 30 bottles finds a mean of 33.8 grams of sugar per bottle. Test to see if this provides significant evidence that the true average is greater than 25. A randomization distribution for the test is shown, showing 1000 randomization statistics. [Show all details: state hypotheses, give notation and value of the sample statistic, use the randomization distribution to estimate the p-value, give a formal conclusion at a 5% level, and give a conclusion in context.] H0: = 25 where is mean grams of sugar for all bottles Ha: > 25 Statistic: 𝑥̅ = 33.8 p-value is proportion of statistics to the right of 33.8, which appears to be 3/1000 = 0.003. We estimate p-value = 0.003. (This is just an estimate, but students should know the p-value is small!) Formal conclusion: Reject H0 Conclusion in context: There is strong evidence that the mean number of grams of sugar in bottles of this iced tea is greater than 25.
More Examples: Playing with StatKey In each case, state the hypotheses, use StatKey to generate the randomization distribution, and find the p-value. Make a conclusion in the test. 1. Is there evidence of a difference in mean time spent exercising between males and females? In the sample, we have a mean of 12.4 hours a week for 20 males and a mean of 9.4 hours a week for 30 females. H0: m = f vs Ha: m f We see on StatKey that the randomization distribution is centered at the null parameter of zero. The sample difference in means of 3.0 in this two-tail test gives us a p-value of 2(0.108) = 0.216. This is a large pvalue, so we do not reject H0 and we do not find evidence of a difference in mean number of hours spent exercising between males and females. 2. Is there evidence of a negative correlation between blood pressure and heart rate? In a sample of 200 patients, we found a sample correlation of -0.057. H0: = 0 vs Ha: < 0 We see on StatKey that the randomization distribution is centered at the null parameter of zero. The sample difference in means of -0.057 in this left-tail test gives us a p-value of 0.221. This is a large p-value, so we do not reject H0 and we do not find evidence of a negative correlation between blood pressure and heart rate.
Section 4.4: A Closer Look at Testing Example 1: BPA in Tomato Soup A consumer protection agency is testing a sample of cans of tomato soup from a company. If they find evidence that the average level of the chemical bisphenol A (BPA) in tomato soup from this company is greater than 100 ppb (parts per billion), they will recall all the soup and sue the company. a). State the null and alternative hypotheses. This is a test for a single mean. The hypotheses are H0: = 100 vs Ha: > 100
b). What does a Type I error mean in this situation. A Type I error means the company’s mean is within normal bounds of 100 (the null hypothesis is true) but the sample obtained happens to show(incorrectly) that the mean is too high and the agency ends up recalling all the soup and suing the company when it shouldn’t have.
c). What does a Type II error mean in this situation. A Type II error means the company’s mean is too high (the null hypothesis is false) but the sample obtained doesn’t give sufficient evidence to show that it is too high and the agency (incorrectly) decides not to recall the soup or sue the company.
d). Which is more serious, a Type I error or a Type II error? (There is no right answer to this one. It is a matter of opinion and one could argue either way.) Both seem pretty serious so you really want to try to not make an error. (Good time to remind them of the benefits of a larger sample size!)
Example 2: Analogy to Law A hypothesis test is similar to the way our legal system works. For each underlined word or phrase below, give the analogy in a statistical test. A person is innocent until proven guilty. Innocent: H0
Guilty: Ha
Evidence must be beyond a shadow of a doubt. Evidence: the data! (quantified by the p-value) Shadow of a doubt: Significance level There are two kinds of errors: Convicting an innocent person Releasing a guilty person
Convicting an innocent person: Type I error Releasing a guilty person: Type II error
Page 2
Example 3: Vitamin E and Heart Attacks? Suppose 100 tests are conducted to determine whether taking vitamin E increases one's chances of having a heart attack. Suppose also that vitamin E has absolutely no effect on one's likelihood of having a heart attack. The tests will use a 5% significance level. (a) How many of the tests are likely to show significance, just by random chance? 5% of the 100 tests, or 0.05(100) = 5 tests (Remember that the significance level gives the probability of making a Type 1 error, so about 5% of the 100 tests will make a Type I error. In this case, that means showing significance when there really is nothing significant.) (b) If only the significant tests are reported, what is the only information the public is likely to hear? The public will hear the false information that vitamin E causes heart attacks! Emphasize that all significant tests should be replicated in further tests before we are confident in the results.
Quick Self-Quiz: Experimenting with Sample Size on StatKey Suppose that we are testing a coin to see if it is fair, so our hypotheses are H0: p = 0.5 vs Ha: p ≠ 0.5. In each of (a) and (b) below, use the “Edit Data” option on StatKey to find the p-value for the sample results and give a conclusion in the test. (a) We get 56 heads out of 100 tosses. The p-value is about 0.14. An outcome of 56 heads in 100 tosses is relatively likely to happen by random chance, and we do not have evidence that the coin is not fair.
(b) We get 560 heads out of 1000 tosses. The p-value is very small, close to zero. An outcome of 560 heads in 1000 tosses is very unlikely to happen just by random chance with a fair coin, so we have strong evidence that the coin is not fair.
(c) Compare the sample proportions in parts (a) and (b). Compare the p-values. Why are the p-values so different? The sample proportions are the same, 0.56 in both (a) and (b). The p-values are very different: 0.14 (not at all significant) to 0.000 (very significant!) The difference is due to the sample size. Sample size is very important in statistics, and a larger sample size can help us find significant results, such as a biased coin, if the coin really is biased.
Section 4.5: Making Connections Example 1: Normal Human Body Temperature We find a 95% confidence interval for mean body temperature to be 98.05 to 98.47. What is the conclusion of a test of H0: = 98.6 vs Ha: 98.6$? What significance level is used in making the conclusion? The value 98.6 is not inside the confidence interval, so 98.6 is not a plausible value for and we reject H0. There is evidence that mean body temperature is not 98.6F. The significance level used is 5%, since the confidence level used was 95% for the interval. (Note: There is a great image in the text and also provided in the PowerPoint slides of a bootstrap distribution and randomization distribution together for this example. It provides a great visual display of the connection between plausible values for in the bootstrap distribution and sample statistics for which we do not reject the null hypothesis. Show this image if possible!)
Example 2: Happy Family? The Pew Research Center asked a random sample of US adults age 18 to 29 ``Does a child need both a father and a mother to grow up happily?” A 95% confidence interval is given below for p, the proportion of all US adults age 18 to 29 who say yes. Use the interval to state the conclusion to a hypothesis test of H0: p = 0.5 vs Ha: p 0.5. (a) In 2010, the 95% confidence interval was 0.487 to 0.573. Since 0.5 is in the confidence interval 0.487 to 0.573, and thus is a plausible value for p, we do not have evidence against the null hypothesis so we do not reject H0. At a 5% level, we do not have evidence in 2010 that the proportion is different from 0.5. (b) In 1997, the 95% confidence interval was 0.533 to 0.607. Since 0.5 is not in the confidence interval 0.533 to 0.607, and thus is not a plausible value for p, we do have evidence against the null hypothesis, so we reject H0. At a 5% level, we have evidence that the proportion in 1997 is different from 0.5. Quick Self-Quiz: Intervals and Tests Using the confidence interval given, indicate the conclusion of the test and indicate the significance level used. (a) A 95% confidence interval for a mean is 12.5 to 17.1. Testing H0: = 18 vs Ha: 18. 18 is outside the interval so is not a plausible value for so we reject H0 at a 5% level. (b) A 90% confidence interval for a proportion p is 0.62 to 0.80. Testing H0: p = 0.65 vs Ha: p 0.65. 0.65 in inside the interval so is a plausible value for p so we do not reject H0 at a 10% level. (c) A 99% confidence interval for a difference in proportions is – 0.10 to 0.20. Testing H0: p1 = p2 vs Ha: p1 p2. A difference of 0 (no difference in proportions) is inside the interval, so is a plausible value for p1 – p2 so we do not reject H0 at a 1% level.
Example 3: Evaluating Drugs to Fight Cocaine Addiction In a randomized experiment on treating cocaine addiction, 48 cocaine addicts who were trying to quit were randomly assigned to take either desipramine (a new drug), or Lithium (an existing drug). The response variable is whether or not the person relapsed (which means the person was unable to break out of the cycle of addiction and returned to using cocaine.) We are testing to see if desipramine is better than lithium at treating cocaine addiction. The results are shown in the two-way table. Desipramine Lithium Total
Relapse 10 18 28
No relapse 14 6 20
Total 24 24 48
(a) Using pD for the proportion of desipramine users who relapse and pL for the proportion of lithium users who relapse, write the null and alternative hypotheses. [Example continued on reverse.] H0: pD = pL vs Ha: pD < pL
(b) Compute the appropriate sample statistic. 10
18
We see that 𝑝̂ 𝐷 = 24 = 0.417 and 𝑝̂𝐿 = 24 = 0.75 so we have 𝑝̂𝐷 − 𝑝̂𝐿 = 0.417 − 0.75 = −0.333 (c) We compute a randomization statistic by assuming the null hypothesis is true. What does that mean in this case? It means that the two proportions are equal and the drug has no effect on the relapse rate. It doesn’t matter what drug is taken. (d) How might we compute a randomization sample for this data? What statistic would we compute as the randomization statistic? Since drug doesn’t matter, we combine all 48 patients together and see that 28 relapsed and 20 didn’t. To see what happens by random chance, we randomly divide them into two groups and compute the difference in proportions of relapses between the two groups. The difference in proportions is the statistic. (e) We can use StatKey to generate a randomization dotplot for the difference in proportions based on this sample and what we might see by random chance of the null hypothesis is true. Describe the resulting distribution. Where is it centered? The resulting distribution will be bell-shaped and centered at the null hypothesis value, which is zero. (f) How extreme is the sample statistic from part (b) in the randomization distribution? This tells us how unlikely the sample data is if the null hypothesis is true (which is the p-value!) Use the sample statistic calculated in (b) to find the p-value for this test. Use the p-value to make a conclusion. This is a left-tail test, and we see on StatKey that the p-value is about -0.016. We reject the null hypothesis and conclude that despramine is significantly better at helping people kick the cocaine habit.
Example 4: Normal Human Body Temperature, Revisited Normal human body temperature is generally considered to be 98.6F. We wish to test to see if there is evidence that mean body temperature is different from 98.6F. We collect data from a random sample of 50 people and find 𝑥̅ = 98.26. a). State the null and alternative hypotheses. H0: = 98.6 vs Ha: 98.6 b). A randomization distribution requires that the null hypothesis is true. What does that mean in this case? It means that the population mean for the simulated samples must be 98.6. c). Brainstorm: How can we use the data in the sample as much as possible while also forcing the null hypothesis to be true? Ask the students to think about this: we need to use the 50 data values that we have, while also somehow forcing the mean to be 98.6. How can we do that? They might come up with this on their own: we shift all the data values up by 0.34 to use the data (same sample size and same spread) while also forcing the mean to be 98.6. d). Use StatKey to create a randomization distribution for this test, and then use it to find the p-value. Use the p-value to make a conclusion in the test. Notice that the distribution is centered at 98.6 as it should be. We see how extreme the sample statistic of 98.26 is in the tail of the randomization distribution and we remember to double it since this is a two-tail test. We see that the p-value is very small, so even doubling it, we still get a p-value very close to zero. There is very strong evidence that average human body temperature is not 98.6F.
Section 5.1: Hypothesis Tests using Normal Distributions Example 1: Find the specified areas for a standard normal distribution, and sketch the area. (We won’t bother to include the sketches here but encourage students to draw them.) (a) The area to the left of z = –0.8 0.212 (b) The area to the right of z = 1.2 0.115
Quick Self-Quiz: Standard Normal Distribution Find the specified areas for a standard normal distribution, and sketch the area. (a) The area to the right of z = 2.58 0.0049 (You might want to point out that is more than two and half standard deviations above the mean so it is not surprising that it is so small.) (b) The area to the left of z = –1.32 0.093
Example 2: Multiple Ways to Find a P-value Suppose we want to test H0: p = 0.5 vs Ha: p > 0.5 using a sample proportion of 520 out of 1000, or 0.52. •
Use a randomization distribution to find the p-value: p-value is 0.102 (Answers will vary slightly.)
•
What is the standard error from the randomization distribution? _SE = 0.016 __ Model the randomization distribution with a normal distribution with mean 0.5 from the null hypothesis and standard deviation equal to this standard error. Find the p-value by finding the area to the right of 0.56 in this normal distribution. p-value is 0.106
•
Find the standardized test statistic. Then find the p-value by finding the area beyond this standardized value in a standard normal distribution. z test statistic = Sample statistic – Null parameter = 0.52 – 0.5 = 1.25 SE 0.016 This is a right-tail test and the p-value is 0.106.
•
Compare the three p-values! They will be almost the same (up to round-off error.) Have students compare the shape of the curve and the area shaded also. The visual similarities should really help them see the connections!
page 2
Example 4: Is Divorce Morally Acceptable? In a study conducted by the Pew Foundation, we learn that 67% of women in a random sample view divorce as morally acceptable. Does this provide evidence that more than 60% of women view divorce as morally acceptable? The standard error for the estimate assuming the null hypothesis is true is 0.021. (a) What are the null and alternative hypotheses for this test?
H0: p = 0.6 Ha: p > 0.6
(b) What is the standardized test statistic? Sample statistic – Null parameter = 0.67 – 0.6 = 3.333 SE 0.021 (c) Use the standard normal distribution to find the p-value. This is a right tail test. We see that the p-value is 0.00043 (d) What is the conclusion of the test? The p-value is very small so we have strong evidence that more than 60% of all women view divorce as morally acceptable.
Example 5: Do Men and Women Differ in Opinions about Divorce? In the same study described above, we find that 71% of men view divorce as morally acceptable. Use this and the information in the previous example to test whether there is a significant difference between men and women in how they view divorce. The standard error for the difference in proportions under the null hypothesis that the proportions are equal is 0.029. (a) What are the null and alternative hypotheses for this test? (b) What is the standardized test statistic? Sample statistic – Null parameter SE
H0: pM = pF Ha: pM pF = (0.71-0.67) – 0 = 1.379 0.029
(c) Use the standard normal distribution to find the p-value. This is a two-tail test. We see that the p-value is 2(0.084) = 0.168 (d) What is the conclusion of the test? The p-value is larger than any reasonable significance level, so we do not find evidence of a difference between men and women in the proportion that view divorce as morally acceptable.
Quick Self-Quiz: Hypothesis Tests using the Normal Distribution A sample of baseball games shows that the mean length of the games is 179.83 minutes. (The data is given in BaseballTimes). The standard error is 3.75. Does this sample provide evidence that the mean length of time for baseball games is more than 170 minutes? Use the normal distribution and show all details of the test. H0: = 170 vs Ha: > 170 z test statistic = Sample statistic – Null parameter = 179.83 - 170 = 2.621 SE 3.75 This is a right-tail test and the p-value is 0.0044. The p-value is very small so we have strong evidence that the average length of a baseball game is greater than 170 minutes.
Section 5.2: Confidence Intervals using Normal Distributions Example 1: Find endpoints on a standard normal distribution with the given property, and sketch the area. (a) The area between ± z is 0.95. -1.960 and 1.960 (b) The area between ± z is 0.80. -1.282 and 1.282
Quick Self-Quiz: Standard Normal Distribution Find endpoints on a standard normal distribution with the given property, and sketch the area. The area between ±Z is 0.90 -1.645 and 1.645
Example 2: Confidence Intervals Three Ways A survey conducted by the Pew Foundation found that 536 of 908 Twitter users say they use Twitter for get news. We wish to find a 95% confidence interval for the proportion of Twitter users who use it to get news. What is the sample statistic? __0.590___________ •
Use percentiles on a bootstrap distribution to find the 95% confidence interval. 0.559 to 0.621 (Answers may vary slightly.)
•
We can model the bootstrap distribution with a normal distribution with mean equal to the sample statistic and standard deviation equal to the standard error of the bootstrap distribution. Give the mean __0.590_ and standard deviation _0.016__ for this normal distribution. Use percentiles on this normal distribution to find the 95% confidence interval. 0.559 to 0.621
•
What is z* for a 95% confidence interval? _1.960___ Use the formula “Statistic ± z* · SE” to find the 95% confidence interval. Sample statistic z* SE 0.590 1.960 (0.016) 0.590 0.031 0.559 to 0.621
•
Compare the three answers. They are the same (up to minor variations in the simulations).
page 2
Example 3: Obesity in America In Chapter 3, we see that the mean BMI (Body Mass Index) for a large sample of US adults is 27.655. We are told that the standard error for this estimate is 0.009. If we use the normal distribution to find a 99% confidence interval for the mean BMI of US adults: (a) What is z*? 2.575 (b) Find and interpret the 99% confidence interval. Sample statistic z* SE 27.655 2.575(0.009) 27.632 to 27.678 We are 99% confident that the mean BMI of all US adults is between 27.632 and 27.678.
Example 4: Obesity in America: Exercises vs Non-exercisers Also in Chapter 3, we see that the difference in mean BMI between non-exercisers (those who said they had not exercised at all in the last 30 days) and exercisers (who said they had exercised at least once in the last 30 days) is 𝑥̅𝑁 − 𝑥̅𝐸 = 1.915, with a standard error for the estimate of SE = 0.016. If we use the normal distribution to find a 90% confidence interval for the difference in mean BMI between the two groups: (a) What is z*?
1.645
(b) Find and interpret the 90% confidence interval. Sample statistic z* SE 1.915 1.645(0.016) 1.889 to 1.941 We are 90% confident that the mean BMI of non-exercisers is between 1.889 and 1.941 higher than the mean BMI of exercisers, among all US adults.
Quick Self-Quiz: Confidence Intervals using the Normal Distribution In a recent survey of 1000 US adults conducted in January 2013, 57% said they dine out at least once per week. The standard error for this estimate is 0.016. Use the normal distribution to find a 95% confidence interval for the proportion of US adults who dine out at least once per week. Interpret your answer. Sample statistic z* SE 0.57 1.960(0.016) 0.539 to 0.601 We are 95% confident that the proportion of all US adults who would say they dine out at least once per week is between 0.539 and 0.601.
Section 6.1-CI: Confidence Interval for a Proportion Example 1: Movie Goers are More Likely to Watch at Home In a random sample of 500 movie goers in January 2013, 320 of them said they are more likely to wait and watch a new movie in the comfort of their own home. Find and interpret a 95% confidence interval for the proportion of movie goers who are more likely to watch a new movie from home. 320 We see that 𝑝̂ = 500 = 0.64. Point out to the students that it is important to use the sample proportion in decimal form rather than percent form. The confidence interval is given by:
Statistic z* SE
𝑝̂ ± 𝑧 ∗ ∙ √
𝑝̂ (1 − 𝑝̂ ) 𝑛
0.64(1 − 0.64) 0.64 ± 1.96 ∙ √ 500 0.64 ± 0.042
0.598 to 0.682 We are 95% sure that the proportion of all movie goers who are more likely to wait and watch a new movie at home is between 0.598 and 0.682.
Example 2: Sample Size and Margin of Error for Movie Goers (a) What is the margin of error for the confidence interval found in Example 1? Margin of error is 0.042. (b) What sample size is needed if we want a margin of error within ±2%? (Use the sample proportion from the original sample.) z∗ 2 1.96 2 n = ( ) p̃(1 − p̃) = ( ) 0.64(1 − 0.64) = 2212.76 ME 0.02 We need a sample size of at least n = 2,213 to have a margin of error this small. This is substantially more than the sample size of 500 used in the actual survey. (c) What sample size is needed if we want a margin of error within ±2%, and if we use the conservative estimate of p = 0.5? z∗ 2 1.96 2 n = ( ) p̃(1 − p̃) = ( ) 0.5(1 − 0.5) = 2401. ME 0.02 We need a sample size of at least n = 2,401 to have a margin of error this small. Notice that if we have less knowledge of the actual proportion, we need a larger sample size to arrive at the same margin of error.
Quick Self-Quiz: What Percent of US Adults are Thriving? In a random sample of 1500 US adults, 780 of them are “Thriving” which is defined as rating their current life as 7 or higher on a 10 point scale and rating their future life as 8 or higher on a 10 point scale. Find and interpret a 99% confidence interval for the proportion of all US adults who are thriving (based on this definition.) 780 We see that 𝑝̂ = 1500 = 0.52. The confidence interval is given by:
Statistic z* SE
𝑝̂ ± 𝑧 ∗ ∙ √
𝑝̂ (1 − 𝑝̂ ) 𝑛
0.52(1 − 0.52) 0.52 ± 2.575 ∙ √ 1500 0.52 ± 0.033
0.487 to 0.553 We are 99% sure that the proportion of all US adults who are thriving is between 0.487 and 0.553.
Section 6.1-D: Distribution of a Proportion Example 1: Proportion Speaking a Language other than English in Oregon From the 2010 US Census, we learn that 14.6% of the residents of Oregon speak a language other than English at home. If we take random samples of size n = 100 and calculate the proportion of the sample that speaks a language other than English at home, describe the shape, mean, and standard error of the distribution of sample proportions. The distribution will be bell-shaped with a mean of 0.146 and a standard error of 0.146(1 − 0.146) 𝑆𝐸 = √ = 0.035. 100
Describe the shape, mean, and standard error of the distribution of sample proportions if we instead take random samples of size n = 500. What is the effect of the larger sample size on the shape, mean and standard error? The distribution will be bell-shaped with a mean of 0.146 and a standard error of 0.146(1 − 0.146) 𝑆𝐸 = √ = 0.016. 500 The larger sample size does not affect the shape or the center, but it reduces the standard error.
Quick Self-Quiz: Distribution of a Sample Proportion From the 2010 US Census, we learn that 71.8% of the residents of Missouri are 21 years old or over. If we take random samples of size n = 200 and calculate the proportion of the sample that is 21 years old or over, describe the shape, mean, and standard error of the distribution of sample proportions. The distribution will be bell-shaped with a mean of 0.718 and a standard error of 0.718(1 − 0.718) 𝑆𝐸 = √ = 0.032. 200
Section 6.1-HT: Hypothesis Test for a Proportion Example 1: NFL Overtime At the start of overtime in a National Football League game, a coin is flipped to determine which team will kick off and which will receive. The question of interest is how much advantage (if any) is given to the team that wins the coin flip at the start of the sudden death overtime period. In the overtime games played between 1974 and 2009, the winner of the coin flip won the game in 240 of the 428 games in which a winner was determined in overtime. Assume that the overtime games played during this time period can be viewed as a sample of all possible NFL overtime games. Do the data provide sufficient evidence to conclude that the team winning the coin flip has an advantage in overtime games? Show all details of the test. We are testing H0: p = 0.5 vs Ha: p > 0.5 where p represents the proportion of times that the team winning the 240 coin flip wins the game. The sample proportion is 𝑝̂ = 428 = 0.56 and the sample size is n = 428. The test statistic is 𝑧 =
𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐−𝑁𝑢𝑙𝑙 = 𝑆𝐸
𝑝̂−𝑝0 𝑝 (1−𝑝0 ) √ 0 𝑛
=
0.56−0.5 0.5(1−0.5) 428
= 2.483.
√
This is a right-tail test, and we see that the area to the right of 2.483 in a normal distribution is 0.0065, so the p-value is 0.0065. We reject H0 and conclude that there is evidence that the winner of the coin flip has an advantage in overtime games in the NFL.
Quick Self-Quiz: Mendel’s green peas? One of Gregor Mendel’s famous genetic experiments dealt with raising pea plants. According to Mendel’s genetic theory, under a certain set of conditions the proportion of pea plants that produce smooth green peas should be p=3/16 (0.1875). A sample of n=556 plants from the experiment had 108 with smooth green peas. Does this provide evidence of a problem with Mendel’s theory and that the proportion is different from 3/16? Show all details of the test. We are testing H0: p = 0.1875 vs Ha: p ≠ 080.1875 where p represents the proportion of pea plants with smooth 108 green peas. The sample proportion is 𝑝̂ = 556 = 0.1942 and the sample size is n = 556. The test statistic is 𝑧 =
𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐−𝑁𝑢𝑙𝑙 = 𝑆𝐸
𝑝̂−𝑝0 𝑝 (1−𝑝0 ) √ 0 𝑛
=
0.1942−0.1875 0.1875(1−0.1875) 556
= 0.405.
√
This is a two-tail test, and we see that the area to the right of 0.405 in a normal distribution is 0.343, so the p-value is 2(0.343) = 0.686. We do not reject H0 and conclude that this sample does not provide evidence that the proportion of smooth green pea plants is different from the 3/16 that Mendel’s theory predicts. (It is worth pointing out to students that this does not “prove” Mendel’s theory, since we don’t “accept” H0 – we just find a lack of sufficient evidence to refute it. )
Section 6.2-HT: Confidence Interval for a Mean Example 1: Dark Chocolate for Good Health Eleven people were given 46 grams (1.6 ounces) of dark chocolate every day for two weeks, and their vascular health was measured before and after the two weeks. Larger numbers indicate greater vascular health, and the mean increase for the participants was 1.3 with a standard deviation of 2.32. Assume a dotplot shows the data are reasonably symmetric with no extreme values. Find and interpret a 90% confidence interval for the mean increase in this measure of vascular health after two weeks of eating dark chocolate. Can we be 90% confident that the mean change for everyone would be positive? For a 90% confidence interval with 11 − 1 = 10 degrees fo freedom we find t*=1.812. Statistic ± t ∗ ∙ SE s x̅ ± t ∗ ∙ n √ 2.32 1.3 ± 1.812 ∙ √11 1.3 ± 1.268
0.032 to 2.568 We are 90% sure that the mean change in this measure of vascular health for people who eat dark chocolate for two weeks is between 0.032 and 2.568. Since all values in the interval are positive, we are 90% confident that the mean change is positive (improving vascular health.)
Example 2: Sample Size and Margin of Error for Dark Chocolate (a) What is the margin of error for the confidence interval found in Example 1? The margin of error is 1.268. (b) What sample size is needed if we want a margin of error within ±0.5, with 90% confidence? (Use the standard deviation from the original sample to estimate σ.) 𝑧 ∗ 𝜎̂ 2 1.645 ∙ 2.32 2 𝑛=( ) =( ) = 58.26. 𝑀𝐸 0.5 We should use a sample size of at least 59 to achieve this level of accuracy.
Quick Self-Quiz: Cell Phone Calls A survey of 1,917 cell phone users in May 2010 asked “On an average day, about how many cell phone calls do you make and receive on your cell phone?” The mean number of calls was 13.10, with a standard deviation of about 10.2. Find and interpret a 99% confidence interval for the mean number of cell phone calls for all cell phone users. For a 99% confidence interval with 1917 − 1 = 1916 degrees fo freedom we find t*=2,578. Statistic ± t ∗ ∙ SE s x̅ ± t ∗ ∙ n √ 10.2 13.10 ± 2.578 ∙ √1917 13.10 ± 0.60
12.50 to 13.70 We are 99% sure that the mean number of calls per day for all cell phone users is between 12.50 calls and 13.70 calls.
Section 6.2-D: Distribution of a Mean Example 1: Salaries of Major League Baseball Players There were 855 major league baseball players in 2012 and their mean salary was = 3.44 million dollars with standard deviation = 4.70 million dollars. If we take random samples of size n = 30 players and calculate the mean salary, in millions of dollars, of each sample, describe the shape, center, and standard error of the distribution of sample means. The distribution will be bell-shaped and centered at a mean of 3.44 million dollars. The standard error will be 𝑆𝐸 =
4.70 √30
= 0.858.
Example 2: More on Salaries of Major League Baseball Players Using the same data as Example 1, but now taking samples of size n = 75, describe the shape, center, and standard error of the distribution of sample means. Compare your answers with those of Example 1. The distribution will be bell-shaped and centered at a mean of 3.44 million dollars. The standard error will be 𝑆𝐸 =
4.70 √75
= 0.543.
Notice that the shape and center don’t change as the sample size gets larger, but the variability goes down.
Quick Self-Quiz: Using the t-Distribution (a) Find endpoints of a t-distribution with 5% beyond them in each tail if the sample has size n = 18. The degrees of freedom are df = 17 and we see that t* = 1.740. (b) Find the area in a t-distribution to the right of 2.30 if the sample has size n = 15. The degrees of freedom are df = 14 and the right-tail area is 0.019. (c) Find the area in a t-distribution to the left of –1.22 if the sample has size n = 50. The degrees of freedom are df = 49 and the left-tail area is 0.114.
Section 6.2-HT: Hypothesis Test for a Mean Example 1: Laptop Computers and Sperm Count Men hoping to have children are encouraged to avoid hot tubs or saunas, because heating the scrotum by just 1C can reduce sperm count and sperm quality. A new study indicates that men might want to also avoid using a laptop computer on their lap for long periods of time. In the study, 29 men sat with for an hour with a laptop computer on their lap. Mean temperature increase was 2.31C with a standard deviation of 0.96C. Test to see if we can conclude that the average temperature increase for a man with a laptop computer on his lap for an hour is above the danger threshold of 1C. Show all details of the test. We are testing H0: = 1 vs Ha: > 1 where represents the mean scrotal temperature increase for all men who sit with a laptop on their lap for an hour. The test statistic is 𝑡 =
𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐−𝑁𝑢𝑙𝑙 𝑥̅ −𝜇 2.31−1 = 𝑠 0 = 0.96 = 7.349. 𝑆𝐸 √𝑛
√29
This is a right-tail test, and we use a t-distribution with df = 28 to find the p-value. However, with a test statistic this large, we know that the p-value will be essentially zero. We reject H0 and conclude that there is strong evidence that the mean temperature increase for men with a laptop on their lap for an hour is greater than 1C.
Quick Self-Quiz: Penalty Minutes in Hockey Games In the 2010-11 National Hockey League (NHL) regular season, the number of penalty minutes per game for each of the 30 teams ranged from a low of 8.8 for the Florida Panthers to a high of 18.0 for the New York Islanders. The mean for all 30 teams is 12.20 penalty minutes per game with a standard deviation of 2.25. If we assume that this is a sample of all teams in all seasons, test to see if this provides evidence that the mean number of penalty minutes per game for a hockey team is less than 13. Show all details of the test. We are testing H0: = 13 vs. Ha: < 13 where represents the mean number of penalty minutes per game for all hockey teams in the NHL. The test statistic is 𝑡 =
𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐−𝑁𝑢𝑙𝑙 𝑥̅ −𝜇 12.20−13 = 𝑠 0 = 2.25 = −1.947. 𝑆𝐸 √𝑛
√30
This is a left-tail test, and we use a t-distribution with df = 29 to find the p-value. We see that the area to the left of –1.947 in a t-distribution with df = 29 is 0.031, so the p-value is 0.031. At a 5% significance level, we reject H0 and conclude that there is evidence that the mean number of penalty minutes is less than 13. However, the results are not strong enough to be significant at the 1% level.
Section 6.3-CI: Confidence Interval for a Difference in Proportions Example 1: Mobile Connections to Libraries In a random sample of 2,252 Americans age 16 and older, 11% of the 1,059 men and 16% of the 1,193 women said they have accessed library services via a mobile device. Find a 95% confidence interval for the difference in proportion accessing libraries via mobile devices, between men and women. The conditions are met for using the normal distribution. The confidence interval is given by:
Statistic z* SE
(𝑝̂ 𝑀 − 𝑝̂𝐹 ) ± 𝑧 ∗ ∙ √
𝑝̂𝑀 (1 − 𝑝̂𝑀 ) 𝑝̂ 𝐹 (1 − 𝑝̂ 𝐹 ) + 𝑛𝑀 𝑛𝐹
0.11(1 − 0.11) 0.16(1 − 0.16) (0.11 − 0.16) ± 1.96 ∙ √ + 1059 1193 −0.05 ± 0.028
–0.078 to –0.022 We are 95% sure that the proportion of men who access libraries via mobile devices is between 0.078 and 0.022 less than the proportion of women who access libraries via mobile devices. Note that if we had subtracted the other way, the interval would be positive, but the interpretation would be the same.
Quick Self-Quiz: Smoking and Pregnancy Rate? Does smoking negatively affect a person’s ability to become pregnant? A study collected data on 678 women who were trying to get pregnant. The two-way table shows the proportion who successfully became pregnant during the first cycle trying and smoking status. Find a 90% confidence interval for the difference in proportion of women who get pregnant, between smokers and non-smokers. Interpret the interval in context. Pregnant Not pregnant Total
Smoker 38 97 135
Non-smoker 206 337 543
Total 244 434 678
The conditions are met for using the normal distribution (at least 10 values in each cell of the table). We see that the proportion of smokers who got pregnant is 38/135 = 0.281 while the proportion of non-smokers who got pregnant is 206/543 = 0.379. The confidence interval is given by:
Statistic z* SE 𝑝̂𝑆 (1 − 𝑝̂𝑆 ) 𝑝̂𝑁 (1 − 𝑝̂𝑁 ) (𝑝̂𝑆 − 𝑝̂ 𝑁 ) ± 𝑧 ∗ ∙ √ + 𝑛𝑆 𝑛𝑁 0.281(1 − 0.281) 0.379(1 − 0.379) (0.281 − 0.379) ± 1.645 ∙ √ + 135 543 −0.098 ± 0.072
–0.170 to –0.026 We are 90% sure that the proportion of smokers who get pregnant in the first cycle is between 0.170 and 0.026 less than the proportion of non-smokers who get pregnant on the first cycle. Note that if we had subtracted the other way, the interval would have only positive values, but the interpretation would be the same.
Section 6.3-D: Distribution of a Difference in Proportions Example 1: Proportion of Foreign-Born Residents, Alabama and Arizona From the 2010 US Census, we learn that 13.9% of the residents of Arizona were born outside the US while 3.4% of the residents of Alabama were born outside the US. If we take random samples of 500 residents from each state and calculate the difference in the proportion of foreign-born residents (Arizona – Alabama), describe the shape, mean, and standard error of the distribution of differences in proportions. The distribution will be bell-shaped with a mean of 𝑝𝐴𝑍 − 𝑝𝐴𝐿 = 0.139 − 0.034 = 0.105 and a standard error of 0.139(1 − 0.139) 0.034(1 − 0.034) 𝑆𝐸 = √ + = 0.017. 500 500 Notice that the smallest proportion times sample size is 0.034(500) = 17, which is larger than 10, so it is appropriate to use a normal curve to model this distribution.
Example 2: Coke/Pepsi Taste Test Suppose 500 people participate in a blind Coke/Pepsi taste test, and 285 of them prefer Coke while the other 215 of them prefer Pepsi. (a) If we conduct inference (creating a confidence interval or conducting a hypothesis test) using this data, should we use the formulas for a single proportion or a difference in proportions? A single proportion! There is only one sample: the 500 people who participated. (b) If we want to test whether the preferences are equally split between Coke and Pepsi, what is the null hypothesis? 𝐻0 : 𝑝 = 0.5 (c) In terms of the outcome of the test, does it matter whether we define p to be the proportion of people who prefer Coke or the proportion of people who prefer Pepsi? The p-value will be exactly the same regardless of which way we define the proportion, since knowing one completely determines the other. In this case, the sample proportion who prefer Coke is 285/500 = 0.57, so the sample proportion preferring Pepsi is 1 – 0.57 = 0.43.
Section 6.3-HT: Hypothesis Test for a Difference in Proportions Example 1: Accuracy of Lie Detectors Participants in a study to evaluate the accuracy of lie detectors were divided into two groups, with one group reading true material and the other group reading false material, while connected to a lie detector. Both groups received electric shocks to add stress. The two way table indicates whether the participants were lying or telling the truth and also whether the lie detector indicated they were lying or not. (a) Are the conditions met for using the normal distribution? Yes (all cell counts at least 10) (b) Find the three sample proportions for the proportion of times the lie detector says the person is lying (the proportion for the lying people, the proportion for the truthful people, and the pooled proportion).
Person lying Person not lying Total
Detector says lying 31 27 58
Detector says not 17 21 38
Total 48 48 96
31
We see that the proportion for the lying people is 𝑝̂𝐿 = 48 = 0.6458, the proportion for the not lying 27
58
people is 𝑝̂𝑁 = 48 = 0.5625, and the pooled proportion for all 96 people is 𝑝̂ = 96 = 0.6042. (c) Test to see if there is a difference in the proportion of times the lie detector says the person is lying, depending on whether the person is lying or telling the truth. Show all details of the test. We are testing H0: pL = pN vs Ha: pL pN . The test statistic is 𝑧 =
𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐−𝑁𝑢𝑙𝑙 = 𝑆𝐸
(𝑝̂𝐿 −𝑝̂𝑁 )−0 ̂ (1−𝑝 ̂) 𝑝 ̂ (1−𝑝 ̂) 𝑝 √ 𝑛 + 𝑛 𝐿 𝑁
=
0.6458−0.5625 0.6042(1−0.6042) 0.6042(1−0.6042) + 48 48
= 0.834.
√
This is a two-tail test, and the area to the right of 0.834 in a normal distribution is 0.202, so the p-value is 2 (0.202) = 0.404. We fail to reject H0 and conclude that there is not enough evidence that a lie detector can tell whether a person is lying or telling the truth.
Quick Self-Quiz: Tagging Penguins A study was conducted to see if tagging penguins with metal tags harms them. In the study, 100 penguins were randomly assigned to receive a metal tag or (as a control group) an electronic tag. One of the variables studied is survival rate ten years after the penguins were tagged. The scientists observed that 10 of the 50 metal tagged penquins survived while 18 of the 50 electronic penguins survived. (a) Create a two-way table of the data. Include row and column Lived Died Total totals. Metal tag 10 40 50 Electronic tag 18 32 50 (b) Are the conditions met for using the normal distribution? Total 28 72 100 Yes (just barely with 10 penguins with metal tags living) (c) Test to see if the survival rate is lower for metal-tagged penguins than for electronic-tagged penguins. Do metal tags appear to reduce survival rate in penguins? 10 18 18 We test H0: pM = pE vs Ha: pM < pE . Compute 𝑝̂𝑀 = 50 = 0.20, 𝑝̂𝑀 = 50 = 0.36, 𝑝̂ = 50 = 0.36 The test statistic is 𝑧 =
𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐−𝑁𝑢𝑙𝑙 = 𝑆𝐸
(𝑝̂𝑀 −𝑝̂𝐸 )−0 ̂ (1−𝑝 ̂) 𝑝 ̂ (1−𝑝 ̂) 𝑝 + √ 𝑛 𝑛𝐸 𝑀
=
0.20−0.36 0.28(1−0.28) 0.28(1−0.28) √ + 50 50
= −1.782.
This is a left-tail test, and we see that the area to the left of -1.782 in a normal distribution is 0.037, so the p-value is 0.037. At a 5% significance level, we reject H0 and conclude that metal tags appear to reduce survival time in penguins, although the evidence is not strong and is not significant at a 1% level. Notice that we can conclude causation since the results come from a randomized experiment.
Section 6.4-CI: Confidence Interval for a Difference in Means Example 1: The Effect of Tribulus Tribulus is a food supplement that some athletes use to enhance performance. A study on the effects of this supplement randomly assigned 20 athletes to take the supplement for a 20-day period and compared various characteristics to 12 similar athletes who were not given tribulus. On the AAMP (anaerobic alactic muscle power) measurement, the mean for the tribulus group was 1305.6 with a standard deviation of 177.3 and the mean for the control group was 1255.9 with a standard deviation of 66.8. Find and interpret a 90% confidence interval for the difference in mean AAMP between athletes taking tribulus and those not taking it. The conditions are met for using a t-distribution. Are we 90% sure that tribulus has an effect? We find a confidence interval for 𝜇𝑇 − 𝜇𝐶 , where 𝜇𝑇 is the mean AAMP measurement for athletes who have taken tribulus for 20 days and 𝜇𝐶 is the mean AAMP measurement for athletes who have not been taking tribulus. The degrees of freedom for the t-distribution is df = 11, and the confidence interval is given by:
Statistic t* SE
𝑠2 𝑠2 (𝑥̅ 𝑇 − 𝑥̅ 𝐶 ) ± 𝑡 ∗ ∙ √ 𝑇 + 𝐶 𝑛𝑇 𝑛𝐶 (1305.6 − 1255.9) ± 1.796 ∙ √
177.32 66.82 + 20 12
49.7 ± 79.179 –29.479 to 128.879 We are 90% sure that the mean AAMP measurement for athletes taking tribulus is between 29.479 points lower and 128.879 points higher than the mean measurement for athletes not taking tribulus. Since zero is in this interval, we are not confident that tribulus has an effect.
Quick Self-Quiz: Diet Cola and Calcium Does diet cola wash calcium out of our systems? A study to investigate this question randomly assigned 16 healthy women to drink 24 ounces of either diet cola or water. Their urine was collected for three hours after ingestion and calcium excretion was measured. For the 8 diet cola drinkers, mean amount of calcium excreted was 56.0 mg with a standard deviation of 4.93. For the 8 water drinkers, the mean was 49.1 mg with a standard deviation of 3.64. Neither distribution had any significant outliers or skewness. (Why is this important?) Find and interpret a 95% confidence interval for the difference in mean amount of calcium excreted between diet cola drinkers and water drinkers. Does diet cola appear to have an effect? The sample sizes are quite small but we are told that there are no significant outliers or skewness so the conditions are met to use the t-distribution. We use 7 degrees of freedom for the t-distribution. The confidence interval for 𝜇𝐶 − 𝜇𝑊 , where 𝜇𝐶 is the mean amount of calcium excreted by women who drink diet cola and 𝜇𝑊 is the mean amount of calcium excreted by women who drink water, is given by:
Statistic t* SE
2 𝑠𝐶2 𝑠𝑊 √ (𝑥̅𝐶 − 𝑥̅𝑊 ) ± 𝑡 ∙ + 𝑛𝐶 𝑛𝑊 ∗
(56.0 − 49.1) ± 2.364 ∙ √
4.932 3.642 + 8 8
6.9 ± 5.122 1.778 to 12.022 We are 95% sure that the mean amount of calcium excreted by diet cola drinkers is between 1.778 mg and 12.022 mg higher than the mean amount of calcium excreted by water drinkers. All values in this interval are positive, so we are 95% sure that the mean for diet cola drinkers is higher than the mean for water drinkers. Yes, diet cola does appear to have an effect on calcium excretion.
Section 6.4-D: Distribution of a Difference in Means Example 1: Salaries of Baseball Players Of the 855 major league baseball players in the 2012 season, there were 423 pitchers and 432 batters playing other positions. The average salary for the pitchers was 3.189 million dollars with a standard deviation of 4.288, while the average salary for the batters was 3.683 million dollars with a standard deviation of 5.060. Suppose that we take random samples of 30 pitchers and 50 batters and calculate the difference in mean salary between the two groups (Pitchers – Batters). (a) Describe the shape, mean, and standard error of the distribution of differences in means. The distribution will be bell-shaped with a mean of 𝜇𝑃 − 𝜇𝐵 = 3.189 − 3.683 = −0.494 million dollars and a standard error of 𝑆𝐸 = √
4.2882 5.0602 + = 1.061. 30 50
(b) How many degrees of freedom would you use in this situation for a t-distribution when doing inferences for the difference in sample means? Since the sample sizes are 𝑛𝑃 = 30 and 𝑛𝐵 = 30 the degrees of freedom for a t-distribution for the difference in means would be 30 − 1 = 29 (the smaller of the df for the two samples).
Section 6.4-HT: Hypothesis Test for a Difference in Means Example 1: Cognition Score and Alcohol A recent study asked college students to indicate the level of alcohol use, and in this example, we compare the group of students who said they were light drinkers to the group of students who said they were heavy drinkers. Each student was also given several cognitive skills tests and assigned a cognition z-score based on the performance on these tests. The 83 students who said they were light drinkers had a mean cognition z-score of 0.1302 with a standard deviation of 0.75, while the 16 students who said they were heavy drinkers had a mean cognition z-score of –0.2338 with a standard deviation of 0.65. Test, using a 5% significance level, to see if there is evidence that heavy drinkers have a lower mean cognitive level than light drinkers. If the results are significant, can we conclude from this study that heavy drinking affects cognitive ability? We are testing H0: L =H vs Ha: L > H, where L represents the mean cognition level for college students who say they are light drinkers and H represents the mean cognition level for college students who say they are heavy drinkers. The test statistic is 𝑡 =
(𝑥̅ −𝑥̅ )−0 𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐−𝑁𝑢𝑙𝑙 0.1302−(−0.2338) = 𝐿 𝐻 = = 1.998. 2 2 𝑆𝐸 𝑠2 𝑠2 √0.75 +0.65 √ 𝐿+ 𝐻 83 16 𝑛𝐿 𝑛𝐻
This is a right-tail test, and we use a t-distribution with 15 degrees of freedom. We see that the p-value is 0.032. At a 5% significance level, we reject H0 and conclude that students who say they are heavy drinkers have lower cognition scores than students who say they are light drinkers. The results are significant, but we cannot conclude that heavy drinking affects cognitive ability since the data come from an observational study not an experiment. There are many possible confounding variables. (See if the students can name some!)
Quick Self-Quiz: Restaurant Tips and Credit Cards We analyze the percent tip left on 157 bills from the First Crush bistro in Northern New York State. The mean percent tip left on the 106 bills paid in cash was 16.39 with a standard deviation of 5.05. The mean percent tip left on the 51 bills paid with a credit card was 17.10 with a standard deviation of 2.47. Test to see if there is a difference in the mean percent of the bill left as a tip between bills paid with a credit card and bills paid with cash. We are testing H0: 1 =2 vs Ha: 1 2, where 1 represents the mean percent tip for people paying with cash and 2 represents the mean percent tip for people paying with a credit card. The test statistic is 𝑡 =
(𝑥̅ −𝑥̅ )−0 𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐−𝑁𝑢𝑙𝑙 16.39−17.10 = 1 2 = = −1.183. 2 2 𝑆𝐸 2 2 √5.05 +2.47 𝑠1 𝑠2 √ + 106 51 𝑛1 𝑛2
This is a two-tail test, and we use a t-distribution with 50 degrees of freedom. We see that the p-value is 2(0.121) = 0.242. We do not reject H0 and do not find a significant difference in the mean percent tip left between people paying with cash and people paying with a credit card.
Section 6.5: Paired Difference in Means Example: CAOS Comparisons The CAOS (Comprehensive Assessment of Outcomes in Statistics) exam is an online multiple-choice test on concepts covered in a typical introductory statistics course. Students take one version before the start of the course and another version after the course ends. Before and After scores for a possible random sample of 10 students are shown in the table. (An actual random sample of scores are given in Exercise C.68 on page 455 of the text.) Student Before After Difference
A 43 60 17
B 40 45 5
C 48 55 7
D 65 80 15
E 60 85 25
F 48 71 23
G 43 52 9
H 38 35 –3
I 43 54 11
J 55 55 0
a). We are interested in determining whether taking the course increases students’ understanding, as measured by this test. Why should we do a paired design rather than two separate groups? There seems to be a great deal of variation in the scores between the students both before and after the course. We are interested in the increase in score for individuals, and pairing the data reduces the random variation. b). Find the differences (After – Before) for all 10 students. State the mean, standard deviation, and sample size for the differences here: See the differences above as the new fourth row of the table. Using these 10 values, we calculate that the mean is 10.9, the standard deviation is 9.219, and the sample size is 10. c). Test to see if scores at the end of the course are higher, on average, than scores at the beginning of the course. Show all details of the test. We are testing H0: d = 0 vs Ha: d > 0 where d represents the mean increase on a student’s score after taking an introductory statistics course. The test statistic is 𝑡 =
𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐−𝑁𝑢𝑙𝑙 𝑥̅ −𝜇 10.9−0 = 𝑑𝑠𝑑 0 = 9.219 = 3.74 𝑆𝐸 √𝑛𝑑
√10
This is a right-tail test, and we use a t-distribution with df = 9 to find the p-value. We see that the p-value is 0.0023. We reject H0 and conclude that there is strong evidence that, on average, scores increased after taking an introductory statistics course.
d). What is the average increase on the exam after taking the course? Compute and interpret a 95% confidence interval for the improvement in mean CAOS scores between the Before and After scores. Statistic ± t ∗ ∙ SE 𝑠 𝑥̅𝑑 ± t ∗ ∙ 𝑛𝑑 √ 𝑑
10.9 ± 2.262 ∙ 10.9 ± 6.59 4.31 to 17.49
9.219 √10
We are 95% sure that the mean increase for students on the CAOS exam after taking an introductory statistics class is between 4.31 points and 17.49 points.
Section 7-1: Testing Goodness-of-Fit for a Single Categorical Variable Example 1: What Type of Ice Cream? Sixty people were asked whether they preferred vanilla, chocolate, or strawberry ice cream and the results are shown in the table below. Perform a chi-square goodness-of-fit test to determine whether the flavors are equally popular. Flavor Frequency Expected (O-E)2/E Vanilla 28 20 3.2 Chocolate 23 20 0.45 Strawberry 9 20 6.05 Total 60 60 9.7 The null hypothesis is that the flavors are equally likely to be preferred (pi=1/3), while the alternative hypothesis is that the flavors are not equally likely to be selected. The expected counts and contribution to the chi-square statistic are shown as new rows in the table above. We see that the chi-square statistic is 9.7. There are three categories so we have df = 2. Using a chi-square distribution with df = 2, we see that the p-value is 0.0078. This is below most significance levels, so we find strong evidence that the flavors are not equally preferred. Note that the strongest contributor to the chi-square statistic is the count for strawberry, which is below the expected count.
Example 2: ADHD or Just Young? Were you one of the youngest or one of the oldest in your class in elementary school? A new study examines whether the youngest children in a school grade are more likely to be diagnosed with attentiondeficit/hyperactivity disorder (ADHD) than their older peers in the same grade. The study involved almost a million children between the ages of 6 and 12 in British Columbia, Canada. The cutoff date for entering school in Canada is December 31, so those born in January are the oldest in any given class and those born in December are the youngest. The table below shows the number of boys diagnosed with ADHD based on the quarter of the year in which they were born. The table also shows the proportion of all births during that quarter. Is it possible that younger students are being over-diagnosed with ADHD? (This table gives the data for boys. The equivalent data for girls is in Exercise D.31.) Birth Date ADHD Diagnoses Proportion of Births Expected (O-E)2/E Jan – Mar 6880 0.244 8044.2 168.5 Apr – Jun 7982 0.258 8505.7 32.2 Jul – Sep 9161 0.257 8472.8 55.9 Oct – Dec 8945 0.241 7945.3 125.8 Total 32,968 382.4 The null hypothesis is that the proportion of ADHD diagnoses matches the proportion of all births, given in the table, while the alternative hypothesis is that the proportion of ADHD diagnoses do not match the proportion of all births. The expected count for the Jan – Mar cell is n pi = 32,968(0.244) = 8044.2. Computing the other expected counts similarly, we obtain the results in the Expected column in the table. For each cell, we find the contribution to the chi-square statistic. For the Jan – Mar cell, we have (6880 – 8044.2)2/8044.2 = 168.5. Computing the other contributions similarly, we obtain the results in the far right column of the table. The sum of these values is 382.4, which is our chi-square test statistic. This is a very large test statistic! Since there are four categories, we have df = 3. Using a chi-square distribution with df = 3, we see that the pvalue is essentially zero. Comparing observed with expected, we see that there is very strong evidence that boys who are younger than their classmates are much more likely to be diagnosed with ADHD than we would expect by random chance, and boys who are older than their classmates are much less likely to be diagnosed.
page 2
Quick Self-Quiz: Chi-square Goodness-of-Fit Studies in genetics often involve chi-square tests. For one gene, we expect 25% of people to have the variant AA, 25% to have the variant BB, and 50% to have the variant AB. Observed counts of the three variants in one sample are shown. Do these counts provide evidence that the stated proportions are not right? Variant AA BB AB Total
Frequency 142 121 307 570
Observed 142.5 142.5 285
Contribution 0.00 3.24 1.70 4.94
The null hypothesis is that the proportions of pAA=0.25, pBB=0.25, and pAB=0.50 are correct, while the alternative hypothesis is that at least one of the null proportions is not correct. The expected counts and contribution to the chi-square statistic are shown as new rows in the table above. We see that the chi-square statistic is 4.94. There are three categories so we have df = 2. Using a chi-square distribution with df = 2, we see that the p-value is 0.085. At a 5% level, we do not reject H0. These data do not contradict the hypothesized proportions for these three gene variants.
Section 7-2: Testing for an Association Between Two Categorical Variables Example 1: Painkillers and Miscarriage Is there an association between the use of painkillers during pregnancy and the likelihood of miscarriage? Scientists interviewed 1009 pregnant women soon after they got positive results from pregnancy tests about their use of painkillers around the time of conception or in the early weeks of pregnancy. The scientists then recorded which of the pregnancies were successfully carried to term. The results are shown in the table below. (NSAIDS are a class of painkiller that includes aspirin and ibuprofen.) Does there appear to be an association between having a miscarriage and using painkillers? NSAIDs Acetaminophen No painkiller Total
Miscarriage 18 (10.8) 24 (24.7) 103 (109.5) 145
No miscarriage 57 (64.2) 148 (147.3) 659 (652.5) 864
Total 75 172 762 1009
(a) What is the expected count for the cell with NSAID use and Miscarriage? 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 =
𝑅𝑜𝑤𝑇𝑜𝑡𝑎𝑙 ∙ 𝐶𝑜𝑙𝑢𝑚𝑛𝑇𝑜𝑡𝑎𝑙 75 ∙ 145 = = 10.8. 𝑇𝑜𝑡𝑎𝑙 1009
(b) Find all expected counts and add them to the table above. See expected counts in parentheses. (c) What is the contribution to the chi-square statistic for the cell with NSAID use and Miscarriage? 𝐶𝑜𝑛𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 =
(𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 − 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑)2 (18 − 10.8)2 = = 4.80. 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 10.8
(d) Find all six contributions. NSAIDs Acetaminophen No painkiller
Miscarriage 4.80 0.02 0.39
No miscarriage 0.81 0.00 0.06
(e) What is the value of the chi-square test statistic for these data? We add up all the contributions to see that the chi-square test statistic is 6.08. (f) What are the degrees of freedom for the test? What is the p-value? We have three rows and two columns, so df = (3-1)(2-1) = 2 1 = 2. The p-value is 0.0478. (g) Using a 5% significance level, what is the conclusion of the test? Be specific. If there is an association between having a miscarriage and using painkillers, describe how the two variables are related. Using a 5% significance level, we reject H0 (just barely) and conclude that there is an association between painkiller use and the risk of miscarriage. By far the largest contribution to the chi-square statistic comes from the NSAIDS and miscarriage cell, and we see that the number of miscarriages is much larger than expected for those expectant mothers taking NSAIDS (aspirin or ibuprofen) early in pregnancy. However, the risk of miscarriage does not appear to increase with acetaminophen use. Note: This was not an experiment so we should not conclude that NSAIDS necessarily cause miscarriages.
page 2
Quick Self-Quiz: Chi-square Test for Association 478 middle school students in Michigan were asked whether grades, athletic ability, or popularity was most important to them. The results are shown below, broken down by gender. Do the data provide evidence of an association between the two variables? Show all details and be sure to state your conclusion clearly. Boy Girl
Grades 117 130
Sports 60 30
Popular 50 91
H0: Gender and what students value are not associated Ha: Gender and what students value are associated We find the row totals and column totals and then use them to find the expected counts, given in parentheses in the following table.
Boy Girl Total
Grades 117 (117.3) 130 (129.7) 247
Sports 60 (42.7) 30 (47.3) 90
Popular 50 (67.0) 91 (74.0) 141
Total 227 251 478
We find the contribution to the chi-square statistic for each of the six cells:
Boy Girl
Grades 0.00 0.00
Sports 7.01 6.33
Popular 4.31 3.91
Adding up these contributions, we see that the chi-square test statistic is 21.56. Degrees of freedom are 2 and the resulting p-value is 0.00002. There is strong evidence of an association between gender and what middle school students value. We see that boys are more likely than expected to value sports and less likely to value being popular, while girls are less likely to value sports and more likely to value being popular. We don’t see a strong gender effect in attitudes toward school grades.
Section 8-1: Analysis of Variance Example 1: Why Care about Variance? Two datasets are shown below, both with Group A and Group B, and we are testing for a difference in population means between Group A and Group B. The sample means are the same in both cases. Are we more likely to find a difference in population means from the data in Dataset 1 or in Dataset 2? Explain. DATASET #1 Group A 19 20 21 21 19 Mean = 20
DATASET #2 Group A 31 5 30 12 22 Mean = 20
Group B 31 29 29 30 31 Mean = 30
Group B 19 35 20 39 37 Mean = 30
Give this hint to the students: If the next value you see is 31, can you tell which group it is in for Dataset 1? Can you tell which group it is in for Dataset 2? In which case is it easier to tell whether the value is more likely to be from population A or population B? The easier it is to tell the groups apart, the more likely we are to find a difference in population means! The two populations seem to be clearly different for Dataset 1. However, there is so much random variation in Dataset 2 that it is very hard to tell whether there is any difference between Group A and Group B. This is why variation within groups matters! We are much more likely to conclude that there is a difference in population means between Group A and Group B for the data in Dataset 1.
Example 2: Height and Voice Data were collected on the heights of singers and are summarized below. Heights are in inches. Does average height differ by voice? Show all details of the test.
Standard deviation condition is met. H0: The four population means are the same Ha: At least two of the population means are different The analysis of variance table follows: Source Groups Error Total
df 3 126 129
SS 1058.2 796.7 1854.9
MS 352.73 6.32
Soprano Alto Tenor Bass TOTAL
Mean 64.25 64.89 69.15 70.72 67.12
Std.Dev. 1.873 2.795 3.216 2.361 3.792
Sample Size 36 35 20 39 130
F 55.8
The F-statistic of 55.8 is very large so we expect a small p-value. Using the F-distribution with 3 and 126 degrees of freedom, we see that the p-value is essentially zero. There is very strong evidence that mean height differs between the four types of singers! (And not surprisingly, since gender is a very strong confounding variable.)
page 2
Quick Self-Quiz: Analysis of Variance The dataset Cereal shows the number of grams of fiber per serving for 30 different breakfast cereals from three different companies. The summary statistics are shown below. Conduct an analysis of variance test to determine whether there is a difference in mean number of grams of fiber per cereal between the three companies. Show all details of the test.
Standard deviation condition is met. H0: The three population means are the same Ha: At least two of the population means are different
General Mills Kellogg’s Quaker TOTAL
Mean 1.469 1.764 2.567 1.797
Std.Dev. 1.248 2.349 2.174 1.880
Sample Size 13 11 6 30
The analysis of variance table follows: Source Groups Error Total
df 2 27 29
SS 5.00 97.50 102.50
MS 2.50 3.61
F 0.69
The F-statistic of 0.69 is not very large so we don’t expect it to be significant. Using the F-distribution with 2 and 27 degrees of freedom, we see that the p-value is 0.512. We do not reject H0 and do not find evidence that mean amount of fiber differs between the three types of cereals.
Section 8-2: Pairwise Comparisons and Inference after ANOVA Example 1: Height and Voice Data were collected on the heights of singers and are summarized below. Heights are in inches. The data are shown below, along with the ANOVA table. Soprano Alto Tenor Bass TOTAL
Mean 64.25 64.89 69.15 70.72 67.12
Std.Dev. 1.873 2.795 3.216 2.361 3.792
Sample Size 36 35 20 39 130
Source Groups Error Total
df 3 126 129
SS MS 1058.2 352.73 796.7 6.32 1854.9
F 55.8
P 0.00
(a) Is there strong evidence for difference in population means between the four groups? Yes, there is clear evidence of a difference, since the p-value is 0.00. (b) Use the ANOVA results to find a 95% confidence interval for the mean height of tenors. We use 126 for the degrees of freedom to see t* = 1.979. We use √6.32 for the standard deviation. We have 𝑥̅ ± 𝑡 ∗ ∙
√𝑀𝑆𝐸
= 69.15 ± 1.979 ∙
√6.32
= 69.15 ± 1.11 = (68.04,70.26) √𝑛 √20 We are 95\% confidence that the mean height of tenors is between 68.04 inches and 70.26 inches. (c) Use the ANOVA results to find a 95% confidence interval for the difference in mean height between tenors and basses. 1 1 (70.72 − 69.15) ± 1.979 ∙ √6.32( + ) = 1.57 ± 1.37 = (0.20,2.94) 39 20 Since the confidence interval does not contain zero and contains all positive values, we find evidence that the mean height for basses is taller than the mean height for tenors. (d) Use the ANOVA results to test for a difference in mean height between sopranos and tenors. We test H0:µs=µt versus Ha: µs≠µt. The test statistic is 𝑡 =
69.15−64.25 1 1 20 36
= 6.989. Using a t-distribution with 126 df, we find a p-value of 0.00.
√6.32( + )
There is strong evidence of a difference in mean height between sopranos and tenors (with tenors tending to be taller)
Quick Self-Quiz: Pairwise Comparisons Using the data on height and voice from Example 1, and using the ANOVA results, test for a difference in mean height between altos and tenors. We test H0:µa=µt versus Ha: µa≠µt. The test statistic is 𝑡 =
64.89−69.15 1 1 35 20
= −6.045. Using a t-distribution with 126 df, we find a p-value of 0.00.
√6.32( + )
There is strong evidence of a difference in mean height between altos and tenors(with altos tending to be shorter).
Section 9-1: Inference for Slope and Correlation Example 1: Depression and Missed Classes Is depression a possible factor in students missing classes? Two of the variables in the dataset SleepStudy are DepressionScore, which gives score on a standard depression scale with higher numbers indicating greater depression, and ClassesMissed, the number of classes missed during the semester, for a sample of 253 college students. Computer output is shown below for correlation between these two variables and the regression line to predict classes missed based on the depression score. Pearson correlation of DepressionScore and ClassesMissed = 0.154 P-Value = 0.014 The regression equation is ClassesMissed = 1.78 + 0.0831 DepressionScore Predictor Constant DepressionScore S = 3.20806
Coef 1.7771 0.08312
R-Sq = 2.4%
SE Coef 0.2671 0.03368
T 6.65 2.47
P 0.000 0.014
R-Sq(adj) = 2.0%
(a) What is the sample correlation? _0.154_ What is the p-value for testing the correlation? _0.014_ Give the conclusion of the test in context. At a 5% level, we reject H0 and conclude that there is evidence that 0, which means there is evidence some association between depression scores and classes missed. (b) What is the regression line? ̂ 𝐶𝑙𝑎𝑠𝑠𝑒𝑠𝑀𝑖𝑠𝑠𝑒𝑑 = 1.78 + 0.0831 𝐷𝑒𝑝𝑟𝑒𝑠𝑠𝑖𝑜𝑛𝑆𝑐𝑜𝑟𝑒 Find the predicted value and residual for an individual with a depression score of 7 who has missed 4 classes. ̂ The predicted number of missed classes is: 𝐶𝑙𝑎𝑠𝑠𝑒𝑠𝑀𝑖𝑠𝑠𝑒𝑑 = 1.78 + 0.0831 (7) = 2.36. The residual is 4 – 2.36 = 1.64. (c) Interpret the slope of the line in context. If the depression score were one point higher, the predicted number of missed classes would be 0.0831 higher. (d) What is the p-value for a test of the slope? _0.014_ Give the conclusion of the test in context. At a 5% level, we reject H0 and conclude that there is evidence that 1 0, which means there is evidence that a student’s depression score is an effective predictor of the number of classes missed. (e) What is the standard error of the slope? _0.03368_ Find and interpret a 95% confidence interval for the slope of the regression line. We use df = n – 2 = 251, and find that t* = 1.970. We have b1 t* SE 0.0831 1.970(0.03368) 0.0831 0.0664 0.0167 to 0.1495 We are 95% sure that the slope of the linear model to predict classes missed from the depression score for all students at this college is between 0.0167 and 0.1495. (f) Compare the two p-values, from the test for correlation and the test for slope. They are the same, as expected. (g) What is the value of R2 for this model? Interpret it in context. R2=2.4%. The depression scores only explain 2.4% of the variability in number of classes missed.
page 2
Quick Self-Quiz: Inference for Regression: Alcohol and Missed Classes Another possible explanatory variable for the ClassesMissed variable in the SleepStudy dataset, described in Example 1, is Drinks, the number of alcoholic drinks in a week that the college students say they have. Computer output is shown below for correlation between these two variables and the regression line to predict classes missed based on the number of alcoholic drinks. Pearson correlation of Drinks and ClassesMissed = 0.078 P-Value = 0.215 The regression equation is ClassesMissed = 1.86 + 0.0620 Drinks Predictor Constant Drinks S = 3.23679
Coef 1.8644 0.06196
SE Coef 0.3439 0.04979
R-Sq = 0.6%
T 5.42 1.24
P 0.000 0.215
R-Sq(adj) = 0.2%
(a) What is the correlation? _0.078_ What is the p-value for testing the correlation? _0.215_ Give the conclusion of the test in context. We do not reject H0 and do not find evidence of a linear relationship between the number of classes missed and number of alcoholic drinks consumed. (b) What is the regression line? ̂ 𝐶𝑙𝑎𝑠𝑠𝑒𝑠𝑀𝑖𝑠𝑠𝑒𝑑 = 1.86 + 0.062 𝐷𝑟𝑖𝑛𝑘𝑠 Find the predicted value and residual for an individual who drinks 6 alcoholic drinks a week and has missed 2 classes. ̂ The predicted number of missed classes is: 𝐶𝑙𝑎𝑠𝑠𝑒𝑠𝑀𝑖𝑠𝑠𝑒𝑑 = 1.86 + 0.062 (6) = 2.23. The residual is 2 – 2.23 = – 0.23. (c) What is the p-value for a test of the slope? _0.215_ Give the conclusion of the test in context. We do not reject H0 and do not find evidence that the number of alcoholic drinks is effective at predicting the number of classes missed. (d) What is the standard error of the slope? _0.04979_ Find and interpret a 95% confidence interval for the slope of the regression line. We use df = n – 2 = 251, and find that t* = 1.970. We have b1 t* SE 0.062 1.970(0.04979) 0.062 0.098 –0.036 to 0.160 We are 95% sure that the slope of the linear model to predict classes missed from the number of alcoholic drinks per week for all students at this college is between –0.036 and 0.160. (e) Compare the two p-values, from the test for correlation and the test for slope. They are the same, as expected. (f) What is the value of R2 for this model? Interpret it in context. R2 =0.6%. The number of alcoholic drinks per week only explains 0.6% of the variability in number of classes missed.
Section 9-2: ANOVA for Regression Example 1: Depression and Missed Classes Is depression a possible factor in students missing classes? Two of the variables in the dataset SleepStudy are DepressionScore, which gives score on a standard depression scale with higher numbers indicating greater depression, and ClassesMissed, the number of classes missed during the semester, for the sample of college students. Computer output is shown below for the regression line to predict classes missed based on the depression score. The regression equation is ClassesMissed = 1.78 + 0.0831 DepressionScore Predictor Constant DepressionScore S = 3.20806
Coef 1.7771 0.08312
SE Coef 0.2671 0.03368
R-Sq = 2.4%
T 6.65 2.47
P 0.000 0.014
R-Sq(adj) = 2.0%
Analysis of Variance Source DF SS Regression 1 62.70 Residual Error 251 2583.20 Total 252 2645.90
MS 62.70 10.29
F 6.09
P 0.014
(a) How many students were included in the study? _253___ (b) What are the F-statistic and p-value of the ANOVA test? Give a conclusion in context, using a 5% significance level. We see in the ANOVA table that the F-statistic is 6.09 and the p-value is 0.014. At a 5% level, we reject H0 and conclude that this model using depression score to predict classes missed is an effective model.
(c) What is the p-value of the test for slope? How does it compare to the ANOVA p-value? We see in the 5th row down in the output that the p-value for the test of slope is 0.014. It is the same as the ANOVA p-value, as expected. (d) Use the sum of squares values in the ANOVA table to compute R2, and compare the result with the value given in the output. 𝑆𝑆𝑀𝑜𝑑𝑒𝑙 62.70 𝑅2 = = = 0.0237. 𝑆𝑆𝑇𝑜𝑡𝑎𝑙 2645.90 This matches the value given in the output: R-Sq = 2.4% (e) (Optional) What is the standard deviation of the error term? Either compute it from the ANOVA table or find it in the output. 𝑆𝑆𝐸 2583.2 =√ = √10.29 = 3.21. This 𝑛−2 251
We compute the standard deviation of the error term with 𝑠𝜖 = √
is the value S=3.20806 that is circled in the computer output above.
page 2
Quick Self-Quiz: ANOVA for Regression: Alcohol and Missed Classes In addition to the ClassesMissed variable in the SleepStudy dataset described in Example 1, another variable is Drinks, the number of alcoholic drinks in a week that the college students say they have. Computer output is shown below for the regression line to predict classes missed based on the number of alcoholic drinks. The regression equation is ClassesMissed = 1.86 + 0.0620 Drinks Predictor Constant Drinks S = 3.23679
Coef 1.8644 0.06196
SE Coef 0.3439 0.04979
R-Sq = 0.6%
Analysis of Variance Source DF SS Regression 1 16.22 Residual Error 251 2629.67 Total 252 2645.90
T 5.42 1.24
P 0.000 0.215
R-Sq(adj) = 0.2% MS 16.22 10.48
F 1.55
P 0.215
(a) What are the F-statistic and p-value of the ANOVA test? Give a conclusion in context, using a 5% significance level. We see in the ANOVA table that the F-statistic is 1.55 and the p-value is 0.215. At a 5% level, we do not reject H0 and do not find evidence that this model to predict classes missed using number of alcoholic drinks is an effective model.
(b) What is the p-value of the test for slope? How does it compare to the ANOVA p-value? We see in the 5th row down in the output that the p-value for the test of slope is 0.215. It is the same as the ANOVA p-value, as expected. (c) Use the sum of squares values to compute R2, and compare the result with the value given in the output. Do we want this value to be large or small? 𝑆𝑆𝑀𝑜𝑑𝑒𝑙 16.22 𝑅2 = = = 0.00613. 𝑆𝑆𝑇𝑜𝑡𝑎𝑙 2645.90 This matches the value given in the output: R-Sq = 0.6%. We want R2 to be large, to explain more of the variability in the y-values. (d) What is the standard deviation of the error term? Either compute it from the ANOVA table or find it in the output. Do we want this value to be large or small? 𝑆𝑆𝐸
2629.67
We compute the standard deviation of the error term with 𝑠𝜖 = √𝑛−2 = √ 251 = √10.48 = 3.24. This is the value S=3.23679 that is circled in the computer output above. We want the errors (residuals) to be close to zero (which is their mean), so less variability implies the errors tend to be closer to zero.
Section 9-3: Confidence and Prediction Intervals Example 1: Predicting Mustang Prices The dataset MustangPrice contains information for a sample of 25 used Mustang cars offered for sale on a website. Suppose that we are interested in predicting the Price of a Mustang (in thousands of dollars) based on how old it is (Age in years). Some output for fitting this model is shown below. The regression equation is Price = 30.3 - 1.72 Age Predictor Constant Age
Coef 30.264 -1.7168
S = 8.10241
SE Coef 3.440 0.3648
R-Sq = 49.1%
Analysis of Variance Source DF SS Regression 1 1454.4 Residual Error 23 1509.9 Total 24 2964.3
T 8.80 -4.71
P 0.000 0.000
R-Sq(adj) = 46.8% MS 1454.4 65.6
F 22.15
P 0.000
(a) Is age of a used Mustang an effective predictor of its price? Justify your answer using information from the output above. Yes. The p-value from ANOVA (or the test for slope) is 0.014, which is significant at 5%. (b) You are thinking about buying a 5 year-old Mustang, what would you predict the price to be? ̂ = 30.26 − 1.72(5) = 21.66. The predicted price for a 5 year-old Mustang is $21,660. 𝑃𝑟𝑖𝑐𝑒 (c) Two intervals are given below. One is the 95% confidence interval for mean price and one is the 95% prediction interval for individual prices, for Mustangs that are 5 years old. Which is which? Interval A is CI for mean price and the wider Interval B is PI for individual pricss. Age 5
Interval A (17.49, 25.86)
Interval B (4.40, 38.96)
(d) Clearly interpret both of the intervals in part (c) in context. We are 95% sure that the mean price of all 5 year-old Mustangs for sale at this website is between 17.49 and 25.86 thousand dollars ($17,490 to $25,860). We are 95% sure that a randomly chosen 5 year-old Mustang will cost between 4.40 and 38.96 thousand dollars ($4,400 to $38,960). . (e) What is the predicted price for a 10 year-old Mustang? ̂ = 30.26 − 1.72(10) = 13.06. The predicted price for a 10 year-old Mustang is $13,060. 𝑃𝑟𝑖𝑐𝑒 (f) Two intervals are given below. One is the 95% confidence interval for mean price and one is the 95% prediction interval for individual prices, for Mustangs that are 10 years old. Which is which? Interval A is PI for individual prices (wider) and Interval B is CI for mean price. Age 10
Interval A (-4.04, 30.24)
Interval B (9.51, 16.68)
(g) Clearly interpret both of the intervals in part (f) in context. We are 95% sure that the mean price of all 10 year-old Mustangs for sale at this website is between 9.51 and 16.68 thousand dollars ($9,510 to $16,680). We are 95% sure that a randomly chosen 10 year-old Mustang will cost between 0 and 30.24 thousand dollars (i.e., under $30,240). Note that a negative price is not feasible for a used car.
page 2
Quick Self-Quiz: Confidence and Prediction Intervals The NBAPlayers2011 dataset has information on all the regular players in the NBA in the 2010-2011 season. Two of the variables are number of field goals attempted in the season and number of field goals made in the season. Computer output is shown below for the regression line to predict number of field goals made based on the number of field goals attempted. The regression equation is FGMade = - 5.57 + 0.471 FGAttempt Predictor Constant FGAttempt S = 37.0055
Coef -5.568 0.471146
SE Coef 7.764 0.009109
R-Sq = 93.9%
Analysis of Variance Source DF SS Regression 1 3663379 Residual Error 174 238277 Total 175 3901655
T -0.72 51.72
P 0.474 0.000
R-Sq(adj) = 93.9% MS 3663379 1369
F 2675.16
P 0.000
(a) Is the number of field goal attempts a good predictor of the number of field goals made? Justify your answer using specific values in the output. Yes. The p-value from ANOVA (or the test for slope) is 0.000. (b) Interpret the value of R2 for this model. R2 is 93.9%. The number of field goal attempts explains 93.9% of the variability in field goals made. (c) Use the regression equation to find the predicted number of field goals made if a player attempts 100 field goals. ̂ = −5.57 + 0.471(1000) = 465.4. The predicted number of field goals made is 465.4 if a 𝐹𝐺𝑀𝑎𝑑𝑒 player attempts 100 field goals. (d) The output below shows the 95% confidence and prediction intervals for players that attempt 1000 field goals in a season. Clearly interpret both in context. FGAttempt 1000
Fit 465.58
SE Fit 3.35
95% CI (458.96, 472.20)
95% PI (392.24, 538.92)
We are 95% sure that the mean number of field goals made by all players who attempt 1000 field goals is between 458.96 and 472.20. We are 95% sure that the number of field goals made will be between 392 and 539 for any player who attempts 1000 field goal in a season.
Section 10.1: Multiple Predictors Example 1: Predicting GPA Multiple regression output is shown for predicting a college student’s grade point average using the height in inches, number of hours spent watching television per week, combined math and verbal SAT score, and the number of piercings the student has (based on the data in StudentSurvey). The following questions refer to this output. The regression equation is GPA = 2.37 - 0.00983 Height - 0.00505 TV + 0.00123 SAT + 0.0084 Piercings Predictor Constant Height TV SAT Piercings
Coef 2.3692 -0.009829 -0.005045 0.0012290 0.00840
S = 0.367674
SE Coef 0.4547 0.005823 0.003626 0.0001693 0.01114
R-Sq = 15.8%
Analysis of Variance Source DF SS Regression 4 8.4689 Residual Error 333 45.0164 Total 337 53.4853
T 5.21 -1.69 -1.39 7.26 0.75
P 0.000 0.092 0.165 0.000 0.451
R-Sq(adj) = 14.8% MS 2.1172 0.1352
F 15.66
P 0.000
(a) One of the students in the sample has a GPA of 3.13, is 71 inches tall, watches 1 hour of TV per week, has a combined SAT score of 1210, and has no piercings. Find the predicted GPA and the residual for this person. Predicted GPA = 2.37 - 0.00983(71) - 0.00505(1) + 0.00123(1210) + 0.0084(0) = 3.15532 Residual = 3.13 – 3.15532 = -0.02532 (b) Interpret the coefficient of TV in this model. The coefficient is -0.00505, so for every additional hour that a student watches TV per week (assuming all other variables stay the same), the grade point average goes down by about 0.00505 points. (c) State the conclusion of the analysis of variance test. The ANOVA F-statistic is 15.66 and the p-value is 0.000, so there is strong evidence that the model based on Height, TV, SAT, and Piercings is effective at predicting grade point average. (d) Interpret the value of R2 in context. R-Sq = 15.8% which means that 15.8% of the variability in GPA can be explained by the model (or can be explained by the four variables Height, TV watching, SAT score, and number of piercings). (e) Which predictor variable is most significant in the model? SAT, with a p-value of 0.000 (f) Which predictor variable is least significant in the model? Piercings, with a p-value of 0.451 (g) Which predictor variables are significant at a 10% level? Height (just barely) and SAT
page 2
Quick Self-Quiz: Multiple Predictors: Miami Heat The dataset MiamiHeat has information from boxscores from all 82 regular season basketball games played by the Miami Heat NBA team in the 2010-11 season. Multiple regression output is shown for predicting the number of points scored by the Miami Heat in a game based on the number of rebounds, steals, blocks, and assists in the game. The regression equation is Points = 64.7 + 0.216 Rebounds - 0.068 Steals + 0.839 Blocks + 1.22 Assists Predictor Constant Rebounds Steals Blocks Assists S = 9.76719
Coef 64.655 0.2156 -0.0676 0.8389 1.2192
SE Coef 9.724 0.2019 0.4448 0.4671 0.2298
R-Sq = 29.9%
Analysis of Variance Source DF SS Regression 4 3139.06 Residual Error 77 7345.64 Total 81 10484.70
T 6.65 1.07 -0.15 1.80 5.30
P 0.000 0.289 0.880 0.076 0.000
R-Sq(adj) = 26.3% MS 784.76 95.40
F 8.23
P 0.000
(a) In the first game of the season, the Heat scored 80 points, and had 39 rebounds, 10 steals, 6 blocks, and 15 assists. Find the predicted number of points and the residual for this game. ̂ = 64.7 + 0.216(39) − 0.068(10) + 0.839(6) + 1.22(15) = 95.8 𝑃𝑜𝑖𝑛𝑡𝑠 𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙 = 80 – 95.8 = −15.8 (b) Interpret the coefficient of Rebounds in this model. The coefficient is 0.216, so for every extra rebound a team gets in a game, the predicted number of points scored for that team goes up by 0.216 (assuming all other predictor values are the same). (c) State the conclusion of the analysis of variance test. The ANOVA F-statistic is 8.23 and the p-value is 0.000 so there is strong evidence that this model base on Rebounds, Steals, Blocks and Assists is effective at predicting the team’s number of points in a game. (d) Interpret the value of R2 in context. R-Sq = 29.9% so 29.9% of the variability in number of points scored in games by the Miami Heat is explained by this model (or is explained by the number of rebounds, steals, blocks, and assists the team gets). (e) Which predictor variable is most significant in the model? Assists, with a p-value of 0.000 (f) Which predictor variable is least significant in the model? Steals, with a p-value of 0.880 (g) Which predictor variables are significant at a 5% level? Only Assists
Section 10.2: Checking Conditions for a Regression Model Example 1: Depression and Missing Classes We use a regression model to predict number of classes missed based on a depression score for a sample of 253 college students in SleepStudy. The figures below show the scatterplot with regression line (left), the histogram of residuals (middle), and the residuals vs fits plot (right) to help us check the conditions for this model.
(a) One student had a depression score slightly over 20 and missed 20 classes. Put an arrow in all three graphs showing where this student is represented. In the scatterplot we see only two students with 20 classes missed, so the point we want is the one with depression score over 20. The predicted response for this student (dropping down to the regression line) looks like about 3 or 4 classes missed. This makes the residual≈20−4=16, so, in the second graph, the student is represented by the small histogram bar just above 15. In the third graph, we should look for the second largest positive residual (at about 16) which is well above the zero line at a fitted value of about 3.7.
(b) Another student has the highest depression score of everyone in the study. Put a circle in all three graphs showing where this student is represented. In the scatterplot, we see the highest depression score is about 35 with 2 missed classes. The predicted number of classes missed for this student is about 5 from the fitted regression line; more accurately seen as about 4.7, the largest (rightmost) fitted value in the residuals vs. fits plot. Thus the residual is about 2 − 4.7 = −2.7 , which is probably among the values in the tallest bar of the histogram.. See the circles above.
(c) Are the conditions met for using this regression model? Comment on all three graphs in your response. There are several problems with the regression conditions, the most serious of which is the number of large outliers; points well above the line in the scatterplot with regression line. These also contribute to the right skew in the histogram of residuals, violating the normality condition. The residuals vs fits plot doesn't show roughly equal bands on either side of the zero mean, rather we again see the several large positive residuals that aren't balanced with similar sized negative residuals below the line. There is no clear curvature in the data, but the residual vs fits plot shows an interesting pattern as the most extreme negative residuals decrease in regular fashion -- not a random scatter. We should be hesitant to use inference based on a linear model for these data.
page 2
Quick Self-Quiz: Checking Conditions for a Regression Model We check the conditions simulated data for using a variable X to predict a variable Y. The figures below show the scatterplot with regression line (left), the histogram of residuals (middle), and the residuals vs fits plot (right).
(a) One case has an x- value around 30 and a large positive residual. Put an arrow in all three graphs showing where this case is represented. Notice that this case has an x-value around 30, a y-value around 70, and a predicted y-value around 40, so the residual is about 70 – 40 = 30. In the second graph, the case is represented by the small histogram bar at 30. In the third graph, the appropriate dot is the one with a fitted value of about 40 and a residual of about 30. See the arrows above.
(b) Another case has an x-value around 10 and a relatively large negative residual. Put a circle in all three graphs showing where this case is represented. Notice that this case has an x-value around 10, a y-value around 42, and a predicted Y-value around 55, so the residual is about 42 – 55 = –13. In the second graph, the case is represented by the histogram bar at about −15. In the third graph, the appropriate dot is the one with a fitted value of about 55 and a residual of about −13 that stands out below the pattern near the right side of the graph. See the circles above.
(c) Are the conditions met for using this regression model? Comment on all three graphs in your response. There are several problems with the regression conditions, the most serious of which is the curved pattern in the scatterplot, which shows up very clearly both in the scatterplot and in the residuals vs fits plot. Note that with a multiple regression model, we would not have the scatterplot to guide us, but we could still recognize the problem from the residuals vs fits plot. The histogram looks reasonably symmetric and bell-shaped, so we find no serious problems with normality.
Section 10.3: Using Multiple Regression Example 1: Predicting GPA Multiple regression output is shown for predicting a college student’s grade point average using the height in inches, number of hours spent watching television per week, combined math and verbal SAT score, and the number of piercings the student has (based on data in StudentSurvey). The questions below refer to this output. The regression equation is GPA = 2.37 - 0.00983 Height - 0.00505 TV + 0.00123 SAT + 0.0084 Piercings Predictor Constant Height TV SAT Piercings
Coef 2.3692 -0.009829 -0.005045 0.0012290 0.00840
S = 0.367674
SE Coef 0.4547 0.005823 0.003626 0.0001693 0.01114
R-Sq = 15.8%
Analysis of Variance Source DF SS Regression 4 8.4689 Residual Error 333 45.0164 Total 337 53.4853
T 5.21 -1.69 -1.39 7.26 0.75
P 0.000 0.092 0.165 0.000 0.451
R-Sq(adj) = 14.8% MS 2.1172 0.1352
F 15.66
P 0.000
(a) How might we proceed if we want to try to improve this model, using just the data for these variables? Eliminate the variable Piercings, since it is the least significant variable (p-value=0.451) (b) If we eliminate the most insignificant predictor variable in the model, do we expect the coefficients and p-values of the other variables to change or remain the same? We expect them to change, often in unpredictable ways. (c) If we eliminate the most insignificant variable in the model and think that the model will improve, do we expect the F-statistic to increase, decrease, or remain the same? Increase. As the model improves, the F-statistic will increase. (d) If we eliminate the most insignificant variable in the model and expect the model to improve, do we expect the ANOVA p-value to increase, decrease, or remain the same? Decrease. As the model improves, the ANOVA p-value will decrease. In this case, the p-value is already at 0.000. It will still decrease, but the change is so many digits out that we can’t see the change. That’s why it is also helpful to look at the F-statistic. (e) If we eliminate the most insignificant variable in the model and think that the model will improve, do we expect R2 to increase, decrease, or remain the same? What type of change do we hope for in R2? Decrease. If we remove variables, R2 always decreases (although the decrease may be so small that it appears to stay the same up in the digits we see.) The change we hope to see is only a very small decrease. (f) If we eliminate the most insignificant variable in the model and think that the model will improve, do we expect the standard deviation of the error Sε to increase, decrease, or remain the same? Decrease. We hope the errors will get closer to the mean of zero. (a) Use technology to try to find the best model to predict GPA using some or all of these variables. Answers will vary. For example, Height and SAT might work well together.
page 2
Quick Self-Quiz: Improving the Model: Miami Heat The dataset MiamiHeat has information from boxscores from all 82 regular season basketball games played by the Miami Heat NBA team in the 2010-11 season. Multiple regression output is shown for predicting the number of points scored by the Miami Heat in a game based on the number of rebounds, steals, blocks, and assists in the game. The regression equation is Points = 64.7 + 0.216 Rebounds - 0.068 Steals + 0.839 Blocks + 1.22 Assists Predictor Constant Rebounds Steals Blocks Assists S = 9.76719
Coef 64.655 0.2156 -0.0676 0.8389 1.2192
SE Coef 9.724 0.2019 0.4448 0.4671 0.2298
R-Sq = 29.9%
Analysis of Variance Source DF SS Regression 4 3139.06 Residual Error 77 7345.64 Total 81 10484.70
T 6.65 1.07 -0.15 1.80 5.30
P 0.000 0.289 0.880 0.076 0.000
R-Sq(adj) = 26.3% MS 784.76 95.40
F 8.23
P 0.000
(b) How might we proceed if we want to try to improve this model, using only some or all of these variables? Eliminate the variable Steals, since it is the least significant variable (p-value=0.880). (c) If we eliminate the most insignificant predictor variable in the model, do we expect the coefficients and p-values of the other variables to change or remain the same? We expect them to change, often in unpredictable ways. (d) If we eliminate the most insignificant variable in the model think that the model will improve, do we expect the F-statistic to increase, decrease, or remain the same? Increase. As the model improves, the F-statistic will increase. (e) If we eliminate the most insignificant variable in the model and think that the model will improve, do we expect the ANOVA p-value to increase, decrease, or remain the same? Decrease. As the model improves, the ANOVA p-value will decrease. In this case, the p-value is already at 0.000. It will decrease, but the change is so many digits out that we can’t see the change. That’s why it is also helpful to look at the F-statistic. (f) If we eliminate the most insignificant variable in the model and think that the model will improve, do we expect R2 to increase, decrease, or remain the same? What type of change do we hope for in R2? Decrease. If we remove variables, R2 never increases (although the decrease may be so small that it appears to stay the same up in the digits we see.) The change we hope to see is only a very small decrease. (g) If we eliminate the most insignificant variable in the model and think that the model will improve, do we expect the standard deviation of the error Sε to increase, decrease, or remain the same? Decrease. We hope the errors will be smaller in magnitude, so get closer to the mean of zero. (h) Use technology to try to find the best model to predict Points using some or all of these variables. Answers may vary, but dropping just Blocks leaves a pretty effective model..
Section P.1: Probability Rules Example 1: Hockey Hall of Fame The table below shows all players ever inducted into the Hockey Hall of Fame, by place of birth and position. Write each of the following as a probability expression, using the event names given in the table, and find the probability. C = Canada U = USA E = Europe X = Other Total
O = Offense 123 7 6 2 138
D = Defense 71 2 3 1 77
G = Goal 33 1 2 0 36
Total 227 10 11 3 251
a). What is the probability that an inductee chosen at random plays offense? P(O) = 138/251 = 0.550 b). What is the probability that an inductee chosen at random is not from Canada? P(not C) = 1 – (227/251) = 1 – 0.904 = 0.096 c). What is the probability that an inductee chosen at random is a European defenseman? P(E and D) = 3/251 = 0.012 d). What is the probability that an inductee chosen at random is either from the USA or a goalie? P(U or G) = (7 + 2 + 1 + 33 + 2 +0)/251 = 45/251 = 0.180 e). What is the probability that a Canadian inductee plays goal? P(G if C) = 33/227 = 0.145 f). What is the probability that an inductee who plays offense is from Canada? P(C if O) = 123/138 = 0.891
Example 2: Computing Probabilities Suppose that, for events A and B, P(A) = 0.6, P(B) = 0.3, and P(A and B) = 0.24. a). Find P(not A)
= 1 – 0.6 = 0.4
b). Find P(A or B)
= P(A)+P(B)−P(A and B)=0.6 + 0.4 – 0.3 = 0.7
c). Find P(A if B)
=P(A and B)/P(B)= 0.24/0.3 = 0.8
d). Find P(B if A)
= P(A and B)/P(A)=0.24/0.6 = 0.4
e). Are events A and B disjoint? No, since P(A and B) is not zero. f). Are events A and B independent?
No, since P(A and B) = 0.24 but P(A) P(B) = 0.6 03 = 0.18.
page 2
Example 3: Reese’s Pieces In a bag of Reese’s Pieces, there are 28 orange ones, 15 yellow ones, and 12 brown ones, for a total of 55 pieces. a). If we select one at random, what is the probability that it is yellow? P(Y) = 15/55 = 0.273
b). If we select one at random, what is the probability that it is not brown? P(not B) = 1−P(B) = 1 – (12/55) = 1 – 0.218 = 0.782
c). If we select one at random, then put it back and mix them up well so that the selections are random, what is the probability that both the first and second ones are yellow? The probability that the first one is yellow is 15/55. Since we put it back and mix them up, the probability stays the same for the second, so the probability the second is yellow is also 15/55, i.e. these two events are independent. By the multiplication rule for independent events, the probability that both are yellow is P(Y1 and Y2) = P(Y1)P(Y2)=(15/55) (15/55) = 0.074. d). If we select one and keep it and then select a second one, what is the probability that both of them are yellow? The probability that the first one is yellow is 15/55. If we don’t put it back, there are only 14 yellow pieces left out of a total of 54, so the probability the second is yellow is 14/54. By the multiplication rule, the event both are yellow has probability P(Y1 and Y2) = P(Y1)P(Y2 if Y1 )= (15/55) (14/54) = 0.071.
Example 4: Left-Handed About 11% of males are left-handed while about 9% of females are left-handed. a). If a female is selected at random, what is the probability that she is not left-handed? By the complement rule, we have 1 – 0.09 = 0.91
b). If one male and one female are selected at random, what is the probability that both are left-handed? By the multiplication rule and assuming selections are independent since they are randomly selected, we have (0.09)(0.11) = 0.0099.
c). If one male and one female are selected at random, what is the probability that neither is left-handed? By the multiplication rule and assuming selections are independent since they are randomly selected, we have (0.91)(0.89) = 0.8099.
d). If three males are selected at random, what is the probability that none of them are left-handed? By the multiplication rule and assuming selections are independent since they are randomly selected (and we assume the population is very large), we have (0.89)(0.89)(0.89) = 0.705.
Section P.2: Tree Diagrams and Bayes’ Rule Example 1: Flower Seeds A gardener plants seeds for three types of flowers, A, B, and C, with 48% of the seeds for type A, 32% for type B, and the remaining 20% for type C. We know that 90% of the seeds for Type A will germinate, only 30% of the seeds for Type B will germinate, and 60% of the seeds for type C will germinate. Create a tree diagram using this information and use it to answer the questions below.
a). What is the probability that a seed chosen at random is for type A and germinates? This is the top branch of the tree diagram and we see that the probability is 0.432. b). What is the probability that a seed chosen at random is for type C and does not germinate? Looking at this branch of the tree diagram, we see that the probability is 0.080. c). What percent of all seeds in the garden will germinate? Using the total probability rule and adding up all the events that include germinating, we see that the probability is 0.432 + 0.096 + 0.120 = 0.648. d). What is the probability that a seed that germinates is for type A? This is a conditional probability. We see in part (c) that the probability that a seed germinates is 0.648. We see from part (a) that the probability a seed is type A and germinates is 0.432. Therefore this conditional probability is P(A if G ) = P(A and G)/P(G) = 0.432/0.648 = 0.667. e). What is the probability that a seed either is type B or does not germinate? Adding up all the probabilities that include either of these options, we see that the probability is 0.048 + 0.096 + 0.224 + 0.080 = 0.448. f). What is the probability that a seed that does not germinate is for type B? This is a conditional probability. The probability that a seed does not germinate is 0.352 (either adding up these probabilities or using the complement rule on the result of part (c)) and the probability that a seed does not germinate and is type B is 0.224. Therefore this conditional probability is P(B if not G)=P(B and not G)P(not(G)= 0.224/0.352 = 0.636.
page 2
Example 2: False Positives and False Negatives Suppose that only 3 in 10,000 people have a certain disease. In a test for the disease, a positive result indicates that the person may have the disease and a negative result generally means the person does not have the disease. However, the test gives a false positive (positive result when the person does not have the disease) 4% of the time for healthy people and gives a false negative (negative result when the person does have the disease) 1% of the time for people with the disease. If a person has the test and receives a positive result, what is the probability that the person actually has the disease? We first use the information given to construct a tree diagram, using S for sick and H for healthy for the events of having the disease or not, and then P for positive and N for negative for the results of the test. The tree diagram is shown:
We are being asked to find the probability that a person is sick given that the person received a positive test result, which is P(S if P). We see from the tree diagram that the probability a person receives a positive test result is P(P)=0.000297 + 0.03988 = 0.040177 and the probability that a person is sick and receives a positive test result is P(S and P)=0.000297. Therefore the conditional probability is P(S if P) =P(S and P)/P(P) = 0.000297/0.040177 = 0.00739. Notice that even though the person received a (potentially scary) positive test result, the probability is only about 0.007 that the person has the disease, which means there is a 99.3\% chance that the person does not have the disease.
Section P.3: Random Variables and Probability Functions Example 1: Probability Functions Probability functions for three random variables X, Y, and Z are shown below. Fill in the blank to make each a valid probability function. (a)
(b) x 1 2 3 4
p(x) 0.7 0.1 0.1 0.1
(c) x 1 2 3 4
p(x) 0.25 0.25 0.25 0.25
x 10 12 14 16
p(x) 0.1 0.1 0.4 0.4
Example 2: Find the Probabilities (a) For probability function (a) above, find P(X < 3).
0.7 + 0.1 = 0.8
(b) For probability function (a) above, find P(X > 2).
0.1 + 0.1 = 0.2
(c) For probability function (b) above, find P(X is odd).
0.25 + 0.25 = 0.5
(d) For probability function (b) above, find P(X ≥ 2).
0.25 + 0.25 + 0.25 = 0.75
(e) For probability function (c) above, find P(X < 15).
0.1 + 0.1 + 0.4 = 0.6
(f) For probability function (c) above, find P(X = 12 or X = 14).
0.1 + 0.4 = 0.5
Example 3: Matching Means For each of the probability functions (a), (b), and (c) in Example 1 above, pick the best choice below for the mean without doing any calculations. 0.25
0.4
0.7
1.8
2.5
3.2
12.0
13.0
14.0
(a) Mean for X is closest to ___1.8______ (All x-values are between 1 and 4 so the mean has to be between 1 and 4. The probabilities show that the values are weighted closer to 1.) (b) Mean for Y is closest to ___2.5______ (All y-values are between 1 and 4 so the mean has to be between 1 and 4. The probabilities show that the values are equally spread out, so the mean is in the middle.) (c) Mean for Z is closest to ___14.0_____ (All z-values are between 10 and 16 so the mean has to be between 10 and 16. The probabilities show that the values are weighted toward 14 and 16.)
page 2
Example 4: Calculating Means Calculate the mean for each of the probability functions in Example 1. (a) Mean for X = 1(0.7) + 2(0.1) + 3(0.1) + 4(0.1) = 1.6. (b) Mean for Y = 1(0.25) + 2(0.25) + 3(0.25) + 4(0.25) = 2.5. (c) Mean for Z = 10(0.1) + 12(0.1) + 14(0.4) + 16(0.4) = 14.2.
Example 5: Calculating Standard Deviations Calculate the standard deviation for each of the probability functions in Example 1. (a) Variance for X = (1 – 1.6)2(0.7) + (2 – 1,6)2(0.1) + (3 – 1.6)2(0.1) + (4 – 1.6)2(0.1) = 1.04 so the standard deviation is 𝜎𝑋 = √1.04 = 1.020. (b) Variance for Y = (1 – 2.5)2(0.25) + (2 – 2.5)2(0.25) + (3 – 2.5)2(0.25) + (4 – 2.5)2(0.25) = 1.25 so the standard deviation is 𝜎𝑌 = √1.25 = 1.118. (c) Variance for Z = (10 – 14.2)2(0.1) + (12 – 14.2)2(0.1) + (14 – 14.2)2(0.4) + (16 – 14.2)2(0.4) = 3.56 so the standard deviation is 𝜎𝑍 = √3.56 = 1.887.
Section P.4: Binomial Probabilities Example 1: Binomial or Not? In each case below, indicate whether the random variable is binomial or not. If it is, give the values of n and p. (a) In Great Britain, 66% of the people drink tea every day. We sample 50 people at random from Great Britain and count the number who drink tea every day. This is binomial, with n = 50 and p = 0.66. (b) We flip a coin until we get 5 heads and record how many flips of the coin it takes. This is not binomial, since the sample size changes in different iterations. (c) A bag has 25 Reese’s Pieces, with 6 yellow ones, 5 brown ones, and 14 orange ones. We take out 3 of the pieces and count how many are orange. This is not binomial, since after we take out one piece, the proportions change for the remaining pieces. The probability p that a piece is yellow is not constant. (d) A very large bag has thousands of Reese’s Pieces in it, with 24% yellow, 28% brown, and 48% orange. We sample 25 pieces at random and count the number that are yellow. This is binomial, since there are so many pieces in the population that the proportions do not change as we take out one or a small number. We have n = 25 and p = 0.24. (e) Count the number of correct answers on a 10 question multiple choice test, where each question has 5 choices. Students might be tempted to say this follows a binomial with n=10 and p=1/5, but that is only if students are guessing at random! For a real exam, the chance of getting a question correct usually varies from question to question (so p is not constant) and the answers might not be independent (students who miss one question might be more likely to miss a similar question).
Example 2: Calculating Binomial Coefficients Calculate each of the following quantities: (a) 5!
5·4·3·2·1 = 120
(b) 7!
7·6·5·4·3·2·1 = 5040
5 (c) ( ) 2
5! 120 = 2∙6 = 10 2!∙3!
7 (d) ( ) 3
7! 5040 = 6∙24 = 35 3!∙4!
Example 3: Tea Drinkers in Great Britain In Great Britain, 66% of the people drink tea every day. If we sample 10 people at random from Great Britain, calculate the probability that exactly 8 of them drink tea every day. 10 𝑃(𝑋 = 8) = ( ) 0.668 0.342 = 45 ∙ 0.668 0.342 = 0.187 8
page 2
Example 4: Twins and Other Multiple Births Multiple births occur in 3.3% of all pregnancies. If we randomly sample 50 pregnancies, find the probability that (a) None of the pregnancies are multiples 50 𝑃(𝑋 = 0) = ( ) 0.0330 0.96750 = 1 ∙ 0.0330 0.96750 = 0.187 0
(b) Exactly one of the pregnancies is a multiple 50 𝑃(𝑋 = 1) = ( ) 0.0331 0.96749 = 50 ∙ 0.0331 0.96749 = 0.319 1
(c) Exactly two of the pregnancies are multiples 50 𝑃(𝑋 = 2) = ( ) 0.0332 0.96748 = 1225 ∙ 0.0332 0.96748 = 0.266 2
Example 5: Finding Mean and Standard Deviation Find the mean and the standard deviation for (a) The number of people who drink tea every day in a random sample of 10 people in Great Britain, as in Example 3. Mean is 𝜇 = 𝑛 · 𝑝 = 10(0.66) = 6.6 Standard deviation is 𝜎 = √𝑛𝑝(1 − 𝑝) = √10(0.66)(0.34) = 1.498
(b) The number of multiple births in a random sample of 50 pregnancies, as in Example 4. Mean is 𝜇 = 𝑛 · 𝑝 = 50(0.033) = 1.65 Standard deviation is 𝜎 = √𝑛𝑝(1 − 𝑝) = √50(0.033)(0.967) = 1.263
Section P.5: Density Curves and the Normal Distribution Example 1: Find the specified areas for a standard normal density, and sketch the area. (We won’t bother to include the sketches here but encourage students to draw them.) (a) The area below z = 0.8 0.788 (b) The area above z = 1.2 0.115 (c) The area between z = -1 and z = 2 0.819
Example 2: Find endpoints on a standard normal density with the given property, and sketch the area. (a) The area to the left of the endpoint is about 0.20. -0.842 (b) The area to the right of the endpoint is 0.4. 0.253 (c) The area between ± z is 0.80. -1.282 and 1.282
Quick Self-Quiz: Standard Normal Distribution Find the specified areas for a standard normal density, and sketch the area. (a) The area above 2.58 0.0049 (You might want to point out that is more than two and half standard deviations above the mean so it is not surprising that it is so small.) (b) The area below –1.32 0.093 (c) The area between 0.85 and 1.35 0.109 Find endpoints on a standard normal density with the given property, and sketch the area. (a) The area to the right of the endpoint is 0.70 -0.524 (b) The area to the left of the endpoint is 0.10 -1.282 (c) The area between ±Z is 0.90 -1.645 and 1.645
page 2
Example 3: Find the specified areas for a normal density with the given mean and standard deviation. (a) The area above 62 in a N(50,10) density. 0.115
(b) The area between 5 and 8 in a N(10,2) density. 0.152
Example 4: Find endpoints on the given normal density curve with the given property. (a) The area to the right on a N(10,4) density curve is about 0.05. 16.580
(b) The area to the left on a N(100,25) density curve is about 0.35. 90.367
Quick Self-Quiz: Weights of Newborns Suppose weights of newborn babies in one community are normally distributed with a mean of 7.5 pounds and a standard deviation of 1.2 pounds. (a) Use the 95% rule to sketch a graph of this normal density curve. Include a scale with at least three values on the horizontal axis.
5.1
6.3
7.5
(b) What percent of newborns weigh less than 5 pounds? 0.019 (c) What percent of newborns weight more than 11 pounds? 0.0018
(d) If a newborn baby is at the 15th percentile for weight, what is the baby’s weight? 6.256 pounds
8.7
9.9
UNIT A: ESSENTIAL SYNTHESIS & REVIEW Unit A: Essential Synthesis Solutions A.1
(a) Yes
(b) No A.2
(a) No
(b) No A.3
(a) No
(b) Yes A.4
(a) Yes
(b) Yes A.5
(a) No
(b) No A.6
(a) No
(b) Yes A.7
(a) One categorical variable
(b) Bar chart or pie chart (c) Frequency or relative frequency table, proportion A.8
(a) Two quantitative variables
(b) Scatterplot (c) Correlation or slope from regression A.9
(a) One categorical variable and one quantitative variable
(b) Side-by-side boxplots, dotplots, or histograms (c) Statistics by group or difference in means A.10
(a) One quantitative variable
(b) Histogram, dotplot, or boxplot (c) Mean, median, standard deviation, range, IQR A.11
(a) Two categorical variables
(b) Segmented or side-by-side bar charts (c) Two-way table or difference in proportions A.12
(a) One categorical variable and one quantitative variable
(b) Side-by-side boxplots, dotplots, or histograms (c) Statistics by group or difference in means A.13
(a) One quantitative variable
79
UNIT A: ESSENTIAL SYNTHESIS & REVIEW
80 (b) Histogram, dotplot, or boxplot (c) Mean, median, standard deviation, range, IQR A.14
(a) One categorical variable
(b) Bar chart or pie chart (c) Frequency or relative frequency table, proportion A.15
(a) Two quantitative variables
(b) Scatterplot (c) Correlation or slope from regression A.16
(a) Two categorical variables
(b) Segmented or side-by-side bar charts (c) Two-way table or difference in proportions A.17
(a) This is an experiment since subjects were randomly assigned to one of three groups which determined what method was used.
(b) This study could not be “blind” since both the participants and those recording the results could see what each had applied. (c) The sample is the 46 subjects participating in the experiment. The intended population is probably anyone who might consider using black grease under the eyes to cut down on glare from the sun. (d) One variable is the improvement in contrast sensitivity, and this is a quantitative variable. A second variable records what group the individual is in, and this is a categorical variable. (e) Since we are examining the relationship between a categorical variable and a quantitative variable, we could use side-by-side boxplots to display the results. A.18
(a) The cases are the tagged penguins. The variables are type of tag (categorical), number of chicks (quantitative), survival or not (categorical), and length of time on foraging trips (quantitative).
(b) This is an experiment, so we can conclude that the metal tag is causing the problems. (c)
i. One is categorical and the other is quantitative, so it is appropriate to use side-by-side boxplots. We could compare the mean number of chicks between the two tagged groups. ii. Both are categorical so we display the results in a two-way table. The relevant statistics would be the proportion to survive in each of the two tagged groups. iii. One is categorical and the other is quantitative, so we might use side-by-side boxplots. We could compare the mean foraging time between the two tagged groups. iv. Both are quantitative, so we display the results in a scatterplot. The relevant statistics are the correlation and possibly a regression line. v. One is quantitative and one is categorical, so it makes sense to display the results using side-byside boxplots. We might compare the mean foraging time between those who lived and those who died.
A.19
(a) The cases are the students (or the students’ computers). The sample size is n = 45. The sample is not random since the students were specifically recruited for the study.
(b) This is an observational study, since none of the variables was actively manipulated.
UNIT A: ESSENTIAL SYNTHESIS & REVIEW
81
(c) For each student, the variables recorded are: number of active windows per lecture, percent of windows that are distracting, percent of time on distracting windows, and score on the test of the material. All of these variables are quantitative. (Note whether or not a window is distracting is categorical, but for each student the percentage of distracting windows is quantitative.) (d) The number of active windows opened per lecture is a single quantitative variable, so we might use a histogram, dotplot, or boxplot. If we want outliers clearly displayed, a boxplot would be the best choice. (e) The association described is between two quantitative variables, so we would use a scatterplot. An appropriate statistic would be a correlation. Since more time on distracting websites is associated with lower test scores, it is a negative association. (f) No, we cannot conclude that the time spent at distracting sites causes lower test scores, since this is an observational study not an experiment. There are many possible confounding variables and we cannot infer a cause-and-effect relationship (although there might be one). (g) We consider the time on distracting websites the explanatory variable and the exam score the response variable. (h) To make this cause-and-effect conclusion, we would need to do a randomized experiment. One option would be to randomly divide a group of students into two groups and have one group use distracting websites and the other not have access to such websites. Compare test scores of the two groups at the end of the study. It is probably not feasible to require one group to visit distracting websites during class! A.20
(a) The histogram of temperature increases is quite symmetric and bell-shaped.
(b) We see in the computer output that n = 29 for each session. There were 29 volunteers and all 29 participated in each session, on separate days. (c) From the computer output, we see that the mean from Session 1 is an average temperature increase of 2.325◦ C with a standard deviation of 1.058◦ C. The five number summary is (−0.076, 1.571, 2.328, 2.806, 4.915). (d) For Session 3, the mean is x = 1.494 with standard deviation s = 0.617. The smallest value is 0.534, so the z-score is x−x 0.534 − 1.494 z-score = = = −1.56 s 0.617 The minimum is only about one and a half standard deviations below the mean, so, no, it is not more than two standard deviations from the mean. (e) The histogram is approximately symmetric and bell-shaped, so we can use the mean plus and minus two standard deviations to create an interval that is likely to contain 95% of the data. We have: x ± 2s
=
1.911 ± 2(0.747)
=
1.911 ± 1.494
Since 1.911 − 1.494 = 0.417 and 1.911 + 1.494 = 3.405, we expect 95% of the data to be between 0.417 and 3.405. (f) For Session 1 we have Q1 = 1.571 and Q3 = 2.806, so the IQR is 2.806 − 1.571 = 1.235. To see if the largest value of 4.915 is an outlier, we compute Q3 + 1.5(IQR) = 2.806 + 1.5(1.235) = 4.6585 We see that any value larger than 4.6585 is an outlier, so 4.915 is an outlier.
UNIT A: ESSENTIAL SYNTHESIS & REVIEW
82
(g) The temperature increase is highest for Session 1 and lowest for Session 3. There is only one outlier and it is in Session 1. The largest temperature increase happened in Session 1, in which participants had their legs close together and did not use a lap pad. To avoid scrotal temperature increase associated with having a laptop computer on one’s lap, it is more effective to spread legs farther apart than to use a lap pad. A.21
(a) The sample is the 86 patients in the study. The intended population is all people with bladder cancer.
(b) One variable records whether or not the tumor recurs, and one records which treatment group the patient is in. Both variables are categorical. (c) This is an experiment since treatments were randomly assigned. Since the experiment was doubleblind, we know that neither the participants nor the doctors checking for new tumors knew who was getting which treatment. (d) Since we are looking at a relationship between two categorical variables, it is reasonable to use a twoway table to display the data. The categories for treatment are “Placebo” and “Thiotepa” and the categories for outcome are “Recurrence” and “No Recurrence.” The two-way table is shown.
Placebo Thiotepa Total
Recurrence 29 18 47
No recurrence 19 20 39
Total 48 38 86
(e) We compare the proportion of patients for whom tumors returned between the two groups. For the placebo group, the tumor returned in 29/48 = 0.604 = 60.4% of the patients. For the group taking the active drug, the tumor returned in 18/38 = 0.474 = 47.4% of the patients. The drug appears to be more effective than the placebo, since the rate of recurrence of the tumors is lower for the patients in the thiotepa treatment group than for patients in the placebo group. A.22
(a) Yes, the association is approximately linear.
(b) For every additional kilometer a population is away from East Africa, the predicted genetic diversity decreases by 0.0000067. (c) The predicted genetic diversity for a population in East Africa is 0.76. (d) The predicted genetic diversity for the Mayan population is 0.76 − 0.0000067(19,847) = 0.627. (e) The point corresponding to Maya will be above the regression line, because the actual response value of 0.678 is higher than the predicted value of 0.627. (f) The residual for Maya is actual − predicted = 0.678 − 0.627 = 0.051. A.23
(a) The cases are the bills in this restaurant. The sample size is n = 157.
(b) There are seven variables. Bill, Tip, and PctTip are quantitative, while Credit, Server, and Day are categorical. The seventh variable Guests could be classified as either quantitative or categorical depending on what we wanted to do with it. (c) The mean is x = 16.62 and the standard deviation is s = 4.39. The five number summary is (6.7, 14.3, 16.2, 18.2, 42.2). We use the 1.5 · IQR rule to find how large or small a tip percentage has to be to qualify as an outlier. We have IQR = 18.2 − 14.3 = 3.9 and we compute Q1 − 1.5 · IQR Q3 + 1.5 · IQR
= =
14.3 − 1.5(3.9) = 8.45 18.2 + 1.5(3.9) = 24.05
UNIT A: ESSENTIAL SYNTHESIS & REVIEW
83
Any data value less than 8.45 or greater than 24.05 is an outlier. Looking at the minimum (6.7) and maximum (42.2) in the five number summary, it is obvious that there are both small and large outliers in this dataset. (d) Other than several large outliers, the histogram is symmetric and bell-shaped. (e) A table of type of payment and day of the week is shown. The proportion of bills paid with a credit card on Thursday is p̂th = 12/36 = 0.333 and the percent of bills paid with a credit card on Friday is p̂f = 4/26 = 0.154. These two proportions are quite different, so there appears to be an association between these two variables. A much larger percentage appear to pay with a credit card on Thursday than on Friday, perhaps because some people have run out of cash on Thursday but then are flush with cash again after getting paid on Friday.
n (cash) y (credit) Total
m 14 6 20
t 5 8 13
w 41 21 62
th 24 12 36
f 22 4 26
Total 106 51 157
(f) PctTip is a quantitative variable and Server is categorical, so we might use side-by-side boxplots. See the graph below. We see that Server A appears to get the highest median tip percentage (and has two large outliers), while Server B and C are similar.
Server
A
B
C 5
10
15
20
25 30 PctTip
35
40
45
(g) The explanatory variable is Bill while the response variable is PctTip. See the scatterplot. There are several large outliers with high values for PctTip — percentages between 40% and 45%. There is also a possible outlier on the right, with a value of about 70 for Bill. The relationship is not strong at all and may be slightly positive.
UNIT A: ESSENTIAL SYNTHESIS & REVIEW
84 45 40
PctTip
35 30 25 20 15 10 5 0
(h) The correlation is r = 0.135. A.24 Answers will vary.
10
20
30
40 Bill
50
60
70
UNIT A: ESSENTIAL SYNTHESIS & REVIEW
85
Unit A: Review Exercise Solutions A.25 The data is a sample of students at a university. The sample was collected by surveying students taking introductory statistics. The relevant population might be all students at this university, or all students who take introductory statistics at this university, or (if we stretch it a bit) all university students. A.26
(a) The sample is the 200 patients from whom data were collected. A reasonable population is all patients admitted to the ICU at that hospital. Other answers are possible.
(b) The quantitative variables are: Age, Systolic (Systolic blood pressure), and HeartRate. The other 17 variables are all categorical. (c) There are many possible answers, such as “What is the average age of a person being admitted to the ICU?” or “What proportion of patients admitted to the ICU survive?” (d) There are many possible answers, such as “Does gender impact the likelihood of CPR being administered?” or “Is there a strong relationship between heart rate and age?” A.27
(a) The sample is the 48 men. A reasonable population is all men.
(b) There are three variables mentioned: which group a man is assigned to (exercise or rest), the amount of protein converted to muscle, and age. (c) The group variable is categorical and the other two are quantitative. A.28
(a) The categorical variables are Smoke, Vitamin, Gender, VitaminUse, and PriorSmoke. The other 11 variables are quantitative.
(b) There are many possible answers. For example, one possible relationship of interest between two categorical variables is the relationship between smoking status and vitamin use. A possibly interesting relationship between two quantitative variables is between the amount of beta-carotene consumed in the diet (BetaDiet) and the concentration of beta-carotene in the blood (BetaPlasma). One possible relationship of interest between a categorical variable and a quantitative variable might be gender and the number of alcoholic drinks per week. A.29
(a) There are 8 cases, corresponding to the 8 rowers. The two variables are number of days to cross the Atlantic and gender. Number of days to cross the Atlantic is quantitative and gender is categorical.
(b) We need two columns, one for each variable. The columns can be in either order. See the table. Time 40 87 78 106 67 70 153 81 A.30
Gender Male Male Male Male Male Female Female Female
(a) This is an observational study because the explanatory variable (time spent on affection after sex) was not randomly assigned.
UNIT A: ESSENTIAL SYNTHESIS & REVIEW
86
(b) No, because this is an observational study. It’s quite possible that people in stronger, more loving relationships simply tend to spend more time cuddling after sex, not that cuddling after sex causes relationship happiness. (c) No, the phrase “boosts” implies a causal relationship, which cannot be supported by an observational study. (d) No, the phrase “promotes” implies a causal relationship, which cannot be supported by an observational study. A.31
(a) The cases are the 41 participants.
(b) There are many variables in this study. The only categorical variable is whether or not the person participated in the meditation program. All other variables are quantitative variables. These variables include (at minimum): • Brain wave activity before • Brain wave activity after • Brain wave activity 4 months later • Immune response after 1 month • Immune response after 2 months • Negative survey before • Negative survey after • Positive survey before • Positive survey after (c) The explanatory variable is whether or not the person participated in the meditation program. (d) The dataset will have 41 rows (one for each participant) and at least 10 columns (one for each variable). A.32
(a) The “most appealing” question would require just one categorical variable with four possible categories corresponding to the four flavors.
(b) Data for “which are appealing” would need four categorical variables, one for each flavor, with values of yes or no. (c) The “rank the flavors” item would need four variables recording the rank given to each flavor. These could be considered categorical (first, second, ...) or quantitative (numerical value of the rank). (d) The “rate the flavors” item would need four quantitative variables, each with a value between 1 and 10 for the rating assigned to that flavor. A.33 No, we cannot conclude that about 79% of all people think physical beauty matters, since this was a volunteer sample in which only people who decided to vote were included in the sample, and only people looking at cnn.com even had the opportunity to vote. The sample is the 38,485 people who voted. The population if we made such an incorrect conclusion would be all people. There is potential for sampling bias in every volunteer sample. A.34
(a) The sample is the 300 salons that were contacted.
(b) Yes, the sample was collected in a way that should be representative of all tanning salons. (c) The salons are more interested in marketing what they offer than in giving facts. The responses are dishonest, and a completely inaccurate portrayal of the facts, because they are trying to get more business.
UNIT A: ESSENTIAL SYNTHESIS & REVIEW
87
(d) Yes, the sample is well taken so the study is probably accurate in how salons market to teenage girls. A.35
(a) This is an experiment since the background color was actively assigned by the researchers.
(b) The explanatory variable is the background color, which is categorical. The response variable is the attractiveness rating, which is quantitative. (c) The men were randomly divided into the two groups. Blinding was used by not telling the participants or those working with them the purpose of the study. (d) Yes. Since this was a well-designed randomized experiment, we can conclude that there is a causal relationship. A.36
(a) It is an observational study. The researcher asked the boys how often they ate fish and collected data on their intelligence test scores but did nothing to change or determine their levels of fish consumption or intelligence.
(b) The explanatory variable is whether or not fish is consumed at least once a week, and the response variable is the score on the intelligence test. (c) One possible confounding variable is the intelligence level of the parents. Families in which the parents are more intelligent may tend to eat more fish and also to have sons who score higher on an intelligence test. Other possible confounding variables might be whether boys live near the coast or inland or how often the boys’ parents provide home-cooked meals. You can probably think of other possibilities. Remember that a confounding variable is a variable that might influence both the explanatory and the response variables. (d) No. Observational studies cannot yield causal conclusions. A.37
(a) One possibility: Find professors who give an easy first quiz and professors who give a hard first quiz. Compare their students’ grades on a common exam.
(b) The teaching styles of the professors might be a confounding factor. If professors choose to give either an easy first quiz or a hard first quiz, there are probably other differences in the teaching styles which could also dramatically impact the grades on the exam. (c) Randomly divide students into two groups and give one group an easy first quiz and the other group a hard first quiz. Keep everything else as similar as possible between the two groups, such as professors, other quizzes, and homework. Compare grades on the common exam. A.38 The article is assuming causation when it should not be, since the results come from an observational study. A possible confounding variable is the health of the men when they were tested at age 70. Poor health would cause slower walking and greater risk of death. A.39 Snow falls when it is cold out and the heating plant will be used more on cold days than on warm days. Also, when snow falls, people have to shovel the snow and that can lead to back pain. Notice that the confounding variable has an association with both the variables of interest. A.40 No. The researchers only measured the walking habits at the beginning of the study and did not actively control the amounts walked. This is an observational study, not an experiment, so the causal conclusion (that hiking reduces risk) is not justified. This doesn’t mean that walking might not be helpful, just that this study does not establish a cause-and-effect relationship. A.41
(a) The cases are university students. One variable is whether the student lives in a single-sex or co-ed dorm. This is a categorical variable. The other variable is how often the student reports hooking up for casual sex, which is quantitative.
UNIT A: ESSENTIAL SYNTHESIS & REVIEW
88
(b) The type of dorm is the explanatory variable and the number of hook-ups is the response variable. (c) Yes, apparently the studies show that students in same-sex dorms hook-up for casual sex more often, so there is an association. (d) Yes, the president is assuming that there is a causal relationship, since he states that “single-sex dorms reduce the number of student hook-ups.” (e) There is no indication that any variable was manipulated, so the studies are probably observational studies. (f) The type of student who requests a single-sex dorm might be different from the type of student who requests a co-ed dorm. There are other possible confounding variables. (g) No! We should not assume causation from an observational study. (h) He is assuming causation when there may really only be association. A.42
(a) There are two variables: percent college graduates (quantitative) and region of the country (categorical).
(b) The Northeast has the states with the highest percent of college graduates, while the South has the states with the lowest percent of college graduates. The only outlier is a high outlier in the South. (This is the state of Virginia.) (c) There seems to be a clear association between region and percent of college graduates. (d) No, the data are from an observational study and not from an experiment, so we should not conclude there is causation. A.43
(a) The cases are the students and the sample size is 70.
(b) There are four variables mentioned. One is the treatment group (walk in sync, walk out of sync, or walk any way). The other three are the quantitative ratings given on the three questions of closeness, liking, and similarity. (c) This is an experiment since the students were actively told how to follow the accomplice. (d) There are many possible ways to draw this; one is shown in the figure.
Group
AnyWay
InSync
OutofSync 1
2
3
4 Rating
5
6
7
(e) One possible graph to use to look at a relationship of number of pill bugs killed by which treatment group the student was in would be a side-by-side boxplot, since one of these variables is quantitative and one is categorical. We could also use comparative dotplots or comparative histograms. We would use a scatterplot to look at the association of number of pill bugs killed with the rating given on the liking accomplice scale, since both of these are quantitative variables.
UNIT A: ESSENTIAL SYNTHESIS & REVIEW A.44
89
(a) We compute the percentage of smokers in the female column and in the male column. For females, we see that 16/169 = 0.095, so 9.5% of the females in the sample classify themselves as smokers. For males, we see that 27/193 = 0.140, so 14% of the males in the sample classify themselves as smokers. In this sample, a larger percentage of males are smokers.
(b) For the entire sample, the proportion of smokers is 43/362 = 0.119, or 11.9%. (c) There are 43 smokers in the sample and 16 of them are female, so the proportion of smokers who are female is 16/43 = 0.372, or 37.2%. A.45
(a) This is an experiment since the treatment was randomly assigned and imposed.
(b) The cases are the 24 fruit flies. There are two variables. The explanatory variable is which of the two groups the fly is in. The response variable is percent of time the alcoholic mixture is selected. (c) Using xR for the mean of the rejected group and xM for the mean for the mated group, we have xR − xM = 0.73 − 0.47 = 0.26. (d) Yes, since this was a randomized experiment. A.46 We find the total revenue in each of the four cases. (a) Revenue for Standard pricing
=
Number of buyers · Mean price paid
= = =
(0.005 · 15,000) · 12.95 75 · 12.95 $971.25
Under the standard pricing, the photo booth has revenues of $971.25 per day. (b) Using similar reasoning, we find the revenue under each of the three experimental conditions: Revenue if pay what you want = (0.08 · 15,000) · 0.92 = 1200 · 0.92 = $1104.00 Notice that the company makes more money if they allow customers to pay whatever they want, since so many more people are buying the photos. (c) What if half the money is given to charity? We have Initial revenue if half to charity = (0.006 · 15,000) · 12.95 = 90 · 12.95 = $1165.50 The company must then give half this money to charity so the company’s revenue is 0.5 · 1165.50 = $582.75. This is the worst revenue for the company thus far. (d) Finally, we find the revenue if customers can pay whatever they want and half the money is given to charity: Initial revenue if pay what want and half to charity = (0.04 · 15,000) · 5.50 = 600 · 5.50 = $3300.00 Again, we are donating half this money to charity so the company’s revenue under this scenario is 0.5 · 3300 = $1650.00. The best option, both for maximizing revenue and for social responsibility, is to allow customers to pay whatever they want and to donate half the proceeds to charity. We hope some photo booths are starting to do this!
UNIT A: ESSENTIAL SYNTHESIS & REVIEW
90
A.47 Answers will vary. One possible sample is shown below. 17692, 01708, 00099, 04755, 01406, 14937, 06647, 02496, 03850, 04673 See the technology notes to see how to use specific technology to select a random sample. A.48 Neither inference is valid. Voluntary polls are notoriously unreliable, so we can’t infer anything other than the fact that 34% of the sample of people who chose to answer the poll admit to having driven with a pet on their lap. Not all people who visit cnn.com will choose to answer the poll and many in the population may never even see it. A.49
(a) This is an experiment since facial features were actively manipulated.
(b) This “blinding” allows us to get more objective reactions to the video clips. (c) We use xS for the mean of the smiling group and xN for the mean of the non-smiling group. The difference in means is xS − xN = 7.8 − 5.9 = 1.9. (d) Since the results come from a randomized experiment, a substantial difference in the mean ratings would imply that smiling causes an increase in positive emotions. A.50
(a) The total is n = 1502, so we divide each of the frequencies by the total. See the table. Notice that the relative frequencies add to 1.0, as we expect. Cell phone owned Smartphone Cell phone but not a smartphone No cell phone Total
Relative frequency 0.812 0.164 0.025 1.0
(b) We see that 2.5% do not own a cell phone, 16.4% own a cell phone but not a smartphone, and 81.2% own a smartphone. A.51 The two-way table with row and column totals is shown. Cardiac arrest Other cardiac problem Total
Near-death experience 11 16 27
No such experience 105 1463 1568
Total 116 1479 1595
To compare the two groups, we compute the percent of each group that had a near-death experience. For the cardiac arrest patients, the percent is 11/116 = 0.095 = 9.5%. For the patients with other cardiac problems, the percent is 16/1479 = 0.011 = 1.1%. We see that approximately 9.5% of the cardiac arrest patients reported a near-death experience, which appears to be much higher than the 1.1% of the other patients reporting this. A.52
(a) The percent of pregnancies ending in miscarriage is 145/1009 = 14.4%.
(b) For each category, we compute the percent ending in miscarriage: Aspirin: Percent
=
Ibuprofen: Percent
=
Acetaminophen: Percent
=
No painkiller: Percent
=
5 = 22.7% 22 13 = 24.5% 53 24 = 14.0% 172 103 = 13.5% 762
UNIT A: ESSENTIAL SYNTHESIS & REVIEW
91
The percent ending in miscarriage seems to be higher for those women who used aspirin or ibuprofen. Acetaminophen does not seem to pose a greater risk of miscarrying. (c) This is an observational study. There are many possible confounding variables, including the fact that women who take painkillers may have other characteristics that are different from women who are not taking painkillers. It might be any of these other characteristics that is related to the increased proportion of miscarriages. (d) We see that 22 + 53 = 75 of the women took NSAIDs and 5 + 13 = 18 of them miscarried, so we have NSAIDs: Percent =
18 5 + 13 = = 24.0% 22 + 53 75
The percent of miscarriages is higher for women taking NSAIDs than it is for women who did not use painkillers. The use of acetaminophen does not appear to significantly increase the risk. Pregnant women who do not want to miscarry might want to avoid taking NSAIDs. (e) The original table in the exercise is not a two-way table since it does not list all outcomes for each of the variables. A two-way table (showing both “Miscarriage” and “No miscarriage”) is given below.
Aspirin Ibuprofen Acetaminophen No painkiller Total
Miscarriage 5 13 24 103 145
No miscarriage 17 40 148 659 864
Total 22 53 172 762 1009
(f) We have 103 = 71.0% 145 Notice that although certain painkillers appear to increase the risk of a miscarriage, it is still true that within this sample 71% of all miscarriages happened to women who did not use any painkiller. p̂ =
A.53
(a) Since no one assigned smoking or not to the participants, this is an observational study. Because this is an observational study, we can not use this data to determine whether smoking influences one’s ability to get pregnant. We can only determine whether there is an association between smoking and ability to get pregnant.
(b) The sample collected is on women who went off birth control in order to become pregnant, so the population of interest is women who have gone off birth control in an attempt to become pregnant. (c) We look in the total section of our two way table to find that out of the 678 women attempting to become pregnant, 244 succeeded in their first cycle, so p̂ = 244/678 = 0.36. For smokers we look only in the Smoker column of the two way table and observe 38 of 135 succeeded, so p̂s = 38/135 = 0.28. For non-smokers we look only in the Non-smoker column of the two way table and observe 206 of 543 succeeded, so p̂ns = 206/543 = 0.38. (d) For the difference in proportions, we have p̂ns − p̂s = 0.38−0.28 = 0.10. This means that in this sample, the percent of non-smoking women successfully getting pregnant in the first cycle is 10 percentage points higher than the percent of smokers. A.54 The balance point of the histogram appears to be at about 40 years old (the actual mean is 39.8). The middle 95% of the data appear to go from about 20 to 60 years old, so the standard deviation is about 10 years (actual value is 11.2).
UNIT A: ESSENTIAL SYNTHESIS & REVIEW
92 A.55
(a) The sample is 48 participants. The population of interest is all people. The variable is whether or not each person’s lie is detected.
(b) The proportion of time the lie detector fails to report deception is p̂ = 17/48 = 0.35. (c) Since the lie detector fails 35% of the time, it is probably not reasonable to use it. A.56
(a) The lie detector was correct 21 times, so it was wrong 27, meaning the proportion of the time it incorrectly reported deception is p̂ = 27/48 = 0.56.
(b) If I were on a jury I would not trust results from this lie detector. Within the sample it reported lies in over half of the honest cases! A.57 The histogram is relatively symmetric and bell-shaped. The mean appears to be approximately x = 7. To estimate the standard deviation, we estimate an interval centered at 7 that contains approximately 95% of the data. The interval from 3 to 11 appears to contain almost all the data. Since 3 and 11 are both 4 units from the mean of 7, we have 2s = 4, so the standard deviation appears to be approximately s = 2. A.58 The fact that the mean is much larger than the median indicates the likely presence of outliers to the right (large values) and/or a strong right skew in the distribution. A.59
(a) There appear to be some low outliers pulling the mean well below the median. Half of the growing seasons over the last 50 years have been longer than 275 days and half have been shorter. Some of the growing seasons have been extremely short and have pulled the mean down to 240 days.
(b) Here is a smooth curve that could represent this distribution.
240 275
(c) The distribution is skewed to the left. A.60
(a) A dotplot of the body temperatures is shown below.
97.2
97.4
97.6 97.8 98.0 98.2 Body_Temperature
(b) We compute x = 98.0◦ F. It is the balance point in the dotplot.
98.4
98.6
UNIT A: ESSENTIAL SYNTHESIS & REVIEW
93
(c) There are n = 12 data values, so the median is the average of the two middle values. We have m=
97.9 + 98.3 = 98.1◦ F 2
This is a point in the dotplot that has six dots on either side. A.61
(a) The distribution has a right skew. There are a number of apparent outliers on the right side.
(b) The actual median is 140 ng/ml. Estimates between 120 and 160 are reasonable. (c) The actual mean is 189.9 ng/ml. Estimates between 160 and 220 are reasonable. Note that the outliers and right skew should make the mean larger than the median. A.62
(a) We have xf = 6.40.
(b) We have xm = 6.81. (c) We see that xm − xf = 6.81 − 6.40 = 0.41. In this sample, the males, on average, spent 0.41 more hours per week exercising than the females. A.63 The values are in order smallest to largest, and since more than half the values are 1, the median is 1. We calculate the mean to be x = 3.2. In this case, the mean is probably a better value (despite the fact that 12 might be an outlier) since it allows us to see that some of the data values are above 1. A.64
(a) We use technology to see that the mean is x = 85.25 and the standard deviation is s = 33.18.
(b) The longest time is 153 days and the z-score for that is Z-score for 153 days =
x−x 153 − 85.25 = = 2.04 s 33.18
The shortest time is 40 days, with a z-score of Z-score for 40 days =
40 − 85.25 40 − x = = −1.36 s 33.18
The longest time in the sample is slightly more than two standard deviations above the mean, while the shortest time is 1.36 standard deviations below the mean. A.65
(a) Using software, we see that x = 0.272 and s = 0.237.
(b) The largest concentration is 0.851. The z-score is z-score =
x−x 0.851 − 0.272 = = 2.44 s 0.237
The largest value is almost two and a half standard deviations above the mean and appears to be an outlier. (c) Using software, we see that Five number summary = (0.073, 0.118, 0.158, 0.358, 0.851) (d) The range is 0.851 − 0.073 = 0.778 and the interquartile range is IQR = 0.358 − 0.118 = 0.240. A.66
(a) The data are heavily skewed and there appear to be some large outliers. It is most appropriate to use the five number summary.
UNIT A: ESSENTIAL SYNTHESIS & REVIEW
94
(b) No, it is not appropriate to use that rule with this distribution. That rule is useful when data are symmetric and bell-shaped. A.67
(a) The average for both joggers is 45, so they are the same.
(b) The averages are the same, but the set of times for jogger 1 has a much lower standard deviation. A.68
(a) Using technology, we see that the mean is x = 13.15 years with a standard deviation of s = 7.24 years.
(b) We have The z-score for the elephant =
40 − 13.15 Elephant’s value − Mean = = 3.71 Standard deviation 7.24
The elephant is 3.71 standard deviations above the mean, which is way out in the upper tail of the distribution. The elephant is a strong outlier! A.69 It is helpful to compute the row and column totals for the table as shown below. Rosiglitazone Placebo Total
Greater than 50% 5 21 26
Less than 50% 42 27 69
Total 47 48 95
(a) A total of 47 patients received rosiglitazone, while 48 patients received a placebo. (b) There were a total of 95 patients in the study, and 42 + 27 = 69 of them had less than 50% blockage, so we have 69 = 0.726 p̂ = 95 About 72.6% of the patients had blockage less than 50% after 6 months. (c) We consider only patients with greater than 50% blockage, a total of 26 patients. Of these, only 5 patients were on rosiglitazone, so we have p̂ =
5 = 0.192 26
Only 19.2% of the patients with greater than 50% blockage were taking the drug rosiglitazone. (d) We consider only the 48 patients given a placebo. Of these, 27 had less than 50% blockage, so we have p̂ =
27 = 0.5625 48
We see that 56.25% of the patients given a placebo had less than 50% blockage. (e) We are comparing rosiglitazone to a placebo, which are the rows of the table, so we find the proportion with less than 50% blockage in each row. For rosiglitazone, the proportion is 42/47 = 0.894. For those taking a placebo, the proportion is 27/48 = 0.563. The percent is quite a bit higher for those taking the active drug, rosiglitazone. (f) From the results of this sample, it appears that rosiglitazone is effective at limiting coronary blockage. The percent of patients whose blockage was less than 50% after 6 months was almost 90% for those taking the drug and it was only about 56% for those taking a placebo.
UNIT A: ESSENTIAL SYNTHESIS & REVIEW A.70
95
(a) The sample is the 1917 people who filled out the survey. The intended population appears to be all cell phone users.
(b) Since percentages are given rather than the actual counts, this is a relative frequency table. (c) The distribution is skewed to the right. The percentages are high for the categories with fewer calls (1 − 5 and 6 − 10) and get smaller as the number of calls get larger, even though the widths of the intervals increase. We would expect almost no tail on the left and a long tail to the right. (d) The median is m = 5.00 and the mean is x = 13.10. The strong right skew and a few large values greater than 30 pull up the mean and have little effect on the median. A.71
(a) This is an experiment. Double-blind means that neither the patients nor the doctors making the cancer diagnosis knew who was getting the drug and who was getting a placebo.
(b) There are two variables: one records the presence or absence of prostate cancer and the other records whether the individual was in the finasteride group or the placebo group. (c) Here is a two-way table for treatment groups and cancer diagnosis.
Finasteride Placebo Total
Cancer 804 1145 1949
No cancer 3564 3547 7111
Total 4368 4692 9060
(d) We have Percent receiving finasteride =
4368 = 48.2% 9060
(e) A total of 1949 men were found to have cancer, and 1145 of these were in the placebo group, so we have 1145 = 58.7% p̂ = 1949 (f) We have the following cancer rates in each group: Percent on finasteride getting cancer
=
Percent on a placebo getting cancer
=
804 = 18.4% 4368 1145 = 24.4% 4692
The percent getting cancer appears to be quite a bit lower for those taking finasteride. A.72
(a) For most people, the vast majority of phone calls made are quite short. On the other hand, there are often a few very long phone calls during a month. We expect that the bulk of the data values will be between 0 and 5 minutes, with a tail extending out to the right to some phone calls extending perhaps as long as two hours (120 minutes). This describes a distribution that is skewed to the right.
(b) The extremely long phone calls will pull up the mean but not the median, so we expect the mean to be 13.7 minutes and the median to be 2.5 minutes. Notice that this implies that half the phone calls made on this cell phone are less than 2.5 minutes in length. A.73
(a) We see that 22 of the 72 participants found much improvement in sleep quality, so the proportion is 22/72 = 0.306.
UNIT A: ESSENTIAL SYNTHESIS & REVIEW
96
(b) We combine the results of the “medication” and “both” columns to find that 5 + 10 = 15 of the 17 + 19 = 36 people on medication had much improvement in sleep quality, so the proportion is 15/36 = 0.417. (c) Thirty-four participants had no improvement and within this row we find 8 + 3 = 11 who had medication, so the proportion is 11/34 = 0.324. (d) For the denominator we need the totals for the “medication” and “neither” groups who did not receive training, 17 + 18 = 35. In the numerator we add the counts for “much” and “some” improvement in each of the two groups, 5 + 0 + 4 + 1 = 10, so the proportion without training who received some or much improvement is 10/35 = 0.286. A.74 There were 67000/2 = 33500 patients receiving each drug. Since (0.587)(33500) = 19664.5, we estimate that the number of people receiving paricalcitol who survived is about 19,665. Since (0.515)(33500) = 17252.5, we estimate that the number of people receiving calcitriol who survived is about 17,253. The total number of survivors is 19665 + 17253 = 36918, The percent of survivors receiving paricalcitol is p̂ =
19665 = 53.3% 36918
A two-way table (subject to round off error) is given below.
Paricalcitol Calcitriol Total A.75
Survived 19665 17253 36918
Died 13835 16247 30082
Total 33500 33500 67000
(a) Each of the statistics in the five number summary is lower, often by a considerable margin, for the developed nations. The birth rate distributions do appear to be different in developed and undeveloped nations, with the values for birth rates in undeveloped countries tending to be much higher.
(b) Many of the undeveloped countries would be high outliers in birth rate if they were considered developed. The general rule of thumb for detection of larger outliers is Q3 + 1.5(IQR), which for developed nations (where IQR = 13.9 − 9.7 = 4.2) is 13.9 + 1.5(4.2) = 20.2, so in fact even the median for undeveloped countries would be an outlier in developed countries! By contrast, none of the developed countries would be outliers if considered undeveloped. The lower bound for outliers based on the undeveloped quartiles (where IQR = 31.8 − 18.5 = 13.3) is Q1 − 1.5 ∗ IQR = 18.5 − 1.5 ∗ 13.3 = −1.45, and there are no negative birthrates. (c) Turkmenistan’s birth rate of 24.6 would be an outlier among developed countries (see the calculation above which puts the threshold at anything over 20.2). Turkmenistan would not be an outlier in the distribution of undeveloped countries. In fact its birth rate is just a bit above the median for undeveloped countries. (d) Sketch should be similar to the boxplots shown below (although you would need to use the raw data to determine exactly where the right whisker ends and find the outliers for developed countries).
UNIT A: ESSENTIAL SYNTHESIS & REVIEW
97
Developed?
Yes
No
10
A.76
20
30 BirthRate
40
50
(a) There are many possible answers. One boxplot with a right skew is given.
20
40
60
80
100
120
140
(b) There are many possible answers. One boxplot with a left skew is given.
500
520
540
560
580
600
(c) There are many possible answers. One boxplot with a symmetric distribution is given.
UNIT A: ESSENTIAL SYNTHESIS & REVIEW
98
125
130
135
140
145
A.77 For the 13 teenage patients, the five number summary for blood pressure is (100, 104, 130, 140, 156). For the 15 patients in their eighties, the five number summary is (80, 110, 135, 141, 190). The range for the patients in their eighties, 190 − 80 = 110, is quite a bit larger than for the teenage patients, 156 − 100 = 56. However the interquartile ranges, IQR= 140 − 104 = 36 for the patients in their teens and IQR=141 − 110 = 31 for those in their 80s, are fairly similar. Thus, while we see more variability at both extremes for the older patients, the distribution of the middle 50% of the blood pressure readings are not very different between the two age groups. The standard deviation for the blood pressure readings for the 13 ICU patients in their teens is s = 19.57, while the standard deviation for the 15 patients in their 80s is s = 31.23. This reinforces the fact that the variability is greater for patients in their 80s. A.78 The five number summary for teens is (100, 104, 130, 140, 156), so the IQR is 36. For teens typical values should fall between 104 − 1.5(36) = 50 and 140 + 1.5(36) = 194. Since all 13 blood pressure values for the teens are between these values, we find no outliers for that group. The five number summary for patients in their eighties is (80, 110, 135, 141, 190), so the IQR is 31. Typical values for the patients in their eighties fall between 110 − 1.5(31) = 63.5 and 141 + 1.5(31) = 187.5. This identifies the two blood pressures at 190 as unusually high values for eighty year-old patients in this intensive care unit. These two values are both outliers. A.79
(a) We see that the interquartile range is IQR = Q3 − Q1 = 149 − 15 = 134. We compute: Q1 − 1.5(IQR) = 15 − 1.5(134) = 15 − 201 = −186 and Q3 + 1.5(IQR) = 149 + 1.5(134) = 149 + 201 = 350 Outliers are any values outside these fences. In this case, there are four outliers that are larger than 350. The four outliers are 402, 447, 511, and 536.
(b) A boxplot of time to infection is shown:
UNIT A: ESSENTIAL SYNTHESIS & REVIEW
0
A.80
100
99
200 300 400 Time to Infection
500
600
(a) We have IQR = 2106 − 1334 = 772, so the upper boundary for non-outlier data values is:
Q3 + 1.5(IQR)
=
2106 + 1.5(772)
= =
2106 + 1158 3264
Any data value above 3264 is an outlier, so the seven largest calorie counts are all outliers.
(b) We have already seen that IQR = 772, so the lower boundary for non-outlier data values is
Q1 − 1.5(IQR)
=
1334 − 1.5(772)
= =
1334 − 1158 176
We see in the five number summary that the minimum data value is 445, so there are no values below 176 and no low outliers.
(c) A boxplot of daily calorie consumption is shown:
UNIT A: ESSENTIAL SYNTHESIS & REVIEW
100
0
A.81
1000 2000 3000 4000 5000 6000 7000 Calories
(a) The median appears to be about 500 calories higher for the males than for the females. The largest outlier of 6662 calories in one day is a male, but the females have many more outliers.
(b) Yes, there does appear to be an association. Females appear to have significantly lower calorie consumption than males. We see that every number in the five number summary is higher for males than it is for females. The median for females is even lower than the first quartile for males.
A.82 The blood pressures have a relatively symmetric distribution ranging from a low of around 35 mm Hg to a high just over 250 mm Hg. The middle 50% of blood pressures are between 110 mm Hg and 150 mm Hg with a median value of 130 mm Hg. There are two unusually low blood pressures at around 35 and 46 and three unusually high blood pressures at 210, 220, and 255. The five number summary appears to be about (35, 110, 130, 150, 255).
A.83 Both distributions are relatively symmetric with one or two outliers. In general, the blood pressures of patients who lived appear to be slightly higher as a group than those of the patients who died. The middle 50% box for the surviving patients is shifted to the right of the box for patients who died and shows a smaller interquartile range. Both quartiles and the median are larger for the surviving group. Note that the boxplots give no information about how many patients are in each group. From the original data table, we can find that 40 of the 200 patients died and the rest survived.
A.84
(a) See the figure. It appears that respiration rate is higher when calcium levels are low and that there is not much difference in respiration rate between medium and high levels of calcium.
UNIT A: ESSENTIAL SYNTHESIS & REVIEW
101
Calcium
High
Low
Medium 30
40
50
60 70 GillRate
80
90
100
(b) With a low calcium level, the mean is 68.50 beats per minute with a standard deviation of 16.23. With a medium level, the mean is 58.67 beats per minute with a standard deviation of 14.28. With a high level of calcium, the mean is 58.17 beats per minute with a standard deviation of 13.78. Again, we see that respiration is highest with low calcium. (c) This is an experiment since the calcium level was actively manipulated. A.85
(a) Yes, it is appropriate to use the 95% rule, since we see in the histogram of blood pressures that the distribution is approximately symmetric and bell-shaped.
(b) We expect 95% of the data values to lie within two standard deviations of the mean, so we have x ± 2s = 132.28 ± 2(32.95) = 66.38 and 198.18 (c) There are 186 systolic values between 66.38 and 198.18 among the 200 cases in ICUAdmissions, or 186/200 = 93% of the data values within the interval x ± 2s. (d) These data match the 95% rule very well. A.86
(a) The mean is the balance point of the histogram and appears to be at approximately 100 beats per minute. Since the distribution is relatively symmetric and bell-shaped, we expect 95% of the data to be within two standard deviations of the mean. It appears that about 95% of the data is between about 50 and 150, so we estimate that the standard deviation is about 25. (In fact, the exact mean and standard deviation are x = 98.9 and s = 26.8.)
(b) The 10th -percentile is the data point with 10% of the area of histogram boxes below it. It appears on the histogram that the 10th -percentile is about 60. (c) The smallest heart rate appears to be about 40 and the largest appears to be about 190, so the range is roughly 190 − 40 = 150. A.87
(a) The explanatory variable is whether the traffic lights are on a fixed or flexible system. This variable is categorical. The response variable is the delay time, in seconds, which is quantitative.
(b) Using technology we find the mean and standard deviation for each sample: Timed: xT = 105 seconds and sT = 14.1 seconds Flexible: xF = 44 seconds and sF = 3.4 seconds This shows that the mean delay time is much less, 61 seconds or more than a full minute, with the flexible light system. We also see that the variability is much smaller with the flexible system. (c) For the differences we have xD = 61 seconds and sD = 15.2 seconds.
UNIT A: ESSENTIAL SYNTHESIS & REVIEW
102
(d) The boxplot is shown. We see that there are 3 large outliers. Since this is a boxplot of the differences, this means there were three simulation runs where the flexible system really improved the time.
40
A.88
50
60
70 80 Difference
90
100
(a) A negative relationship would mean that old people were married to young people and vice versa. It would mean that an 80-year-old might more likely be married to a 20-year-old than to another 80-year-old.
(b) A positive relationship would mean that old people tended to be married to old people and young people tended to be married to young people. (c) A positive relation is expected between these two variables. (d) We expect a very strong linear relationship since it is quite common for people to be married to someone similar in age. (e) Yes, a strong correlation implies an association (but not causation!). A.89
(a) A scatterplot of verbal vs math SAT scores is shown below with the regression line (which is almost perfectly flat).
(b) Using technology we find the correlation between math and verbal scores for this sample is r = −0.071. (c) Based on this small sample of seven pairs, computing a regression line to help predict verbal scores based on math scores is not very useful. The flat line in the scatterplot shows no consistent positive or negative linear trend between these variables. This is also seen with the sample correlation (r = −0.071) which is very close to zero.
UNIT A: ESSENTIAL SYNTHESIS & REVIEW A.90
103
(a) Since we expect lung capacity to be lower for people who have smoked cigarettes for many years, we expect the relationship to be negative. We think of the number of years spent smoking cigarettes as influencing lung capacity, so it makes sense to call the number of years smoking cigarettes the explanatory variable and the lung capacity the response variable.
(b) Since taller people generally weigh more than shorter people, we expect the relationship to be positive. Although either might be the response variable, we more commonly think of predicting weight as a response variable based on height as the explanatory variable, rather than the other way around. (c) People with high blood pressure generally have high blood pressure for both variables and people with low blood pressure generally have low blood pressure for both variables, so we expect a positive relationship between these variables. These variables are related but there is no reason to think of either one as more likely to influence the other, so there is not an obvious explanatory variable and response variable. A.91
(a) The scatterplot for the original five data points is shown below. 5.0
Y
4.5 4.0 3.5 3.0 1
2
4
3 X
5
(b) We use technology to find r = −0.189. Both the scatterplot and the correlation show almost no linear relationship between x and y. (c) The scatterplot with the extra point added is shown below. 10 9 8 Y
7 6 5 4 3 0
2
4
6 X
(d) The correlation (with the extra data point) is r = 0.836.
8
10
UNIT A: ESSENTIAL SYNTHESIS & REVIEW
104
(e) When the outlier (10, 10) is added, the correlation is suddenly very strong and even changes from negative to positive. The outlier has a very substantial effect on the correlation. A.92
(a) The association is negative so r is negative. There is quite a strong linear relationship, so r is close to −1. However, the points do not all fall exactly on a line, so r is close to, but not equal to, −1. We estimate that r ≈ −0.9.
(b) There is no obvious relationship in this data, so we estimate r ≈ 0. (c) The points all fall on a line, and the association is positive, so we have r = 1. (d) The association is positive so r is positive. There appears to be a linear relationship in this data but it is not very strong. We pick a middle of the road positive value for r, such as r ≈ 0.4. (e) There is a very strong and obvious relationship in these data, but the relationship is clearly curved and not linear. There are more points spread out along the “negative” sloping part of the curve, but also an outlier in the upper right corner. Overall, we might expect a correlation near zero. A.93
(a) The general trend in this data appears to be up and to the right, so there is a mostly positive relationship between height and weight. This makes sense, since taller people are more likely to weigh more.
(b) Individual A is short and light in weight, individual B is tall and heavy. Individual C is relatively short and heavy, whereas individual D is relatively tall and thin. A.94
(a) Both correlations appear to be positive.
(b) The linear relationship is stronger for the calories and fat data, so the correlation will be higher for these two variables. (c) We locate this individual point as the only point above the 4000 calorie mark on the vertical scale. The fat consumption appears to be about 240 grams, which is an extreme high value for fat. The fiber consumption for this individual appears to be about 23 grams of fiber, which is not an extreme value for fiber consumption. A.95
(a) A positive association means that large values of one variable tend to be associated with large values of the other; in this case, that taller people generally weigh more. A negative association means that large values of one variable tend to be associated with small values of the other; in this case, that tall people generally weigh less than shorter people. Since we expect taller people to generally weigh more, we expect a positive relationship between these two variables.
(b) In the scatterplot, we see a positive upward relationship in the trend (as we expect), but it is not very strong. It appears to be approximately linear. (c) The outlier in the lower right corner appears to have height about 83 inches (or 6 ft 11 inches) and weight about 135 pounds. This is a very tall thin person! (It is reasonable to suspect that this person may have entered the height incorrectly on the survey. Outliers can help us catch data-entry errors.) A.96
(a) The data point (204, 52) for patient #772 is the dot just above the tick mark for 200 on the systolic blood pressure axis.
(b) No. The rest of the data in this scatterplot are fairly randomly distributed and show no clear association in either direction between heart rate and blood pressure. A.97
(a) Here is a scatterplot of Jogger A vs Jogger B.
UNIT A: ESSENTIAL SYNTHESIS & REVIEW
105
50
Jogger B
48 46 44 42 40 38 43
44
45 46 Jogger A
47
48
(b) The correlation between the two joggers is −0.096. (c) The correlation between the two joggers with the windy race added is now 0.562. (d) Adding the results from the windy day has a very strong effect on the relationship between the two joggers! A.98
(a) There is a strong positive linear relationship.
(b) Using T to represent temperature and R to represent chirp rate, we see that the regression line for these data is T̂ = 37.68 + 0.23R. (c) We use the regression line to find the predicted value for each data point, and then subtract to find the residuals. The results are given in the table. We see that the predicted values are all quite close to the actual values, so the residuals are all relatively small. Chirp rate (R) 81 97 103 123 150 182 195 A.99
Temperature (T ) 54.5 59.5 63.5 67.5 72.0 78.5 83.0
Predicted Temp (T̂ ) 56.31 59.99 61.37 65.97 72.18 79.54 82.53
Residual −1.81 −0.49 2.13 1.53 −0.18 −1.04 0.47
(a) We are attempting to predict rural population with land area, so land area is the explanatory variable, and percent rural is the response.
(b) There appears to be some positive correlation between these two variables, so the most likely correlation is 0.50. = 30.52 + 0.051(LandArea). The slope is 0.051, which (c) Using technology the regression line is Rural means percent rural goes up by about 0.051 with each increase in 1000 sq km of country size. (d) The intercept does not make sense, since a country of size zero would have no population at all! (e) The most influential country is the one in the far top right, which is Uzbekistan (UZB). This is due to the fact that Uzbekistan is much larger than any of the other countries sampled, so it appears to be an outlier for the explanatory variable.
UNIT A: ESSENTIAL SYNTHESIS & REVIEW
106
= 30.52+0.051(9147.4) = (f) Predicting the percent rural for USA with the prediction equation gives Rural 497.0. This implies that 497% of the United States population lives in rural areas, which doesn’t make any sense at all. The regression line does not work at all for the US, because we are extrapolating so far outside of the original sample of 10 land area values. The US is much larger in area than any of the countries that happened to be picked for the random sample. A.100
= 36.49 − 0.00764(LandArea). (a) Using technology the regression line is Rural
(b) The slope with the USA is −0.00764, the slope without is 0.051. These are two very different slopes. Adding the USA has a strong effect, because the United States is an extreme outlier in terms of size. = 36.49 − 0.00764(9147.4) = (c) Predicting the USA percent rural with the new regression line gives Rural = 30.52 + 0.051(9147.4) = 497.0%. The prediction is much better when the 21.6%, compared to Rural United States is included because we are no longer extrapolating so far outside the data. A.101 (a) For the five groups with no hyper-aggressive male, we see that x = 0.044 with s = 0.062. For the groups with a hyper-aggressive male, we have x = 0.390 with s = 0.288. There is a large difference in the mean proportion of time females spend in hiding between the two groups. (b) A scatterplot of female hiding proportions vs mating activity is shown below. 0.6
MatingActivity
0.5 0.4 0.3 0.2 0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 FemalesHiding
(c) We use technology to find the regression line: M atingActivity = 0.48 − 0.323 · F emalesHiding. (d) If there is not a hyper-aggressive male present, the mean proportion of time females spend in hiding is 0.044. The predicted mating activity for this mean is M atingActivity = 0.48 − 0.323 · (0.044) = 0.466. If there is a hyper-aggressive male present, the mean proportion of time females spend in hiding is 0.390. The predicted mating activity for this mean is M atingActivity = 0.48 − 0.323 · (0.390) = 0.354. If there is not a hyper-aggressive male present, predicted mating activity is 0.466. If there is a hyperaggressive male present, predicted mating activity is 0.354. (e) Don’t hang out with any hyper-aggressive males! A.102 (a) The slope 1.426 means that for a one percent rise in the high school graduation rate, the percent graduating college will go up by about 1.426. (b) If the percent to graduate high school in a state is 85%, the predicted percent to graduate college is = −96.37 + 1.426 · (85) = 24.84. If the percent to graduate high school is 90%, the predicted College = −96.37 + 1.426 · (90) = 31.97. percent to graduate college is College
UNIT A: ESSENTIAL SYNTHESIS & REVIEW
107
= −96.37 + 1.426 · (93.3) = 36.68, so the (c) The predicted college percent for Massachusetts is College residual is 50.9 − 36.68 = 14.22. This is the largest residual. A.103
(a) There is a strong linear positive trend, with no obvious outliers.
(b) Household income appears to be a stronger predictor and more strongly correlated. (c) The state with the largest positive residual appears to have a mean household income of about 75 thousand and to have over 50% graduating college, with a predicted percent graduating college (on the regression line) of about 40%. (This state is Massachusetts and the actual values for that point are (50.9, 74.2) with a residual of about 10.3.) (d) The state with the largest negative residual appears to have a mean household income of about 76 thousand dollars and to have about 26% graduating college, with a predicted percent graduating college of just over 40%. (This state is Alaska, where the point is (76.1, 26.5) and the residual is −15.5.)
0.70
0.70
0.65
0.65
0.60
0.60
0.55
FGPct
FGPct
A.104 (a) A scatterplot of FGPct vs FTPct is shown below on the left. There is somewhat of a linear trend, and it is negative, which means that players who are better at free throws tend to be worse at field goals. This is an interesting result!
0.50
0.55 0.50
0.45
0.45
0.40
0.40
0.35
0.35
0.4
0.5
0.6
0.7 FTPct
0.8
0.9
1.0
0.4
0.5
0.6
0.7 FTPct
0.8
0.9
1.0
(b) There are several points on the left side of the graph with free throw percentages at or below 50%. The most extreme of these is Lonzo Ball who made only 41.7% of his free throw attempts. (c) We use technology to find that the correlation is −0.251. (d) The regression line is added to the plot above on the right. The formula for the regression line is ct = 0.5864 − 0.1601 · F T P ct. F GP ct = 0.5864 − 0.1601 · (e) For a player with a F T P ct = 0.70, the predicted field goal percentage is F GP (0.70) = 0.474. A.105 Answers will vary for all except part (a). (a) The table of frequencies for the regions is given below.
108
UNIT A: ESSENTIAL SYNTHESIS & REVIEW Region 1. Latin America 2. Western Nations 3. Middle East 4. Sub-Saharan Africa 5. South Asia 6. East Asia 7. Former Communist Countries Total
Frequency 24 24 16 33 7 12 27 143
UNIT B: ESSENTIAL SYNTHESIS & REVIEW
210 Unit B: Essential Synthesis Solutions
B.1
(a) The p-value is small so we reject a null hypothesis that says the mean recovery time is the same with or without taking Vitamin C.
(b) There are many possible answers. Here’s one example of an inappropriate data collection method: Give Vitamin C only to those who have had cold symptoms for a long time. They may have shorter recovery times since the cold is almost over when they start treatment, while those not getting Vitamin C might be at the early stages of their colds. (c) We need to randomize the assignment of subjects (students with colds) to the two groups. For example, we could flip a coin to determine who gets Vitamin C (heads) and who gets a placebo (tails). Neither the subjects nor the person determining when they are recovered should know which group they are in. (d) The small p-value indicates we should reject H0 : μc = μnc in favor of the alternative H0 : μc < μnc , where μ denotes the mean recovery time from a cold. Thus we have strong evidence that large doses of Vitamin C help reduce the mean time students need to recover from a cold.
B.2
(a) The sample should be from new subjects so the dogs don’t recognize a previous case as familiar. Order should be randomized so a dog doesn’t just happen to sit a lot (for example, it gets tired) when all of one group is presented. The person presenting the sample to the dogs should not know whether it is a cancer or control sample to avoid giving inadvertent signals to the dogs. If patients had already started treatment, the dogs might be smelling the treatment, rather than cancer. This is a well-designed study.
(b) Smokers are more likely to have lung cancer, so a dog might just be smelling that a person is a smoker, rather than having lung cancer, and appear to get a higher than expected proportion correct. (c) The hypotheses are H0 : p1 = p2 vs Ha : p1 < p2 , where p1 and p2 are the proportion of times the dog sits for patients without cancer (Group 1) and with cancer (Group 2), respectively. The sample statistics are p̂1 = 6/105 = 0.057 and p̂2 = 45/48 = 0.938. (d) These two proportions are very different, so we should expect to see a very small p-value. This gives very strong evidence that the dogs can distinguish the cancer patients from the control patients at much better than random chance rates.
B.3
(a) The hypotheses are H0 : μdc = μw vs Ha : μdc > μw , where μdc and μw are the mean calcium loss after drinking diet cola and water, respectively. The difference in means for the sample is xdc − xw = 56.0 − 49.125 = 6.875. We use StatKey or other technology to construct a randomization distribution, such as the one shown below, for this difference in means test. We find the p-value in this upper-tail test by finding the proportion of the distribution above the sample difference in means of 6.875. For the randomization distribution below this gives a p-value of 0.005 and strong evidence to reject the null hypothesis. We conclude that mean calcium loss for women is higher when drinking diet cola than when drinking water.
UNIT B: ESSENTIAL SYNTHESIS & REVIEW
211
(b) Since we found a significant difference in part (a), we find a confidence interval for the difference in means μdc − μw using a bootstrap distribution and either percentiles or the ±2 · SE method. For the bootstrap distribution shown below, we see that a 95% confidence interval using percentiles goes from 2.88 to 10.75. We are 95% sure that women who drink 24 ounces of diet cola will increase calcium excretion, on average, between 2.875 and 10.75 milligrams when compared to drinking water. Using ±2 · SE, the 95% confidence interval is 6.875 ± 2 · 2.014 = 6.875 ± 4.028 = (2.85, 10.90).
B.4
(a) The hypotheses are H0 : p = 0.5 vs Ha : p > 0.5, where p is the proportion of coin flip winners who win the overtime. From the original sample p̂ = 240/428 = 0.56. We use StatKey or other technology to construct a randomization distribution such as the one shown below. The p-value in this right-tail test is the proportion of the distribution above the observed proportion of 0.56. For this randomization distribution we see a p-value of 0.006. This is well below the NFL’s 5% significance level so we reject H0 : p = 0.5. This gives strong evidence that the proportion of overtime games won by the coin flip winner is more than one half. There is evidence of a clear advantage to winning the coin flip when going into overtime in the NFL.
UNIT B: ESSENTIAL SYNTHESIS & REVIEW
212
(b) The hypotheses are H0 : p1 = p2 vs Ha : p1 = p2 , where p1 and p2 are the proportion of overtimes won by coin flip winners before (p1 ) and after (p2 ) the rule changed. The sample results are summarized in the two-way table below.
1973–1993 1994–2009
Flip winner wins OT 94 146
Flip winner loses OT 94 94
Games 188 240
The sample proportions are p̂1 = 94/188 = 0.500 and p̂2 = 146/240 = 0.608 with a difference p̂1 − p̂2 = −0.108. We use StatKey or other technology to construct a randomization distribution such as the one shown below. We find 17 cases where the difference in proportions equals or exceeds the −0.108 difference that was observed in the original data. Doubling to account for the fact that this is a two-tail test gives a p-value of 2 · 17/1000 = 0.034 which gives moderately strong evidence against H0 . From this we can conclude that the proportion of overtime games won by the coin flip winner is higher after the rule change than before. Also, there did not appear to be any advantage to the coin flip winner for overtime games played before the rule changed in 1994.
UNIT B: ESSENTIAL SYNTHESIS & REVIEW B.5
213
(a) Roommates are assigned at random, so whether a student has a roommate with a videogame or not is determined at random.
(b) The hypotheses are H0 : μv = μn vs Ha : μv < μn , where μv and μn are the mean GPA for students whose roommates do and do not bring a videogame, respectively. (c) At a 5% level, we reject H0 . The mean GPA is lower when a roommate brings a videogame. (d) Negative differences indicate μv − μn < 0 which means μv < μn , a lower mean GPA when a roommate brings a videogame. We are 90% sure that students with a roommate who brings a videogame have a mean GPA between 0.315 and 0.015 less than the mean GPA of students whose roommates don’t bring a videogame. (e) At a 5% significance level, we do not reject H0 when the p-value=0.068. There is not (quite) enough evidence to show that mean GPA among students who don’t bring a video game is lower when their roommate does bring one. (f) At a 5% significance level we reject H0 when the p-value=0.026. There is enough evidence to show that mean GPA among students who bring a video game is lower when their roommate also brings one. (g) The effect (reducing mean GPA when a roommate brings a videogame) is larger for students who bring a videogame themselves. Perhaps this makes sense since students who bring a videogame are already predisposed to get distracted by them. (h) For students who bring a video game to college, we are 90% sure that their mean GPA is lower by somewhere between 0.526 and 0.044 points if their roommate also brings a videogame than if their roommate does not bring a videogame. This interval is similar to the one found in part (d) but is farther in the negative direction. (i) Having more videogames in the room tends to be associated with lower mean GPA. (j) There are many possible answers. One possible additional test is to ignore the roommate completely and see if mean GPA in the first semester is lower for students who bring a videogame to college than for students who do not bring one. B.6
(a) The hypotheses are H0 : μh = μw vs Ha : μh > μw , where μh and μw are the mean ages of husbands and wives, respectively, when getting married. Since the data are naturally paired, this is a matched pairs design. We could also denote the hypotheses as H0 : μD = 0 vs Ha : μD > 0, where μD is the mean difference (Husband−Wife) in ages. We compute the difference for each pair and find the mean difference in the original sample of n = 105 couples to be D = 2.83 years. There are a couple of different ways to do the randomizations. In one, we assume that, in each pair, either age could be equally likely to be the husband’s or the wife’s age. This is equivalent to randomly putting a “+” or “−” sign in front of each age difference and then recomputing the mean difference. Another method is to subtract 2.83 from all of the differences to obtain a new set with mean zero and sample (with replacement) from that shifted set. The distributions from either method should be similar. For one set of 5000 randomizations (shown below, using the shifted differences) we find no cases even close to the xD = 2.83 that was observed in the original data. This gives a p-value of essentially zero, providing very strong evidence to reject H0 and conclude that husbands tend to be older, on average, than wives at marriage.
214
UNIT B: ESSENTIAL SYNTHESIS & REVIEW
(b) The hypotheses are H0 : p = 0.5 vs Ha : p > 0.5, where p is the proportion of married couples where the husband is older. In the sample of n = 105 couples, the husband is older in 75 of the couples, so p̂ = 75/105 = 0.714. Using StatKey, we create a randomization distribution based on p = 0.5 (shown below) and see that no cases (out of 5000 randomizations) are as extreme as p̂ = 0.714. This gives a p-value of essentially zero, providing very strong evidence to reject H0 and conclude that the husband is older than the wife in more than 50% of newly marries couples.
(c) Both results are significant so we find a confidence interval in each case. In part (a), to find a confidence interval for μh − μw : form a bootstrap distribution by sampling (with replacement) the differences in the ages (Husband − W if e) to obtain samples of size 105, then find the mean difference for each bootstrap sample. Using StatKey to create such a bootstrap distribution, we find a 95% confidence interval for the mean age difference goes from 1.81 to 3.84 years. We are 95% confident that the mean age for husbands at marriage is between 1.81 years and 3.84 years more than the mean age for wives.
UNIT B: ESSENTIAL SYNTHESIS & REVIEW
215
In part (b), to find a confidence interval for p we form a bootstrap distribution by sampling (with replacement) from the original couples and recording the proportion of couples in each set of 105 sampled for which the husband is older. Using StatKey to create such a bootstrap distribution, we find a 95% confidence interval for the proportion goes from 0.63 to 0.80. We are 95% confident that the husband is older than his wife in between 63% and 80% of all married couples (at least in St. Lawrence County).
B.7
(a) We expect married couples to tend to have similar ages, so we expect a positive correlation between husband and wife ages.
(b) A scatterplot of Husband vs Wife ages is shown below. We see a strong positive, linear association. The correlation for this sample of data is r = 0.914.
216
UNIT B: ESSENTIAL SYNTHESIS & REVIEW
(c) To find a confidence interval for this correlation, we sample (with replacement) from the original data and compute the correlation between husband and wife ages for each bootstrap sample of 105 couples. We repeat this process to generate 5000 values in a bootstrap distribution such as the one shown below. From the percentiles of this distribution of bootstrap correlations, we find a 95% confidence interval to be from 0.877 to 0.945. We are 95% sure that the correlation between husband and wife ages for all recent marriages in this jurisdiction is between 0.877 and 0.945.
(d) Although we have evidence of a strong, positive correlation between the ages of husbands and wives, the correlation contains no information to help with the previous exercise of deciding whether husbands or wives tends to be older.
UNIT B: ESSENTIAL SYNTHESIS & REVIEW
217
Unit B: Review Exercise Solutions B.8 The population is all Internet users in the US. The population parameter of interest is p, the proportion of Internet users who have customized their home page. For this sample, p̂ = 469/1675 = 0.28. Unless we have additional information, the best point estimate of the population parameter p is p̂ = 0.28. To find p exactly, we would have to obtain information about the home page of every Internet user in the US, which is unrealistic. B.9 We are estimating p, the proportion of all US adults who own a laptop computer. The quantity that gives the best estimate is p̂, the proportion of our sample who own a laptop computer. The best estimate is p̂ = 1238/2252 = 0.55. Since the true proportion is unknown, our best estimate for the proportion comes from our sample. We estimate that 55% of all US adults own a laptop computer. B.10
(a) The relevant population is all American adults, and the parameter we are estimating is p, the proportion of all American adults who believe that violent movies lead to more violence in society. The best estimate is p̂ = 0.57.
(b) A 95% confidence interval is Point Estimate 0.57 0.54
± ± to
Margin of Error 0.03 0.60
We are 95% confident that the proportion of all American adults to believe that violent movies lead to more violence in society is between 0.54 and 0.60. B.11 We are 95% sure that the mean amount of carbon for all square kilometers of tropical forests in Latin America, sub-Saharan Africa, and southeast Asia is between 9600 and 13,600 tons. To calculate this exactly, we would have to measure the carbon across all 2.5 billion hectares (or 25 million square kilometers). This is definitely not feasible! B.12 Let μ represent the mean time for a golden shiner fish to find the yellow mark. A 95% confidence interval is given by
46.2
x 51
± ±
2 · SE 2(2.4)
51
± to
4.8 55.8
A 95% confidence interval for the mean time for fish to find the mark is between 46.2 and 55.8 seconds. We are 95% sure that the mean time it would take fish to find the target for all fish of this breed is between 46.2 seconds and 55.8 seconds. In other words, the plausible values for the population mean μ are those values between 46.2 and 55.8. Therefore, 60 is not a plausible value for the mean time for all fish, but 55 is. B.13 We are 95% confident that schools of fish in this situation will end up going with the majority over the opinionated minority only between 9% and 26% of the time. It is not plausible that the schools of fish in this situation are equally likely to go for either option since that would indicate a proportion of p = 0.5 for each option, and 0.5 is not in the range of plausible values. The highly opinionated fish are definitely having an effect!
UNIT B: ESSENTIAL SYNTHESIS & REVIEW
218
B.14 We are estimating the difference in population proportions p1 − p2 where p1 is the proportion of times a school of fish will pick the majority option if there is an opinionated minority, a less passionate majority, and also some additional members with no preference and p2 is the proportion of times a school of fish will pick the majority option if there is an opinionated minority and a less passionate majority and no other fish in the group, as described above in Fish Democracies. (We could also have defined the proportions in the other order.) The best point estimate is p̂1 − p̂2 = 0.61 − 0.17 = 0.44. We find a 95% confidence interval as follows: (p̂1 − p̂2 )
±
2 · SE
(0.61 − 0.17) 0.44
± ±
2(0.14) 0.28
to
0.72
0.16
We are 95% sure that the proportion of schools of fish picking the majority option is 0.16 to 0.72 higher if fish with no preference are added to the group. If adding the indifferent fish had no effect, then the population proportions with and without the indifferent fish would be the same, which means the difference in proportions would be zero. Since zero is not a plausible value for the difference in proportions, it is very unlikely that adding indifferent fish has no effect. The indifferent fish are helping the majority carry the day. B.15
(a) This is a population proportion so the correct notation is p. Using the data in HollywoodMovies, we have p = 386/1295 = 0.298.
(b) We expect it to be symmetric and bell-shaped and centered at the population proportion of 0.298. B.16
(a) It appears that about 95% of the data values are between about 0.18 and 0.42, which is approximately 0.12 on either side of the center at 0.30. Thus we estimate that the standard error is around 0.12/2 = 0.06.
(b) It doesn’t matter what the sample size is, the center will stay approximately at the value of the population parameter (which in this case is p = 0.298). The best option is “about the same as.” (c) Since the sample size is larger (n = 100 instead of n = 50), we expect the sample proportions to be more accurate and closer to the population parameter. This means variability will go down so the best option is “smaller than.” (d) As long as we have a reasonably large number of samples (such as 1000 or greater), the shape, center, and spread of the sampling distribution won’t change much as we take more samples. The best option is “about the same as.” (e) As long as we have a reasonably large number of samples (such as 1000 or greater), the shape, center, and spread of the sampling distribution won’t change much as we take more samples. The best option is “about the same as.” B.17
(a) Both distributions are centered at the population parameter, so 0.05.
(b) The proportions for samples of size n = 100 go from about 0 to 0.12. The proportions for samples of size n = 1000 go from about 0.025 to 0.07. (c) The standard error for samples of size n = 100 is about 0.02 (since it appears that about 95% of the data are between 0.01 and 0.09). The standard error for samples of size n = 1000 is about 0.005 (since it appears that about 95% of the data are between 0.04 and 0.06). (d) A sample proportion of 0.08 is relatively likely from a sample of 100 but extremely unlikely with a sample size of 1000.
UNIT B: ESSENTIAL SYNTHESIS & REVIEW B.18
219
(a) It is not unlikely to get a sample mean more then 2 screws on either side of 50. It is however very unlikely to see a mean below 45 or above 55, so it is unlikely for the sample mean to be more then 5 or 10 screws away.
(b) The distribution shows that finding a mean number of screws equal to 42 from a sample of 10 boxes is very unlikely if the company’s claim is accurate, so, yes, it would be reasonable to conclude that the company’s claim is likely to be incorrect. (c) The sampling distribution shows us that a mean of 42 screws is very unlikely, but this does not imply that one box containing 42 screws is very unlikely. So a box of 42 screws does not give us information one way or another about the company’s claim. B.19
(a) Answers will vary. Here is one possible set of randomly selected P oints values. x = 27.4 Points: 28, 55, 8, 44, 2
(b) Answers will vary. Here is another possible set of randomly selected P oints values. x = 31.6 Points: 14, 55, 22, 62, 5 (c) The mean number of points for all 26 players (rounded to one decimal place) is μ = 24.9 points for the season. Most sample means found in parts (a) and (b) will be somewhat close to this but not exactly the same. (d) The distribution will be roughly symmetric with a peak at the center of 24.9. See the figure.
B.20
(a) Answers will vary. Here is one sample: Minutes: 138.57, 128.73, 135.53, 147.02, 146.18, 137.8, 150.33, 148.17, 134.88, 137.82
(b) Answers will vary. Here is another sample: Minutes: 130.08, 149.63, 138.08, 141, 145.55, 148.35, 147.7, 140.02, 145.18, 134.88
x = 140.50 x = 142.05
(c) The mean of all times of the 140 finishers is μ = 142.37 minutes, or about 2 hours 22 minutes. The sample means found in parts (a) and (b) were probably close to this but not exactly the same. (d) The distribution will be roughly symmetric with a peak at the center of 142.37. See the figure.
220
UNIT B: ESSENTIAL SYNTHESIS & REVIEW
B.21 Answers will vary, but a typical distribution is shown below. The smallest mean is just above 5 and the largest is just below 50 (but answers will vary). The standard deviation of these 5000 sample means is about 7.58.
B.22 Answers will vary, but a typical distribution is shown below. The smallest mean is about 136 minutes and the largest is about 150 minutes. The standard deviation of these sample means is about 2.37.
B.23 The p-value 0.0004 goes with the experiment showing significantly lower performance on material presented while the phone was ringing. The p-value 0.93 goes with the experiment measuring the impact
UNIT B: ESSENTIAL SYNTHESIS & REVIEW
221
of proximity of the student to the ringing phone. The p-value of 0.0004 shows very strong evidence that a ringing cell phone in class affects student learning. B.24
(a) We are estimating μD , the mean difference in delay time for public transportation for all traffic situations in Dresden, Germany.
(b) Put all 24 slips in a container. Pull out one and write down the value and put it back in the container. Mix up the slips, pull out one and repeat that process until there are 24 values written down. Those 24 values form one bootstrap sample. (c) Record the sample mean for the 24 values in the bootstrap sample. (d) The distribution will be bell-shaped and centered at 61. (e) We calculate the standard deviation of the bootstrap statistics. (f) For a 95% confidence interval, we have xD 61
± ±
2 · SE 2(3.1)
61
± to
6.2 67.2
54.8
We are 95% confident that the average time savings is between 54.8 and 67.2 seconds, if the city moves to the new system. B.25 This is a test for a difference in proportions, and we define pF and pN to be the proportion of men copying their partners sentence structure with a fertile partner and non-fertile partner, respectively. The hypotheses are: H0 : Ha :
pF = pN pF < pN
The sample statistic is p̂F − p̂N = 30/62 − 38/61 = 0.484 − 0.623 = −0.139. In a randomization distribution such as the one below, we see that the p-value in the left tail beyond this point is 0.089. We do not reject H0 at a 5% level and do not find evidence that men’s speech is affected by ovulating women. The results are borderline, though, and are significant at a 10% level. It might be worth continuing the experiment with a larger sample size.
UNIT B: ESSENTIAL SYNTHESIS & REVIEW
222
B.26 The point estimate for the population mean is x = 3.04, so we have Interval estimate
=
point estimate ± margin of error
= =
x ± margin of error 3.04 ± 0.86
Since 3.04−0.86 = 2.18 and 3.04+0.86 = 3.90, the interval estimate is 2.18 to 3.90. We have some confidence that the mean tip amount for all her deliveries lies somewhere between $2.18 and $3.90. For these estimates to be accurate, we need to assume that there is nothing special about the nights included in the sample so that it is representative of all her deliveries. For example, the samples were all from Friday and Saturday nights, so we might not expect the interval to hold for the mean tip on Tuesday nights. B.27
(a) Answers vary. One possible sample is 120, 130, 150, 180, 120, 140, 200, 180, 170, 180. All values well above $100.
(b) Answers vary. One possible sample is 70, 120, 90, 110, 80, 60, 110, 100, 80, 120. In this case, the sample mean is x = 94, which is less than 100 so provides no evidence at all that the mean is larger than 100. (c) Answers vary. One possible sample is 90, 100, 70, 110, 120, 80, 140, 100, 80, 120. In this case, the sample mean is x = 101, which is just barely bigger than $100. Since the sample mean is larger than $100, we have some evidence that the population mean will be larger than $100, but it is very weak evidence. B.28 The best point estimate is p̂ = 0.56 and a 95% confidence interval is Point Estimate 0.56 0.53
± ±
Margin of Error 0.03
to
0.59
We are 95% confident that the proportion of all American adults who rarely or never go out to the movies is between 0.53 and 0.59. This entire interval is above 50%, so we can be relatively sure that the percentage is greater than 50%. B.29
(a) The mean is x = 67.59 and the standard deviation is s = 50.02.
(b) Select 20 values at random (with replacement) from the original set of skateboard prices and record the mean for those 20 values as the bootstrap statistic. (c) We expect the bootstrap distribution to be symmetric and bell-shaped and to be centered at the sample mean: 67.59. (d) We find the 95% confidence interval: x 67.59
± ±
2 · SE 2(10.9)
67.59 45.79
± to
21.8 89.39
We are 95% confident that the mean price of skateboards for sale online is between $45.79 and $89.39.
UNIT B: ESSENTIAL SYNTHESIS & REVIEW
223
B.30 The dog got p̂B = 33/36 = 0.917 or 91.7% of the breath samples correct and p̂S = 37/38 = 0.974 or 97.4% of the stool samples correct. (A remarkably high percentage in both cases!) We create a bootstrap distribution for the difference in proportions using StatKey or other technology (as in the figure below) and then find the middle 90% of values. Using the figure, the 90% confidence interval for pB − pS is −0.14 to 0.025. We are 90% confident that the difference between the proportion correct for breath samples and the proportion correct for stool samples for all similar tests we might give this dog is between −0.14 and 0.025. Since a difference of zero represents no difference and zero is in the interval of plausible values, it is plausible that there is no difference in the effectiveness of breath vs stool samples in having this dog detect cancer.
B.31
(a) For one set of 5000 bootstrap sample standard deviations shown below, the 2.5%-tile and 97.5%tile are 13.6 and 31.6, respectively. Thus we can say with 95% confidence that the standard deviation of the number of penalty minutes awarded to all NHL players in a season is between 13.6 and 31.6 minutes.
(b) The midpoint of the interval in part (a) is (13.6 + 31.6)/2 = 22.6 which is less than the standard deviation of the original sample, s = 24.92. In general, an interval based on bootstrap percentiles does not need to be centered at the original sample statistic.
UNIT B: ESSENTIAL SYNTHESIS & REVIEW
224
B.32 The mean area for the sample of ten countries is x = 111.3 thousand square kilometers. Using technology we obtain a bootstrap distribution as shown below. From this distribution the 99% confidence interval is (30.1, 228.3). (Answers will vary.) We are 99% confident that the average country size for all 217 countries is between 30,100 and 228,300 square kilometers.
B.33
(a) We compute the regression line to be P ctRural = 30.52 + 0.050 · Area The slope of the line for this sample is 0.050.
(b) Using technology to produce the bootstrap distribution below for the sample slopes, we get a 95% confidence interval for the slope from −0.004 to 0.135. Answers will vary — for this small a sample with strongly skewed data the bootstrap slopes might contain some very extreme values. We are 95% confident that the slope of the regression line for all countries to predict percent rural from land area is between −0.004 and 0.135.
(c) The 95% confidence interval from part (b) is (−0.004, 0.135), so we barely capture the true population slope of 0. The lower bound is very close to zero, so this answer may vary, depending on the results of the simulation from part (b). B.34
(a) For smokers we look only in the Smoker column of the two way table and observe 38 of 135 succeeded in getting pregnant, so the sample proportion for smokers is p̂s = 38/135 = 0.28. For
UNIT B: ESSENTIAL SYNTHESIS & REVIEW
225
non-smokers we look only in the Non-smoker column of the two way table and observe 206 of 543 succeeded, so the sample proportion for non-smokers is p̂ns = 206/543 = 0.38. The best point estimate is the difference in sample proportions: p̂ns − p̂s = 0.38 − 0.28 = 0.10. Note that you could also choose the other direction and estimate ps − pns with p̂s − p̂ns = 0.28 − 0.38 = −0.10. (b) Using StatKey or other technology, we construct a bootstrap distribution for this difference in proportions (shown below). Using percentiles we see that a 90% confidence interval for the difference in proportions is 0.024 to 0.168. We are 90% confident that the pregnancy rate for non-smokers is 0.024 to 0.168 higher than it is for smokers. Since zero is not in this interval, it is unlikely that the pregnancy proportions are the same for smokers and non-smokers.
B.35
(a) Since we are looking at whether smoking has a negative effect, this is a one-tailed test.
(b) The null hypothesis will be that the two proportions are identical H0 : ps = pns , and the alternative is that the proportion of successful pregnancies will be less in the smoking group Ha : ps < pns . (c) We want the number assigned to each group to match the numbers in the original sample, so 135 women to the smoking group and 543 to the non-smoking. (Both of these values can be found in the two-way table.) (d) In the original sample, there were 38 successful pregnancies in the 135 women in the smoking group. From the randomization distribution, it appears that about 40 of the 1000 values fall less than or equal to the count of 38 from the original sample, so the best estimate for the p-value is about 40/1000 = 0.04. B.36
(a) Sample B. The sample mean in B is around 45, while the sample mean in A is around 47 and the spread and sample size appear similar in the two samples.
(b) Sample B. Both samples appear to have a mean near 47, but the variability is smaller in sample B so we can be more sure the mean is below 50. Also, sample B has only one value above 50, while sample A has 7 values above 50. (c) Neither. Both samples appear to have means above 50, so neither would give evidence that the population mean is less than 50. B.37
(a) Sample A. The sample mean in A is around 43, while the sample mean in B is around 47. Sample sizes and variability are similar for both samples.
226
UNIT B: ESSENTIAL SYNTHESIS & REVIEW
(b) Sample B. Both samples appear to have a mean near 46, but the variability is smaller in sample B so we can be more sure the mean is below 50. Also, sample B has few values above 50, while sample A has at least 25% of its values above 50 (since Q3 > 50). (c) Sample A. Both samples appear to have about the same mean and median (near 45) and similar variability, but sample A is based on a much larger sample size, so it would be more unusual to see that many values below 50 if H0 : μ = 50 were true. B.38
(a) If the tax has no effect on mean soda consumption, a sample mean would be as small (or smaller) than the one observed about 2% of the time by random chance alone. This would be fairly unlikely, so we find fairly strong evidence that the sales tax has reduced average consumption.
(b) If the tax has no effect on mean soda consumption, a sample mean would be as small (or smaller) than the one observed about 41% of the time by random chance alone. This would not be very surprising to see, so we wouldn’t have much evidence at all that the sales tax reduces average soda consumption. (c) The p-value of 0.02 is small, much smaller than 0.41, so it would give stronger evidence that a sales tax will reduce mean soda consumption. (d) The smaller p-value, 0.02, is more statistically significant. B.39
(a) This is a population proportion so the correct notation is p. We have p = 170/1295 = 0.131.
(b) Using technology we produce a sampling distribution (shown below) of 5000 sample proportions when samples of size n = 100 are drawn from a population with p = 0.131. We see that the distribution is relatively symmetric, bell-shaped, and centered at the population proportion of 0.131, as we expect. We also see in the figure that the estimated standard error based on these 5000 simulated proportions is 0.033.
B.40
(a) This is a population mean so the correct notation is μ. Using the data in the People variable and restricting to just the 230 cases in the ”Performer” category, we find μ = 3.057.
(b) We might use a dotplot, histogram, or boxplot to look at all 230 values. A dotplot is shown below.
UNIT B: ESSENTIAL SYNTHESIS & REVIEW
227
Whichever plot we choose, we see that the data are very skewed to the right with a few outliers. This is not surprising, since there are lots of performing “groups” that consist of just one person, but there are also some very large groups, such as the 16-member Parliament-Funkadelic, or The Grateful Dead at 12 members. (c) Using technology, we produce a dotplot (shown below) for the mean number of People in 5000 samples of size n = 10 chosen from among the 230 performer groups/individuals in the Rock and Roll Hall of fame. We see that the distribution is relatively symmetric, bell-shaped, and centered (about) at the population mean of 3.057, as we expect. We find the standard deviation of these 5000 sample means is 0.74, so we estimate the standard error of x in this case to be about SE ≈ 0.74.
(d) One dot on the sampling distribution represents the sample mean of People for one random sample of 10 performers from the full dataset. B.41
(a) Approximately the same. We expect both distributions to be approximately symmetric and bell-shaped.
(b) Different. The sampling distribution is centered at the value of the population parameter, while the bootstrap distribution is centered at the value of the sample statistic. (c) Approximately the same. The standard error from the bootstrap distribution gives a good approximation to the standard error for the sampling distribution.
UNIT B: ESSENTIAL SYNTHESIS & REVIEW
228
(d) Different. One value in the sampling distribution represents the statistic from a sample taken (without replacement) from the entire population, while one value in the bootstrap distribution represents the statistic from a sample taken with replacement from the original sample. In both cases, however, we compute the same statistic (mean, proportion, or whatever) and use the same sample size. (e) Different. In order to create a sampling distribution, we need to know the data values for the entire population! In order to create a bootstrap distribution, we only need to know the values in one sample. This is what makes the bootstrap method so powerful. B.42 A bootstrap distribution generated to find a confidence interval is centered at the value of the original sample statistic. A randomization distribution to test a hypothesis is centered at the value for the parameter given in the null hypothesis. B.43
(a) The hypotheses are H0 : p = 0.5 vs Ha : p > 0.5, where p is the proportion of all games Paul the Octopus picks correctly.
(b) Answers vary, but 8 out of 8 heads should rarely occur. (c) The proportion of heads in flipping a coin is p = 0.5, which matches the null hypothesis. B.44 We use technology to simulate many samples of size 8 from a population that has an equal number of “successes” and “failures”, i.e. one where p = 0.5. For each sample we count the number of successes out of the 8 trials to obtain a randomization distribution such as the one shown below (or find the proportion of successes in each sample). We then count the number of samples for which all 8 trials are successes, and divide by the total number of samples to get a p-value. For the distribution below, only 4 of the 1000 samples gave 8 correct guesses in 8 trials, so we estimate the p-value = 0.004. Answers will vary for other randomizations but the p-value will always be small, indicating that it is very unlikely to predict all eight games correctly when just guessing at random.
B.45
(a) H0 : pf = pnf , Ha : pf > pnf , where pf and pnf are the proportion of females wearing a red or pink shirt in the fertile and not fertile groups, respectively.
(b) p̂f − p̂nf = 4/10 − 1/14 = 0.400 − 0.071 = 0.329 (c) The statistics of 0.329 and 0.500 are greater than or equal to the observed statistic of 0.329, so 68 + 5 = 73 of the 1000 simulated statistics are as extreme as the observed statistic, so the p-value is 73/1000 = 0.073.
UNIT B: ESSENTIAL SYNTHESIS & REVIEW
229
B.46 We use StatKey or other technology to generate a bootstrap distribution for the difference in means. For a 90% confidence interval, we keep the middle 90% of values in the bootstrap distribution and cut off 5% in each tail. We see in the figure that the 90% confidence interval for the difference in weekly hours spent exercising is (−0.92, 6.87). We are 90% confident that the difference in mean number of hours spent exercising between male and female college students is between −0.92 hours and 6.87 hours. This means the true difference is likely to be anywhere from females exercising, on average, 0.92 hours more than males to males exercising, on average, 6.87 hours more than females. Notice that the 90% interval calculated here is narrower than the 95% confidence interval (−1.75 to 7.75) calculated earlier using the standard error. This makes sense, since a 95% confidence interval needs to be wider to have a better chance of capturing the true difference.
B.47
(a) The population is all American adults. The sample is the 7293 people who were contacted.
(b) We are 95% sure that the proportion of all American adults planning to watch the game was between 57.3% and 59.7%. (c) The center of the confidence interval is (0.573 + 0.597)/2 = 0.585, so we expect the estimated proportion to watch Super Bowl 50 is p̂ = 0.586, with a margin of error of 0.012. B.48
(a) We put the 10 data values (time immobile) from the sample on 10 slips of paper. We mix them up, select one, record the value, and put it back. Do this 10 times. The statistic recorded is the mean of the 10 values obtained that way.
(b) A 95% confidence interval is x ± 2 · SE = 135 ± 2(6) = 135 ± 12. We are 95% sure that the mean time immobile for mice who were depressed and then received a shot of ketamine is between 123 and 147 seconds. (c) The value of 160 is outside of the confidence interval from part (b), so is not a plausible mean value for immobile time of mice treated with ketamine. It appears that, on average, ketamine reduces this measure of depression in mice. B.49
(a) The hypotheses are H0 : μ = 160 vs Ha : μ < 160, where μ is the mean score on the forced swim test for depressed mice after treatment with ketamine.
(b) The mean for the original sample of 10 mice is x = 135 seconds and we need to match the null mean of μ = 160 seconds so we add 160 − 135 = 25 seconds to each of the ten data values and write each
230
UNIT B: ESSENTIAL SYNTHESIS & REVIEW new score on a slip of paper. We choose a slip of paper at random (with replacement), write down the value, and continue until we have 10 values. The randomization statistic is the mean of those 10 values.
B.50
(a) The mean and standard deviation of Internet access rates for all countries are population parameters so the correct notation is μ and σ. Using the (non-missing) data cases in AllCountries, we have μ = 54.5% and σ = 28.4.
(b) The countries with the highest Internet access rates is Andorra, with 98.9% of its population having access to the Internet. The lowest rate is Eritrea, at 1.3%. Answers will vary for the percentage in your country, depending on where you live. (c) A sampling distribution with means for 5000 samples of size n = 10 taken from the Internet access rates for the population of all countries is shown below. We see that the distribution is symmetric, bellshaped, and centered at the population mean of 54.5 percent, as we expect. The standard deviation of these 5000 sample means is 8.7, so we estimate the standard error for mean Internet access rates based on samples of 10 countries to be SE ≈ 8.7.
B.51 When the results of the study are not statistically significant, we fail to reject the null hypothesis (in this case that heavy cell phone use is unrelated to developing brain cancer). But that does not mean we “accept” that H0 must be true, we just lack sufficiently convincing evidence to refute it. (a) By not rejecting H0 we are saying it is a plausible option, so it might be true that heavy cell phone use has no effect on developing brain cancer. (b) There is some evidence in the sample that heavy cell phone users have a higher risk of developing brain cancer, just not enough to be considered statistically significant. Failing to reject H0 means either H0 or Ha might still be true. Note that any confidence interval that includes zero contains both positive and negative values as plausible options. Hence the authors tell us that the question “remains open.” (c) We note that this study was an observational study and not an experiment, and thus, even if the results had been statistically significant, we would not make a cause/effect conclusion about this relationship. However, a lack of significant results does not rule this out as a plausible option. B.52
(a) We are interested in the proportion of babies born with an infection if their mothers are wiped with a treated wipe (pt ) compared to the proportion of babies born with an infection if their mothers are wiped with just a sterile wipe (ps ). The hypotheses are H0 : pt = ps vs Ha : pt < ps .
UNIT B: ESSENTIAL SYNTHESIS & REVIEW
231
(b) We use the sample proportions with an infection, p̂t and p̂s , for each treatment, or the difference p̂t − p̂s . (c) If the results are significant we can conclude that treated wipes help reduce the proportion of babies who are born with infections as compared to using sterile wipes. (d) If the results are not significant we can’t make any definitive conclusion about whether or not treated wipes help reduce the rate of infections in babies. B.53
(a) The parameter of interest is ρ, the correlation between score on the mouse grimace scale and pain intensity and duration. Since the study is investigating a positive relationship between these variables, the hypotheses are H0 : ρ = 0 vs Ha : ρ > 0.
(b) Yes. If they conclude there is some relationship, the sample correlation must have been statistically significant (and positive). (c) No. If the original correlation is statistically significant, a sample produced under the null hypothesis of no relationship should rarely give a correlation more extreme than was originally observed. This is what we mean by statistically significant. (d) If the results of the original study were not significant, it would not be very unusual to get a sample correlation that extreme when H0 is true. So it would not be very surprising to see a placebo give a larger correlation. B.54
(a) Finding significant differences on kindergarten tests would mean the p-value is relatively small.
(b) Finding no significant difference on junior high and high school tests would mean the p-value is not small and is relatively large. (c) Finding significant differences again in adulthood indicates the p-values for testing those factors are relatively small. B.55
(a) This is an upper-tail test, so the p-value is the proportion of randomization samples with differences more than the observed D = 0.79. There are 23 dots to the right of 0.79 in the plot, so the p-value is 23/1000 = 0.023.
(b) The randomization distribution depends only on H0 so it would not change for Ha : μs = μn . For a two-tailed alternative, we need to double the proportion in one tail, so the p-value is 2(0.023) = 0.046. B.56 The point estimate from the original sample is p̂ = 0.45. The figure shows the proportions for 10,000 bootstrap samples of size 147,291 (the original sample size) when the proportion is 0.45. We see that the distribution is relatively symmetric and bell-shaped, and we see in the upper-right of the figure that the standard error based on the bootstrap distribution is SE = 0.001. (Looking at the distribution, an estimate of 0.0015 or anything in between would also be reasonable.) The 95% confidence interval is p̂
±
2 · SE
0.45 0.45
± ±
2 · 0.001 0.002
to
0.452
0.448
For a sample as large as 147,291, the margin of error (0.002 or 0.2%) is quite small. We can be 95% confident that the proportion of all American adults to get health insurance from an employer is between 44.8% and 45.2%. B.57
(a) The sample proportion is 0.57, and this is the best point estimate we have of the population proportion of inaccurate classifications of truthful answers when under stress.
UNIT B: ESSENTIAL SYNTHESIS & REVIEW
232
(b) Using the sample proportion and the bootstrap standard error we get an interval estimate for the true proportion using p̂
±
2 · SE
0.57 0.57
± ±
2 · 0.07 0.14
to
0.71
0.43
(c) The proportion of false reports of lying is extremely large throughout our interval! So results from this lie detector should not hold up in court. B.58 This is a hypothesis test for a single proportion, and we define p to be the proportion of times the lie detector will report lying when a stressed individual is telling the truth. The hypotheses are: H0 : Ha :
p = 0.5 p > 0.5
We create a randomization distribution of sample proportions (shown below) using p = 0.5 and samples of size n = 48. The statistic for the original sample is p̂ = 27/48 = 0.563 and this is a right-tailed test, so the p-value is the proportion of samples with proportions beyond 0.563. For the randomizations distribution below this gives a p-value of 0.229 which is not small. We do not reject H0 , and do not find evidence that the software gives inaccurate results more than half the time in this situation .
B.59
(a) Reject H0 . The mean resting metabolic rate is higher in lemurs after treatment with a resveratrol supplement.
(b) Reject H0 . The mean body mass gain is lower in lemurs after treatment with a resveratrol supplement. (c) Reject H0 . The mean food intake in lemurs changes after treatment with a resveratrol supplement. We can’t tell from the information given which way it changes. (d) Do not reject H0 . There is not evidence of a change in mean locomotor activity in lemurs after treatment with a resveratrol supplement.
UNIT B: ESSENTIAL SYNTHESIS & REVIEW
233
(e) Strongest evidence (smallest p-value) is in the test for mean body mass gain (p-value=0.007). Weakest evidence (largest p-value) is in the test for locomotor activity (p-value=0.980). (f) Parts (b) and (d) remain the same with a 1% significance level, but parts (a) and (c) would change to “do not reject H0 ” since those p-values are larger than 0.01. Thus the difference in mean metabolic rate and mean food intake would not be significant at a 1% level. (g) We have strong evidence that the mean metabolic rate is lower when treating lemurs with resveratrol, very strong evidence that the mean body mass gain is lower, strong evidence that the mean food intake is different, but no evidence that mean locomotor activity is affected. (h) If the lemurs represent a random sample of all lemurs, then we can generalize the findings based on this small p-value to conclude that the mean body mass gain is lower after four weeks of resveratrol supplements. B.60
(a) This is an experiment since the explanatory factor (cell phone “on” or “off”) was controlled. The design is matched pairs, since all 47 participants were tested under both conditions. For each participant, we find the difference in brain activity between the two conditions.
(b) Randomization in this case means that the order of the conditions (“on” and “off”) was randomized for all the participants. Cell phones were on the ears for both conditions to control for any lurking variables and to make the treatments as similar as possible except for the variable of interest (the radiofrequency waves). (c) Using μon to represent average brain glucose metabolism when the cell phones are on and μof f to represent average brain glucose metabolism when the cell phones are off, the hypotheses are: H0 :
μon = μof f
Ha :
μon = μof f
Notice that since this is a matched pairs study, we could also write the hypotheses in terms of the average difference μD between the two conditions, with H0 : μD = 0 vs Ha : μ = 0. (d) Since the p-value is quite small (less than a significance level of 0.01), we reject the null hypothesis. There is significant evidence that brain activity is affected by cell phones. (e) Both of these variables (brain glucose metabolism and amplitude of radiofrequency) are quantitative, so we use a scatterplot to graph the relationship. (f) We are testing to see if the correlation ρ between these two variables is significantly different from zero, so the hypotheses are H0 : Ha :
ρ=0 ρ = 0
where ρ is the correlation between brain glucose metabolism and amplitude of radiofrequency. (g) This p-value is very small so we reject H0 . There is strong evidence that brain activity is correlated with the amplitude of the radiofrequency waves emitted by the cell phone. B.61 Using StatKey or other technology we create a bootstrap distribution using the original sample with a proportion of 28 autism cases out of 92 siblings (p̂ = 0.304). We find that a 99% confidence interval from this distribution goes from about 0.185 to 0.424. We are 99% sure that the percentage of siblings of children with autism likely to themselves have autism is between 18.5% and 42.4%.
234
UNIT B: ESSENTIAL SYNTHESIS & REVIEW
B.62
(a) For the original sample, the mean commute distance is 18.16 miles and the standard deviation is 13.8 miles.
(b) One bootstrap distribution of distance means is shown below. It is bell-shaped, centered around 18.2, and shows sample means ranging between about 16.5 and 20.5 miles.
(c) The standard error of the means for this set of 2000 bootstrap samples is 0.61 miles. (d) A 95% confidence interval is given by 8.16
±
2(0.61)
8.16 16.94
± to
1.22 19.38
We are 95% sure that the mean commuting distance for all Atlanta commuters is between 16.94 miles and 19.38 miles. B.63 Using StatKey or other technology, we construct a bootstrap distribution based on the mean commute distances for 5000 samples of size n = 500 taken (with replacement) from the original CommuteAtlanta
UNIT B: ESSENTIAL SYNTHESIS & REVIEW
235
distances. For the distribution shown below, the 5%-tile is 17.15 and the 95%-tile is 19.15, so the 90% confidence interval is (17.15, 19.15). We are 90% sure that the average distance to work for all commuters in metropolitan Atlanta is between 17.15 miles and 19.15 miles.
B.64
(a) We use technology to compute the correlation between commute distances and times, r = 0.807, for the 500 data values.
(b) The distribution of bootstrap correlations (shown below) is fairly bell-shaped (perhaps a slight left skew), centered around 0.81, and ranges between about 0.70 and 0.90. (c) The standard deviation of the bootstrap correlations for this bootstrap distribution is 0.0355 so the margin of error is 2 · 0.0355 = 0.071. An interval estimate for the correlation between commute distances and time is 0.807 ± 0.071 or between 0.736 and 0.878. (d) The interval is shown on a dotplot of the bootstrap distribution below. The interval includes roughly 95% of the bootstrap correlations.
B.65
(a) Select a sample of size 500, with replacement, from the original CommuteAtlanta data and compute the correlation between Distance and T ime for that sample.
236
UNIT B: ESSENTIAL SYNTHESIS & REVIEW
(b) Among one set of 1000 bootstrap correlations show below, the 0.5%-tile is 0.706 and the 99.5%-tile is 0.876. The 99% confidence interval is (0.706, 0.876). We are 99% sure that the correlation between distance and time of Atlanta commutes is somewhere between 0.706 and 0.876.
(c) For this set of bootstrap correlations the 95% confidence interval goes from 0.729 to 0.867. The 90% confidence interval goes from 0.742 to 0.859. (d) As the confidence level decreases from 99% to 95% to 90%, the confidence intervals get narrower. B.66
(a) A p-value of 0.7619 is not at all small, so the difference in means between the two groups is not significant. Thus there is insufficient evidence to conclude that playing Game 1 helped those students on the exam question about Game 1.
(b) The p-value (0.7619) measures the chance of seeing results so extreme when H0 is true, so we would expect about 762 out of 1000 experiments to be this extreme if there is no effect. (c) If both p-values for the one tail tests are greater than 0.5, it means the differences in the sample means were in the opposite direction of Ha in both cases. So for both questions the students who did not play the game in class actually had the higher mean score on the exam question related to the game. This was a very surprising result! B.67
(a) For the small p-value of 0.0012, we expect about 0.0012 · 1000 = 1.2 or about one time out of every 1000 to be as extreme as the difference observed, if the questions are equally difficult.
(b) The p-value is very small, so seeing this large a difference would be very unusual if the two questions really were equally difficult. Thus we conclude that there is a difference in the average difficulty of the two questions. (c) There is nothing in the information given that indicates which question had the higher mean, so we can’t tell which of the two questions is the easier one. B.68 The results are inconclusive. Playing the game might actually help (and the study had Type II errors), or playing the game might actually hurt exam performance (as the direction of the samples indicates), or the game might have no effect on exam performance at all (as stated in H0 ). B.69
(a) Let ρ denote the correlation between pH and mercury in all Florida lakes. The question of interest is whether or not this correlation is negative, so we use a one-tailed test, with hypotheses H0 : ρ = 0 vs Ha : ρ < 0.
UNIT B: ESSENTIAL SYNTHESIS & REVIEW
237
(b) We want to find a point that has roughly 30% of the randomization distribution in the lower tail below it. This should occur somewhere between r = −0.20 and r = −0.10, perhaps r ≈ −0.15. (It is difficult to determine this point very precisely from the plot.) (c) We want to find a point that has only about 1% of the randomization distribution in the lower tail below it. This should occur around r ≈ −0.50. B.70
(a) A Type I error means the restaurant chain concludes that the arsenic level in chickens from a supplier is too high (above 80), when actually the mean for the supplier is not more than 80 ppb. They would cancel their orders from that supplier when they don’t need to.
(b) A Type II error means that the restaurant chain concludes that the arsenic level is acceptable from a supplier, when actually the mean is more than 80 ppb. They would continue to buy chicken from this supplier, even though the mean arsenic level is too high. (c) No. A Type I error can occur by getting a sample that is unusual just by random chance. A Type II error can occur when the mean arsenic level is more than 80, but the sample size is too small or the sample has too much variability to rule out random chance as a possible explanation for the difference. B.71
(a) The hypotheses are H0 : p1 = p2 vs Ha : p1 > p2 , where p1 and p2 are the proportion with reduced pain when using cannabis and a placebo, respectively.
(b) The sample statistics are p̂1 = 14/27 = 0.519 and p̂2 = 7/28 = 0.250. Since p̂1 > p̂2 , the sample statistics are in the direction of Ha . (c) If the FDA requires very strong evidence to reject H0 , they should choose a small significance level, such as α = 0.01. (d) In this situation, under a null hypothesis that H0 : p1 = p2 , pain response would be the same whether in the cannabis or placebo group. The randomization distribution for p̂1 − p̂2 should be centered at zero. (e) We draw a bell-shaped curve, centered at 0, and roughly locate the original sample statistic, p̂1 − p̂2 = 0.519 − 0.250 = 0.269, so that the area in the right tail is only about 0.02.
(f) A p-value as small as 0.02 gives fairly strong evidence to reject H0 : p1 = p2 . (g) If the FDA uses a small α = 0.01, the p-value is not less than α, so we do not reject H0 . Although the sample results are suggestive of a benefit to using cannabis for pain reduction, they are not sufficiently strong to conclude (at a 1% significance level) that the proportion of patients having reduced pain after using cannabis is more than the proportion who are helped by a placebo.
238
UNIT B: ESSENTIAL SYNTHESIS & REVIEW
B.72 Using StatKey or other technology we create randomization samples by randomly assigning the actual sample results to cannabis and placebo groups and finding the difference in proportions for pain reduced, p̂1 − p̂2 . For the randomization distribution shown below, we see that 25 of the 1000 samples had p̂1 − p̂2 ≥ 0.269, giving a p-value of 0.025.
This is close to the p-value of 0.02 from the previous exercise and the p-value will vary for different sets of randomizations. We have moderately strong evidence that cannabis is better than a placebo at pain reduction. However, if the significance level is the strict α = 0.01 (as in the previous exercise), we would still not have sufficient evidence to reject H0 . The conclusion remains the same — the results are not extreme enough to convince the FDA that using cannabis is better than using a placebo for pain relief in HIV patients. B.73
(a) In a Type I error, we conclude that treated wipes prevent infection, when actually they don’t.
(b) In a Type II error, we conclude that treated wipes are not shown to be effective, when actually they help prevent infections. (c) A smaller significance level means we need more evidence to reject H0 . We would want a smaller significance level in the second situation (harmful side effects) so it has to be very clear that the treated wipes help prevent infection, since we don’t want to put people at risk for side effects if the benefit isn’t definite. (d) The p-value (0.32) is not small, so we do not reject H0 . The study does not provide sufficient evidence to show that treated wipes are more effective at reducing the proportion of infected babies than sterile wipes. (e) Not necessarily. The results of the test are inconclusive when the p-value is not small. Either H0 or Ha could still be valid, so the treated wipes might help prevent infections and the study just didn’t accumulate enough evidence to verify it. B.74
(a) The null hypothesis assumes that pH values and mercury levels are not related, ρ = 0.
(b) The randomization distribution should be centered at zero, the null value for the correlation. (c) Take 53 index cards and write down the pH value for each lake in the sample on a different card. Shuffle the cards and deal them out, assigning the pH values to the Florida lakes in a random order. (We could also have done this in the opposite way: write the mercury levels on the cards and assign them at random to pH values. Either way works.) Compute the sample correlation between the randomly assigned pH values and the actual mercury readings for the lakes.
UNIT B: ESSENTIAL SYNTHESIS & REVIEW
239
B.75 We use technology to create a randomization distribution (such as shown below) and then check to see how extreme the original sample statistic of r = −0.575 is in that distribution. None of the randomization statistics in this distribution are beyond (less than since this is a lower-tail test) the observed r = −0.575. Thus the p-value ≈ 0.000. Based on this very small p-value, we have strong evidence to reject H0 and conclude that there is negative correlation between pH levels and the amount of mercury in fish of Florida lakes. More acidic lakes tend to have higher mercury levels in the fish.
B.76
(a) The hypotheses are H0 : ρ = 0 vs Ha : ρ > 0, where ρ is the correlation between heart rate and systolic blood pressure for all 55-year-old ICU patients.
(b) We assume that the null hypothesis is true, which in this case means that heart rates are unrelated to systolic blood pressure. (c) We record the sample correlation for each randomization sample. For the original sample, we have r = 0.279. (d) Since the null hypothesis is H0 : ρ = 0, the randomization correlations should be centered at zero. (e) Randomly assign the 8 blood pressure values to the 8 heart rates. Compute the sample correlation for each random assignment. (f) Answers vary. r = 0.199.
One randomization sample (out of many possibilities) is shown below which has
Heart Rate Systolic BP
86 188
86 122
92 138
100 110
112 140
116 128
136 132
140 190
(g) Answers vary. Here’s a second scrambling of the blood pressures assigned to the heart rates which gives a correlation of r = −0.443. Heart Rate Systolic BP
86 190
86 140
92 122
100 188
112 110
116 138
136 132
140 128
B.77 For one set of 1000 randomization samples (shown below), 242 of the sample correlations are more than the observed r = 0.279 which gives a p-value of 0.242. This p-value is not smaller than any reasonable significance level, so we do not have sufficient evidence to reject H0 . Based on this sample of 8 patients, we cannot conclude that there must be a positive association between heart rate and systolic blood pressure for 55-year-old patients at this ICU.
240
UNIT B: ESSENTIAL SYNTHESIS & REVIEW
B.78 The null hypothesis for this difference in proportions test is that the proportions are the same. To create the randomization samples, we match the null hypothesis. In this situation, that means gender doesn’t matter in smoking outcomes, so one way to match this is to randomly scramble the yes/no responses from the original sample to the smoking question and assign them to the original subjects. Compute the difference in the proportion of smokers between the two genders in this simulated sample to get the randomization statistic. Other methods, for example sampling with replacement from the pooled data to simulate new samples of male and female responses, are also acceptable. B.79 The null hypothesis for this correlation test is that the correlation is zero. To create the randomization samples, we match the null hypothesis. In this situation, that means height and salary are completely unrelated. We might randomly scramble the height values and assign them to the original subjects/salaries. Compute the correlation, r, between those random heights and the actual salaries. B.80 The null hypothesis for this test for a single proportion is that the population proportion is 0.20. To create the randomization samples, we match the null hypothesis. In this situation, that means we might randomly sample (with replacement) from a set that has 2 “yes” and 8 “no” values, where “yes” represents a person who watches the Home Shopping Network. Use the same sample size as the original sample and compute p̂, the proportion of “yes” responses in the simulated sample. B.81 The null hypothesis for this difference in means test is that the means of the two groups are the same. To create the randomization samples, we match the null hypothesis. In this situation, that means whether or not a customer is approached has no effect on sales. We might randomly scramble the labels for type of store (“approach” or “not approach”) and assign them to the actual sales values. Compute the difference in the mean sales, xa − xna , between the stores assigned to the “approach” group and those randomly put in the “not approach” group. Other methods, for example sampling with replacement from the pooled sales values to simulate new samples of ”approach” and ”not approach” sales, are also acceptable. B.82 The null hypothesis for this difference in means test is that the means of the two groups are the same. To create the randomization samples, we match the null hypothesis. In this situation, that means it doesn’t matter in studying time whether the person is a first-year student or an upperclass student. We might randomly scramble the labels for type of student (“FY” or “Upper”) and assign them to the actual study time values. Compute the difference in the mean study time, xf y − xu , between the students assigned to the “first-year” group and those randomly put in the “upperclass” group. Other methods, for example shifting
UNIT B: ESSENTIAL SYNTHESIS & REVIEW
241
the two original samples to a common mean and sampling (with replacement) from the respective shifted values to form two new samples, are also acceptable.
364
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
Unit C: Essential Synthesis Solutions C.1 We are estimating mean amount spent in the store, so we use a confidence interval for a mean. C.2 We are testing for a difference in mean time spent waiting in line, so we use a hypothesis test for a difference in means. C.3 We are testing for a difference in the proportion classified as insulin-resistant between high sugar and normal diets, so we use a hypothesis test for a difference in proportions. C.4 We are estimating the proportion who make a purchase, so we use a confidence interval for a proportion. C.5 We are estimating the difference in the mean financial aid package between two groups of students, so we use a confidence interval for a difference in means. C.6 We are estimating a difference in mean muscle mass, using before and after paired data. We use a confidence interval for matched pairs difference in means. C.7 We are testing whether the proportion of left-handers is different from 10%, so we use a hypothesis test for a proportion. C.8 We are estimating the difference in proportion trying to lose weight between males and females, so we use a confidence interval for a difference in proportions. C.9 The z-test statistic can be interpreted as a z-score so this test statistic is more than 5 standard deviations out in the tail. This will have very little area beyond it (so the p-value is very small) and will provide strong evidence for the alternative hypothesis (so we reject H0 ). (a) Small (b) Reject H0 C.10 The z-test statistic can be interpreted as a z-score so this test statistic is more than 8 standard deviations out in the tail. This will have very little area beyond it (so the p-value is very small) and will provide strong evidence for the alternative hypothesis (so we reject H0 ). (a) Small (b) Reject H0 C.11 The z-test statistic can be interpreted as a z-score so this test statistic is less than 1 standard deviation from the mean. This will have quite a large area beyond it (so the p-value is relatively large) and will not provide much evidence for the alternative hypothesis (so we do not reject H0 ). (a) Large (b) Do not reject H0 C.12 For relatively large sample size, the t-distribution is very similar to the standard normal distribution, so the t-test statistic is similar to a z-score. This test statistic is approximately 12 standard deviations out in the tail! This will have very little area beyond it (so the p-value is very small) and will provide strong evidence for the alternative hypothesis (so we reject H0 ). (a) Small
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
365
(b) Reject H0 C.13 For relatively large sample size, the t-distribution is very similar to the standard normal distribution, so the t-test statistic is similar to a z-score. This test statistic is approximately 7 standard deviations out in the tail! This will have very little area beyond it (so the p-value is very small) and will provide strong evidence for the alternative hypothesis (so we reject H0 ). (a) Small (b) Reject H0 C.14 For relatively large sample size, the t-distribution is very similar to the standard normal distribution, so the t-test statistic is similar to a z-score. This test statistic is not even 1 standard deviation from the mean, so it will have quite a large area beyond it (so the p-value is relatively large) and will not provide much evidence for the alternative hypothesis (so we do not reject H0 ). (a) Large (b) Do not reject H0 C.15 It is best to begin by adding the totals to the two-way table. Yes No Total
A 21 39 60
B 15 50 65
C 15 17 32
Total 51 106 157
If we let p be the proportion of bills by Server B, we are testing H0 : p = 1/3 vs Ha : p > 1/3. We have p̂ = 65/157 = 0.414. The sample size is large enough to use the normal distribution. We have p̂ − p0 z=
p0 (1−p0 ) n
0.414 − 1/3 = = 2.14 1/3(2/3) 157
This is a upper-tail test so the p-value is the area above 2.14 in a standard normal distribution, so we see that the p-value is 0.0162. At a 5% significance level, we reject H0 and conclude that there is evidence that Server B is responsible for more than 1/3 of the bills at this restaurant. The results, however, are not strong enough to be significant at a 1% level. C.16 We see in the table that 51 of the 157 bills were paid with a credit or debit card, so we have p̂ = 51/157 = 0.325. For a 95% confidence interval, we use z ∗ = 1.96. We have: p̂(1 − p̂) ∗ p̂ ± z · n 0.325(0.675) 0.325 ± 1.96 · 157 0.325 ± 0.073 0.252 to 0.398 We are 95% confident that the percent of bills paid with a credit or debit card at this restaurant is between 25.2% and 39.8%.
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
366
C.17 This is a test for a difference in proportions. We are testing H0 : pB = pC vs Ha : pB = pC , where pB and pC are the proportions of bills paid with cash for Server B and Server C, respectively. We see from the table that p̂B = 50/65 = 0.769 and p̂ = 17/32 = 0.531. The sample sizes are large enough to use the normal distribution, and we compute the pooled proportion as p̂ = 67/97 = 0.691. We have z=
(p̂B − p̂C ) − 0 p̂(1 − p̂)( n1B + n1C )
=
0.769 − 0.531 1 1 0.691(0.309)( 65 + 32 )
= 2.39
This is a two-tail test, so the p-value is twice the area above 2.39 in a standard normal distribution. We see that p-value = 2(0.0084) = 0.0168 At a 5% significance level, we reject H0 and conclude that there is evidence that the proportion paying with cash is not the same between Server B and Server C. Server B appears to have a greater proportion of customers paying with cash. The results, however, are not strong enough to be significant at a 1% level. C.18 Paying with a credit or debit card has a larger average tip percentage in the sample (xY = 17.10 > xN = 16.39) and paying with cash has greater variability (sY = 2.47 < sN = 5.05). To determine if there is evidence of a difference in mean tip percentage depending on the method of payment, we do a two-sample t-test for a difference in means. We are testing H0 : μY = μN vs Ha : μY = μN , where μY represents the mean tip percent when paying with a credit or debit card and μN represents the mean tip percent when paying with cash. The sample sizes are large enough to use the t-distribution. We have t=
(xY − xN ) − 0 17.10 − 16.39 2 = = 1.18 sY s2N 2.472 5.052 + 51 106 nY + nN
This is a two-tail test, so the p-value is twice the area above 1.18 in a t-distribution with 51 − 1 = 50 df. We see that p-value = 2(0.122) = 0.244 At any reasonable significance level, we do not reject H0 . There is no convincing evidence that the mean tip percentage is different depending on whether the customer pays with cash or a credit/debit card. C.19 In the sample, the bill is larger when paying with a credit or debit card (xY = 29.4 > xN = 19.5) and there is more variability with a card (sY = 14.5 < sN = 9.4). To determine if there is evidence of a difference in the mean bill depending on the method of payment, we do a test for a difference in means. We are testing H0 : μY = μN vs Ha : μY = μN , where μY represents the mean bill amount when paying with a credit or debit card and μN represents the mean bill amount when paying with cash. The sample sizes are large enough to use the t-distribution, even if the underlying bills are not normally distributed. We have t=
29.4 − 19.5 (xY − xN ) − 0 2 = = 4.45 sY s2N 14.52 9.42 + + 51 106 nY nN
This is a two-tail test, so the p-value is twice the area above 4.45 in a t-distribution with 51 − 1 = 50 df. This area is essentially zero, so we have p-value ≈ 0. At any reasonable significance level, we find strong evidence to reject H0 . There is strong evidence that the mean bill amounts are not the same between the two payment methods. Customers with higher bills are more likely to pay with a credit/debit card.
UNIT C: ESSENTIAL SYNTHESIS & REVIEW C.20
367
(a) This is a difference in means test, and the hypotheses are H0 : μD = μS vs Ha : μD > μS , where μD represents the mean change in pain threshold for those striking a dominant pose and μS represents the mean change for those striking a submissive pose. We have t=
14.3 − (−6.1) (xD − xS ) − 0 2 = = 2.40 sD s2S 39.82 40.42 45 + 44 nD + nS
This is an upper-tail test, so the p-value is the area above 2.40 in a t-distribution with df = 43. Using technology, we see that the p-value is 0.01. At a 5% significance level, we reject H0 and find evidence that striking a dominant pose improves one’s mean tolerance to pain over striking a submissive pose. (b) This is a difference in means test, and the hypotheses are H0 : μD = μS vs Ha : μD < μS , where μD represents the mean change in pain threshold for those whose partner is dominant and μS represents the mean change for those whose partner is submissive. We are told that there are no extreme outliers so we can use the t-distribution. We have t=
(xD − xS ) − 0 −13.8 − 4.2 2 = = −1.96 sD s2S 27.12 22.92 15 + 15 nD + nS
This is a lower-tail test, so the p-value is the area below −1.96 in a t-distribution with df = 14. Using technology, we see that the p-value is 0.035. At a 5% significance level, we reject H0 and find evidence that a partner’s attitude influences one’s perception of pain. In particular, having a dominant partner (which makes one feel submissive) reduces pain tolerance (and increases one’s perception of pain) compared to having a submissive partner (which makes one feel more dominant). Notice that if you switched the order of the differences and in the null and alternative hypotheses, this would be an upper tail test but the p-value and result are exactly the same. This result reinforces that of part (a), in that the more dominant one feels, the easier it is to tolerate pain. (c) We find a 90% confidence interval for this difference in means. The degrees of freedom are 14, so we have t∗ = 1.76. The confidence interval is given by:
(xD − xS )
±
(−45.3 − (−6.8))
±
−38.5 −63.6
± to
s2 s2D + S nD nS 45.62 31.02 + 1.76 · 15 15 25.1 −13.4 ∗
t ·
We are 90% confident that people who feel submissive toward a dominant peer will have a mean decrease in handgrip strength between 63.6 and 13.4 newtons more than people who feel dominant over a submissive peer. Since zero (or no difference) is not included in the confidence interval, we have reason to believe that a hypothesis test for a difference in means will find evidence of a difference between the two groups. (d) These experiments reinforce that patients who feel more control over a situation have a better tolerance for pain, and that being around people who act submissive rather than dominant helps patients feel as if they have more control. This would argue for health care professionals acting more submissive and giving patients more control when possible.
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
368
C.21 None of the data appears to be severely skewed or with significant outliers, so it is fine to use the t-distribution. We use technology and the data in MentalMuscle to obtain the means and standard deviations for the times in each group as summarized in the table below. Variable Time Time Time Time
Action Mental Mental Actual Actual
Fatigue Pre Post Pre Post
N 8 8 8 8
Mean 7.34 6.10 7.16 8.04
StDev 1.22 0.95 0.70 1.07
(a) To compare the mean mental and actual times before fatigue we do a two-sample t-test for difference in means. Letting μM represent the mean time for someone mentally imaging the actions before any muscle fatigue and μA represent the mean time for someone actually performing the actions before any muscle fatigue, our hypotheses are: H0 : Ha :
μM = μA μM = μA
Using the pre-fatigue data values in each case, the relevant summary statistics as xM = 7.34 with sM = 1.22 and nM = 8 for the mental pre-fatigue group and xA = 7.16 with sA = 0.70 and nA = 8 for the actual pre-fatigue group. The test statistic is xM − xA 7.34 − 7.16 t= 2 = = 0.36 2 sM sA 1.222 0.702 + + 8 8 nM nA This is a two-tail test, so the p-value is twice the area above 0.36 in a t-distribution with df = 7. We see that the p-value is 2(0.365) = 0.73. This is a very large p-value, so we do not reject H0 . There is no convincing evidence at all of a difference between the two groups before muscle fatigue. (b) For each action, the same 8 people perform the movements twice, once before muscle fatigue and once after. To compare the pre-fatigue and post-fatigue means for those actually doing the movements we use a paired difference in means test. We compute the differences, D = P ostF atigue − P reF atigue, using the data for the people doing the actual movements and see that the 8 differences are: 2.5,
0,
0.7,
0.5,
0.2,
0.6,
−0.3,
2.8
The summary statistics for the differences are xD = 0.88 with sD = 1.15 and nD = 8. The hypotheses are: H0 :
μD = 0
Ha :
μD > 0
Notice that if we had subtracted the other direction (P reF atigue − P ostF atigue), the alternative would be in the other direction and all the differences would have the opposite sign, but the results would be identical. The test statistic is xD 0.88 √ = 2.16 = t= √ sD / nD 1.15/ 8 This is an upper-tail test, so the p-value is the area above 2.16 in a t-distribution with df = 7. Using technology we see that the p-value is 0.0338. At a 5% significance level, we reject H0 and conclude that people are slower, on average, at performing physical motions when they have muscle fatigue. This is not surprising; we slow down when we are tired!
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
369
(c) As in part (b), the same 8 people perform the movements twice, once before muscle fatigue and once after. To compare the pre-fatigue and post-fatigue means for those doing mental movements we use a paired difference in means test. We compute the differences, D = P ostF atigue − P reF atigue, using the data for the people doing the mental imaging and see that the 8 differences are: 1.5,
−3.9,
−1.5,
−1.1,
−1.1,
−0.2,
−1.5,
−2.1
The summary statistics for the differences are xD = −1.24 with sD = 1.54 and nD = 8. The hypotheses are: H0 :
μD = 0
Ha :
μD < 0
Notice that if we had subtracted the other direction (P reF atigue − P ostF atigue), the alternative would be in the other direction and all the differences would have the opposite sign, but the results would be identical. The test statistic is t=
xD −1.24 √ = −2.28 = √ sD / nD 1.54/ 8
This is a lower-tail test, so the p-value is the area below −2.28 in a t-distribution with df = 7. Using technology we see that the p-value is 0.0283. At a 5% significance level, we reject H0 and conclude that people are faster, on average, at mentally imaging physical motions when they have muscle fatigue. This is the new finding from the study: When people have muscle fatigue, they speed up their mental imaging — presumable because they just want to get done! It is likely that this makes the mental imagery less effective. (d) There are two separate groups (M ental vs Actual) so we do a difference in means test with two groups. Letting μM represent the mean time for someone mentally imaging the actions after muscle fatigue and μA represent the mean time for someone actually performing the actions after muscle fatigue, our hypotheses are: H0 :
μM = μA
Ha :
μM = μA
Using the post-fatigue data values in each case, the relevant summary statistics are xM = 6.10 with sM = 0.95 and nM = 8 for the mental post-fatigue group and xA = 8.04 with sA = 1.07 and nA = 8 for the actual post-fatigue group. The test statistic is xM − xA 6.10 − 8.04 t= 2 = = −3.83 sM s2A 0.952 1.072 + + 8 8 nM nA This is a two-tail test, so the p-value is twice the area below −3.83 in a t-distribution with df = 7. We see that the p-value is 2(0.0032) = 0.0064. This is a very small p-value, so we reject H0 . There is strong evidence of a difference in the mean times between the two groups after muscle fatigue. (e) Before muscle fatigue, the group mentally imaging doing the actions was remarkably similar in time to those actually doing the actions. The mental imaging was quite accurate. However, muscle fatigue caused those actually doing the motions to slow down while it caused those mentally imaging the motions to speed up. Taken together, there was a significant difference between the two groups after experiencing muscle fatigue so that the mental imaging of the motions was not as accurate at matching the actual motions.
370
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
C.22
(a) We see from the output that the sample proportion who smoke is p̂ = 0.118785. We use a hypothesis test to see of the percent of all students at this university who smoke is different from 0.2. The hypotheses are H0 : p = 0.2 vs Ha : p = 0.2. We see from the output that the p-value is 0.000, so we reject H0 and find strong evidence that the percent of students at this university who smoke is different from 20%.
(b) We use a hypothesis test to see if the average math SAT score of students at this university is greater than 600. The hypotheses are H0 : μ = 600 vs Ha : μ > 600. We see from the output that the z-test statistic is 3.11 and the p-value is 0.005, so at most significance levels we reject H0 and find relatively strong evidence that the mean math SAT score for students at this university is greater than 600. (c) We see in the output that the proportion of females in the sample with a higher verbal SAT score than math SAT score is 0.509091, while the corresponding proportion for males is 0.347368. We use a hypothesis test to determine whether the proportion for whom verbal is higher is different between males and females for all students at this university. The hypotheses are H0 : pF = pM vs Ha : pF = pM , and we see from the output that the p-value is 0.002. We reject H0 and find evidence that the proportions are not the same and, in fact, a higher proportion of female students appear to do better on the verbal SAT than for male students. A 95% confidence interval for the difference in proportions in the output is (0.0597323, 0.263713), so we are 95% confident that the proportion of female students who have a higher verbal score is between 0.060 and 0.264 more than the proportion of male students who have a higher verbal score. (d) In the sample, smokers have a higher average pulse rate (71.8) than non-smokers (69.3). We use a hypothesis test to see if there is evidence of a difference in mean pulse rate between smokers and non-smokers for all students at this university. The hypotheses are H0 : μN = μS vs Ha : μN = μS . We see in the output that the p-value is 0.188, so we do not reject H0 . We do not find evidence of a difference in average pulse rate between smokers and non-smokers. (e) In the sample, we see in the output that non-smokers have a higher mean GPA (3.173) than smokers (3.054). We use a hypothesis test to see if there is evidence of a difference in mean GPA between smokers and non-smokers for all students at this university. The hypotheses are H0 : μN = μS vs Ha : μN = μS , and we see in the output that the p-value is 0.061. There is evidence of a difference in mean GPA between smokers and non-smokers at the 10% level, but not at the 5% level or a 1% level. C.23 If μ is the mean number of free throws attempted in games by the Warriors, we test H0 : μ = 25.0 vs Ha : μ = 25.0. For the sample of 82 games in 2018–2019 in GSWarriors2019 we find the mean number of free throw attempts by the Warriors is x = 21.39 with a standard deviation of 6.93. The t-statistic is t=
x − μ0 20.39 − 25.0 √ √ = = −6.03 s/ n 6.93/ 82
Even after doubling the area beyond −6.03 in a t-distribution with 82 − 1 = 81 degrees of freedom, the p-value is very close to zero. We have strong evidence that the Golden State Warriors average fewer free throw attempts per game than is typical for NBA teams. C.24 If p is the proportion of free throws the Golden State Warriors make, we test H0 : p = 0.756 vs Ha : p = 0.756. The sample proportion for the Warriors in 2018–2019 is p̂ = 1339/1672 = 0.801. The standardized z-statistic is 0.801 − 0.756 p̂ − p0 = = 4.28 z= p0 (1−p0 ) n
0.756(1−0.756) 1672
We find the p-value by doubling the area beyond 4.28 in a standard normal distribution, p-value = 2(0.000009) = 0.000018. (We can also find the p-value by doing the test directly using technology.) This is a very small
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
371
p-value, so we have strong evidence that proportion of free throws made by the Golden State Warriors is higher than proportion for the league as a whole. C.25 This question is asking for a test to compare two proportions, pH and pA , the proportion of free throws the Warriors make in home and away games, respectively. The question also suggests a particular direction so the hypotheses are H0 : pH = pA vs Ha : pH > pA . Based on the sample results we get the proportions below for each location and the combined data. p̂H =
664 = 0.803 827
p̂A =
675 = 0.799 845
p̂ =
1339 = 0.801 1672
The standardized test statistic is z=
0.803 − 0.799 0.801(1−0.801) + 0.801(1−0.801) 827 845
=
0.004 = 0.2 0.020
Since the sample sizes are large we find a p-value using the area in a N(0,1) distribution that lies above z = 0.2. This gives p-value = 0.421 which is no small at all, so we do not have sufficient evidence to conclude that the Warriors make a higher proportion of their free throws at home than they do on the road. C.26 This question is asking about an estimate and confidence interval for the difference in means, μH − μA , where now μ is the mean number of free throw attempts in a game. Using the data in F T A we find the following summary statistics Variable FTA
Location Away Home
Count 41 41
Mean 20.17 20.61
StDev 7.86 5.95
The sample sizes are both more than 30 and the F T A amounts have no outliers, so we use a t-distribution with 41 − 1 = 40 degrees of freedom to find t∗ = 2.021 for 95% confidence. We compute the confidence interval using s2 s2H (xH − xA ) ± t∗ · + A nH nA 7.862 5.952 + (20.61 − 20.17) ± 2.021 · 41 41 0.44 ± 3.11 −2.67
to
3.55
Based on these results we are 95% sure that the Warriors average between 2.67 fewer and 3.55 more free throw attempts (per game) when playing games at home versus games on the road. C.27
(a) The hypotheses are H0 :
μ = 72
Ha :
μ = 72
where μ represents the average heart rate of all patients admitted to this ICU. Using technology, we see that the average heart rate for the sample of patients is x = 98.92 and we see that the p-value for the test is essentially zero (with a t-statistic of 14.2). There is very strong evidence that the average heart rate of ICU patients is not 72.
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
372
(b) Using technology, we see that 40 of the 200 patients died, so p̂ = 40/200 = 0.2. Using technology, we see that a 95% confidence interval for the proportion who die is (0.147, 0.262). We are 95% confident that between 14.7% and 26.2% of the ICU patients die at this hospital. (c) We see that there were 124 females (62%) and 76 males (38%), so more females were admitted to the ICU in this sample. To test whether genders are equally split, we do a one-sample proportion test. It doesn’t matter whether we test for the proportion of males or the proportion of females, since if we know one, we can compute the other, and in either case we are testing whether the proportion is significantly different from 0.5. Using p to denote the proportion of ICU patients that are female, we test H0 : Ha :
p = 0.5 p = 0.5
Using technology, we see that the p-value for this test is 0.001. This provides very strong evidence that patients are not evenly split between the genders. There are significantly more females than males admitted to this ICU unit. (d) This is a difference in means test. If we let μM represent the mean age of male ICU patients and μF represent the mean age of female ICU patients, the hypotheses are H0 :
μM = μF
Ha :
μM = μF
Using technology, we see that the p-value is 0.184. We do not reject H0 and do not find convincing evidence that mean age differs between males and females. (e) This is a difference in proportions test. If we let pM represent the proportion of males who die and pF represent the proportion of females who die, the hypotheses are H0 :
pM = pF
Ha :
pM = pF
Using technology, we see that the p-value is 0.772. We do not reject H0 and do not find convincing evidence that the proportion who die differs between males and females. C.28
(a) We use technology to see that there are 43 current smokers and p̂ = 0.137. Using technology, we find that the 95% confidence interval is (0.101, 0.179). We are 95% confident that the proportion of people who smoke is between 0.101 and 0.179.
(b) Using the Fiber variable and technology, we see that a 99% confidence interval is 12.01 to 13.57. We are 99% confident that the average number of grams of fiber that people eat in a day is between 12.01 and 13.57 grams. The best estimate is x = 12.79 grams with margin of error 0.778. (c) Using the Fat variable and technology, we see that a 90% confidence interval is 73.89 to 80.18. We are 90% confident that the average amount of fat that people eat in a day is between 73.89 and 80.18 grams. (d) Using technology, we see that 7 of the 42 males are current smokers and 36 of the 273 females are current smokers. The hypotheses for a difference in proportions test, using pM for the proportion of males who are current smokers and pF for the proportion of females that are current smokers, are H0 : Ha :
pM = pF pM = pF
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
373
Using technology, we see that the p-value for this test is 0.569. We do not reject H0 and do not find convincing evidence of a difference in smoking rates between males and females. (e) The hypotheses for this difference in means test, using μM for the mean cholesterol level of males and μF for the mean cholesterol level of females, are H0 :
μM = μF
Ha :
μM = μF
Using technology, we see that the test statistic is 4.17 and the p-value ≈ 0. We reject H0 and find strong evidence of a difference in mean cholesterol levels between males and females. Looking more carefully at the data, we see that xM = 328 while xF = 229, so males have significantly higher mean cholesterol than females. (f) The hypotheses for this difference in means test, using μS for the mean beta carotene level of smokers and μN for the mean beta carotene level of non-smokers, are H0 :
μS = μN
Ha :
μS = μN
Using technology, we see that the test statistic is t = 4.74 and the p-value is ≈ 0. We reject H0 and find strong evidence of a difference in beta carotene levels between smokers and non-smokers. Looking more carefully at the data, we see that xS = 121.3 while xN = 201, so non-smokers have significantly higher mean levels of beta carotene in the blood than smokers.
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
374 Unit C: Review Exercise Solutions
C.29 Using technology on a calculator or computer, we see that (a) The area below z = −2.10 is 0.018. (b) The area above z = 1.25 is 0.106. C.30 Using technology on a calculator or computer, we see that (a) The area below z = 1.68 is 0.954. (b) The area above z = 2.60 is 0.005. C.31 Using technology on a calculator or computer, we see that the endpoint z is (a) z = 0.253 (b) z = −2.054 C.32 Using technology on a calculator or computer, we see that the endpoint z is (a) z = −0.674 (b) z = 1.405 C.33 We use a t-distribution with df = 24. Using technology, we see that the values with 5% beyond them in each tail are ±1.711. C.34 We use a t-distribution with df = 11. Using technology, we see that the values with 1% beyond them in each tail are ±2.718. C.35 We use a t-distribution with df = 9. Using technology, we see that the area above 2.75 is 0.011. C.36 We use a t-distribution with df = 23. Using technology, we see that the area below −1.50 is 0.074. C.37
(a) To find a confidence interval using the normal distribution, we use Sample statistic ± z ∗ · SE We are finding a confidence interval for a mean, so the statistic from the original sample is x = 12.79. For a 95% confidence interval, we use z ∗ = 1.960 and we have SE = 0.30. Putting this information together, we have x 12.79
± ±
z ∗ · SE 1.960 · (0.30)
12.79 12.20
± to
0.59 13.38
We are 95% confident that the average number of grams of fiber consumed per day is 12.20 grams and 13.38 grams.
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
375
(b) The relevant hypotheses are H0 : μ = 12 vs Ha : μ > 12, where μ is the mean number of grams of fiber consumed per day by all people in the population. The statistic of interest is x = 12.79. The standard error of this statistic is given as SE = 0.30 and the null hypothesis is that the population mean is 12. We compute the standardized test statistic with z=
12.79 − 12 Sample Statistic − Null Parameter = = 2.63 SE 0.30
Using technology, the area under a N (0, 1) curve beyond z = 2.63 is 0.004. This small p-value provides strong evidence that the mean number of grams of fiber consumed per day is greater than 12. C.38
(a) To find a confidence interval using the normal distribution, we use Sample statistic ± z ∗ · SE We are finding a confidence interval for a proportion, so the statistic from the original sample is p̂ = 0.17. For a 90% confidence interval, we use z ∗ = 1.645 and we have SE = 0.0085. Putting this information together, we have p̂ 0.17
± ±
z ∗ · SE 1.645 · (0.0085)
0.17 0.156
± to
0.014 0.184
We are 90% confident that the proportion of cell phone owners who do most of their online browsing on their phone is between 15.6% and 18.4%. (b) The relevant hypotheses are H0 : p = 0.15 vs Ha : p > 0.15, where p is the proportion of the population doing most of their online browsing on their phone. The statistic of interest is p̂ = 0.17. The standard error of this statistic is given as SE = 0.0085 and the null hypothesis is that the population proportion is 0.15. We compute the standardized test statistic with z=
0.17 − 0.15 Sample Statistic − Null Parameter = = 2.353 SE 0.0085
Using technology, the area under a N (0, 1) curve beyond z = 2.353 is 0.009. This small p-value provides strong evidence that the proportion of cell phone owners who do most of their online browsing on their phone is greater than 15%. C.39 The sample size is definitely large enough to use the normal distribution. For a confidence interval using the normal distribution, we use Sample statistic ± z ∗ · SE The relevant sample statistic for a confidence interval for a proportion is p̂ = 0.83. For a 95% confidence interval, we have z ∗ = 1.96, and the standard error is SE = p̂(1 − p̂)/n. The confidence interval is p̂(1 − p̂) ∗ p̂ ± z n 0.83(0.17) 0.83 ± 1.96 · 1000 0.83 ± 0.023 0.807 to 0.853
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
376
We are 95% confident that the proportion of adults who believe that children spend too much time on electronic devices is between 0.807 and 0.853. The margin of error is 0.023. Since the lowest plausible value for p in the confidence interval is 0.807, it is not plausible that the proportion is less than 80%. Since 0.85 is within the plausible range in the confidence interval, it is plausible that the proportion is greater than 85%. C.40 To find a 90% confidence interval for p, the proportion of MLB games won by the home team, we use z ∗ = 1.645 and p̂ = 0.549 from the sample of n = 2430 games. The confidence interval is Sample statistic p̂ 0.549 0.549 0.532
z ∗ · SE p̂(1 − p̂) ∗ ± z n 0.549(0.451) ± 1.645 2430 ± 0.017 to 0.566 ±
We are 90% confident that the proportion of MLB games that are won by the home team is between 0.532 and 0.566. This statement assumes that the 2009 season is representative of all Major League Baseball games. If there is reason to assume that that season introduces bias, then we cannot be confident in our statement. C.41
(a) We see that n = 157, x = 3.849, and s = 2.421.
(b) We have 2.421 s = 0.193 SE = √ = √ n 157 This is the same as the value given in the computer output. (c) For a 95% confidence interval with degrees of freedom 156, we use t∗ = 1.98. The confidence interval is s x ± t∗ · √ n 2.421 3.849 ± 1.98 · √ 157 3.849 ± 0.383 3.466
to
4.232
A 95% confidence interval is $3.466 to $4.232. (d) Up to two decimal places, the confidence interval we found is the same as the one given in the computer output. The small differences are probably due to round-off error in estimating the t∗ value. (e) We are 95% confident that the average tip given at this restaurant is between $3.47 and $4.23. C.42
(a) We see that n = 30, x = 529.83, and s = 71.66.
(b) We have 71.66 s = 13.08 SE = √ = √ n 30 This is the same as the value given in the computer output.
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
377
(c) For a 95% confidence interval with 30 − 1 = 29 degrees of freedom, we use t∗ = 2.045. The confidence interval is x
±
s t∗ · √ n
529.83
±
529.83 503.07
± to
71.66 2.045 · √ 30 26.76 556.59
A 95% confidence interval is 503.07 to 556.59 walks. (d) Up to round-off error, the confidence interval we found is the same as the one given in the computer output. (e) We are 95% confident that the mean number of walks in a season for all MLB teams is between 503.1 walks and 556.6 walks. C.43
(a) The cookies were bought from locations all over the country to try to avoid sampling bias.
(b) Let μ be the mean number of chips per bag. We are testing H0 : μ = 1000 vs Ha : μ > 1000. The test statistic is 1261.6 − 1000 √ = 14.4 t= 117.6/ 42 We use a t-distribution with 41 degrees of freedom. The area to the left of 14.4 is negligible, and p-value ≈ 0. We conclude, with very strong evidence, that the average number of chips per bag of Chips Ahoy! cookies is greater than 1000. (c) No! The test in part (b) gives convincing evidence that the average number of chips per bag is greater than 1000. However, this does not necessarily imply that every individual bag has more than 1000 chips. C.44 The sample sizes (n = 2255 US adults or n = 1787 Internet users) are both large enough for the proportions in this exercise to follow a normal distribution. In each case we find the confidence interval with Sample statistic ± z ∗ · SE where z ∗ = 1.96 for 95% confidence and SE = p̂(1 − p̂)/n. (a) For the proportion of US adults who use the Internet regularly, we find p̂ = 1787/2255 = 0.792. The confidence interval is p̂(1 − p̂) ∗ p̂ ± z n 0.792(0.208) 0.792 ± 1.96 · 2255 0.792 ± 0.017 0.775
to
0.809
We are 95% confident that the proportion of US adults who use the Internet regularly is between 0.775 and 0.809.
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
378
(b) For the proportion of Internet users who use social networking sites, we find p̂ = 1054/1787 = 0.590. The confidence interval is p̂(1 − p̂) ∗ p̂ ± z n 0.590(0.410) 0.590 ± 1.96 · 1787 0.590 ± 0.023 0.567
to
0.613
We are 95% confident that the proportion of US adult Internet users who use social networking sites is between 0.567 and 0.613. (c) For the proportion of US adults who use social networking sites, we find p̂ = 1054/2255 = 0.467. The confidence interval is p̂(1 − p̂) ∗ p̂ ± z n 0.467(0.533) 0.467 ± 1.96 · 2255 0.467 ± 0.021 0.446 to 0.488 We are 95% confident that the proportion of US adults who use a social networking site between 0.446 and 0.488. Since 0.5 is not within this range of plausible values for the population proportion, it is not plausible to estimate that 50% of US adult Internet users use a social networking site. (The percentage is increasing rapidly though and doubled in the three years from 2008 to 2011. By the time you are reading this book, the percentage will probably be over 50%.) C.45
(a) In each case, we use
p̂ ± z
∗
p̂(1 − p̂) n
for the confidence interval, using z ∗ = 1.96 for a 95% confidence interval. In every case, n = 970. • For percent updating their status, we have: 0.15(0.85) = 0.15 ± 0.022 = the interval from 0.128 to 0.172 0.15 ± 1.96 · 970 We are 95% confident that between 12.8% and 17.2% of Facebook users update their status in an average day. • For percent commenting on another’s post, we have 0.22(0.78) = 0.22 ± 0.026 = the interval from 0.194 to 0.246 0.22 ± 1.96 · 970 We are 95% confident that between 19.4% and 24.6% of Facebook users comment on another’s post in an average day.
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
379
• For percent commenting on another’s photo, we have 0.20(0.80) 0.20 ± 1.96 · = 0.20 ± 0.025 = the interval from 0.175 to 0.225 970 We are 95% confident that between 17.5% and 22.5% of Facebook users comment on another’s photo in an average day. • For percent “liking” another’s content, we have 0.26(0.74) 0.26 ± 1.96 · = 0.26 ± 0.028 = the interval from 0.232 to 0.288 970 We are 95% confident that between 23.2% and 28.8% of Facebook users “like” another’s content in an average day. • For percent sending another user a private message, we have 0.10(0.90) 0.10 ± 1.96 · = 0.10 ± 0.019 = the interval from 0.081 to 0.119 970 We are 95% confident that between 8.1% and 11.9% of Facebook users send another user a private message in an average day. (b) The plausible proportions for those commenting on another’s content are those between 0.194 and 0.246, while the plausible proportions for those updating their status are those between 0.128 and 0.172. Since these ranges do not overlap, we can be relatively confident that these proportions are not the same. A greater percentage comment on another’s content than update their own status. C.46 Letting p̂1 and p̂2 represent the proportion supporting capital punishment in 2006 and 1974, respectively, we have 1945 937 = 0.691 and p̂2 = = 0.665 p̂1 = 2815 1410 The sample sizes are both large, so it is reasonable to use a normal distribution. For 95% confidence the standard normal endpoint is z ∗ = 1.96. This gives p̂1 (1 − p̂1 ) p̂2 (1 − p̂2 ) ∗ (p̂1 − p̂2 ) ± z · + n1 n2 0.691(1 − 0.691) 0.665(1 − 0.665) + (0.691 − 0.665) ± 1.96 · 2815 1410 0.026 ± 0.030 −0.004 to 0.056 We are 95% sure that, between 1974 and 2006, the percent change in support of the death penalty is between a decrease of 0.4% and an increase of 5.6%. Since a difference of zero (no chance) is within this interval, it is plausible that there has been no change in support or opposition to the death penalty in this 22-year period. C.47
(a) Using I for the Internet users and N for the non-Internet users, we see that p̂I = 807/1754 = 0.46 and p̂N = 130/483 = 0.27. In the sample, the Internet users are more trusting.
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
380
(b) A hypothesis test is used to determine whether we can generalize the results from the sample to the population. Since we are looking for a difference, this is a two-tail test. The hypotheses are H0 : Ha :
pI = pN pI = pN
In addition to the two sample proportions computed in part (a), we compute the pooled proportion. A total of 807 + 130 people in the full sample of 1754 + 483 people agreed with the statement, so Pooled proportion = p̂ =
807 + 130 937 = = 0.419 1754 + 483 2237
The standardized test statistic is z=
p̂I − p̂N p̂(1−p̂) p̂) + p̂(1− nI nN
=
0.46 − 0.27 0.419(0.581) + 0.419(0.581) 1754 483
= 7.49
This is a two-tail test, so the p-value is two times the area above 7.49 in a standard normal distribution. However, the area above 7.49 in a standard normal is essentially zero (more than seven standard deviations above the mean!). The p-value is essentially zero. The p-value is extremely small and gives very strong evidence that Internet users are more trusting than non-Internet users. (c) No, we cannot conclude that Internet uses causes people to be more trusting. The data come from an observational study rather than a randomized experiment. There are many possible confounding factors. (d) Yes. Level of formal education is a confounding factor if education level affects whether a person uses the Internet (it does — more education is associated with more Internet use) and also if education level affects how trusting someone is (it does — more education is associated with being more trusting). Remember that a confounding factor is a factor that influences both of the variables of interest. (In fact, even after controlling for education level and several other confounding variables, the data still show that Internet users are more trusting than non-users.) C.48 For 95% confidence, we have z ∗ = 1.96, so the margin of error for estimating the proportion of Democrats is p̂(1 − p̂) 0.353(0.647) = 1.96 · = 0.0076 M E = z∗ · n 15,000 The margin of error for estimating the proportion of Republicans is p̂(1 − p̂) 0.340(0.660) ∗ ME = z · = 1.96 · = 0.0076 n 15,000 The proportion of Democrats might be as low as 0.353−0.0076 = 0.3454 while the proportion of Republicans might be as high as 0.340+0.0076 = 0.3476. Thus we cannot be sure that more American adults self-identified as Democrats than as Republicans in March 2011. C.49 For 95% confidence, we have z ∗ = 1.96, so the margin of error for estimating the proportion of Democrats is p̂(1 − p̂) 0.290(0.710) ∗ = 1.96 · = 0.0081 ME = z · n 12,000
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
381
The margin of error for estimating the proportion of Republicans is p̂(1 − p̂) 0.260(0.740) = 1.96 · = 0.0078 M E = z∗ · n 12,000 The proportion of Democrats might be as low as 0.290 − 0.0081 = 0.2819 while the proportion of Republicans might be as high as 0.260 + 0.0078 = 0.2678. Even at the extremes of the confidence intervals, the proportion of Democrats is still higher. Thus we can feel comfortable concluding that more American adults selfidentified as Democrats than as Republicans in February 2010. C.50 For the confidence interval for a single mean in California we use sc x c ± t∗ √ nc The sample size is 30, so we use a t-distribution with 29 degrees of freedom. The 2.5% and 97.5% points in this distribution give t∗ = ±2.045. 269.2 = 535.4 ± 100.5 = (434.9, 635.9) 535.4 ± 2.045 · √ 30 Based on these data we are 95% sure that the mean home price in California is between $434,900 and $635,900. C.51 For the difference in mean home price between California and New York, we use s2ny s2ca (xca − xny ) ± t∗ + nca nny Both samples have size nca = nny = 30 so we use 29 degrees of freedom to find t∗ = 1.699 for 90% confidence. The confidence interval is 269.22 317.82 + (535.4 − 365.3) ± 1.699 30 30 170.1 ± 129.2 40.9
to
299.3
We are 90% sure that the mean home price in California is somewhere between $40.9 thousand and $299.3 thousand more than in New York. C.52 For the difference in mean home price between New Jersey and Pennsylvania, we use s2nj s2pa (xnj − xpa ) ± t∗ + nnj npa Both samples have size nnj = npa = 30 so we use 29 degrees of freedom to find t∗ = 2.756 for 99% confidence. The confidence interval is 137.12 158.02 + (328.5 − 265.6) ± 2.756 30 30 62.9 ± 105.3 −42.4
to
168.2
We are 99% sure that the mean home price in New Jersey is somewhere between $42.4 thousand less than in Pennsylvania to as much as $168.2 thousand more.
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
382
C.53 For the difference in mean home price between New York and New Jersey, we use s2nj s2ny + (xny − xnj ) ± t∗ nny nnj Both samples have size nny = nnj = 30 so we use 29 degrees of freedom to find t∗ = 2.045 for 95% confidence. The confidence interval is 158.02 317.82 + (365.3 − 328.5) ± 2.045 30 30 36.8 ± 132.5 −95.7
to
169.3
We are 95% sure that the mean home price in New York is somewhere between $95.7 thousand less than in New Jersey to as much as $169.3 thousand more. C.54 Let μ1 be the average hourly earnings for left-handed men and μ2 be the average hourly earnings for right-handed men. Then, we are testing H0 : μ1 = μ2 vs Ha : μ1 = μ2 . The test statistic is 13.1 − 13.4 t= = −0.584 7.92 7.92 2027 + 268 To find the p-value we use a t-distribution with 267 degrees of freedom (the normal distribution would give about the same result). The area to the left of −0.584 is 0.280, so p-value = 2(0.280) = 0.560 This is not convincing evidence that average hourly earnings depends on handedness. C.55 Let μC and μI represent mean weight loss after six months for women on a continuous or intermittent calorie restricted diet, respectively. The hypotheses are: H0 : Ha :
μC = μI μC = μI
The relevant statistic for this test is xC − xI , and the relevant null parameter is zero, since from the null hypothesis we have μC − μI = 0. The t-test statistic is: t=
Sample statistic − Null parameter (xC − xI ) − 0 14.1 − 12.2 = 2 = = 0.82 sC s2I SE 13.22 10.62 + + 54 53 nC nI
This is a two-tail test, so the p-value is two times the area above 0.82 in a t-distribution with df = 52. We see that the p-value is 2(0.208) = 0.416. This p-value is not small at all so we do not reject H0 . We do not see convincing evidence of a difference in effectiveness of the two weight loss methods. C.56 The proportions from the sample in favor of the legislation are p̂m = 318/520 = 0.612 for men
and
p̂w = 379/460 = 0.824 for women
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
383
For a 90% confidence interval the normal percentiles are z = ±1.645. Evaluating the formula for a confidence interval for a difference in proportions gives
(0.612 − 0.824)
±
−0.212 −0.212
± ±
0.612(1 − 0.612) 0.824(1 − 0.824) + 520 460 1.645 · 0.0278 0.046
−0.258
to
−0.166
1.645 ·
Thus we are 90% sure that the percentage of men who support this gun control legislation is between 25.8% and 16.6% less than the percentage of support among women. C.57 This is a test for a difference in proportions, and we define pF and pN to be the proportion of men copying their partners sentence structure with a fertile partner and non-fertile partner, respectively. The hypotheses are: H0 : Ha :
pF = pN pF < pN
We compute the sample proportions and the pooled sample proportion: p̂F =
30 = 0.484 62
p̂N =
38 = 0.623 61
p̂ =
68 = 0.553 123
It is appropriate to use a normal distribution so we compute the standardized test statistic: Sample statistic − Null parameter (0.484 − 0.623) − 0 = = −1.55 SE 0.553(0.447) 0.553(0.447) + 62 61 This is a lower-tail test so the p-value is the area below −1.55 in a standard normal distribution. We see that the p-value is 0.0606. This p-value is small enough to be significant at the 10% level but not quite at the 5% level. We do not reject H0 and do not find evidence that ovulating women affect men’s speech. (However, the results are so close to being significant that it is probably worth replicating the experiment with a larger sample size.) C.58 This is a matched pairs experiment so we work with the differences. We have xd = 2.4 with sd = 6.3 and nd = 47. To find a 95% confidence interval (with degrees of freedom 46), we use t∗ = 2.01. We have xd
±
2.4
±
2.4
±
sd t∗ √ nd 6.3 2.01 √ 47 1.85
to
4.25
0.55
We are 95% confident that the average increase in brain glucose metabolism rate from having a cell phone turned on and pressed to the ear for 50 minutes is between 0.55 and 4.25 μmol/100 g per minute.
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
384
C.59 We are testing for a difference in two means from a matched pairs experiment. We can use as our null hypothesis either H0 : μon = μof f or, equivalently, H0 : μd = 0. The two are equivalent since, using d to represent the differences, we have μd = μon − μof f . Using the differences, we have H0 : Ha :
μd = 0 μd > 0
Notice that this is a one-tail test since we are specifically testing to see if the metabolism is higher for the “on” condition. The t-test statistic is: t=
xd − 0 2.4 √ = 2.61 √ = sd / nd 6.3/ 47
Using the t-distribution with degrees of freedom 46, we find a p-value of 0.006. This provides enough evidence to reject H0 and conclude that mean brain metabolism is significantly higher when a cell phone is turned on and pressed to the ear. C.60
(a) The sample is the 27,000 people included in the survey. The population is all consumers with Internet access.
(b) The sample size is definitely large enough to use the normal distribution. For a confidence interval using the normal distribution, we use Sample statistic ± z ∗ · SE The relevant sample statistic for a confidence interval for a proportion is p̂ = 0.61. For a 99% confidence interval, we have z ∗ = 2.576, and the standard error is SE = p̂(1 − p̂)/n. The confidence interval is p̂(1 − p̂) ∗ p̂ ± z n 0.61(0.39) 0.61 ± 2.576 · 27,000 0.61 ± 0.008 0.602
to
0.618
We are 99% confident that the proportion of consumers worldwide who purchased more store brands during the economic downturn is between 0.602 and 0.618. The margin of error is very small because the sample size is so large. (c) The only change for this part is that we now have p̂ = 0.91. The confidence interval is p̂(1 − p̂) ∗ p̂ ± z n 0.91(0.09) 0.91 ± 2.576 · 27,000 0.91 ± 0.004 0.906 to 0.914 We are 99% confident that the proportion of consumers worldwide who plan to continue to purchase the same number of store brands is between 0.906 and 0.914. The margin of error is even smaller as the sample proportion gets farther away from 0.5.
UNIT C: ESSENTIAL SYNTHESIS & REVIEW C.61
385
(a) The best estimate is the sample proportion, 0.17. Using z ∗ = 2.576 for 99% confidence, the margin of error is p̂(1 − p̂) 0.17(0.83) ∗ ME = z · = 2.576 · = 0.03 n 1016 The margin of error is ±3%.
(b) We use z ∗ = 2.576 for 99% confidence, and the sample proportion p̂ for our estimated proportion p̃ = 0.17. If we want the margin of error to be within ±1%, we need a sample size of: n=
2.576 0.01
2 (0.17)(1 − 0.17) = 9363.1
We round up to require at least 9364 US adults in the sample. C.62 We test H0 : p = 1/3 vs Ha : p = 1/3 where p is the proportion of all female students who choose the Olympic gold medal. For the sample of n = 169 female students we have p̂ = 73/169 = 0.432. We compute a standardized test statistic as 0.432 − 1/3 = 2.72 z= 1/3(1−1/3) 169
Checking that 169 · 1/3 = 56.3 and 169 · (1 − 1/3) = 112.7 are both bigger than 10, we use the standard normal curve to find the p-value. Using technology or a table, the area to the right beyond 2.72 is 0.003 and we double this value since it’s a two-tailed alternative to get p-value = 0.006. This is a small p-value, so we have strong evidence that the proportion of female students who choose the Olympic gold medal differs from one third. Thus it would appear that the three awards are not equally popular among female students. C.63
(a) About 800 × 0.28 = 224 Quebecers and 500 × 0.18 = 90 Texans wanted to separate.
(b) The pooled proportion is p̂ =
224 + 90 ≈ 0.242 800 + 500
(c) If we let p1 and p2 represent the proportion of Quebecers and Texans who want to secede, respectively, the relevant hypotheses are H0 : p1 = p2 vs Ha : p1 = p2 . We compute the standardized test statistic as 0.28 − 0.18 = 4.10 z= 0.242(1−0.242) 0.242(1−0.242) + 800 500 The p-value is twice the area of the standard normal tail beyond 4.10, which is very small (p < 0.0001). Thus, we have very strong evidence that a greater proportion of Quebecers support secession from Canada than Texans support secession from the United States. C.64 We want to estimate the size of the difference in the mean rating with the red background (μR ) compared to the white background (μW ). We estimate the difference in population means using the difference in sample means xR − xW , where xR represents the mean rating in the sample using red and xW represents the mean rating in the sample using white. For a 90% confidence interval with degrees of freedom equal to
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
386
11 (the smaller sample size minus one), we use t∗ = 1.80. We have: Sample statistic
±
t∗ · SE
(xR − xW )
±
1.80
(7.2 − 6.1)
±
1.80
1.1 0.75
± to
0.35 1.45
s2 s2R + W nR nW 0.42 0.62 + 15 12
We are 90% confident that men’s average rating of women’s attractiveness on a 9-point scale will be between 0.75 and 1.45 points higher when the picture is displayed on a red background rather than a white background. C.65 These are large sample sizes so the normal distribution is appropriate. For 95% confidence z ∗ = 1.96. The sample proportions are p̂H = 164/8506 = 0.0193 and p̂P = 122/8102 = 0.0151 in the HRT group and the placebo group, respectively. The confidence interval is 0.0193(1 − 0.0193) 0.0151(1 − 0.0151) + = 0.0042 ± 0.0039 = (0.0003, 0.0081) (0.0193 − 0.0151) ± 1.96 · 8506 8102 We are 95% sure that the proportion of women who get cardiovascular disease is between 0.0003 and 0.0081 higher among women who get hormone replacement therapy rather than a placebo. C.66 These are large sample sizes so the normal distribution is appropriate. For 95% confidence z ∗ = 1.96. The sample proportions are p̂H = 166/8506 = 0.0195 and p̂P = 124/8102 = 0.0153 in the HRT group and the placebo group, respectively. The confidence interval is 0.0195(1 − 0.0195) 0.0153(1 − 0.0153) + = 0.0042 ± 0.0040 = (0.0002, 0.0082) (0.0195 − 0.0153) ± 1.96 · 8506 8102 We are 95% sure that the proportion of women who get invasive breast cancer is between 0.0002 and 0.0082 higher among women who get hormone replacement therapy rather than a placebo. C.67 These are large sample sizes so the normal distribution is appropriate. For 95% confidence z ∗ = 1.96. The sample proportions are p̂H = 502/8506 = 0.0590 and p̂P = 458/8102 = 0.0565 in the HRT group and the placebo group, respectively. The confidence interval is 0.0590(1 − 0.0590) 0.0565(1 − 0.0565) + = 0.0025 ± 0.0071 = (−0.0046, 0.0096) (0.0590 − 0.0565) ± 1.96 · 8506 8102 We are 95% sure that the proportion of women who get any form of cancer is between 0.0046 lower and 0.0096 higher among women who get hormone replacement therapy rather than a placebo. C.68 These are large sample sizes so the normal distribution is appropriate. For 95% confidence z ∗ = 1.96. The sample proportions are p̂H = 650/8506 = 0.076 and p̂P = 788/8102 = 0.097 for the HRT group and the placebo group, respectively. The confidence interval is 0.076(1 − 0.076) 0.097(1 − 0.097) + = −0.021 ± 0.0086 = (−0.030, −0.012) (0.076 − 0.097) ± 1.96 · 8506 8102 We are 95% sure that the proportion of women who get fractures is between 0.030 and 0.012 lower among women who get hormone replacement therapy rather than a placebo. While HRT increases risk of cardiovascular disease and breast cancer, it decrease risk of fractures.
UNIT C: ESSENTIAL SYNTHESIS & REVIEW C.69
387
(a) The sample for Pennsylvania has only 0.267 × 30 = 8 homes with more then 3 bedrooms, which is less than 10, so the normal distribution may not be appropriate.
(b) Because the normal distribution may not apply, we return to the methods of Chapter 4 and perform a randomization test for the hypotheses H0 : pN Y = pP A vs Ha : pN Y = pP A . We use resampling, taking samples with replacement from the pooled data, because the randomness in this problem comes from sampling, not any kind of allocation. Using StatKey or other technology we construct a randomization distribution of many differences in proportions for two simulated samples of size 30, taken from a “population” with the same proportion of three bedroom homes as we see in the combined sample.
In this randomization distribution we see that 0.011 of the samples give differences as big as the difference of 0.567 − 0.267 = 0.30 that occurred in the original sample. Because this is a two-sided test, we double the 0.011 to get p-value = 0.022. This is a fairly small p-value, so we have evidence to conclude that proportion of homes with more than 3 bedrooms is higher in New York than in Pennsylvania. C.70
(a) We test H0 : p1 = p2 vs Ha : p1 > p2 , where p1 is the proportion of patients who find THC effective and p2 is the proportion for prochlorperazine. The sample proportions for effective treatments are pˆ1 = 36/79 = 0.456 (THC) and pˆ2 = 16/78 = 0.205 (Prochlorperazine). We also need the pooled proportion p̂ =
52 36 + 16 = = 0.331 79 + 78 157
The value of the standardized test statistic is z=
0.456 − 0.205 0.331(1−0.331) + 0.331(1−0.331) 79 78
=
0.251 = 3.34 0.0751
Using area in the upper tail of a standard normal distribution beyond z = 3.34 gives a p-value=0.0004. This is a very small p-value, even less that the strict α = 0.01, so we have strong evidence that the proportion of chemotherapy patients who find THC effective in combating nausea is more than the proportion of patients helped by prochlorperazine.
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
388
(b) Since this was an experiment, we can directly link the new drug (THC) as the reason for the improved effectiveness. C.71 Let μ be the mean volume of juice per bottle, in fl oz. We are testing H0 : μ = 12.0 vs Ha : μ = 12.0. The test statistic is 11.92 − 12.0 √ = −1.69 t= 0.26/ 30 We use a t-distribution with 29 degrees of freedom. The area to the left of −1.69 is 0.051, so the p-value is 2(0.051) = 0.102. We do not reject H0 at the 1% level. There is not sufficient evidence that the average amount of juice per bottle differs from 12 fl oz, so Susan need not recalibrate the machine. C.72 The sample proportion who prefer a playoff system is p̂ = 0.63, the margin of error is M E = 0.031, and z ∗ = 1.96 for 95% confidence. We can “work backwards” to determine the sample size: 2 1.96 0.63(1 − 0.63) = 931.8 n= 0.031 So, about 932 college football fans were sampled. C.73 We estimate the difference in population means using the difference in sample means xL − xD , where xL represents the mean weight gain of the mice in light and xD represents the mean weight gain of mice in darkness. For a 99% confidence interval with degrees of freedom equal to 7, we use t∗ = 3.50. We have: Sample statistic
±
(xL − xD )
±
(9.4 − 5.9)
±
3.5 0.02
± to
t∗ · SE s2 s2L t∗ + D nL nD 1.92 3.22 + 3.50 19 8 3.48 6.98
We are 99% confident that mice with light at night will gain, on average, between 0.02 grams and 6.98 grams more than mice in darkness. C.74 We estimate the difference in mean scoring between home and away teams using the difference in sample means xH − xA = 25.16 − 20.86 = 4.3 points. For a 90% confidence interval with degrees of freedom equal to 79, we use t∗ = 1.664. We have: Sample statistic
±
t∗ · SE
(xH − xA )
±
1.664
(25.16 − 20.86)
±
1.664
4.30 1.72
± to
2.58 6.88
s2 s2H + A nH nA 10.382 9.222 + 80 80
We are 90% confident that the mean advantage in the NFL is between 1.72 points and 6.88 points in favor of the home team.
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
389
C.75 Depending on technology, we may need to add a new column to the NFLScores2018 dataset to compute the difference in home and away scores with a formula like Dif f = HomeScore − AwayScore. After doing so, we can find the summary statistics for the differences as in the output below. Variable Diff
N 256
Mean 2.20
StDev 14.36
Minimum -37
Maximum 44
Within this sample of nd = 256 games, the mean difference is xd = 2.20 points in favor of the home team with a standard deviation of sd = 14.36. For 90% confidence with a t-distribution and 255 df we have t∗ = 1.65. To compute the confidence interval based on the paired data we have xd
±
2.20
±
2.20 0.72
± to
sd t∗ √ nd 14.36 1.65 · √ 256 1.48 3.68
Based on these results we are 90% sure that the mean home field advantage for NFL games is between 0.72 and 3.68 points. Football fans often use the adage that the home field is worth about a field goal which counts for 3 points. That would be a plausible value based on this interval. C.76
(a) We find that x = 12.80 penalty minutes per game with s = 2.10.
(b) We use a t-distribution with df = 29, so for a 95% confidence interval, we have t∗ = 2.045. The confidence interval is s x ± t∗ · √ n 2.10 12.80 ± 2.045 · √ 30 12.80 ± 0.78 12.02 to 13.58 We are 95% confident that the average number of penalty minutes per game for NHL teams is between 12.02 and 13.58. (c) It might be reasonable to generalize to the broader population of all NHL teams in all years if we think the 2018–2019 teams are representative of all other years. It is probably not appropriate, however, since there are many things that can change year to year. For example, the referees might have been particularly harsh (or lenient) in 2018–2019, or there might be rule changes that cause more or fewer penalties. There are many possible ways in which this sample could be biased. C.77 First, we compute the proportion of students choosing the Olympic gold medal within each of the gender samples. 109 73 p̂m = = 0.565 and p̂f = = 0.432 193 169 The estimated difference in proportions is p̂m − p̂f = 0.565 − 0.432 = 0.133. The sample sizes are both quite large, well more than 10 students choosing each type of award within each gender, so we model the
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
390
differences in proportions with a normal distribution. For 90% confidence the standard normal endpoint is z ∗ = 1.645. This gives p̂m (1 − p̂m ) p̂f (1 − p̂f ) + (p̂m − p̂f ) ± z ∗ · nm nf 0.565(1 − 0.565) 0.432(1 − 0.432) + (0.565 − 0.432) ± 1.645 193 169 0.133 ± 0.086 0.047 to 0.219 We are 90% sure that the proportion of male statistics students who prefer the Olympic gold medal is between 0.047 and 0.219 more than the proportion of female statistics students who make that choice. C.78 We let μH and μS represent the mean number of unique bacterial genes, in millions, in the stomachs of healthy people and sick (IBS) people, respectively. The hypotheses are: H0 : Ha :
μH = μS μH > μS
The relevant statistic for this test is xH − xS , and the relevant null parameter is zero, since from the null hypothesis we have μH − μS = 0. The t-test statistic is: t=
Sample statistic − Null parameter (xH − xS ) − 0 564 − 425 = 2 = = 4.93 sH s2S SE 1222 1272 + + 99 25 nH nS
This is an upper-tail test, so the p-value is the area above 4.93 in a t-distribution with df = 24. We see that the p-value is 0.00002. This is an extremely small p-value, so we reject H0 and conclude that there is very strong evidence that people with IBS have, on average, significantly fewer unique gut bacteria genes. C.79
(a) Let μ1 represent the mean midterm grade for students who attend class on a Friday before break and μ2 be the mean grade for students who miss that class. The instructor’s suspicion is in a particular direction so we use a one-tail alternative: H0 : μ1 = μ2 vs Ha : μ1 > μ2 .
80.9 − 68.2 12.7 (b) The value of the t-statistic is t = = 3.02. = 2 2 4.21 11.07 9.26 + 15 9 We find a p-value using a t-distribution with 9 − 1 = 8 degrees of freedom. The area in the upper tail of this distribution beyond 3.02 gives a p-value of 0.008. This is a very small p-value so we find strong evidence to support the claim that the mean midterm grade is higher for students who attend class on the Friday before break than for those who skip. (c) Even though the test shows strong evidence that the mean midterm grade is lower for those who skip class, we can’t conclude that missing class causes this to happen. In fact, the midterm grades were determined before students even came to class (or chose to miss). This was not an experiment (we didn’t randomly assign students to attend or miss class!) so we can’t draw a cause/effect conclusion. (d) It was a good idea to exclude the student who was never attending class. The instructor would probably like to draw a conclusion about the population of students who are regular members of the class. Also, including the extremely low midterm grade from this student would bring into question the appropriateness of using the t-distribution for this test since the sample sizes are somewhat small.
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
391
C.80 In this case, we have p = 0.30 and n = 500. The sample proportions will be centered at the population proportion of p = 0.30 so will have a mean of 0.30. The standard deviations of the sample proportions is the standard error, which is p(1 − p) 0.30(1 − 0.30) = = 0.020 SE = n 500 C.81 In this case, we have p = 0.651 and n = 50. The sample proportions will be centered at the population proportion of p = 0.651 so will have a mean of 0.651. The standard deviation of the sample proportions is the standard error, which is p(1 − p) 0.651(1 − 0.651) SE = = = 0.067 n 50 C.82
(a) In this case, we have p = 0.69 and n = 100. The sample proportions will be centered at the population proportion of p = 0.69 so will have a mean of 0.69. The standard deviation of the sample proportions is the standard error, which is p(1 − p) 0.69(1 − 0.69) SE = = = 0.046 n 100
(b) In this case, we have p = 0.69 and n = 1000. The sample proportions will be centered at the population proportion of p = 0.69 so will have a mean of 0.69. The standard deviation of the sample proportions is the standard error, which is p(1 − p) 0.69(1 − 0.69) SE = = = 0.015 n 1000 Notice that the standard error is significantly less with a sample size of 1000 than it is with a sample size of 100. (c) In this case, we have p = 0.75 and n = 100. The sample proportions will be centered at the population proportion of p = 0.75 so will have a mean of 0.75. The standard deviation of the sample proportions is the standard error, which is p(1 − p) 0.75(1 − 0.75) SE = = = 0.043 n 100 (d) In this case, we have p = 0.75 and n = 1000. The sample proportions will be centered at the population proportion of p = 0.75 so will have a mean of 0.75. The standard deviation of the sample proportions is the standard error, which is p(1 − p) 0.75(1 − 0.75) SE = = = 0.014 n 1000 Notice that the standard error is significantly less with a sample size of 1000 than it is with a sample size of 100. C.83
(a) The mean of the distribution is 36.78 years old. The standard deviation of the distribution of sample means is the standard error: 22.58 σ = 7.14 years SE = √ = √ n 10
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
392
(b) The mean of the distribution is 36.78 years old. The standard deviation of the distribution of sample means is the standard error: 22.58 σ = 2.258 years SE = √ = √ n 100 (c) The mean of the distribution is 36.78 years old. The standard deviation of the distribution of sample means is the standard error: σ 22.58 SE = √ = √ = 0.714 years n 1000 Notice that as the sample size goes up, the standard error of the sample means goes down. C.84
(a) The mean of the distribution is 233 minutes. The standard deviation of the distribution of sample means is the standard error: 45 σ SE = √ = √ = 14.2 minutes n 10
(b) The mean of the distribution is 233 minutes. The standard deviation of the distribution of sample means is the standard error: σ 45 SE = √ = √ = 4.5 minutes n 100 (c) The mean of the distribution is 233 minutes. The standard deviation of the distribution of sample means is the standard error: σ 45 SE = √ = √ = 1.4 minutes n 1000 Notice that as the sample size goes up, the standard error of the sample means goes down. C.85 The differences in sample proportions will have a mean of pA − pS = 0.194 − 0.186 = 0.008 and a standard deviation equal to the standard error SE. We have pA (1 − pA ) pS (1 − pS ) 0.194(0.806) 0.186(0.814) + = 0.039 SE = + = nA nS 200 200 C.86 The differences in sample proportions will have a mean of pA − pN Z = 0.157 − 0.156 = 0.001 and a standard deviation equal to the standard error SE. We have pA (1 − pA ) pN Z (1 − pN Z ) 0.157(0.843) 0.156(0.844) + = 0.0265 SE = + = nA nN Z 500 300 C.87 The sample size is large enough to use the normal approximation. The mean is pa − pri = 0.520 − 0.483 = 0.037 the standard error is
0.52(1 − 0.52) 0.483(1 − 0.483) + = 0.041 300 300
The distribution of p̂a − p̂ri is approximately N (0.037, 0.041).
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
393
C.88 We have p̂ = 0.73 with n = 1000. For a 95% confidence interval, we have z ∗ = 1.96. The 95% confidence interval for a proportion is p̂(1 − p̂) ∗ p̂ ± z · n 0.73(1 − 0.73) 0.73 ± 1.96 · 1000 0.73 ± 0.028 0.702 to 0.758 We are 95% confident that the proportion of likely voters in the US who think a woman president is likely in the next 10 years is between 0.702 and 0.758. C.89 If we let μE represent the average time spent in freeze mode for rats who have been previously shocked (the experimental condition) and μC represent the same thing for rats who have not been previously shocked (the control condition), then the hypotheses are H0 :
μE = μC
Ha :
μE > μC
The sample sizes are small, but the data have no large outliers so we use the t-distribution. The test statistic is xE − xC 36.6 − 1.2 t= 2 = = 6.39 2 sE sC 21.32 2.32 + + 15 11 n n E
C
This is an upper-tail test, so the p-value is the area above 6.39 in a t-distribution with df = 10. We see that this is approximately zero. There is strong evidence that rats with previous experience getting shocked react more strongly to seeing other rats get shocked. C.90 We let μ represent the average scrotal temperature increase for a man with a laptop computer on his lap for an hour. The hypotheses are H0 : Ha : The test statistic is given by t=
μ=1 μ>1
x − μ0 2.31 − 1 √ = 7.35 √ = s/ n 0.96/ 29
This is an upper-tail test, so the p-value is the area above 7.35 in a t-distribution with df = 28. We see that the p-value is essentially zero. There is very strong evidence that mean scrotal temperature increase is greater than 1◦ C. C.91 The hypotheses are H0 : μm = μf vs Ha : μm > μf , where μm and μf represent the mean salary recommended for male applicants and female applicants, respectively. The sample sizes are large enough to use the t-distribution. The t-test statistic is: t=
Sample statistic − Null parameter (xm − xf ) − 0 (30,238 − 26,508) − 0 = = = 3.316 2 SE 51522 73482 2 sf sm + 63 64 nm + nf
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
394
This is a right-tail test, so the p-value is the area above 3.316 in a t-distribution with df = 62. We see that the p-value is 0.0008. This is a very small p-value, so we reject H0 and conclude that there is very strong evidence that there is a gender bias in salary in favor of male applicants. C.92 The hypotheses are H0 : μm = μf vs Ha : μm = μf , where μm and μf represent the mean salary recommended for female applicants by male faculty members and female faculty members, respectively. The sample sizes are large enough to use the t-distribution. The t-test statistic is: t=
(xm − xf ) − 0 (27,111 − 25,000) − 0 Sample statistic − Null parameter = = = 1.130 2 SE 69482 79662 2 s sm f + 32 32 nm + nf
The area above 1.130 in a t-distribution with df = 31 is 0.134. This is a two-tail test, so the p-value is 2(0.134) = 0.268. This is not a small p-value, so we do not reject H0 . We do not have evidence of a difference in mean recommended salary based on the gender of the faculty evaluator. C.93
(a) If p is the proportion of overtime games that are won by the coin flip winner, we test H0 : p = 0.5 vs Ha : p > 0.5 to see if there is an advantage. The proportion in the sample is p̂ = 240/428 = 0.561 and the sample size is large, so we compute a standardized z-statistic p̂ − p0 z=
p0 (1−p0 ) n
0.561 − 0.5 = = 2.52 0.5(1−0.5) 428
Using the upper tail of a standard normal distribution beyond z = 2.52, the p-value for this test is 0.006. This is a small p-value, meaning it is quite unlikely to see this many wins by the coin flip winner (if there were no advantage), so we reject the null hypothesis and conclude that there probably is some advantage to winning the coin flip when playing in overtime in the NFL. (b) To compare the proportions of overtime wins by coin flip winners under the two rules we test H0 : p1 = p2 vs Ha : p1 = p2 , where p1 is the proportion under the old rule and p2 is using the new rule. From the sample data we estimate a proportion of each group and the combined sample. p̂H =
94 = 0.500 188
p̂A =
146 = 0.608 240
p̂ =
94 + 146 240 = = 0.561 188 + 240 428
The standardized test statistic is z=
−0.108 0.500 − 0.608 = −2.23 = 1 0.04833 1 0.561(1 − 0.561) 188 + 240
Since the sample sizes are large we find a p-value using the area in a N(0,1) distribution that lies below z = −2.23 and double to account for two tails. This gives p-value = 2(0.0129) = 0.0258 which is less than a 5% significance level. This provides evidence that the advantage to the coin flip winner is different under the two sets of rules. However, we should take care to avoid making a cause/effect conclusion about this relationship. These data were not from an experiment (can you imagine randomly assigning a rule to each overtime game?). It is quite possible that some other aspect of the game might have changed between the these two eras that is responsible for the change in proportions for coin flip winners.
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
395
C.94 We are finding a confidence interval for a difference in proportions. Using pD for the proportion of people with dyslexia who have the gene disruption and pC be the proportion of people without dyslexia who have the gene disruption, we want to estimate pD − pC . Since p̂D = 10/109 = 0.092 and p̂C = 5/195 = 0.026, the sample statistic is p̂D − p̂C = 0.092 − 0.026 = 0.066. Since there are only 5 in the group of people who have dyslexia and are in the control group, the conditions are not met for using the normal distribution. We use a bootstrap method instead. Using StatKey or other technology, we create a bootstrap distribution of differences in proportions after sampling with replacement from the respective samples.
Using the 2 · SE method, we estimate the 95% confidence interval to be Sample statistic 0.066 0.066 0.006
± ±
2 · SE 2(0.030)
± 0.060 to 0.126
We are 95% confident that the difference in proportions with the gene disruption between people with dyslexia and people without is between 0.006 and 0.126. We could also use percentages from the bootstrap distribution to estimate the confidence interval, and the answer will be similar but possibly not identical. C.95 There are 48 people who are lying and the lie detector accurately detected the lies for 31 of them, so we have p̂ = 31/48 = 0.646. For a 90% confidence interval, we use z ∗ = 1.645 so the confidence interval is p̂(1 − p̂) ∗ p̂ ± z · n 0.646(1 − 0.646) 0.646 ± 1.645 · 48 0.646 ± 0.114 0.532
to
0.760
We are 90% confident that a lie detector will accurately detect a lying person under these circumstances between 53.2% and 76.0% of the time.
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
396
C.96 This is a hypothesis test for a proportion. There are 96 people total and the lie detector detects lying for 31 + 27 = 58 of them. We have p̂ = 58/96 = 0.604. Letting p represent the proportion of times a lie detector says a person is lying regardless of whether or not the person is lying, we have the hypotheses
The test statistic is
H0 :
p = 0.5
Ha :
p > 0.5
p̂ − p0 z=
p0 (1−p0 ) n
0.604 − 0.5 = = 2.04 0.5(0.5) 96
This is an upper-tail test so the p-value is the area above 2.04 in a normal distribution. We see that the p-value is 0.0207. At a 5% significance level, we reject H0 and conclude that a lie detector says a person is lying more than 50% of the time. The results are significant at the 5% level but not at the 1% level. C.97 This is a hypothesis test for a difference in proportions. Using pL to represent the proportion found lying when they are lying and pT to represent the proportion found lying when they are telling the truth, the hypotheses are H0 :
pL = pT
Ha :
pL = pT
We need three different sample proportions. We have p̂L p̂T
= =
31/48 = 0.646 27/48 = 0.563
p̂
=
58/96 = 0.604
The test statistic is z=
p̂L − p̂T p̂(1 − p̂)
1 1 nL + nT
=
0.646 − 0.563 = 0.83 1 1 0.604(0.396) 48 + 48
This is a two-tail test, so the p-value is twice the area above 0.83 in a normal distribution. We see that the p-value is 2(0.203) = 0.406. For any reasonable significance level, this is not significant. We do not reject H0 and do not find evidence that there is any difference in the proportion the machine says are lying depending on whether the person is actually lying or telling the truth. C.98 This is a confidence interval for a difference in proportions. We need sample proportions (of detected lies) for the lying and truthful groups. We have p̂L = 31/48 = 0.646
and
p̂T = 27/48 = 0.563
For a 95% confidence interval, we have z ∗ = 1.96. The confidence interval is p̂L (1 − p̂L ) p̂T (1 − p̂T ) (p̂L − p̂T ) ± z ∗ · + nL nT 0.646(0.354) 0.563(0.437) + (0.646 − 0.563) ± 1.96 · 48 48 0.083 ± 0.195 −0.112 to 0.278
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
397
We are 95% confident that the lie detector is between −0.112 and 0.278 more likely to detect a person lying if the person actually is lying than if the person is telling the truth. C.99 We are estimating p, the proportion of US adults who believe the government does not provide enough support for soldiers returning from Iraq or Afghanistan. We have p̂ = 931/1502 = 0.620 with n = 1502. For a 99% confidence interval, we have z ∗ = 2.576. The 99% confidence interval for a proportion is p̂(1 − p̂) ∗ p̂ ± z · n 0.620(1 − 0.620) 0.620 ± 2.576 · 1502 0.620 ± 0.032 0.588 to 0.652 We are 99% confident that the proportion of US adults who believe the government does not provide enough support for returning soldiers is between 0.588 and 0.652. C.100 (a) For the data from Great Britain, the sample size is only 8 and the data appear to be quite skewed with outliers. A t-distribution is probably not appropriate for these data, so we use a bootstrap method to produce the distribution of sample means shown below.
Using the 2 · SE method to construct the confidence interval, we see that the standard error for one bootstrap distribution using this data is about 2.82. We have x 7.21
± ±
2 · SE 2(2.82)
7.21 1.57
± to
5.64 12.85
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
398
We are 95% confident that the mean arsenic level in toenails of all people living near the former arsenic mine in Great Britain is between 1.57 mg/kg and 12.85 mg/kg. (b) For the data from New Hampshire the sample size is 19 and the data are only mildly skewed, so we use the t-distribution. (It is also appropriate to use a bootstrap distribution with this data and the results are similar.) For a 95% confidence interval with df = 18, we have t∗ = 2.10. The summary statistics for the data are x = 0.2719 and s = 0.2365. A 95% confidence interval is s x ± t∗ · √ n 0.2365 0.2719 ± 2.10 · √ 19 0.2719 ± 0.1139 0.1580 to 0.3858 We are 95% confident that the mean arsenic level in toenails of people with private wells in New Hampshire is between 0.158 ppm and 0.386 ppm. C.101 (a) We are estimating ps − pns where ps is the proportion of smokers who get pregnant in the first cycle of trying and pns is the proportion of non-smokers who get pregnant in the first cycle of trying. For smokers we have p̂s = 38/135 = 0.28. For non-smokers we have p̂ns = 206/543 = 0.38. For a 95% confidence interval, we have z ∗ = 1.96. The confidence interval is p̂s (1 − p̂s ) p̂ns (1 − p̂ns ) ∗ (p̂s − p̂ns ) ± z · + ns nns 0.28(0.72) 0.38(0.62) + (0.28 − 0.38) ± 1.96 · 135 543 −0.10 ± 0.086 −0.186
−0.014
to
We are 95% confident that the difference in pregnancy rates between smoking and non-smoking women during the first cycle of trying to get pregnant is between −0.186 and −0.014. Since 0 (representing no difference) is not in this interval, there appears to be a significant difference between smokers and non-smokers in pregnancy success rates. (b) The hypotheses are H0 :
ps = pns
Ha :
ps = pns
In addition to the sample proportions computed in part (a), we need the pooled proportion. We see that out of the 678 women attempting to become pregnant 244 succeeded in their first cycle, so p̂ = 244/678 = 0.36. The test statistic is z=
p̂s − p̂ns p̂(1 − p̂)
1 1 ns + nns
=
0.28 − 0.38 = −2.17 1 1 + 543 0.36(0.64) 135
This is a two-tail test, so the p-value is twice the area below −2.17 in a standard normal distribution. We see that the p-value is 2(0.015) = 0.030. At a 5% level, we reject H0 . There is evidence that smokers have less success getting pregnant than non-smokers.
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
399
(c) Although the results are significant, the data come from an observational study rather than an experiment. There may be confounding variables and we cannot conclude that there is a causal relationship. C.102 This is a hypothesis test for a difference in means. Using μR for the average rating with a red background and μW for the average rating with a white background, the hypotheses for the test are: H0 : Ha :
μR = μW μR > μW
The relevant statistic for this test is xR − xW , where xR represents the mean rating in the sample with the red background and xW represents the mean rating in the sample with the white background. The relevant null parameter is zero, since from the null hypothesis we have μR − μW = 0. The t-test statistic is: t=
(xR − xW ) − 0 Sample statistic − Null parameter 7.2 − 6.1 = 2 = = 5.69 2 sR sW SE 0.62 0.42 + + 15 12 nR nW
This is a right tail test, so using the t-distribution with df = 11, we see that the p-value is less than 0.0001. This is an extremely small p-value, so we reject H0 and conclude that there is strong evidence that average attractiveness rating is higher with the red background. C.103
(a) The sample is the 2006 randomly selected US adults. The intended population is all US adults.
(b) This is an observational study and we cannot make causal conclusions from this study. (c) This is a hypothesis test for a difference in means. Using μS for the mean number of close confidants for those using a social networking site and μN for the average number for those not using a social networking site, the hypotheses for the test are: H0 : Ha :
μS = μN μS > μN
The relevant statistic for this test is xS − xN , the difference in means for the samples. The relevant null parameter is zero, since from the null hypothesis we have μS − μN = 0. The t-test statistic is: t=
(xS − xN ) − 0 Sample statistic − Null parameter 2.5 − 1.9 = 2 = = 9.91 2 sS sN SE 1.42 1.32 + + 947 1059 nS nN
This is a right tail test, so using the t-distribution with df = 947 we see that the p-value is essentially zero. (Indeed, the test statistic is almost 10 standard deviations above the mean!) This is an extremely small p-value, so we reject H0 and conclude that there is very strong evidence that those with a profile on a social networking site tend, on average, to have more close confidants. (d) There are many possible confounding variables. Remember that a confounding variable is one with a likely association with both variables of interest (in this case, number of close confidants and whether or not the person is on a social networking site). One possible confounding variable is age, while another is gender, and still another is how socially active and extroverted a person is. Other answers are possible. C.104 (a) For the cardiac arrest patients, the proportion is p̂A = 11/116 = 0.095. There are 1595 − 116 = 1479 other patients and 27 − 11 = 16 of them reported a near-death experience, so for the patients with other cardiac problems, the proportion is p̂B = 16/1479 = 0.011.
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
400
(b) Letting pA be the proportion of cardiac arrest patients reporting a near-death experience and pB be the proportion of other heart patients reporting a near-death experience, the hypotheses are H0 : Ha :
pA = pB pA > pB
The sample sizes are large enough to use the normal distribution. To compute the test statistic, we need three sample proportions. In addition to p̂A and p̂B from part (a), we also compute the pooled proportion. We see that p̂ = 27/1595 = 0.017. The test statistic is z=
p̂A − p̂B p̂(1 − p̂)
=
1 1 nA + nB
0.095 − 0.011 = 6.74 1 1 0.017(0.983) 116 + 1479
This is an upper-tail test, so the p-value is the area above 6.74 in a normal distribution. This is almost seven standard deviations above the mean, so we see that the p-value is essentially zero. There is very strong evidence that near-death experiences are significantly more likely for people who have cardiac arrest than for people with other heart problems. C.105 We estimate μf − μm where μf represents the mean time spent exercising for females and μm represents the mean time spent exercising for males. For females, we have xf = 6.40 with sf = 4.60 and nf = 10. For males, we have xm = 6.81 with sm = 3.83 and nm = 26. The data are relatively symmetric with no extreme outliers so we use a t-distribution. For a 95% confidence interval with df = 9, we have t∗ = 2.26. The 95% confidence interval is s2f s2 ∗ + m (xf − xm ) ± t · nf nm 3.832 4.602 + (6.40 − 6.81) ± 2.26 · 10 26 −0.41 ± 3.70 −4.11 to 3.29 We are 95% confident that senior females average between 4.11 hours less and 3.29 hours more of exercise per week than senior males. C.106 (a) We combine the Training and Both categories for those who received training and we combine the Medication and Neither categories for those who did not receive training. For improvement, we combine the Much and Some categories for those who received any improvement at all: the Yes category. The results are shown in the table. Any Improvement Yes No Total
Training 28 9 37
No training 10 25 35
Total 38 34 72
(b) This is a test for a difference in proportions. Letting pT represent the proportion of insomniacs getting any improvement from training and pN represent the proportion of insomniacs getting any improvement with no training, the hypotheses are H0 : Ha :
pT = pN pT > pN
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
401
To compute the test statistic, we need three sample proportions. We have p̂T = 28/37 = 0.757 and p̂N = 10/35 = 0.286. To compute the pooled proportion, we see that a total of 38 people had some improvement out of 72 people in the study, so the pooled proportion is p̂ =
38 = 0.528 72
The test statistic is
p̂T − p̂N p̂(1 − p̂)
=
1 1 nT + nN
0.757 − 0.286 = 4.0 1 1 0.528(0.472) 37 + 35
This is an upper-tail test, so the p-value is the area above 4.0 in a standard normal distribution. The p-value is 0.00003, or essentially zero. There is very strong evidence that training in behavioral modifications helps older people fight insomnia. C.107 Letting μ represent the mean pulse rate of ICU patients, we are testing the hypotheses
The test statistic is
H0 :
μ = 80
Ha :
μ > 80
x − μ0 √s n
=
98.9 − 80 √ = 9.97 26.8/ 200
Using a t-distribution with 199 df, we see that the p-value is essentially zero. There is strong evidence that ICU patients have a mean pulse rate higher than 80 beats per minute. C.108 This is a hypothesis test for a mean, and the data appear to be normal enough that we can use the t-distribution for the test. Letting μ represent the mean body temperature for this person. The hypotheses are H0 :
μ = 98.6
Ha :
μ = 98.6
The summary sample statistics are x = 98.4 with s = 0.49 with n = 12. The test statistic is x − μ0 √s n
=
98.4 − 98.6 0.49 √ 12
= −1.41
This is a two-tail test so the p-value is twice the area below −1.41 in a t-distribution with df = 11. We see that the p-value is 2(0.093) = 0.186. This is not a small p-value so if these readings are a random sample of the person’s body temperatures throughout the day, then there is not convincing evidence that the mean body temperature for this person is different from 98.6◦ F. C.109 Letting pF represent the proportion of men getting prostate cancer while taking finasteride and pC represent the proportion of men getting prostate cancer in the control group taking the placebo, the hypotheses are H0 : Ha :
pF = pC pF < pC
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
402
To compute the test statistic, we need three sample proportions. We have p̂F = 804/4368 = 0.184 and p̂C = 1145/4692 = 0.244. To compute the pooled proportion, we see that a total of 804 + 1145 = 1949 men got prostate cancer out of a total of 4368 + 4692 = 9060 men in the study. The pooled proportion is p̂ =
1949 804 + 1145 = = 0.215 4368 + 4692 9060
The test statistic is z=
p̂F − p̂C p̂(1 − p̂)
=
1 1 nF + nC
0.184 − 0.244 = −6.95 1 1 0.215(0.785) 4368 + 4692
This is a lower-tail test, so the p-value is the area below −6.95 in a normal distribution. This is almost seven standard deviations below the mean, so we see that the p-value is essentially zero. There is very strong evidence that men who take finasteride are less likely to develop prostate cancer. C.110 (a) Placebo-controlled means participants are given a treatment that is as close to the real treatment as feasible — in this case, dark chocolate which has had the flavonoids removed. (b) This is a hypothesis test for a difference in means. Using μC for the mean increase in flow-mediated dilation for people eating dark chocolate every day and μN for the mean increase in flow-mediated dilation for people eating a dark chocolate substitute each day, the hypotheses for the test are H0 : Ha :
μC = μN μC > μN
The relevant statistic for this test is xC − xN , the difference in means for the two samples. The relevant null parameter is zero, since from the null hypothesis we have μC − μN = 0. The t-test statistic is: t=
(xC − xN ) − 0 1.3 − (−0.96) Sample statistic − Null parameter = 2 = = 2.63 2 sC sN SE 2.322 1.582 + + 11 10 nC nN
This is an upper-tail test, so using the t-distribution with df = 9, we see that the p-value is 0.014. This is a reasonably small p-value, so we reject H0 and conclude that there is evidence that dark chocolate improves vascular health. The results are significant at a 5% level but not at a 1% level. (This is not surprising given the very small sample sizes. The fact that the results are significant at all is pretty impressive.) (c) Yes, the results are significant and come from a randomized experiment. C.111 This is a test for a mean. If we let μ be the average age of honeybee scouts, we are testing H0 : μ = 12 vs Ha : μ > 12. The standardized test statistic is t=
x − μ0 29.1 − 12 √ = √ = 21.6 s/ n 5.6/ 50
This t-statistic is very large, so we know that the p-value is essentially zero. There is very strong evidence that scout bees are older, on average, than the general population of all honeybees. C.112 (a) We test H0 : μ1 = μ2 vs Ha : μ1 > μ2 where μ1 and μ2 are the mean AAMP scores for athletes using Tribulus and those not using it, respectively. We proceed with some caution since the
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
403
sample sizes are rather small and the article did not include any information about possible outliers or skewness in the data. The relevant test statistic is x1 − x2 1305.6 − 1255.9 = = 1.13 t= 2 2 s1 s22 177.32 + 66.8 20 12 n1 + n2 Using a t-distribution with 11 df, we find that the p-value in the upper tail is 0.141. We do not have sufficient evidence to conclude that athletes using Tribulus have a higher mean AAMP score than those not using the supplement. (b) To compare the mean AAMP using measurements on the same subjects (the 20 athletes in the experimental group) before and after using the Tribulus supplement we would need the paired data to find the difference for each subject. There is no way to recover the standard deviation of the differences (and complete the test) by knowing just the mean and standard deviation for all 20 participants before and after using the supplement. C.113
(a) The sample proportion is p̂ = 15/90 = 0.167. The standard error is calculated as: p̂(1 − p̂) 0.167(1 − 0.167) SE = = = 0.039 n 90
(b) Since we have more than 10 cases (15 is the smallest) in each group the Central Limit Theorem applies. (c) Using our answers to parts (a) and (b), and the fact that for a 95% confidence interval, we have z ∗ = 1.96, we calculate p̂ ± z ∗ · SE = 0.167 ± 1.96(0.039) = 0.167 ± 0.076 = (0.091, 0.243) We are 95% sure that between 9.1% and 24.3% of houses for sale in these three Mid-Atlantic states are larger than the 2400 sq. ft. in size. C.114 (a) For Mid-Atlantic houses the count in both groups (large and small) is greater than 10, so the normal distribution is appropriate. For the California houses we see only 3 big houses, so the normal distribution may not be appropriate. (b) We can use the test based on the normal distribution. Using p to represent the proportion of MidAtlantic homes larger than the national average, we are testing H0 : p = 0.25 vs Ha : p < 0.25. The sample proportion for big homes in the Mid-Atlantic states is p̂ = 15/90 = 0.167. We calculate the test statistic 0.167 − 0.25 p̂ − p0 = = −1.83 z= p0 (1−p0 ) n
0.25(1−0.25) 90
The area in a standard normal distribution below z = −1.83 gives p-value = 0.034. This is small enough to reject H0 at a 5% significance level. We have sufficient evidence to show that the proportion of big houses for sale in these Mid-Atlantic states is less than 25%. (c) Because the CLT may not apply (even at a null value of p = 0.25 we only have np = 7.5 < 10), we return to the methods of Chapter 4 and perform a randomization test to test the California proportion. The observed sample proportion is p̂ = 3/30 = 0.100. We use StatKey or other technology to generate a randomization distribution assuming p = 0.25 and find the proportion of simulated randomizations yielding sample proportions less than or equal to 0.10.
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
404
This proportion beyond 0.10 in this set of randomization samples is 0.036, so our p-value = 0.036. At a 5% level, we reject the null hypothesis, meaning we have sufficient evidence that the proportion of California houses considered big is less than 25%. Note that the p-values are similar in parts (b) and (c), even though the California proportion is much farther away from 0.25, but its sample size is much smaller than the Mid-Atlantic states. C.115
(a) The sample difference is our best point estimate p̂M − p̂C =
15 37 − = 0.167 − 0.10 = −0.067 90 30
(b) The number of large houses in California is too small for the Central Limit Theorem to apply (3 < 10), so we return to the methods of Chapter 3 and use StatKey or other technology to create a bootstrap distribution by taking samples of size 90 and 30 with replacement from the original samples of MidAtlantic homes for sale and California homes for sale.
The bootstrap distribution is approximately symmetric. We use the percentile method, keeping the middle 90%, and get an 90% confidence interval of (−0.056, 0.167). We are 90% sure that the proportion of big homes in Mid-Atlantic states is between 0.056 lower and 0.167 higher than the proportion of big homes in California.
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
405
(c) Since the 90% confidence interval for the difference in proportions (−0.056, 0.167) contains zero, we do not have clear evidence that the proportion of big homes is different between the two locations, even at a 10% significance level. C.116 This test of difference in means is testing the null hypothesis that the Patriots average air pressure and Colts average air pressure was the same (H0 : μp = μc ), against the alternative that the Patriots average air pressure was less than the Colts (H0 : μp < μc ). The test statistic is x̄p − x̄c 11.10 − 12.63 t= 2 = = −11.36 sp s2c 0.402 0.122 + + 11 4 np nc Comparing to a t-distribution with 3 degrees of freedom results in a p-value of 0.00073. So we conclude that the average air pressure of the New England Patriot’s balls was significantly less than the average air pressure of the Indianapolis Colt’s balls. C.117
(a) For a 99% confidence interval with df = 37, we have t∗ = 2.72. The confidence interval is s x ± t∗ · √ n 144 111.7 ± 2.72 · √ 38 111.7 ± 63.5 48.2
to
175.2
The best estimate for the mean time to infection for all kidney dialysis patients is 111.7 days with a margin of error of 63.5. We are 99% confident that the mean time to infection for all kidney dialysis patients is between 48.2 days and 175.2 days. (b) Both 24 days and 165 days are reasonable values for individual patients. Both values lie well within one standard deviation (s = 144) of the estimated mean of 111.7 days. In fact, the actual data in the sample range from 2 days to 536 days. Remember that the confidence interval is an interval for the mean of the population — not a range for individual values. (c) Since the confidence interval goes from 48.2 to 175.2, it would be implausible for the mean in the population to be as small as 24 days, but a mean of 165 days would be a plausible value for the population. C.118 While there are some possible outliers in the 80’s age range, they are not extreme and otherwise the data looks reasonable, so we proceed using the t-distribution. (a) For the 13 teenage patients, we compute xT = 126.15 with sT = 19.57. With df = 12, we have t∗ = 2.18. The 95% confidence interval is sT xT ± t∗ · √ nT 19.57 126.15 ± 2.18 · √ 13 126.15 ± 11.83 114.32 to 137.98 We are 95% confident that the mean systolic blood pressure reading for Intensive Care Unit teenage patients is between 114.32 and 137.98.
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
406
For the 15 patients in their eighties, we compute xE = 132.27 with sE = 31.23. With df = 14, we have t∗ = 2.14. The 95% confidence interval is xE
±
132.27
±
132.27 115.01
± to
sE t∗ · √ nE 31.23 2.14 · √ 15 17.26 149.53
We are 95% confident that the mean systolic blood pressure reading for Intensive Care Unit patients in their 80s is between 115.01 and 149.53. The margin of error is larger for the patients in their 80’s. Although the sample size is slightly larger for this group, the variability is quite a bit larger, causing the margin of error to be larger. (b) If μT represents the mean systolic blood pressure reading of ICU teenage patients and μE represents the mean systolic blood pressure reading for patients in their 80s, the hypotheses are H0 : Ha :
μT = μE μT = μE
Using the summary statistics given in part (a), we compute the test statistic: xT − xE 126.15 − 132.27 t= 2 = = −0.63 2 sT sE 19.572 31.232 + + 13 15 nT nE This is a two-tail test, so the p-value is two times the area below −0.63 in a t-distribution with df = 12. We see that the p-value is 2(0.270) = 0.540. This is a very large p-value so there is no clear evidence of a difference in blood pressure between the two age groups. C.119 This is a hypothesis test for a difference in means. Using μL for the average weight gain of mice with light at night and μD for the average weight gain of mice with darkness at night, the hypotheses for the test are: H0 : Ha :
μL = μD μL > μD
The relevant statistic for this test is xL − xD , where xL represents the mean weight gain of the sample mice in light and xD represents the mean weight gain of the sample mice in darkness. The relevant null parameter is zero, since from the null hypothesis we have μL − μD = 0. The t-test statistic is: t=
(xL − xD ) − 0 9.4 − 5.9 Sample statistic − Null parameter = 2 = = 3.52 sL s2D SE 3.22 1.92 + + 19 8 nL nD
This is a right tail test, so using the t-distribution with df = 7, we see that the p-value is 0.005. This is quite a small p-value so we reject H0 . There is strong evidence that mice with light at night gain significantly more weight on average than mice with darkness at night.
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
407
C.120 These are paired data so we need to compute the differences, d = P ostest − P retest, given below. 17.5,
5,
7.5,
15,
5,
25,
15,
17.5,
15,
17.5
The mean of these improvement differences is xd = 14.0 with a standard deviation of sd = 6.37. The distribution of the differences is relatively symmetric with no big outliers so we use the t-distribution to find the confidence interval for the mean difference, with t∗ = 2.262 for 95% confidence with df = 9. xd
±
sd t∗ · √ n
14.0
±
14.0 9.44
± to
6.37 2.262 · √ 10 4.56 18.56
Based on these data we are 95% sure that the mean improvement between the CAOS posttest and pretest scores for this instructor’s students is between 9.44 and 18.56 points. C.121 (a) We test H0 : μ2 = 54 vs Ha : μ2 > 54 where μ2 is the mean score for all of the instructor’s students on the CAOS posttest. The mean for the sample of 10 students is x2 = 60.25 with standard deviation s2 = 9.96. The relevant t-statistic is t=
60.25 − 54.0 √ = 1.98 9.96/ 10
A dotplot of the posttest scores is relatively symmetric with no strong outliers so we use a t-distribution with 9 df to find the upper-tail p-value = 0.040. Since this p-value is less than 5%, we reject H0 and have evidence that the mean score for this instructor’s students on the CAOS posttest is higher than the benchmark mean of 54.0 points. (b) We test H0 : μ1 = 44.9 vs Ha : μ1 > 44.9 where μ1 is the mean score for all of the instructor’s students on the CAOS pretest. The mean for the sample of 10 students is x1 = 46.25 with standard deviation s1 = 9.30. The relevant t-statistic is t=
46.25 − 44.9 √ = 0.46 9.30/ 10
A dotplot of the pretest scores is relatively symmetric with no strong outliers so we use a t-distribution with 9 df to find the upper-tail p-value = 0.328. Since this p-value is quite large, we do not find sufficient evidence to reject H0 . The mean score for this instructor’s students on the CAOS pretest is not significantly more than the benchmark mean of 44.9 points. (c) We test H0 : μd = 9.1 vs Ha : μd > 9.1 where μd is the mean improvement (P osttest − P retest) for all of the instructor’s students on the CAOS exams. To test this hypothesis, we need to compute the differences for the 10 students in the sample. 17.5,
5,
7.5,
15,
5,
25,
15,
17.5,
15,
17.5
The mean improvement for the sample of 10 students is xd = 14.0 with standard deviation sd = 6.37. The relevant t-statistic is 14.0 − 9.1 √ = 2.43 t= 6.37/ 10
408
UNIT C: ESSENTIAL SYNTHESIS & REVIEW A dotplot of the improvement differences is relatively symmetric with no strong outliers so we use a t-distribution with 9 df to find the upper-tail p-value = 0.019. Since this p-value is less than 5%, we reject H0 and have evidence that the mean improvement for this instructor’s students on the CAOS exam is higher than the benchmark mean of 9.1 points.
C.122 The sample includes 240 females and 260 males, both large enough that we needn’t be concerned about problems due to lack of normality. The estimated mean commuting times are quite close, xf = 21.6 minutes for women and xm = 22.3 minutes for men, giving a difference of just 0.7 minutes longer (on average) for men’s commutes. If we test for a difference in mean commute time with H0 : μf = μm vs Ha : μf = μm we find a lack of evidence for much difference in the means (p-value = 0.585). Based on the confidence interval, we are 95% sure that the mean female commuters in St. Louis average between 3.2 minutes less and 1.8 minute more in commute time compared to males. C.123 Here is some computer output for doing both a confidence interval for μf − μm and a test of H0 : μf = μm vs Ha : μf = μm . (Note: You could also use the summary statistics to compute the details of the test by hand.)
Contrary to the St. Louis commute results, the p-value in this case is small (0.014) so we have evidence of a difference in mean commute time by gender in Atlanta. From the confidence interval, we are 95% sure that the mean commute time for females in Atlanta is between 8.14 and 0.93 minutes less than the mean commute time for males in Atlanta. C.124 This question suggests a hypothesis test to compare the means commute times between trips made with the carbon and steel bikes. The relevant hypotheses are H0 : μc = μs vs Ha : μc = μs , where μc and μs are the mean commute times (in minutes) for the carbon and steel bikes. Using technology and the data in the M inutes variable of BikeCommute we find the summary statistics below for the samples using each type of bike along with boxplots to compare the distributions.
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
409
The distributions of commute times in both samples are relatively symmetric and show no outliers, so we use a two-sample t-test with 26 − 1 = 25 degrees of freedom to compare the means. The t-test statistic is xc − xs 108.34 − 107.79 = = 0.36 t= 2 sc s2s 6.252 4.862 + + 26 30 nc ns For the two-tailed alternative, we double the area beyond 0.36 for a t-distribution with 25 degrees of freedom to get a p-value = 2(0.3609) = 0.7218. This is not at all a small p-value, so we fail to find evidence that the mean commute time differs between the two types of bikes. The difference in mean times for these samples was about 0.55 minutes (or about 33 seconds) — and the older, cheaper, steel bike had the smaller (sample) mean time. C.125 (a) If we let μh and μw denote the mean age at marriage for husbands and wives, respectively, the relevant hypotheses are H0 : μh = μw vs Ha : μh > μw . Since these are paired data, collected from n = 105 couples, we could also use the hypotheses H0 : μd = 0 vs Ha : μd > 0 where μd is the mean difference in age, Husband − W if e. If needed we use technology to compute the difference in age for each couple and then run a single sample t-test to see if the mean difference is greater than zero. For this sample husbands are older by an average of xd = 2.83 years with a standard deviation of sd = 5.00. The t-statistic is 5.80 and the p-value for the upper-tail test is essentially zero. This gives very strong evidence that on average the husbands are older. (b) Using the data in MarriageAges we find that the husband is older than the wife in 75 of the 105 marriages or 71.4% of the time. To see if this provides evidence that the proportion is bigger than 0.5 for the entire population we use technology to test H0 : p = 0.5 vs Ha : p > 0.5. The z-statistic for the test is z = 4.39 and the p-value is essentially zero. This gives very strong evidence that the husband is older than the wife in more than 50% of recently married couples in St. Lawrence County. (c) Using technology with the differences from part (a) we find that a 95% confidence interval for the difference in mean ages goes from 1.87 to 3.80 years. We are 95% sure that, on average, husbands are between 1.87 and 3.80 years older than their wives. Using the fact that the husband was older in 75 out of 105 couples sampled, technology indicates that a 95% confidence interval for the proportion goes from 0.618 to 0.798. We are 95% sure that the husband is older than his wife in between 61.8% and 79.8% of newly married couples in St. Lawrence County. C.126 (a) Using the data in MarriageAges we find that the mean age at marriage for the 105 wives is xw = 31.8 years old with an interval that says we are 95% sure the mean age for wives in the population is between 29.8 and 33.9 years old. (b) For the husbands the mean age on marriage licenses in the sample is xh = 34.7 years old with a 95% confidence interval for the mean age of husbands in the population going from 32.3 to 37.0 years old. (c) You might be tempted to use the fact that the 95% confidence intervals for wives (29.8, 33.9) and husbands (32.3, 37.0) overlap to conclude that the difference between the mean ages is not significant. This would be wrong on two counts. First, the separate intervals do not take into account the pairing of husbands and wives in the data. Second, just because there is some overlap at the ends of the two intervals, we cannot conclude a common mean is simultaneously plausible for both groups. In fact, a paired data t-test for these data give strong evidence that the mean age is higher for husbands.
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
410
C.127 These data are paired, so we use only the Difference variable to do paired difference in means inference. Using a software package and the variable Difference in the dataset TrafficFlow, we obtain the following output from one statistics package: One-Sample T: Difference Test of mu = 0 vs not = 0 Variable Difference
N 24
Mean 61.00
StDev 15.19
SE Mean 3.10
95% CI (54.59, 67.41)
T 19.68
P 0.000
The mean difference is 61.0 minutes. Since the differences are positive, and the subtraction went as Dif f erence = T imed − F lexible, we see that the timed method has higher mean times in this sample compared to the flexible times. We see that a 95% confidence interval is 54.59 to 67.41. The flexible method reduces delay time by an average of between 54.59 and 67.41 minutes in simulations. We see from the p-value of 0.000 that the difference is significant: the flexible system is clearly better than the timed system. In order for this inference to be valid, note that we must assume that the simulations run by the engineers are a representative sample of all types of traffic flow on these streets. C.128 The relevant hypotheses are H0 : ρ = 0 vs Ha : ρ > 0, where ρ is the correlation between uniform malevolence and penalty minutes. We can construct randomization samples under this null hypothesis by randomly assigning the malevolence ratings to the standardized penalty minute values. Finding the correlations for 1000 of these randomization samples produces a dotplot such as the one shown below.
The standard deviation of the correlations in this randomization distribution is SE = 0.222. Using the correlation from the original sample of r = 0.521 we obtain a standardized test statistic z=
0.521 − 0 = 2.35 0.222
The area above 2.35 in a N (0, 1) distribution, the upper tail p-value, is 0.0094. Since this value is quite small (less than 5%) we have strong evidence that there is a positive correlation between perceived uniform malevolence and penalty minutes for NHL teams. C.129 (a) The randomization distribution is formed under the assumption that the null hypothesis, H0 : ρ = 0, is true. A standard error computed from this distribution might not be a good estimate of the standard error for correlations when the “true” correlation is around 0.521. (b) Instead of using a randomization distribution to estimate the standard error, the student should find correlations for bootstrap samples (with replacement) from the original sample, then estimate the standard error based on the standard deviation of those bootstrap correlations.
UNIT C: ESSENTIAL SYNTHESIS & REVIEW
411
(c) One set of 1000 bootstrap correlations is shown in the dotplot below.
Based on the standard deviation of these bootstrap correlations, we estimate the standard error for the sample correlations to be about SE = 0.235 (which is quite close to the standard deviation of 0.22 from the randomization distribution). For a 90% confidence interval, the standard normal endpoint is z ∗ = 1.645. This gives a confidence interval for the correlation of 0.521 ± 1.645 · 0.235 = 0.521 ± 0.387 = (0.134, 0.908) Thus we are 90% sure that the correlation between the malevolence rating of uniforms and the number of penalty minutes for NHL teams is somewhere between 0.134 and 0.908. Note that this is a very wide interval, although it does contain only positive values for the correlation. (d) The bootstrap distribution of correlations shows a clear left skew. A normal distribution is probably not appropriate in this case. We should question the validity of the confidence interval found in (c). C.130
(a) The standard deviation of the 5000 P hat values in RandomP50N200 is 0.0354.
(b) The sample proportion is p̂ = 84/200 = 0.42. The standardized test statistic is z=
0.42 − 0.5 = −2.26 0.0354
(c) If H0 : p = 0.5 is true, we expect (on average) half of the 200 spins, or a count of 100 to be heads. (d) The standard deviation of the 5000 Count values in RandomP50N200 is 7.081. (e) We see that z=
84 − 100 = −2.26 7.081
It is the same as that in (b). (f) Using technology, the area below −2.26 in a standard normal distribution is 0.012. Since this is a two-tailed test, we double that area to find p-value = 2 · 0.012 = 0.024. This is a small value, so we have fairly strong evidence that the proportion of heads when spinning a penny differs from 0.50. Therefore penny spinning is probably not a fair process.
UNIT D: ESSENTIAL SYNTHESIS & REVIEW
528 Unit D: Essential Synthesis Solutions
D.1 This is a chi-square goodness-of-fit test. The null hypothesis is that the bills are equally distributed among the three servers, while the alternative hypothesis is that they are not equally spread out. In symbols the hypotheses are H0 : Ha :
pA = pB = pC = 1/3 Some pi = 1/3
The expected count for each cell is n · pi = 157 · (1/3) = 52.33. We compute the chi-square statistic: (65 − 52.33)2 (32 − 52.33)2 (60 − 52.33)2 + + = 1.124 + 3.068 + 7.898 = 12.09 52.33 52.33 52.33 The upper-tail p-value from a chi-square distribution with df = 2 is 0.002. This is a small p-value, so we reject H0 and find evidence that the bills are not equally distributed between the three servers. Server C appears to have substantially fewer bills than expected if they were equally distributed, and the result is significant enough to generalize (assuming the sample data are representative of all bills). χ2 =
D.2 This is a chi-square goodness-of-fit test. The null hypothesis is that the bills are equally distributed between the five days of the week over which the data were collected, while the alternative hypothesis is that they are not equally spread out. In symbols the hypotheses are H0 : Ha :
pm = ptu = pw = pth = pf = 1/5 Some pi = 1/5
The expected count for each cell is n · pi = 157 · (1/5) = 31.4. We compute the chi-square statistic: χ2
=
(13 − 31.4)2 (62 − 31.4)2 (36 − 31.4)2 (26 − 31.4)2 (20 − 31.4)2 + + + + 31.4 31.4 31.4 31.4 31.4 4.139 + 10.782 + 29.820 + 0.674 + 0.929
=
46.344
=
This is a very large chi-square statistic and the p-value from the upper tail of a chi-square distribution with df = 4 is essentially zero. We reject H0 and find strong evidence that the bills are not equally distributed across the days of the week. There are a particularly large number of bills on Wednesday and a particularly small number on Tuesday. If this is a random sample of bills from this restaurant, it would be very unlikely to see this much difference between the daily counts if all days were equally popular, so there is strong evidence that business at the restaurant is not equally spread across the days. D.3 This is a chi-square test on a two-way table. The null hypothesis is that use of a credit card does not differ based on the server and the alternative hypothesis is that there is an association between card use and server. The expected count for the (Cash, Server A) cell is (60 · 106)/157 = 40.51 For the (Cash, Server A) cell, we then compute the contribution to the chi-square statistic as (39 − 40.51)2 /40.51 = 0.056 Finding the other expected counts and contributions similarly (or using technology) produces a table with the observed counts in each cell, expected counts below them, and the contributions to the chi-square statistic below that, as in the computer output below.
UNIT D: ESSENTIAL SYNTHESIS & REVIEW
Cash
Card
A 39 40.51 0.0563
B 50 43.89 0.8520
C 17 21.61 0.9816
21 19.49 0.1169
15 21.11 1.7708
15 10.39 2.0401
Cell Contents:
529
Count Expected count Contribution to Chi-square
Pearson Chi-Square = 5.818, DF = 2, P-Value = 0.055 Adding up all the contributions to the chi-square statistic, we obtain χ2 = 5.818 (also seen in the computer output). Using the upper tail of a chi-square distribution with df = 2, we obtain a p-value of 0.055. This is a borderline p-value but, at a 5% level, does not provide enough evidence of an association between server and the use of cash or credit. D.4 This is a chi-square test on a two-way table. The null hypothesis is that use of a credit card does not differ based on the day and the alternative hypothesis is that there is an association between day and card use. The expected count for the (Cash, Monday) cell is (20 · 106)/157 = 13.50 For the (Cash, Monday) cell, we can then compute the contribution to the chi-square statistic as (14 − 13.50)2 /13.50 = 0.019 Finding the other expected counts and contributions similarly (or using technology) produces a table with the observed counts in each cell, expected counts below them, and the contributions to the chi-square statistic below that, as in the computer output below. Mon
Tues
Wed
Thurs
Fri
Cash
14 13.50 0.0183
5 8.78 1.6254
41 41.86 0.0177
24 24.31 0.0038
22 17.55 1.1260
Card
6 6.50 0.0380
8 4.22 3.3783
21 20.14 0.0367
12 11.69 0.0080
4 8.45 2.3403
Cell Contents:
Count Expected count Contribution to Chi-square
Pearson Chi-Square = 8.592, DF = 4, P-Value = 0.072 * NOTE * 1 cells with expected counts less than 5
UNIT D: ESSENTIAL SYNTHESIS & REVIEW
530
Adding up all the contributions to the chi-square statistic, we obtain χ2 = 8.592 (also seen in the computer output). However, notice that the expected count for (Tuesday, Credit) is 4.22, so one cell is (just barely) less than 5. (Notice that the bottom line of the computer output warns us of this problem.) The conditions for conducting a chi-square test are not quite met. Since the conditions are just barely not met, we might proceed with the test but with caution. However we see in the computer output that the p-value is 0.070, which is borderline significant and a large part of the contribution to the chi-square statistic comes from the (Tuesday, Credit) cell. We might be better off doing a randomization test with these data. D.5 This is an analysis of variance test for a difference in means. The sample sizes of the three groups (servers) are all greater than 30, so the normality condition is met, and we see from the standard deviations that the condition of relatively equal variability is also met. We proceed with the ANOVA test. The null hypothesis is that the means are all the same (no association between tip percent and server) and the alternative hypothesis is that there is a difference in mean tip percent between the servers. In symbols, where μ represents the mean tip percentage, the hypotheses are H0 : Ha :
μA = μB = μC Some μi = μj
Using technology with the data in RestaurantTips we obtain the analysis of variance table shown below. One-way ANOVA: PctTip versus Server Source Server Error Total
DF 2 154 156
SS 83.1 2917.9 3001.1
MS 41.6 18.9
F 2.19
P 0.115
We see that the F-statistic is 2.19 and the p-value for an F-distribution with 2 and 154 degrees of freedom is 0.115. This is not a small p-value, so we do not reject H0 . We do not find convincing evidence of a difference in mean percentage tip between the three servers. D.6 This is an analysis of variance test for a difference in means. The sample sizes of the three groups are all greater than 30, so the normality condition is met, and we see from the standard deviations that the condition of relatively equal variability is also met. We proceed with the ANOVA test. The null hypothesis is that the means are all the same (no association between bill and server) and the alternative hypothesis is that there is a difference in the mean size of the bill between the servers. In symbols, where μ represents the mean bill size, the hypotheses are H0 : Ha :
μA = μB = μC Some μi = μj
Using technology with the data in RestaurantTips we obtain the analysis of variance table shown below. One-way ANOVA: Bill versus Server Source Server Error Total
DF 2 154 156
SS 490 22567 23057
MS 245 147
F 1.67
P 0.191
UNIT D: ESSENTIAL SYNTHESIS & REVIEW
531
We see that the F-statistic is 1.67 and the p-value for an F-distribution with 2 and 154 degrees of freedom is 0.191. This is not a small p-value so we do not reject H0 . We do not find convincing evidence of a difference between the servers in the mean size of the bill. D.7
(a) There appears to be a positive association in the data with larger parties tending to give larger tips. There is one outlier: The (generous) party of one person who left a $15 tip.
(b) We are testing H0 : ρ = 0 vs Ha : ρ > 0 where ρ is the correlation between number of guests and size of the tip. As always, the null hypothesis is that there is no relationship, while the alternative hypothesis is that there is a relationship between the two variables. The test statistic is √ √ 0.504 · 155 r· n−2 = 7.265 = t= √ 1 − r2 1 − (0.5042 ) This is a one-tailed test, so the p-value is the area above 7.265 in a t-distribution with df = n−2 = 155. We see that the p-value is essentially zero. There is strong evidence of a significant positive linear relationship between these two variables. The tip does tend to be larger if there are more guests. (c) No, we cannot assume causation since these data do not come from an experiment. An obvious confounding variable is the size of the bill, since the bill tends to be higher for more guests and higher bills generally tend to correspond to higher tips. D.8
(a) For every additional guest, the predicted tip increases by about $1.31.
(b) The predicted tip amount for three guests is T ip = 1.1068 + 1.3087(Guests) = 1.1068 + 1.3087(3) = $5.03 (c) For all tables of three guests at this restaurant, we can be 95% confident that the average tip is most likely somewhere between $4.57 and $5.49. For any one table with three guests (e.g., the table she is waiting on at the moment), we can be 95% confident that her tip will be between $0.86 and $9.20. D.9
(a) We see that R2 = 83.7%, which tells us that 83.7% of the variability in the amount of the tip is explained by the size of the bill.
(b) From the output the F-statistic is 797.87 and the p-value is 0.000. There is very strong evidence that this regression line T ip = −0.292 + 0.182 · Bill is effective at predicting the size of the tip. D.10
(a) The regression equation is T ip = −0.252 + 0.184Bill − 0.036Guests. The predicted tip for three guests with a $30 bill is T ip = −0.252 + 0.184(30) − 0.036(3) = $5.16
(b) The coefficient for Bill is b1 = 0.184, indicating that, after accounting for the number of guests, every dollar increase in the bill results in a $0.184 dollar increase in predicted tip. The coefficient for Guests is b2 = −0.036, indicating that, after accounting for the size of the bill, each additional guest corresponds to a $0.036 decrease in tip. (c) The p-value for testing the coefficient of Bill is essentially zero. This indicates that Bill is a useful predictor of T ip in this model. The p-value for testing the coefficient of Guests is quite large at 0.727, which indicates that Guests is not a helpful predictor in this model. (d) We see that R2 = 83.7%, which tells us that 83.7% of the variability in the amount of the tip is explained by the model (the size of the bill and the number of guests).
UNIT D: ESSENTIAL SYNTHESIS & REVIEW
532
(e) The hypotheses being tested by the ANOVA for regression are H0 : Ha :
Model is ineffective Model is effective
or equivalently
H0 : Ha :
β1 = β2 = 0 β1 = 0 or β2 = 0
From the output the F-statistic is 396.74 and the p-value is 0.000. There is very strong evidence that this fitted regression model, T ip = −0.292 + 0.184 · Bill − 0.036 · Guests, is effective at predicting the size of the tip at this restaurant. However, Guests is relatively ineffective in the model (p-value = 0.727), so we might consider removing it. It is also likely that Bill and Guests are highly correlated with each other so that Guests is really not needed in the model if Bill is in it.
UNIT D: ESSENTIAL SYNTHESIS & REVIEW
533
Unit D: Review Exercise Solutions D.11 The area above t = 1.36 gives a p-value of 0.0970, which is not significant at a 5% level. D.12 The area above F = 7.42 gives a p-value of 0.0003, which is significant at a 5% level. D.13 For five groups we have 5 − 1 = 4 degrees of freedom for the chi-square goodness-of-fit statistic. The area above χ2 = 4.18 gives a p-value of 0.382, which is not significant at a 5% level. D.14 For a one-predictor model (simple linear regression), we have n − 2 = 30 − 2 = 28 degrees of freedom for the t-distribution. The two-tail area beyond t = 2.89 gives a p-value of 2(0.0037) = 0.0074, which is significant at a 5% level. D.15 For a difference in means ANOVA with k = 6 groups and an overall sample size of n = 100, we use an F-distribution with k − 1 = 6 − 1 = 5 numerator df and n − k = 100 − 6 = 94 denominator df. The area above F = 2.51 for this distribution gives a p-value of 0.035, which is significant at a 5% level. D.16 For a 2 × 4 table, we have (r − 1)(c − 1) = (2 − 1)(4 − 1) = 3 degrees of freedom for the chi-square distribution. The area above χ2 = 6.83 gives a p-value of 0.078, which is not significant at a 5% level. D.17 For a multiple regression model with k = 3 predictors, we have n − k − 1 = 26 − 3 − 1 = 22 degrees of freedom for the t-distribution to test an individual coefficient. Accounting for two tails and the area beyond t = 1.83 gives a p-value of 2(0.0404) = 0.0808, which is not significant at a 5% level. D.18 For testing a correlation, we have n − 2 = 81 − 2 = 79 degrees of freedom for the t-distribution. The area below t = −4.51 gives a p-value of 0.00001 (essentially zero), which is very significant at a 5% level. D.19 White blood cell count is a quantitative variable so this is a test for a difference in means between the three groups, which is analysis of variance for difference in means. D.20 Whether or not a person develops AIDS is categorical, so this is a test between two categorical variables. A chi-square test for association is most appropriate. D.21 Both of the relevant variables are quantitative, and we can use a test for correlation, a test for slope, or ANOVA for regression. D.22 They are using many quantitative variables to develop a multiple regression model. To test its effectiveness, we use analysis of variance for regression. D.23 The data are counts from the sample within different racial categories. To compare these with the known racial proportions for the entire city, we use a chi-square goodness-of-fit test with the census proportions as the null hypothesis. D.24 This is a multiple regression model and we are interested in how significant one of the predictor variables is in the model. To test its significance, we use a t-test for the coefficient of the relevant variable. D.25 They are using many quantitative variables to develop a multiple regression model. To test its effectiveness, we use ANOVA for regression. D.26 The time it takes for a case to go to trial is a quantitative variable, and this is a test for a difference in means between the seven groups, which is ANOVA for difference in means.
UNIT D: ESSENTIAL SYNTHESIS & REVIEW
534
D.27 Whether or not a case gets settled out of court is categorical, and the county is the other categorical variable, so this is a test between two categorical variables. A chi-square test for association is most appropriate. D.28 The null hypothesis is that the proportion of sales for each of the cars is 1/3, while the alternative hypothesis is that at least one of the proportions is not 1/3. Total sales of the three cars are n = 22,274 + 21,385 + 20,808 = 64,467 so the expected count in each group is 64,467(1/3) = 21,489. If sales were exactly the same we would expect each model to sell 21,489 cars. We calculate the chi-square test statistic χ2
=
(observed − expected)2
=
expected (21,385 − 21,489)2 (20,808 − 21,489)2 (22,274 − 21,489)2 + + 21,489 21,489 21,489 28.68 + 0.50 + 21.58
=
50.76
=
Compared to a chi-square distribution with 2 degrees of freedom, we get a very small p-value ≈ 0. We conclude that the sales are different amongst these three models. We can go further to look at the contributions to the chi-square test statistic to conclude that Escapes are selling better, and Fusions are selling worse. D.29 Since we are only interested in cases where one performed higher on the math and verbal sections, we ignore the students who scored the same on each. The relevant hypotheses are H0 : pm = pv = 0.5 vs Ha : Some pi = 0.5, where the pm and pv are the proportions with higher Math or Verbal SAT scores, respectively. The total number of students (ignoring the ties) is 205 + 150 = 355, so the expected count in each cell, assuming equally likely, is 355(0.5) = 177.5. We compute a chi-square test statistic as χ2 = (205 − 177.5)2 /177.5 + (150 − 177.5)2 /177.5 = 8.52 Using the upper tail of chi-square distribution with 1 degree of freedom yields a p-value of 0.0035. At a 5% significance level, we conclude that students are not equally likely to have higher Math or Verbal scores. From the data, we see that students from this population are more likely to have a higher Math score. (Note that since there are only two categories after we eliminate the ties, we could have also done this problem as a z-test for a single proportion with H0 : p = 0.5.) D.30
(a) To see if a chi-square distribution is appropriate, we find the expected count in each cell by multiplying the total for the row by the total for the column and dividing by the overall sample size (n = 156). These expected counts are summarized in the table below, and we see that the smallest expected counts of 5.2, (Rain, SF) and (Rain, SJ), are both (barely) greater than five, so a chi-square distribution is reasonable.
Rain No Rain Total
LA 5.6 19.4 25
SF 5.2 17.8 23
SD 19.1 65.9 85
SJ 5.2 17.8 23
Total 35 121 156
UNIT D: ESSENTIAL SYNTHESIS & REVIEW
535
(b) The null hypothesis is that distribution of rain/no rain days does not depend on the city and the alternative is that the rain/no rain distribution is related to the city. We have already calculated the expected counts in part (a), so we proceed to compute the chi-square test statistic by summing (observed − expected)2 /expected for each cell χ2 =
(6 − 5.2)2 (22 − 19.1)2 (20 − 17.8)2 (4 − 5.6)2 + + + ... + = 2.52 5.6 5.2 19.1 17.8
Comparing χ2 = 2.52 to the upper tail of a chi-square distribution with 3 degrees of freedom yields a p-value of 0.472. (c) This is not a small p-value, so we do not have sufficient evidence to show that the proportion of rainy days is different among these four cities. D.31
(a) To see if a chi-square distribution is appropriate, we find the expected count in each cell by multiplying the total for the row by the total for the column and dividing by the overall sample size (n = 85). These expected counts are summarized in the table below and we see that the smallest expected count of 5.2 (Rain, Fall) is (barely) greater than five, so a chi-square distribution is reasonable.
Rain No Rain Total
Spring 5.4 15.6 21
Summer 5.7 16.3 22
Fall 5.2 14.8 20
Winter 5.7 16.3 22
Total 22 63 85
(b) The null hypothesis is that distribution of rain/no rain days in San Diego does not depend on the season and the alternative is that the rain/no rain distribution is related to the season. We have already calculated the expected counts in part (a), so we proceed to compute the chi-square test statistic by summing (observed − expected)2 /expected for each cell χ2 =
(0 − 5.7)2 (6 − 5.2)2 (11 − 16.3)2 (5 − 5.4)2 + + + ... + = 14.6 5.4 5.7 5.2 16.3
Comparing χ2 = 14.6 to the upper tail of a chi-square distribution with 3 degrees of freedom yields a p-value of 0.002. (c) This is a small p-value, so we reject the null hypothesis, indicating that there is a difference in the proportion of rainy days among the four seasons, and it appears the rainy season (with almost twice as many rainy days as expected if there were no difference) is the winter. D.32 We test H0 : Home size is not related to state vs Ha : Home size is related to state. Since there are 30 homes from each state, the expected counts for all four “Smaller” cells are (30 · 68)/120 = 17.0 and the expected counts for all the “Larger” cells are (30 · 52)/120 = 13.0. Since all the expected counts are greater than 5, we proceed with the chi-square test. The chi-square statistic is χ2 =
(15 − 17)2 (13 − 13)2 (15 − 17)2 + + ... + = 3.258 17 17 13
Using a χ2 distribution with df = 3, we find a p-value of 0.354. We do not find convincing evidence that location of the house for sale (at the level of the state it’s in) is related to the house being smaller or larger.
UNIT D: ESSENTIAL SYNTHESIS & REVIEW
536 D.33
(a) The null hypothesis is that all the state average home prices are equal and the alternative is that at least two states have different means. H0 : μCA = μN Y = μN J = μP A Ha : Some μ1 = μj
(b) We are comparing k = 4 states, so the numerator degrees of freedom is k − 1 = 3. (c) The overall sample size is 120 homes, so the denominator degrees of freedom is n − k = 116. (d) The sum of squares for error will tend to be much greater than the sum of squares for groups because we will be dividing the sum of squares for error by 116 to standardize it, while we will only be dividing the sum of squares groups by 3. However, without looking at the data, we cannot tell for sure and we have even less knowledge about how the mean squares might compare. D.34 The underlying conditions to use an F-distribution for ANOVA are not met. In the histogram of NY home prices, we see that the distribution is heavily skewed and certainly not normal, but since we have n = 30 in each group, this may not be so serious a problem. The other condition that is violated is the equality of standard deviations. The standard deviation for New York (sny = 317.8) is a bit more than more then 2 times that of Pennsylvania (spa = 137.1). D.35 The null hypothesis is that the mean fiber amounts are all the same and the alternative hypothesis is that at least two of the companies have different mean amounts of fiber. H0 : μGM = μK = μQ Ha : Some μ1 = μj Since there are 3 groups (the three companies), the degrees of freedom for groups is 2. Since the sample size is 30, the total degrees of freedom is 29. The error degrees of freedom is 30 − 3 = 27. We subtract to find the error sum of squares: SSError = SST otal − SSGroups = 102.47 − 4.96 = 97.51. Filling in the rest of the ANOVA table, we have: Source Company Error Total
DF 2 27 29
SS 4.96 97.51 102.47
MS 2.48 3.61
F 0.69
P 0.512
The p-value of 0.512 is found using the upper tail (beyond F = 0.69) for an F-distribution with 2 numerator df and 27 denominator df. The p-value is very large, so this sample does not provide evidence that the mean number of grams of fiber differs between the three companies. D.36
(a) Letting μ denote mean height, we have the hypotheses H0 : μ B = μ T = μ A = μ S Ha : At least one μi = μj
(b) Yes, the conditions for ANOVA are satisfied. In three of the four groups the sample sizes are greater than 30. In the group with only nT = 20 (tenors), the boxplot looks approximately symmetric with no outliers, so the assumption of normality is not violated. The condition of equal variability is satisfied because the within group standard deviations within each of the groups are not that different; the highest standard deviation (sT = 3.22) is less than twice the lowest (sS = 1.87). (c) The rest of the calculations are summarized within the ANOVA table below.
UNIT D: ESSENTIAL SYNTHESIS & REVIEW
537
Source
d.f.
Sum of Sq.
Groups
4−1=3
1058.5
Error
130 − 4 = 126
796.7
Total
130 − 1 = 129
1855.2
Mean Square
F-statistic
p-value
1058.5 = 352.83 3 796.7 = 6.32 126
352.83 = 55.8 6.32
≈0
The p-value ≈ 0 provides strong evidence against the null hypothesis. The average height of singers differs by voice. D.37
(a) For the red ink sample, the mean is 4.4 and the sample size is 19. We have M SE = 0.84 and, for 95% confidence from a t-distribution with df = n − k = 71 − 3 = 68, we have t∗ = 2.00. The confidence interval is √ ∗ M SE xi ± t √ ni √ 0.84 4.4 ± 2.00 √ 19 4.4 ± 0.42 3.98 to 4.82 We are 95% confident that the mean number of anagrams solved by people getting the puzzles in red ink is between 3.98 and 4.82.
(b) Using the same t∗ and M SE from part (a), we have (xi − xj )
±
(5.7 − 4.4)
±
1.3 0.75
± to
1 1 t M SE + ni nj 1 1 + 2.00 0.84 27 19 ∗
0.55 1.85
We are 95% confident that people can solve, on average, between 0.75 and 1.85 more anagrams when green ink is used than when red ink is used. Since zero is not in this interval, there is a significant difference between the mean number of anagrams solved in the two conditions. (c) We are testing H0 : μR = μB vs Ha : μR = μB , where μR and μB represent the mean number of anagrams people can solve if the ink used is red or black, respectively. The test statistic is t=
xR − xB M SE
1 1 nR + nB
=
4.4 − 5.9 0.84
1 1 19 + 25
= −5.38
We use a t-distribution with df = 68 to find the p-value. The t-statistic is quite large in magnitude so, even after multiplying by 2 for the two-tailed test, the p-value is essentially zero. There is very strong evidence of a difference in the mean number of anagrams people can solve based on whether the ink is red or black. We see that people solve fewer anagrams with red ink.
UNIT D: ESSENTIAL SYNTHESIS & REVIEW
538
D.38 In every case, we are testing H0 : μi = μj vs Ha : μi = μj , where μi and μj represent the mean temperature increase in the indicated conditions. Testing legs together vs lap pad, we have t=
xi − xj M SE
=
1 1 ni + nj
2.31 − 2.18 0.63
1 1 29 + 29
= 0.62
We use a t-distribution with df = n − k = 87 − 3 = 84 to find the p-value. This is a two-tailed test so the p-value is 2(0.268) = 0.536. This large p-value shows no convincing evidence that a lap pad affects the mean temperature increase when legs are together. Testing legs together vs legs apart, we have t=
xi − xj M SE
=
1 1 ni + nj
2.31 − 1.41 0.63
1 1 29 + 29
= 4.32
We again use a t-distribution with df = 84 to find the p-value. The t-statistic is quite large in magnitude so, even after multiplying by 2 for the two-tailed test, the p-value is essentially zero. There is very strong evidence of a difference in mean temperature increase between keeping the legs together or legs apart. Testing lap pad vs legs apart, we have t=
xi − xj M SE
1 1 ni + nj
=
2.18 − 1.41 0.63
1 1 29 + 29
= 3.69
We again use a t-distribution with df = 84 to find the p-value. This is a two-tailed test so the p-value is 2(0.0002) = 0.0004. There is strong evidence of a difference in mean temperature increase between keeping the legs apart or using a lap pad with legs together. The mean temperature increase is lower when keeping the legs apart. In summary, there is no significant difference in mean temperature increase between using a lap pad or not using when keeping the legs together. However, the mean temperature increase is significantly less than in either of those conditions when keeping the legs apart. D.39 The hypotheses are H0 : ρ = 0 vs Ha : ρ > 0, where ρ is the correlation between standardized cognition score and GPA for all students. The t-statistic is: √ √ r n−2 0.267 251 t= √ = 4.39 = 1 − r2 1 − (0.2672 ) Using a t-distribution with n − 2 = 253 − 2 = 251 degrees of freedom, we find a p-value of essentially zero. We find strong evidence of a positive association between CognitionZscore and GP A for students at this college. D.40 The hypotheses are H0 : ρ = 0 vs Ha : ρ = 0, where ρ is the correlation between happiness score and average sleep for all students. The t-statistic is: √ √ r n−2 0.104 251 t= √ = 1.66 = 1 − r2 1 − (0.1042 )
UNIT D: ESSENTIAL SYNTHESIS & REVIEW
539
Using a t-distribution with n − 2 = 253 − 2 = 251 degrees of freedom, for this two-tailed test, we find a p-value of 2(0.0491) = 0.098. At a 5% level, we do not find convincing evidence for a linear relationship between this measure of happiness and average number of hours slept at night for all students. The results are borderline, however, and are significant at a 10% level. D.41
(a) The slope is b1 = 0.0831. If the depression score goes up by 1, the predicted number of classes missed goes up by 0.0831.
(b) The t-statistic is 2.47 and the p-value is 0.014. At a 5% level, we find that the depression score is an effective predictor of the number of classes missed. (c) We see that R2 = 2.4%. Only 2.4% of the variability in number of classes missed can be explained by the depression score. Clearly, many other variables are also involved in explaining the number of classes a student will miss in a semester. (d) The F-statistic is 6.09 and the p-value is 0.014. At a 5% level, we find that this model based on the depression score has some value for predicting the number of classes missed. D.42
(a) The slope is b1 = 0.0620. If the number of drinks goes up by 1, the predicted number of classes missed goes up by 0.0620.
(b) The t-statistic is 1.24 and the p-value is 0.215. We do not find convincing evidence that the number of alcoholic drinks helps to predict the number of missed classes. (c) We see that R2 = 0.6%. The number of alcoholic drinks predicts very little (only 0.6%) of the variability in number of classes missed. (d) The F-statistic is 1.55 and the p-value is 0.215. This model based on the number of drinks is not effective at predicting the number of classes missed. D.43 There are several problems with the regression conditions, the most serious of which is the number of large outliers; points well above the line in the scatterplot with regression line. These also contribute to the right skew in the histogram of residuals, violating the normality condition. The residuals vs fits plot doesn’t show roughly equal bands on either side of the zero mean, rather we again see the several large positive residuals that aren’t balanced with similar sized negative residuals below the line. There is no clear curvature in the data, but the residual vs fits plot shows an interesting pattern as the most extreme negative residuals decrease in regular fashion — not a random scatter. We should be hesitant to use inference based on a linear model for these data (including the earlier exercise for these variables). D.44 There are several problems with the regression conditions, the most serious of which is the number of large outliers; points well above the line in the scatterplot with regression line. These also contribute to the right skew in the histogram of residuals, violating the normality condition. The residuals vs fits plot doesn’t show roughly equal bands on either side of the zero mean, rather we again see the several large positive residuals that aren’t balanced with similar sized negative residuals below the line. There is no clear curvature in the data, but the residual vs fits plot shows an interesting pattern as the most extreme negative residuals decrease in regular fashion — not a random scatter. We should be hesitant to use inference based on a linear model for these data (including the earlier exercise for these variables). D.45
(a) The 95% confidence interval for the mean response is 715.0 to 786.8. We are 95% confident that the mean number of points for all players who make 100 free throws in a season is between 715.0 and 786.8.
(b) The 95% prediction interval for the response is 311.1 to 1190.7. We are 95% confident that a player who makes 100 free throws in a season will have between 311 and 1191 points for the season.
UNIT D: ESSENTIAL SYNTHESIS & REVIEW
540 D.46
(a) The 95% confidence interval for the mean response is 1770.3 to 1913.5. We are 95% confident that the mean number of points for all players who make 400 free throws in a season is between 1770.3 and 1913.5.
(b) The 95% prediction interval for the response is 1397.8 to 2286.1. We are 95% confident that a player who makes 400 free throws in a season will have between 1398 and 2286 points for the season. D.47
(a) The coefficient of Gender is −0.0971. Since males are coded 1, this means that, all else being equal, a male is predicted to have a GPA that is 0.0971 less than a female. The coefficient of ClassY ear is −0.0558. All else being equal, as students move up one class year, their GPA is predicted to go down by 0.0558. The coefficient of ClassesM issed is −0.0146. All else being equal, for every additional class missed, the predicted GPA of students goes down by 0.0146.
(b) The p-value from the ANOVA test is 0.000, so this model is effective at predicting GPA. (c) We see that R2 = 18.4%, so 18.4% of the variability in grade point averages can be explained by the model using these six explanatory variables. (d) We see from the p-values for the individual slopes that CognitionZscore is the most significant variable in the model, with a p-value of 0.001, while Gender is the least significant with a p-value of 0.069. Note that all the variables are significant at the 10% level, however. (e) Four of the variables are significant at the 5% level: ClassY ear, CognitionZscore, DASScore (just barely), and Drinks. D.48
(a) The coefficient of Age is 0.08378. All else being equal, a person one year older will have a predicted percent body fat that is about 0.084 higher. The coefficient of Abdomen is 1.0327. If a person gains one centimeter on his or her abdomen circumference (with all other variables remaining the same), the predicted percent body fat goes up by 1.0327.
(b) The p-value from the ANOVA test is 0.000, so this model is effective at predicting percent body fat. (c) We see that R2 = 75.7%, so 75.7% of the variability in body fat can be explained by the model using these nine explanatory variables. (d) We see from the p-values for the individual slopes that Abdomen is the most significant variable in the model, with a p-value of 0.000, while N eck is the least significant with a p-value of 0.998. (e) Two of the variables are significant at the 5% level: Abdomen and W rist. D.49
(a) This is an association between two quantitative variables, so can be tested with either a test for correlation or a test for slope in simple linear regression. Here we do a test for correlation. Let ρ be the true correlation between hours of exercise per week and GPA for all students, and our hypotheses are then H0
:ρ=0
Ha
: ρ = 0
Using technology, we find the sample correlation to be r = −0.159. We can use technology to find the p-value or we can use the formula. Using the formula, the t-statistic is then t=
r−0 1−r 2 n−2
=
−0.159 1−(−0.1592 ) 343−2
=
−0.159 = −2.98 0.053
We compare this to a t-distribution with df = n − 2 = 343 − 2 = 341 and get a p-value of 0.003. We have strong evidence for a negative correlation between hours of exercise per week and GPA for college students.
UNIT D: ESSENTIAL SYNTHESIS & REVIEW
541
(b) Using technology, we fit the multiple regression model with GP A as the response variable and Exercise and Gender as explanatory variables. The output is below. Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.311599 0.043427 76.257 < 2e-16 *** Exercise -0.009739 0.003828 -2.544 0.01139 * GenderCode -0.125922 0.042497 -2.963 0.00326 ** Residual standard error: 0.3888 on 340 degrees of freedom Multiple R-squared: 0.04967, Adjusted R-squared: 0.04408 F-statistic: 8.885 on 2 and 340 DF, p-value: 0.0001732 The p-value for testing the coefficient of Exercise of 0.011 indicates that Exercise is a significant predictor of GP A, even after accounting for GenderCode. D.50
(a) This is an association between two quantitative variables, so can be tested with either a test for correlation or a test for slope in simple linear regression. Here we do a test for correlation. Let ρ be the true correlation between number of piercings and GPA, and our hypotheses are then H0
:ρ=0
Ha
: ρ = 0
Using technology, we find the sample correlation between P iercings and GP A to be r = 0.079. We can find the p-value using technology or using the formula. Using the formula, we see that the t-statistic is then √ √ r· n−2 0.079 · 343 − 2 t= √ = 1.48 = 1 − r2 1 − (0.0792 ) We compare this to a t-distribution with df = n − 2 = 343 − 2 = 341 and get a p-value of 0.143. The results are insignificant, and we do not have sufficient evidence that an association exists between the number of piercings and GPA of college students. (b) Using technology, we fit the multiple regression model with GP A as the response variable and P iercings and SAT as explanatory variables. The output is below. Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.6035944 0.2022610 7.928 3.22e-14 *** Piercings 0.0207638 0.0091077 2.280 0.0232 * SAT 0.0012604 0.0001653 7.625 2.47e-13 *** Residual standard error: 0.3674 on 340 degrees of freedom Multiple R-squared: 0.1514, Adjusted R-squared: 0.1464 F-statistic: 30.33 on 2 and 340 DF, p-value: 7.593e-13 The p-value for P iercings of 0.0232 indicates that, after accounting for SAT score, the number of piercings is significantly associated with GPA. Note: In part (b) we could also use P iercings as the response with GP A and SAT as the predictors and the result (significance of the relationship between GP A and P iercings after accounting for SAT ) would be identical.
UNIT D: ESSENTIAL SYNTHESIS & REVIEW
542
D.51 Here is some output for fitting the model to predict Bodyfat with all nine predictors. The regression equation is Bodyfat = -23.7 + 0.0838 Age - 0.0833 Weight + 0.036 Height + 0.001 Neck - 0.139 Chest + 1.03 Abdomen + 0.226 Ankle + 0.148 Biceps - 2.20 Wrist Predictor Constant Age Weight Height Neck Chest Abdomen Ankle Biceps Wrist S = 4.13552
Coef -23.66 0.08378 -0.08332 0.0359 0.0011 -0.1387 1.0327 0.2259 0.1483 -2.2034
SE Coef 29.46 0.05066 0.08471 0.2658 0.3801 0.1609 0.1459 0.5417 0.2295 0.8129
R-Sq = 75.7%
Analysis of Variance Source DF SS Regression 9 4807.36 Residual Error 90 1539.23 Total 99 6346.59
T -0.80 1.65 -0.98 0.14 0.00 -0.86 7.08 0.42 0.65 -2.71
P 0.424 0.102 0.328 0.893 0.998 0.391 0.000 0.678 0.520 0.008
R-Sq(adj) = 73.3%
MS 534.15 17.10
F 31.23
P 0.000
We see that a number of the predictors have very large p-values for the t-test of the coefficient, indicating that they are not very helpful in this model. Suppose that we drop the Neck and Height measurements which have the worst individual p-values (0.998 and 0.893) to obtain the new output shown below. The regression equation is Bodyfat = -20.5 + 0.0850 Age - 0.0757 Weight - 0.144 Chest + 1.02 Abdomen + 0.214 Ankle + 0.144 Biceps - 2.21 Wrist Predictor Constant Age Weight Chest Abdomen Ankle Biceps Wrist S = 4.09076
Coef -20.47 0.08496 -0.07569 -0.1444 1.0223 0.2137 0.1442 -2.2082
SE Coef 14.62 0.04938 0.05856 0.1539 0.1231 0.5246 0.2244 0.7485
R-Sq = 75.7%
Analysis of Variance Source DF SS Regression 7 4807.03 Residual Error 92 1539.56 Total 99 6346.59
T -1.40 1.72 -1.29 -0.94 8.30 0.41 0.64 -2.95
P 0.165 0.089 0.199 0.350 0.000 0.685 0.522 0.004
R-Sq(adj) = 73.9%
MS 686.72 16.73
F 41.04
P 0.000
UNIT D: ESSENTIAL SYNTHESIS & REVIEW
543
We see that the value of R2 = 75.7% remains unchanged and SSModel has only gone down by 0.33 (from 4807.36 to 4807.03). Clearly we lose essentially no predictive power by dropping these two very ineffective predictors, so the model is improved. But can you do even better? When choosing a model from among many predictors there is rarely an absolute “best” choice that is optimal by all criteria. For this reason, there are several (even many) reasonable models that would be acceptable choices for this situation. D.52 We might start by looking at correlations between Happiness and other variables in the SleepStudy dataset. One set of variables with, not surprisingly, fairly strong (negative) correlations with Happiness are DepressionScore (r = −0.542), AnxietyScore (r = −0.355), and StressScore (r = −0.360). If we fit a model using these three predictors, we obtain the output below. The regression equation is Happiness = 29.0 - 0.464 DepressionScore - 0.160 AnxietyScore + 0.0424 StressScore Predictor Constant DepressionScore AnxietyScore StressScore S = 4.59793
Coef 28.9796 -0.46359 -0.15999 0.04244
SE Coef 0.4590 0.06190 0.07816 0.05702
R-Sq = 30.7%
Analysis of Variance Source DF SS Regression 3 2326.81 Residual Error 249 5264.09 Total 252 7590.90
T 63.13 -7.49 -2.05 0.74
P 0.000 0.000 0.042 0.457
R-Sq(adj) = 29.8%
MS 775.60 21.14
F 36.69
P 0.000
The overall model is effective (ANOVA p-value ≈ 0), and it explains 30.7% of the variability in Happiness for this sample. However, the t-test for the coefficient of StressScore is quite large (0.457), indicating that StressScore is not very useful in this model. Can we do better? Should StressScore be dropped from this model? Would one of the other potential explanatory variables be more helpful? When choosing a model from among many predictors there is rarely an absolute “best” choice that is optimal by all criteria. For this reason, there are several (even many) reasonable models that would be acceptable choices for this situation. Can you find one that’s better than the one above?
FINAL ESSENTIAL SYNTHESIS
544 Final Essential Synthesis Solutions E.1 Dear Congressman Daniel Webster,
You criticized the American Community Survey as being a random survey. However, the fact that it is a random survey is crucial for enabling us to make generalizations from the sample of people surveyed to the entire population of US residents. We can only generalize from the sample to the population if the sample is representative of the population (closely resembles the population in all characteristics, except only smaller). Unfortunately, without randomness we are notoriously bad at choosing representative samples. Because the whole point of the survey is to gain information about the population, we do not know what the population looks like, and so have no way of knowing what is “representative”. On the bright side, randomly choosing a sample yields a group of people that are representative of the population. With a random sample, the larger the sample size, the closer the sample statistics will be to the population values you care about. With non-random samples this may not be the case. In short, we can best draw valid scientific conclusions from samples that have been randomly selected. Sincerely, A statistics student E.2 The normal-based confidence interval has the form Statistic ± z ∗ · SE, and a value of z ∗ = 1.645 corresponds to a 90% confidence interval. The number for both blanks is 90. E.3
(a) A bootstrap distribution, generated via StatKey and showing the cutoffs for the middle 90% is shown below:
Based on the middle 90% of bootstrap proportions, a 90% confidence interval is 0.77 to 0.98. (b) We estimate the standard error using the formula p̂(1 − p̂) 0.0875(1 − 0.0875) = = 0.0063 SE = n 2000 Notice that this matches the standard deviation of the bootstrap distribution. For a 90% confidence
FINAL ESSENTIAL SYNTHESIS
545
interval z ∗ = 1.645, so we generate the interval as sample statistic
±
z ∗ · SE
p̂ 0.0875
± ±
z ∗ · SE 1.645 · 0.0063
0.0875 0.077
± to
0.01036 0.098
(c) We are 90% confident that between 7.7% and 9.8% of US residents do not have health insurance. (d) The sample statistic is p̂ = 0.0875 and the margin of error is z ∗ · SE = 1.645 · 0.0063 = 0.01036. (e) The sample size is much larger for the entire American Community Survey sample than for the 2000 people sub-sampled for ACS, so the margin of error will be much smaller. (f) The 90% confidence interval is sample statistic ± margin of error, or 0.087 ± 0.001 = (0.086, 0.088). Based on the full ACS survey, we are 90% confident that between 8.6% and 8.8% of all US residents do not have health insurance. E.4
(a) We can create a 90% confidence interval either by bootstrapping or with formulas and the normal distribution. The bootstrap distribution and corresponding 90% confidence interval (0.501 to 0.538) is shown below:
Alternatively, we can use formulas. For a 90% confidence interval, z ∗ = 1.645, so the interval is z ∗ · SE
statistic
±
p̂
±
0.52
±
0.52
±
p̂(1 − p̂) n 0.52(1 − 0.52) 1.645 · 2000 1.645 · 0.0112
0.52 0.502
± to
0.018 0.538
1.645 ·
FINAL ESSENTIAL SYNTHESIS
546
Based on the ACS data, we are 90% confident that the proportion of people aged 15 and older who are married in the US is between 0.502 and 0.538. (b) About 90% of confidence intervals should contain the population parameter, so about 0.9 · 17 = 15.3, or about 15 of the 17 intervals will contain the true population value. (c) The confidence interval in part (a) does not include 0.545, so this is not a plausible value for the true proportion of people 15 and older who are married in the US in 2017 (at a 10% significance level). Based just on the ACS data, we do have some evidence that the proportion of US adults who are married is smaller in 2017 than it was in 2000. Note that we could also answer this part with a formal hypothesis test for a proportion, but don’t need to if we already have the confidence interval from part (b). If we were to do a two-tailed test of H0 : p = 0.545, based on the ACS sample from 2017, the p-value is 0.026.
E.5
(a) Income is a quantitative variable, so we can visualize its distribution with a histogram:
800
Frequency
600 400 200 0 0
100
200
300 Income
400
500
600
The distribution of income for employed US residents is strongly right-skewed. Most people have yearly incomes below $100,000, but some people have incomes that are much higher. The maximum yearly income in this dataset is an outlier making over $500,000 a year. (b) The mean yearly income in the sample is x = $44, 520, while the median income is $30,200. The standard deviation of incomes is s = $55, 061 and the IQR = $49, 000. Yearly incomes in this dataset range from a minimum of $0 to a maximum of $566,000. (c) We are looking at the relationship between a quantitative variable and a categorical variable, so can visualize with side-by-side boxplots:
FINAL ESSENTIAL SYNTHESIS
547
500
Income
400 300 200 100 0 0
1 Sex
It appears that males tend to make more than females. The income distributions for each sex are heavily skewed toward larger incomes with several high outliers. (d) In this sample, the males make an average of xM = $51, 529, while the females make an average of xF = $36, 765. The males make an average of $51, 529 − $36, 765 = $14, 764 more in income than the females. Also, the five-number summary for males (0, 15.0, 36.0, 69.3, 566.0) is generally higher than for females (0, 8.9, 28.0, 50.0, 502.0) and they have more variability (sM = 61.9, sF = 45.1). (e) Let μM and μF denote the average yearly income for employed males and females, respectively, who live in the US. We test the hypotheses H0 Ha
: μM = μF : μM = μF
We use StatKey or other technology to create a randomization distribution for difference in means, as shown below:
Of the 10,000 simulated randomization statistics, none were beyond the observed difference in means (14.764), so the p-value is essentially zero. We have strong evidence that the average yearly income among employed US residents in 2017 is higher for males than for females.
548
FINAL ESSENTIAL SYNTHESIS
E.6
(a) We are visualizing the relationship between a quantitative variable and a categorical variable, so use side-by-side boxplots. We see many values concentrated near the median of 40 for both distributions, with outliers in both directions for both boxplots. The median and lower quartile are both 40 for females, while the median and upper quartile are both 40 for males.
80
HoursWk
60
40
20
0 0
1 Sex
(b) Females in the sample work an average of xF = 34.9 hours per week, and males work an average of xM = 40.5 hours per week, so males in the sample work an average of 40.5 − 34.9 = 5.6 hours more per week than females. (c) Let μM and μF denote the average hours worked per week for employed US males and females, respectively. We test the hypotheses H0 : μ M = μ F Ha : μ M > μ F The test statistic is t = =
=
= =
statistic − null SE xM − xF − 0 2 sM s2F nM + nF
40.5 − 34.9
12.462 12.852 676 + 611
5.6 0.707 7.92
We find the area above t = 7.92 in a t-distribution with 611 − 1 = 610 degrees of freedom to get a p-value of essentially 0. There is very strong evidence that males work more hours, on average, per week than females, among employed US residents in 2017. E.7
(a) The number of people with health insurance in each racial group is the number of people of that race multiplied by the proportion of that race with health insurance. So the number of white people
FINAL ESSENTIAL SYNTHESIS
549
with health insurance is 1520 · 0.9257 = 1407.1, which rounds to 1407 (counts of people have to be whole numbers). The complete table is given below (and we could also generate this table directly from the ACS data).
Health Insurance No Health insurance
White 1407 113
Black 178 21
Asian 118 11
Other 122 30
(b) This is a visualization of two categorical variables, which can be done with a segmented bar chart: Not Insured Insured
1400 1200 1000 800 600 400 200 0
white
black
asian
other
(c) We are testing for an association between two categorical variables, one of which has more than two levels, so use a chi-square test for association. The hypotheses are H0 : There is no association between health insurance status and race Ha : There is an association between health insurance status and race We compute expected counts for each cell using (row total)(column total)/(sample size). The table of observed (expected) counts is given below:
Health Insurance No Health insurance Total
White 1407 (1387.0) 113 (133.0) 1520
Black 178 (181.6) 21 (17.4) 199
Asian 118 (117.7) 11 (11.3) 129
Other 122 (138.7) 30 (13.3) 152
Total 1825 175 2000
This gives a chi-square statistic of χ2 =
(observed − expected)2 expected
=
(1407 − 1387.0)2 (30 − 13.3)2 + ... + = 27.09 1387.0 13.3
The expected counts are all greater than 5, so we can compare this to a chi-square distribution with (2 − 1)(4 − 1) = 3 degrees of freedom and find the p-value as the area above χ2 = 27.09. This gives a p-value of 0.000006. There is strong evidence for an association between whether or not a person has health insurance and race.
FINAL ESSENTIAL SYNTHESIS
550 E.8
(a) Age is a quantitative variable, so we can visualize it with a histogram:
250
Count
200 150 100 50 0 20
40
60
80
100
Age
People’s ages appear to be relatively evenly distributed between the ages of 20 and 70, with few respondents less than 20. After 70 the number of people alive in each age range decreases as ages get older. Ages go from teenage years to somewhere in the 90’s. (b) The sample mean age is x = 48.29 and the standard deviation is s = 19.34. Either by looking at the standard deviation of the bootstrap distribution or using the formula we find the standard error of the sample mean to be 19.34 s = 0.432, SE = √ = √ n 2000 so a 95% confidence interval for the mean using t∗ = 1.961 is x 48.29 48.29
± ± ±
t∗ · SE 1.961 · 0.432 0.847
47.44
to
49.14
We are 95% confident that the average age of US residents (from which the ACS sample is taken) is between 47.44 and 49.14 years old. (c) To compare ages by race we are looking at the relationship between a quantitative and a categorical variable, so we can visualize with side-by-side boxplots:
FINAL ESSENTIAL SYNTHESIS
551
Age
80
60
40
20 asian
black
other
white
Race
It appears that Whites and Blacks are slightly older, on average, than Asians, with people of other races tending to be younger than the other groups. Each of the distributions is relatively symmetric with similar variability. (d) We are testing for an association between a quantitative variable and a categorical variable with more than two categories, so we use analysis of variance for difference in means. Define μrace to be the average age for a given race in the US. The hypotheses are then H0 : μwhite = μblack = μasian = μother Ha : At least two means are different The sample sizes are large, the distributions appear to be relatively symmetric within groups, and the sample standard deviations are close together, so the conditions for using the F-distribution are satisfied. The ANOVA table is given below: Analysis of Variance Table Response: Age Df Sum Sq Mean Sq F value Pr(>F) Race 3 17477 5825.6 15.925 3.116e-10 *** Residuals 1996 730190 365.8 Total 1999 747667 The p-value of 3.1−10 is very small, so we have strong evidence that average age differs by race. (e) The sample mean age for Asians is xasian √ = 42.97. We are doing inference after ANOVA, so approximate all group standard deviations with M SE. Therefore, the standard error of the sample mean age of Asians is M SE 365.8 = 1.684 SE = = nasian 129
FINAL ESSENTIAL SYNTHESIS
552
The 95% confidence interval is for mean Asian age, using t∗ = 1.961 for 1996 df, is t∗ · SE 1.961 · 1.684
xasian 42.97
± ±
42.97 39.67
± 3.30 to 46.27
We are 95% sure that the average age for US Asians in 2017 is between 39.67 and 46.27 years old. This interval is centered around a lower number (42.97 as opposed to 48.29) and is much wider than the interval for all people in the US because the sample size for Asians is only 129 as opposed to 2000. E.9
(a) HoursW k and Income are both quantitative variables, so we visualize with a scatterplot. Note: We could also switch the variables between the axes, unless we’ve read ahead to part (c).
500
Income
400 300 200 100 0 0
20
40 HoursWk
60
80
There appears to be a positive trend, although the association might be slightly curved, rather than linear, and the variability appears to increase quite a bit as the number of hours worked increases. (b) These are both quantitative variables, so we do a test for correlation. The hypotheses are H0 : ρ = 0 vs Ha : ρ > 0. The sample size is n = 1287 and the sample correlation is r = 0.338. The relevant t-statistic is √ √ r n−2 0.338 1287 − 2 t= √ = √ = 12.9 1 − r2 1 − 0.3382 We compare this to a t-distribution with 1287 − 2 = 1285 degrees of freedom and find the p-value is essentially 0. Hours worked per week and income are very significantly positively associated. (c) Some output for a regression model to predict Income based on HoursW k is given below: Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -9.9889 4.4678 -2.236 0.0255 * HoursWk 1.4395 0.1116 12.893 <2e-16 *** --Residual standard error: 51.83 on 1285 degrees of freedom Multiple R-squared: 0.1145,Adjusted R-squared: 0.1139 F-statistic: 166.2 on 1 and 1285 DF, p-value: < 2.2e-16 = −9.9889 + 1.4395 · HoursW k. The prediction equation is Income
FINAL ESSENTIAL SYNTHESIS
553
(d) The predicted yearly income for someone who works 40 hours a week is = −9.9889 + 1.4395 · 40 = 47.591 Income or about $47,591. (e) The percent of the variability in income explained by the number of hours worked per week is R2 = 11.45%. (f) Below is a scatterplot with the regression line on it:
500
Income
400 300 200 100 0 0
20
40 HoursWk
60
80
The condition of constant variability is clearly violated; variability in income is much higher when more hours are worked per week. There might also be a small amount of curvature in the relationship — note the relatively large number of points below the line for 20–35 hours of work a week and the negative predicted incomes when hours per week is very small. E.10
(a) Hours per week is a confounding variable in the relationship between income and gender because it is associated with both gender and income.
(b) Multiple regression output from a model regressing Income on both Sex and HoursW k is given below: Response: Income Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -11.4478 4.5022 -2.543 0.0111 * Sex 7.0031 2.9585 2.367 0.0181 * HoursWk 1.3809 0.1142 12.095 <2e-16 *** --Residual standard error: 51.74 on 1284 degrees of freedom Multiple R-squared: 0.1184,Adjusted R-squared: 0.117 F-statistic: 86.22 on 2 and 1284 DF, p-value: < 2.2e-16 The p-value for Sex is 0.018, so even after accounting for number of hours worked per week, Sex is a significant predictor of Income.
FINAL ESSENTIAL SYNTHESIS
554 (c) The predicted income for a male who works 40 hours per week is Income
= =
−11.4478 + 7.0031 · Sex + 1.3809 · HoursW k −11.4478 + 7.0031 · 1 + 1.3809 · 40
=
50.791
or about $51,791. The predicted yearly income for a female who works 40 hours a week is = −11.4478 + 7.0031 · 0 + 1.3809 · 40 = 43.788 Income or about $43,788. The predicted income is $50,791 − $43,788 = $7,003 higher for males. Note that this corresponds exactly to the coefficient of the Sex variable in the fitted model. E.11 Skull size is quantitative and which mound each skull was found in is categorical with two categories, so they should use a test for a difference in means. E.12 Sugar content is quantitative and pineapple strain is categorical with more than two categories, so analysis of variance for difference in means should be used. E.13 We would like to make a statement about a population based on a sample proportion, so would use a confidence interval for a proportion. E.14 MAJOR and FUTURE are both categorical with more than two categories, so we would see if they are related using a chi-square test for association. E.15 We care about a single proportion (proportion of numbers ending in 0 or 5) and wish to determine whether this differs from 0.20, so would use a test for a proportion. E.16 Sherlock Holmes needs to predict height (quantitative) based on length of stride (quantitative), so would fit a simple linear regression model with height as the response variable and length of stride as the explanatory variable. E.17 Both of these variables are quantitative, and the question is asking whether a negative correlation exists, so we would do a one-sided test for correlation. An lower-tail test for the slope in a regression model would also be acceptable. E.18 Number of times urinating per day is quantitative, and we want to estimate an average, so use a confidence interval for a mean. E.19 This is testing for an association between two categorical variables, each with two categories, so we could use either a test for difference in proportions or a chi-square test for association. E.20 Ounces of alcohol consumed is quantitative and class year is categorical with four different categories, so would use analysis of variance for difference in means. E.21 We want to estimate a proportion, so would do an interval for a proportion. E.22 Both of these variables are quantitative, so we would do a test for correlation. (Simple linear regression is also an acceptable answer.) E.23 We want to predict human equivalent age (quantitative) based on dog age (quantitative), so use slope of the simple linear regression.
FINAL ESSENTIAL SYNTHESIS
555
E.24 He wants to use at least three explanatory variables (GPA, science/math courses, honors or not) to predict a quantitative response variable (MCAT score), so should use multiple regression. E.25 Even though the experiment contains four groups, we are only interested in a categorical variable with two categories; exercise instructions or not, and one quantitative variable, amount of weight loss. The goal is not to test for a difference between the groups but to estimate the difference between the groups, so we use a confidence interval for difference in means. E.26 The question pertains to one categorical variable, same or different, and we want to test whether a single proportion (say, proportion of same) is significantly different from 0.5. This is a test for a proportion. (Note that difference in proportions is not appropriate because there is only one categorical variable, not two.) E.27 This involves two categorical variables (land or not, double loop or double flip), and she is interested whether the proportion of landing is higher for loop or flip, so should use a test for difference in proportions.