Test Bank For Statistics Unlocking the Power of Data, 3rd Edition Robin H. Lock, Patti Frazer Lock, Kari Lock Morgan, Eric F. Lock, Dennis F. Lock Chapter 1-10 Chapter 1 Collecting Data 1.1 The Structure of Data Use the following to answer the questions below: A high school senior is collecting data on the colleges in which she is interested. Identify the variables as either categorical or quantitative. 1) Type of college: Private or Public college Answer: Categorical Diff: 1 Type: SA Var: 1 L.O.: 1.1.2 2) Tuition: in thousands of dollars Answer: Quantitative Diff: 1 Type: SA Var: 1 L.O.: 1.1.2 3) State: the state in which the college is located Answer: Categorical Diff: 1 Type: SA Var: 1 L.O.: 1.1.2 4) Zip Code: the zip code of the part of the country in which the college is located Answer: Categorical Diff: 2 Type: SA Var: 1 L.O.: 1.1.2 5) Enrollment: the number of students enrolled at the college Answer: Quantitative Diff: 1 Type: SA Var: 1 L.O.: 1.1.2 6) Student-Faculty Ratio: the number of students divided by the number of faculty Answer: Quantitative Diff: 1 Type: SA Var: 1 L.O.: 1.1.2 7) Graduation Rate: as a percentage Answer: Quantitative Diff: 1 Type: SA Var: 1 L.O.: 1.1.2 1
Use the following to answer the questions below: A high school senior is collecting data on the colleges in which she is interested, including the following variables: Type of college, Tuition, State, Enrollment, Student-Faculty Ratio, Graduation Rate 8) What are the cases in the high school senior's dataset? A) Colleges B) Tuition C) Graduation Rate D) State Answer: A Diff: 2 Type: BI Var: 1 L.O.: 1.1.1 9) Refer to the variables collected by the high school senior looking at colleges. Identify a question we might ask about any one of these individual variables. Answer: Answers will vary. Some possible answers include: Type of college: Is she considering more private schools than public schools? Tuition: What is the average tuition of the colleges she is considering? What is the "cheapest" school she is considering? What is the most expensive school she is considering? State: Is there a state that she seems to prefer? Enrollment: What is the average size of the colleges she is considering? What is the largest college she is considering? What is the largest college she is considering? Student-Faculty Ratio: What is the average SF ratio for the schools she is considering? What is the smallest SF ratio for the schools she is considering? What is the largest SF ratio for the schools she is considering? Graduation Rate: What is the average graduation rate for the schools she is considering? What is the lowest graduation rate for the schools she is considering? What is the highest graduation rate for the schools she is considering? Diff: 2 Type: ES Var: 1 L.O.: 1.1.4;1.1.5 10) Refer to the variables collected by the high school senior looking at colleges. Identify a question that we might ask about relationships between any two (or more) of these variables. Answer: Answers will vary. Some possible answers include: Which type of schools tend to cost more, the private or the public schools? Which type of schools tend to have the higher graduation rate, the private or public schools? Which type of schools tend to have the lower student-faculty ratio, the private or public schools? Which state is the most expensive? Can student-faculty ratio be used to predict tuition? Can graduation rate be used to predict tuition? Can enrollment be used to predict tuition? Diff: 2 Type: ES Var: 1 L.O.: 1.1.4;1.1.5
2
Use the following to answer the questions below: A realtor's website provides information on area homes that are for sale. Identify each of the variables as either categorical or quantitative. 11) List Price: amount, in thousands of dollars, for which the house is being sold Answer: Quantitative Diff: 1 Type: SA Var: 1 L.O.: 1.1.2 12) School District: the school district in which the home is located Answer: Categorical Diff: 1 Type: SA Var: 1 L.O.: 1.1.2 13) Size: in square feet Answer: Quantitative Diff: 1 Type: SA L.O.: 1.1.2
Var: 1
14) Style: the style of home (ranch, Cape Cod, Victorian, etc.) Answer: Categorical Diff: 1 Type: SA Var: 1 L.O.: 1.1.2 Use the following to answer the questions below: A realtor's website provides information on area homes that are for sale, including the following variables: List Price, School District, Size, Style. 15) What are the cases in the realtor's dataset? A) Individual houses B) List Price C) Size D) Style Answer: A Diff: 2 Type: BI Var: 1 L.O.: 1.1.1
3
16) Refer to the variables provided by the realtor. Identify a question we might ask about any one of these individual variables. Answer: Answers will vary. Some possible answers include: List Price: What is the average list price of homes for sale in the area? What is the least expensive home for sale in the area? What is the most expensive home for sale in the area? School District: In which school district are most of the homes located? Size: What is the average size of homes for sales in the area? What is the largest home for sale in the area? What is the smallest home for sale in the area? Style: What style of home is most popular (or for sale the most) in this area? Diff: 2 Type: ES Var: 1 L.O.: 1.1.4;1.1.5 17) Refer to the variables provided by the realtor. Identify a question that we might ask about relationships between any two (or more) of these variables. Answer: Answers will vary. Some possible answers include Does the price of the home depend on the school district (i.e., does one tend to cost more than others?)? Which style of home tends to cost the most? Do larger homes tend to cost more than smaller homes? Diff: 2 Type: ES Var: 1 L.O.: 1.1.4;1.1.5 Use the following to answer the questions below: The USStates dataset, used throughout the textbook, contains information on the 50 U.S. states. A small segment from the dataset is displayed in the following table.
18) What are the cases in this dataset? A) States B) Percent of residents with a college degree C) Residents D) USStates dataset Answer: A Diff: 1 Type: BI Var: 1 L.O.: 1.1.1
4
19) What variable from this dataset is displayed? Is it categorical or quantitative? A) Variable = Percent of state residents with a college degree. This is a quantitative variable. B) Variable = Percent of state residents with a college degree. This is a categorical variable. C) Variable = State. This is a quantitative variable. D) Variable = State. This is a categorical variable. Answer: A Diff: 1 Type: BI Var: 1 L.O.: 1.1.1;1.1.2 20) Data from the state of Connecticut were used to determine that 42.7% of state residents had a college degree. What were the cases from Connecticut used to arrive at this figure? A) Residents of Connecticut B) The state of Connecticut C) 42.7 percent of the Connecticut residents with a college degree D) The residents of California, Colorado, and Connecticut Answer: A Diff: 3 Type: BI Var: 1 L.O.: 1.1.1 21) What variable was used to determine that 42.7% of Connecticut state residents have a college degree? Is it categorical or quantitative? A) Variable = whether or not they have a college degree. This is a categorical variable. B) Variable = whether or not they have a college degree. This is a quantitative variable. C) Variable = what state they are from. This is a categorical variable. D) Variable = what state they are from. This is a quantitative variable. Answer: A Diff: 3 Type: BI Var: 1 L.O.: 1.1.1;1.1.2 22) The ________ variable is used to understand or predict values of the ________ variable. A) Blank 1 = Explanatory, Blank 2 = Response B) Blank 1 = Response, Blank 2 = Explanatory C) Blank 1 = Categorical, Blank 2 = Quantitative D) Blank 1 = Quantitative, Blank 2 = Categorical Answer: A Diff: 2 Type: BI Var: 1 L.O.: 1.1.3 1.2
Sampling from a Population
1) A population includes all individuals or objects of interest. Answer: TRUE Diff: 1 Type: TF Var: 1 L.O.: 1.2.1
5
2) A population is a subset of the sample. Answer: FALSE Diff: 1 Type: TF Var: 1 L.O.: 1.2.1 3) A biased sample is one that does not accurately reflect or represent the population. Answer: TRUE Diff: 1 Type: TF Var: 1 L.O.: 1.2.0 Use the following to answer the questions below: State whether the data are best described as a population or a sample. 4) The makers of M&M's state that when they package their candies they thoroughly mix the colored candies together and randomly put them into packages. A student purchases a bag of Milk Chocolate M&M's from the vending machine. A) Sample B) Population Answer: A Diff: 1 Type: BI Var: 1 L.O.: 1.2.1 5) A professor wants to schedule a review session for an exam. He asks all students enrolled in the course their preferred time, and they all respond. A) Population B) Sample Answer: A Explanation: Population - he collects data from everyone enrolled in the course Diff: 2 Type: BI Var: 1 L.O.: 1.2.1 6) A researcher has identified a beach with a substantial number of driftwood logs. She randomly chooses 30 logs and takes core samples from those logs. A) Sample B) Population Answer: A Explanation: Sample. She has only measured a subset of all of the logs on the beach. Diff: 2 Type: BI Var: 1 L.O.: 1.2.1
6
7) A football fan recorded the number of rushing yards for all NFL running backs who played last season. A) Population B) Sample Answer: A Explanation: Population. The data were collected on all of the running backs. Diff: 2 Type: BI Var: 1 L.O.: 1.2.1 Use the following to answer the questions below: A tree enthusiast is interested in estimating the typical length of oak tree leaves. He chooses 30 leaves from the oak tree in his backyard. 8) What is the sample in this situation? Answer: 30 leaves selected from this oak tree Diff: 2 Type: ES Var: 1 L.O.: 1.2.1 9) What is the population in which the tree enthusiast is interested? Answer: All oak tree leaves Diff: 2 Type: ES Var: 1 L.O.: 1.2.1 10) Is this a biased sampling strategy? A) Yes B) No Answer: A Explanation: This is a biased sampling strategy as he is only taking the leaves from one tree; there could be something unusual about that tree. At best we can generalize to the leaves on the tree in his backyard. Diff: 2 Type: MC Var: 1 L.O.: 1.2.2;1.2.3 In each situation, indicate whether the method of data collection is biased. 11) Ask the students at the gym on a Tuesday afternoon how many hours a week they work out to estimate the average amount of time students at the university work out. A) Biased B) Not biased Answer: A Explanation: Biased, because only students at the gym (who are likely working out) were sampled. Students who don't go to the gym (and thus possibly don't work out) were not included in the sample. Another potential source of bias is that, for various reasons, the students asked the question may exaggerate about the amount of time they work out each week. Diff: 1 Type: BI Var: 1 L.O.: 1.2.2;1.2.3;1.2.4;1.2.5 7
12) A professor asks her class of first year students if any of them consumed alcohol over the weekend. A) Biased B) Not biased Answer: A Explanation: This sample is likely biased. First year students would tend to be under the age of 21, and thus it would be illegal for them to be consuming alcohol. They might not want to truthfully tell their professor about engaging in an illegal behavior. Diff: 1 Type: BI Var: 1 L.O.: 1.2.2;1.2.3;1.2.4;1.2.5 13) A campus bookstore is holding a drawing to give away five free textbooks (one per student). Students enter the contest by writing their name and contact information on an index card. The index cards were placed in a bowl, thoroughly mixed around, and five cards were selected. Those five students were contacted and received their free textbook. A) Not biased B) Biased Answer: A Explanation: This is a non-biased sample of the students who entered the contest (the population in this situation is the student who entered the contest). Diff: 3 Type: BI Var: 1 L.O.: 1.2.2;1.2.3;1.2.4 14) A professor is considering a new textbook for her introductory statistics class. She wants to choose a book that emphasizes graphing data. A book that she is considering has 530 pages. To estimate the proportion of pages in the book that have displays of data, she randomly generates 20 numbers between 1 and 530. She then records whether or not each selected page contains displays of data. A) Not biased B) Biased Answer: A Explanation: Her pages were selected by generating random numbers. This is an unbiased sampling method. Diff: 1 Type: BI Var: 1 L.O.: 1.2.2;1.2.3;1.2.4 15) A reporter from the campus newspaper is writing an article about student opinions on Greek organizations (sororities and fraternities). For his article, he visits all of the Greek houses on campus and interviews a random sample of residents of each house. A) Biased B) Not biased Answer: A Explanation: His sample only contains individuals who have chosen to participate in Greek life, and this the opinions will be biased towards the individuals who choose to belong to a Greek organization. Diff: 1 Type: BI Var: 1 L.O.: 1.2.2;1.2.3;1.2.4 8
1.3
Experiments and Observational Studies
Use the following to answer the questions below: A group of researchers investigated the effect of media usage (whether or not subjects watch television or use the Internet) in the bedroom on "Tiredness" during the day (measured on a 50 point scale). 1) Identify the variables described and whether they are categorical or quantitative. A) Media usage in the bedroom = Categorical; "Tiredness" = Quantitative B) Media usage in the bedroom = Quantitative; "Tiredness" = Categorical C) Media usage in the bedroom = Categorical; "Tiredness" = Categorical D) Media usage in the bedroom = Quantitative; "Tiredness" = Quantitative Answer: A Diff: 2 Type: BI Var: 1 L.O.: 1.1.2 2) Identify the variables as either explanatory or response variables. A) Media usage in the bedroom = Explanatory; "Tiredness" = Response variable B) Media usage in the bedroom = Response variable; "Tiredness" = Explanatory C) Media usage in the bedroom = Explanatory; "Tiredness" = Explanatory D) Media usage in the bedroom = Response variable; "Tiredness" = Response variable Answer: A Diff: 1 Type: BI Var: 1 L.O.: 1.1.3 3) To collect these data, the researchers randomly selected homes to visit and interviewed the adult member of the household whose birthday was nearest. Is this an experiment or an observational study? A) Observational study B) Experiment Answer: A Explanation: This is an observational study because treatments are not being applied to the study participants; they are being asked about their normal behaviors. Diff: 1 Type: BI Var: 1 L.O.: 1.3.3 4) Suppose that the researchers found that the individuals who use media in the bedroom tended to be more tired during the day than those who do not. Would it be appropriate for the researchers to conclude that using media in the bedroom causes tiredness during the day? A) Yes B) No Answer: B Explanation: No, it would not be appropriate to make a claim about a causal relationship between media usage in the bedroom and tiredness because this is an observational study. It could be the case that the individuals used media in the bedroom because they couldn't sleep, and they were tired because of their inability to sleep. An experiment needs to be conducted to show a cause-and-effect relationship between two variables. Diff: 2 Type: BI Var: 1 9
L.O.: 1.3.1;1.3.2;1.3.4 5) Association implies causation. Answer: FALSE Diff: 1 Type: TF Var: 1 L.O.: 1.3.1 6) In elementary school (Grades 1 through 6) there is a strong association between a child's height and reading ability. What is a possible confounding variable that would help explain this relationship? Explain briefly. Answer: Age/grade is a possible confounding variable. Older students (who tend to be in the higher grades) tend to be taller than the younger students (in the lower grades). The older students should also be better readers than the younger students. Diff: 2 Type: ES Var: 1 L.O.: 1.3.2 7) A sample of college age students shows an interesting association between hair length (in inches) and height (also in inches). On average, shorter students tend to have longer hair. What is a possible confounding variable that would help explain this relationship? A) Gender B) Age C) Grade point average D) Local fashion preferences Answer: A Explanation: Gender is a possible confounding variable. We are looking at students in general, which can be either male or female. Females tend to be shorter than males, and females tend to have longer hair than males. Diff: 2 Type: BI Var: 1 L.O.: 1.3.2 Use the following to answer the questions below: A recent study investigated the impact of psychological stress on men's judgments of female body size. The men were randomly assigned to one of two groups; one group was assigned to participate in a stressful task while the other group did not take part in the task. Then the men were asked to rate the attractiveness of female bodies varying in size from emaciated to obese. 8) What are the cases in this study? Answer: Men Diff: 2 Type: SA Var: 1 L.O.: 1.1.1 9) Is this an experiment or an observational study? A) Experiment B) Observational study Answer: A 10
Explanation: This is an experiment because the men were assigned, at random, to one of two treatments (stressful task or not). Because their normal behavior was modified, this is not an observational study. Diff: 1 Type: BI Var: 1 L.O.: 1.3.3 10) Identify the explanatory variable in this experiment. A) Type of task (stressful or not) B) Rating of the attractiveness of the female body sizes Answer: A Diff: 1 Type: BI Var: 1 L.O.: 1.1.3 11) Identify the response variable in this experiment. A) Type of task (stressful or not) B) Rating of the attractiveness of the female body sizes Answer: B Diff: 1 Type: BI Var: 1 L.O.: 1.1.3 12) Is a control group used in this experiment? A) Yes B) No Answer: A Explanation: Yes, a control group is used. The group that did not participate in the stressful task is the control group. Diff: 1 Type: MC Var: 1 L.O.: 1.3.0 Use the following to answer the questions below: Identify whether each of the following scenarios describe a randomized comparative experiment or a matched pairs experiment. 13) To study the impact of texting while driving, researchers have students drive around an obstacle course twice, once while texting and once without texting (the order of which was randomized). Their score for each turn is the number of obstacles they successfully maneuvered around. Answer: Matched pairs Diff: 1 Type: SA Var: 1 L.O.: 1.3.6 14) Studies have shown that multi-tasking typically results in lower productivity. However, some people believe that individuals who play video games are better at multi-tasking. To investigate this, 28 video game players were randomly assigned to one of two groups. One group was assigned to play a video game that involved driving a car around a track. The other group was assigned to play the same video game while simultaneously answering unrelated trivia questions over the phone. 11
Answer: Randomized Comparative Experiment Diff: 1 Type: SA Var: 1 L.O.: 1.3.6 15) To study the effect of classical music on concentration, 26 math majors were assigned at random into two groups. Subjects in one group listened to classical music while trying to solve a hard Sudoku puzzle, while the subjects in the other group solved the same puzzle in a silent room. The time it took each student to finish was recorded. Answer: Randomized Comparative Experiment Diff: 1 Type: SA Var: 1 L.O.: 1.3.6 16) On their website, the makers of Cold-EEZE lozenges provide links to studies done to demonstrate the effectiveness of their product at shortening the duration of the common cold. One study, published in the Annals of Internal Medicine, is described as a "randomized, double-blind, placebo-controlled" study. Briefly explain what the phrase "randomized, double-blind, placebo-controlled" means. Answer: "Randomized" means that the subjects in the experiment were randomly assigned to the different treatments. Since this was a "placebo-controlled" experiment, one of the treatments was the Cold-EEZE lozenge while the other was a lozenge not believed to have any effect on the duration of a cold. "Double-blind" means that neither the subjects nor the individuals evaluating them knew which treatment the subjects were receiving; to ensure that this could happen, both the Cold-EEZE lozenge and the placebo need to be administered in the same way and be otherwise indistinguishable. Diff: 2 Type: ES Var: 1 L.O.: 1.3.5 17) Is using meditation to relax and clear the mind a natural way to treat insomnia? Design an experiment to investigate this question. Assume that you have 20 individuals who suffer from insomnia available to participate in the study. At the end of two months, you will ask subjects to rate their sleep quality. Answer: Randomly assign the 20 subjects to one of two groups. One group of 10 subjects will be taught how to meditate and asked to meditate once a day. The other group of 10 subjects will not change their normal behavior (this is the control group). After two months, compare sleep quality for the two groups. Diff: 1 Type: ES Var: 1 L.O.: 1.3.7 Use the following to answer the questions below: Can people text just as quickly with their off hand as they do their dominant hand? Assume that you have 42 volunteers available to participate in your study, and that the response you will measure is the time it takes to type and send a text message. 18) Design a randomized comparative experiment to investigate this question. Be specific about how randomization will be used in your experiment.
12
Answer: Randomly assign the 42 subjects to one of two groups. One group will be assigned to send a text message with their dominant hand, while the other group will send the same text message with their off-hand. All participants will send the same message. The time it takes to type and send the message will be recorded and compared for the two groups. Diff: 2 Type: ES Var: 1 L.O.: 1.3.6;1.3.7 19) Design a matched pairs experiment to investigate this question. Be specific about how randomization will be used in your experiment. Answer: Each subject will send the text message with both hands, but the order of "dominant" and "off hand" should be randomized for each subject. The time it takes for them to send each text message should be recorded. Diff: 2 Type: ES Var: 1 L.O.: 1.3.6;1.3.7 Use the following to answer the questions below: The Admissions Office at a small university has developed a new 10-minute video about the university to send to prospective students. Before mass-producing the DVD, they would like to test whether it is more effective than the current video. Suppose that you have 12 high school student volunteers who have agreed to take part in an experiment. The explanatory variable to be studied is the type of video, with two levels OLD and NEW. 20) Design a study that could be used in this situation. Give explicit instructions on what the 12 students should do, and be sure to indicate how randomization is used in the study. Answer: Answers will vary. Students could describe either a randomized comparative experiment or a matched pairs experiment. An example of a randomized comparative experiment would be: Randomly assign the 12 students to one of two groups. One group of 6 watch the OLD video (this is the control group). The other group of 6 students watch the new video. An example of a matched pairs experiment would be: In this type of study, the students would watch both videos. For each student, randomly decide which video they should watch first, labeled "1" (OLD) or "2" (NEW). The student should not know which is the OLD and which is the NEW. After watching the first video, have the student watch the second video. Diff: 2 Type: ES Var: 1 L.O.: 1.3.7 21) What specific question would you ask the students to measure as the response variable in this study? Answer: Answers will vary. For the randomized comparative study, you might ask something like "Based on the video alone, would you consider attending the university?" or "Based on the video alone, rate the likelihood of your attending the university on a scale from 1 to 10." For a matched pairs experiment, you might ask something like "Which video makes you more likely to consider the university, 1 or 2?" 13
Diff: 2 Type: ES L.O.: 1.3.0
Var: 1
22) What are the cases in your study? A) High school students B) Type of video C) OLD and NEW D) Instructional materials Answer: A Diff: 2 Type: BI Var: 1 L.O.: 1.1.1 23) Would you describe your study as a randomized comparative experiment, a matched pair experiment, or an observational study? Briefly explain. Answer: Answers will depend on their design. Diff: 2 Type: ES Var: 1 L.O.: 1.3.6 24) Explain what it means, in the context of this study, to want the subjects to be "blind." Answer: It means that you do not want the student to know which video they are viewing (the old or the new). Diff: 1 Type: ES Var: 1 L.O.: 1.3.5 25) When purchasing some foods, like Jello, at the grocery store, the color of the product typically "matches" the taste. For example, lemon-flavored Jello is yellow, cherry-flavored Jello is red, orange-flavored Jello is orange, and grape-flavored Jello is purple. But, does the color of our food impact the taste that we perceive? Suppose you want to design an experiment to address this question. Note that you can easily make your own "Jello" with simple ingredients that include unflavored gelatin, flavored extracts, and food coloring. Assume that the 30 college students are willing to participate in your study and the response variable is the number of flavors correctly identified. How would you design a randomized comparative experiment with two groups, each getting a different treatment? Be sure to explain how randomization is used. Answer: Answers will vary. A possible acceptable answer would be: Make a few batches of "Jello" where the color does not "match" the flavor. For example, make some purple orange-flavored "Jello", some red lemon-flavored "Jello," some green cherry-flavored "Jello," and some yellow grape-flavored "Jello." Randomly assign the 30 students to one of two groups. One group will taste the "Jello" blindfolded, and the other group will taste the "Jello" while being able to see the color.* Provide all subjects in each group a sample of all flavors to taste, and ask them to identify the flavor of each. Record the number that each participant gets correct. * Note that one group should see the misleading colors and the other should not. There could be several ways in which the latter occurs, including being blindfolded or being served Jello where the color and flavor "match." Diff: 3 Type: ES Var: 1 14
L.O.: 1.3.6;1.3.7 26) A group of students were asked to count the number of scars on both of their hands. The number of scars on their dominant hand was compared to the number of scars on their "off" hand. Is this an observational study or a randomized experiment? A) Observational study B) Randomized experiment Answer: A Diff: 2 Type: BI Var: 1 L.O.: 1.3.3 Use the following to answer the questions below: A university's Admissions staff sends one of four different representatives to work at college fairs. A study was conducted to evaluate the relative effectiveness of the four representatives. For each college fair over the course of the year, the number of inquiries from students, the type of fair (large or small), the representative who worked at that fair, and the percent of inquiries that resulted in applications were recorded. It was found that one of the representatives was far more effective at getting lots of inquiries. 27) What are the cases in this study? A) College fairs B) Representatives C) The percent of inquiries that resulted in applications D) The number of inquiries from students Answer: A Diff: 2 Type: BI Var: 1 L.O.: 1.1.1 28) What are the variables recorded in this study? List them and identify each as either categorical or quantitative. Answer: Number of inquiries from students — quantitative Type of fair — categorical (large or small) Representative who attended — categorical (will be one of the four representatives) Percent of inquiries that resulted in applications — quantitative Diff: 2 Type: ES Var: 1 L.O.: 1.1.1;1.1.2 29) Is this an observational study or an experiment? A) Yes B) No Answer: A Explanation: This is an observational study because the representatives weren't randomly assigned to the fairs. Diff: 2 Type: MC Var: 1 L.O.: 1.3.3
15
30) Can we conclude that sending the most effective representative to more college fairs will increase the number of inquiries from those college fairs? A) Yes B) No Answer: B Explanation: No, we cannot make that conclusion. This was not a randomized experiment (which is the only way to conclude causation), and thus there could be another lurking variable that could explain why one representative tended to be more effective. Diff: 2 Type: MC Var: 1 L.O.: 1.3.4 31) Briefly explain the distinction between an observational study and a designed experiment. Answer: In a designed experiment, some sort of treatment is applied to the cases in the study (i.e., their usual behavior/condition is modified). In an observational study, the cases are observed as they are, without any interference. Only a designed experiment can result in a cause-and-effect conclusion. Diff: 2 Type: ES Var: 1 L.O.: 1.3.3 32) A company is interested in redesigning its website, and two possible designs are being considered. The company wants to get input in the form of ratings of the two designs. Design a matched pairs experiment to decide which design gets higher ratings. Fifty volunteers are available to participate. Answer: Call the designs A and B. Randomly assign half of the volunteers to examine design A first and the rest to examine design B first. They should rate the design (say, on a scale from 1 to 10). After they have seen one design, they should examine and rate the other. The difference in their ratings should be computed. Diff: 2 Type: ES Var: 1 L.O.: 1.3.6;1.3.7
© 2021 John Wiley & Sons, Inc. All rights reserved. Instructors who are authorized users of this course are permitted to download these materials and use them in connection with the course. Except as permitted herein or by law, no part of these materials should be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise.
16
Statistics - Unlocking the Power of Data, 3e (Lock) Chapter 2 Describing Data 2.1
Categorical Variables
Use the following to answer the questions below: February 12, 2009 marked the anniversary of Charles Darwin's birth. To celebrate, Gallup, a national polling organization, surveyed 1,018 Americans about their education level and their beliefs about the theory of evolution. The survey results are displayed in the provided two-way table.
Believe Do Not Believe No Opinion Total
High School or Less 80 103 197 380
Some College 133 94 98 325
College Graduate Postgraduate 121 63 48 9 59 13 228 85
Total 397 254 367 1,018
1) What proportion of respondents have a college degree? Round your answer to three decimal places. Answer: 0.224 Diff: 1 Type: SA Var: 1 L.O.: 2.1.4 2) What proportion of respondents have no opinion on the theory of evolution? Round your answer to two decimal places. Answer: 0.36 Diff: 1 Type: SA Var: 1 L.O.: 2.1.4 3) What proportion of non-believers have a high school education or less? Use four decimal places in your answer. Answer: 0.4055 Diff: 2 Type: SA Var: 1 L.O.: 2.1.4 4) What proportion of college graduates believe in the theory of evolution? Use four decimal places in your answer. Answer: 0.5307 Diff: 2 Type: SA Var: 1 L.O.: 2.1.4
17
5) Find the proportion of respondents who believe in evolution for each education level (round each to three decimal places). Does there seem to be an association between education level and belief in evolution? If so, in what direction? Answer: Proportion of respondents who believe in evolution for each education level: HS or less: 0.211 Some College: 0.409 College Graduate: 0.531 Postgraduate: 0.741 There does seem to be an association between education level and belief in evolution. Individuals with more education are more likely to believe in evolution. Diff: 3 Type: ES Var: 1 L.O.: 2.1.4
18
6) The survey results are displayed in the segmented bar chart. Does there appear to be an association between education level and belief in the theory of evolution? If so, what does it mean about these two variables?
Answer: Individuals with a high school education or less are least likely to believe in the theory of evolution and are also most likely to have no opinion. The percentage of people who believe in the theory of evolution increases as education level increases, and the percentage with no opinion decreases as education level increases. Students might also notice that there are more respondents with a high school education or less, and that the number of respondents in each education category decreases as education level increases, however, these observations, though useful and informative, are not about the relationship between the two variables. Diff: 3 Type: ES Var: 1 L.O.: 2.1.5 Use the statement to answer the following questions below. 7) A Fun-Size bag of M&M's contains 4 green, 4 red, 3 yellow, 4 orange, 4 blue, and 3 brown candies. What proportion of the M&M's are green? Use four decimal places in your answer. Answer: 0.1818 Diff: 1 Type: SA Var: 1 L.O.: 2.1.2 19
8) A Fun-Size bag of M&M's contains 5 green, 4 red, 4 yellow, 3 orange, 4 blue, and 4 brown candies. What proportion of the candies are yellow or orange? Use four decimal places in your answer. Answer: 0.2917 Diff: 1 Type: SA Var: 1 L.O.: 2.1.2 9) A Fun-Size bag of M&M's contains 5 green, 3 red, 3 yellow, 3 orange, 4 blue, and 3 brown candies. Sketch a bar chart of the data. Answer: Will depend on values of the values randomly chosen for each color, but should look something like
Diff: 1 Type: ES L.O.: 2.1.1
Var: 1
20
10) A Fun-Size bag of M&M's contains 4 green, 2 red, 4 yellow, 3 orange, 6 blue, and 1 brown candies. Construct a relative frequency table of the results. Use two decimal places in your relative frequencies. Answer: Color Proportion Green 0.20 Red 0.10 Yellow 0.20 Orange 0.15 Blue 0.30 Brown 0.05 Total 1.00 Diff: 1 Type: ES L.O.: 2.1.1
Var: 1
Use the following to answer the questions below: In a survey conducted by the Gallup organization September 6-9, 2012, 1,017 adults were asked, "In general, how much trust and confidence do you have in the mass media—such as newspapers, TV, and radio—when it comes to reporting the news fully, accurately, and fairly?" 81 said that they had a "great deal" of confidence, 325 said they had a "fair amount" of confidence, 397 said they had "not very much" confidence, and 214 said they had "no confidence at all." 11) Display the results in a frequency table. Answer: Response Count Great Deal of Confidence 81 Fair Amount of Confidence 325 Not Very Much Confidence 397 No Confidence At All 214 1,017 Total Diff: 1 Type: ES L.O.: 2.1.1
Var: 1
21
12) Sketch a bar chart of the data. Answer: Displayed below:
Diff: 1 Type: ES L.O.: 2.1.1
Var: 1
13) Give a relative frequency table of the data. Use two decimal places in your relative frequencies. Answer: Response Proportion Great Deal of Confidence 0.08 Fair Amount of Confidence 0.32 Not Very Much Confidence 0.39 No Confidence At All 0.21 1.00 Total Diff: 1 Type: ES L.O.: 2.1.1
Var: 1
14) What proportion of respondents have a great deal of confidence in the media? Use two decimal places in your answer. Answer: 0.08 Diff: 1 Type: SA Var: 1 L.O.: 2.1.2
22
15) What proportion of respondents have a negative opinion (not very much confidence/none at all) about the mass media? Use two decimal places in your answer. Answer: 0.60 Diff: 2 Type: SA Var: 1 L.O.: 2.1.2 16) In the article, they discuss the association between political party and opinion about the media ("positive" = great deal/fair amount of confidence and "negative" = not very much confidence/none at all). The results, as percentages, are displayed in the side-by-side bar charts. Describe the association between political party and opinion about the media. Use the bar chart to estimate the proportion of individuals with positive opinions about the media for each political party.
Answer: Democrats are more likely to have a positive opinion about the media, while the majority of Republicans and Independents have negative opinions about the media. The Republicans are the group that are least likely to have a positive opinion about the media. About 58% of Democrats have a positive opinion about the media, while roughly 31% of Independents and 26% of Republicans have positive opinions about the media. (Answers may vary slightly.) Diff: 2 Type: ES Var: 1 L.O.: 2.1.5
23
17) In the same article, they compare opinions about the media in 2012 to those in the previous election year (2008). In 2008, 60% of Democrats, 27% of Republicans, and 41% of Independents had positive opinions (great deal/fair amount of confidence) about the media. For each political party, find the difference in the proportion of positive opinions in 2008 and the proportion of positive opinions in 2012. Comment on if/how opinions have changed for the political parties over the past four years. Answer: Differences may vary slightly as they are estimating the 2012 proportions from the bar chart. Democrats: = 0.60 - 0.58 = 0.02 Republicans: Independents:
-
= 0.27 - 0.26 = 0.01 -
= 0.41 - 0.31 = 0.10
The proportion of people with positive views on the media have decreased over the last four years for all parties. The Democrats and Republicans saw only small changes in this proportion, 0.02 and 0.01 respectively. The Independents had the largest change in the proportion of people with positive views on the media, with a decrease of 10% (0.10). Diff: 2 Type: ES Var: 1 L.O.: 2.1.1 Use the following to answer the questions below: In a recent survey, Gallup asked a sample of U.S. adults if they would prefer to have a job outside the home, or if they would prefer to stay home to care for the family and home. Partial results for the individuals who expressed a preference, broken down by gender, are displayed in the two-way table.
Male Female Total
Job Outside of the Home Stay at Home 391 ??? 254 219 645 332
Total 504 473 977
18) Find the number of males who would prefer to stay at home. Answer: 113 Explanation: 504 - 391 = 113 or 332 - 219 = 113 Diff: 2 Type: SA Var: 1 L.O.: 2.1.3 19) What proportion of respondents would prefer to stay at home? Round your answer to two decimal places. Answer: 0.34 Diff: 1 Type: SA Var: 1 L.O.: 2.1.4
24
20) Compute the difference in the proportion of men who would prefer a job outside of the home and the proportion of females who would prefer a job outside of the home. Use two decimal places in your answer. Answer: 0.24 Explanation: = 0.78 - 0.54 = 0.24 Diff: 1 Type: SA L.O.: 2.1.4
Var: 1
21) Students in a small statistics class were asked which was their dominant hand and if they were in a STEM (science, technology, engineering, and math) major. Their results are listed below. Use the results to construct a two-way table. Student 1 2 3 4 5 6 7 8
Hand Right Right Left Right Right No Right Left
STEM? No Yes Yes Yes No No No No
Student 9 10 11 12 13 14 15 16
Hand Right Left Right Right Right Right Right Right
Left 1 2 3
Total 7 9 16
STEM? Yes No Yes Yes No No Yes No
Answer: STEM, Yes STEM, No Total Diff: 2 Type: ES L.O.: 2.1.3 2.2
Right 6 7 13 Var: 1
One Quantitative Variable: Shape and Center
1) If a distribution is heavily skewed to the left, which relationship between the mean and median is most likely? A) Mean < Median B) Mean ≈ Median C) Mean > Median Answer: A Diff: 2 Type: MC Var: 1 L.O.: 2.2.4
25
2) If a distribution is roughly symmetric, which relationship between the mean and median is likely true? A) Mean < Median B) Mean ≈ Median C) Mean > Median Answer: B Diff: 2 Type: MC Var: 1 L.O.: 2.2.4 3) If a distribution is heavily skewed to the right, which relationship between the mean and median is likely true? A) Mean < Median B) Mean ≈ Median C) Mean > Median Answer: C Diff: 2 Type: MC Var: 1 L.O.: 2.2.4 Use the dataset to calculate the following summary statistics in the questions below. Report each with one decimal place. 4) Median Answer: 14.5 Diff: 1 Type: SA L.O.: 2.2.2
Var: 1
5) Mean Answer: 14.0 Diff: 1 Type: SA L.O.: 2.2.2
Var: 1
26
Use the following to answer the questions below: The provided histogram displays the number of Facebook friends for students in a small statistics class.
6) Which of the following best describes the shape of the distribution of the number of Facebook friends? A) Skewed to the left B) Roughly symmetric C) Skewed to the right Answer: C Diff: 1 Type: MC Var: 1 L.O.: 2.2.1 7) The mean number of Facebook friends is closest to which value? A) 220 friends B) 560 friends C) 810 friends D) 1,000 friends Answer: B Diff: 2 Type: MC Var: 1 L.O.: 2.2.3 8) The median number of Facebook friends is likely closest to which value? A) 300 B) 500 C) 700 D) 800 Answer: B Diff: 2 Type: MC Var: 1 L.O.: 2.2.3 27
Use the following to answer the questions below: The finishing time for the top 100 men in a marathon are displayed in the provided figure.
9) Which of the following best describes the distribution of times for the top 100 male finishers in the marathon? A) Skewed to the left B) Roughly symmetric C) Skewed to the right Answer: A Diff: 1 Type: MC Var: 1 L.O.: 2.2.1 10) The mean time for the top 100 males is closest to which value? A) 148 minutes B) 151 minutes C) 140 minutes D) 135 minutes Answer: A Diff: 2 Type: MC Var: 1 L.O.: 2.2.3 11) The median time for the top 100 males is closest to which value? A) 156 minutes B) 140 minutes C) 145 minutes D) 151 minutes Answer: D Diff: 2 Type: MC Var: 1 L.O.: 2.2.3 28
Use the following to answer the questions below: The midrange is another way to measure of the center of a distribution. The midrange of a dataset is defined to be the average of the minimum and maximum values in the dataset. 12) Calculate the midrange of this dataset. 4
10
12
2
7
5
9
8
Answer: 7 Explanation: midrange = (12 + 2)/2 = 7 Diff: 2 Type: SA Var: 1 L.O.: 2.2.0 13) In general, would you think that the midrange should be a resistant statistic? A) Yes B) No Answer: B Explanation: No, it would not be resistant in general because it is calculated with the minimum and maximum in the dataset. If there is an outlier in the dataset, it will be one of these values. This would affect the average of the minimum and maximum (since averages are not resistant). Diff: 3 Type: MC Var: 1 L.O.: 2.2.4 2.3
One Quantitative Variable: Measures of Spread
1) Which statistic is more resistant to outliers (or extreme data values)? A) Mean B) Median Answer: B Diff: 2 Type: BI Var: 1 L.O.: 2.2.4;2.3.6 2) Which statistic is more resistant to outliers (or extreme data values)? A) Interquartile Range B) Standard Deviation Answer: A Diff: 2 Type: BI Var: 1 L.O.: 2.3.6
29
Use the dataset to compute the following summary statistics in the questions below. 3) Median Answer: 12 Diff: 1 Type: SA L.O.: 2.2.2
Var: 1
4) Mean (rounded to two decimal places) Answer: 14.43 Diff: 1 Type: SA Var: 1 L.O.: 2.2.2 5) Q1 Answer: 6 Diff: 1 Type: SA L.O.: 2.3.1
Var: 1
6) Q3 Answer: 22 Diff: 1 Type: SA L.O.: 2.3.1
Var: 1
7) IQR Answer: 17 Diff: 1 Type: SA L.O.: 2.3.5
Var: 1
8) Range Answer: 25 Diff: 1 Type: SA L.O.: 2.3.5
Var: 1
30
9) Each of the variables displayed in the histograms below has a mean of 14.5, a range of 8, and 59 observations. Rank the three variables according to their standard deviations, from the smallest to the largest.
A) A, B, C B) B, A, C C) C, B, A D) A, C, B Answer: B Diff: 3 Type: BI L.O.: 2.3.2
Var: 1
Use the following statement to answer the questions below. 10) The distribution of waiting times at the student health center is bell-shaped with a mean of 13 minutes and a standard deviation of 2. Give an interval that is likely to contain about 95% of wait times. A) 9 to 17 minutes B) 11 to 15 minutes C) 7 to 19 minutes D) 9 to 13 minutes Answer: A Diff: 2 Type: BI Var: 1 L.O.: 2.3.2
31
11) The distribution of waiting times at the student health center is bell-shaped with a mean of 10 minutes and a standard deviation of 3. Find the z-score of someone who waits 5 minutes. Round your z-score to two decimal places. Be sure to specifically indicate if a wait time of 5 minutes is unusual. A) -1.67; The wait time is not unusual. B) 1.67; The wait time is unusual. C) -1.67; The wait time is unusual. D) 1.67; The wait time is not unusual. Answer: A Explanation: Interpretations will vary: An individual who waits 5 minutes has a wait time that is 1.67 standard deviations below the mean wait time. If the z-score is less than two standard deviations below the mean, the answer should indicate that this wait time is not unusual. If the z-score is more than two standard deviations below the mean, the answer should indicate that this wait time is unusual. Diff: 2 Type: BI Var: 1 L.O.: 2.3.3
32
Use the following to answer the questions below: Scores on an exam (out of 100 points) given in a large introductory statistics course are displayed in the provided histogram.
12) Which best describes the shape of the distribution of exam scores? A) Approximately symmetric B) Skewed to the left C) Skewed to the right Answer: A Diff: 1 Type: BI Var: 1 L.O.: 2.2.1 13) Based on the histogram, which value is likely the mean exam score? A) 82 B) 88 C) 76 D) 92 Answer: A Diff: 1 Type: BI Var: 1 L.O.: 2.2.3 14) Based on the histogram of exam scores, which value is likely the median exam score? A) 92 B) 88 C) 82 D) 72 Answer: C Diff: 1 Type: BI Var: 1 L.O.: 2.2.3 33
15) Based on the histogram, the standard deviation of the exam scores is likely closest to which of these values? A) 0.5 B) 10 C) 5 D) 1 Answer: C Diff: 3 Type: BI Var: 1 L.O.: 2.3.2;2.3.5 2.4
Boxplots and Quantitative/Categorical Relationships
Use the following to answer the questions below: One of the symptoms of the flu is an elevated pulse rate. Pulse rates (in beats per minute) for patients with the flu are provided. 75 90
80 90
81 91
82 92
82 93
83 93
84 95
85 97
86 99
88 101
88 90 110
1) Give the sample mean pulse rate. Use two decimal places in your answer. A) 89.35 beats per minute B) 90 beats per minute C) 92.5 beats per minute D) 89 beats per minute Answer: A Diff: 1 Type: BI Var: 1 L.O.: 2.2.2 2) Find the standard deviation of the pulse rates. Use two decimal places in your answer. A) 7.85 beats per minute B) 8.27 beats per minute C) 7.33 beats per minute D) 8.19 beats per minute Answer: A Diff: 2 Type: BI Var: 1 L.O.: 2.3.1
34
3) Give the five-number summary of these pulse rates. Answer: (Answers may vary slightly due to software used; R was used to compute these summaries.) Min. 75.00 bpm
1st Qu. 83.50 bpm
Median 90.00 bpm
3rd Qu. 93.00 bpm
Max. 110.00 bpm
Median 90.00 bpm
3rd Qu. 93.00 bpm
Max. 110.00 bpm
bpm = beats per minute Summaries from Minitab: Min. 1st Qu. 75.00 bpm 83.00 bpm bpm = beats per minute Diff: 1 Type: ES L.O.: 2.3.1;2.3.4
Var: 1
4) Are there any outliers? If so, which data points? Clearly show your work to justify your answer. Answer: Answers for IQR could vary slightly, depending on method/software used. Regardless, 110 should be the only outlier. Using Q1 and Q3 found in R: Use the 1.5 IQR rule to detect outliers. IQR = 93 - 83.50 = 9.5 1.5 IQR = 14.25 An observation is an outlier if it is smaller than 83.5 - 14.25 = 69.25 bpm or larger than 93 + 14.25 = 107.25 bpm. There is only one outlier, 110 bpm. Diff: 2 Type: ES Var: 1 L.O.: 2.4.1
35
Use the following to answer the questions below: Match the five-number summary to the appropriate boxplot.
5) ________ 15, 19, 20, 25, 28 Answer: C Diff: 1 Type: SA Var: 1 L.O.: 2.3.4;2.4.2 6) ________ 1, 3, 4, 6, 8 Answer: B Diff: 1 Type: SA Var: 1 L.O.: 2.3.4;2.4.2 7) ________ 3, 10, 12, 13, 19 Answer: D Diff: 1 Type: SA Var: 1 L.O.: 2.3.4;2.4.2 8) ________ 5, 8, 9, 11, 14 Answer: A Diff: 1 Type: SA Var: 1 L.O.: 2.3.4;2.4.2
36
Use the following to answer the questions below: The provided Minitab output displays descriptive statistics for the amount of financial aid, in thousands of dollars, awarded to a sample of students at a large university.
9) How many students are included in the sample? Answer: 120 Diff: 1 Type: SA Var: 1 L.O.: 2.3.1 10) Based on the mean and median financial aid amounts displayed in the summary, which of the following most likely describes the shape of the distribution of financial aid amounts? A) Slightly skewed to the left B) Roughly symmetric C) Slightly skewed to the right Answer: C Diff: 2 Type: BI Var: 1 L.O.: 2.2.4 11) Based on the output, give an interval that is certain to contain the 15th percentile of the distribution of financial aid amounts. Answer: Answers may vary. The lower bound must be the minimum and the upper bound can be Q1 or anything larger. Thus, one possible answer is $3,400 to $9,925. Diff: 3 Type: ES Var: 1 L.O.: 2.3.4 12) Based on the output, give an interval that is certain to contain the 60th percentile of the distribution of financial aid amounts. Answer: Answers may vary. Based on the available information, the smallest interval that contains the 60th percentile is the median to Q3: $13,500 to $19,275. Diff: 3 Type: ES Var: 1 L.O.: 2.3.4 13) What is the range of financial aid amounts? Answer: $31,600 Explanation: Range = Max - Min = $35,000 - $3,400 = $31,600 Diff: 1 Type: SA Var: 1 L.O.: 2.3.5 14) What is the IQR of financial aid amounts? Answer: $9,350 Explanation: IQR = Q3 - Q1 = $19,275 - $ 9,925 = $9,350 Diff: 2 Type: SA Var: 1 L.O.: 2.3.5 37
15) Is the largest financial aid amount an outlier? A) Yes B) No Answer: A Explanation: IQR = $9350 or 9.35 1.5 IQR = 1.5*9.35 = 14.025 Q3 + 1.5 IQR = 19.275 + 14.025 = 33.3 Since the maximum value of 35 is above the cut-off for outliers on the high end of the distribution, 35 or $35,000 is an outlier. Diff: 2 Type: MC Var: 1 L.O.: 2.4.1 16) Find and interpret the z-score for the smallest financial aid amount. Answer: -1.59 Explanation: z = (3.4 - 15.142)/7.362 = -1.59 Someone receiving $3,400 in financial aid is 1.59 standard deviations below the sample mean financial aid amount. Diff: 2 Type: SA Var: 1 L.O.: 2.3.3
38
Use the following to answer the questions below: Students in an introductory statistics course were asked to count the number of scars on their dominant hand (the one they write with the most). The results are displayed in the provided boxplot.
17) From the boxplot you can identify how many students are in the class. Answer: FALSE Diff: 2 Type: TF Var: 1 L.O.: 2.4.2 18) Use the boxplot to estimate the median number of scars that students in the class have on their dominant hand. Answer: 1.5 Diff: 2 Type: SA Var: 1 L.O.: 2.4.2 19) The distribution of the number of scars would be classified as A) skewed to the left. B) roughly symmetric. C) skewed to the right. Answer: C Diff: 2 Type: BI Var: 1 L.O.: 2.4.2 20) Calculate the IQR for the distribution of the number of scars students have. Answer: 2 Explanation: Q3 = 3 and Q1 = 1, so IQR = 2 Diff: 2 Type: SA Var: 1 L.O.: 2.3.5;2.4.2 39
21) Which answer best describes the following conclusion? "There are no students with 7 scars on their dominant hand." A) True B) False C) Cannot be determined Answer: A Diff: 3 Type: MC Var: 1 L.O.: 2.4.2 22) Which answer best describes the following conclusion? "There are no students with 2 scars on their dominant hand." A) True B) False C) Cannot be determined Answer: C Diff: 3 Type: MC Var: 1 L.O.: 2.4.2 23) The mean can be determined exactly from the boxplot. Answer: FALSE Diff: 2 Type: TF Var: 1 L.O.: 2.4.2
40
Use the following to answer the questions below: The side-by-side boxplots compare the top 100 men's and women's finishing times in a marathon.
24) Which group tends to finish the race faster? A) Men B) Women Answer: A Diff: 2 Type: BI Var: 1 L.O.: 2.4.4 25) Which group has the larger spread in its race times? A) Men B) Women Answer: B Diff: 2 Type: BI Var: 1 L.O.: 2.4.4 26) Does there appear to be an association between gender and race time? A) Yes B) No Answer: A Explanation: Yes, it appears that men tend to be faster than women. The median race time for men is well below that for women, and there is very little overlap between the distributions for the two groups. Diff: 2 Type: MC Var: 1 L.O.: 2.4.4
41
Use the following to answer the questions below: The states located in the Midwestern region of the country typically experience a large number of tornados every year. The number of tornadoes from 2000-2011 for four Midwestern states (Kansas, Nebraska, South Dakota, and Iowa) are displayed in the side-by-side boxplots.
27) Which state tends to see the most tornadoes per year? A) Kansas B) Nebraska C) South Dakota D) Iowa Answer: A Diff: 1 Type: BI Var: 1 L.O.: 2.4.3 28) Which state has the largest range? A) Kansas B) Nebraska C) South Dakota D) Iowa Answer: A Diff: 2 Type: BI Var: 1 L.O.: 2.43
42
29) Which state tends to see the fewest tornadoes per year? A) Kansas B) Nebraska C) South Dakota D) Iowa Answer: C Diff: 1 Type: BI Var: 1 L.O.: 2.4.3 30) Which state has the largest IQR? A) Kansas B) Nebraska C) South Dakota D) Iowa Answer: D Diff: 2 Type: BI Var: 1 L.O.: 2.4.3 31) Which state has an outlier? A) Kansas B) Nebraska C) South Dakota D) Iowa Answer: C Diff: 1 Type: BI Var: 1 L.O.: 2.4.3 32) (58, 88, 94, 133, 185) is the five number summary for which state? A) Kansas B) Nebraska C) South Dakota D) Iowa Answer: A Diff: 2 Type: BI Var: 1 L.O.: 2.3.4;2.4.3
43
2.5
Two Quantitative Variables: Scatterplot and Correlation
Use the following to answer the questions below: Identify which graphical display might be appropriate in each case. Select all that apply. 1) Investigate the number of Facebook friends students in your class have. A) Histogram B) Bar chart C) Pie chart D) Side-by-side bar chart E) Segmented bar chart F) Dotplot G) Side-byside boxplots H) Scatterplot Answer: A, F Diff: 2 Type: MC Var: 1 L.O.: 2.2.0 2) Compare the number of points scored for all games played in a season for all football teams in the Big 10 conference. A) Side-by-side boxplots B) Histogram C) Bar chart D) Pie chart E) Side-by-side bar chart F) Segmented bar chart G) Dotplot H) Scatterplot Answer: A Diff: 2 Type: MC Var: 1 L.O.: 2.4.3 3) Investigate the relationship between pulse rate (in beats per minute) and systolic blood pressure (the top number in a blood pressure reading, measured in of mercury) for patients at the student health center. A) Scatterplot B) Side-by-side boxplots C) Histogram D) Bar chart E) Pie chart F) Side-by-side bar chart G) Segmented bar chart H) Dotplot Answer: A Diff: 2 Type: MC Var: 1 L.O.: 2.5.0 44
4) Investigate the favorite type of music (country, rock, classical, etc.) for the students in your class. A) Bar chart B) Side-by-side boxplots C) Histogram D) Pie chart E) Side-by-side bar chart F) Segmented bar chart G) Dotplot H) Scatterplot Answer: A Diff: 2 Type: MC Var: 1 L.O.: 2.1.0 5) Compare the percentage of people in favor of Barack Obama and Mitt Romney in the 2012 Presidential election for the different regions in the U.S. (Northeast, Southeast, Midwest, West). A) Side-by-side bar chart B) Histogram C) Bar chart D) Pie chart E) Segmented bar chart F) Dotplot G) Scatterplot H) Side-by-side boxplots Answer: A, E Diff: 2 Type: MC Var: 1 L.O.: 2.1.0 6) Investigate the relationship between the length of the right foot (in cm) and the length of the right forearm (in cm) for students in your class. A) Scatterplot B) Side-by-side boxplots C) Histogram D) Bar chart E) Pie chart F) Side-by-side bar chart G) Segmented bar chart H) Dotplot Answer: A Diff: 2 Type: MC Var: 1 L.O.: 2.5.0
45
7) Investigate the length of songs on your iPod. A) Side-by-side boxplots B) Histogram C) Bar chart D) Pie chart E) Side-by-side bar chart F) Segmented bar chart G) Dotplot H) Scatterplot Answer: B, G Diff: 2 Type: MC Var: 1 L.O.: 2.2.0 8) Investigate the number of text messages sent yesterday by students in your class. A) Side-by-side boxplots B) Histogram C) Bar chart D) Pie chart E) Side-by-side bar chart F) Segmented bar chart G) Dotplot H) Scatterplot Answer: B, G Diff: 2 Type: MC Var: 1 L.O.: 2.2.0 9) Investigate the relationship between gender and left/right handedness for students in the class. A) Side-by-side boxplots B) Histogram C) Bar chart D) Pie chart E) Side-by-side bar chart F) Segmented bar chart G) Dotplot H) Scatterplot Answer: E, F Diff: 2 Type: MC Var: 1 L.O.: 2.1.0
46
10) Investigate the relationship between the number of hours of exercise per week and athlete or not for a sample of students at a small university. A) Side-by-side boxplots B) Histogram C) Bar chart D) Pie chart E) Side-by-side bar chart F) Segmented bar chart G) Dotplot H) Scatterplot Answer: A Diff: 2 Type: MC Var: 1 L.O.: 2.4.3 Use the following to answer the questions below: Match the correlation to the corresponding scatterplot.
11) ________ 0.719 Answer: D Diff: 1 Type: SA L.O.: 2.5.3 12) ________ -0.064 Answer: C Diff: 1 Type: SA L.O.: 2.5.3
Var: 1
Var: 1
47
13) ________ 0.889 Answer: A Diff: 1 Type: SA L.O.: 2.5.3 14) ________ -0.701 Answer: B Diff: 1 Type: SA L.O.: 2.5.3
Var: 1
Var: 1
Use the following to answer the questions below: A student working an independent research project wants to investigate if there is an association between the amount of sleep someone gets and their body mass index (BMI) an indicator of body fatness. For a sample of 45 students, she records their BMI and the average amount of sleep they get on weeknights over a two-week period. 15) What would it mean for average amount of sleep and BMI to be positively correlated? Answer: It would mean that the more sleep individuals get, the higher their BMI tends to be. Diff: 2 Type: ES Var: 1 L.O.: 2.5.2 16) What would it mean for average amount of sleep to be negatively correlated with BMI? Answer: It would mean that individuals who get more sleep tend to have lower BMIs. Diff: 2 Type: ES Var: 1 L.O.: 2.5.2 17) Suppose the student found a correlation of -0.413 between amount of sleep and BMI. Would it be appropriate for her to conclude that getting more sleep causes individuals to have a lower BMI? A) Yes B) No Answer: B Explanation: No, because correlation does not imply causation. Diff: 2 Type: MC Var: 1 L.O.: 2.5.5
48
Use the following to answer the questions below: The scatterplot shows the relationship between GPA and the number of Facebook friends for 30 students in a class.
18) Discuss the information contained in the scatterplot. What does it mean about GPA and number of Facebook friends? Answer: There seems to be a moderately strong negative linear association between GPA and the number of Facebook friends. The negative association indicates that individuals who have more Facebook friends tend to have a lower GPA, while individuals with fewer Facebook friends tend to have a higher GPA. There is a potential outlier where one student had roughly 1,500 Facebook friends and a GPA a little below 2.50. Diff: 2 Type: ES Var: 1 L.O.: 2.5.1 19) For each corner of the scatterplot (top left, top right, bottom left, bottom right), describe a student whose responses place him or her in that corner. Answer: Top Left: A student in this corner of the scatterplot would have a small number of Facebook friends (less than 200) and a GPA fairly close to 4.0, i.e., a very strong student with relatively few Facebook friends. Top Right: A student in this corner of the scatterplot would have a large number of Facebook friends (more than 1,000) and a high GPA (close to 4.0), i.e., a very strong student with lots of Facebook friends. Bottom Left: A student in this corner of the scatterplot would have a small number of Facebook friends (less 200) and a low GPA (maybe 2.75 or less), i.e., a somewhat weak student with relatively few Facebook friends. Bottom Right: A student in this corner of the scatterplot would have a large number of Facebook friends (more than 1,000) and a low GPA (maybe 2.75 or less), i.e., a somewhat weak student with lots of Facebook friends. Diff: 2 Type: ES Var: 1 L.O.: 2.5.0 49
20) The correlation between GPA and the number of Facebook friends is -0.686. Should you go "unfriend" some of your Facebook friends if you want to improve your GPA (i.e., can you conclude that having more Facebook friends lowers GPA)? A) Yes B) No Answer: B Explanation: No! Correlation does not imply causation! A possible lurking variable could be amount of time spent studying - maybe individuals with more Facebook friends spend more time online and less time studying (hence the lower GPA). Diff: 2 Type: MC Var: 1 L.O.: 2.5.5 2.6
Two Quantitative Variables: Linear Regression
Use the following to answer the questions below: Trying to determine the number of students to accept is a tricky task for universities. The Admissions staff at a small private college wants to use data from the past few years to predict the number of students enrolling in the university from those who are accepted by the university. The data are provided in the following table. Number Accepted Number Enrolled 2,440 611 2,800 708 2,720 637 2,360 584 2,660 614 2,620 625 1) What is the explanatory (X) variable? A) Number of students accepted B) Number of students enrolling Answer: A Diff: 2 Type: BI Var: 1 L.O.: 1.1.3;2.6.0 2) What is the response (Y) variable? A) Number of students enrolling B) Number of students accepted Answer: A Diff: 2 Type: BI Var: 1 L.O.: 1.1.3;2.6.0
50
3) Find the correlation between the number of students accepted and enrolled. Use two decimal places in your answer. Answer: 0.83 Explanation: r = 0.83 Diff: 2 Type: SA Var: 1 L.O.: 2.5.4 4) Find the least squares regression line for predicting the number enrolled from the number accepted. A) = 89 + 0.208 accepted, where E = enrolled B)
= 85 + 0.208 accepted, where E = enrolled
C)
= 89 + 0.204 accepted, where E = enrolled
D) = 85 + 0.206 accepted, where E = enrolled Answer: A Diff: 2 Type: BI Var: 1 L.O.: 2.6.1 5) Interpret the slope in of the least squares regression line in context. Answer: The enrollment is predicted to increase by 0.208 students for each additional student accepted. (The enrollment is predicted to increase by 2.08 students for each additional 10 students accepted.) Diff: 2 Type: ES Var: 1 L.O.: 2.6.3 6) Interpret the intercept of the least squares regression line in context. Does the interpretation make sense? Answer: When 0 students are accepted, 89 are enrolled. (Note that this does not make sense because we are extrapolating.) Diff: 2 Type: ES Var: 1 L.O.: 2.6.3;2.6.5 7) Suppose Admissions has announced that 2,575 students have been accepted this year. Use your regression equation to predict the number of students that will enroll. Answer: Let y = # of students enrolled. = 89 + 0.208(2,575) = 624.6 The model predicts that if 2,575 are accepted, then 624.6 students will enroll at the college. Diff: 2 Type: ES Var: 1 L.O.: 2.6.2
51
Use the following to answer the questions below: The least squares regression line is displayed on the provided scatterplot. Note that the points are displayed with numbers (each point having its own number), rather than points.
8) Which point has the most extreme negative residual? A) 0 B) 1 C) 4 D) 9 Answer: D Diff: 3 Type: BI Var: 1 L.O.: 2.6.4 9) Which point has the most extreme positive residual? A) 0 B) 4 C) 8 D) 9 Answer: B Diff: 3 Type: BI Var: 1 L.O.: 2.6.4
52
10) Which point has the residual that is closest to 0? A) 1 B) 4 C) 5 D) 9 Answer: C Diff: 3 Type: BI Var: 1 L.O.: 2.6.4 Use the following to answer the questions below: Students in a small statistics course collected data to determine if the length of the forearm could be used to predict the length of the foot (both measured in centimeters). Their data are displayed in the provided table. Forearm (cm) Foot (cm)
29 26
28 23
27 24
23 23
26 25
29.5 27
36 29
29 28
30 23
24 23
27 24
29.5 26
32 31
11) Based on their goal (to predict foot length from forearm length), which variable is the explanatory variable? A) Forearm length B) Foot length Answer: A Diff: 2 Type: BI Var: 1 L.O.: 1.1.3;2.5.0;2.6.0 12) Which of the following would you expect to be true about the association between the length of the forearm and the length of the foot? A) Positive association B) Negative association C) No association Answer: A Diff: 1 Type: BI Var: 1 L.O.: 2.5.2
53
13) A scatterplot of the data collected by the students is provided. Does there appear to be a positive or negative association between these two variables? What does this mean for these two variables?
A) Positive; This means that people with longer forearms tend to have larger feet. B) Positive; This means that people with longer forearms tend to have smaller feet. C) Negative; This means that people with longer forearms tend to have larger feet. D) Negative; This means that people with longer forearms tend to have smaller feet. Answer: A Explanation: There is a positive association between forearm and foot length. This means that people with longer forearms tend to have larger feet (and similarly people with shorter forearms tend to have smaller feet). Diff: 1 Type: BI Var: 1 L.O.: 2.5.1;2.5.2 14) Find the correlation between forearm and foot length. Use three decimal places in your answer. Answer: 0.739 Explanation: r = 0.739 Diff: 2 Type: SA Var: 1 L.O.: 2.5.4 15) If the forearm and foot lengths had been measured in inches instead of centimeters the correlation would be different. Answer: FALSE Diff: 3 Type: TF Var: 1 L.O.: 2.5.3
54
16) Find the least squares regression equation for predicting foot length from forearm length. A) y = foot; = 9.216 + 0.5735 forearm B) y = foot; = 2.133 + 0.7246 forearm C) y = foot; = 2.133 + 0.5735 forearm D) y = foot; = 9.216 + 0.7246 forearm Answer: A Diff: 2 Type: BI Var: 1 L.O.: 2.6.1 17) A scatterplot of the data with the least squares regression line is shown. What are the coordinates of the point with the most extreme negative residual?
A) (28, 23) B) (30, 23) C) (32, 31) D) (36, 29) Answer: B Diff: 2 Type: BI L.O.: 2.6.4
Var: 1
55
18) A scatterplot of the data with the least squares regression line is shown. What are the coordinates of the point with the most extreme positive residual?
A) (32, 31) B) (30, 23) C) (28, 23) D) (23, 23) Answer: A Diff: 2 Type: BI L.O.: 2.6.4
Var: 1
19) Using your model to predict the foot length for an individual with a forearm length of 45 cm would be ________. A) extrapolation B) regression C) a positive residual D) a negative residual Answer: A Diff: 2 Type: BI Var: 1 L.O.: 2.6.5
56
Use the following to answer the questions below: A biologist collected data on a sample of 20 porcupines. She wants to be able to predict the body mass of a porcupine (in grams) based on the length of the porcupine (in cm). Her least squares regression equation is where M is the mass. 20) Interpret the slope of the least squares regression line, in the context of the situation. Answer: The body mass of porcupines is predicted to increase by 175.6 g for every centimeter of length. Diff: 2 Type: ES Var: 1 L.O.: 2.6.3 21) If it would make sense, provide a clear interpretation of the intercept of the regression line, in context. Otherwise, explain why the interpretation does not make sense. Answer: The interpretation of the intercept does not make sense in this situation. A real porcupine would not have a length of 0 cm. Making a prediction for a porcupine with a body length of 0 cm would be extrapolation because 0 cm is far outside the range of the original data. Diff: 2 Type: ES Var: 1 L.O.: 2.6.3;2.6.5 22) Predict the body mass of a porcupine that is 51 cm long. Report your answer using one decimal place. A) 5,866.6 g B) 5,661.0 g C) 5,732.5 g D) 6,139.2 g Answer: A Explanation: = -3,089 + 175.6(51) = 5,866.6 A porcupine that is 51 cm long is predicted to have a body mass of 5,866.6 g. Diff: 2 Type: BI Var: 1 L.O.: 2.6.2 23) One of the porcupines in the dataset had a body length of 51 cm and a body mass of 5,281 g. Calculate the residual for this porcupine. Use one decimal place in your calculation. A) -585.6 g. B) 585.6 g. C) -451.5 g. D) 451.5 g. Answer: A Explanation: = -3,089 + 175.6(51) = 5,866.6 g is the predicted body mass for a porcupine that is 51 cm long. The residual for this porcupine is e = y Diff: 2 Type: BI Var: 1 L.O.: 2.6.4
= 5,281 - 5,866.6 = -585.6 g.
57
24) A scatterplot of the biologist's data, with the least squares regression line, is provided. There is a clear outlier in the lower left corner of the plot. How would removing this point from the dataset most likely affect the correlation between body length (cm) and body mass (g)?
A) It would make the correlation stronger. B) It would make the correlation weaker. C) It would have no impact on the correlation between body length and body mass. Answer: B Diff: 3 Type: BI Var: 1 L.O.: 2.5.6;2.6.6 25) Another variable that the biologist recorded was the chest circumference (in cm) of the porcupines. Explain what both a negative and a positive association between body mass and chest circumference would mean. Which is more plausible in this situation? Answer: A positive correlation would mean that porcupines that have a larger chest circumference tend to have a higher body mass while those with a smaller chest circumference would tend to have a lower body mass. A negative correlation would mean that porcupines that have a larger chest circumference tend to have a lower body mass while those with the smaller chest circumference would tend to have a higher body mass. The positive correlation is most plausible in this situation - the porcupines that are "bigger" in the chest area are likely bigger overall and thus should have the higher body mass. Diff: 2 Type: ES Var: 1 L.O.: 2.5.2;2.5.3
58
2.7
Data Visualization and Multiple Variables
1) There are no testbank questions for this section. Diff: 1 Type: SA Var: 1
© 2021 John Wiley & Sons, Inc. All rights reserved. Instructors who are authorized users of this course are permitted to download these materials and use them in connection with the course. Except as permitted herein or by law, no part of these materials should be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise.
Statistics - Unlocking the Power of Data, 3e (Lock) Chapter 3 Confidence Intervals 3.1
Sampling Distributions
Use the following to answer the questions below: Identify each of the following as either a parameter or a statistic, and give the correct notation. 1) Correlation between height and armspan (distance from fingertip to fingertip when arms are extended to the sides) for all players on the Chicago Bulls basketball team, using data from all players currently on the team A) Parameter, ρ B) Parameter, C) Statistic, D) Statistic, ρ Answer: A Diff: 2 Type: BI L.O.: 3.1.1
Var: 1
2) Proportion of students at your university that smoke, based on data from your class. A) Statistic, B) Parameter, ρ C) Parameter, D) Statistic, ρ Answer: A Diff: 2 Type: BI Var: 1 L.O.: 3.1.1 3) Correlation between price of a textbook and the number of pages, based on 25 textbooks selected from the bookstore. A) Statistic, r B) Parameter, p C) Parameter, μ D) Statistic, Answer: A 59
Diff: 2 Type: BI L.O.: 3.1.1
Var: 1
60
4) Average commute time for employees at a small company, based on interviews with all employees. A) Parameter, μ B) Statistic, r C) Parameter, p D) Statistic, Answer: A Diff: 2 Type: BI Var: 1 L.O.: 3.1.1 5) Average gas price in Minnesota, based on prices at randomly selected gas stations throughout the state. A) Statistic, B) Statistic, r C) Parameter, p D) Parameter, μ Answer: A Diff: 2 Type: BI Var: 1 L.O.: 3.1.1 6) Proportion of students at a university that are part-time, based on data on all students enrolled at the university. A) Parameter, p B) Statistic, r C) Parameter, μ D) Statistic, Answer: A Diff: 2 Type: BI Var: 1 L.O.: 3.1.1 7) Briefly explain the distinction between a parameter and a statistic. Answer: A parameter is a number that describes some aspect of a population, and a statistic is computed from the data in a sample. A parameter is fixed while a statistic varies from sample to sample. Diff: 2 Type: ES Var: 1 L.O.: 3.1.1
61
Use the following to answer the questions below: The sampling distribution shows sample proportions from samples of size n = 35.
8) What does one dot on the sampling distribution represent? Answer: Each dot represents the sample proportion ( ) from a sample of size n = 35. Diff: 2 Type: ES Var: 1 L.O.: 3.1.3 9) Estimate the population proportion from the dotplot. A) 0.56 B) 0.63 C) 0.70 D) 0.91 Answer: B Diff: 2 Type: BI Var: 1 L.O.: 3.1.4 10) Estimate the standard error of the sample proportions. A) 0.07 B) 0.63 C) 0.14 D) 0.01 Answer: A Diff: 2 Type: BI Var: 1 L.O.: 3.1.5
62
11) Using the sampling distribution, how likely is = 0.65? A) Reasonably likely to occur from a sample of this size B) Unusual but might occur occasionally C) Extremely unlikely to ever occur Answer: A Diff: 2 Type: BI Var: 1 L.O.: 3.1.3 12) Using the sampling distribution, how likely is = 0.45? A) Reasonably likely to occur from a sample of this size B) Unusual but might occur occasionally C) Extremely unlikely to ever occur Answer: B Diff: 2 Type: BI Var: 1 L.O.: 3.1.3 13) Using the sampling distribution, how likely is = 0.98? A) Reasonably likely to occur from a sample of this size B) Unusual but might occur occasionally C) Extremely unlikely to ever occur Answer: C Diff: 2 Type: BI Var: 1 L.O.: 3.1.3 14) If samples of size n = 65 had been used instead of n = 35, which of the following would be true? A) The sample statistics would be centered at a larger proportion. B) The sample statistics would be centered at roughly the same proportion. C) The sample statistics would be centered at a smaller proportion. Answer: B Diff: 2 Type: BI Var: 1 L.O.: 3.1.6 15) If samples of size n = 65 had been used instead of n = 35, which of the following would be true? A) The sample statistics would have more variability. B) The variability in the sample statistics would be about the same. C) The sample statistics would have less variability. Answer: C Diff: 2 Type: BI Var: 1 L.O.: 3.1.6
63
Use the following to answer the questions below: The sampling distribution shows sample means from samples of size n = 50.
16) What does one dot on the sampling distribution represent? Answer: Each individual dot represents the sample mean ( ) from a sample of size n = 50. Diff: 2 Type: ES Var: 1 L.O.: 3.1.3 17) Estimate the population mean from the dotplot. A) 62 B) 63 C) 65 D) 67 Answer: C Diff: 2 Type: BI Var: 1 L.O.: 3.1.4 18) Estimate the standard error of the sample means. A) 1 B) 2 C) 3 D) 5 Answer: B Diff: 2 Type: BI Var: 1 L.O.: 3.1.5
64
19) Using the sampling distribution, how likely is = 55.6? A) Reasonably likely to occur from a sample of this size B) Unusual but might occur occasionally C) Extremely unlikely to ever occur Answer: C Diff: 2 Type: BI Var: 1 L.O.: 3.1.3 20) Using the sampling distribution, how likely is = 64.2? A) Reasonably likely to occur from a sample of this size B) Unusual but might occur occasionally C) Extremely unlikely to ever occur Answer: A Diff: 2 Type: BI Var: 1 L.O.: 3.1.3 21) Using the sampling distribution, how likely is = 68.7? A) Reasonably likely to occur from a sample of this size B) Unusual but might occur occasionally C) Extremely unlikely to ever occur Answer: B Diff: 2 Type: BI Var: 1 L.O.: 3.1.3 22) If samples of size n = 30 had been used instead of n = 50, which of the following would be true? A) The sample means would be centered at a larger value. B) The sample means would be centered at the same value. C) The sample means would be centered at a smaller value. Answer: B Diff: 2 Type: BI Var: 1 L.O.: 3.1.6 23) If samples of size n = 30 had been used instead of n = 50, which of the following would be true? A) The sample means would have more variability. B) The variability in the sample statistics would be about the same. C) The sample means would have less variability. Answer: A Diff: 2 Type: BI Var: 1 L.O.: 3.1.6
65
Use the following to answer the questions below: In a survey of 7,786 randomly selected adults living in Germany, 5,840 said they exercised for at least 30 minutes three or more times per week. 24) Identify, with the proper notation, the quantity being estimated. A) p = proportion of German adults who exercise for 30 minutes three or more times per week. B) = proportion number of German adults who exercise for 30 minutes three or more times per week. C) p = the number of German adults who exercise for 30 minutes three or more times per week. D) = the number of German adults who exercise for 30 minutes three or more times per week. Answer: A Diff: 2 Type: BI Var: 1 L.O.: 3.1.1 25) Using the correct notation, give the value of the best estimate of the population parameter. Round your answer to two decimal places. Answer: 0.75 Explanation: = 5,840/7,786 = 0.75 (proportion of the sample that say they exercise for 30 minutes three or more times per week) Diff: 2 Type: SA Var: 1 L.O.: 3.1.2
66
Use the following to answer the questions below: According to U.S. Census data, 71.6% of Americans are age 21 and over. The provided figure shows possible sampling distributions for the proportion of a sample age 21 and over, for samples of size n = 50, n = 125, and n = 250.
Match the sample sizes (n = 50, n = 125, and n = 250) to their sampling distribution. 26) Sample A: n = ________ Answer: 250 Explanation: n = 250 Diff: 2 Type: SA Var: 1 L.O.: 3.1.6 27) Sample B: n = ________ Answer: 50 Explanation: n = 50 Diff: 2 Type: SA Var: 1 L.O.: 3.1.6 28) Sample C: n = ________ Answer: 125 Explanation: n = 125 Diff: 2 Type: SA Var: 1 L.O.: 3.1.6
67
Use the following to answer the questions below: According to ESPN.com, the average number of yards per game for all NFL running backs with at least 50 attempts in the 2011 season was 49 yards/game. A sample of 20 running backs from the 2011 season averaged 46.54 yards/game. 29) Is 49 yards/game a parameter or statistic? A) Parameter B) Statistic Answer: A Diff: 2 Type: BI Var: 1 L.O.: 3.1.1 30) Is 46.54 yards/game a parameter or statistic? A) Parameter B) Statistic Answer: B Diff: 2 Type: BI Var: 1 L.O.: 3.1.1
68
31) Two boxplots are shown. One boxplot corresponds to the yards/game for a random sample of running backs. The other boxplot represents the values in a sampling distribution of 1,000 means of yards/game for samples of size n = 20.
Which boxplot represents the sample? Which boxplot represents the sampling distribution? A) Boxplot A is the sampling distribution while Boxplot B is a single sample. B) Boxplot B is the sampling distribution while Boxplot A is a single sample. Answer: A Explanation: Boxplot A is the sampling distribution while Boxplot B is a single sample. The values plotted in A are sample means, which will have less variability than the data on individual running backs. Diff: 3 Type: BI Var: 1 L.O.: 3.1.0
69
3.2
Understanding and Interpreting Confidence Intervals
Use the following to answer the questions below: A random sample of 200 students shows that 62% of students use the Student Health Center at some point during their time on campus, with a margin of error of ± 4%. Based on this information, identify each of the following as plausible or not for the percent of the entire student body that use the Student Health Center at some point during their time on campus. 1) 50% A) Plausible B) Not Plausible Answer: B Diff: 2 Type: BI L.O.: 3.2.1;3.2.2 2) 60% A) Plausible B) Not Plausible Answer: A Diff: 2 Type: BI L.O.: 3.2.1;3.2.2 3) 65% A) Plausible B) Not Plausible Answer: A Diff: 2 Type: BI L.O.: 3.2.1;3.2.2 4) 72% A) Plausible B) Not Plausible Answer: B Diff: 2 Type: BI L.O.: 3.2.1;3.2.2
Var: 1
Var: 1
Var: 1
Var: 1
70
Use the following to answer the questions below: In a recent Gallup survey of 1,012 randomly selected U.S. adults (age 18 and over), 53% said that they were dissatisfied with the quality of education students receive in kindergarten through grade 12. They also report that the "margin of sampling error is plus or minus 4%." 5) What is the population of interest? A) U.S. adults (age 18 and over) B) 1,012 randomly selected U.S. adults C) U.S. adults dissatisfied with K-12 education D) U.S. adults satisfied with K-12 education Answer: A Diff: 1 Type: BI Var: 1 L.O.: 1.2.1;3.1.0 6) What is the sample being used? A) 1,012 randomly selected U.S. adults B) U.S. adults (age 18 and over) C) U.S. adults dissatisfied with K-12 education D) U.S. adults satisfied with K-12 education Answer: A Diff: 1 Type: BI Var: 1 L.O.: 1.2.1;3.1.0 7) What is the population parameter of interest, and what is the correct notation for this parameter? A) p = proportion of U.S. adults who are dissatisfied with the quality of education students receive in kindergarten through grade 12 B) = proportion of the sample of 1,012 randomly selected U.S. adults who are dissatisfied = 0.53 C) p = proportion of the sample of 1,012 randomly selected U.S. adults who are dissatisfied = 0.53 D) =proportion of U.S. adults who are dissatisfied with the quality of education students receive in kindergarten through grade 12 Answer: A Diff: 2 Type: BI Var: 1 L.O.: 3.1.1
71
8) What is the relevant statistic? A) = proportion of the sample of 1,012 randomly selected U.S. adults who are dissatisfied = 0.53 B) p = proportion of U.S. adults who are dissatisfied with the quality of education students receive in kindergarten through grade 12 C) p = proportion of the sample of 1,012 randomly selected U.S. adults who are dissatisfied = 0.53 D) = proportion of U.S. adults who are dissatisfied with the quality of education students receive in kindergarten through grade 12 Answer: A Diff: 2 Type: BI Var: 1 L.O.: 3.1.1 9) Find an interval estimate for the parameter of interest. Interpret it in terms of dissatisfaction in the quality of education students receive. Use two decimal places in your answer. A) 0.49 to 0.57 We are 95% sure that the true proportion of U.S. adults who are dissatisfied with the quality of education students receive in kindergarten through grade 12 is between 0.49 and 0.57 (i.e., 49% and 57%).0.49 to 0.57 We are 95% sure that the true proportion of U.S. adults who are dissatisfied with the quality of education students receive in kindergarten through grade 12 is between 0.49 and 0.57 (i.e., 49% and 57%). B) 0.51 to 0.55 We are 95% sure that the true proportion of U.S. adults who are dissatisfied with the quality of education students receive in kindergarten through grade 12 is between 0.51 and 0.55 (i.e., 51% and 55%). C) 0.51 to 0.55 We are 95% sure that the proportion of U.S. adults who reported being dissatisfied with the quality of education students receive in kindergarten through grade 12 is between 0.51 and 0.55 (i.e., 51% and 55%). D) 0.49 to 0.57 We are 95% sure that the proportion of U.S. adults who reported being dissatisfied with the quality of education students receive in kindergarten through grade 12 is between 0.49 and 0.57 (i.e., 49% and 57%). Answer: A Diff: 2 Type: BI Var: 1 L.O.: 3.2.1;3.2.4
72
Use the following to answer the questions below: Identify if each of the following statements is a proper interpretation of a 95% confidence interval. 10) I am 95% sure that this interval will contain the population parameter. A) Correct B) Incorrect Answer: A Diff: 2 Type: BI Var: 1 L.O.: 3.2.4 11) I am 95% sure that this interval will contain the sample statistic. A) Correct B) Incorrect Answer: B Diff: 2 Type: BI Var: 1 L.O.: 3.2.4 12) 95% of the population values will fall within this interval. A) Correct B) Incorrect Answer: B Diff: 2 Type: BI Var: 1 L.O.: 3.2.4 13) The probability that the population parameter is in this interval is 0.95. A) Correct B) Incorrect Answer: B Diff: 2 Type: BI Var: 1 L.O.: 3.2.4 14) 95% of the possible samples from this population will have sample statistics in this particular interval. A) Correct B) Incorrect Answer: B Diff: 3 Type: BI Var: 1 L.O.: 3.2.4
73
15) Recently, the Centers for Disease Control and Prevention estimated 9.4% of children under the age of 18 had asthma. They reported the standard error to be 0.35%. Assuming that the sampling distribution is symmetric and bell-shaped, find a 95% confidence interval. A) 8.7% to 10.1% B) 9.1% to 7.8% C) 8.4% to 10.5% D) 8.1% to 10.8% Answer: A Explanation: 0.094 ± 2 ∙ 0.0035 ⇒ 0.094 ± 0.007 ⇒ 0.087 to 0.101 (or, 8.7% to 10.1%) Diff: 2 Type: BI Var: 1 L.O.: 3.2.3 Use the following to answer the questions below: A sample of 148 college students reports sleeping an average of 6.85 hours on weeknights, with a margin of error of 0.35 hours. Based on this information, identify each of the following as plausible or not for the average amount of sleep college students get on weeknights. 16) 6.6 hours A) Plausible B) Not plausible Answer: A Diff: 2 Type: BI L.O.: 3.2.2 17) 7.5 hours A) Plausible B) Not plausible Answer: B Diff: 2 Type: BI L.O.: 3.2.2 18) 8 hours A) Plausible B) Not plausible Answer: B Diff: 2 Type: BI L.O.: 3.2.2
Var: 1
Var: 1
Var: 1
74
Use the following to answer the questions below: In a poll conducted before a Massachusetts city's mayoral election, 134 of 420 randomly chosen likely voters indicated that they planned to vote for the Democratic candidate. 19) Compute a sample statistic from these data. Report your answer with three decimal places. Answer: 0.319 Explanation: = 134/420 = 0.319 Diff: 2 Type: SA Var: 1 L.O.: 3.1.2 20) Suppose that an article describing the poll says that the margin of error for the statistic is 0.045. Use this information to find an interval estimate. Report your answer with three decimal places. Answer: 0.274 to 0.364 Explanation: 0.319 ± 0.045 ⇒ 0.274 to 0.364 Diff: 2 Type: SA Var: 1 L.O.: 3.2.1 21) Suppose that an article describing the poll says that the margin of error for the statistic is 0.045 and an interval estimate is found.What quantity is the interval estimate in trying to capture? Identify with appropriate notation and words. A) p = proportion of likely voters who plan to vote for the Democratic candidate B) = proportion of likely voters who plan to vote for the Democratic candidate C) μ = the mean number of voters who plan to vote for the Democratic candidate D) = the mean number of voters who plan to vote for the Democratic candidate Answer: A Diff: 2 Type: BI Var: 1 L.O.: 3.1.1;3.2.0 22) Suppose that an article describing the poll says that the margin of error for the statistic is 0.045. Use this information to find an interval estimate and interpret the confidence interval. Answer: 0.274 to 0.364. We are 95% sure that the proportion of likely voters who plan to vote for the Democratic candidate is between 0.274 and 0.364. Diff: 2 Type: ES Var: 1 L.O.: 3.2.4
75
23) Suppose that a student collects pulse rates from a random sample of 200 students at her college and finds a 90% confidence interval goes from 65.5 to 71.8 beats per minute. Is the following statement an appropriate interpretation of this interval? If not, explain why not. "90% of the students at my college have mean pulse rates between 65.5 and 71.8 beats per minute." Answer: This is not an appropriate interpretation of the confidence interval. Confidence intervals are constructed to learn about a parameter (a summary of a population), not the individual values in the population (which is what this interpretation is referring to). The correct interpretation of the interval is that "We are 95% sure that the mean pulse rate of all students at her college is between 65.5 and 71.8 beats per minute" - the interval is for a summary of the population values (in this case, the mean pulse rate). Diff: 3 Type: ES Var: 1 L.O.: 3.2.4 3.3
Constructing Bootstrap Confidence Intervals
Use the following to answer the questions below: Identify whether each of the following samples is a possible bootstrap sample from this original sample: 20, 24, 19, 23, 18 1) 24, 18, 23 A) Possible B) Not Possible Answer: B Diff: 2 Type: BI L.O.: 3.3.1 2) 24, 19, 24, 20,23 A) Possible B) Not Possible Answer: A Diff: 2 Type: BI L.O.: 3.3.1 3) 20, 24, 21, 19, 18 A) Possible B) Not Possible Answer: B Diff: 2 Type: BI L.O.: 3.3.1
Var: 1
Var: 1
Var: 1
76
4) 20, 20, 20, 20, 20 A) Possible B) Not Possible Answer: A Diff: 2 Type: BI L.O.: 3.3.1 5) 18, 19, 20, 23, 24 A) Possible B) Not Possible Answer: A Diff: 2 Type: BI L.O.: 3.3.1
Var: 1
Var: 1
6) A sample of size 46 with a mean of 13.6 is to be used to construct a confidence interval for μ. A bootstrap distribution based on 1,000 samples is created. Where will the bootstrap distribution be centered? A) 46 B) 13.6 C) μ D) 1,000 Answer: B Diff: 2 Type: BI Var: 1 L.O.: 3.3.2 7) A sample of n = 10 Illinois gas stations had an average price of $3.975 per gallon. The national average at this time was $3.63. If we want to use the sample data to construct a 95% confidence interval for the average gas price in Illinois, where would the bootstrap distribution be centered? A) 3.63 B) 3.80 C) 3.975 D) 10 Answer: C Diff: 2 Type: BI Var: 1 L.O.: 3.3.2 8) A bootstrap distribution will be centered at the value of the original statistic. Answer: TRUE Diff: 2 Type: TF Var: 1 L.O.: 3.3.2
77
3.4
Bootstrap Confidence Intervals Using Percentiles
1) Decreasing the confidence level (say, from 95% to 85%) will cause the width of a typical confidence interval to ________. A) increase B) decrease C) remain the same Answer: B Diff: 2 Type: BI Var: 1 L.O.: 3.4.3 Use the following to answer the questions below: An Internet provider contacts a random sample of 300 customers and asks how many hours per week the customers use the Internet. It found the average amount of time spent on the Internet per week to be about 7.2 hours. 2) Define the parameter of interest, using the proper notation. A) μ = mean number of hours per week all customers use the Internet B) = 7.2 hours C) = mean number of hours per week all customers use the Internet D) μ = 7.2 hours Answer: A Diff: 2 Type: BI Var: 1 L.O.: 3.1.1 3) Use the information from the sample to give the best estimate of the population parameter. A) = 7.2 hours B) μ = mean number of hours per week all customers use the Internet C) = mean number of hours per week all customers use the Internet D) μ = 7.2 hours Answer: A Diff: 1 Type: BI Var: 1 L.O.: 3.1.2 4) Describe how to use the data to select one bootstrap sample. What statistic is recorded from this sample? Answer: A bootstrap sample of size 300 (the same size as the original sample) would be generated by sampling from the original sample with replacement (i.e., each time a value is selected from the original sample it is "returned" to the sample and can be selected again). The bootstrap statistic would be the sample mean of the bootstrap sample. Diff: 3 Type: ES Var: 1 L.O.: 3.3.1;3.4.2
78
5) The standard error is about 0.458. Find a 95% confidence interval for the parameter. Round the margin of error to two decimal places. A) 6.28 to 8.12 hours B) 7.04 to 7.66 hours C) 5.82 to 8.58 hours D) 6.77 to 7.43 hours Answer: A Explanation: 7.2 ± 2*0.458 ⇒ 7.2 ± 0.92 ⇒ 6.28 to 8.12 hours Diff: 2 Type: BI Var: 1 L.O.: 3.2.3;3.2.4 6) Percentiles of the bootstrap distribution are provided. Use the percentiles to report a 95% confidence interval for the parameter. 1% 2.5% 5% 10% 25% 6.174 6.322 6.438 6.593 6.866
50% 7.17
75% 7.481
90% 7.78
95% 97.5% 99% 7.947 8.082 8.304
A) 6.322 hours to 8.082 hours B) 6.438 hours to 7.947 hours C) 6.174 hours to 8.304 hours D) 6.593 hours to 7.78 hours Answer: A Diff: 2 Type: BI Var: 1 L.O.: 3.4.1 7) Percentiles of the bootstrap distribution are provided. Use the percentiles to report a 90% confidence interval for the parameter. 1% 2.5% 5% 10% 25% 6.174 6.322 6.438 6.593 6.866
50% 7.17
75% 7.481
A) 6.438 hours to 7.947 hours B) 6.322 hours to 8.082 hours C) 6.174 hours to 8.304 hours D) 6.593 hours to 7.78 hours Answer: A Diff: 2 Type: BI Var: 1 L.O.: 3.4.1
79
90% 7.78
95% 97.5% 99% 7.947 8.082 8.304
Use the following to answer the questions below: Suppose that a 95% confidence interval for the slope of a regression line based on a sample of size n = 100 and the percentiles of the slopes for 1,000 bootstrap samples goes from 2.50 to 2.80. For each change described (with all else staying the same), indicate which of the three confidence intervals would be the most likely result. 8) Decrease the sample size to n = 60. A) 2.53 to 2.77 (narrower) B) 2.50 to 2.80 (the same) C) 2.46 to 2.84 (wider) Answer: C Diff: 3 Type: BI Var: 1 L.O.: 3.4.3 9) Increase the confidence level to 99%. A) 2.53 to 2.77 (narrower) B) 2.50 to 2.80 (the same) C) 2.46 to 2.84 (wider) Answer: C Diff: 3 Type: BI Var: 1 L.O.: 3.4.3 10) Increase the number of bootstrap samples to 5,000. A) 2.53 to 2.77 (narrower) B) 2.50 to 2.80 (the same) C) 2.46 to 2.84 (wider) Answer: B Diff: 3 Type: BI Var: 1 L.O.: 3.3.0;3.4.2
80
Use the following to answer the questions below: Suppose we are interested in comparing the proportion of male students who smoke to the proportion of female students who smoke. We have a random sample of 150 students (60 males and 90 females) that includes two variables: Smoke = "yes" or "no" and Gender = "female (F)" or "male (M)." The two-way table below summarizes the results.
Gender = M Gender = F
Smoke = Yes 9 9
Smoke = No 51 81
Sample Size 60 90
11) If the parameter of interest is the difference in proportions, pm - pf, where pm and pf represent the proportion of smokers in each gender, find a point estimate for this difference in proportions based on the data in the table. Report your answer with two decimal places. Answer: 0.05 Explanation: = 9/60 - 9/90 = 0.15 - 0.10 = 0.05 Diff: 2 Type: SA L.O.: 3.1.2
Var: 1
12) Describe how to use the data to construct a bootstrap distribution. What value should be recorded for each of the bootstrap samples. Answer: For each bootstrap sample, randomly select 60 males with replacement from the original sample of 60 males and randomly select 90 females with replacement from the original sample of 90 females. To compute the bootstrap statistic for this sample, compute the proportion of smokers among the bootstrap samples of males and females and find the difference ( - ). Repeat many times (say 1,000 or more times). Diff: 2 Type: ES Var: 1 L.O.: 3.3.1;3.4.2 13) Use technology to construct a bootstrap distribution with at least 1,000 samples and estimate the standard error. A) SE = 0.056 B) SE = 0.067 C) SE = 0.072 D) SE = 0.079 Answer: A Explanation: Answers will vary slightly: SE = 0.056 (based on 5,000 bootstrap samples in Statkey) Diff: 2 Type: BI Var: 1 L.O.: 3.3.3;3.3.4
81
14) Use the estimate of the standard error to construct a 95% confidence interval for the difference in the proportion of smokers between male and female students, Round the margin of error to three decimal places. Provide an interpretation of the interval in the context of this data situation. Answer: Answers will vary slightly: 0.05 ± 2*0.056 ⇒ 0.05 ± 0.112 ⇒ -0.062 to 0.162 We are 95% sure that the difference in the proportion of smokers between male and female students is between -0.062 and 0.162. Diff: 2 Type: ES Var: 1 L.O.: 3.2.3;3.3.5 15) You wish to provide a 98% confidence interval for the difference in the proportion of smokers between male and female students. State which percentiles of your bootstrap distribution you would use. A) Use the 1%- and 99%-tiles B) Use the 2%- and 98%-tiles C) Use the 4%- and 96%-tiles D) Use the 5%- and 95%-tiles Answer: A Explanation: Using the 1% -and 99%-tiles, a 98% confidence interval for the difference in proportions is -0.078 to 0.189 (based on 5,000 bootstrap samples). Diff: 2 Type: BI Var: 1 L.O.: 3.4.1 Use the following to answer the questions below: Suppose we are interested in comparing the proportion of male students who smoke to the proportion of female students who smoke. We have a random sample of 150 students (60 males and 90 females) that includes two variables: Smoke = "yes" or "no" and Gender = "female (F)" or "male (M)." The two-way table below summarizes the results.
Gender = M Gender = F
Smoke = Yes 9 9
Smoke = No 51 81
Sample Size 60 90
16) If the parameter of interest is the difference in proportions,
-
, where
and
,
represent the proportion of smokers in each gender, find a point estimate for this difference in proportions based on the data in the table. Report your answer with two decimal places. Answer: 0.05 Explanation: = 9/60 - 9/90 = 0.15 - 0.10 = 0.05 Diff: 2 Type: SA L.O.: 3.1.2
Var: 1
82
17) Describe how to use the data to construct a bootstrap distribution. What value should be recorded for each of the bootstrap samples. Answer: For each bootstrap sample, randomly select 60 males with replacement from the original sample of 60 males and randomly select 90 females with replacement from the original sample of 90 females. To compute the bootstrap statistic for this sample, compute the proportion of smokers among the bootstrap samples of males and females and find the difference Repeat many times (say 1,000 or more times). Diff: 2 Type: ES Var: 1 L.O.: 3.3.1;3.4.2 18) Where should the bootstrap distribution be centered? A) 0.0 B) 0.05 C) 0.10 D) 0.15 Answer: B Diff: 2 Type: BI Var: 1 L.O.: 3.3.2 19) Describe how you would estimate the standard error from the bootstrap distribution. Answer: Use the standard deviation of the statistics in the bootstrap sample. Diff: 2 Type: ES Var: 1 L.O.: 3.3.4 20) The standard error is estimated to be 0.056. Find (in the context of this data situation) a 95% confidence interval for the difference in the proportion of smokers between male and female students, Round the margin of error to three decimal places. A) -0.062 to 0.162 B) -0.006 to 0.106 C) -0.118 to 0.218 D) 0.475 to 0.525 Answer: A Explanation: 0.05 ± 2*0.056 ⇒ 0.05 ± 0.112 ⇒ -0.062 to 0.162 We are 95% sure that the difference in the proportion of smokers between male and female students is between -0.062 and 0.162. Diff: 2 Type: BI Var: 1 L.O.: 3.2.3
83
21) Percentiles of the bootstrap distribution (based on 5,000 samples) are provided. Use the percentiles to provide a 98% confidence interval for the difference in the proportion of smokers between male and female students. 1% 2.5% 5% 10% 25% -0.078 -0.056 -0.039 -0.022 0.011
75% 0.083
90% 0.122
95% 97.5% 99% 0.144 0.164 0.189
A) -0.078 to 0.189 B) -0.056 to 0.164 C) -0.039 to 0.144 D) -0.022 to 0.122 Answer: A Explanation: Use the 1%- and 99%-iles: -0.078 to 0.189 Diff: 2 Type: BI Var: 1 L.O.: 3.4.1 Use the following to answer the questions below: To create a confidence interval from a bootstrap distribution using percentiles, we keep the middle values and chop off a certain percentage from each tail. Indicate what percent of values must be chopped off from each tail for each confidence level. 22) 95% Answer: 2.5% Diff: 2 Type: SA L.O.: 3.4.1 23) 88% Answer: 6% Diff: 2 Type: SA L.O.: 3.4.1 24) 99% Answer: 0.5% Diff: 2 Type: SA L.O.: 3.4.1 25) 80% Answer: 10% Diff: 2 Type: SA L.O.: 3.4.1 26) 96% Answer: 2% Diff: 2 Type: SA L.O.: 3.4.1
Var: 1
Var: 1
Var: 1
Var: 1
Var: 1
84
Use the following to answer the questions below: There are 24 students enrolled in an introductory statistics class at a small university. As an in-class exercise the students were asked how many hours of television they watch each week. Their responses, broken down by gender, are summarized in the provided table. Assume that the students enrolled in the statistics class are representative of all students at the university. Male
3
1
12
12
0
4
10
4
5
5
2
10
Female
10
3
2
10
3
2
0
1
6
1
5
27) If the parameter of interest is the difference in means,
-
where
10
=6 = 3.91
and
are the
mean number of hours spent watching television for males and females at this university, find a point estimate of the parameter based on the available data. Report your answer with two decimal places. Answer: 2.09 Explanation: - = 6 - 3.91 = 2.09 Diff: 2 Type: SA L.O.: 3.1.2
Var: 1
28) Describe how to use the data to construct a bootstrap distribution. What value should be recorded for each of the bootstrap samples? Answer: For each bootstrap sample, randomly select 13 males with replacement from the original sample of 13 males and randomly select 11 females with replacement from the original sample of 11 females. Compute the sample mean for males and females and find the difference in those sample means; this difference in sample means is the bootstrap statistic. Repeat this process many times. Diff: 2 Type: ES Var: 1 L.O.: 3.3.1;3.4.2 29) If the parameter of interest is the difference in means,
-
where
and
are the
mean number of hours spent watching television for males and females at this university, use technology to construct a bootstrap distribution with at least 1,000 samples and estimate the standard error. A) SE = 1.511 B) SE = 2.283 C) SE = 3.022 D) SE = 18.132 Answer: A Explanation: Answers will vary: SE = 1.511 (based on 5,000 bootstrap samples) Diff: 2 Type: BI Var: 1 L.O.: 3.3.3
85
30) Estimate the standard error and construct a 95% confidence interval for the difference in the mean number of hours spent watching television for males and females at this university. Round the margin of error to two decimal places. A) -0.93 to 5.11 B) -2.48 to 6.66 C) 0.89 to 6.93 D) -0.66 to 8.48 Answer: A Explanation: Answers may vary: 2.09 ± 2*1.511 ⇒ 2.09 ± 3.02 ⇒ -0.93 to 5.11 Diff: 2 Type: BI Var: 1 L.O.: 3.2.3;3.3.5 31) Suppose another class does the same in class exercise and gets a 95% confidence interval of -0.86 to 5.34 for the difference in the mean number of hours spent watching television for males and females at this university. Interpret this 95% confidence interval in the context of this data situation. Answer: We are 95% sure that the difference in mean number of hours of TV for males and females at this university is between -0.86 and 5.34 hours. Diff: 2 Type: ES Var: 1 L.O.: 3.2.4 32) You wish to provide a 95% confidence interval for the difference in the mean number of hours spent watching television for males and females at this university based on a bootstrap distribution. Which percentiles would you use? A) The 2.5%- and 97.5%-iles B) The 5%- and 95%-iles C) The 10%- and 95%-iles D) The 10%- and 90%-iles Answer: A Diff: 2 Type: BI Var: 1 L.O.: 3.4.1
86
Use the following to answer the questions below: There are 24 students enrolled in an introductory statistics class at a small university. As an in-class exercise the students were asked how many hours of television they watch each week. Their responses, broken down by gender, are summarized in the provided table. Assume that the students enrolled in the statistics class are representative of all students at the university. Male
3
1
12
12
0
4
10
4
5
5
2
10
Female
10
3
2
10
3
2
0
1
6
1
5
33) If the parameter of interest is the difference in means,
-
where
10
=6 = 3.91
and
are the
mean number of hours spent watching television for males and females at this university, find a point estimate of the parameter based on the available data. Report your answer with two decimal places. Answer: 2.09 Explanation: - = = 6 - 3.91 = 2.09 Diff: 2 Type: SA L.O.: 3.1.2
Var: 1
34) Describe how to use the data to construct a bootstrap distribution. What value should be recorded for each of the bootstrap samples? Answer: For each bootstrap sample, randomly select 13 males with replacement from the original sample of 13 males and randomly select 11 females with replacement from the original sample of 11 females. Compute the sample mean for males and females and find the difference in those sample means; this difference in sample means is the bootstrap statistic. Repeat this process many times. Diff: 2 Type: ES Var: 1 L.O.: 3.3.1;3.4.2 35) Where should the bootstrap distribution be centered? A) 0 B) 2.09 C) 3.91 D) 6 Answer: B Diff: 2 Type: BI Var: 1 L.O.: 3.3.2 36) Describe how you would estimate the standard error from the bootstrap distribution. Answer: Use the standard deviation of the bootstrap distribution. Diff: 2 Type: ES Var: 1 L.O.: 3.3.4
87
37) The standard error is estimated to be 1.511 (based on 5,000 bootstrap samples). Find a 95% confidence interval for the difference in the mean number of hours spent watching television for males and females at this university. Round the margin of error to two decimal places. Answer: -0.93 to 5.11 Explanation: 2.09 ± 2*1.511 ⇒ 2.09 ± 3.02 ⇒ -0.93 to 5.11 Diff: 2 Type: SA Var: 1 L.O.: 3.2.3;3.3.5 38) The standard error is estimated to be 1.511. Construct and interpret the 95% confidence interval in the context of this data situation. Answer: We are 95% sure that the difference in mean number of hours of TV watched per week for males and females at this university is between -0.93 and 5.11 hours. Diff: 2 Type: ES Var: 1 L.O.: 3.2.4 39) Percentiles of the bootstrap distribution (based on 5,000 samples) are provided. Use the percentiles to provide a 95% confidence interval for the difference in the mean number of hours spent watching television for males and females at this university. Indicate which percentiles you are using. 1% 2.5% 5% 10% -1.497 -0.888 -0.395 0.189
25% 1.105
75% 3.136
90% 4.056
95% 97.5% 99% 4.573 4.972 5.657
A) -0.888 to 4.972 B) -1.497 to 5.657 C) -0.395 to 4.573 D) -0.189 to 4.056 Answer: A Explanation: -0.888 to 4.972 (using the 2.5%- and 97.5%-iles) Diff: 2 Type: BI Var: 1 L.O.: 3.4.1
88
Use the following to answer the questions below: November 6, 2012 was election day. Many of the major television networks aired coverage of the incoming election results during the primetime hours. The provided table displays the amount of time (in minutes) spent watching election coverage for a random sample of 25 U.S. adults. 123 2 71
120 70 97
45 155 73
30 70 90
40 168 69
86 156 5
36 107 68
52 126
86 66
40) What is the population parameter of interest? Define using the appropriate notation. Answer: μ = mean amount of time (in minutes) U.S. adults spent watching election coverage on election night Diff: 2 Type: ES Var: 1 L.O.: 3.1.1 41) Use the data from the sample to estimate the parameter of interest. Report your answer with two decimal places. A) = 80.44 minutes B) = 70.00 minutes C) μ = 80.44 minutes D) μ = 70.00 minutes Answer: A Diff: 2 Type: BI Var: 1 L.O.: 3.1.2 42) Describe how to use the data to construct a bootstrap distribution. What value should be recorded for each of the bootstrap samples? Answer: For each bootstrap sample, randomly generate a sample of 25 with replacement from the original sample of 25. Compute the sample mean for each sample to use as the bootstrap statistic. Repeat many times. Diff: 2 Type: ES Var: 1 L.O.: 3.3.1;3.4.2 43) Use technology to construct a bootstrap distribution with at least 1,000 samples and estimate the standard error. A) SE = 8.769 B) SE = 17.538 C) SE = 11.471 D) SE = 22.942 Answer: A Explanation: Answers may vary: SE = 8.769 (based on 5,000 bootstrap samples) Diff: 2 Type: BI Var: 1 L.O.: 3.3.3;3.3.4
89
44) Use the estimate of the standard error to construct a 95% confidence interval for the mean amount of time (in minutes) U.S. adults spent watching election coverage on election night. Use three decimal places in your answer. A) 62.902 to 97.978 minutes B) 71.671 to 89.209 minutes C) 52.462 to 87.538 minutes D) 58.529 to 81.471 minutes Answer: A Explanation: Answers may vary: 80.44 ± 2*8.769 ⇒ 80.44 ± 17.538 ⇒ 62.902 to 97.978 minutes Diff: 2 Type: BI Var: 1 L.O.: 3.2.3;3.3.4 45) Suppose you wish to use the percentiles of your bootstrap distribution to provide a 92% confidence interval for the mean amount of time (in minutes) U.S. adults spent watching election coverage on election night. Which percentiles would you use? A) The 4%- and 96%-iles. B) The 1%- and 99%-iles. C) The 2%- and 98%-iles. D) The 5%- and 95%-iles. Answer: A Diff: 2 Type: BI Var: 1 L.O.: 3.4.1 46) Interpret your 92% confidence interval in the context of this data situation. Answer: We are 92% sure that the mean amount of time (in minutes) U.S. adults spent watching election coverage on election night is between 65.16 and 95.78 minutes. Diff: 2 Type: ES Var: 1 L.O.: 3.2.4 Use the following to answer the questions below: November 6, 2012 was election day. Many of the major television networks aired coverage of the incoming election results during the primetime hours. The provided table displays the amount of time (in minutes) spent watching election coverage for a random sample of 25 U.S. adults. 123 2 71
120 70 97
45 155 73
30 70 90
40 168 69
86 156 5
36 107 68
52 126
86 66
47) What is the population parameter of interest? Define using the appropriate notation. Answer: μ = mean amount of time (in minutes) U.S. adults spent watching election coverage on election night Diff: 2 Type: ES Var: 1 L.O.: 3.1.1
90
48) Use the data from the sample to estimate the parameter of interest. Report your answer with two decimal places. A) = 80.44 minutes B) μ = 70.00 minutes C) μ = 80.44 minutes D) = 70.00 minutes Answer: A Diff: 2 Type: BI Var: 1 L.O.: 3.1.2 49) Describe how to use the data to construct a bootstrap distribution. What value should be recorded for each of the bootstrap samples? Answer: For each bootstrap sample, randomly generate a sample of 25 with replacement from the original sample of 25. Compute the sample mean for each sample to use as the bootstrap statistic. Repeat many times. Diff: 2 Type: ES Var: 1 L.O.: 3.3.1;3.4.2 50) Where should the bootstrap distribution be centered? A) 25 B) 60 C) 80.44 D) 100 Answer: C Diff: 2 Type: BI Var: 1 L.O.: 3.3.2 51) Describe how you would estimate the standard error from the bootstrap distribution. Answer: Use the standard deviation of the bootstrap distribution. Diff: 2 Type: ES Var: 1 L.O.: 3.3.4 52) The standard error is estimated to be 8.769 (based on 5,000 bootstrap samples). Find a 95% confidence interval for the mean amount of time (in minutes) U.S. adults spent watching election coverage on election night. Round the margin of error to two decimal places. A) 62.90 to 97.98 minutes B) 52.462 to 87.538 minutes C) 71.671 to 89.209 minutes D) 58.529 to 81.471 minutes Answer: A Explanation: Answers may vary: 80.44 ± 2*8.769 ⇒ 80.44 ± 17.538 ⇒ 62.902 to 97.978 minutes Diff: 2 Type: BI Var: 1 L.O.: 3.2.3;3.3.5
91
53) Percentiles of the bootstrap distribution (based on 5,000 samples) are provided. Use the percentiles to provide a 92% confidence interval for the mean amount of time (in minutes) U.S. adults spent watching election coverage on election night. Indicate which percentiles you are using. 2% 63.000
4% 65.160
6% 66.880
8% 68.240
92% 92.740
94% 94.080
96% 95.780
98% 98.54
A) 65.160 to 95.780 minutes (use the 4%- and 96%-iles) B) 63.000 to 98.540 minutes (use the 2%- and 98%-iles) C) 66.880 to 94.080 minutes (use the 6%- and 94%-iles) D) 68.240 to 92.740 minutes (use the 8%- and 92%-iles) Answer: A Diff: 2 Type: BI Var: 1 L.O.: 3.4.1 54) Interpret your 92% confidence interval in the context of this data situation. Answer: We are 92% sure that the mean amount of time (in minutes) U.S. adults spent watching election coverage on election night is between 65.16 and 95.78 minutes. Diff: 2 Type: ES Var: 1 L.O.: 3.2.4 Use the following to answer the following questions: A study to investigate the dominant paws in cats was described in the scientific journal Animal Behaviour. The researchers used a random sample of 42 domestic cats. In this study, each cat was shown a treat (5 grams of tuna), and while the cat watched, the food was placed inside a jar. The opening of the jar was small enough that the cat could not stick its head inside to remove the treat. The researcher recorded the paw that was first used by the cat to try to retrieve the treat. This was repeated 100 times for each cat (over a span of several days). The paw used most often was deemed the dominant paw (note that one cat used both paws equally and was classified as "ambidextrous"). Of the 42 cats studied, 20 were classified as "left-pawed." 55) What is the population parameter of interest? Define using the appropriate notation. Answer: p = proportion of domestic cats that are "left-pawed" Diff: 2 Type: ES Var: 1 L.O.: 3.1.1 56) Use the data from the sample to estimate the parameter of interest. Report your answer with three decimal places. Answer: 0.476 Explanation: = 20/42 = 0.476 Diff: 1 Type: SA Var: 1 L.O.: 3.1.2
92
57) Describe how to use the data to construct a bootstrap distribution. What value should be recorded for each of the bootstrap samples? Answer: For each bootstrap sample, randomly generate a sample of size 42 cats with replacement from the original sample of 42 cats. For each bootstrap sample, compute the sample proportion of "left-pawed" cats — this is the bootstrap statistic. Diff: 2 Type: ES Var: 1 L.O.: 3.3.1;3.4.2 58) Use technology to construct a bootstrap distribution with at least 1,000 samples and estimate the standard error. A) SE = 0.078 B) SE = 0.042 C) SE = 0.156 D) SE = 0.636 Answer: A Explanation: Answers will vary: SE = 0.078 (based on 5,000 bootstrap samples) Diff: 2 Type: BI Var: 1 L.O.: 3.3.3;3.3.4 59) Suppose you estimate of the standard error to be 0.056. Construct a 95% confidence interval for the proportion of domestic cats that are "left-pawed". Round the margin of error to three decimal places. A) 0.364 to 0.588 B) 0.420 to 0.532 C) 0.392 to 0.560 D) 0.402 to 0.550 Answer: A Explanation: Answers will vary: 0.476 ± 2*0.056 ⇒ 0.476 ± 0.112 ⇒ 0.364 to 0.588 Diff: 2 Type: BI Var: 1 L.O.: 3.2.3;3.3.5 60) Use technology to construct a bootstrap distribution with at least 1,000 samples. Use the percentiles of your bootstrap distribution to provide a 99% confidence interval for the parameter. Indicate the percentiles that you use. A) 0.262 to 0.667 (using the 0.5%- and 99.5%-iles) B) 0.286 to 0.643 (use 1%- and 99%-iles) C) 0.310 to 0.619 (use 2.5%- and 97.5%-iles) D) 0.357 to 0.595 (use 5%- and 95%-iles) Answer: A Explanation: Answers will vary: 0.262 to 0.667 (using the 0.5%- and 99.5%-iles) Diff: 2 Type: BI Var: 1 L.O.: 3.4.1
93
61) Construct a 99% confidence interval and provide an interpretation of it in the context of this data situation. Answer: We are 99% sure that the proportion of domestic cats that are "left-pawed" is between 0.262 and 0.667. Diff: 2 Type: ES Var: 1 L.O.: 3.2.4 62) The researchers were also interested in comparing the proportion of "left-pawed" cats for male and female cats. Of the 21 male cats in the sample, 19 were classified as "left-pawed" while only 1 of the 21 female cats was considered to be "left-pawed." A bootstrap distribution (based on 1,000 bootstrap samples) for difference in the proportion of "left-pawed" cats is provided. Would it be appropriate to use this bootstrap distribution to construct a confidence interval for the difference in the proportion of male and female cats that are "left-pawed"?
A) Yes B) No Answer: B Explanation: No. The bootstrap distribution is not quite symmetric. Diff: 3 Type: MC Var: 1 L.O.: 3.4.4
94
Use the following to answer the questions below: A study to investigate the dominant paws in cats was described in the scientific journal Animal Behaviour. The researchers used a random sample of 42 domestic cats. In this study, each cat was shown a treat (5 grams of tuna), and while the cat watched, the food was placed inside a jar. The opening of the jar was small enough that the cat could not stick its head inside to remove the treat. The researcher recorded the paw that was first used by the cat to try to retrieve the treat. This was repeated 100 times for each cat (over a span of several days). The paw used most often was deemed the dominant paw (note that one cat used both paws equally and was classified as "ambidextrous"). Of the 42 cats studied, 20 were classified as "left-pawed." 63) What is the population parameter of interest? Define using the appropriate notation. Answer: p = proportion of domestic cats that are "left-pawed" Diff: 2 Type: ES Var: 1 L.O.: 3.1.1 64) Use the data from the sample to estimate the parameter of interest. Round your answer to three decimal places. Answer: 0.476 Explanation: = 20/42 = 0.476 Diff: 1 Type: SA Var: 1 L.O.: 3.1.2 65) Describe how to use the data to construct a bootstrap distribution. What value should be recorded for each of the bootstrap samples? Answer: For each bootstrap sample, randomly generate a sample of size 42 cats with replacement from the original sample of 42 cats. For each bootstrap sample, compute the sample proportion of "left-pawed" cats — this is the bootstrap statistic. Diff: 2 Type: ES Var: 1 L.O.: 3.3.1;3.4.2 66) Where should the bootstrap distribution be centered? A) 0.476 B) 20 C) 42 D) 0.95 Answer: A Diff: 2 Type: BI Var: 1 L.O.: 3.3.2 67) Describe how you would estimate the standard error from the bootstrap distribution. Answer: Use the standard deviation of the bootstrap distribution. Diff: 2 Type: ES Var: 1 L.O.: 3.3.4
95
68) The standard error is estimated to be 0.078 (based on 5,000 bootstrap samples). Find a 95% confidence interval for the proportion of domestic cats that are "left-pawed". Round the margin of error to three decimal places. A) 0.320 to 0.632 B) .398 to 0.554 C) 0.262 to 0.667 D) 0.286 to 0.643 Answer: A Explanation: 0.476 ± 2*0.078 ⇒ 0.476 ± 0.156 ⇒ 0.32 to 0.632 Diff: 2 Type: BI Var: 1 L.O.: 3.2.3;3.3.5 69) Percentiles of the bootstrap distribution (based on 5,000 samples) are provided. Use the percentiles to provide a 99% confidence interval for the parameter. Indicate the percentiles that you use. 0.5% 0.262
1% 0.286
2.5% 0.310
5% 0.357
95% 0.595
97.5% 0.619
99% 0.643
99.5% 0.667
A) 0.262 to 0.667 (use 0.5%- and 99.5%-iles) B) 0.286 to 0.643 (use 1%- and 99%-iles) C) 0.310 to 0.619 (use 2.5%- and 97.5%-iles) D) 0.357 to 0.595 (use 5%- and 95%-iles) Answer: A Diff: 2 Type: BI Var: 1 L.O.: 3.4.1 70) Percentiles of the bootstrap distribution (based on 5,000 samples) are provided. Use the percentiles to provide a 99% confidence interval for the parameter. Provide an interpretation of your 99% confidence interval in the context of this data situation. 0.5% 0.262
1% 0.286
2.5% 0.310
5% 0.357
95% 0.595
97.5% 0.619
99% 0.643
99.5% 0.667
Answer: We are 99% sure that the proportion of domestic cats that are "left-pawed" is between 0.262 and 0.667. Diff: 2 Type: ES Var: 1 L.O.: 3.2.4
96
71) The researchers were also interested in comparing the proportion of "left-pawed" cats for male and female cats. Of the 21 male cats in the sample, 19 were classified as "left-pawed" while only 1 of the 21 female cats was considered to be "left-pawed". A bootstrap distribution (based on 1,000 bootstrap samples) for difference in the proportion of "left-pawed" cats is provided. Would it be appropriate to use this bootstrap distribution to construct a confidence interval for the difference in the proportion of male and female cats that are "left-pawed"? Briefly explain.
Answer: No. The bootstrap distribution is not quite symmetric. Diff: 3 Type: ES Var: 1 L.O.: 3.4.4 72) A bootstrap distribution, based on 1,000 bootstrap samples is provided. Use the distribution to estimate a 99% confidence interval for the population mean. Explain how you arrived at your answer.
Answer: Answers may vary slightly: Since we are interested in a 99% confidence interval, 0.5% of the points should be in each tails of the distribution (i.e., 0.5% of 1,000 is 1,000(0.005) = 5 dots). The approximate cut-offs that leave 5 dots in each tail are 46 and 52.6. (In Statkey the actual cut-offs are 45.994 and 52.642.) Diff: 3 Type: ES Var: 1 L.O.: 3.4.1 97
Use the following to answer the questions below: A biologist collected data on a random sample of porcupines. She wants to estimate the correlation between the body mass of a porcupine (in grams) and the length of the porcupine (in cm). 73) Her sample consists of 20 porcupines. A bootstrap distribution for the correlation between body mass and length (based on 1,000 samples) is provided. Would it be appropriate to use this bootstrap distribution to estimate a 95% confidence interval for the correlation between body mass and length of porcupines?
A) Yes B) No Answer: B Explanation: No. The bootstrap distribution is extremely skewed. Diff: 2 Type: MC Var: 1 L.O.: 3.4.4
98
74) The biologist noted that two of the porcupines were much smaller than the others, and thus they were likely not "adults". Since she is only interested in adult porcupines, the biologist wants to use the 18 adults to estimate the correlation between body mass and body length. The sample correlation is 0.407. Her bootstrap distribution is provided. The standard error is estimated to be 0.165.
If appropriate, construct and interpret a 95% confidence interval for the correlation between body mass and body length for adult porcupines (with the margin of error rounded to three decimal places). If not appropriate, explain why not. Answer: Because the bootstrap distribution is roughly symmetric, it is appropriate to use the bootstrap distribution to construct a confidence interval. 0.407 ± 2*0.165 ⇒ 0.407 ± 0.33 ⇒ 0.077 to 0.737 We are 95% sure that the correlation between body mass and body length for adult porcupines is between 0.077 and 0.737. Diff: 2 Type: ES Var: 1 L.O.: 3.2.3;3.2.4;3.3.5;3.4.4 Use the following to answer the questions below: In a survey conducted by the Gallup organization, 1,017 adults were asked, "In general, how much trust and confidence do you have in the mass media — such as newspapers, TV, and radio — when it comes to reporting the news fully, accurately, and fairly?" 81 said that they had a "great deal" of confidence, 325 said they had a "fair amount" of confidence, 397 said they had "not very much" confidence, and 214 said they had "no confidence at all." 75) Suppose the parameter of interest is the proportion of U.S. adults who have "no confidence at all" in the media. Use the data to find an estimate of this parameter. Report your answer with two decimal places. Answer: 0.21 Explanation: = 214/(81 + 325 + 397 + 214) = 214/1,017 = 0.21 Diff: 2 Type: SA Var: 1 L.O.: 3.1.2 99
76) Describe how to use the data to construct a bootstrap distribution. What value should be recorded for each of the bootstrap samples? Answer: For a single bootstrap sample, generate a sample of 1,017 responses with replacement from the original sample of responses. For each sample, compute the sample proportion of responses that are "no confidence at all" — this is the bootstrap statistic. Repeat many times. Diff: 2 Type: ES Var: 1 L.O.: 3.3.1;3.4.2 77) Use technology to construct a bootstrap distribution with at least 1,000 samples and estimate the standard error. A) SE = 0.013 B) SE = 0.026 C) SE = 0.001 D) SE = 0.002 Answer: A Explanation: Answers may vary: SE = 0.013 (based on 5,000 bootstrap samples) Diff: 2 Type: BI Var: 1 L.O.: 3.3.3;3.3.4 78) Use the estimate of the standard error to construct a 95% confidence interval for the proportion of U.S. adults who have no confidence in the media. Round the margin of error to three decimal places. A) 0.184 to 0.236 B) 0.197 to 0.223 C) 0.394 to 0.446 D) 0.407 to 0.433 Answer: A Explanation: Answers may vary: 0.21 ± 2*0.013 ⇒ 0.21 ± 0.026 ⇒ 0.184 to 0.236 Diff: 2 Type: BI Var: 1 L.O.: 3.2.3;3.3.5 79) Construct a 95% confidence interval for the proportion of U.S. adults who have no confidence in the media. Provide an interpretation of your 95% confidence interval in the context of this data situation. Answer: We are 95% sure that the proportion of U.S. adults who have no confidence in the media is between 0.184 and 0.236. Diff: 2 Type: ES Var: 1 L.O.: 3.2.4
100
80) Suppose you wish to use the percentiles of your bootstrap distribution to provide a 95% confidence interval for the proportion of U.S. adults who have no confidence in the media. Which percentiles would you use? A) The 2.5%- and 97.5%-tiles. B) The 1%- and 99%-tiles. C) The 5%- and 95%-tiles. D) The 10%- and 90%-tiles. Answer: A Diff: 2 Type: BI Var: 1 L.O.: 3.4.1 Use the following to answer the questions below: In a survey conducted by the Gallup organization, 1,017 adults were asked "In general, how much trust and confidence do you have in the mass media — such as newspapers, TV, and radio — when it comes to reporting the news fully, accurately, and fairly?" 81 said that they had a "great deal" of confidence, 325 said they had a "fair amount" of confidence, 397 said they had "not very much" confidence, and 214 said they had "no confidence at all." 81) Suppose the parameter of interest is the proportion of U.S. adults who have "no confidence at all" in the media. Use the data to find an estimate of this parameter. Report your answer with two decimal places. Answer: 0.21 Explanation: = 214/(81 + 325 + 397 + 214) = 214/1017 = 0.21 Diff: 2 Type: SA Var: 1 L.O.: 3.1.2 82) Describe how to use the data to construct a bootstrap distribution. What value should be recorded for each of the bootstrap samples? Answer: For a single bootstrap sample, generate a sample of 1,017 responses with replacement from the original sample of responses. For each sample, compute the sample proportion of responses that are "no confidence at all" — this is the bootstrap statistic. Repeat many times. Diff: 2 Type: ES Var: 1 L.O.: 3.3.1;3.4.2 83) Describe how you would estimate the standard error from the bootstrap distribution. Answer: Use the standard deviation of the bootstrap distribution. Diff: 2 Type: ES Var: 1 L.O.: 3.3.4
101
84) The estimate of the standard error is 0.013. Use the estimate of the standard error to construct a 95% confidence interval for the proportion of U.S. adults who have no confidence in the media. Round the margin of error to three decimal places. A) 0.184 to 0.236 B) 0.197 to 0.223 C) 0.190 to 0.231 D) 0.194 to 0.227 Answer: A Explanation: 0.21 ± 2*0.013 ⇒ 0.21 ± 0.026 ⇒ 0.184 to 0.236 Diff: 2 Type: BI Var: 1 L.O.: 3.2.3;3.3.5 85) Construct a 95% confidence interval for the proportion of U.S. adults who have no confidence in the media. Provide an interpretation of your 95% confidence interval in the context of this data situation. Answer: We are 95% sure that the proportion of U.S. adults who have no confidence in the media is between 0.184 and 0.236. Diff: 2 Type: ES Var: 1 L.O.: 3.2.4 86) Percentiles of the bootstrap distribution (based on 5,000 samples) are provided. Which percentiles would you use 95% confidence interval for the proportion of U.S. adults who have no confidence in the media? 1% 0.181
2.5% 0.186
5% 0.190
10% 0.194
90% 0.227
95% 0.231
97.5% 0.235
99% 0.239
A) The 2.5%- and 97.5%-tiles. B) The 1%- and 99%-tiles. C) The 5%- and 95%-tiles. D) The 10%- and 90%-tiles. Answer: A Diff: 2 Type: BI Var: 1 L.O.: 3.4.1 87) In a dotplot of a bootstrap distribution, the number of dots should match the size of the original sample. Answer: FALSE Diff: 2 Type: TF Var: 1 L.O.: 3.3.0;3.4.2
© 2021 John Wiley & Sons, Inc. All rights reserved. Instructors who are authorized users of this course are permitted to download these materials and use them in connection with the course. Except as permitted herein or by law, no part of these materials should be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise.
Statistics - Unlocking the Power of Data, 3e (Lock) Chapter 4 Hypothesis Tests 102
4.1
Introducing Hypothesis Tests
1) The p-value is A) the probability that the null hypothesis is true. B) the probability that the alternative hypothesis is true. C) the probability, when the null hypothesis is true, of obtaining a sample as extreme as (or more extreme than) the observed sample. D) the probability, when the alternative hypothesis is true, of obtaining a sample as extreme as (or more extreme than) the observed sample. Answer: C Diff: 2 Type: MC Var: 1 L.O.: 4.2.1 2) The following figure shows a randomization distribution for the hypotheses versus
The statistic used for each sample is
possible sample results provides the most evidence against
A)
= 56.5;
= 51.3
B)
= 50.2;
= 53.1
Answer: A Diff: 1 Type: MC L.O.: 4.1.3
Var: 1
103
Which of the two ?
3) The average SAT-Critical Reading score for college bound students taking the exam in the 2018-2019 academic year was 501 531. A highly selective university wants to know if their 2020 incoming class had an average SAT-Critical Reading score that was higher than the national average. Which of the following possible samples provides the most evidence for this claim?
A) Sample A B) Sample B C) Sample C D) Sample D Answer: D Diff: 1 Type: BI L.O.: 4.1.3
Var: 1
4) A statistical test uses data from a sample to assess a claim about a population. Answer: TRUE Diff: 2 Type: TF Var: 1 L.O.: 4.1.1 5) Identify the error in the following hypotheses:
versus
Answer: The hypotheses should use p, not . Diff: 2 Type: ES Var: 1 L.O.: 4.1.2 6) Identify the error in the following hypotheses: of
versus
Answer: The null hypothesis should be "=" as only a single value can be specified; the not equal should be used in the alternative. Diff: 2 Type: ES Var: 1 L.O.: 4.1.2
104
7) Identify the error in the following hypotheses:
versus
Answer: The "30" is invalid as proportions must be between 0 and 1. Diff: 2 Type: ES Var: 1 L.O.: 4.1.2 8) Which of the following samples provides the most evidence that the amount of time spent studying for an exam and the grade on the exam are positively correlated?
A) Sample A B) Sample B C) Sample C D) Sample D Answer: A Diff: 2 Type: BI L.O.: 4.1.3
Var: 1
Use the following to answer the questions below: A study described in Attention, Perception, and Psychophysics investigated the impacts of multi-tasking on people who play video games and those who don't. Participants in the study were asked to perform three visually demanding tasks with (dual-task) and without (single-task) answering unrelated questions over the phone. One of the tasks involved tracking multiple circles moving around on a computer monitor. At the 5% significance level, the authors of the study concluded "tracking accuracy was significantly worse in the dual-task condition" for both people who play video games and those who do not. 9) What does the phrase "significantly worse" mean in this context? Answer: The tracking accuracy tended to be lower for the individuals who are multi-tasking, and the score is so much lower that the difference is not likely to be due to random chance. Diff: 2 Type: ES Var: 1 L.O.: 4.1.4 105
106
4.2
Measuring Evidence with P-values
1) Of the two p-values, which provides more evidence against A) p-value = 0.49 B) p-value = 0.007 Answer: B Diff: 1 Type: MC L.O.: 4.2.4
?
Var: 1
Use the following to answer the questions below: Consider testing the hypotheses
: p = 0.4 versus
: p > 0.4. Four possible sample
statistics, along with four possible p-values, are given. Match the statistics to their p-values. A
= 0.42
B
= 0.38
C
2) ________ p-value = 0.72 Answer: B Diff: 2 Type: SA Var: 1 L.O.: 4.2.1 3) ________ p-value = 0.293 Answer: A Diff: 2 Type: SA Var: 1 L.O.: 4.2.1 4) ________ p-value = 0.138 Answer: D Diff: 2 Type: SA Var: 1 L.O.: 4.2.1 5) ________ p-value = 0.019 Answer: C Diff: 2 Type: SA Var: 1 L.O.: 4.2.1
107
= 0.51
D
= 0.46
6) The randomization distribution for testing the hypotheses provided. The sample statistic is
versus
Use the provided randomization distribution
(based on 100 samples) to estimate the p-value for this test.
Answer: 0.01 Explanation: 1/100 = 0.01 Diff: 2 Type: SA Var: 1 L.O.: 4.2.2
108
is
7) The provided figure displays the randomization distribution for testing
versus
The p-value for the sample mean = 112 is closest to A) 0.01 B) 0.25 Answer: A Diff: 2 Type: BI Var: 1 L.O.: 4.2.2 8) Decreasing the significance level of a hypothesis test (say, from 5% to 1%) will cause the p-value of an observed test statistic to A) increase. B) decrease. C) stay the same. Answer: C Diff: 3 Type: BI Var: 1 L.O.: 4.2.1;4.2.3 9) Using the definition of a p-value, explain why the area in the tail of a randomization distribution is used to compute a p-value. Answer: A p-value is the probability of observing an outcome as or more extreme than the sample outcome if the null hypothesis is true. A randomization distribution is generated to be consistent with the null hypothesis, and thus contains the types of outcomes we should expect to see in samples of this size if the null hypothesis is true. By locating the observed sample statistic in the randomization distribution and examining the tail of the distribution, we are identifying outcomes that are as or more extreme than the observed outcome. The area that corresponds to these values is the probability of observing these types of outcomes in this distribution (which was formed under the assumption that the null hypothesis is true); this area is the p-value. Diff: 2 Type: ES Var: 1 L.O.: 4.2.1;4.2.3 109
4.3
Determining Statistical Significance
1) A Type I error occurs by A) rejecting the null hypothesis when the null hypothesis is false. B) not rejecting the null hypothesis when the null hypothesis is false. C) rejecting the null hypothesis when the null hypothesis is true. D) not rejecting the null hypothesis when the null hypothesis is true. Answer: C Diff: 2 Type: BI Var: 1 L.O.: 4.3.3 2) A Type II error occurs by A) rejecting the null hypothesis when the null hypothesis is false. B) not rejecting the null hypothesis when the null hypothesis is false. C) rejecting the null hypothesis when the null hypothesis is true. D) not rejecting the null hypothesis when the null hypothesis is true. Answer: B Diff: 2 Type: BI Var: 1 L.O.: 4.3.3 3) Using a significance level of 5%, the appropriate conclusion for a test with a p-value of 0.0421 would be: A) Reject B) Do not reject Answer: A Diff: 1 Type: BI L.O.: 4.3.1
Var: 1
4) The significance level, α, represents the tolerable probability of making a Type II error. Answer: FALSE Diff: 2 Type: TF Var: 1 L.O.: 4.3.4 Use the following to answer the questions below: Match each p-value to the most appropriate conclusion. A 0.0001
B 0.0735
C 0.6082
D 0.0361
5) ________ "The evidence against the null and in favor of the alternative is very strong." Answer: A Diff: 2 Type: SA Var: 1 L.O.: 4.3.1
110
6) ________ "The result is significant at the 5% level but not at a 1% level." Answer: D Diff: 2 Type: SA Var: 1 L.O.: 4.3.1 7) ________ "There is really no evidence supporting the alternative hypothesis." Answer: C Diff: 2 Type: SA Var: 1 L.O.: 4.3.1 8) ________ "The evidence against the null is significant, but only at the 10% level." Answer: B Diff: 2 Type: SA Var: 1 L.O.: 4.3.1 Use the following to answer the questions below: A study described in Attention, Perception, and Psychophysics investigated the impacts of multi-tasking on people who play video games and those who don't. Participants in the study were asked to perform three visually demanding tasks with (dual-task) and without (single-task) answering unrelated questions over the phone. One of the tasks involved tracking multiple circles moving around on a computer monitor. At the 5% significance level, the authors of the study concluded "tracking accuracy was significantly worse in the dual-task condition" for both people who play video games and those who do not. 9) What conclusion would the authors have made at the 10% significance level? A) Tracking accuracy was significantly worse in the dual-task condition. B) Tracking accuracy was not significantly worse in the dual-task condition. C) Not enough information Answer: A Diff: 3 Type: MC Var: 1 L.O.: 4.3.1 10) What conclusion would the authors have made at the 1% significance level? A) Tracking accuracy was significantly worse in the dual-task condition. B) Tracking accuracy was not significantly worse in the dual-task condition. C) Not enough information Answer: C Diff: 3 Type: MC Var: 1 L.O.: 4.3.1 11) Which type of error, Type I or Type II, could have occurred in this situation? Briefly justify your answer. Answer: They found that tracking accuracy was "significantly worse," which suggests they rejected their null hypothesis. When a null hypothesis is rejected, the Type I error is the possible error. Diff: 2 Type: ES Var: 1 L.O.: 4.3.3 111
4.4
A Closer Look at Testing
1) It is of interest to test the hypotheses
: p = 0.8 versus
The sample outcome,
based on observations, is and the randomization statistic to be calculated is . The p-value for this test was found to be 0.322. If the test was performed correctly, where should the randomization distribution be centered? A) 0.7 B) 10 C) 0.8 D) 0.322 Answer: C Diff: 1 Type: BI Var: 1 L.O.: 4.4.1 2) It is believed that about 37% of college students binge drink (5 or more drinks for men, and 4 or more drinks for women, in two hours). Administrators at a small university of 6,000 students want to do a study to determine if the proportion of their students who binge drink differs from 37%. They select a sample of 98 students enrolled at the university to survey about their drinking behavior. When generating the randomization distribution for this test, how large should each individual randomization sample be? A) 98 because that is the size of the original sample B) 1,000 to get an accurate randomization distribution C) 6,000 because that is the size of the university D) 2,220 because that is 37% of the students at the university Answer: A Diff: 2 Type: BI Var: 1 L.O.: 4.4.1 3) When generating a randomization sample, the sample should be consistent with the ________ hypothesis. Answer: null Diff: 1 Type: SA Var: 1 L.O.: 4.4.1 4) The null and alternative hypotheses for a test are
vs.
Give the
notation for a sample statistic we might record for each simulated sample to create the randomization distribution. A) B) p C) μ D) Answer: A Diff: 2 Type: BI Var: 1 L.O.: 4.4.1 112
Use the following to answer the questions below: A student in an introductory statistics course investigated if there is evidence that the proportion of milk chocolate M&M's that are green differs from the proportion of dark chocolate M&M's that are green. She purchased a bag of each variety, and her data are summarized in the following table.
Milk Chocolate Dark Chocolate Total
Green 8 4 12
Not Green 33 38 71
Total 41 42 83
5) Describe how you would generate a single randomization sample in this situation, and identify (using the appropriate notation) the sample statistic you would record for each sample. Answer: The student wants to know if the proportion of green candies differs for the two types of M&M's, implying that the null hypothesis would be that the proportion of green candies is the sample for the two types. If the null hypothesis were true, green candies would be equally likely to appear in either bag. Since there are 83 candies, we should construct a deck of 83 cards - 12 of which are green (representing the 12 total green candies she observed) and the rest can be white (representing the "other" colors she observed). We would shuffle the deck and deal out two piles (one with 41 cards to represent the Milk Chocolate candies and the other with 42 cards to represent the Dark Chocolate candies). From those samples we could record the difference in the sample proportions of green candies: Diff: 2 Type: ES L.O.: 4.4.1
Var: 1
113
6) Use the provided randomization distribution (based on 100 samples) to test if this sample provides evidence that the proportion of candies that are green differs for the two types of M&M's. Include an assessment of the strength of your evidence.
Answer: Parameters:
= the proportion of milk chocolate candies that are green and
= the
proportion of dark chocolate candies that are green Hypothesis: : = versus : ≠ The sample difference in proportions is
-
= 8/41 - 4/42 = 0.195 - 0.095 = 0.10.
In the randomization distribution, 16 dots of the 100 (16/100) dots greater than or equal to 0.10. However, since this is a two-sided test, this must be multiplied by 2 to obtain the correct p-value: This p-value provides no evidence that the proportion of green candies differs for milk chocolate and dark chocolate M&M's. Diff: 2 Type: ES Var: 1 L.O.: 4.1.2; 4.2.2; 4.2.5; 4.3.2; 4.3.5; 4.4.4
114
7) Use technology and the provided data to test if this sample provides evidence that the proportion of candies that are green differs for the two types of M&M's. Include an assessment of the strength of your evidence. Answer: Parameters: = the proportion of milk chocolate candies that are green and = the proportion of dark chocolate candies that are green Hypothesis: : = versus The sample difference in proportions is
-
= 8/41 - 4/42 = 0.195 - 0.095 = 0.10.
The actual p-value for this test will vary depending upon the student's randomization distribution, but it should be somewhere near 0.13. They should be providing a two-sided p-value. Their conclusion should be consistent with their p-value (and either a formal decision based on a significance level or an informal statement of the strength of their evidence against the null). Most p-values should lead the students to fail to reject the null hypothesis and conclude that this sample does not provide evidence that the proportion of green candies differs for the two types of M&M's. Diff: 2 Type: ES Var: 1 L.O.: 4.1.2; 4.2.2; 4.2.5; 4.3.2; 4.3.5; 4.4.3; 4.4.4 Use the following to answer the questions below: A student in an introductory statistics course investigated if there is evidence that the proportion of milk chocolate M&M's that are green differs from the proportion of dark chocolate M&M's that are green. She purchased a bag of each variety, and her data are summarized in the following table.
Milk Chocolate Dark Chocolate Total
Green 8 4 12
Not Green 33 38 71
Total 41 42 83
8) Define the appropriate parameter(s) and state the hypotheses for testing if the proportion of green M&M's differs for milk chocolate and dark chocolate M&M's. Answer: = proportion of milk chocolate M&M's that are green = proportion of dark chocolate M&M's that are green :
=
:
≠
Diff: 2 Type: ES L.O.: 4.1.2
Var: 1
115
9) Describe how you would generate a single randomization sample in this situation, and identify (using the appropriate notation) the sample statistic you would record for each sample. Answer: If the null hypothesis were true, green candies would be equally likely to appear in either bag. Since there are 83 candies, we should construct a deck of 83 cards - 12 of which are green (representing the 12 total green candies she observed) and the rest can be white (representing the "other" colors she observed). We would shuffle the deck and deal out two piles (one with 41 cards to represent the Milk Chocolate candies and the other with 42 cards to represent the Dark Chocolate candies). From those samples we could record the difference in the sample proportions of green candies: Diff: 2 Type: ES L.O.: 4.4.1
Var: 1
10) Use technology to create a randomization distribution with at least 1,000 values for testing these hypotheses. Use your randomization distribution to estimate the p-value for this sample. Answer: The actual p-value for this test will vary depending upon the student's randomization distribution, but it should be somewhere near 0.13. They should be providing a two-sided p-value. Diff: 2 Type: ES Var: 1 L.O.: 4.4.2;4.4.3 11) Use technology to create a randomization distribution with at least 1,000 values for testing these hypotheses and estimate the p-value. Use your p-value to make a decision about these hypotheses. Be sure to word your decision in the context of the problem. Include an assessment of the strength of your evidence. Answer: Their conclusion should be consistent with their p-value (and either a formal decision based on a significance level or an informal statement of the strength of their evidence against the null). Though, most p-values should lead the students to fail to reject the null hypothesis and conclude that this sample does not provide evidence that the proportion of green candies differs for the two types of M&M's. Diff: 2 Type: ES Var: 1 L.O.: 4.3.2;4.3.5
116
Use the following to answer the questions below A student in an introductory statistics course investigated if there is evidence that the proportion of milk chocolate M&M's that are green differs from the proportion of dark chocolate M&M's that are green. She purchased a bag of each variety, and her data are summarized in the following table.
Milk Chocolate Dark Chocolate Total
Green 8 4 12
Not Green 33 38 71
Total 41 42 8
12) Define the appropriate parameter(s) and state the hypotheses for testing if the proportion of green M&M's differs for milk chocolate and dark chocolate M&M's. Answer: = proportion of milk chocolate M&M's that are green = proportion of dark chocolate M&M's that are green :
=
:
≠
Diff: 2 Type: ES L.O.: 4.1.2
Var: 1
13) Describe how you would generate a single randomization sample in this situation, and identify (using the appropriate notation) the sample statistic you would record for each sample. Answer: If the null hypothesis were true, green candies would be equally likely to appear in either bag. Since there are 83 candies, we should construct a deck of 83 cards - 12 of which are green (representing the 12 total green candies she observed) and the rest can be white (representing the "other" colors she observed). We would shuffle the deck and deal out two piles (one with 41 cards to represent the Milk Chocolate candies and the other with 42 cards to represent the Dark Chocolate candies). From those samples we could record the difference in the sample proportions of green candies: Diff: 2 Type: ES L.O.: 4.4.1
Var: 1
117
14) Use the provided randomization distribution (based on 100 samples) to estimate the p-value for this sample.
Answer: In the randomization distribution, 16 dots of the 100 (16/100) dots greater than or equal to 0.10. However, since this is a two-sided test, this must be multiplied by 2 to obtain the correct p-value: p-value = 2*0.16 = 0.32 Diff: 2 Type: ES Var: 1 L.O.: 4.2.2;4.4.2 15) Use the provided randomization distribution (based on 100 samples) to estimate the p-value for this sample. Use your p-value to make a decision about these hypotheses. Be sure to word your decision in the context of the problem. Include an assessment of the strength of your evidence.
Answer: This p-value provides no evidence that the proportion of green candies differs for milk chocolate and dark chocolate M&M's. Diff: 2 Type: ES Var: 1 L.O.: 4.3.2;4.3.5 118
Use the following to answer the questions below: As of July 8, 2020, the national average price for a gallon of regular unleaded gasoline was $2.18. The prices for a sample of gas stations in the state of Illinois are provided. $2.365
$2.417
$2.437
$2.421
$2.396
$2.444
$2.422
$2.374
$2.422
$2.447
It is of interest to use this sample to compare the average gas price in Illinois to the national average. 16) Describe how you could generate a single randomization sample in this situation, and identify (using the appropriate notation) the sample statistic you would record for each sample. Answer: The student should first notice a couple of things: 1) The sample mean is $2.415. 2) Since the parameter (average gas price in Illinois) is going to be compared to the "national average", the null hypothesis should state that The original sample needs to be consistent with the null hypothesis, so all observations in the sample should be subtracted by 2.415-2.18 making the sample mean now 2.18 (i.e., preserving the general structure of the original sample but making it consistent with the null hypothesis). We could write each new sample value on an index card, shuffle the cards, select one, record that value, replace that card in the deck, and repeat (sampling with replacement) until we have a new sample of size 10 that is consistent with the null hypothesis (yet still preserves the structure of the original sample). We would then record the sample mean, , from this sample. Diff: 2 Type: ES Var: 1 L.O.: 4.4.1
119
17) Use the provided randomization distribution (based on 1,000 samples) to test if this sample provides evidence that the average gas price in Illinois exceeds the national average. Include an assessment of the strength of your evidence.
Answer: Parameter: μ = average gas price in Illinois Hypotheses: : μ = 2.18 versus : μ > 2.18 The sample statistic based on the original sample is = 2.415. None of the dots in the dotplot of the randomization distribution exceed (or are equal to) the sample mean, so the p-value for this test is 0. This p-value provides very strong evidence that the average gas price in Illinois is greater than 2.18 (the national average). Diff: 2 Type: ES Var: 1 L.O.: 4.1.2; 4.2.2; 4.2.5; 4.3.2; 4.3.5; 4.4.4
120
18) Use technology and the provided data to test if this sample provides evidence that the average gas price in Illinois exceeds the national average. Include an assessment of the strength of your evidence. Answer: Parameter: μ = average gas price in Illinois Hypotheses: μ = 2.18 versus : μ > 2.18 The sample statistic based on the original sample is = 2.415. The actual p-value will vary for the students, but they should be counting the number of dots that correspond to sample statistics greater than or equal to the observed 2.415 (which will likely be zero or very small). Their conclusion should be consistent with their p-value (and they will likely find very strong evidence that the average gas price in Illinois exceeds the national average). Diff: 2 Type: ES Var: 1 L.O.: 4.1.2; 4.2.2; 4.2.5; 4.3.2; 4.3.5; 4.4.3; 4.4.4 Use the following to answer the questions below: As of July 8, 2020, the national average price for a gallon of regular unleaded gasoline was $2.18. The prices for a sample of n = 10 gas stations in the state of Illinois are provided. $2.365
$2.417
$2.437
$2.421
$2.396
$2.444
$2.422
$2.374
$2.422
$2.447
It is of interest to use this sample to compare the average gas price in Illinois to the national average. 19) Define the appropriate parameter(s) and state the hypotheses for testing if this sample provides evidence that the average gas price in Illinois exceeds the national average. A) Parameter: μ = average gas price in Illinois Hypotheses: : μ = 2.18 versus : μ > 2.18 B) Parameter: μ = average gas price in Illinois Hypotheses: : μ = 2.18 versus : μ ≠ 2.18 C) Parameter: μ = average gas price in Illinois Hypotheses: : μ > 2.18 versus : μ = 2.18 D) Parameter: μ = average gas price in Illinois Hypotheses: : μ ≠ 2.18 versus : μ = 2.18 Answer: A Diff: 2 Type: BI L.O.: 4.1.2
Var: 1
121
20) Describe how you would generate a single randomization sample in this situation, and identify (using the appropriate notation) the sample statistic you would record for each sample. Answer: The student should first notice a couple of things: 1) The sample mean is $3.975. 2) Since the parameter (average gas price in Illinois) is going to be compared to the "national average," the null hypothesis should state that The original sample needs to be consistent with the null hypothesis, so all observations in the sample should be subtracted by , making the sample mean now 2.18 (i.e., preserving the general structure of the original sample but making it consistent with the null hypothesis). We could write each new sample value on an index card, shuffle the cards, select one, record that value, replace that card in the deck, and repeat (sampling with replacement) until we have a new sample of size 10 that is consistent with the null hypothesis (yet still preserves the structure of the original sample). We would then record the sample mean, , from this sample. Diff: 2 Type: ES Var: 1 L.O.: 4.4.1 21) Use technology to create a randomization distribution with at least 1,000 values for testing these hypotheses. Use your randomization distribution to estimate the p-value for this sample. Answer: The actual p-value will vary for the students, but they should be counting the number of dots that correspond to sample statistics greater than or equal to the observed 2.415 (which will likely be zero or very small). Diff: 2 Type: ES Var: 1 L.O.: 4.4.3 22) Use technology to create a randomization distribution with at least 1,000 values for testing these hypotheses and estimate the p-value. Use your p-value to make a decision about these hypotheses. Be sure to word your decision in the context of the problem. Include an assessment of the strength of your evidence. Answer: Their conclusion should be consistent with their p-value (and they will likely find very strong evidence that the average gas price in Illinois exceeds the national average). Diff: 2 Type: ES Var: 1 L.O.: 4.3.2;4.3.5
122
Use the following to answer the questions below: As of July 8, 2020, the national average price for a gallon of regular unleaded gasoline was $2.18. The prices for a sample of gas stations in the state of Illinois are provided. $2.365
$2.417
$2.437
$2.421
$2.396
$2.444
$2.422
$2.374
$2.422
$2.447
It is of interest to use this sample to compare the average gas price in Illinois to the national average. 23) Define the appropriate parameter(s) and state the hypotheses for testing if this sample provides evidence that the average gas price in Illinois exceeds the national average. A) Parameter: μ = average gas price in Illinois Hypotheses: : μ = 2.18 versus : μ > 2.18 B) Parameter: μ = average gas price in Illinois Hypotheses: : μ = 2.18 versus : μ ≠ 2.18 C) Parameter: μ = average gas price in Illinois Hypotheses: : μ > 2.18 versus : μ = 2.18 D) Parameter: μ = average gas price in Illinois Hypotheses: : μ ≠ 2.18 versus : μ = 2.18 Answer: A Diff: 2 Type: BI L.O.: 4.1.2
Var: 1
24) Describe how you would generate a single randomization sample in this situation, and identify (using the appropriate notation) the sample statistic you would record for each sample. Answer: The student should first notice a couple of things: 1) The sample mean is $2.415. 2) Since the parameter (average gas price in Illinois) is going to be compared to the "national average", the null hypothesis should state that The original sample needs to be consistent with the null hypothesis, so all observations in the sample should be subtracted by making the sample mean now 3.63 (i.e., preserving the general structure of the original sample but making it consistent with the null hypothesis). We could write each new sample value on an index card, shuffle the cards, select one, record that value, replace that card in the deck, and repeat (sampling with replacement) until we have a new sample of size 10 that is consistent with the null hypothesis (yet still preserves the structure of the original sample). We would then record the sample mean, , from this sample. Diff: 2 Type: ES Var: 1 L.O.: 4.4.1
123
25) Use the provided randomization distribution (based on 1,000 samples) to estimate the p-value for this sample.
Answer: 0 Explanation: The sample statistic based on the original sample is None of the dots in the dotplot of the randomization distribution exceed (or are equal to) the sample mean, so the p-value for this test is 0. Diff: 2 Type: SA Var: 1 L.O.: 4.2.2
124
26) Use the provided randomization distribution (based on 1,000 samples) to estimate the p-value for this sample. Use your p-value to make a decision about these hypotheses. Be sure to word your decision in the context of the problem. Include an assessment of the strength of your evidence.
Answer: This p-value provides very strong evidence that the average gas price in Illinois is greater than 2.18 (the national average). Diff: 2 Type: ES Var: 1 L.O.: 4.3.2;4.3.5
125
27) The provided histogram displays the prices (in thousands of dollars) of 25 homes sold in 2019 in a Midwestern city.
In general, this shape, right skewed with some unusually high values, is common for describing home values in many cities. For this reason, the median home value for a city is a useful parameter. This sample of recently sold homes had a median price (value) of $232,500. Someone considering moving to this city is interested in knowing if the median home value is more than $200,000. Describe how you would generate a single randomization sample in this situation, and identify the statistic you would calculate from the sample. Answer: The current sample first needs to be shifted so that it is consistent with the null hypothesis. Since the median of the original sample is $23,500 more than the hypothesized value, all of the observations in the original sample should be reduced by $23,500. This creates a sample that still has the same general structure as the original sample but is now consistent with the null hypothesis. To generate a single randomization sample, we could write the 26 modified prices (those consistent with the null hypotheses) on index cards, shuffle them, draw one, record the value, replace that card in the deck, and repeat (sampling with replacement) until we've obtained a sample of 26 prices. The median of this sample would be the statistic we calculate. Diff: 3 Type: ES Var: 1 L.O.: 4.4.0
126
Use the following to answer the questions below: A certain species of tree has an average life span of 130 years. A researcher has noticed a large number of trees of this species washing up along a beach as driftwood. She takes core samples from 27 of those trees to count the number of rings and measure the widths of the rings. Counting the rings allows the researcher to determine the age of each tree. The average age of the trees in the sample is about 120 years. One of her interests is determining if this sample provides evidence that the average age of the driftwood is less than the 130 year life span expected for this type of tree. If the average age is less than 130 years it might suggest that the trees have died from unusual causes, such as invasive beetles or logging. 28) Describe how you would generate a single randomization sample in this situation, and identify the statistic you would calculate for each sample. Answer: The observed sample needs to be modified so that it is consistent with the null hypothesis. Since the sample mean is about 120 and the hypothesized mean is 130, this can be done by adding 10 years to each observation in the sample. This creates a sample that is consistent with the general structure of the original sample but is consistent with the null hypothesis. To generate a randomization sample, we would write each of the newly modified years on an index card (there would be 27 of them), shuffle the deck, select an index card at random, record the value, return that card to the deck, and repeat (sampling with replacement) until a sample of 27 trees is obtained. We would calculate the mean of this sample as the statistic. Diff: 2 Type: ES Var: 1 L.O.: 4.4.1
127
29) Use the provided randomization distribution (based on 100 samples) to determine if this sample provides evidence that the average age of the driftwood along this beach is less than 130 years. Use a 5% significance level to make your conclusion.
Answer: Parameter: μ = average age of driftwood trees along this beach Hypotheses: : μ = 130 versus : μ < 130 The observed sample statistic is 120 years. To obtain the p-value we can count the number of dots at or below 120 (note that either 8 or 10 dots should be an acceptable answer). Thus, a p-value of either 0.08 or 0.10 would be acceptable. This p-value would lead us to not reject the null hypothesis and conclude that there is little to no evidence that the average age of driftwood along this beach is less than 130 years old. Diff: 2 Type: ES Var: 1 L.O.: 4.1.2; 4.2.2; 4.2.5; 4.3.1; 4.3.2; 4.4.4
128
Use the following to answer the questions below: A certain species of tree has an average life span of 130 years. A researcher has noticed a large number of trees of this species washing up along a beach as driftwood. She takes core samples from 27 of those trees to count the number of rings and measure the widths of the rings. Counting the rings allows the researcher to determine the age of each tree. Her data are displayed in the provided table. One of her interests is determining if this sample provides evidence that the average age of the driftwood is less than the 130 year life span expected for this type of tree. If the average age is less than 130 years it might suggest that the trees have died from unusual causes, such as invasive beetles or logging. 98
79
147
200
130
60
51
127
105 75
120
113
200
81
98
160
165
134
62
152
66
68
190
159
60
124
190
30) Describe how you would generate a single randomization sample in this situation, and identify the statistic you would calculate for each sample. Answer: The observed sample needs to be modified so that it is consistent with the null hypothesis. Since the sample mean is about 119 and the hypothesized mean is 130, this can be done by adding 11 years to each observation in the sample. This creates a sample that is consistent with the general structure of the original sample but is consistent with the null hypothesis. To generate a randomization sample, we would write each of the newly modified years on an index card (there would be 27 of them), shuffle the deck, select an index card at random, record the value, return that card to the deck, and repeat (sampling with replacement) until a sample of 27 trees is obtained. We would calculate the mean of this sample as the statistic. Diff: 2 Type: ES Var: 1 L.O.: 4.4.1 31) Use technology and the provided data to determine if this sample provides evidence that the average age of the driftwood along this beach is less than 130 years. Use a 5% significance level to make your conclusion. Answer: Parameter: μ = average age of driftwood trees along this beach Hypotheses: : μ = 130 versus : μ < 130 The observed sample statistic is 119.037 years. The p-value is 0.114. Because 0.114 > 0.05, we would not reject the null hypothesis and thus we have no evidence to conclude that the average age of the driftwood on the beach is less than 130 years. Diff: 2 Type: ES Var: 1 L.O.: 4.1.2; 4.2.5; 4.3.1; 4.3.2; 4.4.3; 4.4.4
129
Use the following to answer the questions below: A certain species of tree has an average life span of 130 years. A researcher has noticed a large number of trees of this species washing up along a beach as driftwood. She takes core samples from 27 of those trees to count the number of rings and measure the widths of the rings. Counting the rings allows the researcher to determine the age of each tree. The average age of the trees in the sample is approximately 120 years. One of her interests is determining if this sample provides evidence that the average age of the driftwood is less than the 130 year life span expected for this type of tree. If the average age is less than 130 years it might suggest that the trees have died from unusual causes, such as invasive beetles or logging. 32) Define the appropriate parameter(s) and state the hypotheses for testing if this sample provides evidence that the average age of the driftwood along this beach is less than 130 years. A) Parameter: μ = average age of driftwood trees along this beach Hypotheses: : μ = 130 versus : μ < 130 B) Parameter: μ = average age of driftwood trees along this beach Hypotheses: : μ = 130 versus : μ ≠ 130 C) Parameter: μ = average age of driftwood trees along this beach Hypotheses: : μ ≠ 130 versus : μ = 130 D) Parameter: μ = average age of driftwood trees along this beach Hypotheses: : μ > 130 versus : μ = 130 Answer: A Diff: 2 Type: BI L.O.: 4.1.2
Var: 1
33) Describe how you would generate a single randomization sample in this situation, and identify the statistic you would calculate for each sample. Answer: The observed sample needs to be modified so that it is consistent with the null hypothesis. Since the sample mean is 120 and the hypothesized mean is 130, this can be done by adding 10 years to each observation in the sample. This creates a sample that is consistent with the general structure of the original sample but is consistent with the null hypothesis. To generate a randomization sample, we would write each of the newly modified years on an index card (there would be 27 of them), shuffle the deck, select an index card at random, record the value, return that card to the deck, and repeat (sampling with replacement) until a sample of 27 trees is obtained. We would calculate the mean of this sample as the statistic. Diff: 2 Type: ES Var: 1 L.O.: 4.4.1
130
34) Use the provided randomization distribution (based on 100 samples) to estimate the p-value for this sample.
Answer: To obtain the p-value we can count the number of dots at or below 120 (note that either 8 or 10 dots should be an acceptable answer). Thus, a p-value of either 0.08 or 0.10 would be acceptable. Diff: 2 Type: ES Var: 1 L.O.: 4.2.2 35) Use the provided randomization distribution (based on 100 samples) to estimate the p-value for this sample. Use your p-value and a 5% significance level to make a decision about these hypotheses. Be sure to word your decision in the context of the problem.
Answer: Because the p-value > 0.05, we would not reject the null hypothesis and thus we have no evidence to conclude that the average age of the driftwood on the beach is less than 130 years. Diff: 2 Type: ES Var: 1 L.O.: 4.3.1;4.3.2 131
36) Use the provided randomization distribution (based on 100 samples) to estimate the p-value for this sample. What conclusion would you make at the 10% significance level?
Answer: With a p-value around 0.08-0.10, we would reject the null hypothesis and conclude that there is some evidence that evidence that the average age of driftwood logs along this beach is less than 130 years old. Diff: 2 Type: ES Var: 1 L.O.: 4.3.1
132
Use the following to answer the questions below: A certain species of tree has an average life span of 130 years. A researcher has noticed a large number of trees of this species washing up along a beach as driftwood. She takes core samples from 27 of those trees to count the number of rings and measure the widths of the rings. Counting the rings allows the researcher to determine the age of each tree. Her data are displayed in the provided table. One of her interests is determining if this sample provides evidence that the average age of the driftwood is less than the 130 year life span expected for this type of tree. If the average age is less than 130 years it might suggest that the trees have died from unusual causes, such as invasive beetles or logging. 98
79
147
200
130
60
51
127
105
75
120
113
200
81
98
160
165
134
62
152
66
68
124
190
159
60
190
37) Define the appropriate parameter(s) and state the hypotheses for testing if this sample provides evidence that the average age of the driftwood along this beach is less than 130 years. A) Parameter: μ = average age of driftwood trees along this beach Hypotheses: : μ = 130 versus : μ < 130 B) Parameter: μ = average age of driftwood trees along this beach Hypotheses: : μ = 130 versus : μ > 130 C) Parameter: μ = average age of driftwood trees along this beach Hypotheses: : μ < 130 versus : μ = 130 D) Parameter: μ = average age of driftwood trees along this beach Hypotheses: : μ > 130 versus : μ = 130 Answer: A Diff: 2 Type: BI L.O.: 4.1.2
Var: 1
38) Describe how you would generate a single randomization sample in this situation, and identify (using the appropriate notation) the sample statistic you would record for each sample. Answer: The observed sample needs to be modified so that it is consistent with the null hypothesis. Since the sample mean is about 119 and the hypothesized mean is 130, this can be done by adding 11 years to each observation in the sample. This creates a sample that is consistent with the general structure of the original sample but is consistent with the null hypothesis. To generate a randomization sample, we would write each of the newly modified years on an index card (there would be 27 of them), shuffle the deck, select an index card at random, record the value, return that card to the deck, and repeat (sampling with replacement) until a sample of 27 trees is obtained. We would calculate the mean of this sample as the statistic. Diff: 2 Type: ES Var: 1 L.O.: 4.4.1 39) Use technology to create a randomization distribution with at least 1,000 values for testing these hypotheses. Use your randomization distribution to estimate the p-value for this sample. Answer: The p-value is 0.114. (Answers may vary) Diff: 2 Type: ES Var: 1 L.O.: 4.4.3 133
40) Use technology to create a randomization distribution with at least 1,000 values for testing these hypotheses and estimate the p-value. Use your p-value and a 5% significance level to make a decision about these hypotheses. Be sure to word your decision in the context of the problem. Answer: Because 0.114 > 0.05, we would not reject the null hypothesis, and thus we have no evidence to conclude that the average age of the driftwood on the beach is less than 130 years. Diff: 2 Type: ES Var: 1 L.O.: 4.3.1;4.3.2 41) Use technology to create a randomization distribution with at least 1,000 values for testing these hypotheses and estimate the p-value. What conclusion would you make at the 10% significance level? Answer: Because 0.114 > 0.1, there is still no evidence that the average age of the driftwood along the beach is less than 130 years. Diff: 2 Type: ES Var: 1 L.O.: 4.3.1 Use the following to answer the questions below: There are 24 students enrolled in an introductory statistics class at a small university. As an in-class exercise the students were asked how many hours of television they watch each week. Their responses, broken down by gender, are summarized in the provided table. Assume that the students enrolled in the statistics class are representative of all students at the university. Male
3
1
12
12
0
4
10
4
5
5
2
Female
10
3
2
10
3
2
0
1
6
1
5
10
10
=6 =4
42) Does this sample provide evidence that, on average, male students watch more television than female students at this university? Describe how you could generate a single randomization sample in this situation, and identify the statistic that you would calculate for each sample. Answer: The null hypothesis in this situation would be that the two groups have the same mean. To generate a sample consistent with the null hypothesis, we would write each student's response on an index card, shuffle the cards, deal the cards into two piles (one with 11 cards for the "females" and the other with 13 for the "males"), and calculate the difference in sample means as the statistic. Diff: 2 Type: ES L.O.: 4.4.1
Var: 1
134
43) Use technology to determine if this sample provides evidence that, on average, male students watch more television than female students at this university. Include an assessment of the strength of the evidence. Answer: Parameters: = mean number of hours of television per week for male students and number of hours of television per week for female students Hypotheses: : = versus : > The observed difference in sample means was P-values will vary, though they should be near 0.10. Conclusions will vary but should convey that this sample provides little/no evidence that male students watch more television each week than female students at this university. Diff: 2 Type: ES Var: 1 L.O.: 4.1.2; 4.2.2; 4.2.5; 4.3.2; 4.3.5; 4.4.3; 4.4.4 44) Use the provided randomization distribution (based on 100 samples) to determine if this sample provides evidence that, on average, male students watch more television than female students at this university. Include an assessment of the strength of the evidence.
Answer: Parameters:
= mean number of hours of television per week for male students and
number of hours of television per week for female students Hypotheses: : = versus : > The observed difference in sample means was There are 10 (12 is also acceptable) dots greater than or equal to the observed difference of 2. The p-value is 10/100 = 0.10 (or 12/100 = 0.12). This sample provides no evidence that male students watch more television each week than female students at this university. Diff: 2 Type: ES Var: 1 L.O.: 4.1.2; 4.2.2; 4.2.5; 4.3.2; 4.3.5; 4.4.4
135
Use the following to answer the questions below: There are 24 students enrolled in an introductory statistics class at a small university. As an in-class exercise the students were asked how many hours of television they watch each week. Their responses, broken down by gender, are summarized in the provided table. Assume that the students enrolled in the statistics class are representative of all students at the university. Male
3
1
12
12
0
4
10
4
5
5
2
Female
10
3
2
10
3
2
0
1
6
1
5
10
10
=6 =4
45) Define the appropriate parameter(s) and state the hypotheses for testing if this sample provides evidence that, on average, male students watch more television than female students at this university. A) Parameters: = mean number of hours of television per week for male students and number of hours of television per week for female students Hypotheses:
:
B) Parameters:
=
versus
:
>
= mean number of hours of television per week for male students and
number of hours of television per week for female students Hypotheses:
:
C) Parameters:
<
versus
:
=
= mean number of hours of television per week for male students and
number of hours of television per week for female students Hypotheses:
:
D) Parameters:
=
versus
:
<
= mean number of hours of television per week for male students and
number of hours of television per week for female students Hypotheses:
:
Answer: A Diff: 2 Type: BI L.O.: 4.1.2
>
versus
:
=
Var: 1
46) Describe how you would generate a single randomization sample in this situation, and identify (using the appropriate notation) the sample statistic you would record for each sample. Answer: The null hypothesis in this situation would be that the two groups have the same mean. To generate a sample consistent with the null hypothesis, we would write each student's response on an index card, shuffle the cards, deal the cards into two piles (one with 11 cards for the "females" and the other with 13 for the "males"), and calculate the difference in sample means as the statistic. Diff: 2 Type: ES L.O.: 4.4.1
Var: 1
136
47) Use the provided randomization distribution (based on 100 samples) to estimate the p-value for this sample.
Answer: The observed difference in sample means was 6 - 4 = 2. There are 10 (12 is also acceptable) dots greater than or equal to the observed difference of 2. The p-value is (or Diff: 2 Type: ES Var: 1 L.O.: 4.2.2 48) Use the provided randomization distribution (based on 100 samples) to estimate the p-value for this sample. Use your p-value to make a decision about these hypotheses. Be sure to word your decision in the context of the problem.
A) This sample provides no evidence that male students watch more television each week than female students at this university. B) This sample provides strong evidence that male students watch more television each week than female students at this university. Answer: A Diff: 2 Type: BI Var: 1 L.O.: 4.3.2;4.3.5
137
Use the following to answer the questions below: There are 24 students enrolled in an introductory statistics class at a small university. As an in-class exercise the students were asked how many hours of television they watch each week. Their responses, broken down by gender, are summarized in the provided table. Assume that the students enrolled in the statistics class are representative of all students at the university. Male
3
1
12
12
0
4
10
4
5
5
2
Female
10
3
2
10
3
2
0
1
6
1
5
10
10
=6 =4
49) Define the appropriate parameter(s) and state the hypotheses for testing if this sample provides evidence that, on average, male students watch more television than female students at this university. Answer: Parameters: = mean number of hours of television per week for male students and number of hours of television per week for female students Hypotheses: : = versus : > Diff: 2 Type: ES L.O.: 4.1.2
Var: 1
50) Describe how you would generate a single randomization sample in this situation, and identify (using the appropriate notation) the sample statistic you would record for each sample. Answer: The null hypothesis in this situation would be that the two groups have the same mean. To generate a sample consistent with the null hypothesis, we would write each student's response on an index card, shuffle the cards, deal the cards into two piles (one with 11 cards for the "females" and the other with 13 for the "males"), and calculate the difference in sample means as the statistic. Diff: 2 Type: ES L.O.: 4.4.1
Var: 1
51) Use technology to create a randomization distribution with at least 1,000 values for testing these hypotheses. Use your randomization distribution to estimate the p-value for this sample. Answer: The observed difference in sample means was P-values will vary, though they should be near 0.10. Diff: 2 Type: ES Var: 1 L.O.: 4.4.3
138
52) Use technology to create a randomization distribution with at least 1,000 values for testing these hypotheses and estimate the p-value. Use your p-value to make a decision about these hypotheses. Be sure to word your decision in the context of the problem. A) This sample provides no evidence that male students watch more television each week than female students at this university. B) This sample provides strong evidence that male students watch more television each week than female students at this university. Answer: A Diff: 2 Type: BI Var: 1 L.O.: 4.3.2;4.3.5 Use the following to answer the questions below: The owner of a small pet supply store wants to open a second store in another city, but he only wants to do so if more than one-third of the city's households have pets (otherwise there won't be enough business). He samples 150 of the households and finds that 64 have pets. 53) Describe how you could generate a single randomization sample in this situation, and identify the statistic that you would calculate for each sample. Answer: The hypothesized value for the implied null hypothesis would be 1/3. Thus, the key to generating a sample that is consistent with this hypothesis is to randomly generate one of three equally likely outcomes. One way to do this would be select three cards from a deck of cards, say a 2, 3, and 4. Let the "2" represent the outcome "has pets". Shuffle the cards, select one, record the outcome or not), return the card to the deck, and repeat for a total of 150 observations. Diff: 2 Type: ES Var: 1 L.O.: 4.4.1 54) Use technology to determine if this sample provides evidence that more than one-third of households in this city own pets. Use a 5% significance level. Answer: Parameter: p = proportion of households in this city with pets Hypotheses: : p = 1/3 versus : p > 1/3 The observed sample statistic is P-values will vary though it will be approximately 0.004. Conclusion should be consistent with p-value, though it should convey that the sample provides pretty strong evidence to reject the null hypothesis (since and conclude that there is strong evidence that more than 1/3 of the households in this city have pets. Diff: 2 Type: ES Var: 1 L.O.: 4.1.2; 4.2.2; 4.2.5; 4.3.1; 4.3.2; 4.4.3; 4.4.4
139
55) Use the provided randomization distribution (based on 100 samples) to determine if this sample provides evidence that more than one-third of households in this city own pets. Use a 5% significance level.
Answer: Parameter: p = proportion of households in this city with pets Hypotheses: : p = 1/3 versus : p > 1/3 The observed sample statistic is (so the cut-off for counting is halfway between 0.40 and 0.45). There is only one dot at a value greater than or equal to 0.426. Thus, the p-value is 0.01. This sample provides strong evidence (since to reject the null hypothesis and conclude that more than 1/3 of households in this city own pets. Diff: 2 Type: ES Var: 1 L.O.: 4.1.2; 4.2.2; 4.2.5; 4.3.1; 4.3.2; 4.4.4
140
Use the following to answer the questions below: The owner of a small pet supply store wants to open a second store in another city, but he only wants to do so if more than one-third of the city's households have pets (otherwise there won't be enough business). He samples 150 of the households and finds that 64 have pets. 56) Define the appropriate parameter(s) and state the hypotheses for testing if this sample provides evidence that more than one-third of households in this city own pets. A) Parameter: p = proportion of households in this city with pets Hypotheses: : p = 1/3 versus : p > 1/3 B) Parameter: p = proportion of households in this city with pets Hypotheses: : p < 1/3 versus : ≥ 1/3 C) Parameter: p = proportion of households in this city with pets Hypotheses: : p = 1/3 versus : p ≠ 1/3 D) Parameter: p = proportion of households in this city with pets Hypotheses: : p ≠ 1/3 versus : p = 1/3 Answer: A Diff: 2 Type: BI L.O.: 4.1.2
Var: 1
57) Describe how you would generate a single randomization sample in this situation, and identify (using the appropriate notation) the sample statistic you would record for each sample. Answer: The hypothesized value for the implied null hypothesis would be 1/3. Thus, the key to generating a sample that is consistent with this hypothesis is to randomly generate one of three equally likely outcomes. One way to do this would be select three cards from a deck of cards, say a 2, 3, and 4. Let the "2" represent the outcome "has pets". Shuffle the cards, select one, record the outcome or not), return the card to the deck, and repeat for a total of 150 observations. Diff: 2 Type: ES Var: 1 L.O.: 4.4.1
141
58) Use the provided randomization distribution (based on 100 samples) to estimate the p-value for this sample.
Answer: The observed sample statistic is = 64/150 = 0.426 (so the cut-off for counting is halfway between 0.40 and 0.45). There is only one dot at a value greater than or equal to 0.426. Thus, the p-value is 0.01. This sample provides strong evidence that more than 1/3 of households in this city own pets. Diff: 2 Type: ES Var: 1 L.O.: 4.2.2
142
59) Use the provided randomization distribution (based on 100 samples) to estimate the p-value for this sample. Use your p-value and a 5% significance level to make a decision about these hypotheses. Be sure to word your decision in the context of the problem.
A) This sample provides strong evidence to reject the null hypothesis and conclude that more than 1/3 of households in this city own pets. B) This sample does not provide enough evidence to reject the null hypothesis and we can not conclude that more than 1/3 of households in this city own pets. Answer: A Explanation: This sample provides strong evidence (since to reject the null hypothesis and conclude that more than 1/3 of households in this city own pets. Diff: 2 Type: BI Var: 1 L.O.: 4.3.1;4.3.2
143
Use the following to answer the questions below: The owner of a small pet supply store wants to open a second store in another city, but he only wants to do so if more than one-third of the city's households have pets (otherwise there won't be enough business). He samples 150 of the households and finds that 64 have pets. 60) Define the appropriate parameter(s) and state the hypotheses for testing if this sample provides evidence that more than one-third of households in this city own pets. A) Parameter: p = proportion of households in this city with pets Hypotheses: : p = 1/3 versus : p > 1/3 B) Parameter: p = proportion of households in this city with pets Hypotheses: : p < 1/3 versus : ≥ 1/3 C) Parameter: p = proportion of households in this city with pets Hypotheses: : p = 1/3 versus : p ≠ 1/3 D) Parameter: p = proportion of households in this city with pets Hypotheses: : p ≠ 1/3 versus : p = 1/3 Answer: A Diff: 2 Type: BI L.O.: 4.1.2
Var: 1
61) Describe how you would generate a single randomization sample in this situation, and identify (using the appropriate notation) the sample statistic you would record for each sample. Answer: The hypothesized value for the implied null hypothesis would be 1/3. Thus, the key to generating a sample that is consistent with this hypothesis is to randomly generate one of three equally likely outcomes. One way to do this would be select three cards from a deck of cards, say a 2, 3, and 4. Let the "2" represent the outcome "has pets." Shuffle the cards, select one, record the outcome or not), return the card to the deck, and repeat for a total of 150 observations. Diff: 2 Type: ES Var: 1 L.O.: 4.4.1 62) Use technology to create a randomization distribution with at least 1,000 values for testing these hypotheses. Use your randomization distribution to estimate the p-value for this sample. Answer: The observed sample statistic is P-values will vary though it will be approximately 0.004. Diff: 2 Type: ES Var: 1 L.O.: 4.4.3
144
63) Use technology to create a randomization distribution with at least 1,000 values for testing these hypotheses and estimate the p-value. Use your p-value and a 5% significance level to make a decision about these hypotheses. Be sure to word your decision in the context of the problem. Answer: Conclusion should be consistent with p-value, though it should convey that the sample provides pretty strong evidence to reject the null hypothesis (since and conclude that there is strong evidence that more than 1/3 of the households in this city have pets. Diff: 2 Type: ES Var: 1 L.O.: 4.3.1;4.3.2 Use the following to answer the questions below: A Division III college men's basketball team is interested in identifying factors that impact the outcomes of their games. They plan to use "point spread" (their score minus their opponent's score) to quantify the outcome of each game this season; positive values indicate games that they won while negative values indicate games they lost. They want to determine if "steal differential" (the number of steals they have in the game minus the number of steals their opponent had) is related to point spread; positive values indicate games where they had more steals than their opponent. The data for the first five games are in the provided table as an example. Point Spread (y) 4 2 -21 -4 -9
Steal Differential (x) 7 -2 -2 1 2
The correlation between point spread and steal differential for the games they played this season is about Assuming that this season was a typical season for the team, they want to know if steal differential is positively correlated with point spread. 64) Describe how you would generate a single randomization sample in this situation, and identify the statistic you would calculate for each sample. Answer: The null hypothesis is that the correlation between point spread and steal differential is 0 One way to generate a sample that is consistent with this value is to write the 25 steal differentials on index cards, shuffle them, and deal them out (one to each of the observed point spreads). The sample correlation r for the 25 new pairs should be calculated. Diff: 2 Type: ES Var: 1 L.O.: 4.4.1
145
65) Use the provided randomization distribution (based on 100 samples) to determine if this sample provides evidence that point spread and steal differential are positively correlated. Use a 10% significance level to make your conclusion.
Answer: Parameter: ρ = correlation between point spread and steal differential for this team Hypotheses: : ρ = 0 versus :ρ>0 The sample statistic is r = 0.35. There are 6 dots corresponding to values greater than or equal to 0.35 (0.35 is located halfway between 0.30 and 0.40). Thus, the p-value is The p-value is less than α = 0.10 (the significance level), thus this season provides some evidence that point spread and steal differential are positively correlated. Diff: 2 Type: ES Var: 1 L.O.: 4.1.2; 4.2.2; 4.2.5; 4.3.1; 4.3.2; 4.4.4
146
Use the following to answer the questions below: A Division III college men's basketball team is interested in identifying factors that impact the outcomes of their games. They plan to use "point spread" (their score minus their opponent's score) to quantify the outcome of each game this season; positive values indicate games that they won while negative values indicate games they lost. They want to determine if "steal differential" (the number of steals they have in the game minus the number of steals their opponent had) is related to point spread; positive values indicate games where they had more steals than their opponent. The data for the first five games are in the provided table as an example. Point Spread (y) 4 2 -21 -4 -9
Steal Differential (x) 7 -2 -2 1 2
The correlation between point spread and steal differential for the games they played this season is about Assuming that this season was a typical season for the team, they want to test if this sample provides evidence that steal differential is positively correlated with point spread. 66) Define the appropriate parameter(s) and state the hypotheses for testing if this sample provides evidence that steal differential is positively correlated with point spread. A) Parameter: ρ = correlation between point spread and steal differential for this team Hypotheses: : : ρ = 0 versus :ρ>0 B) Parameter: ρ = correlation between point spread and steal differential for this team Hypotheses: : : ρ = 0 versus :ρ≠0 C) Parameter: ρ = correlation between point spread and steal differential for this team Hypotheses: : : ρ = 0 versus :ρ<0 D) Parameter: ρ = correlation between point spread and steal differential for this team Hypotheses: : : ρ < 0 versus :ρ=0 Answer: A Diff: 2 Type: BI L.O.: 4.1.2
Var: 1
147
67) Describe how you would generate a single randomization sample in this situation, and identify the statistic you would calculate for each sample. Answer: The null hypothesis is that the correlation between point spread and steal differential is 0 One way to generate a sample that is consistent with this value is to write the 25 steal differentials on index cards, shuffle them, and deal them out (one to each of the observed point spreads). The sample correlation r for the 25 new pairs should be calculated. Diff: 2 Type: ES Var: 1 L.O.: 4.4.1 68) Use the provided randomization distribution (based on 100 samples) to estimate the p-value for this sample.
A) The p-value is 0.06 B) The p-value is 0.35 C) The p-value is 0.12 D) The p-value is 0.14 Answer: A Explanation: The sample statistic is r = 0.35. There are 6 dots corresponding to values greater than or equal to 0.35 (0.35 is located halfway between 0.30 and 0.40). Thus, the p-value is Diff: 2 Type: BI L.O.: 4.2.2
Var: 1
69) Use the provided randomization distribution (based on 100 samples) to estimate the p-value for this sample. Use the p-value to make a decision about these hypotheses. Be sure to word your decision in the context of the problem.
Answer: The p-value is less than α = 0.10 (the significance level), thus this season provides some evidence that point spread and steal differential are positively correlated. Diff: 2 Type: ES Var: 1 L.O.: 4.3.1;4.3.2
148
Use the following to answer the questions below: A Division III college men's basketball team is interested in identifying factors that impact the outcomes of their games. They plan to use "point spread" (their score minus their opponent's score) to quantify the outcome of each game this season; positive values indicate games that they won while negative values indicate games they lost. They want to determine if "steal differential" (the number of steals they have in the game minus the number of steals their opponent had) is related to point spread; positive values indicate games where they had more steals than their opponent. The data for the games they played this season displayed in the provided table. Point Spread (y) 4 2 -21 -4 -9 7 -7 15 7 -13 -11 -17 31
Steal Differential (x) 7 -2 -2 1 2 -5 1 -2 -1 -2 -6 -3 7
Point Spread (y) 18 2 -6 7 13 3 3 10 20 -1 -1 -11
Steal Differential (x) -2 5 -6 4 -1 -3 1 -2 0 1 -3 -2
Assuming that this season was a typical season for the team, they want to know if steal differential is positively correlated with point spread. 70) Describe how you would generate a single randomization sample in this situation, and identify the statistic you would calculate for each sample. Answer: The null hypothesis is that the correlation between point spread and steal differential is 0 One way to generate a sample that is consistent with this value is to write the 25 steal differentials on index cards, shuffle them, and deal them out (one to each of the observed point spreads). The sample correlation r for the 25 new pairs should be calculated. Diff: 2 Type: ES Var: 1 L.O.: 4.4.2
149
71) Use technology to determine if this sample provides evidence that point spread and steal differential are positively correlated. Be sure to include all of the steps of the test. Use a 10% significance level to make your conclusion. Answer: Parameter: ρ = correlation between point spread and steal differential for this team Hypotheses: : ρ = 0 versus :ρ>0 The p-value is about 0.038. (Answers may vary.) Because the p-value is less than the significance level, we would reject the null hypothesis and we have evidence that point spread and steal differential are positively correlated. Diff: 2 Type: ES Var: 1 L.O.: 4.1.2; 4.2.2; 4.3.1; 4.3.2; 4.4.3; 4.4.4
150
Use the following to answer the questions below: A Division III college men's basketball team is interested in identifying factors that impact the outcomes of their games. They plan to use "point spread" (their score minus their opponent's score) to quantify the outcome of each game this season; positive values indicate games that they won while negative values indicate games they lost. They want to determine if "steal differential" (the number of steals they have in the game minus the number of steals their opponent had) is related to point spread; positive values indicate games where they had more steals than their opponent. The data for the games they played this season displayed in the provided table. Point Spread (y) 4 2 -21 -4 -9 7 -7 15 7 -13 -11 -17 31
Steal Differential (x) 7 -2 -2 1 2 -5 1 -2 -1 -2 -6 -3 7
Point Spread (y) 18 2 -6 7 13 3 3 10 20 -1 -1 -11
Steal Differential (x) -2 5 -6 4 -1 -3 1 -2 0 1 -3 -2
Assuming that this season was a typical season for the team, they want to know if steal differential is positively correlated with point spread. 72) Define the appropriate parameter(s) and state the hypotheses for testing if this sample provides evidence that steal differential is positively correlated with point spread. A) Parameter: ρ = correlation between point spread and steal differential for this team. Hypotheses: : ρ = 0 versus :ρ>0 B) Parameter: ρ = correlation between point spread and steal differential for this team. Hypotheses: : ρ = 0 versus :ρ<0 C) Parameter: ρ = correlation between point spread and steal differential for this team. Hypotheses: : ρ = 0 versus :ρ≠0 D) Parameter: ρ = correlation between point spread and steal differential for this team. Hypotheses: : ρ ≠ 0 versus :ρ=0 Answer: A Diff: 2 Type: BI L.O.: 4.1.2
Var: 1 151
73) Describe how you would generate a single randomization sample in this situation, and identify (using the appropriate notation) the sample statistic you would record for each sample. Answer: The null hypothesis is that the correlation between point spread and steal differential is 0 One way to generate a sample that is consistent with this value is to write the 25 steal differentials on index cards, shuffle them, and deal them out (one to each of the observed point spreads). The sample correlation r for the 25 new pairs should be calculated. Diff: 2 Type: ES Var: 1 L.O.: 4.4.2 74) Use technology to create a randomization distribution with at least 1,000 values for testing these hypotheses. Use your randomization distribution to estimate the p-value for this sample. Answer: The p-value is about 0.038. (Answers may vary.) Diff: 2 Type: ES Var: 1 L.O.: 4.4.3 75) Use technology to create a randomization distribution with at least 1,000 values for testing these hypotheses and estimate the p-value. Use your p-value to make a decision about these hypotheses. Be sure to word your decision in the context of the problem. Answer: Because the p-value is less than the significance level, we would reject the null hypothesis and we have evidence that point spread and steal differential are positively correlated. Diff: 2 Type: ES Var: 1 L.O.: 4.3.1;4.3.5 4.5
Making Connections
1) An article published in the Canadian Journal of Zoology presented a method for estimating the body fat percentage of North American porcupines; the method was illustrated with a sample of porcupines. Based on this sample, a 95% bootstrap confidence interval for the average body fat percentage of porcupines is 17.4% to 25.8%. Which of the following null hypotheses would be rejected based on this confidence interval? A) : μ = 19.1% B)
: μ = 31.0%
C)
: μ = 20.6%
D)
: μ = 24.7%
Answer: B Diff: 2 Type: BI L.O.: 4.5.1
Var: 1
152
2) Suppose that a 95% confidence interval for μ is (54.8, 60.8). Which of the following is most likely the p-value for the test of versus A) 0.031 B) 0.001 C) 0.016 D) 0.231 Answer: D Diff: 3 Type: BI L.O.: 4.5.2
Var: 1
3) Briefly explain the difference between a "bootstrap distribution" and a "randomization distribution". Answer: A bootstrap distribution is created by repeatedly resampling from the original sample (with replacement) to generate many samples that are similar to the original sample (that is, many samples that could have come from the population that the original sample came from). The bootstrap distribution is then used to create confidence intervals for a parameter. A randomization distribution is used in hypothesis testing and is created by forcing the sample to be consistent with the null hypothesis before repeatedly resampling. This generates a distribution of sample outcomes that are reasonable under the null hypothesis. This distribution is then used to estimate a p-value by comparing a statistic from the original sample to the randomization distribution. Diff: 3 Type: ES Var: 1 L.O.: 4.5.0 Use the following to answer the questions below: A study conducted by the National Center of Health Statistics collects data on Vitamin D levels. In 2011-2014, in a sample of 3,929 non-Hispanic Blacks showed that 17.5% were Vitamin D deficient. A 95% confidence interval based on the sample is (0.152, 0.200). 4) Define the appropriate parameter and state the appropriate hypotheses for testing the claim that, among African Americans, Vitamin D deficiency occurs at a rate other than 8%. A) Parameter: p = proportion of non-Hispanic Blacks that are Vitamin D deficient Hypotheses: versus B) Parameter: p = proportion of non-Hispanic Blacks that are Vitamin D deficient Hypotheses: versus C) Parameter: p = proportion of non-Hispanic Blacks that are Vitamin D deficient Hypotheses: versus D) Parameter: p = proportion of non-Hispanic Blacks that are Vitamin D deficient Hypotheses: versus Answer: A Diff: 2 Type: BI L.O.: 4.1.2
Var: 1
153
5) Does this confidence interval provide evidence that among non-Hispanic Blacks Vitamin D deficiency occurs at a rate other than 8%? What significance level is being used to make this decision? Briefly justify your answer. Answer: Yes. 8% is not included in the confidence interval, which indicates that it is not a plausible value of the parameter (p = proportion of African Americans who are Vitamin D deficient). This provides evidence that Vitamin D deficiency occurs at a rate other than 8% among African Americans. Since this is a 95% confidence interval, this decision is being made at the 5% significance level. Diff: 2 Type: ES Var: 1 L.O.: 4.5.2 6) In a test of the hypotheses
:
=
versus
:
≠
, the observed sample results in
a p-value of 0.0256. Would you expect a 95% confidence interval for
based on this
sample to contain 0? A) Yes B) No Answer: B Explanation: No, it would not contain 0. The small p-value means that the null hypothesis would be rejected at the 5% significance level and provides evidence that the two means are different. This means the sample suggests that 0 is not a plausible value for the difference in the two means. Since 0 is not a plausible value, it would not be included in the 95% confidence interval. Diff: 3 Type: MC Var: 1 L.O.: 4.5.1 7) In January 2015 Gallup reported the results from a survey of 167,000 U.S. adults from January - June 2014 for the Gallup-Healthways Well-Being Index. Based on self-reported height and weight data, they found that 62.8% of U.S. adults are overweight or obese. A 95% confidence interval for the proportion of U.S. adults that are overweight or obese is (0.626, 0.63). Does this interval support the claim that "two-thirds of Americans are overweight or obese"? A) Yes B) No Answer: B Explanation: No, this claim is not supported by the confidence interval. The interval suggests that percentages between 62.6% and 63% are plausible for the percentage of adults who are overweight or obese. Two-thirds (~66.7%) is not included in this interval, thus it is not a plausible value for the parameter. Diff: 2 Type: MC Var: 1 L.O.: 4.5.2
154
Use the following to answer the questions below: The makers of a popular brand of laundry detergent have discovered a new secret ingredient that they believe will boost the cleaning power of their detergent. The new ingredient is expensive, and if they use it, they would have to increase the retail price of the detergent (and they worry that the price increase will cause them to lose customers). However, they believe that if the improved detergent gets clothes drastically cleaner, customers will recognize that it is worth the extra cost. They conduct an experiment to compare the performance of the new and old formulas at removing grass stains, red wine, and chocolate from white t-shirts. Each cleaned shirt was rated on a scale from 1 (stain did not get removed) to 10 (no evidence of the stain) by trained experts. They compared the average rating for the new and old formulas. 8) Briefly explain what a Type II error would mean in this situation. Answer: A Type II error occurs when the null hypothesis is false but is not rejected. In this situation, a false null hypothesis means that the two formulas do not clean equally well. A Type II error would mean that the makers conclude that there is no significant difference in the cleaning ability of the two formulas when there actually is. Diff: 2 Type: ES Var: 1 L.O.: 4.3.3 9) Suppose they find that the average rating for the shirts cleaned with the new formula was 8.2 and the average rating for the shirts cleaned with the old formula is 8.0 Do you think these results are practically significant? Briefly explain. Answer: Answers will vary here, as this is subjective. Answers that provide a clear, reasonable rationale should be considered. Probably not. The difference in the averages is fairly small. Even if the new formula does get clothes cleaner, it is likely that is not a "drastic" improvement (and it is entirely possible that a customer couldn't tell the difference). If there is a risk of losing customers because of a price increase, it is likely not worth it (and thus not practically significant) since the new formula provides only a small improvement. Diff: 2 Type: ES Var: 1 L.O.: 4.5.3 10) Suppose, at the 5% significance level, they find that the new formula cleaned the shirts significantly better than the old formula, with a p-value of 0.046. Interpret the p-value, in terms of the probability of the results happening by random chance, in this context. Answer: If the two formulas cleaned clothes equally well (i.e., the null hypothesis is true and there is no difference), the chance of observing a sample difference (average rating for new average rating for old) of this size or larger would be about 4.6%. Diff: 2 Type: ES Var: 1 L.O.: 4.2.1
155
Use the following to answer the questions below: In May 2012, President Obama made history by revealing his support of gay marriage. Around that time, the Gallup Organization polled 1,024 U.S. adults about their opinions on gay/lesbian relations and gay marriage. They found that 54% of those sampled viewed gay/lesbian relations as "morally acceptable" and that 50% felt that gay marriage should be legal. 11) Does this sample provide evidence that the majority of Americans find gay/lesbian relations "morally acceptable"? Describe how you could generate a single randomization sample in this situation, and identify the statistic that you would calculate for each sample. Answer: "Majority" means more than half. If we are looking for evidence that the majority of Americans find gay/lesbian relations "morally acceptable", the null hypothesis would be that the proportion is 0.5. We need to generate a sample that is consistent with this proportion. In this case, a fair coin could be tossed and the outcomes "heads" could be recorded as "morally acceptable" (and thus tails would be not "morally acceptable"). This could be repeated for a total of 1,024 tosses. The proportion of this randomly generated sample that say "morally acceptable" would be used as the statistic. Diff: 2 Type: ES Var: 1 L.O.: 4.4.1 12) Use technology to determine if this sample provides evidence that the majority of Americans find gay/lesbian relations "morally acceptable". Be sure to state the hypotheses, give the p-value, and clearly state the conclusion in context. Include an assessment of the strength of the evidence. Answer: Parameter: p = proportion of Americans who find gay/lesbian relations "morally acceptable" Hypotheses: : p = 0.50 versus : p > 0.50 The observed sample statistic is = 0.54. The p-value is the proportion of the dots that are greater than or equal to 0.54. Actual answers will vary, but the p-value should be near 0.005. This p-value provides very strong evidence that the majority of Americans find gay/lesbian relations to be "morally acceptable." Diff: 2 Type: ES Var: 1 L.O.: 4.1.2; 4.2.2; 4.2.5; 4.3.2; 4.3.5; 4.4.3; 4.4.4
156
13) Use the provided randomization distribution (based on 1,000 samples) to determine if this sample provides evidence that the majority of Americans find gay/lesbian relations "morally acceptable". Be sure to state the hypotheses, give the p-value, and clearly state the conclusion in context. Include an assessment of the strength of the evidence.
Answer: Parameter: p = proportion of Americans who find gay/lesbian relations "morally acceptable" Hypotheses: : p = 0.50 versus : p > 0.50 The observed sample statistic is = 0.54. The p-value is the proportion of the 1,000 dots that are greater than or equal to 0.54 (5 or 7 should be accepted), thus either 0.005 or 0.007 should be accepted as the p-value for this test. This p-value provides very strong evidence that the majority of Americans find gay/lesbian relations to be "morally acceptable." Note that this conclusion is based on the interpretation of the p-value as the strength of the evidence against the null hypothesis, but the same conclusion would be reached at any reasonable significance level. Diff: 2 Type: ES Var: 1 L.O.: 4.1.2; 4.2.2; 4.2.5; 4.3.2; 4.3.5; 4.4.4
157
14) Define the appropriate parameter and state the hypotheses for testing if this sample provides evidence that the proportion of American adults who support gay marriage differs from 50%. A) Parameter: p = proportion of American adults who support gay marriage Hypotheses: : p = 0.50 versus : p ≠ 0.50 B) Parameter: p = proportion of American adults who support gay marriage Hypotheses: : p ≠ 0.50 versus : p = 0.50 C) Parameter: p = proportion of American adults who support gay marriage Hypotheses: : p = 0.50 versus : p > 0.50 D) Parameter: p = proportion of American adults who support gay marriage Hypotheses: : p = 0.50 versus : p < 0.50 Answer: A Diff: 2 Type: BI L.O.: 4.1.2
Var: 1
15) A 90% confidence interval for the proportion of American adults who support gay marriage is (0.475, 0.524). Does this confidence interval provide evidence that the percentage of American adults who support gay marriage differs from 50%? State the significance level you are using. A) No; significance level of 10% B) No; significance level of 5% C) Yes; significance level of 10% D) Yes; significance level of 5% Answer: A Explanation: No, this sample does not provide evidence that the percentage of American adults who support gay marriage differs from 50% because 0.50 is included in the confidence interval. This means that 0.50 (or 50%) is a plausible value for the parameter of American adults who support gay marriage; that is, this sample statistic is consistent with that parameter value. A significance level of 10% is being used because this is a 90% confidence interval. Diff: 2 Type: BI Var: 1 L.O.: 4.5.2
© 2021 John Wiley & Sons, Inc. All rights reserved. Instructors who are authorized users of this course are permitted to download these materials and use them in connection with the course. Except as permitted herein or by law, no part of these materials should be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise.
Statistics - Unlocking the Power of Data, 3e (Lock) Chapter 5 Approximating with a Distribution 5.1
Hypothesis Tests Using Normal Distributions
Use the following to answer the questions below: Select the answer closest to the specified areas for a N(0, 1) density. Round to three decimal places. 1) The area to the left of z = 0.63. A) 0.736 158
B) 0.264 C) 0.525 D) 0.041 Answer: A Diff: 1 Type: BI L.O.: 5.1.3
Var: 1
2) The area to the right of z = -0.47. A) 0.341 B) 0.770 C) 0.681 D) 0.319 Answer: C Diff: 1 Type: BI Var: 1 L.O.: 5.1.3 3) The area between z = 0.51 and z = 2.79. A) 0.695 B) 0.302 C) 0.692 D) 0.997 Answer: B Diff: 2 Type: BI Var: 1 L.O.: 5.1.3 4) The area outside of the interval z = -2.13 and z = 1.11. A) 0.133 B) 0.017 C) 0.850 D) 0.150 Answer: D Diff: 3 Type: BI Var: 1 L.O.: 5.1.3
159
Use the following to answer the questions below: Find the endpoint(s) on a N(0, 1) density with the given property. Round to three decimal places. 5) The area to the left of the endpoint is about 0.20. A) -2.054 B) 0.842 C) -0.842 D) 2.054 Answer: C Diff: 2 Type: BI Var: 1 L.O.: 5.1.4 6) The area to the right of the endpoint is about 0.85. A) -1.036 B) 1.036 C) -0.842 D) 1.375 Answer: A Diff: 2 Type: BI Var: 1 L.O.: 5.1.4 7) The area between ±z is about 0.88. A) 1.175 B) 1.645 C) 1.275 D) 1.555 Answer: D Diff: 2 Type: BI Var: 1 L.O.: 5.1.4 Use the following to answer the questions below: Select the answer closest to the specified areas for a normal density. Round to three decimal places. 8) The area to the left of 32 on a N(45, 8) distribution. A) 0.948 B) 0.052 C) 0.896 D) 0.104 Answer: B Diff: 1 Type: BI Var: 1 L.O.: 5.1.3
160
9) The area to the right of 12 on a N(60, 4) distribution. A) 0.691 B) 0.383 C) 0.617 D) 0.309 Answer: D Diff: 2 Type: BI Var: 1 L.O.: 5.1.3 10) The area between 43 and 100 on a N(14, 3) distribution. A) 0.985 B) 0.122 C) 0.863 D) 0.878 Answer: C Diff: 2 Type: BI Var: 1 L.O.: 5.1.3 Use the following to answer the questions below: Find the endpoint(s) on the normal density curve with the given property. Round to three decimal places. 11) The area to the left of the endpoint on a N(54, 2.5) curve is about 0.15. A) 51.409 B) 56.591 C) 59.425 D) 48.575 Answer: A Diff: 2 Type: BI Var: 1 L.O.: 5.1.4 12) The area to the right of the endpoint on a N(26, 4) curve is about 0.4. A) 18.997 B) 24.987 C) 33.003 D) 27.013 Answer: D Diff: 2 Type: BI Var: 1 L.O.: 5.1.4
161
13) The symmetric middle area on a N(12, 4) curve is about 0.75. A) 4.160 and 19.840 B) 7.399 and 16.601 C) 9.302 and 14.698 D) 6.874 and 17.126 Answer: B Diff: 2 Type: BI Var: 1 L.O.: 5.1.4 Use the following to answer the questions below: It is generally believed that the heights of adults males in the U.S. are approximately normally distributed with mean 70 inches (5 feet, 10 inches) and standard deviation 3 inches and that the heights of adult females in the U.S. are also approximately normally distributed with mean 64 inches (5 feet, 4 inches) and standard deviation 2.5 inches. A small university is considering custom ordering beds for their dorm rooms. Answer the following questions about the lengths of beds in dorm rooms at this university. 14) Draw a sketch of the distribution of women's heights and label at least three points on the horizontal axis.
162
Answer: Answers may vary: If variable = men, the sketch should look roughly like:
If variable = women, the sketch should look roughly like:
Diff: 2 Type: ES L.O.: 5.1.2
Var: 1
163
15) The beds that the university currently purchases are 75 inches long. What proportion of males will be able to fit on the bed while lying perfectly straight? Round your answer to three decimal places. Answer: 0.952 Explanation: Need to find area on a N(70, 3) distribution to the left of 75. Diff: 2 Type: SA Var: 1 L.O.: 5.1.4 16) Should the university be concerned that females will not fit in the 75 inch beds? A) Yes B) No Answer: B Explanation: Need to find area on N(64, 2.5) density that is to the right of 75 (or to the left of 75). No, they should not be concerned because ≈ 0% of females are taller than 75 inches (or roughly 100% of females are shorter than 75 inches). Diff: 2 Type: MC Var: 1 L.O.: 5.1.3 17) The university plans on ordering custom sized beds such that 99% of male students are expected to fit in them when lying perfectly straight. What length beds should they order? Round your answer to the nearest inch. A) 77 inches B) 78 inches C) 76 inches D) 75 inches Answer: A Explanation: The endpoint of the N(70, 3) density that has 99% to the left of it is 76.978, or about 77. If they want 99% of male students to fit in them when lying perfectly straight, they should order the beds to be 77 inches long. Diff: 2 Type: BI Var: 1 L.O.: 5.1.4
164
18) The university decides it is too expensive to replace all the beds. Suppose the university has 2,150 beds all of which are 75 inches long. How many beds should they replace? You may assume that only those males taller than 75 inches will receive the longer beds and that females make up half of the population that will need a dorm room bed. A) 52 beds B) 44 beds C) 32 beds D) 28 beds Answer: A Explanation: First, need to determine the proportion of males who are taller than 75 inches and thus will receive a new bed (find area above 75 in density): 0.048 Now figure out number of males that will need new beds. Since half of the students are males, there are male students. Lastly, 1,075(0.048) = 51.6, or about 52 (number of males taller than 75 inches and thus number of beds they need to custom order) Diff: 3 Type: BI Var: 1 L.O.: 5.1.3
165
Use the following to answer the questions below: In the following, convert an area from one normal distribution to an equivalent area for a different normal distribution. Show details of your calculation. Draw sketches of both normal distributions, find and label the endpoints, and shade the regions on both curves. 19) The area to the left of 16 for a N(20, 3) distribution converted to a standard normal distribution Answer: z =
Diff: 2 Type: ES L.O.: 5.1.5
= -1.333
Var: 1
166
20) The "Q1" for a standard normal distribution converted to a N(15, 2.5) distribution. Answer: x = -0.6745 * 2.5 + 15 =13.3138
Diff: 2 Type: ES L.O.: 5.1.5
Var: 1
167
21) The area to the right of 50 in a N(40, 8) distribution converted to a standard normal distribution. Answer: z =
Diff: 2 Type: ES L.O.: 5.1.5
= -1.25
Var: 1
168
22) The middle 90% for a standard normal distribution converted to a N(45, 15) distribution. Answer: x = -1.645 * 15 + 45 = 20.325 x = 1.645 * 15 + 45 = 69.675
Diff: 2 Type: ES L.O.: 5.1.5
Var: 1
169
Use the following to answer the questions below: Heights of 10-year-old girls (5th graders) follow an approximately normal distribution with mean inches and standard deviation of inches. 23) Draw a sketch of this normal distribution and label at least three points on the horizontal axis. Answer:
Diff: 2 Type: ES L.O.: 5.1.2
Var: 1
24) What proportion of 10-year-old girls are shorter than 48 inches (4 feet)? Report your answer with four decimal places. Answer: 0.0089 Diff: 2 Type: SA Var: 1 L.O.: 5.1.3 25) What proportion of 10-year-old girls are taller than 60 inches (5 feet)? Report your answer with three decimal places. Answer: 0.019 Diff: 2 Type: SA Var: 1 L.O.: 5.1.3 26) What proportion of 10-year-old girls have heights between 50 and 55 inches? Report your answer with three decimal places. Answer: 0.536 Diff: 2 Type: SA Var: 1 L.O.: 5.1.3 170
27) A parent says her 10-year-old daughter is in the 95th percentile in height. How tall is the girl? Report your answer with one decimal place. A) 58.8 inches B) 59.8 inches C) 57.1 inches D) 60.5 inches Answer: A Diff: 2 Type: BI Var: 1 L.O.: 5.1.4 28) The tallest 15% of 10-year-old girls are taller than what height? Report your answer with one decimal place. A) 57.2 inches B) 57.8 inches C) 58.8.8 inches D) 59.8 inches Answer: A Diff: 2 Type: BI Var: 1 L.O.: 5.1.4 29) What is the first quartile of heights of 10-year-old girls? Report your answer with one decimal place. A) 52.6 inches B) 51.7 inches C) 54.4 inches D) 53.2 inches Answer: A Explanation: Here we are looking for the 25th percentile. 52.6 inches Diff: 2 Type: BI Var: 1 L.O.: 5.1.4 30) What is the IQR of heights of 10-year-old girls? Note that you will need to find two endpoints of the distribution for your calculation. Report you answer with one decimal place. A) 3.6 inches B) 3.1 inches C) 4.2 inches D) 4.7 inches Answer: A Explanation: Q1, or 25th percentile: 52.6 inches Q3, or 75th percentile: 56.2 inches IQR = 56.2 - 52.6 = 3.6 inches Diff: 3 Type: BI Var: 1 L.O.: 5.1.4
171
Use the following to answer the questions below: Use the provided density function to choose the best estimate for the proportion of the population found in the specified region.
31) The percent of the population that is less than 20 is closest to A) 5% B) 25% C) 75% D) 95% Answer: A Diff: 2 Type: BI Var: 1 L.O.: 5.1.1 32) The percent of the population that is less than 40 is closest to A) 15% B) 35% C) 75% D) 95% Answer: B Diff: 2 Type: BI Var: 1 L.O.: 5.1.1 33) The percent of the population that is more than 100 is closest to A) 75% B) 20% C) 10% D) 2% Answer: D Diff: 2 Type: BI Var: 1 L.O.: 5.1.1 172
34) The percent of the population that is more than 50 is closest to A) 25% B) 75% C) 50% D) 90% Answer: C Diff: 2 Type: BI Var: 1 L.O.: 5.1.1 35) The percent of the population between 20 and 80 is closest to A) 85% B) 99% C) 70% D) 50% Answer: A Diff: 2 Type: BI Var: 1 L.O.: 5.1.1
173
Use the following to answer the questions below: A student suspects that the length of songs currently on her Spotify playlist are approximately normally distributed with a mean of 257 seconds and standard deviation 62 seconds. 36) Draw a sketch of this normal distribution and label at least three points on the horizontal axis. Answer:
Diff: 2 Type: ES L.O.: 5.1.1
Var: 1
37) What proportion of songs are less than 180 seconds (3 minutes)? Report your answer with three decimal places. Answer: 0.107 Diff: 2 Type: SA Var: 1 L.O.: 5.1.3 38) What proportion of songs are longer than 300 seconds (5 minutes)? Report your answer with three decimal places. Answer: 0.244 Diff: 2 Type: SA Var: 1 L.O.: 5.1.3
174
39) What proportion of songs are between 240 and 360 seconds (4 minutes and 6 minutes)? Report your answer with three decimal places. Answer: 0.560 Diff: 2 Type: SA Var: 1 L.O.: 5.1.3 40) The shortest 10% of songs are shorter than what length? Report your answer with one decimal place. A) 177.5 seconds B) 179.4 seconds C) 181.5 seconds D) 185.9 seconds Answer: A Diff: 2 Type: BI Var: 1 L.O.: 5.1.4 41) The longest 25% of songs are longer than what length? Report your answer with one decimal place. A) 298.8 seconds B) 271.5 seconds C) 282.9 seconds D) 291.2 seconds Answer: A Diff: 2 Type: BI Var: 1 L.O.: 5.1.4 42) The symmetric middle 90% of songs have lengths between what two values? Round all values to one decimal place. A) 155.0 seconds and 359.0 seconds B) 151.0 seconds and 363.0 seconds C) 149.0 seconds and 365.0 seconds D) 153.0 seconds and 361.0 seconds Answer: A Diff: 2 Type: BI Var: 1 L.O.: 5.1.4
175
Use the following to answer the questions below: Robins are common birds in North America. Suppose that the wingspan of robins is approximately normal with mean 14 inches and standard deviation 0.7 inches. 43) Draw a sketch of this normal distribution and label at least three points on the horizontal axis. Answer:
Diff: 2 Type: ES L.O.: 5.1.2
Var: 1
44) What proportion of robins have wingspans less than 13 inches? Report your answer with three decimal places. Answer: 0.077 Diff: 2 Type: SA Var: 1 L.O.: 5.1.3 45) What proportion of robins have wingspans longer 15.5 inches? Report your answer with three decimal places. Answer: 0.016 Diff: 2 Type: SA Var: 1 L.O.: 5.1.3 46) What proportion of robins have wingspans between 12.5 and 13.5 inches? Report your answer with three decimal places. Answer: 0.221 Diff: 2 Type: SA Var: 1 L.O.: 5.1.3 176
47) What is the 30th percentile of robin wingspans? Report your answer with two decimal places. A) 13.63 inches B) 13.21 inches C) 12.74 inches D) 12.36 inches Answer: A Diff: 2 Type: BI Var: 1 L.O.: 5.1.4 48) The largest 20% of robins have wingspans longer than what value? Report your answer with two decimal places. A) 14.59 inches B) 14.87 inches C) 14.32 inches D) 13.79 inches Answer: A Diff: 2 Type: BI Var: 1 L.O.: 5.1.4
177
Use the following to answer the questions below: Final grades in Professor Albert's large calculus class are approximately normally distributed with a mean of 76 (%) and standard deviation of 8 (%). 49) Draw a sketch of this normal distribution and label at least three points on the horizontal axis. Answer:
Diff: 2 Type: ES L.O.: 5.1.2
Var: 1
50) In Professor Albert's course, students who earn less than a 60% in the class are assigned a failing grade (F). What proportion of the students earned F's? Report your answer with three decimal places. Answer: 0.023 Diff: 2 Type: SA Var: 1 L.O.: 5.1.3 51) In Professor Albert's course, students who earn above a 94% are assigned an "A." What proportion of students earned A's? Report your answer with three decimal places. Answer: 0.012 Diff: 2 Type: SA Var: 1 L.O.: 5.1.3
178
52) What proportion of students earn between an 82% and 88% in this class? Report your answer with three decimal places. Answer: 0.160 Diff: 2 Type: SA Var: 1 L.O.: 5.1.3 53) What is the 25th percentile in this course? Report your answer with one decimal place. A) 70.6% B) 70.9% C) 72.3% D) 71.4% Answer: A Diff: 2 Type: BI Var: 1 L.O.: 5.1.4 54) The top 30% of students earned scores above what value? Report your answer with one decimal place. A) 80.2% B) 79.8% C) 79.3% D) 78.4% Answer: A Diff: 2 Type: BI Var: 1 L.O.: 5.1.4 5.2
Confidence Intervals Using Normal Distributions
Use the following to answer the questions below: Find the z* values based on a standard normal distribution for each of the following. Round to three decimal places. 1) An 86% confidence interval for a proportion. A) 1.080 B) 1.476 C) 0.994 D) 1.960 Answer: B Diff: 2 Type: BI Var: 1 L.O.: 5.2.1
179
2) An 88% confidence interval for a correlation. A) 2.575 B) 1.175 C) 2.326 D) 1.555 Answer: D Diff: 2 Type: BI Var: 1 L.O.: 5.2.1 3) A 78% confidence interval for a mean. A) 0.772 B) 1.227 C) 1.514 D) 1.126 Answer: B Diff: 2 Type: BI Var: 1 L.O.: 5.2.1 4) A 66% confidence interval for a slope. A) 0.954 B) 0.412 C) 0.754 D) 1.016 Answer: A Diff: 2 Type: BI Var: 1 L.O.: 5.2.1 Use the following to answer the questions below: A set of hypotheses, some information from one or more samples, and a standard error from a randomization distribution are provided. Find the value of the standardized z-test statistic. 5) Test
: p = 0.75 versus
A) 0.03 B) 2 C) -2 D) 1.5 Answer: B Diff: 2 Type: BI L.O.: 5.2.2
: p > 0.75 when the sample has
Var: 1
180
and
6) Test
: μ = 26 versus
SE = 0.6. A) -1.5 B) 0.278 C) 2.5 D) -0.833 Answer: D Diff: 2 Type: BI L.O.: 5.2.2 7) Test
:
=
: μ ≠ 26 when the sample has n = 75,
s = 5.4, and
Var: 1
and and
distribution is 3.2. A) 1.875 B) 6 C) 0.4 D) 0.5 Answer: A Diff: 2 Type: BI L.O.: 5.2.2
≠
:
when the samples have
The standard error of
from the randomization
Var: 1
Use the following to answer the questions below: Find the p-value based on a standard normal distribution for the standardized test statistic and provided alternative hypothesis. 8) z = -1.86 for
: p < 0.5
A) 0.031 B) 0.969 C) 0.062 D) 0.937 Answer: A Diff: 2 Type: BI L.O.: 5.2.2 9) z = 2.36 for
Var: 1
: μ > 86
A) 0.982 B) 0.0182 C) 0.991 D) 0.0091 Answer: D Diff: 2 Type: BI L.O.: 5.2.2
Var: 1
181
10) z = 1.75 for
:
A) 0.960 B) 0.040 C) 0.080 D) 0.920 Answer: C Diff: 2 Type: BI L.O.: 5.2.2
≠
Var: 1
Use the following to answer the questions below: A Gallup survey of 1,012 randomly selected U.S. adults (age 18 and over), 53% said that they were dissatisfied with the quality of education students receive in kindergarten through grade 12. The bootstrap distribution (based on 5,000 samples) is provided.
11) Would it be appropriate to use the normal distribution to construct the confidence interval in this situation? A) Yes B) No Answer: A Explanation: Yes, the bootstrap distribution is pretty symmetric (and the sample size is large). Diff: 2 Type: MC Var: 1 L.O.: 5.2.3
182
12) The standard error from the bootstrap distribution is SE = 0.016. Use the normal distribution to construct and interpret a 99% confidence interval for the proportion of U.S. adults who are dissatisfied with the education students receive in kindergarten through grade 12. Round to three decimal places. Answer: z* = 2.575, so 0.53 ± 2.575*0.016 0.489 to 0.571 We are 99% sure that the proportion of U.S. adults who are dissatisfied with the education that students receive in kindergarten through grade 12 is between 0.489 and 0.571 (48.9% and 57.1%). Diff: 2 Type: ES Var: 1 L.O.: 5.2.1 13) A Gallup survey of 1,012 randomly selected U.S. adults (age 18 and over), 53% said that they were dissatisfied with the quality of education students receive in kindergarten through grade 12. Use the normal distribution to test if the proportion of U.S. adults who are dissatisfied with the education that students receive in kindergarten through grade 12 differs from 50%. The randomization distribution for this test is approximately normal and the standard error is Include all details of the test and use a 5% significance level. Answer: p = proportion of U.S. adults who are dissatisfied with the quality of education students receive in kindergarten through grade 12 : p = 0.50 : p ≠ 0.50 z=
= 1.875
p-value = 0.06 (two-tail, using Statkey) There is no evidence that the proportion of U.S. adults who are dissatisfied with the quality of education students receive in kindergarten through grade 12 differs from 50% (or 0.50). Diff: 2 Type: ES Var: 1 L.O.: 5.2.2
183
14) A sample of 148 college students reports sleeping an average of 6.85 hours on weeknights. The sample size is large enough to use the normal distribution, and a bootstrap distribution shows that the standard error is Use a normal distribution to construct and interpret a 95% confidence interval for the mean amount of weeknight sleep students get at this university. Use two decimal places in your answer. A) 6.51 to 7.19 hours B) 4.89 to 8.81 hours C) 6.68 to 7.03 hours D) 4.81 to 8.89 hours Answer: A Explanation: z* = 1.96 6.85 ± 1.96(0.175) 6.51 to 7.19 hours We are 95% sure that the mean amount of weeknight sleep students get at this university is between 6.51 and 7.19 hours. Diff: 2 Type: BI Var: 1 L.O.: 5.2.1
184
15) Gallup conducted a survey of 1,015 randomly selected U.S. adults about "Black Friday" shopping. They asked the following question: "As you know, the Friday after Thanksgiving is one of the biggest shopping days of the year. Looking ahead, do you personally plan on shopping on the Friday after Thanksgiving, or not?" Of the 515 men who responded, 16% said "Yes." Of the 500 women who responded, 20% said "Yes." The standard error of the differences in proportions is about Use the normal distribution to test, at the 5% level, if the proportions of men and women who planned to shop on the Friday after Thanksgiving are significantly different. The sample size is large enough to use the normal distribution. Answer: = proportion of men who planned to shop the Friday after Thanksgiving = proportion of women who planned to shop the Friday after Thanksgiving :
=
:
≠
z=
= -1.6
p-value = 0.11 (two-tail, using Statkey) There is no evidence that the proportion of men and women who planned to shop the Friday after Thanksgiving are significantly different. Diff: 2 Type: ES Var: 1 L.O.: 5.2.2
185
Use the following to answer the questions below: The gas prices for a random sample of n = 10 gas stations in the state of Illinois have a mean of $3.975, with a standard deviation of $0.2266. 16) The bootstrap distribution, based on 5,000 samples, is provided. Would it be appropriate to use the normal distribution to construct a confidence interval for the mean gas price in Illinois?
A) Yes B) No Answer: A Explanation: Yes, the bootstrap distribution looks approximately normal. Diff: 2 Type: MC Var: 1 L.O.: 5.2.3
186
17) The standard error from the bootstrap distribution is SE = 0.069. Use the normal distribution to construct and interpret a 90% confidence interval for the mean gas price in Illinois. Round all values to two decimal places. A) $3.86 to $4.09 B) $2.33 to $5.62 C) $3.91 to $4.04 D) $3.75 to $4.21 Answer: A Explanation: z* = 1.645 3.975 ± 1.645(0.069) $3.86 to $4.09 We are 90% sure that the mean gas price in Illinois on August 8, 2012 was between $3.86 and $4.09. Diff: 2 Type: BI Var: 1 L.O.: 5.2.1
187
Use the following to answer the questions below: There are 24 students enrolled in an introductory statistics class at a small university. As an in-class exercise the students were asked how many hours of television they watch each week. The male students watched an average of 6 hours of television per week with standard deviation 4.24 hours. The
female students watched an average of
3.91 hours of television per week with a standard deviation of 3.48 hours. Assume that the students enrolled in the statistics class are representative of all students at the university. 18) The randomization distribution for
-
(where
and
are the sample mean amount
of television watched by male and female students, respectively) is provided. Would it be appropriate to use the normal distribution to perform a test comparing the mean amount of television watched per week by male and female students at this university?
A) Yes B) No Answer: A Explanation: Yes, the randomization distribution looks approximately normal. Diff: 2 Type: MC Var: 1 L.O.: 5.2.3
188
19) The standard error of the differences
and
is about SE = 1.667. Use the normal
distribution to test, at the 5% level, if male students at this university watch, on average, more television than female students. Include all details of the test. Answer: = mean amount of television watched in a week by male students at the university = mean amount of television watched in a week by female students at the university
z=
:
=
:
> = 1.25
p-value = 0.106 (right tail, using Statkey) There is no evidence that males at this university watch significantly more television, on average, than female students. Diff: 2 Type: ES Var: 1 L.O.: 5.2.2 20) A biologist interested in estimating the correlation between the body mass (in grams) and body length (in cm) of porcupines has a random sample of 18 porcupines with The bootstrap distribution she constructed is approximately normal and the standard error is estimated to be 0.165. Use the normal distribution to construct and interpret a 98% confidence interval for the correlation between body mass and body length in porcupines. Round all values to three decimal places. A) 0.023 to 0.791 B) 0.019 to 0.795 C) 0.010 to 0.840 D) 0.028 to 0.800 Answer: A Explanation: z* = 2.326 0.407 ± 2.326(0.165) 0.023 to 0.791 We are 98% sure that the correlation between body mass and body length of porcupines is between 0.023 and 0.791. Diff: 2 Type: BI Var: 1 L.O.: 5.2.1 © 2021 John Wiley & Sons, Inc. All rights reserved. Instructors who are authorized users of this course are permitted to download these materials and use them in connection with the course. Except as permitted herein or by law, no part of these materials should be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise. 189
Statistics - Unlocking the Power of Data, 3e (Lock) Chapter 6 Inference for Means and Proportions 6.1
Inference for a Proportion
Use the following to answer the questions below: Consider taking samples of size 100 from a population with proportion 0.33. 1) Find the mean of the distribution of sample proportions. A) 0.0033 B) 0.033 C) 0.33 D) 33 Answer: C Diff: 2 Type: BI Var: 1 L.O.: 6.1.1 2) Find the standard error of the distribution of sample proportions. A) 0.002211 B) 0.0033 C) 0.047 D) 0.33 Answer: C Diff: 2 Type: BI Var: 1 L.O.: 6.1.1 3) Is the sample size large enough for the Central Limit Theorem to apply so that the sample proportions follow a normal distribution? A) Yes B) No Answer: A Diff: 2 Type: MC Var: 1 L.O.: 6.1.2 Use the following to answer the questions below: Consider taking samples of size 25 from a population with proportion 0.65. 4) Find the mean of the distribution of sample proportions. A) 0.026 B) 0.65 C) 0.13 D) 16.25 Answer: B Diff: 2 Type: BI Var: 1 L.O.: 6.1.1 190
5) Find the standard error of the distribution of sample proportions. A) 0.0954 B) 0.0091 C) 0.0455 D) 0.0191 Answer: A Diff: 2 Type: BI Var: 1 L.O.: 6.1.1 6) Is the sample size large enough for the Central Limit Theorem to apply so that the sample proportions follow a normal distribution? A) Yes B) No Answer: B Diff: 2 Type: MC Var: 1 L.O.: 6.1.2 Use the following to answer the questions below: Suppose that the makers of M&M's claim that 24% of their Milk Chocolate M&M's are blue. 7) Assume that Fun-Size bags of Milk Chocolate M&M's hold 20 candies. Find the standard error of the distribution of sample proportions of blue candies for Fun-Size bags (i.e., samples of size 20). Use four decimal places when reporting the standard error. Answer: 0.0955 Explanation: standard error = Diff: 2 Type: SA L.O.: 6.1.1
= 0.0955
Var: 1
8) Assume that the bags of Milk Chocolate M&M's sold in vending machines have 55 candies. Find the standard error of the distribution of sample proportions of blue candies for vending machine bags (i.e., samples of size 55). Use four decimal places when reporting the standard error. Answer: 0.0576 Explanation: standard error = Diff: 2 Type: SA L.O.: 6.1.1
= 0.0576
Var: 1
191
9) Assume that bags of Milk Chocolate M&M's labeled as "Medium" size contain 415 candies. Find the standard error of the distribution of sample proportions of blue candies for Medium bags (i.e., samples of size 415). Use four decimal places when reporting the standard error. Answer: 0.0210 Explanation: standard error = Diff: 2 Type: SA L.O.: 6.1.1
= 0.0210
Var: 1
10) Would you expect using bags of Milk Chocolate M&M's labeled as "Large" size, which contain more candies than the "Medium" size bags, to result in a larger or smaller standard error? A) Larger B) Smaller Answer: B Diff: 2 Type: BI Var: 1 L.O.: 6.1.1 11) For which sample sizes (Fun-Size with 20, Vending Machine with 55, or Medium with 415) would the Central Limit Theorem apply? A) Vending Machine and Medium size bags B) Fun-Size bags C) Medium size bags D) Fun-size and Vending Machine size bags Answer: A Explanation: The Central Limit Theorem would apply for the Vending Machine and Medium size bags (but not the Fun-Size). Fun-Size: 20(0.24) = 4.8 < 10 <— no Vending Machine: 55(0.24) = 13.2 > 10, 55(0.76) = 41.8 > 10 <— yes Medium: 415(0.24) = 99.6 > 10, 415(0.76) = 315.4 > 10 <— yes Diff: 2 Type: BI Var: 1 L.O.: 6.1.2 12) Suppose you purchase a bag of Milk Chocolate M&M's from a vending machine and only 8 of your 55 candies are blue. Assuming that the sample proportions are normally distributed, what percent of vending machine bags (i.e., samples of size 55) will have a sample proportion smaller than 0.145? Use two decimal places when reporting your answer. Answer: 0.05 Explanation: 0.05 (found using Statkey) Diff: 2 Type: SA Var: 1 L.O.: 5.1.0;6.1.0
192
Use the following to answer the questions below: Admissions records at a small university indicates that 6.7% of the students enrolled are international students. 13) Find the mean and standard error of the sample proportion of international students in random samples of size 50. Use four decimal places when reporting the standard error. Answer: 0.0354 Explanation: mean = 0.067 standard error = Diff: 2 Type: SA L.O.: 6.1.1
= 0.0354 Var: 1
14) Find the mean and standard error of the sample proportion of international students in random samples of size 100. Use four decimal places when reporting the standard error. Answer: 0.0250 Explanation: mean = 0.067 standard error = Diff: 2 Type: SA L.O.: 6.1.1
= 0.0250 Var: 1
15) Find the mean and standard error of the sample proportion of international students in random samples of size 200. Use four decimal places when reporting the standard error. Answer: 0.0177 Explanation: mean = 0.067 standard error = Diff: 2 Type: SA L.O.: 6.1.1
= 0.0177 Var: 1
193
16) For which sample sizes (n = 50, n = 100, and n = 200) would the Central Limit Theorem apply? A) n = 200 B) n = 50 and n = 100 C) n = 100 and n = 200 D) all three sample sizes Answer: A Explanation: Only the sample of size n = 200. n = 50: 50*0.067 = 3.35 < 10 <-- no n = 100: 100*0.067 = 6.7 < 10 <-- no n = 200: 200*0.067 = 13.4 > 10,200*0.933 = 186.6 > 10 <-- yes Diff: 2 Type: BI Var: 1 L.O.: 6.1.2 17) What proportion of samples of 200 randomly selected students will have at least 8% international students? Use three decimal places when reporting your answer. Answer: 0.231 Explanation: 0.231 (found using Statkey) Diff: 2 Type: SA Var: 1 L.O.: 5.1.0;6.1.0 Use the following to answer the questions below: A study to investigate the dominant paws in cats was described in the scientific journal Animal Behaviour. The researchers used a random sample of 42 domestic cats. In this study, each cat was shown a treat (5 grams of tuna), and while the cat watched, the food was placed inside a jar. The opening of the jar was small enough that the cat could not stick its head inside to remove the treat. The researcher recorded the paw that was first used by the cat to try to retrieve the treat. This was repeated 100 times for each cat (over a span of several days). The paw used most often was deemed the dominant paw (note that one cat used both paws equally and was classified as "ambidextrous"). Of the 42 cats studied, 20 were classified as "left-pawed." 18) Verify that the sample is large enough to use the normal formula to find a confidence interval for the proportion of domestic cats that are "left-pawed." Answer: Yes. Explanation: = 20/42 = 0.476 n = 20 > 10, n(1 - ) = 22 > 10 <— Yes, we can use the normal formula. Diff: 2 Type: SA Var: 1 L.O.: 6.2.0
194
19) Construct a 95% confidence interval for the proportion of domestic cats that are "left-pawed." Use three decimal places in your margin of error. Answer: 0.325 to 0.627 Explanation: = 20/42 = 0.476 0.476 ± 1.96 0.476 ± 0.151 0.325 to 0.627 Diff: 2 Type: SA L.O.: 6.2.1
Var: 1
20) Construct a 95% confidence interval for the proportion of domestic cats that are "left-pawed." Provide an interpretation of your interval in the context of this data situation. Answer: We are 95% sure that the proportion of domestic cats that are "left-pawed" is between 0.325 and 0.627. Diff: 2 Type: ES Var: 1 L.O.: 3.2.4;6.2.0 21) Another researcher wants to conduct a similar study to more precisely estimate the proportion of cats that are "left-pawed." They want to construct a 95% confidence interval that has a margin of error of 6%. How many cats does she need to use in her sample? A) 267 cats B) 266 cats C) 268 cats D) 269 cats Answer: A Explanation: Use the results of the original study as = 0.476. n=
(0.476)(1 - 0.476) = 266.2
She needs 267 cats to meet her goal. Diff: 2 Type: BI Var: 1 L.O.: 6.2.2
195
Use the following to answer questions 18-21: In a survey of 7,786 randomly selected adults living in Germany, 5,840 said they exercised for at least 30 minutes three or more times per week. 22) Verify that the sample is large enough to use the normal formula to find a confidence interval for the proportion of Germans who exercises for 30 minutes three or more times a week. Answer: = 5,840/7,786 = 0.75 n = 5,840 > 10, n (1 - ) = 1,946 > 10 <— Yes, we can use the normal formula. Diff: 2 Type: ES Var: 1 L.O.: 6.2.0 23) Construct a 99% confidence interval for the proportion of Germans who exercise for 30 minutes three or more times a week. Use three decimal places in your margin of error. A) 0.737 to 0.763 B) 0.733 to 0.767 C) 0.722 to 0.778 D) 0.749 to 0.751 Answer: A Explanation: = 5,840/7,786 = 0.75 0.75 ± 2.575 0.75 ± 0.013 0.737 to 0.763 Diff: 2 Type: BI L.O.: 6.2.1
Var: 1
24) Construct a 99% confidence interval for the proportion of Germans who exercise for 30 minutes three or more times a week. Provide an interpretation of your interval in the context of this data situation. Answer: We are 99% sure that the proportion of Germans who exercise for 30 minutes three or more times a week is between 0.737 and 0.763. Diff: 2 Type: ES Var: 1 L.O.: 3.2.4;6.2.0
196
25) Suppose an exercise scientist wants to estimate the proportion of American adults who exercise for 30 minutes three or more times per week. He wants to construct a 90% confidence interval with a margin of error of 1%. Note that Americans are typically thought to not be as active as individuals in other countries, and thus the estimate from Germany is likely not a good estimate for Americans. What sample size does he need? A) 6,766 people B) 13,532 people C) 27,060 people D) 41 people Answer: A Explanation: Use = 0.5 because 75% is not a good estimate for Americans and we don't have a better guess. n=
* 0.5 * 0.5 = 6,765.1
He should survey 6,766 people for his study. Diff: 2 Type: BI Var: 1 L.O.: 6.2.2 Use the following to answer the questions below: In a Gallup survey of 1,012 randomly selected U.S. adults (age 18 and over), 53% said that they were dissatisfied with the quality of education students receive in kindergarten through grade 12. 26) Verify that the sample is large enough to use the normal formula to find a confidence interval for the proportion of Americans who are dissatisfied with the quality of education students receive in kindergarten through grade 12. Answer: = 0.53 n = 536.36 > 10, n(1 - ) = 475.64 > 10 <— Yes, we can use the normal formula. Diff: 2 Type: ES Var: 1 L.O.: 6.2.0
197
27) Construct a 90% confidence interval for the proportion of U.S. adults who are dissatisfied with the quality of education students receive in kindergarten through grade 12. Use three decimal places in your margin of error. A) 0.504 to 0.556 B) 0.509 to 0.551 C) 0.512 to 0.548 D) 0.497 to 0.563 Answer: A Explanation: 0.53 ± 1.645 0.53 ± 0.026 0.504 to 0.556 Diff: 2 Type: BI L.O.: 6.2.1
Var: 1
28) Construct a 90% confidence interval for the proportion of U.S. adults who are dissatisfied with the quality of education students receive in kindergarten through grade 12. Provide an interpretation of your interval in the context of this data situation. Answer: We are 90% sure that the proportion of U.S. adults who are dissatisfied with the quality of education students receive in kindergarten through grade 12 is between 0.504 and 0.556. Diff: 2 Type: ES Var: 1 L.O.: 2.3.4;6.2.0 29) Suppose you want to estimate the proportion of local adults who are dissatisfied with the education students receive in kindergarten through grade 12 with 95% confidence and a 5% margin of error. If you suspect that local adults won't differ drastically from those Gallup used, how many people should you sample? A) You should sample 383 local adults. B) You should sample 369 local adults. C) You should sample 343 local adults. D) You should sample 358 local adults. Answer: A Explanation: Use = 0.53 (because we don't suspect the local adults to drastically differ from the Gallup sample) n=
(0.53)(0.47) = 382.8
You should sample 383 local adults. Diff: 2 Type: BI Var: 1 L.O.: 6.2.2
198
30) Test, at the 5% level, if this sample provides evidence that the proportion of Americans who are dissatisfied with education in kindergarten through grade 12 differs significantly from 50%. Be sure to verify that it is appropriate to use a normal distribution to compute the p-value and include all of the details of the test. Answer: p = proportion of U.S. adults who are dissatisfied with education in kindergarten through grade 12. : p = 0.50 : p ≠ 0.50 n
= n (1 -
) = 1,012*0.5 = 506 > 10
Since both are larger than 10, the sample size is large enough to use the normal distribution to compute the p-value. Test statistic: z =
= 1.909
p-value = 0.056 (two-tail probability, using Statkey) Since the p-value is larger than the 5% significance level, there is no evidence to reject
and
thus there is no evidence to conclude that proportion of U.S. adults who are dissatisfied with education in kindergarten through grade 12 differs significantly from 50%. Diff: 2 Type: ES Var: 1 L.O.: 6.3.1 Use the following to answer the questions below: In a recent study, the Centers for Disease Control and Prevention reported that in a sample of 4,349 African Americans 31% were Vitamin D deficient. Overall, it is believed that Vitamin D deficiency affects 8% of all U.S. adults. 31) Verify that the sample size is large enough to use a normal distribution to conduct a test comparing the population proportion of African Americans with Vitamin D deficiency to the overall rate of 8%. Answer: n = 4,349*0.08 = 347.92 > 10 n(1 -
) = 4,349*0.92 = 4,001.08 > 10
Since both exceed 10, the sample is large enough to perform a test to compare the population proportion to the overall rate of 8%. Diff: 2 Type: ES Var: 1 L.O.: 6.3.0 199
32) Test, at the 1% significance level, if this sample provides evidence that the rate of Vitamin D deficiency among African Americans differs significantly from the overall rate of 8%. Include all of the details of the test. Answer: p = proportion of African Americans with Vitamin D deficiency : p = 0.08 : p ≠ 0.08
Test statistic: z =
n
= 55.9
= 4,349*0.08 = 347.92 > 10
n(1 -
) = 4,349*0.92 = 4,001.08 > 10
Since both exceed 10, the sample is large enough to use the normal distribution to compute the p-value. p-value ≈ 0 (two-tail probability using Statkey) Because the p-value is drastically smaller than the 5% significance, we have very strong evidence to reject and thus have very strong evidence to conclude that the rate of Vitamin D deficiency among African Americans differs significantly from the overall rate of 8% for American adults. Diff: 2 Type: ES Var: 1 L.O.: 6.3.1 33) Verify that the sample size is large enough to use the normal distribution to construct a confidence interval for the proportion of African Americans with Vitamin D deficiency. Answer: = 0.31 n = 4,349*0.31 = 1,348.19 > 10 n(1 - ) = 4,349*0.69 = 3,000.81 > 10 Since both exceed 10, the sample size is large enough to use the normal distribution. Diff: 2 Type: ES Var: 1 L.O.: 6.2.0
200
34) Construct a 99% confidence interval for the proportion of African Americans with Vitamin D deficiency. Use three decimal places in your margin of error. A) 0.292 to 0.328 B) 0.298 to 0.328 C) 0.296 to 0.324 D) 0.282 to 0.338 Answer: A Explanation: z* = 2.575 0.31 ± 2.575
0.31 ± 2.575(0.007013111) 0.31 ± 0.018 0.292 to 0.328 Diff: 2 Type: BI L.O.: 6.2.1
Var: 1
Use the following to answer the questions below: The owner of a small pet supply store wants to open a second store in another city, but he only wants to do so if more than one-third of the city's households have pets (otherwise there won't be enough business). He selects a random sample of 150 households and finds that 64 have pets. 35) Verify that the sample size is large enough to perform a test to compare the population proportion of households in the city with pets to the target. Answer: n = 150(1/3) = 50 > 10 n(1 -
) = 150(2/3) = 100 > 10
Since both exceed 10, the sample size is large enough to use a normal distribution to perform this test about the population proportion. Diff: 2 Type: ES Var: 1 L.O.: 6.3.0
201
36) Test, at the 5% level, if this sample provides evidence that significantly more than one-third of the city's households have a pet. Include all of the details of the test. Answer: p = proportion of the city's households with a pet : p = 1/3 : p > 1/3 =
= 0.427
Test statistic: z =
n
= 2.433
= 150(1/3) = 50 > 10
n(1 -
) = 150(2/3) = 100 > 10
Since both exceed 10, the sample size is large enough to use a normal distribution to perform this test about the population proportion. p-value = 0.0075 Because the p-value is smaller than the 5% significance level, we have strong evidence to reject and thus have strong evidence to conclude that the proportion of the city's households that have pets is significantly higher than one-third. Diff: 2 Type: ES Var: 1 L.O.: 6.3.1 37) Verify that the sample size is large enough to use the normal distribution to construct a confidence interval for the proportion of the city's households that own pets. Answer: =
= 0.427
n = 64 >10 n(1 - ) = 86 > 10 Since there are more than 10 successes and failures, the sample size is large enough to use a normal distribution. Diff: 2 Type: ES Var: 1 L.O.: 6.2.0
202
38) Construct a 95% confidence interval for the proportion of the city's households that own pets. Round the sample proportion and margin of error to three decimal places. A) 0.348 to 0.506 B) 0.361 to 0.493 C) 0.323 to 0.531 D) 0.331 to 0.523 Answer: A Explanation: =
= 0.427
0.427 ± 1.96
0.427 ± 1.96(0.04038737) 0.427 ± 0.079 0.348 to 0.506 Diff: 2 Type: BI L.O.: 6.2.1
Var: 1
Use the following to answer the questions below: In May 2012 President Obama made history by revealing his support of gay marriage. Around that time the Gallup Organization polled 1,024 U.S. adults about their opinions on gay/lesbian relations and gay marriage. They found that 54% of those sampled viewed gay/lesbian relations as "morally acceptable." 39) Verify that the sample size is large enough to use the normal distribution to construct a confidence interval for the proportion of U.S. adults who consider gay and lesbian relations to be "morally acceptable." Answer: = 0.54 n = 1,024*0.54 = 552.96 > 10 n(1 - ) = 1,024*0.46 = 471.04 > 10 Since there are more than 10 successes and failures in each group, the sample size is large enough to use the normal distribution. Diff: 2 Type: ES Var: 1 L.O.: 6.2.0
203
40) Construct a 90% confidence interval for the proportion of U.S. adults who find gay/lesbian relations to be "morally acceptable." Round the margin of error to three decimal places. A) 0.514 to 0.566 B) 0.509 to 0.571 C) 0.500 to 0.580 D) 0.495 to 0.585 Answer: A Explanation: 0.54 ± 1.645
0.54 ± 1.645(0.01557492) 0.54 ± 0.026 0.514 to 0.566 Diff: 2 Type: BI L.O.: 6.2.1
Var: 1
41) What sample size would we need to reduce the margin of error to ± 1.5%? A) 2,988 U.S. adults B) 2,905 U.S. adults C) 2,887 U.S. adults D) 2,876 U.S. adults Answer: A Explanation: n=
0.54*0.46 = 2,987.452
2,988 U.S. adults Diff: 2 Type: BI L.O.: 6.2.2
Var: 1
204
42) Does this sample provide evidence that the majority of U.S. adults (i.e., more than half) believe that gay/lesbian relations are "morally acceptable"? Use a 5% significance level. Verify that the sample size is large enough to use the normal distribution to compute the p-value for this test and include all of the details of the test. Answer: p = proportion of U.S. adults who believe that gay/lesbian relations are morally acceptable : p = 0.5 : p > 0.5 n
= n(1 -
) = 1,024*0.5 = 512
Because both exceed 10, the sample size is large enough to use the normal distribution to compute the p-value for this test. Test statistic: z =
= 2.56
p-value = 0.0052 (right-tail, using Statkey) Because the p-value is considerably smaller than the 5% significance level, we have very strong evidence to reject and thus have very strong evidence to conclude that the majority (i.e., more than half) of U.S. adults believe that gay/lesbian relations are "morally acceptable." Diff: 2 Type: ES Var: 1 L.O.: 6.3.1
205
Use the following to answer the questions below: In a survey conducted by the Gallup organization, 1,017 adults were asked "In general, how much trust and confidence do you have in the mass media — such as newspapers, TV, and radio — when it comes to reporting the news fully, accurately, and fairly?" Of the 1,017 respondents, 214 said they had "no confidence at all." 43) Test, at the 5% level, if this sample provides evidence that the proportion of U.S. adults who have no confidence in the media differs significantly from 25%. Verify that the sample size is large enough to use the normal distribution to compute the p-value for this test and include all of the details of the test. Answer: p = proportion of U.S. adults who have no confidence in the media : p = 0.25 : p ≠ 0.25 n
= 1,017*0.25 = 254.25 > 10
n(1 -
) = 1,017*0.75 = 762.75 > 10
Since both are greater than 10, the sample size is large enough to use the normal distribution to compute the p-value for this test. Test statistic: z =
= -2.95
p-value = 0.003 (two-sided probability, using Statkey) Because the p-value is smaller than the 5% significance level, we have evidence to reject and thus have evidence to conclude that the proportion of U.S. adults who have no confidence in the media differs significantly from 25%. Diff: 2 Type: ES Var: 1 L.O.: 6.3.1
206
44) Verify that the sample size is large enough to use the normal distribution to construct a confidence interval for the proportion of U.S. adults who have no confidence in the media. Answer: n = 214 n(1 - ) = 803 Since there are more than 10 successes and failures, the sample size is large enough to use the normal distribution to construct a confidence interval. Diff: 2 Type: ES Var: 1 L.O.: 6.2.0 45) Construct a 90% confidence interval for the proportion of U.S. adults who have no confidence in the media. Round the margin of error to three decimal places. A) 0.189 to 0.231 B) 0.185 to 0.235 C) 0.177 to 0.243 D) 0.171 to 0.249 Answer: A Explanation: 0.21 ± 1.645 0.21 ± 1.645(0.01277211) 0.21 ± 0.021 0.189 to 0.231 Diff: 2 Type: BI L.O.: 6.2.1
Var: 1
46) What sample size is needed to reduce the margin of error to 1%? A) 4,490 U.S. adults B) 4,553 U.S. adults C) 4,567 U.S. adults D) 4,598 U.S. adults Answer: A Explanation: n=
0.21*0.79 = 4,489.3
4,490 U.S. adults Diff: 2 Type: BI L.O.: 6.2.2
Var: 1
207
Use the following information to answer the confidence interval for a population proportion p in the questions below. 47) 0.645 to 0.700 What is the margin of error? A) 0.055 B) 0.0275 C) 0.01375 D) 0.6725 Answer: B Diff: 3 Type: BI Var: 1 L.O.: 3.2.0 48) 0.655 to 0.685 What is the best estimate of p? A) 0.685 B) 0.03 C) 0.655 D) 0.67 Answer: D Diff: 3 Type: BI Var: 1 L.O.: 3.2.0 6.2
Inference for a Mean
Use the following to answer the questions below: A sample of 148 college students at a large university reports getting an average of 6.85 hours of sleep last night with a standard deviation of 2.12 hours. 1) Is it reasonable to use the t-distribution to construct a confidence interval for the average amount of sleep students at this university got last night? A) Yes B) No Answer: A Explanation: A sample of size 148 is large enough to use the Central Limit Theorem. Diff: 2 Type: MC Var: 1 L.O.: 6.4.2
208
2) Construct a 98% confidence interval for the average amount of sleep students at this university got last night. Use two decimal places in your margin of error. A) 6.44 to 7.26 hours B) 6.49 to 7.21 hours C) 6.53 to 7.17 hours D) 6.38 to 7.32 hours Answer: A Explanation: n = 148 ⇒ df = 147 ⇒ t* = 2.352 (using Statkey) 6.85 ± 2.352
6.85 ± 0.41 6.44 to 7.26 hours Diff: 2 Type: BI L.O.: 6.4.3;6.5.1
Var: 1
3) Construct a 98% confidence interval for the average amount of sleep students at this university got last night. Provide an interpretation of your interval in the context of this data situation. Answer: We are 98% sure that the mean amount of sleep students at this university got last night is between 6.44 and 7.26 hours. Diff: 2 Type: ES Var: 1 L.O.: 3.2.4;6.5.0 4) Suppose you want to conduct a similar study at your university. Assuming that the standard deviation of this sample is a reasonable estimate of the standard deviation of sleep time at your university, how many students do you need to survey to estimate the mean sleep time of students at your university with 95% confidence and a margin of error of 0.5 hours? A) 70 students B) 69 students C) 88 students D) 89 students Answer: A Explanation: = 2.12 n=
= 69.1
You need to survey 70 students. Diff: 2 Type: BI Var: 1 L.O.: 6.5.2
209
Use the following to answer the questions below: An Internet provider contacts a random sample of 300 customers and asks how many hours per week the customers use the Internet. The responses are summarized in the provided dotplot. The average amount of time spent on the Internet per week was 7.2 hours, with a standard deviation of 7.9 hours.
5) Is it reasonable to use the t-distribution to construct a confidence interval for the average amount of time customers of this Internet provider spend on the Internet each week? A) Yes B) No Answer: A Explanation: Yes, even though the distribution is skewed with some outliers, the sample size is extremely large Diff: 2 Type: MC Var: 1 L.O.: 6.4.2
210
6) Construct a 95% confidence interval for the average amount of time customers of this Internet provider spend on the Internet each week. Round the margin of error to one decimal place. A) 6.3 to 8.1 hours B) 6.0 to 8.4 hours C) 5.7 to 8.7 hours D) 5.5 to 8.9 hours Answer: A Explanation: n = 300 ⇒ df = 299 ⇒ t* = 1.968 7.2 ± 1.968 7.2 ± 0.9 6.3 to 8.1 hours Diff: 2 Type: BI L.O.: 6.4.3;6.5.1
Var: 1
7) Construct a 95% confidence interval for the average amount of time customers of this Internet provider spend on the Internet each week. Provide an interpretation of your interval in the context of this data situation. Answer: We are 95% sure that the average amount of time customers of this Internet provider spend on the Internet each week is between 6.3 and 8.1 hours. Diff: 2 Type: ES Var: 1 L.O.: 3.2.4;6.5.0 8) If we want a margin of error of 0.5 hours, how large of a sample would we need? A) 960 people B) 953 people C) 952 people D) 944 people Answer: A Explanation: n =
= 959.02
We would need 960 people. Diff: 2 Type: BI Var: 1 L.O.: 6.5.2
211
Use the following to answer the questions below: According to a National Science Foundation study, individuals who graduated with a doctoral degree had an average of $14,115 graduate debt. Assume that the standard deviation of graduate debt is $26,400. If we take lots of samples of individuals who graduated with a doctoral degree, what would you expect the standard error of the distribution of sample mean graduate debt amounts to be in each case? In each case, use two decimal places when reporting your standard error. 9) n = 200 individuals Answer: $1,866.76 Explanation: mean: μ = $14,115 SE =
= $1,866.76
Diff: 2 Type: SA L.O.: 6.4.1
Var: 1
10) n = 500 individuals Answer: $1,180.64 Explanation: mean: μ = $14,115 SE =
= $1,180.64
Diff: 2 Type: SA L.O.: 6.4.1
Var: 1
Use the following to answer the questions below: For each of the following, assume that the sample is a random sample from a distribution that is reasonably normally distributed and that we are doing inference for a population mean. 11) Find endpoints of a t-distribution with 2.5% beyond them in each tail if the sample has size A) -2.145 and 2.145 B) -1.533 and 1.533 C) -1.918 and 1.918 D) -2.328 and 2.328 Answer: A Diff: 1 Type: BI L.O.: 6.4.3
Var: 1
212
12) Find endpoints of a t-distribution with 10% beyond them in each tail if the sample has size A) -1.533 and 1.533 B) -2.145 and 2.145 C) -1.918 and 1.918 D) -2.328 and 2.328 Answer: A Diff: 1 Type: BI L.O.: 6.4.3
Var: 1
13) Find endpoints of a t-distribution with 3% beyond them in each tail if the sample has size A) -1.918 and 1.918 B) -2.145 and 2.145 C) -1.533 and 1.533 D) -2.328 and 2.328 Answer: A Diff: 2 Type: BI L.O.: 6.4.3
Var: 1
14) Find endpoints of a t-distribution with 1.5% beyond them in each tail if the sample has size 22. A) -2.328 and 2.328 B) -2.145 and 2.145 C) -1.533 and 1.533 D) -1.918 and 1.918 Answer: A Diff: 1 Type: BI Var: 1 L.O.: 6.4.3 15) Find the area in a t-distribution to the right of 2.6 if the sample has size Answer: 0.01 Diff: 1 Type: SA Var: 1 L.O.: 6.4.3 16) Find the area in a t-distribution to the right of 1.75 if the sample has size Answer: 0.089 Diff: 2 Type: SA Var: 1 L.O.: 6.4.3 17) Find the area in a t-distribution to the left of -2.7 if the sample has size Answer: 0.0054 Diff: 2 Type: SA Var: 1 L.O.: 6.4.3
213
18) Find the area in a t-distribution to the left of -0.68 if the sample has size Answer: 0.252 Diff: 2 Type: SA Var: 1 L.O.: 6.4.3 Use the following to answer the questions below: For each of the following, find the standard error of the distribution of sample means. Use two decimal places when reporting your standard error. 19) Samples of size 15 from a population with mean 25 and standard deviation 4. Answer: 1.03 Explanation: mean: μ = 25 SE =
= 1.03
Diff: 2 Type: SA L.O.: 6.4.1
Var: 1
20) Samples of size 50 from a population with mean 450 and standard deviation 75. Answer: 10.61 Explanation: mean: μ = 450 SE =
= 10.61
Diff: 2 Type: SA L.O.: 6.4.1
Var: 1
21) Samples of size 25 from a population with mean 10 and standard deviation 2. Answer: 0.40 Explanation: mean: μ = 10 SE =
= 0.40
Diff: 2 Type: SA L.O.: 6.4.1
Var: 1
22) Samples of size 250 from a population with mean 80 and standard deviation of 15. Answer: 0.95 Explanation: mean: μ = 80 SE =
= 0.95
Diff: 2 Type: SA L.O.: 6.4.1
Var: 1
214
Use the following to answer the questions below: A dotplot and the summary statistics for a sample are provided. In each case, indicate whether or not it is appropriate to use the t-distribution. 23) n = 12;
= 4.75; s = 1.603
A) Appropriate B) Not Appropriate Answer: A Diff: 2 Type: BI L.O.: 6.4.2 24) n = 10;
Var: 1
= 7.80; s = 9.28
A) Appropriate B) Not Appropriate Answer: B Diff: 2 Type: BI L.O.: 6.4.2
Var: 1
215
25) n = 100;
= 9.93; s = 9.247
A) Appropriate B) Not Appropriate Answer: A Diff: 2 Type: BI L.O.: 6.4.2 26) n = 15;
Var: 1
= 44; s = 7.32
A) Appropriate B) Not Appropriate Answer: B Diff: 2 Type: BI L.O.: 6.4.2
Var: 1
216
Use the following to answer the questions below: Many major television networks air coverage of the incoming election results during primetime hours. The provided boxplot displays the amount of time (in minutes) spent watching election coverage for a random sample of 25 U.S. adults. In this sample, the average time spent watching election coverage was 80.44 minutes with standard deviation of 43.99 minutes.
27) Is it reasonable to use the t-distribution to construct a confidence interval for the average amount of time spent watching election coverage by U.S. adults? A) Yes B) No Answer: A Explanation: Yes. We have a moderate sample size (n = 25) and there are no outliers or extreme skewness in the boxplot. Diff: 2 Type: MC Var: 1 L.O.: 6.4.2
217
28) Construct a 90% confidence interval for the average amount of time U.S. adults spent watching election coverage. Use two decimal places in your margin of error. A) 65.39 to 95.49 hours B) 66.24 to 94.64 hours C) 68.14 to 92.74 hours D) 71.33 to 89.55 hours Answer: A Explanation: 80.44 ± 1.711 80.44 ± 15.05 65.39 to 95.49 hours Diff: 2 Type: BI L.O.: 6.5.1
Var: 1
29) Construct a 90% confidence interval for the average amount of time U.S. adults spent watching election coverage. Provide an interpretation of your interval in the context of this data situation. Answer: We are 90% sure that the average amount of time spent watching election coverage by U.S. adults is between 65.39 and 95.49 hours. Diff: 2 Type: ES Var: 1 L.O.: 3.2.4;6.5.0 30) What sample size would we need to estimate the average amount of time U.S. adults watching election coverage with 99% confidence and a margin of error of ± 5 hours? A) 514 U.S. adults B) 23 U.S. adults C) 22 U.S. adults D) 513 U.S. adults Answer: A Explanation: n =
514 U.S. adults Diff: 2 Type: BI L.O.: 6.5.2
= 513.24
Var: 1
218
Use the following to answer the questions below: Turkey is a staple at most traditional Thanksgiving dinners. A random sample of 12 grocery store customers were asked about the size of the turkey they were purchasing for Thanksgiving. The average weight was 13.9 pounds with a standard deviation of 2.2 pounds. The boxplot displays the distribution of the sample turkey weights.
31) Is it reasonable to use the t-distribution to construct a confidence interval for the average weight of turkeys purchased at this store? A) Yes B) No Answer: A Explanation: Yes, even though we have a small sample size, the boxplot is fairly symmetric and there are no outliers. Diff: 2 Type: MC Var: 1 L.O.: 6.4.2
219
32) Construct a 99% confidence interval for the average weight of turkeys purchased at this store. Round your margin of error to two decimal places. A) 11.93 to 15.87 pounds B) 13.33 to 14.47 pounds C) 12.57 to 15.23 pounds D) 10.80 to 17.01 pounds Answer: A Explanation: 13.9 ± 3.105 13.9 ± 1.97 11.93 to 15.87 pounds Diff: 2 Type: BI L.O.: 6.5.1
Var: 1
33) Construct a 99% confidence interval for the average weight of turkeys purchased at this store. Provide an interpretation of your interval in the context of this data situation. Answer: We are 99% sure that the mean weight of turkeys purchased at this store is between 11.93 and 15.87 pounds. Diff: 2 Type: ES Var: 1 L.O.: 3.2.4;6.5.0 34) What sample size would we need to reduce the margin of error to ±1 pound? A) 33 turkeys (customers purchasing turkeys). B) 15 turkeys (customers purchasing turkeys). C) 13 turkeys (customers purchasing turkeys). D) 24 turkeys (customers purchasing turkeys). Answer: A Explanation: n =
= 32.09
We would need 33 turkeys (customers purchasing turkeys). Diff: 2 Type: BI Var: 1 L.O.: 6.5.2
220
35) According to the Minnesota Turkey Growers Association's website, the average weight of turkeys purchased for Thanksgiving dinner is 15 pounds. Test, at the 5% level, if this sample provides evidence that the average weight of turkeys purchased at this store differs from 15 pounds. Include all of the details of the test. Answer: μ = mean weight of turkeys purchased at this store : μ = 15 : μ ≠ 15 Test Statistic: t =
= -1.732
The sample data look roughly symmetric with no outliers, so we can use the t-distribution with 11 degrees of freedom to compute the p-value. p-value = 0.111 Because the p-value is larger than the 5% significance level, we have no evidence to reject and thus have no evidence to conclude that the average weight of turkeys purchased at this store differs significantly from the 15 pounds reported by the Minnesota Turkey Growers Association. Diff: 2 Type: ES Var: 1 L.O.: 6.6.1
221
Use the following to answer the questions below: On August 8, 2012, the national average price for a gallon of regular unleaded gasoline was $3.63. The prices for a random sample of gas stations in the state of Illinois were recorded at that time. The mean price for the sampled gas stations was $3.975, with standard deviation $0.2266. A boxplot of the data is provided.
36) Is it reasonable to use the t-distribution to perform a test about the average gas price in Illinois (on August 8, 2012)? A) Yes B) No Answer: A Explanation: Yes, even though the sample size is small (10 gas stations were sampled), the sample data displayed in the boxplot is pretty symmetric, indicating that the data could have reasonably come from a normal population. Diff: 2 Type: MC Var: 1 L.O.: 6.6.0
222
37) Test, at the 5% level, if there is evidence that the average gas price in Illinois (on August 8, 2012) was significantly higher than the national average. Include all of the details of the test. Answer: μ = average gas price in Illinois (on August 8, 2012) : μ = 3.63 : μ > 3.63 Test statistic: t =
= 4.815
n = 10, so df = 9 Even though the sample size is small, we can use the t-distribution because the sample data are symmetric and thus look like they could reasonably be from a normal population. p-value = 0.00048 (right tail, using Statkey) Because the p-value is much smaller than the 5% significance level, we have very strong evidence to reject the null hypothesis and thus have very strong evidence to conclude that the average gas price in Illinois (on August 8, 2012) was significantly higher than the national average of $3.63. Diff: 2 Type: ES Var: 1 L.O.: 6.6.1 38) Construct a 95% confidence interval for the mean gas price in Illinois (on August 8, 2012). Round your margin of error to three decimal places. A) 3.813 to 4.137 B) 3.822 to 4.128 C) 3.807 to 4.143 D) 3.791 to 4.159 Answer: A Explanation: df = 9 3.975 ± 2.262 3.975 ± 0.162 3.813 to 4.137 Diff: 2 Type: BI L.O.: 6.5.1
Var: 1
223
Use the following to answer the questions below: A certain species of tree has an average life span of 130 years. A researcher has noticed a large number of trees of this species washing up along a beach as driftwood. She takes core samples from 27 of those trees, selected at random, to count the number of rings and measure the widths of the rings. Counting the rings allows the researcher to determine the age of each tree. The mean age of the sampled driftwood is 119 years old, with standard deviation 46.92 years. The sample data are plotted in the provided dotplot. One of her interests is determining if this sample provides evidence that the average age of the driftwood is less than the 130 year life span expected for this type of tree. If the average age is less than 130 years it might suggest that the trees have died from unusual causes, such as invasive beetles or logging.
39) Verify that it is reasonable to use the t-distribution to perform a test about the average age of driftwood along this beach. Answer: The sample is of moderate size (n = 27) and there is no major skewness or outliers in the dotplot. Thus, it is reasonable to use a t-distribution for inference about the mean age of driftwood along this beach. Diff: 2 Type: ES Var: 1 L.O.: 6.6.0 40) Test, at the 5% level, if there is evidence that the average age of driftwood along this beach is significantly below 130 years. Include all of the details of the test. Answer: μ = mean age of driftwood along the beach : μ = 130 : μ < 130 Test statistic: t =
= -1.218
n = 27, so df = 26 Because the data are roughly symmetric with no outliers, we can use the t-distribution to compute the p-value. p-value = 0.117 Because the p-value is larger than the 5% significance level, we have no evidence to reject and thus have no evidence to conclude that the average age of driftwood along this beach is significantly less than 130 years. Diff: 2 Type: ES Var: 1 L.O.: 6.6.1 224
Use the following to answer the questions below: A random sample of 48 students at a large university reported getting an average of 7 hours of sleep on weeknights, with standard deviation 1.62 hours. A dotplot of the data is provided.
41) Explain why it is reasonable to use a t-distribution to perform inference about the mean amount of weeknight sleep for students at this university. Answer: Because the sample is of moderate size (n = 48) and there is no major skewness or outliers apparent in the boxplot. Diff: 2 Type: ES Var: 1 L.O.: 6.5.0 42) It is recommended, for most college age students, to get 8 hours of sleep each night. Does this sample provide evidence, at the 5% level, that college students at this university get significantly less sleep, on average, than what is recommended? Include all of the details of the test. Answer: μ = mean amount of weeknight sleep for students at this university :μ=8 :μ<8 Test statistic: t =
= -4.28
Because the sample size is so large, we can use the t-distribution with 47 degrees of freedom to compute the p-value. p-value = 0.000046 (left-tail, using Statkey) Because the p-value is considerably smaller than the 5% significance level, we have very strong evidence to reject and thus have very strong evidence to conclude that students at this university, on average, get significantly less sleep than the recommended 8 hours. Diff: 2 Type: ES Var: 1 L.O.: 6.6.1
225
43) Construct a 95% confidence interval for the average amount of weeknight sleep for students at this university. Round the margin of error to two decimal places. A) 6.53 to 7.47 hours B) 6.54 to 7.46 hours C) 6.40 to 7.60 hours D) 6.48 to 7.52 hours Answer: A Explanation: n = 48, so df = 47 and thus t* = 2.012 7 ± 2.012 7 ± 0.47 6.53 to 7.47 hours Diff: 2 Type: BI L.O.: 6.5.1 6.3
Var: 1
Inference for a Difference in Proportions
Use the following to answer the questions below: A study published in the American Journal of Health Promotion by researchers at the University of Minnesota (U of M) found that 124 out of 1,923 U of M females had over $6,000 in credit card debt while 61 out of 1,236 males had over $6,000 in credit card debt. 1) Verify that the sample size is large enough in each group to use the normal distribution to construct a confidence interval for a difference in two proportions. Answer: = 124 > 10 (1 -
) = 1,799 > 10 = 61 > 10
(1 -
) = 1,175 > 10
Since all are greater than 10, the sample size is large enough in each group to use the normal distribution to construct the confidence interval. Diff: 2 Type: ES Var: 1 L.O.: 6.8.0
226
2) Construct a 95% confidence interval for the difference between the proportions of female and male University of Minnesota students who have more than $6,000 in credit card debt Round your sample proportions and margin of error to four decimal places. A) -0.0012 to 0.0314 B) -0.0300 to 0.0365 C) -0.0074 to 0.0376 D) 0.0068 to 0.0234 Answer: A Explanation: = 124/1,923 - 61/1,236 = 0.0645 - 0.0494 = 0.0151
0.0151 ± 1.96 0.0151 ± 1.96(0.008328935) 0.0151 ± 0.0163 -0.0012 to 0.0314 Diff: 2 Type: BI Var: 1 L.O.: 6.8.1
227
3) Test, at the 5% level, if there is evidence that the proportion of female students at U of M with more than $6,000 credit card debt is greater than the proportion of males at U of M with more than $6,000 credit card debt. Include all details of the test. Answer: : = :
>
= proportion of female U of M students with more than $6,000 credit card debt = proportion of male U of M students with more than $6,000 credit card debt Pooled proportion (for standard error):
=
=
Test statistic: z =
= 0.0586
= 1.763
We can use the normal distribution to compute the p-value because both samples have at least 10 successes and failures. p-value = 0.039 (Right tail probability found using Statkey) Because the p-value is less than the 5% significance level, we have evidence to reject the null hypothesis and conclude that the proportion of female U of M students with more than $6,000 in credit card debt is significantly higher than the proportion of male U of M students with more than $6,000 in credit card debt. Diff: 2 Type: ES Var: 1 L.O.: 6.9.1
228
Use the following to answer the questions below: Every year since the 1957-58 academic year, the National Science Foundation (NSF) conducts its Survey of Earned Doctorates (SED) of all individuals receiving research doctoral degrees from accredited U.S. institutions. The results from the 2010 survey published on the NSF website indicate that 78.2% of individuals earning their doctorate in the physical sciences have no graduate debt while 48.3% of those earning their doctorate in the social sciences have no graduate debt. Of the 48,069 research doctorates granted in 2010, 93% completed the SED, thus the information collected by the NSF can be good approximations of the population parameters. 4) Suppose we take random samples of 100 individuals who earned a doctorate in the physical sciences (in 2010) and 100 individuals who earned a doctorate in the social sciences (in 2010). Find the mean and standard error (using four decimal places) of the distribution of differences in sample proportions and indicate if the sample sizes are large enough to use the Central Limit Theorem. Answer: mean = 0.782 - 0.483 = 0.299 SE =
= 0.0648
= 100*0.782 = 78.2 > 10 (1 -
) = 100*(1 - 0.782) = 21.8 > 10
= 100*0.483 = 48.3 > 10 (1 -
) = 100*(1-0.483) = 51.7 > 10
Since all are greater than 10, the sample sizes are large enough to use the Central Limit Theorem. Diff: 2 Type: ES Var: 1 L.O.: 6.7.2;6.7.3
229
5) Suppose we take random samples of 25 individuals who earned a doctorate in the physical sciences (in 2010) and 50 individuals who earned a doctorate in the social sciences (in 2010). Find the mean and standard error (using four decimal places) of the distribution of differences in sample proportions and indicate if the sample sizes are large enough to use the Central Limit Theorem. Answer: mean = 0.782 - 0.483 = 0.299 SE =
= 0.1087
= 25*0.782 = 19.55 > 10 (1 -
) = 25*(1 - 0.782) = 5.45 < 10 X
= 50*0.483 = 24.15 > 10 (1 -
) = 50*(1 - 0.483) = 25.85 > 10 (1 -
Since
) is not greater than 10, the sample sizes are NOT large enough to use the Central
Limit Theorem. Diff: 2 Type: ES L.O.: 6.7.2;6.7.3
Var: 1
6) Suppose we take random samples of 50 individuals who earned a doctorate in the physical sciences (in 2010) and 25 individuals who earned a doctorate in the social sciences (in 2010). Find the mean and standard error (using four decimal places) of the distribution of differences in sample proportions and indicate if the sample sizes are large enough to use the Central Limit Theorem. Answer: mean = 0.782 - 0.483 = 0.299 SE =
= 0.1157
= 50*0.782 = 39.1 > 10 (1 -
) =50*(1 - 0.782) = 10.9 > 10
= 25*0.483 = 12.075 > 10 (1 -
) = 25*(1 - 0.483) = 12.925 > 10
Since all are greater than 10, the sample sizes are large enough to use the Central Limit Theorem. Diff: 2 Type: ES Var: 1 L.O.: 6.7.2;6.7.3 230
Use the following to answer the questions below: Situations comparing two proportions are described. In each case, determine whether the situation involves comparing proportions for two groups or comparing two proportions from the same group. 7) Compare the proportion of U.S. adults who have a positive opinion about the media and the proportion of U.S. adults who have a negative opinion about the media. A) Comparing proportions for two groups B) Comparing two proportions from the same group Answer: B Diff: 2 Type: BI Var: 1 L.O.: 6.7.1 8) Comparing proportion of milk chocolate M&M's that are blue to the proportion of milk chocolate M&M's that are green. A) Comparing proportions for two groups B) Comparing two proportions from the same group Answer: B Diff: 2 Type: BI Var: 1 L.O.: 6.7.1 9) Comparing the proportion of milk chocolate M&M's that are blue to the proportion of dark chocolate M&M's that are blue. A) Comparing proportions for two groups B) Comparing two proportions from the same group Answer: A Diff: 2 Type: BI Var: 1 L.O.: 6.7.1 10) Compare the proportion of female students at a university who play a sport to the proportion of male students at a university who play a sport. A) Comparing proportions for two groups B) Comparing two proportions from the same group Answer: A Diff: 2 Type: BI Var: 1 L.O.: 6.7.1
231
11) A study to investigate the dominant paws in cats was described in the scientific journal Animal Behaviour. The researchers used a random sample of 42 domestic cats. In this study, each cat was shown a treat (5 grams of tuna), and while the cat watched, the food was placed inside a jar. The opening of the jar was small enough that the cat could not stick its head inside to remove the treat. The researcher recorded the paw that was first used by the cat to try to retrieve the treat. This was repeated 100 times for each cat (over a span of several days). The paw used most often was deemed the dominant paw (note that one cat used both paws equally and was classified as "ambidextrous"). The researchers were also interested in comparing the proportion of "left-pawed" cats for male and female cats. Of the 21 male cats in the sample, 19 were classified as "left-pawed" while only 1 of the 21 female cats were considered to be "left-pawed". Explain why it would not be appropriate to use the normal distribution to construct a confidence interval for the difference in the proportion of male and female cats that are "left-pawed." Answer: = 19 > 10 (1 -
) = 2 < 10 X
= 20 > 10 (1 -
) = 1 < 10 X
Since there are only two "failures" in the sample of male cats and only 1 "failure" in the sample of female cats, the sample sizes are not large enough to use the Central Limit Theorem. Diff: 2 Type: ES Var: 1 L.O.: 6.8.1
232
Use the following to answer the questions below: February 12, 2009 marked the anniversary of Charles Darwin's birth. To celebrate, Gallup, a national polling organization, surveyed 1,018 randomly selected American adults about their education level and their beliefs about the theory of evolution. In their sample, 325 of their respondents had some college education and 228 were college graduates. Among the 325 respondents with some college education, 133 said that they believed in the theory of evolution. Among the 228 respondents who were college graduates, 121 said that they believed in the theory of evolution. 12) Verify that the sample size is large enough in each group to use the normal distribution to construct a confidence interval for a difference in proportions. Answer: =
= 0.409 = sample proportion of those with some college education that believe in
evolution =
= 0.531 = sample proportion of the college graduates that believe in evolution
= 133 > 10 (1 -
) = 192 > 10 = 121 > 10
(1 -
) = 107 > 10
Since all are greater than 10, the sample sizes are large enough to apply the Central Limit Theorem and use a normal distribution to construct a confidence interval for a difference in proportions. Diff: 2 Type: ES Var: 1 L.O.: 6.8.0
233
13) Construct a 90% confidence interval for the difference between the proportions of college graduates and individuals with some college who believe in the theory of evolution. Round your sample proportions and margin of error to three decimal places. A) 0.052 to 0.192 B) 0.038 to 0.206 C) 0.079 to 0.165 D) 0.012 to 0.232 Answer: A Explanation: =
= 0.531 = sample proportion of the college graduates that believe in evolution
=
= 0.409 = sample proportion of those with some college education that believe in
evolution 0.531 - 0.409 = 0.122 0.122 ± 1.645 0.122 ± 1.645(0.04284889) 0.122 ± 0.070 0.052 to 0.192 Diff: 2 Type: BI L.O.: 6.8.1
Var: 1
234
14) Test, at the 10% level, if there is evidence that the proportion of college graduates that believe in evolution differs significantly from the proportion of individuals with some college education that believe in evolution. Include all of the details of the test. Answer: = proportion of college graduates that believe in evolution = proportion of adults with some college education that believe in evolution :
=
:
≠
Pooled proportion (for standard error):
=
=
Test statistic: z =
= 0.459
= 2.834
Since both samples have more than 10 successes and failures, the sample sizes are large enough to use a normal distribution for computing the p-value. p-value = 0.005 (p-value for a two-sided alternative found using Statkey) Since the p-value is (considerably) smaller than the 10% significance level, we have strong evidence to reject and thus have strong evidence to conclude that there is a significant difference in the proportion of college graduates and proportion of adults with some college who believe in the theory of evolution. Diff: 2 Type: ES Var: 1 L.O.: 6.9.1
235
Use the following to answer the questions below: In a survey, Gallup asked a random sample of U.S. adults if they would prefer to have a job outside the home, or if they would prefer to stay home to care for the family and home. Of the 504 males they surveyed, 391 said that they would prefer to have a job outside of the home. Of the 473 females they surveyed, 254 said that they would prefer a job outside of the home. 15) Verify that the sample size is large enough in each group to use the normal distribution to construct a confidence interval for a difference in proportions. Answer: =
= 0.776 = sample proportion of males who would prefer to have a job outside of the
home =
= 0.537 = sample proportion of females who would prefer to have a job outside of the
home = 391 > 10 (1 -
) = 113 > 10
= 254 > 10 (1 -
) = 219 > 10
Since all are greater than 10, the sample sizes are large enough to use the normal distribution to construct a confidence interval for a difference in proportions. Diff: 2 Type: ES Var: 1 L.O.: 6.8.0
236
16) Construct a 99% confidence interval for the difference between the proportion of men and women who would prefer to have a job outside the home. Use three decimal places when computing the sample proportions and margin of error. A) 0.163 to 0.315 B) 0.190 to 0.288 C) 0.181 to 0.297 D) 0.197 to 0.295 Answer: A Explanation: =
= 0.776 = sample proportion of males who would prefer to have a job outside of
the home =
= 0.537 = sample proportion of females who would prefer to have a job outside of
the home -
= 0.776 - 0.537 = 0.239
0.239 ± 2.575 0.239 ± 2.575(0.02950484) 0.239 ± 0.076 0.163 to 0.315 Diff: 2 Type: BI L.O.: 6.8.1
Var: 1
237
17) Test, at the 1% level, if there is evidence that the proportion of men who would prefer a job outside of the home is significantly higher than the proportion of women who would prefer a job outside of the home. Answer: = proportion of U.S. adult men who would prefer a job outside of the home = proportion of U.S. adult women who would prefer a job outside of the home :
=
:
>
Pooled proportion (for standard error):
Test statistic: z =
=
=
= 0.660
= 7.881
Since both samples have at least 10 successes and failures, the sample sizes are large enough to use the normal distribution to compute the p-value. p-value ≈ 0 (using Statkey) Because the p-value is less than the 1% significance level, we have very strong evidence to reject and thus we have very strong evidence that the proportion of men who would prefer a job outside of the home is significantly larger than the proportion of women who would prefer a job outside of the home. Diff: 2 Type: ES Var: 1 L.O.: 6.9.1 Use the following to answer the questions below: Consider taking random samples of size 50 from Population A with proportion 0.45 and random samples of size 40 from Population B with proportion 0.38. 18) Find the mean of the distribution of differences in sample proportions, A) 0.45 B) 0.38 C) 0.07 D) -0.07 Answer: C Diff: 2 Type: BI L.O.: 6.7.2
Var: 1
238
-
.
-
19) Find the standard error of the distribution of differences in sample proportions, A) 0.0108 B) 0.0704 C) 0.0768 D) 0.1041 Answer: A Diff: 2 Type: BI L.O.: 6.7.2
.
Var: 1
20) Are the sample sizes for both groups large enough for the Central Limit Theorem to apply so that the differences in sample proportions follow a normal distribution? A) Yes B) No Answer: A Diff: 2 Type: MC Var: 1 L.O.: 6.7.0 Use the following to answer the questions below: Consider taking random samples of size 30 from Population A with proportion 0.84 and random samples of size 60 from Population B with proportion 0.9. 21) Find the mean of the distribution of differences in sample proportions, A) 0.84 B) 0.90 C) -0.06 D) 0.06 Answer: C Diff: 2 Type: BI L.O.: 6.7.2
-
Var: 1
22) Find the standard error of the distribution of differences in sample proportions, A) 0.0060 B) 0.0387 C) 0.0669 D) 0.0773 Answer: D Diff: 2 Type: BI L.O.: 6.7.2
.
Var: 1
239
-
.
23) Are the sample sizes for both groups large enough for the Central Limit Theorem to apply so that the differences in sample proportions follows a Normal distribution? A) Yes B) No Answer: B Diff: 2 Type: MC Var: 1 L.O.: 6.7.0
240
Use the following to answer the questions below: The Gallup organization recently conducted a survey of 1,015 randomly selected U.S. adults about "Black Friday" shopping. They asked the following question: "As you know, the Friday after Thanksgiving is one of the biggest shopping days of the year. Looking ahead, do you personally plan on shopping on the Friday after Thanksgiving, or not?" Of the 515 men who responded, 16% said "Yes." Of the 500 women who responded, 20% said "Yes." 24) Construct a 95% confidence interval for the difference between the proportion of men and women who planned to shop on the Friday after Thanksgiving. Use three decimal places when computing the margin of error. A) -0.087 to 0.007 B) -0.080 to 0.000 C) -0.102 to 0.022 D) -0.103 to 0.023 Answer: A Explanation: = 515(0.16) = 82.4 > 10 (1 -
) = 432.6 > 10
) = 500(0.20) = 100 > 10 (1 -
) = 400 > 10
Since both samples have more than 10 successes and 10 failures, the samples sizes are large enough to use a normal distribution to construct the confidence interval. (0.16 - 0.20) ± 1.96
-0.04 ± 1.96(0.02410334) -0.04 ± 0.047 -0.087 to 0.007 Diff: 2 Type: BI L.O.: 6.8.1
Var: 1
241
25) Test, at the 5% level, if this sample provides evidence that the proportion of women planning to shop on Black Friday differs significantly from the proportion of men planning to shop. Include all of the details of the test. Answer: = proportion of women planning to shop on Black Friday = proportion of men planning to shop on Black Friday :
=
:
≠
Pooled proportion (for standard error):
=
Test Statistic: z =
=
= 0.1797, or about 0.18
= 1.658
Because both samples have more than 10 successes and 10 failures, the sample sizes are large enough to use the normal distribution to compute the p-value. p-value = 0.097 Because the p-value is larger than the 5% significance level, we have no evidence to reject and thus have no evidence to conclude that the proportions of men and women planning to shop the Friday after Thanksgiving are significantly different. Diff: 3 Type: ES Var: 1 L.O.: 6.9.1
242
6.4
Inference for a Difference in Means
Use the following to answer the questions below: Consider taking random samples of size 50 from Population A with mean 15 and standard deviation 3 and random samples of size 75 from Population B with mean 10 and standard deviation 5. 1) Find the mean of the distribution of differences in sample means, A) -5 B) 5 C) 10 D) 15 Answer: B Diff: 2 Type: BI L.O.: 6.10.1
Var: 1
2) Find the standard error of the distribution of differences in sample means, A) 0.1267 B) 0.3559 C) 0.5133 D) 0.7165 Answer: D Diff: 2 Type: BI L.O.: 6.10.1
Var: 1
3) How many degrees of freedom should be used when conducting inference for samples of this size? A) 49 B) 50 C) 74 D) 75 Answer: A Diff: 2 Type: BI L.O.: 6.10.0
Var: 1
243
with
Use the following to answer the questions below: Consider taking random samples of size 100 from Population A with mean 85 and standard deviation of 15 and random samples of size 60 from Population B with mean 78 and standard deviation 12. 4) Find the mean of the distribution of differences in sample means, A) -7 B) 7 C) 78 D) 85 Answer: B Diff: 2 Type: BI L.O.: 6.10.1
Var: 1
5) Find the standard error of the distribution of differences in sample means, A) 0.35 B) 0.5916 C) 2.156 D) 4.65 Answer: C Diff: 2 Type: BI L.O.: 6.10.1
Var: 1
6) How many degrees of freedom should be used when conducting inference for samples of this size? A) 99 B) 100 C) 59 D) 60 Answer: C Diff: 2 Type: BI L.O.: 6.10.0
Var: 1
244
with
Use the following to answer the questions below: Students in a large lecture class want to know who has, on average, more Facebook friends, male or female students. The data for the students are displayed in the provided dotplots and summary statistics are available in the provided table. Gender Male Female
n 68 91
678.9 616.1
s 317.6 330.6
7) Is it reasonable to use a t-distribution for inference about the difference in mean number of Facebook friends for male and female students at this university? A) Yes B) No Answer: A Explanation: Yes, even though the data are skewed for each sample, the samples sizes are rather large. Diff: 2 Type: MC Var: 1 L.O.: 6.11.0;6.12.0
245
8) Construct a 95% confidence interval for the difference in mean number of Facebook friends for male and female students at this university. Use two decimal places in your margin of error. A) -40.62 to 166.22 B) -22.43 to 148.03 C) -38.75 to 164.35 D) -33.95 to 159.55 Answer: A Explanation: Use smaller of - 1 and - 1 as degrees of freedom, so (and thus t* = 1.996) (678.9 - 616.1) ± 1.996
62.8 ± 1.996(51.81156) 62.8 ± 103.42 -40.62 to 166.22 Diff: 2 Type: BI L.O.: 6.11.1
Var: 1
246
9) Test, at the 5% level, if this sample provides evidence of a significant difference in the mean number of Facebook friends for male and female students at this university. Include all of the details of the test. Answer: = mean number of Facebook friends for male students at this university = mean number of Facebook friends for female students at this university :
=
:
≠
Test Statistic: t =
= 1.212
Because of the large sample sizes, we can use the t-distribution (with 67 degrees of freedom) to compute the p-value. p-value = 0.23 (two-tail, using Statkey) Because the p-value is much larger than the 5% significance level, we have no evidence to reject and thus have no evidence to conclude that there is a significant difference in the mean number of Facebook friends between male and female students at this university. Diff: 2 Type: ES Var: 1 L.O.: 6.12.1
247
Use the following to answer the questions below: As part of a course project, a statistics student surveyed random samples of 50 student athletes and 50 student non-athletes at his university, with the goal of comparing the heights of the two groups. His summary statistics are displayed in the provided table.
Athletes Non-athletes
n 50 50
68.96 67.28
s 4.25 3.46
10) Construct a 99% confidence interval for the difference in mean heights between student athletes and non-athletes at this university. Use two decimal places in your margin of error. A) -0.40 to 3.76 B) -0.32 to 3.68 C) -0.16 to 3.20 D) -0.20 to 3.56 Answer: A Explanation: n = 50 - this is a reasonably large sample size (and no real reason to suspect that heights are drastically skewed), so we can use a t-distribution to construct the confidence interval. n = 50, so df = 49 and thus t* = 2.68 (68.96 - 67.28) ± 2.68 1.68 ± 2.68(0.7750368) 1.68 ± 2.08 -0.4 to 3.76 Diff: 2 Type: BI L.O.: 6.11.1
Var: 1
248
11) Test, at the 5% level, if student athletes at this university are significantly taller, on average, than student non-athletes. Include all of the details. Answer: = mean height of student athletes at the university = mean height of student non-athletes at the university :
=
:
>
Test Statistic: t =
= 2.168
This is a reasonably large sample size (and no real reason to suspect that heights are drastically skewed), so we can use a t-distribution to compute the p-value for the test. Because we use p-value = 0.018 (right-tail, using Statkey) Because the p-value is smaller than the 5% significance level, we have evidence to reject and thus have evidence to conclude that student athletes are significantly taller, on average, than student non-athletes at this university. Diff: 2 Type: ES Var: 1 L.O.: 6.12.1
249
Use the following to answer the questions below: A professor with a large introductory statistics class noticed that nearly half of his students missed class the day before a long break (like Thanksgiving Break or Spring Break). He randomly called on students and found 10 students in attendance and 10 students who had skipped class. Later in his office, he examined the current course grades for the 20 students he had selected. A plot of his findings and summary statistics are provided. Note that the grades were entered as proportions, and thus a grade of 0.925 is a 92.5% in the course.
Attending Not Attending
n 10 10
0.8669 0.7811
s 0.0765 0.0805
12) Verify that it is reasonable to use a t-distribution to construct a confidence interval, or perform a test about, the difference in mean grades for the two groups of students. Answer: The sample sizes are small, but the dotplots indicate that data are reasonably symmetric with no serious outliers (thus the population they came from is likely somewhat symmetric). This means it would be okay to use a t-distribution for inference. Diff: 2 Type: ES Var: 1 L.O.: 6.11.0;6.12.0 13) Construct a 98% confidence interval for the difference between the mean grades for students attending and not attending class the day before break. Use four decimal places in your margin of error. A) -0.0132 to 0.1848 B) 0.0278 to 0.1438 C) -0.0042 to 0.1758 D) -0.0095 to 0.1811 Answer: A Explanation: n = 10 for both samples, so df = 9 and thus (0.8669 - 0.7811) ± 2.821 0.0858 ± 2.821(0.03511766) 0.0858 ± 0.0990 -0.0132 to 0.1848 Diff: 2 Type: BI L.O.: 6.11.1
Var: 1
250
14) Test, at the 5% level, if there is evidence that students who attended class before break have, on average, a significantly higher course grade than those who skipped. Include all of the details of the test. Answer: = mean course grade for students who attended the class before break = mean course grade for students who skipped the class before break :
=
:
>
Test Statistic: t =
= 2.44
While both sample sizes are small, the dotplots appear reasonably symmetric and don't have any major outliers, so we can use a t-distribution to compute the p-value (with 9 degrees of freedom since both samples are of size 10). p-value = 0.019 (right tail, using Statkey) Because the p-value is smaller than the 5% significance level, we have evidence that students who attended the class before break have, on average, a significantly higher course grade than the students who skipped. Diff: 2 Type: ES Var: 1 L.O.: 6.12.1 Use the following to answer the questions below: Consider a test of with
:
=
versus
:
and
> with
15) What is the test statistic for this test? A) -1.688 B) 1.688 C) 4.74 D) 0.7701 Answer: B Diff: 2 Type: BI Var: 1 L.O.: 6.12.0
251
using the sample results
16) What are the degrees of freedom for this test? A) 4 B) 23 C) 25 D) 27 Answer: B Diff: 2 Type: BI Var: 1 L.O.: 6.12.0 17) What value is closest to the p-value for this test? A) 0.104 B) 0.948 C) 0.0014 D) 0.052 Answer: D Diff: 2 Type: BI Var: 1 L.O.: 6.12.0 Use the following to answer the questions below: Consider constructing a 90% confidence interval for with
and
18) What is the best estimate of A) 7 B) -7 C) 103 D) 96 Answer: A Diff: 2 Type: BI L.O.: 6.11.0
-
using the sample results with
?
Var: 1
19) What are the degrees of freedom in this situation? A) 10 B) 49 C) 35 D) 39 Answer: D Diff: 2 Type: BI Var: 1 L.O.: 6.11.0
252
20) What is the t* for the 90% confidence interval? A) 1.645 B) 1.677 C) 1.685 D) 2.023 Answer: C Diff: 2 Type: BI Var: 1 L.O.: 6.11.0 21) What is the margin of error for this confidence interval? A) 8.44 B) 5.01 C) 1.73 D) 11.91 Answer: A Diff: 2 Type: BI Var: 1 L.O.: 6.11.0
253
Use the following to answer the questions below: A small university is trying to monitor its electricity usage. For a random sample of 30 weekend days (Saturdays and Sundays), the student center used an average of 94.26 kilowatt hours (kWh) with standard deviation 43.29. For a random sample of 60 weekdays, (Monday - Friday), the student center used an average of 112.63 kWh with standard deviation 32.07. 22) Test, at the 5% level, if significantly more electricity is used at the student center, on average, on weekdays than weekend days. Include all details of the test. Answer: = mean amount of electricity (in kWh) used on weekdays = mean amount of electricity (in kWh) used on weekend days :
=
:
>
Test Statistic: t =
= 2.059
Both sample sizes are large, so we can use the t-distribution with 29 degrees of freedom. p-value = 0.024 (right tail, using Statkey) There is evidence to reject
and thus there is evidence to conclude that more electricity is
used at the student center, on average, on weekdays than on weekend days. Diff: 2 Type: ES Var: 1 L.O.: 6.12.1
254
23) Construct a 95% confidence interval for the difference in mean electricity use at the student center between weekdays and weekend days. Use two decimal places in your margin of error. A) 0.12 to 36.62 kWh B) -6.80 to 43.54 kWh C) -4.61 to 41.35 kWh D) -3.38 to 40.51 kWh Answer: A Explanation: Both sample sizes are large so we can use the t-distribution to compute the p-value. Use so
(112.63 - 94.26) ±2.045 18.37 ± 2.045(8.922381) 18.37 ± 18.25 0.12 to 36.62 kWh Diff: 2 Type: BI L.O.: 6.11.1
Var: 1
Use the following to answer the questions below: In a given year, the average score on the Mathematics portion of the ACT for males was 21.3 with standard deviation 5.3. The average score on the Mathematics portion of the ACT for females was 20.2 with standard deviation 4.8. 24) If random samples are taken with 50 males and 70 females, find the standard error of the distribution of differences in sample means, where and represent the sample means for males and females, respectively. Report the standard error with four decimal places. Answer: 0.9439 Explanation: standard error: Diff: 2 Type: SA L.O.: 6.10.1
= 0.9439
Var: 1
25) If random samples are taken with 60 males and 60 females, find the standard error of the distribution of differences in sample means, where and represent the sample means for males and females, respectively. Report the standard error with four decimal places. Answer: 0.9231 Explanation: standard error:: Diff: 2 Type: SA L.O.: 6.10.1
= 0.9231
Var: 1 255
26) If random samples are taken with 120 males and 105 females, find the standard error of the distribution of differences in sample means, where and represent the sample means for males and females, respectively. Report your standard error with four decimal places. Answer: 0.6734 Explanation: standard deviation: Diff: 2 Type: SA L.O.: 6.10.1
= 0.6734
Var: 1
27) What effect does increasing the sample sizes have on the center of the distribution? A) It increases the center of the distribution. B) It decreases the center of the distribution. C) It has no effect on the center of the distribution. Answer: C Diff: 2 Type: BI Var: 1 L.O.: 6.10.1
256
28) What effect does increasing the sample sizes have on the spread of the distribution? A) It increases the spread of the distribution. B) It decreases the spread of the distribution. C) It has no effect on the spread of the distribution. Answer: B Diff: 2 Type: BI Var: 1 L.O.: 6.10.1 6.5
Paired Difference in Means
Use the following to answer the questions below: Students in a small statistics class were asked to count the number of scars both on their "dominant" hand (the one they use most often) and on their "off" hand. The summary statistics are provided. It is of interest to compare the average number of scars on the dominant and off hands.
Dominant Off Difference (D - O)
n 25 25 25
1.92 2.72 -0.8
s 2.326 3.007 2.363
1) Why is it appropriate to use paired data in this analysis? Explain briefly. Answer: Paired data difference in means would be more appropriate here because each student counts the number of scars on both their dominant and their off hand. Since there are two measurements on each student, this is paired data. Diff: 2 Type: ES Var: 1 L.O.: 6.13.1
257
2) Boxplots of the raw data are provided. Would it be appropriate to use a t-distribution to construct a confidence interval for, or perform a test about, the difference in the mean number of scars on dominant and off hands? Specifically mention which boxplot(s) you are using to justify your answer.
Answer: Since this is a paired data problem, we need to look at the boxplot of the differences. Since the differences look reasonably symmetric, it is appropriate to construct a confidence interval for, or perform a test about, the difference in the mean number of scars on dominant and off hands. Diff: 2 Type: ES Var: 1 L.O.: 6.13.0;6.13.3;6.13.4 3) Construct a 90% confidence interval for the difference in mean number of scars on dominant and off hands. Round your margin of error to two decimal places. A) -1.61 to 0.01 B) -1.67 to 0.07 C) -1.71 to 0.11 D) -1.81 to 0.21 Answer: A Explanation: n = 25, so df = 24. The t endpoint for a 90% confidence interval when there are 24 degrees of freedom is (found using Statkey). -0.8 ± 1.711 -0.8 ± 0.81 -1.61 to 0.01 Diff: 2 Type: BI L.O.: 6.13.3
Var: 1
258
4) Test to see if the mean number of scars on dominant hands is significantly different from the mean number of scars on off hands. Use a 10% significance level. Include all of the details of the test. Answer: = difference in mean number of scars on dominant and off hands (dominant - off) (Note -
= mean number of scars on dominant hand - mean number of scars on off hand would also
be acceptable.) :
=0
:
≠0 = -1.693
Test statistic: t = n = 25, so df = 24
p-value = 0.103 (two-sided p-value found in Statkey using a t distribution with 24 degrees of freedom) Since the p-value is larger than the 10% significance level, we have no evidence to reject and thus have no evidence conclude that there is a significant difference in the mean number of scars on dominant and off hands. Diff: 2 Type: ES Var: 1 L.O.: 6.13.4 Use the following to answer the questions below: The Math and Verbal SAT scores for a random sample of 10 students from a large introductory statistics course are provided. Student Math SAT Verbal SAT Differences (Math Verbal)
1 640 580
2 620 690
3 700 560
4 680 560
5 600 600
6 720 700
7 770 620
8 580 620
9 660 710
10 580 500
655 614
s 63.1 68.8
60
-70
140
120
0
20
150
-40
-50
80
41
81.0
5) Which data analysis method is more appropriate in this situation: paired data difference in means or difference in means with two separate groups? A) Paired data difference in means B) Difference in means with two separate groups Answer: A Explanation: Paired data difference in means is more appropriate in this situation because each individual in the sample provided two responses (one for Math SAT and one for Verbal SAT). Diff: 2 Type: BI Var: 1 L.O.: 6.13.1 259
6) Boxplots of the raw data are provided. Would it be appropriate to use a t-distribution to construct a confidence interval for, or perform a test about, the difference in the mean Math and Verbal SAT scores? Specifically mention which boxplot(s) you are using to justify your answer.
Answer: Since this is a paired data problem, we need to look at the boxplot of the differences. Since the differences look reasonably symmetric, it is appropriate to construct a confidence interval for, or perform a test about, the difference in the mean Math and Verbal SAT scores. Diff: 2 Type: ES Var: 1 L.O.: 6.13.0 7) Construct a 90% confidence interval for the difference in mean Math and Verbal SAT scores for students in the class. Use two decimal places in your margin of error. A) -5.95 to 87.95 B) -9.20 to 91.20 C) -10.13 to 92.13 D) -8.439 to 90.44 Answer: A Explanation: n = 10, so df = 9 and thus t* = 1.833. 41 ± 1.833 41 ± 1.833(25.614) 41 ± 46.95 -5.95 to 87.95 Diff: 2 Type: BI L.O.: 6.13.3
Var: 1
260
8) Test, at the 10% level, if Math SAT scores are significantly higher, on average, than Verbal SAT scores for students in the class. Include all of the details of the test. Answer: = mean Math SAT score for students in the class = mean Verbal SAT score for students in the class :
=
:
>
Test statistic: t =
= 1.60
Because the differences are relatively symmetric, we can use the t-distribution with 9 degrees of freedom to compute the p-value. p-value = 0.072 (right tail, using Statkey) Because the p-value is smaller than the 10% significance level, there is some (somewhat weak) evidence to reject and thus there is some (somewhat weak) evidence to conclude that Math SAT scores are, on average, significantly higher than Verbal SAT scores for students in the class. Diff: 2 Type: ES Var: 1 L.O.: 6.13.4 Use the following to answer the questions below: As part of a course project, a statistics student surveyed random samples of 50 student athletes and 50 student non-athletes at his university, with the goal of comparing the heights of the two groups. His summary statistics are displayed in the provided table.
Athletes Non-athletes
n 50 50
68.96 67.28
s 4.25 3.46
9) Which data analysis method is more appropriate in this situation: paired data difference in means or difference in means with two separate groups? A) Difference in means with two separate groups B) Paired data difference in means Answer: A Explanation: Difference in means with two separate groups is the more appropriate method in this situation. An individuals in one sample cannot be "connected" to an individual in the other sample, so the data are not paired. Diff: 2 Type: BI Var: 1 L.O.: 6.13.1
261
Use the following to answer the questions below: "Black Friday," which occurs annually the day after Thanksgiving, is one of the biggest shopping days of the year. During the holiday season, many stores created controversy by starting their mega-sales on Thanksgiving itself. In a random sample of 25 individuals who shopped during the Black Friday four-day weekend (Thursday - Sunday), the average amount spent was $399.40 with standard deviation $171.10. The data are displayed in the provided dotplot.
10) Construct a 95% confidence interval for the average amount spent by individuals who shopped over the Black Friday weekend. Use two decimal places in your margin of error. A) $328.77 to $470.03 B) $311.28 to $487.52 C) $332.33 to $466.47 D) $324.11 to $474.69 Answer: A Explanation: First note that the distribution of amounts spent does not seem to be severely skewed and there are no major outliers, and thus with a sample of size we can use the t-distribution with 24 degrees of freedom to construct the confidence interval. 399.40 ± 2.064 399.40 ± 2.064(34.22) 399.40 ± 70.63 $328.77 to $470.03 Diff: 2 Type: BI L.O.: 6.5.1
Var: 1
11) Construct a 95% confidence interval for the average amount spent by individuals who shopped over the Black Friday weekend. Provide an interpretation of your interval in the context of this data situation. Answer: We are 95% sure that the average amount spent by individuals shopping over the 2012 Black Friday weekend is between $328.77 and $470.03. Diff: 2 Type: ES Var: 1 L.O.: 3.2.4;6.5.0
262
12) A natural question would be if more money was spent over the Black Friday weekend than over the previous year's Black Friday weekend (which did not start on Thursday). What information would be necessary to address this question using the paired data difference in means method? How could the data be collected so that the difference in means for two separate groups method would be most appropriate? Answer: In order to analyze the data as paired, we would need to know how much each of the 25 individuals in the sample spent last year vs. this year. We would need to be able to match up the two amounts for each individual to compute the differences in the amount spent. In order to analyze the data using two separate groups, we would just need responses from a sample of individuals about how much they spent in the previous year—they can be completely different people. Diff: 3 Type: ES Var: 1 L.O.: 6.13.1
263
13) Suppose we know that in a random sample of n = 22 individuals who shopped over Black Friday weekend in 2011 the average amount spent was $381.30 with standard deviation $119.80. Construct a 95% confidence interval for the difference in the mean amount spent between the 2012 and 2011 Black Friday weekends. Round the margin of error to two decimal places. Recall that for the 2012 sample of 25 individuals, the average amount spent was $399.40 with standard deviation $171.10. Dotplots of both samples are provided.
A) -$70.72 to $106.92 B) -$91.86 to $128.06 C) -$65.59 to $101.79 D) -$76.07 to $112.27 Answer: A Explanation: First note that neither of the two samples appears to be severely skewed or have major outliers, and thus it is reasonable to use a t-distribution to construct the confidence interval. df = 21 (smaller of
- 1 and
- 1), so t = 2.08
(399.40 - 381.30) ± 2.08
18.10 ± 2.08(42.701) 18.10 ± 88.82 -$70.72 to $106.92 Diff: 2 Type: BI L.O.: 6.11.1
Var: 1
264
14) Would the 95% confidence interval provide evidence that the average amount spent over the 2011 Black Friday weekend differs from the average amount spent over the 2012 Black Friday weekend? A) No B) Yes Answer: A Explanation: No, the confidence interval contains 0, as well as positive and negative values, for the plausible values for the difference in the population means. Thus, we cannot conclude which year has the higher population mean. Diff: 2 Type: MC Var: 1 L.O.: 6.11.0
265
15) Suppose we know that in a random sample of n = 22 individuals who shopped over Black Friday weekend in 2011 the average amount spent was $381.30 with standard deviation $119.80. Recall that for the 2012 sample of 25 individuals, the average amount spent was $399.40 with standard deviation $171.10. Dotplots of both samples are provided.
Test, at the 5% level, if the samples provide evidence that Black Friday shoppers spent more, on average, in 2012 than they did in 2011. Include all of the details of the test. Answer: = mean amount spent by shoppers over Black Friday weekend in 2012 = mean amount spent by shoppers over Black Friday weekend in 2011 :
=
:
>
Test Statistic: t =
= 0.424
Neither of the two samples appears to be severely skewed or have major outliers, and thus it is reasonable to use a t-distribution to compute the p-value. p-value = 0.338 (right-tail, using Statkey) Because the p-value is so much larger than the 5% significance level, we have no evidence to reject and thus have no evidence to conclude that shoppers spent significantly more, on average, over the 2012 Black Friday weekend than they did over the 2011 Black Friday weekend. Diff: 2 Type: ES Var: 1 L.O.: 6.12.1
266
Use the following to answer the questions below: Zumba, often described as a Latin-inspired dance fitness party, is currently one of the most popular group fitness classes, but its health benefits have been little studied. An exercise science professor at a large university conducted a study to investigate some of the health benefits of Zumba. He recorded the weight of 9 female college students before they began a six week long Zumba program. As part of the program, they took a 60 minute long Zumba class three days a week. At the end of the program, the subjects were weighed again. Of interest is their weight loss, defined as weight before the program started minus weight after completing the program. The results are displayed in the following table. Student Before Weight After Weight Difference (Before - After)
1 134 131
2 152 147
3 145 144
4 120 121
5 136 134
6 129 125
7 163 156
8 147 144
9 131 131
3
5
1
-1
2
4
7
3
0
The mean weight loss for the sample was 2.667 pounds with standard deviation 2.5 pounds. A dotplot of the differences is provided.
16) Construct a 99% confidence interval for the mean weight loss. Use three decimal places in your margin of error. A) -0.129 to 5.463 pounds B) -0.316 to 5.081 pounds C) -0.521 to 4.813 pounds D) -0.223 to 5.111 pounds Answer: A Explanation: Even though the sample size is small, the distribution appear reasonably symmetric with no major outliers, and thus we can use the t-distribution with 8 degrees of freedom. t* = 3.355 2.667 ± 3.355 2.667 ± 3.355(0.8333333) 2.667 ± 2.796 -0.129 to 5.463 pounds Diff: 2 Type: BI L.O.: 6.13.3
Var: 1 267
17) Test, at the 1% level, if there is evidence that the Zumba program is effective for weight loss. Include all of the details of the test. Answer: = mean weight before the program began = mean weight after the program has completed :
=
:
>
Test statistic: t =
= 3.2004
Even though the sample size is small, the distribution appears reasonably symmetric with no major outliers, and thus we can use the t-distribution with 8 degrees of freedom. p-value = 0.0063 (right tail, using Statkey) Because the p-value is smaller than the significance level, there is evidence to reject evidence to conclude that Zumba is effective for weight loss. Diff: 2 Type: ES Var: 1 L.O.: 6.13.4
268
and thus
Use the following to answer the questions below: A 1997 study described in the European Journal of Clinical Nutrition compares the growth of vegetarian and omnivorous children, ages 7 - 11, in Northwest England. In the study, each of the 50 vegetarian children in the study was matched with an omnivorous child of the same age with similar demographic characteristics. One of the aspects on which the children were compared was their body mass index (BMI). The differences in BMI for each pair of children (one vegetarian and one omnivore) was computed as vegetarian BMI minus omnivore BMI.
Vegetarian Omnivorous Difference (Vegetarian - Omnivorous)
n 50 50
16.76 17.12
s 1.91 2.23
50
-0.36
2.69
18) Which data analysis method is more appropriate in this situation: paired data difference in means or difference in means with two separate groups? A) Paired data difference in means B) Difference in means with two separate groups Answer: A Explanation: Paired data difference in means — each child is matched, or paired, with a child who is very similar to them (with the only difference being vegetarian versus omnivore). Diff: 3 Type: BI Var: 1 L.O.: 6.13.1 19) Construct a 95% confidence interval for the difference in mean BMI between vegetarian and omnivorous children. Use three decimal places in your margin of error. A) -1.125 to 0.405 B) -1.433 to 0.713 C) -1.340 to 0.620 D) -1.312 to 0.592 Answer: A Explanation: The sample size is fairly large so we can use the t-distribution with 49 degrees of freedom to construct the confidence interval. -0.36 ± 2.01 -0.36 ± 2.01(0.380423448) -0.36 ± 0.765 -1.125 to 0.405 Diff: 2 Type: BI L.O.: 6.13.3
Var: 1
269
20) Test, at the 5% level, if there is evidence that the mean BMI for vegetarian children differs significantly from the mean BMI for omnivorous children. Include all of the details of the test. Answer: = mean BMI for vegetarian children = mean BMI for omnivorous children (alternatively, they could define
to represent the mean difference, where the differences are
defined as :
=
:
≠
Test statistic: t =
= -0.946
The sample size is fairly large so we can use the t-distribution with 49 degrees of freedom to compute the p-value for the test. p-value = 0.349 (two-tail, using Statkey) Because the p-value is considerably larger than the 5% significance level, we have no evidence to reject and thus have no evidence to conclude that the mean BMI for vegetarian children differs significantly from the mean BMI for omnivorous children. Diff: 2 Type: ES Var: 1 L.O.: 6.13.4
270
21) Construct a 96% confidence interval for
-
using the paired data in the following
table. Round all values to three decimal places. Case Treatment 1 Treatment 2
1 50 48
2 53 52
3 48 45
4 61 56
5 58 55
6 56 56
Assume that the results come from random samples from populations that are approximately normal and that the differences are computed using A) 0.362 to 4.304 B) 0.316 to 4.350 C) 0.492 to 4.174 D) 0.377 to 4.289 Answer: A Explanation: Differences are: 2, 1, 3, 5, 3, 0 The mean difference is = 2.333 and the standard deviation of the differences is Since n = 6, df = 5 and thus t* = 2.757. 2.333 ± 2.757 2.333 ± 2.757(0.7148428) 2.333 ± 1.971 0.362 to 4.304 Diff: 2 Type: BI L.O.: 6.13.3
Var: 1
271
22) Report the test statistic (with two decimal places), p-value, and conclusion for a test of versus using the paired data provided in the following table. Use a 5% significance level. Subject Situation 1 Situation 2
1 67 75
2 81 86
3 85 87
4 73 77
5 70 76
6 78 81
7 86 86
Assume that the results come from random samples from populations that are approximately normal and that the differences are computed using Answer: The differences are: -8, -5, -2, -4, -6, -3, 0 The mean difference is
= -4 and the standard deviation of the differences is
Test statistic: t =
= -3.99
.
n = 7, so df = 6 p-value = 0.0036 Because the p-value is considerably smaller than the 5% significance level, we have very strong evidence to reject and thus have very strong evidence in favor of Diff: 2 Type: ES L.O.: 6.13.4
Var: 1
© 2021 John Wiley & Sons, Inc. All rights reserved. Instructors who are authorized users of this course are permitted to download these materials and use them in connection with the course. Except as permitted herein or by law, no part of these materials should be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise.
Statistics - Unlocking the Power of Data, 3e (Lock) Chapter 7 Chi-Square Tests for Categorical Variables 7.1
Testing Goodness-of-Fit for a Single Categorical Variable
Use the following to answer the questions below: Are all colors equally likely for Milk Chocolate M&M's? Data collected from a bag of Milk Chocolate M&M's are provided. Blue 110
Brown 47
Green 52
Orange 103
Red 58
Yellow 50
1) State the null and alternative hypotheses for testing if the colors are not all equally likely for Milk Chocolate M&M's. 272
Answer: :
=
=
: Some pi is not
=
=
=
=
.
Diff: 2 Type: ES L.O.: 7.1.0;7.1.1
Var: 1
2) If all colors are equally likely, how many candies of each color (in a bag of 420 candies) would we expect to see? Answer: 70 Diff: 2 Type: SA Var: 1 L.O.: 7.1.0;7.1.1 3) Is a chi-square test appropriate in this situation? A) Yes B) No Answer: A Explanation: Yes, all expected counts are larger than 5. Diff: 2 Type: MC Var: 1 L.O.: 7.1.2 4) How many degrees of freedom are there? A) 2 B) 3 C) 4 D) 5 Answer: D Diff: 2 Type: BI Var: 1 L.O.: 7.1.0;7.1.1
273
5) Calculate the chi-square test statistic. Report your answer with three decimal places. Answer: 58.371 Explanation: = 58.371 Diff: 2 Type: SA Var: 1 L.O.: 7.1.1 6) Report the p-value for your test. What conclusion can be made about the color distribution for Milk Chocolate M&M's? Use a 5% significance level. Answer: p-value ≈ 0 There is very strong evidence that the six colors are not all equally likely among Milk Chocolate M&M's. Diff: 2 Type: ES Var: 1 L.O.: 7.1.0;7.1.1 7) Which color contributes the most to the chi-square test statistic? For this color, is the observed count smaller or larger than the expected count? Answer: Blue (the contribution is 22.8571). The observed count is larger than what we would expect under the null hypothesis, suggesting that more than 1/6 of Milk Chocolate M&M's are blue. Diff: 2 Type: ES Var: 1 L.O.: 7.1.0
274
8) Are all colors equally likely for Dark Chocolate M&M's? Data collected from a bag of Dark Chocolate M&M's are provided. Blue 77
Brown 74
Green 57
Orange 81
Red 62
Yellow 84
Test, at the 5% level, if this sample provides evidence that not all colors are equally likely for Dark Chocolate M&M's. Include all details of the test. Answer:
:
=
: Some pi is not
=
DF 5
=
=
=
.
Category Observed Blue 77 Brown 74 Green 57 Orange 81 Red 62 Yellow 84 N 435
=
Test Proportion 0.166667 0.166667 0.166667 0.166667 0.166667 0.166667
Chi-Sq 7.96552
Expected 72.5 72.5 72.5 72.5 72.5 72.5
Contribution to Chi-Sq 0.27931 0.03103 3.31379 0.99655 1.52069 1.82414
P-Value 0.158
The expected counts are all larger than 5, so it is appropriate to perform the chi-square test. Test statistic:
= 7.96552
Degrees of freedom = 5 p-value = 0.158 There is no evidence to reject
and thus there is no evidence that the six colors are not equally
likely in Dark Chocolate M&M's. Diff: 2 Type: ES Var: 1 L.O.: 7.1.1
275
Use the following to answer the questions below: An insurance agent is interested in knowing if car crashes are more likely to occur on some days of the week than others. She selects a random sample of 250 insurance claims involving car crashes. Computer output from her chi-square test is provided.
Test Observed 26 36 38 39 37 42 32
Category Sunday Monday Tuesday Wednesday Thursday Friday Saturday N 250
DF 6
Chi-Sq 4.632
Proportion 0.142857 0.142857 0.142857 0.142857 0.142857 0.142857 0.142857
Expected 35.7143 35.7143 35.7143 35.7143 35.7143 35.7143 35.7143
Contribution to Chi-Sq 2.64229 0.00229 0.14629 0.30229 0.04629 ??????? 0.38629
P-Value 0.592
9) Is a chi-square test appropriate in this situation? A) Yes B) No Answer: A Explanation: Yes, all of the expected counts are larger than 5. Diff: 2 Type: MC Var: 1 L.O.: 7.1.2 10) Test, at the 5% level, if there is evidence that car crashes are not equally like to occur on all days of the week. Include all details of the test. Answer:
:
: Some pi is not
=
=
=
=
=
=
=
.
Test statistic: = 4.632 Degrees of freedom: 6 p-value = 0.592 There is no evidence to reject
and thus there is no evidence to conclude that car crashes
occur with differing probabilities on the seven days of the week. Diff: 2 Type: ES Var: 1 L.O.: 7.1.1
276
11) The contribution for Friday is missing. Compute the contribution for Friday. Report your answer with three decimal places. Answer: 1.106 Diff: 2 Type: SA Var: 1 L.O.: 7.1.0;7.1.1 Use the following to answer the questions below: Observed counts from a sample are provided in the following table. The expected counts from a null hypothesis are given in parentheses. Category Observed (Expected)
A 42 (45.33)
B 38 (45.33)
C 56 (45.33)
12) What is the - test statistic? A) 3.941 B) 3.711 C) 4.315 D) 2.983 Answer: A Diff: 2 Type: BI Var: 1 L.O.: 7.1.0 13) How many degrees of freedom are there? A) 1 B) 2 C) 3 D) 4 Answer: B Diff: 2 Type: BI Var: 1 L.O.: 7.1.0 14) Based on the expected counts, which of the following is most likely the null hypothesis? A)
:
=
=
B)
:
= 0.25,
= 0.25,
C)
:
= 0.2,
= 0.4,
= 0.4
D)
:
= 0.5,
= 0.2,
= 0.3
Answer: A Diff: 3 Type: BI L.O.: 7.1.0
= = 0.5
Var: 1
277
Use the following to answer the questions below: In a survey conducted by the Gallup organization, 1,017 adults were asked "In general, how much trust and confidence do you have in the mass media — such as newspapers, TV, and radio — when it comes to reporting the news fully, accurately, and fairly?" The results are summarized in the provided table. Response "Great deal" of confidence "Fair amount" of confidence "Not very much" confidence "No confidence at all"
Count 81 325 397 214
We are interested in testing whether or not the four responses are equally likely. 15) Is a chi-square test appropriate in this situation? A) Yes B) No Answer: A Explanation: Yes, the expected counts are all larger than 5. (The expected counts are all 254.25.) Diff: 2 Type: MC Var: 1 L.O.: 7.1.2
278
16) Test, at the 5% level, if there is evidence that the four opinions are not all equally likely. Include all details of the test. Answer: :
=
=
=
=
: some pi is not Test Category Observed Proportion "Great deal" of confidence 81 0.25 "Fair amount" of confidence 325 0.25 "Not very much" confidence 397 0.25 "No confidence at all" 214 0.25 N 1017
DF 3
Chi-Sq 224.263
Expected 254.25 254.25 254.25 254.25
Contribution to Chi-Sq 118.055 19.688 80.148 6.372
P-Value 0.000
Test Statistic: = 224.263 Degrees of freedom: 3 p-value ≈ 0 There is very strong evidence that the four opinions are not all equally likely. Diff: 2 Type: ES Var: 1 L.O.: 7.1.1 17) Which opinion has the largest contribution to the chi-square test statistic? For this age group, is the observed count smaller or larger than the expected count? Answer: The "great deal" of confidence opinion contributes the most to the chi-square test statistic. This group has a much smaller observed count than we would expect under the null hypothesis of equally likely proportions, which indicates that the proportion of people with a "great deal" of confidence is lower than 1/4. Diff: 2 Type: ES Var: 1 L.O.: 7.1.0
279
Use the following to answer the questions below: Upon request, the Mars Company (the maker of M&M's) will provide the color distribution for their candies. As of August 2009, they noted that "Our color blends were selected by conducting consumer preference tests, which indicate the assortment of colors that pleased the greatest number of people and created the most attractive overall effect. On average, our mix of colors for M&M'S CHOCOLATE CANDIES is: M&M'S MILK CHOCOLATE: 24% cyan blue, 20% orange, 16% green, 14% bright yellow, 13% red, 13% brown." Data collected from a bag of Milk Chocolate M&M's are provided. Blue 110
Brown 47
Green 52
Orange 103
Red 58
Yellow 50
We want to determine if this sample provides evidence that the color distribution has changed since August 2009. 18) State the null and alternative hypotheses for testing if the color distribution for Milk Chocolate M&M's has changed since 2009. Answer: : = 0.24, = 0.13, = 0.16, = 0.20, = 0.13, = 0.14 : One of the equalities in Diff: 1 Type: ES L.O.: 7.1.0;7.1.1
does not hold.
Var: 1
19) Find the expected counts for each color using the sample size (420 total candies) and null hypothesis. Answer: Blue 100.8 Brown 54.6 Green 67.2 Orange 84 Red 54.6 Yellow 58.8 Diff: 2 Type: ES L.O.: 7.1.0;7.1.1
Var: 1
280
20) Is a chi-square test appropriate in this situation? A) Yes B) No Answer: A Explanation: Yes, all expected counts are larger than 5. Diff: 2 Type: MC Var: 1 L.O.: 7.1.2 21) How many degrees of freedom are there? A) 2 B) 3 C) 4 D) 5 Answer: D Diff: 2 Type: BI Var: 1 L.O.: 7.1.0;7.1.1 22) Report the chi-square test statistic. Use three decimal places. Answer: 11.162 Diff: 2 Type: SA Var: 1 L.O.: 7.1.0;7.1.1 23) Report the p-value for your test. What conclusion can be made about the color distribution of Milk Chocolate M&M's? Use a 5% significance level. A) p-value = 0.048 We have evidence that the color distribution of Milk Chocolate M&M's has changed since 2009. B) p-value = 0.048 We have do not have evidence that the color distribution of Milk Chocolate M&M's has changed since 2009. C) p-value = 0.052 We have evidence that the color distribution of Milk Chocolate M&M's has changed since 2009. D) p-value = 0.052 We have do not have evidence that the color distribution of Milk Chocolate M&M's has changed since 2009. Answer: A Diff: 2 Type: BI Var: 1 L.O.: 7.1.0;7.1.1 24) Which color contributes the most to the chi-square test statistic? For that color, is the observed count larger or smaller than what we would expect under the null hypothesis? Answer: Orange (the contribution is 4.29762). The observed count is higher than we would expect under the null hypothesis (indicating that there may be slightly more orange candies than in 2009). Diff: 2 Type: ES Var: 1 L.O.: 7.1.0
281
Use the following to answer the questions below: The Gallup organization surveyed a random sample of American adults about their belief in the theory of evolution. The responses are summarized in the provided table. Opinion Believe Do Not Believe No Opinion
Count 397 254 367
25) Is a chi-square test appropriate for testing if all beliefs are not equally likely? A) Yes B) No Answer: A Explanation: Yes, under the null hypothesis that all opinions are equally likely, the expected count for each opinion is 339.33 (larger than 5). Diff: 2 Type: MC Var: 1 L.O.: 7.1.2 26) Test, at the 5% level, if there is evidence that not all opinions are equally likely. Answer: :
=
=
: Some pi is not
= .
Category Observed Believe 397 Do Not Believe 254 No Opinion 367 N 1018
DF 2
Chi-Sq 33.5147
Test Proportion 0.333333 0.333333 0.333333
Expected 339.333 339.333 339.333
Contribution to Chi-Sq 9.7999 21.4591 2.2557
P-Value 0.000
Test statistic: = 33.5147 Degrees of freedom: 2 p-value ≈ 0 There is very strong evidence that not all opinions are equally likely. Diff: 2 Type: ES Var: 1 L.O.: 7.1.1
282
27) Which opinion contributes the most to the chi-square test statistic? For that opinion, is the observed count larger or smaller than we would expect? Answer: Do not believe (contribution is 21.4591). The observed count is smaller than we would expect under the null hypothesis that all three opinions are equally likely, indicating that likely fewer than 1/3 of American adults do not believe in evolution. Diff: 2 Type: ES Var: 1 L.O.: 7.1.0 28) In a survey, Gallup asked a random sample of U.S. adults if they would prefer to have a job outside the home, or if they would prefer to stay home to care for the family and home. The results are summarized below. Job Outside of Home 645
Stay at Home 332
Total 977
Use the goodness-of-fit test to determine if there is evidence that the two choices are not equally likely. Use a 5% significance level. Answer: :
=
=
: The pi's are not the same.
Category Observed Job Outside of Home 645 Stay at Home 332
N 977
DF 1
Chi-Sq 100.275
Test Proportion 0.5 0.5
Expected 488.5 488.5
Contribution to Chi-Sq 50.1377 50.1377
P-Value 0.000
Both expected counts are larger than 5, so it is appropriate to use the chi-square test. Test statistic: = 100.275 Degrees of freedom: 1 p-value ≈ 0 There is very strong evidence that the two choices are not equally popular. Diff: 2 Type: ES Var: 1 L.O.: 7.1.1
283
7.2
Testing for an Association between Two Categorical Variables
Use the following to answer the questions below: February 12, 2009 marked the 200th anniversary of Charles Darwin's birth. To celebrate, Gallup, a national polling organization, surveyed 1,018 Americans about their education level and their beliefs about the theory of evolution. The survey results are displayed in the provided two-way table. Note that the expected counts for most cells appear in parentheses.
Believe Do Not Believe No Opinion Total
High School or Less Some College 80 (148.2) 133 (126.7) 103 (94.8) 94 (81.1) 197 (137.0) 98 (117.2) 380 325
College Graduate 121 (88.9) 48 (?) 59 (82.2) 228
Postgraduate 63 (33.1) 9 (21.2) 13 (30.6) 85
Total 397 254 367 1,018
1) Compute the expected cell count for the (College Graduate, Do Not Believe) cell. Report your answer with one decimal place. Answer: 56.9 Diff: 2 Type: SA Var: 1 L.O.: 7.2.0;7.2.1 2) Compute the contribution to the chi-square statistic for the (Postgraduate, Believe) cell. Report your answer to two decimal places. Answer: 27.01 Diff: 2 Type: SA Var: 1 L.O.: 7.2.0;7.2.1 3) What are the degrees of freedom for the test? A) 6 B) 4 C) 3 D) 11 Answer: A Diff: 2 Type: BI Var: 1 L.O.: 7.2.0;7.2.1 4) State the hypotheses for testing whether the data indicate that there is some association between education level and belief in evolution. Answer: : Belief about evolution does not depend on education level. : Belief about evolution is related to education level. Diff: 2 Type: ES L.O.: 7.2.0;7.2.1
Var: 1
284
5) Is it appropriate to use a chi-square test to test for an association between education level and belief about evolution? A) Yes B) No Answer: A Explanation: Yes, all expected cell counts are larger than 5. Diff: 2 Type: MC Var: 1 L.O.: 7.2.2 6) Using a 5% significance level and assuming the test statistic is = 127.451, compute the p-value and make an appropriate conclusion for this test. If there is a significant association between these two variables, describe how they are related. Answer: p-value ≈ 0 There is very strong evidence of a significant association between education level and belief about evolution. Individuals who are more educated are more likely to believe in the theory of evolution (or have an opinion) than those who are less educated. Diff: 2 Type: ES Var: 1 L.O.: 7.2.0;7.2.1
285
7) A study to investigate the dominant paw in cats was described in the scientific journal Animal Behaviour. The researchers used a random sample of 42 domestic cats. In this study, each cat was shown a treat (5 grams of tuna), and while the cat watched, the food was placed inside a jar. The opening of the jar was small enough that the cat could not stick its head inside to remove the treat. The researcher recorded the paw that was first used by the cat to try to retrieve the treat. This was repeated 100 times for each cat (over a span of several days). The paw used most often was deemed the dominant paw. The researchers want to determine if there is a significant association between sex of the cat and dominant paw. Computer output from the analysis is provided. Is it appropriate to perform the chi-test to test for an association between sex and dominant paw in cats? If so, perform the test. If not, briefly explain why. Rows: Sex
Columns: Paw Left 1 10 8.100
Female 19 10 8.100 Male
Not Left
All
20 11 7.364
21 21 *
2 11 7.364 22 22 *
21 21 * 42 42 *
All
20 20 *
Cell Contents:
Count Expected count Contribution to Chi-square
Pearson Chi-Square = 30.927, DF = 1, P-Value = 0.000 Answer: : Sex and dominant paw in cats are not related. : Sex and dominant paw in cats are related. All expected cell counts are larger than 5, so it is appropriate to use the chi-square test. Test Statistic: 30.927 Degrees of freedom: 1 p-value ≈ 0 There is very strong evidence that sex and the dominant paw in cats are related. Diff: 2 Type: ES Var: 1 L.O.: 7.2.2
286
8) M&M's, the popular candy-coated chocolate treats, come in a variety of flavors. One of the newest varieties is Pretzel, and another popular variety is Peanut Butter. Does the Mars Company (the maker of M&M's) use the same color distribution (frequency of colors) for all varieties, or does it depend on variety? Data collected on the two varieties are displayed in the provided two-way table. Test, at the 5% level, if the samples provide evidence of an association between color and variety. Include all of the details of the test.
Pretzel Peanut Butter Total
Blue 33 28
Brown 28 40
Green 11 38
Orange 24 25
Red 15 34
Yellow 24 23
Total 135 188
61
68
49
49
49
47
323
Answer: : Color does not depend on variety. : Color does depend on variety. Rows: Variety
Pretzel
Columns: Color Blue
Brown
Green Orange
3 25.50 2.2090
28 28.42 0.0062
11 20.48 4.3881
24 15 20.48 20.48 0.6050 1.4663
24 35 19.64 135.00 0.9659 *
28 35.50 1.5863
40 39.58 0.0045
38 28.52 3.1510
25 34 28.52 28.52 0.4345 1.0529
23 188 27.36 188.00 0.6936 *
61 61.00 *
68 68.00 *
49 49.00 *
49 49.00 *
47 323 47.00 323.00 * *
Peanut Butter
All
Cell Contents:
Red Yellow
49 49.00 *
All
Count Expected count Contribution to Chi-square
Pearson Chi-Square = 16.563, DF = 5, P-Value = 0.005 All expected cell counts are greater than 5, so it is appropriate to use the chi-square test. The test statistic is = 16.563. There are 5 degrees of freedom. The p-value is 0.005. There is very strong evidence to reject and thus there is very strong evidence to conclude the color distribution does depend on variety. Diff: 2 Type: ES Var: 1 L.O.: 7.2.1
287
Use the following to answer the questions below: M&M's, the popular candy-coated chocolate treats, come in a variety of flavors. Two popular varieties are Milk Chocolate (sometimes referred to as "Plain") and Peanut. Does the Mars Company (the maker of M&M's) use the same color distribution (frequency of colors) for all varieties, or does it depend on variety? Data were collected on the two varieties and computer output for a chi-square test of association is provided. Rows: Variety
Milk Chocolate
Peanut
All
Cell Contents:
Columns: Color Blue Brown 90 93.57 54 0.1360 49.36 0.4357 37 33.43 13 0.3806 17.64 1.2195 127 67 127.00 67.00 * *
Green Orange
Red
Yellow
All
56 99 60.41 103.14 0.3224 0.1666
53 50.84 0.0921
65 59.68 0.4749
417 417.00 *
26 41 21.59 21.32 0.9023 0.4661 82 140 82.00 140.00 * *
16 18.16 0.2579 69 69.00 *
16 21.32 1.3290 81 81.00 *
149 149.00 * 566 566.00 *
Count Expected count Contribution to Chi-square
Pearson Chi-Square = 6.183, DF = 5, P-Value = 0.289 9) Is it appropriate to use a chi-square test to test for an association between variety and color? A) Yes B) No Answer: A Explanation: Yes, it is appropriate because all expected cell counts are larger than 5. Diff: 2 Type: MC Var: 1 L.O.: 7.2.2
288
10) Test, at the 5% level, if there is a significant association between variety and color. Include all details of the test. Answer: : Color does not depend on variety. : Color depends on variety. All expected cell counts are larger than 5. Test statistic: = 6.183 Degrees of freedom: 5 p-value = 0.289 There is no evidence to reject
and thus there is no evidence of a significant association
between variety (Peanut versus Plain) and color. That is, there is no evidence that the colors appear with different frequencies in Milk Chocolate and Peanut M&M's. Diff: 2 Type: ES Var: 1 L.O.: 7.2.1 Use the following to answer the questions below: We have a random sample of 150 students (60 males and 90 females) that includes two variables: Smoke = "yes" or "no" and Gender = "female (F)" or "Male (M)." The two-way table below summarizes the results.
Gender = M Gender = F Total
Smoke = Yes 9 9 18
Smoke = No 51 81 132
Total 60 90 150
11) Is it appropriate to use a chi-square test to test for an association between gender and smoking status? A) Yes B) No Answer: A Explanation: Yes, it is appropriate because all expected cell counts are larger than 5. Diff: 2 Type: MC Var: 1 L.O.: 7.2.2
289
12) Test, at the 10% level, if there is a significant association between gender and smoking status among students at this university. Include all of the details of the test. Answer: : Smoking status does not depend on gender. : Smoking status and gender are related. Rows: Gender
Columns: Smoke No
Yes
All
F
81 79.20 0.04091
9 10.80 0.30000
90 90.00 *
M
51 52.80 0.06136
9 7.20 0.45000
60 60.00 *
All
132 132.00 *
18 18.00 *
150 150.00 *
Cell Contents:
Count Expected count Contribution to Chi-square
Pearson Chi-Square = 0.852, DF = 1, P-Value = 0.356 Test statistic: = 0.852 Degrees of freedom: 1 p-value = 0.356 There is no evidence, at the 1% level, of an association between gender and smoking status. Diff: 2 Type: ES Var: 1 L.O.: 7.2.1
290
Use the following to answer the questions below: A political science professor at a small university wants to know if political party affiliation is significantly associated with trust in the media. He randomly selects 66 Democrats, 63 Republicans, and 41 Independents. Computer output of his chi-square analysis is provided. Rows: Party
Columns: Trust Media? No 28 39.99 3.594
Democrat 28 24.84 0.402 Independent 47 38.17 2.042 Republican
Yes
All
38 26.01 5.525
66 66.00 *
13 ????? 0.618
41 41.00 *
16 24.83 3.140 67 67.00 *
63 63.00 * 170 170.00 *
All
103 103.00 *
Cell Contents:
Count Expected count Contribution to Chi-square
Pearson Chi-Square = 15.320, DF = 2, P-Value = 0.000 13) The expected count for the (Independent, Yes) cell is missing. Compute the expected count for this cell. Report your answer with two decimal places. Answer: 16.16 Diff: 2 Type: SA Var: 1 L.O.: 7.2.0;7.2.1 14) Is it appropriate to use a chi-square test to test for an association between political party and trust in the media? A) Yes B) No Answer: A Explanation: Yes, all expected cell counts are larger than 5. Diff: 2 Type: MC Var: 1 L.O.: 7.2.2
291
15) Test, at the 5% level, if there is a significant association between political party affiliation and trust in the media. Include all details of the test. Answer: : Political party affiliation and trust in the media are not related. : Political party affiliation and trust in the media are related. Test statistic: = 15.32 Degrees of freedom: 2 p-value ≈ 0 There is very strong evidence to reject
and thus there is very strong evidence to conclude
that there is a significant association between political party affiliation and trust in the media. Diff: 2 Type: ES Var: 1 L.O.: 7.2.1 16) Which cell has the largest contribution to the chi-square statistic? For this cell, is the observed count larger or smaller than the expected count? Answer: (Democrat, Yes) has a contribution of 5.525. The observed count is larger than we would expect if there was no relationship between the two variables. Diff: 2 Type: ES Var: 1 L.O.: 7.2.0
292
Use the following to answer the questions below: The Gallup organization recently conducted a survey of 1,015 randomly selected U.S. adults about "Black Friday" shopping. They asked the following question: "As you know, the Friday after Thanksgiving is one of the biggest shopping days of the year. Looking ahead, do you personally plan on shopping on the Friday after Thanksgiving, or not?" Their results, broken down by sex, are summarized in the provided two-way table.
Male Female Total
Yes Shopping 82 100 182
No Shopping 433 400 833
Total 515 500 1,015
17) Compute the expected cell counts for all cells. Report your counts to two decimal places. Answer: Yes Shopping No Shopping Total Male 92.34 422.66 515 Female 89.66 410.34 500 Total 182 833 1015 Diff: 2 Type: ES L.O.: 7.2.0;7.2.1
Var: 1
18) Is it appropriate to use a chi-square test to test for an association between sex and plans to shop the Friday after Thanksgiving? A) Yes B) No Answer: A Explanation: Yes, all expected cell counts are larger than 5. Diff: 2 Type: MC Var: 1 L.O.: 7.2.2
293
19) Test, at the 1% level, if there is a significant association between sex and plans to shop the Friday after Thanksgiving. Include all details of your test. Answer: : Plans to shop the Friday after Thanksgiving does not depend on sex. : Sex and plans to shop the Friday after Thanksgiving are related. Rows: Sex
Columns: Plans to Shop No
Yes
All
Female
400 410.3 0.2608
100 89.7 1.1936
500 500.0 *
Male
433 422.7 0.2532
82 92.3 1.1589
515 515.0 *
All
833 833.0 *
182 182.0 *
1015 1015.0 *
Cell Contents:
Count Expected count Contribution to Chi-square
Pearson Chi-Square = 2.866, DF = 1, P-Value = 0.090 Test statistic: = 2.866 Degrees of freedom: 1 p-value = 0.09 Because the p-value is smaller than the 10% significance level, we have some (somewhat weak) evidence to reject and thus have some (somewhat weak) evidence that there is a significant association between sex and plans to shop on the Friday after Thanksgiving. Diff: 2 Type: ES Var: 1 L.O.: 7.2.1
294
20) The Gallup organization asked a random sample of U.S. adults if they would prefer to have a job outside the home, or if they would prefer to stay home to care for the family and home. Of the 504 males they surveyed, 391 said that they would prefer to have a job outside of the home. Of the 473 females they surveyed, 254 said that they would prefer a job outside of the home. Job Outside of Home 391 254 645
Males Females Total
Stay at Home 113 219 332
Total 504 473 977
Test, at the 5% level, if there is evidence of an association between sex and preference to have a job outside of the home. Include all details of the test. Answer: : Job preference does not depend on sex. : Sex and job preference are related. Rows: Sex
Columns: Job Preference
Outside Home
Stay at Home
All
Female
254 312.3 10.87
219 160.7 21.12
473 473.0 *
Male
391 332.7 10.20
113 171.3 19.82
504 504.0 *
All
645 645.0 *
332 332.0 *
977 977.0 *
Cell Contents:
Count Expected count Contribution to Chi-square
Pearson Chi-Square = 62.021, DF = 1, P-Value = 0.000 Test statistic: = 62.021 Degrees of freedom: 1 p-value = 0 There is very strong evidence to reject
and thus there is very strong evidence of a significant
association between job preference and sex. Diff: 2 Type: ES Var: 1 L.O.: 7.2.1 295
© 2021 John Wiley & Sons, Inc. All rights reserved. Instructors who are authorized users of this course are permitted to download these materials and use them in connection with the course. Except as permitted herein or by law, no part of these materials should be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise.
Statistics - Unlocking the Power of Data, 3e (Lock) Chapter 8 ANOVA to Compare Means 8.1
Analysis of Variance
Use the following to answer the questions below: Two sets of sample data, A and B, are given. Without doing any calculations, indicate in which set of sample data, A or B, there is likely to be stronger evidence of a difference in the population means. 1) Dataset A Group 1 Group 2 19 23 21 24 20 22 20 24 21 22 21 23 19 24 19 22 = 20.0 = 23.0
Dataset B Group 1 Group 2 15 27 17 28 16 26 16 28 17 26 17 27 15 28 15 26 = 16.0 = 27.0
A) Dataset A B) Dataset B Answer: A Explanation: Dataset A is likely to provide stronger evidence of differences among the population means because the sample means are further apart than those in Dataset B. The variability within each sample for the two datasets is similar. Diff: 2 Type: BI Var: 1 L.O.: 8.1.2
296
2) Group 1 12 8 15 9 17 11 = 12.0
Dataset A Group 2 25 15 12 9 28 19 = 18.0
Group 3 8 10 16 17 20 7 = 33.0
Group 1 12 11 12 12 12 13 = 12.0
Dataset B Group 2 19 18 18 18 17 18 = 18.0
Group 3 12 13 14 13 13 13 = 33.0
A) Dataset A B) Dataset B Answer: B Explanation: Dataset B is likely to provide stronger evidence of differences among the population means because, while the sample means within each group are the same for both datasets, there is much less variability within each group in Dataset B than there is in Dataset A. Diff: 2 Type: BI Var: 1 L.O.: 8.1.2
297
3)
A) Dataset A B) Dataset B Answer: A Explanation: Dataset A is likely to provide stronger evidence of differences among the population means because, while the sample means within each group look to be about the same for both datasets, there is much less variability within each group in Dataset A than there is in Dataset B. Diff: 2 Type: BI Var: 1 L.O.: 8.1.2
298
4)
A) Dataset A B) Dataset B Answer: B Explanation: Dataset B is likely to provide stronger evidence of differences among the population means because, while the sample means within each group look to be about the same for both datasets, there is much less variability within each group in Dataset B than there is in Dataset A. Diff: 1 Type: BI Var: 1 L.O.: 8.1.2
299
5) Dataset A Group 1 Group 2 42 48 42 47 41 48 43 48 42 48 42 49 = 42 = 48
Dataset B Group 1 Group 2 41 50 43 46 39 44 45 52 37 45 47 51 = 42 = 48
A) Dataset A B) Dataset B Answer: A Explanation: Dataset A is likely to provide stronger evidence of differences among the population means because, while the sample means within each group are the same for both datasets, there is much less variability within each group in Dataset A than there is in Dataset B. Diff: 1 Type: BI L.O.: 8.1.2
Var: 1
6)
A) Dataset A B) Dataset B Answer: B Explanation: Dataset B because the two samples are further apart in that dataset than in Dataset A. The two datasets have the same amount of variability within each sample. Diff: 1 Type: BI Var: 1 L.O.: 8.1.2 300
7) Analysis of variance is used to test for significant differences among A) means. B) variances. C) standard deviations. D) proportions. Answer: A Diff: 2 Type: BI Var: 1 L.O.: 8.1.2 8) Some computer output from an analysis of variance is provided. Source Groups Error Total
DF 4 120 124
SS 913.4 3204.8 4118.2
MS 228.3 26.7
F 8.55
P 0.000
How many groups are there? A) 3 B) 4 C) 5 D) 6 Answer: C Diff: 2 Type: BI Var: 1 L.O.: 8.1.0 9) Some computer output from an analysis of variance is provided. Source Groups Error Total
DF 4 120 124
SS 913.4 3204.8 4118.2
MS 228.3 26.7
F 8.55
What is the overall sample size? A) 125 B) 124 C) 123 D) 121 Answer: A Diff: 2 Type: BI Var: 1 L.O.: 8.1.0
301
P 0.000
10) A scientist performing analysis of variance has the following null hypothesis: : = = = . What is the appropriate alternative hypothesis for his analysis? A) : ≠ ≠ ≠ . B)
:
>
>
>
.
C)
:
<
<
<
.
D)
: At least one
≠
Answer: D Diff: 2 Type: BI L.O.: 8.1.0
Var: 1
11) True or False: SSE = SSTotal + SSG Answer: FALSE Diff: 2 Type: TF Var: 1 L.O.: 8.1.0 Use the following to answer the questions below: The sample sizes for the groups in a dataset and an outline of an analysis of variance table with partial information are provided. Fill in the missing parts of the table. Round decimal answers to two decimal places. 12) Three groups with
= 10,
= 10, and
Source Groups Error
SS 350 150
MS
F
MS 175 2.56
F
DF
Total
= 10.
500
Answer: Source Groups Error
DF 2 27
SS 350 150
Total
29
500
Diff: 2 Type: ES L.O.: 8.1.0
31.47
Var: 1
302
13) Four groups with
= 6,
Source Groups Error
SS 750
DF
Total
= 5,
= 5, and
MS
F
MS 250 31.25
F
1,250
Answer: Source Groups Error
DF 3 16
SS 750 500
Total
19
1,250
8
Diff: 2 Type: ES L.O.: 8.1.0
Var: 1
14) Three groups with
= 8,
= 7, and
= 5.
Source Groups Error
DF
SS
MS 120 40
F
Answer: Source Groups Error
DF 2 17
SS 240 680
MS 120 40
F
Total
19
920
Total
Diff: 3 Type: ES L.O.: 8.1.0
3
Var: 1
303
= 4.
8.2
Pairwise Comparisons and Inference after ANOVA
Use the following to answer the questions below: A small university is concerned with monitoring its electricity usage in its Student Center. Specifically, its officials want to know if the amount of electricity used differs by day of the week. They collected data for nearly a year, and the relevant summary statistics are provided. Note that electricity usage is measured in kilowatt hours. Day of Week Sunday Monday Tuesday Wednesday Thursday Friday Saturday
n 45 45 45 44 44 45 45 313
s 34.89 27.37 28.64 31.68 33.26 32.22 38.56 34.25
86.48 109.29 110.96 115.03 114.97 108.58 87.07 104.56
1) State the appropriate null and alternative hypotheses for this test. A) : = = = = = = ≠
: At least one B)
:
=
=
: At least one :
D)
=
=
=
=
≠
=
: At least one :
=
≠
: At least one C)
=
=
Answer: A Diff: 2 Type: BI L.O.: 8.1.0;8.1.1
=
=
=
=
=
=
=
=
≠
Var: 1
2) Are the conditions for using ANOVA reasonably satisfied? A) Yes B) No Answer: A Explanation: Yes. All sample sizes are fairly large ( > 30) and the standard deviations looks roughly similar (none is close to twice another). Diff: 2 Type: BI Var: 1 L.O.: 8.1.3
304
3) Complete the ANOVA table below for doing this test using the template started below. Use two decimal places in the F statistic. Source Groups Error Total A) Source Groups Error Total B) Source Groups Error Total C) Source Groups Error Total D) Source Groups Error Total
df
MS
F
312
SS 41,646 366,073
df 6 306 312
SS 41,646 324,427 366,073
MS 6,941 1,060
F
df 7 306 313
SS 41,646 324,427 366,073
MS 5949 1,060
df 6 306 312
SS 41,646 324,427 366,073
MS 6,941 1,060
df 7 306 313
SS 41,646 324,427 366,073
MS 5949 1,060
Answer: A Diff: 2 Type: BI L.O.: 8.1.0;8.1.1
Var: 1
305
6.55
F 5.61
F 0.15
F 0.18
4) Use the F-distribution to find the p-value for the test. Using α = 0.05, does the mean electricity usage differ significantly by day of the week? Make a conclusion in context. A) There is very strong evidence that mean electricity usage differs significantly by day of the week (i.e., some days of the week use more electricity than others). B) There is not enough evidence to conclude that mean electricity usage differs significantly by day of the week (i.e., some days of the week use more electricity than others). Answer: A Explanation: p-value ≈ 0 (from the F-distribution with 6 and 306 degrees of freedom, using Statkey) Since p-value = 0 < α = 0.05, we reject
.
There is very strong evidence that mean electricity usage differs significantly by day of the week (i.e., some days of the week use more electricity than others). Diff: 2 Type: BI Var: 1 L.O.: 8.1.0;8.1.1 5) Using the results from the ANOVA analysis, at an α = 0.05 level of confidence, what is the conclusion of the test, in context? A) There is very strong evidence that mean electricity usage differs significantly by day of the week (i.e., some days of the week use more electricity than others). B) There is not enough evidence to conclude that mean electricity usage differs significantly by day of the week (i.e., some days of the week use more electricity than others). Answer: A Explanation: F = 6.55, p-value ≈ 0 There is very strong evidence that mean electricity usage differs significantly by day of the week (i.e., some days of the week use more electricity than others). Diff: 2 Type: BI Var: 1 L.O.: 8.1.1
306
6) Use the data and ANOVA results to construct a 95% confidence interval for the difference in mean electricity use between Saturdays and Sundays. Round the margin of error to two decimal places. Does your interval suggest a significant difference in mean electricity use for these two days? A) -12.92 to 14.10 There is no evidence that electricity use differs significantly on Saturdays and Sundays. B) -12.92 to 14.10 There is strong evidence that electricity use differs significantly on Saturdays and Sundays. C) -11.09 to 12.27 There is no evidence that electricity use differs significantly on Saturdays and Sundays. D) -11.09 to 12/27 There is strong evidence that electricity use differs significantly on Saturdays and Sundays. Answer: A Explanation: Use df = 306 (87.07 - 86.48) ± 1.968 0.59 ±1.968(6.863753) 0.59 ±13.51 -12.92 to 14.10 Because the confidence interval contains 0, there is no evidence that electricity use differs significantly on Saturdays and Sundays. Diff: 2 Type: BI Var: 1 L.O.: 8.2.1 7) Based on the ANOVA results, test at the 5% level whether the data provide evidence of a difference in mean electricity use on Sundays and Mondays. Use three decimal places in the test statistic. Answer: : = :
t=
≠ = -3.323
p-value = 0.001 (two-tail in t-distribution with df = 306, using Statkey) There is very strong evidence that electricity use differs significantly on Sundays and Mondays. Diff: 2 Type: ES Var: 1 L.O.: 8.2.2 307
8) Computer output provides the following grouping information: Day of Week Wed Thurs Tues Mon Fri Sat Sun
N 44 44 45 45 45 45 45
Mean Grouping 115.03 A 114.97 A 110.96 A 109.29 A 108.58 A 87.07 B 86.48 B
Means that do not share a letter are significantly different. Use the output to make a statement about how electricity usage differs significantly by day of the week. A) Significantly less electricity is used on weekends than on weekdays. B) There is not enough evidence to conclude that electricity usage on weekends is different than electricity usage on weekdays. Answer: A Diff: 3 Type: BI Var: 1 L.O.: 8.2.0
308
Use the following to answer the questions below: Penalties in ice hockey occur when a player breaks one of the rules of the game. In most cases, when a penalty occurs, the offending player is placed in the penalty box (the length of time spent in the penalty box depends on the severity of the penalty), and the team has to play with fewer people on the ice, which can result in an advantage for the opposing team. The number of penalties per game for several randomly selected games are displayed for three college men's ice hockey teams. Team 9 A 7 B 3 C Overall
Penalties 9 5 9 3 5 5 7 2 8
n
s
11 5
8.6
2.191
5
4.2
2.280
5 15
4.8 5.87
2.588 2.973
1 4
9) Use the summary information to compute the three sums of squares needed for using ANOVA to test for a difference in mean number of penalties among these three teams. Round each to two decimal places. A) SSG = 56.93 SSE = 66.79 SSTotal = 123.74 B) SSG = 40.28 SSE = 83.46 SSTotal = 123.74 C) SSG = 65.79 SSE = 66.79 SSTotal = 132.58 D) SSG = 49.12 SSE = 83.46 SSTotal = 132.58 Answer: A Explanation: Between Groups = SSG = 5 +5 + 5 = 56.93 Within Groups = SSE = (5 - 1)* + (5 - 1)* Total = SSTotal = (15 - 1)* = 123.74
+ (5 - 1)*
= 66.79
Note that computing two of the above is sufficient because the third could be found using the relation SSG + SSE = SSTotal. Diff: 2 Type: BI Var: 1 L.O.: 8.1.0
309
10) Construct the ANOVA table and test, at the 5% significance level, for a difference in mean number of penalties among these three hockey teams. Use two decimal places when rounding decimal values. Include the details of your test. Answer: df SS MS F 2 56.94 28.47 5.11 12 66.79 5.57 14 123.74 :
=
=
Ha: At least one
≠
With F = 5.11, the p-value = 0.025 (from F-distribution with 2 and 12 degrees of freedom, using Statkey). We have evidence to reject Ho and thus have evidence of a significance difference in mean number of penalties among the three teams. Diff: 2 Type: ES Var: 1 L.O.: 8.1.1 11) ANOVA output gives a p-value of 0.025 for the difference in mean number of penalties among the three teams. Using α = 0.05, what is the conclusion of the test in context? A) There is evidence of a significance difference in mean number of penalties among the three teams. B) There is not enough evidence to conclude that there is a significance difference in mean number of penalties among the three teams. Answer: A Diff: 2 Type: BI Var: 1 L.O.: 8.1.0;8.1.1
310
12) Use the summary information and results from the ANOVA to construct 95% confidence intervals for the differences in each pair of means: (a) Team A and Team B (b) Team A and Team C (c) Team B and Team C In each case, round the margin of error to two decimal places. Based on your work, which teams have significantly different means? Briefly justify your answer. Answer: Use df = 12 for all intervals (a) (8.6 - 4.2) ± 2.179 4.4 ± 2.179(1.492649) 4.4 ± 3.25 1.15 to 7.65 (b) (8.6 - 4.8) ± 2.179 3.8 ± 3.25 0.55 to 7.05 (c) (4.2 - 4.8) ± 2.179 -0.6 ± 3.25 -3.85 to 2.65 Because the first two intervals do not contain 0, they provide evidence that those pairs of means are significantly different. In both cases, since the interval encompasses entirely positive values, there is evidence that Team A has the larger mean. There is evidence that Team A earns significantly more penalties than either Team B or Team C. There is no evidence that the mean number of penalties for Teams B and C differs. Note that the results here are different from those in the computer output problems because these intervals have not been adjusted for multiple comparisons. Diff: 2 Type: ES Var: 1 L.O.: 8.2.1
311
13) Computer output provides the following information about the pairwise differences: CI for Difference B - A C - A C - B
Lower -8.378 -7.778 -3.378
Center -4.400 -3.800 0.600
Upper -0.422 0.178 4.578
Based on this output, which teams have significantly different means? A) Teams A and B B) Teams A and B; Teams A and C C) Teams A and C D) All three teams have significantly different means Answer: A Explanation: Teams A and B have significantly different means because the confidence interval for the difference in means does not contain 0. Because the interval only contains negative values, we have evidence that Team A gets significantly more penalties than Team B. There are no other significant differences. Diff: 2 Type: BI Var: 1 L.O.: 8.2.0 14) Computer output provides the following grouping information: Team A C B
N 5 5 5
Mean 8.600 4.800 4.200
Grouping A A B B
Means that do not share a letter are significantly different. Based on this output, which teams have significantly different means? A) Teams A and B B) Teams A and B; Teams A and C C) Teams A and C D) All three teams have significantly different means Answer: A Explanation: Teams A and B have significantly different means because they do not share a letter. There is evidence that Team A gets significantly more penalties than Team B. Diff: 3 Type: BI Var: 1 L.O.: 8.2.0
312
Use the following to answer the questions below: Breakfast is often considered to be the most important meal of the day. Data on the number of calories per serving for randomly selected cereals from three different brands (General Mills, Kellogg's, and Kashi) are summarized in the provided plot and table. Brand General Mills Kashi Kellogg's Overall
n 26 16 33 75
125.77 178.13 141.52 143.87
s 41.10 39.02 43.24 45.38
15) State the appropriate null and alternative hypotheses for testing if the mean calories per serving differs among the three brands. A) : = = ≠
: At least one B)
≠
: At least one :
C)
= :
: D)
= >
: :
= = > >
=
> =
Answer: A Diff: 2 Type: BI L.O.: 8.1.0;8.1.1
Var: 1
313
16) Are the conditions for using ANOVA reasonably satisfied? A) Yes B) No Answer: A Explanation: Yes. There are no outliers and the groups that are most skewed have fairly large sample sizes (so normal condition is reasonable). The standard deviations are all quite similar. Diff: 2 Type: BI Var: 1 L.O.: 8.1.3 17) Computer analysis gives a p-value of 0.001. Using α = 0.05, what is the conclusion of the test, in context? A) There is very strong evidence that the mean calories per serving is not the same for all three brands. B) There is not enough evidence to conclude that the mean calories per serving is not the same for all three brands. Answer: A Explanation: F = 7.92, p-value = 0.001 There is very strong evidence that the mean calories per serving is not the same for all three brands. Diff: 2 Type: BI Var: 1 L.O.: 8.1.1 18) Use the summary information and the fact that the sums of squares for groups is SSG = 27,476 and for error is SSTotal = 152,379 to complete an ANOVA table and find the F-statistic. Round decimal answers to two decimal places. Answer: Source Groups Error Total
df 2 72 74
Diff: 2 Type: ES L.O.: 8.1.0;8.1.1
SS 27,476 124,903 152,379
MS 13,738 1,734.76
Var: 1
314
F 7.92
19) Use the summary information and the fact that the sums of squares for groups is SSG = 27,476 and for error is SSTotal = 152,379 to complete an ANOVA table and find the F-statistic. Use the F-distribution to find the p-value and state the conclusion of the test in context (using ). A) p-value = 0.00078 There is very strong evidence that the mean calories per serving differs significantly among the three brands. B) p-value = 0.00078 There is not enough evidence to conclude that the mean calories per serving differs significantly among the three brands. C) p-value = 0.078 There is very strong evidence that the mean calories per serving differs significantly among the three brands. D) p-value = 0.078 There is not enough evidence to conclude that the mean calories per serving differs significantly among the three brands. Answer: A Explanation: p-value = 0.00078 (From F-distribution with 2 and 72 degrees of freedom, using Statkey) There is very strong evidence that the mean calories per serving differs significantly among the three brands. Diff: 2 Type: BI Var: 1 L.O.: 8.1.0;8.1.1
315
20) Computer output from Minitab is provides the following information about the pairwise differences: Brand = General Mills subtracted from: Brand
Lower
Center
Upper
-------+---------+---------+---------+--
Kashi Kellogg's
20.73 -10.36
52.36 15.75
83.99 41.85
(-------*-------) (------*-----) -------+---------+---------+---------+--40 0 40 80
Upper -6.28
-------+---------+---------+---------+-(-------*------) -------+---------+---------+---------+--40 0 40 80
Brand = Kashi subtracted from: Brand Kellogg's
Lower -66.93
Center -36.61
Based on this output, which brands have significantly different means? A) Kashi and General Mills have significantly different means. Kashi and Kellogg's have significantly different means. Kellogg's and General Mills are not significantly different. B) Kashi and General Mills have significantly different means. Kashi and Kellogg's are not significantly different. Kellogg's and General Mills are not significantly different. C) Kashi and General Mills are not significantly different. Kashi and Kellogg's are not significantly different. Kellogg's and General Mills are not significantly different. D) Kashi and General Mills are not significantly different. Kashi and Kellogg's have significantly different means. Kellogg's and General Mills have significantly different means. Answer: A Explanation: Kashi and General Mills have significantly different means. The confidence interval for the difference in means is (20.73, 83.99). The interval does not contain 0, which indicates a significant difference. Since the interval only contains positive values, Kashi cereals have significantly more calories per serving than General Mills cereals. Kashi and Kellogg's have significantly different means. The confidence interval for the difference in means is (-66.93, -6.28). The interval does not contain 0, which indicates a significant difference. Since the interval only contains negative values, Kashi cereals have significantly more calories per serving than General Mills cereals. Kellogg's and General Mills are not significantly different because the confidence interval contains 0. Diff: 2 Type: BI Var: 1 L.O.: 8.2.0 316
21) Computer output provides the following grouping information: Brand Kashi Kellogg's General Mills
N 16 33 26
Mean 178.13 141.52 125.77
Grouping A B B
Means that do not share a letter are significantly different. Based on this output, which brands have significantly different means? Briefly justify your answer. Answer: Kashi and General Mills have significantly different means because they do not share a letter. Kashi cereals have significantly more calories per serving than General Mills cereals. Kashi and Kellogg's have significantly different means because they do not share a letter. Kashi cereals have significantly more calories per serving than General Mills cereals. Kellogg's and General Mills are not significantly different because they share a letter (B). Diff: 2 Type: ES Var: 1 L.O.: 8.2.0
317
22) Use the summary information and the fact that the sums of squares for groups is SSG = 27,476 and for error is SSTotal = 152,379 to test for significant differences (using ) in each pair of means: (a) General Mills and Kashi (b) General Mills and Kellogg's (c) Kashi and Kellogg's In each case, round the test statistic to three decimal places. Based on your work, which brands have significantly different means? Briefly justify your answer. Answer: (a) General Mills and Kashi : = :
≠ = -3.956
t=
p-value ≈ 0
(two-tail probability, df = 72, using Statkey)
Very strong evidence that the mean calories per serving differs significantly for General Mills and Kashi cereals (Kashi has the larger mean). (b) General Mills and Kellogg's : = :
≠ = -1.442
t=
p-value = 0.154 (two-tail probability, df = 72, using Statkey) No evidence that the mean calories per serving differs significantly for General Mills and Kellogg's cereals. (c) Kashi and Kellogg's : = : t=
≠ = 2.885
p-value = 0.005 (two-tail probability, df = 72, using Statkey) Very strong evidence that the mean calories per serving differs significantly for Kashi and Kellogg's cereals (Kashi has the larger mean). 318
Diff: 2 Type: ES L.O.: 8.2.2
Var: 1
Use the following to answer the questions below: Breakfast is often considered to be the most important meal of the day. Data on the amount of sugar (g) per serving for randomly selected cereals from three different brands (General Mills, Kellogg's, and Kashi) are summarized in the provided plot and table. Brand General Mills Kashi Kellogg's Overall
n 26 16 33 75
8.538 8.500 10.636 9.453
s 4.492 3.183 3.516 3.916
23) Are the conditions for using ANOVA reasonably satisfied? A) Yes B) No Answer: A Explanation: Yes, the data for each group is reasonably symmetric (and the groups that are least symmetric have the larger sample sizes), and the standard deviations are similar (none is more than twice another). Diff: 2 Type: MC Var: 1 L.O.: 8.1.3
319
24) Computer output from the analysis gives a p-value of 0.066. Test, at the 5% level, if there is evidence that the average amount of sugar per serving differs significantly among the three brands. A) There is no evidence that the average amount of sugar per serving differs significantly among the three brands. B) The average amount of sugar per serving differs significantly between Kellogg's and Kashi. C) Kellogg's average amount of sugar per serving id significantly greater than both Kashi and General Mills. D) The average amount of sugar per serving differs significantly among the three brands. Answer: A Explanation: : = = : At least one
≠
F = 2.82 p-value = 0.066 There is no evidence that the average amount of sugar per serving differs significantly among the three brands. Diff: 2 Type: BI Var: 1 L.O.: 8.1.1 25) Use the summary information to compute the three sums of squares needed for using ANOVA to test for a difference in the mean amount of calories per serving among the three brands. Round each to two decimal places. Answer: SSTotal = (75 - 1) ∙ = 1,134.79 SSE = (26 - 1) ∙ + (16 - 1) ∙ + (33 - 1) ∙ = 1,052.02 SSG = SSTotal - SSE = 1,134.79 - 1,052.02 = 82.77 Diff: 2 Type: ES Var: 1 L.O.: 8.1.1
320
26) Construct the ANOVA table and test, at the 5% significance level, for a difference in mean amount of sugar among the three brands. Use two decimal places in all decimal values. Is there enough evidence to conclude that the average amount of sugar per serving differs significantly among the three brands. A) Yes B) No Answer: B Explanation: Source df SS MS F 2 82.77 41.39 2.83 Groups Error 72 1,052.02 14.61 Total 74 1,134.79 :
=
: At least one
= ≠
p-value = 0.066 (using F = 2.83 and right-tail probability in the F-distribution with 2 and 72 degrees of freedom, using Statkey) There no evidence that the average amount of sugar per serving differs significantly among the three brands. Diff: 2 Type: MC Var: 1 L.O.: 8.1.1 27) Should you conduct inference after the ANOVA to investigate differences among the pairs of means in this situation? Briefly explain why or why not. Answer: No, you only look for differences among pairs of means when the ANOVA suggests that there is evidence of some differences among the means. Since, in this analysis, we found no evidence of significant differences among the mean amount of sugar per serving for the three brands, there is no need to perform a follow-up analysis of the pairwise differences. Diff: 2 Type: ES Var: 1 L.O.: 8.2.0
321
Use the following to answer the questions below: An environmental studies student working on an independent research project was investigating metal contamination in the St. Lawrence River. The metals can accumulate in organisms that live in the river (known as bioaccumulation). He collected samples of Quagga mussels at three sites in the St. Lawrence River and measured the concentration of copper (in micrograms per gram, μg/g or mcg/g) in the mussels. His data are summarized in the provided table and plot. He wants to know if there are any significant differences in mean copper concentration among the three sites. Site 1 2 3 Overall
Copper Concentration (μg/ g) 19.9 23.4 17.5 25.4 20.5 13.0 18.8 18.4 16.1 18.4 13.8 7.0 11.4 15.2
n 5 4 5 14
21.34 16.60 13.16 17.06
s 3.092 2.687 4.274 4.82
28) Are the conditions for using ANOVA reasonably satisfied? A) Yes B) No Answer: A Explanation: Yes. The boxplots indicate that the data for all groups is reasonably symmetric and without outliers. The standard deviation for each group is not more than twice the standard deviation for another group. Diff: 2 Type: MC Var: 1 L.O.: 8.1.3
322
29) Use the summary information to compute the three sums of squares needed for using ANOVA to test for a difference in mean copper concentration among the three sites. Round each to two decimal places. Answer: SSTotal = (14 - 1) ∙ = 302.02 SSE = (5 - 1) ∙ + (4 - 1) ∙ + (5 - 1) ∙ = 132.97 SSG = SSTotal - SSE = 169.05 Diff: 2 Type: ES Var: 1 L.O.: 8.1.0;8.1.1 30) Construct the ANOVA table and test, using α = 0.05, for a difference in mean copper concentration among the three sites. Round decimal values to two decimal places. Include all details of the test. Answer: Source df SS MS F 2 169.05 84.53 6.99 Groups Error 11 132.97 12.09 Total 13 302.02 :
=
=
: At least one
≠
F = 6.99 p-value = 0.011 (right-tail probability in the F distribution with 2 and 11 degrees of freedom, using Statkey) There is evidence of a difference in mean copper concentration among the three sites. Diff: 2 Type: ES Var: 1 L.O.: 8.1.1 31) Computer output from the analysis gives a p-value of 0.011. Test, using α = 0.05, for a difference in mean copper concentration among the three sites. Include all details of the test. Answer: : = = : At least one
≠
F = 6.97 and p-value = 0.011 There is evidence of a difference in mean copper concentration among the three sites. Diff: 2 Type: ES Var: 1 L.O.: 8.1.1
323
32) Use the summary information and results from the ANOVA to construct 95% confidence intervals for the differences in each pair of means: (a) Site 1 and Site 2 (b) Site 1 and Site 3 (c) Site 2 and Site 3 In each case, round the margin of error to two decimal places. Based on your work, which sites have significantly different means? Briefly justify your answer. Answer: (a) Site 1 and Site 2 (21.34 - 16.6) ± 2.201 4.74 ± 2.201(2.332488) 4.74 ± 5.13 -0.39 to 9.87 Because this confidence interval contains 0, there is no evidence that Sites 1 and 2 have significantly different mean concentrations of copper. (b) Site 1 and Site 3 (21.34 - 13.16) ± 2.201 8.18 ± 2.201(2.199091) 8.18 ± 4.84 3.34 to 13.02 The confidence interval does not contain 0, thus there is evidence that Sites 1 and 3 have significantly different mean concentrations of copper. Because the interval only contains positive values, there is evidence that Site 1 has a significantly higher mean concentration of copper than Site 3. (c) Site 2 and Site 3 (16.6 - 13.16) ± 2.201 3.44 ± 2.201(2.332488) 3.44 ± 5.13 -1.69 to 8.57 Because this confidence interval contains 0, there is no evidence that Sites 2 and 3 have significantly different mean concentrations of copper. Diff: 2 Type: ES Var: 1 L.O.: 8.2.1
324
33) Computer output from the analysis provides the following information about the pairwise differences: Site = 1 subtracted from: Site 2 3
Lower -11.040 -14.120
Center -4.740 -8.180
Upper 1.560 -2.240
----+---------+---------+---------+----(---------*----------) (---------*---------) ----+---------+---------+---------+-----12.0 -6.0 0.0 6.0
Site = 2 subtracted from: Site 3
Lower Center -9.740 -3.440
Upper 2.860
----+---------+---------+---------+----(---------*----------) ----+---------+---------+---------+-----12.0 -6.0 0.0 6.0
Based on this output, which sites have significantly different means? A) Only Sites 1 and 3 have significantly different means. B) Only Sites 1 and 2 have significantly different means. C) Only Sites 2 and 3 have significantly different means. D) None of the sites have significantly different means. Answer: A Explanation: Only Sites 1 and 3 have significantly different means because that is the only confidence interval that does not contain 0. Because the interval contains entirely negative values, there is evidence that Site 1 has a significantly larger mean concentration of copper than Site 3. Diff: 2 Type: BI Var: 1 L.O.: 8.2.0
325
34) Computer output from the analysis provides the following grouping information: Site 1 2 3
N 5 4 5
Mean 21.340 16.600 13.160
Grouping A A B B
Means that do not share a letter are significantly different. Based on this output, which sites have significantly different means? Briefly justify your answer. A) Only Sites 1 and 3 have significantly different means. B) Only Sites 1 and 2 have significantly different means. C) Only Sites 2 and 3 have significantly different means. D) None of the sites have significantly different means. Answer: A Explanation: Only Sites 1 and 3 have significantly different means because that is the only pair that do not share a letter. Because Site 1 has the larger sample mean in the pair, there is evidence that Site 1 has a significantly larger mean concentration of copper than Site 3. Diff: 3 Type: BI Var: 1 L.O.: 8.2.0 Use the following to answer the questions below: Summary statistics from a dataset and the corresponding computer analysis of variance output are provided. Level A B C
N 25 25 25
Source Groups Error Total
DF 2 72 74
Mean 36.703 30.019 32.483 SS 571.2 1792.9 2364.2
StDev 4.610 5.173 5.166 MS 285.6 24.9
F 11.47
35) What is the pooled standard deviation? A) 24.9 B) 4.99 C) 16.90 D) 42.34 Answer: B Diff: 2 Type: BI Var: 1 L.O.: 8.2.0
326
P 0.000
36) What degrees of freedom are used in doing inferences for these means and differences in means after ANOVA? A) 2 B) 3 C) 72 D) 74 Answer: C Diff: 2 Type: BI Var: 1 L.O.: 8.2.0 37) Find a 90% confidence interval for the mean of population A. Round the margin of error to three decimal places. A) 35.040 to 38.366 B) 35.061 to 38.345 C) 34.747 to 38.660 D) 34.937 to 38.470 Answer: A Explanation: 36.703 ± 1.666
36.703 ± 1.666 36.703 ± 1.663 35.040 to 38.366 Diff: 2 Type: BI L.O.: 8.2.1
Var: 1
38) Find a 95% confidence interval for the difference in the means of Populations A and B. Round the margin of error to three decimal places. A) 4.333 to 9.035 B) 4.362 to 9.006 C) 3.918 to 9.405 D) 4.191 to 9.177 Answer: A Explanation: (36.703 - 30.019) ± 1.666
Diff: 2 L.O.: 8.2.1
6.684 ± 1.666(1.411382) 6.684 ± 2.351 4.333 to 9.035 Type: BI Var: 1
327
39) Test for a difference in population means between groups A and C. Use α = 0.05 and show all details of the test. Round the test statistic to two decimal places. Answer: : = :
≠
t=
= 2.99
p-value = 0.004 There is very strong evidence that the means of populations A and C are significantly different (and that population A has a significantly larger mean that Population C). Diff: 2 Type: ES Var: 1 L.O.: 8.2.2
© 2021 John Wiley & Sons, Inc. All rights reserved. Instructors who are authorized users of this course are permitted to download these materials and use them in connection with the course. Except as permitted herein or by law, no part of these materials should be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise.
Statistics - Unlocking the Power of Data, 3e (Lock) Chapter 9 Inference for Regression 9.1
Inference for Slope and Correlation
Use the following to answer the questions below: Computer output from a regression analysis is provided. The regression equation is Y = 72.9 - 0.519 X Predictor Constant X
Coef 72.909 -0.5195
SE Coef 2.037 0.1946
T 35.79 -2.67
P 0.000 0.008
1) What is the sample slope for this model? A) 72.909 B) 2.037 C) -0.5195 D) 0.1946 Answer: C Diff: 2 Type: BI Var: 1 L.O.: 9.1.1 2) What is the sample intercept for this model? 328
A) 72.909 B) 2.037 C) -0.5195 D) 0.1946 Answer: A Diff: 2 Type: BI L.O.: 9.1.1
Var: 1
3) What is the standard error of the sample slope? A) 72.909 B) 2.037 C) -0.519 D) 0.1946 Answer: D Diff: 2 Type: BI Var: 1 L.O.: 9.1.0
329
4) What is the p-value for testing if the slope in the population is different from zero? A) 0.5195 B) 0.1946 C) p < 0.001 D) 0.008 Answer: D Diff: 2 Type: BI Var: 1 L.O.: 9.1.0 5) The sample size in this situation is n = 157. What are the degrees of freedom for constructing a confidence interval for, or performing a test about, the population slope? A) 157 B) 156 C) 155 D) 153 Answer: C Diff: 2 Type: BI Var: 1 L.O.: 9.1.0 6) The sample size in this situation is n = 157. Construct a 95% confidence interval for the population slope. Round the margin of error to four decimal places. A) -0.9038 to -0.1352 B) -0.9009 to -0.1381 C) -0.8976 to -0.1414 D) -0.9138 to -0.1252 Answer: A Explanation: t* = 1.975 (df = 155) -0.5195 ± 1.975(0.1946) -0.5195 ± 0.3843 -0.9038 to -0.1352 Diff: 2 Type: BI L.O.: 9.1.2
Var: 1
330
7) Use the p-value for testing if the slope in the population is different from zero (and a 5% significance level) to make a clear conclusion about the effectiveness of the model. A) p-value = 0.008 There is very strong evidence that the population slope differs from zero, and thus is an effective model for predicting this response variable. B) p-value = 0.008 There is not enough evidence that the population slope differs from zero, and thus this is not an effective model for predicting this response variable. Answer: A Diff: 2 Type: BI Var: 1 L.O.: 9.1.3 Use the following to answer the questions below: Computer output from a regression analysis is provided. Coefficients: (Intercept) X
Estimate 7.2960 1.6370
Std. Error 14.5444 0.5453
t value 0.502 3.002
8) What is the sample slope for this model? A) 1.6370 B) 0.5453 C) 7.2960 D) 14.5444 Answer: A Diff: 2 Type: BI Var: 1 L.O.: 9.1.1 9) What is the sample intercept for this model? A) 1.6370 B) 0.5453 C) 7.2960 D) 14.5444 Answer: C Diff: 2 Type: BI Var: 1 L.O.: 9.1.1 10) What is the standard error of the sample slope? A) 1.6370 B) 0.5453 C) 7.2960 D) 14.5444 Answer: B Diff: 2 Type: BI Var: 1 L.O.: 9.1.0 331
Pr(>|t|) 0.62200 0.00765
11) What is the p-value for testing if the slope in the population is different from zero? A) 0.502 B) 0.622 C) 0.5453 D) 0.00765 Answer: D Diff: 2 Type: BI Var: 1 L.O.: 9.1.0 12) The sample size in this situation is n = 20. What are the degrees of freedom for constructing a confidence interval, or performing a test about, the population slope? A) 17 B) 18 C) 19 D) 20 Answer: B Diff: 2 Type: BI Var: 1 L.O.: 9.1.0 13) The sample size in this situation is n = 20. Construct a 95% confidence interval for the population slope. Round the margin of error to three decimal places. A) 0.491 to 2.783 B) 0.568 to 2.706 C) 0.578 to 2.697 D) 0.5327 to 2.741 Answer: A Explanation: t* = 2.101 (df = 18) 1.637 ± 2.101(0.5453) 1.637 ± 1.146 0.491 to 2.783 Diff: 2 Type: BI L.O.: 9.1.1
Var: 1
14) Use the p-value for testing if the slope in the population is different from zero (and a 5% significance level) to make a clear conclusion about the effectiveness of the model. A) p-value = 0.00765 There is very strong evidence that the population slope differs from zero, and thus is an effective model for predicting this response variable. B) p-value = 0.00765 There is not enough evidence that the population slope differs from zero, and thus is not an effective model for predicting this response variable. Answer: A Diff: 2 Type: BI Var: 1 L.O.: 9.1.3 332
15) The website for the Quantitative Environmental Learning Project (funded by the National Science Foundation) describes data they collected on the lengths and widths of Puget Sound Butter Clams. A scatterplot of the data (with the regression line) is provided.
Use the scatterplot to check each of the conditions for using a linear model with this data. Is using a linear model appropriate for these data? Answer: There is no curve in the scatterplot, so the linear condition seems reasonable. There are no outliers to be concerned about. There is a pattern that indicates increasing spread (the widths are more spread out for longer clams [8 or more cm long] than shorter clams). This fanning pattern in the data suggests that we shouldn't use a linear model in this situation. Diff: 3 Type: ES Var: 1 L.O.: 9.1.6 16) In a random sample of 41 students, the correlation between Math SAT score and college GPA is 0.289. Is there a significant linear association between Math SAT score and college GPA? Use α = 0.05. Include all details of the test. Round the test statistic to two decimal places. Answer: :ρ=0 :ρ≠0
t=
= 1.89
p-value = 0.066 (two-tail in a t distribution with df = 39) There is no evidence of a significant positive correlation between Math SAT score and college GPA (p-value is less than significance level). Diff: 2 Type: ES Var: 1 L.O.: 9.1.4
333
17) In a random sample of 41 students, the correlation between Verbal SAT score and college GPA is 0.574. Is there evidence of a positive correlation between Verbal SAT score and college GPA? Use a 5% significance level. Include all details of the test. Round the test statistic to two decimal places. Answer: :ρ=0 :ρ>0
t=
= 4.38
p-value ≈ 0 (right tail in a t distribution with df = 39) There is very strong evidence of a positive correlation between Verbal SAT score and college GPA. Diff: 2 Type: ES Var: 1 L.O.: 9.1.4 18) In a random sample of 41 college students, the correlation between number of hours of television watched in a typical week and college GPA is -0.125. Is there evidence of a negative correlation between the amount of television watched and college GPA? Use a 5% significance level. Include all details of the test. Round the test statistic to three decimal places. Answer: :ρ=0 :ρ<0
t=
= -0.787
p-value = 0.218 (left tail in a t distribution with df = 39) There is no evidence that the amount of television watched in a typical week is negatively correlated with college GPA. Diff: 2 Type: ES Var: 1 L.O.: 9.1.4
334
9.2
ANOVA for Regression
Use the following to answer the questions below: In a regression analysis based on a sample of size n = 30, SSModel = 750 and SSTotal = 2,500. 1) Use this information to fill in all values in an analysis of variance table as shown. Source Model Error Total Answer: Source Model Error Total
df
SS
MS
F-statistic
p-value
df 1 28 29
SS 750 1,750 2,500
MS 750 62.5
F-statistic
p-value
12
0.0017
Diff: 2 Type: ES L.O.: 9.2.1 2) Compute A) 30% B) 83% C) 1.2% D) 5.5% Answer: A Explanation:
Var: 1
.
=
Diff: 2 Type: BI L.O.: 9.2.3
= 0.3 or 30% Var: 1
3) Compute the standard deviation of the error term. Use two decimal places in your answer. A) 7.91 B) 6.25 C) 1.50 D) 7.64 Answer: A Explanation:
=
Diff: 2 Type: BI L.O.: 9.2.3
= 7.91 Var: 1
335
Use the following to answer the questions below: In a regression analysis with n = 25, SSE = 1,800 and SSTotal = 2,000. 4) Use this information to fill in all values in an analysis of variance table as shown. Round decimal answers to three decimal places. Source Model Error Total Answer: Source Model Error Total
df
SS
MS
F-statistic
p-value
df 1 23 24
SS 200 1,800 2,000
MS 200 78.261
F-statistic
p-value
2.556
0.124
Diff: 1 Type: ES L.O.: 9.2.1 5) Compute A) 10% B) 0.10% C) 11% D) 0.11% Answer: A Explanation:
Var: 1
.
=
Diff: 2 Type: BI L.O.: 9.2.2
=
= 0.1 or 10%
Var: 1
6) Compute the standard deviation of the error term. Use two decimal places in your answer. A) 8.85 B) 8.49 C) 1.70 D) 1.84 Answer: A Explanation:
=
Diff: 2 Type: BI L.O.: 9.2.3
= 8.85 Var: 1
336
9.3
Confidence and Prediction Intervals
Use the following to answer the questions below: Students in a small statistics course wanted to investigate if forearm length (in cm) was useful for predicting foot length (in cm). The data they collected are displayed in the provided scatterplot (with regression), and the computer output from the analysis is provided. Use three decimal places when reporting the results from any calculations, unless otherwise specified. The regression equation is Foot (cm) = 9.22 + 0.574 Forearm (cm) Predictor Constant Forearm (cm)
Coef 9.216 0.5735
Source Regression Residual Error Total
DF 1 11 12
SE Coef 4.521 0.1578 SS 44.315 36.916 81.231
T 2.04 3.63
MS 44.315 3.356
P 0.066 0.004 F 13.20
P 0.004
Predicted Values for New Observations Forearm (cm) 28
Fit 25.274
SE Fit 0.513
95% CI (24.144, 26.403)
95% PI (21.086, 29.461)
1) Consider the scatterplot. Should we have any significant concerns about the conditions being met for using a linear model with these data? A) Yes B) No Answer: B 337
Explanation: There should be no major concerns. There is not a curved relationship, no real evidence of changing variance, and no major outliers. Diff: 2 Type: MC Var: 1 L.O.: 9.1.6 2) Use the fitted model to predict the foot length for someone whose arm is 30 cm long. Report your answer with two decimal places. A) 26.44 cm B) 25.89 cm C) 26.12 cm D) 25.73 cm Answer: A Explanation: = 9.22 + 0.574(30) = 26.44 The predicted foot length for someone whose arm is 30 cm long is 26.44 cm. Diff: 2 Type: BI Var: 1 L.O.: 9.1.1 3) What is the estimated slope in this regression model? A) 0.574 B) 9.22 C) 0.5735 D) 9.216 Answer: A Explanation: = 0.574 For an additional centimeter in forearm length, foot length is predicted to increase by 0.574 cm. Diff: 2 Type: BI Var: 1 L.O.: 9.1.1 4) What is the test statistic for a test of the slope? What is the p-value? What is the conclusion of the test, in context? A) t = 3.63; p-value = 0.004 There is strong evidence that forearm length is a useful predictor of foot length. B) t = 2.04; p-value = 0.066 There is not enough evidence to conclude that forearm length is a useful predictor of foot length. C) t = 3.63; p-value = 0.004 There is not enough evidence to conclude that forearm length is a useful predictor of foot length. D) t = 2.04; p-value = 0.066 There is strong evidence that forearm length is a useful predictor of foot length. Answer: A Diff: 2 Type: BI Var: 1 L.O.: 9.1.3
338
5) Use the ANOVA table to determine the overall sample size. A) 13 B) 12 C) 11 D) 10 Answer: A Diff: 2 Type: BI Var: 1 L.O.: 9.1.0 6) Construct a 90% confidence interval for the population slope. A) 0.291 to 0.857 B) 0.265 to 0.883 C) 0.314 to 0.834 D) 0.277 to 0.871 Answer: A Explanation: n = 13, so df = 11 and t* = 1.796. 0.574 ± 1.796(0.1578) 0.574 ± 0.283 0.291 to 0.857 Diff: 2 Type: BI L.O.: 9.1.2
Var: 1
7) Use the ANOVA table to compute and interpret . A) 0.546 About 55% of the variability in foot lengths in this sample is explained by the person's forearm length. B) 0.298 About 30% of the variability in foot lengths in this sample is explained by the person's forearm length. C) 0.454 About 45% of the variability in foot lengths in this sample is explained by the person's forearm length. D) 0.206 About 21% of the variability in foot lengths in this sample is explained by the person's forearm length. Answer: A Explanation:
=
= 0.546
About 55% of the variability in foot lengths in this sample is explained by the person's forearm length. Diff: 2 Type: BI Var: 1 L.O.: 9.2.2
339
8) The correlation between foot length and forearm length is 0.7389. Compute and interpret for this regression model. Answer: = = 0.546 About 55% of the variability in foot lengths in this sample is explained by the person's forearm length. Diff: 2 Type: ES Var: 1 L.O.: 9.1.5 9) Use the ANOVA table to find the standard deviation of the error term. Round your answer to three decimal places. Answer: 1.832 Explanation:
=
Diff: 2 Type: SA L.O.: 9.2.3
= 1.832 Var: 1
10) Based on the output, provide and interpret a 95% confidence interval for the mean foot length for all individuals with a forearm that is 28 cm long. Answer: CI = (24.144, 26.403) We are 95% sure that the mean foot length for all individuals with a forearm that is 28 cm long is between 24.1 cm and 26.4 cm. Diff: 2 Type: ES Var: 1 L.O.: 9.3.1 11) Based on the output, provide and interpret a 95% prediction interval for the foot length of a specific individual with a forearm that is 28 cm long. Answer: PI = (21.086, 29.461) We are 95% sure that the foot length for a single individual with forearm that is 28 cm long is between 21.1 cm and 29.5 cm. Diff: 2 Type: ES Var: 1 L.O.: 9.3.2
340
12) When conducting inference for the population slope, it is most common to test if the population slope is different from zero. However, there are other situations where a different test might be more interesting. For instance, it is often said that the length of the forearm is roughly the same as the length of the foot (see, for example, the movie Pretty Woman). What population slope is implied by this statement, and what would the hypotheses for testing the accuracy of this claim look like? Answer: The statement implies that the population slope about 1 (since forearm and foot length are roughly the same. The hypotheses that would be tested in this case are: : =1 :
≠1
Diff: 3 Type: ES L.O.: 9.1.0
Var: 1
341
Use the following to answer the questions below: Data were collected on GPA and number of Facebook friends for students in a small statistics class. Some summary statistics, partial output from the regression analysis, and a scatterplot of the data (with regression line) are provided. Assume that students in this class are typical of all students at the university. Use three decimal places when reporting the results from any calculations, unless otherwise specified. Variable GPA FacebookFriends
Mean 3.2067 677.9
StDev 0.4118 307.2
Minimum Maximum 2.4000 3.9500 185.0 1500.0
The regression equation is GPA = 3.830 - 0.000919 FacebookFriends Source Regression Error Total
DF 1 28 29
SS 2.31193 2.60534 4.91727
MS 2.31193 0.09305
F 24.85
P 0.000
13) Use the scatterplot to determine whether we should have any strong concerns about the conditions being met for using a linear model with these data. A) Yes B) No Answer: B Explanation: No, there shouldn't be any concerns. There is a linear pattern (no curve), there is no fanning pattern, and there are no outliers. Diff: 2 Type: MC Var: 1 L.O.: 9.1.6
342
14) Use the equation of the least squares line to predict the GPA for someone with 800 Facebook friends. A) 3.095 B) 3.830 C) 3.153 D) 3.024 Answer: A Explanation: = 3.83 - 0.000919(800) = 3.095 Someone with 800 Facebook friends is predicted to have a 3.095 GPA. Diff: 2 Type: BI Var: 1 L.O.: 9.1.1 15) Use the information in the ANOVA table to determine the number of students included in the dataset. A) 31 B) 30 C) 29 D) 28 Answer: B Diff: 2 Type: BI Var: 1 L.O.: 9.2.0 16) Use the information in the ANOVA table to compute and interpret . A) = 0.470 About 47% of the variability in GPA for students in this sample is explained by number of Facebook friends. B) = 0.686 About 69% of the variability in GPA for students in this sample is explained by number of Facebook friends. C) = 0.470 About 53% of the variability in GPA for students in this sample is explained by number of Facebook friends. D) = 0.686 About 31% of the variability in GPA for students in this sample is explained by number of Facebook friends. Answer: A Explanation:
=
= 0.470
About 47% of the variability in GPA for students in this sample is explained by number of Facebook friends. Diff: 2 Type: BI Var: 1 L.O.: 9.2.2
343
17) Is the linear model effective at predicting GPA? Use the information from the computer output and α = 0.05. Include all details of the test. Answer: : =1 : or
≠1 : The model is ineffective versus
: The model is effective
F = 24.85 p-value = 0 (using 1 and 28 degrees of freedom) There is very strong evidence that the linear model is effective for predicting GPA. Diff: 2 Type: ES Var: 1 L.O.: 9.2.1 18) Use the information in the computer output to compute the standard deviation of the error term. A) 0.305 B) 0.093 C) 0.412 D) 0.317 Answer: A Explanation:
=
Diff: 2 Type: BI L.O.: 9.2.3
= 0.305 Var: 1
344
19) Use the provided output to construct a 90% confidence interval for the mean GPA of all students with 800 Facebook friends. A) 2.993 to 3.197 B) 2.760 to 3.430 C) 3.035 to 3.155 D) 2.916 to 3.274 Answer: A Explanation: df = 28, so t* = 1.701 3.095 ± 1.701 ∙ 0.305 ∙
3.095 ± 1.701(0.305)(0.1969283) 3.095 ± 0.102 2.993 to 3.197 We are 90% sure that the mean GPA of all students with 800 Facebook friends is between 2.993 and 3.197. Diff: 2 Type: BI Var: 1 L.O.: 9.3.1 20) Use the provided output to construct a 90% prediction interval for the GPA of a student with 800 Facebook friends. A) 2.566 to 3.624 B) 1.361 to 4.829 C) 2.784 to 3.406 D) 2.224 to 3.966 Answer: A Explanation: df = 28, so t* = 1.701 3.095 ± 1.701 ∙ 0.305 ∙ 3.095 ± 1.071(0.305)(1.019206) 3.095 ± 0.529 2.566 to 3.624 We are 90% sure that a student with 800 Facebook friends will have a GPA between 2.566 and 3.624. Diff: 2 Type: BI Var: 1 L.O.: 9.3.2
345
21) Use the following output to identify and interpret a 95% interval for the mean GPA for all students with 500 Facebook friends. Predicted Values for New Observations FacebookFriends 500
Fit 3.3702
SE Fit 95% CI 0.0646 (3.2378, 3.5026)
95% PI (2.7315, 4.0089)
A) CI: (3.2378, 3.5036) We are 95% sure that the mean GPA for all students with 500 Facebook friends is between 3.2378 and 3.5036. B) PI: (2.7315, 4.0089) We are 95% sure that the GPA of a student with 500 Facebook friends is between 2.7315 and 4.0089. Answer: A Diff: 2 Type: BI Var: 1 L.O.: 9.3.1 22) Use the following output to identify and interpret a 95% interval for the GPA of a single student with 500 Facebook friends. Predicted Values for New Observations FacebookFriends 500
Fit 3.3702
SE Fit 95% CI 0.0646 (3.2378, 3.5026)
95% PI (2.7315, 4.0089)
A) PI: (2.7315, 4.0089) We are 95% sure that the GPA of a student with 500 Facebook friends is between 2.7315 and 4.0089. B) CI: (3.2378, 3.5036) We are 95% sure that the mean GPA for all students with 500 Facebook friends is between 3.2378 and 3.5036. Answer: A Diff: 2 Type: BI Var: 1 L.O.: 9.3.2
346
23) The correlation between GPA and number of Facebook friends is -0.686. Use the correlation and α = 0.05 to test for a linear association between GPA and number of Facebook friends. Include all details of the test. Answer: :ρ=0 :ρ≠0
= -4.989
t=
p-value ≈ 0 (two-tail probability in t distribution with df = 28) There is very strong evidence of a linear association between GPA and number of Facebook friends. Diff: 2 Type: ES Var: 1 L.O.: 9.1.4 24) The correlation between GPA and number of Facebook friends is -0.686. Use the correlation and α = 0.05 to test for a negative linear association between GPA and number of Facebook friends. Include all details of the test. Answer: :ρ=0 :ρ<0
t=
= -4.989
p-value ≈ 0 (left tail probability in t distribution with df = 28) There is very strong evidence of a negative linear association between GPA and number of Facebook friends. Diff: 2 Type: ES Var: 1 L.O.: 9.1.4 25) Use the information in the computer output to compute the standard error of the slope, SE. Round the answer to six decimal places. A) 0.000184 B) 0.000183 C) 0.000992 D) 0.000993 Answer: A Explanation: SE = Diff: 2 Type: BI L.O.: 9.2.4
= 0.000184 Var: 1 347
26) Compute the t test statistic for the slope. A) -4.989 B) -5.022 C) -5.465 D) - 5.479 Answer: A Explanation: They will have needed to compute the SE of the slope. t=
= -4.989
Diff: 3 Type: BI L.O.: 9.1.0
Var: 1
348
Use the following to answer the questions below: Data were collected on the age (in years) and price (in thousands of dollars) of a random sample of 25 used Hyundai Elantras. A scatterplot of the data (with regression line) and computer output from a regression analysis are provided. Use three decimal places when reporting the results from any calculations, unless otherwise specified. The regression equation is Price = 15.3 - 1.71 Age Predictor Constant Age S = 1.37179
Coef 15.2912 -1.7126
SE Coef 0.5840 0.1264
T 26.18 -13.55
R-Sq = 88.9%
P 0.000 0.000
R-Sq(adj) = 88.4%
Predicted Values for New Observations Age 3
Fit 10.154
SE Fit 0.306
95% CI (9.520, 10.787)
95% PI (7.246, 13.061)
27) Use the scatterplot to determine whether we should have any serious concerns about the conditions being met for using a linear model with these data. A) Yes B) No Answer: B Explanation: There are no concerns. There is no curved pattern, no fanning pattern, and no outliers. Diff: 2 Type: MC Var: 1 L.O.: 9.1.6
349
28) What is the estimated slope in this regression model? Interpret the slope in context. A) The estimated slope is -1.71. For each additional year of age, the predicted price of the car (used Hyundai Elantra) decreases by $1,710. B) The estimated slope is -1.71. For each additional year of age, the predicted price of the car (used Hyundai Elantra) decreases by $1.71. C) The estimated slope is 15.3. The cost of a new used Hyundai Elantra is approximately $15,300. D) The estimated slope is 15.3. For each additional year of age, the predicted price of the car (used Hyundai Elantra) decreases by $1530. Answer: A Diff: 2 Type: BI Var: 1 L.O.: 9.1.1 29) Use the equation of the least squares line to predict the price of a used Hyundai Elantra that is 6 years old. A) $5,040 B) $13,540 C) $6,750 D) $7,750 Answer: A Explanation: Price = 15.3 - 1.71 Age = 15.3 - 1.71(6) = 5.040 A 6-year-old Hyundai Elantra is predicted to cost $5,040. Diff: 2 Type: BI Var: 1 L.O.: 9.1.1 30) What are the degrees of freedom for constructing a confidence interval for, or performing a test about, the population slope? A) 25 B) 24 C) 23 D) 22 Answer: C Diff: 2 Type: BI Var: 1 L.O.: 9.1.0
350
31) Use the computer output to test the slope to determine whether age is an effective predictor of price. Use α = 0.05. A) There is very strong evidence that age is an effective predictor of price. B) There is not enough evidence to conclude that age is an effective predictor of price. Answer: A Explanation: : =0 :
≠0
t = -13.55 p-value = 0 (using df = 23) There is very strong evidence that age is an effective predictor of price. Diff: 2 Type: BI Var: 1 L.O.: 9.1.3 32) Construct and interpret a 90% confidence interval for the population slope. Answer: df = 23 so t* = 1.714 -1.7126 ± 1.714(0.1264) -1.7126 ± 0.2166 -1.929 to -1.496 We are 90% sure that for each additional year of age, the predicted cost of Hyundai Elantras decreases by between 1.496 and 1.929 thousand dollars ($1,496 and $1,929). Diff: 2 Type: ES Var: 1 L.O.: 9.1.2 33) What is the for this model? Interpret it in context. Answer: = 0.889 (or 88.9%) The age of the cars explains about 89% of the variability in the prices of the cars in this sample. Diff: 2 Type: ES Var: 1 L.O.: 9.1.5 34) Based on the available information, what is the correlation between age and price (in thousands of dollars) of used Hyundai Elantras? A) 0.943 B) -0.943 C) 9.43 D) -9.43 Answer: B Diff: 3 Type: BI Var: 1 L.O.: 9.1.5
351
35) Use the computer output to provide and interpret a 95% interval for the mean price of all 3-year-old used Hyundai Elantras. Answer: CI = (9.520, 10.787) We are 95% sure that the mean price of all 3-year-old used Hyundai Elantras is between 9.520 and 10.787 thousand dollars ($9,520 and $10,787). Diff: 2 Type: ES Var: 1 L.O.: 9.3.1 36) Use the computer output to provide and interpret a 95% interval for the price of a 3-year-old used Hyundai Elantra. Answer: PI = (7.246, 13.061) We are 95% sure that the price of a single 3-year-old used Hyundai Elantra is between 7.246 and 13.061 thousand dollars ($7,246 and $13,061). Diff: 2 Type: ES Var: 1 L.O.: 9.3.2
352
Use the following to answer the questions below: Data were collected on the mileage (in thousands of miles) and price (in thousands of dollars) of a random sample of used Hyundai Elantras. A scatterplot of the data (with regression line), some summary statistics, and partial computer output from a regression analysis are provided. Use three decimal places when reporting the results from any calculations, unless otherwise specified. Variable Price Mileage
Mean 8.304 60.01
StDev Minimum 4.025 1.900 39.31 0.90
Maximum 15.900 138.20
The regression equation is Price = 13.8 - 0.0912 Mileage Source Regression Residual Error Total
DF 1 23 24
SS 308.32 80.57 388.89
MS 308.32 3.50
F 88.01
P 0.000
37) Use the scatterplot to determine whether we should have any strong concerns about the conditions being met for using a linear model with these data. A) Yes B) No Answer: B Explanation: There aren't any concerns—there is no curved pattern, no fanning of the data, and no serious outliers. Diff: 2 Type: MC Var: 1 L.O.: 9.1.6
353
38) Use the equation of the least squares line to predict the price of a used Hyundai Elantra with 50,000 miles. A) 9,240 B) $6,900 C) $12,888 D) $13,344 Answer: A Explanation: Price = 13.8 - 0.0912 Mileage = 13.8 - 0.0912(50) = 9.240 The predicted price of a used Hyundai Elantra with 50,000 miles (coded as 50) is $9,240. Diff: 2 Type: BI Var: 1 L.O.: 9.1.1 39) Use the provided output to compute A) 0.793 B) 0.672 C) 0.891 D) 0.736 Answer: A Explanation:
=
.
= 0.793
79.3% of the variability in the price of used Hyundai Elantras in this sample is explained by the mileage of the cars. Diff: 2 Type: BI Var: 1 L.O.: 9.2.2 40) Use the information in the ANOVA table to determine the number of cars in the sample. A) 25 B) 24 C) 23 D) 22 Answer: A Diff: 2 Type: BI Var: 1 L.O.: 9.2.0
354
41) Is the linear model effective at predicting the price of used Hyundai Elantras? Use the information from the computer output and α = 0.05. Include all details of the test. Answer: : =0 : (or
≠0 : The model not effective versus
: The model effective)
F = 88.01 p-value = 0 (using 1 and 23 degrees of freedom) There is very strong evidence that mileage is a useful predictor of the price of used Hyundai Elantras (i.e., the model is effective). Diff: 2 Type: ES Var: 1 L.O.: 9.2.1 42) Use the provided computer output to compute the standard deviation of the error term. A) 1.872 B) 3.504 C) 1.832 D) 3.357 Answer: A Explanation:
=
Diff: 2 Type: BI L.O.: 9.2.3
= 1.872 Var: 1
43) Use the provided output to construct and interpret a 95% interval for the mean price of all used Hyundai Elantras with 50,000 miles. Answer: df = 23 so t* = 2.069 9.24 ± 2.069 ∙ 1.872 ∙ 9.24 ± 2.069(1.872)(0.2066441) 9.24 ± 0.8 8.440 to 10.040 We are 95% sure that the mean price of all used Hyundai Elantras with 50,000 is between $8,440 and $10,040 (8.440 and 10.040 thousand dollars). Diff: 2 Type: ES Var: 1 L.O.: 9.3.1
355
44) Use the provided output to construct and interpret a 95% interval for the price of a single used Hyundai Elantra with 50,000 miles. Answer: df = 23 so t* = 2.069 9.24 ± 2.069 ∙ 1.872 ∙ 9.24 ± 2.069(1.872)(1.021128) 9.24 ± 3.955 5.285 to 13.195 We are 95% sure that the price of a single used Hyundai Elantra with 50,000 miles is between $5,285 and $13,195 (5.285 and 13.195 thousand dollars). Diff: 2 Type: ES Var: 1 L.O.: 9.3.3 45) Use the following computer output to identify and interpret a 95% interval for the mean price of all used Hyundai Elantras with 30,000 miles. Predicted Values for New Observations Mileage 30
Fit 11.040
SE Fit 0.475
95% CI (10.058, 12.022)
95% PI (7.046, 15.034)
A) We are 95% sure that the mean price of all used Hyundai Elantras with 30,000 miles is between $10,058 and $12,022. B) We are 95% sure that the price of a single used Hyundai Elantra with 30,000 miles is between $7046 and $15,034. C) We are 95% sure that the price of a single used Hyundai Elantras with 30,000 miles is between $10,058 and $12,022. D) We are 95% sure that the mean price of all used Hyundai Elantras with 30,000 miles is between $7046 and $15,034. Answer: A Diff: 2 Type: BI Var: 1 L.O.: 9.3.1
356
46) Use the following computer output to identify and interpret a 95% interval for the price of a single used Hyundai Elantra with 70,000 miles. Mileage 70
Fit 7.393
SE Fit 0.387
95% CI (6.593, 8.193)
95% PI (3.440, 11.347)
A) We are 95% sure that the price of a single used Hyundai Elantra with 70,000 miles is between $3,440 and $11,347. B) We are 95% sure that the mean price of all used Hyundai Elantras with 70,000 miles is between $3,440 and $11,347. C) We are 95% sure that the price of a single used Hyundai Elantra with 70,000 miles is between $6,593 and $8,193. D) We are 95% sure that the mean price of all used Hyundai Elantras with 70,000 miles is between $6,593 and $8,193. Answer: A Diff: 2 Type: BI Var: 1 L.O.: 9.3.2 47) Use the information in the computer output to compute the standard error of the slope, SE. Round your answer to four decimal places. A) 0.0097 B) 0.0099 C) 0.0101 D) 0.0103 Answer: A Explanation: SE = Diff: 2 Type: BI L.O.: 9.2.4
= 0.0097 Var: 1
48) Compute the t test statistic for the slope. A) -9.402 B) -9.212 C) -9.030 D) -8.854 Answer: A Explanation: They will have needed to compute the SE of the slope. t=
= -9.402
Diff: 3 Type: BI L.O.: 9.1.0
Var: 1
357
Use the following to answer the questions below: Fast food restaurants are required to publish nutrition information about the foods they serve. Nutrition information about a random sample of 15 McDonald's lunch/dinner menu items (excluding sides and drinks) was obtained from their website. We wish to use the total fat content (in grams) to better understand the number of calories in the lunch/dinner menu items at McDonald's. Computer output from a regression analysis and a scatterplot (with regression line) of the data are provided. Use two decimal places when reporting the results from any calculations, unless otherwise specified. The regression equation is Calories = 137.1 + 15.06 Total Fat (g) Predictor Constant Total Fat (g)
Coef 137.08 15.055
SE Coef 40.64 1.649
S = 62.7442
R-Sq = 86.5%
T 3.37 9.13
P 0.005 0.000
R-Sq(adj) = 85.5%
Predicted Values for New Observations Total Fat (g) 25
Fit 513.5
SE Fit 95% CI 16.7 (477.4, 549.5)
95% PI (373.2, 653.7)
49) Using the scatterplot, should we should have any major concerns about the conditions being met for using a linear model with these data? A) Yes B) No Answer: B Explanation: There are no major concerns. There is no curved or fanning pattern. There are possibly a couple of outliers (items with more fat than the rest of the sample), but they do follow the overall trend of the data and thus shouldn't be a major issue. 358
Diff: 2 Type: MC L.O.: 9.1.6
Var: 1
50) Use the equation of the least squares line to predict the number of calories in a menu item with 20 grams of fat. A) 438.30 calories B) 334.44 calories C) 289.80 calories D) 410.60 calories Answer: A Explanation: Calories = 137.1 + 15.06 Total Fat (g) = 137.1 + 15.06(20) = 438.30 We predict that a lunch/dinner menu item with 20 grams of fat would have 4 Diff: 2 Type: BI Var: 1 L.O.: 9.1.1
38.30 calories.
51) What is the estimated slope in this regression model? A) 15.06 B) 16.7 C) -15.06 D) -16.7 Answer: A Explanation: = 15.06 For each additional gram of fat in McDonald's lunch/dinner menu items, we predict the number of calories to increase by 15.06 calories. Diff: 2 Type: BI Var: 1 L.O.: 9.1.1 52) What are the degrees of freedom for constructing a confidence interval for, or performing a test about, the population slope? A) 15 B) 14 C) 13 D) 12 Answer: C Diff: 2 Type: BI Var: 1 L.O.: 9.1.0
359
53) Use the computer output, and α = 0.05, to test the slope to determine whether total fat content (g) is an effective predictor of the number of calories. Include all details of the test. Answer: : =0 :
≠0
t = 9.13 p-value ≈ 0 (using df = 13) There is very strong evidence that total fat content (g) is an effective predictor of the number of calories in McDonald's lunch/dinner menu items. Diff: 2 Type: ES Var: 1 L.O.: 9.1.3 54) Construct and interpret a 99% confidence interval for the population slope. Answer: df = 13, so t* = 3.012 15.055 ± 3.012(1.649) 15.055 ± 4.967 10.09 to 20.02 We are 99% sure that for each additional gram in the total fat content, the number of calories in McDonald's lunch/dinner menu items is predicted to increase by between 10.09 and 20.02 calories. Diff: 2 Type: ES Var: 1 L.O.: 9.1.2
360
55) What is the for this model? Interpret it in context. A) = 86.5% 86.5% of the variability in the number of calories for lunch/dinner menu items in this sample is explained by the total fat content (g). B) = 86.5% 86.5% of the variability in the the total fat content in this sample is explained by the number of calories. C) = 85.5% 85.5% of the variability in the number of calories for lunch/dinner menu items in this sample is explained by the total fat content (g). D) = 85.5% 85.5% of the variability in the the total fat content in this sample is explained by the number of calories. Answer: A Explanation: = 86.5% 86.5% of the variability in the number of calories for lunch/dinner menu items in this sample is explained by the total fat content (g). Diff: 2 Type: BI Var: 1 L.O.: 9.1.5 56) Based on the available information, what is the correlation between total fat content (g) and number of calories for McDonald's lunch/dinner menu items in this sample? A) 0.93 B) -0.93 C) 9.3 D) -9.3 Answer: A Diff: 2 Type: BI Var: 1 L.O.: 9.1.5
361
57) Use the computer output to provide and interpret a 95% interval for the mean number of calories in all McDonald's lunch/menu items with 25 total grams of fat. A) CI: (477.4, 549.5) We are 95% sure that the mean number of calories in all McDonald's lunch/menu items with 25 total grams of fat is between 477.4 and 549.5 calories. B) CI: (477.4, 549.5) We are 95% sure that a single lunch/dinner menu item at McDonald's with 25 total grams of fat will have between 477.4 and 477.5 calories. C) PI: (373.2, 653.7) We are 95% sure that the mean number of calories in all McDonald's lunch/menu items with 25 total grams of fat is between 373.2 and 653.7 calories. D) PI: (373.2, 653.7) We are 95% sure that a single lunch/dinner menu item at McDonald's with 25 total grams of fat will have between 373.2 and 653.7 calories. Answer: A Diff: 2 Type: BI Var: 1 L.O.: 9.3.1 58) Use the computer output to provide and interpret a 95% interval for the price of a lunch/dinner menu item with 25 total grams of fat. A) PI: (373.2, 653.7) We are 95% sure that a single lunch/dinner menu item at McDonald's with 25 total grams of fat will have between 373.2 and 653.7 calories. B) PI: (373.2, 653.7) We are 95% sure that the mean number of calories in all McDonald's lunch/menu items with 25 total grams of fat is between 373.2 and 653.7 calories. C) CI: (477.4, 549.5) We are 95% sure that the mean number of calories in all McDonald's lunch/menu items with 25 total grams of fat is between 477.4 and 549.5 calories. D) CI: (477.4, 549.5) We are 95% sure that a single lunch/dinner menu item at McDonald's with 25 total grams of fat will have between 477.4 and 477.5 calories. Answer: A Diff: 2 Type: BI Var: 1 L.O.: 9.3.2
362
59) The website also provides information about the sugar content in the menu items at McDonald's. For this sample of 15 lunch/dinner menu items, the correlation between number of calories and sugar content (in grams) is 0.35. Test, at the 5% significance level, if there is a significant linear association between number of calories and sugar content for McDonald's lunch/dinner menu items. Include all details of the test. Round the test statistic to three decimal places. Answer: :ρ=0 :ρ≠0
t=
= 1.347
p-value = 0.201 There is no evidence of a significant linear association between the number of calories and sugar content of McDonald's lunch/dinner menu items. Diff: 2 Type: ES Var: 1 L.O.: 9.1.4
363
Use the following to answer the questions below: Fast food restaurants are been required to publish nutrition information about the foods they serve. Nutrition information about a random sample of McDonald's lunch/dinner menu items (excluding sides and drinks) was obtained from their website. We wish to use the sodium content (in milligrams) to better understand the number of calories in the lunch/dinner menu items at McDonald's. Some summary statistics, partial computer output from a regression analysis, and a scatterplot (with regression line) of the data are provided. Use two decimal places when reporting the results from any calculations, unless otherwise specified. Variable Calories Sodium (mg)
Mean 477.3 1021.3
StDev 164.6 373.8
The regression equation is Calories = 99.69 + 0.3698 Sodium (mg) Source Regression Error Total
DF 1 13 14
SS 267501 111793 379293
MS 267501 8599
F 31.11
P 0.000
60) Using the scatterplot, should we should have any serious concerns about the conditions being met for using a linear model with these data. A) Yes B) No Answer: B
364
Explanation: There should be no serious concerns. There is no curved pattern and no fanning of the data. There is one unusual observation (about 1100 mg of sodium and nearly 800 calories), but it is not too bad. Diff: 2 Type: MC Var: 1 L.O.: 9.1.6 61) Use the equation of the least squares line to predict the number of calories in a lunch/dinner menu item with 1,000 mg of sodium. A) 469.49 calories. B) 505.63 calories. C) 424.89 calories. D) 448.38 calories. Answer: A Explanation: Calories = 99.69 + 0.3698 Sodium (mg) = 99.69 + 0.3698(1,000) = 469.49 We predict that a lunch/dinner menu item with 1,000 mg of sodium would have 469.49 calories. Diff: 2 Type: BI Var: 1 L.O.: 9.1.1 62) What is the estimated slope in this regression model? Interpret the slope in context. A) 0.3698 For each additional mg of sodium in McDonald's lunch/dinner menu items, we predict the number of calories to increase by 0.3698 calories. B) 0.3698 For each additional calorie in McDonald's lunch/dinner menu items, we predict the grams of sodium to increase by 0.3698 mg. C) 99.69 For each additional mg of sodium in McDonald's lunch/dinner menu items, we predict the number of calories to increase by 99.69 calories. D) 99.69 For each additional calorie in McDonald's lunch/dinner menu items, we predict the grams of sodium to increase by 99.69 mg. Answer: A Diff: 2 Type: BI Var: 1 L.O.: 9.1.1 63) Use the information in the ANOVA table to determine the number of menu items in the sample. A) 15 B) 14 C) 13 D) 12 Answer: A Diff: 2 Type: BI Var: 1 L.O.: 9.2.0 365
64) Use the provided output to compute and interpret A) = 0.71
.
71% of the variability in the number of calories in McDonald's lunch/dinner menu items in the sample is explained by the sodium content (mg). B) = 0.71 71% of the variability in the sodium content (mg) in McDonald's lunch/dinner menu items in the sample is explained by the number of calories. C) = 0.42 42% of the variability in the number of calories in McDonald's lunch/dinner menu items in the sample is explained by the sodium content (mg). D) = 0.42 42% of the variability in the sodium content (mg) in McDonald's lunch/dinner menu items in the sample is explained by the number of calories. Answer: A Explanation:
=
= 0.71
71% of the variability in the number of calories of McDonald's lunch/dinner menu items in the sample is explained by the sodium content (mg). Diff: 2 Type: BI Var: 1 L.O.: 9.2.2 65) Is the linear model effective at predicting the number of calories in lunch/dinner menu items at McDonald's? Use the information from the computer output (and α = 0.05) for this test. Include all details of the test. Answer: : =0 : (or
≠0 : The model is ineffective versus
: The model is effective).
F = 31.11 p-value ≈ 0 There is very strong evidence that this model, which uses sodium content as the predictor, is effective at predicting the number of calories in lunch/dinner menu items at McDonald's. Diff: 2 Type: ES Var: 1 L.O.: 9.2.1
366
66) Use the provided computer output to compute the standard deviation of the error term. Use two decimal places in your answer. A) 92.73 B) 89.36 C) 164.6 D) 138.23 Answer: A Explanation:
=
Diff: 2 Type: BI L.O.: 9.2.3
= 92.73 Var: 1
67) Use the provided output to construct and interpret a 95% interval for the mean number of calories in all McDonald's lunch/dinner menu items with 1,000 mg of sodium. A) 417.68 to 521.30 We are 95% sure that the mean number of calories in all McDonald's lunch/dinner menu items with 1,000 mg of sodium is between 417.68 and 521.30 calories. B) 417.68 to 521.30 We are 95% sure that a single McDonald's lunch/dinner menu item with 1,000 mg of sodium has between 417.68 and 521.30 calories. C) 262.60 to 676.38 We are 95% sure that a single McDonald's lunch/dinner menu item with 1,000 mg of sodium has between 262.60 and 676.38 calories. D) 262.60 to 676.38 We are 95% sure that the mean number of calories in all McDonald's lunch/dinner menu items with 1,000 mg of sodium is between 262.60 and 676.38 calories. Answer: A Explanation: 469.49 ± 2.160 ∙ 92.73 ∙ 469.49 ± 2.160(100.418)(0.2586476) 469.49 ± 51.81 417.68 to 521.30 We are 95% sure that the mean number of calories in all McDonald's lunch/dinner menu items with 1,000 mg of sodium is between 417.68 and 521.30 calories. Diff: 2 Type: BI Var: 1 L.O.: 9.3.1
367
68) Use the provided output to construct and interpret a 95% interval for the number of calories in a single McDonald's lunch/dinner menu item with 1,000 mg of sodium. A) 262.60 to 676.38 We are 95% sure that a single McDonald's lunch/dinner menu item with 1,000 mg of sodium has between 262.60 and 676.38 calories. B) 262.60 to 676.38 We are 95% sure that the mean number of calories in all McDonald's lunch/dinner menu items with 1,000 mg of sodium is between 262.60 and 676.38 calories. C) 417.68 to 521.30 We are 95% sure that the mean number of calories in all McDonald's lunch/dinner menu items with 1,000 mg of sodium is between 417.68 and 521.30 calories. D) 417.68 to 521.30 We are 95% sure that a single McDonald's lunch/dinner menu item with 1,000 mg of sodium has between 417.68 and 521.30 calories. Answer: A Explanation: 469.49 ± 2.160 ∙ 92.73 ∙ 469.49 ± 2.16(100.418)(1.032908) 469.49 ± 206.89 262.60 to 676.38 We are 95% sure that a single McDonald's lunch/dinner menu item with 1,000 mg of sodium has between 262.60 and 676.38 calories. Diff: 2 Type: BI Var: 1 L.O.: 9.3.2 69) Use the following output to identify and interpret a 95% interval for the mean number of calories in all McDonald's lunch/dinner menu items with 1,200 mg of sodium. Predicted Values for New Observations Sodium (mg) 1200
Fit 543.4
SE Fit 95% CI 26.7 (485.7, 601.1)
95% PI (334.9, 751.9)
A) We are 95% sure that the mean number of calories in all McDonald's lunch/dinner menu items with 1,200 mg of sodium is between 485.7 and 601.1 calories. B) We are 95% sure that the mean number of calories in all McDonald's lunch/dinner menu items with 1,200 mg of sodium is between 334.9 and 751.9 calories. C) We are 95% sure that the number of calories in a single McDonald's lunch/dinner menu item with 1200 mg of sodium is between 485.7 and 601.1 calories. D) We are 95% sure that the number of calories in a single McDonald's lunch/dinner menu item with 1200 mg of sodium is between 334.9 and 751.9 calories. Answer: A Diff: 2 Type: BI Var: 1 L.O.: 9.3.1 368
70) Use the following output to identify and interpret a 95% interval for the number of calories in a single McDonald's lunch/dinner menu item with 800 mg of sodium. Predicted Values for New Observations Sodium (mg) 800
Fit 395.5
SE Fit 95% CI 28.1 (334.8, 456.2)
95% PI (186.2, 604.8)
A) We are 95% sure that the number of calories in a single McDonald's lunch/dinner menu item with 800 mg of sodium is between 186.2 and 604.8. B) We are 95% sure that the number of calories in a single McDonald's lunch/dinner menu item with 800 mg of sodium is between 334.8 and 456.2. C) We are 95% sure that the mean number of calories in all McDonald's lunch/dinner menu items with 800 mg of sodium is between 186.2 and 604.8. D) We are 95% sure that the mean number of calories in all McDonald's lunch/dinner menu items with 800 mg of sodium is between 334.8 and 456.2. Answer: A Diff: 2 Type: BI Var: 1 L.O.: 9.3.2 71) The website also provides information about the sugar content in the menu items at McDonald's. For this sample of lunch/dinner menu items, the correlation between number of calories and sugar content (in grams) is 0.35. Test, at the 5% significance level, if there is a significant linear association between number of calories and sugar content for McDonald's lunch/dinner menu items. Include all details of the test. Answer: :ρ=0 :ρ≠0
t=
= 1.347
p-value = 0.201 There is no evidence of a significant linear association between the number of calories and sugar content of McDonald's lunch/dinner menu items. Diff: 2 Type: ES Var: 1 L.O.: 9.1.4
369
72) Use the information in the computer output to compute the standard error of the slope, SE. Report your answer with four decimal places. A) 0.0663 B) 0.0639 C) 0.0688 D) 0.0698 Answer: A Explanation: SE = Diff: 2 Type: BI L.O.: 9.2.4
= 0.0663 Var: 1
73) Compute the t test statistic for the slope. A) 5.58 B) 6.21 C) 5.74 D) 6.34 Answer: A Explanation: t =
= 5.58
Diff: 3 Type: BI L.O.: 9.1.0
Var: 1
74) Two intervals are given for the same value of the explanatory variable. Which interval is the confidence interval for the mean response at this value of the explanatory variable? A) 32 to 38 B) 29 to 41 Answer: A Diff: 2 Type: BI Var: 1 L.O.: 9.3.0 75) Two intervals are given for the same value of the explanatory variable. Which interval is the prediction interval for an individual response at this value of the explanatory variable? A) 115 to 133 B) 120 to 128 Answer: A Diff: 2 Type: BI Var: 1 L.O.: 9.3.0
370
Use the following to answer the questions below: A quantitatively savvy, young couple is interested in purchasing a home in northern New York. They collected data on 48 houses that had recently sold in the area. They want to predict the selling price of homes (in thousands of dollars) based on the size of the home (in square feet). The regression equation is Price (in thousands) = 17.1 + 0.0643 Size (sq. ft.) Predictor Constant Size (sq. ft.) S = 48.5733
Coef 17.06 0.06427 R-Sq = 37.5%
SE Coef 24.59 0.01224
T 0.69 5.25
P 0.491 0.000
R-Sq(adj) = 36.1%
Predicted Values for New Observations Size (sq. ft.) 2000
Fit 145.61
SE Fit 95% CI 7.07 (131.38, 159.83)
95% PI (46.80, 244.41)
76) Using the scatterplot, should we have any serious concerns about the conditions being met for using a linear model with these data. A) Yes B) No Answer: B Explanation: There should be no serious concerns. There is no curved pattern and no fanning of the data. Diff: 2 Type: MC Var: 1 L.O.: 9.1.6
371
77) Use the equation of the least squares line to predict the selling price of a home that is 1,742 square feet in size. A) $129,111 B) $299,000 C) $143,773 D) $115,582 Answer: A Explanation: Price (in thousands) = 17.1 + 0.0643 Size (sq. ft.) = 17.1 + 0.0643(1742) = 129.111 The predicted price of a 1,742 square foot house is $129,111. Diff: 2 Type: BI Var: 1 L.O.: 9.1.1 78) What is the estimated slope in this regression model? Interpret the slope in context. A) 0.06427 For an increase of 1 square foot in the size of a recently sold house, the predicted selling price increases by $64.27. B) 0.06427 For an increase of $1,000 in selling price the size of the house increases by 64.27 square feet . C) 17.1 For an increase of 1 square foot in the size of a recently sold house, the predicted selling price increases by $17.10. D) 17.1 For an increase of $1,000 in selling price the size of the house increases by 17.1 square feet . Answer: A Diff: 2 Type: BI Var: 1 L.O.: 9.1.1 79) What are the degrees of freedom for constructing a confidence interval for, or performing a test about, the population slope? A) 48 B) 47 C) 46 D) 45 Answer: C Diff: 2 Type: BI Var: 1 L.O.: 10.1.0
372
80) Use the computer output to test the slope, at the 5% level, to determine whether size (in square feet) is an effective predictor of the selling price of recently sold homes. Include all details of the test. Answer: : = 0 (or Size is not an effective predictor of selling price.) :
≠ 0 (or Size is an effective predictor of selling price.)
t = 5.25 p-value ≈ 0 There is very strong evidence that size (in square feet) is an effective predictor of the selling price of homes. Diff: 2 Type: ES Var: 1 L.O.: 9.1.3 81) Construct a 95% confidence interval for the population slope. A) 0.0397 to 0.0889 B) 0.0395 to 0.0891 C) 0.0392 to 0.0894 D) 0.0389 to 0.0897 Answer: A Explanation: df = 46, so t* = 2.013 0.0643 ± 2.013(0.01224) 0.0643 ± 0.0246 0.0397 to 0.0889 We are 95% sure that for each additional 1 square foot in the size of a house, the predicted selling price increase by between $39.70 and $88.9 (0.0397 and 0.0889 thousand dollars). Diff: 2 Type: BI Var: 1 L.O.: 9.1.2
373
82) What is the for this model? Interpret it in context. A) = 37.5% 37.5% of the variability in the selling prices of the sampled homes is explained by the size (in square feet). B) = 37.5% 37.5% of the variability in the size (ins square feet) of the sampled homes is explained by the selling price. C) = 36.1% 36.1% of the variability in the selling prices of the sampled homes is explained by the size (in square feet). D) = 36.1% 36.1% of the variability in the size (ins square feet) of the sampled homes is explained by the selling price. Answer: A Diff: 2 Type: BI Var: 1 L.O.: 9.1.5 83) Based on the available information, what is the correlation between selling price (in thousands) and size (square feet) of the sample of recently sold homes? Use three decimal places in your answer. Answer: 0.612 Diff: 2 Type: SA Var: 1 L.O.: 9.1.5 84) Use the computer output to provide and interpret a 95% interval for the mean selling price of all 2,000 square foot houses in this portion of northern New York. A) CI: (131.38, 159.83) We are 95% sure that the mean selling price of all 2,000 square foot houses in this portion of northern New York is between $131,380 and $159,830. B) CI: (131.38, 159.83) We are 95% sure that the selling price of a 2,000 square foot house in this portion of northern New York is between $131,380 and $159,830. C) PI: (46.80, 244.41) We are 95% sure that the mean selling price of all 2,000 square foot houses in this portion of northern New York is between $46,800 and $244,410. D) PI: (46.80, 244.41) We are 95% sure that the selling price of a 2,000 square foot house in this portion of northern New York is between $46,800 and $244,410. Answer: A Explanation: CI: (131.38, 159.83) We are 95% sure that the mean selling price of all 2,000 square foot houses in this portion of northern New York is between $131,380 and $159,830. Diff: 2 Type: BI Var: 1 L.O.: 9.3.1
374
85) Use the computer output to provide and interpret a 95% interval for the selling price of a single 2,000 square foot house in this portion of northern New York. A) PI: (46.80, 244.41) We are 95% sure that the selling price of a 2,000 square foot house in this portion of northern New York is between $46,800 and $244,410. B) CI: (131.38, 159.83) We are 95% sure that the mean selling price of all 2,000 square foot houses in this portion of northern New York is between $131,380 and $159,830. C) CI: (131.38, 159.83) We are 95% sure that the selling price of a 2,000 square foot house in this portion of northern New York is between $131,380 and $159,830. D) PI: (46.80, 244.41) We are 95% sure that the mean selling price of all 2,000 square foot houses in this portion of northern New York is between $46,800 and $244,410. Answer: A Explanation: CI: (131.38, 159.83) We are 95% sure that the mean selling price of all 2,000 square foot houses in this portion of northern New York is between $131,380 and $159,830. Diff: 2 Type: BI Var: 1 L.O.: 9.3.2
375
Use the following to answer the questions below: A quantitatively savvy, young couple is interested in purchasing a home in northern New York. They collected data on houses that had recently sold in the area. They want to predict the selling price of homes (in thousands of dollars) based on the age of the home (in years). Some summary statistics, partial regression output, and a scatterplot of the relationship (with regression line) are provided. Use two decimal places when reporting the results from any calculations, unless otherwise specified. Variable Price (in thousands) Age
Mean 140.86 78.69
StDev 60.78 44.71
The regression equation is Price (in thousands) = 193 - 0.665 Age Analysis of Variance Source Regression Residual Error Total
DF 1 46 47
SS 41580 132025 173605
MS 41580 2870
F 14.49
P 0.000
86) Using the scatterplot, should we have any major concerns about the conditions being met for using a linear model with these data? A) Yes B) No Answer: B Explanation: There is no curved pattern or fanning in the scatterplot, thus we shouldn't have any major concerns about using a linear model. Diff: 2 Type: MC Var: 1 L.O.: 9.1.6 376
87) Use the equation of the least squares to predict the selling price of a 92-year-old home. A) $131,820 B) $152,754 C) $155,398 D) $124,227 Answer: A Explanation: Price (in thousands) = 193 - 0.665 Age = 193 - 0.665(92) = 131.82 The predicted selling price of a 92-year-old home is $131,820. Diff: 2 Type: BI Var: 1 L.O.: 9.1.1 88) What is the estimated slope in this regression model? Interpret the slope in context. A) -0.665 For each additional year of age, the selling price of homes in this portion of northern New York is predicted to decrease by $665. B) -0.665 For each additional year of age, the selling price of homes in this portion of northern New York is predicted to increase by $665. C) 193 For each additional year of age, the selling price of homes in this portion of northern New York is predicted to increase by $193. D) 193 For each additional year of age, the selling price of homes in this portion of northern New York is predicted to decrease by $193. Answer: A Explanation: = -0.665 For each additional year of age, the selling price of homes in this portion of northern New York is predicted to decrease by $665. Diff: 2 Type: BI Var: 1 L.O.: 9.1.1 89) Use the information in the ANOVA table to determine how many homes were used in the sample. Answer: 48 Diff: 2 Type: SA Var: 1 L.O.: 9.2.0
377
90) Use the provided output to compute
.
A) = 0.24 B) = 0.32 C) = 0.43 D) = 0.31 Answer: A Explanation:
=
= 0.24
24% of the variability in the selling price of the sample recently sold home is explained by the age of the home. Diff: 2 Type: BI Var: 1 L.O.: 9.2.2 91) Is the linear model effective at predicting the selling price of homes in this portion of northern New York? Use the provided computer output (and α = 0.05) for this test. A) Yes B) No Answer: A Explanation: : = 0 (or The model is ineffective.) :
≠ 0 (or The model is effective.)
F = 14.49 p-value ≈ 0 There is very strong evidence that the linear model is effective at predicting the selling price of homes in this portion of northern New York. Diff: 2 Type: MC Var: 1 L.O.: 9.2.1 92) Use the provided computer output to compute the standard deviation of the error term. Answer: 53.57 Explanation:
=
Diff: 2 Type: SA L.O.: 9.2.3
= 53.57 Var: 1
378
93) Construct and interpret a 95% interval for the mean selling price of all 92-year-old homes. A) 115.57 to 148.07 We are 95% sure that the mean selling price of all 92-year-old homes is between $115,570 and $148,070. B) 115.57 to 148.07 We are 95% sure that selling price of a 92-year-old home is between $115,570 and $148,070. C) 22.77 to 240.88 We are 95% sure that selling price of a 92-year-old home is between $22,770 and $240,880. D) 22.77 to 240.88 We are 95% sure that the mean selling price of all 92-year-old homes is between $22,770 and $240,880. Answer: A Explanation: df = 46, so t* = 2.013 131.82 ± 2.013 ∙ 53.57 31.82 ± 2.013(53.57)(0.150728) 131.82 ± 16.25 115.57 to 148.07 We are 95% sure that the mean selling price of all 92-year-old homes is between $115,570 and $148,070. Diff: 2 Type: BI Var: 1 L.O.: 9.3.1
379
94) Construct and interpret a 95% interval for the selling price of a single 92-year-old home. A) 22.77 to 240.88 We are 95% sure that selling price of a 92-year-old home is between $22,770 and $240,880. B) 115.57 to 148.07 We are 95% sure that the mean selling price of all 92-year-old homes is between $115,570 and $148,070. C) 115.57 to 148.07 We are 95% sure that selling price of a 92-year-old home is between $115,570 and $148,070. D) 22.77 to 240.88 We are 95% sure that the mean selling price of all 92-year-old homes is between $22,770 and $240,880. Answer: A Explanation: df = 46, so t* = 2.013 131.82 ± 2.013 ∙ 53.57 131.82 ± 2.013(53.57)(1.011296) 131.82 ± 109.055 22.77 to 240.88 We are 95% sure that selling price of a 92-year-old home is between $22,770 and $240,880. Diff: 2 Type: BI Var: 1 L.O.: 9.3.2 95) Use the following output to identify and interpret a 95% interval for the mean selling price of all 50-year-old homes in this portion of northern New York. Predicted Values for New Observations Age 50
Fit 159.94
SE Fit 9.22
95% CI (141.39, 178.49)
95% PI (50.52, 269.36)
A) CI: (141.39, 178.49) We are 95% sure that the mean selling price of all 50-year-old homes in this portion of northern New York is between $141,390 and $178,490. B) CI: (141.39, 178.49) We are 95% sure that the selling price of a 50-year-old house in this portion of northern New York is between $141,390 and $178,490. C) PI: (50.52, 269.36) We are 95% sure that the selling price of a 50-year-old house in this portion of northern New York is between $50,520 and $269,360. D) We are 95% sure that the mean selling price of all 50-year-old homes in this portion of northern New York is between $50,520 and $269,360. Answer: A Diff: 2 Type: BI Var: 1 L.O.: 9.3.1 380
96) Use the following output to identify and interpret a 95% interval for the selling price of a 50-year-old house in this portion of northern New York. Predicted Values for New Observations Age 50
Fit 159.94
SE Fit 9.22
95% CI (141.39, 178.49)
95% PI (50.52, 269.36)
A) PI: (50.52, 269.36) We are 95% sure that the selling price of a 50-year-old house in this portion of northern New York is between $50,520 and $269,360. B) We are 95% sure that the mean selling price of all 50-year-old homes in this portion of northern New York is between $50,520 and $269,360. C) CI: (141.39, 178.49) We are 95% sure that the mean selling price of all 50-year-old homes in this portion of northern New York is between $141,390 and $178,490. D) CI: (141.39, 178.49) We are 95% sure that the selling price of a 50-year-old house in this portion of northern New York is between $141,390 and $178,490. Answer: A Diff: 2 Type: BI Var: 1 L.O.: 9.3.2 97) Use the information in the computer output to compute the standard error of the slope, SE. Report your answer with four decimal places. Answer: 0.1748 Explanation:
=
Diff: 2 Type: SA L.O.: 9.2.4
= 0.1748 Var: 1
98) Compute the t test statistic for the slope. Answer: -3.80 Explanation: t =
= -3.80
Diff: 3 Type: SA L.O.: 9.1.0
Var: 1
381
99) Is there evidence of a negative correlation between the selling price of homes in this portion of northern New York and their age? Use α = 0.05. Include all details of the test. Answer: First, use to find the sample correlation: r = = -0.4899 :ρ=0 :ρ<0 = -3.81
t=
p-value = 0.00021 (left tail in t with df = 46, using Statkey) There is very strong evidence of a negative correlation between the selling price of homes in this portion of northern New York and their age. Diff: 3 Type: ES Var: 1 L.O.: 9.1.4;9.1.5
© 2021 John Wiley & Sons, Inc. All rights reserved. Instructors who are authorized users of this course are permitted to download these materials and use them in connection with the course. Except as permitted herein or by law, no part of these materials should be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise.
Statistics - Unlocking the Power of Data, 3e (Lock) Chapter 10 Multiple Regression 10.1
Multiple Predictors
Use the following to answer the questions below: The ANOVA table from a multiple regression analysis is provided. Source Regression Residual Error Total
DF 4 30 34
SS 10723 21442 32165
MS 2680.75 714.73
F 3.751
P 0.014
1) How many predictors are in the model? Answer: 4 Diff: 2 Type: SA Var: 1 L.O.: 10.1.0 2) How large is the sample size? Answer: 35 Diff: 2 Type: SA Var: 1 L.O.: 10.1.0 3) Compute A) 0.333 B) 0.667 C) 0.501
for this model. Round to three decimal places.
382
D) 0.083 Answer: A Diff: 2 Type: BI L.O.: 10.1.4
Var: 1
383
10.2
Checking Conditions for a Regression Model
Use the following to answer the questions below: While many people count calories, some often don't think about calories in the beverages they consume. Starbucks, one of the leading coffeehouse chains, provides nutrition information about all of their beverages on their website. Nutrition information, including number of calories, fat (g), carbohydrates (g), and protein (g), was collected on a random sample of Starbucks' 16 ounce ("Grande") hot espresso drinks. Note that all of the drinks in the sample are made with 2% milk unless the name specifically included the term "Skinny," which is how Starbucks indicated a beverage made with nonfat milk. The regression equation is Calories = 6.7 + 9.61 Fat (g) + 3.43 Carbs (g) + 4.42 Protein (g) Predictor Constant Fat (g) Carbs (g) Protein (g)
Coef 6.68 9.609 3.4350 4.418
S = 13.2293
SE Coef 17.64 1.452 0.3155 2.231
R-Sq = 98.8%
T 0.38 6.62 10.89 1.98
P 0.715 0.000 0.000 0.083
R-Sq(adj) = 98.4%
Analysis of Variance Source Regression Residual Error Total
DF 3 8
SS 116867 1400 11
MS 38956 175 118267
F 222.58
P 0.000
1) The "Caramel Macchiato" was one of the drinks selected for the sample. When made with 2% milk, a grande Caramel Macchiato has 7 grams of fat, 34 grams of carbohydrates, and 10 grams of protein. Predict the number of calories in a Caramel Macchiato. Round to two decimal places. A) 234.79 calories B) 235.00 calories C) 347.79 calories D) 241.60 calories Answer: A Explanation: = 6.7 + 9.61(7) + 3.43(34) + 4.42(10) = 234.79 A grande Caramel Macchiato (with 2% milk) is predicted to have 234.79 calories. Diff: 2 Type: BI Var: 1 L.O.: 10.1.1
384
2) Interpret the coefficient of Fat in context. Answer: If the amount of carbohydrates and protein were to remain unchanged, an additional gram of fat is predicted to increase the number of calories in Starbucks grande hot espresso beverages by 9.61 calories. Diff: 2 Type: ES Var: 1 L.O.: 10.1.1 3) How many drinks were used in this sample? A) 12 B) 11 C) 10 D) 9 Answer: A Diff: 2 Type: BI Var: 1 L.O.: 10.1.0 4) Interpret for this model. Answer: 98.8% of the variability in the number of calories in the Starbucks hot espresso beverages in the sample is explained by knowing the fat, carbohydrate, and protein contents of the drinks. Diff: 2 Type: ES Var: 1 L.O.: 10.1.4 5) Is the model effective according to the ANOVA test? Use a 5% significance level. Include all details of the test. Answer: : = = = 0 (The model is ineffective and all of the predictors can be omitted.) : At least one
≠ 0 (At least one of the predictors in the model is effective.)
F = 222.58 p-value ≈ 0 There is very strong evidence that at least one of the predictors in the model is effective for explaining the number of calories in Starbucks hot espresso drinks. Diff: 2 Type: ES Var: 1 L.O.: 10.1.3 6) Which predictors are significant at the 5% level? A) Fat and Carbs B) Fat C) Carbs D) Fat, Carbs, and Protein Answer: A Diff: 2 Type: BI Var: 1 L.O.: 10.1.2
385
7) A dotplot of the residuals and a scatterplot of the residuals versus the predicted values are provided. Discuss whether the conditions for a multiple linear regression are reasonable by referring to the appropriate plots.
Answer: Both the dotplot of the residuals and the scatterplot of the residuals versus the predicted values indicate a potential outlier (with a residual of -25) causing some concern about the normality condition. There is no obvious pattern in the scatterplot indicating that the linearity and consistent variability conditions are reasonably satisfied. Overall, we might have some minor concerns about using multiple regression because of the outlier. Diff: 2 Type: ES Var: 1 L.O.: 10.2.1;10.2.2
386
8) Which of the following scatterplots of the residuals versus the predicted values does not indicate problems with either the linearity or the consistent variability conditions?
A) A B) B C) C Answer: C Diff: 2 Type: BI L.O.: 10.2.2
Var: 1
387
Use the following to answer the questions below: Output for a model to predict the GPAs of students at a small university based on their Math SAT scores, Verbal SAT scores, and the number of hours spent watching television in a typical week is provided. The regression equation is GPA = 1.80 + 0.00104 Math SAT + 0.00142 Verbal SAT - 0.0147 TV Predictor Constant Math SAT Verbal SAT TV S = 0.366780
Coef 1.8015 0.0010442 0.0014182 -0.014708
SE Coef 0.1842 0.0002500 0.0002398 0.003269
R-Sq = ????%
T 9.78 4.18 5.91 -4.50
P 0.000 0.000 0.000 0.000
R-Sq(adj) = 19.0%
Analysis of Variance Source Regression Residual Error Total
DF ? ? 447
SS 14.4886 59.7304 74.2190
MS 4.8295 0.1345
F 35.90
P 0.000
9) Predict the GPA of a student at this university with a Math SAT score of 600, a Verbal SAT score of 580, and who watches 5 hours of television in a typical week. Round to three decimal places. A) 3.174 B) 3.233 C) 3.248 D) 3.142 Answer: A Explanation: = 1.80 + 0.00104(600) + 0.00142(580) - 0.0147(5) = 3.174 The predicted GPA for a student at this university with a Math SAT score of 600, a Verbal SAT score of 580, and who watches 5 hours of television in a typical week is 3.174. Diff: 2 Type: BI Var: 1 L.O.: 10.1.1 10) Interpret the coefficient of TV in context. Answer: If the Math and Verbal SAT scores remain unchanged, when the amount of TV watched in a typical week increases by 1 hour, the predicted GPA decreases by 0.0147. Diff: 2 Type: ES Var: 1 L.O.: 10.1.1
388
11) The for this model is missing in the provided output. Use the available information to compute (round to three decimal places) for this model. A) 0.195 B) 0.243 Answer: A Explanation:
=
= 0.195
19.5% of the variability in the GPAs of the students in the sample is explained by their Math SAT score, Verbal SAT score, and the amount of time spent watching television in a typical week. Diff: 2 Type: BI Var: 1 L.O.: 10.1.4 12) Use the output to determine how many students were included in the sample. Answer: 448 Diff: 2 Type: SA Var: 1 L.O.: 10.1.0 13) Some of the information in the ANOVA table is missing. How many degrees of freedom should appear in the "Regression" row of the table? Answer: 3 Diff: 2 Type: SA Var: 1 L.O.: 10.1.0 14) Some of the information in the ANOVA table is missing. How many degrees of freedom should be listed in the "Residual Error" row? Answer: 444 Diff: 2 Type: SA Var: 1 L.O.: 10.1.0 15) At the 5% significance level, is the model effective according to the ANOVA test. Include all details of the test. Answer: : = = = 0 (or The model is ineffective and all predictors can be omitted.) : At least one
≠ 0 (or At least one of the predictors in the model is effective.)
F = 35.90 p-value ≈ 0 There is very strong evidence that at least one of the predictors in the model is effective for explaining the GPA of students at this university. Diff: 2 Type: ES Var: 1 L.O.: 10.1.3
389
16) Which predictors are significant at the 5% level? A) Math SAT, Verbal SAT, and TV B) Verbal SAT, and TV C) Math SAT, Verbal SAT D) Math SAT, and TV Answer: A Diff: 2 Type: BI Var: 1 L.O.: 10.1.2
390
17) A dotplot of the residuals and a scatterplot of the residuals versus the predicted values are provided. Discuss whether the conditions for a multiple linear regression are reasonable by referring to the appropriate plots.
Answer: The dotplot of the residuals seems to have a couple of residuals below -1 (which are potential outliers). We have some minor concerns about the normality condition. The scatterplot of the residuals versus the predicted values shows no signs of violations of either the linearity or consistent variability conditions, thus we have no concerns about those. Diff: 2 Type: ES Var: 1 L.O.: 10.2.1;10.2.2
391
10.3
Using Multiple Regression
Use the following to answer the questions below: Fast food restaurants are required to publish nutrition information about the foods they serve. Nutrition information for a random sample of McDonald's lunch/dinner menu items (excluding sides and drinks) was obtained from their website. Output from a multiple regression analysis is provided. The regression equation is Calories = 65.2 + 9.46 Total Fat (g) + 0.876 Cholesterol (mg) + 0.131 Sodium (mg) Predictor Constant Total Fat (g) Cholesterol (mg) Sodium (mg) S = 39.4529
Coef 65.18 9.464 0.8762 0.13149
SE Coef 31.41 1.710 0.6366 0.04790
R-Sq = 95.5%
Analysis of Variance Source Regression Residual Error Total
DF 3 11 14
SS 362171 17122 379293
T 2.08 5.53 1.38 2.75
P 0.062 0.000 0.196 0.019
R-Sq(adj) = 94.3%
MS 120724 1557
F 77.56
P 0.000
1) What are the explanatory variables used in this model? A) Total Fat (g), Cholesterol (mg), and Sodium (mg) B) Total Fat (g), Cholesterol (mg), Sodium (mg), and Calories C) Total Fat (g) and Calories D) Cholesterol (mg), Sodium (mg), and Calories Answer: A Diff: 2 Type: BI Var: 1 L.O.: 10.1.0 2) Use the provided output to determine how many menu items were included in the sample. A) 12 B) 13 C) 14 D) 15 Answer: D Diff: 2 Type: BI Var: 1 L.O.: 10.1.1
392
3) One of the menu items in the sample is the "McDouble," which has 390 calories, 12 grams of fat, 65 mg of cholesterol, and 850 mg of sodium. What is the predicted response for the McDouble? Round your answer to two decimal places. Answer: 347.01 Explanation: = 65.2 + 9.46(12) + 0.876(65) + 0.131(850) = 347.01 Diff: 2 Type: SA Var: 1 L.O.: 10.1.1 4) One of the menu items in the sample is the "McDouble," which has 390 calories, 12 grams of fat, 65 mg of cholesterol, and 850 mg of sodium. What is the residual for the McDouble? Round your answer to two decimal places. Answer: 42.99 Explanation: = 65.2 + 9.46(12) + 0.876(65) + 0.131(850) = 347.01 e = 390 - 347.01 = 42.99 Diff: 2 Type: SA Var: 1 L.O.: 10.1.0;10.2.0 5) Which predictor appears to be the most important in this model? Explain briefly. A) Total fat (g) B) Cholesterol (mg) C) Sodium (mg) D) Calories Answer: A Explanation: Total fat (g) appears to be most important as it has the smallest p-value. Diff: 2 Type: BI Var: 1 L.O.: 10.1.2 6) Interpret the coefficient of Sodium in context. Answer: For each additional milligram of sodium, the number of calories is predicted to increase by 0.131, when all other variables are held constant (i.e., if total fat and cholesterol do not change). Diff: 2 Type: ES Var: 1 L.O.: 10.1.1 7) Interpret for this model. Answer: Total fat, cholesterol, and sodium together explain 95.5% of the variation in number of calories for the menu items in this sample. Diff: 2 Type: ES Var: 1 L.O.: 10.1.4
393
8) At the 5% significance level, is the model effective according to the ANOVA test? Include all details of the test. Answer: : = = = 0 (or Model is ineffective and all predictors could be omitted.) : At least one
≠ 0 (or At least one predictor in the model is effective.)
F = 77.56 p-value ≈ 0 (using 3 and 11 degrees of freedom) There is very strong evidence that this model (using total fat, cholesterol, and sodium) is effective for predicting the number of calories in McDonald's lunch/dinner menu items. Diff: 2 Type: ES Var: 1 L.O.: 10.1.3 9) Which predictors are significant at the 5% level? What are their p-values? A) Total fat and sodium B) Total fat, cholesterol, and sodium C) Total fat D) Cholesterol, and sodium Answer: A Diff: 2 Type: BI Var: 1 L.O.: 10.1.2
394
10) A boxplot of the residuals and a scatterplot of the residuals versus the predicted values are provided. Discuss whether the conditions for a multiple linear regression are reasonable by referring to the appropriate plots.
Answer: The boxplot is roughly symmetric and there are no outliers, indicating that the normality condition is reasonably satisfied. The scatterplot of the residuals versus the predicted values is fairly scattered. No curved or fanning patterns are apparent, indicating that the linearity and consistent variability conditions are reasonably satisfied. Because all conditions are satisfied, multiple regression with these data is reasonable. Diff: 1 Type: ES Var: 1 L.O.: 10.2.1;10.2.2
395
11) Which variable, if any, would you suggest trying to eliminate first to possibly improve this model? Describe one way in which you might determine if the model had been improved by removing that variable. Explain briefly. Answer: We should first try to eliminate cholesterol as a predictor because it is least significant, with a p-value of 0.196. After removing the predictor, we should refit the model and examine each of the following: 1) — if removing cholesterol causes a large drop in this, we might prefer the model with cholesterol. 2) The residual standard error — if this is smaller without cholesterol in the model, it was a good idea to remove cholesterol. 3) Overall model p-value — if this is smaller without cholesterol, removing cholesterol was a good idea (this might be hard to compare as p-values are often quite small). 4) F-statistic from ANOVA — if this is larger without cholesterol, removing cholesterol from the model was a good idea. 5) Adjusted — if this does not decrease, removing cholesterol from the model was a good idea. Diff: 2 Type: ES Var: 1 L.O.: 10.3.1
396
Use the following to answer the questions below: Data were collected on the age (in years), mileage (in thousands of miles), and price (in thousands of dollars) of a random sample of used Hyundai Elantras. Output from two models are provided. Single Predictor Model: The regression equation is Price = 13.8 - 0.0912 Mileage Predictor Constant Mileage
Coef 13.7751 -0.091167
SE Coef 0.6930 0.009718
T 19.88 -9.38
P 0.000 0.000
Two Predictor Model: The regression equation is Price = 15.2 - 0.0101 Mileage - 1.55 Age Predictor Constant Mileage Age S = 1.39445
Coef 15.2174 -0.01005 -1.5466 R-Sq = 89.0%
SE Coef 0.6112 0.01977 0.3508
T 24.90 -0.51 -4.41
P 0.000 0.616 0.000
R-Sq(adj) = 88.0%
Analysis of Variance Source Regression Residual Error Total
DF 2 22 24
SS 346.11 42.78 388.89
MS 173.06 1.94
F 89.00
P 0.000
12) What is the explanatory variable used in the single predictor model? Answer: Mileage (in thousands of miles) Diff: 2 Type: ES Var: 1 L.O.: 10.1.0 13) One of the cars in the sample was a 5-year-old Hyundai Elantra with 87,100 miles being sold for $6,000. What is the predicted price of this car using the single predictor model? Round to three decimal places. Answer: $5,856. Explanation: = 13.8 - 0.0912(87.1) = 5.856 Using the single predictor model, the predicted price of the car is $5,856. Diff: 2 Type: SA Var: 1 L.O.: 10.1.1
397
14) One of the cars in the sample was a 5-year-old Hyundai Elantra with 87,100 miles being sold for $6,000. What is the predicted price of the car using the two predictor model? Round to three decimal places. Answer: $6,570. Explanation: = 15.2 - 0.0101(87.1) - 1.55(5) = 6.570 Using the two predictor model, the predicted price of this car is $6,570. Diff: 2 Type: SA Var: 1 L.O.: 10.1.0;10.2.0 15) Is mileage a significant single predictor of the price of used Hyundai Elantras? Use α = 0.05. Include all details of your test. Answer: : = 0 (or Mileage is not an effective predictor of :
the price of used Hyundai Elantras.) ≠ 0 (or Mileage is an effective predictor of the price of used Hyundai Elantras.)
t = -9.38 p-value ≈ 0 There is very strong evidence that mileage is a significant single predictor of the price of used Hyundai Elantras. Diff: 2 Type: ES Var: 1 L.O.: 10.1.2 16) Explain why Age is a potential confounding variable in the relationship between Age and Price of used Hyundai Elantras. Answer: Age is likely a confounding variable because as older cars are likely to have been driven more and thus should tend to be higher mileage. Diff: 2 Type: ES Var: 1 L.O.: 10.3.3 17) Is the two predictor model effective according to the ANOVA test? Use α = 0.05. Include all details of the test. Answer: : = = 0 (The model is ineffective and all predictors could be omitted.) : At least one
≠ 0 (At least one predictor in the model is effective.)
F = 89.0 p-value ≈ 0 (using 2 and 22 degrees of freedom) There is very strong evidence that at least one predictor in the model is effective for explaining the price of used Hyundai Elantras. Diff: 2 Type: ES Var: 1 L.O.: 10.1.3 398
18) Is mileage a significant predictor of the price of used Hyundai Elantras, even after accounting for age? A) Yes B) No Answer: B Explanation: No, mileage is not a significant predictor of the price of used Hyundai Elantras after counting for age because the p-value is 0.616, which is not significant at any reasonable significance level. Diff: 2 Type: MC Var: 1 L.O.: 10.1.2 19) Use the provided output to determine how many cars were in the sample. A) 22 B) 23 C) 24 D) 25 Answer: D Diff: 2 Type: BI Var: 1 L.O.: 10.1.0
399
20) A boxplot of the residuals and a scatterplot of the residuals versus the predicted values from the two predictor model are provided. Discuss whether the conditions for a multiple linear regression are reasonable by referring to the appropriate plots.
Answer: The boxplot is reasonably symmetric and there are no outliers, indicating that there should be no concerns about the normality of the residuals condition. There is no curved or fanning pattern in the scatterplot of the residual versus the predicted values, indicating that both the linearity and consistent variability conditions are reasonably satisfied. There should be no major concerns about using multiple regression with these data. Diff: 2 Type: ES Var: 1 L.O.: 10.2.1;10.2.2
400
21) Regression output for the model that only uses Age as a predictor in the model is provided. Assuming that the residuals for this single predictor model do not indicate any problems, is this model an improvement over the model that uses both Age and Mileage as predictors? Statistically justify your answer by discussing at least two quantitative criteria. The regression equation is Price = 15.3 - 1.71 Age Predictor Constant Age
Coef 15.2912 -1.7126
SE Coef 0.5840 0.1264
S = 1.37179
R-Sq = 88.9%
T 26.18 -13.55
P 0.000 0.000
R-Sq(adj) = 88.4%
Analysis of Variance Source Regression Residual Error Total
DF 1 23 24
SS 345.61 43.28 388.89
MS 345.61 1.88
F 183.66
P 0.000
Answer: Yes, this model is an improvement over the model that uses both Age and Mileage because 1) There are no insignificant predictors in the model. 2) The of this new model has barely changed (88.9% compare to 89% for the two-predictor model). 3) The residual standard error is lower for this model (1.372 compared to 1.394 for the two-predictor model). 4) The F-statistic from the ANOVA test is much larger for this model (183.66 compared to 89 for the two-predictor model). 5) The Adjusted is larger for this model (88.4% compared to 88% for the two-predictor model). Diff: 2 Type: ES Var: 1 L.O.: 10.3.1
401
Use the following to answer the questions below: A quantitatively savvy, young couple is interested in purchasing a home in northern New York. They collected data on houses that had recently sold in the two towns they are considering. The variables they collected are the selling price of the home (in thousands of dollars), the size of the home (in square feet), the age of the home (in years), and the town in which the house is located (coded 1 = Canton and 0 = Potsdam). Output from their multiple regression analysis is provided. The regression equation is Price (in thousands) = 69.2 + 0.0627 Size (sq. ft.) - 0.632 Age + 1.6 Town Predictor Constant Size (sq. ft.) Age Town
Coef 69.23 0.06267 -0.6319 1.65
SE Coef 25.10 0.01024 0.1328 12.15
S = 40.0763
R-Sq = 59.3%
T 2.76 6.12 -4.76 0.14
P 0.008 0.000 0.000 0.893
R-Sq(adj) = 56.5%
Analysis of Variance Source Regression Residual Error Total
DF 3 44 47
SS 102936 70669 173605
MS 34312 1606
F 21.36
P 0.000
22) One of the houses they are considering is a 92-year-old, 1,742 square foot house in Canton. What is the predicted selling price of this house? Round to three decimal places. Answer: $121,879 Explanation: = 69.2 + 0.0627(1742) - 0.632(92) + 1.6(1) = 121.879 The predicted selling price of this house is $121,879. Diff: 2 Type: SA Var: 1 L.O.: 10.1.1 23) One of the houses they are considering is a 62-year-old, 1,865 square foot house in Potsdam. What is the predicted selling price of this house? Round to three decimal places. Answer: $146,952 Explanation: = 69.2 + 0.0627(1865) - 0.632(62) + 1.6(0) = 146.952 The predicted selling price of this house is $146,952. Diff: 2 Type: SA Var: 1 L.O.: 10.1.1
402
24) Interpret the coefficient of Age in context. Answer: When comparing houses of the same size in the same town, for each additional year of age, the selling price of the house is predicted to decrease by $632. Diff: 2 Type: ES Var: 1 L.O.: 10.1.1 25) Interpret the coefficient of Town in context. Answer: When comparing houses of the same age and size, a house located in Canton is predicted to cost $1,600 more than one in Potsdam. Diff: 2 Type: ES Var: 1 L.O.: 10.1.1;10.3.2 26) How many houses are used in this dataset? A) 48 B) 47 C) 46 D) 45 Answer: A Diff: 2 Type: BI Var: 1 L.O.: 10.1.0 27) Interpret for this model. Answer: 59.3% of the variability of the selling prices of the homes in the sample is explained by the size (in sq. ft), age, and town. Diff: 2 Type: ES Var: 1 L.O.: 10.1.4 28) Using α = 0.05, is the model effective according to the ANOVA test? Include all details of the test. Answer: : = = = 0 (The model is ineffective and all predictors could be omitted.) : At least one
≠ 0 (At least one of the predictors in the model is effective.)
F = 21.36 p-value ≈ 0 There is very strong evidence that at least one of the predictors used in this model is effective for explaining the selling prices of homes in this area. Diff: 2 Type: ES Var: 1 L.O.: 10.1.3
403
29) Which predictors are significant at the 5% level? A) Size and Age B) Size C) Age D) Size, Age, and Town Answer: A Diff: 2 Type: BI Var: 1 L.O.: 10.1.2 30) A dotplot of the residuals and a scatterplot of the residuals versus the predicted values are provided. Discuss whether the conditions for a multiple linear regression are reasonable by referring to the appropriate plots.
Answer: The dotplot of the residuals is reasonably symmetric with no serious outliers (there is one home with a residual a little larger than 90, which is the largest residual). There is no major problem with the normality condition. The scatterplot of the residuals versus the predicted values displays no patterns (curve or fanning) indicating that there are no problems with the linearity or consistent variability conditions. Diff: 2 Type: ES Var: 1 L.O.: 10.2.1;10.2.2
404
31) Regression output for a model that omits Town as a predictor is provided. Assuming that the residuals for this reduced model do not indicate any problems with using multiple regression, is this model an improvement over the model that uses Size, Age, and Town as predictors? Statistically justify your answer by discussing at least two quantitative criteria. The regression equation is Price (in thousands) = 70.6 + 0.0624 Size (sq. ft.) - 0.635 Age Predictor Constant Size (sq. ft.) Age
Coef 70.56 0.062440 -0.6350
S = 39.6368
R-Sq = 59.3%
SE Coef 22.84 0.009994 0.1294
T 3.09 6.25 -4.91
P 0.003 0.000 0.000
R-Sq(adj) = 57.5%
Analysis of Variance Source Regression Residual Error Total
DF 2 45 47
SS 102907 70698 173605
MS 51453 1571
F 32.75
P 0.000
Answer: Yes, the model without Town (i.e., using only Size and Age) is an improvement over the three predictor model because 1) There are no insignificant predictors in the model. 2) The is not change at all (59.3% in both models). 3) The residual standard error is lower in the two-predictor model (39.64 versus 40.08 in the three-predictor model). 4) The F-statistic is larger in the two-predictor model (32.75 versus 21.36 in the three-predictor model). 5) The Adjusted is larger in the two-predictor model (57.5% versus 56.5% in the three-predictor model). Diff: 2 Type: ES Var: 1 L.O.: 10.3.1
405
Use the following to answer the questions below: A small university is concerned with monitoring the electricity usage in its Student Center, and its officials want to better understand what influences the amount of electricity used on a given day. They collected data on the amount of electricity used in the Student Center each day and the daily high temperature for nearly a year. They also made note of whether each day was a weekend or not (1 = Saturday/Sunday and 0 = Monday - Friday). Regression output is provided. Helpful notes: 1) Electricity usage is measured in kilowatt hours, 2) During the cold months, the Student Center is heated by gas, not electricity, and 3) Air conditioning the building during the warm months does use electricity. The regression equation is Electricity = 83.6 + 0.529 High Temp - 25.2 Weekend Predictor Constant High Temp Weekend
Coef 83.560 0.52918 -25.168
S = 29.8162
R-Sq = 24.7%
SE Coef 4.238 0.07020 3.724
T 19.72 7.54 -6.76
P 0.000 0.000 0.000
R-Sq(adj) = 24.2%
Analysis of Variance Source Regression Residual Error Total
DF 2 310 312
SS 90481 275592 366073
MS 45241 889
F 50.89
P 0.000
32) Predict the amount of electricity used on a Monday with a high temperature of 62°F. Use one decimal place in your answer. A) 116.4 kilowatt hours B) 91.2 kilowatt hours C) 32.8 kilowatt hours D) 141.6 kilowatt hours Answer: A Explanation: = 83.6 + 0.529(62) - 25.2(0) = 116.4 We predict that 116.4 kilowatt hours will be used on a Monday with a high temperature of 62°F. Diff: 2 Type: BI Var: 1 L.O.: 10.1.1
406
33) Predict the amount of electricity used on a Saturday with a high temperature of 68°F. Use one decimal place in your answer. A) 94.4 kilowatt hours B) 119.6 kilowatt hours C) 58.9 kilowatt hours D) 92.4 kilowatt hours Answer: A Explanation: = 83.6 + 0.529(68) - 25.2(1) = 94.4 We predict that 94.4 kilowatt hours will be used on a Saturday with a high temperature of 68°F. Diff: 2 Type: BI Var: 1 L.O.: 10.1.1 34) Interpret the coefficient of High Temp in context. Answer: When comparing weekends (or weekdays), as the daily high temperature increases by 1°F, the predicted electricity usage increases by 0.529 kilowatt hours. Diff: 2 Type: ES Var: 1 L.O.: 10.1.1 35) Interpret the coefficient of Weekend in context. Answer: If the daily high temperature remains unchanged, weekends use, on average, 25.2 kilowatt hours less electricity than weekdays. Diff: 2 Type: ES Var: 1 L.O.: 10.1.1;10.3.2 36) How many days are included in the sample? A) 365 B) 311 C) 312 D) 313 Answer: D Diff: 2 Type: BI Var: 1 L.O.: 10.1.0 37) Interpret for this model. Answer: 24.7% of the variability in Student Center electricity use for the days in this sample is explained by knowing the daily high temperature and whether or not it is a weekend. Diff: 2 Type: ES Var: 1 L.O.: 10.1.4
407
38) Is the model effective according to the ANOVA test? Use α = 0.05. Include all details of the test. Answer: : = = 0 (or The model is ineffective and all predictors could be removed.) ≠ 0 (or At least one of the predictors in the model is effective.)
: At least one F = 50.89 p-value ≈ 0
There is very strong evidence that at least one predictor in the model is effective for explaining electricity use at the university's Student Center. Diff: 2 Type: ES Var: 1 L.O.: 10.1.3 39) Which predictors are significant at the 5% level? What are their p-values? Answer: High Temp (p-value ≈ 0) and Weekend (p-value ≈ 0) are both significant at the 5% level. Diff: 2 Type: ES Var: 1 L.O.: 10.1.2 40) Another possible predictor they recorded was the average temperature over the course of each day. Regression output for the model that uses High Temp, Weekend, and Avg. Temp is provided. Explain why these results differ so drastically from those for the two-predictor model. The regression equation is Sullivan Student Center = 81.9 + 0.839 High Temp - 25.1 Weekend - 0.337 Avg. Temp Predictor Constant High Temp Weekend Avg. Temp S = 29.8393
Coef 81.881 0.8389 -25.053 -0.3372
SE Coef 4.837 0.4351 3.730 0.4673
R-Sq = 24.8%
T 16.93 1.93 -6.72 -0.72
P 0.000 0.055 0.000 0.471
R-Sq(adj) = 24.1%
Analysis of Variance Source DF SS MS F P Regression 3 90945 30315 34.05 0.000 Residual Error 309 275129 890 Total 312 66073 Answer: Average temperature is likely highly correlated with the high temperature (since average temperature is an average over the entire day, the day's high temperature is included in that calculation). In these results neither High Temp nor Avg. Temp are significant when accounting for the other predictors in the model. Diff: 3 Type: ES Var: 1 L.O.: 10.3.3 408
41) A histogram of the residuals and a scatterplot of the residuals versus the predicted values are provided. Discuss whether the conditions for a multiple linear regression are reasonable by referring to the appropriate plots.
Answer: The histogram of the residuals looks roughly symmetric with no serious outliers, indicating that the normality condition is reasonable. However, there appears to be a "bend" in the scatterplot of the residuals versus the predicted values, indicating that the linearity condition is not satisfied. There doesn't seem to be an obvious problem with the consistent variability condition. Overall, we should be concerned that this multiple regression model is not appropriate. Diff: 2 Type: ES Var: 1 L.O.: 10.2.1;10.2.2
409
Use the following to answer the questions below: Is there such thing as a "home court/field advantage"? The number of points scored and whether or not it was a home game are available for a sample of games played by the Boston Celtics during the regular season. The Home variable is coded as 1 = home game and 0 = away game. The regression equation is Points Scored = 102 - 8.76 Home Predictor Constant Home
Coef 102.091 -8.758
SE Coef 3.842 5.728
S = 12.7430
R-Sq = 11.5%
T 26.57 -1.53
P 0.000 0.144
R-Sq(adj) = 6.6%
Analysis of Variance Source Regression Residual Error Total
DF 1 18 19
SS 379.6 2922.9 3302.5
MS 379.6 162.4
F 2.34
P 0.144
42) How many points are the Celtics predicted to score in a home game? Round to one decimal place. A) 93.2 points B) 110.8 points C) 94.0 points D) 111.8 points Answer: A Explanation: = 102 - 8.76(1) = 93.2 The Celtics are predicted to score 93.2 points in a home game. Diff: 2 Type: BI Var: 1 L.O.: 10.1.1;10.3.2 43) How many points are the Celtics predicted to score in an away game? Round to one decimal place. A) 102.0 points B) 101.0 points C) 93.2 points D) 110.8 points Answer: A Explanation: = 102 - 8.76(0) = 102 The Celtics are predicted to score 102.0 points in an away game. Diff: 2 Type: BI Var: 1 L.O.: 10.1.1;10.3.2
410
44) Interpret the for this model. Answer: 11.5% of the variability in the points scored in the sampled games is explained by knowing if the game is home or away. Diff: 2 Type: ES Var: 1 L.O.: 10.1.4 45) Using α = 0.05, is there a difference in the number of points scored for home and away games? Include all details of the test. Answer: : = 0 (or Home/away is not effective for explaining the number of points scored the Boston Celtics.) : ≠ 0 (or Home/away is effective for explaining the number of points scored by the Boston Celtics.) t = -1.53 p-value = 0.144 There is no evidence that home/away is effective for explaining the number of point scores by the Boston Celtics. Since there is only one predictor in the model, the ANOVA F test could have been used instead of the t-test. Diff: 2 Type: ES Var: 1 L.O.: 10.1.2;10.1.3
411
Use the following to answer the questions below: Does the price of used cars depend upon the model? Data were collected on the selling price and age of used Hyundai Elantras (coded as Model = 1) and Toyota Camrys (coded as Model = 0). Output from the multiple regression analysis is provided. The regression equation is Price = 14.5 - 0.619 Age - 3.63 Model Predictor Constant Age Model
Coef 14.4648 -0.61922 -3.6343
SE Coef 0.7059 0.04903 0.7584
S = 2.63465
R-Sq = 69.3%
T 20.49 -12.63 -4.79
P 0.000 0.000 0.000
R-Sq(adj) = 68.4%
Analysis of Variance Source Regression Residual Error Total
DF 2 73 75
SS 1142.13 506.72 1648.85
MS 571.06 6.94
F 82.27
P 0.000
46) What is the predicted price of a 6-year-old Hyundai Elantra? Round to three decimal places. Answer: $7,156 Explanation: = 14.5 - 0.619(6) - 3.63(1) = 7.156 The predicted price of a 6-year-old Hyundai Elantra is $7,156. Diff: 2 Type: SA Var: 1 L.O.: 10.1.1 47) What is the predicted price of a 6-year-old Toyota Camry? Round to three decimal places. Answer: $10,786 Explanation: = 14.5 - 0.619(6) - 3.63(0) = 10.786 The predicted price of a 6-year-old Toyota Camry is $10,786. Diff: 2 Type: SA Var: 1 L.O.: 10.1.1 48) Interpret the coefficient of Model in context. Answer: When the age remains unchanged, used Hyundai Elantras are worth, on average, $3,634 less than used Toyota Camrys. Diff: 2 Type: ES Var: 1 L.O.: 10.1.1;10.3.2 49) Interpret for this model. Answer: 69.3% of the variability in the price of the used cars in this sample is explained by the age and model. Diff: 2 Type: ES Var: 1 L.O.: 10.1.4 412
50) Is the model effective according to the ANOVA test? Use α = 0.05. Include all details of the test. Answer: : = = 0 (or The model is ineffective and all predictors could be omitted.) : At least one
≠ 0 (or At least one of the predictors in the model is effective.)
F = 82.27 p-value ≈ 0 There is very strong evidence that at least one of the predictors is effective for explaining the price of used cars. Diff: 2 Type: ES Var: 1 L.O.: 10.1.3 51) Which predictors are significant at the 5% level? What are their p-values? Answer: Age (p-value ≈ 0) and Model (p-value ≈ 0) are both significant at the 5% level. Diff: 2 Type: ES Var: 1 L.O.: 10.1.2
413
52) A histogram of the residuals and a scatterplot of the residuals versus the predicted values are provided. Discuss whether the conditions for a multiple linear regression are reasonable by referring to the appropriate plots.
Answer: The histogram of the residuals is symmetric with no outliers, indicating that there are no problems with the normality condition. There is a clear curved pattern in the residuals (and possible fanning pattern), indicating that the linearity condition, and possibly the consistent variability condition, are problematic. Diff: 2 Type: ES Var: 1 L.O.: 10.2.1;10.2.2
© 2021 John Wiley & Sons, Inc. All rights reserved. Instructors who are authorized users of this course are permitted to download these materials and use them in connection with the course. Except as permitted herein or by law, no part of these materials should be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise.
Statistics - Unlocking the Power of Data, 3e (Lock) Chapter 11 Probability Basics 414
11.1
Probability Rules
Use the following information about events A and B to answer the questions below. 1) P(A) = 0.42, P(B) = 0.26, and Find P(not A). Use two decimal places. Answer: 0.58 Diff: 1 Type: SA Var: 1 L.O.: 11.1.3 2) P(A) = 0.41, P(B) = 0.35, and Find P(not B). Use two decimal places. Answer: 0.65 Diff: 1 Type: SA Var: 1 L.O.: 11.1.3 3) P(A) = 0.49, P(B) = 0.34, and Find P(A or B). Use two decimal places. Answer: 0.66 Diff: 2 Type: SA Var: 1 L.O.: 11.1.3 4) P(A) = 0.45, P(B) = 0.35, and Find P(A if B). Use three decimal places. Answer: 0.343 Diff: 2 Type: SA Var: 1 L.O.: 11.1.3 5) P(A) = 0.47, P(B) = 0.26, and Find P(B if A). Use three decimal places. Answer: 0.383 Diff: 2 Type: SA Var: 1 L.O.: 11.1.3
415
6) P(A) = 0.39, P(B) = 0.35, and Are events A and B disjoint? Explain briefly. Answer: No, because P(A and B) ≠ 0. Diff: 2 Type: ES Var: 1 L.O.: 11.1.4 Use the following to answer the questions below: Let A and B be two events such that P(A) = 0.35, P(B) = 0.45, and two decimal places in your answer unless otherwise specified. 7) Find P(not A). Answer: 0.65 Diff: 2 Type: SA L.O.: 11.1.3 8) Find P(not B). Answer: 0.55 Diff: 2 Type: SA L.O.: 11.1.3
Var: 1
Var: 1
9) Find P(A or B). Use four decimal places. Answer: 0.6425 Diff: 2 Type: SA Var: 1 L.O.: 11.1.3 10) Find P(A if B). Answer: 0.35 Diff: 2 Type: SA L.O.: 11.1.3 11) Find P(B if A). Answer: 0.45 Diff: 2 Type: SA L.O.: 11.1.3
Var: 1
Var: 1
12) Are events A and B disjoint? A) Yes B) No Answer: B Explanation: No, because P(A and B) ≠ 0. Diff: 2 Type: MC Var: 1 L.O.: 11.1.4
416
Use
13) Are events A and B independent? A) Yes B) No Answer: A Explanation: Yes, they are independent events because P(A)P(B) = 0.35*0.45 = 0.1575 = P(A and B). Diff: 2 Type: MC Var: 1 L.O.: 11.1.4 Use the following information about the independent events A and B to answer the questions below. 14) P(A) = 0.23 and Find P(A if B). Use two decimal places. Answer: 0.23 Diff: 2 Type: SA Var: 1 L.O.: 11.1.3;11.1.4 15) P(A) = 0.30 and Find P(B if A). Use two decimal places. Answer: 0.80 Diff: 2 Type: SA Var: 1 L.O.: 11.1.3;11.1.4 16) P(A) = 0.25 and Find P(A and B). Round your answer to four decimal places. Answer: 0.1850 Diff: 2 Type: SA Var: 1 L.O.: 11.1.3;11.1.4 17) P(A) = 0.30 and P(B) = 0.77 Find P(A or B). Round your answer to four decimal places. Answer: 0.8390 Diff: 2 Type: SA Var: 1 L.O.: 11.1.2;11.1.3
417
Use the following to answer the questions below: Consider rolling a fair six-sided die. Round all answers to three decimal places. 18) What is the probability that the result of rolling the die is a 3? Answer: 0.167 Diff: 2 Type: SA Var: 1 L.O.: 11.1.1;11.1.2 19) What is the probability that the result is not a 5? Answer: 0.833 Diff: 2 Type: SA Var: 1 L.O.: 11.1.1;11.1.2 20) What is the probability that the result is a 2 or a 4? Answer: 0.333 Diff: 2 Type: SA Var: 1 L.O.: 11.1.1;11.1.2;11.1.3 21) Suppose we record the result of the roll and then roll the die a second time. What is the probability that both rolls are a 6? Answer: 0.028 Diff: 2 Type: SA Var: 1 L.O.: 11.1.1;11.1.2;11.1.3;11.1.4 Use the following to answer the questions below: On the first day of class, students in a large introductory statistics course were asked their sex and eye color. The results are summarized in the provided table.
Female Male Total
Blue 24 20 44
Brown 21 17 38
Green 10 8 18
Hazel 11 10 21
All 66 55 121
Round your answer to each question to three decimal places. 22) What is the probability that a randomly selected student in the class has green eyes? Answer: 0.149 Diff: 2 Type: SA Var: 1 L.O.: 11.1.2 23) What is the probability that a randomly selected student is a female? Answer: 0.545 Diff: 2 Type: SA Var: 1 L.O.: 11.1.2
418
24) What is the probability that a randomly selected student in the class is a female and has hazel eyes? Answer: 0.091 Diff: 2 Type: SA Var: 1 L.O.: 11.1.2 25) What is the probability that a randomly selected student does not have green eyes? Answer: 0.851 Diff: 2 Type: SA Var: 1 L.O.: 11.1.2;11.1.3 26) What is the probability that a randomly selected student in the class is a female or has brown eyes? Answer: 0.686 Diff: 2 Type: SA Var: 1 L.O.: 11.1.2;11.1.3 27) What is the probability that a randomly selected student has blue eyes if we know they are female? Answer: 0.364 Diff: 2 Type: SA Var: 1 L.O.: 11.1.2;11.1.3 28) What is the probability that a randomly selected student is a male, if we know that they have hazel eyes? Answer: 0.476 Diff: 2 Type: SA Var: 1 L.O.: 11.1.2;11.1.3 29) Are male and blue eyes independent? Briefly justify your answer. Answer: One way to check this is to see if P(Male and Blue) = P(Male)P(Blue). P(Male and Blue) = 20/121 = 0.165 P(Male)P(Blue) = (55/121)(44/121) = 0.165 Since P(Male and Blue) = P(Male)P(Blue) male and blue are independent. Could also check if P(Blue if Male) = P(Blue) or P(Male if Blue) = P(Male): P(Blue if Male) = 20/55 = 0.364 P(Blue) = 44/121 = 0.364 = P(Blue if Male) P(Male if Blue) = 20/44 = 0.455 P(Male) = 55/121 = 0.455 = P(Male if Blue) Diff: 2 Type: ES Var: 1 L.O.: 11.1.4
419
30) Are brown and blue eyes disjoint? A) Yes B) No Answer: A Explanation: Yes, brown and blue eyes are disjoint because P(brown and blue) = 0 (no one has both). Diff: 2 Type: MC Var: 1 L.O.: 11.1.4 31) Are hazel eyes and male disjoint? A) Yes B) No Answer: B Explanation: P(Hazel and male) = 10/121 = 0.083 ≠ 0, thus hazel and male are not disjoint. Diff: 2 Type: MC Var: 1 L.O.: 11.1.4 Use the following to answer the questions below: A bag of peanut butter M&M's contains 188 candies. Of the candies, 28 are blue, 40 are brown, 38 are green, 25 are orange, 34 are red, and 23 are yellow. They are thoroughly mixed up so that each is equally likely to be selected if we pick one. Round all of your answers to four decimal places. 32) If we select one M&M at random, what is the probability that it is brown? Answer: 0.2128 Diff: 1 Type: SA Var: 1 L.O.: 11.1.1 33) If we select one M&M at random, what is the probability that it is not red? Answer: 0.8191 Diff: 2 Type: SA Var: 1 L.O.: 11.1.1;11.1.2;11.1.3 34) If we select one M&M at random, what is the probability that it is orange or yellow? Answer: 0.2553 Diff: 2 Type: SA Var: 1 L.O.: 11.1.1;11.1.2;11.1.3;11.1.4 35) If we select one at random, then put it back, mix them up well (so the selections are independent) and select another one, what is the probability that both the first and second M&M's are blue? Answer: 0.0222 Diff: 2 Type: SA Var: 1 L.O.: 11.1.1;11.1.2;11.1.3;11.1.4
420
36) If we select one, keep it, and then select a second one, what is the probability that the first one is yellow and the second one is blue? Answer: 0.0183 Diff: 2 Type: SA Var: 1 L.O.: 11.1.1;11.1.2;11.1.3;11.1.4 11.2
Tree Diagrams and Bayes’ Rule
Use the following to answer the questions below: Use the provided tree diagram to find the requested probabilities. Round all answers to three decimal places.
1) P(B and X) Answer: 0.420 Diff: 2 Type: SA L.O.: 11.2.1 2) P(A and Y) Answer: 0.060 Diff: 2 Type: SA L.O.: 11.2.1 3) P(Y if A) Answer: 0.200 Diff: 2 Type: SA L.O.: 11.2.1 4) P(X if B) Answer: 0.600 Diff: 2 Type: SA L.O.: 11.2.1
Var: 1
Var: 1
Var: 1
Var: 1
421
5) P(Y) Answer: 0.340 Diff: 2 Type: SA L.O.: 11.2.2 6) P(X) Answer: 0.660 Diff: 2 Type: SA L.O.: 11.2.2 7) P(A if X) Answer: 0.364 Diff: 2 Type: SA L.O.: 11.2.3 8) P(B if Y) Answer: 0.824 Diff: 2 Type: SA L.O.: 11.2.3
Var: 1
Var: 1
Var: 1
Var: 1
422
Use the following to answer the questions below: A partial tree diagram is provided. The missing probabilities are indicated by lower case letters. For each of the following, find the indicated probability. Round all answers to three decimal places.
9) a Answer: 0.361 Diff: 2 Type: SA L.O.: 11.2.1 10) b Answer: 0.390 Diff: 2 Type: SA L.O.: 11.2.1 11) c Answer: 0.157 Diff: 2 Type: SA L.O.: 11.2.1 12) d Answer: 0.198 Diff: 2 Type: SA L.O.: 11.2.1
Var: 1
Var: 1
Var: 1
Var: 1
423
13) P(X) Answer: 0.389 Diff: 2 Type: SA L.O.: 11.2.2 14) P(Y) Answer: 0.611 Diff: 2 Type: SA L.O.: 11.2.2 15) P(I if X) Answer: 0.232 Diff: 2 Type: SA L.O.: 11.2.3 16) P(II if X) Answer: 0.259 Diff: 2 Type: SA L.O.: 11.2.3 17) P(I if Y) Answer: 0.443 Diff: 2 Type: SA L.O.: 11.2.3 18) P(III if Y) Answer: 0.299 Diff: 2 Type: SA L.O.: 11.2.3
Var: 1
Var: 1
Var: 1
Var: 1
Var: 1
Var: 1
Use the following to answer the questions below: There are three roofing companies that service a small community. Al's Roof Repair gets 45% of the roofing jobs in the community while Bob's Better Building and Carl's Roof Service get 25% and 30% of the business, respectively. Of Al's customers, 70% are satisfied. Of Bob's customers, 95% are satisfied. Among Carl's customers, 90% are satisfied. Round all answers to four decimal places. 19) What is the probability that a randomly selected customer used Al's Roof Repair and is dissatisfied? Answer: 0.1350 Diff: 2 Type: SA Var: 1 L.O.: 11.2.1
424
20) What is the probability that a randomly selected customer used Bob's Better Building and is satisfied? Answer: 0.2375 Diff: 2 Type: SA Var: 1 L.O.: 11.2.1 21) What proportion of roofing customers are satisfied? Answer: 0.8225 Diff: 2 Type: SA Var: 1 L.O.: 11.2.2 22) What proportion of roofing customers are satisfied? Answer: 0.1775 Diff: 2 Type: SA Var: 1 L.O.: 11.2.2 23) If a randomly selected customer is satisfied, what is the probability that they used Al's Roof Repair? Answer: 0.3830 Diff: 2 Type: SA Var: 1 L.O.: 11.2.3 24) If a randomly selected customer is dissatisfied, what is the probability they used Bob's Better Building? Answer: 0.0704 Diff: 2 Type: SA Var: 1 L.O.: 11.2.3 25) If a randomly selected customer is dissatisfied, what is the probability that they used Al's Roof Repair? Answer: 0.7606 Diff: 2 Type: SA Var: 1 L.O.: 11.2.3 26) If a randomly selected customer is satisfied, what is the probability that they used Carl's Roof Service? Answer: 0.3283 Diff: 2 Type: SA Var: 1 L.O.: 11.2.3
425
Use the following to answer the questions below: A computer science student is writing a simplified version of the classic murder-mystery game Clue for his class project. In his implementation, there are three equally likely suspects: Miss Scarlet, Colonel Mustard, and Professor Plum. If Miss Scarlet is the murderer, there is a 40% chance she uses the knife, 35% chance she uses the lead pipe, and a 25% chance she uses the rope. If Colonel Mustard is the murderer, there is a 20% chance he uses the knife, a 30% chance he uses the lead pipe, and a 50% chance he uses the rope. If Professor Plum is the murderer, there is a 30% chance he uses the knife, a 40% chance he uses the lead pipe, and a 30% chance he uses the rope. Round all answers to three decimal places. 27) What is the probability that the knife is the murder weapon? Answer: 0.300 Diff: 2 Type: SA Var: 1 L.O.: 11.2.2 28) What is the probability that the lead pipe is the murder weapon? Answer: 0.350 Diff: 2 Type: SA Var: 1 L.O.: 11.2.2 29) What is the probability that the rope is the murder weapon? Answer: 0.350 Diff: 2 Type: SA Var: 1 L.O.: 11.2.2 30) Suppose while playing the game you discover that the lead pipe is the murder weapon. Given this information, what is the probability that Colonel Mustard is the murderer? Answer: 0.286 Diff: 2 Type: SA Var: 1 L.O.: 11.2.3 31) Suppose while playing the game you discover that the rope is the murder weapon. Given this information, what is the probability that the murderer is Miss Scarlet? Answer: 0.238 Diff: 2 Type: SA Var: 1 L.O.: 11.2.3 32) Suppose while playing the game you discover that the rope is the murder weapon. Given this information, what is the probability that Colonel Mustard is the murderer? Answer: 0.476 Diff: 2 Type: SA Var: 1 L.O.: 11.2.3
426
33) Suppose while playing the game you discover that the knife is the murder weapon. Given this information, what is the probability that Professor Plum is the murderer? Answer: 0.333 Diff: 2 Type: SA Var: 1 L.O.: 11.2.3 34) Suppose while playing the game you discover that the knife is the murder weapon. Given this information, what is the probability that Miss Scarlet is the murderer? Answer: 0.444 Diff: 2 Type: SA Var: 1 L.O.: 11.2.3 35) Suppose while playing the game you discover that the lead pipe is the murder weapon. Given this information, what is the probability that Professor Plum is the murderer? Answer: 0.381 Diff: 2 Type: SA Var: 1 L.O.: 11.2.3
427
Use the following to answer the questions below: In American Football, a coach on the sideline "calls" a play — either a running play or a passing play. Occasionally, for various reasons, the quarterback may decide to change the play (called an "audible"). Suppose that for a particular team, the coach calls running plays 40% of the time (and thus calls passing plays 60% of the time). When the coach calls a running play, a running play is executed on the field 88% of the time. When the coach calls a passing play, a running play is executed on the field 7% of the time. Round all answers to three decimal places. 36) Use the provided information to create a tree diagram. Answer: Let A = Coach calls a running play and B = Running play actually occurs
Diff: 2 Type: ES L.O.: 11.2.1
Var: 1
37) What is the probability that a running play occurs on a randomly selected play? Answer: 0.394 Diff: 2 Type: SA Var: 1 L.O.: 11.2.2 38) What is the probability that a passing play occurs on a randomly selected play? Answer: 0.606 Diff: 2 Type: SA Var: 1 L.O.: 11.2.2 39) Suppose we observe a run on a randomly selected play. What is the probability that the coach called a running play? Answer: 0.893 Diff: 2 Type: SA Var: 1 L.O.: 11.2.3 428
40) Suppose we observe a run on a randomly selected play. What is the probability that the coach actually called a passing play? Answer: 0.107 Diff: 2 Type: SA Var: 1 L.O.: 11.2.3 41) Suppose we observe a pass on a randomly selected play. What is the probability that the coach actually called a running play? Answer: 0.079 Diff: 2 Type: SA Var: 1 L.O.: 11.2.3 42) Suppose we observe a pass on a randomly selected play. What is the probability that the coach called a passing play? Answer: 0.921 Diff: 2 Type: SA Var: 1 L.O.: 11.2.3 Use the following to answer the questions below: About 80% of people have seen television ads for a certain product. Of the individuals who see the ad, only 1% buy the product. Of the individuals who do not see the ad, 0.5% buy the product. Round all answers to three decimal places. 43) Use the provided information to construct a tree diagram. Answer: Let A = Saw ad on television and B = Buy product.
Diff: 2 Type: ES L.O.: 11.2.1
Var: 1
429
44) What is the probability that a randomly selected individual has bought the product? Answer: 0.009 Diff: 2 Type: SA Var: 1 L.O.: 11.2.2 45) What is the probability that a randomly selected individual has not bought the product? Answer: 0.991 Diff: 2 Type: SA Var: 1 L.O.: 11.2.2 46) Suppose we randomly select an individual and discover that they have bought the product. What is the probability that they have seen the product's advertisement? Answer: 0.889 Diff: 2 Type: SA Var: 1 L.O.: 11.2.3 47) Suppose we randomly select an individual and discover that they have bought the product. What is the probability that they have not seen the product's advertisement? Answer: 0.111 Diff: 2 Type: SA Var: 1 L.O.: 11.2.3 48) Suppose we randomly select an individual and discover that they have not bought the product. What is the probability that they have seen the product's advertisement? Answer: 0.799 Diff: 2 Type: SA Var: 1 L.O.: 11.2.3 49) Suppose we randomly select an individual and discover that they have not bought the product. What is the probability that they have not seen the product's advertisement? Answer: 0.201 Diff: 2 Type: SA Var: 1 L.O.: 11.2.3 11.3
Random Variables and Probability Functions
Use the probability function chart for the random variable X to answer the following questions below. Round all answers to two decimal places. 1) x p(x)
1 0.14
Find P(X = 3). Answer: 0.23 Diff: 1 Type: SA L.O.: 11.3.1
2 0.25
3 0.23
4 0.38
Var: 1
430
2) x p(x)
1 0.12
Find P(X = 1 or X = 2). Answer: 0.39 Diff: 2 Type: SA L.O.: 11.3.1
2 0.27
3 0.30
4 0.31
3 0.22
4 0.42
3 0.27
4 0.36
3 0.22
4 0.38
3 0.23
4 0.38
Var: 1
3) x p(x)
1 0.12
Find P(X > 2). Answer: 0.64 Diff: 2 Type: SA L.O.: 11.3.1
2 0.24
Var: 1
4) x p(x)
1 0.11
Find P(X < 2). Answer: 0.11 Diff: 2 Type: SA L.O.: 11.3.1
2 0.26
Var: 1
5) x p(x)
1 0.13
Find P(X ≥ 2). Answer: 0.87 Diff: 3 Type: SA L.O.: 11.3.1
2 0.27
Var: 1
6) x p(x)
1 0.11
Find P(X is odd). Answer: 0.34 Diff: 2 Type: SA L.O.: 11.3.1
2 0.28
Var: 1
431
7) x p(x)
1 0.11
Find P(X is even). Answer: 0.69 Diff: 2 Type: SA L.O.: 11.3.1
2 0.28
3 0.20
4 0.41
3 0.23
4 0.40
Var: 1
8) x p(x)
1 0.11
2 0.26
Compute the mean of the random variable X. Answer: 2.92 Diff: 2 Type: SA Var: 1 L.O.: 11.3.2 9) x p(x)
1 0.15
2 0.28
3 0.29
4 0.28
Compute the variance of the random variable X. Answer: 2.70 Diff: 2 Type: SA Var: 1 L.O.: 11.3.3 10) x p(x)
1 0.13
2 0.28
3 0.25
4 0.34
Compute the standard deviation of the random variable X. Answer: 1.05 Diff: 2 Type: SA Var: 1 L.O.: 11.3.3
432
Use the following to answer the questions below: Identify whether or not each of the following is a valid probability function. If it is not, explain why not. 11) x p(x)
1 0.15
2 0.1
3 0.3
4 0.4
Answer: No, this is not a valid probability function because the probabilities do not sum to 1 (they sum to 0.95). Diff: 2 Type: ES Var: 1 L.O.: 11.3.0 12) x p(x)
1 -0.1
2 0.3
3 0.4
4 0.4
Answer: No, this is not a valid probability function because one of the "probabilities" is negative (they must all be between 0 and 1). Diff: 2 Type: ES Var: 1 L.O.: 11.3.0 13) x p(x)
1 0
2 0.3
3 0.3
4 0.4
Answer: Yes, this is a valid probability function. Diff: 2 Type: ES Var: 1 L.O.: 11.3.0 14) x p(x)
1 0.2
2 0.1
3 0.3
4 0.4
Answer: Yes, this is a valid probability function. Diff: 2 Type: ES Var: 1 L.O.: 11.3.0
433
Use the probability function table of a random variable X to answer the following questions below. Round all answers to two decimal places. 15) x p(x)
5 0.38
10 0.30
20 0.32
Compute the mean of the random variable X. Answer: 11.30 Diff: 2 Type: SA Var: 1 L.O.: 11.3.2 16) x p(x)
5 0.32
10 0.30
20 0.38
Compute the variance of the random variable X. Answer: 41.16 Diff: 2 Type: SA Var: 1 L.O.: 11.3.3 17) x p(x)
5 0.34
10 0.25
20 0.41
Compute the standard deviation of the random variable X. Answer: 6.61 Diff: 2 Type: SA Var: 1 L.O.: 11.3.3 Use the following to answer the questions below: In the game Pass the Pigs, players score (or lose) points by rolling a pair of rubber pigs. The number of points scored depends upon the configuration of the pigs when they land. A paper that appeared in the Journal of Statistics Education (Volume 14, Number 3, 2006) describes a dataset obtained by rolling the pigs many times. The probability function for the number of points scored on a roll (X) is displayed in the following table: x p(x)
0 0.2170
1 0.2173
5 0.3895
10 0.0847
15 0.0285
Round all answers to two decimal places. 18) Compute the mean number of points scored on a roll. Answer: 4.71 Diff: 2 Type: SA Var: 1 L.O.: 11.3.2 434
20 0.0622
25 0.0003
40 0.0003
60 0.0002
19) Compute the variance of the number of points scored on a roll. Answer: 28.82 Diff: 2 Type: SA Var: 1 L.O.: 11.3.3 20) Compute the standard deviation of the number of points scored on a roll. Answer: 5.37 Diff: 2 Type: SA Var: 1 L.O.: 11.3.3 Use the following to answer the questions below: The New York Lottery has a daily game called "Take Five" where you win prizes based on how many of the 5 selected numbers match your ticket. The probability function for the number of correct picks on a ticket (X) is displayed in the provided table. x p(x)
0 0.483287
1 0.402739
2 0.103933
3 0.009744
4 0.000295
5 0.000002
Round all answers to three decimal places. 21) What is the probability of getting only 1 or 2 picks correct? Answer: 0.507 Diff: 2 Type: SA Var: 1 L.O.: 11.3.1 22) What is the probability of getting fewer than 3 picks correct? Answer: 0.990 Diff: 2 Type: SA Var: 1 L.O.: 11.3.1 23) Players of the "Take Five" game receive a prize for getting two or more picks correct. What is the probability of getting a prize? Answer: 0.114 Diff: 3 Type: SA Var: 1 L.O.: 11.3.1 24) What is the mean number of correct picks on a "Take Five" ticket? Answer: 0.641 Diff: 2 Type: SA Var: 1 L.O.: 11.3.2 25) Compute the variance of the number of correct picks on a "Take Five" ticket. Answer: 0.500 Diff: 2 Type: SA Var: 1 L.O.: 11.3.3
435
26) Compute the standard deviation of the number of correct picks on a "Take Five" ticket. Answer: 0.707 Diff: 2 Type: SA Var: 1 L.O.: 11.3.3 Use the following to answer the questions below: The New York Lottery has a daily game called "Take Five" where you win prizes based on how many of the 5 selected numbers match your ticket. The probability function for the typical payout on a ticket (X) is displayed in the provided table. x p(x) Correct Picks:
$0 0.886026 0 or 1
$1 0.103933 2
$25.66 0.009744 3
$508.02 0.000295 4
$57,575.70 0.000002 5
Round all answers to two decimal places. 27) Compute the mean typical payout of a "Take Five" ticket. Answer: $0.62 Diff: 2 Type: SA Var: 1 L.O.: 11.3.2 28) Compute the variance of the typical payout of a "Take Five" ticket. Answer: $6,712.19 Diff: 2 Type: SA Var: 1 L.O.: 11.3.3 29) Compute the standard deviation of the typical payout of a "Take Five" ticket. Answer: $81.93 Diff: 2 Type: SA Var: 1 L.O.: 11.3.3 Use the following to answer the questions below: Let X represent the number of heads seen in two tosses of a fair coin. The probability function for this random variable is summarized in the provided table. x p(x)
0 0.25
1 0.50
2 0.25
Round all answers to two decimal places unless otherwise specified. 30) Find P(X = 0). Answer: 0.25 Diff: 1 Type: SA L.O.: 11.3.1
Var: 1
436
31) Find P(X > 0). Answer: 0.75 Diff: 2 Type: SA L.O.: 11.3.1
Var: 1
32) Compute the mean number of heads seen in two tosses of a fair coin. Answer: 1 Diff: 2 Type: SA Var: 1 L.O.: 11.3.2 33) Compute the variance of the number of heads seen in two tosses of a fair coin. Use one decimal place. Answer: 0.5 Diff: 2 Type: SA Var: 1 L.O.: 11.3.3 34) Compute the standard deviation of the number of heads seen in two tosses of a fair coin. Answer: 0.71 Diff: 2 Type: SA Var: 1 L.O.: 11.3.3 Use the following to answer the questions below: A local organization is holding a raffle. There are four prizes: $50, $30, $20, and $10. They have sold 250 tickets. To select the winners, they draw four tickets at random. Let X represent the amount won with a single ticket. Round all answers to two decimal places. 35) Construct a table to display the probability function of this random variable. Answer: x $50 $30 $20 $10 $0 p(x) 0.004 0.004 0.004 0.004 0.984 Diff: 2 Type: ES L.O.: 11.3.0
Var: 1
36) Compute the mean winnings of a single ticket. Answer: $0.44 Diff: 2 Type: SA Var: 1 L.O.: 11.3.2 37) Compute the variance of the amount won with a single ticket. Answer: $15.41 Diff: 2 Type: SA Var: 1 L.O.: 11.3.3
437
38) Compute the standard deviation of the amount won with a single ticket. Answer: $3.93 Diff: 2 Type: SA Var: 1 L.O.: 11.3.3 11.4
Binomial Probabilities
Use the following to answer the questions below: In the classic dice game Yahtzee, players roll five dice and score points by obtaining different combinations of values. Consider rolling five fair dice. Let X represent the number of sixes in a single roll of the five dice. Unless otherwise specified, round all answers to three decimal places. 1) Explain why X is a binomial random variable. Answer: X is a binomial random variable because 1) the number of trials is fixed in advanced (there are 5 dice) 2) the probability of success (i.e., the probability of a 6) is the same each time 3) the trials are independent (the outcome on one die is independent of the others) Diff: 2 Type: ES Var: 1 L.O.: 11.4.1 2) What is the probability of getting exactly 2 sixes? Answer: 0.161 Diff: 2 Type: SA Var: 1 L.O.: 11.4.2 3) The highest point outcome in the game, called a "Yahtzee," occurs when all five dice display the same value. What is the probability of getting exactly 5 sixes? Round your answer to five decimal places. Answer: 0.00013 Diff: 2 Type: SA Var: 1 L.O.: 11.4.2 4) What is the probability of getting 3 or more sixes? Answer: 0.035 Diff: 2 Type: SA Var: 1 L.O.: 11.4.2 5) What is the probability of getting fewer than 2 sixes? Answer: 0.804 Diff: 2 Type: SA Var: 1 L.O.: 11.4.2
438
6) Find the mean number of sixes in a roll of five fair dice. Answer: 0.833 Diff: 2 Type: SA Var: 1 L.O.: 11.4.3 7) Find the standard deviation of the number of sixes is a roll of five fair dice. Answer: 0.833 Diff: 2 Type: SA Var: 1 L.O.: 11.4.3 Use the following to answer the questions below: In 2008, the Detroit Lions set an NFL record by losing all 16 regular season games. Suppose that the Lions actually had a 10% chance of winning an individual game in the 2008 season and that each game was independent. Let X represent the number of wins out of 16 regular season games. Round all answers to three decimal places unless otherwise specified. 8) Explain why X is a binomial random variable. Answer: X is a binomial random variable because 1) the number of trials is fixed in advance (16 regular season games) 2) the probability of success does not change from trial to trial 3) we are assuming that the games are independent Diff: 2 Type: ES Var: 1 L.O.: 11.4.1 9) What is the probability of not winning a single game? Answer: 0.185 Diff: 2 Type: SA Var: 1 L.O.: 11.4.2 10) What is the probability of winning 1 or 2 games? Answer: 0.604 Diff: 2 Type: SA Var: 1 L.O.: 11.4.2 11) What is the probability of winning fewer than 4 games? Answer: 0.932 Diff: 2 Type: SA Var: 1 L.O.: 11.4.2 12) Find the mean of the random variable X. Use one decimal place. Answer: 1.6 Diff: 2 Type: SA Var: 1 L.O.: 11.4.3
439
13) Find the standard deviation of the random variable X. Use one decimal place. Answer: 1.2 Diff: 2 Type: SA Var: 1 L.O.: 11.4.3 Use the following to answer the questions below: A statistician used a computer to generate 4 random values between 0 and 9. Let X represent the number of these values that are 5 or larger. Round all probability calculations to three decimal places. 14) If a "success" is being 5 or larger, what is the probability of success in this situation? Answer: 0.5 Diff: 3 Type: SA Var: 1 L.O.: 11.4.0 15) What is the probability that all 4 values are 5 or larger? Answer: 0.063 Diff: 2 Type: SA Var: 1 L.O.: 11.4.2 16) What is the probability that 2 or more of the values are 5 or larger? Answer: 0.688 Diff: 2 Type: SA Var: 1 L.O.: 11.4.2 17) What is the probability that 2 or 3 of the values are 5 or larger? Answer: 0.625 Diff: 2 Type: SA Var: 1 L.O.: 11.4.2 18) Find the mean of the random variable X. Answer: 2 Diff: 2 Type: SA Var: 1 L.O.: 11.4.3 19) Find the standard deviation of the random variable X. Answer: 1 Diff: 2 Type: SA Var: 1 L.O.: 11.4.3
440
Use the following to answer the questions below: According to a study described in the New York Times, blue eyes are becoming rarer among Americans with only about 17% of Americans having blue eyes. Consider taking a random sample of 50 Americans. Let X represent the number of individuals with blue eyes in the sample. Round all values to three decimal places unless otherwise specified. 20) Explain why X is a binomial random variable. Answer: X is a binomial random variable because 1) the number of trials is fixed in advance (there are people) 2) the probability of success (blue eyes) does not change from trial to trial 3) with a random sample the trials (people) should be independent. Diff: 2 Type: ES Var: 1 L.O.: 11.4.1 21) What is the probability that exactly 5 people in the sample have blue eyes? Answer: 0.069 Diff: 2 Type: SA Var: 1 L.O.: 11.4.2 22) What is the probability that 5 or fewer people have blue eyes? Answer: 0.126 Diff: 2 Type: SA Var: 1 L.O.: 11.4.2 23) Find the mean of the random variable X. Use one decimal place. Answer: 8.5 Diff: 2 Type: SA Var: 1 L.O.: 11.4.3 24) Find the standard deviation of X. Answer: 2.656 Diff: 2 Type: SA Var: 1 L.O.: 11.4.3 Use the following to answer the questions below: Calculate the requested quantity. 25) 3! Answer: 6 Diff: 2 Type: SA L.O.: 11.4.0
Var: 1
441
26) 5! Answer: 120 Diff: 2 Type: SA L.O.: 11.4.0
Var: 1
27) Answer: 20 Diff: 2 Type: SA L.O.: 11.4.0
Var: 1
28) Answer: 210 Diff: 2 Type: SA L.O.: 11.4.0
Var: 1
Use the following to answer the questions below: When a certain pitcher throws his fastball, 75% of the time it is a strike. Suppose he throws 20 fastballs and that the pitches are independent of one another. Let X represent the number of strikes in 20 pitches. Round all answers to three decimal places. 29) Explain why X is a binomial random variable. Answer: X is a binomial random variable because 1) the number of trials is fixed in advance 2) the probability of success (strike) is the same for all trials 3) we are assuming the trials (pitches) are independent. Diff: 2 Type: ES Var: 1 L.O.: 11.4.1 30) What is the probability that all 20 pitches are strikes? Answer: 0.003 Diff: 2 Type: SA Var: 1 L.O.: 11.4.2 31) What is the probability that more than 15 pitches are strikes? Answer: 0.415 Diff: 2 Type: SA Var: 1 L.O.: 11.4.2 32) Find the mean of the random variable X. Answer: 15 Diff: 2 Type: SA Var: 1 L.O.: 11.4.3 442
33) Find the standard deviation of the random variable X. Answer: 1.936 Diff: 2 Type: SA Var: 1 L.O.: 11.4.3 Use the following to answer the questions below: Determine whether the process describes a binomial random variable. If it is binomial, give values for n and p. If it is not binomial, state why not. 34) It is believed that 10% of people are left handed. Randomly sample 100 adults, and count the number that are left handed. Answer: Binomial, n = 100 and p = 0.10. Diff: 2 Type: ES Var: 1 L.O.: 11.4.1 35) In American Football, when the place kicker attempts a field goal, they are typically more likely to make "short" field goals (when they are kicking relatively close to the end zone) than "long" field goals (when they are relatively far away from the end zone). Suppose a certain place kicker attempts 35 field goals in a season (at different distances) and the number of field goals he made is recorded. Answer: Not Binomial because the probability of success (making the field goal) is not the same for all attempts (since he is kicking at different distances). Diff: 2 Type: ES Var: 1 L.O.: 11.4.1 36) The proportion of babies that are boys is about 0.51. In a kindergarten class with 28 children, count the number of boys. We assume that the sex of each child is independent of the others. Answer: Binomial, with n = 28 and p = 0.51. Diff: 2 Type: ES Var: 1 L.O.: 11.4.1 37) Toss a fair coin until it lands heads three times. Count the number of tosses required. Answer: Not binomial because the number of trials is not fixed in advanced. Diff: 2 Type: ES Var: 1 L.O.: 11.4.1 38) In a large bag of M&M's, select 10 candies and count the number of red candies. Answer: Not binomial. If a red candy is selected, there will be fewer reds left in the bag. Thus, the trials are not independent. Diff: 2 Type: ES Var: 1 L.O.: 11.4.1
443
39) Randomly select one adult from each of the 50 U.S. states, and count the number that are obese. Answer: Not Binomial because the probability of selecting an obese individual varies from state to state (the percentage of residents that are obese is not the same for all 50 states). Diff: 2 Type: ES Var: 1 L.O.: 11.4.1 Use the following to answer questions below: Let's say that about 25.4% of a town's residents are classified as obese. Suppose we take a random sample of 200 residents. Let X represent the number of residents that are obese. Round all answers to three decimal places unless otherwise specified. 40) Explain why X is a binomial random variable. Answer: X is a binomial random variable because 1) the number of trials is fixed in advanced 2) the probability of success (obese) is the same for all trials 3) since this is a random sample, the trials (individuals selected for the sample) should be independent. Diff: 2 Type: ES Var: 1 L.O.: 11.4.1 41) What is the probability that exactly 56 people are obese? Answer: 0.044 Diff: 2 Type: SA Var: 1 L.O.: 11.4.2 42) What is the probability that 49 or 50 people are obese? Answer: 0.127 Diff: 2 Type: SA Var: 1 L.O.: 11.4.2 43) Find the mean of the random variable X. Use one decimal place. Answer: 50.8 Diff: 2 Type: SA Var: 1 L.O.: 11.4.3 44) Find the standard deviation of the random variable X. Answer: 6.156 Diff: 2 Type: SA Var: 1 L.O.: 11.4.3
444
11.5
Density Curves and the Normal Distribution
1) There are no testbank questions for this section. Diff: 2 Type: SA Var: 1
© 2021 John Wiley & Sons, Inc. All rights reserved. Instructors who are authorized users of this course are permitted to download these materials and use them in connection with the course. Except as permitted herein or by law, no part of these materials should be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise.
445