Instructor Solution Manual for Statistics for Business and Economics, 14th edition

Page 1

Instructor Solution Manual for Statistics for Business and Economics, 14th edition

richard@qwconsultancy.com

1|Pa ge


INSTRUCTOR’S SOLUTIONS MANUAL MARK DUMMELDINGER University of South Florida

S TATISTICS FOR B USINESS AND E CONOMICS

FOURTEENTH EDITION

James T. McClave University of Florida

P. George Benson College of Charleston

Terry Sincich University of South Florida


Please contact https://support.pearson.com/getsupport/s/contactsupport with any queries on this content. Microsoft and/or its respective suppliers make no representations about the suitability of the information contained in the documents and related graphics published as part of the services for any purpose. All such documents and related graphics are provided “as is” without warranty of any kind. Microsoft and/or its respective suppliers hereby disclaim all warranties and conditions with regard to this information, including all warranties and conditions of merchantability, whether express, implied or statutory, fitness for a particular purpose, title and non-infringement. In no event shall Microsoft and/or its respective suppliers be liable for any special, indirect or consequential damages or any damages whatsoever resulting from loss of use, data or profits, whether in an action of contract, negligence or other tortious action, arising out of or in connection with the use or performance of information available from the services. The documents and related graphics contained herein could include technical inaccuracies or typographical errors. Changes are periodically added to the information herein. Microsoft and/or its respective suppliers may make improvements and/or changes in the product(s) and/or the program(s) described herein at any time. Partial screen shots may be viewed in full within the software version specified. Microsoft® and Windows® are registered trademarks of the Microsoft Corporation in the U.S.A. and other countries. This book is not sponsored or endorsed by or affiliated with the Microsoft Corporation. The author and publisher of this book have used their best efforts in preparing this book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The author and publisher make no warranty of any kind, expressed or implied, with regard to these programs or the documentation contained in this book. The author and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. Reproduced by Pearson from electronic files supplied by the author. Copyright © 2022, 2018, 2014 by Pearson Education, Inc. or its affiliates, 221 River Street, Hoboken, NJ 07030. All Rights Reserved. Manufactured in the United States of America. This publication is protected by copyright, and permission should be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise. For information regarding permissions, request forms, and the appropriate contacts within the Pearson Education Global Rights and Permissions department, please visit www.pearsoned.com/permissions/. PEARSON, ALWAYS LEARNING, and MYLAB are exclusive trademarks owned by Pearson Education, Inc. or its affiliates in the U.S. and/or other countries. Unless otherwise indicated herein, any third-party trademarks, logos, or icons that may appear in this work are the property of their respective owners, and any references to third-party trademarks, logos, icons, or other trade dress are for demonstrative or descriptive purposes only. Such references are not intended to imply any sponsorship, endorsement, authorization, or promotion of Pearson’s products by the owners of such marks, or any relationship between the owner and Pearson Education, Inc., or its affiliates, authors, licensees, or distributors.

ISBN-13: 978-0-13-685527-9 ISBN-10: 0-13-685527-X


Contents 1. Statistics, Data, and Statistical Thinking 1 2. Methods for Describing Sets of Data 10 3. Probability 90 4. Random Variables and Probability Distributions 138 5. Sampling Distributions 235 6. Inferences Based on a Single Sample: Estimation with Confidence Intervals 268 7. Inferences Based on a Single Sample: Tests of Hypotheses 311 8. Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 368 9. Design of Experiments and Analysis of Variance 431 10. Categorical Data Analysis 506 11. Simple Linear Regression 546 12. Multiple Regression and Model Building 623 13. Methods for Quality Improvement: Statistical Process Control 745 14. Time Series: Descriptive Analyses, Models, and Forecasting 810 15. Nonparametric Statistics 883


Chapter 1 Statistics, Data, and Statistical Thinking 1.1

Statistics is a science that deals with the collection, classification, analysis, and interpretation of information or data. It is a meaningful, useful science with a broad, almost limitless scope of applications to business, government, and the physical and social sciences.

1.2

Descriptive statistics utilizes numerical and graphical methods to look for patterns, to summarize, and to present the information in a set of data. Inferential statistics utilizes sample data to make estimates, decisions, predictions, or other generalizations about a larger set of data.

1.3

The four elements of a descriptive statistics problem are: 1. 2. 3. 4.

1.4

The population or sample of interest. This is the collection of all the units upon which the variable is measured. One or more variables that are to be investigated. These are the types of data that are to be collected. Tables, graphs, or numerical summary tools. These are tools used to display the characteristic of the sample or population. Identification of patterns in the data. These are conclusions drawn from what the summary tools revealed about the population or sample.

The five elements of an inferential statistical analysis are: 1. 2. 3. 4. 5.

The population of interest. The population is a set of existing units. One or more variables that are to be investigated. A variable is a characteristic or property of an individual population unit. The sample of population units. A sample is a subset of the units of a population. The inference about the population based on information contained in the sample. A statistical inference is an estimate, prediction, or generalization about a population based on information contained in a sample. A measure of reliability for the inference. The reliability of an inference is how confident one is that the inference is correct.

1.5

The first major method of collecting data is from a published source. These data have already been collected by someone else and are available in a published source. The second method of collecting data is from a designed experiment. These data are collected by a researcher who exerts strict control over the experimental units in a study. These data are measured directly from the experimental units. The final method of collecting data is observational. These data are collected directly from experimental units by simply observing the experimental units in their natural environment and recording the values of the desired characteristics. The most common type of observational study is a survey.

1.6

Quantitative data are measurements that are recorded on a meaningful numerical scale. Qualitative data are measurements that are not numerical in nature; they can only be classified into one of a group of categories.

1.7

A population is a set of existing units such as people, objects, transactions, or events. A variable is a characteristic or property of an individual population unit such as height of a person, time of a reflex, amount of a transaction, etc. 1 Copyright © 2022 Pearson Education, Inc.


2

Chapter 1

1.8

A population is a set of existing units such as people, objects, transactions, or events. A sample is a subset of the units of a population.

1.9

A representative sample is a sample that exhibits characteristics similar to those possessed by the target population. A representative sample is essential if inferential statistics is to be applied. If a sample does not possess the same characteristics as the target population, then any inferences made using the sample will be unreliable.

1.10

An inference without a measure of reliability is nothing more than a guess. A measure of reliability separates statistical inference from fortune telling or guessing. Reliability gives a measure of how confident one is that the inference is correct.

1.11

A population is a set of existing units such as people, objects, transactions, or events. A process is a series of actions or operations that transform inputs to outputs. A process produces or generates output over time. Examples of processes are assembly lines, oil refineries, and stock prices.

1.12

Statistical thinking involves applying rational thought processes to critically assess data and inferences made from the data. It involves not taking all data and inferences presented at face value, but rather making sure the inferences and data are valid.

1.13

The data consisting of the classifications A, B, C, and D are qualitative. These data are nominal and thus are qualitative. After the data are input as 1, 2, 3, and 4, they are still nominal and thus qualitative. The only differences between the two data sets are the names of the categories. The numbers associated with the four groups are meaningless.

1.14

Answers will vary. First, number the elements of the population from 1 to 200,000. Using MINITAB, generate 10 numbers on the interval from 1 to 200,000, eliminating any duplicates. The 10 numbers selected for the random sample are: 135075 89127 189226 83899 112367 191496 110021 44853 42091 198461 Elements with the above numbers are selected for the sample.

1.15

a.

Electrical generation capacity can take on values such as 400, 10,000, etc. Therefore, it is quantitative.

b.

Hub height can take on values such as 100, 200, etc. Therefore, it is quantitative.

c.

Rotor diameter can take on values such as 5, 10, etc. Therefore, it is quantitative.

d.

Location can take on values "Florida," "Georgia," etc., which are not numeric. Therefore, it is qualitative.

e.

Number of turbines in the project can take on values such as 5, 10, etc. Therefore, it is quantitative. Copyright © 2022 Pearson Education, Inc.


Statistics, Data, and Statistical Thinking 1.16

1.17

1.18

1.19

3

a.

The experimental unit is a single-tenant of retail properties.

b.

The capitalization rate can take on values such as 6.75, 7.40, etc. Therefore, it is quantitative. Years remaining on lease can take on values such as, 5, 10, etc. Therefore, it is quantitative. Credit rating can take on values such as “BBB,” “B,” etc. Therefore, it is qualitative.

c.

The 13 tenants in the study represent a sample of the Boulder Group consulting firm.

d.

The population of interest is the entire set of tenants of the Boulder Group consulting firm.

e.

We can see from the table of sampled tenants that the capitalization rates tend to increase as the remaining lease term decreases. The inferential techniques we look at later in the semester will allow us to determine if this trend can be inferred to the general population of all tenants as well.

a.

The experimental unit is a firm that had annual balance sheet data from 1973 to 2011 reported on Compustat.

b.

The variable measured in this study is the DQ value.

c.

This is an observational study. No variables were manipulated in this study.

a.

The data would represent the population. These data are all of the data that are of interest to the researchers.

b.

If the 80 jamming attacks are actually a sample, then the population would be all jamming attacks by the U.S. military over the past several years.

c.

The variable “network type” is qualitative.

d.

The variable “number of channels” is qualitative because there are only two categories of possible responses.

e.

If we wanted to measure the number of channels as a quantitative variable, then we would have to record the exact number of channels for each jamming attack.

a.

The population of interest is all U.S. residents with a listed phone number.

b.

The variable of interest is the view of each U.S. resident as to whether the president is doing a good or bad job. It is qualitative.

c.

The sample is the 2000 individuals selected for the poll.

d.

The inference of interest is to estimate the proportion of all U.S. residents who believe the president is doing a good job.

e.

The method of data collection is a survey.

f.

It is not very likely that the sample will be representative of the population of all residents of the United States. By selecting phone numbers at random, the sample will be limited to only those people who have telephones. Also, many people share the same phone number, so each person would not have an equal chance of being contacted. Another possible problem is the time of day the calls are made. If the calls are made in the evening, those people who work in the evening would not be represented. Copyright © 2022 Pearson Education, Inc.


4

Chapter 1

1.20

a.

The population of interest is all professionals who hold ISACA’s Certified Information Security Manager designation.

b.

The data-collection method was a survey: 766 responded to the question on whether or not they expect to experience a cyberattack against the firm in the coming year. There are possible biases because those who decided to respond to the questionnaire were self-selected. Generally, those who respond to a questionnaire have very strong opinions, either positive or negative, which may not be representative of all those in the population.

c.

The variable measured is whether or not the respondents expect to experience a cyberattack against the firm in the coming year. It is a qualitative variable.

d.

Assuming that the sample was representative of the population, we could infer that 80% of all professionals who hold ISACA’s Certified Information Security Manager designation expect to experience a cyberattack against the firm in the coming year.

1.21

Since the data collected consist of the entire population, this would represent a descriptive study. Flaherty used the data to help describe the condition of the U.S. Treasury in 1861.

1.22

This is an example of inferential statistics. The researchers are using a set of collected data (the sample) to make a statement about other retail stores in the population.

1.23

a.

The experimental unit of interest is an online order.

b.

The variable “state delivered to” can take on values such as “Washington,” “Montana,” etc. Therefore, it is qualitative. The variable “distribution center” can take on values such as “eastern” or “western.” Therefore, it is qualitative. The variable “delivery time” can take on values such as 5 days, 7 days, etc. Therefore, it is quantitative.

c.

The goal of the analysis is to use the collected, or sampled, data to determine which distribution center reduces delivery times for other future deliveries (in a larger population). Because the results are intended to be applied to a larger group of deliveries than the ones collected, this would be considered an inferential study. The sampled data seems to indicate that the further west you go, the more effective the western US distribution center is at reducing delivery times. The inferential techniques we learn later in the semester will allow us to see if this can be said for all future deliveries as well.

1.24

1.25

a.

The experimental unit of interest is a college student.

b.

Both the variables street crossing performance score and memory task score are quantitative variables.

c.

This is an application of inferential statistics. Data were collected from a sample of students and conclusions were made about the entire population based on the information found in the sample.

a.

This is a designed experiment because the college students were randomly assigned to a group.

b.

The experimental unit is a college student.

c.

The two variables are type of condition and type of disposal. Both type of condition and type of disposal are qualitative. Copyright © 2022 Pearson Education, Inc.


Statistics, Data, and Statistical Thinking

1.26

1.27

1.28

1.29

1.30

5

d.

We could infer that in the population of all college students, those who could be placed in the usefulness is salient condition will recycle at a much higher rate (68%) than those who could be placed in the control condition (37%).

a.

The experimental unit for this study is an NFL quarterback.

b.

The variables measured in this study include draft position, NFL winning ratio, and QB production score. Since the draft position was put into 3 categories, it is a qualitative variable. The NFL winning ratio and the QB production score are both quantitative.

c.

Since we want to project the performance of future NFL QBs , this would be an application of inferential statistics.

a.

The population of interest is all individuals who took GMAT in the time period.

b.

The method of data collection was a survey.

c.

This is probably not a representative sample. The sample was self-selected. Not all of those who were selected for the study responded to all four surveys. Those who did respond to all 4 surveys probably have very strong opinions, either positive or negative, which may not be representative of all of those in the population.

a.

The population of interest is all CPA firms.

b.

A survey was used to collect the data.

c.

This sample was probably not representative. Not all of those selected to be in the sample responded. In fact, only 992 of the 23,500 people who were sent the survey responded. Generally, those who do respond to surveys have very strong opinions, either positive or negative. These may not be the opinions of all CPA firms.

d.

Since the sample may not be representative, the inferences drawn in the study may not be valid.

a.

Length of maximum span can take on values such as 15 feet, 50 feet, 75 feet, etc. Therefore, it is quantitative.

b.

The number of vehicle lanes can take on values such as 2, 4, etc. Therefore, it is quantitative.

c.

The answer to this item is "yes" or "no," which is not numeric. Therefore, it is qualitative.

d.

Average daily traffic could take on values such as 150 vehicles, 3,579 vehicles, 53,295 vehicles, etc. Therefore, it is quantitative.

e.

Condition can take on values "good," "fair," or "poor," which are not numeric. Therefore, it is qualitative.

f.

The length of the bypass or detour could take on values such as 1 mile, 4 miles, etc. Therefore, it is quantitative.

g.

Route type can take on values "interstate," U.S.," "state," "county," or "city," which are not numeric. Therefore, it is qualitative.

a.

The variable of interest to the researchers is the rating of highway bridges. Copyright © 2022 Pearson Education, Inc.


6

1.31

1.32

1.33

Chapter 1

b.

Since the rating of a bridge can be categorized as one of three possible values, it is qualitative.

c.

The data set analyzed is a population since all highway bridges in the U.S. were categorized.

d.

The data were collected observationally. Each bridge was observed in its natural setting.

a.

The process being studied is the distribution of pipes, valves, and fittings to the refining, chemical, and petrochemical industries by the Wallace Company of Houston.

b.

The variables of interest are the speed of the deliveries, the accuracy of the invoices, and the quality of the packaging of the products.

c.

The sampling plan was to monitor a subset of current customers by sending out a questionnaire twice a year and asking the customers to rate the speed of the deliveries, the accuracy of the invoices, and the quality of the packaging minutes. The sample is the total numbers of questionnaires received.

d.

The Wallace Company's immediate interest is learning about the delivery process of its distribution of pipes, valves, and fittings. To do this, it is measuring the speed of deliveries, the accuracy of the invoices, and the quality of its packaging from the sample of its customers to make an inference about the delivery process to all customers. In particular, it might use the mean speed of its deliveries to the sampled customers to estimate the mean speed of its deliveries to all its customers. It might use the mean accuracy of its invoices from the sampled customers to estimate the mean accuracy of its invoices of all its customers. It might use the mean rating of the quality of its packaging from the sampled customers to estimate the mean rating of the quality of its packaging of all its customers.

e.

Several factors might affect the reliability of the inferences. One factor is the set of customers selected to receive the survey. If this set is not representative of all the customers, the wrong inferences could be made. Also, the set of customers returning the surveys may not be representative of all its customers. Again, this could influence the reliability of the inferences made.

a.

The population of interest would be the set of all students. The sample of interest would be the students participating in the experiment. The variable measured in this study is whether the student would spend money on repairing a very old car or not.

b.

The data-collection method used was a designed experiment. The students participating in the experiment were randomly assigned to one of three emotional states and then asked a question.

c.

The researcher could estimate the proportion of all students in each of the three emotional states who would spend money to repair a very old car.

d.

One factor that might affect the reliability of the inference drawn are whether the students in the experiment were representative of all students. It is stated that the sample was made up of volunteer students. Chances are that these volunteer students were not representative of all students. In addition, if these students were all from the same school, they probably would not be representative of the population of students either.

a.

The population of interest would be all accounting alumni of a large southwestern university.

b.

Age would produce quantitative data – the responses would be numbers. Gender would produce qualitative data – the responses would be ‘male’ or ‘female’. Level of education would produce qualitative data – the responses could be categories such college degree, masters’ degree, or PhD degree. Copyright © 2022 Pearson Education, Inc.


Statistics, Data, and Statistical Thinking

7

Income would produce quantitative data – the responses would be numbers. Job satisfaction score would produce quantitative data. We would assume that a satisfaction score would be a number, where the higher the number, the higher the job satisfaction. Machiavellian rating score would produce quantitative data. We would assume that a rating score would be a number, where the higher the score, the higher the Machiavellian traits.

1.34

c.

The sample is the 198 people who returned the useable questionnaires.

d.

The data collection method used was a survey.

e.

The inference made by the researcher is that Machiavellian behavior is not required to achieve success in the accounting profession.

f.

Generally, those who respond to surveys are those with strong feelings (in either direction) toward the subject matter. Those who do not have strong feelings for the subject matter tend not to answer surveys. Those who did not respond might be those who are not real happy with their jobs or those who are not real unhappy with their jobs. Thus, we might have no idea what type of scores these people would have on the Machiavellian rating score.

a.

The experimental units for this study are engaged couples who used a particular website.

b.

There are two variables of interest – the price of the engagement ring and the level of appreciation. Price of the engagement ring is a quantitative variable because it is measured on a numerical scale. Level of appreciation is a qualitative variable. There are 7 different categories for this variable that are then assigned numbers.

c.

The population of interest would be all engaged couples.

d.

No, the sample is probably not representative. Only engaged couples who used a particular web site were eligible to be in the sample. Then, only those with “average” American names were invited to be in the sample.

e.

Answers will vary. First, we will number the individuals from 1 to 50. Using MINITAB, 25 random numbers were generated on the interval from 1 to 50. The random numbers are: 1, 4, 5, 8, 12, 13, 17, 18, 19, 20, 22, 26, 27, 30, 31, 33, 34, 35, 38, 39, 40, 42, 43, 46, 49 The individuals who were assigned the numbers corresponding to the above numbers would be assigned to one role and the remaining individuals would be assigned to the other role.

1.35

a.

Some possible questions are: 1.

In your opinion, why has the banking industry consolidated in the past few years? Check all that apply. a. b. c. d. e. f.

Too many small banks with not enough capital. A result of the Savings and Loan scandals. To eliminate duplicated resources in the upper management positions. To provide more efficient service to the customers. To provide a more complete list of financial opportunities for the customers. Other. Please list.

Copyright © 2022 Pearson Education, Inc.


8

Chapter 1 2.

Using a scale from 1 to 5, where 1 means strongly disagree and 5 means strongly agree, indicate your agreement to the following statement: "The trend of consolidation in the banking industry will continue in the next five years." 1 strongly disagree

1.36

2 disagree

3 no opinion

4 agree

5 strongly agree

b.

The population of interest is the set of all bank presidents in the United States.

c.

It would be extremely difficult and costly to obtain information from all bank presidents. Thus, it would be more efficient to sample just 200 bank presidents. However, by sending the questionnaires to only 200 bank presidents, one risks getting the results from a sample which is not representative of the population. The sample must be chosen in such a way that the results will be representative of the entire population of bank presidents in order to be of any use.

Answers will vary. Using MINITAB, the 5 seven-digit phone numbers generated with area code 373 were: 373-639-0598 373-411-9164 373-502-7699 373-782-2719 373-930-3231

1.37

1.38

1.39

a.

The population of interest is the set of all people in the United States at least 15 years of age.

b.

The variable being measured is the employment status of each person. This variable is qualitative. Each person is either employed or not.

c.

The problem of interest to the Census Bureau is inferential. Based on the information contained in the sample, the Census Bureau wants to estimate the percentage of all people in the labor force who are unemployed.

a.

The process being studied is the process of filling beverage cans with softdrink at CCSB's Wakefield plant.

b.

The variable of interest is the amount of carbon dioxide added to each can of beverage.

c.

The sampling plan was to monitor five filled cans every 15 minutes. The sample is the total number of cans selected.

d.

The company's immediate interest is learning about the process of filling beverage cans with softdrink at CCSB's Wakefield plant. To do this, they are measuring the amount of carbon dioxide added to a can of beverage to make an inference about the process of filling beverage cans. In particular, they might use the mean amount of carbon dioxide added to the sampled cans of beverage to estimate the mean amount of carbon dioxide added to all the cans on the process line.

e.

The technician would then be dealing with a population. The cans of beverage have already been processed. He/she is now interested in the outputs.

Suppose we want to select 900 intersections by numbering the intersections from 1 to 500,000. We would then use a random number table or a random number generator from a software program to select 900 distinct intersection points. These would then be the sampled markets. Now, suppose we want to select the 900 intersections by selecting a row from the 500 and a column from the 1,000. We would first number the rows from 1 to 500 and number the columns from 1 to 1,000. Using Copyright © 2022 Pearson Education, Inc.


Statistics, Data, and Statistical Thinking

9

a random number generator, we would generate a sample of 900 from the 500 rows. Obviously, many rows will be selected more than once. At the same time, we use a random number generator to select 900 columns from the 1,000 columns. Again, some of the columns could be selected more than once. Placing these two sets of random numbers side-by-side, we would use the row-column combinations to select the intersections. For example, suppose the first row selected was 453 and the first column selected was 731. The first intersection selected would be row 453, column 731. This process would be continued until 900 unique intersections were selected. 1.40

Answers will vary. a.

The results as stated indicate that by eating oat bran, one can improve his/her health. However, the only way to get the stated benefit is to eat only oat bran with limited results. People may change their eating habits expecting an outcome that is almost impossible.

b.

In order to investigate the impact of domestic violence on birth defects, one would need to collect data on all kinds of birth defects and whether the mother suffered any domestic violence or not during her pregnancy. One could use an observational study survey to collect the data.

c.

Very few people are always happy with the way they are. However, many people are happy with themselves most of the time. One might want to ask a series of questions to measure self-esteem rather than just one. One question might ask what percent of the time the high school girl is happy with the way she is.

d.

The results of the study are probably misleading because of the fact that if someone relied on a limited number of foods to feed her children it does not imply that the children are hungry. In addition, one might cut the size of a meal because the children were overweight, not because there was not enough food. One might get better information about the proportion of hungry American children by actually recording what a large, representative sample of children eat in a week.

e.

A leading question gives information that seems to be true, but may not be complete. Based on the incomplete information, the respondent may come to a different decision than if the information was not provided.

Copyright © 2022 Pearson Education, Inc.


Chapter 2 Methods for Describing Sets of Data 2.1

First, we find the frequency of the grade A. The sum of the frequencies for all five grades must be 200. Therefore, subtract the sum of the frequencies of the other four grades from 200. The frequency for grade A is: 200 − (36 + 90 + 30 + 28) = 200 − 184 = 16 To find the relative frequency for each grade, divide the frequency by the total sample size, 200. The relative frequency for the grade B is 36/200 = .18. The rest of the relative frequencies are found in a similar manner and appear in the table: Grade on Statistics Exam A: 90 −100 B: 80 − 89 C: 65 − 79 D: 50 − 64 F: Below 50 Total

2.2

a.

Relative Frequency .08 .18 .45 .15 .14 1.00

To find the frequency for each class, count the number of times each letter occurs. The frequencies for the three classes are: Class X Y Z Total

b.

Frequency 16 36 90 30 28 200

Frequency 8 9 3 20

The relative frequency for each class is found by dividing the frequency by the total sample size. The relative frequency for the class X is 8/20 = .40. The relative frequency for the class Y is 9/20 = .45. The relative frequency for the class Z is 3/20 = .15. Class X Y Z Total

Frequency 8 9 3 20

Relative Frequency .40 .45 .15 1.00

10 Copyright © 2022 Pearson Education, Inc.


Methods for Describing Sets of Data c.

The frequency bar chart is:

9 8

Frequency

7 6 5 4 3 2 1 0

d.

X

Y C la s s

Z

The pie chart for the frequency distribution is: Pie Chart of Class Category X Y Z

Z 15.0%

X 40.0%

Y 45.0%

2.3

a.

The bar graph for the male student data is shown here:

Copyright © 2022 Pearson Education, Inc.

11


12

2.4

Chapter 2 b.

The bar graph for the female student data is shown here:

c.

Male and female responses were very similar for the foreign language and humanities areas of study.

d.

The greatest differences between male and female responses are found in the mathematics and English area of studies.

a.

The pie chart indicates that 30.0% of the adults in the sample currently have only a cable/satellite TV subscription at home. This can be verified by the survey results. 240/800 = .30, or 30%.

b.

After removing the adults that have neither a cable/satellite TV or streaming service, the pie chart is shown here:

41.9% of the adults reported having both a cable/satellite TV and streaming service, 34.9% reported having only a cable/satellite TV, and 23.3% reported having a streaming service. 2.5

a.

The type of graph is a bar graph.

b.

The variable measured for each of the robots is type of robotic limbs.

c.

From the graph, the design used the most is the “legs only” design.

d.

The relative frequencies are computed by dividing the frequencies by the total sample size. The total sample size is n = 106. The relative frequencies for each of the categories are:

Copyright © 2022 Pearson Education, Inc.


Methods for Describing Sets of Data Type of Limbs None Both Legs ONLY Wheels ONLY Total e.

Frequency 15 8 63 20 106

13

Relative Frequency 15/106 = .142 8 / 106 = .075 63/106 = .594 20/106 = .189 1.000

Using MINITAB, the Pareto diagram is:

.60

Relative Frequency

.50 .40 .30 .20 .10 0

Legs

Wheels

None

Both

Type Percent within all data.

2.6

a.

Credit card company would take on values like “Visa,” “Mastercard,” etc. It is a qualitative variable.

b.

There were a total of 295.7 billion transactions in 2018. The percentage for each credit card is found by taking the number of worldwide transactions and dividing by 295.7 billion. The results are shown in the table below:

Credit Card

Number (billions)

Percentage

Visa

147.9

0.5002

Mastercard

75.8

0.2563

American Express

7.5

0.0254

UnionPay

58.6

0.1982

JCB

3.4

0.0115

Discover

2.5

0.0085

Total

295.7

Copyright © 2022 Pearson Education, Inc.


2.7

Chapter 2 c.

The bar graph is shown below:

d.

Visa, by far, is the most used credit card in the world. Both Mastercard and UnionPay are the next most used credit cards with all others well behind.

a.

Using MINITAB, the pie chart is: Pie Chart of Product Explorer 12.0%

Category Office Windows Explorer

Office 24.0%

Windows 64.0%

b.

Explorer had the lowest proportion of security issues with the proportion Using MINITAB, the Pareto chart is:

.12.

50 40 30

Percent

14

20 10 0

R

e ot em

co

de

u ec ex

n ti o ge ile iv Pr

io at ev el

n

f In

i at m or

on

s di

e ur os cl en D

ce vi er fs o l ia

fi n oo Sp

g

Bulletins Percent is calculated within all data.

The security bulletin with the highest frequency is Remote code execution. Microsoft should focus Copyright © 2022 Pearson Education, Inc.


Methods for Describing Sets of Data on this repercussion. a.

Using MINITAB, the Pareto chart is: 40

30

Percent

2.8

20

10

0

WLAN/Single

WLAN/Multi

WSN/SINGLE

WSN/Multi

AHN/SINGLE

AHN/Multi

Network/Channel Percent is calculated within all data.

The network type and number of channels that suffered the most number of jamming attacks is WLAN/Single. The network/number of channels type that received the next most number of jamming attacks is WSN/Single and WLAN/Multi. The network/Number of channels type that suffered the least number of jamming attacks is AHN/Multi. b.

Using MINITAB, the pie chart is: Pie Chart of Network Category WLAN WSN AHN

AHN 16.3%

WSN 27.5%

WLAN 56.3%

The network type that suffered the most jamming attacks is WLAN with more than half. The network type that suffered the least number of jamming attacks is AHN.

Copyright © 2022 Pearson Education, Inc.

15


16 2.9

Chapter 2 Using MINITAB, the pie chart is: Pie Chart of Degree Category None First Post

Post 10.4%

None 36.9%

First 52.7%

A little of half of the successful candidates had a First (Bachelor’s) degree, while a little more than a third of the successful candidates had no degree. Only about 10% of the successful candidates had graduate degrees. Using MINITAB, the bar graphs of the 2 waves is:

Sch

NoWorkGrad

NoWorkBusSch

Sch

2

NoWorkGrad

WorkMBA

WorkNoMBA

NoWorkBusSch

1

90 80 70 60 50 40 30 20 10 0

WorkMBA

WorkNoMBA

Chart of Job Status

Percent

2.10

Job Status Panel variable: Wave; Percent within all data.

In wave 1, most of those taking the GMAT were working 2657/3244 = .819 and none had MBA’s. About 20% were not working but were in either a 4-year institution or other graduate school 36 + 551 / 3244 = .181 . In wave 2, almost all were now working 1787 + 1372 /3244 = .974 . Of those working, more than half had MBA’s 1787/ 1787 + 1372 = .566 . Of those not working, most were in another graduate school.

Copyright © 2022 Pearson Education, Inc.


Methods for Describing Sets of Data 2.11

17

Using MINITAB, the Pareto diagram for the data is: Chart of Tenants 50

Percent

40

30

20

10

0

Small

SmallStandard

Large Tenants

Major

Anchor

Percent within all data.

Most of the tenants in UK shopping malls are small or small standard. They account for approximately 84% of all tenants 711 + 819 /1,821 = .84 . Very few (less than 1%) of the tenants are anchors. 2.12

A side-by-side bar chart is shown here:

The chart shows that there was a substantially larger number of healthy purchases (compared to unhealthy or neutral purchases) made with the non-indulgent scent. The types of purchases made with the indulgent scent were more similar across the three categories. This would support the researchers claim.

Copyright © 2022 Pearson Education, Inc.


18

Chapter 2

2.13

a.

The pie chart is shown below:

More than half (by volume) of the shellfish are caught in the Gulf region, with a pretty equal amount caught in both the Atlantic and Pacific regions. b.

The pie chart is shown below:

The Atlantic and Pacific regions are the leading shellfish catch (by value), followed by the Gulf region. c.

While the Gulf region accounts for the largest volume of shellfish, it pales in comparison to both the Atlantic and Pacific regions value of shellfish caught. The logical conclusion is that the Atlantic and Pacific regions are catching a much more valuable type of shellfish than the Gulf region.

Copyright © 2022 Pearson Education, Inc.


Methods for Describing Sets of Data 2.14

19

Using MINITAB, a pie chart of the data is: Pie Chart of Measure Category Big Shows Funds Raised Members Paying visitors Total visitors

Big Shows 20.0%

Total visitors 26.7%

Funds Raised 23.3%

Paying visitors 16.7%

Members 13.3%

Since the sizes of the slices are close to each other, it appears that the researcher is correct. There is a large amount of variation within the museum community with regard to performance measurement and evaluation. 2.15

a.

The variable measured by Performark is the length of time it took for each advertiser to respond back.

b.

The pie chart is: Pie Chart of Response Time

13-59 days 25.6%

Never responded 23.3%

Category Never responded >120 days 60-120 days 13-59 days

>120 days 13.3%

60-120 days 37.8%

c.

Twenty-one percent or.21 × 17, 000 = 3,570of the advertisers never respond to the sales lead.

d.

The information from the pie chart does not indicate how effective the "bingo cards" are. It just indicates how long it takes advertisers to respond, if at all.

Copyright © 2022 Pearson Education, Inc.


20 2.16

Chapter 2 Using MINITAB, the side-by-side bar graphs are: Chart of Dive Left Ahead

Middle

Right

Behind

80 60

Percent

40 20 0

Tied

80 60 40 20 0

Left

Middle

Right

Dive Panel variable: Situation; Percent within all data.

From the graphs, it appears that if the team is either tied or ahead, the goal-keepers tend to dive either right or left with equal probability, with very few diving in the middle. However, if the team is behind, then the majority of goal-keepers tend to dive right (71%). a.

Using MINITAB, bar charts for the 3 variables are: Chart of Well Class 120 100 80 Count

2.17

60 40 20 0

Private

Public Well Class

Copyright © 2022 Pearson Education, Inc.


Methods for Describing Sets of Data Chart of Aquifer 200

Count

150

100

50

0

Bedrock

Unconsolidated Aquifer

Chart of Detection 160 140 120

Count

100 80 60 40 20 0

Below Limit

Detect Detection

Using MINITAB, the side-by-side bar chart is: Chart of Detection Below Limit Private

Detect Public

80 70 60 Percent

b.

50 40 30 20 10 0

Below Limit

Detect Detection

Panel variable: Well Class; Percent within all data.

Copyright © 2022 Pearson Education, Inc.

21


22

Chapter 2 c.

Using MINITAB, the side-by-side bar chart is: Chart of Detection Below Limit Bedrock

Detect Unconsoli

70 60

Percent

50 40 30 20 10 0

Below Limit

Detect Detection

Panel variable: Aquifer; Percent within all data.

d.

Using MINITAB, the relative frequency histogram is:

.25

.20 Relative Frequency

2.18

From the bar charts in parts a-c, one can infer that most aquifers are bedrock and most levels of MTBE were below the limit(≈ 2/3). Also the percentages of public wells verses private wells are relatively close. Approximately 80% of private wells are not contaminated, while only about 60% of public wells are not contaminated. The percentage of contaminated wells is about the same for both types of aquifers(≈ 30%).

.15

.10

.05

0

0

.5

2.5

4.5

6.5

8.5 Class

10.5

12.5

14.5

16.5

Copyright © 2022 Pearson Education, Inc.


Methods for Describing Sets of Data 2.19

23

To find the number of measurements for each measurement class, multiply the relative frequency by the total number of observations, n = 500. The frequency table is: Measurement Class Relative Frequency .10 .5 − 2.5 .15 2.5 − 4.5 .25 4.5 − 6.5 .20 6.5 − 8.5 .05 8.5 − 10.5 .10 10.5 − 12.5 .10 12.5 − 14.5 .05 14.5 − 16.5

Frequency 500(.10) = 50 500(.15) = 75 500(.25) = 125 500(.20) = 100 500(.05) = 25 500(.10) = 50 500(.10) = 50 500(.05) = 25 500

Using MINITAB, the frequency histogram is: 140 120

Frequency

100 80 60 40 20 0

2.20

0

.5

2.5

4.5

6.5

8.5 Class

10.5

12.5

14.6

16.5

a.

The original data set has 1 + 3 + 5 + 7 + 4 + 3 = 23 observations.

b.

For the bottom row of the stem-and-leaf display: The stem is 0. The leaves are 0, 1, 2. Assuming that the data are up to two digits, rounded off to the nearest whole number, the numbers in the original data set are 0, 1, and 2.

2.21

c.

Again, assuming that the data are up to two digits, rounded off to the nearest whole number, the dot plot corresponding to all the data points is:

a.

This is a frequency histogram because the number of observations is graphed for each interval rather than the relative frequency.

b.

There are 14 measurement classes. Copyright © 2022 Pearson Education, Inc.


24

2.22

2.23

2.24

Chapter 2 c.

There are 49 measurements in the data set.

a.

The graph is a frequency histogram.

b.

The quantitative variable summarized in the graph is the fup/fumic ratio.

c.

The proportion of ratios greater than 1 is

=

= .034.

d.

The proportion of ratios less than .4 is

=

= .695.

a.

Since the label on the vertical axis is Percent, this is a relative frequency histogram. We can divide the percents by 100% to get the relative frequencies.

b.

Summing the percents represented by all of the bars above 100, we get approximately 12%.

a.

Using MINITAB, the stem-and-leaf display and histogram are:

Stem-and-leaf of Score N = 211 2 2 2 3 11 19 40 67 102 (50) 59 22

7 8 8 8 8 8 9 9 9 9 9 10

99

4 66667777 88888999 000000000011111111111 222222222222233333333333333 44444444444444444555555555555555555 66666666666666666666777777777777777777777777777777 8888888888888888888889999999999999999 0000000000000000000000

Leaf Unit = 1

b.

From the stem-and-leaf display, there are only 3 observations with sanitation scores less than 86. The proportion of ships with accepted sanitation standards is(211 − 3)/211 = 208/211 = .986. Copyright © 2022 Pearson Education, Inc.


Methods for Describing Sets of Data

2.25

2.26

25

c.

The score of 79 is highlighted in the stem-and-leaf display.

a.

The relative frequency histograms are shown below:

b.

The main difference is the larger percentage of attack packets in the smallest size when compared to the normal packets. The general shape of the histogram is otherwise very similar.

a.

Using MINITAB, a histogram of the current values of the 32 NFL teams is:

b.

Using MINITAB, a histogram of the 1-year change in current value for the 32 NFL teams is:

Copyright © 2022 Pearson Education, Inc.


26

Chapter 2 c.

Using MINITAB, a histogram of the debt-to-value ratios for the 32 NFL teams is:

d.

Using MINITAB, a histogram of the annual revenues for the 32 NFL teams is:

e.

Using MINITAB, a histogram of the operating incomes for the 32 NFL teams is:

Copyright © 2022 Pearson Education, Inc.


Methods for Describing Sets of Data

2.27

27

f.

For all but one of the histograms, there is one team that has a very high score. The Dallas Cowboys have the largest values for current value, annual revenues, and operating income. The Los Angeles Rams have the highest debt-to-value ratio. There are no extremes values in the graph of the one-year change in value. All the graphs except the one showing the 1-year value changes are skewed to the right.

a.

Using MINITAB, the frequency histograms for 2019 and 2017 SAT mathematics scores are:

It appears that the scores have not changed very much at all. The graphs are very similar. b.

Using MINITAB, the frequency histogram of the differences is:

From this graph of the differences, we can see that there are more observations to the right of 0 than to the left of 0. This indicates that, in general, the scores have improved since 2017. We can also see that this graph is skewed to the left. c.

From the graph, the largest improvement score is in the bar located above 20. The actual largest score is 32 and it is associated with South Dakota.

Copyright © 2022 Pearson Education, Inc.


28 2.28

Chapter 2 Using MINITAB, the stem-and-leaf display is:

Stem-and-leaf of CR-5 N = 13 1 5 1 6 5 6 (2) 7 6 7 5 8 2 8

5 7777 24 5 124 55

Leaf Unit = 0.1

The 5-year capitalization rates for the tenants with a BBB- S&P rating are highlighted in the stem-and-leaf display. We can see that these tend to be the higher capitalization rates shown. 2.29

Using MINITAB, the stem-and-leaf display is: Stem-and-Leaf Display: Dioxide Stem-and-leaf of Dioxide Leaf Unit = 0.10 5 7 (2) 7 7 5 5 4 4

0 0 1 1 2 2 3 3 4

N

= 16

12234 55 34 44 3 0000

The highlighted values are values that correspond to water specimens that contain oil. There is a tendency for crude oil to be present in water with lower levels of dioxide as 6 of the lowest 8 specimens with the lowest levels of dioxide contain oil. a.

Using MINTAB, the histograms of the number of deaths is: Histogram of Deaths 12

10

8

Frequency

2.30

6

4

2

0

0

200

400

600

800

1000

Deaths

b.

The interval containing the largest proportion of estimates is 0-50. Almost half of the estimates fall in this interval. Copyright © 2022 Pearson Education, Inc.


Methods for Describing Sets of Data

29

2.31

Yes, we would agree with the statement that honey may be the preferable treatment for the cough and sleep difficulty associated with childhood upper respiratory tract infection. For those receiving the honey dosage, 14 of the 35 children (or 40%) had improvement scores of 12 or higher. For those receiving the DM dosage, only 9 of the 33 (or 24%) children had improvement scores of 12 or higher. For those receiving no dosage, only 2 of the 37 children (or 5%) had improvement scores of 12 or higher. In addition, the median improvement score for those receiving the honey dosage was 11, the median for those receiving the DM dosage was 9 and the median for those receiving no dosage was 7.

2.32

a.

Using MINITAB, the stem-and-leaf display is as follows, where the stems are the units place and the leaves are the decimal places:

Stem-and-Leaf Display: Time Stem-and-leaf of Time Leaf Unit = 0.10 (26) 23 15 9 4 2 2 1 1 1

1 2 3 4 5 6 7 8 9 10

N

= 49

00001122222344444445555679 11446799 002899 11125 24 8

1

b.

A little more than half (26/49 = .53) of all companies spent less than 2 months in bankruptcy. Only two of the 49 companies spent more than 6 months in bankruptcy. It appears that, in general, the length of time in bankruptcy for firms using "prepacks" is less than that of firms not using prepacks."

c.

A dot diagram will be used to compare the time in bankruptcy for the three types of "prepack" firms:

Votes

Dotplot of Time vs Votes

Joint None Prepack

1.2

2.4

3.6

4.8

6.0

7.2

8.4

9.6

Time

d.

The highlighted times in part a correspond to companies that were reorganized through a leverage buyout. There does not appear to be any pattern to these points. They appear to be scattered about evenly throughout the distribution of all times.

Copyright © 2022 Pearson Education, Inc.


30 2.33

Chapter 2 Using MINITAB, the histogram of the data is: Histogram of INTTIME 60 50

Frequency

40 30 20 10 0

0

75

150

225 300 INTTIME

375

450

525

This histogram looks very similar to the one shown in the problem. Thus, there appears that there was minimal or no collaboration or collusion from within the company. We could conclude that the phishing attack against the organization was not an inside job. 2.34

Using MINITAB, the stem-and-leaf display for the data is:

Stem-and-Leaf Display: Time Stem-and-leaf of Time Leaf Unit = 1.0 3 7 (7) 11 6 4 2 1

N

= 25

3 239 4 3499 5 0011469 6 34458 7 13 8 26 9 5 10 2

The numbers in bold represent delivery times associated with customers who subsequently did not place additional orders with the firm. Since there were only 2 customers with delivery times of 68 days or longer that placed additional orders, I would say the maximum tolerable delivery time is about 65 to 67 days. Everyone with delivery times less than 67 days placed additional orders. 2.35

Assume the data are a sample. The sample mean is: 𝑥̄ =

.

=

.

.

.

.

.

=

.

= 2.717

The median is the average of the middle two numbers when the data are arranged in order (since n = 6 is even). The data arranged in order are: 2.0, 2.1, 2.5, 2.8, 3.2, 3.7. The middle two numbers are 2.5 and 2.8. The median is: .

.

=

.

= 2.65

Copyright © 2022 Pearson Education, Inc.


Methods for Describing Sets of Data 2.36

=

31

= 8.5

a.

𝑥̄ =

b.

𝑥̄ =

= 25

c.

𝑥̄ =

= .778

d.

𝑥̄ =

= 13.44

2.37

The mean and median of a symmetric data set are equal to each other. The mean is larger than the median when the data set is skewed to the right. The mean is less than the median when the data set is skewed to the left. Thus, by comparing the mean and median, one can determine whether the data set is symmetric, skewed right, or skewed left.

2.38

The median is the middle number once the data have been arranged in order. If n is even, there is not a single middle number. Thus, to compute the median, we take the average of the middle two numbers. If n is odd, there is a single middle number. The median is this middle number. A data set with five measurements arranged in order is 1, 3, 5, 6, 8. The median is the middle number, which is 5. A data set with six measurements arranged in order is 1, 3, 5, 5, 6, 8. The median is the average of the middle two numbers which is = = 5.

2.39

Assume the data are a sample. The mode is the observation that occurs most frequently. For this sample, the mode is 15, which occurs three times. The sample mean is: 𝑥̄ =

=

=

= 14.545

The median is the middle number when the data are arranged in order. The data arranged in order are: 10, 11, 12, 13, 15, 15, 15, 16, 17, 18, 18. The middle number is the 6th number, which is 15. 2.40

a.

𝑥̄ =

=

Median = Mode = 3 ∑

=

= 2.5

= 3 (mean of 3rd and 4th numbers, after ordering)

b.

𝑥̄ = = = = 3.08 Median = 3 (7th number, after ordering) Mode = 3

c.

𝑥̄ =

=

Median = Mode = 50 2.41

=

= 49.6

= 49 (mean of 5th and 6th numbers, after ordering)

a.

For a distribution that is skewed to the left, the mean is less than the median.

b.

For a distribution that is skewed to the right, the mean is greater than the median.

c.

For a symmetric distribution, the mean and median are equal. Copyright © 2022 Pearson Education, Inc.


32

Chapter 2

2.42

a.

The average score for Energy Star is 4.44. The average score is close to 5 meaning the average score is close to ‘very familiar’.

b.

The median score for Energy Star is 5. At least half of the respondents indicated that they are very familiar with the ecolabel Energy Star.

c.

The mode score for Energy Star is 5. More respondents answered ‘very familiar’ to Energy Star than any other option.

d.

The ecolabel that appears to be most familiar to travelers is Energy Star.

a.

This statistic represents a population mean because it is computed for every freshman who attended the university in 2018. The average financial aid awarded to freshmen at Harvard University is $41,505.

b.

This statistic represents a sample median because it is computed for a sample of alumni. The median salary during early career for alumni of Harvard University is $74,800. Half of the alumni from Harvard make more than $74,800 during their early career.

a.

The mean is 𝑥̄ =

2.43

2.44

=

( . ) (

. )

.

.

.

.

.

.

.

=

.

= 10.33

The average annualized percentage return on investment for 10 randomly selected stock screeners is 10.33. b.

Since the number of observations is even, the median is the average of the middle number two values once the data have been arranged in order. The data arranged in order are: -1.6 -.1 3.2 7.7 9.0 9.8 14.6 16.0 19.9 24.8 The median is: 𝑚𝑒𝑑𝑖𝑎𝑛 =

.

.

= 9.4

Half of the annualized percentage returns on investment are below 9.4 and half are above 9.4. 2.45

MINITAB was used to generate the descriptive statistics shown here:

Statistics Variable CR-5

N 13

N* 0

Mean 7.419

StDev 0.922

Minimum 5.500

Q1 6.750

Median 7.400

Q3 8.350

Maximum 8.500

Mode 6.75

N for Mode 3

The mean is 7.419. The average 5-year capitalization rate for the 13 tenants sampled is 7.419%. The median is 7.400. Half of the 5-year capitalization rates of the sampled tenants are above 7.400% and half are below 7.400% The mode is 6.75. More tenants had a 5-year capitalization rate of 6.75% than any other value.

2.46

a.

The mean size of the attack packets was 276.8538. The average packet size of the attack packets was 276.8538 bytes. The mean size of the normal packets was 337.4500. The average packet size of the normal packets was 337.4500 bytes. Copyright © 2022 Pearson Education, Inc.


Methods for Describing Sets of Data

2.47

2.48

2.49

33

b.

The median size of the attack packets was 207.5000. Half of the attack packets were smaller than 207.5000 bytes and half were larger than 207.5000 bytes. The median size of the normal packets was 223.0000. Half of the normal packets were smaller than 223.0000 bytes and half were larger than 223.0000 bytes.

c.

Because both groups have a much larger mean value than median value, we would expect both the attack and the normal groups to be skewed to the right.

a.

The mean permeability for group A sandstone slices is 73.62mD. The average permeability for group A sandstone is 73.62mD. The median permeability for group A sandstone is 70.45mD. Half of the sandstone slices in group A have permeability less than 70.45mD.

b.

The mean permeability for group B sandstone slices is 128.54mD. The average permeability for group B sandstone is 128.54mD. The median permeability for group B sandstone is 139.30mD. Half of the sandstone slices in group B have permeability less than 139.30mD.

c.

The mean permeability for group C sandstone slices is 83.07mD. The average permeability for group C sandstone is 83.07mD. The median permeability for group C sandstone is 78.650mD. Half of the sandstone slices in group C have permeability less than 78.65mD.

d.

The mode permeability score for group C sandstone is 70.9. More sandstone slices in group C had permeability scores of 70.9 than any other value.

e.

Weathering type B appears to result in faster decay because the mean, median, and mode values fore group B is higher than those for group C.

a.

The mean is 67.755. The statement is accurate.

b.

The median is 68.000. The statement is accurate.

c.

The mode is 64. The statement is not accurate. A better statement would be: “The most common reported level of support for corporate sustainability for the 992 senior managers was 64.

d.

Since the mean and median are almost the same, the distribution of the 992 support levels should be fairly symmetric. The histogram in Exercise 2.23 is almost symmetric.

a.

The median is the middle number (18th) once the data have been arranged in order because n = 35 is odd. The honey dosage data arranged in order are: 4,5,6,8,8,8,8,9,9,9,9,10,10,10,10,10,10,11,11,11,11,12,12,12,12,12,12,13,13,14,15,15,15,15,16 The 18th number is the median = 11.

b.

The median is the middle number (17th) once the data have been arranged in order because n = 33 is odd. The DM dosage data arranged in order are: 3,4,4,4,4,4,4,6,6,6,7,7,7,7,7,8,9,9,9,9,9,10,10,10,11,12,12,12,12,12,13,13,15 The 17th number is the median = 9.

c.

The median is the middle number (19th) once the data have been arranged in order because n = 37 is odd. The No dosage data arranged in order are: 0,1,1,1,3,3,4,4,5,5,5,6,6,6,6,7,7,7,7,7,7,7,7,8,8,8,8,8,8,9,9,9,9,10,11,12,12 The 19th number is the median = 7. Copyright © 2022 Pearson Education, Inc.


34

2.50

Chapter 2 d.

Since the median for the Honey dosage is larger than the other two, it appears that the honey dosage leads to more improvement than the other two treatments.

a.

The mean dioxide level is𝑥̄ =

b.

Since the number of observations is even, the median is the average of the middle 2 numbers once the data are arranged in order. The data arranged in order are:

.

.

.

.

=

= 1.81. The average dioxide amount is 1.81.

0.1 0.2 0.2 0.3 0.4 0.5 0.5 1.3 1.4 2.4 2.4 3.3 4.0 4.0 4.0 4.0 The median is

.

.

=

.

= 1.35. Half of the dioxide levels are below 1.35 and half are above 1.35.

c.

The mode is the number that occurs the most. For this data set the mode is 4.0. The most frequent level of dioxide is 4.0.

d.

Since the number of observations is even, the median is the average of the middle 2 numbers once the data are arranged in order. The data arranged in order are: 0.1 0.3 1.4 2.4 2.4 3.3 4.0 4.0 4.0 4.0 The median is

e.

.

.

=

.

= 2.85.

Since the number of observations is even, the median is the average of the middle 2 numbers once the data are arranged in order. The data arranged in order are: 0.2 0.2 0.4 0.5 0.5 1.3 The median is

2.51

2.52

.

.

=

.

= 0.45.

f.

The median level of dioxide when crude oil is present is 0.45. The median level of dioxide when crude oil is not present is 2.85. It is apparent that the level of dioxide is much higher when crude oil is not present.

a.

Skewed to the right. There will be a few people with very high salaries such as the president and football coach.

b.

Skewed to the left. On an easy test, most students will have high scores with only a few low scores.

c.

Skewed to the right. On a difficult test, most students will have low scores with only a few high scores.

d.

Skewed to the right. Most students will have a moderate amount of time studying while a few students might study a long time.

e.

Skewed to the left. Most cars will be relatively new with a few much older.

f.

Skewed to the left. Most students will take the entire time to take the exam while a few might leave early.

a.

The sample means is:

𝑥̄ =

=

.

.

⋯ ( .

)

=

.

= 3.171

Copyright © 2022 Pearson Education, Inc.


Methods for Describing Sets of Data

35

The median is found by locating the 13th observation, once the data have been ordered. The 13th observations is 2.9165793: For this data set, no observation occurred more than once, so there is not mode.

2.53

b.

The sample average driving performance index is 3.171. The median driving performance index is 2.9165793. Half of all driving performance indexes are less than 2.9165793 and half are higher.

c.

Since the mean is larger than the median, the data will be skewed to the right. Using MINITAB, a histogram of the driving performance index values is:

For the "Joint exchange offer with prepack" firms, the mean time is 2.6545 months, and the median is 1.5 months. Thus, the average time spent in bankruptcy for "Joint" firms is 2.6545 months, while half of the firms spend 1.5 months or less in bankruptcy. For the "No prefiling vote held" firms, the mean time is 4.2364 months, and the median is 3.2 months. Thus, the average time spent in bankruptcy for "No prefiling vote held" firms is 4.2364 months, while half of the firms spend 3.2 months or less in bankruptcy. For the "Prepack solicitation only" firms, the mean time is 1.8185 months, and the median is 1.4 months. Thus, the average time spent in bankruptcy for "Prepack solicitation only" firms is 1.8185 months, while half of the firms spend 1.4 months or less in bankruptcy. Since the means and medians for the three groups of firms differ quite a bit, it would be unreasonable to use a single number to locate the center of the time in bankruptcy. Three different "centers" should be used.

2.54

a.

The mean is 𝜇 =

=

=

= 2.0. The average number of nuclear power plants per

state for states that have nuclear power plants is 2.0 . The median is found by first arranging the data in order from smallest to largest: 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 4 4 5 6 Since there are an odd number of data points, the median is the middle, or 15th, value of the ordered data. In this case, the median is 2. Half of the states with nuclear power plants have 2 or fewer plants. The mode is 1. Most states that have nuclear power plants have just 1.

Copyright © 2022 Pearson Education, Inc.


36

Chapter 2

b.

∑𝑥

2+1+1+⋯1

29

For regulated states: The mean is 𝜇 = = = 17 = 1.706. The average number of 𝑛 17 nuclear power plants per state for states that have nuclear power plants is 1.706. The median is found by first arranging the data in order from smallest to largest: 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 4 Since there are an odd number of data points, the median is the middle number which is 2. Half of the states with nuclear power plants have 2 or fewer plants. The mode is 1. Most states that have nuclear power plants have 1.

c.

For deregulated states: The mean is 𝜇 =

=

=

= 2.417. The average

number of nuclear power plants per state for states that have nuclear power plants is 2.385. The median is found by first arranging the data in order from smallest to largest: 1 1 1 1 1 2 2 2 3 4 5 6 Since there are an even number of data points, the median is the average of the middle two numbers which is = 2. Half of the states with nuclear power plants have 2 or fewer plants. The mode is 1. Most states that have nuclear power plants have 1. d.

Because the average number of nuclear power plants in states that are deregulated is greater than the average number of nuclear power plants in states that are regulated, it appears that regulations limits the number of nuclear power plants.

e.

After deleting the largest observation, the mean is 𝜇 =

=

=

= 1.857. The

average number of nuclear power plants per state for states that have nuclear power plants is 1.857. The median is found by first arranging the data in order from smallest to largest: 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 4 4 5 Since there are an even number of data points, the median is the average of the middle two numbers which is = 2. Half of the states with nuclear power plants have 2 or fewer plants. The mode is 1. Most states that have nuclear power plants have just 1. By deleting the largest observation, the mean decreases, but the median and mode remain the same. f.

The trimmed mean is 𝜇 =

=

=

= 1.800. The trimmed mean is decreased

when the extreme values are removed. 2.55

a.

Due to the "elite" superstars, the salary distribution is skewed to the right. Since this implies that the median is less than the mean, the players' association would want to use the median.

b.

The owners, by the logic of part a, would want to use the mean.

Copyright © 2022 Pearson Education, Inc.


Methods for Describing Sets of Data 2.56

37

a.

The primary disadvantage of using the range to compare variability of data sets is that the two data sets can have the same range and be vastly different with respect to data variation. Also, the range is greatly affected by extreme measures.

b.

The sample variance is the sum of the squared deviations of the observations from the sample mean divided by the sample size minus 1. The population variance is the sum of the squared deviations of the values from the population mean divided by the population size.

c.

The variance of a data set can never be negative. The variance of a sample is the sum of the squared deviations from the mean divided by n − 1. The square of any number, positive or negative, is always positive. Thus, the variance will be positive. The variance is usually greater than the standard deviation. However, it is possible for the variance to be smaller than the standard deviation. If the data are between 0 and 1, the variance will be smaller than the standard deviation. For example, suppose the data set is .8, .7, .9, .5, and .3. The sample mean is: 𝑥̄ =

=

.

.

.

.

.

=

.

.

= .64

The sample variance is: 𝑠 =

(∑ )

=

.

.

=

.

= .058

The standard deviation is 𝑠 = √. 058 = .241 2.57

a.

Range = 4 − 0 = 4 𝑠 =

b.

2.59

𝑠 = √2.3 = 1.52

= 3.619

𝑠 = √3.619 = 1.9

= 7.111

𝑠 = √7.111 = 2.67

(∑ )

=

(∑ )

=

Range = 1 − (−3) = 4 𝑠 =

2.58

= 2.3

Range = 8 − (−2) = 10 𝑠 =

d.

=

Range = 6 − 0 = 6 𝑠 =

c.

(∑ )

a.

𝑠 =

b.

𝑠 =

(∑ )

=

.

(

. )

= 1.395

(∑ )

=

= 4.8889

=

= 3.3333

a.

∑ 𝑥 = 3 + 1 + 10 + 10 + 4 = 28

𝑠 =

𝑠 = √3.3333 = 1.826

(∑ )

𝑠 =

𝑠 = √4.8889 = 2.211

(∑ )

c.

𝑥̄ =

𝑠 = √1.395 = 1.18

=

=

= .1868

𝑠 = √. 1868 = .432 ∑ 𝑥 = 3 + 1 + 10 + 10 + 4 = 226

= 5.6 (∑ )

=

=

.

= 17.3

𝑠 = √17.3 = 4.1593

Copyright © 2022 Pearson Education, Inc.


38

Chapter 2 b.

∑ 𝑥 = 8 + 10 + 32 + 5 = 55 ∑

𝑥̄ = 𝑠 =

=

∑ 𝑥 = 8 + 10 + 32 + 5 = 1213

= 13.75feet (∑ )

=

.

=

= 152.25square feet

𝑠 = √152.25 = 12.339feet c.

∑ 𝑥 = −1 + (−4) + (−3) + 1 + (−4) + (−4) = −15 ∑ 𝑥 = (−1) + (−4) + (−3) + 1 + (−4) + (−4) = 59 𝑥̄ =

𝑠 = d.

𝑠 = a.

)

.

=

= 4.3

𝑠 = √4.3 = 2.0736

= 2 ∑𝑥 =

+

+

+

+

+

=

= .96

= = = .33ounce (∑ )

=

=

.

= .0587square ounce 𝑠 = √. 0587 = .2422ounce

(∑ )

=

= 3.7

𝑠 = √3.7 = 1.92

(∑ )

=

,

= 1,949.25

𝑠=

= 1,307.84

𝑠 = √1,307.84 = 36.16

1,949.25 = 44.15

Range = 100 − 2 = 98 𝑠 =

2.61

(

=

Range = 100 − 1 = 99 𝑠 =

c.

(∑ )

Range = 42 − 37 = 5 𝑠 =

b.

= −2.5

∑𝑥 = + + + + + = 𝑥̄ =

2.60

=

(∑ )

=

,

This is one possibility for the two data sets. Data Set 1: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 Data Set 2: 0, 0, 1, 1, 2, 2, 3, 3, 9, 9 The two sets of data above have the same range = largest measurement − smallest measurement = 9 − 0 = 9. The means for the two data sets are: 𝑥̄ =

=

=

= 4.5

𝑥̄ =

=

=

=3

Copyright © 2022 Pearson Education, Inc.


Methods for Describing Sets of Data The dot diagrams for the two data sets are shown below. Dotplot of x1, x2

x1 0

2

x

4

6

8

6

8

x2 0

2.62

2

x

4

This is one possibility for the two data sets. Data Set 1: 1, 1, 2, 2, 3, 3, 4, 4, 5, 5 Data Set 2: 1, 1, 1, 1, 1, 5, 5, 5, 5, 5 𝑥̄ =

=

=

=3

𝑥̄ =

=

=

=3

Therefore, the two data sets have the same mean. The variances for the two data sets are: 𝑠 = 𝑠 =

(∑ )

=

=

= 2.2222

=

=

= 4.4444

(∑ )

Copyright © 2022 Pearson Education, Inc.

39


40

Chapter 2 The dot diagrams for the two data sets are shown below. Dotplot of x1, x2

x1 x 1

2

3 x2

1

2

3

4

5

4

5

x

2.63

a.

Range = 3 − 0 = 3 𝑠 =

b.

(∑ )

=

= 1.3

𝑠 = √1.3 = 1.14

After adding 3 to each of the data points, Range = 6 − 3 = 3 𝑠 =

c.

(∑ )

=

= 1.3

𝑠 = √1.3 = 1.14

After subtracting 4 from each of the data points, Range = −1 − (−4) = 3 𝑠 =

d.

(∑ )

(

=

)

= 1.3

𝑠 = √1.3 = 1.14

The range, variance, and standard deviation remain the same when any number is added to or subtracted from each measurement in the data set.

2.64

The ecolabel that had the most variation in the numerical responses is Audubon International because it has the largest standard deviation.

2.65

a.

The range of permeability scores for group A sandstone slices isRange = 𝑚𝑎𝑥 − 𝑚𝑖𝑛 = 122.4 − 55.2 = 67.2.

b.

The variance of group A sandstone slices is𝑠 =

(∑ )

=

,

,

.

= 209.5292.

The standard deviation is𝑠 = √209.5292 = 14.475. c.

Condition B has the largest range and the largest standard deviation. Thus, condition B has more variable permeability data. Copyright © 2022 Pearson Education, Inc.


Methods for Describing Sets of Data 2.66

a.

The range in the difference between the maximum and minimum values. The range= 24. 8– (−1.6) = 26. 4. The units of measurement are percents.

b.

The variance is 𝑠 =

(∑ )

=

.

.

=

.

.

=

.

41

= 73.585

The units are square percents.

2.67

2.68

c.

The standard deviation is 𝑠 = √73.585 = 8.578. The units are percents.

a.

The range is 155. The statement is accurate.

b.

The variance is 722.036. The statement is not accurate. A more accurate statement would be: “The variance of the levels of supports for corporate sustainability for the 992 senior managers is 722.036.”

c.

The standard deviation is 26.871. If the units of measure for the two distributions are the same, then the distribution of support levels for the 992 senior managers has less variation than a distribution with a standard deviation of 50. If the units of measure for the second distribution is not known, then we cannot compare the variation in the two distributions by looking at the standard deviations alone.

d.

The standard deviation best describes the variation in the distribution. The range can be greatly affected by extreme measures. The variance is measured in square units which is hard to interpret. Thus, the standard deviation is the best measure to describe the variation.

a.

The sample variance of the honey dosage group is: 𝑠 =

(∑ )

=

4295-

35

35-1

=

.

= 8.1512605

The standard deviation is: 𝑠 = √8.1512605 = 2.855 b.

The sample variance of the DM dosage group is: 𝑠 =

c.

(∑ )

=

2631-

33

33-1

=

.

= 10.604167

The standard deviation is: 𝑠 = √10.604167 = 3.256 The sample variance of the control group is: 𝑠 =

(∑ )

=

1881-

241 37

37-1

=

.

= 8.6456456

The standard deviation is: 𝑠 = √8.6456456 = 2.940 d.

The group with the most variability is the group with the largest standard deviation, which is the DM group. The group with the least variability is the group with the smallest standard deviation, which is the honey group.

Copyright © 2022 Pearson Education, Inc.


42

Chapter 2

2.69

a.

The range is the largest observation minus the smallest observation or 6 – 1 = 5. The variance is: 𝑠 =

=

= 1.7143

The standard deviation is: 𝑠 = √𝑠 = √1.7143 = 1.309 b.

The largest observation is 6. It is deleted from the data set. The new range is: 5 – 1 = 4. The variance is: 𝑠 =

=

= 1.164

The standard deviation is: 𝑠 = √𝑠 = √1.164 = 1.079 When the largest observation is deleted, the range, variance and standard deviation decrease. c.

The largest observation is 6 and the smallest is 1. When these two observations are deleted from the data set, the new range is: 5 – 1 = 4. The variance is: 𝑠 =

=

= 1.1795

The standard deviation is: 𝑠 = √𝑠 = √1.1795 = 1.0860 When the largest and smallest observations are deleted, the range, variance and standard deviation decrease. 2.70

a.

A worker's overall time to complete the operation under study is determined by adding the subtasktime averages. Worker A ∑ = The average for subtask 1 is: 𝑥̄ =

= 30.14

= =3 The average for subtask 2 is: 𝑥̄ = Worker A's overall time is 30.14 + 3 = 33.14. Worker B ∑ The average for subtask 1 is: 𝑥̄ = =

= 30.43

= = 4.14 The average for subtask 2 is: 𝑥̄ = Worker B's overall time is 30.43 + 4.14 = 34.57. b.

Worker A 𝑠=

(∑ )

=

= √15.8095 = 3.98

=

= √. 9524 = .98

Worker B 𝑠=

(∑ )

Copyright © 2022 Pearson Education, Inc.


Methods for Describing Sets of Data

43

c.

The standard deviations represent the amount of variability in the time it takes the worker to complete subtask 1.

d.

Worker A 𝑠=

(∑ )

=

= √. 6667 = .82

=

= √4.4762 = 2.12

Worker B 𝑠= e.

(∑ )

I would choose workers similar to worker B to perform subtask 1. Worker B has a slightly higher average time on subtask 1 (A: 𝑥̄ = 30.14, B: 𝑥̄ = 30.43). However, Worker B has a smaller variability in the time it takes to complete subtask 1 (part b). He or she is more consistent in the time needed to complete the task. I would choose workers similar to Worker A to perform subtask 2. Worker A has a smaller average time on subtask 2 (A: 𝑥̄ = 3, B: 𝑥̄ = 4.14). Worker A also has a smaller variability in the time needed to complete subtask 2 (part d).

2.71

a. b.

The unit of measurement of the variable of interest is dollars (the same as the mean and standard deviation). Based on this, the data are quantitative. Since no information is given about the shape of the data set, we can only use Chebyshev's Rule. $900 is 2 standard deviations below the mean, and $2100 is 2 standard deviations above the mean. Using Chebyshev's Rule, at least 3/4 of the measurements (or 3/4 × 200 = 150 measurements) will fall between $900 and $2100. $600 is 3 standard deviations below the mean and $2400 is 3 standard deviations above the mean. Using Chebyshev's Rule, at least 8/9 of the measurements (or 8/9 × 200 ≈ 178 measurements) will fall between $600 and $2400. $1200 is 1 standard deviation below the mean and $1800 is 1 standard deviation above the mean. Using Chebyshev's Rule, nothing can be said about the number of measurements that will fall between $1200 and $1800. $1500 is equal to the mean and $2100 is 2 standard deviations above the mean. Using Chebyshev's Rule, at least 3/4 of the measurements (or 3/4 × 200 = 150 measurements) will fall between $900 and $2100. It is possible that all of the 150 measurements will be between $900 and $1500. Thus, nothing can be said about the number of measurements between $1500 and $2100.

2.72

Since no information is given about the data set, we can only use Chebyshev's Rule. a.

Nothing can be said about the percentage of measurements which will fall between 𝑥̄ − 𝑠 and 𝑥̄ + 𝑠.

b.

At least 3/4 or 75% of the measurements will fall between𝑥̄ − 2𝑠and𝑥̄ + 2𝑠.

c.

At least 8/9 or 89% of the measurements will fall between𝑥̄ − 3𝑠and𝑥̄ + 3𝑠.

Copyright © 2022 Pearson Education, Inc.


44 2.73

2.74

Chapter 2 According to the Empirical Rule: a.

Approximately 68% of the measurements will be contained in the interval𝑥̄ − 𝑠to𝑥̄ + 𝑠.

b.

Approximately 95% of the measurements will be contained in the interval𝑥̄ − 2𝑠to𝑥̄ + 2𝑠.

c.

Essentially all the measurements will be contained in the interval𝑥̄ − 3𝑠to𝑥̄ + 3𝑠.

a.

𝑥̄ =

𝑠 =

=

= 8.24 (∑ )

=

= 3.357

𝑠 = √3.357 = 1.83

b. Number of Measurements in Interval

Interval

Percentage

𝑥̄ ± 𝑠, or (6.41, 10.07)

18

18/25 = .72 or 72%

𝑥̄ ± 2𝑠, or (4.58, 11.90)

24

24/25 = .96 or 96%

𝑥̄ ± 3𝑠, or (2.75, 13.73)

25

25/25 = 1.00or 100%

c.

The percentages in part b are in agreement with Chebyshev's Rule and agree fairly well with the percentages given by the Empirical Rule.

d.

Range = 12 − 5 = 7 and 𝑠 ≈

Range

= = 1.75

The range approximation provides a satisfactory estimate of 𝑠 = 1.83 from part a. 2.75

Using Chebyshev's Rule, at least 8/9 of the measurements will fall within 3 standard deviations of the mean. Thus, the range of the data would be around 6 standard deviations. Using the Empirical Rule, approximately 95% of the observations are within 2 standard deviations of the mean. Thus, the range of the data would be around 4 standard deviations. We would expect the standard deviation to be somewhere between Range/6 and Range/4. For our data, the range = 760 − 135 = 625. The

Range

=

Range

= 104.17and

=

= 156.25.

Therefore, I would estimate that the standard deviation of the data set is between 104.17 and 156.25. It would not be feasible to have a standard deviation of 25. If the standard deviation were 25, the data would span 625/25 = 25 standard deviations. This would be extremely unlikely. 2.76

a.

𝑧=

= −3 A score of 263 would be 3 standard deviations below the mean.

𝑧=

= 3 A score of 443 would be 3 standard deviations above the mean.

Using Chebyshev’s Rule, at least 8/9 of the observations will be within 3 standard deviations of the mean.

Copyright © 2022 Pearson Education, Inc.


Methods for Describing Sets of Data b.

For a mound-shaped, symmetric distribution, approximately 99.7% of the observations will be within 3 standard deviations of the mean, using the Empirical Rule.

c.

𝑧=

= −3 A score of 109 would be 3 standard deviations below the mean.

𝑧=

= 3 A score of 259 would be 3 standard deviations above the mean.

d. 2.77

45

a.

Using Chebyshev’s Rule, at least 8/9 of the observations will be within 3 standard deviations of the mean. For a mound-shaped, symmetric distribution, approximately 99.7% of the observations will be within 3 standard deviations of the mean, using the Empirical Rule. Because the distribution is skewed, we will use Chebyshev’s Rule. At least 8/9 of the observations will be within 3 standard deviations of the mean: 𝑥̄ ± 3𝑠 ⇒ 73.62 ± 3(14.48) ⇒ 73.62 ± 43.44 ⇒ (30.18,117.06)

b.

Because the distribution is skewed, we will use Chebyshev’s Rule. At least 8/9 of the observations will be within 3 standard deviations of the mean: 𝑥̄ ± 3𝑠 ⇒ 128.54 ± 3(21.97) ⇒ 128.54 ± 65.91 ⇒ (62.63,194.45)

c.

Because the distribution is skewed, we will use Chebyshev’s Rule. At least 8/9 of the observations will be within 3 standard deviations of the mean: 𝑥̄ ± 3𝑠 ⇒ 83.07 ± 3(20.05) ⇒ 83.07 ± 60.15 ⇒ (22.92,143.22) Although all the intervals overlap, it appears that weathering group B results in faster decay because the sample mean is higher and the upper limit of the interval is much higher than the upper limit for the other two weathering types.

a.

Using MINITAB, the histogram of the data is: Histogram of Wheels 12 10 8 Frequency

2.78

d.

6 4 2 0

1

2

3

4

5

6

7

8

Wheels

Since the distribution is skewed to the right, it is not mound-shaped and it is not symmetric.

Copyright © 2022 Pearson Education, Inc.


46

Chapter 2 b.

Using MINITAB, the results are: Descriptive Statistics: Wheels Variable Wheels

N 28

Mean 3.214

StDev 1.371

Minimum 1.000

Q1 2.000

Median 3.000

Q3 4.000

Maximum 8.000

The mean is 3.214 and the standard deviation is 1.371.

2.79

2.80

c.

The interval is: 𝑥̄ ± 2𝑠 ⇒ 3.214 ± 2(1.371) ⇒ 3.214 ± 2.742 ⇒ (0.472,5.956).

d.

According to Chebyshev’s rule, at least 75% of the observations will fall within 2 standard deviations of the mean.

e.

According to the Empirical Rule, approximately 95% of the observations will fall within 2 standard deviations of the mean.

f.

Actually, 26 of the 28 or 26/28 = .929 of the observations fall within the interval. This value is close to the 95% that we would expect with the Empirical Rule.

a.

The interval𝑥̄ ± 2𝑠will contain at least 75% of the observations. This interval is 𝑥̄ ± 2𝑠 ⇒ 3.11 ± 2(.66) ⇒ 3.11 ± 1.32 ⇒ (1.79,4.43).

b.

No. The value 1.25 does not fall in the interval𝑥̄ ± 2𝑠. We know that at least 75% of all observations will fall within 2 standard deviations of the mean. Since 1.25 falls more than 2 standard deviations from the mean, it would not be a likely value to observe.

a.

According to the Empirical Rule, approximately 95% of the observations will fall within 2 standard deviations of the mean. This interval is 𝑥̄ ± 2𝑠 ⇒ 160.3 ± 2(19.6) ⇒ 160.3 ± 39.2 ⇒ (121.1, 199.5)

b.

According to the Empirical Rule, approximately 95% of the observations will fall within 2 standard deviations of the mean. This interval is 𝑥̄ ± 2𝑠 ⇒ 133.5 ± 2(19.6) ⇒ 133.5 ± 39.2 ⇒ (94.3, 172.7)

2.81

c.

The value of 94.3 falls two standard deviation below the mean. From the Empirical Rule, approximately 95% of the shooters will have MVIC strength scores within two standard deviations of the mean. Half of this value gives 47.5% between the mean and two standard deviations below the mean. Because half the data fall below the mean in a symmetric curve, the percentage below twone standard deviations below the mean is found by subtracting 47.5% from 50%. Therefore, approximately 2.5% of the male shooters after six consecutive shooting sessions have MVIC strength scores below 94.3 N.

a.

The sample mean is: 𝑥̄ =

The sample variance is: 𝑠 =

=

20,020

= 94.882

=

1,902,842

, 211

211

The standard deviation is: 𝑠 = √𝑠 = √15.781 = 3.9725

Copyright © 2022 Pearson Education, Inc.

= 15.781


Methods for Describing Sets of Data b.

47

𝑥̄ ± 𝑠 ⇒ 94.882 ± 3.9725 ⇒ (90.9095, 98.8545) 𝑥̄ ± 2𝑠 ⇒ 94.882 ± 2(3.9725) ⇒ 94.882 ± 7.945 ⇒ (86.937, 102.827) 𝑥̄ ± 3𝑠 ⇒ 94.882 ± 3(3.9725) ⇒ 94.78 ± 11.9175 ⇒ (82.9645, 106.7995)

c.

There are 144 out of 211 observations in the first interval. This is (144/211)x100% = 68.25%.. There are 204 out of 211 observations in the second interval. This is (204/211)x100% = 96.68%.. There are 209 out of 211 observations in the third interval. This is (209/211)x100% = 99.05%. The Empirical Rule indicates that approximately 68% of the observations will fall within 1 standard deviation of the mean. It also indicates that approximately 95% of the observations will fall within 2 standard deviations of the mean. Chebyshev’s Theorem says that at least ¾ or 75% of the observations will fall within 2 standard deviations of the mean and at least 8/9 or 88.9% of the observations will fall within 3 standard deviations of the mean. It appears that our observed percentages agree with the Empirical Rule better than Chebyshev’s Theorem.

2.82

Using MINITAB, the descriptive statistics are: Descriptive Statistics: Deaths Variable Deaths

N 27

Mean 163.4

StDev 227.4

Minimum 4.0

Q1 29.0

Median 68.0

Q3 184.0

Maximum 955.0

Since the data are not mound-shaped, we will use Chebyshev’s Rule. Most of the observations (8/9) will fall within 3 standard deviations of the mean. This interval is: 𝑥̄ ± 3𝑠 ⇒ 163.4 ± 3(227.4) ⇒ 163.4 ± 682.2 ⇒ (−518.8, 845.6). Since no observations can be negative, then most observations will fall between 0 and 845.6. 2.83

Using MINITAB, the descriptive statistics are: Descriptive Statistics: Q2 Variable Q2

Q1 No Undecided Yes

N 1 5 30

Mean 2.0000 4.800 3.967

StDev * 0.447 0.850

Minimum 2.0000 4.000 2.000

Q1 * 4.500 3.000

Median 2.0000 5.000 4.000

Q3 * 5.000 5.000

Maximum 2.0000 5.000 5.000

The data for those users who believe there should be national standards is close to being mound-shaped and symmetric. Therefore, we will use the Empirical Rule. Approximately 95% of the observations fall within 2 standard deviations of the mean. This interval is: 𝑥̄ ± 2𝑠 ⇒ 3.967 ± 2(.85) ⇒ 3.967 ± 1.70 ⇒ (2.267, 5.667) 2.84

a.

The average ranking for contestants with a first degree who competed for a job with Lord Sugar is 7.796.

b.

Approximately 95% of the observations will fall within 2 standard deviations of the mean. This interval is: 𝑥̄ ± 2𝑠 ⇒ 7.796 ± 2(4.231) ⇒ 7.796 ± 8.462 ⇒ (−.666, 16.258) Since no observations can be negative, the interval will be 0 to 16.258.

Copyright © 2022 Pearson Education, Inc.


48

2.85

2.86

Chapter 2 c.

No. It appears that just the opposite is true. When the prize was a job, the higher the education level of the contestant, the higher the mean ratting. When the prize was a partnership, the higher the education level of the contestant, the lower mean the rating.

a.

The interval𝑥̄ ± 2𝑠for the flexed arm group is𝑥̄ ± 2𝑠 ⇒ 59 ± 3(4) ⇒ 59 ± 12 ⇒ (47,71). The interval for the extended are group is𝑥̄ ± 2𝑠 ⇒ 43 ± 3(2) ⇒ 43 ± 6 ⇒ (37,49). We know that at least 8/9 or 88.9% of the observations will fall within 3 standard deviations of the mean using Chebyshev’s Rule. Since these 2 intervals barely overlap, the information supports the researchers’ theory. The shoppers from the flexed arm group are more likely to select vice options than the extended arm group.

b.

The interval𝑥̄ ± 2𝑠 for the flexed arm group is𝑥̄ ± 2𝑠 ⇒ 59 ± 2(10) ⇒ 59 ± 20 ⇒ (39,79). The interval for the extended are group is𝑥̄ ± 2𝑠 ⇒ 43 ± 2(15) ⇒ 43 ± 30 ⇒ (13,73). Since these two intervals overlap almost completely, the information does not support the researcher’s theory. There does not appear to be any difference between the two groups.

a.

Yes. The distribution of the buy-side analysts is fairly flat and skewed to the right. The distribution of the sell-side analysts is more mound shaped and is not spread out as far as the buy-side distribution. Since the buy-side distribution is more spread out, the variance of the buy-side distribution will be larger than the variance of the sell-side distribution. Because the buy-side distribution is skewed to the right, the mean will be pulled to the right. Thus, the mean of the buyside distribution will be greater than the mean of the sell-side distribution.

b.

Since the sell-side distribution is fairly mound-shaped, we can use the Empirical Rule. The Empirical Rule says that approximately 95% of the observations will fall within 2 standard deviations of the mean. The interval for the sell-side distribution would be: 𝑥̄ ± 2𝑠 ⇒ −.05 ± 2(.85) ⇒ −.05 ± 1.7 ⇒ (−1.75, 1.65) Since the buy-side distribution is skewed to the right, we cannot use the Empirical Rule. Thus, we will use Chebyshev’s Rule. We know that at least (1 – 1/k2) will fall within k standard deviations of the mean. If we choose𝑘 = 4 , then (1 − 1/4 ) = .9375 or 93.75%. This is very close to 95% requested in the problem. The interval for the buy-side distribution to contain at least 93.75% of the observations would be: 𝑥̄ ± 4𝑠 ⇒ .85 ± 4(1.93) ⇒ .85 ± 7.72 ⇒ (−6.87, 8.57) Note: This interval will contain at least 93.75% of the observations. It may contain more than 93.75% of the observations.

2.87

Since we do not know if the distribution of the heights of the trees is mound-shaped, we need to apply Chebyshev's Rule. We know𝜇 = 30and𝜎 = 3. Therefore, 𝜇 ± 3𝜎 ⇒ 30 ± 3(3) ⇒ 30 ± 9 ⇒ (21,39). According to Chebyshev's Rule, at least 8/9 = .89of the tree heights on this piece of land fall within this interval and at most 1/9 = .11 of the tree heights will fall above the interval. However, the buyer will only purchase the land if at least = .20of the tree heights are at least 40 feet tall. Therefore, the buyer should not buy the piece of land.

2.88

Just by looking at the data, one would guess that the distribution center was the Eastern DC, as the mean value of the Eastern DC is very close to the value of 5 days. Because no shape for the delivery time distributions are given, we will use Chebyshev’s Theorem to get an idea of where “most” of the delivery times will fall. Chebyshev’s Theorem states that most of the data values fall within three standard deviations of the mean. For the Eastern DC, 𝑥̄ ± 3𝑠 ⇒ 5.22 ± 3(.77) ⇒ 5.22 ± 2.31 ⇒ (2.91, 7.53) For the Western DC. 𝑥̄ ± 3𝑠 ⇒ 6.95 ± 3(.55) ⇒ 6.95 ± 1.65 ⇒ (5.30, 8.60) Copyright © 2022 Pearson Education, Inc.


Methods for Describing Sets of Data

49

It is unlikely that a value of 5 days would come from the Western DC, so we believe the Eastern DC was the most likely center. 2.89

We know 𝜇 = 25and𝜎 = 1. Therefore, 𝜇 ± 2𝜎 ⇒ 25 ± 2(.1) ⇒ 25 ± .2 ⇒ (24.8,25.2) The machine is shut down for adjustment if the contents of two consecutive bags fall more than 2 standard deviations from the mean (i.e., outside the interval (24.8, 25.2)). Therefore, the machine was shut down yesterday at 11:30 (25.23 and 25.25 are outside the interval) and again at 4:00 (24.71 and 25.31 are outside the interval).

2.90

2.91

a.

𝑧=

b.

̄

=

= 2 (sample)

2 standard deviations above the mean.

𝑧=

=

= .5 (population)

.5 standard deviations above the mean.

c.

𝑧=

=

= 0 (population)

0 standard deviations above the mean.

d.

𝑧=

=

= −2.5 (sample)

2.5 standard deviations below the mean.

̄

Using the definition of a percentile:

a.

Percentile 75th

Percentage Above 25%

Percentage Below 75%

b.

50th

50%

50%

c.

20th

80%

20%

d.

84th

16%

84%

2.92

QL corresponds to the 25th percentile. QM corresponds to the 50th percentile. QU corresponds to the 75th percentile.

2.93

We first compute z-scores for each x value. a.

𝑧=

=

=2

b.

𝑧=

=

= −3

c.

𝑧=

=

= −2

d.

𝑧=

=

= 1.67

The above z-scores indicate that the x value in part a lies the greatest distance above the mean and the x value of part b lies the greatest distance below the mean.

Copyright © 2022 Pearson Education, Inc.


50 2.94

Chapter 2 Since the element 40 has a z-score of −2 and 90 has a z-score of 3, −2 =

and 3 =

⇒ −2𝜎 = 40 − 𝜇 ⇒ 𝜇 − 2𝜎 = 40 ⇒ 𝜇 = 40 + 2𝜎 ⇒ 3𝜎 = 90 − 𝜇 ⇒ 𝜇 + 3𝜎 = 90 By substitution, 40 + 2𝜎 + 3𝜎 = 90 ⇒ 5𝜎 = 50 ⇒ 𝜎 = 10and𝜇 = 40 + 2(10) = 60. Therefore, the population mean is 60 and the standard deviation is 10. 2.95

The mean score of U.S. eighth-graders on a mathematics assessment test is 282. This is the average score. The 25th percentile is 255. This means that 25% of the U.S. eighth-graders score below 255 on the test and 75% score higher. The 75th percentile is 309. This means that 75% of the U.S. eighth-graders score below 309 on the test and 25% score higher. The 90th percentile is 333. This means that 90% of the U.S. eighthgraders score below 333 on the test and 10% score higher.

2.96

a.

= 𝑧= the mean.

b.

𝑧= = below the mean.

̄

= 1.57 A transformer with 400 sags in a week is 1.57 standard deviations above

̄

= −3.36 A transformer with 100 swells in a week is 3.36 standard deviations

2.97

At early career, half of the University of South Florida graduates had a salary less than $49,700 and half had salaries greater than $49,700. At mid-career, half of the University of South Florida graduates had a salary less than $91,300 and half had salaries greater than $91,300.

2.98

a.

From Exercise 2.81, 𝑥̄ = 94.882 and 𝑠 = 3.9725. The z-score for an observation of 79 is: ̄

=

𝑧=

. .

= −3.998

This z-score indicates that an observation of 73 is 3.998 standard deviations below the mean. Very few observations will be lower than this one. b.

The z-score for an observation of 95 is: 𝑧=

̄

=

. .

= 0.030

This z-score indicates that an observation of 91 is .030 standard deviations above the mean. This score is not an unusual observation in the data set. 2.99

Since the 90th percentile of the study sample in the subdivision was .00372 mg/L, which is less than the USEPA level of .015 mg/L, the water customers in the subdivision are not at risk of drinking water with unhealthy lead levels.

2.100

The z-score associated with a score of 155 is𝑧 = = = 3.25. This score would not be . considered a typical level of support. It is 3.25 standard deviations above the mean. Very few observations would be above this value.

̄

.

Copyright © 2022 Pearson Education, Inc.


Methods for Describing Sets of Data 2.101

51

The average ROE is 13.93. The median ROE is 14.86, meaning 50% of firms have ROE below 14.86. The 5th percentile is -19.64 meaning 5% of firms have ROE below –19.64. The 25th percentile is 7.59 meaning 25% of firms have ROE below 7.59. The 75th percentile is 21.32 meaning 75% of firms have ROE below 21.32. The 95th percentile is 38.42 meaning 95% of firms have ROE below 38.42. The standard deviation is 21.65. Most observations will fall within 2s or 43.30 units of mean. The distribution will be somewhat skewed to the left as the 5th percentile value is much further from the median than the 95th percentile value.

2.102

a.

Since the data are approximately mound-shaped, we can use the Empirical Rule. On the blue exam, the mean is 53% and the standard deviation is 15%. We know that approximately 68% of all students will score within 1 standard deviation of the mean. This interval is: 𝑥̄ ± 𝑠 ⇒ 53 ± 15 ⇒ (38, 68) About 95% of all students will score within 2 standard deviations of the mean. This interval is: 𝑥̄ ± 2𝑠 ⇒ 53 ± 2(15) ⇒ 53 ± 30 ⇒ (23, 83) About 99.7% of all students will score within 3 standard deviations of the mean. This interval is: 𝑥̄ ± 3𝑠 ⇒ 53 ± 3(15) ⇒ 53 ± 45 ⇒ (8, 98)

b.

Since the data are approximately mound-shaped, we can use the Empirical Rule. On the red exam, the mean is 39% and the standard deviation is 12%. We know that approximately 68% of all students will score within 1 standard deviation of the mean. This interval is: 𝑥̄ ± 𝑠 ⇒ 39 ± 12 ⇒ (27, 51) About 95% of all students will score within 2 standard deviations of the mean. This interval is: 𝑥̄ ± 2𝑠 ⇒ 39 ± 2(12) ⇒ 39 ± 24 ⇒ (15, 63) About 99.7% of all students will score within 3 standard deviations of the mean. This interval is: 𝑥̄ ± 3𝑠 ⇒ 39 ± 3(12) ⇒ 39 ± 36 ⇒ (3, 75)

2.103

c.

The student would have been more likely to have taken the red exam. For the blue exam, we know that approximately 95% of all scores will be from 23% to 83%. The observed 20% score does not fall in this range. For the red exam, we know that approximately 95% of all scores will be from 15% to 63%. The observed 20% score does fall in this range. Thus, it is more likely that the student would have taken the red exam.

a.

From the printout, 𝑥̄ = 394.1 and s = 124.7. The gross revenue of $936.66 million has a z-score of 𝑧 =

̄

=

.

. .

= 4.35

Therefore, the highest gross revenue is 4.35 standard deviations above the mean. b.

The gross revenue of $281.576 million has a z-score of 𝑧 =

̄

=

.

. .

= −0.90

Therefore, the 100th highest gross revenue is 0.90 standard deviations below the mean. c.

Z-scores were found for all 100 gross revenues and the top and bottom five are shown in the table below. It was verified that 63 movies have negative z-scores and 37 movies have positive z-score. This would indicate that most of the data will be on the left side of the distribution, with less being on the right side. This would indicate a skewed right distribution as can be seen in the histogram below:

Copyright © 2022 Pearson Education, Inc.


52

Chapter 2

2.104

a.

From the problem, 𝜇 = 2.7and 𝜎 = .5 𝑧=

⇒ 𝑧𝜎 = 𝑥 − 𝜇 ⇒ 𝑥 = 𝜇 + 𝑧𝜎

For z = 2.0, 𝑥 = 2.7 + 2.0(.5) = 3.7 For z = −1.0, 𝑥 = 2.7 − 1.0(.5) = 2.2 For z = .5, 𝑥 = 2.7 + .5(.5) = 2.95 For z = −2.5, 𝑥 = 2.7 − 2.5(.5) = 1.45 b.

For z = −1.6, 𝑥 = 2.7 − 1.6(.5) = 1.9

c.

If we assume the distribution of GPAs is approximately mound-shaped, we can use the Empirical Rule.

From the Empirical Rule, we know that ≈.025 or ≈2.5% of the students will have GPAs above 3.7 (with z = 2). Thus, the GPA corresponding to summa cum laude (top 2.5%) will be greater than 3.7 (z > 2). We know that ≈.16 or 16% of the students will have GPAs above 3.2 (z = 1). Thus, the limit on GPAs for cum laude (top 16%) will be greater than 3.2 (z > 1). We must assume the distribution is mound-shaped.

Copyright © 2022 Pearson Education, Inc.


Methods for Describing Sets of Data 2.105

2.106

2.107

53

a.

Yes. In symmetric distribution, we would expect a similar number of positive and negative z-scores. The larger the difference in the number of positive and negative z-scores, the more skewed the distribution tends to be. In this case, most of the data have negative z-scores, indicating that most of the data is found on the left-hand side of the distribution.

b.

Not necessarily. Because the distribution is highly skewed to the right, the standard deviation is very large. Remember that the z-score represents the number of standard deviations a score is from the mean. If the standard deviation is very large, then the z-scores for observations somewhat near the mean will appear to be fairly small. If we deleted the schools with the very high productivity scores and recomputed the mean and standard deviation, the standard deviation would be much smaller. Thus, most of the z-scores would be larger because we would be dividing by a much smaller standard deviation. This would imply a bigger spread among the rest of the schools than the original distribution with the few outliers.

To determine if the measurements are outliers, compute the z-score. ̄

a.

𝑧=

b.

𝑧= = outlier.

c.

𝑧=

̄

=

= 1.364 Since the z-score is less than 3, this would not be an outlier.

d.

𝑧= outlier.

̄

=

= 3.727 Since the z-score is greater than 3 in absolute value, this would be an

= ̄

= .727

Since the z-score is less than 3, this would not be an outlier.

= −3.273 Since the z-score is greater than 3 in absolute value, this would be an

The interquartile range is𝐼𝑄𝑅 = 𝑄 − 𝑄 = 85 − 60 = 25. The lower inner fence =𝑄 − 1.5(𝐼𝑄𝑅) = 60 − 1.5(25) = 22.5. The upper inner fence =𝑄 + 1.5(𝐼𝑄𝑅) = 85 + 1.5(25) = 122.5. The lower outer fence =𝑄 − 3(𝐼𝑄𝑅) = 60 − 3(25) = −15. The upper outer fence =𝑄 + 3(𝐼𝑄𝑅) = 85 + 3(25) = 160. With only this information, the box plot would look something like the following:

*

──────────── ──────────────────│ + │────── ────────────

─┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼─── 10 20 30 40 50 60 70 80 90 100 110

The whiskers extend to the inner fences unless no data points are that small or that large. The upper inner fence is 122.5. However, the largest data point is 100, so the whisker stops at 100. The lower inner fence is 22.5. The smallest data point is 18, so the whisker extends to 22.5. Since 18 is between the inner and outer fences, it is designated with a *. We do not know if there is any more than one data point below 22.5, so we cannot be sure that the box plot is entirely correct.

Copyright © 2022 Pearson Education, Inc.


54

Chapter 2

2.108

a.

Median is approximately 4.

b.

QL is approximately 3 (Lower Quartile) QU is approximately 6 (Upper Quartile)

2.109

c.

𝐼𝑄𝑅 = 𝑄 − 𝑄 ≈ 6 − 3 = 3

d.

The data set is skewed to the right since the right whisker is longer than the left, there is one outlier, and there are two potential outliers.

e.

50% of the measurements are to the right of the median and 75% are to the left of the upper quartile.

f.

The upper inner fence is𝑄 + 1.5(𝐼𝑄𝑅) = 6 + 1.5(3) = 10.5. The upper outer fence is𝑄 + 3(𝐼𝑄𝑅) = 6 + 3(3) = 15. Thus, there are two suspect outliers, 12 and 13. There is one highly suspect outlier, 16.

a.

Using MINITAB, the box plots for samples A and B are: Boxplot of Sample A, Sample B

Sample A

Sample B

100

125

150

175

200

Data

b.

In sample A, the measurement 84 is an outlier. This measurement falls outside the lower outer fence. Lower outer fence = Lower hinge−3(𝐼𝑄𝑅) ≈ 150 − 3(172 − 150) = 150 − 3(22) = 84 Lower inner fence = Lower hinge−1.5(𝐼𝑄𝑅) ≈ 150 − 1.5(22) = 117 Upper inner fence = Upper hinge+1.5(𝐼𝑄𝑅) ≈ 172 + 1.5(22) = 205 In addition, 100 may be an outlier. It lies outside the inner fence. In sample B, 140 and 206 may be outliers. The point 140 lies outside the inner fence while the point 206 lies right at the inner fence. Lower outer fence = Lower hinge−3(𝐼𝑄𝑅) ≈ 168 − 3(184 − 169) = 168 − 3(15) = 123 Lower inner fence = Lower hinge−1.5(𝐼𝑄𝑅) ≈ 168 − 1.5(15) = 145.5 Upper inner fence = Upper hinge+1.5(𝐼𝑄𝑅) ≈ 184 + 1.5(15) = 206.5

2.110

a.

Using MINITAB, the descriptive statistics are:

Copyright © 2022 Pearson Education, Inc.


Methods for Describing Sets of Data

55

Statistics Variable Academic Rep Score

Total Count 50

Mean 75.92

Minimum Q1 50.00 64.75

Median Q3 75.00 88.25

Maximum IQR 100.00 23.50

The median is 75, the lower quartile is 64.75, and the upper quartile is 88.25.

2.111

2.112

2.113

b.

𝐼𝑄𝑅 = 𝑄 − 𝑄 = 88.25 − 64.75 = 23.50

c.

Using MINITAB, the boxplot is:

d.

From the MINITAB boxplot, we see that there are no outliers or suspect outliers.

a.

𝑧= = = 1.57 Since the z-score is less than 2, 400 sags per week would not be considered unusual.

b.

𝑧= = = −3.36 Since the absolute value of the z-score is greater than 3, 100 swells per week would be considered unusual.

a.

The approximate 25th percentile PASI score before treatment is 10. The approximate median before treatment is 15. The approximate 75th percentile PASI score before treatment is 28.

b.

The approximate 25th percentile PASI score after treatment is 3. The approximate median after treatment is 5. The approximate 75th percentile PASI score after treatment is 7.5.

c.

Since the 75th percentile after treatment is lower than the 25th percentile before treatment, it appears that the ichthyotherapy is effective in treating psoriasis.

a.

The median is 0.098, meaning 50% of companies have DROS values below 0.098 and 50% have DROS values above 0.098. The lower quartile is 0.053, meaning 25% of companies have DROS values below 0.053 and 75% have DROS values above 0.053. The upper quartile is 0.172 meaning 75% of companies have DROS values below 0.172 and 25% have DROS values above 0.172.

b.

𝐼𝑄𝑅 = 𝑄 − 𝑄 = 0.172 − 0.053 = 0.119

c.

0.172 is the upper quartile and 0.053 is the lower quartile. Half the data falls between these two values.

̄

̄

Copyright © 2022 Pearson Education, Inc.


56

Chapter 2 ̄

𝑧=

d.

=

.

.

= 2.00

.

If nothing is known about the shape of the distribution, then we would use Chebyshev’s Theorem to describe it. Chebyshev’s Theorem indicates that at most 25% of the distribution fall above two standard deviations above the mean. If the distribution is mound-shaped and symmetric, we would sue the Empirical Rule and this answer would change to approximately .025 2.114

a.

From the printout, 𝑥̄ = 52.334 and s = 9.224. The highest salary is 75 (thousand). ̄

The z-score is 𝑧 =

.

=

= 2.46

.

Therefore, the highest salary is 2.46 standard deviations above the mean. The lowest salary is 35.0 (thousand). ̄

The z-score is 𝑧 =

=

.

.

= −1.88

.

Therefore, the lowest salary is 1.88 standard deviations below the mean. The mean salary offer is 52.33 (thousand). The z-score is 𝑧 =

̄

=

.

. .

=0

The z-score for the mean salary offer is 0 standard deviations from the mean. No, the highest salary offer is not unusually high. For any distribution, at least 8/9 of the salaries should have z-scores between −3 and 3. A z-score of 2.46 would not be that unusual. Since no salaries are outside the inner fences, none of them are suspect or highly suspect outliers.

a.

Using MINITAB, the boxplots for each type of firm are: Boxplot of Time

Joint

Votes

2.115

b.

None

Prepack

0

2

4

6

8

10

Time

b.

The median bankruptcy time for No prefiling firms is about 3.2. The median bankruptcy time for Joint firms is about 1.5. The median bankruptcy time for Prepack firms is about 1.4.

c.

The range of the "Prepack" firms is less than the other two, while the range of the "None" firms is the Copyright © 2022 Pearson Education, Inc.


Methods for Describing Sets of Data

57

largest. The interquartile range of the "Prepack" firms is less than the other two, while the interquartile range of the "Joint" firms is larger than the other two.

2.116

d.

No. The interquartile range for the "Prepack" firms is the smallest which corresponds to the smallest standard deviation. However, the second smallest interquartile range corresponds to the "None" firms. The second smallest standard deviation corresponds to the "Joint" firms.

e.

Yes. There is evidence of two outliers in the "Prepack" firms. These are indicated by the two *'s. There is also evidence of two outliers in the "None" firms. These are indicated by the two *'s.

From Exercise 2.100, 𝑥̄ = 67.755 and𝑠 = 26.87. Using MINITAB, a boxplot of the data is: Boxplot of Support

0

20

40

60

80

100

120

140

160

Support

From the boxplot, the support level of 155 would be an outlier. From Exercise 2.100, we found the z-score ̄ . associated with a score of 155 as 𝑧 = = = 3.25. Since this z-score is greater than 3, the . observation 155 is considered an outlier. 2.117

a.

Using MINITAB, the boxplot is:

From the boxplot, there appears to be two outliers just below the value 80 (the values 79 and 79). b.

From Exercise 2.81, 𝑥̄ = 94.882 and 𝑠 = 3.9725. Since the data are skewed to the left, we will consider observations more than 2 standard deviations from the mean to be outliers. An observation with a z-score of 3 would have the value: 𝑧=

̄

⇒3=

. .

⇒ 3(3.9725) = 𝑥 − 94.882 ⇒ 11.918 = 𝑥 − 94.882 ⇒ 𝑥 = 106.800

An observation with a z-score of -3 would have the value:

Copyright © 2022 Pearson Education, Inc.


58

Chapter 2 ̄

𝑧=

⇒ −3 =

.

⇒ −3(3.9725) = 𝑥 − 94.882 ⇒ −11.918 = 𝑥 − 94.882 ⇒ 𝑥 = 82.965

.

Observations greater than 1046.800 or less than 82.965 would be considered outliers. Using this criterion, the following observations would be outliers: 79 and 79.

2.118

c.

Yes, these methods do agree exactly. Both methods identify two observations as outliers.

a.

Using MINITAB, the box plot is: Boxplot of Downtime

0

10

20

30

40

50

60

70

Downtime

The median is about 18. The data appear to be skewed to the right since there are 3 suspect outliers to the right and none to the left. The variability of the data is fairly small because the IQR is fairly small, approximately 26 − 10 = 16. b.

The customers associated with the suspected outliers are customers 268, 269, and 264.

c.

In order to find the z-scores, we must first find the mean and standard deviation. 𝑥̄ =

=

s2 =

= 20.375

( x) x − 2

n

n −1

2

𝑠 = √192.90705 = 13.89 The z-scores associated with the suspected outliers are: Customer 268 𝑧 = Customer 269 𝑧 = Customer 264 𝑧 =

.

= 2.06

.

= 2.13

.

= 3.14

.

.

.

2

24129 − 815 40 = 192.90705 = 40 − 1

All the z-scores are greater than 2. These are unusual values.

Copyright © 2022 Pearson Education, Inc.


Methods for Describing Sets of Data 2.119

59

Using MINITAB, the boxplots of the data are: Boxplot of PermA, PermB, PermC

PermA

PermB

PermC

50

75

100

125

150

Data

The descriptive statistics are: Descriptive Statistics: PermA, PermB, PermC Variable PermA PermB PermC

2.120

N 100 100 100

Mean 73.62 128.54 83.07

StDev 14.48 21.97 20.05

Minimum 55.20 50.40 52.20

Q1 62.00 108.65 67.72

Median 70.45 139.30 78.65

Q3 81.42 147.02 95.35

Maximum 122.40 150.00 129.00

IQR 19.42 38.37 27.63

a.

For group A, the suspect outliers are any observations greater than𝑄 + 1.5(𝐼𝑄𝑅) = 81.42 + 1.5(19.42) = 110.55or less than𝑄 − 1.5(𝐼𝑄𝑅) = 62 − 1.5(19.42) = 32.87. There are 3 observations greater than 110.55: 117.3, 118.5, and 122.4.

b.

For group B, the suspect outliers are any observations greater than𝑄 + 1.5(𝐼𝑄𝑅) = 147.02 + 1.5(38.37) = 204.575or less than𝑄 − 1.5(𝐼𝑄𝑅) = 108.65 − 1.5(38.37) = 51.095. There is 1 observation less than 51.095: 50.4.

c.

For group C, the suspect outliers are any observations greater than𝑄 + 1.5(𝐼𝑄𝑅) = 95.35 + 1.5(27.63) = 136.795or less than𝑄 − 1.5(𝐼𝑄𝑅) = 67.72 − 1.5(27.63) = 26.275. No observations are greater than 136.795 or less than 26.275.

d.

For group A, if the outliers are removed, the mean will decrease, the median will slightly decrease, and the standard deviation will decrease. For group B, if the outlier is removed, the mean will increase, the median will slightly increase, and the standard deviation will decrease.

For Perturbed Intrinsics, but no Perturbed Projections: 𝑥̄ =

=

.

= 1.62

𝑠 =

=

.

.

̄

The z-score corresponding to a value of 4.5 is𝑧 =

.

= =

.

.

= .627

𝑠 = √𝑠 = √. 627 = .792

= 3.63

.

Since this z-score is greater than 3, we would consider this an outlier for perturbed intrinsics, but no perturbed projections. For Perturbed Projections, but no Perturbed Intrinsics: 𝑥̄ =

=

.

= 25.16 𝑠 =

=

.

.

=

𝑠 = √𝑠 = √46.243 = 6.800 Copyright © 2022 Pearson Education, Inc.

.

= 46.243


60

Chapter 2

The z-score corresponding to a value of 4.5 is𝑧 =

̄

=

.

.

= −3.038

.

Since this z-score is less than -3, we would consider this an outlier for perturbed projections, but no perturbed intrinsics. Since the z-score corresponding to 4.5 for the perturbed projections, but no perturbed intrinsics is smaller in absolute value than that for perturbed intrinsics, but no perturbed projections, it is more likely that the that the type of camera perturbation is perturbed projections, but no perturbed intrinsics. 2.121

From the stem-and-leaf display in Exercise 2.34, the data are fairly mound-shaped, but skewed somewhat to the right. The sample mean is𝑥̄ =

=

The sample variance is𝑠 =

= 59.72. (∑ )

=

,

= 321.7933.

The sample standard deviation is𝑠 = √321.7933 = 17.9386. The z-score associated with the largest value is𝑧 =

̄

.

=

.

= 2.36.

This observation is a suspect outlier. The observations associated with the one-time customers are 5 of the largest 7 observations. Thus, repeat customers tend to have shorter delivery times than one-time customers. Using MINITAB, a scatterplot of the data is: Scatterplot of Var 2 vs Var 1 14 12 10

Var 2

2.122

8 6 4 2 0 0

2

4

6

8

Var 1

Copyright © 2022 Pearson Education, Inc.


Methods for Describing Sets of Data 2.123

61

Using MINITAB, the scatterplot is: Scatterplot of Var 2 vs Var 1 18 16 14

Var 2

12 10 8 6 4 2 0 1

2

3

4

5

Var 1

2.124

Using MINITAB, the scatterplot is: Scatterplot of RATIO vs DIAMETER 10.0 9.5

RATIO

9.0 8.5 8.0 7.5 7.0 6.5 0

100

200

300

400

500

600

700

DIAMETER

It appears that as the pipe diameter increases, the ratio of repair to replacement cost increases. 2.125.

From the scatterplot of the data, it appears that as the number of punishments increases, the average payoff decreases. Thus, there appears to be a negative linear relationship between punishment use and average payoff. This supports the researchers conclusion that “winners” don’t punish”.

Copyright © 2022 Pearson Education, Inc.


62 2.126

Chapter 2 Using MINITAB, the scatterplot of the data is: Scatterplot of Catch vs Search 7000

Catch

6000

5000

4000

3000 15

20

25

30

35

Search

There is an apparent negative linear trend between the search frequency and the total catch. As the search frequency increases, the total catch tends to decrease. 2.127

Using MINITAB, a scattergram of the data is: Scatterplot of SLUGPCT vs ELEVATION 0.625 0.600

SLUGPCT

0.575 0.550 0.525 0.500 0.475 0.450 0

1000

2000

3000

4000

5000

6000

ELEVATION

If we include the observation from Denver, then we would say there might be a linear relationship between slugging percentage and elevation. If we eliminated the observation from Denver, it appears that there might not be a relationship between slugging percentage and elevation. 2.128

Using MINITAB, the scatterplot of the data is:

Copyright © 2022 Pearson Education, Inc.


Methods for Describing Sets of Data

63

There appears to be a positive linear trend between the Math SAT scores in 2017 and the Math SAT scores in 2019. As the 2017 Math SAT scores increase, the 2019 Math SAT scores also tend to increase. 2.129

Using MINITAB, the scatterplot of the data is: Scatterplot of Number vs Hour 400

350

Number

300

250

200

150

100 0

2

4

6

8

10

12

Hour

There appears to be a positive linear trend to the data. As the hours increase, the number of accidents tends to increase. 2.130

Using MINITAB, the scatterplot of the data is: Scatterplot of Mass vs Time 7 6 5

Mass

4 3 2 1 0 0

10

20

30

40

50

60

Time

There is evidence to indicate that the mass of the spill tends to diminish as time increases. As time is getting larger, the mass is decreasing. 2.131

a.

Using MINITAB, the scatterplot of the data is:

Copyright © 2022 Pearson Education, Inc.


64

Chapter 2 The is evidence to indicate that as perceived adaptation to the guests’ tastes increases, the perceived level of traditionalism decreases. b.

2.132

One restaurant sticks out as going against the trend. Restaurant #7 has an adaptation value of 9 and an unexpectedly large traditionalism value of 8.

Using MINITAB, the scatterplot of the data is:

There is a positive trend to the data. As operating income increases, the 2019 value also tends to increase. Since the trend is positive, we would recommend that an NFL executive use operating income to predict a team’s current value. 2.133

a.

Using MINITAB, the scatterplot of the data is:

There does not appear to be a relationship between a CEO’s ratio of salary to worker pay and the CEO’s age. b.

Using MINITAB the descriptive statistics are:

Copyright © 2022 Pearson Education, Inc.


Methods for Describing Sets of Data

Statistics Variable

Total Count Mean StDev Minimum Q1 Median Q3 Maximum IQR

Pay Ratio

50

1477

5696

64 270

468 681

40668

411

Using the interquartile range, the highly suspect outliers are any observations greater than 𝑄 + 3(𝐼𝑄𝑅) = 681.8 + 3(411) = 1,914.8 or less than 𝑄 − 3(𝐼𝑄𝑅) = 270 − 3(411) = −963. There are five highly suspect outliers: 2,159, 2,447, 2,450, 3,168, and 40,668. Using the z-score, any observation greater than 3 standard deviations above or below the mean are highly suspect outliers. Three standard deviations above the mean is: 𝑧=

̄

⇒3=

, ,

⇒ 3(5,696) = 𝑥 − 1,477 ⇒ 17,088 = 𝑥 − 1,477 ⇒ 𝑥 = 18,565

Three standard deviations below the mean results in a very large negative number, so we use the value 0. Using this method, there is one highly suspect outlier: 40,668. c.

Removing the observation 40,668, the scatterplot of the data is:

By removing the one highly suspect outlier, there still does not appear to be a relationship between the two variables.

Copyright © 2022 Pearson Education, Inc.

65


66

Chapter 2

2.134

Using MINITAB, a scatterplot of the data is:

His concern may be a valid one. From the scatterplot, there appears to be a relatively weak negative relationship between accuracy and driving distance. As driving distance increases, the driving accuracy tends to decrease. While not a strong relationship, it still should be of concern to the professional golfer. 2.135

One way the bar graph can mislead the viewer is that the vertical axis has been cut off. Instead of starting at 0, the vertical axis starts at 12. Another way the bar graph can mislead the viewer is that as the bars get taller, the widths of the bars also increase.

2.136

a.

If you work for Volkswagon, you would choose to use the median number of deaths because this is much lower than the mean. The data are skewed to the right, so the median would probably be a better representation of the middle of the distribution.

b.

If you support an environmental watch group, you would choose to use the mean number of deaths because this is much greater than the median. The average number of deaths is much high than the median number of deaths.

a.

The graph might be misleading because the scales on the vertical axes are different. The left vertical axis ranges from 0 to $120 million. The right vertical axis ranges from 0 to $20 billion.

b.

Using MINITAB, the redrawn graph is: Time Series Plot of Craigslist, NewspaperAds 18000

Variable Craigslist NewspaperAds

16000 14000 12000

Data

2.137

10000 8000 6000 4000 2000 0 2003

2004

2005

2006

2007

2008

2009

Year Copyright © 2022 Pearson Education, Inc.


Methods for Describing Sets of Data

67

Although the amount of revenue produced by Craigslist has increased dramatically from 2003 to 2009, it is still much smaller than the revenue produced by newspaper ad sales. 2.138

a.

This graph is misleading because it looks like as the days are increasing, the number of barrels collected per day are also increasing. However, the bars are the cumulative number of barrels collected. The cumulative value can never decrease.

b.

Using MINITAB, the graph of the daily collection of oil is: Chart of Barrells 2500

Barrells

2000

1500

1000

500

0

May-16

May-17

May-18

May-19 May-20 Day

May-21

May-22

May-23

From this graph, it shows that there has not been a steady improvement in the suctioning process. There was an increase for 3 days, then a leveling off for 3 days, then a decrease. 2.139

The relative frequency histogram is: Histogram of Class

Relative frequency

.20

.15

.10

.05

0

1.125

2.625

4.125 5.625 Measurement Class

7.125

8.625

2.140

The mean is sensitive to extreme values in a data set. Therefore, the median is preferred to the mean when a data set is skewed in one direction or the other.

2.141

a.

𝑧=

=

= −1

𝑧=

=1

𝑧=

=2

b.

𝑧=

=

=0

𝑧=

=4

𝑧=

=6

Copyright © 2022 Pearson Education, Inc.


68

Chapter 2

2.142

2.143

c

𝑧=

=

=1

𝑧=

=3

𝑧=

=4

d.

𝑧=

=

= .1

𝑧=

= .3

𝑧=

= .4

a.

If we assume that the data are about mound-shaped, then any observation with a z-score greater than 3 in absolute value would be considered an outlier. From Exercise 2.139, the z-score corresponding to 50 is −1, the z-score corresponding to 70 is 1, and the z-score corresponding to 80 is 2. Since none of these z-scores is greater than 3 in absolute value, none would be considered outliers.

b.

From Exercise 2.139, the z-score corresponding to 50 is −2, the z-score corresponding to 70 is 2, and the z-score corresponding to 80 is 4. Since the z-score corresponding to 80 is greater than 3, 80 would be considered an outlier.

c.

From Exercise 2.139, the z-score corresponding to 50 is 1, the z-score corresponding to 70 is 3, and the z-score corresponding to 80 is 4. Since the z-scores corresponding to 70 and 80 are greater than or equal to 3, 70 and 80 would be considered outliers.

d.

From Exercise 2.139, the z-score corresponding to 50 is .1, the z-score corresponding to 70 is .3, and the z-score corresponding to 80 is .4. Since none of these z-scores is greater than 3 in absolute value, none would be considered outliers.

a.

∑ 𝑥 = 13 + 1 + 10 + 3 + 3 = 30 𝑥̄ =

b.

a.

=

=

= 6.25

=

=

=7

𝑠 =

𝑠 =

=

=3

=

= 5.67

(∑ )

=

=

= 28.25

𝑠 = √28.25 = 5.32

∑ 𝑥 = 1 + 0 + 1 + 10 + 11 + 11 + 15 = 569.

=

=

= 37.67

𝑠 = √37.67 = 6.14

∑ 𝑥 = 3 + 3 + 3 + 3 = 36 (∑ )

𝑠 =

= −$1.5

.

=

=

= =0

𝑠 =

(∑ )

=

𝑠 = √0 = 0

∑ 𝑥 = 4 + 6 + 6 + 5 + 6 + 7 = 198 .

=

∑ 𝑥 = −1 + 4 + (−3) + 0 + (−3) + (−6) = −9 (−6) = 71 𝑥̄ =

𝑠 = √27 = 5.20

(∑ )

𝑠 =

= 27

∑ 𝑥 = 13 + 6 + 6 + 0 = 241

∑ 𝑥 = 4 + 6 + 6 + 5 + 6 + 7 = 34 𝑥̄ =

b.

(∑ )

𝑠 =

∑ 𝑥 = 3 + 3 + 3 + 3 = 12 𝑥̄ =

2.144

=6

∑ 𝑥 = 1 + 0 + 1 + 10 + 11 + 11 + 15 = 49

𝑥̄ = d.

=

∑ 𝑥 = 13 + 6 + 6 + 0 = 25 𝑥̄ =

c.

∑ 𝑥 = 13 + 1 + 10 + 3 + 3 = 288

(∑ )

𝑠 = √1.067 = 1.03

∑ 𝑥 = (−1) + 4 + (−3) + 0 + (−3) +

(

=

= 1.0667

)

=

.

= 11.5dollars squared

Copyright © 2022 Pearson Education, Inc.


Methods for Describing Sets of Data

69

𝑠 = √11.5 = $3.39 ∑𝑥 = + + + +

c.

𝑥̄ =

𝑠 =

=

.

= 2.0625

∑𝑥 =

+

+

+

+

= 1.2039

= .4125%

(∑ )

=

.

.

=

.

= .0883%squared

𝑠 = √. 0883 = .30% d.

(a)

Range = 7 − 4 = 3

(b)

Range = $4 − ($-6) = $10

(c)

Range = % −

%=

%−

%=

% = .7375%

2.145

The range is found by taking the largest measurement in the data set and subtracting the smallest measurement. Therefore, it only uses two measurements from the whole data set. The standard deviation uses every measurement in the data set. Therefore, it takes every measurement into account—not just two. The range is affected by extreme values more than the standard deviation.

2.146

𝜎≈

2.147

Using MINITAB, the scatterplot is:

range

=

=5

Scatterplot of Var 2 vs Var 1 30

Var 2

25

20

15

10

100

200

300

400

500

Var 1

2.148 a. Since the data are mound-shaped and symmetric, we know from the Empirical Rule that approximately 95% of the observations will fall within 2 standard deviations of the mean. This interval will be: 𝑥̄ ± 2𝑠 ⇒ 39 ± 2(6) ⇒ 39 ± 12 ⇒ (27,51). b.

We know that approximately .05 of the observations will fall outside the range 27 to 51. Since the distribution of scores is symmetric, we know that half of the .05 or .025 will fall above 51.

c.

We know from the Empirical Rule that approximately 99.7% (essentially all) of the observations will fall within 3 standard deviations of the mean. This interval is: 𝑥̄ ± 3𝑠 ⇒ 39 ± 3(6) ⇒ 39 ± 18 ⇒ (21,57).

d.

The z-score is 𝑧 =

̄

=

= −1.5. A score of 30 is 1.5 standard deviations below the mean.

Copyright © 2022 Pearson Education, Inc.


70

2.149

Chapter 2 e.

If 5% of the drug dealers have WR scores above 49, then 95% will have WR scores below 49. Thus, 49 will be the 95th percentile.

a.

𝑝 =

= .615

b.

𝑝 =

= .328

c.

𝑝 =

= .057

d.

. 615(360) = 221.4, . 328(360) = 118.1, . 057(360) = 20.5

e.

Using MINITAB, the pie chart is: Pie Chart of Location Category Urban Suburban Rural

Rural 5.7%

Suburban 32.8%

Urban 61.5%

f.

61.5% of the STEM participants are from urban areas, 32.8% are from suburban areas, and 5.7% are from rural areas.

g.

Using MINITAB, the bar chart is: 70 60

Percent

50 40 30 20 10 0

Urban

Suburban

Rural

Loc Percent is calculated within all data.

Both charts give the same information. 2.150

a.

To find relative frequencies, we divide the frequencies of each category by the total number of incidents. The relative frequencies of the number of incidents for each of the cause categories are:

Copyright © 2022 Pearson Education, Inc.


Methods for Describing Sets of Data Management System Cause Category Engineering & Design Procedures & Practices Management & Oversight Training & Communication TOTAL b.

Number of Incidents

Relative Frequencies

27 24 22 10 83

27 / 83 = .325 24 / 83 = .289 22 / 83 = .265 10 / 83 = .120 1

71

The Pareto diagram is: Management Systen Cause Category 35 30

P er cent

25 20 15 10 5 0

2.151

E ng&D es

P roc&P ract M gmt&O v er C ategor y

Trn&C omm

c.

The category with the highest relative frequency of incidents is Engineering and Design. The category with the lowest relative frequency of incidents is Training and Communication.

a.

The mean years of experience is𝑥̄ = years of experience is 17.824 years.

b.

To find the median, we first arrange the data in order from lowest to highest:

=

=

= 17.824. The average number of

3 5 6 9 10 10 10 15 20 20 25 25 25 30 30 30 30 Since there are an odd number of observations, the median is the middle number which is 20. Half of interviewees have less than 20 years of experience.

2.152

c.

The mode is 30. More interviewees had 30 years of experience than any other value.

a.

The data are time series data because the numbers of bankruptcies were collected over a period of 8 years.

b.

Using MINITAB, the time series plot is:

Copyright © 2022 Pearson Education, Inc.


72

2.153

Chapter 2

c.

There is a strong decreasing trend in the number of bankruptcies as the years increase. We also see that the trend is more curvilinear than linear.

a.

Using MINITAB, the pie chart is: Pie Chart of Drivstar Category 2 3 4 5

2 4.1%

5 18.4%

3 17.3%

4 60.2%

b.

The average driver’s severity of head injury in head-on collisions is 603.7.

c.

Since the mean and median are close in value, the data should be fairly symmetric. Thus, we can use the Empirical Rule. We know that about 95% of all observations will fall within 2 standard deviations of the mean. This interval is 𝑥̄ ± 2𝑠 ⇒ 603.7 ± 2(185.4) ⇒ 603.7 ± 370.8 ⇒ (232.9,974.5) Most of the head-injury ratings will fall between 232.9 and 974.5.

2.154

̄

.

d.

The z-score would be: 𝑧 = = = −1.06 . Since the absolute value is not very big, this is not an unusual value to observe.

a.

The sample mean is: 𝑥̄ =

=

.

.

.

⋅⋅⋅

.

=

.

= 1.881

The sample average surface roughness of the 20 observations is 1.881. b.

The median is found as the average of the 10th and 11th observations, once the data have been ordered. The ordered data are: 1.06 1.09 1.19 1.26 1.27 1.40 1.51 1.72 1.95 2.03 2.05 2.13 2.13 2.16 2.24 2.31 2.41 2.50 2.57 2.64

The 10th and 11th observations are 2.03 and 2.05. The median is: Copyright © 2022 Pearson Education, Inc.


Methods for Describing Sets of Data .

.

=

.

73

= 2.04

The middle surface roughness measurement is 2.04. Half of the sample measurements were less than 2.04 and half were greater than 2.04.

2.155

c.

The data are somewhat skewed to the left. Thus, the median might be a better measure of central tendency than the mean. The few small values in the data tend to make the mean smaller than the median.

a.

Using MINITAB, a Pareto diagram for the data is: Chart Defects 70 60

Frequency

50 40 30 20 10 0

Body

Accessories

Electrical Defect

Transmission

Engine

The most frequently observed defect is a body defect. b.

Using MINITAB, a Pareto diagram for the Body Defect data is: Chart of Body Defects 30

Frequency

25 20 15 10 5 0

Paint

Dents

Upolstery Body Defect

Windshield

Chrome

Most body defects are either paint or dents. These two categories account for (30 + 25)/70 = 55/70 = .786 of all body defects. Since these two categories account for so much of the body defects, it would seem appropriate to target these two types of body defects for special attention. 2.156

The percentile ranking of the age of 25 years would be 100% − 76% = 24%. Thus, an age of 25 would correspond to the 24th percentile. Copyright © 2022 Pearson Education, Inc.


74

Chapter 2

2.157

a.

The mean amount exported on the printout is 653. This means that the average amount of money per market from exporting sparkling wine was $653,000.

b.

The median amount exported on the printout is 231. Since the median is the middle value, this means that half of the 30 sparkling wine export values were above $231,000 and half of the sparkling wine export values were below $231,000.

c.

The mean 3-year percentage change on the printout is 481. This means that in the last three years, the average change is 481%, which indicates a large increase.

d.

The median 3-year percentage change on the printout is 156. Since the median is the middle value, this means that half, or 15 of the 30 countries’ 3-year percentage change values were above 156% and half, or 15 of the 30 countries’ 3-year percentage change values were below 156%.

e.

The range is the difference between the largest observation and the smallest observation. From the printout, the largest observation is $4,852 thousand and the smallest observation is $70 thousand. The range is: 𝑅 = $4,852 − $70 = $4,882thousand

f.

From the printout, the standard deviation is s = $1,113 thousand.

g.

The variance is the standard deviation squared. The variance is: 𝑠 = 1,113 = 1,238,769million dollars squared

2.158

h.

We would expect an export amount to fall within 2 standard deviations of the mean or𝑥̄ ± 2𝑠 ⇒ 653 ± 2(1,113) ⇒ 653 ± 2,226 ⇒ (−1,573,2,879). Since the exports cannot be negative, the interval would be(0,2,879).

a.

Using MINITAB, the pie charts are: Pie Chart of COLOR, CLARITY COLOR

I 40, 13.0%

CLARITY

D 16, 5.2%

E 44, 14.3%

VVS2 78, 25.3%

Category D E F G H I IF VS1 VS2 VVS1 VVS2

IF 44, 14.3%

H 61, 19.8% VS1 81, 26.3% F VVS1 82, 26.6% 52, 16.9% G 65, 21.1%

VS2 53, 17.2%

The F color occurs the most often with 26.6%. The clarity that occurs the most is VS1 with 26.3%. The D color occurs the least often with 5.2%. The clarity that occurs the least is IF with 14.3%.

Copyright © 2022 Pearson Education, Inc.


Methods for Describing Sets of Data b.

75

Using MINITAB, the relative frequency histogram is: Histogram of CARAT 60

Frequency

50

40

30

20

10

0

0.30

0.45

0.60

0.75

0.90

1.05

CARAT

c.

Using MINITAB, the relative frequency histogram for the GIA group is: Histogram of CARAT CERT = GIA

30

Percent

20

10

0

0.30

0.45

0.60

0.75

0.90

1.05

CARAT

d.

Using MINITAB, the relative frequency histograms for the HRD and IGI groups are: Histogram of CARAT

Histogram of CARAT

CERT = HRD

CERT = IGI

40

40

30

Percent

Percent

30

20

10

0

10

0.30

0.45

0.60

CARAT

e.

20

0.75

0.90

1.05

0

0.30

0.45

0.60

0.75

0.90

1.05

CARAT

The HRD group does not assess any diamonds less than .5 carats and almost 40% of the diamonds they assess are 1.0 carat or higher. The IGI group does not assess very many diamonds over .5 carats and more than half are .3 carats or less. More than half of the diamonds assessed by the GIA group are more than .5 carats, but the sizes are less than those of the HRD group.

Copyright © 2022 Pearson Education, Inc.


76

Chapter 2

f.

The sample mean is: 𝑥̄ =

.

=

= .631

The average number of carats for the 308 diamonds is .631. g.

The median is the average of the middle two observations once they have been ordered. The 154th and 155th observations are .62 and .62. The average of these two observations is .62. Half of the diamonds weigh less than .62 carats and half weigh more.

h

The mode is 1.0. This observation occurred 32 times.

i.

Since the mean and median are close in value, either could be a good descriptor of central tendency.

j.

From Chebyshev’s Theorem, we know that at least ¾ or 75% of all observations will fall within 2 standard deviations of the mean. From part e, 𝑥̄ = .631. The variance is: 𝑠 =

=

.

.

= .0768square carats

The standard deviation is: 𝑠 = √𝑠 = √. 0768 = .277carats This interval is: 𝑥̄ ± 2𝑠 ⇒ .631 ± 2(.277) ⇒ .631 ± .554 ⇒ (.077,1.185) k.

Using MINITAB, the scatterplot is: Scatterplot of PRICE vs CARAT 18000 16000 14000

PRICE

12000 10000 8000 6000 4000 2000 0 0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

1.1

CARAT

As the number of carats increases the price of the diamond tends to increase. There appears to be an upward trend. 2.159

a.

Using MINITAB, a bar graph of the data is:

Copyright © 2022 Pearson Education, Inc.


Methods for Describing Sets of Data Chart of Cause 12 10

Count

8 6 4 2 0

Collision

Fire

Grounding Cause

HullFail

Unknown

Fire and grounding are the two most likely causes of puncture. b.

Using MINITAB, the descriptive statistics are: Descriptive Statistics: Spillage Variable N Mean StDev Minimum Q1 Median Q3 Maximum Spillage 42 66.19 56.05 25.00 32.00 43.00 77.50 257.00 The mean spillage amount is 66.19 thousand metric tons, while the median is 43.00. Since the median is so much smaller than the mean, it indicates that the data are skewed to the right. The standard deviation is 56.05. Again, since this value is so close to the value of the mean, it indicates that the data are skewed to the right. Since the data are skewed to the right, we cannot use the Empirical Rule to describe the data. Chebyshev’s Rule can be used. Using Chebyshev’s Rule, we know that at least 8/9 of the observations will fall within 3 standard deviations of the mean. 𝑥̄ ± 3𝑠 ⇒ 66.19 ± 3(56.05) ⇒ 66.19 ± 168.15 ⇒ (−101.96, 234.34) or (0, 234.34) since we cannot have negative spillage. Thus, at least 8/9 of all oil spills will be between 0 and 234.34 thousand metric tons.

2.160

Using MINITAB, a pie chart of the data is: Pie Chart of Recoded defect Category False True

True 49, 9.8%

False 449, 90.2%

A response of ‘true’ means the software contained defective code. Thus, only 9.8% of the modules contained defective software code. Copyright © 2022 Pearson Education, Inc.

77


78

Chapter 2

2.161

a.

Since no information is given about the distribution of the velocities of the Winchester bullets, we can only use Chebyshev's Rule to describe the data. We know that at least 3/4 of the velocities will fall within the interval: 𝑥̄ ± 2𝑠 ⇒ 936 ± 2(10) ⇒ 936 ± 20 ⇒ (916,956) Also, at least 8/9 of the velocities will fall within the interval: 𝑥̄ ± 3𝑠 ⇒ 936 ± 3(10) ⇒ 936 ± 30 ⇒ (906,966)

2.162

b.

Since a velocity of 1,000 is much larger than the largest value in the second interval in part a, it is very unlikely that the bullet was manufactured by Winchester.

a.

First, we must compute the total processing times by adding the processing times of the three departments. The total processing times are as follows: Request

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Total Processing Time 13.3 5.7 7.6 20.0* 6.1 1.8 13.5 13.0 15.6 10.9 8.7 14.9 3.4 13.6 14.6 14.4

Request

17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Total Processing Time 19.4* 4.7 9.4 30.2 14.9 10.7 36.2* 6.5 10.4 3.3 8.0 6.9 17.2* 10.2 16.0 11.5

Request

33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50

Copyright © 2022 Pearson Education, Inc.

Total Processing Time 23.4* 14.2 14.3 24.0* 6.1 7.4 17.7* 15.4 16.4 9.5 8.1 18.2* 15.3 13.9 19.9* 15.4 14.3* 19.0


Methods for Describing Sets of Data

79

The stem-and-leaf displays with the appropriate leaves highlighted are as follows: Stem-and-leaf of Mkt Leaf Unit = 0.10 6 0 7 1 14 2 16 3 22 4 (10) 5 18 6 8 7 4 8 2 9 2 10 1 11

Stem-and-leaf of Engr Leaf Unit = 0.10

0112446 3 0024699 25 001577 0344556889 0002224799 0038 07 0 0

7 14 19 23 (5) 22 19 14 9 9 7 6 5 2 1

Stem-and-leaf of Accnt Leaf Unit = 0.10 19 (8) 23 21 19 15 15 13 11 11 11 11 10 9 9 8 8

0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 HI

111111111112 2333444 55556888 00 79 0023 23 78

8 2

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

4466699 3333788 12246 1568 24688 233 01239 22379 66 0 3 023 0 4

Stem-and-leaf of Total Leaf Unit = 1.00 1 3 5 11 17 21 (5) 24 14 10 6 5 4

0 1 0 33 0 45 0 666677 0 888999 1 0000 1 33333 1 4444445555 1 6677 1 8999 2 0 2 3 2 44 HI 30, 36

0 4 99, 105, 135, 144, 182, 220, 300

Of the 50 requests, 10 were lost. For each of the three departments, the processing times for the lost requests are scattered throughout the distributions. The processing times for the departments do not appear to be related to whether the request was lost or not. However, the total processing times for the lost requests appear to be clustered towards the high side of the distribution. It appears that if the total processing time could be kept under 17 days, 76% of the data could be maintained, while reducing the number of lost requests to 1. Copyright © 2022 Pearson Education, Inc.


80

Chapter 2

b.

For the Marketing department, if the maximum processing time was set at 6.5 days, 78% of the requests would be processed, while reducing the number of lost requests by 4. For the Engineering department, if the maximum processing time was set at 7.0 days, 72% of the requests would be processed, while reducing the number of lost requests by 5. For the Accounting department, if the maximum processing time was set at 8.5 days, 86% of the requests would be processed, while reducing the number of lost requests by 5.

c.

Using MINITAB, the summary statistics are:

Descriptive Statistics: REQUEST, MARKET, ENGINEER, ACCOUNT Variable MARKET ENGINEER ACCOUNT TOTAL

d.

N Mean 50 4.766 50 5.044 50 3.652 50 13.462

Minimum 0.100 0.400 0.100 1.800

Q1 2.825 1.775 0.200 8.075

Median Q3 5.400 6.250 4.500 7.225 0.800 3.725 13.750 16.600

Maximum 11.000 14.400 30.000 36.200

The z-scores corresponding to the maximum time guidelines developed for each department and the total are as follows: Marketing: 𝑧 = Engineering: 𝑧 = Accounting: 𝑧 = Total: 𝑧 =

e.

StDev 2.584 3.835 6.256 6.820

̄

=

̄

.

= ̄

̄

=

.

=

.

. .

.

= .67

. .

= .51

.

= .77

.

.

= .52

To find the maximum processing time corresponding to a z-score of 3, we substitute in the values of z,𝑥̄ , and s into the z formula and solve for x. 𝑧=

̄

⇒ 𝑥 − 𝑥̄ = 𝑧𝑠 ⇒ 𝑥 = 𝑥̄ + 𝑧𝑠

Marketing:

𝑥 = 4.77 + 3(2.58) = 4.77 + 7.74 = 12.51 None of the orders exceed this time.

Engineering:

𝑥 = 5.04 + 3(3.84) = 5.04 + 11.52 = 16.56 None of the orders exceed this time.

These both agree with both the Empirical Rule and Chebyshev's Rule. Accounting:

𝑥 = 3.65 + 3(6.26) = 3.65 + 18.78 = 22.43 One of the orders exceeds this time or 1/50 = .02.

Total:

𝑥 = 13.46 + 3(6.82) = 13.46 + 20.46 = 33.92 One of the orders exceeds this time or 1/50 = .02.

These both agree with Chebyshev's Rule but not the Empirical Rule. Both of these last two distributions are skewed to the right.

Copyright © 2022 Pearson Education, Inc.


Methods for Describing Sets of Data f.

Marketing:

𝑥 = 4.77 + 2(2.58) = 4.77 + 5.16 = 9.93 Two of the orders exceed this time or 2/50 = .04.

Engineering:

𝑥 = 5.04 + 2(3.84) = 5.04 + 7.68 = 12.72 Two of the orders exceed this time or 2/50 = .04.

Accounting:

𝑥 = 3.65 + 2(6.26) = 3.65 + 12.52 = 16.17 Three of the orders exceed this time or 3/50 = .06.

Total:

𝑥 = 13.46 + 2(6.82) = 13.46 + 13.64 = 27.10 Two of the orders exceed this time or 2/50 = .04.

81

All of these agree with Chebyshev's Rule but not the Empirical Rule. g.

No observations exceed the guideline of 3 standard deviations for both Marketing and Engineering. One observation exceeds the guideline of 3 standard deviations for both Accounting (#23, time = 30.0 days) and Total (#23, time = 36.2 days). Therefore, only (1/10) × 100% of the "lost" quotes have times exceeding at least one of the 3 standard deviation guidelines. Two observations exceed the guideline of 2 standard deviations for both Marketing (#31, time = 11.0 days and #48, time = 10.0 days) and Engineering (#4, time = 13.0 days and #49, time = 14.4 days). Three observations exceed the guideline of 2 standard deviations for Accounting (#20, time = 22.0 days; #23, time = 30.0 days; and #36, time = 18.2 days). Two observations exceed the guideline of 2 standard deviations for Total (#20, time = 30.2 days and #23, time = 36.2 days). Therefore, (7/10) × 100% = 70% of the "lost" quotes have times exceeding at least one the 2 standard deviation guidelines. We would recommend the 2 standard deviation guideline since it covers 70% of the lost quotes, while having very few other quotes exceed the guidelines.

2.164

a.

The average expenditure per full-time employee is $6,563. The median expenditure per employee is $6,232. Half of all expenditures per employee were less than $6,232 and half were greater than $6,232. The lower quartile is $5,309. Twenty-five percent of all expenditures per employee were below $5,309. The upper quartile is $7,216. Seventy-five percent of all expenditures per employee were below $7,216.

b.

𝐼𝑄𝑅 = 𝑄 − 𝑄 = $7,216 − $5,309 = $1,907.

c.

The interquartile range goes from the 25th percentile to the 75th percentile. Thus, . 5 = .75 − .25 of the 1,751 army hospitals have expenses between $5,309 and $7,216.

a.

Using MINITAB, a scatterplot of the data is: Scatterplot of Year2 vs Year1 55

50

45

Year2

2.163

40

35

30

20

30

40

50

60

Year1

Copyright © 2022 Pearson Education, Inc.


82

Chapter 2 There is a moderate positive trend to the data. As the scores for Year1 increase, the scores for Year2 also tend to increase.

2.165

b.

From the graph, two agencies that had greater than expected PARS evaluation scores for Year2 were USAID and State.

a.

Since the mean is greater than the median, the distribution of the radiation levels is skewed to the right.

b.

𝑥̄ ± 𝑠 ⇒ 10 ± 3 ⇒ (7,13); 𝑥̄ ± 2𝑠 ⇒ 10 ± 2(3) ⇒ (4,16); 𝑥̄ ± 3𝑠 ⇒ 10 ± 3(3) ⇒ (1,19) Interval (7, 13) (4, 16) (1, 19)

Chebyshev's At least 0 At least 75% At least 88.9%

Empirical ≈68% ≈95% ≈100%

Since the data are skewed to the right, Chebyshev's Rule is probably more appropriate in this case. c.

The background level is 4. Using Chebyshev's Rule, at least 75% or .75(50) ≈ 38 homes are above the background level. Using the Empirical Rule, ≈ 97.5% or .975(50) ≈ 49 homes are above the background level.

d.

𝑧=

̄

=

= 3.333

It is unlikely that this new measurement came from the same distribution as the other 50. Using either Chebyshev's Rule or the Empirical Rule, it is very unlikely to see any observations more than 3 standard deviations from the mean. 2.166

a.

Using MINITAB, a pie chart of the data is: Pie Chart of PREVUSE Category NEVER USED USED 28.8%

NEVER 71.2%

From the chart, 71.2% or .712 of the sampled physicians have never used ethics consultation. b.

Using MINITAB, a pie chart of the data is:

Copyright © 2022 Pearson Education, Inc.


Methods for Describing Sets of Data

83

Pie Chart of FUTUREUSE Category NO YES

NO 19.5%

YES 80.5%

From the chart, 19.5% or .195 of the sampled physicians state that they will not use the services in the future. c.

Using MINITAB, the side-by-side pie charts are: Pie Chart of PREVUSE MED

SURG

Category NEVER USED

USED 27.9%

USED 29.3%

NEVER 70.7%

NEVER 72.1%

Panel variable: SPEC

The proportion of medical practitioners who have never used ethics consultation is .707. The proportion of surgical practitioners who have never used ethics consultation is .721. These two proportions are almost the same. d.

Using MINITAB, the side-by-side pie charts are: Pie Chart of FUTUREUSE MED

SURG

NO 17.3%

YES 82.7%

Category NO YES

NO 23.3%

YES 76.7%

Panel variable: SPEC

Copyright © 2022 Pearson Education, Inc.


Chapter 2 The proportion of medical practitioners who will not use ethics consultation in the future is .173. The proportion of surgical practitioners who will not use ethics consultation in the future is .233. The proportion of surgical practitioners who will not use ethics consultation in the future is greater than that of the medical practitioners. e.

Using MINITAB, the relative frequency histograms of the years in practice for the two groups of doctors are: Histogram of YRSPRAC 0.0

NO

25

7.5

15.0

22.5

30.0

37.5

YES

20

Percent

84

15

10

5

0

0.0

7.5

15.0

22.5

30.0

37.5

YRSPRAC Panel variable: FUTUREUSE

The researchers hypothesized that older, more experienced physicians will be less likely to use ethics consultation in the future. From the histograms, approximately 38% of the doctors that said “no” have more than 20 years of experience. Only about 19% of the doctors that said “yes” had more than 20 years of experience. This supports the researchers’ assertion. f.

Using MINITAB, the output is: Descriptive Statistics: YRSPRAC Variable YRSPRAC

N 112

N* 6

Mean 14.598

N for Minimum Median 1.000 14.000

Maximum 40.000

Mode 14, 20, 25

Mode 9

The mean is 14.598. The average length of time in practice for this sample is 14.598 years. The median is 14. Half of the physicians have been in practice less than 14 years and half have been in practice longer than 14 years. There are 3 modes: 14, 20, and 25. The most frequent years in practice are 14, 20, and 25 years. g.

Using MINITAB, the results are: Descriptive Statistics: YRSPRAC Variable YRSPRAC

FUTUREUSE NO YES

N 21 91

N* 2 4

Mean 16.43 14.176

Minimum 1.00 1.000

Median 18.00 14.000

Maximum 35.00 40.000

Mode 25 14, 20

N for Mode 5 8

The mean for the physicians who would refuse to use ethics consultation in the future is 16.43. The average time in practice for these physicians is 16.43 years. The median is 18. Half of the physicians who would refuse ethics consultation in the future have been in practice less than 18 years and half have been in practice more than 18 years. The mode is 25. The most frequent years in practice for these physicians is 25 years. Copyright © 2022 Pearson Education, Inc.


Methods for Describing Sets of Data

85

h.

From the results in part g, the mean for the physicians who would use ethics consultation in the future is 14.176. The average time in practice for these physicians is 14.176 years. The median is 14. Half of the physicians who would use ethics consultation in the future have been in practice less than 14 years and half have been in practice more than 14 years. There are 2 modes: 14 and 20. The most frequent years in practice for these physicians are 14 and 20 years.

i.

The results in parts g and h confirm the researchers’ theory. The mean, median and mode of years in practice are larger for the physicians who would refuse to use ethics consultation in the future than those who would use ethics consultation in the future.

j.

Using MINITAB, the results are: Descriptive Statistics: YRSPRAC Variable YRSPRAC

N 112

N* 6

Mean 14.598

StDev 9.161

Variance 83.918

Range 39.000

The range is 39. The difference between the largest years in practice and the smallest years in practice is 39 years. The variance is 83.918 square years. The standard deviation is 9.161 years. k.

Using MINITAB, the results are: Descriptive Statistics: YRSPRAC Variable YRSPRAC

FUTUREUSE NO YES

N 21 91

N* 2 4

Mean 16.43 14.176

StDev 10.05 8.950

Variance 100.96 80.102

Range 34.00 39.000

For the physicians who would refuse to use ethics consultation in the future, the standard deviation is 10.05 years. l.

For the physicians who would use ethics consultation in the future, the standard deviation is 8.95 years.

m.

The variation in the length of time in practice for the physicians who would refuse to use ethics consultation in the future is greater than that for the physicians who would use ethics consultation in the future.

n.

Using MINITAB, the scatterplot of the data is: Scatterplot of YRSPRAC vs EDHRS 40

YRSPRAC

30

20

10

0 0

200

400

600

800

1000

EDHRS

Copyright © 2022 Pearson Education, Inc.


86

Chapter 2 There does not appear to be much of a relationship between the years of experience and the amount of exposure to ethics in medical school. o.

Using MINITAB, a boxplot of the amount of exposure to ethics in medical school is: Boxplot of EDYHS

0

200

400

600

800

1000

EDHRS

The one data point that is an extreme outlier is the value of 1000. p.

After removing this data point, the scatterplot of the data is: Scatterplot of YRSPRAC2 vs EDHRS2 40

YRSPRAC2

30

20

10

0 0

10

20

30

40

50

60

70

80

90

EDHRS2

With the data point removed, there now appears to be a negative trend to the data. As the amount of exposure to ethics in medical school increases, the years of experience decreases. 2.167

a.

Both the height and width of the bars (peanuts) change. Thus, some readers may tend to equate the area of the peanuts with the frequency for each year.

b.

Using MINITAB, the frequency bar chart is:

Copyright © 2022 Pearson Education, Inc.


Methods for Describing Sets of Data

2.168

87

Using MINITAB, the two dot plots are: Dotplot of Arrive, Depart

Arrive Depart

108

120

132

144

156

168

Data

Yes. Most of the numbers of items arriving at the work center per hour are in the 135 to 165 area. Most of the numbers of items departing the work center per hour are in the 110 to 140 area. Because the number of items arriving is larger than the number of items departing, there will probably be some sort of bottleneck. 2.169

First we make some preliminary calculations. Of the 20 engineers at the time of the layoffs, 14 are 40 or older. Thus, the probability that a randomly selected engineer will be 40 or older is 14/20 = .70. A very high proportion of the engineers is 40 or over.

Copyright © 2022 Pearson Education, Inc.


88

Chapter 2 In order to determine if the company is vulnerable to a disparate impact claim, we will first find the median age of all the engineers. Ordering all the ages, we get: 29, 32, 34, 35, 38, 39, 40, 40, 40, 40, 40, 41, 42, 42, 44, 46, 47, 52, 55, 64 The median of all 20 engineers is

=

= 40

Now, we will compute the median age of those engineers who were not laid off. The ages underlined above correspond to the engineers who were not laid off. The median of these is = = 40. The median age of all engineers is the same as the median age of those who were not laid off. The median age of those laid off is = = 40.5, which is not that much different from the median age of those not laid off. In addition, 70% of all the engineers are 40 or older. Thus, it appears that the company would not be vulnerable to a disparate impact claim. 2.170

a.

Clinic A claims to have a mean weight loss of 15 during the first month and Clinic B claims to have a median weight loss of 10 pounds in the first month. With no other information, I would choose Clinic B. It is very likely that the distributions of weight losses will be skewed to the right – most people lose in the neighborhood of 10 pounds, but a couple might lose much more. If a few people lost much more than 10 pounds, then the mean will be pulled in that direction.

b.

For Clinic A, the median is 10 and the standard deviation is 20. For Clinic B, the mean is 10 and the standard deviation is 5. For Clinic A: The mean is 15 and the median is 10. This would indicate that the data are skewed to the right. Thus, we will have to use Chebyshev’s Rule to describe the distribution of weight losses. 𝑥̄ ± 2𝑠 ⇒ 15 ± 2(20) ⇒ 15 ± 40 ⇒ (−25, 55) Using Chebyshev’s Rule, we know that at least 75% of all weight losses will be between -25 and 55 pounds. This means that at least 75% of the people will have weight losses of between a loss of 55 pounds to a gain of 25 pounds. This is a very large range. For Clinic B: The mean is 10 and the median is 10. This would indicate that the data are symmetrical. Thus, the Empirical Rule can be used to describe the distribution of weight losses. 𝑥̄ ± 2𝑠 ⇒ 10 ± 2(5) ⇒ 10 ± 10 ⇒ (0, 20) Using the Empirical Rule, we know that approximately 95% of all weight losses will be between 0 and 20 pounds. This is a much smaller range than in Clinic A. I would still recommend Clinic B. Using Clinic A, a person has the potential to lose a large amount of weight, but also has the potential to gain a relatively large amount of weight. In Clinic B, a person would be very confident that he/she would lose weight.

c.

One would want the clients selected for the samples in each clinic to be representative of all clients in that clinic. One would hope that the clinic would not choose those clients for the sample who lost the most weight just to promote their clinic.

Copyright © 2022 Pearson Education, Inc.


Methods for Describing Sets of Data

89

2.171

There is evidence to support this claim. The graph peaks at the interval above 1.002. The heights of the bars decrease in order as the intervals get further and further from the peak interval. This is true for all bars except the one above 1.000. This bar is greater than the bar to its right. This would indicate that there are more observations in this interval than one would expect, suggesting that some inspectors might be passing rods with diameters that were barely below the lower specification limit.

2.172

Answers will vary. The graph is made to look like the amount of money spent on education has risen dramatically from 1980 to 2000, but the 4th grade reading scores have not increased at all. The graph does not take into account that the number of school children has also increased dramatically in the last 20 years. A better portrayal would be to look at the per capita spending rather than total spending.

Copyright © 2022 Pearson Education, Inc.


Chapter 3 Probability 3.1

a.

Since the probabilities must sum to 1, P( E3 ) = 1 − P ( E1 ) − P ( E2 ) − P( E4 ) − P( E5 ) = 1 − .1 − .2 − .1 − .1 = .5

b.

3.2

P( E3 ) = 1 − P( E1 ) − P( E2 ) − P( E4 ) − P( E5 ) = 1 − P( E3 ) − P( E2 ) − P( E4 ) − P( E5 )  2 P( E3 ) = 1 − .1 − .2 − .1  2 P( E3 ) = .6  P( E3 ) = .3

c.

P( E3 ) = 1 − P( E1 ) − P ( E2 ) − P( E4 ) − P( E5 ) = 1 − .1 − .1 − .1 − .1 = .6

a.

This is a Venn Diagram.

b.

If the sample points are equally likely, then P(1) = P(2) = P(3) =  = P(10) =

1 10

Therefore, 1 1 1 3 + + = = .3 10 10 10 10 1 1 2 P( B) = P(6) + P(7) = + = = .2 10 10 10 P( A) = P(4) + P(5) + P(6) =

c.

3.3

1 1 3 5 + + = = .25 20 20 20 20 3 3 6 P( B) = P(6) + P(7) = + = = .3 20 20 20 P( A) = P(4) + P(5) + P(6) =

P( A) = P(1) + P(2) + P(3) = .05 + .20 + .30 = .55 P( B) = P(1) + P(3) + P(5) = .05 + .30 + .15 = .50 P(C ) = P(1) + P(2) + P(3) + P(5) = .05 + .20 + .30 + .15 = .70

3.4

a.

 9 9! 9 ⋅ 8 ⋅ 7 ⋅ 6 ⋅ 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅1 = = 126  =  4  4!(9 − 4)! 4 ⋅ 3 ⋅ 2 ⋅1 ⋅ 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅1

b.

7 7! 7 ⋅ 6 ⋅ 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅1 = = 21  = 2 2!(7 − 2)! 2 ⋅1⋅ 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅1  

c.

 4 4! 4 ⋅ 3 ⋅ 2 ⋅1 = =1  = 4   4!(4 − 4)! 4 ⋅ 3 ⋅ 2 ⋅1 ⋅1

d.

 5 5! 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅1 = =1  =  0  0!(5 − 0)! 1 ⋅ 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅1 90 Copyright © 2022 Pearson Education, Inc.


Probability

3.5

3.6

e.

 6 6! 6 ⋅ 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅1 = =6  =  5  5!(6 − 5)! 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅1 ⋅1

a.

 N  5 5! 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅1 120 = = = 10  = = 2 n 2!(5 2)! 2 ⋅1 ⋅ 3 ⋅ 2 ⋅1 12 −    

b.

 N  6 6! 6 ⋅ 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅1 720 = = = 20  = =  n   3  3!(6 − 3)! 3 ⋅ 2 ⋅1 ⋅ 3 ⋅ 2 ⋅1 36

c.

 N   20  20! 20 ⋅19 ⋅18 ⋅⋅⋅ 3 ⋅ 2 ⋅1 2.432902008 × 1018 = = = 15,504  = = 14  n   5  5!(20 − 5)! 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅1 ⋅15 ⋅14 ⋅13 ⋅⋅⋅ 3 ⋅ 2 ⋅1 1.569209242 × 10

a.

The tree diagram of the sample points is:

Copyright © 2022 Pearson Education, Inc.

91


92

Chapter 3

b. c.

If the dice are fair, then each of the sample points is equally likely. Each would have a probability of 1/36 of occurring. 1 There is one sample point in A: (3,3). Thus, P( A) = . 36 There are 6 sample points in B: (1,6) (2,5) (3,4) (4,3) (5,2) and (6,1). P( B) =

6 1 = . 36 6

There are 18 sample points in C: (1,1) (1,3) (1,5) (2,2) (2,4) (2,6) (3,1) (3,3) (3,5) (4,2) (4,4) 18 1 = . (4,6) (5,1) (5,3) (5,5) (6,2) (6,4) and (6,6). Thus, P(C ) = 36 2 3.7

a.

If we denote the marbles as B1, B2, R1, R2, and R3, then the ten sample points are: (B1, B2) (B1, R1) (B1, R2) (B1, R3) (B2, R1) (B2, R2) (B2, R3) (R1, R2) (R1, R3) (R2, R3)

b.

3.8

3.9

3.10

Each of the sample points would be equally likely. Thus, each would have a probability of 1/10 of occurring.

1 . 10 There are 6 sample points in B: (B1, R1) (B1, R2) (B1, R3) (B2, R1) (B2, R2) (B2, R3). 1 6 3 = . Thus, P( B) = 6   =  10  10 5 1 3 There are 3 sample points in C: (R1, R2) (R1, R3) (R2, R3). Thus, P(C ) = 3   = .  10  10 Each student will obtain slightly different proportions. However, the proportions should be close to P ( A) = 1/ 10, P ( B ) = 6 / 10, and P (C ) = 3 / 10.

c.

There is one sample point in A: (B1, B2). Thus, P( A) =

a.

There are four sample points in this study – Only use cable/satellite, only use streaming service, subscribe to both, and have neither service.

b.

Based on the numbers given in the sample: P(only use cable/satellite) = 240/800 P(only use streaming service) = 160/800 P(subscribe to both) = 288/800 P(have neither service) = 112/800

c. d.

P(cord cutter) = P(only use streaming service) = 160/800 =0.20 P(currently or previously subscribed to cable/satellite TV) = P(only use cable/satellite or only use streaming service or subscribe to both) = (240 + 160 + 288)/800 = 688/800 = 0.86

Define the following events: L: {robot has legs only} W: {robot has wheels only} B: {robots have both legs and wheels} N: {robots have neither legs nor wheels} a.

The sample points are L W, B and N.

Copyright © 2022 Pearson Education, Inc.


Probability

3.11

63 20 8 15 = .594 , P (W ) = = .189 , P ( B ) = = .075 , P ( N ) = = .142 106 106 106 106

b.

P ( L) =

c.

P ( Wheels) = P (W ) + P ( B ) =

d.

P ( Legs ) = P ( L ) + P ( B ) =

a.

The sample points of this experiment correspond to each of the 6 possible colors of the M&M’s. Let B r = brown, Y = yellow, R = red, Bl = blue, O = orange, G = green. The six sample points are: Br, Y, R, Bl, O, and G

b.

From the problem, the probabilities of selecting each color are:

20 + 8 = .264 106

63 + 8 = .670 106

P(Br) = 0.13, P(Y) = 0.14, P(R) = 0.13, P(Bl) = 0.24, P(O) = 0.2, P(G) = 0.16 c.

The probability that the selected M&M is brown is P(Br) = 0.13

d.

The probability that the selected M&M is red, green or yellow is: P( R or G or Y ) = P ( R) + P (G ) + P (Y ) = 0.13 + 0.16 + 0.14 = 0.43

e. 3.12

P (not Bl ) = P ( R ) + P (G ) + P (Y ) + P( Br ) + P(O) = 0.13 + 0.16 + 0.14 + 0.13 + 0.20 = 0.76

Define the following events: M: {Nanny who was placed in a job last year is a male}

P( M ) = 3.13

24 = .0057 4,176

Define the following events: A: {achieved the American dream} O: {American dream is out of reach} P: {pessimistic about reaching the American dream}

3.14

93

a.

P(A) =24.0%

b.

P(O or P) = 11.0% + 7.0% = 18%

Define the following events: M: {mobile device user is male} F: {mobile device user is female} B: {mobile device user uses Facebook most often} T: {mobile device user uses Twitter most often} Y: {mobile device user uses YouTube most often}

Copyright © 2022 Pearson Education, Inc.


94

Chapter 3

a.

A tree diagram is:

M

B

M,B

T

M,T

Y

M,Y

B

F,B

T

F,T

Y

F,Y

F

3.15

b.

If 10 males and 10 females were selected from each of the three social media, then the probability of each of the sample events would be 10 / 60 = 1 / 6 .

c.

P ( F ,T ) =

d.

P (Y ) = P ( M ,Y ) + P ( F ,Y ) =

1 6 1 1 2 1 + = = 6 6 6 3

Define the following events: A: {Will use all their paid vacation days this year} M: {Will use almost all their paid vacation days this year} H: {Will use about half of their paid vacation days this year}

3.16

a.

𝑃 𝐴 = 38%

b.

𝑃 𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 ℎ𝑎𝑙𝑓 = 𝑃 𝐴 + 𝑃 𝑀 + 𝑃 𝐻 = 38% + 21% + 17% = 76%

Define the following events: F: {Friend or relative} R: {Professional} P ( Friend, relative, or professional) = P ( F ) + P ( R ) =

3.17

31 114 145 + = = .537 270 270 270

Define the following events: p1 = probability of a successful 1-point play P(p1) = 3,453/3,677 = 0.939 p2= probability of a successful 2-point play P(p2) = 141/300 = 0.47 2p2 = 2(0.47) = 0.94. Based on this data, p1 is less than 2p2, so going for 2-points would be considered optimal.

Copyright © 2022 Pearson Education, Inc.


Probability

3.18

a.

95

Define the following events: A: {Total visitors} B: {Paying visitors} C: {Big shows} D: {Funds raised} E: {Members}

P( A or D) = P( A) + P( D) = b & c.

8 7 15 + = = .5 30 30 30

A tree diagram with the corresponding probabilities for this problem follows. To compute the probabilities, we have to assume that this sample is representative of all such museums. In addition, we have to assume that each selection of a museum is independent of the second selection. The probability of selecting a particular type of museum is estimated by the number of museums in that category divided by 30. Each sample point consists of two museums. The probabilities of each type of museum in the pair are then multiplied together to find the probability of the sample point. The probabilities are shown in the tree.

Copyright © 2022 Pearson Education, Inc.


96

3.19

Chapter 3

d.

𝑃(𝐴𝐴or𝐷𝐷or𝐴𝐷or𝐷𝐴) = 𝑃(𝐴𝐴) + 𝑃(𝐷𝐷) + 𝑃(𝐴𝐷) + 𝑃(𝐷𝐴) = .071 + .054 + .062 + .062 = .249

a.

Define the following event: C: {Slaughtered chicken passes inspection with fecal contamination} P(C ) =

3.20

3.21

1 = .01 100

306 = .0095 ≈ .01 32, 075 Yes. The probability of a slaughtered chicken passing inspection with fecal contamination rounded off to 2 decimal places is .01.

b.

Based on the data, P(C ) =

a.

The possible outcomes for this study are the age ranges given by the parents. They are 3-5 years, 6-8 years, 9-11 years, 12-15 years, and 16-18 years.

b.

P(3-5 years) = 0.05, P(6-8 years) = 0.09, P(9-11 years) = 0.20, P(12-15 years) = 0.27, and P(16-18 years) = .39.

c.

P( 12-18 years) = P(12-15 years) + P(16-18 years) = 0.27 + 0.39 = 0.66

d.

P( 11 years or younger) = P(3-5 years) + P(6-8 years) + P( 9-11 years)= 0.05 + 0.09 + 0.20 = 0.34

a.

The probability that any network is selected on a particular day is 1/8. Therefore, P( ESPN selected on July 11) = 1/8.

b.

The number of ways to select four networks for the weekend days is a combination of 8 networks 8  8! 8 ⋅ 7 ⋅ 6 ⋅ 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅1 taken 4 at a time. The number of ways to do this is   = = = 70 . 4 − 4!(8 4)! 4 ⋅ 3 ⋅ 2 ⋅1 ⋅ 4 ⋅ 3 ⋅ 2 ⋅1  

c.

First, we need to find the number of ways one can choose the 4 networks where ESPN is one of the 4. If ESPN has to be chosen, then the number of ways of doing this is a combination of one thing taken  1 1! 1 one at a time or   = = = 1 . The number of ways to select the remaining 3 networks is a  1 1!(1 − 1)! 1 ⋅1 7 7! 7 ⋅ 6 ⋅ 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅1 combination of 7 things taken 3 at a time or   = = = 35 . Thus, the total 3   3!(7 − 3)! 3 ⋅ 2 ⋅1 ⋅ 4 ⋅ 3 ⋅ 2 ⋅1 number of ways of selecting 4 networks of which one has to be ESPN is 1(35) = 35.

Finally, the probability of selecting ESPN as one of the 4 networks for the weekend analysis is 35 / 70 = .5 . 3.22

a.

Since order does not matter, the number of different bets would be a combination of 8 things taken 2 at 8  8! 8 ⋅ 7 ⋅ 6 ⋅ 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅1 40,320 = = = 28 . a time. The number of ways would be   =  2  2!(8 − 2)! 2 ⋅1 ⋅ 6 ⋅ 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅1 1440

Copyright © 2022 Pearson Education, Inc.


Probability

3.23

b.

If all players are of equal ability, then each of the 28 sample points would be equally likely. Each would have a probability of occurring of 1/28. There is only one sample point with values 2 and 7. Thus, the probability of winning with a bet of 2-7 would be 1/28 or .0357.

a.

There would be a total of 86 candy brands taken 2 at a time or 86⋅85 = 3,655 possible choices of two candy brands. 2

97

86 = 86! = 86⋅85⋅84⋯1 = 2!(86−2)! 2⋅1⋅84⋅83⋅82⋯1 2

b.

Only one of these 3,655 comparisons result in these two candy brands being selected. The probability is 1/3,655 = 0.00027.

c.

We first determine how many different comparisons can be made with the four Reese’s candies. We find the number of combinations that two of the four Reese’s candies are selected using 4! 4⋅3⋅2⋅1 4⋅3 = 2⋅1⋅2⋅1 = 2 = 6 2!(4−2)!

4 = 2

The probability of selecting two of the Reese candies from the total of all 86 candies is 6/3,655 = 0.00164. 3.24

Denote S = Suburu, G = Genesis, and P= Porsche. a.

The possible set of rankings is: (S,G,P) (S,P,G) (G,S,P) (G,P,S) (P,S,G) (P,G,S)

b.

If all rankings are equally likely, then each ranking has a probability of 1 / 6 . 𝑃(Subaru first) = 𝑃(𝑆, 𝐺, 𝑃) + 𝑃(𝑆, 𝑃, 𝐺) = + = = 𝑃(Genesis third) = 𝑃(𝑆, 𝑃, 𝐺) + 𝑃(𝑃, 𝑆, 𝐺) = + = = 𝑃(Porsche first and Genesis second) = 𝑃(𝑃, 𝐺, 𝑆) =

3.25

a.

There are a total of 3 × 3 × 3 = 27 different combinations of Distance, Excess, and Association.

b.

There would be a total of 27 scenarios taken 2 at a time or  27 27! 27 ⋅ 26 ⋅ 251 27 ⋅ 26  2  = 2!( 27 − 2)! = 2 ⋅ 1 ⋅ 25 ⋅ 24 ⋅ 231 = 2 = 351 possible choices of two scenarios.

3.26

No. If 50% of the volunteers are selected to be tested, there would be a combination of 160 volunteers 160 160! 160! = = ways to select those to be tested. To find the probability   80  80!(160 − 80)! 80!80!

taken 80 at a time or 

that one particular volunteer is selected, the number of ways to select 50% that included this particular 159 159! 159! = = .  79  79!(159 − 79)! 79!80!

volunteer would be a combination of 159 volunteers taken 79 at a time or 

159 159!  79  80 79!80! = = = .5 . Thus, the probability that a particular volunteer is selected would be 160! 160 160  80  80!80!

If 50% of all firefighters (career and volunteers) are selected to be tested, there would be a combination of Copyright © 2022 Pearson Education, Inc.


98

Chapter 3 1140 1140! 1140! 980 + 160 = 1,140 firefighters taken 570 at a time or  = = ways to select   570  570!(1140 − 570)! 570!570!

those to be tested. To find the probability that one particular volunteer is selected, the number of ways to select 50% that included this particular volunteer would be a combination of 1,139 firefighters taken 569 at 1139 1139! 1139! = = . Thus, the probability that a particular volunteer is selected  569  569!(1139 − 569)! 569!570!

a time or 

1139 1139!  569  570 = 569!570! = = .5 . would be 1140! 1140 1140  570  570!570!

3.27

3.28

1 1 2 to 1 − = or 1 to 2. 3 3 3

a.

The odds in favor of an Oxford Shoes win are

b.

If the odds in favor of Oxford Shoes are 1 to 1, then the probability that Oxford Shoes wins is 1 1 = . 1+1 2

c.

If the odds against Oxford Shoes are 3 to 2, then the odds in favor of Oxford Shoes are 2 2 = . 2 to 3. Therefore, the probability that Oxford Shoes wins is 2+3 5

First, we need to compute the total number of ways we can select 2 bullets (pair) from 1,837 bullets. This is a combination of 1,837 things taken 2 at a time. 1,837  1,837! 1837 ⋅1836 ⋅⋅⋅⋅1 1837 ⋅1836 = = = 1,686,366 The number of pairs is:  = 2  2  2!(1,837 − 2)! 2 ⋅1 ⋅1835 ⋅1834 ⋅⋅⋅1

The probability of a false positive is the number of false positives divided by the number of pairs and is: P(false positive) = # false positives / # pairs = 693 / 1,686,366 = .0004 This probability is very small. There would be only about 4 false positives out of every 10,000. I would have confidence in the FBI’s forensic evidence. 3.29

a.

The number of ways the 5 commissioners can vote is 2(2)(2)(2)(2) = 25 = 32 (Each of the 5 commissioners has 2 choices for his/her vote – For or Against.)

b.

Let F denote a vote ‘For’ and A denote a vote ‘Against’. The 32 sample points would be: FFFFF FFFFA FFFAF FFAFF FAFFF AFFFF FFFAA FFAFA FAFFA AFFFA FFAAF FAFAF AFFAF FAAFF AFAFF AAFFF FFAAA FAFAA FAAFA FAAAF AFFAA AFAFA AFAAF AAFFA AAFAF AAAFF FAAAA AFAAA AAFAA AAAFA AAAAF AAAAA Each of the sample points should be equally likely. Thus, each would have a probability of 1/32.

c.

The sample points that result in a 2-2 split for the other 4 commissioners are: FFAAF FAFAF AFFAF FAAFF AFAFF AAFFF FFAAA FAFAA FAAFA AFFAA AFAFA AAFFA Copyright © 2022 Pearson Education, Inc.


Probability

There are 12 sample points. d.

Let V = event that your vote counts. P(V ) = 12 / 32 = 0.375 .

e.

If there are now only 3 commissioners in the bloc, then the total number of ways the bloc can vote is 2(2)(2) = 23 = 8 . The sample points would be: FFF

FFA

FAF

AFF

FAA

AFA

AAF

AAA

The number of sample points where your vote would count is 4: FAF, AFF, FAA, AFA Let W = event that your vote counts in the bloc. P(W ) = 4 / 8 = 0.5 . 3.30

3.31

3.32

a.

P ( B c ) = 1 − P ( B ) = 1 − .7 = .3

b.

P ( Ac ) = 1 − P ( A) = 1 − .4 = .6

c.

P( A ∪ B) = P( A) + P( B) − P( A ∩ B) = .4 + .7 − .3 = .8

a.

A: {HHH, HHT, HTH, THH, TTH, THT, HTT} B: {HHH, TTH, THT, HTT} A ∪ B : {HHH, HHT, HTH, THH, TTH, THT, HTT} Ac: {TTT} A ∩ B : {HHH, TTH, THT, HTT}

b.

P( A) =

c.

P( A ∪ B) = P( A) + P( B) − P( A ∩ B) =

d.

No. P ( A ∩ B ) =

7 8

P( B) =

4 1 = 8 2

P( A ∪ B) =

7 8

P( Ac ) =

1 8

P( A ∩ B ) =

4 1 = 8 2

7 1 1 7 + − = 8 2 2 8

1 which is not 0. 2

The experiment consists of rolling a pair of fair dice. The sample points are: 1, 1 1, 2 1, 3 1, 4 1, 5 1, 6

2, 1 2, 2 2, 3 2, 4 2, 5 2, 6

3, 1 3, 2 3, 3 3, 4 3, 5 3, 6

4, 1 4, 2 4, 3 4, 4 4, 5 4, 6

5, 1 5, 2 5, 3 5, 4 5, 5 5, 6

6, 1 6, 2 6, 3 6, 4 6, 5 6, 6

Since each die is fair, each sample point is equally likely. The probability of each sample point is 1/36. a.

A: {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)} B: {(1, 4), (2, 4), (3, 4), (4, 4), (5, 4), (6, 4), (4, 1), (4, 2), (4, 3), (4, 5), (4, 6)}

A ∩ B : {(3, 4), (4, 3)} A ∪ B : {(1, 4), (2, 4), (3, 4), (4, 4), (5, 4), (6, 4), (4, 1), (4, 2), (4, 3), (4, 5), Copyright © 2022 Pearson Education, Inc.

99


100

Chapter 3

(4, 6), (1, 6), (2, 5), (5, 2), (6, 1)} Ac: {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (2, 1), (2, 2), (2, 3), (2, 4), (2, 6), (3, 1), (3, 2), (3, 3), (3, 5), (3, 6), (4, 1), (4, 2), (4, 4), (4, 5), (4, 6), (5, 1), (5, 3), (5, 4), (5, 5), (5, 6), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)} b.

 1  6 1 P( A) = 6   = =  36  36 6

 1  11 P( B) = 11  =  36  36

 1  15 5 P( A ∪ B) = 15   = =  36  36 12

 1  30 5 P( Ac ) = 30   = =  36  36 6 1 11 1 6 + 11 − 2 15 5 + − = = = 6 36 18 36 36 12

c.

P( A ∪ B) = P( A) + P( B) − P( A ∩ B) =

d.

A and B are not mutually exclusive. To be mutually exclusive, P( A ∩ B) must be 0. Here, P( A ∩ B) =

3.33

 1  2 1 P( A ∩ B) = 2   = =  36  36 18

1 . 18

a.

1 1 1 1 1 15 3 P( A) = P( E1 ) + P( E2 ) + P( E3 ) + P( E5 ) + P( E6 ) = + + + + = = 5 5 5 20 10 20 4

b.

1 1 1 1 13 P( B) = P( E2 ) + P( E3 ) + P( E4 ) + P( E7 ) = + + + = 5 5 20 5 20

c.

P( A ∪ B) = P( E1 ) + P( E2 ) + P( E3 ) + P( E4 ) + P( E5 ) + P( E6 ) + P( E7 ) 1 1 1 1 1 1 1 = + + + + + + =1 5 5 5 20 20 10 5

d.

1 1 2 P( A ∩ B) = P( E2 ) + P( E3 ) = + = 5 5 5

e.

P ( Ac ) = 1 − P ( A) = 1 −

f.

P( B c ) = 1 − P( B) = 1 −

g.

P ( A ∪ Ac ) = P ( E1 ) + P ( E2 ) + P ( E3 ) + P( E4 ) + P ( E5 ) + P ( E6 ) + P ( E7 )

3 1 = 4 4

13 7 = 20 20

1 1 1 1 1 1 1 = + + + + + + =1 5 5 5 20 20 10 5

h.

P( Ac ∩ B) = P( E4 ) + P( E7 ) =

1 1 5 1 + = = 20 5 20 4

Copyright © 2022 Pearson Education, Inc.


Probability

3.34

3.35

101

a.

P ( Ac ) = P ( E3 ) + P ( E6 ) = .2 + .3 = .5

b.

P ( B c ) = P ( E1 ) + P ( E7 ) = .10 + .06 = .16

c.

P ( Ac ∩ B ) = P ( E3 ) + P ( E6 ) = .2 + .3 = .5

d.

P( A ∪ B) = P( E1 ) + P( E2 ) + P( E3 ) + P( E4 ) + P( E5 ) + P( E6 ) + P( E7 ) = .10 + .05 + .20 + .20 + .06 + .30 + .06 = .97

e.

P( A ∩ B) = P( E2 ) + P( E4 ) + P( E5 ) = .05 + .20 + .06 = .31

f.

P ( Ac ∩ B c ) = P ( E8 ) = .03

g.

No. A and B are mutually exclusive if P( A ∩ B) = 0 . Here, P( A ∩ B) = .31 .

a.

P( A) = .50 + .10 + .05 = .65

b.

P( B) = .10 + .07 + .50 + .05 = .72

c.

P (C ) = .25

d.

P( D) = .05 + .03 = .08

e.

P ( Ac ) = .25 + .07 + .03 = .35 (Note: P( Ac ) = 1 − P( A) = 1 − .65 = .35 )

f.

P( A ∪ B) = P( B) = .10 + .07 + .50 + .05 = .72

g.

P( A ∩ C ) = 0

h.

Two events are mutually exclusive if they have no sample points in common or if the probability of their intersection is 0. P( A ∩ B) = P( A) = .50 + .10 + .05 = .65 . Since this is not 0, A and B are not mutually exclusive. P( A ∩ C ) = 0 . Since this is 0, A and C are mutually exclusive. P( A ∩ D) = .05 . Since this is not 0, A and D are not mutually exclusive.

P( B ∩ C ) = 0 . Since this is 0, B and C are mutually exclusive. P( B ∩ D) = .05 . Since this is not 0, B and D are not mutually exclusive. P(C ∩ D) = 0 . Since this is 0, C and D are mutually exclusive.

Copyright © 2022 Pearson Education, Inc.


102

3.36

3.37

Chapter 3

a.

The outcome "On" and "High" is A ∩ D .

b.

The outcome "Low" or "Medium" is Dc.

Define the following events: L: {robot has legs only} W: {robot has wheels only} B: {robots have both legs and wheels} N: {robots have neither legs nor wheels} P ( legs or wheels ) = 1 − P ( N ) = 1 −

3.38

15 = 1 − .142 = .858 106

Define the following events: M: {firefighter is male} F: {firefighter is female} W: {glove fits well} N: {glove fits poorly}

3.39

3.40

a.

The sample points would be MW, MP, FW, and FP.

b.

P ( MW ) =

c.

P ( F ) = P ( FW ) + P ( FP ) =

d.

P (W ) = P ( MW ) + P ( FW ) =

e.

P ( FW ) =

f.

P ( F or W ) = P ( F ) + P (W ) − P ( FW ) =

a.

The analyst makes an early forecast and is only concerned with accuracy is the event ( A ∩ B) .

b.

The analyst is not only concerned with accuracy is the event Ac.

c.

The analyst is from a small brokerage firm or makes an early forecast is the event C ∪ B .

d.

The analyst makes a late forecast and is not only concerned with accuracy is the event B c ∩ Ac .

a.

The events A and B are not mutually exclusive since both events can occur at the same time. A worker can manage over ten passwords and also have had to reset a password that he/she forgot.

b.

𝑃(𝐴) = .19, 𝑃(𝐵) = .57

c.

In order to determine the union of events A and B, we would need to know the probability of the intersection between them.

415 = .708 , 586

P ( MP ) =

132 = .225 , 586

P ( FW ) =

19 = .032 , 586

P ( FP ) =

20 = .034 586

19 + 20 39 = = .067 586 586 415 + 19 434 = = .741 586 586

19 = .032 586 39 434 19 454 + − = = .775 586 586 586 586

Copyright © 2022 Pearson Education, Inc.


Probability

d. 3.41

103

𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵) = .19 + .57 − .15 = .61

Define the following event: A: {Store violates the NIST scanner accuracy standard} Then P( Ac ) = 1 − P( A) = 1 − 52 / 60 = 8 / 60 = .133

3.42

Define the following events: F: {Person uses Facebook} T: {Person uses Twitter} a.

The Venn Diagram which illustrates the use of social networking sites in UK is:

F 59%

F∩T 10%

T 12%

19%

3.43

3.44

b.

𝑃(𝐹 ∪ 𝑇) = 𝑃(𝐹) + 𝑃(𝑇) − 𝑃(𝐹 ∩ 𝑇) = 69% + 22% − 10% = 81%

c.

𝑃(𝐹 ∩ 𝑇 ) = 1 − 𝑃(𝐹 ∪ 𝑇) = 100% − 81% = 19%

a.

P ( A) = 1 −  P ( Online via personal computer ) + P ( Online via tablet or e-reader )  = 1 − ( .37 + .03) = .60

b.

P ( B ) = 1 −  P ( Debit card ) + P ( Credit card )  = 1 − ( .09 + .07 ) = .84

c.

P( A ∩ B) = P ( Write check ) + P ( Checking account withdrawal) + P ( Cash ) + P ( Mobile bill account ) = .22 + .10 + .07 + .08 = .44

d.

P( A ∪ B) = P( A) + P( B) − P( A ∩ B) = .60 + .84 − .44 = 1

a.

If customer gives fuzzy response 7 , P (7C ) = 1 − P ( 7 ) = 1 − =

b.

If the response is fuzzy 5 , then the possible responses could be 3, 5, or 7. If the response is fuzzy 9 , then the possible responses could be 7, or 9. Thus, the only possibility that both 5 and 9 have in common is 7.

1 3

2 . 3

Copyright © 2022 Pearson Education, Inc.


104

3.45

Chapter 3

First, define the following events: F: {Fully compensated} P: {Partially compensated} N: {Non-compensated} R: {Left because of retirement} From the text, we know 127 45 72 7 + 11 + 10 28 , P( P) = , P( N ) = , and P ( R ) = P( F ) = = 244 244 244 244 244

3.46

127 244

a.

P( F ) =

b.

P( F ∩ R) =

c.

P( F c ) = 1 − P( F ) = 1 −

d.

P( F ∪ R) = P( F ) + P( R) − P( F ∩ R) =

7 244 127 117 = 244 244 127 28 7 148 + − = 244 244 244 244

Define the following events: I: {The scent is indulgent} N: {The scent is non-indulgent} H: {The product is healthy}

a. 𝑃(𝐻 ∩ 𝐼 ) =

= 0.2945

b. 𝑃(𝐻 ∩ 𝑁) =

= 0.4554

c. 𝑃(𝐻) = 𝑃(𝐻 ∩ 𝐼 ) + 𝑃(𝐻 ∩ 𝑁) = 3.47

+

=

= 0.3980

Define the following events: M1: {Model 1} M2: {Model 2} 85 = .531 160

a.

P (5) =

b.

P (5 ∪ 0) = P(5) + P (0) − P (5 ∩ 0) = .531 +

c.

P ( M 2 ∩ 0) =

35 − 0 = .531 + .219 = .75 160

15 = .094 160

Copyright © 2022 Pearson Education, Inc.


Probability

3.48

a.

105

Define the following events: G: {Student is assigned to the guilty state} C: {Student chooses the stated option} Then P(G ) = 57 /171 = .333 .

3.49

b.

P(C ) = 60 /171 = .351

c.

P(G ∩ C ) = 45 / 171 = .263

d.

P(G ∪ C ) = P(G ) + P(C ) − P(G ∩ C ) = .333 + .351 − .263 = .421

Define the following events: A: {Individual tax return is audited by the IRS} B: {Corporation tax return is audited by the IRS}

3.50

,

a.

𝑃(𝐴) =

b.

𝑃(𝐴 ) = 1 − 𝑃(𝐴) = 1 − .0059 = .9941

c.

𝑃(𝐵) =

d.

𝑃(𝐵 ) = 1 − 𝑃(𝐵) = 1 − .0088 = .9912

,

,

, ,

,

= .0059

= .0088

There are a total of 6 × 6 × 6 = 216 possible outcomes from throwing 3 fair dice. To help demonstrate this, suppose the three dice are different colors – red, blue and green. When we roll these dice, we will record the outcome of the red die first, the blue die second, and the green die third. Thus, there are 6 possible outcomes for the first position, 6 for the second, and 6 for the third. This leads to the 216 possible outcomes. The Grand Duke argued that the chance of getting a sum of 9 and the chance of getting a sum of 10 should be the same since the number of partitions for 9 and 10 are the same. These partitions are: 9 126 135 144 225 234 333

10 136 145 226 235 244 334

In each case, there are 6 partitions. However, if we take into account the three colors of the dice, then there are various ways to get each partition. For instance, to get a partition of 126, we could get 126, 162, 216, 261, 612, and 621 (again, think of the red die first, the blue die second, and the green die third). However, to get a partition of 333, there is only 1 way. To get a partition of 144, there are 3 ways: 144, 414, and 441. The numbers of ways to get each of the above partitions are:

Copyright © 2022 Pearson Education, Inc.


106

Chapter 3 9 126 135 144 225 234 333

# ways 6 6 3 3 6 _ 1 25

10 136 145 226 235 244 334

# ways 6 6 3 6 3 _3 27

Thus, there are a total of 25 ways to get a sum of 9 and 27 ways to get a sum of 10. The chance of throwing a sum of 9 (25 chances out of 216 possibilities) is less than the chance of throwing a 10 (27 chances out of 216 possibilities). 3.51

A possible Venn Diagram would be:

ACC Dimension

CCC

Plain

Lambda

3.52

3.53

3.54

CFA

a.

P( A ∩ B ) = P ( A | B) P ( B) = .6(.2) = .12

b.

P( B | A) =

P( A ∩ B) .12 = = .3 P( A) .4

a.

P( A | B) =

P( A ∩ B) .1 = = .5 .2 P( B)

b.

P( B | A) =

P( A ∩ B) .1 = = .25 P( A) .4

c.

Events A and B are said to be independent if P( A | B) = P( A) . In this case, P ( A | B ) = .5 and P( A) = .4 . Thus, A and B are not independent.

a.

Since A and B are mutually exclusive events, P( A ∪ B) = P( A) + P( B) = .30 + .55 = .85

b.

Since A and C are mutually exclusive events, P( A ∩ C ) = 0

c.

P( A | B) =

P( A ∩ B) 0 = =0 P( B) .55 Copyright © 2022 Pearson Education, Inc.


Probability

3.55

3.56

d.

Since B and C are mutually exclusive events, P( B ∪ C ) = P( B) + P(C ) = .55 + .15 = .70

e.

No, B and C cannot be independent events because they are mutually exclusive events.

a.

If two events are independent, then P( A ∩ B) = P( A) P( B) = .4(.2) = .08 .

b.

If two events are independent, then P( A | B) = P( A) = .4 .

c.

P( A ∪ B) = P( A) + P( B) − P( A ∩ B) = .4 + .2 − .08 = .52

a.

If two fair coins are tossed, there are 4 possible outcomes or simple events. They are: E1 = HH

107

E2 = HT E3 = TH E4 = TT

Event A contains the simple events E1, E2, and E3. Event B contains the simple events E2 and E3. A Venn diagram of this would be:

A

B E2 E3

E1

E4

Since the coins are fair, each of the sample points is equally likely. Each would have probabilities of ¼. b.

1 3 P( A) = 3   = = .75 4 4 P ( A ∩ B ) = P ( E2 )+P ( E3 ) =

3.57

1 2 1 P( B) = 2   = = = .5 4 4 2 1 1 2 1 + = = = .5 4 4 4 2

P( A ∩ B) .5 = =1 P( B) .5

c.

P ( A | B) =

P( B | A) =

a.

P( A) = P( E1 ) + P ( E2 ) + P( E3 ) = .2 + .3 + .3 = .8

P( A ∩ B) .5 = = .667 P( A) .75

P( B) = P( E2 ) + P( E3 ) + P ( E5 ) = .3 + .3 + .1 = .7

Copyright © 2022 Pearson Education, Inc.


108

Chapter 3

P( A ∩ B) = P( E2 ) + P( E3 ) = .3 + .3 = .6 b.

P( E1 | A) =

P( E 1 ∩ A) P( E 1) .2 = = = .25 P( A) P( A) .8

P( E2 | A) =

P( E 2 ∩ A) P( E 2) .3 = = = .375 P ( A) P( A) .8

P( E3 | A) =

P( E 3 ∩ A) P( E 3) .3 = = = .375 P( A) P( A) .8

The original sample point probabilities are in the proportion .2 to .3 to .3 or 2 to 3 to 3. The conditional probabilities for these sample points are in the proportion .25 to .375 to .375 or 2 to 3 to 3. c.

(1)

P( B | A) = P( E2 | A) + P( E3 | A) = .375 + .375 = .75 (from part b)

(2)

P( B | A) =

P( A ∩ B) .6 = = .75 (from part a) P( A) .8

The two methods do yield the same result. d.

3.58

If A and B are independent events, P( B | A) = P( B) . From part c, P ( B | A) = .75 . From part a, P( B) = .7 . Since .75 ≠ .7 , A and B are not independent events.

The 36 possible outcomes obtained when tossing two dice are listed below: (1, 1) (1, 2) (1, 3) (1, 4) (1, 5) (1, 6) (2, 1) (2, 2) (2, 3) (2, 4) (2, 5) (2, 6) (3, 1) (3, 2) (3, 3) (3, 4) (3, 5) (3, 6) (4, 1) (4, 2) (4, 3) (4, 4) (4, 5) (4, 6) (5, 1) (5, 2) (5, 3) (5, 4) (5, 5) (5, 6) (6, 1) (6, 2) (6, 3) (6, 4) (6, 5) (6, 6) A: {(1, 2), (1, 4), (1, 6), (2, 1), (2, 3), (2, 5), (3, 2), (3, 4), (3, 6), (4, 1), (4, 3), (4, 5), (5, 2), (5, 4), (5, 6), (6, 1), (6, 3), (6, 5)} B: {(3, 6), (4, 5), (5, 4), (5, 6), (6, 3), (6, 5), (6, 6)} A ∩ B : {(3, 6), (4, 5), (5, 4), (5, 6), (6, 3), (6, 5)}

If A and B are independent, then P( A) P( B) = P( A ∩ B) . P ( A) =

18 1 = 36 2

P ( A) P ( B ) =

P( B) =

7 36

P( A ∩ B) =

6 1 = 36 6

1 7 7 1 ⋅ = ≠ = P ( A ∩ B ) . Thus, A and B are not independent. 2 36 72 6

Copyright © 2022 Pearson Education, Inc.


Probability

3.59

a.

P ( A ∩ C ) = 0  A and C are mutually exclusive. P ( B ∩ C ) = 0  B and C are mutually exclusive.

b.

P( A) = P(1) + P(2) + P(3) = .20 + .05 + .30 = .55

P( B) = P(3) + P(4) = .30 + .10 = .40

P(C ) = P(5) + P(6) = .10 + .25 = .35

P( A ∩ B) = P(3) = .30

P( A | B) =

P( A ∩ B) .30 = = .75 P( B) .40

A and B are independent if P( A | B) = P( A) . Since P ( A | B ) = .75 and P ( A) = .55 , A and B are not independent. Since A and C are mutually exclusive, they are not independent. Similarly, since B and C are mutually exclusive, they are not independent. c.

Using the probabilities of sample points, P( A ∪ B) = P(1) + P(2) + P(3) + P(4) = .20 + .05 + .30 + .10 = .65

Using the additive rule, P( A ∪ B) = P( A) + P( B) − P( A ∩ B) = .55 + .40 − .30 = .65 Using the probabilities of sample points, P( A ∪ C ) = P(1) + P(2) + P(3) + P(5) + P(6) = .20 + .05 + .30 + .10 + .25 = .90 Using the additive rule, P( A ∪ C ) = P( A) + P(C ) − P( A ∩ C ) = .55 + .35 − 0 = .90 3.60

3.61

From the Exercise, P( A) = .15 , P( B) = .10 , and P ( A ∩ B) = .05 . a.

If events A and B are mutually exclusive then P( A ∩ B) = 0 . For this problem, P ( A ∩ B ) = .05 . Therefore, events A and B are not mutually exclusive.

b.

P( B | A) =

c.

Events A and B are independent if P( B | A) = P( B) . For this exercise, P ( B | A) = .333 and P( B) = .10 . Since these are not equal, events A and B are not independent.

P( A ∩ B) .05 = = .333 P( A) .15

Define the following events: A: {American believes the American Dream is within reach} F: {Person is female} From the problem, we know that 𝑃(𝐴) = 0.24 and 𝑃(𝐹|𝐴) = 0.63 𝑃(𝐴 ∩ 𝐹) = 𝑃(𝐴)𝑃(𝐹|𝐴) = 0.24(0.63) = 0.1512.

Copyright © 2022 Pearson Education, Inc.

109


110

Chapter 3

3.62

Define the following events: G: {The respondent is assigned to the guilt state} A: {The respondent is assigned to the anger state} C: {The respondent chooses the stated option to repair car} a.

From Exercise 3.44, we know P(G ) = 57 /171 = .333 and P(G ∩ C ) = 45 / 171 = .263 P(C | G ) =

b.

P(G ∩ C ) .263 = = .790 P(G ) .333

From Exercise 3.44, we know P(C ) = 60 /171 = .351 . Thus, P (C c ) = 1 − .351 = .649 P( A | C c ) =

c.

P ( A ∩ C c ) = 50 / 171 = .292

P ( A ∩ C c ) .292 = = .450 P(C c ) .649

Two events C and G are independent if P(C ∩ G ) = P(C ) P(G ) . From Exercise 3.44, P(G ) = .333 , P(C ) = .351 , and P(G ∩ C ) = .263 . P(G ) P(C ) = .333(.351) = .117 ≠ .263 = P(G ∩ C ) . Thus C and G are not independent.

3.63

Define the following events: B: {Diamond is a blood diamond} R: {Diamond is a rough diamond} S: {Rough diamond is processed in Surat, India} From the Exercise, P ( B | R ) = .25 , P( S ) = .9 , P ( B | S ) =

3.64

a.

P ( B C | R ) = 1 − P ( B | R ) = 1 − .25 = .75

b.

P( B ∩ S ) = P( B | S ) P ( S ) =

1 3

1 (.9) = .3 3

Define the following events: L: {robot has legs only} W: {robot has wheels only} B: {robots have both legs and wheels} N: {robots have neither legs nor wheels}

P( L | W ) =

8 P( L ∩ W ) 8 = 106 = = .286 28 P (W ) 28 106

Copyright © 2022 Pearson Education, Inc.


Probability

3.65

111

Define the following events: T: {Consumer tracks all deliveries} A: {Consumer is American} From the Exercise, 𝑃(𝑇) = 0.56, 𝑃(𝐴|𝑇) = 0.44 𝑃(𝑇 ∩ 𝐴) = 𝑃(𝑇)𝑃(𝐴|𝑇) = 0.56(0.44) = 0.2464

3.66

Define the following events: A: {Person is victim of identity theft} B: {Theft occurred from unauthorized use of credit card} From the exercise, 𝑃(𝐴) = .10 and 𝑃(𝐵|𝐴) = .83

3.67

a.

𝑃(𝐴) = .10

b.

𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐵|𝐴)𝑃(𝐴) = .10(.83) = .0830

Define the following events: F: {Worker is fully compensated} P: { Worker is partially compensated} N: { Worker is non-compensated} R: { Worker retired} From the exercise, P( F ) = 127 / 244 = .520 , P( P) = 45 / 244 = .184 , P( R | F ) = 7 /127 = .055 , P( R | P) = 11/ 45 = .244 , and P( R | N ) = 10 / 72 = .139 . a.

P( R | F ) = 7 /127 = .055

b.

P( R | N ) = 10 / 72 = .139

c.

The two events are independent if P( R | F ) = P( R) . 7 + 11 + 10 28 = = .115 and P( R | F ) = 10 / 72 = .055 . Since these are not equal, events R 244 244 and F are not independent. P( R) =

3.68

Define the following events: S: {believe agency’s approach is extremely/very successful} P: {paid sponsored content on social media} From the Exercise, 𝑃(𝑆) = .32, 𝑃(𝑃|𝑆) = .79 𝑃(𝑃 ∩ 𝑆) = 𝑃(𝑃|𝑆)𝑃(𝑆) = .79(. 32) = .2528

Copyright © 2022 Pearson Education, Inc.


112

3.69

Chapter 3

Define the following events: W: {Mode is Watch} R: {Mode is Read} L: {Mode is Listen} T: {Platform is TV} O: {Platform is Online} D: {Platform is Radio} P: {Platform is Print}

3.70

a.

.35+.10+.01+.01+.04+.21+.03+.06+.05+.03+.10+.01=1.00

b.

𝑃(𝐿 ∩ 𝑇) = .05

c.

𝑃(𝑅 ∩ 𝑂) = .21

d.

𝑃(𝑊 ) = .35 + .10 + .01 + .01 = .47

e.

𝑃(𝑂) = .10 + .21 + .03 = .34

f.

𝑃(𝑇|𝑊 ) =

g.

𝑃(𝑅|𝑂) =

( ∩

)

( ) ( ∩ )

= .

=

( )

.

. .

= .7447 = .6176

Define the following events: Q: {Subject quit smoking} H: {State has a high tax rate} L: {State has a low tax rate}

3.71

a.

𝑃 (𝑄 ) =

= .6648

b.

𝑃(𝑄|𝐻) =

( ∩ )

c.

𝑃(𝑄|𝐿) =

( ∩ )

( ) ( )

=

=

= =

= .6691 = .5952

Define the following events: A: {Ambulance can travel to location A under 8 minutes} B: {Ambulance can travel to location B under 8 minutes} C: {Ambulance is busy} We are given P( A) = .58 , P( B) = .42 , and P(C ) = .3 . a.

P( A ∩ C c ) = P ( A | C c ) P (C c ) = .58(1 − .3) = .406

b.

P( B | C c ) P (C c ) = .42(1 − .3) = .294

Copyright © 2022 Pearson Education, Inc.


Probability

3.72

Define the following events: S: {Adult is smartphone only} C: {Adult has a college degree} A: {Adult is between 18-29 years old} From the Exercise, 𝑃(𝑆) = .20, 𝑃(𝐶|𝑆) = .10, 𝑃(𝐴|𝑆) = .28

3.73

a.

𝑃(𝑆 ∩ 𝐶) = 𝑃(𝐶|𝑆)𝑃(𝑆) = .10(. 20) = .020

b.

𝑃(𝑆 ∩ 𝐴) = 𝑃(𝐴|𝑆)𝑃(𝑆) = .28(. 20) = .056

c.

𝑃(𝑆)𝑃(𝐶) = .20(.33) = .066 ≠ .020 = 𝑃(𝑆 ∩ 𝐶). Thus, S and C are not independent.

Define the following events: H: {Firefighter had no SOP for detecting/monitoring hydrogen cyanide in fire smoke} C: {Firefighter had no SOP for detecting/monitoring carbon monoxide if fire smoke} From the Exercise, P ( H ) = .80 , P ( M ) = .49 , P ( H ∪ M ) = .94 . We know P( H ∪ M ) = P( H ) + P( M ) − P( H ∩ M )  P( H ∩ M ) = P( H ) + P( M ) − P( H ∪ M ) P( H ∩ M ) = P( H ) + P( M ) − P( H ∪ M ) = .80 + .49 − .94 = .35

3.74

a.

Since there are 2 vineyards and 3 years, there are a total of 2(3) = 6 combinations.

b.

Of the 6 combinations, 3 of them are from the Llarga vineyard. Thus, P(Llarga) = 3 / 6 = .5 .

c.

Of the 6 combinations, 2 of them are Year 3. Thus, P(Year 3) = 2 / 6 = .333

d.

If the tasters are independent, then the probability that each selects Llarga is

P(Llarga) P(Llarga) P(Llarga) P(Llarga) = .5(.5)(.5)(.5) = .0625 . 3.75

a.

𝑃(𝐴) =

= .5992

b.

𝑃(𝐵) =

=

c.

𝑃(𝐶|𝐴) =

= .2208

d.

𝑃(𝐷|𝐵) =

= .1321

e.

𝑃(𝐴 ∩ 𝐵) = 0, as events A and B are mutually exclusive.

f.

𝑃(𝐵 ∩ 𝐷) = 𝑃(𝐷|𝐵)𝑃(𝐵) = .1321(. 4008) = .0529

= .4008

Copyright © 2022 Pearson Education, Inc.

113


114

3.76

Chapter 3

Define the following events: A1: {Adele 1 is selected first} A2: {Adele 2 is selected second} I1: {Imagine Dragons 1 is selected third} I2: {Imagine Dragons 2 is selected fourth} M5: {Maroon Five is selected fifth} 1 5

a.

P ( A1) =

b.

P ( A2 | A1) =

c.

𝑃(𝐼1|𝐴1 ∩ 𝐴2) =

d.

𝑃(𝐼2|𝐴1 ∩ 𝐴2 ∩ 𝐼1) =

e.

𝑃(𝑀5|𝐴1 ∩ 𝐴2 ∩ 𝐼1 ∩ 𝐼2) = = 1

f.

𝑃(𝐴) = 𝑃(𝐴1 ∩ 𝐴2 ∩ 𝐼1 ∩ 𝐼2 ∩ 𝑀5) = 𝑃(𝑀5|𝐴1 ∩ 𝐴2 ∩ 𝐼1 ∩ 𝐼2)𝑃(𝐴1 ∩ 𝐴2 ∩ 𝐼1 ∩ 𝐼2) = 𝑃(𝑀5|𝐴1 ∩ 𝐴2 ∩ 𝐼1 ∩ 𝐼2)𝑃(𝐼2|𝐴1 ∩ 𝐴2 ∩ 𝐼1)𝑃(𝐴1 ∩ 𝐴2 ∩ 𝐼1) = 𝑃(𝑀5|𝐴1 ∩ 𝐴2 ∩ 𝐼1 ∩ 𝐼2)𝑃(𝐼2|𝐴1 ∩ 𝐴2 ∩ 𝐼1)𝑃(𝐼1|𝐴1 ∩ 𝐴2)𝑃(𝐴1 ∩ 𝐴2) = 𝑃(𝑀5|𝐴1 ∩ 𝐴2 ∩ 𝐼1 ∩ 𝐼2)𝑃(𝐼2|𝐴1 ∩ 𝐴2 ∩ 𝐼1)𝑃(𝐼1|𝐴1 ∩ 𝐴2)𝑃(𝐴2|𝐴1)𝑃(𝐴1) 1 1 1 1 1 = =1 120 2 3 4 5

g.

𝑃(𝐴2) = , 𝑃(𝐼1|𝐴2) = , 𝑃(𝑀5|𝐴2 ∩ 𝐼1) = , 𝑃(𝐼2|𝐴2 ∩ 𝐼1 ∩ 𝑀5) = ,

1 4

𝑃(𝐴1|𝐴2 ∩ 𝐼1 ∩ 𝑀5 ∩ 𝐼2) = = 1 𝑃(𝐵) = 𝑃(𝐴2 ∩ 𝐼1 ∩ 𝑀5 ∩ 𝐼2 ∩ 𝐴1) = 𝑃(𝐴1|𝐴2 ∩ 𝐼1 ∩ 𝑀5 ∩ 𝐼2)𝑃(𝐴2 ∩ 𝐼1 ∩ 𝑀5 ∩ 𝐼2) = 𝑃(𝐴1|𝐴2 ∩ 𝐼1 ∩ 𝑀5 ∩ 𝐼2)𝑃(𝐼2|𝐴2 ∩ 𝐼1 ∩ 𝑀5)𝑃(𝐴2 ∩ 𝐼1 ∩ 𝑀5) = 𝑃(𝐴1|𝐴2 ∩ 𝐼1 ∩ 𝑀5 ∩ 𝐼2)𝑃(𝐼2|𝐴2 ∩ 𝐼1 ∩ 𝑀5)𝑃(𝑀5|𝐴2 ∩ 𝐼1)𝑃(𝐴2 ∩ 𝐼1) = 𝑃(𝐴1|𝐴2 ∩ 𝐼1 ∩ 𝑀5 ∩ 𝐼2)𝑃(𝐼2|𝐴2 ∩ 𝐼1 ∩ 𝑀5)𝑃(𝑀5|𝐴2 ∩ 𝐼1)𝑃(𝐼1|𝐴2)𝑃(𝐴2) 1 1 1 1 1 = =1 120 2 3 4 5

3.77

(

)

a.

P E |Hp =1

b.

P ( E | H d ) = P { 6 / 9} + P {9 / 6} = .21(.14) + .14 (.21) = .0294 + .0294 = .0588

c.

(

P E | Hp

) = 1 = 17.0 Since this value is greater than 1, this supports the prosecution.

P( E | Hd )

.0588

Copyright © 2022 Pearson Education, Inc.


Probability 3.78

Define the following events: I: {Leak ignites immediately (jet fire)} D: {Leak has delayed ignition (flash fire)} From the problem, P( I ) = .01 and P( D | I c ) = .01 The probability of a jet fire or a flash fire = P( I ∪ D) = P( I ) + P( D) − P( I ∩ D) = P( I ) + P( D | I c ) P( I c ) − P( I ∩ D) = .01 + .01(1 − .01) − 0 = .01 + .0099 = .0199

A tree diagram of this problem is: I

I .01

D(.01)

Ic∩D .99(.01)=.0099

.01

.99 Ic

Dc (.99) 3.79

a.

Ic∩Dc .99(.99)=.9801

If the coin is balanced, then P( H ) = .5 and P(T ) = .5 on any trial. Also, we can assume that the results of any coin toss is independent of any other. Thus,

P( H ∩ H ∩ H ∩ H ∩ H ∩ H ∩ H ∩ H ∩ H ∩ H ) = P ( H ) P ( H ) P( H ) P ( H ) P( H ) P ( H ) P( H ) P ( H ) P( H ) P ( H ) = .5(.5)(.5)(.5)(.5)(.5)(.5)(.5)(.5)(.5) = .510 = .0009766 P( H ∩ H ∩ T ∩ T ∩ H ∩ T ∩ T ∩ H ∩ H ∩ H ) = P( H ) P( H ) P(T ) P(T ) P( H ) P(T ) P(T ) P( H ) P( H ) P( H ) = .5(.5)(.5)(.5)(.5)(.5)(.5)(.5)(.5)(.5) = .510 = .0009766 P(T ∩ T ∩ T ∩ T ∩ T ∩ T ∩ T ∩ T ∩ T ∩ T ) = P(T ) P(T ) P(T ) P(T ) P(T ) P(T ) P(T ) P(T ) P(T ) P(T ) = .5(.5)(.5)(.5)(.5)(.5)(.5)(.5)(.5)(.5) = .510 = .0009766 b.

Define the following events: A: {10 coin tosses result in all heads or all tails} B: {10 coin tosses result in mix of heads and tails} P ( A) = P ( H ∩ H ∩ H ∩ H ∩ H ∩ H ∩ H ∩ H ∩ H ∩ H ) + P (T ∩ T ∩ T ∩ T ∩ T ∩ T ∩ T ∩ T ∩ T ∩ T ) = .0009766 + .0009766 = .0019532

Copyright © 2022 Pearson Education, Inc.

115


116

Chapter 3

c. d. 3.80

P ( B ) = 1 − P ( A) = 1 − .0019532 = .9980468

From the above probabilities, the chances that either all heads or all tails occurred is extremely rare. Thus, if one of these sequences really occurred, it is most likely sequence #2.

Define the following events: C: {Erroneous ciphertest occurs} R: {Error in restoring plaintext} From the Exercise, P ( C ) = β , P ( R | C ) = .5 , P ( R | C C ) = αβ

(

) ( )

P ( R ) = P ( R | C ) P ( C ) + P R | C C P C C = .5 ( β ) + αβ (1 − β ) = .5β + αβ − αβ 2 = β (.5 + α − αβ )

3.81

3.82

a.

P ( B1 ∩ A) = P ( A | B1 ) P( B1 ) = .3(.75) = .225

b.

P ( B2 ∩ A) = P ( A | B2 ) P( B2 ) = .5(.25) = .125

c.

P ( A) = P ( B1 ∩ A) + P( B2 ∩ A) = .225 + .125 = .35

d.

P( B1 | A) =

P( B1 ∩ A) .225 = = .643 P( A) .35

e.

P ( B2 | A) =

P ( B2 ∩ A) .125 = = .357 P ( A) .35

First, we find the following probabilities: P( A ∩ B1 ) = P ( A | B1 ) P( B1 ) = .4(.2) = .08 P ( A ∩ B2 ) = P ( A | B2 ) P( B2 ) = .25(.15) = .0375 P ( A ∩ B3 ) = P ( A | B3 ) P( B3 ) = .6(.65) = .39 P ( A) = P ( A ∩ B1 ) + P( A ∩ B2 ) + P( A ∩ B3 ) = .08 + .0375 + .39 = .5075

a.

P ( B1 | A) =

P ( A ∩ B1 ) .08 = = .158 P ( A) .5075

b.

P ( B2 | A) =

P ( A ∩ B2 ) .0375 = = .074 P ( A) .5075

c.

P( B3 | A) =

P ( A ∩ B3 ) .39 = = .768 P( A) .5075

Copyright © 2022 Pearson Education, Inc.


Probability

3.83

If A is independent of B1, B2, and B3, then P ( A | B1 ) = P( A) = .4 . Then P ( B1 | A) =

3.84

3.85

P ( A | B1 ) P( B1 ) .4(.2) = = .2 P ( A) .4

a.

The event B | A is the event that the report at time t + 1 is “OK” given the report at time t is “OK”.

b.

The event B | AC is the event that the report at time t + 1 is “OK” given the report at time t is not “OK”.

c.

P AC = 1 − P ( A) = 1 − .8 = .2

d.

P ( A ∩ B ) = P ( B | A) P ( A) = .9 (.8) = .72

e.

P AC ∩ B = P B | AC P AC = .5 (.2) = .10

f.

P ( B ) = P ( A ∩ B ) + P AC ∩ B = .72 + .10 = .82

g.

P ( A | B) =

( ) (

)

(

) ( )

(

)

P ( B | A) P ( A) .9 (.8) = = .878 .82 P ( B)

Define the following events: E: {Expert makes the correct decision} N: {Novice makes the correct decision} M: {Matched condition} E: {Similar distracter condition} E: {Non-similar distracter condition}

3.86

a.

P ( E c | M ) = 1 − .9212 = .0788

b.

P ( N c | M ) = 1 − .7455 = .2545

c.

Since P ( N c | M ) = .2545 > P ( E c | M ) = .0788 , it is more likely that the participant is a Novice.

From the information given, 𝑃(𝐷) = 1/80, 𝑃(𝐷 ) = 79/80, 𝑃(𝑁|𝐷) = 1/2, 𝑃(𝑁 |𝐷) = 1/2, 𝑃(𝑁|𝐷 ) = 1, and 𝑃(𝑁 |𝐷 ) = 0. Using Bayes’ Rule P( D ∩ N ) P( N | D) P( D) = P( N ) P( N | D) P( D) + P( N | D c ) P( D c ) 1 1 1 1 ⋅ 1 2 80 160 = = 160 = = = .0063 1 1 79 1 79 1 158 159 ⋅ + 1⋅ + + 2 80 80 160 80 160 160

P( D | N ) =

3.87

117

a.

Converting the percentages to probabilities, P (275 − 300) = .52 , P (305 − 325) = .39 , and P (330 − 350) = .09 .

Copyright © 2022 Pearson Education, Inc.


118

Chapter 3

b.

Using Bayes Theorem, P(275 − 300 ∩ CC ) P(CC ) P(CC | 275 − 300) P(275 − 300) = P(CC | 275 − 300) P(275 − 300) + P(CC | 305 − 325) P(305 − 325) + P(CC | 330 − 350) P(330 − 350) P(275 − 300 | CC ) =

=

3.88

a.

.775(.52) .403 .403 = = = .516 .775(.52) + .77(.39) + .86(.09) .403 + .3003 + .0774 .7807

P( E1 ∩ error ) P(error ) P(error | E1 ) P( E1 ) = P(error | E1 ) P ( E1 ) + P (error | E2 ) P ( E2 ) + P (error | E3 ) P( E3 )

P( E1 | error ) =

=

b.

P( E2 ∩ error ) P(error ) P(error | E2 ) P( E2 ) = P (error | E1 ) P( E1 ) + P(error | E2 ) P( E2 ) + P(error | E3 ) P( E3 )

P( E2 | error ) =

=

c.

3.89

.03(.20) .006 .006 = = = .316 .01(.30) + .03(.20) + .02(.50) .003 + .006 + .01 .019

P( E3 ∩ error ) P(error ) P(error | E3 ) P( E3 ) = P(error | E1 ) P( E1 ) + P(error | E2 ) P( E2 ) + P(error | E3 ) P( E3 )

P( E3 | error ) =

= d.

.01(.30) .003 .003 = = = .158 .01(.30) + .03(.20) + .02(.50) .003 + .006 + .01 .019

.02(.50) .01 .01 = = = .526 .01(.30) + .03(.20) + .02(.50) .003 + .006 + .01 .019

If there was a serious error, the probability that the error was made by engineer 3 is .526. This probability is higher than for any of the other engineers. Thus engineer #3 is most likely responsible for the error.

Define the following events: S: {Shale} D: {Dolomite } G: {Gamma ray reading > 60 } From the exercise: P( D) =

476 295 34 280 = .617 , P( S ) = = .383 , P(G | D) = = .071 , and P(G | S ) = = .949 . 771 771 476 295

P ( D ∩ G ) = P (G | D ) P ( D ) = .071(.617) = .0438 and P (G ) = P (G | D) P ( D ) + P (G | S ) P ( S ) = .071(.617) + .949(.383) = .0438 + .3635 = .4073 .

Copyright © 2022 Pearson Education, Inc.


Probability

119

P ( D ∩ G ) .0438 = = .1075 . Since this probability is so small, we would suggest that the P (G ) .4073 area should not be mined.

Thus, P ( D | G ) =

3.90

Define the following events: D: {Defect in steel casting} H: {NDE detects ‘Hit” or defect in steel casting} From the problem, P( H | D) = .97 , P( H | Dc ) = .005 , and P ( D ) = .01 . P( H ) = P( H | D) P( D) + P( H | Dc ) P( Dc ) = .97(.01) + .005(.99) = .0097 + .00495 = .01465

P( D | H ) =

3.91

P( D ∩ H ) P( H | D) P( D) .97(.01) .0097 = = = = .6621 P( H ) P( H ) .01465 .01465

Define the following events: D: {Worker is drug user} P: {Drug test is positive} a.

From the Exercise, P ( D ) = .05 , P ( P | D C ) = .05 , P ( P C | D ) = .05 .

(

)

)

P P ∩ DC

(

) ( )

P ( P ) = P P ∩ DC + P ( P ∩ D ) = P P | DC P DC + P ( P | D ) P ( D ) = .05 (1 − .05) + (1 − .05)(.05) = .0475 + .0475 = .095

(

P DC | P =

b.

(

P ( P)

) = P ( P | D ) P ( D ) = .05(1 − .05) = .0475 = .5 C

C

P ( P)

.095

.095

From the Exercise, P ( D ) = .95 , P ( P | D C ) = .05 , P ( P C | D ) = .05 .

(

)

)

P P ∩ DC

(

) ( )

P ( P ) = P P ∩ DC + P ( P ∩ D ) = P P | DC P DC + P ( P | D ) P ( D ) = .05 (1 − .95) + (1 − .05)(.95) = .0025 + .9025 = .905

(

P DC | P =

3.92

(

P ( P)

) = P ( P | D ) P ( D ) = .05(1 − .95) = .0025 = .0028 C

P ( P)

C

.905

.905

Define the following events: A: {Alarm A sounds alarm} B: {Alarm B sounds alarm} I: {Intruder} From the problem: P ( A | I ) = .9 , P ( B | I ) = .95 , P( A | I c ) = .2 , P( B | I c ) = .1 , and P( I ) = .4

Since the two systems are operating independently of each other, Copyright © 2022 Pearson Education, Inc.


120

Chapter 3

P( A ∩ B | I ) = P( A | I ) P( B | I ) = .9(.95) = .855 P( A ∩ B ∩ I ) = P( A ∩ B | I ) P( I ) = .855(.4) = .342

P( A ∩ B | I c ) = P( A | I c ) P( B | I c ) = .2(.1) = .02 P( A ∩ B ∩ I c ) = P( A ∩ B | I c ) P( I c ) = .02(.6) = .012 Thus, P( A ∩ B) = P( A ∩ B ∩ I ) + P( A ∩ B ∩ I c ) = .342 + .012 = .354 Finally, P ( I | A ∩ B) =

3.93

a.

b.

P ( A ∩ B ∩ I ) .342 = = .966 P( A ∩ B) .354

P (T | E ) < 1 , then P (T | E ) < P (T c | E ) . Thus, the probability of more than two bullets given the P (T c | E ) evidence is greater than the probability of two bullets given the evidence. This supports the theory of more than two bullets were used in the assassination of JFK.

If

Using Bayes Theorem, P (T | E ) =

P (T ) P ( E | T ) P (T c ) P ( E | T c ) and P (T c | E ) = . c c P (T ) P ( E | T ) + P (T ) P( E | T ) P (T ) P ( E | T ) + P (T c ) P ( E | T c )

P(T ) P( E | T ) P(T | E ) P(T ) P( E | T ) P(T ) P( E | T ) + P(T c ) P( E | T c ) = = . Thus, c P(T | E ) P(T c ) P( E | T c ) P(T c ) P( E | T c ) P(T ) P( E | T ) + P(T c ) P( E | T c ) 3.94

3.95

a.

If the Dow Jones Industrial Average increases, a large New York bank would tend to decrease the prime interest rate. Therefore, the two events are not mutually exclusive since they could occur simultaneously.

b.

The next sale by a PC retailer could not be both a notebook and a desktop computer. Since the two events cannot occur simultaneously, the events are mutually exclusive.

c.

Since both events cannot occur simultaneously, the events are mutually exclusive.

a.

The two probability rules for a sample space are that the probability for any sample point is between 0 and 1 and that the sum of the probabilities of all the sample points is 1. For this Exercise, all the probabilities of the sample points are between 0 and 1 and 4

 P(S ) = P(S ) + P(S ) + P(S ) + P(S ) = .2 + .1 + .3 + .4 = 1.0 i

1

2

3

4

i =1

b.

P( A) = P ( S1 ) + P( S4 ) = .2 + .4 = .6

Copyright © 2022 Pearson Education, Inc.


Probability

3.96

P ( A ∪ B) = P ( A) + P( B) − P( A ∩ B) = .7 + .5 − .4 = .8

3.97

a.

If events A and B are mutually exclusive, then P ( A ∩ B ) = 0 . P( A | B) =

3.98

P( A ∩ B) 0 = =0 P( B) .3

b.

No. If events A and B are independent, then P( A | B) = P( A) . However, from the Exercise we know P ( A) = .2 and from part a, we know P ( A | B ) = 0 . Thus, events A and B are not independent.

a.

Because events A and B are independent, we have: P ( A ∩ B ) = P ( A) P( B) = .3(.1) = .03

Thus, P ( A ∩ B ) ≠ 0 , and the two events cannot be mutually exclusive.

3.99

121

P( A ∩ B) .03 = = .3 P( B) .1

P ( B | A) =

P ( A ∩ B ) .03 = = .1 P ( A) .3

b.

P( A | B) =

c.

P ( A ∪ B) = P( A) + P( B) − P( A ∩ B) = .3 + .1 − .03 = .37

P ( A ∩ B ) = .4 , P ( A | B ) = .8

Since P ( A | B) =

.8 =

P( A ∩ B) , substitute the given probabilities into the formula and solve for P(B). P( B)

.4 .4  P( B) = = .5 P( B) .8

3.100

The number of ways to select 5 things from 50 is a combination of 50 things taken 5 at a time or  50  50! 50! 50 ⋅ 49 ⋅ 48 ⋅ 47 ⋅ 46 ⋅ 45! = = = 2,118, 760 .  = 5 5!(50 − 5)! 5!45! 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅1 ⋅ 45!  

3.101

a.

P( A ∩ B) = 0 P ( B ∩ C ) = P (2) = .2 P ( A ∪ C ) = P (1) + P (2) + P(3) + P(5) + P(6) = .3 + .2 + .1 + .1 + .2 = .9 P ( A ∪ B ∪ C ) = P (1) + P(2) + P (3) + P (4) + P (5) + P (6) = .3 + .2 + .1 + .1 + .1 + .2 = 1 P( Bc ) = P(1) + P(3) + P(5) + P(6) = .3 + .1 + .1 + .2 = .7 P( Ac ∩ B) = P(2) + P(4) = .2 + .1 = .3

P( B | C ) =

P( B ∩ C ) P(2) .2 .2 = = = = .4 P(C ) P(2) + P (5) + P(6) .2 + .1 + .2 .5

Copyright © 2022 Pearson Education, Inc.


122

Chapter 3 P ( B | A) =

3.102

3.103

P ( B ∩ A) 0 = =0 P ( A) P ( A)

b.

Since P ( A ∩ B ) = 0 , and P( A) P( B) > 0 , these two would not be equal, implying A and B are not independent. However, A and B are mutually exclusive, since P ( A ∩ B ) = 0 .

c.

P( B) = P(2) + P (4) = .2 + .1 = .3 . But P( B | C ) , calculated above, is .4. Since these are not equal, B and C are not independent. Since P ( B ∩ C ) = .2 , B and C are not mutually exclusive.

a.

6! = 6 ⋅ 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅1 = 720

b.

10  10! 10 ⋅ 9 ⋅ 8 ⋅  ⋅ 1 = = 10  = 9 − ⋅ 8 ⋅ 7 ⋅  ⋅ 1⋅ 1 9!(10 9)! 9  

c.

10  10! 10 ⋅ 9 ⋅ 8 ⋅  ⋅ 1 = = 10  = 1 1!(10 − 1)! 1⋅ 9 ⋅ 8 ⋅  ⋅ 1  

d.

 6 6! 6 ⋅ 5⋅ 4 ⋅ 3⋅ 2 ⋅ 1 = = 20  =  3  3!(6 − 3)! 3 ⋅ 2 ⋅ 1 ⋅ 3 ⋅ 2 ⋅ 1

e.

0! = 1

a.

The sample points for this problem are the following: WLAN/Single, WSN/Single, AHN/Single, WLAN/Multi, WSN/Multi, and AHN/Multi. From the sample data collected, P(WLAN/Single) = 31/80, P(WSN/Single) = 13/80, P(AHN/Single) = 8/80, P(WLAN/Multi) = 14/80, P(WSN/Multi) = 9/80, and P(AHN/Multi) = 5/80.

b.

𝑃(𝑆𝑖𝑛𝑔𝑙𝑒 𝐶ℎ𝑎𝑛𝑛𝑒𝑙) = 𝑃(𝑊𝐿𝐴𝑁/𝑆𝑖𝑛𝑔𝑙𝑒) + 𝑃(𝑊𝑆𝑁/𝑆𝑖𝑛𝑔𝑙𝑒) + 𝑃(𝐴𝐻𝑁, 𝑆𝐼𝑛𝑔𝑙𝑒) = = .65

c. 3.104

𝑃(𝑊𝐿𝐴𝑁 𝑁𝑒𝑡𝑤𝑜𝑟𝑘) = 𝑃(𝑊𝐿𝐴𝑁, 𝑆𝑖𝑛𝑔𝑙𝑒) + 𝑃(𝑊𝐿𝐴𝑁, 𝑀𝑢𝑙𝑡𝑖) =

+

=

= .5625

Define the following events: I: {personal illness} F: {family issues} N: {personal needs} E: {entitlement mentality} S: {stress} a.

The 5 sample points are: I, F, N, E, S

b.

The probability of each sample points are: P( I ) = 0.34, P( F ) = 0.22, P( N ) = 0.18, P( E ) = 0.13, P( S ) = 0.13

c.

The probability that the absence is due to something other than “personal illness” (I) is: P (not I ) = P( F ) + P( N ) + P( E ) + P( S ) = 0.22 + 0.18 + 0.13 + 0.13 = 0.66

Copyright © 2022 Pearson Education, Inc.

+

+

=


Probability

3.105

123

a.

𝑃(𝐴) = .08. The probability that a randomly selected adult earns money from online gig work is 8%.

b.

𝑃(𝐵) = 1 − 𝑃(𝐴) = 1 − .08 = .92. The probability that a randomly selected adult does not earn money from online gig work is 92%.

c.

𝑃(𝐴 ∩ 𝐶 ) = .045. The probability that a randomly selected adult earns money from online gig work and says that the income they earn is essential or important is 4.5%.

d.

𝑃(𝐶|𝐴) =

( ∩ ) ( )

=

. .

= .5625.

56.25% of all adults who earn money from online gig work say that the money they earn is essential or important. 3.106

Define the following events: A: {problems with absenteeism} T: {problems with turnover} From the problem, P ( A) = .55, P (T ) = .41 , and P ( A ∩ T ) = .22 P(problems with either absenteeism or turnover) = P ( A ∪ T ) = P ( A) + P(T ) − P( A ∩ T ) = .55 + .41 − .22 = .74

3.107

a.

This statement is false. All probabilities are between 0 and 1 inclusive. One cannot have a probability of 4.

b.

If we assume that the probabilities are the same as the percents (changed to proportions), then this is a true statement. P (4 or 5) = P (4) + P(5) = .6020 + .1837 = .7857

3.108

c.

This statement is true. There were no observations with one star. Thus, P (1) = 0 .

d.

This statement is false. P(2) = .0408 and P(5) = .1837 . P (5) > P(2) .

Define the following events: S: {cause of fatal crash is speeding} C: {cause of fatal crash is missing a curve} From the problem, we know P(S) = .3 and P ( S ∩ C ) = .12 . P(C | S ) =

P(C ∩ S ) .12 = = .4 P( S ) .3

Copyright © 2022 Pearson Education, Inc.


124

3.109

Chapter 3

a.

Since we want to maximize the purchase of grill #2, grill #2 must be one of the 3 grills in the display. Thus, we have to pick 2 more grills from the 4 remaining grills. Since order does not matter, the number of different ways to select 2 grill displays from 4 would be a combination of 4 things taken 2 at a time. The number of ways is:  4 4! 4 ⋅ 3 ⋅ 2 ⋅1 24 = = =6  = 2   2!(4 − 2)! 2 ⋅1⋅ 2 ⋅1 4

Let Gi represent Grill i. The possibilities are: G1G2G3, G1G2G4, G1G2G5, G2G3G4, G2G3 G5, G2G4G5 b.

c. 3.110

To find reasonable probabilities for the 6 possibilities, we divide the frequencies by the total sample size of 124. The probabilities would be: P(G1G2 G3 ) = 35 / 124 = .282

P (G1G2 G4 ) = 8 /124 = .065

P (G1G2 G5 ) = 42 / 124 = .339

P (G2 G3G4 ) = 4 / 124 = .032

P (G2 G3G5 ) = 1/124 = .008

P (G2 G4 G5 ) = 34 /124 = .274

P( display contained Grill #1) = P (G1G2 G3 ) + P (G1G2 G4 ) + P (G1G2 G5 ) = .282 + .065 + .339 = .686

Define the following events: A: {oil structure is active} I: {oil structure is inactive} C: {oil structure is caisson} W: {oil structure is well protector} F: {oil structure is fixed platform} a.

The simple events are all combinations of structure type and activity type. The simple events are: AC, AW, AF, IC, IW, IF

b.

Reasonable probabilities would be the frequency divided by the sample size of 3,400. The probabilities are: P( AC ) = 503 / 3, 400 = .148

P ( AW ) = 225 / 3, 400 = .066

P( AF ) = 1, 447 / 3, 400 = .426

P ( IC ) = 598 / 3, 400 = .176

P ( IW ) = 177 / 3, 400 = .052

P( IF ) = 450 / 3, 400 = .132

c.

P ( A) = P( AC ) + P( AW ) + P( AF ) = .148 + .066 + .426 = .640

d.

P (W ) = P ( AW ) + P ( IW ) = .066 + .052 = .118

e.

P( IC ) = .176

f.

P ( I ∪ F ) = P ( IC ) + P ( IW ) + P( IF ) + P( AF ) = .176 + .052 + .132 + .426 = .786

g.

P(C c ) = 1 − P(C ) = 1 − ( P( AC ) + P( IC ) ) = 1 − (.148 + .176 ) = 1 − .324 = .676

Copyright © 2022 Pearson Education, Inc.


Probability

3.111

a.

The international consumer is most likely to use the Certification mark on a label to identify a green product.

b.

Define the following events: A: {Certification mark on label} B: {Packaging} C: {Reading information about the product} D: {Advertisement} E: {Brand website} F: {Other} P ( A or B) = P( A) + P( B) = .45 + .15 = .60

3.112

c.

P(C or E ) = P(C ) + P( E ) = .12 + .04 = .16

d.

P (not D) = P ( A) + P ( B ) + P (C ) + P( E ) + P( F ) = .45 + .15 + .12 + .04 + .18 = .94

The total number of mothers in the workforce who have children is: 15,545 + 7,338 + 6,018 + 2,108 = 31,009 From the problem, 𝑃(𝐴) =

,

, ,

=

, ,

= .6954 and 𝑃(𝐵) =

𝑃(𝐴)𝑃(𝐵) = .6954(. 7379) = .5132 ≠ .5013 =

, ,

,

, ,

=

, ,

= .7379.

= 𝑃(𝐴 ∩ 𝐵)

Therefore, events A and B are not independent. 3.113

3.114

125

a.

P( A) =

1, 465 = .684 2,143

b.

P( B) =

265 = .124 2,143

c.

No. There is one sample point that they have in common: Plaintiff trial win – reversed, Jury

d.

P( Ac ) = 1 − P( A) = 1 − .684 = .316

e.

P( A ∪ B) =

194 + 71 + 429 + 111 + 731 1,536 = = .717 2,143 2,143

f.

P( A ∩ B) =

194 = .091 2,143

a.

P ∩ S ∩ A . Products 6 and 7 are contained in this intersection.

b.

P(possess all the desired characteristics) = P( P ∩ S ∩ A) = P(6) + P(7) =

c.

A∪ S

P ( A ∪ S ) = P (2) + P (3) + P (5) + P(6) + P(7) + P(8) + P(9) + P(10)

Copyright © 2022 Pearson Education, Inc.

1 1 1 + = 10 10 5


126

Chapter 3 =

d.

1 1 1 1 1 1 1 1 8 4 + + + + + + + = = 10 10 10 10 10 10 10 10 10 5

P∩S P( P ∩ S ) = P(2) + P(6) + P(7) =

e.

1 1 1 3 + + = 10 10 10 10

The number of different pairs of products would be a combination of 10 products taken 2 at a time or 10 10! 10 ⋅ 9 ⋅ 81 10 ⋅ 9  2  = 2!(10 − 2)! = 2 ⋅ 1 ⋅ 8 ⋅ 7 ⋅ 61 = 2 = 45

3.115

Define the following events: A: {The watch is accurate} N: {The watch is not accurate} Assuming the manufacturer's claim is correct, P ( N ) = .05 and P ( A) = 1 − P ( N ) = 1 − .05 = .95

The sample space for the purchase of four of the manufacturer's watches is listed below. (A, A, A, A) (N, A, A, A) (A, N, N, A) (N, A, N, N) (A, A, A, N) (A, A, N, N) (N, A, N, A) (N, N, A, N) (A, A, N, A) (A, N, A, N) (N, N, A, A) (N, N, N, A) (A, N, A, A) (N, A, A, N) (A, N, N, N) (N, N, N, N) a.

All four watches not being accurate as claimed is the sample point (N, N, N, N). Assuming the watches purchased operate independently and the manufacturer's claim is correct, P ( N , N , N , N ) = P ( N ) P ( N ) P ( N ) P ( N ) = .054 = .00000625

b.

The sample points in the sample space that consist of exactly two watches failing to meet the claim are listed below. (A, A, N, N) (N, A, A, N) (A, N, A, N) (N, A, N, A) (A, N, N, A) (N, N, A, A) The probability that exactly two of the four watches fail to meet the claim is the sum of the probabilities of these six sample points. Assuming the watches purchased operate independently and the manufacturer's claim is correct, P ( A, A, N , N ) = P( A) P( A) P( N ) P( N ) = .95(.95)(.05)(.05) = .00225625

All six of the sample points will have the same probability. Therefore, the probability that exactly two of the four watches fail to meet the claim when the manufacturer's claim is correct is 6(.00225625) = .0135

Copyright © 2022 Pearson Education, Inc.


Probability

c.

127

The sample points in the sample space that consist of three of the four watches failing to meet the claim are listed below. (A, N, N, N) (N, N, A, N) (N, A, N, N) (N, N, N, A) The probability that three of the four watches fail to meet the claim is the sum of the probabilities of the four sample points. Assuming the watches purchased operate independently and the manufacturer's claim is correct, P ( A, N , N , N ) = P ( A) P( N ) P( N ) P( N ) = .95(.05)(.05)(.05) = .00011875

All four of the sample points will have the same probability. Therefore, the probability that three of the four watches fail to meet the claim when the manufacturer's claim is correct is 4(.00011875) = .000475

If this event occurred, we would tend to doubt the validity of the manufacturer's claim since its probability of occurring is so small. d.

All four watches tested failing to meet the claim is the sample point (N, N, N, N). Assuming the watches purchased operate independently and the manufacturer's claim is correct, P ( N , N , N , N ) = P( N ) P( N ) P( N ) P( N ) = .05(.05)(.05)(.05) = .00000625

Since the probability of observing this event is so small if the claim is true, we have strong evidence against the validity of the claim. However, we do not have conclusive proof that the claim is false. There is still a chance the event can occur (with probability .00000625) although it is extremely small. 3.116

The possible ways of ranking the blades are: GSW

SGW

WGS

GWS

SWG

WSG

If the consumer had no preference but still ranked the blades, then the 6 possibilities are equally likely. Therefore, each of the 6 possibilities has a probability of 1/6 of occurring. a.

1 1 2 1 P(Ranks G first) = P(GSW ) + P(GWS ) = + = = 6 6 6 3

b.

1 1 2 1 P(Ranks G last) = P( SWG ) + P(WSG ) = + = = 6 6 6 3

c.

P(ranks G last and W second) = P( SWG ) =

d.

P (WGS ) =

1 6

1 6

Copyright © 2022 Pearson Education, Inc.


128

Chapter 3

3.117 Define the following event: A: {The specimen labeled “red snapper” was really red snapper} a.

The probability that you are actually served red snapper the next time you order it at a restaurant is P ( A) = 1 − .77 = .23

b.

P(at least one customer is actually served red snapper) = 1 – P(no customer is actually served red snapper) = 1 − P( Ac ∩ Ac ∩ Ac ∩ Ac ∩ Ac ) = 1 − P( Ac ) P( Ac ) P( Ac ) P( Ac ) P( Ac ) = 1 − .775 = 1 − .271 = .729

Note: In order to compute the above probability, we had to assume that the trials or events are independent. This assumption is likely to not be valid. If a restaurant served one customer a look-alike variety, then it probably served the next one a look-a-like variety. 3.118

3.119

a.

Consecutive tosses of a coin are independent events since what occurs one time would not affect the next outcome.

b.

If the individuals are randomly selected, then what one individual says should not affect what the next person says. They are independent events.

c.

The results in two consecutive at-bats are probably not independent. The player may have faced the same pitcher both times which may affect the outcome.

d.

The amount of gain and loss for two different stocks bought and sold on the same day are probably not independent. The market might be way up or down on a certain day so that all stocks are affected.

e.

The amount of gain or loss for two different stocks that are bought and sold in different time periods are independent. What happens to one stock should not affect what happens to the other.

f.

The prices bid by two different development firms in response to the same building construction proposal would probably not be independent. The same variables would be present for both firms to consider in their bids (materials, labor, etc.).

Define the following events: I: {Invests in Market} N: {No investment} 44, 651 = .283 158, 044

a.

P( I ) =

b.

P (IQ ≥ 6) =

c.

P ( I ∩ {IQ ≥ 6}) =

d.

P ( I ∪ {IQ ≥ 6}) = P( I ) + P (IQ ≥ 6) − P( I ∩ {IQ ≥ 6}) = .283 + .453 − .168 = .568

31,943 + 17, 958 + 12,145 + 9,531 71,577 = = .453 158, 044 158, 044 10, 270 + 6, 698 + 5,135 + 4, 464 26,567 = = .168 158, 044 158, 044

Copyright © 2022 Pearson Education, Inc.


Probability

e.

P ( I c ) = 1 − P ( I ) = 1 − .283 = .717

f.

Two events are mutually exclusive if the probability of their intersection is 0. 893 P ( I ∩ {IQ = 1}) = = .006 . Since this value is not 0, these two events are not mutually 158, 044 exclusive.

g.

10, 270 + 6, 698 + 5,135 + 4, 464 P ( I ∩ {IQ ≥ 6}) 26,567 158, 044 = = = .371 P ( I | IQ ≥ 6) = + + + 31,943 17,958 12,145 9,531 P (IQ ≥ 6) 71,577 158, 044

h.

44, 651 − 26,567 P( I ∩ {IQ ≤ 5}) 18, 084 158, 044 = = = .209 . P( I | IQ ≤ 5) = 158, 044 − 71,577 86, 467 P(IQ ≤ 5) 158, 044

i.

129

Yes, it appears that investing in the stock market is dependent on IQ. If investing in the stock market and IQ were independent, then P ( I | IQ ≤ 5) = P ( I | IQ ≥ 6) = P ( I ) . Since P ( I | IQ ≤ 5) ≠ P ( I | IQ ≥ 6) , then investing in the stock market and IQ are dependent.

3.120 Define the following events: A1: {Paraguay is assigned to Group A} A2: {Ecuador is assigned to Group A} B1: {Paraguay is assigned to Group B} B2: {Sweden or top team in pot 3 is assigned to Group B} D1: {Paraguay is assigned to Group D} D2: {Ecuador is assigned to Group D} If the teams are drawn at random from each pot, the probability that any team is assigned to a group is 1/8. a.

P ( A1 ) = 1/ 8 = .125

b.

P ( A1 ∪ A2 ) = P ( A1 ) + P( A2 ) − P( A1 ∩ A2 ) = 1 / 8 + 1/ 8 − 0 = 2 / 8 = .25

c.

P ( B1 ∩ B2 ) = P( B1 ) P( B2 ) = (1/ 8)(2 / 8) = 2 / 64 = .03125

d.

We can look at this probability by looking at how the slots can be filled. We will just look at how the teams from pot 2 can be put into Groups A, B, C, and D. The order of filling these really does not matter, so we will look at the ways to fill Group C, then Group D, then Group A, then Group B. First, we will find the total number of ways we can fill these 4 slots or Groups where Group C cannot have Paraguay or Ecuador. Since Group C cannot have Paraguay or Ecuador, then there are only 6 ways to fill Group C. There would then be 7 ways to fill Group D, 6 ways to fill Group A and 5 ways to fill Group B. The total ways to fill these 4 Groups without having Paraguay or Ecuador in Group C is 6(7)(6)(5) = 1,260. Now, we will find the number of ways we can fill these 4 Groups where Group C cannot have Paraguay or Ecuador and Group D does have either Paraguay or Ecuador. There will be 6 ways to fill Group C, 2 ways to fill Group D, 6 ways to fill Group A, and 5 ways to fill Group B. The total ways to fill these 4 Groups without having Paraguay or Ecuador in Group C and having either Paraguay or Ecuador in Group D is 6(2)(6)(5) = 360. Copyright © 2022 Pearson Education, Inc.


130

Chapter 3

Thus, the probability that Group C does not have either Paraguay or Ecuador and Group D does have either Paraguay or Ecuador is 360 / 1, 260 = 2 / 7 = .286 . Finally, the probability that Group D does not have either Paraguay or Ecuador is 1 − .286 = .714 . 3.121

Define the following events: S1: {Salesman makes sale on the first visit} S2: {Salesman makes a sale on the second visit} P( S1 ) = .4

P( S2 | S1c ) = .65

The sample points of the experiment are:

S1 ∩ S2c , S1c ∩ S2 , S1c ∩ S2c The probability the salesman will make a sale is:

P(S1 ∩ S2c ) + P(S1c ∩ S2 ) = P( S1 ) + P(S2 | S1c ) P( S1c ) = .4 + .65(1 − .4) = .4 + .39 = .79 3.122

Define the following events: U: {Athlete uses testosterone} P: {Test is positive} 50 = .5 100

a.

Sensitivity is P( P | U ) =

b.

Specificity is P( P c | U c ) = 1 −

c.

First, we need to find the probability that an athlete is a user: P(U ) = 100 /1000 = .1 .

9 = 1 − .01 = .99 900

Next, we need to find the probability of a positive test: P( P) = P( P | U ) P(U ) + P( P | U c ) P(U c ) = .5(.1) + .01(.9) = .05 + .009 = .059

Positive predictive value is P (U | P ) = 3.123

a.

P (U ∩ P) P ( P | U ) P(U ) .5(.1) = = = .847 P( P) P( P) .059

Suppose we let the four positions in a sample point represent in order (1) Raise a broad mix of crops, (2) Raise livestock, (3) Use chemicals sparingly, and (4) Use techniques for regenerating the soil, such as crop rotation. A farmer is either likely (L) to engage in an activity or unlikely (U). The possible classifications are: LLLL LLLU LLUL LULL ULLL LLUU LULU LUUL ULLU ULUL UULL LUUU ULUU UULU UUUL UUUU

Copyright © 2022 Pearson Education, Inc.


Probability

b.

Since there are 16 classifications or sample points and all are equally likely, then each has a probability of 1/16. P(UUUU ) =

c.

1 16

The probability that a farmer will be classified as likely on at least three criteria is

1 5 P( LLLL) + P( LLLU ) + P( LLUL) + P( LULL) + P(ULLL) = 5   = .  16  16 3.124

Define the following events: C: {Committee judges joint acceptable} I: {Inspector judges joint acceptable} The sample points of this experiment are:

C ∩ I , C ∩ I c , Cc ∩ I , Cc ∩ I c a.

The probability the inspector judges the joint to be acceptable is: P( I ) = P(C ∩ I ) + P(C c ∩ I ) =

101 23 124 + = = .810 153 153 153

The probability the committee judges the joint to be acceptable is: P(C ) = P(C ∩ I ) + P(C ∩ I c ) =

b.

101 10 111 + = = .725 153 153 153

The probability that both the committee and the inspector judge the joint to be acceptable is: P(C ∩ I ) =

101 = .660 153

The probability that neither judge the joint to be acceptable is: P(C c ∩ I c ) = c.

The probability the inspector and committee disagree is: P(C ∩ I c ) + P(C c ∩ I ) =

10 23 33 + = = .216 153 153 153

The probability the inspector and committee agree is: P(C ∩ I ) + P(C c ∩ I c ) =

101 19 120 + = = .784 153 153 153

Copyright © 2022 Pearson Education, Inc.

19 = .124 153

131


132

3.125

Chapter 3

a.

The tree diagram would be:

R

1

AR1

2

AR2

3

AR3

1

ABl1

2

ABl2

3

ABl3

1

BR1

2

BR2

3

BR3

1

BBl1

2

BBl2

3

BBl3

A Bl

R B Bl

b. 3.126

No. If black color TVs are in higher demand then red TVs, then the probabilities involving black TVs should be higher than the probabilities involving the ref TVs with similar characteristics.

Define the following events: R: {Recycle} For the usefulness is salient condition, P ( R ) = For the control condition, P ( R ) =

26 = .667 . 39

14 = .359 . 39

Those in the usefulness is salient condition recycled at almost twice the rate as those in the control condition. 3.127

The probability of a false positive is P( A | B) .

3.128

a.

The total number of different combinations of the three dice is 6 × 6 × 6 = 216 .

b.

There is only 1 way that the outcome {1,1,1} can be obtained.

c.

The number of outcomes that do not include a 3 is 5 × 5 × 5 = 125 . Therefore, the number of outcomes that include at least 1 3 is 216 − 125 = 91 .

d.

There are 3 ways to obtain an outcome with two 1s and one 2: {1,1,2} , {1,2,1} , and { 2,1,1} .

e.

There are 3 ways to obtain an outcome with one 1 and two 2s: {1,2,2} , { 2,1,2} , and { 2,2,1} . Copyright © 2022 Pearson Education, Inc.


Probability

f.

There are 6 ways to obtain an outcome with one 1, one 2, and one 4: {1,2,4} , { 2,1,4} , {1,4,2} , { 2,4,1} , { 4,1,2} , and { 4,2,1} .

g.

There are 6 ways to obtain an outcome with one 1, one 2, and one 5: {1,2,5} , { 2,1,5} , {1,5,2} , { 2,5,1} , {5,1,2} , and {5,2,1} .

h.

There are 6 ways to obtain an outcome with one 1, one 2, and one 6: {1,2,6} , { 2,1,6} , {1,6,2} , { 2,6,1} , { 6,1,2} , and { 6,2,1} .

i.

The total number of “Fratilli” outcomes is 1 + 91 + 3 + 3 + 6 + 6 + 6 = 116 .

j.

The probability of a “Fratilli” outcome is P ( F ) =

133

116 = .537 . This is slightly different than Cardan’s 216

probability. 3.129

Define the following events: A: {Press is correctly adjusted} B: {Press is incorrectly adjusted} D: {part is defective} From the exercise, P ( A) = .90 , P( D | A) = .05 , and. We also know that event B is the complement of event A. Thus, P( B ) = 1 − P( A) = 1 − .90 = .10 . P( B | D) =

3.130

P( B ∩ D) P( D | B) P( B) .50(.10) .05 .05 = = = = = .526 P( D) P ( D | B) P ( B) + P( D | A) P( A) .50(.10) + .05(.90) .05 + .045 .095

There are a total of 6 × 6 = 36 outcomes when rolling 2 dice. If we let the first number in the pair represent the outcome of die number 1 and the second number in the pair represent the outcome of die number 2, then the possible outcomes are: 1,1 1,2 1,3 1,4 1,5 1,6

2,1 2,2 2,3 2,4 2,5 2,6

3,1 3,2 3,3 3,4 3,5 3,6

4,1 4,2 4,3 4,4 4,5 4,6

5,1 5,2 5,3 5,4 5,5 5,6

6,1 6,2 6,3 6,4 6,5 6,6

If both dice are fair, then each of these outcomes are equally like and have a probability of 1/36. a.

To win on the first roll, a player must roll a 7 or 11. There are 6 ways to roll a 7 and 2 ways to roll an 11. Thus the probability of winning on the first roll is: P(7 or 11) =

b.

8 = .2222 36

To lose on the first roll, a player must roll a 2 or 3. There is 1 way to roll a 2 and 2 ways to roll a 3. Thus the probability of losing on the first roll is: P(2 or 3) =

3 = .0833 36

Copyright © 2022 Pearson Education, Inc.


134

Chapter 3

c.

If a player rolls a 4 on the first roll, the game will end on the next roll if the player rolls 4 (player wins) or if the player rolls a 7 (player loses). There are 3 ways to roll a 4 and 6 ways to roll a 7. Thus, P(4or 7on 2nd roll) =

3.131

3+6 9 = = .25 . 36 36

Define the flowing events: A: {Dealer draws a blackjack} B: {Player draws a blackjack} a.

For the dealer to draw a blackjack, he needs to draw an ace and a face card. There are  4 4! 4 ⋅ 3 ⋅ 2 ⋅1 = = 4 ways to draw an ace and  = 1  1!(4 − 1)! 1 ⋅ 3 ⋅ 2 ⋅1 12  12! 12 ⋅ 11 ⋅10 ⋅⋅⋅1 = = 12 ways to draw a face card (there are 12 face  = 1 1!(12 − 1)! 1 ⋅11 ⋅10 ⋅ 9 ⋅⋅⋅1   cards in the deck).

The total number of ways a dealer can draw a blackjack is 4 ⋅ 12 = 48. The total number of ways a dealer can draw 2 cards is  52  52! 52 ⋅ 51 ⋅ 50 ⋅⋅⋅1 = = 1326  = 2   2!(52 − 2)! 2 ⋅1 ⋅ 50 ⋅ 49 ⋅ 48 ⋅⋅⋅1

Thus, the probability that the dealer draws a blackjack is P( A) = b.

48 = .0362 1326

In order for the player to win with a blackjack, the player must draw a blackjack and the dealer does not. Using our notation, this is the event B ∩ AC . We need to find the probability that the player draws a blackjack ( P ( B ) ) and the probability that the dealer does not draw a blackjack given the

player does ( P ( Ac | B ) ) . Then, the probability that the player wins with a blackjack is P( Ac | B) P( B) .

The probability that the player draws a blackjack is the same as the probability that the dealer draws a blackjack, which is P ( B ) = .0362 . There are 5 scenarios where the dealer will not draw a blackjack given the player does. First, the dealer could draw an ace and not a face card. Next, the dealer could draw a face card and not an ace. Third, the dealer could draw two cards that are not aces or face cards. Fourth, the dealer could draw two aces, and finally, the dealer could draw two face cards. The number of ways the dealer could draw an ace and not a face card given the player draws a blackjack is  3  36  3! 36! 3 ⋅ 2 ⋅1 36 ⋅ 35 ⋅ 34 ⋅⋅⋅1 ⋅ = ⋅ = 3(36) = 108    = 1 1  1!(3 − 1)! 1!(36 − 1)! 1 ⋅ 2 ⋅1 1 ⋅ 35 ⋅ 34 ⋅ 33 ⋅⋅⋅1

(Note: Given the player has drawn blackjack, there are only 3 aces left and 36 non-face cards.) Copyright © 2022 Pearson Education, Inc.


Probability

The number of ways the dealer could draw a face card and not an ace given the player draws a blackjack is 11 36  11! 36! 11⋅10 ⋅ 9 ⋅⋅⋅1 36 ⋅ 35 ⋅ 34 ⋅⋅⋅1 ⋅ = ⋅ = 11(36) = 396    = 1 1 1!(11 − 1)! 1!(36 − 1)! 1 ⋅10 ⋅ 9 ⋅ 8 ⋅⋅⋅1 1 ⋅ 35 ⋅ 34 ⋅ 33 ⋅⋅⋅1   

The number of ways the dealer could draw neither a face card nor an ace given the player draws a blackjack is  36  36! 36 ⋅ 35 ⋅ 34 ⋅⋅⋅1 = = 630  = 2 2!(36 − 2)! 2 ⋅ 1 ⋅ 34 ⋅ 33 ⋅ 32 ⋅⋅⋅1  

The number of ways the dealer could draw two aces given the player draws a blackjack is 3 3! 3 ⋅ 2 ⋅1 = =3  =  2  2!(3 − 2)! 2 ⋅1⋅1

The number of ways the dealer could draw two face cards given the player draws a blackjack is 11 11! 11 ⋅10 ⋅ 9 ⋅⋅⋅1 = = 55  = 2 2!(11 − 2)! 2 ⋅ 9 ⋅ 8 ⋅ 7 ⋅⋅⋅1  

The total number of ways the dealer can draw two cards given the player draws a blackjack is  50  50! 50 ⋅ 49 ⋅ 48 ⋅⋅⋅1 = = 1225  = 2 − ⋅ ⋅ 48 ⋅ 47 ⋅ 46 ⋅⋅⋅1 2!(50 2)! 2 1  

The probability that the dealer does not draw a blackjack given the player draws a blackjack is P( Ac | B) =

108 + 396 + 630 + 3 + 55 1192 = = .9731 1225 1225

Finally, the probability that the player wins with a blackjack is P( B ∩ Ac ) = P( Ac | B) P( B) = .9731(.0362) = .0352

3.132

Define the following events: A: {Algorithm predicts defects} B: {Module has defects} C: {Algorithm is correct} a.

Accuracy = P(C ) = P( A ∩ B) + P( Ac ∩ B c ) =

b.

Detection rate = P( A | B) =

d b+d

c.

False alarm = P( A | B c ) =

c a+c

d a a+d + = a+b+c+d a+b+c+d a+b+c+d

Copyright © 2022 Pearson Education, Inc.

135


136

Chapter 3

d c+d

d.

Precision = P( B | A) =

e.

From the SWDEFECTS file the table is: Module has Defects

Algorithm Predicts Defects

False

True

No

400

29

Yes

49

20

Accuracy = P(C ) = P( A ∩ B) + P( Ac ∩ Bc ) d a d +a 20 + 400 420 = + = = = = .843 a + b + c + d a + b + c + d a + b + c + d 400 + 29 + 49 + 20 498 The probability that the algorithm is correct is .843. d 20 20 = = = .408 b + d 29 + 20 49 The probability that the algorithm predicts a defect given the module is actually defective is .408.

Detection rate = P( A | B) =

c 49 49 = = = .109 a + c 400 + 49 449 The probability that the algorithm predicts a defect given the module is not defective is .109.

False alarm = P( A | Bc ) =

d 20 20 = = = .290 c + d 49 + 20 69 The probability that the module is defective given the algorithm predicted a defect is .290.

Precision = P( B | A) =

3.133

First, we will list all possible sample points for placing a car (C) and 2 goats (G) behind doors #1, #2, and #3. If the first position corresponds to door #1, the second position corresponds to door #2, and the third position corresponds to door #3, the sample space is: (C G G) (G C G) (G G C) Now, suppose you pick door #1. Initially, the probability that you will win the car is 1/3 – only one of the sample points has a car behind door #1. The host will now open a door behind which is a goat. If you pick door #1 in the first sample point (C G G), the host will open either door #2 or door #3. Suppose he opens door #3 (it really does not matter). If you pick door #1 in the second sample point (G C G), the host will open door #3. If you pick door #1 in the third sample point (G G C), the host will open door #2. Now, the new sample space will be: (C G) (G C) (G C) where the first position corresponds to door #1 (the one you chose) and the second position corresponds to the door that was not opened by the host. Now, if you keep door #1, the probability that you win the car is 1/3. However, if you switch to the remaining door, the probability that you win the car is now 2/3. Based on these probabilities, it is to your advantage to switch doors. The above could be repeated by selecting door #2 initially or door #3 initially. In either of these cases, Copyright © 2022 Pearson Education, Inc.


Probability

137

again, the probability of winning the car is 1/3 if you do not switch and 2/3 if you switch. Thus, Marilyn was correct. 3.134

Suppose we define the following event: E: {Error produced when dividing} From the problem, we know that P ( E ) = 1 / 9, 000, 000, 000 The probability of no error produced when dividing is P( E c ) = 1 − P( E ) = 1 − 1/ 9,000,000,000 = 8,999,999,999 / 9,000,000,000 = .999999999 ≈ 1.0000 Suppose we want to find the probability of no errors in 2 divisions (assuming each division is independent): P( E c ∩ E c ) = .999999999(.999999999) = .999999999 ≈ 1.0000

Thus, in general, the probability of no errors in k divisions would be: c c k k P (  Ec ∩ Ec ∩ E c ∩ ∩ E   ) = P ( E ) = [8, 999, 999, 999 / 9, 000, 000, 000] k

Suppose a user ran a program that performed 1 billion divisions. The probability of no errors in these 1 billion divisions would be: P( E c )1,000,000,000 = [8,999,999,999 / 9,000,000,000]1,000,000,000 = .8948

Thus, the probability of at least 1 error in 1 billion divisions would be 1 − P( E c )1,000,000,000 = 1 − [8,999,999,999 / 9,000,000,000]1,000,000,000 = 1 − .8948 = .1052

Copyright © 2022 Pearson Education, Inc.


Chapter 4 Random Variables and Probability Distributions 4.1

4.2

a.

The number of newspapers sold by New York Times each month can take on a countable number of values. Thus, this is a discrete random variable.

b.

The amount of ink used in printing the Sunday edition of the New York Times can take on an infinite number of different values. Thus, this is a continuous random variable.

c.

The actual number of ounces in a one gallon bottle of laundry detergent can take on an infinite number of different values. Thus, this is a continuous random variable.

d.

The number of defective parts in a shipment of nuts and bolts can take on a countable number of values. Thus, this is a discrete random variable.

e.

The number of people collecting unemployment insurance each month can take on a countable number of values. Thus, this is a discrete random variable.

a.

The closing price of a particular stock on the New York Stock Exchange is discrete. It can take on only a countable number of values.

b.

The number of shares of a particular stock that are traded on a particular day is discrete. It can take on only a countable number of values.

c.

The quarterly earnings of a particular firm is discrete. It can take on only a countable number of values.

d.

The percentage change in yearly earnings between last year and this year for a particular firm is continuous. It can take on any value in an interval.

e.

The number of new products introduced per year by a firm is discrete. It can take on only a countable number of values.

f.

The time until a pharmaceutical company gains approval from the U.S. Food and Drug Administration to market a new drug is continuous. It can take on any value in an interval of time.

4.3

Since there are only a fixed number of outcomes to the experiment, the random variable, x, the number of stars in the rating, is discrete.

4.4

The number of customers, x, waiting in line can take on values 0, 1, 2, 3, … . Even though the list is never ending, we call this list countable. Thus, the random variable is discrete.

4.5

The variable x, total compensation in 2019 (in $ millions), is a continuous random variable.

4.6

A banker might be interested in the number of new accounts opened in a month, or the number of mortgages it currently has, both of which are discrete random variables. 138 Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions

139

4.7

An economist might be interested in the percentage of the work force that is unemployed, or the current inflation rate, both of which are continuous random variables.

4.8

The manager of a hotel might be concerned with the number of employees on duty at a specific time, or the number of vacancies there are on a certain night.

4.9

The manager of a clothing store might be concerned with the number of employees on duty at a specific time of day, or the number of articles of a particular type of clothing that are on hand.

4.10

A stockbroker might be interested in the length of time until the stock market is closed for the day.

4.11

a.

𝑝 22 = .25

b.

𝑃 𝑥 = 20 or 𝑥 = 24 = 𝑃 𝑥 = 20 + 𝑃 𝑥 = 24 = .15 + .20 = .35

c.

𝑃 𝑥 ≤ 23 = 𝑃 𝑥 = 20 + 𝑃 𝑥 = 21 + 𝑃 𝑥 = 22 + 𝑃 𝑥 = 23 = .15 + .10 + .25 + .30 = .80

a.

The variable x can take on values 1, 3, 5, 7, and 9.

b.

The value of x that has the highest probability associated with it is 5. It has a probability of .4.

c.

Using MINITAB, the probability distribution of x as a graph is:

4.12

.4

p(x)

.3

.2

.1

0

4.13

4.14

1

2

3

4

5 x

6

7

8

9

d.

𝑃 𝑥 = 7 = .2

e.

𝑃 𝑥 ≥ 5 = 𝑝 5 + 𝑝 7 + 𝑝 9 = .4 + .2 + .1 = .7

f.

𝑃 𝑥 > 2 = 𝑝 3 + 𝑝 5 + 𝑝 7 + 𝑝 9 = .2 + .4 + .2 + .1 = .9

g.

𝐸(𝑥) = ∑ 𝑥𝑝(𝑥) = 1(. 1) + 3(. 2) + 5(. 4) + 7(. 2) + 9(. 1) = .1 + .6 + 2.0 + 1.4 + .9 = 5.0

a.

We know ∑ 𝑝(𝑥) = 1. Thus, 𝑝(2) + 𝑝(3) + 𝑝(5) + 𝑝(8) + 𝑝(10) = 1 ⇒ 𝑝(5) = 1 − 𝑝(2) − 𝑝(3) − 𝑝(8) − 𝑝(10) = 1 − .15 − .10 − .25 − .25 = .25

b.

𝑃(𝑥 = 2or𝑥 = 10) = 𝑃(𝑥 = 2) + 𝑃(𝑥 = 10) = .15 + .25 = .40

c.

𝑃(𝑥 ≤ 8) = 𝑃(𝑥 = 2) + 𝑃(𝑥 = 3) + 𝑃(𝑥 = 5) + 𝑃(𝑥 = 8) = .15 + .10 + .25 + .25 = .75

a.

This is not a valid distribution because ∑ 𝑝(𝑥) = .1 + .3 + .3 + .2 = .9 ≠ 1.

b.

This is a valid distribution because 0 ≤ 𝑝(𝑥) ≤ 1for all values of x and ∑ 𝑝(𝑥) = .25 + .5 + .25 = 1. Copyright © 2022 Pearson Education, Inc.


140

4.15

Chapter 4 c.

This is not a valid distribution because 𝑝(4) = −.3 < 0.

d.

The sum of the probabilities over all possible values of the random variable is ∑ 𝑝(𝑥) = .15 + .15 + .45 + .35 = 1.1 > 1, so this is not a valid probability distribution.

a.

When a die is tossed, the number of spots observed on the upturned face can be 1, 2, 3, 4, 5, or 6. Since the six sample points are equally likely, each one has a probability of 1/6. The probability distribution of x may be summarized in tabular form: 1

2

3

4

5

6

p(x)

1 6

1 6

1 6

1 6

1 6

1 6

The probability distribution of x may also be presented in graphical form:

p(x)

b.

x

1/6

0

1

2

3

4

5

6

x

4.16

a.

The sample points are (where H = head, T = tail): x = # heads

b.

c.

HHH HHT HTH THH HTT THT TTH TTT 3 2 2 2 1 1 1 0

If each event is equally likely, then P(sample point)=

=

𝑝(3) = , 𝑝(2) =

= , and 𝑝(0) =

+

+

= , 𝑝(1) =

+

+

Using Minitab, the graph of𝑝(𝑥)is:

Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions

.500

p(x)

.375

.250

.125

0

0

1

2

3

x

𝑃(𝑥 = 2or 𝑥 = 3) = 𝑝(2) + 𝑝(3) =

a.

𝜇 = 𝐸(𝑥) = ∑ 𝑥𝑝(𝑥) = −4(. 02) + (−3)(. 07) + (−2)(. 10) + (−1)(. 15) + 0(. 3) +1(. 18) + 2(. 10) + 3(. 06) + 4(. 02) = −.08 − .21 − .2 − .15 + 0 + .18 + .2 + .18 + .08 = 0

+

=

=

= ∑(𝑥 − 𝜇) 𝑝(𝑥) = (−4 − 0) (. 02) + (−3 − 0) (. 07) + (−2 − 0) (. 10) +(−1 − 0) (. 15) + (0 − 0) (. 30) + (1 − 0) (. 18) +(2 − 0) (. 10) + (3 − 0) (. 06) + (4 − 0) (. 02) = .32 + .63 + .4 + .15 + 0 + .18 + .4 + .54 + .32 = 2.94

𝜎 = 𝐸 (𝑥 − 𝜇)

𝜎 = √2.94 = 1.715 b.

Using MINITAB, the graph is: Histogram of x .30 .25 .20 p(x)

4.17

d.

.15 .10 .05 0

-4

-3

𝜇 − 2𝜎

-2

-1

𝜇=0

0

1

2

3

4

𝜇 + 2𝜎

𝜇 ± 2𝜎 ⇒ 0 ± 2(1.715) ⇒ 0 ± 3.430 ⇒ (−3.430,3.430) c.

𝑃(−3.430 < 𝑥 < 3.430) = 𝑝(−3) + 𝑝(−2) + 𝑝(−1) + 𝑝(0) + 𝑝(1) + 𝑝(2) + 𝑝(3) = .07 + .10 + .15 + .30 + .18 + .10 + .06 = .96

Copyright © 2022 Pearson Education, Inc.

141


142 4.18

Chapter 4 a.

𝜇 = 𝐸(𝑥) = ∑ 𝑥𝑝(𝑥) = 10(. 05) + 20(. 20) + 30(. 30) + 40(. 25) + 50(. 10) + 60(. 10) = .5 + 4 + 9 + 10 + 5 + 6 = 34.5 𝜎 = 𝐸(𝑥 − 𝜇) = ∑(𝑥 − 𝜇) 𝑝(𝑥) = (10 − 34.5) (. 05) + (20 − 34.5) (. 20) + (30 − 34.5) (. 30) +(40 − 34.5) (. 25) + (50 − 34.5) (. 10) + (60 − 34.5) (. 10) = 30.0125 + 42.05 + 6.075 + 7.5625 + 24.025 + 65.025 = 174.75 𝜎 = √174.75 = 13.219

b.

Using MINITAB, the graph is: Histogram of x .30 .25

p(x)

.20 .15 .10 .05 0

10

20

30

40

50

60

x

𝜇 − 2𝜎 c.

𝜇 = 34.5

𝜇 + 2𝜎

𝜇 ± 2𝜎 ⇒ 34.5 ± 2(13.219) ⇒ 34.5 ± 26.438 ⇒ (8.062,60.938) 𝑃(8.062 < 𝑥 < 60.938) = 𝑝(10) + 𝑝(20) + 𝑝(30) + 𝑝(40) + 𝑝(50) + 𝑝(60) = .05 + .20 + .30 + .25 + .10 + .10 = 1.00

4.19

a.

It would seem that the mean of both would be 1 since they both are symmetric distributions centered at 1.

b.

𝑃(𝑥)seems more variable since there appears to be greater probability for the two extreme values of 0 and 2 than there is in the distribution of y.

c.

For x: 𝜇 = 𝐸(𝑥) = ∑ 𝑥𝑝(𝑥) = 0(. 3) + 1(. 4) + 2(. 3)+= 0 + .4 + .6 = 1 𝜎 = 𝐸 (𝑥 − 𝜇) = ∑(𝑥 − 𝜇) 𝑝(𝑥) = (0 − 1) (. 3) + (1 − 1) (. 4) + (2 − 1) (. 3) = .3 + 0 + .3 = .6 For y: 𝜇 = 𝐸(𝑦) = ∑ 𝑦𝑝(𝑦) = 0(.1) + 1(.8) + 2(.1)+= 0 + .8 + .2 = 1 𝜎 = 𝐸 (𝑦 − 𝜇) = ∑(𝑦 − 𝜇) 𝑝(𝑦) = (0 − 1) (. 1) + (1 − 1) (. 8) + (2 − 1) (. 1) = .1 + 0 + .1 = .2

The variance for x is larger than that for y. Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions

4.20

a.

. 01 + .02 + .02 + .02 + ⋯ . +.09 = 1.00

b.

𝑃(𝑥 < 6) = .01 + .02 + .02 = .05

c.

𝑃(10 < 𝑥 < 15) = .07 + .07 + .06 + .09 = .29

d.

𝜇 = 𝐸(𝑥) = ∑

143

𝑥 𝑝(𝑥 ) = 3(. 01) + 4(. 02) + 5(.02 + ⋯ + 18(. 09) = 13.08

The average age when parents would buy their child their first smart speaker is 13.08 years old. 4.21

a.

The probability distribution for x is found by converting the Percent column to a probability column by dividing the percents by 100. The probability distribution of x is: x 2 3 4 5

b.

𝑃(𝑥 = 5) = 𝑝(5) = .1837.

c.

𝑃(𝑥 ≤ 2) = 𝑝(2) = .0408.

d.

𝜇 = 𝐸(𝑥) = ∑

p(x) .0408 .1735 .6020 .1837

𝑥 𝑝(𝑥 ) = 2(. 0408) + 3(. 1735) + 4(. 6020) + 5(. 1837) = .0816 + .5205 + 2.4080 + .9185 = 3.9286 ≈ 3.93

The average star rating for a car’s drivers-side star rating is 3.93. a.

Yes. Relative frequencies are observed values from a sample. Relative frequencies are commonly used to estimate unknown probabilities. In addition, relative frequencies have the same properties as the probabilities in a probability distribution, namely 1. all relative frequencies are greater than or equal to zero 2. the sum of all the relative frequencies is 1

b.

Using MINITAB, the graph of the probability distribution is: .16 .14 .12 .10 p(age)

4.22

.08 .06 .04 .02 0

c.

20

22

24

26 age

28

30

32

Let x = age of employee. Then 𝑃(𝑥 > 30) = .13 + .15 + .12 = .40. Copyright © 2022 Pearson Education, Inc.


144

Chapter 4

𝑃(𝑥 > 40) = 0 𝑃(𝑥 < 30) = .02 + .04 + .05 + .07 + .04 + .02 + .07 + .02 + .11 + .07 = .51

4.23

4.24

d.

𝑃(𝑥 = 25 or 𝑥 = 26) = .02 + .07 = .09

a.

In order for this to be a valid probability distribution, all probabilities must be between 0 and 1 and the sum of all the probabilities must be 1. For Section 1, all the probabilities are between 0 and 1. The sum of all the probabilities is. 05 + .25 + .25 + .45 = 1.00.

b.

For Section 2, all the probabilities are between 0 and 1. The sum of all the probabilities is .10 + .25 + .35 + .30 = 1.00. For Section 3, all the probabilities are between 0 and 1. The sum of all the probabilities is .15 + .20 + .30 + .35 = 1.00.

c.

𝑃(𝑥 > 30) = 𝑃(𝑥 = 40) + 𝑃(𝑥 = 50) + 𝑃(𝑥 = 60) = .25 + .25 + .45 = .95

d.

For Section 2, 𝑃(𝑥 > 30) = 𝑃(𝑥 = 40) + 𝑃(𝑥 = 50) + 𝑃(𝑥 = 60) = .25 + .35 + .30 = .90. For Section 3, 𝑃(𝑥 > 30) = 𝑃(𝑥 = 40) + 𝑃(𝑥 = 50) + 𝑃(𝑥 = 60) = .20 + .30 + .35 = .85

a.

The probability distribution for x is: Grill Display Combination 1-2-3 1-2-4 1-2-5 2-3-4 2-3-5 2-4-5

4.25

p(x) 35/124 = .282 8/124 = .065 42/124 = .339 4/124 = .032 1/124 = .008 34/124 = .274

b.

𝑃(𝑥 > 10) = 𝑝(11) = .274

a.

The possible values of x are 0, 2, 3, and 4.

b.

To find the probability distribution of x, we first find the frequency distribution of x. We then divide the frequencies by 𝑛 = 106 to get the probabilities. The probability distribution of x is: x f(x) p(x)

4.26

x 6 7 8 9 10 11

0 35 .3302

2 58 .5472

3 5 .0472

4 8 .0755

c.

𝜇 = 𝐸(𝑥) = ∑ 𝑥𝑝(𝑥) = 0(. 3302) + 2(. 5472) + 3(. 0472) + 4(. 0755) = 1.538. For all social robots, the average number of legs on the robot is 1.538.

a.

In order for this to be a valid probability distribution, all probabilities must be between 0 and 1 and the sum of all the probabilities must be 1. For Line 1, all the probabilities are between 0 and 1. The sum of all the probabilities is .01 + .02 + .02 + .95 = 1.00. For Line 2, all the probabilities are between 0 and 1. The sum of all the probabilities is .002 + .002 + .996 = 1.00.

b.

𝑃(𝑥 > 30) = 𝑃(𝑥 = 36) = .95

c.

𝑃(𝑥 > 30) = 𝑃(𝑥 = 35) + 𝑃(𝑥 = 70) = .002 + .996 = .998 Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions

145

d.

𝑃(𝑥 > 30|Line 1)𝑃(𝑥 > 30|Line 2) = .95(. 998) = .9481

e.

Line 1: 𝜇 = 𝐸(𝑥) = ∑ 𝑥𝑝(𝑥) = 0(. 01) + 12(. 02) + 24(. 02) + 36(. 95) = 0 + .24 + .48 + 34.2 = 34.92 Line 2: 𝜇 = 𝐸(𝑥) = ∑ 𝑥𝑝(𝑥) = 0(. 002) + 35(. 002) + 70(. 996) = 0 + .07 + 69.72 = 69.79

f.

Line 1: 𝜎 = ∑All (𝑥 − 𝜇) 𝑝(𝑥) = (0 − 34.92) (. 01) + (12 − 34.92) (. 02) + (24 − 34.92) (. 02) + (36 − 34.92) (. 95) = 12.1941 + 10.5065 + 2.3849 + 1.1081 = 26.1936𝜎 = √26.1936 = 5.1180 At least 75% of the capacities will fall within 2 standard deviations of the mean or 𝜇 ± 2𝜎 ⇒ 34.92 ± 2(5.1180) ⇒ 34.92 ± 10.236 ⇒ (24.684,45.156) Line 2: 𝜎 = ∑All (𝑥 − 𝜇) 𝑝(𝑥) = (0 − 69.79) (. 002) + (35 − 69.79) (. 002) + (70 − 69.79) (. 996) = 9.7413 + 2.4207 + .0439 = 12.2059 𝜎 = √12.2059 = 3.4937 At least 75% of the capacities will fall within 2 standard deviations of the mean or 𝜇 ± 2𝜎 ⇒ 69.79 ± 2(3.4937) ⇒ 69.79 ± 6.9874 ⇒ (62.8026,76.7774)

4.27

a.

The random variable x is a discrete random variable because it can take on only values 0, 1, 2, 3, 4, or 5 in this example.

b.

𝑝(0) =

!(.

𝑝(1) =

!(.

𝑝(2) =

!(.

𝑝(3) =

!(.

𝑝(4) =

!(.

𝑝(5) = c.

) (.

)

!(

)!

) (.

)

!(

)!

) (.

)

!(

)!

) (.

)

!(

)!

) (.

)

)! !( !(. ) (. ) !(

)!

=

⋅ ⋅ ⋅ ⋅ ( )(.

=

⋅ ⋅ ⋅ ⋅ (.

=

⋅ ⋅ ⋅ ⋅ (.

=

⋅ ⋅ ⋅ ⋅ (.

=

⋅ ⋅ ⋅ ⋅ (.

=

)

⋅ ⋅ ⋅ ⋅ ⋅ ) (.

= (. 62) = .0916 )

= 5(. 38)(. 62) = .2808

)

= 10(. 38) (. 62) = .3441

)

= 10(.38) (.62) = .2109

) (.

)

⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ (. ) (.

= 5(. 38) (. 62) = .0646

)

= (. 38) = .0079

⋅ ⋅ ⋅ ⋅ ) (.

⋅ ⋅ ⋅ ⋅ ) (.

⋅ ⋅ ⋅ ⋅

⋅ ⋅ ⋅ ⋅ ⋅

The two properties of discrete random variables are that 0 ≤ 𝑝(𝑥) ≤ 1 for all x and ∑ 𝑝(𝑥) = 1. From above, all probabilities are between 0 and 1 and ∑ 𝑝(𝑥) = .0916 + .2808 + .3441 + .2109 + .0646 + .0079 = 1

d.

𝑃(𝑥 ≥ 4) = 𝑝(4) + 𝑝(5) = .0646 + .0079 = .0725

Copyright © 2022 Pearson Education, Inc.


4.28

Chapter 4 a.

First, we must find the probability distribution of x. Define the following events: C: {Chicken is contaminated} N: {Chicken is not contaminated} If 3 slaughtered chickens are randomly selected, then the possible outcomes are: CCC, CCN, CNC, NCC, CNN, NCN, NNC, and NNN Each of these outcomes are NOT equally likely since𝑃(𝐶) = 1/100 = .01. 𝑃(𝑁) = 1 − 𝑃(𝐶) = 1 − .01 = .99. 𝑃(𝐶𝐶𝐶) = 𝑃(𝐶 ∩ 𝐶 ∩ 𝐶) = 𝑃(𝐶)𝑃(𝐶)𝑃(𝐶) = .01(. 01)(. 01) = .000001 𝑃(𝐶𝐶𝑁) = 𝑃(𝐶𝑁𝐶) = 𝑃(𝑁𝐶𝐶) = 𝑃(𝐶 ∩ 𝐶 ∩ 𝑁) = 𝑃(𝐶)𝑃(𝐶)𝑃(𝑁) = .01(. 01)(. 99) = .000099 𝑃(𝐶𝑁𝑁) = 𝑃(𝑁𝐶𝑁) = 𝑃(𝑁𝑁𝐶) = 𝑃(𝐶 ∩ 𝑁 ∩ 𝑁) = 𝑃(𝐶)𝑃(𝑁)𝑃(𝑁) = .01(. 99)(. 99) = .009801 𝑃(𝑁𝑁𝑁) = 𝑃(𝑁 ∩ 𝑁 ∩ 𝑁) = 𝑃(𝑁)𝑃(𝑁)𝑃(𝑁) = .99(. 99)(. 99) = .970299 The variable x is defined as the number of contaminated chickens in the sample. The value of x for each of the outcomes is: Event CCC CCN CNC NCC CNN NCN NNC NNN

x 3 2 2 2 1 1 1 0

The probability distribution of x is: x 0 1 2 3 b.

p(x) .000001 .000099 .000099 .000099 .009801 .009801 .009801 .970299 p(x) .970299 .029403 .000297 .000001

Using MINITAB, the probability graph for x is: 1

.8

.6 p(x)

146

.4

.2

0

0

1

2

3

x

c.

𝑃(𝑥 ≤ 1) = 𝑃(𝑥 = 0) + 𝑃(𝑥 = 1) = .970299 + .029403 = .999702 Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions 4.29

4.30

147

a.

= .23(. 77) = .23. The probability that one would encounter a contaminated 𝑝(1) = .23(. 77) cartridge on the first trial is .23.

b.

𝑝(5) = .23(. 77) = .23(. 77) = .0809. The probability that one would encounter a the first contaminated cartridge on the fifth trial is .0809.

c.

𝑃(𝑥 ≥ 2) = 1 − 𝑃(𝑥 ≤ 1) = 1 − 𝑃(𝑥 = 1) = 1 − .23 = .77. The probability that the first contaminated cartridge is found on the second trial or later is .77.

a.

If the first letters of consumers’ last names are all equally likely, then 𝑃(𝑥 = 𝑖) = 1/26 for i = 1, 2, …, 26.

b.

The expected value is 𝜇 = 𝐸(𝑥) = ∑ 𝑥𝑝(𝑥) = 1

+2

+3

+ ⋯ + 26

= 13.5

The average number given to a consumer based on his last name is 13.5. c.

This probability distribution is probably not realistic. Very few consumers have last names that begin with Q or U. However, many consumers have last names that begin with S and T. One could estimate the true probability distribution of x by taking a random sample of names from a phone book and looking at the relative frequency distribution of the values of x assigned to the sampled names. !

4.31

a.

𝑝(0) =

0

3-0 100 3

=

! )! !( !

!(

)!

b.

𝑝(1) =

3-1 100 3

=

!(

c.

𝑝(2) =

3-2 100 3

!(

)!

=

d. 4.32

a.

𝑝(3) =

3-0 100 3

)!

=

!(

!

! ! ! ! ! ! !

! )! !( !

)! )!

,

=

,

=

,

,

,

!

)!

!(

!(

=

!

!

3

=

!

! ! ! ! ! ! !

=

!

)!

)! !( !

!(

=

!

! )! !( !

!

2

=

! ! ! ! ! ! !

)!

!( !

1

! !

!

! !

! !

=

,

, ,

= .508

= .391

= .094

= .007

𝐸(𝑥) = ∑All 𝑥𝑝(𝑥) Firm A: 𝐸(𝑥) = 0(. 01) + 500(. 01) + 1000(. 01) + 1500(. 02) + 2000(. 35) + 2500(. 30) +3000(. 25) + 3500(. 02) + 4000(. 01) + 4500(. 01) + 5000(. 01) = 0 + 5 + 10 + 30 + 700 + 750 + 750 + 70 + 40 + 45 + 50 = 2450 Firm B: 𝐸(𝑥) = 0(. 00) + 200(. 01) + 700(. 02) + 1200(. 02) + 1700(. 15) + 2200(. 30) +2700(. 30) + 3200(. 15) + 3700(. 02) + 4200(. 02) + 4700(. 01) = 0 + 2 + 14 + 24 + 255 + 660 + 810 + 480 + 74 + 84 + 47 = 2450

b.

𝜎 = √𝜎

𝜎 = ∑All (𝑥 − 𝜇) 𝑝(𝑥)

Copyright © 2022 Pearson Education, Inc.


148

Chapter 4 Firm A: 𝜎 = (0 − 2450) (. 01) + (500 − 2450) (. 01) + ⋯ + (5000 − 2450) (. 01) = 60,025 + 38,025 + 21,025 + 18,050 + 70,875 + 750 + 75,625 +22,050 + 24,025 + 42,025 + 65,025 = 437,500 𝜎=

437,500 = 661.44

Firm B: 𝜎 = (0 − 2450) (. 00) + (200 − 2450) (. 01) + ⋯ + (4700 − 2450) (. 01) = 0 + 50,625 + 61,250 + 31,250 + 84,375 + 18,750 + 84,375 +31,250 + 61,250 + 50,625 = 492,500 𝜎=

492,500 = 701.78

Firm B faces greater risk of physical damage because it has a higher variance and standard deviation. 4.33

a.

The probability distribution for x is: x

p(x)

0 1 2

383/3,977 = .0963 3,453/3,977 = .8682 141/3,977 = .0355

b.

𝑃(𝑥 > 0) = .8682 + .0355 = .9037

c.

𝜇 = 𝐸(𝑥) = ∑ 𝑥𝑝(𝑥) = 0(0.963) + 1(. 8682) + 2(. 0355) = .9392 The average number of points scored after a touchdown is .9392 points

4.34

To determine which group of Finnish citizens has the highest average IQ score, we must find the expected value for each group. To do this, we first find the probability distribution for each group by dividing the frequency for each IQ level in each group by the group total. The probability distributions are: IQ 1 2 3 4 5 6 7 8 9

Invest .020 .030 .045 .120 .190 .230 .150 .115 .100

No Invest .041 .083 .088 .174 .217 .191 .099 .062 .045

For Investors, 𝜇 = 𝐸(𝑥) = ∑ 𝑥𝑝(𝑥) = 1(. 020) + 2(. 030) + 3(. 045) + ⋯ + 9(. 100) = 5.895 For Non-investors, 𝜇 = 𝐸(𝑥) = ∑ 𝑥𝑝(𝑥) = 1(. 041) + 2(. 083) + 3(. 088) + ⋯ + 9(. 045) = 4.992 Thus, the investors had a higher average IQ than the non-investors. 4.35

a.

Let x = the potential flood damages. Since we are assuming if it rains the business will incur damages and if it does not rain the business will not incur any damages, the probability distribution of x is: Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions

x p(x) b. 4.36

0 .7

300,000 .3

The expected loss due to flood damage is 𝜇 = 𝐸(𝑥) = ∑ 𝑥𝑝(𝑥) = 0(. 7) + 300,000(. 3) = 0 + 90,000 = $90,000

Let x = winnings in the Florida lottery. The probability distribution for x is: x −$1 $6,999,999

p(x) 22,999,999/23,000,000 1/23,000,000

The expected net winnings would be: , , 𝜇 = 𝐸(𝑥) = ∑ 𝑥𝑝(𝑥) = (−1) ,

,

+ 6,999,999

,

= −$. 70

,

The average winnings of all those who play the lottery is −$.70. 4.37

a.

The possible values of x are 30, 40, 50, or 60.

b.

𝑃(𝑥 = 30) = 𝑃(𝑥 = 30|Section 1)𝑃(Section 1) + 𝑃(𝑥 = 30|Section 2)𝑃(Section 2) +𝑃(𝑥 = 30|Section 3)𝑃(Section 3) 1 1 1 = .05 + .10 + .15 = .10 3 3 3

c.

𝑃(𝑥 = 40) = 𝑃(𝑥 = 40|Section 1)𝑃(Section 1) + 𝑃(𝑥 = 40|Section 2)𝑃(Section 2) +𝑃(𝑥 = 40|Section 3)𝑃(Section 3) = .25

1 1 1 + .25 + .20 = .2333 3 3 3

𝑃(𝑥 = 50) = 𝑃(𝑥 = 50|Section 1)𝑃(Section 1) + 𝑃(𝑥 = 50|Section 2)𝑃(Section 2) +𝑃(𝑥 = 50|Section 3)𝑃(Section 3) 1 1 1 + .35 + .30 = .30 = .25 3 3 3 𝑃(𝑥 = 60) = 𝑃(𝑥 = 60|Section 1)𝑃(Section 1) + 𝑃(𝑥 = 60|Section 2)𝑃(Section 2) +𝑃(𝑥 = 60|Section 3)𝑃(Section 3) 1 1 1 + .30 + .35 = .3667 = .45 3 3 3 Thus, the probability distribution of x is x p(x)

4.38

30 .1000

40 .2333

50 .3000

d.

𝑃(𝑥 ≥ 50) = 𝑝(50) + 𝑝(60) = .3000 + .3667 = .6667

a.

The preference scores for the 10 voters are: Voter Score

1 2

2 2

3 2

4 2

5 0

6 1

7 2

Copyright © 2022 Pearson Education, Inc.

60 .3667

8 2

9 2

10 2

149


150

Chapter 4

b.

The probability distribution for x is: x p(x)

4.39

0 .1

1 .1

2 .8

3 0

c.

𝑃(𝑥 > 2) = 𝑝(3) = 0

d.

No. A “Condorcet” committee is a committee that is preferred by voters over any other committee of 3 members. No voter preferred committee {A,B,C}.

Let x = bookie's earnings per dollar wagered. Then x can take on values $1 (you lose) and $-5 (you win). The only way you win is if you pick 3 winners in 3 games. If the probability of picking 1 winner in 1 game is .5, then 𝑃(𝑤𝑤𝑤) = 𝑝(𝑤)𝑝(𝑤)𝑝(𝑤) = .5(. 5)(. 5) = .125 (assuming games are independent). Thus, the probability distribution for x is: x

p(x)

$1 .875 $-5 .125 𝐸(𝑥) = ∑ 𝑥𝑝(𝑥) = 1(. 875) − 5(. 125) = .875 − .625 = $. 25 4.40

4.41

a.

! )!

!(

!

=

! !

=

⋅ ⋅ ⋅

= 15

⋅ ⋅

b.

5 = 2

!(

c.

7 = 0

!(

d.

6 = 6

!(

e.

4 = 3

!(

a.

x is discrete. It can take on only six values.

b.

This is a binomial distribution.

c.

𝑝(0) =

! )! ! )! ! )! ! )!

= = = =

! ! ! ! ! ! ! ! ! ! ! !

5 (. 7) (. 3) 0

5 (. 7) (. 3) 1 ! (. 7) (. 3) = .1323

𝑝(1) =

=

⋅ ⋅ ⋅

=

⋅ ⋅ ⋅

= =

= =

⋅ ⋅

= 10 ⋅

⋅ ⋅

⋅ ⋅

⋅ ⋅

= 4

! ! ! ! ! !

⋅ ⋅

= 1

(Note: 0! = 1)

= 1

(. 7) (. 3) =

⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅

(. 7) (. 3) = .02835

(1)(. 00243) = .00243 𝑝(2) =

5 (. 7) (. 3) 2

=

! !

! 5 (. (. 7) (. 3) = .3087 = 7) (. 3) ! ! 3 ! (. 7) (. 3) = .36015

𝑝(3) =

𝑝(4) =

! !

𝑝(5) =

5 (. 7) (. 3) 5

=

! ! !

(. 7) (. 3) = .16807

Copyright © 2022 Pearson Education, Inc.

5 (. 7) (. 3) 4

=


Random Variables and Probability Distributions

Histogram of x .4

p(x)

.3

.2

.1

0

0

1

2

3

𝜇 − 2𝜎

4.42

5

𝜇 = 3.5

𝜇 + 2𝜎

𝑛𝑝𝑞 =

5(. 7)(. 3) = 1.0247

d.

𝜇 = 𝑛𝑝 = 5(. 7) = 3.5

e.

𝜇 ± 2𝜎 ⇒ 3.5 ± 2(1.0247) ⇒ 3.5 ± 2.0494 ⇒ (1.4506,5.5494)

a.

𝑝(0) =

3 (. 3) (. 7) 0

3 (. 3) (. 7) 1 3 (. 𝑝(2) = 3) (. 7) 2 3 (. 𝑝(3) = 3) (. 7) 3

𝑝(1) =

b.

4.43

4

𝜎=

!

= = = =

! ! ! ! ! !

(. 3) (. 7) =

⋅ ⋅

⋅ ⋅

(1)(. 7) = .343

(. 3) (. 7) = .441 (. 3) (. 7) = .189

! ! ! ! !

(. 3) (. 7) = .027

The probability distribution in tabular form is: x

p(x)

0 1 2 3

.343 .441 .189 .027

a.

𝑃(𝑥 = 1) =

b.

𝑃(𝑥 = 2) =

c.

𝑃(𝑥 = 0) =

d.

𝑃(𝑥 = 3) =

e.

𝑃(𝑥 = 2) =

! ! ! ! ! ! ! ! ! ! ! ! ! ! !

(. 2) (. 8) = (. 6) (. 4) = (. 7) (. 3) =

⋅ ⋅

(. 6) (. 4) = 6(. 6) (. 4) = .3456

⋅ ⋅ ⋅ ⋅

⋅ ⋅

(. 2) (. 8) = 5(. 2) (. 8) = .4096

(. 7) (. 3) = 1(. 7) (. 3) = .027

(. 1) (. 9) =

⋅ ⋅ ⋅

(. 4) (. 6) =

⋅ ⋅ ⋅

(. 1) (. 9) = 10(. 1) (. 9) = .0081

(. 4) (. 6) = 6(. 4) (. 6) = .3456

Copyright © 2022 Pearson Education, Inc.

151


152

4.44

4.45

4.46

Chapter 4 !

(. 9) (. 1) =

(. 9) (. 1) = 3(. 9) (. 1) = .027

f.

𝑃(𝑥 = 1) =

a.

𝑃(𝑥 = 2) = 𝑃(𝑥 ≤ 2) − 𝑃(𝑥 ≤ 1) = .167 − .046 = .121 (from Table I, App. D with 𝑛 = 10 and 𝑝 = .4)

b.

𝑃(𝑥 ≤ 5) = .034

c.

𝑃(𝑥 > 1) = 1 − 𝑃(𝑥 ≤ 1) = 1 − .919 = .081

d.

𝑃(𝑥 < 10) = 𝑃(𝑥 ≤ 9) = 0

e.

𝑃(𝑥 ≥ 10) = 1 − 𝑃(𝑥 ≤ 9) = 1 − .002 = .998

f.

𝑃(𝑥 = 2) = 𝑃(𝑥 ≤ 2) − 𝑃(𝑥 ≤ 1) = .206 − .069 = .137

a.

𝜇 = 𝑛𝑝 = 25(. 5) = 12.5 𝜎 = 𝑛𝑝(1 − 𝑝) = 25(. 5)(. 5) = 6.25 and 𝜎 = √𝜎 = √6.25 = 2.5

b.

𝜇 = 𝑛𝑝 = 80(. 2) = 16 𝜎 = 𝑛𝑝(1 − 𝑝) = 80(. 2)(. 8) = 12.8 and 𝜎 = √𝜎 = √12.8 = 3.578

c.

𝜇 = 𝑛𝑝 = 100(. 6) = 60 𝜎 = 𝑛𝑝(1 − 𝑝) = 100(. 6)(. 4) = 24 and 𝜎 = √𝜎 = √24 = 4.899

d.

𝜇 = 𝑛𝑝 = 70(. 9) = 63 𝜎 = 𝑛𝑝(1 − 𝑝) = 70(. 9)(. 1) = 6.3 and 𝜎 = √𝜎 = √6.3 = 2.510

e.

𝜇 = 𝑛𝑝 = 60(. 8) = 48 𝜎 = 𝑛𝑝(1 − 𝑝) = 60(. 8)(. 2) = 9.6 and 𝜎 = √𝜎 = √9.6 = 3.098

f.

𝜇 = 𝑛𝑝 = 1,000(. 04) = 40 𝜎 = 𝑛𝑝(1 − 𝑝) = 1,000(. 04)(. 96) = 38.4 and 𝜎 = √𝜎 = √38.4 = 6.197

! !

⋅ ⋅

x is a binomial random variable with n = 4. a.

If the probability distribution of x is symmetric, 𝑝(0) = 𝑝(4)and𝑝(1) = 𝑝(3). 𝑛 We know 𝑝(𝑥) = x = 0, 1, ... , n, 𝑝 𝑞 𝑥 When n = 4,  4 4 4! 0 4 4! 4 0 4 4 p (0) = p (4) =   p 0 q 4 =   p 4 q 0  p q = p q q = p  p=q 0 4 0!4! 4!0!    

Since𝑝 + 𝑞 = 1, 𝑝 = .5 Therefore, the probability distribution of x is symmetric when 𝑝 = .5. b.

If the probability distribution of x is skewed to the right, then the mean is greater than the median. Therefore, there are more small values in the distribution (0, 1) than large values (3, 4). For this to happen, p must be smaller than .5. If we pick 𝑝 = .2, the probability distribution of x will be skewed to the right.

Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions

153

c.

If the probability distribution of x is skewed to the left, then the mean is smaller than the median. Therefore, there are more large values in the distribution (3, 4) than small values (0, 1). For this to happen, p must be larger than .5. If we pick 𝑝 = .8, the probability distribution of x will be skewed to the left.

d.

In part a, x is a binomial random variable with 𝑛 = 4 and 𝑝 = .5. 𝑝(𝑥) =

4 .5 .5 𝑥

x = 0, 1, 2, 3, 4

4 (. 5) (. 5) = 0 4 (. 𝑝(1) = 5) (. 5) = 1 4 (. 𝑝(2) = 5) (. 5) = 2

𝑝(0) =

! ! ! ! ! ! ! ! !

(. 5) = 1(. 5) = .0625 (. 5) = 4(. 5) = .25 (. 5) = 6(. 5) = .375

𝑝(3) = 𝑝(1) = .25 (since the distribution is symmetric) 𝑝(4) = 𝑝(0) = .0625 The probability distribution of x in tabular form is: x

0

1

2

3

4

p(x)

.0625

.25

.375

.25

.0625

𝜇 = 𝑛𝑝 = 4(. 5) = 2 Using MINITAB, the graph of the probability distribution of x when 𝑛 = 4 and 𝑝 = .5 is as follows. Histogram of x .4

p(x)

.3

.2

.1

0

0

1

2 x

3

4

𝜇=2 In part b, x is a binomial random variable with 𝑛 = 4 and 𝑝 = .2. 𝑝(𝑥) =

4 (. 2) (. 8) 𝑥

𝑝(0) =

4 (. 2) (. 8) = 1(1)(. 8) = .4096 0

x = 0, 1, 2, 3, 4

Copyright © 2022 Pearson Education, Inc.


Chapter 4

𝑝(1) =

4 (. 2) (. 8) = 4(. 2)(. 8) = .4096 1

𝑝(2) =

4 (. 2) (. 8) = 6(. 2) (. 8) = .1536 2

𝑝(3) =

4 (. 2) (. 8) = 4(. 2) (. 8) = .0256 3

𝑝(4) =

4 (. 2) (. 8) = 1(. 2) (. 8) = .0016 4

The probability distribution of x in tabular form is: x

0

1

2

3

4

p(x)

.4096

.4096

.1536

.0256

.0016

𝜇 = 𝑛𝑝 = 4(. 2) = .8. Using MINITAB, the graph of the probability distribution of x when 𝑛 = 4 and 𝑝 = .2 is as follows: Histogram of x .4

.3

p(x)

154

.2

.1

0

0

1

2 x

3

4

𝜇 = .8 In part c, x is a binomial random variable with 𝑛 = 4 and 𝑝 = .8. 𝑝(𝑥) =

4 (. 8) (. 2) 𝑥

𝑝(0) =

4 (. 8) (. 2) = 1(1)(. 2) = .0016 0

𝑝(1) =

4 (. 8) (. 2) = 4(. 8)(. 2) = .0256 1

𝑝(2) =

4 (. 8) (. 2) = 6(. 8) (. 2) = .1536 2

𝑝(3) =

4 (. 8) (. 2) = 4(. 8) (. 2) = .4096 3

x = 0, 1, 2, 3, 4

Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions 𝑝(4) =

155

4 (. 8) (. 2) = 1(. 8) (1) = .4096 4

The probability distribution of x in tabular form is: x

0

1

2

3

4

p(x)

.0016

.0256

.1536

.4096

.4096

Note: The distribution of x when 𝑛 = 4 and 𝑝 = .8 is the reverse of the distribution of x when 𝑛 = 4 and 𝑝 = .8. 𝜇 = 𝑛𝑝 = 4(. 8) = 3.2 Using MINITAB, the graph of the probability distribution of x when 𝑛 = 4 and 𝑝 = .8 is as follows: Histogram of x .4

p(x)

.3

.2

.1

0

0

1

2 x

3

4

𝜇 = 3.2

4.47

e.

In general, when𝑝 = .5, a binomial distribution will be symmetric regardless of the value of n. When p is less than .5, the binomial distribution will be skewed to the right; and when p is greater than .5, it will be skewed to the left. (Refer to parts a, b, and c.)

a.

Let S = adult feels that the “American Dream” is out of reach.

b.

To see if x is approximately a binomial random variable, we check the characteristics: 1.

n identical trials. Taking a random sample of size n = 10 from a very large population will result in trials being essentially identical.

2.

Two possible outcomes. The adults can either feels that the “American Dream” is out of reach (Success) or they can feel that the “American Dream” is not out of reach (Failure).

3.

P(S) remains the same from trial to trial. If we sample without replacement, then P(S) will change slightly from trial to trial. However, the differences are extremely small and will essentially be 0. In this example, P(S) ≈ .11.

4.

Trials are independent. Again, although the trials are not exactly independent, they are very close.

5.

The random variable x = number of adults who feel that the “American Dream” is out of reach in a sample of n = 10 trials. Copyright © 2022 Pearson Education, Inc.


156

Chapter 4 Thus, x is very close to being a binomial. We will assume that it is a binomial random variable. c.

For this problem, 𝑝 = .11.

d.

Using MINITAB, with 𝑛 = 10 and 𝑝 = .11, the probability is: Binomial with n = 10 and p = 0.11 x

P( X = x )

3

0.0706463

Thus, 𝑃(𝑥 = 3) = .0706. e.

Using MINITAB, with 𝑛 = 10 and 𝑝 = .11, the probability is: Binomial with n = 10 and p = 0.11 x P( X ≤ x ) 2

0.911557

Thus, 𝑃(𝑥 ≤ 2) = .9116. 4.48

a.

Let x = number of adults who agree to participate in the loyalty card program in 250 trials. To see if x is approximately a binomial random variable, we check the characteristics: 1.

n identical trials. Although the trials are not exactly identical, they are close. Taking a sample of reasonable size 250 from a very large population will result in trials being essentially identical.

2.

Two possible outcomes. The adults can either agree to participate in the loyalty card program or not. S = adult agrees to participate in the loyalty card program and F = adult does not agree to participate in the loyalty card program.

3.

P(S) remains the same from trial to trial. If we sample without replacement, then P(S) will change slightly from trial to trial. However, the differences are extremely small and will essentially be 0.

4.

Trials are independent. Again, although the trials are not exactly independent, they are very close.

5.

The random variable x = number of adults who agree to participate in the loyalty card program in 𝑛 = 250 trials.

Thus, x is very close to being a binomial. We will assume that it is a binomial random variable.

4.49

b.

𝑝 = 𝑃(𝑆) = .5. Half of adults agree to participate and half do not.

c.

𝜇 = 𝐸(𝑥) = 𝑛𝑝 = 250(. 5) = 125

a.

Let x = number of hotel guests who experienced a better-than-expected quality of sleep at the hotel in 15 trials. To see if x is approximately a binomial random variable we check the characteristics: 1.

n identical trials. Although the trials are not exactly identical, they are close. Taking a sample of reasonable size 15 from a very large population will result in trials being essentially identical. Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions

157

2.

Two possible outcomes. The hotel guests can experience a better-than-expected quality of sleep at the hotel or not. S = hotel guest experienced a better-than-expected quality of sleep at the hotel and F = hotel guest did not experience a better-than-expected quality of sleep at the hotel.

3.

P(S) remains the same from trial to trial. If we sample without replacement, then P(S) will change slightly from trial to trial. However, the differences are extremely small and will essentially be 0.

4.

Trials are independent. Again, although the trials are not exactly independent, they are very close.

5.

The random variable x = number of hotel guests who experienced a better-than-expected quality of sleep at the hotel in 𝑛 = 15trials.

Thus, x is very close to being a binomial. We will assume that it is a binomial random variable.

4.50

b.

𝑝 = 𝑃(𝑆) =.29(.71) = .2059

c.

Assuming 𝑝 = .20, 𝑃(𝑥 ≥ 10) = 1 − 𝑃(𝑥 ≤ 9) = 1 − 1.000 = .000 using Table I, Appendix D.

a.

We will check the 5 characteristics of a binomial random variable. 1. 2.

The experiment consists of n identical trials. There are only 2 possible outcomes for each trial. Let S = general practice physician in the United States does not recommend medicine as a career and F = general practice physician in the United States does recommend medicine as a career. The probability of success (S) is the same from trial to trial. For each trial, 𝑝 = 𝑃(𝑆) = .50 and 𝑞 = 1 − 𝑝 = 1 − .50 = .50. The trials are independent. The binomial random variable x is the number of general practice physicians in the United States in n trials who do not recommend medicine as a career.

3. 4. 5.

Thus, x is a binomial random variable. b.

From the information given, 𝑝 = .50.

c.

𝜇 = 𝐸(𝑥) = 𝑛𝑝 = 25(. 50) = 12.5 𝜎=

4.51

𝑛𝑝𝑞 =

25(. 50)(. 50) = √6.25 = 2.5

d.

Using MINITAB, with 𝑛 = 25 and 𝑝 = .50, 𝑃(𝑥 ≥ 1) = 1 − 𝑃(𝑥 = 0) = 1 − .0000 = 1.0000.

a.

Let x = number of sports parents that have professional sports and/or Olympic dreams for their child. Then x is a binomial random variable with n = 25 and p = .34. Using MINITAB with 𝑛 = 25 and 𝑝 = .34, the probability is: Binomial with n = 25 and p = 0.34 x P( X ≤ x ) 19

1.00000

𝑃(𝑥 < 20) = 𝑃(𝑥 ≤ 19) = 1.000 b.

Using MINITAB with 𝑛 = 25 and 𝑝 = .34, the probability is: Copyright © 2022 Pearson Education, Inc.


158

Chapter 4 Binomial with n = 25 and p = 0.34 x P( X ≤ x ) 20 9

1.00000 0.66995

𝑃(10 ≤ 𝑥 ≤ 20) = 𝑃(𝑥 ≤ 20) − 𝑃(𝑥 ≤ 9) = 1.000 − .66995 = .33005 c.

Let x = number of sports parents that have children that reach the Olympics or go pro. Then x is a binomial random variable with n = 25 and p = .05. Using MINITAB with 𝑛 = 25 and 𝑝 = .05, the probability is: Binomial with n = 25 and p = 0.05 x P( X ≤ x ) 19

1

𝑃(𝑥 < 20) = 𝑃(𝑥 ≤ 19) = 1.000 4.52

a.

Let x = number of students who initially answer the question correctly in 20 students. Then x is a binomial random variable with 𝑛 = 20 and 𝑝 = .5. Using a MINITAB with 𝑛 = 20 and 𝑝 = .5, the probability is: Cumulative Distribution Function Binomial with n = 20 and p = 0.5 x 10

P( X <= x ) 0.588099

Thus, 𝑃(𝑥 > 10) = 1 − 𝑃(𝑥 ≤ 10) = 1 − .5881 = .4119. b.

Let y = number of students who answer the question correctly after immediate feedback in 20 students. Then y is a binomial random variable with 𝑛 = 20 and 𝑝 = .7. Using a MINITAB with 𝑛 = 20 and 𝑝 = .7, the probability is: Cumulative Distribution Function Binomial with n = 20 and p = 0.7 x 10

P( X <= x ) 0.0479619

Thus, 𝑃(𝑥 > 10) = 1 − 𝑃(𝑥 ≤ 10) = 1 − .0480 = .9520. 4.53

a.

Let x = number of pairs correctly identified by an expert in 5 trials. Then x is a binomial random variable with 𝑛 = 5 and 𝑝 = .92. Using a MINITAB with 𝑛 = 5 and 𝑝 = .92, the probability is: Probability Density Function Binomial with n = 5 and p = 0.92 x 5

P( X = x ) 0.659082

Thus, 𝑃(𝑥 = 5) = .6591. b.

Let y = number of pairs correctly identified by a novice in 5 trials. Then y is a binomial random variable with 𝑛 = 5 and 𝑝 = .75. Using a MINITAB with 𝑛 = 5 and 𝑝 = .75, the probability is: Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions

159

Probability Density Function Binomial with n = 5 and p = 0.75 x 5

P( X = x ) 0.237305

Thus, 𝑃(𝑥 = 5) = .2373. 4.54

a.

Let x = number of commissioners out of 4 who vote in favor of an issue. Then x is a binomial random variable with 𝑛 = 4 and 𝑝 = .5 (since they are equally likely to vote for or against an issue). The probability that your vote counts is equal to 𝑃(𝑥 = 2). 𝑃(𝑥 = 2) =

b.

4.56

)!

. 5 (.5)

=

⋅ ⋅ ⋅

. 5 (. 5) = .375

⋅ ⋅ ⋅

Let x = number of commissioners out of 2 who vote in favor of an issue. Then x is a binomial random variable with 𝑛 = 2 and 𝑝 = .5 (since they are equally likely to vote for or against an issue). The probability that your vote counts is equal to 𝑃(𝑥 = 1). 𝑃(𝑥 = 1) =

4.55

! !(

! !(

)!

. 5 (. 5)

=

⋅ ⋅

. 5 (. 5) = .5

Let x = number of major bridges in Denver that will have a rating of 4 or below in 2020 in 10 trials. Then x has an approximate binomial distribution with 𝑛 = 10 and 𝑝 = .09. a.

𝑃(𝑥 ≥ 3) = 1 − 𝑃(𝑥 ≤ 2) = 1 − 𝑃(𝑥 = 0) − 𝑃(𝑥 = 1) − 𝑃(𝑥 = 2) 10 10 10 − − =1− . 09 (. 91) . 09 (. 91) . 09 (. 91) 1 2 0 10! 10! 10! . 09 . 91 − . 09 . 91 − . 09 . 91 = 1 − .3894 − .3851 − .1714 = .0541 =1− 1! 9! 2! 8! 0! 10!

b.

Since the probability of seeing at least 3 bridges out of 10 with ratings of 4 or less is so small, we can conclude that the forecast of 9% of all major Denver bridges will have ratings of 4 or less in 2020 is too small. There would probably be more than 9%.

Define the following events: A: {Taxpayer is audited} B: {Taxpayer has income less than $1 million) C: {Taxpayer has income of $1 million or higher} a.

From the information given in the problem, 𝑃(𝐴|𝐵) = 1/200 = .005

b.

𝑃(𝐴|𝐶) = 3/100 = .03

Let x = number of taxpayers with incomes under $1 million who are audited. Then x is a binomial random variable with 𝑛 = 5 and 𝑝 = .005. 𝑃(𝑥 = 1) =

5 . 005 (. 995) 1

=

! ! !

.0051 (. 995) = .0245

5 𝑃(𝑥 > 1) = 1 − 𝑃(𝑥 = 0) + 𝑃(𝑥 = 1) = 1 − + .0245 . 005 (.995) 0 5! . 005 (.995) + .0245 = 1 − . 9752 + .0245 = 1 − .9997 = .0003 =1 − 0! 5! c.

Let x = number of taxpayers with incomes of $1 million or more who are audited. Then x is a binomial random variable with 𝑛 = 5 and 𝑝 = .03. Copyright © 2022 Pearson Education, Inc.


160

Chapter 4

𝑃(𝑥 = 1) =

5 . 03 (. 97) 1

=

! ! !

.03 (. 97) = .1328

5 𝑃(𝑥 > 1) = 1 − 𝑃(𝑥 = 0) + 𝑃(𝑥 = 1) = 1 − + .1328 . 03 (. 97) 0 5! .03 (. 97) + .1328 = 1 − . 8587 + .1328 = 1 − .9915 = .0085 =1 − 0! 5! d.

Let x = number of taxpayers with incomes under $1 million who are audited. Then x is a binomial random variable with 𝑛 = 2 and 𝑝 = .005. Let y = number of taxpayers with incomes $1 million or more who are audited. Then y is a binomial random variable with 𝑛 = 2 and 𝑝 = .03. ! 2 . 005 (. 995) = .005 (. 995) = .9900 ! ! 0 ! 2 𝑃(𝑦 = 0) = = .03 (. 97) = .9409 . 03 (. 97) ! ! 0

𝑃(𝑥 = 0) =

𝑃(𝑥 = 0)𝑃(𝑦 = 0) = .9900(.9409) = .9315

4.57

4.58

e.

We must assume that the variables defined as x and y are binomial random variables. We must assume that the trials are identical, the probability of success is the same from trial to trial, and that the trials are independent.

a.

𝜇 = 𝑛𝑝 = 800(. 91) = 728, 𝜎 =

b.

𝑧= = = −40.52 . No. It would be extremely unlikely to observe less than half without traces of pesticide because the z-score associated with 400 is so far below the mean.

𝑛𝑝𝑞 =

800(. 91)(. 09) = √65.52 = 8.094

Assuming the supplier's claim is true, 𝜇 = 𝑛𝑝 = 500(. 001) = .5 and 𝜎 =

𝑛𝑝𝑞 =

500(. 001)(. 999) = √. 4995 = .707

If the supplier's claim is true, we would only expect to find .5 defective switches in a sample of size 500. Therefore, it is not likely we would find 4. Based on the sample, the guarantee is probably inaccurate. .

4.59

Note: 𝑧 = = = 4.95. This is an unusually large z-score. . a. We must assume that the probability that a specific type of ball meets the requirements is always the same from trial to trial and the trials are independent. To use the binomial probability distribution, we need to know the probability that a specific type of golf ball meets the requirements. b.

For a binomial distribution, 𝜇 = 𝑛𝑝 and 𝜎 =

𝑛𝑝𝑞.

In this example, 𝑛 = two dozen = 2(12) = 24, 𝑝 = .10, and 𝑞 = 1 − .10 = .90. (Success here means the golf ball does not meet standards.) 𝜇 = 𝑛𝑝 = 24(. 10) = 2.4 and 𝜎 = c.

𝑛𝑝𝑞 =

24(. 10)(. 90) = 1.47

In this situation, 𝑛 = 24, p = Probability of success = Probability golf ball does meet standards = .90, and 𝑞 = 1 − .90 = .10.

Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions 𝐸(𝑦) = 𝜇 = 𝑛𝑝 = 24(. 90) = 21.6 and 𝜎 = (Note that 𝜎 is the same as in part b.) 4.60

𝑛𝑝𝑞 =

24(. 10)(. 90) = 1.47

The distribution would be a binomial distribution. We will check the 5 characteristics of a binomial random variable. 1. 2.

The experiment consists of n identical trials. For this problem, there are 4 bytes or trials. There are only 2 possible outcomes for each trial. Let S = strings match on a byte and F = strings do not match on a byte. The probability of success (S) is the same from trial to trial. For each trial 𝑝 = 𝑃(𝑆) = .5 and 𝑞 = 1 − 𝑝 = 1 − .5 = .5. The trials are independent. Whether one byte matches does not affect whether another byte matches. The binomial random variable x is the number of bytes that match in 4 trials.

3. 4. 5.

Thus, x is a binomial random variable. a.

The random variable x is discrete since it can assume a countable number of values (0, 1, 2, ...).

b.

This is a Poisson probability distribution with 𝜆 = 3.

c.

In order to graph the probability distribution, we need to know the probabilities for the possible values of x. Using MINITAB with 𝜆 = 3: Probability Density Function Poisson with mean = 3 x 0 1 2 3 4 5 6 7 8 9 10

P( X = x ) 0.049787 0.149361 0.224042 0.224042 0.168031 0.100819 0.050409 0.021604 0.008102 0.002701 0.000810

Using MINITAB, the probability distribution of x in graphical form is: Histogram of x .25

.20

.15 f(x)

4.61

.10

.05

0

0

1

2

3

4

5

6

7

8

9

10

x

Copyright © 2022 Pearson Education, Inc.

161


162

Chapter 4

d.

𝜇=𝜆=3 𝜎 = 𝜆 = 3 and 𝜎 = √3 = 1.7321

4.62

𝜇 = 𝜆 = 1.5. Using MINITAB with the Poisson distribution and 𝜆 = 1.5, the probabilities are: Cumulative Distribution Function Poisson with mean = 1.5 x 3 2 0 6

4.63

4.64

P( X <= x ) 0.934358 0.808847 0.223130 0.999074

a.

𝑃(𝑥 ≤ 3) = .934358.

b.

𝑃(𝑥 ≥ 3) = 1 − 𝑃(𝑥 ≤ 2) = 1 − .808847 = .191153.

c.

𝑃(𝑥 = 3) = 𝑃(𝑥 ≤ 3) − 𝑃(𝑥 ≤ 2) = .934358 − .808847 = .125511.

d.

𝑃(𝑥 = 0) = .22313.

e.

𝑃(𝑥 > 0) = 1 − 𝑃(𝑥 = 0) = 1 − .22313 = .77687.

f.

𝑃(𝑥 > 6) = 1 − 𝑃(𝑥 ≤ 6) = 1 − .999074 = .000926.

a.

𝑃(𝑥 = 1) =

=

=

! ! ! ! ! ! ! ! !

=

( )

= .3

b.

𝑃(𝑥 = 3) =

=

=

! ! ! ! ! ! ! ! !

=

(

)

c.

𝑃(𝑥 = 2) =

=

=

! ! ! ! ! ! ! ! !

=

( )

= .167

d.

𝑃(𝑥 = 0) =

=

=

! ! ! ! ! ! ! ! !

=

( )

= .167

= .119

For 𝑁 = 8, 𝑛 = 3, and 𝑟 = 5, a.

𝑃(𝑥 = 1) =

=

=

! ! ! ! ! ! ! ! !

=

( )

= .268

b.

𝑃(𝑥 = 0) =

=

=

! ! ! ! ! ! ! ! !

=

( )

= .018

c.

𝑃(𝑥 = 3) =

=

=

! ! ! ! ! ! ! ! !

=

( )

d.

𝑃(𝑥 ≥ 4) = 𝑃(𝑥 = 4) + 𝑃(𝑥 = 5) = 0

= .179

Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions

163

Since the sample size is only 3, there is no way to get 4 or more successes in only 3 trials. 4.65

a.

Using MINITAB with 𝜆 = 1, and the Poisson distribution, the probability is: Cumulative Distribution Function Poisson with mean = 1 x 2

P( X <= x ) 0.919699

𝑃(𝑥 ≤ 2) = .919699 b.

Using MINITAB with 𝜆 = 2, and the Poisson distribution, the probability is: Cumulative Distribution Function Poisson with mean = 2 x 2

P( X <= x ) 0.676676

𝑃(𝑥 ≤ 2) = .676676 c.

Using MINITAB with 𝜆 = 3, and the Poisson distribution, the probability is: Cumulative Distribution Function Poisson with mean = 3 x 2

P( X <= x ) 0.423190

𝑃(𝑥 ≤ 2) = .42319 d.

The probability decreases as 𝜆 increases. This is reasonable because 𝜆 is equal to the mean. As the mean increases, the probability that x is less than a particular value will decrease.

Copyright © 2022 Pearson Education, Inc.


4.66

Chapter 4 a.

To graph the Poisson probability distribution with 𝜆 = 5, we need to calculate p(x) for x = 0 to 15. Using MINITAB with 𝜆 = 5, the results are: Probability Density Function Poisson with mean = 5 x 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

P( X = x ) 0.006738 0.033690 0.084224 0.140374 0.175467 0.175467 0.146223 0.104445 0.065278 0.036266 0.018133 0.008242 0.003434 0.001321 0.000472 0.000157

Using MINITAB, the graph is: Histogram of x .20

.15

f(x)

164

.10

.05

0

0

2

4

6

8

10

12

14

x

𝜇=5 𝜇 ± 2𝜎 b.

𝜇 = 𝜆 = 5 and 𝜎 = √𝜆 = √5 = 2.2361 𝜇 ± 2𝜎 ⇒ 5 ± 2(2.2361) ⇒ 5 ± 4.4722 ⇒ (. 5278,9.4722)

c.

Using MINITAB with 𝜆 = 5 Cumulative Distribution Function Poisson with mean = 5 x 9 0

P( X <= x ) 0.968172 0.006738

𝑃(. 5278 < 𝑥 < 9.4722) = 𝑃(1 ≤ 𝑥 ≤ 9) = 𝑃(𝑥 ≤ 9) − 𝑃(𝑥 ≤ 0) = .968172 − .006738 = .961434 Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions 4.67

4.68

For this problem, 𝑁 = 100, 𝑛 = 10, and 𝑥 = 4. a.

If the sample is drawn without replacement, the hypergeometric distribution should be used. The hypergeometric distribution requires that sampling be done without replacement.

b.

If the sample is drawn with replacement, the binomial distribution should be used. The binomial distribution requires that sampling be done with replacement.

With 𝑁 = 10, 𝑛 = 5, and 𝑟 = 7, x can take on values 2, 3, 4, or 5. a.

𝑃(𝑥 = 2) =

=

=

! ! ! ! ! ! ! ! !

=

( )

= .083

𝑃(𝑥 = 3) =

=

=

! ! ! ! ! ! ! ! !

=

( )

= .417

𝑃(𝑥 = 4) =

=

=

! ! ! ! ! ! ! ! !

=

( )

= .417

𝑃(𝑥 = 5) =

=

=

! ! ! ! ! ! ! ! !

=

( )

= .083

The probability distribution of x in tabular form is: x 2 3 4 5 b.

c.

p(x) .083 .417 .417 .083

𝜇=

=

𝜎 =

(

( )

= 3.5

) ( (

) )

=

(

) ( (

) )

=

= .583

𝜎 = √. 5833 = .764 𝜇 ± 2𝜎 ⇒ 3.5 ± 2(. 764) ⇒ 3.5 ± 1.528 ⇒ (1.972,5.028) The graph of the distribution is:

Copyright © 2022 Pearson Education, Inc.

165


166

Chapter 4

0.4

p(x)

0.3

0.2

0.1

0.0

2 1.972

4.69

3

4 x 3.5

d.

𝑃(1.972 < 𝑥 < 5.028) = 𝑃(2 ≤ 𝑥 ≤ 5) = 1.000

a.

The characteristics of a binomial random variable are:

5 5.028

1.

n identical trials. We are selecting 10 robots from 106. On the first trial, we are selecting 1 robot out of 106. On the next trial, we are selecting 1 robot out of 105. On the 10th trial, we are selecting 1 robot out of 97. These trials are not identical.

2.

Two possible outcomes. A selected robot either has no legs or wheels or it has some legs or wheels. S = robot has no legs or wheels and F = robot has either legs and/or wheels. This condition is met

3.

P(S) remains the same from trial to trial. For this example the probability of success does not stay constant. On the first trial, there are 106 robots of which 15 have neither legs nor wheels. Thus, P(S) on the first trial is 15/106. If a robot with neither legs nor wheels is selected on the first trial, then P(S) on the second trial would be 14/105. If a robot with neither legs nor wheels is not selected on the first trial, then P(S) on the second trial would be 15/105. The value of P(S) is not constant from trial to trial. This condition is not met.

4.

Trials are independent. The trials are not independent. The type of robot selected on one trial affects the type of robot selected on the next trial. This condition is not met.

5.

The random variable x = number of robots selected that do not have legs or wheels in 10 trials.

The necessary conditions for a binomial random variable are not met. b.

The characteristics of a hypergeometric random variable are: 1. The experiment consists of randomly drawing n elements without replacement from a set of N elements, r of which are successes and (N – r) of which are failures. For this example there are a total of 𝑁 = 106 robots, of which 𝑟 = 15 have neither legs nor wheels and 𝑁– 𝑟 = 106– 15 = 95 have some legs and/or wheels. We are selecting 𝑛 = 10 robots. 2. The hypergeometric random variable x is the number of successes in the draw of n elements. For this example, x = number of robots selected with no legs or wheels in 20 selections.

c.

𝜇=

=

(

)

= 1.415 and

Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions 𝜎=

(

) ( (

) )

(

=

)

(

)

(

)

!

d.

=

𝑃(𝑥 = 2) =

!(

4.70

= √1.1107 = 1.0539

! )! !( !

!(

)!

167

=

)!

. .

× ×

= .2801

a.

𝐸(𝑥) = 𝜇 = 𝜆 = 29

𝜎 = √𝜆 = √29 = 5.385

b. c.

= = 23.77 𝑧= . Using MINITAB with 𝜆 = 29, and the Poisson distribution, the probability is: Poisson with mean = 29 x P( X ≤ x ) 4 0.0000000

𝑃(𝑥 ≤ 4) = .0000 4.71

4.72

4.73

= .368

a.

For Poisson random variable x, 𝜆 = 1. 𝑃(𝑥 = 1) =

b.

𝑃(𝑥 = 1) =

c.

𝐸(𝑥) = 𝜇 = 𝜆 = 1, 𝜎 = √𝜆 = √1 = 1

a.

Let x = number of Reese candies sampled in the n = 2 trials. For this problem, we can use the Hypergeometric distribution with 𝑁 = 86, 𝑟 = 4, and 𝑛 = 2.

b.

𝑃(𝑥 = 1) =

!

!

= .368

=−

=

! ! ! ! ! ! ! ! !

=

= .0897

Let x = number of “clean” cartridges selected in 5 trials. For this problem, 𝑁 = 158, 𝑛 = 5, and 𝑟 = 122.  r   N − r   122  36 122! 36!  x  n − x   5   0  5!117! 0!36! = .2693 P ( x = 5) = = = 158! 158 N      n   5  5!153!

4.74

For Poisson random variable x, 𝜆 = 2. Using MINITAB with 𝜆 = 2, the probability is: Cumulative Distribution Function Poisson with mean = 2 x 4

P( X ≤ x ) 0.947347

𝑃(𝑥 > 4) = 1 − 𝑃(𝑥 ≤ 4) = 1 − .947347 = .052653

Copyright © 2022 Pearson Education, Inc.


168 4.75

Chapter 4 Let x = number of times “total visitors” is selected in 5 museums. For this exercise, x has a hypergeometric distribution with 𝑁 = 30, 𝑛 = 5, 𝑟 = 8 and 𝑥 = 0. !

𝑃(𝑥 = 0) = 4.76

!(

=

! )! !( ! )! !(

)!

= .1848

Let x = number of game-day traffic fatalities at the winning team’s location. For this Exercise, x has a Poisson distribution with 𝜆 = .5. 𝑃(𝑥 ≥ 3) = 1 − 𝑃(𝑥 ≤ 2) = 1 − 𝑃(𝑥 = 0) − 𝑃(𝑥 = 1) − 𝑃(𝑥 = 2) .5 𝑒 . .5 𝑒 . .5 𝑒 . − − = 1 − .6065 − .3033 − .0758 = .0144 =1− 0! 1! 2!

4.77

Let x = number of times cell phone accesses color code “b” in 7 handoffs. For this problem, x has a hypergeometric distribution with 𝑁 = 85, 𝑛 = 7, and 𝑟 = 40. !

! )! !( !

!(

=

𝑃(𝑥 = 2) =

=

( ,

,

,

,

)

= .1931

For this exercise, x has a hypergeometric distribution with 𝑁 = 57, 𝑛 = 10, and 𝑟 = 45. !

a.

=

𝑃(𝑥 = 5) =

! )! !( !

!(

!

b.

=

𝑃(𝑥 = 8) = 𝜇 = 𝐸(𝑥) =

=

(

)

=

,

)! )!

=

(

, ,

,

! )! !( !

!( !(

c.

)! )!

!(

4.79

,

)!

!(

4.78

)!

, ,

) ,

(

, ,

= .0224

)

,

= .3294

= 7.895

Let x = number of flaws in a 4 meter length of wire. For this exercise, x has a Poisson distribution with 𝜆 = .8. The roll will be rejected if there is at least one flaw in the sample of a 4 meter length of wire. 𝑃(𝑥 ≥ 1) = 1 − 𝑃(𝑥 = 0) = 1 −

.

. !

= 1 − .4493 = .5507

We have to assume that the flaws are randomly distributed throughout the roll of wire and that the 4 meter sample of wire is representative of the entire roll. 4.80

a.

Using MINITAB with 𝜆 = 10, Probability Density Function Poisson with mean = 10 x 24

P( X = x ) 0.0000732

𝑃(𝑥 = 24) = .0000732

Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions b.

169

Using MINITAB with 𝜆 = 10, Probability Density Function Poisson with mean = 10 x 23

P( X = x ) 0.0001756

𝑃(𝑥 = 23) = .0001756

4.81

c.

Yes, these probabilities are good approximations for the probability of “fire” and “theft”. The researchers estimated these probabilities to be .0001, indicating that these would be extremely rare events. Our probabilities of .0001 and .0002 are very close to .0001.

a.

Let x =the number of homes owned by the plaintiff. For this problem, we use the Hypergeometric distribution with 𝑁 = 276, 𝑛 = 53, and 𝑟 = 57. Using MINITAB, we find: Hypergeometric with N = 276, M = 57, and n = 53 x P( X ≤ x ) 50

1

𝑃(𝑥 ≥ 51) = 1 − 𝑃(𝑥 ≤ 50) = 1 − 1 = 0

b.

Yes. The likelihood of getting that many of the plaintiffs home in the sample is approximately 0.

c.

Let x = the number of the sections that bordered a window. Then x can be modeled using a binomial distribution with n = 98 and p = .50. Using MINITAB with n = 98, and p = .50, we find: Binomial with n = 98 and p = 0.5 x P( X ≤ x ) 78

1.00000

𝑃(𝑥 ≥ 79) = 1 − 𝑃(𝑥 ≤ 78) = 1 − 1 = 0

d.

4.82

Yes. Both the number of homes that were selected from the plaintiff’s homes and the number of sections that bordered windows were higher than should be expected. This could easily bias the results.

If it takes exactly 5 minutes to wash a car and there are 5 cars in line, it will take 5(5) = 25 minutes to wash these 5 cars. Thus, for anyone to be in line at closing time, more than 1 car must arrive in the final ½ hour. In addition, if on average 10 cars arrive per hour, then an average of 5 cars will arrive per ½ hour (30 minutes). If we let x = number of cars to arrive in ½ hour, then x is a Poisson random variable with 𝜆 = 5.

Copyright © 2022 Pearson Education, Inc.


170

Chapter 4 Using MINITAB with 𝜆 = 5, Cumulative Distribution Function Poisson with mean = 5 x 1

P( X <= x ) 0.0404277

𝑃(𝑥 > 1) = 1 − 𝑃(𝑥 ≤ 1) = 1 − .0404277 = .9595723 Since this probability is so large, it is very likely that someone will be in line at closing time. 4.83

4.84

4.85

.

.

= .333

a.

𝑃(𝑦 = 0) = 𝑃(𝑛 = 0) =

b.

𝑃(𝑦 = 1) = 𝑃(𝑛 = 1 ∩ 𝑥 = 1) = 𝑃(𝑛 = 1)𝑃(𝑥 = 1) =

!

.

. !

(. 4) = (. 333)(. 4) = .133

Table II in the text gives the area between𝑧 = 0 and 𝑧 = 𝑧 . In this exercise, the answers may thus be read directly from the table by looking up the appropriate z. a.

𝑃(0 < 𝑧 < 2.0) = .4772

b.

𝑃(0 < 𝑧 < 3.0) = .4987

c.

𝑃(0 < 𝑧 < 1.5) = .4332

d.

𝑃(0 < 𝑧 < .80) = .2881

Using Table II, Appendix D: a.

𝑃(𝑧 > 1.46) = .5 − 𝑃(0 < 𝑧 < 1.46) = .5 − .4279 = .0721

b.

𝑃(𝑧 < −1.56) = .5 − 𝑃(−1.56 < 𝑧 < 0) = .5 − .4406 = .0594

c.

𝑃(. 67 ≤ 𝑧 ≤ 2.41) = 𝑃(0 < 𝑧 ≤ 2.41) − 𝑃(0 < 𝑧 < .67) = .4920 − .2486 = .2434

Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions

4.86

d.

𝑃(−1.96 ≤ 𝑧 ≤ −.33) = 𝑃(−1.96 ≤ 𝑧 < 0) − 𝑃(−.33 ≤ 𝑧 < 0) = .4750 − .1293 = .3457

e.

𝑃(𝑧 ≥ 0) = .5

f.

𝑃(−2.33 < 𝑧 < 1.50) = 𝑃(−2.33 < 𝑧 < 0) + 𝑃(0 < 𝑧 < 1.50) = .4901 + .4332 = .9233

Using Table II, Appendix D, a.

𝑃(−1 < 𝑧 < 1) = 𝑃(−1 < 𝑧 < 0) + 𝑃(0 < 𝑧 < 1) = .3413 + .3413 = .6826

b.

𝑃(−2 < 𝑧 < 2) = 𝑃(−2 < 𝑧 < 0) + 𝑃(0 < 𝑧 < 2) = .4772 + .4772 = .9544

c.

𝑃(−2.16 ≤ 𝑧 ≤ 0.55) = 𝑃(−2.16 ≤ 𝑧 ≤ 0) + 𝑃(0 ≤ 𝑧 ≤ 0.55) = .4846 + .2088 = .6934

d.

𝑃(−.42 < 𝑧 < 1.96) = 𝑃(−.42 < 𝑧 ≤ 0) + 𝑃(0 ≤ 𝑧 < 1.96) = .1628 + .4750 = .6378

Copyright © 2022 Pearson Education, Inc.

171


172

4.87

Chapter 4

e.

𝑃(𝑧 ≥ −2.33) = 𝑃(−2.33 ≤ 𝑧 ≤ 0) + 𝑃(𝑧 ≥ 0) = .4901 + .5 = .9901

f.

𝑃(𝑧 < 2.33) = 𝑃(𝑧 ≤ 0) + 𝑃(0 ≤ 𝑧 ≤ 2.33) = .5 + .4901 = .9901

Using Table II, Appendix D: a.

𝑃(−1 ≤ 𝑧 ≤ 1) = 𝑃(−1 ≤ 𝑧 ≤ 0) + 𝑃(0 ≤ 𝑧 ≤ 1) = .3413 + .3413 = .6826

b.

𝑃(−1.96 ≤ 𝑧 ≤ 1.96) = 𝑃(−1.96 ≤ 𝑧 ≤ 0) + 𝑃(0 ≤ 𝑧 ≤ 1.96) = .4750 + .4750 = .9500

c.

𝑃(−1.645 ≤ 𝑧 ≤ 1.645) = 𝑃(−1.645 ≤ 𝑧 ≤ 0) + 𝑃(0 ≤ 𝑧 ≤ 1.645) = .4500 + .4500 = .9000 (using interpolation)

d.

𝑃(−2 ≤ 𝑧 ≤ 2) = 𝑃(−2 ≤ 𝑧 ≤ 0) + 𝑃(0 ≤ 𝑧 ≤ 2) = .4772 + .4772 = .9544

Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions 4.88

4.89

Using Table II, Appendix D: a.

𝑃(𝑧 ≥ 𝑧 ) = .05 𝐴 = .5 − .05 = .4500 Looking up the area .4500 in Table II gives 𝑧 = 1.645.

b.

𝑃(𝑧 ≥ 𝑧 ) = .025 𝐴 = .5 − .025 = .4750 Looking up the area .4750 in Table II gives 𝑧 = 1.96.

c.

𝑃(𝑧 ≤ 𝑧 ) = .025 𝐴 = .5 − .025 = .4750 Looking up the area .4750 in Table II gives 𝑧 = 1.96. Since z0 is to the left of 0, 𝑧 = −1.96.

d.

𝑃(𝑧 ≥ 𝑧 ) = .10 𝐴 = .5 − .1 = .4000 Looking up the area .4000 in Table II gives 𝑧 = 1.28.

e.

𝑃(𝑧 > 𝑧 ) = .10 𝐴 = .5 − .1 = .4000 𝑧 = 1.28 (same as in d)

Using Table II of Appendix D: a.

𝑃(𝑧 ≤ 𝑧 ) = .2090 𝐴 = .5 − .2090 = .2910 Looking up the area .2910 in the body of Table II gives𝑧 = −.81. (z0 is negative since the graph shows z0 is on the left side of 0.)

b.

𝑃(𝑧 ≤ 𝑧 ) = .7090 𝑃(𝑧 ≤ 𝑧 ) = 𝑃(𝑧 ≤ 0) + 𝑃(0 ≤ 𝑧 ≤ 𝑧 ) = .5 + 𝑃(0 ≤ 𝑧 ≤ 𝑧 ) = .7090 Therefore, 𝑃(0 ≤ 𝑧 ≤ 𝑧 ) = .7090 − .5 = .2090 = 𝐴 Looking up the area .2090 in the body of Table II gives 𝑧 ≈ .55.

c.

𝑃(−𝑧 ≤ 𝑧 ≤ 𝑧 ) = .8472 𝑃(−𝑧 ≤ 𝑧 ≤ 𝑧 ) = 2𝑃(0 ≤ 𝑧 ≤ 𝑧 ) = .8472 Copyright © 2022 Pearson Education, Inc.

173


174

Chapter 4

Therefore,𝑃(0 ≤ 𝑧 ≤ 𝑧 ) = .8472/2 = .4236. Looking up the area .4236 in the body of Table II gives 𝑧 = 1.43. d.

𝑃(−𝑧 ≤ 𝑧 ≤ 𝑧 ) = .1664 𝑃(−𝑧 ≤ 𝑧 ≤ 𝑧 ) = 2𝑃(0 ≤ 𝑧 ≤ 𝑧 ) = .1664 Therefore,𝑃(0 ≤ 𝑧 ≤ 𝑧 ) = .1664/2 = .0832. Looking up the area .0832 in the body of Table II gives 𝑧 = .21.

e.

𝑃(𝑧 ≤ 𝑧 ≤ 0) = .4798 𝑃(𝑧 ≤ 𝑧 ≤ 0) = 𝑃(0 ≤ 𝑧 ≤ −𝑧 ) Looking up the area .4798 in the body of Table II gives 𝑧 = −2.05.

f.

𝑃(−1 < 𝑧 < 𝑧 ) = .5328 𝑃(−1 < 𝑧 < 𝑧 ) = 𝑃(−1 < 𝑧 < 0) + 𝑃(0 < 𝑧 < 𝑧 ) = .5328 𝑃(0 < 𝑧 < 1) + 𝑃(0 < 𝑧 < 𝑧 ) = .5328 Thus,𝑃(0 < 𝑧 < 𝑧 ) = .5328 − .3413 = .1915 Looking up the area .1915 in the body of Table II gives 𝑧 = .50.

4.90

a.

𝑧=1

b.

𝑧 = −1

c.

𝑧=0

d.

𝑧 = −2.5

e.

𝑧=3

Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions 4.91

4.92

a.

𝑧=

=

= −2.50

b.

𝑧=

=

=0

c.

𝑧=

=

d.

𝑧=

=

= −3.75

e.

𝑧=

=

= 1.25

f.

𝑧=

=

= −1.25

.

= −0.625

Using Table II of Appendix D: a.

To find the probability that x assumes a value more than 2 standard deviations from 𝜇: 𝑃(𝑥 < 𝜇 − 2𝜎) + 𝑃(𝑥 > 𝜇 + 2𝜎) = 𝑃(𝑧 < 2) + 𝑃(𝑧 > 2) = 2𝑃(𝑧 > 2) = 2(. 5 − .4772) = 2(. 0228) = .0456 To find the probability that x assumes a value more than 3 standard deviations from 𝜇: 𝑃(𝑥 < 𝜇 − 3𝜎) + 𝑃(𝑥 > 𝜇 + 3𝜎) = 𝑃(𝑧 < 3) + 𝑃(𝑧 > 3) = 2𝑃(𝑧 > 3) = 2(. 5 − .4987) = 2(. 0013) = .0026

b.

To find the probability that x assumes a value within 1 standard deviation of its mean: 𝑃(𝜇 − 𝜎 < 𝑥 < 𝜇 + 𝜎) = 𝑃(−1 < 𝑧 < 1) = 2𝑃(0 < 𝑧 < 1) = 2(. 3413) = .6826

To find the probability that x assumes a value within 2 standard deviations of μ: 𝑃(𝜇 − 2𝜎 < 𝑥 < 𝜇 + 2𝜎) = 𝑃(−2 < 𝑧 < 2) = 2𝑃(0 < 𝑧 < 2) = 2(. 4772) = .9544 c.

To find the value of x that represents the 80th percentile, we must first find the value of z that corresponds to the 80th percentile. 𝑃(𝑧 < 𝑧 ) = .80. Thus,𝐴 + 𝐴 = .80. Since𝐴 = .50, 𝐴 = .80 − .50 = .30. Using the body of Table II, 𝑧 = .84.

Copyright © 2022 Pearson Education, Inc.

175


176

Chapter 4 To find x, we substitute the values into the z-score formula: 𝑧=

⇒ .84 =

⇒ 𝑥 = .84(10) + 1000 = 1008.4

To find the value of x that represents the 10th percentile, we must first find the value of z that corresponds to the 10th percentile. 𝑃(𝑧 < 𝑧 ) = .10. Thus,𝐴 = .50 − .10 = .40. Using the body of Table II, 𝑧 = −1.28. To find x, we substitute the values into the z-score formula: 𝑧=

4.93

a.

⇒ −1.28 =

⇒ 𝑥 = −1.28(10) + 1000 = 987.2

≤ 𝑧 ≤

𝑃(10 ≤ 𝑥 ≤ 12) = 𝑃

= 𝑃(−0.50 ≤ 𝑧 ≤ 0.50) = 𝐴 + 𝐴 = .1915 + .1915 = .3830

b.

= 𝑃(−2.50 ≤ 𝑧 ≤ −0.50)

≤ 𝑧 ≤

𝑃(6 ≤ 𝑥 ≤ 10) = 𝑃

= 𝑃(−2.50 ≤ 𝑧 ≤ 0) − 𝑃(−0.50 ≤ 𝑧 ≤ 0) = .4938 − .1915 = .3023

c.

=

≤ 𝑧 ≤

𝑃(13 ≤ 𝑥 ≤ 16) = 𝑃

𝑃(1.00 ≤ 𝑧 ≤ 2.50) = 𝑃(0 ≤ 𝑧 ≤ 2.50) − 𝑃(0 ≤ 𝑥 ≤ 1.00) = .4938 − .3413 = .1525

d.

𝑃(7.8 ≤ 𝑥 ≤ 12. 6) = 𝑃

.

≤ 𝑧 ≤

.

=

𝑃(−1.60 ≤ 𝑧 ≤ 0.80) = 𝐴 + 𝐴 = .4452 + .2881 = .7333

e.

𝑃(𝑥 ≥ 13.24) = 𝑃 𝑧 ≥

.

=

𝑃(𝑧 ≥ 1.12) = 𝐴 = .5 − 𝐴 = .5000 − .3686 = .1314

Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions f.

𝑃(𝑥 ≥ 7.62) = 𝑃 𝑧 ≥

.

177

=

𝑃(𝑧 ≥ −1.69) = 𝐴 + 𝐴 =.4545 + .5000 = .9545

4.94

The random variable x has a normal distribution with 𝜇 = 50 and 𝜎 = 3. a.

𝑃(𝑥 < 𝑥 ) = .8413 So, 𝐴 + 𝐴 = .8413 Since 𝐴 = .5, 𝐴 = .8413 − .5 = .3413 Looking up the area .3413 in the body of Table II, Appendix D gives 𝑧 = 1.0. To find x0, substitute all the values into the z-score formula: 𝑧=

b.

⇒ 1.0 =

⇒ 𝑥 = 50 + 3(1.0) = 53

𝑃(𝑥 > 𝑥 ) = .025 So, 𝐴 = .5 − .025 = .4750 Looking up the area .4750 in the body of Table II, Appendix D gives 𝑧 = 1.96. To find x0, substitute all the values into the z-score formula: 𝑧=

c.

⇒ 1.96 =

⇒ 𝑥 = 50 + 3(1.96) = 55.88

𝑃(𝑥 > 𝑥 ) = .95 So, 𝐴 + 𝐴 = .95. Since 𝐴 = .5, 𝐴 = .95 − .5 = .4500. Looking up the area .4500 in the body of Table II, Appendix D gives (since it is exactly between two values, average the z-scores) 𝑧 ≈ −1.645. To find x0, substitute into the z-score formula: 𝑧= ⇒ −1.645 = 3(1.645) = 45.065

d.

⇒ 𝑥 = 50 −

𝑃(41 ≤ 𝑥 ≤ 𝑥 ) = .8630 𝑧=

=

= −3

𝐴 = 𝑃(41 ≤ 𝑥 ≤ 𝜇) = 𝑃(−3 ≤ 𝑧 ≤ 0) = 𝑃(0 ≤ 𝑧 ≤ 3) = .4987 𝐴 + 𝐴 = .8630, since 𝐴 = .4987, 𝐴 = .8630 − .4987 = .3643. Looking up .3643 in the body of Table II, Appendix D gives 𝑧 = 1.1.

Copyright © 2022 Pearson Education, Inc.


178

Chapter 4 To find x0, substitute into the z-score formula: 𝑧= e.

⇒ 1.1 =

⇒ 𝑥 = 50 + 3(1.1) = 53.3

𝑃(𝑥 < 𝑥 ) = .10 So 𝐴 = .5 − .10 = .4000 Looking up area .4000 in the body of Table II, Appendix D gives 𝑧 = 1.28. Since z0 is to the left of 0, 𝑧 = −1.28. To find x0, substitute all the values into the z-score formula: ⇒ −1.28 = ⇒ 𝑥 = 50 − 3(1.28) = 46.16 𝑧=

f.

𝑃(𝑥 > 𝑥 ) = .01 𝐴 = .5 − .01 = .4900 Looking up area .4900 in the body of Table II, Appendix D gives 𝑧 = 2.33. To find x0, substitute all the values into the z-score formula: 𝑧=

4.95

a.

⇒ 2.33 =

⇒ 𝑥 = 50 + 3(2.33) = 56.99

In order to approximate the binomial distribution with the normal distribution, the interval 𝜇 ± 3𝜎 ⇒ 𝑛𝑝 ± 3 𝑛𝑝𝑞 should lie in the range 0 to n. When n = 25 and p = .4, 𝑛𝑝 ± 3 𝑛𝑝𝑞 ⇒ 25(. 4) ± 3 25(. 4)(1 − .4) ⇒ 10 ± 3√6 ⇒ 10 ± 7.3485 ⇒ (2.6515,17.3485) Since the interval calculated does lie in the range 0 to 25, we can use the normal approximation.

b.

𝜇 = 𝑛𝑝 = 25(. 4) = 10 and 𝜎 = 𝑛𝑝𝑞 = 25(. 4)(. 6) = 6

c.

𝑃(𝑥 ≥ 9) = 1 − 𝑃(𝑥 ≤ 8) = 1 − .274 = .726 (Table I, Appendix D)

d.

𝑃(𝑥 ≥ 9) ≈ 𝑃 𝑧 ≥

(

. ) √

= 𝑃(𝑧 ≥ −.61) = .5000 + .2291 = .7291 (Using Table II, Appendix D.)

4.96

𝜇 = 𝑛𝑝 = 1000(. 5) = 500, 𝜎 = a.

𝑛𝑝𝑞 =

Using the normal approximation, ( . ) 𝑃(𝑥 > 500) ≈ 𝑃 𝑧 > .

1000(. 5)(. 5) = 15.811

= 𝑃(𝑧 > .03) = .5 − .0120 = .4880 (from Table II, Appendix D)

Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions b.

𝑃(490 ≤ 𝑥 < 500) ≈ 𝑃

(

. ) .

= .2454 − .0120 = .2334 c.

4.97

a.

𝑃(𝑥 > 550) ≈ 𝑃 𝑧 >

(

. ) .

≤ 𝑧 <

(

. ) .

= 𝑃(−.66 ≤ 𝑧 < −.03)

(from Table II, Appendix D) = 𝑃(𝑧 > 3.19) = .5 − .49929 = .00071 (from Table II, Appendix D)

Using MINITAB with 𝜇 = 1.5 and 𝜎 = .2, the probability is: Cumulative Distribution Function Normal with mean = 1.5 and standard deviation = 0.2 x 1.6 1.3

P( X ≤ x ) 0.691462 0.158655

𝑃(1.3 < 𝑥 < 1.6) = 𝑃(𝑥 < 1.6) − 𝑃(𝑥 < 1.3) = .691462 − .158655 = .532807 b.

Using MINITAB with 𝜇 = 1.5 and 𝜎 = .2, the probability is: Cumulative Distribution Function Normal with mean = 1.5 and standard deviation = 0.2 x 1.4

P( X ≤ x ) 0.308538

𝑃(𝑥 > 1.4) = 1 − 𝑃(𝑥 < 1.4) = 1 − .308538 = .691462 c.

Using MINITAB with 𝜇 = 1.5 and 𝜎 = .2, the probability is: Cumulative Distribution Function Normal with mean = 1.5 and standard deviation = 0.2 x 1.5

P( X ≤ x ) 0.308538

𝑃(𝑥 < 1.5) = .5 4.98

a.

Using MINITAB with 𝜇 = 4.44 and 𝜎 = .82, the probability is: Cumulative Distribution Function Normal with mean = 4.44 and standard deviation = 0.82 x 4

P( X ≤ x ) 0.295777

𝑃(𝑥 > 4) = 1 − 𝑃(𝑥 < 4) = 1 − .295777 = .704223 b.

Using MINITAB with 𝜇 = 4.44 and 𝜎 = .82, the probability is: Cumulative Distribution Function Normal with mean = 4.44 and standard deviation = 0.82 x 2

P( X ≤ x ) 0.0014620

𝑃(2 < 𝑥 < 4) = 𝑃(𝑥 < 4) − 𝑃(𝑥 < 2) = .295777 − .001462 = .294315 Copyright © 2022 Pearson Education, Inc.

179


180

Chapter 4

c.

Using MINITAB with 𝜇 = 4.44 and 𝜎 = .82, the probability is: Cumulative Distribution Function Normal with mean = 4.44 and standard deviation = 0.82 x 1

P( X ≤ x ) 0.0000136

𝑃(𝑥 ≤ 1) = .0000136 Since the probability of observing a value of 1 or less is so small, it would be extremely unlikely that the ecolabel shown was Energy Star. 4.99

a.

Using MINITAB with 𝜇 = 5.22 and 𝜎 = .77, the probability is: Normal with mean = 5.22 and standard deviation = 0.77 x P( X ≤ x ) 5

0.387548

𝑃(𝑥 ≤ 5) = .3875 b.

Using MINITAB with 𝜇 = 5.22 and 𝜎 = .77, the probability is: Normal with mean = 5.22 and standard deviation = 0.77 x P( X ≤ x ) 4 7

0.056550 0.989603

𝑃(4 ≤ 𝑥 ≤ 7) = 𝑃(𝑥 ≤ 7) − 𝑃(𝑥 ≤ 4) = .989603 − .056550 = .933053 c.

𝑃(𝑥 > 𝑎) = .3 means 𝑃(𝑥 ≤ 𝑎) = .7 Using MINITAB with 𝜇 = 5.22 and 𝜎 = .77, the value of a is found: Normal with mean = 5.22 and standard deviation = 0.77 P( X ≤ x )

x

0.7 5.62379

Thus,𝑥 = 5.62379. d.

Using MINITAB with 𝜇 = 6.95 and 𝜎 = .55, the probability is: Normal with mean = 6.95 and standard deviation = 0.55 x P( X ≤ x ) 5 0.0001960

𝑃(𝑥 ≤ 5) = .0001960

Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions

181

Using MINITAB with 𝜇 = 6.95 and 𝜎 = .55, the probability is: Normal with mean = 6.95 and standard deviation = 0.55 x P( X ≤ x ) 4 7

0.000000 0.536218

𝑃(4 ≤ 𝑥 ≤ 7) = 𝑃(𝑥 ≤ 7) − 𝑃(𝑥 ≤ 4) = .536218 − .0000 = .536218 𝑃(𝑥 > 𝑎) = .3 means 𝑃(𝑥 ≤ 𝑎) = .7 Using MINITAB with 𝜇 = 6.95 and 𝜎 = .55, the value of a is found: Normal with mean = 6.95 and standard deviation = 0.55 P( X ≤ x )

x

0.7 7.23842

Thus, 𝑥 = 7.23842. 4.100

Using MINITAB with 𝜇 = 67.755 and 𝜎 = 26.871, the probabilities are: Cumulative Distribution Function Normal with mean = 67.755 and standard deviation = 26.871 x 40 120

P( X <= x ) 0.150826 0.974070

a.

𝑃(𝑥 < 40) = .150826

b.

𝑃(40 < 𝑥 < 120) = 𝑃(𝑥 < 120) − 𝑃(𝑥 < 40) = .974070 − .150826 = .823244

c.

𝑃(𝑥 > 120) = 1 − 𝑃(𝑥 < 120) = 1 − .974070 = .02593

d.

We want to find a where 𝑃(𝑥 < 𝑎) = .25. Using MINITAB with 𝜇 = 67.755 and 𝜎 = 26.871, the value of a is found: Inverse Cumulative Distribution Function Normal with mean = 67.755 and standard deviation = 26.871 P( X <= x ) 0.25

x 49.6308

Thus,x= 49.6308. 4.101

a.

Using MINITAB with 𝜇 = 59 and 𝜎 = 5, the probability is: Cumulative Distribution Function Normal with mean = 59 and standard deviation = 5 x 60

P( X <= x ) 0.579260

Copyright © 2022 Pearson Education, Inc.


182

Chapter 4 𝑃(𝑥 > 60) = 1 − 𝑃(𝑥 ≤ 60) = 1 − .57926 = .42074 b.

Using MINITAB with 𝜇 = 43 and 𝜎 = 5, the probability is: Cumulative Distribution Function Normal with mean = 43 and standard deviation = 5 x 60

P( X <= x ) 0.999663

𝑃(𝑥 > 60) = 1 − 𝑃(𝑥 ≤ 60) = 1 − .999663 = .000337 4.102

a.

Let 𝑥 = buy-side analyst’s forecast error. Then x has an approximate normal distribution with 𝜇 = .85 and 𝜎 = 1.93. Using Table II, Appendix D, .

𝑃(𝑥 > 2.00) = 𝑃 𝑧 > b.

4.104

4.105

= 𝑃(𝑧 > .60) = .5 − .2257 = .2743

Let 𝑦 = sell-side analyst’s forecast error. Then y has an approximate normal distribution with 𝜇 = −.05 and 𝜎 = .85. Using Table II, Appendix D, .

𝑃(𝑦 > 2.00) = 𝑃 𝑧 > 4.103

. .

( . .

)

= 𝑃(𝑧 > 2.41) = .5 − .4920 = .0080

a.

For 𝑛 = 700 and 𝑝 = .01, 𝜇 = 𝑛𝑝 = 700(. 01) = 7

b.

𝜎=

𝑛𝑝𝑞 =

c.

𝑧=

=

d.

𝑃(𝑥 ≤ 10) = 𝑃 𝑧 ≤

a.

For 𝑛 = 250 and 𝑝 = .5, 𝐸(𝑥) = 𝜇 = 𝑛𝑝 = 250(. 5) = 125

b.

𝜎=

𝑛𝑝𝑞 =

c.

𝑧=

=

d.

𝑃(𝑥 < 200) = 𝑃 𝑧 ≤

a.

Using MINITAB with 𝜇 = 160.3 and 𝜎 = 19.6, the probability is:

700(. 01)(. 99) = √6.93 = 2.6325

.

= 1.329

.

(

. )

= 𝑃(𝑧 ≤ 1.33) = .5 + .4082 = .9082 using Table II, Appendix D

.

250(. 5)(. 5) = √62.5 = 7.9057

.

= 9.48 (

. ) .

= 𝑃(𝑧 ≤ 8.42) = .5 + .51 = 1

Normal with mean = 160.3 and standard deviation = 19.6 x P( X ≤ x ) 120 0.0198854

𝑃(𝑥 ≤ 120) = .0198854

Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions b.

Using MINITAB with 𝜇 = 133.5 and 𝜎 = 19.6, the probability is: Normal with mean = 133.5 and standard deviation = 19.6 x P( X ≤ x ) 120

0.245482

𝑃(𝑥 ≤ 120) = .245482 c.

𝑃(𝑥 > 𝑎) = .75 means 𝑃(𝑥 ≤ 𝑎) = .25 Using MINITAB with 𝜇 = 160.3 and 𝜎 = 19.6, the value of a is found: Normal with mean = 160.3 and standard deviation = 19.6 P( X ≤ x )

x

0.25 147.080

Thus, 𝑥 = 147.080. d.

𝑃(𝑥 > 𝑎) = .75 means 𝑃(𝑥 ≤ 𝑎) = .25 Using MINITAB with 𝜇 = 133.5 and 𝜎 = 19.6, the value of a is found: Normal with mean = 133.5 and standard deviation = 19.6 P( X ≤ x )

x

0.25 120.280

Thus, 𝑥 = 120.280.

4.106

e.

If an MVIC value of 120 N or less was observed, it more likely occurred after repeated shooting sessions. The likelihood of a value that low at rest is too small to be expected.

a.

Using MINITAB with 𝜇 = 353 and 𝜎 = 30, the probability is: Cumulative Distribution Function Normal with mean = 353 and standard deviation = 30 x 400

P( X ≤ x ) 0.941404

𝑃(𝑥 < 400) = .941404 a.

Using MINITAB with 𝜇 = 184 and 𝜎 = 25, the probability is: Cumulative Distribution Function Normal with mean = 184 and standard deviation = 25 x 100

P( X ≤ x ) 0.0003897

𝑃(𝑥 > 100) = 1 − 𝑃(𝑥 < 100) = 1 − .0003897 = .9996103 4.107

a.

Let x = the image position of the first female image. Then x can be described using a normal distribution with 𝜇 = 3.7 and 𝜎 = .5. Copyright © 2022 Pearson Education, Inc.

183


184

Chapter 4

Using MINITAB with 𝜇 = 3.7 and 𝜎 = .5, the probability is: Normal with mean = 3.7 and standard deviation = 0.5 x P( X ≤ x ) 2 0.0003369

𝑃(𝑥 ≤ 2) = .0003369 b.

Let x = the image position of the first female image. Then x can be described using a normal distribution with 𝜇 = 2.0 and 𝜎 = .5. Using MINITAB with 𝜇 = 2.0 and 𝜎 = .5, the probability is: Normal with mean = 2 and standard deviation = 0.5 x P( X ≤ x ) 2

0.5

𝑃(𝑥 ≤ 2) = .50 4.108

We will find the probability of obtaining a FS less than or equal to 1 for each tunnel area. Tunnel Face: Using MINITAB with 𝜇 = 1.2 and 𝜎 = .16, the probability is: Cumulative Distribution Function Normal with mean = 1.2 and standard deviation = .16 x 1

P( X ≤ x ) 0.105650

𝑃(𝑥 ≤ 1) = .105650 Tunnel Walls: Using MINITAB with 𝜇 = 1.4 and 𝜎 = .20, the probability is: Cumulative Distribution Function Normal with mean = 1.4 and standard deviation = .20 x 1

P( X ≤ x ) 0.0227501

𝑃(𝑥 ≤ 1) = .0227501 Tunnel Crown: Using MINITAB with 𝜇 = 2.1 and 𝜎 = .70, the probability is: Cumulative Distribution Function Normal with mean = 2.1 and standard deviation = .70 x 1

P( X ≤ x ) 0.0580416

𝑃(𝑥 ≤ 1) = .0580416 Thus, the tunnel area that is most likely to result in failure is Tunnel Face with a probability of .105650.

Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions 4.109

a.

𝜇 = 𝑛𝑝 = 200(. 21) = 42, 𝜎 = 𝑃(𝑥 > 70) = 𝑃 𝑧 >

(

. )

𝑛𝑝𝑞 =

185

200(. 21)(. 79) = √33.18 = 5.7602

= 𝑃(𝑧 > 4.95) ≈ 0

.

If the true probability that a hotel guest experienced a better-than-expected quality of sleep and would return to the hotel is only .21, it would be extremely unlikely that more than 70 out of 200 guests would have had this experience. b. 4.110

In order for the claim to be true, the probability that a hotel guest experienced a better-than-expected quality of sleep and would return to the hotel would have to be much larger than .21.

Let x = wage rate. The random variable x is normally distributed with 𝜇 = 22.50 and 𝜎 = 1.25. Using Table II, Appendix D, .

.

a.

𝑃(𝑥 > 23.80) = 𝑃 𝑧 >

= 𝑃(𝑧 > 1.04) = .5 − 𝑃(0 < 𝑧 < 1.04) = .5 − .3508 = .1492

b.

𝑃(𝑥 > 23.80) = 𝑃 𝑧 >

.

c.

𝑃(𝑥 ≤ 𝜂) = 𝑃(𝑥 ≥ 𝜂) = .5. Thus, 𝜇 = 𝜂 = 22.50.

.

.

= 𝑃(𝑧 > 1.04) = .5 − 𝑃(0 < 𝑧 < 1.04) = .5 − .3508 = .1492 .

(Recall from section 2.4 that in a symmetric distribution, the mean equals the median.) 4.111

a.

Using Table II, Appendix D, and 𝜇 = 75and𝜎 = 7.5, 𝑃(𝑥 > 80) = 𝑃 𝑧 >

.

= 𝑃(𝑧 > .67) = .5 − .2486 = .2514

Thus, 25.14% of the scores exceeded 80. b.

𝑃(𝑥 ≤ 𝑥 ) = .98. Find x0.  x − 75  P ( x ≤ x0 ) = P  z ≤ 0  = P ( z ≤ z0 ) = .98  7.5  𝐴 = .98 − .5 = .4800 Looking up area .4800 in Table II, 𝑧 = 2.05. z0 =

4.112

x 0 − 75 x − 75  2.05 = 0  x 0 = 90.375 7.5 7.5

Let x = number of additional Electoral College votes a candidate will win if he/she wins California’s 55 votes. Then x has a normal distribution with 𝜇 = 241.5 and 𝜎 = 49.8. In order to be elected, the candidate will have to win an additional 270– 55 = 215 votes or x has to be greater than or equal to 215. Using MINITAB with 𝜇 = 241.5 and 𝜎 = 49.8, the probability is: Cumulative Distribution Function Normal with mean = 241.5 and standard deviation = 49.8 x 215

P( X <= x ) 0.297318

𝑃(𝑥 ≥ 215) = 1 − 𝑃(𝑥 < 215) = 1 − .297318 = .702682

The probability the candidate becomes the next president if he/she wins California is about .70.

Copyright © 2022 Pearson Education, Inc.


186 4.113

Chapter 4 b.

Let v = number of credit card users out of 100 who carry Visa. Then v is a binomial random variable with 𝑛 = 100 and 𝑝 = .50. 𝐸(𝑣) = 𝑛𝑝 = 100(. 50) = 50. Let d = number of credit/debit card users out of 100 who carry Discover. Then d is a binomial random variable with 𝑛 = 100 and 𝑝 = .005. 𝐸(𝑑) = 𝑛𝑝 = 100(. 005) = .5.

c.

To see if the normal approximation is valid, we use: 𝜇 ± 3𝜎 => 𝑛𝑝 ± 3 𝑛𝑝 𝑞 => 100(.5) ± 3 100(. 50). 50) => 50 ± 3(5) => 50 ± 15 => (35, 65) Since the interval lies in the range 0 to 100, we can use the normal approximation to approximate the probability. 𝑃(𝑣 ≥ 50) ≈ 𝑃 𝑧 ≥

(

. )

= 𝑃(𝑧 ≥ −.10) = .5 + .0398 = .5398

Let m = number of credit/debit card users out of 100 who carry Mastercard. Then m is a binomial random variable with 𝑛 = 100 and 𝑝 = .26. To see if the normal approximation is valid, we use: 𝜇 ± 3𝜎 => 𝑛𝑝 ± 3 𝑛𝑝 𝑞 => 100(.26) ± 3 100(. 26). 24) => 26 ± 3(2.4978) => 26 ± 7.494 => (18.506, 33.494) Since the interval lies in the range 0 to 100, we can use the normal approximation to approximate the probability. 𝑃(𝑎 ≥ 50) ≈ 𝑃 𝑧 ≥

4.114

(

. ) .

= 𝑃(𝑧 ≥ 9.81) ≈ .5 − .5 = 0

d.

In order for the normal approximation to be valid, 𝜇 ± 3𝜎 must lie in the interval (0, n). This check was done in part c for both portions of the question. The normal approximation was justified for both parts.

a.

Let x = quantity injected per container. The random variable x has a normal distribution with 𝜇 = 10 and 𝜎 = .2. 𝑃(𝑥 < 10) = 𝑃 𝑧 < 𝑃(𝑥 ≥ 10) = 𝑃 𝑧 ≥

.

.

= 𝑃(𝑧 < 0.0) = .5 = 𝑃(𝑧 ≥ 0.0) = .5

b.

Since the container needed to be reprocessed, it cost $10. Upon refilling, it contained 10.60 units with a cost of 10.60($20) = $212. Thus, the total cost for filling this container is $10 + $212 = $222. Since the container sells for $230, the profit is $230 − $222 = $8.

c.

Let x = quantity injected per container. The random variable x has a normal distribution with 𝜇 = 10.60 and 𝜎 = .2. The expected value of x is 𝐸(𝑥) = 𝜇 = 10.60. The cost of a container with 10.60 units is 10.60($20) = $212. Thus, the expected profit would be the selling price minus the cost or $230 − $212 = $18.

Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions 4.115

187

We have to find the probability of observing 𝑥 = .7 or anything more unusual given the two different values of 𝜇. Without receiving executive coaching: Using Table II, Appendix D with 𝜇 = .75 and 𝜎 = .085, . . = 𝑃(𝑧 ≤ −.59) = .5 − .2224 = .2776. 𝑃(𝑥 ≤ .7) = 𝑃 𝑧 ≤ .

After receiving executive coaching: Using Table II, Appendix D with 𝜇 = .52 and 𝜎 = .075, . . 𝑃(𝑥 ≥ .7) = 𝑃 𝑧 ≥ = 𝑃(𝑧 ≥ 2.40) = .5 − .4918 = .0082. .

Since the probability of observing 𝑥 ≤ .7 for those not receiving executive coaching is much larger than the probability of 𝑥 ≥ .7 for those receiving executive coaching, it is more likely that the leader did not receive executive coaching. 4.116

a.

If z is a standard normal random variable, 𝑄 = 𝑧 is the value of the standard normal distribution which has 25% of the data to the left and 75% to the right. Find zL such that 𝑃(𝑧 < 𝑧 ) = .25 𝐴 = .50 − .25 = .25. Look up the area 𝐴 = .25 in the body of Table II of Appendix D; 𝑧 − .67 (taking the closest value). If interpolation is used, −.675 would be obtained. 𝑄 = 𝑧 is the value of the standard normal distribution which has 75% of the data to the left and 25% to the right. Find zU such that 𝑃(𝑧 < 𝑧 ) = .75 𝐴 + 𝐴 = 𝑃(𝑧 ≤ 0) + 𝑃(0 ≤ 𝑧 ≤ 𝑧 ) = .5 + 𝑃(0 ≤ 𝑧 ≤ 𝑧 ) = .75 Therefore, 𝑃(0 ≤ 𝑧 ≤ 𝑧 ) = .25. Look up the area .25 in the body of Table II of Appendix D; 𝑧

b.

= .67 (taking the closest value).

Recall that the inner fences of a box plot are located 1.5(𝑄 − 𝑄 ) outside the hinges (QL and QU). To find the lower inner fence, 𝑄 − 1.5(𝑄 − 𝑄 ) = −.67 − 1.5 .67 − (−.67) = −.67 − 1.5(1.34) = −2.68 (−2.70 if 𝑧 = −.675 and 𝑧 = +.675) The upper inner fence is: 𝑄 + 1.5(𝑄 − 𝑄 ) = .67 + 1.5 .67 − (−.67) = .67 + 1.5(1.34) = 2.68 (+2.70 if 𝑧 = −.675 and 𝑧 = +.675)

c.

Recall that the outer fences of a box plot are located 3(𝑄 − 𝑄 ) outside the hinges (QL and QU).

Copyright © 2022 Pearson Education, Inc.


188

Chapter 4 To find the lower outer fence, 𝑄 − 3(𝑄 − 𝑄 ) = −.67 − 3 .67 − (−.67) = −.67 − 3(1.34) = −4.69 (−4.725 if 𝑧 = −.675 and 𝑧 = +.675) The upper outer fence is: 𝑄 + 3(𝑄 − 𝑄 ) = .67 + 3 .67 − (−.67) = .67 + 3(1.34) = 4.69 (4.725 if 𝑧 = −.675 and 𝑧 = +.675) d.

𝑃(𝑧 < −2.68) + 𝑃(𝑧 > 2.68) = 2𝑃(𝑧 > 2.68) = 2(. 5000 − .4963) = 2(. 0037) = .0074 (Table II, Appendix D) (or 2(. 5000 − .4965) = .0070 if − 2.70 and 2. 70 are used) 𝑃(𝑧 < −4.69) + 𝑃(𝑧 > 4.69) = 2𝑃(𝑧 > 4.69) ≈ 2(. 5000 − .5000) ≈ 0

4.117

4.118

4.119

e.

In a normal probability distribution, the probability of an observation being beyond the inner fences is only .0074 and the probability of an observation being beyond the outer fences is approximately zero. Since the probability is so small, there should not be any observations beyond the inner and outer fences. Therefore, they are probably outliers.

a.

The proportion of measurements that one would expect to fall in the interval 𝜇 ± 𝜎 is about .68.

b.

The proportion of measurements that one would expect to fall in the interval 𝜇 ± 2𝜎 is about .95.

c.

The proportion of measurements that one would expect to fall in the interval 𝜇 ± 3𝜎 is about 1.00.

a.

𝐼𝑄𝑅 = 𝑄 − 𝑄 = 195 − 72 = 123

b.

𝐼𝑄𝑅/𝑠 = 123/95 = 1.295

c.

Yes. Since IQR is approximately 1.3, this implies that the data are approximately normal.

If the data are normally distributed, then the normal probability plot should be an approximate straight line. Of the three plots, only plot c implies that the data are normally distributed. The data points in plot c form an approximately straight line. In both plots a and b, the plots of the data points do not form a straight line.

Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions 4.120

a.

189

Using MINITAB, the stem-and-leaf display is: Stem-and-Leaf Display: x Stem-and-leaf of x N Leaf Unit = 0.10 5 6 8 11 14 14 10 7 2

1 2 3 4 5 6 7 8 9

= 28

11266 1 35 035 039 3457 346 24469 47

Since the data do not form a mound-shape, it indicates that the data may not be normally distributed. b.

Using MINITAB, the descriptive statistics are: Descriptive Statistics: x Variable x

N 28

Mean 5.511

StDev 2.765

Minimum 1.100

Q1 3.350

Median 6.100

Q3 8.050

Maximum 9.700

The standard deviation is 2.765. c.

Using the printout from MINITAB in part b, 𝑄 = 3.35, and 𝑄 = 8.05. The 𝐼𝑄𝑅 = 𝑄 − 𝑄 = 8. 05 − 3.35 = 4.7. If the data are normally distributed, then 𝐼𝑄𝑅/𝑠 ≈ 1.3. For this data, 𝐼𝑄𝑅/𝑠 ≈ 4.7/2.765 = 1.70. This is a fair amount larger than 1.3, which indicates that the data may not be normally distributed.

d.

Using MINITAB, the normal probability plot is: Probability Plot of x Normal - 95% CI 99

Mean StDev N AD P-Value

95 90

5.511 2.765 28 0.533 0.158

Percent

80 70 60 50 40 30 20 10 5 1

-5

0

5 x

10

15

The data do not form a particularly a straight line. This indicates that the data are not normally distributed. 4.121

a.

For data that are normally distributed, the ratio 𝐼𝑄𝑅/𝑠 should be approximately 1.3. The ratio for contestants who played for a job, the ratio is 𝐼𝑄𝑅/𝑠 = 7/4.324 = 1.62. This number is close to 1.3. Therefore, the data are approximately normally distributed. Copyright © 2022 Pearson Education, Inc.


190

Chapter 4

b.

The ratio for contestants who played for a business partnership, the ratio is 𝐼𝑄𝑅/𝑠 = 7/4.809 = 1.456. This number is close to 1.3. Therefore, the data are approximately normally distributed.

4.122

The histogram of the data is very close to a normal distribution. The engineers should use the normal distribution to model the behavior of shear strength for rack fractures.

4.123

a.

If the data are normal, then approximately 68% of the observations should fall within 1 standard deviation of the mean. For this data, the interval is 𝑥̄ ± 𝑠 ⇒ 89.2906 ± 3.1834 ⇒ (86.1072,92.4740). There are 34 out of the 50 observations in this interval which is 34/50 = .68 or 68%. This is exactly the 68%. If the data are normal, then approximately 95% of the observations should fall within 2 standard deviations of the mean. For this data, the interval is 𝑥̄ ± 2𝑠 ⇒ 89.2906 ± 2(3.1834) ⇒ 89.2906 ± 6.3668 ⇒ (82.9238,95.6574). There are 48 out of the 50 observations in this interval which is 48/50 = .96 or 96%. This is very close to the 95%. If the data are normal, then approximately 100% of the observations should fall within 3 standard deviations of the mean. For this data, the interval is 𝑥̄ ± 3𝑠 ⇒ 89.2906 ± 3(3.1834) ⇒ 89.2906 ± 9.5502 ⇒ (79.7404,98.8408). There are 50 out of the 50 observations in this interval which is 50/50 = 1.00 or 100%. This is exactly the 100%. Since these percents are very close to percentages for the normal distribution, this indicates that the data are approximately normal. The 𝐼𝑄𝑅 = 𝑄 − 𝑄 = 91.88 − 87.2725 = 4.6075 and the standard deviation is s = 3.1834. If the . ≈ 1.3. For this data, = = 1.447. This is fairly close to 1.3. This data are normal, then . indicates that the data are approximately normal.

b.

The data on the plot are fairly close to a straight line. This indicates that the data are approximately normal.

4.124

Based on the normal probability plot, it appears that the data are not approximately normal. If the data are normal, then the probability plot should reflect a straight line. In this graph, the plot of the data is not a straight line.

4.125

a.

Using MINITAB, a histogram of the data is: Histogram of Support Normal 80

Mean 67.76 StDev 26.87 N 992

70

Frequency

60 50 40 30 20 10 0

0

20

40

60

80 Support

100

120

140

The data are fairly mound-shaped. This indicates that the data are probably from a normal distribution. Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions b.

191

Using MINITAB, the descriptive statistics are: Descriptive Statistics: Support Variable Support

N 992

Mean 67.755

StDev 26.871

Minimum 0.000000000

Q1 49.000

Median 68.000

Q3 86.000

Maximum 155.000

If the data are normal, then approximately 68% of the observations should fall within 1 standard deviation of the mean. For this data, the interval is 𝑥̄ ± 𝑠 ⇒ 67.755 ± 26.871 ⇒ (40.884,94.626). There are 665 out of the 992 observations in this interval which is 665/992 = .670 or 67%. This is very close to the 68%. If the data are normal, then approximately 95% of the observations should fall within 2 standard deviations of the mean. For this data, the interval is 𝑥̄ ± 2𝑠 ⇒ 67.755 ± 2(26.871) ⇒ 67.755 ± 53.742 ⇒ (14.013,121.497). There are 946 out of the 992 observations in this interval which is 946/992 = .954 or 95.4%. This is very close to the 95%. If the data are normal, then approximately 100% of the observations should fall within 3 standard deviations of the mean. For this data, the interval is 𝑥̄ ± 3𝑠 ⇒ 67.755 ± 3(26.871) ⇒ 67.755 ± 80.613 ⇒ (−12.858,148.368). There are 991 out of the 992 observations in this interval which is 991/992 = .999 or 99.9%. This is very close to the 100%. Since these percents are very close to percentages for the normal distribution, this indicates that the data are probably from a normal distribution. c.

The 𝐼𝑄𝑅 = 𝑄 − 𝑄 = 86 − 49 = 37 and the standard deviation is 𝑠 = 26.871. If the data are ≈ 1.3. For this data, = = 1.377. This is very close to 1.3. This indicates normal, then . that the data probably come from a normal distribution.

d.

Using MINITAB, the normal probability plot is: Probability Plot of Support Normal - 95% CI 99.99

Mean StDev N AD P-Value

99

Percent

95

67.76 26.87 992 0.496 0.214

80 50 20 5 1

0.01

0

50

100 Support

150

200

Except for the several 0’s on the left of the plot, the data are very close to a straight line. This again indicates that the data probably come from a normal distribution.

Copyright © 2022 Pearson Education, Inc.


4.126

Chapter 4 We will look at the 4 methods or determining if the data are normal. First, we will look at a histogram of the data. Using MINITAB, the histogram of the failure times of the 50 used panels is: Histogram of Fail Normal 12

Mean 1.935 StDev 0.9287 N 50

10 8 Frequency

192

6 4 2 0

0

1

2 Fail

3

4

From the histogram, the data appear to have a somewhat normal distribution. Next, we look at the intervals 𝑥̄ ± 𝑠, 𝑥̄ ± 2𝑠, 𝑥̄ ± 3𝑠. If the proportions of observations falling in each interval are approximately .68, .95, and 1.00, then the data are approximately normal. Using MINITAB, the summary statistics are: Descriptive Statistics: Fail Variable Fail

N 50

Mean 1.935

StDev 0.929

Q1 1.218

Median 1.835

Q3 2.645

𝑥̄ ± 𝑠 ⇒ 1.935 ± .929 ⇒ (1.006, 2.864). 33 of the 50 values fall in this interval. The proportion is 33/50 = .66. This is fairly close to the .68 we would expect if the data were normal. 𝑥̄ ± 2𝑠 ⇒ 1.935 ± 2(. 929) ⇒ 1.935 ± 1.858 ⇒ (0.077, 3.793). 49 of the 50 values fall in this interval. The proportion is 49/50 = .98. This is a fair amount above the .95 we would expect if the data were normal. 𝑥̄ ± 3𝑠 ⇒ 1.935 ± 3(. 929) ⇒ 1.935 ± 2.787 ⇒ (−0.852, 4.722). 50 of the 50 values fall in this interval. The proportion is 50/50 =1.00. This is equal to the 1.00 we would expect if the data were normal. From this method, it appears that the data may be normal. Next, we look at the ratio of the IQR to s. 𝐼𝑄𝑅 = Q – 𝑄 = 2.645– 1.218 = 1.427. .

= = 1.54. This is somewhat larger than the 1.3 we would expect if the data were normal. This . method indicates the data may be normal.

Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions

193

Finally, using MINTAB, the normal probability plot is: Probability Plot of Fail Normal - 95% CI 99

Mean 1.935 StDev 0.9287 N 50 AD 0.305 P-Value 0.557

95 90

Percent

80 70 60 50 40 30 20 10 5 1

-1

0

1

2 Fail

3

4

5

Since the data form a fairly straight line, the data may be normal. From the 4 different methods, all indications are that the failure times are approximately normal. 4.127

Using MINITAB, a histogram of the data is:

The data are not particularly mound-shaped. This indicates that the data may not be from a normal distribution. Using MINITAB, the descriptive statistics are: Statistics Variable Academic Rep Score

Total Count Mean StDev Minimum 50

75.92

13.38

Q1 Median

50.00 64.75

Q3 Maximum

75.00 88.25

IQR

100.00 23.50

If the data are normal, then approximately 68% of the observations should fall within 1 standard deviation of the mean. For this data, the interval is 𝑥̄ ± 𝑠 ⇒ 75.92 ± 13.38 ⇒ (62.54, 89.30). There are 30 out of the 50 observations in this interval which is 30/50 = .6 or 60%. This is somewhat smaller than the 68% we would expect. If the data are normal, then approximately 95% of the observations should fall within 2 standard deviations of the mean. For this data, the interval is 𝑥̄ ± 2𝑠 ⇒ 75.92 ± 2(13.38) ⇒ 75.92 ± 26.76 ⇒ Copyright © 2022 Pearson Education, Inc.


194

Chapter 4 (49.16, 102.68). All of the 100 observations are in this interval which is 100%. This is somewhat larger than the 95% we would expect. If the data are normal, then approximately 100% of the observations should fall within 3 standard deviations of the mean. For this data, the interval is 𝑥̄ ± 3𝑠 ⇒ 75.92 ± 3(13.38) ⇒ 75.92 ± 40.14 ⇒ (35.78, 116.06). All of the 100 observations are in this interval which is 100%. This is equal to the 100% we would expect. Since these percents are not very close to percentages for the normal distribution, this indicates that the data may not be from a normal distribution. The 𝐼𝑄𝑅 = 23.50 and the standard deviation is 𝑠 = 13.38. If the data are normal, then = data, distributed.

. .

≈ 1.3. For this

= 1.76. This is somewhat larger than what we would expect if the data were normally

Using MINITAB, the normal probability plot is:

The data do not form a real straight line. This again indicates that the data probably do not come from a normal distribution. 4.128

We will look at the 4 methods for determining if the data are normal. First, we will look at a histogram of the data. Using MINITAB, the histogram of the sanitation scores is:

From the histogram, the data appear to be skewed to the left. This indicates that the data are not normal. Next, we look at the intervals 𝑥̄ ± 𝑠, 𝑥̄ ± 2𝑠, 𝑥̄ ± 3𝑠. If the proportions of observations falling in each interval are approximately .68, .95, and 1.00, then the data are approximately normal. Using MINITAB, Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions

195

the summary statistics are: Statistics Variable Score

Total Count Mean StDev Minimum 211 94.882

3.973

Q1 Median

79.000 92.000

Q3 Maximum

96.000 98.000

IQR

100.000 6.000

𝑥̄ ± 𝑠 ⇒ 94.882 ± 3.973 ⇒ (90.909, 98.855) 154 of the 211 values fall in this interval. The proportion is 154/211 = .730. This is much larger than the .68 we would expect if the data were normal. 𝑥̄ ± 2𝑠 ⇒ 94.882 ± 2(3.973) ⇒ 94.882 ± 7.946 ⇒ (86.936, 102.828). 204 of the 211 values fall in this interval. The proportion is 204/211 = .967. This is slightly larger than the .95 we would expect if the data were normal. 𝑥̄ ± 3𝑠 ⇒ 94.882 ± 3(3.973) ⇒ 94.882 ± 11.919 ⇒ (82.963, 106.801) 209 of the 211 values fall in this interval. The proportion is 209/211 = .991. This is somewhat smaller than the 1.00 we would expect if the data were normal. From this method, it appears that the data are not normal. = = 1.510 This is fairly close to the 1.3 we would Next, we look at the ratio of the IQR to s. . expect if the data were normal. This method indicates the data could be normal. Finally, using MINITAB, the normal probability plot is:

Since the data do not form a straight line, the data are not normal. From 3 of the 4 different methods, the indications are that the sanitation scores data are not normal.

Copyright © 2022 Pearson Education, Inc.


196 4.129

Chapter 4 We will look at the 4 methods or determining if the 3 variables are normal. Distance: First, we will look at A histogram of the data. Using MINITAB, the histogram of the distance data is:

From the histogram, the distance data does appear to have a normal distribution. Next, we look at the intervals 𝑥̄ ± 𝑠, 𝑥̄ ± 2𝑠, 𝑥̄ ± 3𝑠. If the proportions of observations falling in each interval are approximately .68, .95, and 1.00, then the data are approximately normal. Using MINITAB, the summary statistics are: Statistics Variable

Total Count

DISTANCE ACCURACY INDEX

Mean StDev Minimum

25 305.73 10.63 25 61.36 7.20 25 3.171 1.984

Q1 Median

282.00 299.85 48.79 56.70 0.565 1.722

Q3 Maximum

305.60 314.30 60.32 65.94 2.917 3.929

327.10 80.95 9.434

IQR

14.45 9.25 2.207

For Distance: 𝑥̄ ± 𝑠 ⇒ 305.73 ± 10.63 ⇒ (295.10, 316.36). 16 of the 25 values fall in this interval. The proportion is 16/25 = .64. This is fairly close to the .68 we would expect if the data were normal. 𝑥̄ ± 2𝑠 ⇒ 305.73 ± 2(10.63) ⇒ 305.73 ± 21.64 ⇒ (284.47, 326.99). 23 of the 25 values fall in this interval. The proportion is 23/25 = .92. This is close to the .95 we would expect if the data were normal. 𝑥̄ ± 3𝑠 ⇒ 305.73 ± 3(10.63) ⇒ 305.73 ± 31.89 ⇒ (273.84, 337.62). 25 of the 25 values fall in this interval. The proportion is 25/25 = 1.00. This is equal to the 1.00 we would expect if the data were normal. From this method, it appears that the distance data may be normal. Next, we look at the ratio of the IQR to s. .

= = 1.36 This is very close to the 1.3 we would expect if the data were normal. This method . indicates the distance data may be normal.

Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions

197

Finally, using MINITAB, the normal probability plot is:

Since the data do form a fairly straight line, the distance data may be normal. From the 4 different methods, all indications are that the distance data are normal. Accuracy: First, we will look at a histogram of the data. Using MINITAB, the histogram of the accuracy data is:

From the histogram, the accuracy data do appear to have a fairly normal distribution. From the Descriptive Statistics above:

𝑥̄ ± 𝑠 ⇒ 61.36 ± 7.20 ⇒ (54.16, 68.56) 19 of the 25 values fall in this interval. The proportion is 19/25 = .76. This is greater than the .68 we would expect if the data were normal. 𝑥̄ ± 2𝑠 ⇒ 61.36 ± 2(7.20) ⇒ 61.36 ± 14.40 ⇒ (46.96, 75.76) 24 of the 25 values fall in this interval. The proportion is 24/25 = .96 This is very close to the .95 we would expect if the data were normal. 𝑥̄ ± 3𝑠 ⇒ 61.36 ± 3(7.20) ⇒ 61.36 ± 21.60 ⇒ (39.76, 82.96) 25 of the 25 values fall in this interval. The proportion is 25/25 = 1.00. This is equal to the 1.00 we would expect if the data were normal. From this method, it appears that the accuracy data may be normal. Next, we look at the ratio of the IQR to s.

Copyright © 2022 Pearson Education, Inc.


198

Chapter 4 .

= = 1.28. This is fairly close to the 1.3 we would expect if the data were normal. This method . indicates the accuracy data may be normal. Finally, using MINTAB, the normal probability plot is:

Since the data do form a fairly straight line, the accuracy data may be normal. From the 4 different methods, all indications are that the accuracy data might be normal. Index: First, we will look at a histogram of the data. Using MINITAB, the histogram of the index data is:

From the histogram, the index data do not appear to have a normal distribution. From the Descriptive Statistics above:

𝑥̄ ± 𝑠 ⇒ 3.171 ± 1.984 ⇒ (1.187, 5.155). 19 of the 25 values fall in this interval. The proportion is 19/25 = .76. This is greater than the .68 we would expect if the data were normal. 𝑥̄ ± 2𝑠 ⇒ 3.171 ± 2(1.984) ⇒ 3.171 ± 3.968 ⇒ (−0.797, 7.139). 24 of the 25 values fall in this interval. The proportion is 24/25 = .96. This is very close to the .95 we would expect if the data were normal.

Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions

199

𝑥̄ ± 3𝑠 ⇒ 3.171 ± 3(1.984) ⇒ 3.171 ± 5.952 ⇒ (−2.781, 9.123). 25 of the 25 values fall in this interval. The proportion is 25/25 = 1.000. This is equal to the 1.00 we would expect if the data were normal. From this method, it appears that the index data may not be normal. Next, we look at the ratio of the IQR to s. .

= = 1.11. This is not very close to the 1.3 we would expect if the data were normal. This method . indicates the index data may not be normal. Finally, using MINTAB, the normal probability plot is:

Since the data do not form a fairly straight line, the index data may not be normal. From 3 of the 4 different methods, the indications are that the index data are not normal. Using MINITAB, the histograms of the data are: Histogram of PermA, PermB, PermC Normal

PermA

20

PermB

PermA Mean 73.62 StDev 14.48 N 100 PermB Mean 128.5 StDev 21.97 N 100

40 15

30

10

Frequency

4.130

20

5

10

0

0

45

60

75

90

105

120

60

80

100

120

140

160

180

PermC

16 12 8 4 0

45

60

75

90

105

120

Copyright © 2022 Pearson Education, Inc.

PermC Mean 83.07 StDev 20.05 N 100


200

Chapter 4 None of the three histograms appear to be mound-shaped. The histograms for Groups A and C appear to be skewed to the right, while the histogram for Group B appears to be skewed to the left. Thus, it does not appear that any of the 3 distributions are normally distributed. Next, we look at the intervals 𝑥̄ ± 𝑠, 𝑥̄ ± 2𝑠, 𝑥̄ ± 3𝑠. If the proportions of observations falling in each interval are approximately .68, .95, and 1.00, then the data are approximately normal. Using MINITAB, the summary statistics are: Descriptive Statistics: PermA, PermB, PermC Variable PermA PermB PermC

N 100 100 100

Mean 73.62 128.54 83.07

StDev 14.48 21.97 20.05

Minimum 55.20 50.40 52.20

Q1 62.00 108.65 67.72

Median 70.45 139.30 78.65

Q3 81.42 147.02 95.35

Maximum 122.40 150.00 129.00

IQR 19.42 38.37 27.63

For Group A: 𝑥̄ ± 𝑠 ⇒ 73.62 ± 14.48 ⇒ (59.14, 88.10). 76 of the 100 values fall in this interval. The proportion is 76/100 = .76. This is much larger than the .68 we would expect if the data were normal. 𝑥̄ ± 𝑠 ⇒ 73.62 ± 2(14.48) ⇒ 73.62 ± 28.96 ⇒ (44.66, 102.58). 96 of the 100 values fall in this interval. The proportion is 96/100 = .96. This is slightly larger than the .95 we would expect if the data were normal. 𝑥̄ ± 𝑠 ⇒ 73.62 ± 3(14.48) ⇒ 73.62 ± 43.44 ⇒ (30.18, 117.06). 97 of the 100 values fall in this interval. The proportion is 97/100 = .97. This is much smaller than the 1.00 we would expect if the data were normal. From this method, it appears that the data are not normal. For Group B: 𝑥̄ ± 𝑠 ⇒ 128.54 ± 21.97 ⇒ (106.57, 150.51). 81 of the 100 values fall in this interval. The proportion is 81/100 = .81. This is much larger than the .68 we would expect if the data were normal. 𝑥̄ ± 𝑠 ⇒ 128.54 ± 2(21.97) ⇒ 128.54 ± 43.94 ⇒ (84.60, 172.48). 98 of the 100 values fall in this interval. The proportion is 98/100 = .98. This is larger than the .95 we would expect if the data were normal. 𝑥̄ ± 𝑠 ⇒ 128.54 ± 3(21.97) ⇒ 128.54 ± 65.91 ⇒ (62.63, 194.45). 98 of the 100 values fall in this interval. The proportion is 98/100 = .98. This is somewhat smaller than the 1.00 we would expect if the data were normal. From this method, it appears that the data are not normal. For Group C: 𝑥̄ ± 𝑠 ⇒ 83.07 ± 20.05 ⇒ (63.02, 103.12). 66 of the 100 values fall in this interval. The proportion is 66/100 = .66. This is about the same as the .68 we would expect if the data were normal. 𝑥̄ ± 𝑠 ⇒ 83.07 ± 2(20.05) ⇒ 83.07 ± 40.10 ⇒ (42.97, 123.17). 96 of the 100 values fall in this interval. The proportion is 96/100 = .96. This is slightly larger than the .95 we would expect if the data were normal. 𝑥̄ ± 𝑠 ⇒ 83.07 ± 3(20.05) ⇒ 83.07 ± 60.15 ⇒ (22.92, 143.22). 100 of the 100 values fall in this interval. The proportion is 100/100 = 1.00. This agrees with the 1.00 we would expect if the data were normal. From this method, it appears that the data are approximately normal.

Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions

201

Next, we look at the ratio of the IQR to s. .

For Group A, = = 1.341. This is fairly close to the 1.3 we would expect if the data were normal. . This method indicates the data could be normal. .

For Group B, = = 1.746. This is larger than the 1.3 we would expect if the data were normal. . This method indicates the data may not be normal. .

For Group C, = = 1.378. This is fairly close to the 1.3 we would expect if the data were normal. . This method indicates the data could be normal. Finally, using MINITAB, the normal probability plot is: Probability Plot of PermA, PermB, PermC Normal - 95% CI

PermA

Percent

99.9 99 90

90

50

50

10

10

1

1

0.1

0.1

30

60

PermB

99.9 99

90

120

50

100

PermA Mean 73.62 StDev 14.48 N 100 AD 2.564 P-Value <0.005

150

PermB Mean 128.5 StDev 21.97 N 100 AD 5.887 P-Value <0.005

200

PermC

99.9 99

PermC Mean 83.07 StDev 20.05 N 100 AD 2.205 P-Value <0.005

90 50 10 1 0.1

0

40

80

120

160

The data do not form a straight line for any of the 3 groups. This indicates that the data are probably not normal. Thus, based on the histograms and normal probability plots, it appears that the data for all three groups are not normally distributed. 4.131

If the data are normally distributed, the distribution will be symmetric and the mean and median will be close in value. For this data set, the mean is much greater than the median, indicating the data are not normally distributed. Thus, it is very unlikely that the data are normally distributed.

4.132

a.

(𝑐 ≤ 𝑥 ≤ 𝑑)

𝑓(𝑥) = = 𝑓(𝑥) =

b.

𝜇=

= (3 ≤ 𝑥 ≤ 7) 0 otherwise =

=

=5

𝜎=

=

=

= 1.155

Copyright © 2022 Pearson Education, Inc.


202

Chapter 4 c.

𝜇 ± 𝜎 ⇒ 5 ± 1.155 ⇒ (3.845, 6.155) =

𝑃(𝜇 − 𝜎 ≤ 𝑥 ≤ 𝜇 + 𝜎) = 𝑃(3.845 ≤ 𝑥 ≤ 6.155) = 4.133

a.

(𝑐 ≤ 𝑥 ≤ 𝑑)

𝑓(𝑥) = =

=

= .04

.04 (20 ≤ 𝑥 ≤ 45) 0 otherwise

So, 𝑓(𝑥) = b.

𝜇=

c.

Using MINITAB, the graph is:

f(x)

.

=

=

= 32.5

𝜎=

=

= 7.22

1/25

0

20

45 x

𝜇 − 2𝜎 = 18.06

𝜇 = 32.5

𝜇 + 2𝜎 = 46.94

𝜇 ± 2𝜎 ⇒ 32.5 ± 2(7.22) ⇒ (18.06, 46.94) 𝑃(18. 06 < 𝑥 < 46.94) = 𝑃(20 < 𝑥 < 45) = (45 − 20)(. 04) = 1 4.134

From Exercise 4.133, 𝑓(𝑥) =

.04 (20 ≤ 𝑥 ≤ 45) 0 otherwise

a.

𝑃(20 ≤ 𝑥 ≤ 30) = (30 − 20)(. 04) = .4

b.

𝑃(20 < 𝑥 ≤ 30) = (30 − 20)(. 04) = .4

c.

𝑃(𝑥 ≥ 30) = (45 − 30)(. 04) = .6

d.

𝑃(𝑥 ≥ 45) = (45 − 45)(. 04) = 0

e.

𝑃(𝑥 ≤ 40) = (40 − 20)(. 04) = .8

f.

𝑃(𝑥 < 40) = (40 − 20)(. 04) = .8

g.

𝑃(15 ≤ 𝑥 ≤ 35) = (35 − 20)(.04) = .6

h.

𝑃(21. 5 ≤ 𝑥 ≤ 31. 5) = (31. 5 − 21. 5)(. 04) = .4 Copyright © 2022 Pearson Education, Inc.

.

=

.

= .5775


Random Variables and Probability Distributions 4.135

4.136

4.137

𝑃(𝑥 ≥ 𝑎) = 𝑒

/

/

=𝑒 /

. Using a calculator:

a.

𝑃(𝑥 > 1) = 𝑒

b.

𝑃(𝑥 ≤ 3) = 1 − 𝑃(𝑥 > 3) = 1 − 𝑒

c.

𝑃(𝑥 > 1.5) = 𝑒

d.

𝑃(𝑥 ≤ 5) = 1 − 𝑃(𝑥 > 5) = 1 − 𝑒

/

a.

𝑃(𝑥 ≤ 4) = 1 − 𝑃(𝑥 > 4) = 1 − 𝑒

/ .

b.

𝑃(𝑥 > 5) = 𝑒

c.

𝑃(𝑥 ≤ 2) = 1 − 𝑃(𝑥 > 2) = 1 − 𝑒

d.

𝑃(𝑥 > 3) = 𝑒

𝑓(𝑥) =

=𝑒

. /

/ .

/ .

=

= .367879

.

=𝑒

=𝑒

=1−𝑒

= 1 − .049787 = .950213

= .223130 =1−𝑒 =1−𝑒

= 1 − .006738 = .993262 .

= 1 − .201897 = .798103

= .135335

=𝑒

.

=

= .01

/ .

= 1 − 𝑒 . = 1 − .449329 = .550671

= .301194

𝑓(𝑥) =

.01 (100 ≤ 𝑥 ≤ 200) 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

𝜇=

=

=

/

= 150

𝜎=

=

=

= 28.8675

a.

𝜇 ± 2𝜎 ⇒ 150 ± 2(28.8675) ⇒ 150 ± 57.735 ⇒ (92.265, 207.735) 𝑃(𝑥 < 92.265) + 𝑃(𝑥 > 207.735) = 𝑃(𝑥 < 100) + 𝑃(𝑥 > 200) = 0 + 0 = 0

b.

𝜇 ± 3𝜎 ⇒ 150 ± 3(28.8675) ⇒ 150 ± 86.6025 ⇒ (63.3975, 236.6025) 𝑃(63.3975 < 𝑥 < 236. 6025) = 𝑃(100 < 𝑥 < 200) = (200 − 100)(. 01) = 1

c.

From a, 𝜇 ± 2𝜎 ⇒ (92.265, 207.735). 𝑃(92.265 < 𝑥 < 207.735) = 𝑃(100 < 𝑥 < 200) = (200 − 100)(. 01) = 1

4.138

With 𝜃=2, 𝜇 = 𝜎 = 𝜃 = 2 a.

𝜇 ± 3𝜎 ⇒ 2 ± 3(2) ⇒ 2 ± 6 ⇒ (−4, 8) Since 𝜇 − 3𝜎 lies below 0, find the probability that x is more than 𝜇 + 3𝜎 = 8. 𝑃(𝑥 > 8) = 𝑒

b.

/

=𝑒

= .018316

𝜇 ± 2𝜎 ⇒ 2 ± 2(2) ⇒ 2 ± 4 ⇒ (−2, 6) Since 𝜇 − 2𝜎 lies below 0, find the probability that x is between 0 and 6. 𝑃(𝑥 < 6) = 1 − 𝑃(𝑥 ≥ 6) = 1 − 𝑒

/

= 1 − 𝑒 = 1 − .049787 = .950213 (using Table V, Appendix D)

Copyright © 2022 Pearson Education, Inc.

203


204

Chapter 4 c.

𝜇 ± .5𝜎 ⇒ 2 ± .5(2) ⇒ 2 ± 1 ⇒ (1, 3) 𝑃(1 < 𝑥 < 3) = 𝑃(𝑥 > 1) − 𝑃(𝑥 > 3) = 𝑒 .383401

4.139

a.

b.

𝑓(𝑥) =

=

=

=𝑒 . −𝑒

.

= .606531 − .223130 =

= .01

𝑓(𝑥) =

.01 (0 ≤ 𝑥 ≤ 100) 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

𝜇=

=

=

−𝑒

= 50

𝜎=

=

=

= 28.8675

𝜎 = 28.8675 = 833.333

4.140

4.141

c.

𝑃(𝑥 < 30) = (30 − 0)(. 01) = .3

a.

𝑓(𝑥) = 𝑒

b.

𝜇 = 𝜃 = 500, 𝜎 = 𝜃 = 500 = 250,000

c.

𝑃(𝑥 < 30) = 1– 𝑃(𝑥 ≥ 30) = 1– 𝑒 calculator)

/

= 1– 𝑒 .

= 1 − .941765=. 058235 (using a

Let x = load of a cantilever beam. Then x has a uniform distribution on the interval from 100 to 115. 𝑓(𝑥) =

4.142

(𝑥 > 0), where θ = 500.

=

(100 ≤ 𝑥 ≤ 115) 0 otherwise

a.

𝑃(𝑥 > 110) = (115 − 110)

=

= .333

b.

𝑃(𝑥 < 102) = (102 − 100)

=

= .133

c.

𝑃(𝑥 > 𝐿) = (115 − 𝐿)

a.

x is an exponential random variable with θ = 40.

= .100 ⇒ 115 − 𝐿 = 1.5 ⇒ 𝐿 = 113.5

𝑃(30 < 𝑥 < 80) = 𝑃(𝑥 ≥ 30) − 𝑃(𝑥 ≥ 80) = 𝑒 −𝑒 . =𝑒 − 𝑒 = .4723666 − .1353353 = .3370313 b.

𝑃(𝑥 > 120) = 𝑃(𝑥 ≥ 120) = 𝑒

c.

𝑃(𝑥 < 50) = 1 − 𝑃(𝑥 ≥ 50) = 1 − 𝑒

=𝑒

= .049787 =1−𝑒

.

Copyright © 2022 Pearson Education, Inc.

= 1 − .286505 = .713495


Random Variables and Probability Distributions 4.143

a.

205

Let x = temperature with no bolt-on trace elements. Then x has a uniform distribution. (𝑐 ≤ 𝑥 ≤ 𝑑)

𝑓(𝑥) = =

= (260 ≤ 𝑥 ≤ 290)

Therefore, 𝑓(𝑥) =

0

𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

𝑃(280 < 𝑥 < 284) = (284 − 280)

= .133

=4

Let y = temperature with bolt-on trace elements. Then y has a uniform distribution. (𝑐 ≤ 𝑦 ≤ 𝑑)

𝑓(𝑦) = =

= (278 ≤ 𝑦 ≤ 285)

Therefore, 𝑓(𝑦) =

0

𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

𝑃(280 < 𝑦 < 284) = (284 − 280) = 4 b.

𝑃(𝑥 ≤ 268) = (268 − 260)

=8

= .571

= .267

𝑃(𝑦 ≤ 268) = (268 − 260)(0) = 0 4.144

a.

Let x = number of anthrax spores. Then x has an approximate uniform distribution. (𝑐 ≤ 𝑥 ≤ 𝑑)

𝑓(𝑥) = =

=

= .1

Therefore, 𝑓(𝑥) =

(0 ≤ 𝑥 ≤ 10) 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

.1 0

𝑃(𝑥 ≤ 8) = (8– 0)(. 1) = .8

4.145

4.146

b.

𝑃(2 ≤ 𝑥 ≤ 5) = (5– 2)(. 1) = .3

a.

𝑃(𝑥 > 2) = 𝑒

b.

𝑃(𝑥 < 5) = 1– 𝑃(𝑥 ≥ 5) = 1– 𝑒

a.

Let x = time until the first critical part failure. Then x has an exponential distribution with 𝜃=.1. 𝑃(𝑥 ≥ 1) = 𝑒

4.147

/ .

/.

= 𝑒 . = .449329

=𝑒

/ .

(using a calculator) = 1– 𝑒

= 1 − .135335=.864665 (using a calculator)

= .0000454 (using a calculator)

b.

30 minutes = .5 hours. 𝑃(𝑥 < .5) = 1 − 𝑃(𝑥 ≥ .5) = 1 − 𝑒 . /. = 1 − 𝑒

a.

For this problem, x has a uniform distribution on the interval from 0 to 1. Thus, 𝜇 = Copyright © 2022 Pearson Education, Inc.

= 1 − .0067 = .9933 =

= .5.


206

4.148

Chapter 4 For this problem, 𝑓(𝑥) =

c.

With n = 2, the total possible connections is

𝑃(𝑥 > .7) = (1 − .7)(1) = .3

! 2 = 1. Thus, the density can be either 0 or = )! !( 2 1. Therefore, the uniform model would not be a good approximation for the distribution of network density.

Let x = percentage of Boeing 787s delivered in the fifth year of the program. Then x has a uniform distribution on the interval from 2.5 to 5.5. 𝑓(𝑥) =

.

(2.5 ≤ 𝑥 ≤ 5.5)

=

.

0 otherwise

𝑃(𝑥 > 4) = (5.5 − 4)

= 1.5

⇒ .5

a.

The amount dispensed by the beverage machine is a continuous random variable since it can take on any value between 6.5 and 7.5 ounces.

b.

Since the amount dispensed is random between 6.5 and 7.5 ounces, x is a uniform random variable. (𝑐 ≤ 𝑥 ≤ 𝑑)

𝑓(𝑥) = =

.

=

.

=1 1 (6.5 ≤ 𝑥 ≤ 7.5) 0 otherwise

Therefore, 𝑓(𝑥) =

The graph is as follows:

f(x)

4.149

1 0≤𝑥≤1 0 otherwise

b.

1

0

6.5

7.5 x

c.

𝜇 − 2𝜎 = 6.422

𝜇=7

=

.

= 7

=

.

𝜇 = 𝜎 =

.

=

.

= .2887

𝜇 + 2𝜎 = 7.577

𝜇 ± 2𝜎 ⇒ 7 ± 2(. 2887) ⇒ 7 ± .5774 ⇒ (6.422, 7.577) Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions d.

𝑃(𝑥 ≥ 7) = (7.5 − 7)(1) = .5

e.

𝑃(𝑥 < 6) = 0

f.

𝑃(6.5 ≤ 𝑥 ≤ 7.25) = (7.25 − 6.5)(1) = .75

g.

The probability that the next bottle filled will contain more than 7.25 ounces is:

207

𝑃(𝑥 > 7.25) = (7.5 − 7.25)(1) = .25 The probability that the next 6 bottles filled will contain more than 7.25 ounces is: 𝑃 (𝑥 > 7.25) ∩ (𝑥 > 7.25) ∩ (𝑥 > 7.25) ∩ (𝑥 > 7.25) ∩ (𝑥 > 7.25) ∩ (𝑥 > 7.25) = 𝑃(𝑥 > 7.25) = .25 = .0002 4.150

/

= .2828

a.

Two minutes equals 120 seconds. 𝑃(𝑥 ≥ 120) = 𝑒

b.

Using MINITAB, a histogram of the data with the exponential distribution displayed on top of it is: Histogram of INTTIME Exponential Mean 95.52 N 267

70 60

Frequency

50 40 30 20 10 0

0

75

150

225 300 INTTIME

375

450

525

The data appear to fit the exponential distribution fairly well. 4.151

Let x = number of inches a gouge is from one end of the spindle. Then x has a uniform distribution with f(x) as follows: 𝑓(𝑥) =

=

=

0

0 ≤ 𝑥 ≤ 18 otherwise

In order to get at least 14 consecutive inches without a gouge, the gouge must be within 4 inches of either end. Thus, we must find: 𝑃(𝑥 < 4) + 𝑃(𝑥 > 14) = (4 − 0)(1/18) + (18 − 14)(1/18) = 4/18 + 4/18 = 8/18 = .4444 4.152

Let x = the inter-arrival time of a malicious data packet. Then x will follow an exponential distribution with θ = 1 second. 𝑃(𝑥 > 5) = 𝑒

/

=𝑒

= .006738

Copyright © 2022 Pearson Education, Inc.


208 4.153

Chapter 4 Let x be a random variable with an exponential distribution with mean 𝜃. Let k = median of the distribution. Then 𝑃(𝑥 > 𝑘) = .5. We now need to find k. 𝑃(𝑥 > 𝑘) = .5 ⇒ 𝑒

a.

= .5 ⇒ −𝑘/𝜃 = 𝑙𝑛(. 5) ⇒ 𝑘 = −𝜃 𝑙𝑛(. 5) = .693147𝜃

Using MINITAB, a graph is:

f(p)

4.154

/

1

0

0

1 p

= .5, 𝜎 =

=

= .289,

𝜎 = .289 = .083

𝜇=

c.

𝑃(𝑝 > .95) = (1 − .95)(1) = .05 𝑃(𝑝 < .95) = (.95 − 0)(1) = .95

d.

The analyst should use a uniform probability distribution with 𝑐 = .90 and 𝑑 = .95. 𝑓(𝑝) =

4.155

=

b.

a.

= 0

.

.

=

.

= 30(. 90 ≤ 𝑝 ≤ .95)

𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

For 𝜃 = 250, 𝑃(𝑥 > 𝑎) = 𝑒

/250

For 𝑎 = 300 and 𝑏 = 200, show 𝑃(𝑥 > 𝑎 + 𝑏) ≥ 𝑃(𝑥 > 𝑎)𝑃(𝑥 > 𝑏) 𝑃(𝑥 > 300 + 200) = 𝑃(𝑥 > 500) = 𝑒 𝑃(𝑥 > 300)𝑃(𝑥 > 200) = 𝑒

/250

𝑒

/250 /250

=𝑒

=𝑒

= .1353 .

𝑒 . = .3012(.4493) = .1353

Since 𝑃(𝑥 > 300 + 200) = 𝑃(𝑥 > 300)𝑃(𝑥 > 200), then 𝑃(𝑥 > 300 + 200) ≥ 𝑃(𝑥 > 300)𝑃(𝑥 > 200) Also, show 𝑃(𝑥 > 300 + 200) ≤ 𝑃(𝑥 > 300)𝑃(𝑥 > 200). Since we already showed that 𝑃(𝑥 > 300 + 200) = 𝑃(𝑥 > 300)𝑃(𝑥 > 200), then 𝑃(𝑥 > 300 + 200) ≤ 𝑃(𝑥 > 300)𝑃(𝑥 > 200). b.

Let 𝑎 = 50 and 𝑏 = 100. Show 𝑃(𝑥 > 𝑎 + 𝑏) ≤ 𝑃(𝑥 > 𝑎)𝑃(𝑥 > 𝑏) 𝑃(𝑥 > 50 + 100) = 𝑃(𝑥 > 150) = 𝑒 150/250 = 𝑒 . = .5488 𝑃(𝑥 > 50)𝑃(𝑥 > 100) = 𝑒

/250

𝑒

/250

= 𝑒 . 𝑒 . = .8187(.6703) = .5488

Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions

209

Since 𝑃(𝑥 > 50 + 100) = 𝑃(𝑥 > 50)𝑃(𝑥 > 100), then 𝑃(𝑥 > 50 + 100) ≥ 𝑃(𝑥 > 50)𝑃(𝑥 > 100) Also, show 𝑃(𝑥 > 50 + 100) ≤ 𝑃(𝑥 > 50)𝑃(𝑥 > 100). Since we already showed that 𝑃(𝑥 > 50 + 100) = 𝑃(𝑥 > 50)𝑃(𝑥 > 100), then 𝑃(𝑥 > 50 + 100) ≤ 𝑃(𝑥 > 50)𝑃(𝑥 > 100).

4.156

c.

Show 𝑃(𝑥 > 𝑎 + 𝑏) ≥ 𝑃(𝑥 > 𝑎)𝑃(𝑥 > 𝑏) 𝑃(𝑥 > 𝑎 + 𝑏) = 𝑒 ( )/250 = 𝑒 /250 𝑒 /250 = 𝑃(𝑥 > 𝑎)𝑃(𝑥 > 𝑏)

a.

If x has a uniform distribution with c = 0, then 𝑓 (𝑥 ) =

= 0

=

, (0 ≤ 𝑥 ≤ 𝑑 )

otherwise

,

𝑃(𝑥 ≥ 500,000) = (𝑑 − 500,000) = 1 −

Since we know that 𝑃(𝑥 ≥ 500,000) = .9, we solve for d and find d = 5,000,000

𝜇= b.

=

,

,

=

,

,

= $2,500,000

If x has an exponential distribution, then 𝑃(𝑥 > 500,000) = 𝑒

,

/

= .9

Taking the natural log of both sides, we get −500,000 𝜃 = ln (.9) −500,000

4.157

4.158

𝜃 = −.105361  − 500,000 = −.105361(𝜃)  𝜃 = $4,745,610.79

 n x n- x p ( x) =   p q x = 0, 1, 2, ... , n  x a.

𝑃(𝑥 = 3) = 𝑝(3) =

7 .5 .5 3

=

b.

𝑃(𝑥 = 3) = 𝑝(3) =

4 .8 .2 3

=

c.

𝑃(𝑥 = 1) = 𝑝(1) =

15 .1 .9 1

a.

This experiment consists of 100 trials. Each trial results in one of two outcomes: chip is defective or not defective. If the number of chips produced in one hour is much larger than 100, then we can assume the probability of a defective chip is the same on each trial and that the trials are independent. Thus, x is a binomial. If, however, the number of chips produced in an hour is not much larger than 100, the trials would not be independent. Then x would not be a binomial random variable.

b.

This experiment consists of two trials. Each trial results in one of two outcomes: applicant qualified or not qualified. However, the trials are not independent. The probability of selecting a qualified applicant on the first trial is 3 out of 5. The probability of selecting a qualified applicant on the second trial depends on what happened on the first trial. Thus, x is not a binomial random variable. It is a hypergeometric random variable.

c.

The number of trials is not a specified number in this experiment, thus x is not a binomial random variable. In this experiment, x is counting the number of calls received.

! ! !

. 5 . 5 = 35(. 125)(. 0625) = .2734

! ! !

=

. 8 . 2 = 4(.512)(. 2) = .4096 ! !

!

.1 .9

= 15(. 1)(.228768) = .3432

Copyright © 2022 Pearson Education, Inc.


210

Chapter 4

d.

4.159

The number of trials in this experiment is 1000. Each trial can result in one of two outcomes: favor state income tax or not favor state income tax. Since 1000 is small compared to the number of registered voters in Florida, the probability of selecting a voter in favor of the state income tax is the same from trial to trial, and the trials are independent of each other. Thus, x is a binomial random variable.

From Table I, Appendix D: a.

𝑃(𝑥 = 14) = 𝑃(𝑥 ≤ 14) − 𝑃(𝑥 ≤ 13) = .584 − .392 = .192

b.

𝑃(𝑥 ≤ 12) = .228

c.

𝑃(𝑥 > 12) = 1 − 𝑃(𝑥 ≤ 12) = 1 − .228 = .772

d.

𝑃(9 ≤ 𝑥 ≤ 18) = 𝑃(𝑥 ≤ 18) − 𝑃(𝑥 ≤ 8) = .992 − .005 = .987

e.

𝑃(8 < 𝑥 < 18) = 𝑃(𝑥 ≤ 17) − 𝑃(𝑥 ≤ 8) = .965 − .005 = .960

f.

𝜇 = 𝑛𝑝 = 20(. 7) = 14,

g.

𝜇 ± 2𝜎 ⇒ 14 ± 2(2.049) ⇒ 14 ± 4.098 ⇒ (9.902, 18. 098)

𝜎 = 𝑛𝑝𝑞 = 20(. 7)(. 3) = 4.2,

𝜎 = √4.2= 2. 049

𝑃(9.902 < 𝑥 < 18. 098) = 𝑃(10 ≤ 𝑥 ≤ 18) = 𝑃(𝑥 ≤ 18) − 𝑃(𝑥 ≤ 9) = .992 − .017 = .975 4.160

a.

𝜇 = ∑ 𝑥𝑝(𝑥) = 10(. 2) + 12(. 3) + 18(. 1) + 20(. 4) = 15. 4 𝜎 = ∑(𝑥 − 𝜇) 𝑝(𝑥) = (10 − 15. 4) (. 2) + (12 − 15. 4) (. 3) + (18 − 15. 4) (. 1) + (20 − 15. 4) (. 4) = 18.44 𝜎 = √18.44 ≈ 4.294

4.161

b

𝑃(𝑥 < 15) = 𝑝(10) + 𝑝(12) = .2 + .3 = .5

c.

𝜇 ± 2𝜎 ⇒ 15. 4 ± 2(4.294) ⇒ (6.812, 23.988)

d.

𝑃(6.812 < 𝑥 < 23.988) = .2 + .3 + .1 + .4 = 1.0

a.

Poisson

b.

Binomial

c.

Binomial

Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions 4.162

a.

Using MINITAB with 𝜆 = 2, Probability Density Function Poisson with mean = 2 x 3

P( X = x ) 0.180447

𝑝(3) = 𝑃(𝑥 = 3) = .180447 b.

Using MINITAB with 𝜆 = 1, Probability Density Function Poisson with mean = 1 x 4

P( X = x ) 0.0153283

𝑝(4) = 𝑃(𝑥 = 4) = .0153283 c.

Using MINITAB with 𝜆 = .5, Probability Density Function Poisson with mean = .5 x 2

P( X = x ) 0.0758163

𝑝(2) = 𝑃(𝑥 = 2) = .0758163 4.163

4.164

4.165

a.

Discrete - The number of damaged inventory items is countable.

b.

Continuous - The average monthly sales can take on any value within an acceptable limit.

c.

Continuous - The number of square feet can take on any positive value.

d.

Continuous - The length of time we must wait can take on any positive value.

a.

𝑃(𝑥 = 2) =

=

=

! ! ! ! ! ! ! ! !

=

(

)

b.

𝑃(𝑥 = 2) =

=

=

! ! ! ! ! ! ! ! !

=

( )

= .067

c.

𝑃(𝑥 = 3) =

=

=

! ! ! ! ! ! ! ! !

=

( )

= .8

a.

𝑃(𝑧 ≤ 2.1) = 𝐴 + 𝐴 = .5 + .4821 = .9821

= .536

Copyright © 2022 Pearson Education, Inc.

211


212

4.166

Chapter 4

b.

𝑃(𝑧 ≥ 2.1) = 𝐴 = .5 − 𝐴 = .5 − .4821 = .0179

c.

𝑃(𝑧 ≥ −1.65) = 𝐴 + 𝐴 = .4505 + .5000 = .9505

d.

𝑃(−2.13 ≤ 𝑧 ≤ −.41) = 𝑃(−2.13 ≤ 𝑧 ≤ 0) − 𝑃(−.41 ≤ 𝑧 ≤ 0) = .4834 − .1591 = .3243

e.

𝑃(−1.45 ≤ 𝑧 ≤ 2.15) = 𝐴 + 𝐴 = .4265 + .4842 = .9107

f.

𝑃(𝑧 ≤ −1.43) = 𝐴 = .5 − 𝐴 = .5000 − .4236 = .0764

a.

𝑓(𝑥) =

b.

𝜇= 𝜎=

c.

=

=

0

otherwise

=

= 50

=

, (10 ≤ 𝑥 ≤ 90)

= 23.094011

The interval 𝜇 ± 2𝜎 ⇒ 50 ± 2(23. 094) ⇒ 50 ± 46.188 ⇒ (3.812, 96.188) is indicated on the graph.

Copyright © 2022 Pearson Education, Inc.


f(x)

Random Variables and Probability Distributions

1/80

0

10

90 x

𝜇 − 2𝜎 = 3.812

4.167

4.168

𝜇 = 50

𝜇 + 2𝜎 = 96.188

d.

𝑃(𝑥 ≤ 60) = (60 − 10)

e.

𝑃(𝑥 ≥ 90) = 0

f.

𝑃(𝑥 ≤ 80) = (80 − 10)

g.

𝑃(𝜇 − 𝜎 ≤ 𝑥 ≤ 𝜇 + 𝜎) = 𝑃(50 − 23. 094 ≤ 𝑥 ≤ 50 + 23. 094) = 𝑃(26. 906 ≤ 𝑥 ≤ 73. 094) . = = .577 = (73. 094 − 26. 906)

h.

𝑃(𝑥 > 75) = (90 − 75)

a.

For the probability density function, 𝑓(𝑥) =

, 𝑥 > 0, x is an exponential random variable.

b.

For the probability density function, 𝑓(𝑥) =

, 5 < 𝑥 < 25, x is a uniform random variable.

c.

For the probability function, 𝑓(𝑥) =

a.

𝑃(𝑧 ≤ 𝑧 ) = .5080 ⇒ 𝑃(0 ≤ 𝑧 ≤ 𝑧 ) = .5080 − .5 = .0080 Looking up the area .0080 in Table II, gives 𝑧 = .02.

b.

𝑃(𝑧 ≥ 𝑧 ) = .5517 ⇒ 𝑃(𝑧 ≤ 𝑧 ≤ 0) = .5517 − .5 = .0517 Looking up the area .0517 in Table II, 𝑧 = −.13.

=

= .625

=

= .875

=

= .1875

. [(

)/ ]

, x is a normal random variable.

Copyright © 2022 Pearson Education, Inc.

213


214

Chapter 4 c.

𝑃(𝑧 ≥ 𝑧 ) = .1492 ⇒ 𝑃(0 ≤ 𝑧 ≤ 𝑧 ) = .5 − .1492 = .3508 Looking up the area .3508 in Table II, gives 𝑧 = 1.04.

d.

𝑃(𝑧 ≤ 𝑧 ≤ .59) = .4773 ⇒ 𝑃(𝑧 ≤ 𝑧 ≤ 0) + 𝑃(0 ≤ 𝑧 ≤ .59) = .4773 𝑃(0 ≤ 𝑧 ≤ .59) = .2224 Thus, 𝑃(𝑧 ≤ 𝑧 ≤ 0) = .4773 − .2224 = .2549 Looking up the area .2549 in Table II, gives 𝑧 = −.69.

4.169

a.

𝑃(𝑥 ≤ 80) = 𝑃 𝑧 ≤

= 𝑃(𝑧 ≤ .5) = .5+.1915 =

.6915 (Table II, Appendix D)

b.

𝑃(𝑥 ≥ 85) = 𝑃 𝑧 ≥

= 𝑃(𝑧 ≥ 1) = .5 − .3413 =

.1587 (Table II, Appendix D) c.

𝑃(70 ≤ 𝑥 ≤ 75) = 𝑃

≤ 𝑧 ≤

= 𝑃(−.5 ≤ 𝑧 ≤ 0) = 𝑃(0 ≤ 𝑧 ≤ .5) = .1915 (Table II, Appendix D) d.

𝑃(𝑥 > 80) = 1 − 𝑃(𝑥 ≤ 80) = 1 − .6915 = .3085 (Refer to part a.)

e.

𝑃(𝑥 = 78) = 0, since a single point does not have an area.

f.

𝑃(𝑥 ≤ 110) = 𝑃 𝑧 ≤

= 𝑃(𝑧 ≤ 3.5)

= .5 + .49977 = .99977 (Table II, Appendix D) 4.170

/

= 1 − .716531 = .283469 (using calculator)

a.

𝑃(𝑥 ≤ 1) = 1 − 𝑃(𝑥 > 1) = 1 − 𝑒

b.

𝑃(𝑥 > 1) = 𝑒

c.

𝑃(𝑥 = 1) = 0

d.

𝑃(𝑥 ≤ 6) = 1 − 𝑃(𝑥 > 6) = 1 − 𝑒

e.

𝑃(2 ≤ 𝑥 ≤ 10) = 𝑃(𝑥 ≥ 2) − 𝑃(𝑥 > 10) = 𝑒 (using calculator)

/

= .716531 (x is a continuous random variable. There is no probability associated with a single point.) /

=1−𝑒

= 1 − .135335 = .864665 (using a calculator) /

−𝑒

/

= .513417 − .035674 = .47774

Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions 4.171

x is normal random variable with 𝜇 = 40, 𝜎 = 36, and 𝜎 = 6. a.

𝑃(𝑥 ≥ 𝑥 ) = .10 So,𝐴 = .5 − .10 = .4000. Looking up the area .4000 In the body of Table II, Appendix D gives 𝑧 = 1.28. To find x0, substitute the values into the z-score formula: 𝑧 = ⇒ 1.28 = ⇒ 𝑥 = 1.28(6) + 40 = 47.68

b.

𝑃(𝜇 ≤ 𝑥 ≤ 𝑥 ) = .40 Looking up the area .4000 in the body of Table II, Appendix D gives 𝑧 = 1.28. To find x0, substitute the values into the z-score formula: 𝑧 =

c.

⇒ 1.28 =

⇒ 𝑥 = 1.28(6) + 40 = 47.68

𝑃(𝑥 < 𝑥 ) = .05 So,𝐴 = .5000 − .0500 = .4500. Looking up the area .4500 in the body of Table II, Appendix D gives 𝑧 = −1.645. (.45 is halfway between .4495 and .4505; therefore, we average the z-scores) .

.

= 1.645

z0 is negative since the graph shows z0 is on the left side of 0. To find x0, substitute the values into the z-score formula: 𝑧 = d.

⇒ −1.645 =

⇒ 𝑥 = −1.645(6) + 40 = 30.13

𝑃(𝑥 > 𝑥 ) = .40 So,𝐴 = .5000 − .4000 = .1000. Looking up the area .1000 in the body of Table II, Appendix D gives 𝑧 = .25. To find x0, substitute the values into the z-score formula: 𝑧 =

e.

⇒ .25 =

⇒ 𝑥 = .25(6) + 40 = 41.5

𝑃(𝑥 ≤ 𝑥 < 𝜇) = .45 Looking up the area .4500 in the body of Table II,

Copyright © 2022 Pearson Education, Inc.

215


216

Chapter 4 Appendix D gives 𝑧 = −1.645. (.45 is halfway between .4495 and .4505; therefore, we average the z-scores) .

.

= 1.645

z0 is negative since the graph shows z0 is on the left side of 0. To find x0, substitute the values into the z-score formula: 𝑧 =

4.172

⇒ −1.645 =

𝜇 = 𝑛𝑝 = 100(. 5) = 50, 𝜎 = a.

𝑃(𝑥 ≤ 48) = 𝑃 𝑧 ≤

(

⇒ 𝑥 = −1.645(6) + 40 = 30.13

𝑛𝑝𝑞 = . )

100(. 5)(. 5) = 5 = 𝑃(𝑧 ≤ −.30)

= .5 − .1179 =.3821

b.

𝑃(50 ≤ 𝑥 ≤ 65) = 𝑃

(

. )

≤ 𝑧 ≤

(

. )

= 𝑃(−.10 ≤ 𝑧 ≤ 3.10) = .0398 + .49903 = .53883

c.

𝑃(𝑥 ≥ 70) = 𝑃 𝑧 ≥

(

. )

= 𝑃(𝑧 ≥ 3.90)

= .5 − .49995 = .00005

d.

𝑃(55 ≤ 𝑥 ≤ 58) = 𝑃

(

. )

≤𝑧≤

(

. )

= 𝑃(. 90 ≤ 𝑧 ≤ 1.70) = 𝑃(0 ≤ 𝑧 ≤ 1.70) − 𝑃(0 ≤ 𝑧 ≤ .90) = .4554 − .3159 = .1395

e.

𝑃(𝑥 = 62) = 𝑃

(

. )

≤𝑧≤

(

. )

= 𝑃(2.30 ≤ 𝑧 ≤ 2.50) = 𝑃(0 ≤ 𝑧 ≤ 2.50) − 𝑃(0 ≤ 𝑧 ≤ 2.30) = .4938 − .4893 = .0045 f.

𝑃(𝑥 ≤ 49 or 𝑥 ≥ 72) ( ( . ) +𝑃 𝑧≥ =𝑃 𝑧≤

. )

= 𝑃(𝑧 ≤ −.10) + 𝑃(𝑧 ≥ 4.30) = (. 5 − .0398) + (. 5 − .5) = .4602

4.173

a.

We will check the 5 characteristics of a binomial random variable.

Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions

217

1.

The experiment consists of n = 5 identical trials. We have to assume that the number of bottled water brands is large.

2.

There are only 2 possible outcomes for each trial. Let S = brand of bottled water used tap water and F = brand of bottled water did not use tap water.

3.

The probability of success (S) is the same from trial to trial. For each trial, 𝑝 = 𝑃(𝑆) = .25 and 𝑞 = 1– 𝑝 = 1 − .25 = .75.

4.

The trials are independent.

5.

The binomial random variable x is the number of brands in the 5 trials that used tap water.

If the total number of brands of bottled water is large, then the above characteristics will be basically true. Thus, x is a binomial random variable. b. c. d.

e.

5 The formula for the probability distribution for x is 𝑝(𝑥) = . 25 (. 75) 𝑥 ! 5 𝑃(𝑥 = 2) = = . 25 (. 75) = .2637 . 25 (. 75) ! ! 2

, for x = 0, 1, 2, 3, 4, 5.

5 5 𝑃(𝑥 ≤ 1) = 𝑃(𝑥 = 0) + 𝑃(𝑥 = 1) = + . 25 (. 75) . 25 (. 75) 0 1 5! 5! . 25 (. 75) + . 25 (. 75) = .2373 + .3955 = .6328 = 1! 4! 0! 5! 𝐸(𝑥) = 𝜇 = 𝑛𝑝 = 65(. 25) = 16.25 𝜎 = √𝜎 =

𝑛𝑝𝑞 =

65(. 25)(. 75) = √12.1875 = 3.49

To see if the normal approximation is appropriate, we use: 𝜇 ± 3𝜎 ⇒ 16.25 ± 3(3.49) ⇒ 16.25 ± 10.47 ⇒ (5.78, 26.72) Since this interval lies in the range from 0 to 65, the normal approximation is appropriate. 𝑃(𝑥 ≥ 20) = 𝑃 𝑧 ≥

(

. ) .

.

= 𝑃(𝑧 ≥ .93) = .5 − 𝑃(0 ≤ 𝑧 ≤ .93) = .5 − .3238 = .1762 (Using Table II, Appendix D)

Since this probability is not small, it is likely that 20 or more brands will contain tap water.

Copyright © 2022 Pearson Education, Inc.


218

Chapter 4 f.

We will check the 5 characteristics of a binomial random variable. 1. The experiment consists of n = 5 identical trials. We have to assume that the number of bottled water brands is large. 2. There are only 2 possible outcomes for each trial. Let S = brand of bottled water used purified tap water and F = brand of bottled water did not use tap water. 3. The probability of success (S) is the same from trial to trial. For each trial, 𝑝 = 𝑃(𝑆) = .5 and 𝑞 = 1– 𝑝 = 1 − .5 = .5. 4. The trials are independent. 5. The binomial random variable x is the number of brands in the 5 trials that used purified tap water. If the total number of brands of bottled water is large, then the above characteristics will be basically true. Thus, x is a binomial random variable. 5 The formula for the probability distribution for x is 𝑝(𝑥) = . 5 (. 5) 𝑥 ! 5 = . 5 (. 5) = .3125 𝑃(𝑥 = 2) = . 5 (. 5) ! ! 2

, for x = 0, 1, 2, 3, 4, 5.

5 5 𝑃(𝑥 ≤ 1) = 𝑃(𝑥 = 0) + 𝑃(𝑥 = 1) = + . 5 (. 5) . 5 (. 5) 0 1 5! 5! . 5 (. 5) + . 5 (. 5) = .03125 + .15625 = .18750 = 1! 4! 0! 5! 𝐸(𝑥) = 𝜇 = 𝑛𝑝 = 65(. 5) = 32.5 𝜎 = √𝜎 =

𝑛𝑝𝑞 =

65(. 5)(. 5) = √16.25 = 4.03

To see if the normal approximation is appropriate, we use: 𝜇 ± 3𝜎 ⇒ 32.5 ± 3(4.03) ⇒ 32.5 ± 12.09 ⇒ (20.41, 44.59) Since this interval lies in the range from 0 to 65, the normal approximation is appropriate. (

. )

𝑃(𝑥 ≥ 20) = 𝑃 𝑧 ≥ . = .5 − .49936 = .00064

.

= 𝑃(𝑧 ≥. −3.22) = .5 − 𝑃(−3.22 ≤ 𝑧 ≤ 0) (Using Table II, Appendix D)

Since this probability is extremely small, it is not likely that 20 or more brands will contain purified tap water. 4.174

a.

In order for this to be a valid probability distribution, all probabilities must be between 0 and 1 and the sum of all the probabilities must be 1. For this data, all the probabilities are between 0 and 1. The sum of all the probabilities is. 10 + .39 + .40 + .11 = 1.00.

b.

𝑃(𝑥 > 2) = 𝑃(𝑥 = 3) + 𝑃(𝑥 = 4) = .40 + .11 = .51

c.

𝜇 = 𝐸(𝑥) = ∑ 𝑥𝑝(𝑥) = 1(. 10) + 2(. 39) + 3(. 40) + 4(. 11) = .10 + .78 + 1.20 + .44 = 2.52

Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions 4.175

For this problem, 𝑓(𝑥) = 𝑓(𝑥) =

=

219

. Thus,

0 ≤ 𝑥 ≤ 3600 0

otherwise

The last 15 minutes would represent the last 15(60) = 900 seconds. 𝑃(2700 < 𝑥 < 3600) = (3600 − 2700) 4.176

=

= .25

Let x = number of patients who undergo laser surgery who have serious post-laser vision problems in 100,000 trials. Then x is a binomial random variable with 𝑛 = 100,000 and 𝑝 = .01. 𝐸(𝑥) = 𝜇 = 𝑛𝑝 = 100,000(. 01) = 1,000 𝜎 = √𝜎 =

𝑛𝑝𝑞 =

100,000(. 01)(. 99) = √990 = 31.464

To see if the normal approximation is appropriate, we use: 𝜇 ± 3𝜎 ⇒ 1,000 ± 3(31.464) ⇒ 1,000 ± 94.392 ⇒ (905.608,1,094.392) Since the interval lies in the range of 0 to 100,000, the normal approximation is appropriate. .

𝑃(𝑥 < 950) ≈ 𝑃 𝑧 < 4.177

= 𝑃(𝑧 < −1.61) = .5 − .4463 = .0537 (Using Table II, Appendix D)

.

Let x = interarrival time between patients. Then x is an exponential random variable with a mean of 4 minutes. /

= 1 − 𝑒 .25 = 1 − .778801 = .221199

a.

𝑃(𝑥 < 1) = 1 − 𝑃(𝑥 ≥ 1) = 1 − 𝑒

b.

Assuming that the interarrival times are independent, P(next 4 interarrival times are all less than 1 minute) = [𝑃(𝑥 < 1)] = .221199 = .002394

c. 4.178

𝑃(𝑥 > 10) = 𝑒

/

=𝑒

.

= .082085

Using MINITAB with 𝜆 = 5, and the Poisson distribution, the probability is: Cumulative Distribution Function Poisson with mean = 5 x 10

P( X <= x ) 0.986305

𝑃(𝑥 > 10) = 1 − 𝑃(𝑥 ≤ 10) = 1 − .986305 = .013695

Copyright © 2022 Pearson Education, Inc.


220

Chapter 4

4.179 a.

Let x = number of adults who participated in youth and/or high school sports who have an income greater than $100,000 in 25 adults. Then x is a binomial random variable with 𝑛 = 25 and 𝑝 = .15. Using MINITAB, with 𝑛 = 25 and 𝑝 = .15, the probability is: Cumulative Distribution Function Binomial with n = 25 and p = 0.15 P( X ≤ x ) 1.00000

x 19

Thus, 𝑃(𝑥 < 20) = 𝑃(𝑥 ≤ 19) = 1.0000. b.

Cumulative Distribution Function Binomial with n = 25 and p = 0.15 P( X ≤ x ) 0.999505

x 10

Thus, 𝑃(10 < 𝑥 < 20) = 𝑃(𝑥 ≤ 19) − 𝑃(𝑥 ≤ 10) = 1.0000 − .9995 = .0005. c.

Let x = number of adults who did not participate in youth and/or high school sports who have an income greater than $100,000 in 25 adults. Then x is a binomial random variable with 𝑛 = 25 and 𝑝 = .09. Using MINITAB, with 𝑛 = 25 and 𝑝 = .09, the probability is: Cumulative Distribution Function Binomial with n = 25 and p = 0.09 x 19

P( X ≤ x ) 1.00000

Thus, 𝑃(𝑥 < 20) = 𝑃(𝑥 ≤ 19) = 1.0000. Cumulative Distribution Function Binomial with n = 25 and p = 0.09 x 10

P( X ≤ x ) 1.00000

Thus, 𝑃(10 < 𝑥 < 20) = 𝑃(𝑥 ≤ 19) − 𝑃(𝑥 ≤ 10) = 1.0000 − 1.0000 = 0.0000. 4.180

a.

Let x = number of trees infected with the Dutch elm disease in the two trees purchased. For this problem, x is a hypergeometric random variable with 𝑁 = 10, 𝑛 = 2, and 𝑟 = 3. The probability that both trees will be healthy is: 𝑃(𝑥 = 0) =

4.181

=

=

! ! ! ! ! ! ! ! !

=

(

)

= .467

b.

The probability that at least one tree will be infected is: 𝑃(𝑥 ≥ 1) = 1 − 𝑃(𝑥 = 0) = 1 − .467 = .533.

a.

𝑃(1200 < 𝑥 < 1500) = 𝑃(𝑥 > 1200) − 𝑃(𝑥 > 1500) = 𝑒 = 𝑒 . − 𝑒 . = .3012 − .2231 = .0781

b.

𝑃(𝑥 ≥ 1200) = 𝑒

/

=𝑒

.

= .3012

Copyright © 2022 Pearson Education, Inc.

/

−𝑒

/


Random Variables and Probability Distributions

4.182

c.

𝑃(𝑥 < 1500|𝑥 ≥ 1200) =

a.

Using Table II, Appendix D, .

𝑃(𝑥 > 0) = 𝑃 𝑧 >

(

) (

)

=

. .

= .2593

= 𝑃(𝑧 > −0.526) = .5 + 𝑃(−0.53 < 𝑧 < 0) = .5 + .2019 = .7019 .

.

b.

= 𝑃(−0.026 < 𝑧 < 0.974) <𝑧< 𝑃(5 < 𝑥 < 15) = 𝑃 = 𝑃(−.03 < 𝑧 < 0) + 𝑃(0 < 𝑧 < .97) = .0120 + .3340 = .3460

c.

𝑃(𝑥 < 1) = 𝑃 𝑧 <

d.

𝑃(𝑥 ≤ −25) = 𝑃 𝑧 ≤

.

221

= 𝑃(𝑧 < −0.426) = .5 − 𝑃(−0.43 < 𝑧 < 0) = .5 − .1664 = .3336 .

= 𝑃(𝑧 ≤ −3.026) = .5 − 𝑃(−3.03 ≤ 𝑧 < 0) = .5 − .4988 = .0012

Since the probability of seeing a win percentage of -25% or anything more unusual is so small (p = .0012), we would conclude that the average casino win percentage is not 5.26%. 4.183

a.

Using MINITAB with 𝜇 = 105.3 and 𝜎 = 8, the probability is: Cumulative Distribution Function Normal with mean = 105.3 and standard deviation = 8 x 120

P( X <= x ) 0.966932

𝑃(𝑥 > 120) = 1 − 𝑃(𝑥 ≤ 120) = 1 − .966932 = .033068 b.

Using MINITAB with 𝜇 = 105.3 and 𝜎 = 8, the probabilities are: Cumulative Distribution Function Normal with mean = 105.3 and standard deviation = 8 x 110 100

P( X <= x ) 0.721566 0.253825

𝑃(100 < 𝑥 < 110) = 𝑃(𝑥 < 110) − 𝑃(𝑥 ≤ 100) = .721566 − .253825 = .467741 c.

Using MINITAB with 𝜇 = 105.3 and 𝜎 = 8, the value of a is found: Inverse Cumulative Distribution Function Normal with mean = 105.3 and standard deviation = 8 P( X <= x ) 0.25

x 99.9041

Thus, 𝑎 = 99.9041.

Copyright © 2022 Pearson Education, Inc.


222 4.184

Chapter 4 a.

In order for the number of deaths to follow a Poisson distribution, we must assume that the probability of a death is the same for any week. We must also assume that the number of deaths in any week is independent of any other week. The first assumption may not be valid. The probability of a death may not be the same for every week. The number of passengers varies from week to week, so the probability of a death may change. Also, things such as weather, which varies from week to week, may increase or decrease the chance of an accident.

b. c.

𝐸(𝑥) = 𝜆 = 5 and 𝜎 = √𝜆 = √5 = 2.2361 = 3.13. Since this z-score is more than 3 standard The z-score corresponding to𝑥 = 12 is 𝑧 = . deviations from the mean, it would be very unlikely that more than 12 deaths occur next week.

d.

Using MINTAB with 𝜆 = 5, we get the following probability: Cumulative Distribution Function Poisson with mean = 5 x 12

P( X ≤ x ) 0.997981

𝑃(𝑥 > 12) = 1 − .997981 = .002019 This probability is consistent with the answer in part c. The probability of more than 12 deaths is essentially zero, which is very unlikely. 4.185

a.

For 𝑁 = 209, 𝑟 = 10, and 𝑛 = 8, 𝐸(𝑥) =

b.

10 209 10

!

4.186

𝑃(𝑥 = 4) =

=

!(

)! !( ! !(

=

( )

= .383

! )!

= .0002

)!

Let 𝑥 = driver’s head injury rating. The random variable x has a normal distribution with 𝜇 = 605 and 𝜎 = 185. Using Table II, Appendix D, a.

= 𝑃(−0.57 < 𝑧 < 0.51) <𝑧< 𝑃(500 < 𝑥 < 700) = 𝑃 = 𝑃(−0.57 < 𝑧 < 0) + 𝑃(0 < 𝑧 < 0.51) = .2157 + .1950 = .4107

b.

= 𝑃(−1.11 < 𝑧 < −0.57) <𝑧< 𝑃(400 < 𝑥 < 500) = 𝑃 = 𝑃(−1.11 < 𝑧 < 0) − 𝑃(−0.57 < 𝑧 < 0) = .3665 − .2157 = .1508

c.

𝑃(𝑥 < 850) = 𝑃 𝑧 <

d.

𝑃(𝑥 > 1,000) = 𝑃 𝑧 >

= 𝑃(𝑧 < 1.32) = .5 + 𝑃(0 < 𝑧 < 1.32) = .5 + .4066 = .9066 ,

= 𝑃(𝑧 > 2.14) = .5 − 𝑃(0 < 𝑧 < 2.14) = .5 − .4838 = .0162

Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions

4.187

223

Let x equal the difference between the actual weight and recorded weight (the error of measurement). The random variable x is normally distributed with 𝜇 = 592 and 𝜎 = 628. a.

We want to find the probability that the weigh-in-motion equipment understates the actual weight of the truck. This would be true if the error of measurement is positive. = 𝑃(𝑧 > −.94)

𝑃(𝑥 > 0) = 𝑃 𝑧 >

= .5000 + .3264=.8264

b.

P(overstate the weight) = 1 − P(understate the weight) = 1 − .8264 = .1736 (Refer to part a.) For 100 measurements, approximately 100(.1736)=17.36 or 17 times the weight would be overstated.

c.

𝑃(𝑥 > 400) = 𝑃 𝑧 >

= 𝑃(𝑧 > −.31)

= .5000 + .1217 = .6217 d.

We want P(understate the weight) = .5 To understate the weight, 𝑥 > 0. Thus, we want to find 𝜇 so that 𝑃(𝑥 > 0) = .5 𝑃(𝑥 > 0) = 𝑃 𝑧 >

= .5

From Table II, Appendix D,𝑧 = 0. To find 𝜇, substitute into the z-score formula: 𝑧 = ⇒0= ⇒𝜇=0 Thus, the mean error should be set at 0. We want P(understate the weight) = .4 To understate the weight, 𝑥 > 0. Thus, we want to find 𝜇 so that 𝑃(𝑥 > 0) = .4. 𝐴 = .5 − .40 = .1. Look up the area .1000 in the body of Table II, Appendix D, 𝑧 = .25. To find 𝜇, substitute into the z-score formula: 𝑧 =

⇒ .25 =

⇒ 𝜇 = 0 − (.25)628 = −157

Copyright © 2022 Pearson Education, Inc.


224 4.188

Chapter 4 a.

Let x = number of tests until the system needs to be replaced = number of tests until the 5th failed test. Thus, x can take on values 5, 6, 7, …. Also, 𝑝 = 𝑃(no flaw is detected) = .85 and 1 − 𝑝 = 𝑃(flaw is detected) = .15. In order for the system to be replaced, the 5th flaw has to occur on the last trial, or the xth trial. Therefore, the number of arrangements of flaws and no flaws for the first 𝑥−1 𝑥 − 1 trials is . For the trials to end on the 5th flaw, there has to be 5 flaws with a probability 𝑥−5 (1 − 𝑝) = (. 15) . The number of ‘no flaws’ then must be 𝑥 − 5 with probability 𝑝 = (. 85) . Thus, the probability distribution for x is: 𝑝(𝑥) =

4. 189

𝑥 − 1 (1 − 𝑝) 𝑝 𝑥−5

, 𝑥 = 5,6,7, . ..

8 − 1 (1 − .85) (. 85) 8−5

b.

𝑃(𝑥 = 8) =

a.

𝜇 = 𝑛𝑝 = 25(. 05) = 1.25

𝜎=

=

! ! !

(. 15) (. 85) = .0016

𝑛𝑝𝑞 =

25(. 05)(. 95) = 1.09

Since 𝜇 is not an integer, x could not equal its expected value. b.

The event is (𝑥 ≥ 5). From Table I with 𝑛 = 25 and 𝑝 = .05: 𝑃(𝑥 ≥ 5) = 1 − 𝑃(𝑥 ≤ 4) = 1 − .993 = .007

4.190

c.

Since the probability obtained in part b is so small, it is unlikely that 5% applies to this agency. The percentage is probably greater than 5%.

a.

Using MINITAB with 𝜇 = 7.5 and 𝜎 = 2.5, the probability is: Cumulative Distribution Function Normal with mean = 7.5 and standard deviation = 2.5 x 9

P( X <= x ) 0.725747

𝑃(𝑥 ≤ 9) = .725747. Since this probability is less than .90, the regulations are not being met at EMS station A. b.

Using MINITAB with 𝜇 = 7.5 and 𝜎 = 2.5, the probability is: Cumulative Distribution Function Normal with mean = 7.5 and standard deviation = 2.5 x 2

P( X <= x ) 0.0139034

𝑃(𝑥 ≤ 2) = .0139034. Since this probability is so small, it would be very unlikely that the call was serviced by Station A. 4.191

For 𝜇 = 13.93 and 𝜎 = 21.65, we want to find k such that 𝑃(𝑥 > 𝑘) = .80. First, we find z0 that corresponds to 𝑃(𝑧 > 𝑧 ) = .80 or 𝑃(𝑧 < 𝑧 < 0) = .30. Using Table II, Appendix D, 𝑧 = −.84. Thus, 𝑧 =

⇒ −.84 =

. .

⇒ −18.186 = 𝑘 − 13.93 ⇒ 𝑘 = −4.256%

Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions 4.192

To find the probability distribution of x, we sum the probabilities associated with the same value of x. The probability distribution is: x p(x)

4.193

225

a.

8.5 .462189

9 .288764

9.5 .141671

10 .069967

10.5 .025236

11 .011657

12 .000518

The properties of valid probability distributions are: ∑ 𝑝(𝑥) = 1 and 0 ≤ 𝑝(𝑥) ≤ 1 for all x. For ARC a1: 0 ≤ 𝑝(𝑥) ≤ 1 for all x and ∑ 𝑝(𝑥) = .05 + .10 + .25 + .60 = 1.00 Thus, this is a valid probability distribution. For ARC a2: 0 ≤ 𝑝(𝑥) ≤ 1 for all x and ∑ 𝑝(𝑥) = .10 + .30 + .60 + 0 = 1.00 Thus, this is a valid probability distribution. For ARC a3: 0 ≤ 𝑝(𝑥) ≤ 1 for all x and ∑ 𝑝(𝑥) = .05 + .25 + .70 + 0 = 1.00 Thus, this is a valid probability distribution. For ARC a4: 0 ≤ 𝑝(𝑥) ≤ 1 for all x and ∑ 𝑝(𝑥) = .90 + .10 + 0 + 0 = 1.00 Thus, this is a valid probability distribution.

b.

For Arc a1, 𝑃(𝑥 > 1) = 𝑃(𝑥 = 2) + 𝑃(𝑥 = 3) = .25 + .6 = .85

c.

For Arc a2, 𝑃(𝑥 > 1) = 𝑃(𝑥 = 2) = .60 For Arc a3, 𝑃(𝑥 > 1) = 𝑃(𝑥 = 2) = .70 For Arc a4, 𝑃(𝑥 > 1) = 0

d.

For Arc a1, 𝐸(𝑥) = ∑ 𝑥𝑝(𝑥) = 0(. 05) + 1(. 10) + 2(. 25) + 3(. 60) = 0 + .10 + .50 + 1.80 = 2.40 The average capacity of Arc a1 is 2.40. For Arc a2, 𝐸(𝑥) = ∑ 𝑥𝑝(𝑥) = 0(. 10) + 1(. 30) + 2(. 60) = 0 + .30 + 1.20 = 1.50 The average capacity of Arc a2 is 1.50. For Arc a3, 𝐸(𝑥) = ∑ 𝑥𝑝(𝑥) = 0(. 05) + 1(. 25) + 2(. 70)+= 0 + .25 + 1.40 = 1.65 The average capacity of Arc a3 is 1.65. For Arc a4, 𝐸(𝑥) = ∑ 𝑥𝑝(𝑥) = 0(. 90) + 1(. 10) = 0 + .10 = .10 The average capacity of Arc a4 is 0.10.

e.

For Arc a1, 𝜎 = 𝐸[(𝑥 − 𝜇)] = ∑(𝑥 − 𝜇) 𝑝(𝑥) = (0 − 2.4) (. 05) + (1 − 2.4) (. 10) + (2 − 2.4) (. 25) + (3 − 2.4) (. 60) = (−2.4) (. 05) + (−1.4) (. 10) + (−.4) (. 25) + (. 6) (. 60) = .288 + .196 + .04 + .216 = .74 𝜎 = √. 74 = .86 We would expect most observations to fall within 2 standard deviations of the mean or 2.40 ± 2(. 86) ⇒ 2.40 ± 1.72 ⇒ (. 68,4.12) For Arc a2, Copyright © 2022 Pearson Education, Inc.


226

Chapter 4 𝜎 = 𝐸[(𝑥 − 𝜇)] = ∑(𝑥 − 𝜇) 𝑝(𝑥) = (0 − 1.5) (. 10) + (1 − 1.5) (. 30) + (2 − 1.5) (. 60) = (−1.5) (. 10) + (−.5) (. 30) + (. 5) (. 60) = .225 + .075 + .15 = .45 𝜎 = √. 45 = .67 We would expect most observations to fall within 2 standard deviations of the mean or 1.50 ± 2(. 67) ⇒ 1.50 ± 1.34 ⇒ (. 16,2.84) For Arc a3, 𝜎 = 𝐸[(𝑥 − 𝜇)] = ∑(𝑥 − 𝜇) 𝑝(𝑥) = (0 − 1.65) (. 05) + (1 − 1.65) (. 25) + (2 − 1.65) (. 70) = (−1.65) (. 05) + (−.65) (. 25) + (. 35) (. 70) = .136125 + .105625 + .08575 = .3275 𝜎 = √. 3275 = .57 We would expect most observations to fall within 2 standard deviations of the mean or 1.65 ± 2(. 57) ⇒ 1.65 ± 1.14 ⇒ (. 51,2.79) For Arc a4, 𝜎 = 𝐸[(𝑥 − 𝜇)] = ∑(𝑥 − 𝜇) 𝑝(𝑥) = (0 − .1) (. 90) + (1 − .1) (. 10) = (−.1) (. 90) + (. 9) (. 10) = .009 + .081 = .090 𝜎 = √. 09 = .30 We would expect most observations to fall within 2 standard deviations of the mean or . 10 ± 2(. 30) ⇒ .10 ± .60 ⇒ (−.50, .70)

4.194

Let x = number of doctors who refuse ethics consultation in 𝑛 = 10 trials. From Exercise 2.11, we can estimate p with 𝑝 = .195. Then x will be a binomial random variable with 𝑛 = 10 and 𝑝 = .195. Using MINITAB with 𝑛 = 10 and 𝑝 = .195, the probability is: Cumulative Distribution Function Binomial with n = 10 and p = 0.195 x 1

P( X <= x ) 0.391097

𝑃(𝑥 ≥ 2) = 1 − 𝑃(𝑥 ≤ 1) = 1 − .391097 = .608903 4.195

Let x = perception of light flicker. Then x has an approximate normal distribution with 𝜇 = 2.2 and 𝜎 = .5. 𝑃(𝑥 > 3) = 𝑃 𝑧 >

4.196

. .

= 𝑃(𝑧 > 1.6) = .5 − 𝑃(0 < 𝑧 < 1.6) = .5 − .4452 = .0548

Let x = number of spoiled bottles in the sample of 3. Since the sampling will be done without replacement, x is a hypergeometric random variable with 𝑁 = 12, 𝑛 = 3, and 𝑟 = 1. 𝑃(𝑥 = 1) =

=

=

! ! ! ! ! ! ! ! !

=

= .25

Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions 4.197

227

Let x = demand for white bread. Then x is a normal random variable with 𝜇 = 7200 and 𝜎 = 300: a.

𝑃(𝑥 ≤ 𝑥 ) = .94. Find x0.

 x − 7200  P( x ≤ x0 ) = P  z ≤ 0  = P( z ≤ z0 ) = .94 300   𝐴 = .94 − .50 = .4400 Using Table II and area .4400, 𝑧 = 1.555. z0 =

b.

x 0 − 7200 x − 7200  1.555 = 0  x0 = 7666.5 ≈ 7667 300 300

If the company produces 7,667 loaves, the company will be left with more than 500 loaves if the demand is less than 7,667 − 500 = 7,167. 𝑃(𝑥 < 7167) = 𝑃 𝑧 <

= 𝑃(𝑧 < −.11) = .5 − .0438 = .4562

(from Table II, Appendix D) Thus, on 45.62% of the days the company will be left with more than 500 loaves. 4.198

a.

If a large number of measurements are observed, then the relative frequencies should be very good estimators of the probabilities.

b.

𝐸(𝑥) = ∑ 𝑥𝑝(𝑥) = 1(. 01) + 2(. 04) + 3(. 04) + 4(. 08) + 5(. 10) + 6(. 15) + 7(. 25) +8(. 20) + 9(. 08) + 10(. 05) = .01 + .08 + .12 + .32 + .50 + .90 + 1.75 + 1.60 + .72 + .50 = 6.50

The average number of checkout lanes per store is 6.5. c.

𝜎 = ∑All (𝑥 − 𝜇) 𝑝(𝑥) = (1 − 6.5) (. 01) + (2 − 6.5) (. 04) + (3 − 6.5) (. 04) + (4 − 6.5) (. 08) +(5 − 6.5) (. 10) + (6 − 6.5) (. 15) + (7 − 6.5) (. 25) +(8 − 6.5) (. 20) + (9 − 6.5) (. 08) + (10 − 6.5) (. 05) = .3025 + .8100 + .4900 + .5000 + .2250 + .0375 + .0625 + .4500 + .5000 + .6125 = 3.99 𝜎 = √3.99 = 1.9975

d.

Chebyshev's Rule says that at least 0 of the observations should fall in the interval 𝜇 ± 𝜎. Chebyshev's Rule says that at least 75% of the observations should fall in the interval 𝜇 ± 2𝜎.

e.

𝜇 ± 𝜎 ⇒ 6.5 ± 1.9975 ⇒ (4.5025,8.4975) 𝑃(4.5025 ≤ 𝑥 ≤ 8.4975) = .10 + .15 + .25 + .20 = .70 This is at least 0. 𝜇 ± 2𝜎 ⇒ 6.5 ± 2(1.9975) ⇒ 6.5 ± 3.995 ⇒ (2.505,10.495) 𝑃(2.505 ≤ 𝑥 ≤ 10.495) = .04 + .08 + .10 + .15 + .25 + .20 + .08 + .05 = .95 This is at least .75 or 75%. Copyright © 2022 Pearson Education, Inc.


228

4.199

Chapter 4

a.

Let x = rating. Then x has a normal distribution with 𝜇 = 50 and 𝜎 = 15. Using Table II, Appendix D, 𝑃(𝑥 > 𝑥 ) = .10. Find xo. = 𝑃(𝑧 > 𝑧 ) = .10 𝑃(𝑥 > 𝑥 ) = 𝑃 𝑧 > 𝐴 = .5 − .10 = .4000 Looking up area .4000 in Table II, 𝑧 = 1.28 𝑧 =

b.

⇒ 𝑥 = 50 + 1.28(15) = 69.2

𝑃(𝑥 > 𝑥 ) = .10 + .20 + .40 = .70. Find xo. = 𝑃(𝑧 > 𝑧 ) = .70 𝑃(𝑥 > 𝑥 ) = 𝑃 𝑧 > 𝐴 = .70 − .5 = .2000 Looking up area .2000 in Table II, 𝑧 = −.52 𝑧 =

4.200

⇒ 1.28 =

⇒ −.52 =

⇒ 𝑥 = 50 − .52(15) = 42.2

a.

Let x = life length of CD-ROM. Then x has an exponential distribution with 𝜃 = 25, 000. 𝑅(𝑡) = 𝑃(𝑥 > 𝑡) = 𝑒 / ,

b.

𝑅(8,760) = 𝑃(𝑥 > 8,760) = 𝑒

b.

S(t) = probability that at least one of two drives has a length exceeding t hours

,

/

,

=𝑒 .

= .7044

= 1 – probability that neither has a length exceeding t hours = 1– 𝑃(𝑥 ≤ 𝑡)𝑃(𝑥 ≤ 𝑡) = 1– [1– 𝑃(𝑥 > 𝑡)][1– 𝑃(𝑥 > 𝑡)] 1– 𝑒 /25, = 1– 1– 𝑒 /25, = 2𝑒 = 1– 1– 2𝑒 /25, + 𝑒 /12, ,

/

,

−𝑒

,

/

,

/25,

–𝑒

/12,

d.

𝑆(8,760) = 2𝑒

= 2(. 7044) − .4962 = 1.4088 − .4962 = .9126

e.

The probability in part d is greater than that in part b. We would expect this. The probability that at least one of the systems lasts longer than 8,760 hours would be greater than the probability that only one system lasts longer than 8,760 hours.

4.201 Let x = number of defects per million. Then x has an approximate normal distribution with 𝜇 = 3. Using Table II, Appendix D,

𝑃(3 − 1.5𝜎 ≤ 𝑥 ≤ 3 + 1.5𝜎) = 𝑃 .4332 = .8664

.

≤𝑧≤

.

= 𝑃(−1.5 ≤ 𝑧 ≤ 1.5) = .4332 +

It is fairly likely that the goal will be met. Since the probability is .8664, the goal would be met approximately 86.64% of the time.

Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions

4.202

Let x = number of females promoted in the 72 employees awarded promotion, where x is a hypergeometric random variable. From the problem, 𝑁 = 302, 𝑛 = 72 ,and 𝑟 = 73. We need to find if observing 5 females who were promoted was fair. 𝐸(𝑥) = 𝜇 =

=

(

)

= 17.40

If 72 employees are promoted, we would expect that about 17 would be females. 𝑉(𝑥) = 𝜎 =

(

) ( (

) )

=

(

) (

(

) )

= 10.084

𝜎 = √10.084 = 3.176

Using Chebyshev’s Theorem, we know that at least 8/9 of all observations will fall within 3 standard deviations of the mean. The interval from 3 standard deviations below the mean to 3 standard deviations above the mean is: 𝜇 ± 3𝜎 ⇒ 17.40 ± 3(3.176) ⇒ 17.40 ± 9.528 ⇒ (7.872,26.928)

If there is no discrimination in promoting females, then we would expect between 8 and 26 females to be promoted within the group of 72 employees promoted. Since we observed only 5 females promoted, we would infer that females were not promoted fairly. 4.203

We know from the Empirical Rule that almost all the observations are larger than 𝜇 − 2𝜎. (≈ 95% are between 𝜇 − 2𝜎 and 𝜇 + 2𝜎). Thus 𝜇 − 2𝜎 > 100. For the binomial, 𝜇 = 𝑛𝑝 = 𝑛(. 4) = .0025 and 𝜎 =

𝑛𝑝𝑞 =

20(20) = 400 𝑛(. 4)(. 6) = √. 24𝑛

𝜇 − 2𝜎 > 100 ⇒ .4𝑛 − 2√. 24𝑛 > 100 ⇒ .4𝑛 − .98√𝑛 − 100 > 0

Solving for √𝑛, we get: .98 ± .98 2 − 4 (.4 )( −100 ) .98 ± 12.687 n= = 2(.4) .8  n = 17.084  n = 17.084 2 = 291.9 ≈ 292

4.204

229

Let x = tensile strength of a particular metal part. Then x is a normal random variable with 𝜇 = 25 and 𝜎 = 2. The tolerance limits are 21 and 30. 𝑃(𝑥 < 21) = 𝑃 𝑧 <

= 𝑃(𝑧 < −2) = .5 − .4772 = .0228 (Using Table II, Appendix D).

𝑃(21 < 𝑥 < 30) = 𝑃

<𝑧<

𝑃(𝑥 > 30) = 𝑃 𝑧 >

= 𝑃(𝑧 > 2.5) = .5 − .4938 = .0062

= 𝑃(−2 < 𝑧 < 2.5) = .4772 + .4938 = .9710

𝐸(Profit) = −$2(. 0228) + $10(.9710) − $1(. 0062) = −$. 0456 + $9.71 − $. 0062 = $9.66

Copyright © 2022 Pearson Education, Inc.


230

4.205

Chapter 4

a.

Since there are 20 possible outcomes that are all equally likely, the probability of any of the 20 numbers is 1/20. The probability distribution of x is: 𝑃(𝑥 = 5) = 1/20 = .05; 𝑃(𝑥 = 10) = 1/20 = .05; etc.

x

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

p(x) .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 b.

𝐸(𝑥) = ∑ 𝑥𝑝(𝑥) = 5(. 05) + 10(. 05) + 15(. 05) + 20(. 05) + 25(. 05) + 30(. 05) + 35(. 05) +40(. 05) + 45(. 05) + 50(. 05) + 55(. 05) + 60(. 05) + 65(. 05) + 70(. 05) + 75(. 05) +80(. 05) + 85(. 05) + 90(. 05) + 95(. 05) + 100(. 05) = 52.5

c.

𝜎 = 𝐸(𝑥 − 𝜇) = ∑(𝑥 − 𝜇) 𝑝(𝑥) = (5 − 52.5) (. 05) + (10 − 52.5) (. 05) +(15 − 52.5) (. 05) + (20 − 52.5) (. 05) + (25 − 52.5) (. 05) + (30 − 52.5) (. 05) +(35 − 52.5) (. 05) + (40 − 52.5) (. 05) + (45 − 52.5) (. 05) + (50 − 52.5) (. 05) +(55 − 52.5) (. 05) + (60 − 52.5) (. 05) + (65 − 52.5) (. 05) + (70 − 52.5) (. 05) +(75 − 52.5) (. 05) + (80 − 52.5) (. 05) + (85 − 52.5) (. 05) + (90 − 52.5) (. 05) +(95 − 52.5) (.05) + (100 − 52.5) (.05) = 831.25 𝜎 = √831.25 = 28.83

Since the uniform distribution is not mound-shaped, we will use Chebyshev's theorem to describe the data. We know that at least 8/9 of the observations will fall with 3 standard deviations of the mean and at least 3/4 of the observations will fall within 2 standard deviations of the mean. For this problem, 𝜇 ± 2𝜎 ⇒ 52.5 ± 2(28.83) ⇒ 52.5 ± 57.66 ⇒ (−5.16,110.16). Thus, at least 3/4 of the data will fall between −5.16 and 110.16. For our problem, all of the observations will fall within 2 standard deviations of the mean. Thus, x is just as likely to fall within any interval of equal length.

d.

If a player spins the wheel twice, the total number of outcomes will be 20(20) = 400. The sample space is: 5, 5 10, 5 5,10 10,10 5,15 10,15 . . . . . . 5,100 10,100

15, 5 15,10 15,15 . . . 15,100

20, 5 20,10 20,15 . . . 20,100

25, 5... 100, 5 25,10... 100,10 25,15... 100,15 . . . . . . 25,100... 100,100

Each of these outcomes are equally likely, so each has a probability of1/400 = .0025. Now, let x equal the sum of the two numbers in each sample. There is one sample with a sum of 10, two samples with a sum of 15, three samples with a sum of 20, etc. If the sum of the two numbers exceeds 100, then x is zero. The probability distribution of x is:

Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions

x 0 10 15 20 25 30 35 40 45 50

p(x) .5250 .0025 .0050 .0075 .0100 .0125 .0150 .0175 .0200 .0225

x 55 60 65 70 75 80 85 90 95 100

231

p(x) .0250 .0275 .0300 .0325 .0350 .0375 .0400 .0425 .0450 .0475

e.

We assumed that the wheel is fair, or that all outcomes are equally likely.

f.

𝜇 = 𝐸(𝑥) = ∑ 𝑥𝑝(𝑥) = 0(. 5250) + 10(. 0025) + 15(. 0050) + 20(. 0075) + ⋯ + 100(. 0475) = 33.25 𝜎 = 𝐸(𝑥 − 𝜇) = ∑(𝑥 − 𝜇) 𝑝(𝑥) = (0 − 33.25) (. 5250) + (10 − 33.25) (. 0025) +(15 − 33.25) (. 0050) + (20 − 33.25) (. 0075) + ⋯ + (100 − 33.25) (. 0475) = 1,471.3125 𝜎=

1,471.3125 = 38.3577

g.

𝑃(𝑥 = 0) = .525

h.

Given that the player obtains a 20 on the first spin, the possible values for x (sum of the two spins) are 0 (player spins 85, 90, 95, or 100 on the second spin), 25, 30, ..., 100. In order to get an x of 25, the player would spin a 5 on the second spin. Similarly, the player would have to spin a 10 on the second spin order to get an x of 30, etc. Since all of the outcomes are equally likely on the second spin, the distribution of x is: x 0 25 30 35 40 45 50 55 60

i.

p(x) .20 .05 .05 .05 .05 .05 .05 .05 .05

x 65 70 75 80 85 90 95 100

p(x) .05 .05 .05 .05 .05 .05 .05 .05

The probability that the players total score will exceed one dollar is the probability that x is zero. 𝑃(𝑥 = 0) = .20

Copyright © 2022 Pearson Education, Inc.


232

Chapter 4

j.

Given that the player obtains a 65 on the first spin, the possible values for x (sum of the two spins) are 0 (player spins 40, 45, 50, up to 100 on second spin), 70, 75, 80,..., 100. In order to get an x of 70, the player would spin a 5 on the second spin. Similarly, the player would have to spin a 10 on the second spin in order to get an x of 75, etc. Since all of the outcomes are equally likely on the second spin, the distribution of x is: x 0 70 75 80 85 90 95 100

p(x) .65 .05 .05 .05 .05 .05 .05 .05

The probability that the players total score will exceed one dollar is the probability that x is zero. 𝑃(𝑥 = 0) = .65. 4.206

a.

For this test, 𝑛 = 20 and 𝑝 = .10. Then x is a binomial random variable with 𝑛 = 20 and 𝑝 = .10. Using Table I, Appendix D, with𝑛 = 20 and 𝑝 = .10, 𝑃(𝑥 ≤ 1) = .392

b.

For the experiment in part a, the level of confidence is1 − 𝑃(𝑥 ≤ 1) = 1 − .392 = .608. Since this value is not close to 1, this would not be an acceptable level.

c.

Suppose we increased n from 20 to 25. Using Table I, Appendix D, with 𝑛 = 25 and 𝑝 = .10, 𝑃(𝑥 ≤ 1) = .271. This value is smaller than the value found in part a. The level of confidence is 1 − 𝑃(𝑥 ≤ 1) = 1 − .271 = .729.

Now, suppose we keep 𝑛 = 20, but change K to 0 instead of 1. Using Table I, Appendix D, with 𝑛 = 20 and 𝑝 = .10, 𝑃(𝑥 ≤ 1) = .122. This value is again, smaller than the value found in part a. The level of confidence is 1 − 𝑃(𝑥 ≤ 1) = 1 − .122 = .878.

d.

Suppose we let 𝐾 = 0. Now, we need to find n such that the level of confidence ≥ .95, which means that 𝑃(𝑥 = 0) ≤ .05. 𝑃(𝑥 = 0) =

𝑛 . 1 (. 9) 0

≤ .05 𝑛! (. 9) ≤ .05 ⇒ 0! 𝑛! ⇒ (. 9) ≤ .05 ⇒ ln(.9 ) ≤ 𝑙𝑛(. 05) ⇒ 𝑛ln(. 9) ≤ 𝑙𝑛(. 05) 𝑙𝑛(. 05) −2.99573 = = 28.4 ⇒𝑛≥ −.10536 ln(. 9)

Thus, if 𝐾 = 0, then we need a sample size of 29 or larger to get a level of confidence of at least .95.

Copyright © 2022 Pearson Education, Inc.


Random Variables and Probability Distributions

233

Now, suppose K = 1. Now, we need to find n such that the level of confidence is at least .95, which means that 𝑃(𝑥 ≤ 1) ≤ .05. 𝑛 𝑛 𝑃(𝑥 ≤ 1) = 𝑃(𝑥 = 0) + 𝑃(𝑥 = 1) = . 1 (. 9) + . 1 (. 9) ≤ .05 0 1 𝑛! 𝑛! (. 9) + (. 1) (. 9) ≤ .05 ⇒ 1! (𝑛 − 1)! 0! 𝑛! ⇒ (. 9) + 𝑛(. 1) (. 9) ≤ .05 (. 9) + (. 1)𝑛 ≤ 𝑙𝑛(. 05) ⇒ (. 9)

From here, we will use trial and error. For 𝑛 = 30, (. 9)

(. 9) + .1(30) = .1837

n 30

(. 9) (. 9)

40

(. 9)

45

(. 9)45

46

(. 9)46

(. 9) + (. 1)𝑛 (. 9) + (. 1)(30) = .1837 (. 9) + (. 1)(40) = .0805 (. 9) + (. 1)(45) = .0524 (. 9) + (. 1)(46) = .0480

Thus, for 𝐾 = 1, we would need a sample size of 46 to get a level of confidence of at least .95. 4.207

a.

Using Table II, Appendix D. For 𝜎 = 1: 𝑃(−1 < 𝑥 < 1) + 𝑃(4 < 𝑥 < 6) + 𝑃(9 < 𝑥 < 11) 1−5 4−5 6−5 9−5 11 − 5 −1 − 5 <𝑧< +𝑃 <𝑧< +𝑃 <𝑧< =𝑃 1 1 1 1 1 1 = 𝑃(−6 < 𝑧 < −4) + 𝑃(−1 < 𝑧 < 1) + 𝑃(4 < 𝑧 < 6) = 0 + .3413 + .3413 + 0 = .6826

For 𝜎 = 2: 𝑃(−1 < 𝑥 < 1) + 𝑃(4 < 𝑥 < 6) + 𝑃(9 < 𝑥 < 11) 1−5 4−5 6−5 9−5 11 − 5 −1 − 5 <𝑧< +𝑃 <𝑧< +𝑃 <𝑧< =𝑃 2 2 2 2 2 2 = 𝑃(−3 < 𝑧 < −2) + 𝑃(−.5 < 𝑧 < .5) + 𝑃(2 < 𝑧 < 3) = (. 4987 − .4772) + (. 1915 + .1915) + (. 4987 − .4772) = .4260

For 𝜎 = 4: 𝑃(−1 < 𝑥 < 1) + 𝑃(4 < 𝑥 < 6) + 𝑃(9 < 𝑥 < 11) 1−5 4−5 6−5 9−5 11 − 5 −1 − 5 <𝑧< +𝑃 <𝑧< +𝑃 <𝑧< =𝑃 4 4 4 4 4 4 = 𝑃(−1.5 < 𝑧 < −1) + 𝑃(−.25 < 𝑧 < .25) + 𝑃(1 < 𝑧 < 1.5) = (. 4332 − .3413) + (. 0948 + .0948) + (. 4332 − .3413) = .3734

Copyright © 2022 Pearson Education, Inc.


234

Chapter 4

b.

For 𝜎 = 1, 764 of the 1100 flechettes hit a target. The proportion is 764/1100 = .6945. This is a little higher than the probability that was computed in part a. For 𝜎 = 2, 462 of the 1100 flechettes hit a target. The proportion is 462/1100 = .42. This is very close to the probability that was computed in part a. For 𝜎 = 4, 408 of the 1100 flechettes hit a target. The proportion is 408/1100 = .3709. Again, this is very close to the probability that was computed in part a.

c.

If the Army wants to maximize the chance of hitting the target that the prototype gun us aimed at, then 𝜎 should be set at 1. The probability of hitting the target is .6826. If the Army wants to hit multiple targets with a single shot of the weapon, then 𝜎 should be set at 2. The probability of hitting at least one of the targets is .4260.

4.208

Let x = number of disasters in 25 trials. If NASA’s assessment is correct, then x is a binomial random variable with 𝑛 = 25 and 𝑝 = 1/60,000 = .00001667. If the Air Force’s assessment is correct, then x is a binomial random variable with 𝑛 = 25 and 𝑝 = 1/35 = .02857. If NASA’s assessment is correct, then the probability of no disasters in 25 missions would be: 𝑃(𝑥 = 0) =

25 (1/60,000) (59,999/60,000) 0

= .9996

Thus, the probability of at least one disaster would be 𝑃(𝑥 ≥ 1) = 1 − 𝑃(𝑥 = 0) = 1 − .9996 = .0004 If the Air Force’s assessment is correct, then the probability of no disasters in 25 missions would be: 𝑃(𝑥 = 0) =

25 (1/35) (34/35) 0

= .4845

Thus, the probability of at least one disaster would be 𝑃(𝑥 ≥ 1) = 1 − 𝑃(𝑥 = 0) = 1 − .4845 = .5155 One disaster actually did occur. If NASA’s assessment was correct, it would be almost impossible for at least one disaster to occur in 25 trials. If the Air Force’s assessment was correct, one disaster in 25 trials would not be an unusual event. Thus, the Air Force’s assessment appears to be appropriate.

Copyright © 2022 Pearson Education, Inc.


Chapter 5 Sampling Distributions 5.1

a–b. The different samples of 𝑛 = 2 with replacement and their means are: 𝒙̄ 0 1 2 3 1 2 3 4

Possible Samples 0, 0 0, 2 0, 4 0, 6 2, 0 2, 2 2, 4 2, 6

Possible Samples 4, 0 4, 2 4, 4 4, 6 6, 0 6, 2 6, 4 6, 6

𝒙̄ 2 3 4 5 3 4 5 6

c.

Since each sample is equally likely, the probability of any 1 being selected is

d.

𝑃(𝑥̄ = 0) = 𝑃(𝑥̄ = 1) =

+

=

𝑃(𝑥̄ = 2) =

+

+

=

𝑃(𝑥̄ = 3) =

+

+

+

𝑃(𝑥̄ = 4) =

+

+

=

𝑃(𝑥̄ = 5) =

+

=

𝑥̄ 0 1 2 3 4 5 6

=

=

𝒑(𝒙̄ ) 1/16 2/16 3/16 4/16 3/16 2/16 1/16

𝑃(𝑥̄ = 6) = e.

Using MINITAB, the graph is: Histogram of x-bar .25

Probability

.1875

.125

.0625

0

5.2

0

1

2

3 x-bar

4

5

6

Answers will vary. Using a statistical package, 100 samples of size 2 with replacement were generated from the population containing 0, 2, 4, and 6. The sample mean was computed for each of the 100 samples of size 2. The relative frequency distribution for these 100 sample means is:

235 Copyright © 2022 Pearson Education, Inc.


236

Chapter 5

𝑥̄ 0 1 2 3 4 5 6

Relative frequency .04 .15 .17 .30 .21 .10 .03

Frequency 4 15 17 30 21 10 3

𝑝(𝑥̄ ) 1/16 = .0625 2/16 = .1250 3/16 = .1875 4/16 = .2500 3/16 = .1875 2/16 = .1250 1/16 = .0625

The exact distribution is in the last column headed with 𝑝(𝑥̄ ). The relative frequencies from this sample are similar to the probabilities from the exact distribution. 5.3

If the observations are independent of each other, then 𝑃(1, 1) = 𝑝(1)𝑝(1) = .2(. 2) = .04 𝑃(1, 3) = 𝑝(1)𝑝(3) = .2(. 2) = .04 etc. a. Possible Sample 𝒙̄

𝒑(𝒙̄ )

Possible Samples

𝒙̄

𝒑(𝒙̄ )

1, 1 1, 2 1, 3 1, 4 1, 5 2, 1 2, 2 2, 3 2, 4 2, 5 3, 1 3, 2 3, 3

.04 .06 .04 .04 .02 .06 .09 .06 .06 .03 .04 .06 .04

3, 4 3, 5 4, 1 4, 2 4, 3 4, 4 4, 5 5, 1 5, 2 5, 3 5, 4 5, 5

3.5 4 2.5 3 3.5 4 4.5 3 3.5 4 4.5 5

.04 .02 .04 .06 .04 .04 .02 .02 .03 .02 .02 .01

1 1.5 2 2.5 3 1.5 2 2.5 3 3.5 2 2.5 3

𝑃(1, 2) = 𝑝(1)𝑝(2) = .2(. 3) = .06

Summing the probabilities, the probability distribution of is: 𝑥̄ 1 1.5 2 2.5 3 3.5 4 4.5 5

𝒑(𝒙̄ ) .04 .12 .17 .20 .20 .14 .08 .04 .01

Copyright © 2022 Pearson Education, Inc.


Sampling Distributions b.

237

Using MINITAB, the graph is: Histogram of x-bar .20

Probability

.15

.10

.05

0

5.4

1

1.5

2

2.5

3 x-bar

3.5

4

4.5

5

c.

𝑃(𝑥̄ ≥ 4.5) = .04 + .01 = .05

d.

No. The probability of observing 𝑥̄ = 4.5 or larger is small (.05).

𝐸(𝑥) = 𝜇 = ∑ 𝑥𝑝(𝑥) = 1(. 2) + 2(. 3) + 3(. 2) + 4(. 2) + 5(. 1) = .2 + .6 + .6 + .8 + .5 = 2.7 𝐸(𝑥̄ ) = ∑ 𝑥̄ 𝑝(𝑥̄ ) = 1.0(. 04) + 1.5(.12) + 2.0(.17) + 2.5(. 20) + 3.0(. 20) + 3.5(.14) + 4.0(. 08) +4.5(. 04) + 5.0(. 01) = .04 + .18 + .34 + .50 + .60 + .49 + .32 + .18 + .05 = 2.7

5.6

a.

For a sample of size 𝑛 = 2, the sample mean and sample median are exactly the same. Thus, the sampling distribution of the sample median is the same as that for the sample mean (see Exercise 5.3a).

b.

The probability histogram for the sample median is identical to that for the sample mean (see Exercise 5.3b).

a.

Answers will vary. A statistical package was used to generate 500 samples of size 15 from a uniform distribution on the interval from 150 to 200. The sample mean was computed for each sample of size 15. Using MINITAB, a histogram of the sample means is: Histogram of Mean 120 100 80 Frequency

5.5

60 40 20 0

156

162

168

174 Mean

180

186

192

Copyright © 2022 Pearson Education, Inc.


238

Chapter 5 b.

The sample medians were computed for each of the 500 samples of size 15 used in part a. Using MINITAB, a histogram of the sample medians is: Histogram of Median 120 100

Frequency

80 60 40 20 0

156

162

168

174 Median

180

186

192

The sampling distribution of the sample medians is more spread out than the sampling distribution of the sample means. In addition, there are more observations in the middle of the distribution of the sample means than the distribution of the sample medians. a.

Answers will vary. A statistical package was used to generate 500 samples of size 25 from a uniform distribution on the interval from 00 to 99. The sample mean was computed for each sample of size 25. Using MINITAB, a histogram of the sample means is: Histogram of Mean 60 50 40 Frequency

5.7

30 20 10 0

b.

30

36

42

48

54 Mean

60

66

72

The sample variances were computed for each of the 500 samples of size 25 used in part a. Using MINITAB, a histogram of the sample variances is:

Copyright © 2022 Pearson Education, Inc.


Sampling Distributions Histogram of Variance 60

Frequency

50 40 30 20 10 0

5.8

a.

400

600

𝜇 = ∑ 𝑥𝑝(𝑥) = 0

800

+1

1000 Variance

+4

1200

1400

= = 1.667

𝜎 = ∑(𝑥 − 𝜇) 𝑝(𝑥) = 0 −

+ 1−

+ 4−

𝒙̄ 0 0.5 2 0.5 1 2.5 2 2.5 4

Probability 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9

=

= 2.889

=

= = 1.667

b. Sample 0, 0 0, 1 0, 4 1, 0 1, 1 1, 4 4, 0 4, 1 4, 4 𝒙̄ 0 0.5 1 2 2.5 4

c.

𝐸(𝑥̄ ) = ∑ 𝑥̄ 𝑝(𝑥̄ ) = 0

Probability 1/9 2/9 1/9 2/9 2/9 1/9

+ 0.5

+1

+2

+ 2.5

+4

Since 𝐸(𝑥̄ ) = 𝜇, 𝑥̄ is an unbiased estimator for𝜇. d.

Recall that 𝑠 =

(∑ )

(

For the first sample, 𝑠 =

)

= 0.

Copyright © 2022 Pearson Education, Inc.

239


240

Chapter 5 (

)

For the second sample, 𝑠 =

( )

=

=

The rest of the values are shown in the table below. s2 0 0.5 8 0.5 0 4.5 8 4.5 0

Sample 0, 0 0, 1 0, 4 1, 0 1, 1 1, 4 4, 0 4, 1 4, 4

Probability 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9

The sampling distribution of𝑠 is: s2 0 0.5 4.5 8 e.

Probability 3/9 2/9 2/9 2/9

𝐸(𝑠 ) = ∑ 𝑠 𝑝(𝑠 ) = 0

+ .5

+ 4.5

+8

=

= 2.889

Since 𝐸(𝑠 ) = 𝜎 , 𝑠 is an unbiased estimator for 𝜎 . 5.9

a.

𝜇 = ∑ 𝑥𝑝(𝑥) = 2

b.

The possible samples of size𝑛 = 3, the sample means, and the probabilities are: Possible Samples 2, 2, 2 2, 2, 4 2, 2, 9 2, 4, 2 2, 4, 4 2, 4, 9 2, 9, 2 2, 9, 4 2, 9, 9 4, 2, 2 4, 2, 4 4, 2, 9 4, 4, 2

𝒙̄ 2 8/3 13/3 8/3 10/3 5 13/3 5 20/3 8/3 10/3 5 10/3

+4

+9

𝒑(𝒙̄ ) 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27

=

=5

m 2 2 2 2 4 4 2 4 9 2 4 4 4

Possible Samples 4, 4, 4 4, 4, 9 4, 9, 2 4, 9, 4 4, 9, 9 9, 2, 2 9, 2, 4 9, 2, 9 9, 4, 2 9, 4, 4 9, 4, 9 9, 9, 2 9, 9, 4 9, 9, 9

Copyright © 2022 Pearson Education, Inc.

𝒙̄ 4 17/3 5 17/3 22/3 13/3 5 20/3 5 17/3 22/3 20/3 22/3 9

𝒑(𝒙̄ ) 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27

m 4 4 4 4 9 2 4 9 4 4 9 9 9 9


Sampling Distributions The sampling distribution of 𝑥̄ is: 𝑥̄ 2 8/3 10/3 4 13/3 5 17/3 20/3 22/3 9

𝒑(𝒙̄ ) 1/27 3/27 3/27 1/27 3/27 6/27 3/27 3/27 3/27 1/27 27/27

𝐸(𝑥̄ ) = ∑ 𝑥̄ 𝑝(𝑥̄ ) = 2

+ + +4 + 17 3 20 3 22 3 1 6 + + + +9 +5 3 27 3 27 3 27 27 27

=

+

+

+

+

+

+

+

+

+

=

=5

Since 𝜇 = 5 in part a, and 𝐸(𝑥̄ ) = 𝜇 = 5, 𝑥̄ is an unbiased estimator of𝜇. c.

The median was calculated for each sample and is shown in the table in part b. The sampling distribution of m is: m 2 4 9

p(m) 7/27 13/27 7/27 27/27

𝐸(𝑚) = ∑ 𝑚𝑝(𝑚) = 2

+4

+9

=

+

+

=

The 𝐸(𝑚) = 4.778 ≠ 𝜇 = 5. Thus, m is a biased estimator of 𝜇.

5.10

d.

Use the sample mean, 𝑥̄ . It is an unbiased estimator.

a.

𝜇 = ∑ 𝑥𝑝(𝑥) = 0

+1

+2

=1

Copyright © 2022 Pearson Education, Inc.

= 4.778

241


242

Chapter 5

b. Sample 0, 0, 0 0, 0, 1 0, 0, 2 0, 1, 0 0, 1, 1 0, 1, 2 0, 2, 0 0, 2, 1 0, 2, 2 1, 0, 0 1, 0, 1 1, 0, 2 1, 1, 0 1, 1, 1

𝑥̄ 0 1/3 2/3 1/3 2/3 1 2/3 1 4/3 1/3 2/3 1 2/3 1

Probability 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27

Sample 1, 1, 2 1, 2, 0 1, 2, 1 1, 2, 2 2, 0, 0 2, 0, 1 2, 0, 2 2, 1, 0 2, 1, 1 2, 1, 2 2, 2, 0 2, 2, 1 2, 2, 2

𝑥̄ 4/3 1 4/3 5/3 2/3 1 4/3 1 4/3 5/3 4/3 5/3 2

Probability 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27 1/27

From the above table, the sampling distribution of the sample mean would be: 𝑥̄ 0 1/3 2/3 1 4/3 5/3 2

Probability 1/27 3/27 6/27 7/27 6/27 3/27 1/27

c. Sample m Probability Sample m Probability 0, 0, 0 0 1/27 1, 1, 2 1 1/27 0, 0, 1 0 1/27 1, 2, 0 1 1/27 0, 0, 2 0 1/27 1, 2, 1 1 1/27 0, 1, 0 0 1/27 1, 2, 2 2 1/27 0, 1, 1 1 1/27 2, 0, 0 0 1/27 0, 1, 2 1 1/27 2, 0, 1 1 1/27 0, 2, 0 0 1/27 2, 0, 2 2 1/27 0, 2, 1 1 1/27 2, 1, 0 1 1/27 0, 2, 2 2 1/27 2, 1, 1 1 1/27 1, 0, 0 0 1/27 2, 1, 2 2 1/27 1, 0, 1 1 1/27 2, 2, 0 2 1/27 1, 0, 2 1 1/27 2, 2, 1 2 1/27 1, 1, 0 1 1/27 2, 2, 2 2 1/27 1, 1, 1 1 1/27 From the table, the sampling distribution of the sample median would be: m 0 1 2 d.

𝐸(𝑥̄ ) = ∑ 𝑥̄ 𝑝(𝑥̄ ) = 0

+

Probability 7/27 13/27 7/27 +

+1

+

+

Copyright © 2022 Pearson Education, Inc.

+2

=1


Sampling Distributions

243

Since 𝐸(𝑥̄ ) = 𝜇, 𝑥̄ is an unbiased estimator for 𝜇. 𝐸(𝑚) = ∑ 𝑚𝑝(𝑚) = 0

+1

+2

=1

Since 𝐸(𝑚) = 𝜇, m is an unbiased estimator for 𝜇. e.

𝜎 ̄ = ∑(𝑥̄ − 𝜇) 𝑝(𝑥̄ ) = (0 − 1)

+

+

+ (2 − 1)

+

−1

−1

𝜎 = ∑(𝑚 − 1) 𝑝(𝑚) = (0 − 1) f. 5.11

+

−1

−1

+ (1 − 1)

= = .2222

+ (1 − 1)

+ (2 − 1)

=

= .5185

Since both the sample mean and median are unbiased estimators and the variance is smaller for the sample mean, the sample mean would be the preferred estimator of 𝜇.

Answers will vary. MINITAB was used to generate 500 samples of size n = 25 observations from a uniform population from 1 to 50. The first 10 samples along with the sample means and medians are shown in the table below:

Sample

Observations

1

28 27 11 19 50 30 47 26

2

8

Mean

9 33 50 15 21 41 31 41 35 32 32 17

Median

6 32 39 34 21 29.08

31

4 32 32 3 45 18 9 40 3 42 21 44 50 42 14 24 10 36 6 15 47 26 48 28 25.88

26

3

6 20 27 1 50 14 21 37 46 23 1 34 42 47 24 46 8 29 18 28 40 39 49 33 23 28.24

28

4

45 12 26 13 40 17 11 43

27.4

26

5

40 38 25 37 47

2 17 40 32 6 22 30 23 2 18 22 14

6 22 3 43 47 16 35 35 24.88

23

6

17 8 43 27 21

5 18 45 31 15 2 38 22 18

7

9

3 35 23 45 24 39 38 35 37 24.20

23

7

40 1 22 29 6

8 22 20 36 18 45 16 29 9

6

3 49 34 24 40 27 5 49 11 30 23.16

22

8

25 3 44 34 29

6 33 32 43 6 43 24 49 14 37 8 46 44 1 12 36 18 30 25 4

25.84

29

9

7 33 36 41 30 13 17 19 14 36 20 39 41 20 15 38 12 37 14 9 19 2 37 15 8

22.88

19

10

4 46 49 49 45 49 24 3 25 22 27 28 23 17 14 6 35

23.88

23

8 35 20

8 44 48 13 46 49 17 47 27

5 20 34

5 45 9 21 36

4 41 9 15 3

Copyright © 2022 Pearson Education, Inc.


244

Chapter 5 Using MINITAB, side-by side histograms of the means and medians of the 500 samples are: Histogram of Mean, Median 12

18

Mean

24

30

36

42

Median

140

Frequency

120 100 80 60 40 20 0

5.12

12

18

24

30

36

42

a.

Yes, it appears that𝑥̄ and the median are unbiased estimators of the population mean. The centers of both distributions above appear to be around 25 to 26. In fact, the mean of the sampling distribution of 𝑥̄ is 25.65 and the mean of the sampling distribution of the median is 25.73.

b.

The sampling distribution of the median has greater variation because it is more spread out than the sampling distribution of 𝑥̄ .

a.

The mean of the random variable x is 𝐸(𝑥) = 𝜇 = ∑ 𝑥𝑝(𝑥) = 1(. 2) + 2(. 3) + 3(. 2) + 4(. 2) + 5(. 1)+= 2.7 From Exercise 5.3, the sampling distribution of 𝑥̄ is: 𝑥̄ 1 1.5 2 2.5 3 3.5 4 4.5 5

𝒑(𝒙̄ ) .04 .12 .17 .20 .20 .14 .08 .04 .01

The mean of the sampling distribution of𝑥̄ is: 𝐸(𝑥̄ ) = ∑ 𝑥̄ 𝑝(𝑥̄ ) = 1(. 04) + 1.5(. 12) + 2(. 17) + 2.5(. 20) + 3(. 20) + 3.5(. 14) +4(. 08) + 4.5(. 04) + 5(. 01) = 2.7 Since 𝐸(𝑥̄ ) = 𝐸(𝑥) = 𝜇, 𝑥̄ is an unbiased estimator of 𝜇. b.

The variance of the sampling distribution of 𝑥̄ is: 𝜎 ̄ = ∑(𝑥̄ − 𝜇) 𝑝(𝑥̄ ) = (1 − 2.7) (. 04) + (1.5 − 2.7) (. 12) + (2 − 2.7) (. 17) +(2.5 − 2.7) (. 20) + (3 − 2.7) (. 20) + (3.5 − 2.7) (. 14) +(4 − 2.7) (. 08) + (4.5 − 2.7) (. 04) + (5 − 2.7) (. 01) = .805 Copyright © 2022 Pearson Education, Inc.


Sampling Distributions c.

245

𝜇 ± 2𝜎 ̄ ⇒ 2.7 ± 2√. 805 ⇒ 2.7 ± 1.794 ⇒ (. 906,4.494) 𝑃(. 906 ≤ 𝑥̄ ≤ 4.494) = .04 + .12 + .17 + .2 + .2 + .14 + .08 = .95

5.13

a.

Refer to the solution to Exercise 5.3. The values of s2 and the corresponding probabilities are listed below: 𝑠 =

(∑ )

For sample 1, 1; 𝑠 =

=0

For sample 1, 2: 𝑠 =

= .5

The rest of the values are calculated and shown: s2 0.0 0.5 2.0 4.5 8.0 0.5 0.0 0.5 2.0 4.5 2.0 0.5 0.0

p(s2) .04 .06 .04 .04 .02 .06 .09 .06 .06 .03 .04 .06 .04

s2 0.5 2.0 4.5 2.0 0.5 0.0 0.5 8.0 4.5 2.0 0.5 0.0

p(s2) .04 .02 .04 .06 .04 .04 .02 .02 .03 .02 .02 .01

The sampling distribution of s2 is: s2 0.0 0.5 2.0 4.5 8.0

p(s2) .22 .36 .24 .14 .04

b.

𝜎 = ∑(𝑥 − 𝜇) 𝑝(𝑥) = (1 − 2.7) (. 2) + (2 − 2.7) (. 3) + (3 − 2.7) (. 2) +(4 − 2.7) (. 2) + (5 − 2.7) (. 1) = 1.61

c.

𝐸(𝑠 ) = ∑ 𝑠 𝑝(𝑠 ) = 0(.22) + .5(.36) + 2(.24) + 4.5(.14) + 8(.04) = 1.61

d.

The sampling distribution of s is listed below, where 𝑠 = √𝑠 : s 0.000 0.707 1.414 2.121 2.828

e.

p(s) .22 .36 .24 .14 .04

𝐸(𝑠) = ∑ 𝑠𝑝(𝑠) = 0(. 22) + .707(. 36) + 1.414(. 24) + 2.121(. 14) + 2.828(. 04) = 1.00394 Since 𝐸(𝑠) = 1.00394 is not equal to 𝜎 = √𝜎 = √1.61 = 1.269, s is a biased estimator of 𝜎. Copyright © 2022 Pearson Education, Inc.


246

5.14

Chapter 5

The mean of the random variable x is: 𝐸(𝑥) = 𝜇 = ∑ 𝑥𝑝(𝑥) = 1(. 2) + 2(. 3) + 3(. 2) + 4(. 2) + 5(. 1) = 2.7 From Exercise 5.5, the sampling distribution of the sample median is: m p(m)

1 .04

1.5 .12

2 .17

2.5 .20

3 .20

3.5 .14

4 .08

4.5 .04

5 .01

The mean of the sampling distribution of the sample median m is: 𝐸(𝑚) = ∑ 𝑚𝑝(𝑚) = 1(. 04) + 1.5(. 12) + 2(. 17) + 2.5(. 20) + 3(. 20) + 3.5(. 14) + 4(. 08) +4.5(. 04) + 5(. 01) = 2.7 Since 𝐸(𝑚) = 𝜇, m is an unbiased estimator of 𝜇. 5.15

The sampling distribution is approximately normal only if the sample size is sufficiently large or if the population being sampled from is normal.

5.16

a.

𝜇 ̄ = 𝜇 = 10, 𝜎 ̄ = 𝜎/√𝑛 = 3/√25 = 0.6

b.

𝜇 ̄ = 𝜇 = 100, 𝜎 ̄ = 𝜎/√𝑛 = 25/√25 = 5

c.

𝜇 ̄ = 𝜇 = 20, 𝜎 ̄ = 𝜎/√𝑛 = 40/√25 = 8

d.

𝜇 ̄ = 𝜇 = 10, 𝜎 ̄ = 𝜎/√𝑛 = 100/√25 = 20

a.

𝜇 ̄ = 𝜇 = 100, 𝜎 ̄ =

b.

𝜇 ̄ = 𝜇 = 100, 𝜎 ̄ =

c.

𝜇 ̄ = 𝜇 = 100, 𝜎 ̄ =

d.

𝜇 ̄ = 𝜇 = 100, 𝜎 ̄ =

e.

𝜇 ̄ = 𝜇 = 100, 𝜎 ̄ =

f.

𝜇 ̄ = 𝜇 = 100, 𝜎 ̄ =

a.

𝜇 ̄ = 𝜇 = 20, 𝜎 ̄ = 𝜎/√𝑛 = 16/√64 = 2

b.

By the Central Limit Theorem, the distribution of𝑥̄ is approximately normal. In order for the Central Limit Theorem to apply, n must be sufficiently large. For this problem, 𝑛 = 64 is sufficiently large.

c.

𝑧=

d.

𝑧=

5.17

5.18

̄ ̄ ̄

̄ ̄ ̄

= =

.

=

=

=

=

=

=

√ √ √

= 5 = 2

= 1 = 1.414 = .447 = .316

= −2.25 = 1.50

Copyright © 2022 Pearson Education, Inc.


Sampling Distributions 5.19

5.20

247

In Exercise 5.18, it was determined that the mean and standard deviation of the sampling distribution of the sample mean are 20 and 2 respectively. Using Table II, Appendix D: a.

𝑃(𝑥̄ < 16) = 𝑃 𝑧 <

= 𝑃(𝑧 < −2) = .5 − .4772 = .0228

b.

𝑃(𝑥̄ > 23) = 𝑃 𝑧 >

= 𝑃(𝑧 > 1.50) = .5 − .4332 = .0668

c.

𝑃(𝑥̄ > 25) = 𝑃 𝑧 >

= 𝑃(𝑧 > 2.5) = .5 − .4938 = .0062

d.

𝑃(16 < 𝑥̄ < 22) = 𝑃

< 𝑧 <

e.

𝑃(𝑥̄ < 14) = 𝑃 𝑧 <

= 𝑃(𝑧 < −3) = .5 − .4987 = .0013

= 𝑃(−2 < 𝑧 < 1) = .4772 + .3413 = .8185

For this population and sample size, 𝐸(𝑥̄ ) = 𝜇 = 100, 𝜎 ̄ = 𝜎/√𝑛 = 10/√900 = 1/3

5.21

a.

Almost all of the time, the sample mean will be within three standard deviations of the mean, i.e., 𝜇 ± ⇒ 100 ± 1 ⇒ (99,101). Thus, the smallest value of 𝑥̄ we would expect is 99 and 3𝜎 ⇒ 100 ± 3 the largest value would be 101.

b. c.

=1 No more than three standard deviations, i.e., 3 No, the previous answer only depended on the standard deviation of the sampling distribution of the sample mean, not the mean itself.

By the Central Limit Theorem, the sampling distribution of 𝑥̄ is approximately normal with μ x = μ = 30 and 𝜎 ̄ = 𝜎/√𝑛 = 16/√100 = 1.6. Using Table II, Appendix D: = 𝑃(𝑧 ≥ −1.25) = .5 + .3944 = .8944

a.

𝑃(𝑥̄ ≥ 28) = 𝑃 𝑧 ≥

b.

𝑃(22.1 ≤ 𝑥̄ ≤ 26.8) = 𝑃

.

. – .

≤ 𝑧 ≤

. – .

= 𝑃(−4.94 ≤ 𝑧 ≤ −2) = .5 − .4772

= .0228

5.22

c.

𝑃(𝑥̄ ≤ 28.2) = 𝑃 𝑧 ≤

.

d.

𝑃(𝑥̄ ≥ 27.0) = 𝑃 𝑧 ≥

.

. .

= 𝑃(𝑧 ≤ −1.13) = .5 − .3708 = .1292 = 𝑃(𝑧 ≥ −1.88) = .5 + .4699 = .9699

Answers will vary. A computer package was used to generate 500 samples of size 𝑛 = 2. The sample mean was computed for each of the 500 samples. This was repeated for 500 samples of size 𝑛 = 5, 500 samples of size 𝑛 = 10, 500 samples of size 𝑛 = 30, and 500 samples of size 𝑛 = 50. Using MINITAB, the relative frequency histograms for 𝑥̄ for each of the sample sizes are:

Copyright © 2022 Pearson Education, Inc.


248

Chapter 5

Histogram of xbar2, xbar5, xbar10, xbar30, xbar50 15

30

45

xbar2

60

75

90

xbar5

xbar10 .6

Relative frequency

.4

.2

xbar30

xbar50

15

30

45

60

75

90

0

.6

.4

.2

0

15

30

45

60

75

90

All of the histograms look mound-shaped. As n increases, the spread of the values of 𝑥̄ decreases. 5.23

5.24

5.25

a.

𝜇 ̄ = 𝜇 = 40

𝜎̅ =

b.

By the Central Limit Theorem, the sampling distribution of 𝑥̄ is approximately normal.

c.

𝑃(𝑥̄ < 32) = 𝑃 𝑧 <

𝑥̄ −𝜇𝑥̄ 𝜎𝑥̄

=

= 64

=𝑃 𝑧<

= 𝑃(𝑧 < −1) = .5 − .3413 = .1587

(Using Table II, Appendix D)

a.

𝜇 ̄ = 𝜇 = 94,000

b.

𝜎̄=

c.

By the Central Limit Theorem, the sampling distribution of 𝑥̄ is approximately normal.

d.

𝑧=

e.

𝑃(𝑥̄ > 86,600) = 𝑃(𝑧 > −5.19) ≈ .5 + .5000 ≈ 1.000 (Using Table II, Appendix D)

a.

𝐸(𝑥̄ ) = 𝜇 = 353. The mean of the sampling distribution of𝑥̄ is 353.

b. c.

= 20 𝑉(𝑥̄ ) = = By the Central Limit Theorem, the sampling distribution of 𝑥̄ is approximately normal.

d.

𝑃(𝑥̄ > 400) = 𝑃 𝑧 >

=

̄ ̄ ̄

=

,

= 1,414.214

,

, ,

400 sags in a week.

= −5.19

.

= 𝑃(𝑧 > 10.51) ≈ 0. It is almost impossible to observe more than

Copyright © 2022 Pearson Education, Inc.


Sampling Distributions 5.26

249

By the Central Limit Theorem, the sampling distribution of 𝑥̄ is approximately normal. a.

̄ ̄

𝑃(𝑥̄ ≤ 5.2) = 𝑃 𝑧 ≤

=𝑃 𝑧≤.

̄

.

.

= 𝑃(𝑧 ≤ −.26) = .5 − .1026 = .3974

(Using Table II, Appendix D) b.

𝑃(𝑥̄ ≤ 5.2) = 𝑃 𝑧 ≤

̄ ̄

=𝑃 𝑧≤ .

̄

.

.

= 𝑃(𝑧 ≤ −31.45) ≈ .5 − .5 ≈ 0

(Using Table II, Appendix D) 5.27

5.28

5.29

a.

𝜇 ̄ = 𝜇 = 68. The average value of sample mean level of support is 68.

b.

𝜎̄=

c.

Because the sample size is large (n = 45 > 30), the Central Limit Theorem says that the sampling distribution of𝑥̄ is approximately normal.

d.

𝑃(𝑥̄ > 65) = 𝑃 𝑧 >

a.

𝐸(𝑥̄ ) = 𝜇 = 1.5. The mean of the sampling distribution of 𝑥̄ is 1.5.

b.

𝑉(𝑥̄ ) =

c.

By the Central Limit Theorem, the sampling distribution of 𝑥̄ is approximately normal.

d.

𝑧=

e.

𝑃(𝑥̄ > 1.52) = 𝑃(𝑧 > 1) = .5 − .3413 = .1587 (Using Table II, Appendix D)

f.

No. Because the sample size is greater than 30, the sampling distribution of 𝑥̄ is approximately normal, regardless of the shape of the distribution of x.

. √.

=

=

.

= 4.0249 The standard deviation of the distribution of the sample means is 4.0249.

.

= 𝑃(𝑧 > −.75) = .5 + .2734 = .7734 (Using Table II, Appendix D)

.

= .0004

=1

By the Central Limit Theorem, the sampling distribution of 𝑥̄ is approximately normal. a.

b.

c.

𝑃(𝑥̄ ≥ 150) = 𝑃 𝑧 ≥

𝑃(130 ≤ 𝑥̄ ≤ 140) = 𝑃

̄ ̄

.

=𝑃 𝑧≥

̄

̄ ̄ ̄

≤𝑧≤

̄

.

̄ ̄

= 𝑃(𝑧 ≥ −2.88) = .5 + .4980 = .9980

(Using Table II, Appendix D)

=𝑃

. .

≤𝑧≤

. .

= 𝑃(−.98 ≤ 𝑧 ≤ 1.82) = .3365 + .4656 = .8021

(Using Table II, Appendix D)

After repeated shooting sessions, 𝑃(𝑥̄ ≥ 150) = 𝑃 𝑧 ≥

̄

𝑃(𝑧 ≥ 4.61) ≈ .5 − .5000 = .0000

̄ ̄

=𝑃 𝑧≥

(Using Table II, Appendix D)

. .

=

A result of 150 N or more is much more likely for a shooter at rest. 5.30

a.

By the Central Limit Theorem, the sampling distribution of 𝑥̄ is approximately normal with a mean . = .1461. 𝜇 ̄ = 𝜇 = 5.1 and standard deviation 𝜎 ̄ = = √

Copyright © 2022 Pearson Education, Inc.


250

Chapter 5

.

𝑃(𝑥̄ > 5.5) = 𝑃 𝑧 >

5.31

.

= 𝑃(𝑧 > 2.74) = .5 − .4969 = .0031

.

b.

Because the probability of seeing a sample mean of 5.5 or larger for the non-video game players, we can infer that the mean of the population of non-video game players is greater than 5.1 or the standard deviation of the population of non-video game players is greater than .8.

a.

By the Central Limit Theorem, the sampling distribution of 𝑥̄ is approximately normal with 𝜇 ̄ = 𝜇 = = .5538. 6 and 𝜎 ̄ = = √

𝑃(𝑥̄ > 7.5) = 𝑃 𝑧 >

b.

.

= 𝑃(𝑧 > 2.71) = .5 − .4966 = .0034 (Using Table II, Appendix D)

.

We first need to find the probability of observing the current data or anything more unusual if the true mean is 6. 𝑃(𝑥̄ ≥ 300) = 𝑃 𝑧 ≥

= 𝑃(𝑧 ≥ 530.88) ≈ .5 − .5 = 0 (Using Table II, Appendix D)

.

Since the probability of observing a sample mean of 300 ppb or higher is essentially 0 if the true mean is 6 ppb, we would infer that the true mean PFOA concentration for the population of people who live near DuPont’s Teflon facility is not 6 ppb but higher than 6 ppb. 5.32

a.

By the Central Limit Theorem, the sampling distribution of 𝑥̄ will be approximately normal with . = .1. 𝜇 ̄ = 𝜇 = 3.7 and 𝜎 ̄ = = √

b.

5.33

̄

𝑃(𝑥̄ < 3.5) = 𝑃 𝑧 ≤

̄

=𝑃 𝑧≤

̄

.

. .

= 𝑃(𝑧 ≤ −2.00) = .5 − .4772 = .0228 (Using Table II, Appendix D)

c.

Any value of 𝑥̅ that is closer to 2.0 than it is to 3.7 would lead us to believe that x actually represents the 1st-male-image position number. Using the midpoint between 2.0 and 3.7 would be any value less than 2.85.

a.

From Exercise 2.33, the population of interarrival times is skewed to the right.

b.

The population mean and standard deviation are: 𝜇=

𝜎 =

= ∑

,

.

(∑ )

=

= 95.52 ,

,

(

.

,

.

)

= 8,348.027737

𝜎 = √8,348.027737 = 91.3675

c.

By the Central Limit Theorem, the sampling distribution of 𝑥̄ will be approximately normal. . = 14.4465. Theoretically, 𝜇 ̄ = 𝜇 = 95.52and𝜎 ̄ = = √

d.

𝑃(𝑥̄ < 90) = 𝑃 𝑧 <

. .

= 𝑃(𝑧 < −.38) = .5 − .1480 = .3520 (Using Table II, Appendix D.)

Copyright © 2022 Pearson Education, Inc.


Sampling Distributions

251

e &f. Answers will vary. A statistical package was used to randomly select 40 interarrival times from the Phishing data set and 𝑥̄ was computed. This was repeated 50 times to simulate 50 students selecting 40 interarrival times and computing 𝑥̄ . Using MINITAB, a histogram of the 50 𝑥̄ -values is: Histogram of Means 16 14

Frequency

12 10 8 6 4 2 0

60

80

100

120

Means

This shape is somewhat normal. g.

Using MINITAB, the mean and standard deviation of these 50 means is: Descriptive Statistics: Means Variable Means

N 50

Mean 96.09

StDev 14.08

Minimum 52.73

Q1 86.36

Median 95.65

Q3 105.23

Maximum 130.23

The mean of these 50 means is 96.09. This is very close to 𝜇 ̄ = 95.52 found in part c. The standard . = 14.4465 found in deviation of these 50 means is 14.08. This is also very close to 𝜎 ̄ = =

5.34

part c.

Let x = a company’s domestic return on sales (DROS). By the Central Limit Theorem, the sampling . = .029884. distribution of 𝑥̄ will be approximately normal with 𝜇 ̄ = 𝜇 = .149 and 𝜎 ̄ = = √

𝑃(𝑥̄ ≥ .18) = 𝑃 𝑧 ≥

.

.

= 𝑃(𝑧 > 1.04) = .5 − .3508 = .1492

.

Because the probability of observing a DROS value similar to the one observed is .1492, we would say that such an outcome is fairly likely. This result does not give enough evidence to support the financial accountant’s claim. 5.35

For 𝑛 = 50, we can use the Central Limit Theorem to decide the shape of the distribution of the sample mean bacterial counts. For the handrubbing sample, the sampling distribution of 𝑥̄ is approximately normal = 8.344. For the handwashing sample, the with a mean of 𝜇 ̄ = 35 and standard deviation = √

sampling distribution of 𝑥̄ is approximately normal with a mean of 𝜇 ̄ = 69 and standard deviation √

=

= 14.991.

For Handrubbing: 𝑃(𝑥̄ < 30|𝜇 = 35) = 𝑃 𝑧 <

.

= 𝑃(𝑧 < −.60) = .5 − .2257 = .2743 (using Table II, Appendix D)

Copyright © 2022 Pearson Education, Inc.


252

Chapter 5 For Handwashing: 𝑃(𝑥̄ < 30|𝜇 = 69) = 𝑃 𝑧 <

= 𝑃(𝑧 < −2.6) = .5 − .4953 = .0047 (using Table II, Appendix D)

.

Since the probability of getting a sample mean of less than 30 for the handrubbing is not small compared with that for the handwashing, the sample of workers probably came from the handrubbing group. 5.36

5.37

5.38

5.39

a.

𝜇 = 𝑝 = .2and 𝜎 =

(

)

=

. (

. )

= .0566

b.

𝜇 = 𝑝 = .2and 𝜎 =

(

)

=

. (

. )

= .0126

c.

𝜇 = 𝑝 = .2and 𝜎 =

(

)

=

. (

. )

= .02

a.

𝜇 = 𝑝 = .1and 𝜎 =

(

)

=

. (

. )

= .0134

b.

𝜇 = 𝑝 = .5and 𝜎 =

(

)

=

. (

. )

= .0224

c.

𝜇 = 𝑝 = .7and 𝜎 =

(

)

=

. (

. )

= .0205

a.

𝜇 = 𝑝 = .3and 𝜎 =

(

)

=

. (

. )

= .0512

b.

The sampling distribution of 𝑝̂ will be approximately normal since the sample size is sufficiently large.

c.

𝑧=

d.

𝑃(𝑝̂ > .35) = 𝑃 𝑧 >

a.

𝐸(𝑝̂ ) = 𝜇 = 𝑝 = .85and 𝜎 =

b.

The sampling distribution of 𝑝̂ will be approximately normal since the sample size is sufficiently large.

c.

𝑃(𝑝̂ < .9) = 𝑃 𝑧 <

(

)

=

.

.

. (

. )

,

= .98

.

.

. (

. .

. (

= 𝑃(𝑧 > .98) = .5 − .3365 = .1635 (Using Table II, Appendix D)

. )

.

)

(

)

=

.

(

.

)

= .0226

= 𝑃(𝑧 < 2.21) = .5 + .4864 = .9864 (Using Table II, Appendix D)

5.40.

We would not expect to see any values of 𝑝̂ more than 3 standard deviations below or above the mean value of 𝑝̂ . 𝑧=

(

)

⇒ −3 =

. . (

. )

⇒ −3

. (

. )

= 𝑝̂ − .4 ⇒ −.0379 = 𝑝̂ − .4 ⇒ 𝑝̂ = .3621

Copyright © 2022 Pearson Education, Inc.


Sampling Distributions

253

The smallest value we would expect for 𝑝̂ would be .3621. 𝑧=

(

)

⇒3=

. . (

. )

⇒3

. (

. )

= 𝑝̂ − .4 ⇒ .0379 = 𝑝̂ − .4 ⇒ 𝑝̂ = .4379

The largest value we would expect for 𝑝̂ would be .4379. a.

Answers will vary. Using a statistical package, 500 samples of size 10 were generated from the population of (0,1). The histogram of the 500 sample proportions is: Histogram of p-hat10 120

Frequency

100 80 60 40 20 0

0.0

0.2

0.4

0.6

0.8

1.0

p-hat10

b.

Using a statistical package, 500 samples of size 25 were generated from the population of (0,1). The histogram of the 500 sample proportions is: Histogram of p-hat25 100

80

Frequency

5.41

60

40

20

0

c.

0.12

0.24

0.36

0.48 p-hat25

0.60

0.72

Using a statistical package, 500 samples of size 100 were generated from the population of (0,1). The histogram of the 500 sample proportions is:

Copyright © 2022 Pearson Education, Inc.


254

Chapter 5 Histogram of p-hat100 50

Frequency

40

30

20

10

0

5.42

0.40

0.44

0.48 0.52 p-hat100

0.56

0.60

0.64

d.

As the sample size increases, the spread of the values of 𝑝̂ decreases. In the graph in part a, the spread of the values of 𝑝̂ is from 0 to 1. In the graph in part b, the spread of the values of 𝑝̂ is from .20 to .76. In the graph in part c, the spread of the values of 𝑝̂ is from .37 to .64. In all graphs, the distributions are mound-shaped. As the sample size increases, the distribution becomes more peaked.

a.

𝐸(𝑝̂ ) = 𝑝 = .70

b.

𝜎 =

c.

The sampling distribution of 𝑝̂ will be approximately normal since the sample size is sufficiently large.

d.

𝑃(𝑝̂ > .80) = 𝑃 𝑧 >

(

)

=

. (

. )

= .0529

.

. . (

= 𝑃(𝑧 > 1.89) = .5 − .4706 = .0294

. )

(Using Table II, Appendix D) 5.43

a.

𝐸(𝑝̂ ) = 𝑝 = .20

b.

𝜎 =

c.

The sampling distribution of 𝑝̂ will be approximately normal since the sample size is sufficiently large.

d.

𝑃(𝑝̂ < .17) = 𝑃 𝑧 <

(

)

=

. (

. )

= .01789

.

.

. (

= 𝑃(𝑧 < −1.67) = .5 − .4525 = .0475

. )

(Using Table II, Appendix D)

5.44

e.

𝑃(𝑝̂ > .15) = 𝑃 𝑧 >

a.

𝜇 = 𝑝 = .4 and 𝜎 =

.

.

. (

(

= 𝑃(𝑧 > −2.79) = .5 + .4974 = .9974

. )

)

=

. (

. )

= .0476

Copyright © 2022 Pearson Education, Inc.


Sampling Distributions

255

b.

The sampling distribution of 𝑝̂ will be approximately normal since the sample size is sufficiently large.

c.

𝑃(𝑝̂ > .59) = 𝑃 𝑧 >

.

.

. (

= 𝑃(𝑧 > 3.99) = .5 − .49997 = .00003

. )

(Using Table II, Appendix D)

5.45

d.

𝑝̂ = = = .59. We found in part c that 𝑃(𝑝̂ > .59) = .00003. Since this probability is so small, it casts doubt on the assumption that 40% of all social robots are designed with legs, but no wheels.

a.

For a sample of 500 adults, 𝑛𝑝̂ = 500(. 64) = 320 and 𝑛𝑞 = 500(. 36) = 180. Since both these values are at least 15, n is considered large. The sampling distribution of 𝑝̂ will be approximately (

normal with 𝜇 = 𝑝 = .64 and 𝜎 =

.

𝑃(. 55 < 𝑝̂ < .65) = 𝑃

5.46

. (

.

.

)

.

.

<𝑧<

)

. .

.

=

. (

.

)

(

.

)

= .0215.

= 𝑃(−4.19 < 𝑧 < .47) ≈ .5 + .1808 ≈ .6808.

= 𝑃(𝑧 > 5.12) ≈ .5 − .5 ≈ 0

b.

𝑃(𝑝̂ > .75) = 𝑃 𝑧 >

a.

To see if x is approximately a binomial random variable we check the characteristics:

(

.

)

.

1.

n identical trials. Although the trials are not exactly identical, they are close. Taking a sample of reasonable size n from a very large population will result in trials being essentially identical.

2.

Two possible outcomes. The hospital-related injuries were due to overexertion or not. S = hospital-related injury was due to overexertion and F = hospital-related injury was not due to overexertion.

3.

P(S) remains the same from trial to trial. P(S) = .48.

4.

Trials are independent. The outcome of one hospital-related injury does not affect the outcome of another hospital-related injury.

5.

The random variable x = number of hospital-related injuries that were due to overexertion in n trials.

Thus, x is a binomial random variable. b.

The estimated value of p is .48.

c.

𝜇 = 𝑝 = .48 and 𝜎 =

d.

𝑃(𝑝̂ < .40) = 𝑃 𝑧 <

(

. .

)

. (

.

)

=

.

(

.

)

= .0500

= 𝑃(𝑧 < 6.85) − 1.60 = .5 − .4452 = .0548

Copyright © 2022 Pearson Education, Inc.


256 5.47

Chapter 5 Let B = guest experienced a better-than-expected quality of sleep R = guest would definitely return to the hotel brand 𝑃(𝐵 ∩ 𝑅) = 𝑃(𝐵)𝑃(𝑅|𝐵) = .30(. 70) = .21 = 𝑝 For a sample of 100 adults, 𝑛𝑝̂ = 100(. 21) = 21 and 𝑛𝑞 = 100(. 79). Since both these values are at least 15, n is considered large. The sampling distribution of 𝑝̂ will be approximately normal with 𝜇 = 𝑝 = .21 and 𝜎 =

(

)

=

𝑃(𝑝̂ < .10) = 𝑃 𝑧 <

5.48

a.

(

.

. .

)

.

. (

.

)

= .0407.

= 𝑃(𝑧 < −2.70) = .5 − 4965 = .0035

For a sample of 300 adults, 𝑛𝑝̂ = 300(. 45) = 135 and 𝑛𝑞 = 300(. 55) = 165. Since both these values are at least 15, n is considered large. The sampling distribution of 𝑝̂ will be approximately (

normal with 𝜇 = 𝑝 = .45 and 𝜎 =

𝑃 𝑝̂ >

b.

.

= (𝑝̂ > .667) = 𝑃 𝑧 >

.

)

= .0287.

= 𝑃(𝑧 > 7.54) ≈ .5 − .5 ≈ 0

)

.

(

.

=

. (

.

For a sample of 300 adults, 𝑛𝑝̂ = 300(. 30) = 90 and 𝑛𝑞 = 300(. 70) = 210. Since both these values are at least 15, n is considered large. The sampling distribution of 𝑝̂ will be approximately (

normal with 𝜇 = 𝑝 = .30 and 𝜎 =

𝑃 𝑝̂ >

5.49

)

= (𝑝̂ > .667) = 𝑃 𝑧 >

)

.

. . (

. (

=

. )

= .0265.

= 𝑃(𝑧 > 13.86) ≈ .5 − .5 ≈ 0

. )

a.

𝐸(𝑝̂ ) = 𝜇 = 𝑝 = .92

b.

By the Central Limit theorem, the sampling distribution of 𝑝̂ will be approximately normal since the (

sample size is sufficiently large, with 𝜇 = 𝑝 = .92 and 𝜎 = 𝑃 𝑝̂ <

= 𝑃(𝑝̂ < .9) = 𝑃 𝑧 <

.

. (

.

=

.

(

.

)

= .0086.

= 𝑃(𝑧 < −2.33) = .5 − .4901 = .0099

)

.

)

(Using Table II, Appendix D) 5.50

a.

Let 𝑝̂ = sample proportion of U.S. adult workers who prepare their own tax returns. By the Central Limit theorem, the sampling distribution of 𝑝̂ will be approximately normal since the sample size is sufficiently large, with 𝜇 = 𝑝 = .37 and 𝜎 = 𝑃 𝑝̂ >

= 𝑃(𝑝̂ > .41) = 𝑃 𝑧 >

. .

. (

.

)

(

)

=

.

(

.

)

= .0294.

= 𝑃(𝑧 > 1.52) = .5 − .4357 = .0643

(Using Table II, Appendix D)

Copyright © 2022 Pearson Education, Inc.


Sampling Distributions

b.

𝑃

< 𝑝̂ <

= 𝑃(. 37 < 𝑝̂ < .56) = 𝑃

. .

= 𝑃(. 01 < 𝑧 < 6.32) ≈ .5 − .0040 = .4960 5.51

. (

.

)

<𝑧<

. .

257

. (

.

)

(Using Table II, Appendix D)

Let 𝑝̂ = sample proportion of Smartphone users who have a problem with apps not working on their cell phone. For a sample of 75 adults, 𝑛𝑝̂ = 75 60 75 = 75(. 8) = 60 and 𝑛𝑞 = 75(. 2) = 15. Because both of these values exceed the value 15, the sampling distribution of 𝑝̂ will be approximately normal with

μ pˆ = p = .90 and 𝜎 = 𝑃 𝑝̂ <

(

)

=

= 𝑃(𝑝̂ < .8) = 𝑃 𝑧 <

.

(

.

.

. (

.

. )

)

= .0346.

= 𝑃(𝑧 < −2.89) = .5 − .4981 = .0019 (Using Table II, Appendix D)

Because this probability is so small, we would infer that the actual percentage of Smartphone users who have a problem with apps not working on their cell phone is not 90% but something less than 90%. 5.52

a.

As the sample size increases, the standard error will decrease. This property is important because we know that the larger the sample size, the less variable our estimator will be. Thus, as n increases, our estimator will tend to be closer to the parameter we are trying to estimate.

b.

This would indicate that the statistic would not be a very good estimator of the parameter. If the standard error is not a function of the sample size, then a statistic based on one observation would be as good an estimator as a statistic based on 1000 observations.

c.

𝑥̄ would be preferred over A as an estimator for the population mean. The standard error of 𝑥̄ is smaller than the standard error of A.

d.

The standard error of 𝑥̄ is

=

= 1.25 and the standard error of A is

= 2.5.

If the sample size is sufficiently large, the Central Limit Theorem says the distribution of 𝑥̄ is approximately normal. Using the Empirical Rule, approximately 68% of all the values of 𝑥̄ will fall between 𝜇 − 1.25 and 𝜇 + 1.25. Approximately 95% of all the values of 𝑥̄ will fall between 𝜇 − 2.50 and 𝜇 + 2.50. Approximately all of the values of will fall between 𝜇 − 3.75 and 𝜇 + 3.75. Using the Empirical Rule, approximately 68% of all the values of A will fall between 𝜇 − 2.50 and 𝜇 + 2.50. Approximately 95% of all the values of A will fall between 𝜇 − 5.00 and 𝜇 + 5.00. Approximately all of the values of A will fall between 𝜇 − 7.50 and 𝜇 + 7.50. 5.53

a.

"The sampling distribution of the sample statistic A” is the probability distribution of the variable A.

b.

"A" is an unbiased estimator of 𝛼 if the mean of the sampling distribution of A is 𝛼.

c.

If both A and B are unbiased estimators of 𝛼, then the statistic whose standard deviation is smaller is a better estimator of 𝛼.

d.

No. The Central Limit Theorem applies only to the sample mean. If A is the sample mean, 𝑥̄ , and n is sufficiently large, then the Central Limit Theorem will apply. However, both A and B cannot be sample means. Thus, we cannot apply the Central Limit Theorem to both A and B.

Copyright © 2022 Pearson Education, Inc.


258 5.54

Chapter 5 a.

First we must compute 𝜇 and 𝜎. The probability distribution for x is: x 1 2 3 4

p(x) .3 .2 .2 .3

𝜇 = 𝐸(𝑥) = ∑ 𝑥𝑝(𝑥) = 1(. 3) + 2(. 2) + 3(. 2) + 4(. 3) = 2.5 𝜎 = 𝐸 ∑(𝑥 − 𝜇) = ∑(𝑥 − 𝜇) 𝑝(𝑥) = (1 − 2.5) (. 3) + (2 − 2.5) (. 2) + (3 − 2.5) (. 2) + (4 − 2.5) (. 3) = 1.45 𝜇 ̄ = 𝜇 = 2.5, 𝜎 ̄ = b. 5.55

5.57

= .1904

By the Central Limit Theorem, the distribution of 𝑥̄ is approximately normal. The sample size, 𝑛 = 40, is sufficiently large. Yes, the answer depends on the sample size.

By the Central Limit Theorem, the sampling distribution of 𝑥̄ is approximately normal. 𝜇 ̄ = 𝜇 = 19.6, 𝜎 ̄ =

5.56

√ .

=

. √

= .388 .

.

= 𝑃(𝑧 ≤ 0) = .5

a.

𝑃(𝑥̄ ≤ 19.6) = 𝑃 𝑧 ≤

b.

𝑃(𝑥̄ ≤ 19)𝑃 𝑧 ≤

c.

𝑃(𝑥̄ ≥ 20.1) = 𝑃 𝑧 ≥

d.

𝑃(19.2 ≤ 𝑥̄ ≤ 20.6) = 𝑃 <𝑧< . . = 𝑃(−1.03 ≤ 𝑧 ≤ 2.58) = .3485 + .4951 = .8436

. .

(Using Table II, Appendix D)

= 𝑃(𝑧 ≤ −1.55) = .5 − .4394 = .0606

. .

.

= 𝑃(𝑧 ≥ 1.29) = .5 − .4015 = .0985 (Using Table II, Appendix D)

.

.

(

(Using Table II, Appendix D)

.

)

.

.

(

.

)

.

(Using Table II, Appendix D)

a.

𝜇 = 𝑝 = .35 and 𝜎 =

b.

By the Central Limit theorem, the sampling distribution of 𝑝̂ will be approximately normal since the sample size is sufficiently large.

=

= .0213.

By the Central Limit theorem, the sampling distribution of 𝑝̂ will be approximately normal since the sample size is sufficiently large with 𝜇 = 𝑝 = .8 and 𝜎 = a.

𝑃(𝑝̂ < .83) = 𝑃 𝑧 <

.

.

. (

. )

(

)

=

. (

. )

= .0231.

= 𝑃(𝑧 < 1.30) = .5 + .4032 = .9032 (Using Table II, Appendix D)

b.

𝑃(𝑝̂ > .75) = 𝑃 𝑧 >

. . (

. . )

= 𝑃(𝑧 > −2.17) = .5 + .4850 = .9850 (Using Table II, Appendix D)

Copyright © 2022 Pearson Education, Inc.


Sampling Distributions

𝑃(. 79 < 𝑝̂ < .81) = 𝑃

c.

.

.

. (

. )

<𝑧<

. . (

. . )

259

= 𝑃(−.43 < 𝑧 < .43) = .1664 + .1664 = .3328

(Using Table II, Appendix D) 5.58

Answers will vary. One hundred samples of size n = 2 were selected from a normal distribution with a mean of 100 and a standard deviation of 10. The process was repeated for samples of size 𝑛 = 5, 𝑛 = 10, 𝑛 = 30, and 𝑛 = 50. For each sample, the value of 𝑥̄ was computed. Using MINITAB, the histograms for each set of 100 𝑥̄ ’s were constructed:

Histogram of xbar2, xbar5, xbar10, xbar30, xbar50 Normal 85

xbar2

90

95

0 5 0 5 0 10 10 11 11 12

xbar5

xbar10 60 45

Frequency

30 15 xbar30

xbar50

60

85

90

95 100 105 110 115 120

45

xbar5 Mean 99.70 StDev 6.278 N 100 xbar10 Mean 99.73 StDev 3.249 N 100 xbar30 Mean 100.2 StDev 2.040 N 100

30 15 0

0

xbar2 Mean 101.1 StDev 6.614 N 100

85

90

95 100 105 110 115 120

xbar50 Mean 100.1 StDev 1.512 N 100

The sampling distribution of 𝑥̄ is normal regardless of the sample size because the population we sampled from was normal. Notice that as the sample size n increases, the variances of the sampling distributions decrease. 5.59

Answers will vary. One hundred samples of size n = 2 were selected from a uniform distribution on the interval from 0 to 10. The process was repeated for samples of size 𝑛 = 5, 𝑛 = 10, 𝑛 = 30, and 𝑛 = 50. For each sample, the value of 𝑥̄ was computed. Using MINITAB, the histograms for each set of 100 𝑥̄ ’s were constructed:

Copyright © 2022 Pearson Education, Inc.


260

Chapter 5

Histogram of xbar2, xbar5, xbar10, xbar30, xbar50 Normal 0.0 1.5 3.0 4.5 6.0 7.5 9.0 xbar2

xbar5

xbar10 48 36

Frequency

24 12 xbar30

xbar50

0.0 1.5 3.0 4.5 6.0 7.5 9.0

48 36

xbar5 Mean 4.828 StDev 1.610 N 100 xbar10 Mean 5.004 StDev 0.9256 N 100 xbar30 Mean 5.010 StDev 0.5652 N 100

24 12 0

0

xbar2 Mean 4.935 StDev 2.073 N 100

xbar50 Mean 4.998 StDev 0.4323 N 100

0.0 1.5 3.0 4.5 6.0 7.5 9.0

For small sizes of n, the sampling distributions of𝑥̄ are somewhat normal. As n increases, the sampling distributions of 𝑥̄ become more normal. 5.60

a.

Tossing a coin two times can result in: 2 heads (2 ones) 2 tails (2 zeros) 1 head, 1 tail (1 one, 1 zero)

b.

𝑥̄ heads =

= 1; 𝑥̄ tails =

c.

𝑝̂ heads = = 1; 𝑝̂ tails = = 0; 𝑝̂

d.

There are four possible combinations for one coin tossed two times, as shown below: Coin Tosses H, H H, T T, H T, T

𝒑 1 1/2 1/2 0

= 0; 𝑥̄ 1H,1T = ,

=

=

𝒑 0 1/2 1

Copyright © 2022 Pearson Education, Inc.

𝒑(𝒑) 1/4 1/2 1/4


Sampling Distributions

e.

261

The sampling distribution of 𝑝̂ is given in the histogram shown. H istogr am of p-hat 0.5

p(p-hat)

0.4

0.3

0.2

0.1

0.0

5.61

0.0

0.5 p-hat

1.0

Given: 𝜇 = 100 and 𝜎 = 10 n 𝜎

1

5

√𝑛 10 The graph of

10

20

30

4.472 3.162 2.236 1.826

40

50

1.581 1.414

against n is given here:

Scatter plot of st er r vs n 10 9 8

st err

7 6 5 4 3 2 1 0

10

20

30

40

50

n

5.62

,

a.

𝐸(𝑥̄ ) = 𝐸(𝑥) = = = 1,800. The average value of the sample mean number of seconds from the start of the hour is 1,800 second.

b.

𝜎̄ =

c.

Because the sample size is sufficiently large, by the Central Limit Theorem, the sampling distribution of 𝑥̄ is approximately normal.

=

(

)

=

( ,

)

= 18,000

Copyright © 2022 Pearson Education, Inc.


262

Chapter 5 d.

𝑃(1700 < 𝑥̄ < 1900) = 𝑃

<𝑧<

,

= 𝑃(−.75 < 𝑧 < .75)

,

= .2734 + .2734 = .5468 (Using Table II, Appendix D) e.

𝑃(𝑥̄ > 2000) = 𝑃 𝑧 > Appendix D)

5.63

(Using Table II,

By the Central Limit Theorem, the sampling distribution of 𝑥̄ is approximately normal with 𝜇 ̄ = 𝜇 = 105.3 and 𝜎 ̄ = = = 1. √

.

𝑃(𝑥̄ < 103) = 𝑃 𝑧 < 5.64

= 𝑃(𝑧 > 1.49) = .5 − .4319 = .0681

,

a.

= 𝑃(𝑧 < −2.3) = .5 − .4893 = .0107 (Using Table II, Appendix D)

By the Central Limit theorem, the sampling distribution of 𝑝̂ will be approximately normal since the (

sample size is sufficiently large, with 𝜇 = 𝑝 = .03 and 𝜎 =

b.

𝑃(𝑝̂ < .05) = 𝑃 𝑧 <

.

. (

.

(

.

=

)

.

= .0054.

= 𝑃(𝑧 < 3.71) = .5 + .4999 = .9999

)

.

)

(Using Table II, Appendix D)

c.

𝑃(𝑝̂ > .025) = 𝑃 𝑧 >

.

. .

(

.

)

= 𝑃(𝑧 > −.93) = .5 + .3238 = .8238 (Using Table II, Appendix D)

5.65

a.

Let 𝑝̂ = sample proportion of Finnish citizens with high IQ who invest in the stock market. By the Central Limit theorem, the sampling distribution of 𝑝̂ will be approximately normal since the = 𝑝 = .44 and 𝜎

sample size is sufficiently large, with 𝜇

𝑃 𝑝̂ >

.

= 𝑃(𝑝̂ > .3) = 𝑃 𝑧 >

.

. (

.

)

(

=

)

(

.

=

)

.

= .0222.

= 𝑃(𝑧 > −6.31) = .5 + .5 = 1

(Using Table II, Appendix D) b.

Let 𝑝̂ = sample proportion of Finnish citizens with average IQ who invest in the stock market. By the Central Limit theorem, the sampling distribution of 𝑝̂ will be approximately normal since the = 𝑝 = .26 and 𝜎

sample size is sufficiently large, with 𝜇

𝑃 𝑝̂ >

= 𝑃(𝑝̂ > .3) = 𝑃 𝑧 >

. .

. (

.

)

(

=

)

.

=

(

.

)

= .0196.

= 𝑃(𝑧 > 2.04) = .5 − .4793 = .0207

(Using Table II, Appendix D) c.

Let 𝑝̂ = sample proportion of Finnish citizens with low IQ who invest in the stock market. By the Central Limit theorem, the sampling distribution of 𝑝̂ will be approximately normal since the sample size is sufficiently large, with 𝜇

= 𝑝 = .14 and 𝜎

=

(

Copyright © 2022 Pearson Education, Inc.

)

=

.

(

.

)

= .0155.


Sampling Distributions

𝑃 𝑝̂ >

.

= 𝑃(𝑝̂ > .3) = 𝑃 𝑧 >

.

. (

.

263

= 𝑃(𝑧 > 10.31) ≈ .5 − .5 = 0

)

(Using Table II, Appendix D) 5.66

a.

Since the sample size is small, we also have to assume that the distribution from which the sample . = .1118 was drawn is normal. 𝜇 ̄ = 𝜇 = 1.8, 𝜎 ̄ = = √

.

.

𝑃(𝑥̄ ≥ 1.85) = 𝑃 𝑧 ≥ . (using Table II, Appendix D) b.

= 𝑃(𝑧 ≥ 0.45) = .5 − .1736 = .3264

Using MINITAB, the descriptive statistics are: Descriptive Statistics: Rough Variable Rough

N Mean 20 1.881

StDev 0.524

Minimum 1.060

Q1 1.303

Median 2.040

Q3 2.293

Maximum 2.640

From this output, the value of 𝑥̄ is 1.881. c.

For 𝑥̄ = 1.881: 𝑃(𝑥̄ ≥ 1.881) = 𝑃 𝑧 ≥

.

. .

= 𝑃(𝑧 ≥ 0.72) = .5 − .2642 = .2358

Since this probability is so high, observing a sample mean of 𝑥̄ = 1.881is not unusual. The assumptions in part a appear to be valid. 5.67

5.68

𝐸(𝑥̄ ) = 𝜇 ̄ = 𝜇 = .10

b.

Since 𝑛 > 30, the sampling distribution of 𝑥̄ is approximately normal by the Central Limit Theorem.

c.

𝑃(𝑥̄ > .13) = 𝑃 𝑧 >

a.

By the Central Limit Theorem, the sampling distribution of 𝑥̄ is approximately normal with 𝜇 ̄ = 𝜇 and 𝜎 ̄ = 𝜎/√𝑛 = 𝜎/√100.

b.

The mean of the𝑥̄ distribution is equal to the mean of the distribution of the fleet or the fleet mean score.

c.

𝜇 ̄ = 𝜇 = 30 and 𝜎 ̄ = 𝜎/√𝑛 = 𝜎/√100 = 60/√100 = 6.

𝜎̄=

.

. .

=

= .0141

𝜎 =. 0141 = .0002

= 𝑃(𝑧 > 2.12) = .5 − .4830 = .017 (Using Table II, Appendix D)

= 𝑃(𝑧 ≥ 2.5) = .5 − .4938 = .0062 (Using Table II, Appendix D)

𝑃(𝑥̄ ≥ 45) = 𝑃 𝑧 ≥

5.69

.

a.

d.

The sample mean of 45 tends to refute the claim. If the true fleet mean was as high as 30, observing a sample mean of 45 or higher would be extremely unlikely (probability = .0062). Thus, we would infer that the true mean is actually not 30 but something higher. Thus, we would refute the company’s claim that the mean “couldn’t possibly be as large as 30.”

a.

By the Central Limit Theorem, the sampling distribution of 𝑥̄ is approximately normal with 𝜇 ̄ = 𝜇 and 𝜎 ̄ = 𝜎/√𝑛 = 𝜎/√50.

b.

𝜇 ̄ = 𝜇 = 40 and 𝜎 ̄ = 𝜎/√50 = 12/√50 = 1.6971. 𝑃(𝑥̄ ≥ 44) = 𝑃 𝑧 ≥

.

= 𝑃(𝑧 ≥ 2.36) = .5 − .4909 = .0091 (using Table II, Appendix D)

Copyright © 2022 Pearson Education, Inc.


264

Chapter 5

c.

𝜇 ± 2𝜎/√𝑛 ⇒ 40 ± 2(1.6971) ⇒ 40 ± 3.3942 ⇒ (36. 6058, 43.3942) 𝑃(36.6058 ≤ 𝑥̄ ≤ 43.3942) = 𝑃

36.6058 − 40 43.3942 − 40 ≤ 𝑧 ≤ 1.6971 1.6971

= 𝑃(−2 ≤ 𝑧 ≤ 2) = 2(.4772) = .9544 (using Table II, Appendix D) 5.70

a.

The mean diameter of the bearings, 𝜇, is unknown with a standard deviation of 𝜎 = .001 inch. Assuming that the distribution of the diameters of the bearings is normal, the sampling distribution of the sample mean is also normal. The mean and variance of the distribution are: 𝜇 ̄ = 𝜇, 𝜎 ̄ =

=

. √

= .0002

Having the sample mean fall within .0001 inch of 𝜇 implies |𝑥̄ − 𝜇| ≤ .0001 or −.0001 ≤ 𝑥̄ − 𝜇 ≤ .0001 𝑃(−.0001 ≤ 𝑥̄ − 𝜇 ≤ .0001) = 𝑃

.

≤𝑧≤

.

. .

= 𝑃(−.50 ≤ 𝑧 ≤ .50) = .1915 + .1915 = .3830 b.

5.71

(using Table II, Appendix D)

The approximation is unlikely to be accurate. In order for the Central Limit Theorem to apply, the sample size must be sufficiently large. For a very skewed distribution, 𝑛 = 25 is not sufficiently large, and thus, the Central Limit Theorem will not apply.

From Exercise 5.68, 𝜎 = .001. We must assume the Central Limit theorem applies (n is only 25). Thus, . = .0002. Using the distribution of𝑥̄ is approximately normal with 𝜇 ̄ = 𝜇 = .501 and 𝜎 ̄ = = √

Table II, Appendix D, 𝑃(𝑥̄ < .4994) + 𝑃(𝑥̄ > .5006) = 𝑃 𝑧 <

.

.

+𝑃 𝑧 >

.

.

. .

= 𝑃(𝑧 < −8) + 𝑃(𝑧 > −2) = (. 5 − .5) + (. 5 + .4772) = .9772 5.72

a.

By the Central Limit Theorem, the sampling distribution of 𝑥̄ is approximately normal with 𝜎 = = = .3235. √

b.

If 𝜇 = 18.5, 𝑃(𝑥̄ > 19.1) = 𝑃 𝑧 >

c.

If 𝜇 = 19.5, 𝑃(𝑥̄ > 19.1) = 𝑃 𝑧 >

d.

𝑃(𝑥̄ > 19.1) = 𝑃 𝑧 >

e.

𝑃(𝑥̄ > 19.1) = 𝑃 𝑧 >

. . . .

= .5

.

.

= 𝑃(𝑧 > 1.85) = .5 − .4678 = .0322 (Using Table II, Appendix D)

.

= 𝑃(𝑧 > −1.24) = .5 + .3925 = .8925 (Using Table II, Appendix D)

.

. .

We know that𝑃(𝑧 > 0) = .5. Thus,

. .

= 0 ⇒ 𝜇 = 19.1

= .2

Thus, 𝜇 must be less than 19.1. If 𝜇 = 19.1, then 𝑃(𝑥̄ > 19.1) = .5. Since 𝑃(𝑥̄ > 19.1) < .5, then 𝜇 < 19.1. 5.73

a.

𝐸(𝑝̂ ) = 𝜇 = 𝑝 = .2 and 𝜎 =

(

)

=

. (

. )

= .0253

Copyright © 2022 Pearson Education, Inc.


Sampling Distributions b.

𝐸(𝑝̂ ) ± 2𝜎 ⇒ .2 ± 2(.0253) ⇒ .2 ± .0506 ⇒ (.1494, .2506)

c.

By the Central Limit Theorem, the sampling distribution of 𝑝̂ will be approximately normal since the sample size is sufficiently large. Thus, .

𝑃(. 1494 ≤ 𝑝̂ ≤ .2506) = 𝑃 5.74

a.

. .

≤𝑧≤

.

. .

= 𝑃(−2 ≤ 𝑧 ≤ 2) = .4772 + .4772 = .9544

By the Central Limit Theorem, the sampling distribution of𝑥̄ is approximately normal since 𝑛 > 30 = = 2.1213 with 𝜇 ̄ = 𝜇 = 840 and 𝜎 ̄ = √

= 𝑃(𝑧 ≤ −4.71) ≈ .5 − .5 = 0

b.

𝑃(𝑥̄ ≤ 830) = 𝑃 𝑧 ≤

c.

Since the probability of observing a mean of 830 or less is extremely small(≈ 0) if the true mean is 840, we would tend to believe that the mean is not 840, but something less.

d.

By the Central Limit Theorem, the sampling distribution of 𝑥̄ is approximately normal since 𝑛 > 30 = = 6.3640 with 𝜇 ̄ = 𝜇 = 840 and 𝜎 ̄ =

.

𝑃(𝑥̄ ≤ 830) = 𝑃 𝑧 ≤ 5.75

a.

= 𝑃(𝑧 ≤ −1.57) ≈ .5 − .4418 = .0582

.

Let p1 = probability of an error = 1/100 = .01and p2 = probability of an error resulting in a significant problem= 1/500 = .002. Let 𝑝̂ = proportion of errors. Then 𝐸(𝑝̂ ) = 𝜇

= 𝑝 = .01.

Let 𝑝̂ = proportion of significant errors. Then 𝐸(𝑝̂ ) = 𝜇 b.

= 𝑝 = .002.

Since the distribution of 𝑝̂ will be approximately normal by the Central Limit Theorem, we would expect the proportion of significant errors to fall within 2 standard deviations of the expected value. The interval would be: 𝑝̂ ± 2𝜎

5.76

265

(

.

⇒ .002 ± 2

. ,

)

⇒ .002 ± .00036 ⇒ (. 00164, .00236)

Even though the number of flaws per piece of siding has a Poisson distribution, the Central Limit Theorem implies that the distribution of the sample mean will be approximately normal with 𝜇 ̄ = 𝜇 = 2.5 and 𝜎̄ =

=

√ . √

= .2673. Therefore,

2.1 − 2.5   P ( x ≥ 2.1) = P  z >  = P ( z > −1.50 ) = .5 + .4332 = .9332  2.5 / 35 

(Using Table II, Appendix D) 5.77

a.

By the Central Limit Theorem, the sampling distribution of 𝑥̄ is approximately normal with a mean . 𝜇 ̄ = 𝜇 = .53 and 𝜎 ̄ = = = .0273. √

.

.

= 𝑃(𝑧 > 1.83) = .5 − .4664 = .0336

b.

𝑃(𝑥̄ > .58) = 𝑃 𝑧 >

c.

If Before Tensioning: 𝜇 ̄ = 𝜇 = .53

.

Copyright © 2022 Pearson Education, Inc.


266

Chapter 5 𝑃(𝑥̄ ≥ .59) = 𝑃 𝑧 ≥

.

.

= 𝑃(𝑧 ≥ 2.20) = .5 − .4861 = .0139

.

If After Tensioning: 𝜇 ̄ = 𝜇 = .58 𝑃(𝑥̄ ≥ .59) = 𝑃 𝑧 ≥

.

.

= 𝑃(𝑧 ≥ 0.37) = .5 − .1443 = .3557

.

Since the probability of getting a maximum differential of .59 or more Before Tensioning is so small, it would be very unlikely that the measurements were obtained Before Tensioning. However, since the probability of getting a maximum differential of .59 or more After Tensioning is not small, it would not be unusual that the measurements were obtained after tensioning. Thus, most likely, the measurements were obtained After Tensioning. 5.78

For𝑛 = 36, 𝜇 ̄ = 𝜇 = 406 and 𝜎 ̄ = 𝜎/√𝑛 = 10.1/√36 = 1.6833. By the Central Limit Theorem, the sampling distribution is approximately normal (n is large). 𝑃(𝑥̄ ≤ 400.8) = 𝑃 𝑧 ≤

.

= 𝑃(𝑧 ≤ −3.09) = .5 − .4990 = .0010

.

(Using Table II, Appendix D)

We agree with the first operator. If the true value of 𝜇 is 406, it would be extremely unlikely to observe an 𝑥̄ as small as 400.8 or smaller (probability .0010). Thus, we would infer that the true value of 𝜇 is less than 406. 5.79

By the Central Limit Theorem, the sampling distribution of𝑥̄ is approximately normal with 𝜇 ̄ = 𝜇 = 40 = .5. and 𝜎 ̄ = = √

𝑃(𝑥̄ ≥ 42) = 𝑃 𝑧 ≥

.

= 𝑃(𝑧 > 4) ≈ .5 − .5 = 0 (Using Table II, Appendix D)

Since this probability is so small, it is very unlikely that the sample was selected from the population of convicted drug dealers. 5.80

a.

By the Central Limit Theorem, the distribution of 𝑥̄ is approximately normal, with 𝜇 ̄ = 𝜇 = 157 and = .474. 𝜎̄= = √

The sample mean is 1.3 psi below 157 or 𝑥̄ = 157 − 1.3 = 155.7 𝑃(𝑥̄ ≤ 155.7) = 𝑃 𝑧 ≤

. .

= 𝑃(𝑧 ≤ −2.74) = .5 − .4969 = .0031 (Using Table II, Appendix D)

If the claim is true, it is very unlikely (probability = .0031) to observe a sample mean 1.3 psi below 157 psi. Thus, the actual population mean is probably not 157 but something lower. b.

𝑃(𝑥̄ ≤ 155.7) = 𝑃 𝑧 ≤

. .

= 𝑃(𝑧 ≤ −.63) = .5 − .2357 = .2643 (Using Table II, Appendix D)

The observed sample is more likely if 𝜇 = 156 rather than 𝜇 = 157. 𝑃(𝑥̄ ≤ 155.7) = 𝑃 𝑧 ≤

. .

= 𝑃(𝑧 ≤ −4.85) ≈ .5 − .5 = 0

The observed sample is less likely if 𝜇 = 158 rather than 𝜇 = 157.

Copyright © 2022 Pearson Education, Inc.


Sampling Distributions c.

If 𝜎 = 2, 𝜎 ̄ =

=

267

= .316.

𝑃(𝑥̄ ≤ 155.7) = 𝑃 𝑧 ≤

. .

= 𝑃(𝑧 ≤ −4.11) = .5 − .5 = 0 (Using Table II, Appendix D)

The observed sample is less likely if 𝜎 = 2 than if 𝜎 = 3. If 𝜎 = 6, 𝜎 ̄ =

=

𝑃(𝑥̄ ≤ 155.7) = 𝑃 𝑧 ≤

= .949. . .

= 𝑃(𝑧 ≤ −1.37) = .5 − .4147 = .0853 (Using Table II, Appendix D)

The observed sample is more likely if 𝜎 = 6 than if 𝜎 = 3. 5.81

Answers will vary. We are to assume that the fecal bacteria concentrations of water specimens follow an approximate normal distribution. Now, suppose that the distribution of the fecal bacteria concentration at a beach is normal with a true mean of 360 and with a standard deviation of 40. If only a single sample was selected, then the probability of getting an observation at the 400 level or higher would be: 𝑃(𝑥 ≥ 400) = 𝑃 𝑧 ≥

= 𝑃(𝑧 ≥ 1) = .5 − .3413 = .1587 (Using Table II, Appendix D)

Thus, even if the water is safe, the beach would be closed approximately 15.87% of the time. On the other hand, if the mean was 440 and the standard deviation was still 40, then the probability of getting a single observation less than the 400 level would be: 𝑃(𝑥 ≤ 400) = 𝑃 𝑧 ≤

= 𝑃(𝑧 ≤ −1) = .5 − .3413 = .1587 (Using Table II, Appendix D)

Thus, the beach would remain open approximately 15.78% of the time when it should be closed. Now, suppose we took a random sample of 64 water specimens. The sampling distribution of 𝑥̄ is = 5. approximately normal by the Central Limit Theorem with 𝜇 ̄ = 𝜇 and 𝜎 ̄ = = √

= 𝑃(𝑧 ≥ 8) ≈ .5 − .5 = 0. Thus, the beach would never be If 𝜇 = 360, 𝑃(𝑥̄ ≥ 400) = 𝑃 𝑧 ≤ shut down if the water was actually safe if we took samples of size 64. = 𝑃(𝑧 ≤ −8) ≈ .5 − .5 = 0. Thus, the beach would never be If 𝜇 = 440, 𝑃(𝑥̄ ≤ 400) = 𝑃 𝑧 ≤ left open if the water was actually unsafe if we took samples of size 64. The single sample standard can lead to unsafe decisions or inconvenient decisions, but is much easier to collect than samples of size 64.

Copyright © 2022 Pearson Education, Inc.


Chapter 6 Inferences Based on a Single Sample: Estimation with Confidence Intervals 6.1

6.2

6.3

a.

For 𝛼 = .10, 𝛼/2 = .10/2 = .05. 𝑧 / = 𝑧. is the z-score with .05 of the area to the right of it. The area between 0 and z.05 is. 5 − .05 = .4500. Using Table II, Appendix D,𝑧. = 1.645.

b.

For 𝛼 = .01, 𝛼/2 = .01/2 = .005. 𝑧 / = 𝑧. is the z-score with .005 of the area to the right of it. The area between 0 and z.005 is. 5 − .005 = .4950. Using Table II, Appendix D,𝑧. = 2.575.

c.

For 𝛼 = .05, 𝛼/2 = .05/2 = .025. 𝑧 / = 𝑧. is the z-score with .025 of the area to the right of it. The area between 0 and z.025 is. 5 − .025 = .4750. Using Table II, Appendix D,𝑧. = 1.96.

d.

For 𝛼 = .20, 𝛼/2 = .20/2 = .10. 𝑧 / = 𝑧. is the z-score with .10 of the area to the right of it. The area between 0 and z.10 is . 5 − .10 = .4000. Using Table II, Appendix D,𝑧. = 1.28.

a.

𝑧 / = 1.96, using Table II, Appendix D, 𝑃 0 ≤ 𝑧 ≤ 1.96 = .4750. Thus, 𝛼/2 = .5 − .4750 = .025, 𝛼 = 2 . 025 = .05, and 1 − 𝛼 = 1 − .05 = .95. The confidence level is 100% . 95 = 95%.

b.

𝑧 / = 1.645, using Table II, Appendix D, 𝑃 0 ≤ 𝑧 ≤ 1.645 = .45. Thus, 𝛼/2 = .5 − .45 = .05, 𝛼 = 2 . 05 = .10, and 1 − 𝛼 = 1 − .10 = .90. The confidence level is 100% . 90 = 90%.

c.

𝑧 / = 2.575, using Table II, Appendix D, 𝑃 0 ≤ 𝑧 ≤ 2.575 = .495. Thus, 𝛼/2 = .5 − .495 = .005, 𝛼 = 2 . 005 = .01, and 1 − 𝛼 = 1 − .01 = .99. The confidence level is 100% . 99 = 99%.

d.

𝑧 / = 1.282, using Table II, Appendix D, 𝑃 0 ≤ 𝑧 ≤ 1.282 = .4. Thus, 𝛼/2 = .5 − .4 = .1, 𝛼 = 2 . 1 = .20, and 1 − 𝛼 = 1 − .20 = .80. The confidence level is 100% . 80 = 80%.

e.

𝑧 / = .99, using Table II, Appendix D, 𝑃 0 ≤ 𝑧 ≤ .99 = .3389. Thus, 𝛼/2 = .5 − .3389 = .1611, 𝛼 = 2 . 1611 = .3222, and 1 − 𝛼 = 1 − .3222 = .6778. The confidence level is 100% . 6778 = 67.78%.

a.

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. = 1.96. The confidence interval is: 𝑥̄ ± 𝑧.

⇒ 28 ± 1.96

⇒ 28 ± .784 ⇒ (27.216, 28.784)

⇒ 102 ± 1.96

⇒ 102 ± .65 ⇒ (101.35, 102.65)

b.

𝑥̄ ± 𝑧.

c.

𝑥̄ ± 𝑧.

d.

𝑥̄ ± 𝑧.

e.

No. Since the sample size in each part was large (n ranged from 75 to 200), the Central Limit Theorem indicates that the sampling distribution of 𝑥̄ is approximately normal.

√ √ √

⇒ 15 ± 1.96

√ .

⇒ 15 ± .0588 ⇒ (14.9412, 15.0588)

⇒ 4.05 ± 1.96

. √

⇒ 4.05 ± .163 ⇒ (3.887, 4.213)

268 Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Estimation with Confidence Intervals 6.4

a.

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. = 1.96. The confidence interval is: 𝑥̄ ± 𝑧.

b.

a.

⇒ 25.9 ± 1.96

.

⇒ 25.9 ± .56 ⇒ (25.34, 26.46)

⇒ 25.9 ± 1.645

.

⇒ 25.9 ± .47 ⇒ (25.43, 26.37)

For confidence coefficient .99, 𝛼 = .01 and 𝛼/2 = .01/2 = .005. From Table II, Appendix D, 𝑧. = 2.58. The confidence interval is: 𝑥̄ ± 𝑧.

6.5

For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table II, Appendix D, 𝑧. = 1.645. The confidence interval is: 𝑥̄ ± 𝑧.

c.

269

⇒ 25.9 ± 2.58

.

⇒ 25.9 ± .73 ⇒ (25.17, 26.63)

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. = 1.96. The confidence interval is: 𝑥̄ ± 𝑧 /

⇒ 26.2 ± 1.96

. √

⇒ 26.2 ± .96 ⇒ (25.24, 27.16)

b.

The confidence coefficient of .95 means that in repeated sampling, 95% of all confidence intervals constructed will include 𝜇.

c.

For confidence coefficient .99, 𝛼 = .01 and 𝛼/2 = .01/2 = .005. From Table II, Appendix D, 𝑧. = 2.58. The confidence interval is: 𝑥̄ ± 𝑧 /

⇒ 26.2 ± 2.58

. √

⇒ 26.2 ± 1.26 ⇒ (24.94, 27.46)

d.

As the confidence coefficient increases, the width of the confidence interval also increases.

e.

Yes. Since the sample size is 70, the Central Limit Theorem applies. This ensures the distribution of𝑥̄ is normal, regardless of the original distribution.

6.6

If we were to repeatedly draw samples from the population and form the interval 𝑥̄ ± 1.96𝜎 ̄ each time, approximately 95% of the intervals would contain 𝜇. We have no way of knowing whether our interval estimate is one of the 95% that contain𝜇or one of the 5% that does not.

6.7

A point estimator is a single value used to estimate the parameter, 𝜇. An interval estimator is two values, an upper and lower bound, which define an interval with which we attempt to enclose the parameter, 𝜇. An interval estimate also has a measure of confidence associated with it.

6.8

a.

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. = 1.96. The confidence interval is: 𝑥̄ ± 𝑧.

⇒ 33.9 ± 1.96 ⇒ 33.9 ± 1.96

.

⇒ 33. 9 ± .647 ⇒ (33.253, 34.547)

.

⇒ 33. 9 ± .323 ⇒ (33.577, 34.223)

b.

𝑥̄ ± 𝑧.

c.

For part a, the width of the interval is 2(. 647) = 1.294. For part b, the width of the interval is 2(. 323) = .646. When the sample size is quadrupled, the width of the confidence interval is halved.

Copyright © 2022 Pearson Education, Inc.


270

Chapter 6

6.9

Yes. As long as the sample size is sufficiently large, the Central Limit Theorem says the distribution of 𝑥̄ is approximately normal regardless of the original distribution.

6.10

a.

The 90% confidence interval is (155.8, 206.2).

b.

We are 90% confident that the true mean number of crowdfunding backers for all entrepreneurial projects pitched via the internet is between 155.8 and 206.2.

c.

In repeated sampling, 90% of all confidence intervals constructed will contain the true mean.

a.

The confidence coefficient is .95.

b.

We are 95% confident that the true mean HRV for officers diagnosed with hypertension is between 4.1 and 124.5. We are 95% confident that the true mean HRV for officers that are not hypertensive is between 4.1 and 124.5.

c.

In repeated sampling, 95% of all confidence intervals constructed will contain the true mean.

d.

To reduce the width of the confidence interval, one would reduce the confidence coefficient. The smaller the confidence coefficient, the smaller the width of the confidence interval.

a.

The 99% confidence interval is (65.553, 69.957).

b.

For confidence coefficient .99, 𝛼 = .01 and 𝛼/2 = .01/2 = .005. From Table II, Appendix D, 𝑧. = 2.576. The confidence interval is:

6.11

6.12

𝑥̄ ± 𝑧.

⇒ 67.755 ± 2.576(.85315) ⇒ 67.755 ± 2.1978 ⇒ (65.557, 69.953)

This is close to the interval reported in the output.

6.13

c.

We are 99% confident that the true mean level of support for all senior managers is between 65.557 and 69.953..

d.

No. The 99% confidence interval does not contain 75. Therefore, it is not a likely value for the true mean.

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. = 1.96. The confidence interval is: 𝑥̄ ± 𝑧.

⇒ 112 ± 1.96

√ ,

⇒ 112 ± 21.46 ⇒ (90.54,133.46)

We are 95% confidence that the true mean tipping point of all daily deal offerings in Korea is between 90.54 and 133.46. 6.14

a.

From the printout, the 95% confidence interval is (1.6776, 2.1924).

b.

We are 95% confident that the true mean failure time of used colored display panels is between 1.6776 and 2.1924 years.

c.

If 95% confidence intervals are formed, then approximately .95 of the intervals will contain the true mean failure time.

Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Estimation with Confidence Intervals 6.15

271

a.

Because the sample size is large, the Central Limit Theorem guarantees that the sampling distribution of 𝑥̅ will be approximately normally distributed.

b.

Using MINITAB, the 95% confidence interval is: Descriptive Statistics N Mean StDev SE Mean 95% CI for μ 130 276.85 139.60

12.244 (252.63, 301.08)

We are 95% confident that the true mean size of the data packets in the attack state is between 252.63 and 301.08 bytes.

6.16

c.

Since the entire interval is below the value 337 bytes, there is evidence to support the theory. We are 95% confident that the true mean size of the data packets in the attack state is less than 337 bytes.

a.

For confidence coefficient .99, 𝛼 = .01 and 𝛼/2 = .01/2 = .005. From Table II, Appendix D, 𝑧. = 2.576. The confidence interval is: 𝑥̄ ± 𝑧.

⇒ .149 ± 2.576

. √ ,

⇒ .149 ± .0072 ⇒ (. 1418, .1562)

We are 99% confidence that the true mean DROS for all companies that claim the tax deduction is between .1418 and .1562.

6.17

b.

To reduce the width of the confidence interval, one could reduce the confidence coefficient or increase the size of the sample. Either of these options would result in a smaller width for the confidence interval.

a.

For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table II, Appendix D, 𝑧. = 1.645. The confidence interval is: 𝑥̄ ± 𝑧.

6.18

⇒ 2.42 ± 1.645

. √

⇒ 2.42 ± .504 ⇒ (1.916,2.924)

b.

We are 90% confidence that the true mean intention to comply score for the population of entry level accountants is between 1.916 and 2.924.

c.

In repeated sampling, 90% of all similarly constructed confidence intervals will contain the true value of 𝜇.

d.

𝑥̄ ± 2𝑠 ⇒ 2.42 ± 2(2.84) ⇒ 2.42 ± 5.68 ⇒ (−3.26,8.10). This interval gives the range of the actual values of x, while the confidence interval in part a gives the range of values for the population mean.

a.

The population of interest is all U.S. women who shop on Black Friday.

b.

The quantitative variable of interest is the number of hours spent shopping.

c.

Using MINITAB, the descriptive statistics are: Descriptive Statistics: Hours Variable N Mean StDev Hours 38 6.079 2.755

Minimum 3.000

Q1 4.000

Median 5.000

Copyright © 2022 Pearson Education, Inc.

Q3 7.250

Maximum 16.000


272

Chapter 6 For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. = 1.96. The confidence interval is: 𝑥̄ ± 𝑧.

6.19

⇒ 6.079 ± 1.96

. √

⇒ 6.079 ± .876 ⇒ (5.203,6.955)

d.

We are 95% confident that the true mean number of hours spent shopping on Black Friday is between 5.203 and 6.995.

e.

No. The confidence interval constructed in part c contains 5.5. Therefore, the 5.5 is not an unusual value for the mean.

a.

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. = 1.96. The confidence interval is: 𝑥̄ ± 𝑧.

⇒ 31.9 ± 1.96

. √ ,

⇒ 31.9 ± .539 ⇒ (31.361, 32.439)

We are 95% confidence that the true mean amount of food wasted by all US households is between 31.361% and 32.439%.

6.20

b.

We do not agree. The entire confidence interval is contained above the value 30%. Therefore, we are 95% confident that the true mean amount of food wasted by all US households exceeds 30%.

a.

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. = 1.96. The confidence interval is: 𝑥̄ ± 𝑧.

b. 6.21

⇒ 1.96 ± 1.96

.

⇒ 1.96 ± .04 ⇒ (1.92,2.00)

No. The value of 2.2 does not fall in the 95% confidence interval. Therefore, it is not a likely value for the true mean facial WHR.

We will create 95% confidence intervals for both the true mean good head posture and the true mean good neck posture. For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. = 1.96. The confidence interval for the true mean head posture is: 𝑥̄ ± 𝑧.

⇒ 48.97 ± 1.96

. √

⇒ 48.97 ± 7.79 ⇒ (41.18, 56.76)

We are 95% confidence that the true mean good head posture is between 41.18% and 56.76%. Given these endpoints, a value of 55% for the true mean good head posture is possible. The confidence interval for the true mean neck posture is: 𝑥̄ ± 𝑧.

⇒ 43.20 ± 1.96

. √

⇒ 43.20 ± 7.54 ⇒ (35.66, 50.74)

We are 95% confidence that the true mean good neck posture is between 35.66% and 50.74%. Given these endpoints, a value of 55% for the true mean good head posture is not likely. 6.22

𝑥̄ =

, ,

= 2.26

For confidence coefficient, .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, = 1.96. The confidence interval is: 𝑧. Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Estimation with Confidence Intervals 𝑥̄ ± 𝑧.

⇒ 2.26 ± 1.96

. √ ,

273

⇒ 2.26 ± .04 ⇒ (2.22, 2. 30)

We are 95% confident the mean number of roaches produced per roach per week is between 2.22 and 2.30. 6.23

6.24

6.25

6.26

a.

For confidence coefficient .80, 𝛼 = .20 and 𝛼/2 = .20/2 = .10. From Table II, Appendix D, 𝑧. 1.28. From Table II, with df = 𝑛 − 1 = 5 − 1 = 4, 𝑡. = 1.533.

=

b.

For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table II, Appendix D, 𝑧. 1.645. From Table II, with df = 𝑛 − 1 = 5 − 1 = 4, 𝑡. = 2.132.

=

c.

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. = 1.96. From Table II, with df = 𝑛 − 1 = 5 − 1 = 4, 𝑡. = 2.776.

d.

For confidence coefficient .98, 𝛼 = .02 and 𝛼/2 = .02/2 = .01. From Table II, Appendix D, 𝑧. 2.33. From Table II, with df = 𝑛 − 1 = 5 − 1 = 4, 𝑡. = 3.747.

e.

For confidence coefficient .99, 𝛼 = .01 and 𝛼/2 = .01/2 = .005. From Table II, Appendix D, 𝑧. = 2.575. From Table II, with df = 𝑛 − 1 = 5 − 1 = 4, 𝑡. = 4.604.

f.

Both the t- and z-distributions are symmetric around 0 and mound-shaped. The t-distribution is more spread out than the z-distribution.

a.

If x is normally distributed, the sampling distribution of𝑥̄ is normal, regardless of the sample size.

b.

If nothing is known about the distribution of x, the sampling distribution of𝑥̄ is approximately normal if n is sufficiently large. If n is not large, the distribution of𝑥̄ is unknown if the distribution of x is not known.

a.

𝑃(−𝑡 < 𝑡 < 𝑡 ) = .95 where 𝑑𝑓 = 10 Because of symmetry, the statement can be written 𝑃(0 < 𝑡 < 𝑡 ) = .475 where 𝑑𝑓 = 10 ⇒ 𝑃(𝑡 ≥ 𝑡 ) = .025 ⇒ 𝑡 = 2.228

b.

𝑃(𝑡 ≤ −𝑡 or𝑡 ≥ 𝑡 ) = .05 where 𝑑𝑓 = 10 ⇒ 2𝑃(𝑡 ≥ 𝑡 ) = .05 ⇒ 𝑃(𝑡 ≥ 𝑡 ) = .025 ⇒ 𝑡 = 2.228

c.

𝑃(𝑡 ≤ 𝑡 ) = .05 where 𝑑𝑓 = 10 Because of symmetry, the statement can be written ⇒ 𝑃(𝑡 ≥ −𝑡 ) = .05 ⇒ 𝑡 = −1.812

d.

𝑃(𝑡 ≤ −𝑡 or𝑡 ≥ 𝑡 ) = .10 where 𝑑𝑓 = 20 ⇒ 2𝑃(𝑡 ≥ 𝑡 ) = .10 ⇒ 𝑃(𝑡 ≥ 𝑡 ) = .05 ⇒ 𝑡 = 1.725

e.

𝑃(𝑡 ≤ −𝑡 or𝑡 ≥ 𝑡 ) = .01 where 𝑑𝑓 = 5 ⇒ 2𝑃(𝑡 ≥ 𝑡 ) = .01 ⇒ 𝑃(𝑡 ≥ 𝑡 ) = .005 ⇒ 𝑡 = 4.032

a.

𝑃(𝑡 ≥ 𝑡 ) = .025 where 𝑑𝑓 = 11; 𝑡 = 2.201

Copyright © 2022 Pearson Education, Inc.

=


274

Chapter 6 b.

𝑃(𝑡 ≥ 𝑡 ) = .01 where 𝑑𝑓 = 9; 𝑡 = 2.821

c.

𝑃(𝑡 ≤ 𝑡 ) = .005 where 𝑑𝑓 = 6. Because of symmetry, the statement can be rewritten 𝑃(𝑡 ≥ −𝑡 ) = .005 where 𝑑𝑓 = 6; 𝑡 = −3.707 𝑃(𝑡 ≤ 𝑡 ) = .05 where 𝑑𝑓 = 18; 𝑡 = −1.734

d. 6.27

First, we must compute𝑥̄ and s.

(  x) x −

2

2

n

176 −

( 30) 2

6 6 −1

26 = 5.2 , 𝑠 = √5.2 = 2.2804 5

𝑥̄ =

a.

For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table III, Appendix D, with df = 𝑛 − 1 = 6 − 1 = 5, 𝑡. = 2.015. The 90% confidence interval is:

=

𝑥̄ ± 𝑡.

= 5, s 2 =

n −1

⇒ 5 ± 2.015

.

=

=

⇒ 5 ± 1.88 ⇒ (3.12, 6.88)

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table III, Appendix D, = 2.571. The 95% confidence interval is: with df = 𝑛 − 1 = 6 − 1 = 5, 𝑡.

b.

𝑥̄ ± 𝑡.

⇒ 5 ± 2.571

.

⇒ 5 ± 2.39 ⇒ (2.61, 7.39)

For confidence coefficient .99, 𝛼 = .01 and 𝛼/2 = .01/2 = .005. From Table III, Appendix D, = 4.032. The 99% confidence interval is: with df = 𝑛 − 1 = 6 − 1 = 5, 𝑡.

c.

𝑥̄ ± 𝑡. d.

⇒ 5 ± 4.032

.

⇒ 5 ± 3.75 ⇒ (1.25, 8.75)

For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table III, Appendix D, with df = 𝑛 − 1 = 25 − 1 = 24, 𝑡. = 1.711. The 90% confidence interval is:

a)

𝑥̄ ± 𝑡. b)

⇒ 5 ± 1.711

. √

⇒ 5 ± .78 ⇒ (4.22, 5.78)

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table III, Appendix = 2.064. The 95% confidence interval is: D, with df = 𝑛 − 1 = 25 − 1 = 24, 𝑡. 𝑥̄ ± 𝑡.

⇒ 5 ± 2.064

.

⇒ 5 ± .94 ⇒ (4.06, 5.94)

For confidence coefficient .99, 𝛼 = .01 and 𝛼/2 = .01/2 = .005. From Table III, Appendix = 2.797. The 99% confidence interval is: D, with df = 𝑛 − 1 = 25 − 1 = 24, 𝑡.

c)

𝑥̄ ± 𝑡.

⇒ 5 ± 2.797

.

⇒ 5 ± 1.28 ⇒ (3.72, 6.28)

Increasing the sample size decreases the width of the confidence interval. 6.28

For this sample, 𝑥̄ =

=

= 97.9375,

𝑠 =

(∑ )

=

,

= 159.9292, 𝑠 = √𝑠 = 12.6463

Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Estimation with Confidence Intervals a.

For confidence coefficient, .80, 𝛼 = .20 and 𝛼/2 = .20/2 = .10. From Table III, Appendix D, with df = 𝑛 − 1 = 16 − 1 = 15, 𝑡. = 1.341. The 80% confidence interval for 𝜇 is: 𝑥̄ ± 𝑡.

b.

275

⇒ 97.94 ± 1.341

.

⇒ 97.94 ± 4.240 ⇒ (93. 700, 102.180)

For confidence coefficient, .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table III, Appendix D, = 2.131. The 95% confidence interval for 𝜇 is: with df = 𝑛 − 1 = 16 − 1 = 15, 𝑡. 𝑥̄ ± 𝑡.

⇒ 97.94 ± 2.131

. √

⇒ 97.94 ± 6.737 ⇒ (91. 203, 104.677)

The 95% confidence interval for 𝜇 is wider than the 80% confidence interval for 𝜇 found in part a. c.

For part a: We are 80% confident that the true population mean lies between 93.700 and 102.180. For part b: We are 95% confident that the true population mean lies between 91.203 and 104.677. The 95% confidence interval is wider than the 80% confidence interval because the more confident you want to be that 𝜇 lies in an interval, the wider the range of possible values.

6.29

a.

The dairy farmer’s target parameter is µ = true mean December 2019 retail whole milk price for all US cities.

b.

Using MINITAB, the following printout was created: Descriptive Statistics N Mean StDev SE Mean 95% CI for μ 10

3.502

0.574

0.182 (3.091, 3.913)

The point estimate is 𝑥̅ = 3.502 c.

The normal distribution would require a large sample. In this case, a sample of n = 10 was taken.

d.

The 95% confidence interval is (3.091, 3.913)

e.

We are 95% confident that the true mean December 2019 retail whole milk price for all US cities falls between $3.091 and $3.913.

f.

The two conditions that must be satisfied are: • •

6.30

The ten cities sampled were randomly sampled from the population of all US cities The distribution of milk prices in all US cities is approximately normally distributed.

a.

The confidence coefficient used is .90. In repeated sampling, 90% of the intervals created would contain the true population mean.

b.

For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table III, Appendix D, with df = 𝑛 − 1 = 12 − 1 = 11, 𝑡. = 1.796.

c.

We are 90% confident that the true mean rank of China falls between 2.96 and 9.36.

Copyright © 2022 Pearson Education, Inc.


276

Chapter 6

6.31

a.

Using MINITAB, the descriptive statistics are: Descriptive Statistics N Mean StDev SE Mean 99% CI for μ 28

3.214

1.371

0.259 (2.497, 3.932)

The 99% confidence interval is (2.497, 3.932)

6.32

b.

We are 99% confident that the true mean number of wheels used on all social robots is between 2.497 and 3.932.

c.

99% of all similarly constructed confidence intervals will contain the true mean.

We must assume that the distribution of the LOS's for all patients is normal. a.

For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table III, Appendix D, with df = 𝑛 − 1 = 20 − 1 = 19, 𝑡. = 1.729. The 90% confidence interval is: 𝑥̄ ± 𝑡.

6.33

6.34

⇒ 3.8 ± 1.729

.

⇒ 3.8 ± .464 ⇒ (3.336, 4.264)

b.

We are 90% confident that the mean LOS is between 3.336 and 4.264 days.

c.

“90% confidence” means that if repeated samples of size n are selected from a population and 90% confidence intervals are constructed, 90% of all intervals thus constructed will contain the population mean.

a.

From the printout, the 95% confidence interval is (7.639, 8.814).

b.

No. We are 95% confident that the true mean ratio of repair to replacement cost is between 7.639 and 8.814. The value is 7.0 is way below this range and would be very unlikely.

c.

We need to assume that the population of ratios of repair to replacement cost is normally distributed and that the sample was a random sample.

a.

For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table III, Appendix D, withdf = 𝑛 − 1 = 15 − 1 = 14, 𝑡. = 1.761. The 90% confidence interval is: 𝑥̄ ± 𝑡.

6.35

⇒ 18 ± 1.761

⇒ 18 ± 9.09 ⇒ (8.91, 27.09)

b.

Yes. We are 90% confident that the true mean absolute deviation percentage is between 8.91% and 27.09%. These values are all smaller than the 34%.

a.

Using MINITAB, the confidence interval is: One-Sample T: Velocities Variable N Mean Velocities 25 0.26208

StDev 0.04669

SE Mean 0.00934

95% CI (0.24281, 0.28135)

We are 95% confident that the true mean bubble rising velocity is between .24281 and .28135. b.

No. The value of 𝜇 = .338 is not in the interval and would be a very unlikely value for the mean. Thus, the data in the table were not generated at this sparging rate. Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Estimation with Confidence Intervals

6.36

a.

277

Using MINITAB, the confidence interval is: Descriptive Statistics N Mean StDev SE Mean 95% CI for μ 13

7.419

0.922

0.256 (6.862, 7.976)

We are 95% confident that the mean 5-year capitalization rate of all single-property retail tenants is between 6.862% and 7.976%. b.

Using MINITAB, the confidence interval is: Descriptive Statistics N Mean StDev SE Mean 95% CI for μ 4

8.238

0.492

0.246 (7.454, 9.021)

We are 95% confident that the mean 5-year capitalization rate of all single-property retail tenants with a low S&P rating is between 7.454% and 9.021%. c.

Using MINITAB, the confidence interval is: Descriptive Statistics N Mean StDev SE Mean 95% CI for μ 3

6.333

0.722

0.417 (4.541, 8.126)

We are 95% confident that the mean 5-year capitalization rate of all single-property retail tenants with a high S&P rating is between 4.541% and 8.126%. 6.37

Using MINITAB, the confidence intervals are: Descriptive Statistics Sample

N Mean StDev SE Mean 99% CI for μ

Traditionalism Adaptation

9 9

6.556 5.00

2.698 3.28

0.899 (3.538, 9.573) 1.09 (1.33, 8.67)

a.

We are 99% confident that the true mean perceived level of traditionalism at all Greek restaurants is between 3.538 and 9.573.

b.

We are 99% confident that the true mean perceived level of adaptation at all Greek restaurants is between 1.33 and 8.67.

c.

Because the 99% confidence interval for traditionalism does not include the value 2, it would surprise us if the true mean perceived level of traditionalism was equal to 2.

d.

Because the 99% confidence interval for adaptation does include the value 2, it would not surprise us if the true mean perceived level of adaptation was equal to 2.

Copyright © 2022 Pearson Education, Inc.


278

Chapter 6

6.38

a.

Using MINITAB, the descriptive statistics are: Descriptive Statistics: Diox Amt Variable Crude N Mean Diox Amt No 10 2.590 Yes 6 0.517

StDev 1.542 0.407

Minimum 0.100 0.200

Q1 1.125 0.200

Median 2.850 0.450

Q3 4.000 0.700

Maximum 4.000 1.300

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table III, Appendix D, with df = 𝑛 − 1 = 6 − 1 = 5,𝑡. = 2.571. The 90% confidence interval is: 𝑥̄ ± 𝑡.

⇒ .517 ± 2.571

.

⇒ .517 ± .427 ⇒ (. 090, .944)

We are 95% confident that the true mean amount of dioxide present in water specimens that contain oil is between .090 and .944 mg/l. b.

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table III, Appendix D, with df = 𝑛 − 1 = 10 − 1 = 9,𝑡. = 2.262. The 90% confidence interval is: 𝑥̄ ± 𝑡.

⇒ 2.590 ± 2.262

. √

⇒ 2.590 ± 1.103 ⇒ (1.487, 3.693)

We are 95% confident that the true mean amount of dioxide present in water specimens that do not contain oil is between 1.487 and 3.693mg/l.

6.39

c.

Since the confidence interval for the mean amount of dioxide present in water specimens that contain oil is entirely below the confidence interval for the mean amount of dioxide present in water specimens that do not contain oil, we can conclude that the mean amount of dioxide present in water containing oil is significantly less than the mean amount of dioxide present in water not containing oil.

a.

The population from which the sample was drawn is the Forbes 228 Largest Private companies.

b.

Using MINITAB, the confidence interval is: Descriptive Statistics N Mean StDev SE Mean 15

6.40

7.46

7.18

1.85 (2.60, 12.32)

c.

We are 98% confident that the true mean revenue is between $2.60 and $12.32 billion.

d.

The population must be approximately normally distributed in order for the procedure used in part b to be valid.

e.

Yes. The value of $5.0 billion dollars falls in the 98% confidence interval computed in part b. Therefore, we should believe the claim.

By the Central Limit Theorem, the sampling distribution of 𝑝̂ is approximately normal with mean 𝜇 = 𝑝 and standard deviation 𝜎 =

6.41

98% CI for μ

.

The sample size is large enough if both 𝑛𝑝̂ ≥ 15 and 𝑛𝑞 ≥ 15. Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Estimation with Confidence Intervals

6.42

279

a.

When 𝑛 = 400, 𝑝̂ = .10: 𝑛𝑝̂ = 400(. 10) = 40 and 𝑛𝑞 = 400(. 90) = 360 Since both numbers are greater than or equal to 15, the sample size is sufficiently large to conclude the normal approximation is reasonable.

b.

When 𝑛 = 50, 𝑝̂ = .10: 𝑛𝑝̂ = 50(. 10) = 5 and 𝑛𝑞 = 50(. 90) = 45 Since 𝑛𝑝̂ is less than 15, the sample size is not large enough to conclude the normal approximation is reasonable.

c.

When 𝑛 = 20, 𝑝̂ = .5: 𝑛𝑝̂ = 20(. 5) = 10 and 𝑛𝑞 = 20(. 5) = 10 Since both numbers are less than 15, the sample size is not large enough to conclude the normal approximation is reasonable.

d.

𝑛𝑝̂ = 20(. 3) = 6 and 𝑛𝑞 = 20(. 7) = 14 When 𝑛 = 20, 𝑝̂ = .3: Since both numbers are less than 15, the sample size is not large enough to conclude the normal approximation is reasonable.

a.

The sample size is large enough if both 𝑛𝑝̂ ≥ 15 and 𝑛𝑞 ≥ 15. 𝑛𝑝̂ = 121(.88) = 106.48 and 𝑛𝑞 = 121(.12) = 14.52 Since 𝑛𝑞 is less than 15, the sample size is not large enough to conclude the normal approximation is reasonable. However, 14.52 is very close to 15, so the normal approximation may work fairly well.

b.

For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table II, Appendix D, 𝑧. 1.645. The 90% confidence interval is: pˆ ± z .05

6.43

=

ˆˆ pq pq .88(.12)  pˆ ± 1.645  .88 ± 1.645  .88 ± .049  (.831, .929 ) n n 121

c.

We must assume that the sample is a random sample from the population of interest and that the sample size is sufficiently large.

a.

The sample size is large enough if both 𝑛𝑝̂ ≥ 15 and 𝑛𝑞 ≥ 15. 𝑛𝑝̂ = 225(.46) = 103.5 and 𝑛𝑞 = 225(.54) = 121. 5 Since both numbers are greater than or equal to 15, the sample size is sufficiently large to conclude the normal approximation is reasonable.

b.

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. = 1.96. The 95% confidence interval is: pˆ ± z .025

6.44

ˆˆ pq pq .46(.54)  pˆ ± 1.96  .46 ± 1.96  .46 ± .065  (.395, .525 ) n n 225

c.

We are 95% confident the true value of p falls between .395 and .525.

d.

"95% confidence interval" means that if repeated samples of size 225 were selected from the population and 95% confidence intervals formed, 95% of all confidence intervals will contain the true value of p.

a.

Of the 50 observations, 15 like the product⇒ 𝑝̂ =

= .30.

The sample size is large enough if both 𝑛𝑝̂ ≥ 15 and 𝑛𝑞 ≥ 15. Copyright © 2022 Pearson Education, Inc.


280

Chapter 6 𝑛𝑝̂ = 50(. 3) = 15 and 𝑛𝑞 = 50(. 7) = 35 Since both numbers are greater than or equal to 15, the sample size is sufficiently large to conclude the normal approximation is reasonable. For the confidence coefficient .80,𝛼 = .20 and 𝛼/2 = .20/2 = .10. From Table II, Appendix D, 𝑧. = 1.28. The confidence interval is: pˆ ± z .10

6.45

ˆˆ pq pq .3(.7)  pˆ ± 1.28  .3 ± 1.28  .3 ± .083  (.217, .383) n n 50

b.

We are 80% confident the proportion of all consumers who like the new snack food is between .217 and .383.

a.

𝑝̂ =

b.

For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table II, Appendix D, 𝑧. = 1.645. The confidence interval is:

=

= .5

⇒ .5 ± 1.645

𝑝̂ ± 𝑧.

6.46

. (. )

⇒ .5 ± .052 ⇒ (. 448, .552)

c.

We are 90% confident that the true proportion of all U.S. adults who would agree to participate in a store loyalty card program, despite the potential for information sharing is between .448 and .552.

d.

“We are 90% confident” means that in repeated sampling, 90% of all intervals constructed in a similar manner will contain the true proportion.

a.

𝑝̂ =

b.

The sample size is large enough if both 𝑛𝑝̂ ≥ 15 and 𝑛𝑞 ≥ 15. 𝑛𝑝̂ = 800(. 2) = 160 and 𝑛𝑞 = 800(. 8) = 640. Since both numbers are greater than or equal to 15, the sample size is sufficiently large to conclude the normal approximation is reasonable.

=

= .20

By the Central Limit Theorem, the sampling distribution of 𝑝̂ is approximately normal with mean 𝜇 = 𝑝 and standard deviation 𝜎 = c.

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, = 1.96. The 95% confidence interval is: 𝑧. 𝑝̂ ± 𝑧.

6.47

.

⇒ .20 ± 1.96

. (. )

⇒ .20 ± .028 ⇒ (. 172, .228)

d.

We are 95% confident the true value proportion of all cord cutters falls between .172 and .228.

e.

Since .10 falls below the 95% confidence interval, the claim that p = .10 is not believable.

a.

𝑝̂ =

b.

The sample size is large enough if both 𝑛𝑝̂ ≥ 15 and 𝑛𝑞 ≥ 15. 𝑛𝑝̂ = 1,083(. 24) = 260 and 𝑛𝑞 = 1,083(. 76) = 823. Since both numbers are greater than or equal to 15, the sample size is sufficiently

=

,

= .24

Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Estimation with Confidence Intervals

281

large to conclude the normal approximation is reasonable. By the Central Limit Theorem, the sampling distribution of 𝑝̂ is approximately normal with mean 𝜇 = 𝑝 and standard deviation 𝜎 =

.

For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table II, Appendix D, 𝑧. 1.645. The 90% confidence interval is: 𝑝̂ ± 𝑧.

6.48

.

⇒ .24 ± 1.645

(.

)

,

=

⇒ .24 ± .021 ⇒ (. 219, .261)

c.

We are 90% confident the true value proportion of all Americans who believe they have achieved the American Dream falls between .219 and .261.

d.

In repeated sampling, 90% of the intervals created would contain the true value of p.

a.

𝑝̂ =

=

= .594

For confidence coefficient .99, 𝛼 = .01 and 𝛼/2 = .01/2 = .005. From Table II, Appendix D, 𝑧. = 2.58. The confidence interval is: ⇒ .594 ± 2.58

𝑝̂ ± 𝑧.

.

(.

)

⇒ .594 ± .123 ⇒ (. 471, .717)

We are 99% confident that the true proportion of all social robots designed with legs but no wheels is between .471 and .717.

6.49

b.

Since .40 does not fall in the 99% confidence interval, it is very unlikely that the true proportion of all social robots designed with legs but no wheels is .40.

a.

The population is the set of all small businesses.

b.

The sample is the 529 small businesses that data was collected from in this study.

c.

The parameter of interest is p = the proportion of all small businesses that currently have a website.

d.

𝑝̂ =

=

= .599

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. = 1.96. The confidence interval is: ⇒ .599 ± 1.96

𝑝̂ ± 𝑧.

.

(.

)

⇒ .599 ± .042 ⇒ (. 557, .641)

We are 95% confident that the true proportion of all small businesses that currently have a website is between .557 and .641. 6.50

𝑝=

=

,

=

,

= .058

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. 1.96. The confidence interval is:

Copyright © 2022 Pearson Education, Inc.

=


282

Chapter 6

𝑝 ± 𝑧. 6.51

𝑝̂ =

=

, ,

(.

.

⇒ .058 ± 1.96

)

,

⇒ .058 ± .011 ⇒ (. 047, .069)

= .799

For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table II, Appendix D, 𝑧. = 1.645. The confidence interval is: (.

.

⇒ .799 ± 1.645

𝑝̂ ± 𝑧.

)

,

⇒ .799 ± .017 ⇒ (. 782, .816)

We are 90% confident that the true probability of an expected cyberattack at a firm during the year is between .782 and .816. 6.52

a.

For confidence coefficient .99, 𝛼 = .01 and 𝛼/2 = .01/2 = .005. From Table II, Appendix D, = 2.58. The confidence interval is: 𝑧. ⇒ .37 ± 2.58

𝑝̂ ± 𝑧.

6.53

.

(.

)

⇒ .37 ± .076 ⇒ (. 294, .446)

b.

We are 99% confident that the true proportion of all U.S. adult workers who prepare their own tax return is between .294 and .446. Because this interval does not contain .50, the claim is invalid.

c.

The sample was not a random sample, but a convenience or biased sample. Thus, the results will not be valid.

𝑝̂ =

=

= .22

Suppose we form a 95% confidence interval for the true proportion of minority-owned franchises in Mississippi. For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. = 1.96. The confidence interval is: ⇒ .22 ± 1.96

𝑝̂ ± 𝑧.

.

(.

)

⇒ .22 ± .08 ⇒ (. 14, .30)

We are 95% confident that the true percentage of minority-owned franchises in Mississippi is between 14% and 30%. Since 30.8% falls above this interval, we would conclude that the percentage of minority-owned franchises in Mississippi is less than the national value. 6.54

𝑝̂ =

=

= .82

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. = 1.96. The confidence interval is: 𝑝̂ ± 𝑧.

⇒ .82 ± 1.96

.

(.

)

⇒ .82 ± .114 ⇒ (. 706, .934)

We are 95% confident that the true proportion of aircraft bird strikes that occurred above 100 feet is between .704 and .932. Thus, the claim that less than 70% of aircraft bird strikes occur above 100 feet is invalid.

Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Estimation with Confidence Intervals 6.55

283

a.

In order for the large-sample estimation method to be valid, 𝑛𝑝̂ ≥ 15 and 𝑛𝑞 ≥ 15. For this exercise, 𝑝̂ = = = .003 , 𝑛𝑝̂ = 333(. 003) = .999, and 𝑛𝑞 = 333(. 997) = 332.001. Since one of these values is less than 15, the large-sample estimation method is not valid.

b.

𝑝=

=

=

= .009

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. = 1.96. The confidence interval is: 𝑝 ± 𝑧.

(.

.

⇒ .009 ± 1.96

)

⇒ .009 ± .010 ⇒ (−.001, .019)

We are 95% confident that the true proportion of all mountain casualties that require a femoral shaft splint is between 0 and .019. (We know the proportion cannot be negative, so the lower end point must be 0.) 6.56

a.

The point estimate of p is 𝑝̂ = = = .052. The sample size is large enough if both 𝑛𝑝̂ ≥ 15 and 𝑛𝑞 ≥ 15. 𝑛𝑝̂ = 308(. 052) = 16 and 𝑛𝑞 = 308(.948) = 292 Since both numbers are greater than or equal to 15, the sample size is sufficiently large to conclude the normal approximation is reasonable. For confidence coefficient .99, 𝛼 = .01 and 𝛼/2 = .01/2 = .005. From Table II, Appendix D, = 2.58. The confidence interval is: 𝑧. 𝑝̂ ± 𝑧.

⇒ .052 ± 2.58

.

(.

)

⇒ .052 ± .033 ⇒ (. 019, .085)

We are 99% confident that the true proportion of diamonds for sale on the open market that are classified as “D” color is between .019 and .085. b.

The point estimate of p is 𝑝̂ =

=

= .263.

The sample size is large enough if both 𝑛𝑝̂ ≥ 15 and 𝑛𝑞 ≥ 15. 𝑛𝑝̂ = 308(.263) = 81 and 𝑛𝑞 = 308(.737) = 227 Since both numbers are greater than or equal to 15, the sample size is sufficiently large to conclude the normal approximation is reasonable. For confidence coefficient .99,𝛼 = .01 and 𝛼/2 = .01/2 = .005. From Table II, Appendix D, = 2.58. The confidence interval is: 𝑧. 𝑝̂ ± 𝑧.

⇒ .263 ± 2.58

.

(.

)

⇒ .263 ± .065 ⇒ (. 198, .328)

We are 99% confident that the true proportion of diamonds for sale on the open market that are classified as “VS1” clarity is between .198 and .328.

Copyright © 2022 Pearson Education, Inc.


284

Chapter 6

6.57

Let p = the true proportion of all Android apps with a malicious reflection API. = .405. The point estimate of p is 𝑝̂ = = For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. 1.96. The confidence interval is: (.

.

⇒ .405 ± 1.96

𝑝̂ ± 𝑧.

)

=

⇒ .405 ± .037 ⇒ (. 367, .442)

We are 95% confident that the true proportion of all Android apps with a malicious reflection API falls between 36.7% and 44.2%. 6.58

a.

Let p = the true proportion of products purchased that are considered healthy during the indulgent scent hour. The point estimate of p is 𝑝̂ =

=

= .295.

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. = 1.96. The confidence interval is:

b.

.

⇒ .295 ± 1.96

𝑝̂ ± 𝑧.

(.

)

⇒ .295 ± .053 ⇒ (. 242, .348)

Let p = the true proportion of products purchased that are considered healthy during the non-indulgent scent hour. The point estimate of p is 𝑝̂ =

=

= .455

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. = 1.96. The confidence interval is:

6.59

.

⇒ .455 ± 1.96

𝑝̂ ± 𝑧.

(.

)

⇒ .455 ± .042 ⇒ (. 413, .497)

c.

Since .42 falls in the non-indulgent confidence interval and outside of the indulgent confidence interval, it is much more likely that it comes from the non-indulgent scent condition.

𝑝̂ =

=

, ,

= .85

Suppose we form a 95% confidence interval for the true proportion of first class mail within the same city that is delivered on time between Dec. 10 and Mar. 3. For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = = 1.96. The confidence interval is: .05/2 = .025. From Table II, Appendix D, 𝑧. 𝑝̂ ± 𝑧.

⇒ .85 ± 1.96

.

(. ,

)

⇒ .85 ± .001 ⇒ (. 849, .851)

We are 95% confident that the true proportion of first class mail within the same city that is delivered on time between Dec. 10 and Mar. 3 is between .849 and .851 or between 84.9% and 85.1%. This interval does not contain the reported 95% of first class mailed delivered on time. It appears that the performance of the USPS is below the standard during this time period. Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Estimation with Confidence Intervals 6.60

Thus, 𝑛 = a.

( . )

.

where 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From

= 1.96.

Table II, Appendix D, 𝑧.

6.61

/

To compute the necessary sample size, use 𝑛 =

= 307.328 ≈ 308. You would need to take 308 samples.

.

/

To compute the necessary sample size, use 𝑛 = From Table II, Appendix D, 𝑧.

b.

.

a.

( )

=

=1

where𝛼 = .10 and 𝛼/2 = .10/2 = .05.

= 1.645.

= 67.65 ≈ 68

.

A less conservative estimate of 𝜎 is obtained from range ≈ 6𝑠 ⇒ 𝑠 ≈ /

Thus, 𝑛 = 6.62

range

An estimate of 𝜎 is obtained from: range ≈ 4𝑠 ⇒ 𝑠 ≈

Thus, 𝑛 =

285

.

=

(. .

)

=

= .6667

= 30.07 ≈ 31

To compute the needed sample size, use 𝑛 = From Table II, Appendix D, 𝑧. = 1.96.

(

)

, where 𝛼 = .05 and 𝛼/2 = .05/2 = .025.

2 1.96) (.2)(.8) ( Thus, n = = 96.04 ≈ 97 . You would need to take a sample of size 97.

.08

b.

6.63

2

To compute the needed sample size, use 𝑛 = need to take a sample of size 151.

)

=

( .

) (. ). ) .

= 150.0625 ≈ 151. You would

For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table II, Appendix D, 𝑧. 1.645. We know 𝑝̂ is in the middle of the interval, so 𝑝̂ = The confidence interval is 𝑝̂ ± 𝑧. We know . 4 − 1.645 ⇒ .4 −

6.64

(

a.

.

. (. )

.

⇒ .4 ± 1.645

= .4 . (. )

= .26

= .26 ⇒ .4 − .26 =

.

=

. √

⇒ √𝑛 =

. .

= 5.756 ⇒ 𝑛 = 5.756² = 33. 1 ≈ 34

For a width of 5 units, 𝑆𝐸 = 5/2 = 2.5. To compute the needed sample size, use 𝑛 = From Table II, Appendix D, 𝑧. Thus, 𝑛 =

.

( .

)

/

where 𝛼 = .05 and 𝛼/2 = .05/2 = .025.

= 1.96.

= 120.47 ≈ 121

Copyright © 2022 Pearson Education, Inc.


286

Chapter 6 You would need to take 121 samples at a cost of 121($10) = $1210. Yes, you do have sufficient funds. b.

For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table II, Appendix D, 𝑧. 1.645. 𝑛=

.

(

)

.

=

= 84.86 ≈ 85

You would need to take 85 samples at a cost of 85($10) = $850. You still have sufficient funds but have an increased risk of error. 6.65

a.

The width of a confidence interval is 𝑊 = 2(𝑆𝐸) = 2𝑧 /

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, = 1.96. 𝑧. For 𝑛 = 16, 𝑊 = 2𝑧 /

For 𝑛 = 25, 𝑊 = 2𝑧 /

For 𝑛 = 49, 𝑊 = 2𝑧 /

= 2(1.96) = 2(1.96) = 2(1.96)

For 𝑛 = 100, 𝑊 = 2𝑧 /

For 𝑛 = 400, 𝑊 = 2𝑧 /

= 0.98

= 0.784

= 0.56

= 2(1.96) = 2(1.96)

= 0.392 = 0.196

b.

6.66

The sample size will be larger than necessary for any p other than .5.

6.67

For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table II, Appendix D, 𝑧. 1.645. Since we have no estimate given for the value of p, we will use .5. The sample size is: 𝑛= (

/

)

=

(. )(. )

. .

= 1,691. 3 ≈ 1,692

Copyright © 2022 Pearson Education, Inc.

=


Inferences Based on a Single Sample: Estimation with Confidence Intervals 6.68

To compute the needed sample size, use 𝑛 =

.

Thus, 𝑛 =

(

.

)

.

Thus, 𝑛 =

/

=

= 1.96.

(.

)

= 34.57 ≈ 35

.

(

.

)

/

(.

.

=

)

)(.

/

)

=

.

= 285.4 ≈ 286

(.

)(.

)

.

a.

b.

=

= 226.8 ≈ 227

For confidence coefficient .99, 𝛼 = .01 and 𝛼/2 = .01/2 = .005. From Table II, Appendix D, 𝑧. 2.575. From the previous estimate, we will use 𝑝̂ = 1/3 to estimate p.

=

𝑧 / 𝑝𝑞 2.575 (1/3)(2/3) = = 14,734.7 ≈ 14,735 (𝑆𝐸) . 01

Since no level of significance was given, we will use 95%. From Exercise 6.16, 𝑠 = 2.755. For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D,𝑧. = 1.96. 𝑛=

6.75

)

.

𝑛= 6.74

= 1.645.

= 43.3 ≈ 44

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. 1.96. We will use the estimate of p from Exercise 6.54 or 𝑝̂ = .82. The sample size is: 𝑛= (

6.73

where 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From

From Exercise 6.48, 𝑝̂ = .594. For confidence coefficient .99, 𝛼 = .01 and 𝛼/2 = .01/2 = .005. From = 2.58. Table II, Appendix D, 𝑧. 𝑛= (

6.72

/

For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table II, Appendix D,𝑧. 𝑛=

6.71

where 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table

= 82.83 ≈ 83

To compute the needed sample size, use 𝑛 = Table II, Appendix D, 𝑧.

6.70

/

= 1.645.

II, Appendix D, 𝑧.

6.69

287

/

.

=

( . .

)

= 116.6 ≈ 117

Answers will vary. A plan would need to be devised so that the selected shoppers were selected from a variety of different stores in a variety of locations so that the sample would be representative of the entire population.

To compute the needed sample size, use 𝑛 =

where 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From

= 1.96.

Table II, Appendix D, 𝑧. Thus, for 𝑠 = 10, 𝑛 =

/

.

(

)

= 42.68 ≈ 43

Copyright © 2022 Pearson Education, Inc.


288

6.76

Chapter 6

For 𝑠 = 20, 𝑛 =

.

(

)

= 170.72 ≈ 171

For 𝑠 = 30, 𝑛 =

.

(

)

= 384.16 ≈ 385 .

= .02. For confidence The width of a confidence interval is 𝑊 = 2(𝑆𝐸). For a width of .04, 𝑆𝐸 = coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. = 1.96. We will use the estimate of p from Exercise 6.46 or 𝑝̂ = .20. The sample size is: 𝑛= (

6.77

/

)

(.

.

=

)(.

.

= 1,536.64 ≈ 1,537 /

To compute the necessary sample size, use 𝑛 = Table II, Appendix D, 𝑧. Thus, 𝑛 =

6.78

)

(

.

)

where 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From

= 1.645.

= 270.6 ≈ 271

The bound is SE = .05. For confidence coefficient .99, 𝛼 = .01 and 𝛼/2 = .01/2 = .005. From Table II, = 2.575. Appendix D, 𝑧. We estimate p with 𝑝̂ = 11/27 = .407. Thus, 𝑛 =

(

)

=

.

(.

)(.

.

)

= 640.1 ≈ 641

The necessary sample size would be 641. The sample was not large enough. 6.79

a.

To compute the needed sample size, use𝑛 = Table II, Appendix D, 𝑧. Thus, 𝑛 =

( )

. .

/

where 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From

= 1.645.

= 1,082.41 ≈ 1,083

b.

As the sample size decreases, the width of the confidence interval increases. Therefore, if we sample 100 parts instead of 1,083, the confidence interval would be wider.

c.

To compute the maximum confidence level that could be attained meeting the management's specifications, 𝑛=

/

⇒ 100 =

/ ( )

.

⇒𝑧 / =

(.

)

= .25 ⇒ 𝑧 / = .5

Using Table II, Appendix D, 𝑃(0 ≤ 𝑧 ≤ .5) = .1915. Thus, 𝛼/2 = .5000 − .1915 = .3085, 𝛼 = 2(. 3085) = .617, and 1 − 𝛼 = 1 − .617 = .383. The maximum confidence level would be 38.3%. 6.80

a.

Percentage sampled = (100%) =

(100%) = 40%

Finite population correction factor:

=

= √. 6 = .7746

Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Estimation with Confidence Intervals b.

c.

Percentage sampled = (100%) =

(100%) = 20%

Finite population correction factor:

=

Percentage sampled = (100%) =

Percentage sampled = (100%) = Finite population correction factor:

6.81

6.82

𝜎̄=

= √. 8 = .8944

(100%) = 10%

,

=

Finite population correction factor: d.

289

,

,

= √. 9 = .9487

,

(100%) = 1% =

, ,

= √. 99 = .995

a.

𝜎̄ =

b.

𝜎̄ =

c.

𝜎̄ =

d.

𝜎̄ =

a.

For𝑛 = 64, with the finite population correction factor: 𝜎̄=

= 4.90

= 5.66

√ ,

= 6.00

,

√ ,

= 6.293

,

=

= 3√. 9872 = 2.9807

Without the finite population correction factor: 𝜎 ̄ = 𝑠/√𝑛 =

=3

𝜎 ̄ without the finite population correction factor is slightly larger. b.

For 𝑛 = 400, with the finite population correction factor: 𝜎̄ =

=

= 1.2√. 92 = 1.1510

Without the finite population correction factor: 𝜎 ̄ = 𝑠/√𝑛 = 𝜎 ̄ without the finite population correction factor is larger.

6.83

= 1.2

c.

In part a, n is smaller relative to N than in part b. Therefore, the finite population correction factor did not make as much of a difference in the answer in part a as in part b.

a.

𝜎̄ =

=

, √

,

= 1.00

Copyright © 2022 Pearson Education, Inc.


290

6.84

Chapter 6

,

b.

𝜎̄ =

c.

𝜎̄ =

d.

As n increases, 𝜎 ̄ decreases.

e.

We are computing the standard error of 𝑥̄ . If the entire population is sampled, then 𝑥̄ = 𝜇. There is no sampling error, so 𝜎 ̄ = 0.

,

=0

,

An approximate 95% confidence interval for 𝜇 is: ⇒ 422 ± 2

⇒ 422 ± 4.184 ⇒ (417.816, 426.184)

The approximate 95% confidence interval for p is 𝑝̂ ± 2𝜎 ⇒ 𝑝̂ ± 2

6.86

,

,

𝑥̄ ± 2𝜎 ̄ ⇒ 𝑥̄ ± 2 6.85

= .6124

,

a.

𝑥̄ =

=

(

)

= 36.03 s 2 =

(.

.

⇒ .42 ± 2

)

( x) x −

⇒ .42 ± .021 ⇒ (. 399, .441)

2

2

n

=

n −1

1, 0812 30 = 96.3782 30 − 1

41, 747 −

The approximate 95% confidence interval is: 𝑥̄ ± 2𝜎 ̄ ⇒ 𝑥̄ ± 2 b.

𝑝̂ =

=

⇒ 36.03 ± 2

.

⇒ 36.03 ± 3.40 ⇒ (32.63, 39.43)

= .7

The approximate 95% confidence interval is: 𝑝̂ ± 2𝜎 ⇒ 𝑝̂ ± 2 6.87

a.

(

)

First, we must estimate p: 𝑝̂ =

⇒ .7 ± 2 =

. (. )

⇒ .7 ± .159 ⇒ (. 541, .859)

= .560

,

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, = 1.96. Since 𝑛/𝑁 = 1,355/1,696 = .799 > .05, we must use the finite population correction 𝑧. factor. The 95% confidence interval is: 𝑝̂ ± 𝑧.

⇒ .560 ± 1.96

(.

. ,

)

,

, ,

⇒ .560 ± .012 ⇒ (. 548, .572)

b.

We used the finite correction factor because the sample size was very large compared to the population size.

c.

We are 95% confident that the true proportion of active NFL players who select a professional coach as the most influential in their career is between .548 and .572. Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Estimation with Confidence Intervals

6.88

a.

291

For 𝑁 = 2,193, 𝑛 = 115, 𝑥̄ = 184,134, and 𝑠 = 66,181, the 95% confidence interval is: 𝑥̄ ± 2𝜎 ̄ ⇒ 𝑥̄ ± 2

,

⇒ 184,134 ± 2

,

⇒ 184,134 ± 12,014.84

,

⇒ (172,119.16, 196,148.84)

6.89

b.

We are 95% confident that the mean salary of all vice presidents who subscribe to Quality Progress is between $172,119.16 and $196,148.84.

a.

First, we must calculate the sample mean: 𝑥̄ =

=

(

)

(

)

(

) ⋅⋅⋅

(

)

,

=

= 156.46

The point estimate of the mean value of the parts inventory is 𝑥̄ = 156.46. b.

The sample variance and standard deviation are: 15

s 2 = i =1

( f x ) fx −

2

i i

2 i i

n

n −1

3 (108) + 2 ( 55) + ⋅⋅⋅ + 19 (100) − 2

=

2

2

15, 6462 100

100 − 1

2

=

6, 776,336 − 99

15, 646 100 = 43, 720.83677

𝑠 = √𝑠 = √43,720.83677 = 209.10 The estimated standard error is 𝜎 ̄ = c.

=

.

= 18.7025

The approximate 95% confidence interval is: 𝑥̄ ± 2𝜎 ̄ ⇒ 𝑥̄ ± 2

⇒ 156.46 ± 2(18. 7025) ⇒ 156.46 ± 37. 405 ⇒

(119. 055, 193.865) We are 95% confident that the mean value of the parts inventory is between $119.06 and $193.87.

6.90

d.

Since the interval in part c does not include $300, the value of $300 is not a reasonable value for the mean value of the parts inventory.

a.

The population of interest is the set of all households headed by women that have incomes of $25,000 or more in the database.

b.

Yes. Since 𝑛/𝑁 = 1,333/25, 000 = .053 exceeds .05, we need to apply the finite population correction.

c.

The standard error for𝑝̂ should be 𝜎 =

d.

For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table II, Appendix D, 𝑧. 1.645. The approximate 90% confidence interval is:

(

)

=

.

Copyright © 2022 Pearson Education, Inc.

(

.

)

,

, ,

= .012 =


292

Chapter 6

𝑝̂ ± 1.645𝜎 ⇒ .708 ± 1.645(. 012) ⇒ .708 ± .020 ⇒ (.688, .728) 6.91

For 𝑁 = 1,500, 𝑛 = 35, 𝑥̄ = 1, and 𝑠 = 124, the 95% confidence interval is: 𝑥̄ ± 2𝜎 ̄ ⇒ 𝑥̄ ± 2

,

⇒ 1±2

35

⇒ 1 ± 41.43 ⇒ (−40.43, 42.43)

,

We are 95% confident that the mean error of the new system is between -$40.43 and $42.43. 6.92

𝑝̂ =

=

= .086 (

The standard error of 𝑝̂ is 𝜎 =

)

(

.

=

.

)

= .0206

An approximate 95% confidence interval for p is 𝑝̂ ± 2𝜎 ⇒ .086 ± 2(. 0206) ⇒ .086 ± .041 ⇒ (. 045, .127) Since .07 falls in the 95% confidence interval, it is not an uncommon value. Thus, there is no evidence that more than 7% of the corn-related products in this state have to be removed from shelves and warehouses. 6.93

6.94

a.

𝛼/2 = .05/2 = .025; 𝜒.

b.

𝛼/2 = .10/2 = .05; 𝜒.

c.

𝛼/2 = .01/2 = .005; 𝜒.

,

= 39.9968 and 𝜒.

,

= 7.43386

d.

𝛼/2 = .05/2 = .025; 𝜒.

,

= 34.1696 and 𝜒.

,

= 9.59083

a.

For confidence level .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. Using Table IV, Appendix D, with df = 𝑛 − 1 = 50 − 1 = 49, 𝜒. , ≈ 67.5048 and 𝜒. , ≈ 34.7642. The 90% confidence interval is: (

) .

b.

) .

) .

(

) .

≤𝜎 ≤

.

= 7.96164

,

(

) .

⇒ 4.537 ≤ 𝜎 ≤ 8.809

.

≤𝜎 ≤

(

) .

(

).

≤𝜎 ≤

.

(

).

⇒ .00024 ≤ 𝜎 ≤ .00085

.

For confidence level .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. Using Table IV, Appendix D, with df = 𝑛 − 1 = 22 − 1 = 21, 𝜒. , = 32.6705 and 𝜒. , = 11.5913. The 90% confidence interval is: (

) .

d.

(

= 26.2962 and 𝜒.

,

= 1.68987

,

For confidence level .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. Using Table IV, Appendix D, with df = 𝑛 − 1 = 15 − 1 = 14, 𝜒. , = 23.6848 and 𝜒. , = 6.57063. The 90% confidence interval is: (

c.

≤𝜎 ≤

= 16.0128 and 𝜒.

,

≤𝜎 ≤

(

) .

(

) .

.

≤𝜎 ≤

(

) .

.

⇒ 641.86 ≤ 𝜎 ≤ 1,809.09

For confidence level .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. Using Table IV, Appendix D, with df = 𝑛 − 1 = 5 − 1 = 4, 𝜒. , = 9.48773 and 𝜒. , = .710721. The 90% confidence interval is: (

) .

≤𝜎 ≤

(

) .

(

) . .

≤𝜎 ≤

(

) . .

⇒ .94859 ≤ 𝜎 ≤ 12.6632

Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Estimation with Confidence Intervals 6.95

6.96

293

To find the 90% confidence interval for𝜎, we need to take the square root of the end points of the 90% confidence interval for𝜎 from Exercise 6.94. a.

The 90% confidence interval for𝜎is:

√4.537 ≤ 𝜎 ≤ √8.809 ⇒ 2.13 ≤ 𝜎 ≤ 2.97

b.

The 90% confidence interval for𝜎is:

√. 00024 ≤ 𝜎 ≤ √. 00085 ⇒ .016 ≤ 𝜎 ≤ .029

c.

The 90% confidence interval for𝜎is:

√641.86 ≤ 𝜎 ≤ √1,809.09 ⇒ 25.34 ≤ 𝜎 ≤ 42.53

d.

The 90% confidence interval for𝜎is:

√. 94859 ≤ 𝜎 ≤ √12.6632 ⇒ .974 ≤ 𝜎 ≤ 3.559

Using MINITAB, the descriptive statistics are: Descriptive Statistics: x

Variable x

N 6

Mean 6.17

StDev 3.31

Minimum 2.00

Q1 2.75

Median 6.50

Q3 8.75

Maximum 11.00

For confidence level .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. Using Table IV, Appendix D, with df = 𝑛 − 1 = 6 − 1 = 5, 𝜒. , = 12.8325 and 𝜒. , = .831211. The 95% confidence interval is: (

) .

6.97

≤𝜎 ≤

(

) .

(

) . .

≤𝜎 ≤

(

) .

⇒ 4.269 ≤ 𝜎 ≤ 65.904

.

a.

The target parameter is 𝜎 , the population variation in the internal oil content measurements for sweet potato chips.

b.

For confidence level .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. Using Table IV, Appendix D, with df = 𝑛 − 1 = 6 − 1 = 5, 𝜒. , = 12.8325 and 𝜒. , = .831211. The 95% confidence interval is: (

) .

≤𝜎 ≤

(

) .

(

)

≤𝜎 ≤

.

(

)

⇒ 47.15 ≤ 𝜎 ≤ 727.85

.

c.

“95% confidence” means that in repeated sampling, 95% of the intervals constructed in a similar manner will contain the true value of 𝜎 .

d.

We must assume that a random sample was selected and that the population of interest is approximately normal.

e.

The variance is measured in square millions of grams. This is difficult to relate to the data. The standard deviation is measured in millions of grams, the same units as the data.

f.

The 95% confidence interval for 𝜎 is: √47.15 ≤ 𝜎 ≤ √727.85 ⇒ 6.87 ≤ 𝜎 ≤ 26.98 We are 95% confident that the true standard deviation of internal oil content measurements for sweet potato chips is between 6.87 and 26.98.

6.98

a.

The 90% confidence interval for 𝜎 is (25.9, 27.9).

b.

For confidence level .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. Using MINITAB with df = 𝑛 − 1 = 992 − 1 = 991, 𝜒. , = 1065.35 and 𝜒. , = 918.926. The 90% confidence interval for 𝜎 is: (

) .

≤𝜎 ≤

(

) .

(

) .

≤𝜎 ≤

(

) .

⇒ 671.6 ≤ 𝜎 ≤ 778.6

Copyright © 2022 Pearson Education, Inc.


294

Chapter 6 To form the confidence interval for𝜎using the interval in part a, we take the square root of the endpoints: √672, √779 ⇒ (25.9, 27.9) This is the same as the interval on the printout.

6.99

c.

We are 90% confident that the true population standard deviation of the level of support for all senior managers at CPA firms is between 25.9 and 27.9 points.

d.

We must assume that the distribution of the level of support is approximately normally distributed. From Exercise 4.125, we concluded that the data were approximately normally distributed.

a.

To find the confidence interval for 𝜎, we first find the confidence interval for 𝜎 and then take the square root of the endpoints. For confidence level .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. Using Table IV, Appendix D, with df = 𝑛 − 1 = 55 − 1 = 54, 𝜒. , ≈ 71. 4202 and 𝜒. , ≈ 32.3574. The 95% confidence interval is: (

) .

≤𝜎 ≤

(

) .

(

)(.

)

.

≤𝜎 ≤

(

)(.

)

.

⇒ .0170 ≤ 𝜎 ≤ .0375

The 95% confidence interval for 𝜎 is: √. 0170 ≤ 𝜎 ≤ √. 0375 ⇒ 0.130 ≤ 𝜎 ≤ 0.194 We are 95% confident that the true standard deviation of the facial WHR values for all CEOs at publically traded Fortune 500 firms is between .130 and .194. b.

In order for the interval to be valid, the distribution of WHR values should be approximately normally distributed. The distribution should look like: Normal distribution

0

6.100

a.

For confidence level .99, 𝛼 = .01 and 𝛼/2 = .01/2 = .005. Using MINITAB with df = 𝑛 − 1 = 4,605 − 1 = 4,604, 𝜒. , , = 4,854.93 and 𝜒. , , = 4,360.59. The 90% confidence interval for 𝜎 is: (

) .

b.

≤𝜎 ≤

(

) .

( ,

). ,

.

≤𝜎 ≤

( ,

). ,

.

⇒ .03387 ≤ 𝜎 ≤ .03771

The 99% confidence interval for 𝜎 is: √. 03387 ≤ 𝜎 ≤ √. 03771 ⇒ 0.184 ≤ 𝜎 ≤ 0.194 We are 99% confident that the true standard deviation of the domestic return on sales for all companies claim the tax deduction falls between .130 and .194.

Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Estimation with Confidence Intervals

6.101

295

Using MINITAB, the descriptive statistics are: Descriptive Statistics: Drug Variable N Mean StDev Drug 50 89.291 3.183

Variance 10.134

Minimum 81.790

Median 89.375

Maximum 94.830

For confidence level .99, 𝛼 = .01 and 𝛼/2 = .01/2 = .005. From Table IV, Appendix D, with df = 𝑛 − 1 = 50 − 1 = 49, 𝜒. , ≈ 79.4900 and 𝜒. , ≈ 27.9907. The 99% confidence interval is: (

)

≤𝜎 ≤

.

(

)

.

(

)(

)

.

.

≤𝜎 ≤

(

)(

.

)

.

⇒ 6.247 ≤ 𝜎 ≤ 17.740

We are 99% confident that the true population variation in drug concentrations for the new method is between 6.247 and 17.740. 6.102

To find the confidence interval for 𝜎, we first find the confidence interval for 𝜎 and then take the square root of the endpoints. For confidence level .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. Using Table IV, Appendix D, with df = 𝑛 − 1 = 18 − 1 = 17, 𝜒. , = 30.1910 and 𝜒. , = 7.56418. The 95% confidence interval is:

a.

(

) .

≤𝜎 ≤

(

) .

(

)( . ) .

≤𝜎 ≤

(

)( . ) .

⇒ 22.349 ≤ 𝜎 ≤ 89.201

The 95% confidence interval for 𝜎 is:√22.349 ≤ 𝜎 ≤ √89.201 ⇒ 4.727 ≤ 𝜎 ≤ 9.445

6.103

b.

We are 95% confident that the true standard deviation of conduction times of the prototype system is between 4.727 and 9.445.

c.

No, the prototype system does not satisfy this requirement. In order to meet the requirement, the entire confidence interval constructed in part a would have to have values below 7. The interval constructed in part a contains 7, but also contains values greater than 7.

Using MINITAB, the descriptive statistics are: Statistics Variable

Total Count Mean StDev Variance Minimum

CR-5

13

7.419

0.922

0.850

Q1 Median

5.500 6.750

Q3 Maximum

7.400 8.350

8.500

For confidence level .99, 𝛼 = .01 and 𝛼/2 = .01/2 = .005. From Table IV, Appendix D, with df = 𝑛 − 1 = 13 − 1 = 12, 𝜒. , = 28.2995 and 𝜒. , = 3.07382. The 99% confidence interval is: (

) .

≤𝜎 ≤

(

) .

(

)(. .

)

≤𝜎 ≤

(

)(. .

)

⇒ .3604 ≤ 𝜎 ≤ 3.3183

We are 99% confident that the true population variation in 5-year capitalization rates for all single-tenant properties is between .3604 and 3.3183. 6.104

a.

Answers will vary. Using a statistical package, a random sample of 10 observations is: 148.289, 41.891, 73.051, 29.140, 211.240, 4.777, 49.255, 99.407, 90.823, 84.203

Copyright © 2022 Pearson Education, Inc.


296

Chapter 6 b.

Using MINITAB, the descriptive statistics are: Descriptive Statistics: Sample Variable N Mean StDev Sample 10 83.2 60.5

Variance 3665.2

Minimum 4.78

Median 78.6

Maximum 211.2

For confidence level .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. Using Table IV, Appendix D, with df = 𝑛 − 1 = 10 − 1 = 9, 𝜒. , = 19.0228 and 𝜒. , = 2.70039. The 95% confidence interval is: (

) .

≤𝜎 ≤

(

) .

(

)( ,

. )

.

≤𝜎 ≤

(

)( ,

. )

.

⇒ 1,734.066 ≤ 𝜎 ≤ 12,215.569

The measure of reliability for this estimate is 95%. c.

Using MINITAB, the descriptive statistics are: Descriptive Statistics: INTTIME Variable N Mean StDev INTTIME 267 95.52 91.54

Variance 8379.41

The true population variance is found by 𝜎 =

Minimum 1.86

∑(

) ,

Median 70.88

Maximum 513.52

. The variance reported here has a denominator .

(

)

= 8,3478.03. This value is in the 95% of 266 instead of 267. The population variance is confidence interval constructed in part b. We know that in repeated sampling, 95% of all intervals constructed in a similar manner will contain the true variance and 5% will not. The interval that we constructed could be one of the 5% that did not contain the true variance. 6.105

Using MINITAB, the descriptive statistics are: Descriptive Statistics: Honey, DM Variable N Mean StDev Honey 35 10.714 2.855 DM 33 8.333 3.256

a.

Variance 8.151 10.604

Minimum 4.000 3.000

Median 11.000 9.000

Maximum 16.000 15.000

For confidence level .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. Using MINITAB with df = 𝑛 − 1 = 35 − 1 = 34, 𝜒. , = 48.6024 and 𝜒. , = 21.6643. The 90% confidence interval is: (

) .

≤𝜎 ≤

(

) .

(

) .

≤𝜎 ≤

.

(

) .

⇒ 5.702 ≤ 𝜎 ≤ 12.792

.

The 90% confidence interval for 𝜎 is:√5.702 ≤ 𝜎 ≤ √12.792 ⇒ 2.39 ≤ 𝜎 ≤ 3.58 b.

For confidence level .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. Using MINITAB with df = 𝑛 − 1 = 33 − 1 = 32, 𝜒. , = 46.1943 and 𝜒. , = 20.0719. The 90% confidence interval is: (

) .

≤𝜎 ≤

(

) .

(

) .

.

≤𝜎 ≤

(

) .

.

⇒ 7.346 ≤ 𝜎 ≤ 16.906

The 90% confidence interval for 𝜎 is:√7.346 ≤ 𝜎 ≤ √16.906 ⇒ 2.71 ≤ 𝜎 ≤ 4.11 c.

Since the confidence intervals overlap, the researchers cannot conclude that the variances of the two groups differ.

Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Estimation with Confidence Intervals 6.106

6.107

6.108

297

a.

For a small sample from a normal distribution with unknown standard deviation, we use the t-statistic. For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table III, Appendix D, with df = 𝑛 − 1 = 23 − 1 = 22, 𝑡. = 2.074.

b.

For a large sample from a distribution with an unknown standard deviation, we can estimate the population standard deviation with s and use the z-statistic. For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. = 1.96.

c.

For a small sample from a normal distribution with known standard deviation, we use the z-statistic. For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. = 1.96.

d.

For a large sample from a distribution about which nothing is known, we can estimate the population standard deviation with s and use the z-statistic. For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. = 1.96.

e.

For a small sample from a distribution about which nothing is known, we can use neither z nor t.

a.

𝑃(𝑡 ≤ 𝑡 ) = .05 where 𝑑𝑓 = 20. Thus, 𝑡 = −1.725.

b.

𝑃(𝑡 ≥ 𝑡 ) = .005 where 𝑑𝑓 = 9. Thus, 𝑡 = 3.250.

c.

𝑃(𝑡 ≤ −𝑡 or 𝑡 ≥ 𝑡 ) = .10 where 𝑑𝑓 = 8 is equivalent to 𝑃(𝑡 ≥ 𝑡 ) = Thus, 𝑡 = 1.860.

d.

𝑃(𝑡 ≤ −𝑡 or 𝑡 ≥ 𝑡 ) = .01 where 𝑑𝑓 = 17 is equivalent to 𝑃(𝑡 ≥ 𝑡 ) =. Thus, 𝑡 = 2.898.

a.

Of the 400 observations, 227 had the characteristic ⇒ 𝑝̂ = 227/400 = .5675.

.

= .05 where 𝑑𝑓 = 8.

.

= .005 where 𝑑𝑓 = 17.

The sample size is large enough if both 𝑛𝑝̂ ≥ 15 and 𝑛𝑞 ≥ 15. 𝑛𝑝̂ = 400(.5675) = 227and 𝑛𝑞 = 400(.4325) = 173 Since both numbers are greater than or equal to 15, the sample size is sufficiently large to conclude the normal approximation is reasonable. For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, = 1.96. The confidence interval is: 𝑧. 𝑝̂ ± 𝑧. 25 b.

⇒ 𝑝̂ ± 1.96

⇒ .5675 ± 1.96

.

(.

)

⇒ .5675 ± .0486 ⇒ (.5189, .6161)

For this problem, 𝑀𝐸 = .02. For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. = 1.96. Thus, From Table II, Appendix D, 𝑧. 𝑛=

/

=

( .

) (.

)(. .

)

= 2,357. 2 ≈ 2,358

Thus, the sample size was 2,358.

Copyright © 2022 Pearson Education, Inc.


298

Chapter 6

6.109

a.

For confidence coefficient .99, 𝛼 = .01 and 𝛼/2 = .01/2 = .005. From Table II, Appendix D, 𝑧. = 2.575. The confidence interval is: 𝑥̄ ± 𝑧. ⇒ 32.5 ± 2.575 ⇒ 32.5 ± 5.15 ⇒ (27.35, 37.65) √

6.112

(

)

= 23,870.25 ≈ 23,871.

c.

For confidence level .99, 𝛼 = .01 and 𝛼/2 = .01/2 = .005. Using MINITAB with df = 𝑛 − 1 = 225 − 1 = 224, 𝜒. , = 282.268 and 𝜒. , = 173.238. The 99% confidence interval is: ) .

6.111

.

=

The sample size is 𝑛 =

(

6.110

/

b.

≤𝜎 ≤

(

) .

(

)( .

.

)

≤𝜎 ≤

(

)(

)

.

⇒ 714.215 ≤ 𝜎 ≤ 1,163.717

d.

"99% confidence" means that if repeated samples of size 225 were selected from the population and 99% confidence intervals constructed for the population mean, then 99% of all the intervals constructed will contain the population mean.

a.

The finite population correction factor is:

b.

The finite population correction factor is:

c.

The finite population correction factor is:

a.

Using Table IV, Appendix D, with 𝑑𝑓 = 𝑛 − 1 = 10 − 1 = 9, 𝜒. 2.70039.

b.

Using Table IV, Appendix D, with 𝑑𝑓 = 𝑛 − 1 = 20 − 1 = 19, 𝜒. 8.90655.

,

= 32.8523 and 𝜒.

,

=

c.

Using Table IV, Appendix D, with 𝑑𝑓 = 𝑛 − 1 = 50 − 1 = 49, 𝜒. 27.9907.

,

≈ 79.4900 and 𝜒.

,

=

a.

A point estimate for the average number of latex gloves used per week by all healthcare workers with latex allergy is 𝑥̄ = 19.3.

b.

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, = 1.96. The confidence interval is: 𝑧. 𝑥̄ ± 𝑧 /

⇒ 19.3 ± 1.96

. √

(

)

=

( ,

) ,

(

)

=

(

(

)

=

( ,

)

= .9874

= .8944 )

,

= .8944 = 19.0228 and 𝜒.

,

,

=

⇒ 19.3 ± 3.44 ⇒ (15.86, 22.74)

c.

We are 95% confident that the true average number of latex gloves used per week by all healthcare workers with a latex allergy is between 15.86 and 22.74.

d.

The conditions required for the interval to be valid are: i. ii.

The sample selected was randomly selected from the target population. The sample size is sufficiently large, i.e. 𝑛 > 30.

Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Estimation with Confidence Intervals 6.113

6.114

299

The parameters of interest for the problems are: (1)

The question requires a categorical response. One parameter of interest might be the proportion, p, of all Americans over 18 years of age who think their health is generally very good or excellent.

(2)

A parameter of interest might be the mean number of days, 𝜇, in the previous 30 days that all Americans over 18 years of age felt that their physical health was not good because of injury or illness.

(3)

A parameter of interest might be the mean number of days, 𝜇, in the previous 30 days that all Americans over 18 years of age felt that their mental health was not good because of stress, depression, or problems with emotions.

(4)

A parameter of interest might be the mean number of days, 𝜇, in the previous 30 days that all Americans over 18 years of age felt that their physical or mental health prevented them from performing their usual activities.

a.

𝑝̂ =

b.

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. = 1.96. The confidence interval is:

=

= .065

⇒ .065 ± 1.96

𝑝̂ ± 𝑧.

(.

.

)

⇒ .065 ± .025 ⇒ (. 040, .090)

c.

We are 95% confident that the true crash risk for novice drivers who use a cell phone while driving is between .040 and .090.

d.

𝑝̂ =

=

,

= .046

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. = 1.96. The confidence interval is: 𝑝̂ ± 𝑧.

⇒ .046 ± 1.96

(.

. ,

)

⇒ .046 ± .011 ⇒ (. 035, .057)

We are 95% confident that the true crash risk for expert drivers who use a cell phone while driving is between .035 and .057. 6.115

a.

The population of interest is all American adults.

b.

The sample is the 1,000 adults surveyed.

c.

The parameter of interest is the proportion of all American adults who think Starbucks coffee is overpriced, p.

d.

The sample size is large enough if both 𝑛𝑝̂ ≥ 15 and 𝑛𝑞 ≥ 15. 𝑛𝑝̂ = 1,000(.73) = 730 and 𝑛𝑞 = 1,000(.27) = 270 Since both numbers are greater than or equal to 15, the sample size is sufficiently large to conclude the normal approximation is reasonable. For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, = 1.96. The 95% confidence interval is: 𝑧. Copyright © 2022 Pearson Education, Inc.


300

Chapter 6

⇒ 𝑝̂ ± 𝑧 /

𝑝̂ ± 𝑧 /

⇒ .73 ± 1.96

.

(.

)

⇒ .73 ± .028 ⇒ (. 702, .758)

We are 95% confident that the true proportion of all American adults who say Starbucks coffee is overpriced is between .702 and .758. 6.116

a.

From the printout, the 90% confidence interval for the mean lead level is (0.61, 5.16).

b.

From the printout, the 90% confidence interval for the mean copper level is (0.2637, 0.5529).

c.

We are 95% confident that the mean lead level in water specimens from Crystal Lakes Manors is between .61 and 5.16. We are 95% confident that the mean copper level in water specimens from Crystal Lakes Manors is between .2637 and .5529.

d. 6.117

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. 1.96. For this study, 𝑛=

6.118

90% confidence means that if repeated samples of size n are selected and 90% confidence intervals formed, 90% of all confidence intervals will contain the true mean.

a.

/

=

.

( )

=

= 96. 04 ≈ 97 The sample size needed is 97.

Using MINITAB, the descriptive statistics are: Descriptive Statistics: Rate Variable N Mean Median Rate 30 79.73 80.00

StDev Minimum 5.96 60.00

Maximum 90.00

Q1 76.75

Q3 84.00

For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table II, Appendix D, 𝑧. = 1.645. The confidence interval is: 𝑥̄ ± 𝑧.

⇒ 79.73 ± 1.645

. √

⇒ 79.73 ± 1.79 ⇒ (77.94, 81.52)

b.

We are 90% confident that the mean participation rate for all companies that have 401(k) plans is between 77.94% and 81.52%.

c.

We must assume that the sample size(𝑛 = 30)is sufficiently large so that the Central Limit Theorem applies.

d.

Yes. Since 71% is not included in the 90% confidence interval, it can be concluded that this company's participation rate is lower than the population mean.

e.

The center of the confidence interval is 𝑥̄ . If 60% is changed to 80%, the value of𝑥̄ will increase, thus, indicating that the center point will be larger. The value of s2 will decrease if 60% is replaced by 80%, thus causing the width of the interval to decrease.

Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Estimation with Confidence Intervals 6.119

=

First, we must estimate p: 𝑝̂ = 𝑝̂ ± 2

= .694. The 95% confidence interval is:

(.

.

⇒ .694 ± 2

301

)

⇒ .694 ± .092 ⇒ (. 602, .786)

We are 95% confident that the true proportion of all New Jersey Governor’s Council business members that have employees with substance abuse problems is between .602 and .786. 6.120

a.

The target parameter is 𝜎 , the population variation in WR scores for all convicted drug dealers.

b.

For confidence level .99, 𝛼 = .01 and 𝛼/2 = .01/2 = .005. Using Table IV, Appendix D, with df = 𝑛 − 1 = 100 − 1 = 99, 𝜒. , ≈ 140.169 and 𝜒. , ≈ 67.3276. The 90% confidence interval is: (

) .

≤𝜎 ≤

(

) .

(

) .

≤𝜎 ≤

(

) .

⇒ 25.426 ≤ 𝜎 ≤ 52.935

c.

“99% confidence” means that in repeated sampling, 99% of all confidence intervals constructed in a similar manner will contain the true variance.

d.

We must assume that a random sample was selected and that the population of interest is approximately normal.

e.

The variance in measured in terms of WR scores-squared. This is difficult to relate to the data. The standard deviation is measured in WR scores, the same units as the data.

f.

The 99% confidence interval for𝜎is: √25.426 ≤ 𝜎 ≤ √52.935 ⇒ 5.042 ≤ 𝜎 ≤ 7.276 We are 99% confident that the true standard deviation of WR scores is between 5.042 and 7.276.

6.121

a.

The point estimate of 𝜇 is 𝑥̄ = 3.11.

b.

For confidence coefficient .98, 𝛼 = .02 and 𝛼/2 = .02/2 = .01. From Table II, Appendix D, 𝑧. = 2.33. The confidence interval is: 𝑥̄ ± 𝑧.

6.122

⇒ 3.11 ± 2.33

. √

⇒ 3.11 ± .088 ⇒ (3.022,3.198)

c.

This statement is incorrect. Once the interval is constructed, there is no probability involved. The true mean is either in the interval or it is not. A better statement would be: “We are 98% confident that the true mean GPA will be between 3.022 and 3.198.

d.

Since the sample size is so large (𝑛 = 307), the Central Limit Theorem applies. Thus, it does not matter whether the distributions of grades is skewed or not.

a.

Answers will vary. Using MINITAB, 30 random numbers were generated using the uniform distribution from 1 to 308. The random numbers were: 9, 15, 19, 36, 46, 47, 63, 73, 90, 92, 108, 112, 117, 127, 144, 145, 150, 151, 172, 178, 218, 229, 230, 241, 242, 246, 252, 267, 274, 282 The 308 observations were numbered in the order that they appear in the file. Using the random numbers generated above, I selected the 9th, 15th, 19th, etc. observations for the sample. The selected sample is:

Copyright © 2022 Pearson Education, Inc.


302

Chapter 6 .31, .34, .34, .50, .52, .53, .64, .72, .70, .70, .75, .78, 1.00, 1.00, 1.03, 1.04, 1.07, 1.10, .21, .24, .58, 1.01, .50, .57, .58, .61, .70, .81, .85, 1.00 b.

Using MINITAB, the descriptive statistics for the sample of 30 observations are: Descriptive Statistics: carats-samp Variable N Mean Median StDev carats-s 30 0.6910 0.7000 0.2620

Minimum 0.2100

Maximum 1.1000

Q1 Q3 0.5150 1.0000

From above, 𝑥̄ = .6910 and 𝑠 = .2620. c.

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, = 1.96. The confidence interval is: 𝑧. 𝑥̄ ± 𝑧 /

6.123

⇒ .691 ± 1.96

.

⇒ .691 ± .094 ⇒ (. 597, .785)

d.

We are 95% confident that the mean number of carats is between .597 and .785.

e.

From Exercise 2.158, we computed the “population” mean to be .631. This mean does fall in the 95% confidence interval we computed in part d.

There are a total of 96 channel catfish in the sample. The point estimate of p is 𝑝̂ =

=

= .667.

The sample size is large enough if both 𝑛𝑝̂ ≥ 15 and 𝑛𝑞 ≥ 15. 𝑛𝑝̂ = 144(.667) = 96 and 𝑛𝑞 = 144(.333) = 48 Since both numbers are greater than or equal to 15, the sample size is sufficiently large to conclude the normal approximation is reasonable. For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table II, Appendix D, 𝑧. 1.645. The confidence interval is: ⇒ .667 ± 1.645

𝑝̂ ± 𝑧.

.

(.

)

=

⇒ .667 ± .065 ⇒ (. 602, .732)

We are 90% confident that the true proportion of channel catfish in the population is between .602 and .732. 6.124

a.

For confidence coefficient .99, 𝛼 = .01 and 𝛼/2 = .01/2 = .005. From Table II, Appendix D, = 2.58. The confidence interval is: 𝑧. 𝑥̄ ± 𝑧 /

⇒ 1.13 ± 2.58

. √

⇒ 1.13 ± .67 ⇒ (. 46, 1.80)

We are 99% confident that the mean number of pecks at the blue string is between .46 and 1.80.

6.125

b.

Yes. The mean number of pecks at the white string is 7.5. This value does not fall in the 99% confident interval for the blue string found in part a. Thus, the chickens are more apt to peck at white string.

a.

An estimate of the true mean Mach rating score of all purchasing managers is 𝑥̄ = 99.6.

Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Estimation with Confidence Intervals b.

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, = 1.96. The 95% confidence interval is: 𝑧. 𝑥̄ ± 𝑧 /

⇒ 99.6 ± 1.96

. √

⇒ 99.6 ± 2.24 ⇒ (97.36, 101.84)

c.

We are 95% confident that the true Mach rating score of all purchasing managers is between 97.36 and 101.84.

d.

Yes, there is evidence to dispute this claim. We are 95% confident that the true mean Mach rating score is between 97.36 and 101.84. It would be very unlikely that the true mean Mach scores is as low as 85.

e.

To compute the necessary sample size, use𝑛 =

Thus, 𝑛 =

.

( .

)

/

where 𝛼 = .05 and 𝛼/2 = .05/2 = .025.

= 1.96.

From Table II, Appendix D, 𝑧.

6.126

= 245.86 ≈ 246

The point estimate of p is 𝑝̂ =

=

= .186

,

For confidence coefficient .99, 𝛼 = .01 and 𝛼/2 = .01/2 = .005. From Table II, Appendix D, 𝑧. 2.576. The confidence interval is:

𝑝̂ ± 𝑧. 6.127

303

=

. 186(.814) 𝑝̂ 𝑞 ⇒ .186 ± 2.576 ⇒ .186 ± .020 ⇒ (. 166, .206) 2,501 𝑛

a.

The target parameter is 𝜇 = mean trap spacing for the population of red spiny lobster fishermen fishing in Baja California Sur, Mexico.

b.

Using MINITAB, the descriptive statistics are: Descriptive Statistics: Trap Variable N Mean StDev Trap 7 89.86 11.63

Minimum 70.00

Q1 82.00

Median 93.00

Q3 99.00

Maximum 105.00

The point estimate of 𝜇 is 𝑥̄ = 89.86. c.

For this problem, the sample size is 𝑛 = 7. For a small sample size, the Central Limit Theorem does not apply. Therefore, we do not know what the sampling distribution of 𝑥̄ is.

d.

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table III, Appendix D, with = 2.447. The 95% confidence interval is: df = 𝑛 − 1 = 7 − 1 = 6,𝑡. 𝑥̄ ± 𝑡.

⇒ 89.86 ± 2.447

. √

⇒ 89.86 ± 10.756 ⇒ (79.104, 100.616)

e.

We are 95% confident that the true mean trap spacing for the population of red spiny lobster fishermen fishing in Baja California Sur, Mexico is between 79.104 and 100.616 meters.

f.

We must assume that the population of trap spacings is normally distributed and that the sample is a random sample. Copyright © 2022 Pearson Education, Inc.


304

Chapter 6 g.

If the width of the interval is 5, then 𝑆𝐸 = 5/2 = 2.5. For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. = 1.96. 𝑛=

h.

.

=

(

.

)

= 83.14 ≈ 84

.

For confidence level .99, 𝛼 = .01 and 𝛼/2 = .01/2 = .005. Using Table IV, Appendix D, with df = 𝑛 − 1 = 7 − 1 = 6, 𝜒. , = 18.5476 and 𝜒. , = .675727. The 99% confidence interval is: (

) .

6.128

/

≤𝜎 ≤

(

) .

(

)(

.

)

≤𝜎 ≤

.

(

)( .

.

)

⇒ 43.717 ≤ 𝜎 ≤ 1,199.952

a.

The parameter of interest is p, the proportion of all fillets that are red snapper.

b.

The estimate of p is 𝑝̂ =

=

= .23

The sample size is large enough if both 𝑛𝑝̂ ≥ 15 and 𝑛𝑞 ≥ 15. 𝑛𝑝̂ = 22(.23) = 5 and 𝑛𝑞 = 22(.77) = 17 Since 𝑛𝑝̂ is less than 15, the sample size is not large enough to conclude the normal approximation is reasonable. c.

We will use Wilson’s adjustment to form the confidence interval. Using Wilson’s adjustment, the point estimate of the true proportion of all fillets that are not red snapper is 𝑝=

=

=

= .269

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, = 1.96. Wilson’s adjusted 95% confidence interval is: 𝑧. 𝑝±𝑧 /

6.129

.

⇒ .269 ± 1.96

(.

)

⇒ .269 ± .170 ⇒ (. 099, .439)

d.

We are 95% confident that the true proportion of all fillets that are red snapper is between .099 and .439.

a.

The point estimate of p is 𝑝̂ =

b.

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix = 1.96. The confidence interval is: D,𝑧. 𝑝̂ ± 𝑧.

⇒ .867 ± 1.96

.

=

(.

= .867.

)

⇒ .867 ± .086 ⇒ (. 781, .953)

c.

We are 95% confident that the true proportion of Wal-Mart stores in California that have more than 2 inaccurately priced items per 100 scanned is between .781 and .953.

d.

If 99% of the California Wal-Mart stores are in compliance, then only 1% or .01 would not be. However, we found the 95% confidence interval for the proportion that are not in compliance is between .781 and .953. The value of .01 is not in this interval. Thus, it is not a likely value. This claim is not believable.

Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Estimation with Confidence Intervals e.

305

The sample size is large enough if both 𝑛𝑝̂ ≥ 15 and 𝑛𝑞 ≥ 15. 𝑛𝑝̂ = 60(. 867) = 52 and 𝑛𝑞 = 60(.133) = 8 Since 𝑛𝑞 is less than 15, the sample size is not large enough to conclude the normal approximation is reasonable. Thus, the confidence interval constructed in part b may not be valid. Any inference based on this interval is questionable.

f.

From above, the value of 𝑝̂ is .867. For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table II, Appendix D,𝑧. = 1.645. 𝑛= (

6.130

a.

/

(.

.

=

)

)(.

.

)

= 125.08 ≈ 126

Using MINITAB, the descriptive statistics are: Descriptive Statistics N Mean StDev SE Mean 10

10.33

8.58

90% CI for μ

2.71 (5.36, 15.30)

The 90 confidence interval is (5.36, 15.30). We are 90% confident that the true average annualized percentage return on investment of all stock screeners provided by AAII is between 5.36% and 15.30%. b.

Since the confidence interval in part a contains only positive values, then on average, the AAII stock screeners perform better than the S&P500.

c.

We must assume that the annualized percentage returns on investment for all stock screeners are normally distributed and that the sample is random. Yes, this assumption seems to be satisfied. A histogram of the data is: Histogram of x 5

Frequency

4

3

2

1

0

0

5

10

15

20

25

x

The distribution is fairly mound-shaped. 6.131

a.

In order for the large-sample estimation method to be valid, 𝑛𝑝̂ ≥ 15 and 𝑛𝑞 ≥ 15. For this exercise, 𝑝̂ = = = .092 , 𝑛𝑝̂ = 131(. 092) = 12.05, and 𝑛𝑞 = 131(. 908) = 118.95. Since one of these values is less than 15, the large-sample estimation method is not valid. We will use the Wilson’s adjustment. 𝑝=

=

=

= .104 Copyright © 2022 Pearson Education, Inc.


306

Chapter 6 For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. = 1.96. The confidence interval is: ⇒ .104 ± 1.96

𝑝 ± 𝑧.

.

(.

)

⇒ .104 ± .051 ⇒ (. 053, .155)

We are 95% confident that the true proportion of women with cosmetic dermatitis from using eye shadow who have a nickel allergy is between .053 and .155. b.

In order for the large-sample estimation method to be valid, 𝑛𝑝̂ ≥ 15 and 𝑛𝑞 ≥ 15. For this exercise, = .1 , 𝑛𝑝̂ = 250(. 1) = 25, and 𝑛𝑞 = 250(. 9) = 225. Since both of these values are 𝑝̂ = = greater than 15, the large-sample estimation method is valid. For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. = 1.96. The confidence interval is: ⇒ .1 ± 1.96

𝑝̂ ± 𝑧.

. (. )

⇒ .1 ± .037 ⇒ (. 063, .137)

We are 95% confident that the true proportion of women with cosmetic dermatitis from using mascara who have a nickel allergy is between .063 and .137. c.

No, we cannot determine which group is referenced. The value of .12 falls in both confidence intervals.

d.

The value of 𝑝̂ for both groups was close to .1. Since no level of significance was given, we will use 95%. For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D,𝑧. = 1.96. 𝑛= (

6.132

a.

)

=

(. )(. )

. .

= 384.16 ≈ 385

The point estimate for the fraction of the entire market that refuses to purchase bars is: 𝑝̂ =

b.

/

=

= .094

The sample size is large enough if both 𝑛𝑝̂ ≥ 15 and 𝑛𝑞 ≥ 15. 𝑛𝑝̂ = 244(. 094) = 22. 9 and 𝑛𝑞 = 244(. 906) = 221. 1 Since both numbers are greater than or equal to 15, the sample size is sufficiently large to conclude the normal approximation is reasonable.

c.

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, = 1.96. The confidence interval is: 𝑧. 𝑝̂ ± 𝑧.

d.

⇒ .094 ± 1.96

(.

)(.

)

⇒ .094 ± .037 ⇒ (. 057, .131)

The best estimate of the true fraction of the entire market that refuses to purchase bars six months after the poisoning is .094. We are 95% confident the true fraction of the entire market that refuses to purchase bars six months after the poisoning is between .057 and .131. Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Estimation with Confidence Intervals e.

307

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. = 1.96. Also, 𝑆𝐸 = .02. The sample size is 𝑛 =

/

(

=

)

( .

) (.

)(.

.

)

= 817. 9 ≈ 818

You would need to take 𝑛 = 818 samples. a.

Using MINITAB, the descriptive statistics are: Descriptive Statistics: Skid Variable N Mean StDev Skid 20 358.5 117.8

Minimum 141.0

Q1 276.0

Median 367.5

Q3 438.0

Maximum 574.0

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table III, Appendix D, with = 2.093. The 95% confidence interval is: df = 𝑛 − 1 = 20 − 1 = 19, 𝑡. 𝑥̄ ± 𝑡.

⇒ 358.5 ± 2.093

. √

⇒ 358.5 ± 55.13 ⇒ (303.37, 413.63)

b.

We are 95% confident that the mean skidding distance is between 303.37 and 413.63 meters.

c.

In order for the inference to be valid, the skidding distances must be from a normal distribution. We will use the four methods to check for normality. First, we will look at a histogram of the data. Using MINITAB, the histogram of the data is: Histogram of Skid 4

3 Fr equency

6.133

2

1

0

200

300

400

500

Skid

From the histogram, the data appear to be fairly mound-shaped. This indicates that the data may be normal. Next, we look at the intervals 𝑥̄ ± 𝑠, 𝑥̄ ± 2𝑠, 𝑥̄ ± 3𝑠. If the proportions of observations falling in each interval are approximately .68, .95, and 1.00, then the data are approximately normal. 𝑥̄ ± 𝑠 ⇒ 358.5 ± 117.8 ⇒ (240.7, 476.3) 14 of the 20 values fall in this interval. The proportion is .70. This is very close to the .68 we would expect if the data were normal. 𝑥̄ ± 2𝑠 ⇒ 358.5 ± 2(117.8) ⇒ 358.5 ± 235.6 ⇒ (122.9, 594.1) 20 of the 20 values fall in this interval. The proportion is 1.00. This is a larger than the .95 we would expect if the data were normal. 𝑥̄ ± 3𝑠 ⇒ 358.5 ± 3(117.8) ⇒ 358.5 ± 353.4 ⇒ (5.1, 711.9) 20 of the 20 values fall in this interval. The proportion is 1.00. This is exactly the 1.00 we would expect if the data were normal. Copyright © 2022 Pearson Education, Inc.


308

Chapter 6 From this method, it appears that the data may be normal. Next, we look at the ratio of the IQR to s. 𝐼𝑄𝑅 = 𝑄 – 𝑄 = 438– 276 = 162. IQR

= = 1.37 This is fairly close to the 1.3 we would expect if the data were normal. This . method indicates the data may be normal. Finally, using MINITAB, the normal probability plot is: Probability Plot of Skid N ormal - 95% C I 99

95 90

Mean StDev

358.5 117.8

N AD P-Value

20 0.170 0.921

80

P er cent

70 60 50 40 30 20 10 5

1

0

100

200

300

400 Skid

500

600

700

800

Since the data form a fairly straight line, the data may be normal. From above, all the methods indicate the data may be normal. It appears that the assumption that the data come from a normal distribution is probably valid.

6.134

b.

No. A distance of 425 meters falls above the 95% confidence interval that was computed in part a. It would be very unlikely to observe a mean skidding distance of at least 425 meters.

a.

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table III, Appendix D, with df = 𝑛 − 1 = 12 − 1 = 11,𝑡. = 2.201. The 95% confidence interval is: 𝑠

𝑥̄ ± 𝑡.

√𝑛

⇒ 3,643 ± 2.201

4,487 √12

⇒ 3,643 ± 2,850.92 ⇒ (792.08, 6,493.92)

We are 95% confident that the true mean level of radon exposure in tombs in the Valley of Kings is between 792.08 and 6,493.92 Bq/m3. b.

To find the confidence interval for 𝜎, we first find the confidence interval for 𝜎 and then take the square root of the endpoints. For confidence level .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. Using Table IV, Appendix D, with df = 𝑛 − 1 = 12 − 1 = 11, 𝜒. , = 21.9200 and 𝜒. , = 3.81575. The 95% confidence interval is: (

) .

≤𝜎 ≤

(

) .

(

)( , .

)

≤𝜎 ≤

(

)( ,

)

.

Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Estimation with Confidence Intervals

309

⇒ 10,103,323.86 ≤ 𝜎 ≤ 58,039,666.91 The 95% confidence interval for 𝜎 is: √10,103,323.86 ≤ 𝜎 ≤

58,039,666.91

⇒ 3,178.57 ≤ 𝜎 ≤ 7,618.38 We are 95% confident that the true standard deviation of radon levels in tombs in the Valley of Kings is between 3,178.57 and 7,618.38 Bq/m3. 6.135

a.

Answers will vary. Using a computer package, the 100 selected invoices are: 3590 1453 3726 2844 1767 1259 1091 1795 4431 4565 4586 1020 2135 1078 2659 4694 2572 4559 4601

965 4553 1052 3448

574 1360 3803 2247 1164 1862 2385

1255 4966

658 4007 4743 3746 3029 3723 3950

4662

217

949 4580 4126 1794 2912

67 2514 3544 1596 2344 1603 3744 1886

151 4258

183 1869 4509 4572 3875

34 3781 4993 1284 2177 4290

13 2717

287 2977 3459 4639 2272 3620 4646 1544

919 3820

1216 2052 4881 2220 3883

346 4744

312 4325

602 3137

121 2373 4684 2025 2254 4018 2304 3503 1634 2470

The observation numbers ending in 0 are highlighted above. b.

𝑝̂ =

=

= .10

For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table II, Appendix D, 𝑧. 1.645. The confidence interval is: ⇒ .10 ± 1.645

𝑝̂ ± 𝑧.

6.136

.

(.

)

=

⇒ .10 ± .049 ⇒ (. 051, .149)

c.

Our sample proportion was 𝑝̂ = .10 which is equal to the true proportion. The confidence interval does contain .10.

a.

Using Chebyshev’s Theorem, we know that at least 1 −

of the observations fall within k standard

deviations of the mean. We want to find k such that 1 −

= .60.

1−

= .60 ⇒

Thus, 𝑠 ≈

= .40 ⇒ 𝑘 =

ℎ percentile

ℎ percentile

.

= 2.5 ⇒ 𝑘 = √2.5 = 1.5811

=

,

, ( .

)

= 22,452.17

For confidence coefficient .98, 𝛼 = .02 and 𝛼/2 = .02/2 = .01. From Table II, Appendix D, 𝑧. 2.33. 𝑛=

/

=

.

(

, ,

.

)

= 684.18 ≈ 685

b.

See part a.

c.

We have to assume that the estimate of the standard deviation is accurate.

d.

An approximate 95% confidence interval for 𝜇 is:

Copyright © 2022 Pearson Education, Inc.

=


310

Chapter 6

𝑥̄ ± 2𝜎 ̄ ⇒ 𝑥̄ ± 2

– √

⇒ 105,000 ± 2

,

.

,

, ,

√ ,

⇒ 105,000 ± 1,070

⇒ (103,930, 106,070) 6.137

Since the manufacturer wants to be reasonably certain the process is really out of control before shutting down the process, we would want to use a high level of confidence for our inference. We will form a 99% confidence interval for the mean breaking strength. For confidence coefficient .99, 𝛼 = .01 and 𝛼/2 = .01/2 = .005. From Table III, Appendix D, with = 3.355. The 99% confidence interval is: df = 𝑛 − 1 = 9 − 1 = 8, 𝑡. 𝑠

𝑥̄ ± 𝑡.

√𝑛

⇒ 985.6 ± 3.355

22.9 √9

⇒ 985.6 ± 25.61 ⇒ (959.99, 1,011.21)

We are 99% confident that the true mean breaking strength is between 959.99 and 1,011.21. Since 1,000 is contained in this interval, it is not an unusual value for the true mean breaking strength. Thus, we would recommend that the process is not out of control. 6.138

a.

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. = 1.96. Since we do not have any estimate of p, we will estimate p with .5. The sample size is: 𝑛= (

6.139

/

)

=

(. )(. )

. .

= 384.16 ≈ 385

b.

Yes. The desired bound was .05 and the actual bound was .05.

a.

As long as the sample is random (and thus representative), a reliable estimate of the mean weight of all the scallops can be obtained.

b.

The government is using only the sample mean to make a decision. Rather than using a point estimate, they should probably use a confidence interval to estimate the true mean weight of the scallops so they can include a measure of reliability.

a.

We will form a 95% confidence interval for the mean weight of the scallops. Using MINITAB, the descriptive statistics are: Descriptive Statistics: Weight Variable N Mean StDev Weight 18 0.9317 0.0753

Minimum 0.8400

Q1 0.8800

Median 0.9100

Q3 9800

Maximum 1.1400

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table III, Appendix A, with df = n – 1 = 18 – 1 = 17, t.025 = 2.110. The 95% confidence interval is: 𝑥̄ ± 𝑡.

⇒ .932 ± 2.110

. √

⇒ .932 ± .037 ⇒ (. 895, .969)

We are 95% confident that the true mean weight of the scallops is between .8943 and .9691. Recall that the weights have been scaled so that a mean weight of 1 corresponds to 1/36 of a pound. Since the above confidence interval does not include 1, we have sufficient evidence to indicate that the minimum weight restriction was violated.

Copyright © 2022 Pearson Education, Inc.


Chapter 7 Inferences Based on a Single Sample: Tests of Hypotheses 7.1

The null hypothesis is the "status quo" hypothesis, while the alternative hypothesis is the research hypothesis.

7.2

The test statistic is used to decide whether or not to reject the null hypothesis in favor of the alternative hypothesis.

7.3

The "level of significance" of a test is 𝛼. This is the probability that the test statistic will fall in the rejection region when the null hypothesis is true.

7.4

A Type I error is rejecting the null hypothesis when it is true. A Type II error is accepting the null hypothesis when it is false. 𝛼 =the probability of committing a Type I error. 𝛽 =the probability of committing a Type II error.

7.5

The four possible results are: 1. Rejecting the null hypothesis when it is true. This would be a Type I error. 2. Accepting the null hypothesis when it is true. This would be a correct decision. 3. Rejecting the null hypothesis when it is false. This would be a correct decision. 4. Accepting the null hypothesis when it is false. This would be a Type II error.

7.6

We can compute a measure of reliability for rejecting the null hypothesis when it is true. This measure of reliability is the probability of rejecting the null hypothesis when it is true which is 𝛼. However, it is generally not possible to compute a measure of reliability for accepting the null hypothesis when it is false. We would have to compute the probability of accepting the null hypothesis when it is false, 𝛽, for every value of the parameter in the alternative hypothesis.

7.7

When you reject the null hypothesis in favor of the alternative hypothesis, this does not prove the alternative hypothesis is correct. We are 100 1 − 𝛼 % confident that there is sufficient evidence to conclude that the alternative hypothesis is correct. If we were to repeatedly draw samples from the population and perform the test each time, approximately 100 1 − 𝛼 % of the tests performed would yield the correct decision.

7.8

a.

311 Copyright © 2022 Pearson Education, Inc.


312

Chapter 7

b.

c.

d.

e.

f.

g.

7.9

𝑃 𝑧 > 1.96 = .025 𝑃 𝑧 > 1.645 = .05 𝑃 𝑧 > 2.575 = .005 𝑃 𝑧 < −1.28 = .1003 𝑃 𝑧 < −1.645or𝑧 > 1.645 = .10 𝑃 𝑧 < −2.575or𝑧 > 2.575 = .01

Let p = proportion of American adults who pick professional football as their favorite sport. To see if the proportion of American adults who pick professional football as their favorite sport differs from .4, we test: 𝐻 : 𝑝 = .40 𝐻 : 𝑝 ≠ .40

Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Tests of Hypothesis 7.10

a.

313

Let 𝜇 =average gain in green fees, lessons, or equipment expenditures for participating golf facilities. The null and alternative hypotheses would be: 𝐻 : 𝜇 = $2,400 𝐻 : 𝜇 > $2,400

7.11

b.

The 𝛼 = .05 is the Type I error rate. This means that the probability of concluding that the average gain in fees, lessons, or equipment expenditures for participation golf facilities exceeds $2,400 when in fact, the average is $2,400 is .05.

c.

The rejection region requires 𝛼 = .05 in the upper tail of the z-distribution. From Table II, Appendix D,𝑧. = 1.645. The rejection region is 𝑧 > 1.645.

Let p = student loan default rate in this year. To see if the student loan default rate is less than .10, we test: 𝐻 : 𝑝 = .10 𝐻 : 𝑝 < .10

7.12

a.

The parameter of interest is p = proportion of deceitful speech recognized by the avatar.

b.

To test the claim that the avatar can detect deceitful speech 75% of the time, we test: 𝐻 : 𝑝 = .75 𝐻 : 𝑝 ≠ .75

7.13

c.

A Type I error is rejecting H0 when H0 is true. For this exercise, that would be concluding that the proportion of deceitful speech recognized by the avatar is different from .75 when the proportion is equal to .75.

d.

A Type II error is accepting H0 when H0 is false. For this exercise, that would be concluding that the proportion of deceitful speech recognized by the avatar equal to .75 when the proportion differs from .75.

Let 𝜇 =mean caloric content of Virginia school lunches. To test the claim that after the testing period ended, the average caloric content dropped, we test: 𝐻 : 𝜇 = 863 𝐻 : 𝜇 < 863

7.14

Let 𝜇 = average Libor rate for 1-year loans. Since many Western banks think that the reported average Libor rate (1.8%) is too high, they want to show that the average is less than 1.8. The appropriate hypotheses would be: 𝐻 : 𝜇 = 1.8 𝐻 : 𝜇 < 1.8

7.15

a.

Since the company must give proof the drug is safe, the null hypothesis would be the drug is unsafe. The alternative hypothesis would be the drug is safe.

b.

A Type I error would be concluding the drug is safe when it is not safe. A Type II error would be concluding the drug is not safe when it is. α is the probability of concluding the drug is safe when it is not. 𝛽 is the probability of concluding the drug is not safe when it is.

c.

In this problem, it would be more important for α to be small. We would want the probability of concluding the drug is safe when it is not to be as small as possible. Copyright © 2022 Pearson Education, Inc.


314

Chapter 7

7.16

a.

A Type I error would be concluding the proposed user is unauthorized when, in fact, the proposed user is authorized. A Type II error would be concluding the proposed user is authorized when, in fact, the proposed user is unauthorized. In this case, a more serious error would be a Type II error. One would not want to conclude that the proposed user is authorized when he/she is not.

b.

The Type I error rate is 1%. This means that the probability of concluding the proposed user is unauthorized when, in fact, the proposed user is authorized is .01. The Type II error rate is .00025%. This means that the probability of concluding the proposed user is authorized when, in fact, the proposed user is unauthorized is .0000025.

c.

The Type I error rate is .01%. This means that the probability of concluding the proposed user is unauthorized when, in fact, the proposed user is authorized is .0001. The Type II error rate is .005%. This means that the probability of concluding the proposed user is authorized when, in fact, the proposed user is unauthorized is .00005.

7.17

a.

A Type I error is rejecting the null hypothesis when it is true. In a murder trial, we would be concluding that the accused is guilty when, in fact, he/she is innocent. A Type II error is accepting the null hypothesis when it is false. In this case, we would be concluding that the accused is innocent when, in fact, he/she is guilty.

7.18

7.19

b.

Both errors are serious. However, if an innocent person is found guilty of murder and is put to death, there is no way to correct the error. On the other hand, if a guilty person is set free, he/she could murder again.

c.

In a jury trial, 𝛼 is assumed to be smaller than 𝛽. The only way to convict the accused is for a unanimous decision of guilt. Thus, the probability of convicting an innocent person is set to be small.

d.

In order to get a unanimous vote to convict, there has to be overwhelming evidence of guilt. The probability of getting a unanimous vote of guilt if the person is really innocent will be very small.

e.

If a jury is prejudiced against a guilty verdict, the value of 𝛼 will decrease. The probability of convicting an innocent person will be even smaller if the jury if prejudiced against a guilty verdict.

f.

If a jury is prejudiced against a guilty verdict, the value of 𝛽 will increase. The probability of declaring a guilty person innocent will be larger if the jury is prejudiced against a guilty verdict.

a.

The null hypothesis is: Ho: There is no intrusion.

b.

The alternative hypothesis is: Ha: There is an intrusion.

c.

𝛼 = P(warning | no intrusion)=

= .001.

𝛽 = P(no warning | intrusion)=

= .5.

a.

𝑝 = 𝑃 𝑧 ≥ 1.20 = .5 − .3849 = .1151

b.

𝑝 = 𝑃 𝑧 ≤ −1.20 = .5 − .3849 = .1151 Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Tests of Hypothesis 𝑝 = 𝑃 𝑧 ≤ −1.20 + 𝑃 𝑧 ≥ 1.20 = 2 . 1151 = .2302

c. 7.20

7.21

7.22

315

We will reject H0 if the p-value < 𝛼. a.

. 06 ≮ . 05, do not reject H0.

b.

. 10 ≮ . 05, do not reject H0.

c.

. 01 < .05, reject H0.

d.

. 001 < .05, reject H0.

e.

. 251 ≮ . 05, do not reject H0.

f.

. 042 < .05, reject H0.

a.

Since the 𝑝-value = .10 is greater than 𝛼 = .05, H0 is not rejected.

b.

Since the 𝑝-value = .05 is less than 𝛼 = .10, H0 is rejected.

c.

Since the 𝑝-value = .001is less than 𝛼 = .01, H0 is rejected.

d.

Since the 𝑝-value = .05 is greater than 𝛼 = .025, H0 is not rejected.

e.

Since the 𝑝-value = .45 is greater than 𝛼 = .10, H0 is not rejected.

𝑧=

̄ ̄

. . /√

= −1.46

𝑝-value = 𝑝 = 𝑃 𝑧 ≥ −1.46 = .5 + .4279 = .9279 (Using Table II, Appendix D) Since the p-value is so large, H0 would not be rejected for any reasonable value of 𝛼. There is no evidence to indicate the mean is greater than 50. 7.23

𝑝-value = 𝑝 = 𝑃 𝑧 ≥ 2.17 = .5 − 𝑃 0 < 𝑧 < 2.17 = .5 − .4850 = .0150 (using Table II, Appendix D) The probability of observing a test statistic of 2.17 or anything more unusual if the true mean is 100 is .0150. Since this probability is so small, there is evidence that the true mean is greater than 100.

7.24

First, find the value of the test statistic𝑧 =

̄ ̄

. . /√

= 1.60

𝑝-value = 𝑝 = 𝑃 𝑧 ≤ −1.60or 𝑧 ≥ 1.60 = 2𝑃 𝑧 ≥ 1.60 = 2 . 5 − .4452 = 2 . 0548 = .1096 (using Table II, Appendix D) There is no evidence to reject H0 for 𝛼 ≤ .10. 7.25

𝑝-value = 𝑝 = 𝑃 𝑧 ≥ 2.17 + 𝑃 𝑧 ≤ −2.17 = 2 . 5 − .4850 = 2 . 0150 = .0300 (using Table II, Appendix D)

7.26

a.

The p-value reported by SPSS is for a two-tailed test. Thus,𝑃 𝑧 ≤ −1.63 + 𝑃 𝑧 ≥ 1.63 = .1032 . For this one-tailed test, the 𝑝-value = 𝑝 = 𝑃 𝑧 ≤ −1.63 = .1032/2 = .0516. Since the 𝑝-value = .0516 > 𝛼 = .05, H0 is not rejected. There is insufficient evidence to indicate 𝜇 < 75 at 𝛼 = .05. Copyright © 2022 Pearson Education, Inc.


316

Chapter 7 b.

For this one-tailed test, the 𝑝-value = 𝑃 𝑧 ≤ 1.63 . Since𝑃 𝑧 ≤ −1.63 = .1032/2 = . 0516, 𝑃 𝑧 ≤ 1.63 = 1 − .0516 = .9484. Since the 𝑝-value = 𝑝 = .9484 > 𝛼 = .10, H0 is not rejected. There is insufficient evidence to indicate 𝜇 < 75 at 𝛼 = .10.

c.

For this one-tailed test, the 𝑝-value = 𝑃 𝑧 ≥ 1.63 = .1032/2 = .0516. Since the 𝑝-value = 𝑝 = .0516 < 𝛼 = .10, H0 is rejected. There is sufficient evidence to indicate 𝜇 > 75 at 𝛼 = .10.

d.

For this two-tailed test, the 𝑝-value = .1032. Since the 𝑝-value = .1032 > 𝛼 = .01, H0 is not rejected. There is insufficient evidence to indicate 𝜇 ≠ 75 at 𝛼 = .01.

7.27

The smallest value of𝛼for which the null hypothesis would be rejected is just greater than .06.

7.28

a.

By the Central Limit Theorem, the sampling distribution of 𝑥̄ is approximately normal with 𝜇 ̄ = 𝜇 = 1. and 𝜎 ̄ = = √

b.

The test statistic is 𝑧 =

̄

= √

7.29

.

= 2.5.

c.

The p-value is 𝑝 = 𝑃 𝑧 ≥ 2.5 = .5 − .4938 = .0062.

d.

The rejection region requires 𝛼 = .01in the upper tail of the z-distribution. From Table II, Appendix D, 𝑧. = 2.33. The rejection region is 𝑧 > 2.33.

e.

Since the p-value is less than 𝛼 𝑝 = .0062 < .01 , H0 is rejected. There is sufficient evidence to indicate the true mean is greater than 70 at 𝛼 = .01.

f.

Since the observed value of the test statistics falls in the rejection region 𝑧 = 2.5 > 2.33 , H0 is rejected. There is sufficient evidence to indicate the true mean is greater than 70 at 𝛼 = .01.

g.

Yes, the conclusions in parts e and f agree.

a.

The decision rule is to reject H0 if 𝑥̄ > 270. Recall that 𝑧 =

̄ ̄

.

Therefore, reject H0 if 𝑥̄ > 270 can be written as reject H0 if 𝑧 > The decision rule in terms of z is to reject H0 if 𝑧 > 2.14.

7.30

b.

𝑃 𝑧 > 2.14 = .5 − 𝑃 0 < 𝑧 < 2.14 = .5 − .4838 = .0162

a.

𝐻 : 𝜇 = 100 𝐻 : 𝜇 > 100 The test statistic is z =

x − μ0

σx

=

x − μ0

σ/ n

110 − 100 60 / 100

= 1.67

Copyright © 2022 Pearson Education, Inc.

̄ ̄

=

/√

= 2.14.


Inferences Based on a Single Sample: Tests of Hypothesis

317

The rejection region requires 𝛼 = .05 in the upper tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.645. The rejection region is 𝑧 > 1.645. Since the observed value of the test statistic falls in the rejection region, 𝑧 = 1.67 > 1.645 , H0 is rejected. There is sufficient evidence to indicate the true population mean is greater than 100 at 𝛼 = .05. b.

𝐻 : 𝜇 = 100 𝐻 : 𝜇 ≠ 100 The test statistic is z =

x − μ0

σx

110 − 100 60 / 100

= 1.67

The rejection region requires 𝛼/2 = .05/2 = .025 in each tail of the z-distribution. From Table II, Appendix D, 𝑧. 25 = 1.96. The rejection region is 𝑧 < −1.96 or 𝑧 > 1.96. Since the observed value of the test statistic does not fall in the rejection region, 𝑧 = 1.67 ≯ 1.96 , H0 is not rejected. There is insufficient evidence to indicate 𝜇 does not equal 100 at 𝛼 = .05.

7.31

c.

In part a, we rejected H0 and concluded the mean was greater than 100. In part b, we did not reject H0. There was insufficient evidence to conclude the mean was different from 100. Because the alternative hypothesis in part a is more specific than the one in b, it is easier to reject H0.

a.

𝐻 : 𝜇 = .36 𝐻 : 𝜇 < .36 𝑥̄ −𝜇

The test statistic is 𝑧 = 𝜎 0 ≈ .323−.36 = −1.61 𝑥

.034/ 64

The rejection region requires 𝛼 = .10 in the lower tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.28. The rejection region is 𝑧 < −1.28. Since the observed value of the test statistic falls in the rejection region 𝑧 = −1.61 < −1.28 , H0 is rejected. There is sufficient evidence to indicate the mean is less than .36 at 𝛼 = .10. b.

𝐻 : 𝜇 = .36 𝐻 : 𝜇 ≠ .36

The test statistic is 𝑧 = −1.61 (see part a). The rejection region requires 𝛼/2 = .10/2 = .05 in the each tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.645. The rejection region is 𝑧 < −1.645 or 𝑧 > 1.645. Since the observed value of the test statistic does not fall in the rejection region, 𝑧 = −1.61< − 1.645 , H0 is not rejected. There is insufficient evidence to indicate the mean is different from .36 at 𝛼 = .10. 7.32

a.

Let 𝜇 =true mean level of support. To determine if the true mean level of support differs from 75, we test: 𝐻 : 𝜇 = 75 𝐻 : 𝜇 ≠ 75

Copyright © 2022 Pearson Education, Inc.


318

Chapter 7

b.

For this problem, a Type I error would be concluding the true mean level of support differs from 75 when, in fact, the true mean level of support is 75. For this problem, a Type II error would be concluding the true mean level of support equals 75 when, in fact, the true mean level of support differs from 75.

7.33

c.

The test statistic is 𝑧 = −8.4920 and the p-value is 𝑝 < .0001.

d.

Since the p-value is less than 𝛼 𝑝 < .0001 < .05 , H0 is rejected. There is sufficient evidence to indicate the true mean level of support differs from 75 at 𝛼 = .05.

e.

We do not need to make any assumptions about the distribution of support levels. The sample size is very large 𝑛 = 992 . Thus, the Central Limit Theorem holds and no assumptions are necessary.

a.

Let 𝜇 = the true mean monthly profit of all the ride-share drivers. To determine if the true mean monthly profit is less than $1,000, we test: 𝐻 : 𝜇 = 1,000 𝐻 : 𝜇 < 1,000

b.

The rejection region requires 𝛼 = .10 in the lower tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.28. The rejection region is 𝑧 < −1.28.

c.

The test statistic is 𝑧 = 𝜎 0 ≈ 661−1000 = −19.47 583

d.

The p-value obtained from the software is p < .0001

e.

Rejection Region Method: Since the observed value of the test statistic does fall in the rejection region, 𝑧 = −19.47< − 1.28 , H0 is rejected.

𝑥̄ −𝜇 𝑥

1,121

P-value Method: Since the p-value is less than 𝛼 𝑝 < .0001 < .10 , H0 is rejected. There is sufficient evidence to indicate the true mean monthly profit is less than $1,000 at 𝛼 = .10. 7.34

a.

Let 𝜇 = true mean ratio of fup/fumic. To determine if the true mean ratio differs from 1, we test: 𝐻 :𝜇 = 1 𝐻 :𝜇 ≠ 1

b.

Looking simply at the value of 𝑥̄ does not take the variation in the sampling process into account. We need to see if a value of 𝑥̄ = .327 is unusual if the true mean value of the ratio is 1.

c.

From the printout, the test statistic is 𝑧 = −47.09 and the p-value is 𝑝 < .0001.

d.

Suppose we select 𝛼 = .05. The probability of rejecting the null hypothesis when it is true is .05. In this case, the probability of concluding the mean ratio is different from 1 when the true mean ratio is equal to 1.

e.

Since the p-value is less than 𝛼 𝑝 < .0001 < .05 , H0 is rejected. There is sufficient evidence to indicate the mean ratio is different from 1.

f.

Because the sample size is so large 𝑛 = 416 , the Central Limit Theorem applies. We must assume that the sample was a random sample.

Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Tests of Hypothesis

7.35

319

Let 𝜇 = true mean facial width-to-height ratio. To determine if the true mean facial width-to-height ratio differs from 2.2, we test: 𝐻 : 𝜇 = 2.2 𝐻 : 𝜇 ≠ 2.2

The test statistic is 𝑧 =

̄

.

=

.

. √

= −11.87.

The p-value is 𝑝 = 𝑃 𝑧 ≤ −11.87 + 𝑃 𝑧 ≥ 11.87 ≈ 0 + 0 = 0. Since the p-value is so small 𝑝 ≈ 0 , H0 will be rejected for any reasonable value of 𝛼. There is sufficient evidence to indicate the true mean facial width-to-height ratio differs from 2.2 for 𝛼 > .001. 7.36

a.

Let 𝜇 =true mean rate of return of round-trip trades. To determine if the true mean rate of return of round-trip trades is positive, we test: 𝐻 :𝜇 = 0 𝐻 :𝜇 > 0

7.37

b.

The rejection region requires 𝛼 = .05 in the upper tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.645. The rejection region is 𝑧 > 1.645.

c.

𝛼 = probability of making a Type I error or the probability of rejecting H0 when H0 is true. Thus, 𝛼 = probability of concluding the true mean rate of return of round-trip trades is positive when, in fact, it is not.

d.

The test statistic is 𝑡 = 4.73 and the p-value is 𝑝 = 0.000.

e.

Since the p-value is less than 𝛼 𝑝 = 0.000 < .05 , H0 is rejected. There is sufficient evidence to indicate the true mean rate of return of round-trip trades is positive at 𝛼 = .05.

a.

Let 𝜇 =true mean weight of golf tees. To determine if the process is not operating satisfactorily, we test: 𝐻 : 𝜇 = .250 𝐻 : 𝜇 ≠ .250

b.

Using MINITAB, the descriptive statistics are: Descriptive Statistics: Tees Variable N Mean Median StDev Tees 40 0.25248 0.25300 0.00223

Minimum Maximum Q1 Q3 0.24700 0.25600 0.25100 0.25400

Thus, 𝑥̄ = .25248 and 𝑠 = .00223. ̄

.

The test statistic is 𝑧 =

d.

The p-value is 𝑝 = 𝑃 𝑧 ≤ −7.03 + 𝑃 𝑧 ≥ 7.03 ≈ 0 + 0 = 0.

e.

The rejection region requires 𝛼/2 = .01/2 = .005 in each tail of the z-distribution. From Table II, = 2.575. The rejection region is 𝑧 < −2.575 or 𝑧 > 2.575. Appendix D, 𝑧.

̄

.

c.

.

/√

= 7.03.

Copyright © 2022 Pearson Education, Inc.


320

Chapter 7

f.

Since the observed value of the test statistic falls in the rejection region 𝑧 = 7.03 > 2.575 , H0 is rejected. There is sufficient evidence to indicate the process is performing in an unsatisfactory manner at 𝛼 = .01.

g.

𝛼is the probability of a Type I error. A Type I error, in this case, is to say the process is unsatisfactory when, in fact, it is satisfactory. The risk, then, is to the producer since he will be spending time and money to repair a process that is not in error. 𝛽 is the probability of a Type II error. A Type II error, in this case, is to say the process is satisfactory when it, in fact, is not. This is the consumer's risk since he could unknowingly purchase a defective product.

7.38

Let 𝜇 =true mean intension score. To determine if the true mean intension score is greater than 2, we test: 𝐻 :𝜇 = 2 𝐻 :𝜇 > 2

The test statistic is 𝑧 =

̄

.

=

. √

= 1.37.

The rejection region requires 𝛼 = .01in the upper tail of the z-distribution. From Table II, Appendix D, 𝑧. = 2.33. The rejection region is 𝑧 > 2.33. Since the observed value of the test statistic does not fall in the rejection region 𝑧 = 1.37 ≯ 2.33 , H0 is not rejected. There is insufficient evidence to indicate that the true mean intension score is greater than 2 at 𝛼 = .01. 7.39

a.

Let 𝜇 =mean estimated time to read the report. To determine if the students, on average, overestimate the time it takes to read the report, we test: 𝐻 : 𝜇 = 48 𝐻 : 𝜇 > 48 ̄

The test statistic is 𝑧 =

≈ ̄

/√

= 1.85.

The p-value is 𝑝 = 𝑃 𝑧 ≥ 1.85 = .5 − .4678 = .0322 (using Table II, Appendix D) Since the p-value is less than 𝛼 𝑝 = .0322 < .10 , H0 is rejected. There is sufficient evidence to indicate the students, on average, overestimate the time it takes to read the report at 𝛼 = .10. b.

Let 𝜇 =mean estimated number of pages of the report read. To determine if the students, on average, underestimate the number of report pages read, we test: 𝐻 : 𝜇 = 32 𝐻 : 𝜇 < 32

The test statistic is 𝑧 =

̄ ̄

/√

= −1.85.

The p-value is 𝑝 = 𝑃 𝑧 ≤ −1.85 = .5 − .4678 = .0322 (using Table II, Appendix D) Since the p-value is less than 𝛼 𝑝 = .0322 < .10 , H0 is rejected. There is sufficient evidence to indicate the students, on average, underestimate the number of report pages read at 𝛼 = .10.

Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Tests of Hypothesis

7.40

321

c.

No. In both tests, the sample sizes are greater than 30. Thus, the Central Limit Theorem will apply. The distribution of 𝑥̄ is approximately normal regardless of the population distribution.

a.

Let 𝜇 =true mean number of vouchers sold 30 minutes before the tipping point. To determine if the true mean number of vouchers sold 30 minutes before the tipping point is less than 5, we test: 𝐻 :𝜇 = 5 𝐻 :𝜇 < 5 ̄

The test statistic is𝑧 =

= √

.

= −.46.

. √ ,

The rejection region requires 𝛼 = .10 in the lower tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.28. The rejection region is 𝑧 < −1.28. Since the observed value of the test statistic does not fall in the rejection region 𝑧 = −.46 ≮ − 1.28 , H0 is not rejected. There is insufficient evidence to indicate that the true mean number of vouchers sold 30 minutes before the tipping point is less than 5 at 𝛼 = .10. b.

Let 𝜇 =true mean number of vouchers sold 30 minutes after the tipping point. To determine if the true mean number of vouchers sold 30 minutes after the tipping point is greater than 10, we test: 𝐻 : 𝜇 = 10 𝐻 : 𝜇 > 10

The test statistic is 𝑧 =

̄

=

. .

= 1.97.

√ ,

The rejection region requires 𝛼 = .10 in the upper tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.28. The rejection region is 𝑧 > 1.28. Since the observed value of the test statistic falls in the rejection region 𝑧 = 1.97 > 1.28 , H0 is rejected. There is sufficient evidence to indicate that the true mean number of vouchers sold 30 minutes after the tipping point is greater than 10 at 𝛼 = .10. 7.41

To determine if the mean amount of food wasted by all US household is less than 30%, we test: 𝐻 : 𝜇 = 30 𝐻 : 𝜇 < 30

The test statistic is 𝑧 =

̄

= √

. .

= 6.912.

√ ,

The rejection region requires 𝛼 = .05 in the lower tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.645. The rejection region is 𝑧 < −1.645. Since the observed value of the test statistic does not fall in the rejection region 𝑧 = 6.912 ≮ − 1.645 , H0 is not rejected. There is insufficient evidence to indicate that the true mean amount of food wasted by all US household is less than 30% at 𝛼 = .05. 7.42

a.

Let 𝜇 =average full-service fee (in thousands of dollars) of U.S. funeral homes in the current year. To determine if the average full-service fee is less than $7,640, we test:

Copyright © 2022 Pearson Education, Inc.


322

Chapter 7 𝐻 : 𝜇 = 7.64 𝐻 : 𝜇 < 7.64

b.

Using MINTAB, the output is: Statistics Variable FEE

Total Count Mean StDev Minimum 36

7.319

1.265

̄

.

The test statistic is 𝑧 =

= ̄

Q1 Median

5.700 6.525 .

. √

Q3 Maximum

7.100 7.900

12.100

= −1.52

The rejection region requires 𝛼 = .05 in the lower tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.645. The rejection region is 𝑧 < −1.645. Since the observed value of the test statistic does not fall in the rejection region, 𝑧 = −1.52 ≮ − 1.645 H0 is not rejected. There is insufficient evidence to indicate the true mean full-service fee of U.S. funeral homes in the current year is less than $7,640 at 𝛼 = .05.

7.43

c.

No. Since the sample size 𝑛 = 36 is greater than 30, the Central Limit Theorem applies. The distribution of 𝑥̄ is approximately normal regardless of the population distribution.

a.

To determine if the true mean forecast error for buy-side analysts is positive, we test: 𝐻 :𝜇 = 0 𝐻 :𝜇 > 0 ̄

The test statistic is 𝑧 =

.

≈ ̄

.

/√ ,

= 26.15.

The observed p-value of the test is 𝑝 = 𝑃 𝑧 > 26.15 ≈ 0 (Using Table II, Appendix D) Since the p-value is less than 𝛼 𝑝 ≈ 0 < .01 , H0 is rejected. There is sufficient evidence to indicate that the true mean forecast error for buy-side analysts is positive at 𝛼 = .01. This means that the buyside analysts are overestimating earnings. b.

To determine if the true mean forecast error for sell-side analysts is negative; we test: 𝐻 :𝜇 = 0 𝐻 :𝜇 < 0

The test statistic is 𝑧 =

̄ ̄

. .

/√

,

= −14.24.

The observed p-value of the test is 𝑝 = 𝑃 𝑧 < −14.24 ≈ 0 (using Table II, Appendix D) Since the p-value is less than 𝛼 𝑝 ≈ 0 < .01 , H0 is rejected. There is sufficient evidence to indicate that the true mean forecast error for sell-side analysts is negative at 𝛼 = .01. This means that the sellside analysts are underestimating earnings. 7.44

a.

To determine if the sample data refute the manufacturer's claim, we test: 𝐻 : 𝜇 = 10 𝐻 : 𝜇 < 10

b.

A Type I error is concluding the mean number of solder joints inspected per second is less than 10 when, in fact, it is 10 or more. Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Tests of Hypothesis

323

A Type II error is concluding the mean number of solder joints inspected per second is at least 10 when, in fact, it is less than 10. c.

Using MINITAB, the descriptive statistics are: Descriptive Statistics: PCB Variable N Mean Median PCB 48 9.292 9.000

StDev 2.103

Minimum 0.000

Maximum 13.000

Q1 9.000

Q3 10.000

𝐻 : 𝜇 = 10 𝐻 : 𝜇 < 10

The test statistic is z =

x − μ0

σx

9.292 − 10

2.103 / 48

= −2.33

The rejection region requires 𝛼 = .05 in the lower tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.645. The rejection region is 𝑧 < −1.645. Since the observed value of the test statistic falls in the rejection region 𝑧 = −2.33 < −1.645 , H0 is rejected. There is sufficient evidence to indicate the mean number of inspections per second is less than 10 at 𝛼 = .05. 7.45

a.

Let 𝜇 = mean external tension level of all managers who engage in coopetition. To determine if the mean external tension level differs from 10.5, we test: 𝐻 : 𝜇 = 10.5 𝐻 : 𝜇 ≠ 10.5

The test statistic is 𝑧 =

̄ ̄

.

=

.

. √ ,

= 4.12

The rejection region requires 𝛼/2 = .05/2 = .025 in each tail of the z-distribution. From Table II, = 1.96. The rejection region is 𝑧 < −1.96 or 𝑧 > 1.96. Appendix D, 𝑧. Since the observed value of the test statistic falls in the rejection region 𝑧 = 4.12 > 1.96 , H0 is rejected. There is sufficient evidence to indicate the true mean external tension level differs from 10.5 at 𝛼 = .05. b.

The 95% confidence interval is: 𝑥̄ ± 𝑧.

7.46

⇒ 10.82 ± 1.96

. √ ,

⇒ 10.82 ± .152 ⇒ 10.668,10.972

c.

If the hypothesized value of the test statistic falls in the confidence interval, then H0 is not rejected. If the hypothesized value of the test statistic does not fall in the confidence interval, then H0 is rejected.

d.

The observed value of𝑥̄ is very close to 10.5. Even though it is statistically significant, the mean external tension level may not be practically different from 10.5.

Let 𝜇 = mean number of crowdfunding backers for all entrepreneurial projects pitched via the internet. To determine if the mean number differs from 200, we test: 𝐻 : 𝜇 = 200 𝐻 : 𝜇 ≠ 200

Copyright © 2022 Pearson Education, Inc.


324

Chapter 7

The test statistic is found on the printout. It is 𝑧 = −1.24 The p-value for the test is found on the printout. It is p = 0.215. We will test at 𝛼 = .10. Since the observed p-value is greater than the level of significance 𝑝 = 0.215 > .10 = 𝛼 , H0 cannot be rejected. There is insufficient evidence to indicate the true number of crowdfunding backers for all entrepreneurial project pitched via the internet differs from 200 at 𝛼 = .10. The 90% confidence interval is found on the printout: (155.8, 206.2) Since the value 200 falls inside the endpoints, we cannot state that the true mean differs from this value. The confidence interval yields the same conclusion as the test of hypothesis. 7.47

7.48

7.49

a.

We should use the t-distribution in testing a hypothesis about a population mean if the sample size is small, the population being sampled from is normal, and the variance of the population is unknown.

b.

Both distributions are mound-shaped and symmetric. The t-distribution is flatter than the zdistribution.

a.

𝑃 𝑡 > 1.440 = .10 (Using Table III, Appendix D, with df = 6)

b.

𝑃 𝑡 < −1.782 = .05 (Using Table III, Appendix D, with df = 12)

c.

𝑃 𝑡 < −2.060 + 𝑃 𝑡 > 2.060 = .025 + .025 = .05 (Using Table III, Appendix D, with df = 25)

d.

The probability of a Type I error is computed above for each of the parts.

a.

The rejection region requires 𝛼/2 = .05/2 = .025 in each tail of the t-distribution with df = 𝑛 − 1 = 14 − 1 = 13. From Table III, Appendix D, 𝑡. 25 = 2.160. The rejection region is 𝑡 < −2.160 or 𝑡 > 2.160.

b.

The rejection region requires 𝛼 = .01 in the upper tail of the t-distribution with df = 𝑛 − 1 = 24 − 1 = 23. From Table III, Appendix D, 𝑡. = 2.500. The rejection region is 𝑡 > 2.500.

c.

The rejection region requires 𝛼 = .10 in the upper tail of the t-distribution with df = 𝑛 − 1 = 9 − 1 = 8. From Table III, Appendix D ,𝑡. = 1.397. The rejection region is 𝑡 > 1.397.

Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Tests of Hypothesis

7.50

325

d.

The rejection region requires 𝛼 = .01in the lower tail of the t-distribution with df = 𝑛 − 1 = 12 − 1 = 11. From Table III, Appendix D, 𝑡. = 2.718. The rejection region is 𝑡 < −2.718.

e.

The rejection region requires 𝛼/2 = .10/2 = .05 in each tail of the t-distribution with df = 𝑛 − 1 = 20 − 1 = 19. From Table III, Appendix D, 𝑡. = 1.729. The rejection region is 𝑡 < −1.729or 𝑡 > 1.729.

f.

The rejection region requires 𝛼 = .05 in the lower tail of the t-distribution with df = 𝑛 − 1 = 4 − 1 = 3. From Table III, Appendix D, 𝑡. = 2.353. The rejection region is 𝑡 < −2.353.

a.

𝐻 :𝜇 = 6 𝐻 :𝜇 < 6

The test statistic is 𝑡 =

̄ /√

=

. . /√

= −2.064

The necessary assumption is that the population is normal. The rejection region requires 𝛼 = .05 in the lower tail of the t-distribution with df = 𝑛 − 1 = 5 − 1 = 4. From Table III, Appendix D, 𝑡. = 2.132. The rejection region is 𝑡 < −2.132. Since the observed value of the test statistic does not fall in the rejection region 𝑡 = −2.064 ≮ − 2.132 , H0 is not rejected. There is insufficient evidence to indicate the mean is less than 6 at 𝛼 = .05. b.

𝐻 :𝜇 = 6 𝐻 :𝜇 ≠ 6

The test statistic is 𝑡 = −2.064 (from a). The assumption is the same as in a. The rejection region requires 𝛼/2 = .05/2 = .025 in each tail of the t-distribution with df = 𝑛 − 1 = 5 − 1 = 4. From Table III, Appendix D, 𝑡. = 2.776. The rejection region is 𝑡 < −2.776or 𝑡 > 2.776. Since the observed value of the test statistic does not fall in the rejection region 𝑡 = −2.064 ≮ − 2.776 , H0 is not rejected. There is insufficient evidence to indicate the mean is different from 6 at 𝛼 = .05. c.

For part a, the 𝑝-value = 𝑃 𝑡 ≤ −2.064 . Using MINITAB, Cumulative Distribution Function Student's t distribution with 4 DF x -2.064

P( X <= x ) 0.0539809

The p-value is𝑝 = .05398. For part b, the 𝑝-value = 𝑃 𝑡 ≤ −2.064 + 𝑃 𝑡 ≥ 2.064 . The p-value is 𝑝 = 2 . 05398 = .10796.

Copyright © 2022 Pearson Education, Inc.


326

Chapter 7

7.51

a.

We must assume that a random sample was drawn from a normal population.

b.

The hypotheses are: 𝐻 : 𝜇 = 1,000 𝐻 : 𝜇 > 1,000

The test statistic is 𝑡 = 1.89 and the p-value is 𝑝 = .038. Since the p-value is so small, there is evidence to reject H0. There is evidence to indicate the mean is greater than 1000 for 𝛼 > .038. c.

The hypotheses are: 𝐻 : 𝜇 = 1,000 𝐻 : 𝜇 ≠ 1,000

The test statistic is 𝑡 = 1.89 and the p-value is 2 . 038 = .076. There is no evidence to reject H0 for 𝛼 = .05. There is insufficient evidence to indicate the mean is different than 1000 for 𝛼 = .05. There is evidence to reject H0 for 𝛼 > .076. There is evidence to indicate the mean is different than 1000 for 𝛼 > .076. 7.52

a.

Let 𝜇 = mean December 2019 retail whole milk price for all US cities. To determine if the mean price exceeds $3.30, we test: 𝐻 : 𝜇 = 3.30 𝐻 : 𝜇 > 3.30

b.

The test statistic is 𝑡 =

̄

= √

.

. . √

= 1.11.

c.

The rejection region requires 𝛼 = .01 in the upper tail of the t-distribution with df = 𝑛 − 1 = 10 − 1 = 9. From Table III, Appendix D, 𝑡. = 2.821. The rejection region is 𝑡 > 2.821.

d.

Since the observed value of the test statistic does not fall in the rejection region 𝑡 = 1.11 ≯ 2.821 , H0 is not rejected. There is insufficient evidence to indicate the mean December 2019 retail whole milk price for all US cities exceeds $3.30 at 𝛼 = .01.

e.

𝛼 is the probability of a Type I Error. It is the probability that we claim that the true mean price exceeds $3.30, when, in reality, the true mean price really equals $3.30.

f.

Since the p-value is greater than 𝛼 𝑝 = .147 > .01 = 𝛼 , H0 is not rejected. There is insufficient evidence to indicate the mean December 2019 retail whole milk price for all US cities exceeds $3.30 at 𝛼 = .01.

g.

We must assume that a random sample was drawn from all US cities and that the distribution of all December 2019 milk prices follows an approximately normal population.

Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Tests of Hypothesis

7.53

a.

327

Let 𝜇 = mean number of occupational accidents at all Turkish construction sites. To determine if the mean number of occupational accidents is less than 70, we test: 𝐻 : 𝜇 = 70 𝐻 : 𝜇 < 70

b.

The rejection region requires 𝛼 = .01 in the lower tail of the t-distribution with df = 𝑛 − 1 = 3 − 1 = 2. From Table III, Appendix D, 𝑡. = 6.965. The rejection region is 𝑡 < −6.965.

c.

The test statistic is 𝑡 =

̄

= √

7.54

= −.29.

. √

d.

Since the observed value of the test statistic does not fall in the rejection region 𝑡 = −.29 ≮ − 6.965 , H0 is not rejected. There is insufficient evidence to indicate the mean number of occupational accidents at all Turkish construction sites is less than 70 at 𝛼 = .01.

e.

We must assume that a random sample was drawn from a normal population.

a.

To determine if the mean of the trap spacing measurements differs from 95 meters, we test: 𝐻 : 𝜇 = 95 𝐻 : 𝜇 ≠ 95

b.

The value of 𝑥̄ varies from sample to sample. The next sample may yield a value of 𝑥̄ that is greater than 95. We must determine how unusual a value of 𝑥̄ = 89.9 is if the true mean is 95.

c.

The test statistic is 𝑡 =

̄

= √

d.

. .

= −1.16.

Using MINITAB, the results are: One-Sample T Test of mu = 95 vs not = 95 N 7

Mean 89.9000

StDev 11.6000

SE Mean 4.3844

95% CI (79.1718, 100.6282)

T -1.16

P 0.289

The p-value is 𝑝 = .289. e.

Suppose we pick 𝛼 = .05. For this problem, 𝛼 = probability of concluding the mean trap spacing is different from 95 when, in fact, the mean trap spacing is equal to 95.

f.

Since the p-value is greater than 𝛼 𝑝 = .289 > .05 , H0 is not rejected. There is insufficient evidence to indicate the mean trap spacing is different from 95 at 𝛼 = .05.

g.

In order for the test to be valid, the population of trap spacing measurements must be normal and the sample must be random.

h.

From Exercise 6.127, the 95% confidence interval is (79.104, 100.616). Since 95 is contained in this interval, there is no evidence to indicate the mean trap spacing is different from 95. This agrees with the test in part f.

Copyright © 2022 Pearson Education, Inc.


328

Chapter 7

7.55

To determine if the mean level of radon exposure in the tombs is less than 6,000 Bq/m3, we test: 𝐻 : 𝜇 = 6,000 𝐻 : 𝜇 < 6,000

From the printout, the test statistic is 𝑡 = −1.82. Since this is a one-tailed test, the p-value is 𝑝 = .096/2 = .0480. Since the p-value is less than 𝛼 𝑝 = .048 < .10 , H0 is rejected. There is sufficient evidence to indicate the mean level of radon exposure is less than 6,000 Bq/m3 at 𝛼 = .10. 7.56

a.

Let 𝜇 =mean annualized percentage return on investment. To determine if the mean annualized percentage return on investment is positive, we test: 𝐻 :𝜇 = 0 𝐻 :𝜇 > 0

7.57

b.

From the printout the test statistic is 𝑡 = 3.81.

c.

The p-value is 𝑝 = 0.0021.

d.

Since the p-value is less than 𝛼 𝑝 = .0021 < .05 , H0 is rejected. There is sufficient evidence to indicate the mean annualized percentage return on investment is positive at 𝛼 = .05.

e.

We must assume that we have selected a random sample from the population and that the population of annualized percentage return on investments for all AAII stock screeners is normally distributed.

a.

Using MINITAB, the calculations are: One-Sample T: Velocity Test of μ = 0.338 vs ≠ 0.338 Variable Velocity

N 25

Mean 0.26208

StDev 0.04669

SE Mean 0.00934

90% CI (0.24610, 0.27806)

T -8.13

P 0.000

Let 𝜇 =mean bubble rising velocity. To determine if the mean bubble velocity differs from .338, we test: 𝐻 : 𝜇 = .338 𝐻 : 𝜇 ≠ .338

The test statistic is 𝑡 = −8.13 and the p-value is 𝑝 = .000. Since the p-value is less than 𝛼 𝑝 = .000 < .10 , H0 is rejected. There is sufficient evidence to indicate the mean bubble velocity differs from .338 at 𝛼 = .10. b. 7.58

No. Because the p-value is so small, it would be extremely unlikely that the data were generated at a sparging rate of 3.33 x 10-6.

Using MINITAB, the descriptive statistics are: Descriptive Statistics: Choice Variable Choice

N 11

Mean 58.91

StDev 7.78

Minimum 43.00

Q1 56.00

Median 58.00

Q3 62.00

Copyright © 2022 Pearson Education, Inc.

Maximum 76.00


Inferences Based on a Single Sample: Tests of Hypothesis

329

Let 𝜇 = mean choice score for consumers shopping with flexed arms. To determine if the mean choice score for consumers shopping with flexed arms is higher than 43, we test: 𝐻 : 𝜇 = 43 𝐻 : 𝜇 > 43 ̄

The test statistic is 𝑡 =

.

=

= 6.78.

. √

The rejection region requires 𝛼 = .05 in the upper tail of the t-distribution with df = 𝑛– 1 = 11– 1 = 10. From Table III, Appendix D, 𝑡. = 1.812. The rejection region is 𝑡 > 1.812. Since the observed value of the test statistic falls in the rejection region 𝑡 = 6.78 > 1.812 , H0 is rejected. There is sufficient evidence to indicate the mean choice score for consumers shopping with flexed arms is higher than 43 at 𝛼 = .05. 7.59

Let 𝜇 = population mean rank of China. To determine if this population mean rank is less than 15, we test: 𝐻 : 𝜇 = 15 𝐻 : 𝜇 < 15

The test statistic is 𝑡 =

̄

= √

.

= −5.217.

. √

The rejection region requires 𝛼 = .05 in the lower tail of the t-distribution with df = 𝑛– 1 = 12– 1 = 11. From Table III, Appendix D, 𝑡. = 1.796. The rejection region is 𝑡 < −1.796. Since the observed value of the test statistic falls in the rejection region 𝑡 = −5.217 < −1.796 , H0 is rejected. There is sufficient evidence to indicate the population mean rank of China is less than 15 at 𝛼 = .05. 7.60

Using MINITAB, the descriptive statistics are: Descriptive Statistics: Dioxide Variable Dioxide

a.

Oil No Yes

N 10 6

Mean 2.590 0.517

StDev 1.542 0.407

Minimum 0.100 0.200

Q1 1.125 0.200

Median 2.850 0.450

Q3 4.000 0.700

Maximum 4.000 1.300

To determine if the mean amount of dioxide present in water specimens that contain oil is less than 3 mg/l, we test: 𝐻 :𝜇 = 3 𝐻 :𝜇 < 3

The test statistic is𝑡 =

̄

= √

.

= −14.94.

. √

The rejection region requires 𝛼 = .10 in the lower tail of the t-distribution with df = 𝑛– 1 = 6– 1 = 5. From Table III, Appendix D, 𝑡. = 1.476. The rejection region is 𝑡 < − 1.476. Since the observed value of the test statistic falls in the rejection region 𝑡 = −14.94 < − 1.476 , H0 is rejected. There is sufficient evidence to indicate the mean amount of dioxide present in water specimens that contain oil is less than 3 mg/l at 𝛼 = .10.

Copyright © 2022 Pearson Education, Inc.


330

Chapter 7

b.

To determine if the mean amount of dioxide present in water specimens that do not contain oil is less than 3 mg/l, we test: 𝐻 :𝜇 = 3 𝐻 :𝜇 < 3

The test statistic is𝑡 =

̄

.

=

. √

= −0.84.

The rejection region requires 𝛼 = .10 in the lower tail of the t-distribution with df = 𝑛– 1 = 10– 1 = 9. From Table III, Appendix D, 𝑡. = 1.383. The rejection region is 𝑡 < − 1.383. Since the observed value of the test statistic does not fall in the rejection region 𝑡 = −0.83 ≮ − 1.383 , H0 is not rejected. There is insufficient evidence to indicate the mean amount of dioxide present in water specimens that do not contain oil is less than 3 mg/l at 𝛼 = .10. 7.61

Using MINITAB, the preliminary calculations are: One-Sample T: Hardness Test of μ = 76 vs > 76 Variable Hardness

N 3

Mean 82.00

StDev 2.65

SE Mean 1.53

95% Lower Bound 77.54

T 3.93

P 0.030

Let 𝜇 =mean hardness of polyester composite mixture with 40% CKD weight ratio. To determine if using a 40% CKD weight ratio increases the mean hardness of polyester composite mixture, we test: 𝐻 : 𝜇 = 76 𝐻 : 𝜇 > 76

The test statistic is 𝑡 = 3.93 and the p-value is 𝑝 = .030. Since the p-value is small 𝑝 = .030 , H0 is rejected. There is sufficient evidence to indicate that using a 40% CKD weight ratio increases the mean hardness of polyester composite mixture for any value of 𝛼 > .03. 7.62

a.

𝜇= = = 1.16 This number represents the population 𝜇 because it is the mean of the entire population of states.

b.

Answers will vary. Using MINITAB, a random sample of 5 states yielded California (1), Indiana (0), Louisiana (2), Michigan (3), and Montana (0).

c.

Using MINITAB, the preliminary calculations are: Descriptive Statistics N 5

Mean 1.200

StDev 1.304

SE Mean 95% CI for μ 0.583 (-0.419, 2.819)

Test Null hypothesis Alternative hypothesis T-Value P-Value -1.37 0.242

H₀: μ = 2 H₁: μ ≠ 2

Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Tests of Hypothesis

331

To determine if the mean number of active nuclear power plants operating in all states differs from 2, we test: 𝐻 :𝜇 = 2 𝐻 :𝜇 ≠ 2

The test statistic is 𝑡 = −1.37 and the p-value is 𝑝 = .242. Since the p-value is greater than 𝛼 𝑝 = .242 > .10 , H0 is not rejected. There is insufficient evidence to indicate the mean number of active nuclear power plants operating in all states differs from 2 at 𝛼 = .10. d.

7.63

Two possible reasons for incorrectly rejecting the null hypothesis is (1) The population of the number of active nuclear power plants is not normally distributed, but skewed to the right. (2) The sample size is very small. It is possible to get an unusual sample with a very small sample size.

Using MINITAB, the descriptive statistics for the 2 plants are: Descriptive Statistics: AL1, AL2 Variable N Mean StDev AL1 2 0.00750 0.00354 AL2 2 0.0700 0.0283

Minimum 0.00500 0.0500

Q1 * *

Median 0.00750 0.0700

Q3 * *

Maximum 0.01000 0.0900

To determine if plant 1 is violating the OSHA standard, we test: 𝐻 : 𝜇 = .004 𝐻 : 𝜇 > .004 ̄

The test statistic is 𝑡 =

=

.

. .

= 1.40

The p-value is 𝑝 = 𝑃 𝑡 > 1.40 . Using Minitab with df = 𝑛 − 1 = 2 − 1 = 1, the p-value is 𝑝 = .197. Since the p-value is not small, H0 is not rejected. There is insufficient evidence to indicate the OSHA standard is violated by plant 1 for any value of 𝛼 < .197. To determine if plant 2 is violating the OSHA standard, we test: 𝐻 : 𝜇 = .004 𝐻 : 𝜇 > .004

The test statistic is 𝑡 =

̄

= √

.

. .

= 3.30

The p-value is 𝑝 = 𝑃 𝑡 > 1.40 . Using Minitab withdf = 𝑛 − 1 = 2 − 1 = 1, the p-value is 𝑝 = .096. Since the p-value is not small, H0 is not rejected. There is insufficient evidence to indicate the OSHA standard is violated by plant 2 for any value of 𝛼 < .096. If 𝛼 > .096, H0 is rejected. There is sufficient evidence to indicate the OSHA standard is violated by plant 2 for any value of 𝛼 > .096. 7.64

a.

Since the value of 𝑝̂ (.63) is much smaller than the hypothesized value of p (.70), it is likely that the null hypothesis is not correct.

b.

First, check to see if n is large enough. 𝑛𝑝 = 100 . 7 = 70 and 𝑛𝑞 = 100 . 3 = 30 Copyright © 2022 Pearson Education, Inc.


332

Chapter 7

Since both 𝑛𝑝 ≥ 15 and 𝑛𝑞 ≥ 15, the normal approximation will be adequate. 𝐻 : 𝑝 = .70 𝐻 : 𝑝 < .70

The test statistic is 𝑧 =

=

=

.

. .

.

= −1.53

The rejection region requires 𝛼 = .05 in the lower tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.645. The rejection region is 𝑧 < − 1.645. Since the observed value of the test statistic does not fall in the rejection region 𝑧 = −1.53 ≮ − 1.645 , H0 is not rejected. There is insufficient evidence to indicate that the proportion is less than .70 at 𝛼 = .05.

7.65

c.

𝑝-value = 𝑝 = 𝑃 𝑧 ≤ −1.53 = .5 − .4370 = .0630. Since p is not less than 𝛼 = .05, H0 is not rejected.

a.

z=

b.

The denominator in Exercise 7.64 is

pˆ − p 0 p 0q 0 n

=

.83 − .9 .9(.1) 100

= −2.33

.

.

= .0458 as compared to

. .

= .03 in part a. Since the

denominator in this problem is smaller, the absolute value of z is larger. c.

The rejection region requires 𝛼 = .05 in the lower tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.645. The rejection region is 𝑧 < −1.645. Since the observed value of the test statistic falls in the rejection region 𝑧 = −2.33 < −1.645 , H0 is rejected. There is sufficient evidence to indicate the population proportion is less than .9 at 𝛼 = .05.

7.66

d.

The p-value= 𝑝 = 𝑃 𝑧 ≤ −2.33 = .5 − .4901 = .0099 (from Table II, Appendix D). Since the pvalue is less than 𝛼 = .05, H0 is rejected.

a.

No. The p-value is the probability of observing your test statistic or anything more unusual if H0 is true. For this problem, the 𝑝-value = .3300/2 = .1650. Given the true value of the population proportion, p, is .5, the probability of observing a test statistic of 𝑧 = .44 or larger is .1650. Since the p-value is not small 𝑝 = .1650 , there is no evidence to reject H0. There is no evidence to indicate the population proportion is greater than .5.

b.

If the alternative hypothesis were two-tailed, the p-value would be 2 times the p-value for a one-tailed test. For this problem, the 𝑝-value = .3300. The probability of observing your test statistic or anything more unusual if H0 is true is .3300. Since the p-value is so large, there is no evidence to reject H0 for 𝛼 ≤ .10. There is no evidence to indicate that 𝑝 ≠ .5 for 𝛼 ≤ .10.

7.67

Because p is the proportion of consumers who do not like the snack food, 𝑝̂ will be:

Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Tests of Hypothesis pˆ =

333

Number of 0 's in sample 29 = = .58 n 50

First, check to see if the normal approximation will be adequate: 𝑛𝑝 = 50 . 5 = 25

𝑛𝑞 = 50 . 5 = 25

Since both 𝑛𝑝 ≥ 15 and 𝑛𝑞 ≥ 15, the normal distribution will be adequate. a.

𝐻 : 𝑝 = .5 𝐻 : 𝑝 > .5 pˆ − p 0

The test statistic is z =

σ pˆ

pˆ − p 0

=

p 0q 0 n

=

.58 − .5 .5 (1 − .5) 50

= 1.13 .

The rejection region requires 𝛼 = .10 in the upper tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.28. The rejection region is 𝑧 > 1.28. Since the observed value of the test statistic does not fall in the rejection region 𝑧 = 1.13 ≯ 1.28 , H0 is not rejected. There is insufficient evidence to indicate the proportion of customers who do not like the snack food is greater than .5 at 𝛼 = .10. b. 7.68

7.69

𝑝 − value = 𝑝 = 𝑃 𝑧 ≥ 1.13 = .5 − .3708 = .1292 (using Table II, Appendix D)

The sample size is large enough to use the normal approximation if 𝑛𝑝 ≥ 15 and 𝑛𝑞 ≥ 15. a.

𝑛𝑝 = 900 . 975 = 877.5 > 15 and 𝑛𝑞 = 900 . 025 = 22.5 > 15. Thus, the sample size is large enough.

b.

𝑛𝑝 = 125 . 01 = 1.25 < 15 and 𝑛𝑞 = 125 . 99 = 123.75 > 15. Thus, the sample size is not large enough.

c.

𝑛𝑝 = 40 . 75 = 30 > 15 and 𝑛𝑞 = 40 . 25 = 10 < 15. Thus, the sample size is not large enough.

d.

𝑛𝑝 = 15 . 75 = 11.25 < 15 and 𝑛𝑞 = 15 . 25 = 3.75 < 15. Thus, the sample size is not large enough.

e.

𝑛𝑝 = 12 . 62 = 7.44 < 15 and 𝑛𝑞 = 12 . 38 = 4.56 < 15. Thus, the sample size is not large enough.

a.

𝑝̂ =

b.

To determine if more than 80% of all customers would participate in a loyalty card program, we test:

= .9

𝐻 : 𝑝 = .8 𝐻 : 𝑝 > .8

c.

The test statistic is𝑧 =

=

.

. .

.

= 3.95.

Copyright © 2022 Pearson Education, Inc.


334

7.70

Chapter 7

d.

The rejection region requires 𝛼 = .01in the upper tail of the z-distribution. From Table II, Appendix D, 𝑧. = 2.33. The rejection region is 𝑧 > 2.33.

e.

The p-value is 𝑝 = 𝑃 𝑧 > 3.95 = .5 − .49996 = .00004 ≈ 0 (Using Table II, Appendix D)

f.

Since the observed value of the test statistic falls in the rejection region 𝑧 = 3.95 > 2.33 , H0 is rejected. There is sufficient evidence to indicate that more than 80% of all customers would participate in a loyalty card program at 𝛼 = .01.

g.

Since the p-value is less than 𝛼 𝑝 = 0 < .01 , H0 is rejected. There is sufficient evidence to indicate that more than 80% of all customers would participate in a loyalty card program at 𝛼 = .01.

a.

𝑝̂ =

= .2

To determine if the true proportion of cord cutters differs from 10%, we test: 𝐻 : 𝑝 = .10 𝐻 : 𝑝 ≠ .10

7.71

.

The test statistic is 𝑧 =

c.

The rejection region requires 𝛼/2 = .05/2 = .025 in each tail of the z-distribution. From Table II, = 1.96. The rejection region is 𝑧 < −1.96 or 𝑧 > 1.96. Appendix D, 𝑧.

d.

The p-value is 𝑝 = 2𝑃 𝑧 > 9.43 ≈ 2(.5 − .5) ≈ 2(0) ≈ 0 (Using Table II, Appendix D)

e

Since the observed value of the test statistic falls in the rejection region (𝑧 = 9.43 > 1.96), H0 is rejected. There is sufficient evidence to indicate that the true proportion of cord cutters differs from 10% at 𝛼 = .05.

a.

𝑝̂ =

=

,

=

.

b.

.

.

= 9.43.

= .390

𝐻 : 𝑝 = .5 𝐻 : 𝑝 < .5

The test statistic is =

=

.

. . (. ) ,

= −9.08.

The rejection region requires 𝛼 = .05 in the lower tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.645. The rejection region is 𝑧 < −1.645. Since the observed value of the test statistic falls in the rejection region (𝑧 = −9.08 < −1.645), H0 is rejected. There is sufficient evidence to indicate the proportion of companies that had not implemented a whistle-blower hotline in 2017 is less than .5 at 𝛼 = .05. b.

𝑝̂ =

=

= .640

𝐻 : 𝑝 = .5 𝐻 : 𝑝 > .5

Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Tests of Hypothesis

The test statistic is𝑧 =

.

. . (

. )

335

= 7.32.

The rejection region requires 𝛼 = .05 in the upper tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.645. The rejection region is 𝑧 > 1.645. Since the observed value of the test statistic falls in the rejection region (𝑧 = 7.32 > 1.645), H0 is rejected. There is sufficient evidence to indicate the proportion of companies that had not implemented a whistle-blower hotline in 2013 is greater than .5 at 𝛼 = .05. 7.72

a.

If there is no relationship between color and gummy bear flavor, then .5 of the population of students will correctly identify the color.

b.

To determine if color and flavor are related, we test: 𝐻 : 𝑝 = .5 𝐻 : 𝑝 ≠ .5

7.73

c.

From the printout, the p-value is 𝑝 < .0001. Since the p-value is less than 𝛼 (𝑝 < .0001 < .01), H0 is rejected. There is sufficient evidence to indicate that color and flavor are related at 𝛼 = .01.

a.

To determine whether the true proportion of toothpaste brands with the ADA seal verifying effective decay prevention is less than .5, we test: 𝐻 : 𝑝 = .5 𝐻 : 𝑝 < .5

7.74

b.

From the printout, the p-value is 𝑝 = .188.

c.

Since the observed p-value is greater than 𝛼 (𝑝 = .188 > .10), H0 is not rejected. There is insufficient evidence to indicate the true proportion of toothpaste brands with the ADA seal verifying effective decay prevention is less than .5 at 𝛼 = .10.

a.

𝑝̂ =

=

, ,

= .440

Let p = the true proportion of all US home buyers who look online for properties as the first step in the home search process. To determine if the true proportion exceeds one-third, we test: 𝐻 : 𝑝 = 1/3 𝐻 : 𝑝 > 1/3

The test statistic is 𝑧 =

=

.

. (.

.

)

= 17.34.

,

The rejection region requires 𝛼 = .05 in the upper tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.645. The rejection region is 𝑧 > 1.645. Since the observed value of the test statistic falls in the rejection region (𝑧 = 17.34 > 1.645), H0 is rejected. There is sufficient evidence to indicate the true proportion of all US home buyers who look online for properties as the first step in the home search process exceeds one-third at 𝛼 = .05. b.

Since such a small percentage of home buyers responded, the results may not accurately reflect the habits of the entire population. Copyright © 2022 Pearson Education, Inc.


336

Chapter 7

7.75

𝑝̂ =

=

= .214

Let p = the true proportion of all Washington drivers who tested positive for marijuana To determine if the true proportion exceeds 9%, we test: 𝐻 : 𝑝 = .09 𝐻 : 𝑝 > .09

The test statistic is 𝑧 =

=

.

. .

(.

)

= 12.34.

The rejection region requires 𝛼 = .05 in the upper tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.645. The rejection region is 𝑧 > 1.645. Since the observed value of the test statistic falls in the rejection region (𝑧 = 12.34 > 1.645), H0 is rejected. There is sufficient evidence to indicate the true proportion of all Washington drivers who tested positive for marijuana exceeds 9% at 𝛼 = .05. 7.76

𝑝̂ =

=

= .806

To determine if the percentage of European dairy farms that carry out calf dehorning differs from 80%, we test: 𝐻 : 𝑝 = .80 𝐻 : 𝑝 ≠ .80

The test statistic is z =

pˆ − p 0 p 0q 0 n

=

.806 − .80

(.80)(1 − .80)

= .38 .

639

Since no 𝛼 was given, we will use 𝛼 = .05. The rejection region requires 𝛼/2 = .05/2 = .025 in each tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.96. The rejection region is 𝑧 < −1.96 or 𝑧 > 1.96. Since the observed value of the test statistic does not fall in the rejection region (𝑧 = .38 ≯ 1.96), H0 is not rejected. There is insufficient evidence to indicate the percentage of European dairy farms that carry out calf dehorning differs from 80% at 𝛼 = .05. This supports the figure reported by SANKO. 7.77

a.

Let 𝑝 =proportion of middle-aged women who exhibit skin improvement after using the cream. For this problem, 𝑝̂ = = = .727. First we check to see if the normal approximation is adequate: 𝑛𝑝 = 33(. 6) = 19.8, 𝑛𝑞 = 33(. 4) = 13.2 Since 𝑛𝑞 = 13.2 is less than 15, the assumption of normality may not be valid. We will go ahead and perform the test. To determine if the cream will improve the skin of more than 60% of middle-aged women, we test: 𝐻 : 𝑝 = .60 𝐻 : 𝑝 > .60

Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Tests of Hypothesis

The test statistic is 𝑧 =

=

.

. .

(.

)

337

= 1.49

The rejection region requires 𝛼 = .05 in the upper tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.645. The rejection region is 𝑧 > 1.645. Since the observed value of the test statistic does not fall in the rejection region (𝑧 = 1.49 ≯ 1.645), H0 is not rejected. There is insufficient evidence to indicate the cream will improve the skin of more than 60% of middle-aged women at 𝛼 = .05. b.

7.78

The p-value is 𝑝 = 𝑃(𝑧 ≥ 1.49) = (. 5 − .4319) = .0681. (Using Table II, Appendix D.) Since the p-value is greater than 𝛼 (𝑝 = .0681 > .05), H0 is not rejected. There is insufficient evidence to indicate the cream will improve the skin of more than 60% of middle-aged women at 𝛼 = .05.

Le 𝑝 =proportion of all ride-share drivers that make less than minimum wage. For this problem, 𝑝̂ =

=

= .740

,

To determine if the true proportion of all ride-share drivers that make less than minimum wage is less than .70, we test: 𝐻 : 𝑝 = .7 𝐻 : 𝑝 < .7

The test statistic is z =

=

.

. . (. ) ,

= 2.95.

The rejection region requires 𝛼 = .10 in the lower tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.282. The rejection region is 𝑧 < −1.282. Since the observed value of the test statistic does not fall in the rejection region (𝑧 = 2.95 ≮ − 1.282), H0 is not rejected. There is insufficient evidence to indicate the true proportion of all ride-share drivers that make less than minimum wage is less than .70 at 𝛼 = .05. 7.79

The target population would be the entire set of households with annual incomes of at least $50,000. The experimental unit would be a single household with an annual income of at least $50,000. The variable of interest would be the answer to the question, “Does your household own a 4K TV?” The parameter of interest would be p = proportion of all households with annual incomes of at least $50,000 that own a 4K TV. To determine if the true proportion of all households with annual incomes of at least $50,000 that own a 4K TV differs from 22%, we test: 𝐻 : 𝑝 = .22 𝐻 : 𝑝 ≠ .22

To test this population proportion, the test statistic is 𝑧 =

.

Copyright © 2022 Pearson Education, Inc.


338

Chapter 7

7.80

Let p = proportion of students choosing the three-grill display so that Grill #2 is a compromise between a more desirable and a less desirable grill. 𝑝̂ =

=

= .685

To determine if the proportion of students choosing the three-grill display so that Grill #2 is a compromise between a more desirable and a less desirable grill is greater than .167, we test: 𝐻 : 𝑝 = .167 𝐻 : 𝑝 > .167

The test statistic is 𝑧 =

=

.

. .

(.

)

= 15.47

The rejection region requires 𝛼 = .05 in the upper tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.645. The rejection region is 𝑧 > 1.645. Since the observed value of the test statistic falls in the rejection region (𝑧 = 15.47 > 1.645), H0 is rejected. There is sufficient evidence to indicate that the true proportion of students choosing the three-grill display so that Grill #2 is a compromise between a more desirable and a less desirable grill is greater than .167 at 𝛼 = .05. 7.81

To minimize the probability of a Type I error, we will select 𝛼 = .01. First, check to see if the normal approximation is adequate: 𝑛𝑝 = 100(. 5) = 50

𝑛𝑞 = 100(. 5) = 50

Since both 𝑛𝑝 ≥ .15 and 𝑛𝑞 ≥ .15, the normal distribution will be adequate 𝑝̂ =

=

= .56

To determine if more than half of all Diet Coke drinkers prefer Diet Pepsi, we test: 𝐻 : 𝑝 = .5 𝐻 : 𝑝 > .5

The test statistic is z =

pˆ − p 0 p 0q 0 n

=

.56 − .5 .5 (.5) 100

= 1.20

The rejection region requires 𝛼 = .01in the upper tail of the z-distribution. From Table II, Appendix D, 𝑧. = 2.33. The rejection region is𝑧 > 2.33. Since the observed value of the test statistic does not fall in the rejection region (𝑧 = 1.20 ≯ 2.33), H0 is not rejected. There is insufficient evidence to indicate that more than half of all Diet Coke drinkers prefer Diet Pepsi at 𝛼 = .01. Since H0 was not rejected, there is no evidence that Diet Coke drinkers prefer Diet Pepsi.

Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Tests of Hypothesis

7.82

7.83

7.84

339

Using Table IV, Appendix D: a. b. c.

For𝑛 = 12, df = 𝑛 − 1 = 12 − 1 = 11, 𝑃(𝜒 > 𝜒 ) = .10 ⇒ 𝜒 = 17.2750 For𝑛 = 9, df = 𝑛 − 1 = 9 − 1 = 8, 𝑃(𝜒 > 𝜒 ) = .05 ⇒ 𝜒 = 15.5073 For𝑛 = 5, df = 𝑛 − 1 = 5 − 1 = 4, 𝑃(𝜒 > 𝜒 ) = .025 ⇒ 𝜒 = 11.1433

a.

df = 𝑛 − 1 = 16 − 1 = 15; reject H0 if 𝜒 < 6.26214 or 𝜒

b.

df = 𝑛 − 1 = 23 − 1 = 22; reject H0 if 𝜒

> 40.2894

c.

df = 𝑛 − 1 = 15 − 1 = 14; reject H0 if 𝜒

> 21.0642

d.

df = 𝑛 − 1 = 13 − 1 = 12; reject H0 if 𝜒 < 3.57056

e.

df = 𝑛 − 1 = 7 − 1 = 6; reject H0 if 𝜒 < 1.63539 or𝜒

f.

df = 𝑛 − 1 = 25 − 1 = 24; reject H0 if 𝜒 < 13.8484

a.

It would be necessary to assume that the population has a normal distribution.

b.

𝐻 : 𝜎 =1 𝐻 : 𝜎 >1

The test statistic is 𝜒 =

(

)

=

(

)( .

)

> 27.4884

> 12.5916

= 29.04

The rejection region requires 𝛼 = .05 in the upper tail of the𝜒 distribution with df = 𝑛 − 1 = 7 − 1 = 6. From Table IV, Appendix D, 𝜒. = 12.5916. The rejection region is 𝜒 > 12.5916. Since the observed value of the test statistic falls in the rejection region (𝜒 = 29. 04 > 12.5916), H0 is rejected. There is sufficient evidence to indicate that the variance is greater than 1 at 𝛼 = .05. c.

𝐻 : 𝜎 =1 𝐻 : 𝜎 ≠1

The test statistic is 𝜒 =

(

)

=

(

)( .

)

= 29.04

The rejection region requires 𝛼/2 = .05/2 = .025 in the upper tail of the𝜒 distribution with 𝑑𝑓 = 𝑛 − 1 = 7 − 1 = 6. From Table IV, Appendix D, 𝜒. = 1.237347 and 𝜒. = 14.4494. The rejection region is 𝜒 < 1.237347or𝜒 > 14.4494. Since the observed value of the test statistic falls in the rejection region (𝜒 = 29.04 > 14.4494), H0 is rejected. There is sufficient evidence to indicate that the variance is not equal to 1 at 𝛼 = .05. 7.85

a.

𝐻 :𝜎 = 1 𝐻 :𝜎 > 1

The test statistic is χ 2 =

( n − 1) s 2 = (100 − 1) 4.84 = 479.16 σ 02

1

The rejection region requires 𝛼 = .05 in the upper tail of the 𝜒 distribution with df = 𝑛 − 1 = 100 − 1 = 99. From Table IV, Appendix D, 𝜒. ≈ 124.342. The rejection region is 𝜒 > 124.342.

Copyright © 2022 Pearson Education, Inc.


340

Chapter 7

Since the observed value of the test statistic falls in the rejection region (𝜒 = 479.16 > 124.342), H0 is rejected. There is sufficient evidence to indicate the variance is larger than 1 at 𝛼 = .05. b. 7.86

In part b of Exercise 7.84, the test statistic was 𝜒 = 29.04. The conclusion was to reject H0 as it was in this problem.

Some preliminary calculations are: 𝑠 =

(∑ )

=

= 7.9048

To determine if𝜎 < 1, we test: 𝐻 : 𝜎 =1 𝐻 : 𝜎 <1

The test statistic is 𝜒 =

(

)

=

(

) .

= 47.43

The rejection region requires 𝛼 = .05 in the lower tail of the𝜒 distribution with 𝑑𝑓 = 𝑛 − 1 = 7 − 1 = 6. From Table IV, Appendix D, 𝜒. = 1.63539. The rejection region is 𝜒 < 1.63539. Since the observed value of the test statistic does not fall in the rejection region (𝜒 = 47.43 ≮ 1.63539), H0 is not rejected. There is insufficient evidence to indicate the variance is less than 1 at 𝛼 = .05. 7.87

a.

To determine whether the population of institutional investors performs consistently, we test: 𝐻 : 𝜎 = 10 = 100 𝐻 : 𝜎 < 100

b.

The rejection region requires 𝛼 = .05 in the lower tail of the 𝜒 distribution with df = 𝑛 − 1 = 200 − 1 = 199. Using MINITAB, we get: Inverse Cumulative Distribution Function Chi-Square with 199 DF P( X <= x ) 0.05

x 167.361

The rejection region is 𝜒 < 167.361. c.

For this problem, 𝛼 = .05. The probability of concluding the standard deviation is less than 10 when, in fact, it is equal to 10 is .05. If this test was repeated a large number of times, approximately 5% of the time we would conclude the standard deviation was less than 10 when it really was 10.

d.

From the printout, 𝜒 = 154.81 and the p-value is 𝑝 = .009.

e.

Since the p-value is less than 𝛼 (𝑝 = .009 < .05), H0 is rejected. There is sufficient evidence to indicate the standard deviation is less than 10% at 𝛼 = .05.

f.

We must assume that a random sample was selected from the target population and the population sampled from is approximately normal.

Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Tests of Hypothesis

7.88

a.

341

To determine if the variance of the population of trap spacing measurements is larger than 10, we test: 𝐻 : 𝜎 = 10 𝐻 : 𝜎 > 10

b.

Using MINITAB, the results are: Descriptive Statistics: Spacing Variable N Spacing 7

Mean 89.86

StDev 11.63

Variance 135.14

Minimum 70.00

Q1 Median Q3 Maximum 82.00 93.00 99.00 105.00

The sample variance is 𝑠 = 135.14.

7.89

c.

The value of s2 is a variable. The next time a random sample is selected, the value of s2 could be much greater or much smaller. We need to find out how unusual it is to obtain a value of s2 of 135.14 if 𝜎 = 10.

d.

The test statistic is χ 2 =

e.

Using MINITAB, the p-value is 𝑝 = 𝑃(𝜒 ≥ 81.084) = 0.

f.

Since the p-value is so small, H0 is rejected for any reasonable value of 𝛼. There is sufficient evidence to indicate the true population variance is greater than 10.

g.

We must assume that a random sample was selected from the target population and the population sampled from is approximately normal.

a.

Let 𝜎 =weight variance of tees. To determine if the weight variance differs from .000004 (injection mold process is out-of-control), we test:

( n − 1) s 2 = ( 7 − 1)135.14 = 81.084 σ 02

10

𝐻 : 𝜎 = .000004 𝐻 : 𝜎 ≠ .000004

b.

Using MINITAB, the descriptive statistics are: Descriptive Statistics: Tees Variable N Mean Median Tees 40 0.25248 0.25300

The test statistic is 𝜒 =

(

)

=

StDev 0.00223

(

)(. .

)

Minimum 0.24700

Maximum Q1 Q3 0.25600 0.25100 0.25400

= 48.49

The rejection region requires 𝛼/2 = .01/2 = .005 in each tail of the 𝜒 distribution with 𝑑𝑓 = 𝑛 − 1 = 40 − 1 = 39. From Table IV, Appendix D, 𝜒. ≈ 66.7659 and 𝜒. ≈ 20.7065. The rejection region is 𝜒 > 66.7659 or 𝜒 < 20.7065. Since the observed value of the test statistic does not fall in the rejection region (𝜒 = 49.49 ≯ 66.7659 and 𝜒 = 49.49 ≮ 20.7065), H0 is not rejected. There is insufficient evidence to indicate the injection mold process is out-of-control at 𝛼 = .01. c.

We must assume that the distribution of the weights of tees is approximately normal. Using MINITAB, a histogram of the data is:

Copyright © 2022 Pearson Education, Inc.


342

Chapter 7 Histogram of Tees Normal 12

Mean StDev N

10

0.2525 0.002230 40

Frequency

8 6 4 2 0

0.248

0.250

0.252 Tees

0.254

0.256

The data look fairly mound-shaped, so the assumption of normality seems to be reasonably satisfied. 7.90

To determine if there is high volatility in the DROS data for companies claiming DPAD, we test: 𝐻 : 𝜎 =. 15 = .0225 𝐻 : 𝜎 > .0225

The test statistic is 𝜒 =

(

)

=

( ,

)(. .

)

= 7,309.3

The rejection region requires 𝛼 = .01 in the upper tail of the 𝜒 distribution with df = 𝑛 − 1 = 4,605 − 1 = 4,604. From MINITAB, 𝜒. = 4,830.17. The rejection region is 𝜒 > 4,830.17. Since the observed value of the test statistic falls in the rejection region (𝜒 = 7,309.3 > 4,830.17), H0 is rejected. There is sufficient evidence to indicate that there is high volatility in the DROS data for companies claiming DPAD at 𝛼 = .01. 7.91

Using MINITAB, the preliminary calculations are: Descriptive Statistics: Force Variable N Mean StDev Force 12 163.22 4.99

Variance 24.87

Minimum 158.20

Q1 159.95

Median 161.70

Q3 165.80

Maximum 175.60

To determine if the variance of the maximum strand forces that occur after anchorage failure is less than 25 kN2, we test: 𝐻 : 𝜎 = 25 𝐻 : 𝜎 < 25

The test statistic is χ 2 =

( n − 1) s 2 = (12 − 1) 24.87 = 10.94 .

25 σ 02 The rejection region requires 𝛼 = .10 in the lower tail of the 𝜒 distribution with df = 𝑛 − 1 = 12 − 1 = 11. From Table IV, Appendix D, 𝜒. = 5.57779. The rejection region is 𝜒 < 5.57779.

Since the observed value of the test statistic does not fall in the rejection region (𝜒 = 10.94 ≮ 5.57779), H0 is not rejected. There is insufficient evidence to indicate the variance of the maximum strand forces that occur after anchorage failure is less than 25 kN2 at 𝛼 = .10.

Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Tests of Hypothesis

7.92

a.

343

To determine whether the true standard deviation of the population of internal oil contents for sweet potato slices fried at 1300 differs from .1, we test: 𝐻 : 𝜎 = (. 1) 𝐻 : 𝜎 ≠ (. 1) 2 n − 1) s 2 ( 6 − 1)(.011) ( = = .0605 . The test statistic is χ = σ 02 (.1) 2 2

The rejection region requires 𝛼/2 = .05/2 = .025 in each tail of the 𝜒 distribution with df = 𝑛 − 1 = 6 − 1 = 5. From Table IV, Appendix D, 𝜒. = 12.8325 and 𝜒. = .831211. The rejection region is 𝜒 < .831211 or 𝜒 > 12.8325. Since the observed value of the test statistic falls in the rejection region (𝜒 = .0605 < .831211), H0 is rejected. There is sufficient evidence to indicate the true standard deviation of the population of internal oil contents for sweet potato slices fried at 1300 differs from .1 at 𝛼 = .05. b.

7.93

From Exercise 6.97, the confidence interval for the variance in millions of grams was 47.15 ≤ 𝜎 ≤ 727.85. Converting this to gigagrams, the interval would be .00004715 ≤ 𝜎 ≤ .00072785. Finally, converting this to a confidence interval for the standard deviation, we get √. 00004715 ≤ 𝜎 ≤ √. 00072785 or . 0069 ≤ 𝜎 ≤ .0270. Since the hypothesized value of the standard deviation of .1 is not in this interval, it would correspond to rejecting H0. This agrees with the result in part a.

Using MINITAB, the descriptive statistics are: Descriptive Statistics: Drug Variable N Mean StDev Drug 50 89.291 3.183

Variance 10.134

Minimum 81.790

Median 89.375

Maximum 94.830

To determine whether new method of determining drug concentration is less variable than the standard method, we test: 𝐻 :𝜎 = 9 𝐻 :𝜎 < 9

The test statistic is χ 2 =

( n − 1) s 2 = ( 50 − 1)10.134 = 55.174 . σ 02

9

The rejection region requires 𝛼 = .01in the lower tail of the 𝜒 distribution with df = 𝑛 − 1 = 50 − 1 = 49. From Table IV, Appendix D, 𝜒. = 29.7067. The rejection region is 𝜒 < 29.7067. Since the observed value of the test statistic doesn’t fall in the rejection region (𝜒 = 55.174 ≮ 29.7067), H0 is not rejected. There is insufficient evidence to indicate the new method of determining drug concentration is less variable than the standard method at 𝛼 = .01. 7.94

To determine whether the true conduction time standard deviation is less than 7 seconds (variance less than 49), we test: 𝐻 :𝜎 = 7 𝐻 :𝜎 < 7

Copyright © 2022 Pearson Education, Inc.


344

Chapter 7

The test statistic is χ 2 =

( n − 1) s 2 = (18 − 1) 6.32 = 13.77 .

σ 02 72 The rejection region requires 𝛼 = .01 in the lower tail of the𝜒 distribution with df = 𝑛 − 1 = 18 − 1 = 17. From Table IV, Appendix D, 𝜒. = 6.40776. The rejection region is 𝜒 < 6.40776. Since the observed value of the test statistic does not fall in the rejection region (𝜒 = 13.77 ≮ 6.40776), H0 is not rejected. There is insufficient evidence to indicate the true conduction time standard deviation is less than 7 seconds at 𝛼 = .01. Thus, the prototype system does not satisfy this requirement. 7.95

a.

Large standard deviations cause variation in the sample data. This makes it extremely difficult to reject Ho when the value of the sample mean is relatively close to the value being tested. In this case, the value of the sample mean is 4.8 and the value it is being compared to is 5.

b.

The rejection region requires 𝛼 = .01 in the lower tail of the z-distribution. From Table II, Appendix D, 𝑧. = 2.33. The rejection region is 𝑧 < −2.33. 𝑥̄ −𝜇

The test statistic is 𝑧 = 𝜎 0 = 4.8−5 . This value has to be less than -2.33 for Ho to be rejected. Solving 𝜎 𝑥

43

for 𝜎 yields 4.8 − 5 = −2.33

( .

) √ .

= 𝜎 → .56287 = 𝜎

The population standard deviation needs to be less than .56287 in order for the null hypothesis to be rejected. c.

To determine whether the true standard deviation is less than .56287 minutes, we test: 𝐻 : 𝜎 = .56287 = .31682 𝐻 : 𝜎 < .31682

The test statistic is 𝜒 =

(

)

=

(

)(. . .

)

= 1315.4

The rejection region requires 𝛼 = .01 in the lower tail of the𝜒 distribution with df = 𝑛 − 1 = 43 − 1 = 42. From Table IV, Appendix D, 𝜒. ≈ 22.1643. The rejection region is 𝜒 < 22.1643. Since the observed value of the test statistic does not fall in the rejection region (𝜒 = 1315.4 ≮ 22.1643), H0 is not rejected. There is insufficient evidence to indicate the true standard deviation is less than .56287 minutes at 𝛼 = .01.

7.96

d.

When testing a population mean with a sample size of n = 43, the population that we are sampling from does not need to be normally distributed. When testing a population standard deviation (or variance), however, the population does need to be normally distributed. The inferences about the standard deviation would be invalid if the population CT image acquisition times were not normally distributed.

a.

The power of a test increases when: 1. 2. 3.

b.

The distance between the null and alternative values of𝜇increases. The value of𝛼increases. The sample size increases.

The power of a test is equal to 1 − 𝛽. As 𝛽 increases, the power decreases.

Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Tests of Hypothesis

7.97

a.

By the Central Limit Theorem, the sampling distribution of 𝑥̄ is approximately normal with 𝜇 ̄ = 𝜇 = 500 and 𝜎 ̄ = = = 20. √

b.

345

𝑥̄ = 𝜇 + 𝑧 𝜎 ̄ = 𝜇 + 𝑧

where 𝑧 = 𝑧.

= 1.645 from

Table II, Appendix D. Thus, 𝑥̄ = 500 + 1.645(20) = 532.9 c.

The sampling distribution of 𝑥̄ is approximately normal by the Central Limit Theorem with 𝑥̄ = 𝜇 = 550 and 𝜎 ̄ = = = 20. √

7.98

d.

𝛽 = 𝑃(𝑥̄ < 532.9 when 𝜇 = 550) = 𝑃 𝑧 <

e.

𝑃𝑜𝑤𝑒𝑟 = 1 − 𝛽 = 1 − .1949 = .8051

.

= 𝑃(𝑧 < −.86) = .5 − .3051 = .1949

/√

From Exercise 7.97 we want to test 𝐻 : 𝜇 = 500 against 𝐻 : 𝜇 > 500 using 𝛼 = .05, 𝜎 = 100, 𝑛 = 25, and 𝜇 ̄ = 532.9. a.

𝛽 = 𝑃(𝑥̄ < 532.9 when 𝜇 = 575) = 𝑃 𝑧 <

. /√

= 𝑃(𝑧 < −2.11) = .5 − .4826 = .0174

(Using Table II, Appendix D)

7.99

b.

𝑃𝑜𝑤𝑒𝑟 = 1 − 𝛽 = 1 − .0174 = .9826

c.

In Exercise 7.97, 𝛽 = .1949 and 𝑝𝑜𝑤𝑒𝑟 = .8051. The value of 𝛽 has decreased in this exercise since 𝜇 = 575 is further from the hypothesized value than 𝜇 = 550. As a result, the power of the test in this exercise has increased (when 𝛽 decreases, the power of the test increases).

a.

The sampling distribution of 𝑥̄ will be approximately normal (by the Central Limit Theorem) with 𝜇 ̄ = 𝜇 = 75 and 𝜎 ̄ = = = 2.143. √

b.

The sampling distribution of 𝑥̄ will be approximately normal (by the Central Limit Theorem) with 𝜇 ̄ = 𝜇 = 70 and 𝜎 ̄ = = = 2.143. √

c.

First, find 𝑥̄ = 𝜇 − 𝑧 𝜎 ̄ = 𝜇 − 𝑧 Thus, 𝑥̄ = 75 − 1.28

where 𝑧.

= 1.28 from Table II, Appendix D.

= 72.257

Now, find 𝛽 = 𝑃(𝑥̄ > 72.257 when 𝜇 = 70) = 𝑃 𝑧 >

.

– √

= 𝑃(𝑧 > 1.05) = .5 − .3531 = .1469

7.100

d.

𝑃𝑜𝑤𝑒𝑟 = 1 − 𝛽 = 1 − .1469 = .8531

a.

From Exercise 7.99, we want to test 𝐻 : 𝜇 = 75 against 𝐻 : 𝜇 < 75 using 𝛼 = .10, 𝜎 = 15, 𝑛 = 49, and 𝑥̄ = 72.257. Using Table II, Appendix D:

Copyright © 2022 Pearson Education, Inc.


Chapter 7 If 𝝁 = 𝟕𝟒, 𝛽 = 𝑃(𝑥̄ > 72.257 when 𝜇 = 74) = 𝑃 𝑧 >

.

If 𝝁 = 𝟕𝟐, 𝛽 = 𝑃(𝑥̄ > 72.257 when 𝜇 = 72) = 𝑃 𝑧 >

.

If 𝝁 = 𝟕𝟎, 𝛽 = 𝑃(𝑥̄ > 72.257 when 𝜇 = 70) = .1469

= 𝑃(𝑧 > −.81) = .5 + .2910 = .7910

/√

= 𝑃(𝑧 > .12) = .5 − .0478 = .4522

/√

(Refer to Exercise 7.99, part c.)

If 𝝁 = 𝟔𝟖, 𝛽 = 𝑃(𝑥̄ > 72.257 when 𝜇 = 68) = 𝑃 𝑧 >

.

If 𝝁 = 𝟔𝟔, 𝛽 = 𝑃(𝑥̄ > 72.257 when 𝜇 = 66) = 𝑃 𝑧 >

.

= 𝑃(𝑧 > 1.99) = .5 − .4767 = .0233

/√

= 𝑃(𝑧 > 2.92) = .5 − .4982 = .0018

/√

In summary, 𝜇 𝛽

b.

74 .7910

72 .4522

70 .1469

68 .0233

66 .0018

Using MINITAB, the graph is: Scatterplot of beta vs mu 0.8 0.7 0.6 0.5 beta

346

0.4 0.3 0.2 0.1 0.0 65

66

67

68

69

70 mu

71

72

73

74

c.

Looking at the graph, 𝛽 is approximately .62 when 𝜇 = 73.

d.

𝑃𝑜𝑤𝑒𝑟 = 1 − 𝛽

Therefore, 𝜇 𝛽 Power

74 .7910 .2090

72 .4522 .5478

70 .1469 .8531

68 .0233 .9767

Copyright © 2022 Pearson Education, Inc.

66 .0018 .9982


Inferences Based on a Single Sample: Tests of Hypothesis

347

Scatterplot of Power vs mu 1.1 1.0 0.9

Power

0.8 0.7 0.6 0.5 0.4 0.3 0.2 65

66

67

68

69

70 mu

71

72

73

74

The power curve starts out close to 1 when 𝜇 = 66 and decreases as 𝜇 increases, while the 𝛽 curve is close to 0 when 𝜇 = 66 and increases as 𝜇 increases.

7.101

e.

As the distance between the true mean 𝜇 and the null hypothesized mean 𝜇 increases, 𝛽 decreases and the power increases. We can also see that as 𝛽 increases, the power decreases.

a.

First, the sample is sufficiently large if both 𝑛𝑝 ≥ 15 and 𝑛𝑞 ≥ 15. 𝑛𝑝 = 100(. 7) = 70 and 𝑛𝑞 ≥ 100(1 − .7) = 30. Since both 𝑛𝑝 ≥ 15and 𝑛𝑞 ≥ 15, the normal distribution will be adequate. Thus, the sampling distribution of 𝑝̂ will be approximately normal with 𝐸(𝑝̂ ) = 𝑝 = .7 and 𝜎 =

b.

=

= .0458.

The sampling distribution of 𝑝̂ will be approximately normal with 𝐸(𝑝̂ ) = 𝑝 = .65 and 𝜎 =

c.

. (. )

=

(.

.

)

= .0477.

First, find 𝑝̂ , = 𝑝 − 𝑧 / 𝜎 = 𝑝 − 𝑧 / where 𝑧.

= 𝑧. 25 = 1.96 from Table II, Appendix D.

/

. (. )

Thus, 𝑝̂ , = .7 − 1.96

= .610

𝑝̂ , = 𝑝 + 𝑧 / 𝜎 = 𝑝 + 𝑧 /

= .7 + 1.96

. (. )

= .790

Now, find 𝛽 = 𝑃(. 610 < 𝑝̂ < .79 when 𝑝 = .65) =𝑃

d.

.

. (.

.

)

<𝑧<

.

. .

(.

)

= 𝑃(−.84 < 𝑧 < 2.94) = .2995 + .4984 = .7979

𝛽 = 𝑃(. 610 < 𝑝̂ < .79 when 𝑝 = .71) =𝑃

.

. .

(.

)

<𝑧<

.

. .

(.

)

= 𝑃(−2.20 < 𝑧 < 1.76) = .4861 + .4608 = .9469

Copyright © 2022 Pearson Education, Inc.


348

Chapter 7

7.102

a.

To determine if the mean size of California homes exceeds the national average, we test: 𝐻 : 𝜇 = 2,600 𝐻 : 𝜇 > 2,600 ̄

The test statistic is 𝑧 =

̄

,

=

, /√

= 4.55

The rejection region requires 𝛼 = .01 in the upper tail of the z-distribution. From Table II, Appendix D, 𝑧. = 2.33. The rejection region is 𝑧 > 2.33. Since the observed value of the test statistic falls in the rejection region (𝑧 = 4.55 > 2.33), H0 is rejected. There is sufficient evidence to indicate the mean size of California homes exceeds the national average at 𝛼 = .01. b.

To compute the power, we must first set up the rejection regions in terms of 𝑥̄ . 𝑥̄ = 𝜇 + 𝑧 𝜎 ̄ ≈ 𝜇 + 2.33

= 2,600 + 2.33

= 2,659.88

We would reject H0 if 𝑥̄ > 2,659.88. The power of the test when 𝜇 = 2,700 would be: ̄

𝑃𝑜𝑤𝑒𝑟 = 𝑃(𝑥̄ > 2,659.88|𝜇 = 2,700) = 𝑃 𝑧 >

=𝑃 𝑧> ̄

,

.

, /√

= 𝑃(𝑧 > −1.56) = .5 + .4406 = .9406

c.

The power of the test when 𝜇 = 2,650 would be: 𝑃𝑜𝑤𝑒𝑟 = 𝑃(𝑥̄ > 2,659.88|𝜇 = 2,650) = 𝑃 𝑧 >

̄ ̄

=𝑃 𝑧>

,

.

, /√

= 𝑃(𝑧 > 0.38) = .5 − .1480 = .3520

7.103

a.

We have failed to reject H0 when it is not true. This is a Type II error. To compute 𝛽, first find: 𝑥̄ = 𝜇 − 𝑧 𝜎 ̄ = 𝜇 − 𝑧 Thus, 𝑥̄ = 5.0 − 1.645

. √

where 𝑧.

= 1.645 from Table II, Appendix D.

= 4.998355

Then find: 𝛽 = 𝑃(𝑥̄ > 4.998355 when 𝜇 = 4.9975) = 𝑃 𝑧 >

.

. .

/√

= 𝑃(𝑧 > .86) = .5 − .3051 = .1949

7.104

b.

We have rejected H0 when it is true. This is a Type I error. The probability of a Type I error is 𝛼 = .05.

c.

A departure of .0025 below 5.0 is 𝜇 = 4.9975. Using a, 𝛽 = .1949 when 𝜇 = 4.9975. The power of the test is 1 − 𝛽 = 1 − .1949 = .8051.

To compute the power, we must first set up the rejection regions in terms of 𝑝̂ . From Exercise 7.69, the rejection region is 𝑧 > 2.33. In terms of 𝑝̂ , this is:

Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Tests of Hypothesis 2.33 =

. . (

. )

⇒ 2.33

. (

. )

349

= 𝑝̂ − .8 ⇒ 𝑝̂ = .8 + .0589 = .8589

The rejection region is 𝑝̂ > .8589. The power of the test when 𝑝 = .79 would be:

𝑃𝑜𝑤𝑒𝑟 = 𝑃(𝑝̂ > .8589|𝑝 = .79) = 𝑃 𝑧 >

.

. .

(

.

= 𝑃(𝑧 > 2.67) = .5 − .4962 = .0038

)

(Using Table II, Appendix D) 7.105

To compute the power, we must first set up the rejection region in terms of 𝑝̂ . The rejection region requires 𝛼/2 = .01/2 = .005 in each tail of the z-distribution. From Table II, Appendix D, 𝑧.005 = 2.575. The rejection region is 𝑧 > −2.575 or 𝑧 > 2.575. Thus, 𝑝̂ , = 𝑝 − 𝑧 / 𝜎 = 𝑝 − 𝑧 /

= .5 − 2.575

. (. )

= .5 − .117 = .383 and

𝑝̂ , = 𝑝 + 𝑧 / 𝜎 = 𝑝 + 𝑧 /

= .5 + 2.575

. (. )

= .5 + .117 = .617.

𝑃𝑜𝑤𝑒𝑟 = 𝑃(𝑝̂ < .383 or 𝑝̂ > .617|𝑝 = .65) = 𝑃 𝑧 < = 𝑃 ⎛𝑧 < ⎝

7.106

a.

. 383 − .65

⎞ + 𝑃 ⎛𝑧 >

+𝑃 𝑧 >

. 617 − .65

⎞ = 𝑃(𝑧 < −6.16) + 𝑃(𝑧 > −.76) . 65(. 35) . 65(. 35) ⎝ 121 ⎠ 121 ⎠ = (. 5 − .5) + (. 5 + .2764) = .7764

To determine if the mean mpg for 2019 Honda Civic autos is greater than 42 mpg, we test: 𝐻 : 𝜇 = 42 𝐻 : 𝜇 > 42

b.

The test statistic is 𝑧 =

̄ ̄

=

. . /√

= 2.54

The rejection region requires 𝛼 = .05 in the upper tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.645. The rejection region is 𝑧 > 1.645. Since the observed value of the test statistic falls in the rejection region (𝑧 = 2.54 > 1.645), H0 is rejected. There is sufficient evidence to indicate that the mean mpg for 2019 Honda Civic autos is greater than 42 mpg at 𝛼 = .05. We must assume that the sample was a random sample. c.

First find: 𝑥̄ = 𝜇 + 𝑧 𝜎 ̄ = 𝜇 + 𝑧

Thus, 𝑥̄ = 42 + 1.645

. √

where𝑧 = 1.645from Table II, Appendix D.

= 43.49

Copyright © 2022 Pearson Education, Inc.


Chapter 7

For 𝜇 = 42.5: 𝑃𝑜𝑤𝑒𝑟 = 𝑃(𝑥̄ > 43.49|𝜇 = 42.5) = 𝑃 𝑧 >

43.49 − 42.5 6.4/√50

= 𝑃(𝑧 > 1.09) = .5 − .3621

= .1379

For 𝜇 = 43: 𝑃𝑜𝑤𝑒𝑟 = 𝑃(𝑥̄ > 43.49|𝜇 = 43) = 𝑃 𝑧 >

.

= 𝑃(𝑧 > .54) = .5 − .2054 = .2946

. /√

For 𝜇 = 43.5: .

𝑃𝑜𝑤𝑒𝑟 = 𝑃(𝑥̄ > 43.49|𝜇 = 43.5) = 𝑃 𝑧 >

.

= 𝑃(𝑧 > −.01) = .5 + .0040

. √

= .5040

For 𝜇 = 44: 𝑃𝑜𝑤𝑒𝑟 = 𝑃(𝑥̄ > 43.49|𝜇 = 44) = 𝑃 𝑧 >

.

= 𝑃(𝑧 > −.56) = .5 + .2123 = .7123

. /√

For 𝜇 = 44.5: 𝑃𝑜𝑤𝑒𝑟 = 𝑃(𝑥̄ > 43.49|𝜇 = 44.5) = 𝑃 𝑧 >

d.

.

.

= 𝑃(𝑧 > −1.12) = .5 + .3686

. √

= .8686

A scatter plot shows the power curve here:

Scatter Plot of Power vs Mu 0.9 0.8 0.7 0.6

Power

350

0.5 0.4 0.3 0.2 0.1 0.0 42.0

42.5

43.0

43.5

44.0

44.5

45.0

Mu

e.

From the plot, the power is approximately .60. For 𝜇 = 43.75: 𝑃𝑜𝑤𝑒𝑟 = 𝑃(𝑥̄ > 43.49|𝜇 = 43.75) = 𝑃 𝑧 >

f.

43.49 − 43.75 6.4 √50

= 𝑃(𝑧 > −.29) = .5 + .1141 = .6141

From the plot, the power is approximately 1. For 𝜇 = 47:

Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Tests of Hypothesis 43.49 − 47 6.4 √50

𝑃𝑜𝑤𝑒𝑟 = 𝑃(𝑥̄ > 43.49|𝜇 = 47) = 𝑃 𝑧 >

351

= 𝑃(𝑧 > −3.88) = .5 + .5 = 1.0

If the true value of 𝜇 is 47, the approximate probability that the test will fail to reject H0 is 1 − 1 = 0. 7.107

First, find 𝑥̄ such that 𝑃(𝑥̄ < 𝑥̄ ) = .05. 𝑃(𝑥̄ < 𝑥̄ ) = 𝑃 𝑧 <

̄

= 𝑃(𝑧 < 𝑧 ) = .05.

. /√

From Table II, Appendix D, 𝑧 = −1.645. Thus, 𝑧 =

̄ . /√

⇒ 𝑥̄ = −1.645(.173) + 10 = 9.715

The probability of a Type II error is: 𝛽 = 𝑃(𝑥̄ ≥ 9.715|𝜇 = 9.5) = 𝑃 𝑧 ≥

.

. . /√

= 𝑃(𝑧 ≥ 1.24) = .5 − .3925 = .1075

7.108

For a large sample test of hypothesis about a population mean, no assumptions are necessary because the Central Limit Theorem assures that the test statistic will be approximately normally distributed. For a small sample test of hypothesis about a population mean, we must assume that the population being sampled from is normal. The test statistic for the large sample test is the z statistic, and the test statistic for the small sample test is the t statistic.

7.109

The smaller the p-value associated with a test of hypothesis, the stronger the support for the alternative hypothesis. The p-value is the probability of observing your test statistic or anything more unusual, given the null hypothesis is true. If this value is small, it would be very unusual to observe this test statistic if the null hypothesis were true. Thus, it would indicate the alternative hypothesis is true.

7.110

The elements of the test of hypothesis that should be specified prior to analyzing the data are: null hypothesis, alternative hypothesis, and rejection region based on𝛼.

7.111

There is not a direct relationship between 𝛼 and 𝛽. That is, if 𝛼 is known, it does not mean 𝛽 is known because 𝛽 depends on the value of the parameter in the alternative hypothesis and the sample size. However, as 𝛼 decreases, 𝛽 increases for a fixed value of the parameter and a fixed sample size. Thus, if 𝛼 is very small, 𝛽 will tend to be large.

7.112

𝛼 = P(Type I error) = P(rejecting H0 when it is true). Thus, if rejection of H0 would cause your firm to go out of business, you would want this probability or 𝛼 to be small.

7.113

a.

𝐻 : 𝜇 = 80 𝐻 : 𝜇 < 80

The test statistic is t =

x − μ0 s/ n

=

72.6 − 80 19.4 / 20

= −7.51

The rejection region requires 𝛼 = .05 in the lower tail of the t-distribution with df = 𝑛 − 1 = 20 − 1 = 19. From Table III, Appendix D, 𝑡. = 1.729. The rejection region is 𝑡 < − 1.729. Since the observed value of the test statistic falls in the rejection region (𝑡 = −7.51 < −1.729), H0 is rejected. There is sufficient evidence to indicate that the mean is less than 80 at 𝛼 = .05.

Copyright © 2022 Pearson Education, Inc.


352

Chapter 7

b.

𝐻 : 𝜇 = 80 𝐻 : 𝜇 ≠ 80

The test statistic is t =

x − μ0 s/ n

72.6 − 80

=

19.4 / 20

= −7.51

The rejection region requires 𝛼/2 = .01/2 = .005 in each tail of the t-distribution with df = 𝑛 − 1 = 20 − 1 = 19. From Table III, Appendix D, 𝑡. = 2.861. The rejection region is 𝑡 < −2.861 or 𝑡 > 2.861. Since the observed value of the test statistic falls in the rejection region (𝑡 = −7.51 < −2.861), H0 is rejected. There is sufficient evidence to indicate that the mean is different from 80 at 𝛼 = .01. 7.114

a.

𝐻 : 𝜇 = 8.3 𝐻 : 𝜇 ≠ 8.3 ̄

The test statistic is 𝑧 =

≈ ̄

. .

. /√

= −1.67

The rejection region requires 𝛼/2 = .05/2 = .025 in each tail of the z-distribution. From Table II, Appendix D, 𝑧. 25 = 1.96. The rejection region is 𝑧 < −1.96 or 𝑧 > 1.96. Since the observed value of the test statistic does not fall in the rejection region (𝑧 = −1.67 ≮ − 1.96), H0 is not rejected. There is insufficient evidence to indicate that the mean is different from 8.3 at 𝛼 = .05. b.

𝐻 : 𝜇 = 8.4 𝐻 : 𝜇 ≠ 8.4

The test statistic is 𝑧 =

̄ ̄

. .

. /√

= −3.35

The rejection region is the same as part b, 𝑧 < −1.96 or 𝑧 > 1.96. Since the observed value of the test statistic falls in the rejection region (𝑧 = −3.35 < −1.96), H0 is rejected. There is sufficient evidence to indicate that the mean is different from 8.4 at 𝛼 = .05. c.

𝐻 : 𝜎=1 𝐻 : 𝜎 ≠ 1 or 𝐻 : 𝜎 =1 𝐻 : 𝜎 ≠1

The test statistic is 𝜒 =

(

)

=

(

)(.

)

= 108.59

The rejection region requires 𝛼/2 = .05/2 = .025 in each tail of the 𝜒 distribution with 𝑑𝑓 = 𝑛 − 1 = 175 − 1 = 174. Since there are no values in the table with df > 100, we will use MINITAB to find the critical values. Inverse Cumulative Distribution Function Chi-Square with 174 DF P( X <= x ) 0.025

x 139.367

Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Tests of Hypothesis

353

Inverse Cumulative Distribution Function Chi-Square with 174 DF P( X <= x ) 0.975

𝜒.

x 212.419

= 212.419 and 𝜒.

= 139.367. The rejection region is 𝜒 > 212.419 or 𝜒 < 139.367.

Since the observed value of the test statistic falls in the rejection region (𝜒 = 108.59 < 139.367), H0 is rejected. There is sufficient evidence to indicate the standard deviation differs from 1 at 𝛼 = .05. d.

In part a, the rejection region is 𝑧 < −1.96 or 𝑧 > 1.96. In terms of 𝑥̄ , the rejection region would be: 𝑧=

̄ ̄

⇒ 1.96 =

𝑧=

̄

. . √

⇒ .117 = 𝑥̄ − 8.3 ⇒ 𝑥̄ = 8.417

𝑥̄ − 8.3 𝑥̄ − 𝜇 ⇒ −1.96 = ⇒ −.117 = 𝑥̄ − 8.3 ⇒ 𝑥̄ = 8.183 . 79 𝜎̄ √175

Based on 𝑥̄ , the rejection region would be: Reject H0 if 𝑥̄ < 8.183 or 𝑥̄ > 8.417. The power of the test is the probability the test statistic falls in the rejection region, given the alternative hypothesis is true. In this case, we will let 𝜇 = 8.5. Power = P ( x < 8.183 | μ a = 8.5) + P ( x > 8.417 | μ a = 8.5)   8.183 − 8.5  8.417 − 8.5  = Pz <  + Pz >  .79 175  .79 175    = P ( z < −5.31) + P ( z > −1.39 ) = (.5 − .5) + (.5 + .4177 ) = .9177

(Using Table II, Appendix D) 7.115

a.

𝐻 : 𝑝 = .35 𝐻 : 𝑝 < .35

The test statistic is z =

pˆ − p 0 p 0q 0 n

=

.29 − .35 .35 (.65) 200

= −1.78

The rejection region requires 𝛼 = .05 in the lower tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.645. The rejection region is 𝑧 < −1.645. Since the observed value of the test statistic falls in the rejection region (𝑧 = −1.78 < −1.645), H0 is rejected. There is sufficient evidence to indicate 𝑝 < .35 at 𝛼 = .05. b.

𝐻 : 𝑝 = .35 𝐻 : 𝑝 ≠ .35

The test statistic is 𝑧 = −1.78 (from a). The rejection region requires 𝛼/2 = .05/2 = .025 in each tail of the z-distribution. From Table II, Appendix D, 𝑧. 25 = 1.96. The rejection region is 𝑧 < −1.96 or 𝑧 > 1.96. Copyright © 2022 Pearson Education, Inc.


354

Chapter 7

Since the observed value of the test statistic does not fall in the rejection region (𝑧 = −1.78 ≮ − 1.96), H0 is not rejected. There is insufficient evidence to indicate p is different from .35 at 𝛼 = .05. 7.116

7.117

a.

The 𝑝-value = 𝑝 = .1288 = 𝑃(𝑡 ≥ 1.174). Since the p-value is not very small, there is no evidence to reject H0 for 𝛼 ≤ .10. There is no evidence to indicate the mean is greater than 10.

b.

We must assume that a random sample was selected from a population that is normally distributed.

c.

For the alternative hypothesis 𝐻 : 𝜇 ≠ 10, the p-value is 2 times the p-value for the one-tailed test. The 𝑝-value = 𝑝 = 2(.1288) = .2576. There is no evidence to reject H0 for 𝛼 ≤ .10. There is no evidence to indicate the mean is different from 10.

a.

𝐻 : 𝜎 = 30 𝐻 : 𝜎 > 30

The test statistic is

2 n − 1) s 2 ( 41 − 1)( 6.9) ( = = 63.48 χ = 2

σ 02

30

The rejection region requires 𝛼 = .05 in the upper tail of the𝜒 distribution with df = 𝑛 − 1 = 2 = 55.7585 . The rejection region is 𝜒 > 55.7585. 41 − 1 = 40. From Table IV, Appendix D, χ .05 Since the observed value of the test statistic falls in the rejection region (𝜒 = 63.48 > 55.7585), H0 is rejected. There is sufficient evidence to indicate the variance is larger than 30 at 𝛼 = .05. b.

𝐻 : 𝜎 = 30 𝐻 : 𝜎 ≠ 30

The test statistic is 𝜒 = 63.48 (from part a). The rejection region requires 𝛼/2 = .05/2 = .025 in each tail of the𝜒 distribution with 2 2 = 59.3417 and χ .975 = 24.4331 . df = 𝑛 − 1 = 41 − 1 = 40. From Table IV, Appendix D, χ .025 The rejection region is 𝜒 < 24.4331 or 𝜒 > 59.3417. Since the observed value of the test statistic falls in the rejection region (𝜒 = 63.48 > 59.3417), H0 is rejected. There is sufficient evidence to indicate the variance is not 30 at 𝛼 = .05. 7.118

a.

Let p = the percentage of all college faculty that have taught an online class. To test to determine if the true population proportion exceeds 30%, we test: 𝐻 : 𝑝 = .30 𝐻 : 𝑝 > .30

7.119

b.

The rejection region requires 𝛼 = .01 in the upper tail of the z-distribution. From Table II, Appendix D, 𝑧. = 2.33. The rejection region is 𝑧 > 2.33.

a.

𝑝̂ =

, ,

= .830

To determine if the true proportion of all US adults who incorrectly believe that the Census will ask about US citizenship exceeds .8, we test: b.

𝐻 : 𝑝 = .8

Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Tests of Hypothesis

355

𝐻 : 𝑝 > .8

7.120

7.121

=

.

.

c.

The test statistic is 𝑧 =

d.

The rejection region requires 𝛼 = .01 in the upper tail of the z-distribution. From Table II, Appendix D, 𝑧. = 2.33. The rejection region is 𝑧 > 2.33.

e.

The p-value is 𝑝 = 𝑃(𝑧 > 4.46) ≈ (.5 − .5) ≈ 2(0) ≈ 0 (Using Table II, Appendix D)

f.

Since the observed value of the test statistic falls in the rejection region (𝑧 = 4.46 > 2.33), H0 is rejected. There is sufficient evidence to indicate that the true proportion of all US adults who incorrectly believe that the Census will ask about US citizenship exceeds .8 at 𝛼 = .01.

g.

Since the p-value is less than the alpha (𝑝 ≈ 0 < .01 = 𝛼), H0 is rejected. There is sufficient evidence to indicate that the true proportion of all US adults who incorrectly believe that the Census will ask about US citizenship exceeds .8 at 𝛼 = .01.

a.

The rejection region requires 𝛼 = .01 in the lower tail of the z-distribution. From Table II, Appendix D, 𝑧. = 2.33. The rejection region is 𝑧 < −2.33.

b.

The test statistic is 𝑧 =

̄

.

≈ ̄

. (. )

.

= 4.46.

= −.40

c.

Since the observed value of the test statistics does not fall in the rejection region (𝑧 = −.40 ≮ − 2.33), H0 is not rejected. There is insufficient evidence to indicate the true mean number of latex gloves used per week by all hospital employees is less than 20 at 𝛼 = .01.

a.

The rejection region requires 𝛼/2 = .01/2 = .005 in each tail of the 𝜒 distribution with 𝑑𝑓 = 𝑛 − 1 = 46 − 1 = 45. Using MINITAB, Inverse Cumulative Distribution Function Chi-Square with 45 DF P( X <= x ) 0.005

x 24.3110

Inverse Cumulative Distribution Function Chi-Square with 45 DF P( X <= x ) 0.995

x 73.1661

The rejection region is 𝜒 < 24.3110 or 𝜒 > 73.1661. (

)

=

(

)

.

b.

The test statistic is 𝜒 =

= 63.7245.

c.

Since the observed value of the test statistic does not fall in the rejection region (𝜒 = 63.7245 ≮ 24.3110 and 𝜒 = 63.7245 ≯ 73.1661), H0 is not rejected. There is insufficient evidence to indicate the variance is different from 100 at 𝛼 = .01.

Copyright © 2022 Pearson Education, Inc.


356

Chapter 7

7.122

First, check to see if n is large enough. 𝑛𝑝 = 2,376(. 7) = 1,663.2 and 𝑛𝑞 = 2,376(. 3) = 712.8

Since both 𝑛𝑝 ≥ 15 and 𝑛𝑞 ≥ 15, the normal approximation will be adequate. =

,

a.

𝑝̂ =

b.

To determine if the true detection rate for pictures of PTW is different from .70, we test:

,

= .654

𝐻 : 𝑝 = .70 𝐻 : 𝑝 ≠ .70

c.

The test statistic is 𝑧 =

=

=

.

. (.

.

)

= −4.89

,

d.

The rejection region requires 𝛼/2 = .10/2 = .05 in each tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.645. The rejection region is 𝑧 < −1.645 or 𝑧 > 1.645.

e. Since the observed value of the test statistic falls in the rejection region (𝑧 = −4.89 < −1.645), H0 is rejected. There is sufficient evidence to indicate the true detection rate for pictures of PTW is different from .70 at 𝛼 = .10. 7.123

a.

To determine if the average high technology stock is riskier than the market as a whole, we test: 𝐻 : 𝜇=1 𝐻 : 𝜇>1

b.

̄

The test statistic is 𝑡 =

/√

The rejection region requires 𝛼 = .10 in the upper tail of the t-distribution with df = 𝑛 − 1 = 15 − 1 = 14. From Table III, Appendix D, 𝑡. = 1.345. The rejection region is 𝑡 > 1.345. c.

We must assume the population of beta coefficients of technology stocks is normally distributed.

d.

The test statistic is 𝑡 =

̄ /√

=

. .

/√

= 2.41

Since the observed value of the test statistic falls in the rejection region (𝑡 = 2.41 > 1.345), H0 is rejected. There is sufficient evidence to indicate the mean high technology stock is riskier than the market as a whole at 𝛼 = .10. e.

From Table III, Appendix D, with df = 𝑛 − 1 = 15 − 1 = 14, . 01 < 𝑃(𝑡 ≥ 2.41) < .025. Thus, . 01 < 𝑝-value < .025. The probability of observing this test statistic, 𝑡 = 2.41, or anything more unusual is between .01 and .025. Since this probability is small, there is evidence to indicate the null hypothesis is false for 𝛼 = .05.

f.

To determine if the variance of the stock beta values differs from .15, we test: 𝐻 : 𝜎 = .15 𝐻 : 𝜎 ≠ .15

The test statistic is 𝜒 =

(

)

=

(

). .

= 12.7773.

Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Tests of Hypothesis

357

The rejection region requires 𝛼/2 = .05/2 = .025 in each tail of the𝜒 distribution with df = 𝑛 − 1 = 15 − 1 = 14. From Table IV, Appendix D, 𝜒. = 5.62872 and 𝜒. = 26.1190. The rejection region is 𝜒 < 5.62872 or 𝜒 > 26.1190. Since the observed value of the test statistic does not fall in the rejection region (𝜒 = 12.7773 ≮ 5.62875 and 𝜒 = 12.7773 ≯ 26.1190), H0 is not rejected. There is insufficient evidence to indicate the variance of the stock beta values differs from .15 at 𝛼 = .05. 7.124

a.

The population parameter of interest is p = proportion of items that had the wrong price scanned at California Wal-Mart stores.

b.

To determine if the true proportion of items scanned at California Wal-Mart stores with the wrong price exceeds the 2% NIST standard, we test: 𝐻 : 𝑝 = .02 𝐻 : 𝑝 > .02

c.

The test statistic is 𝑧 =

=

.

. (.

.

)

= 14.23

The rejection region requires 𝛼 = .05 in the upper tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.645. The rejection region is 𝑧 > 1.645. d.

Since the observed value of the test statistic falls in the rejection region (𝑧 = 14.23 > 1.645), H0 is rejected. There is sufficient evidence to indicate that the true proportion of items scanned at California Wal-Mart stores with the wrong price exceeds the 2% NIST standard at 𝛼 = .05. This means that the proportion of items with wrong prices at California Wal-Mart stores is much higher than what is allowed.

e.

In order for the inference to be valid, the sampling distribution of 𝑝̂ must be approximately normal. For this assumption to be valid, both 𝑛𝑝 ≥ 15 and 𝑛𝑞 ≥ 15. 𝑛𝑝 = 1000(. 02) = 20

𝑛𝑞 = 1000(. 98) = 980

Since 𝑛𝑝 ≥ 15 and 𝑛𝑞 ≥ 15, we can assume the distribution of 𝑝̂ is approximately normal. 7.125

a.

Let 𝜇 =true mean willingness to eat the brand of sliced apples. To determine if the true mean willingness to eat the brand of sliced apples exceeds 3, we test: 𝐻 :𝜇 = 3 𝐻 :𝜇 > 3

The test statistic is 𝑧 =

̄

= √

. .

= 5.71.

The rejection region requires 𝛼 = .05 in the upper tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.645. The rejection region is 𝑧 > 1.645. Since the observed value of the test statistic falls in the rejection region (𝑧 = 5.71 > 1.645), H0 is rejected. There is sufficient evidence to indicate that the true mean willingness to eat the brand of sliced apples exceeds 3 at 𝛼 = .05. b.

Even though the willingness to eat scores are not normally distributed, the test in part a is valid. Copyright © 2022 Pearson Education, Inc.


358

Chapter 7

Because the sample size is so large (𝑛 = 408), the Central Limit Theorem applies. 7.126

a.

A Type I error is rejecting H0 when H0 is true. In this case, we would conclude that the mean number of carats per diamond is different from .6 when, in fact, it is equal to .6. A Type II error is accepting H0 when H0 is false. In this case, we would conclude that the mean number of carats per diamond is equal to .6 when, in fact, it is different from .6.

b.

From Exercise 6.122, the random sample of 30 diamonds yielded 𝑥̄ = .691 and 𝑠 = .262. Let 𝜇 = mean number of carats per diamond. To determine if the mean number of carats per diamond is different from .6, we test: 𝐻 : 𝜇 = .6 𝐻 : 𝜇 ≠ .6 ̄

The test statistic is 𝑧 =

≈ ̄

.

. . √

= 1.90

The rejection region requires 𝛼/2 = .05/2 = .025 in each tail of the z-distribution. From Table II, Appendix D, 𝑧. 25 = 1.96. The rejection region is 𝑧 < −1.96 or 𝑧 > 1.96. Since the observed value of the test statistic does not fall in the rejection region (𝑧 = 1.90 ≯ 1.96), H0 is not rejected. There is insufficient evidence to indicate the mean number of carats per diamond is different from .6 carats at 𝛼 = .05. c.

When 𝛼 is changed, H0, Ha, and the test statistic remain the same. The rejection region requires 𝛼/2 = .10/2 = .05 in each tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.645. The rejection region is 𝑧 < −1.645or𝑧 > 1.645. Since the observed value of the test statistic falls in the rejection region (𝑧 = 1.90 > 1.645), H0 is rejected. There is sufficient evidence to indicate the mean number of carats per diamond is different from .6 carats at 𝛼 = .10.

7.127

d.

When the value of 𝛼 changes, the decision can also change. Thus, it is very important to include the level of 𝛼 used in all decisions.

a.

Let 𝜇 = mean Mach rating score for all purchasing managers. To determine if the mean Mach rating score is different from 85, we test: 𝐻 : 𝜇 = 85 𝐻 : 𝜇 ≠ 85

7.128

b.

The rejection requires 𝛼/2 = .10/2 = .05 in each tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.645. The rejection region is 𝑧 < −1.645or𝑧 > 1.645.

c.

The test statistic is 𝑧 =

d.

Since the observed value of the test statistic falls in the rejection region (𝑧 = 12. 80 > 1.645), H0 is rejected. There is sufficient evidence to indicate that the true mean Mach rating score of all purchasing managers is not 85 at 𝛼 = .10.

a.

Let p = proportion of shoppers using cents-off coupons. To determine if the proportion of shoppers using cents-off coupons exceeds .65, we test:

̄ ̄

. . /√

= 12.80.

Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Tests of Hypothesis

359

𝐻 : 𝑝 = .65 𝐻 : 𝑝 > .65

The test statistic is 𝑧 =

.

=

. (.

.

)

= 10.61

,

The rejection region requires 𝛼 = .05 in the upper tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.645. The rejection region is 𝑧 > 1.645. Since the observed value of the test statistic falls in the rejection region (𝑧 = 10.61 > 1.645), H0 is rejected. There is sufficient evidence to indicate the proportion of shoppers using cents-off coupons exceeds .65 at 𝛼 = .05. b.

The sample size is large enough if the 𝑛𝑝 ≥ 15 and 𝑛𝑞 ≥ 15. 𝑛𝑝 = 1000(. 65) = 650

𝑛𝑞 = 1000(. 35) = 350

Since both 𝑛𝑝 ≥ 15 and 𝑛𝑞 ≥ 15, the normal distribution will be adequate.

7.129

c.

The p-value is 𝑝 = 𝑃(𝑧 ≥ 10.61) = (. 5 − .5) ≈ .0. (Using Table II, Appendix D.) Since the pvalue is smaller than 𝛼 = .05, H0 is rejected. There is sufficient evidence to indicate the proportion of shoppers using cents-off coupons exceeds .65 at 𝛼 = .05.

a.

The hypotheses would be: H0: Individual does not have the disease Ha: Individual does have the disease

b.

A Type I error would be: Conclude the individual has the disease when in fact he/she does not. This would be a false positive test. A Type II error would be: Conclude the individual does not have the disease when in fact he/she does. This would be a false negative test.

c.

7.130

If the disease is serious, either error would be grave. Arguments could be made for either error being more grave. However, I believe a Type II error would be more grave: Concluding the individual does not have the disease when he/she does. This person would not receive critical treatment, and may suffer very serious consequences. Thus, it is more important to minimize 𝛽.

Using MINITAB, the descriptive statistics are: Descriptive Statistics: Tunnel Variable N Mean Median Tunnel 10 989.8 970.5

StDev 160.7

Minimum 735.0

Maximum 1260.0

Q1 862.5

Q3 1096.8

To determine whether peak hour pricing succeeded in reducing the average number of vehicles attempting to use the Lincoln Tunnel during the peak rush hour, we test: 𝐻 : 𝜇 = 1,220 𝐻 : 𝜇 < 1,220

The test statistic is 𝑡 =

̄ /√

=

.

, . /√

= −4.53

Since no α is given, we will use 𝛼 = .05. The rejection region requires 𝛼 = .05 in the lower tail of the Copyright © 2022 Pearson Education, Inc.


360

Chapter 7

t-distribution with df = 𝑛 − 1 = 10 − 1 = 9. From Table III, Appendix D, 𝑡. region is 𝑡 < −1.833.

= 1.833. The rejection

Since the observed value of the test statistic falls in the rejection region (𝑡 = −4.53 < −1.833), H0 is rejected. There is sufficient evidence to indicate that peak hour pricing succeeded in reducing the average number of vehicles attempting to use the Lincoln Tunnel during the peak rush hour at 𝛼 = .05. 7.131

Using MINITAB, the descriptive statistics are: Descriptive Statistics: GASTURBINE Variable N Mean StDev Minimum GASTURBINE 67 11066 1595 8714

Q1 9918

Median 10656

Q3 11842

Maximum 16243

To determine if the heat rates of the augmented gas turbine engine are more variable than the heat rates of the standard gas turbine engine, we test: 𝐻 : 𝜎 = 1,500 𝐻 : 𝜎 > 1,500

The test statistic is 𝜒 =

(

)

=

(

) ,

= 74.625.

,

The rejection region requires 𝛼 = .05 in the upper tail of the 𝜒 distribution with 𝑑𝑓 = 𝑛 − 1 = 67 − 1 = 66. Using MINITAB, Inverse Cumulative Distribution Function Chi-Square with 66 DF P( X <= x ) 0.95

x 85.9649

The rejection region is 𝜒 > 85.9649. Since the observed value of the test statistic doesn’t fall in the rejection region (𝜒 = 74.625 ≯ 85.9649), H0 is not rejected. There is insufficient evidence to indicate the heat rates of the augmented gas turbine engine are more variable than the heat rates of the standard gas turbine engine at 𝛼 = .05. 7.132

a.

To determine if the true mean number of pecks at the blue string is less than 7.5, we test: 𝐻 : 𝜇 = 7.5 𝐻 : 𝜇 < 7.5

The test statistic is 𝑧 =

̄ ̄

=

.

. . √

= −24.46

The rejection region requires 𝛼 = .01 in the lower tail of the z-distribution. From Table II, Appendix D, 𝑧. = 2.33. The rejection region is 𝑧 < −2.33. Since the observed value of the test statistic falls in the rejection region (𝑧 = −24.46 < −2.33), H0 is rejected. There is sufficient evidence to indicate the true mean number of pecks at the blue string is less than 7.5 at 𝛼 = .01. b.

From Exercise 6.122, the 99% confidence interval is (.46, 1. 80). Since the hypothesized value of the mean (𝜇 = 7.5) does not fall in the confidence interval, it is not a likely candidate for the true value of the mean. Thus, you would reject it. This agrees with the conclusion in part a.

Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Tests of Hypothesis

7.133

a.

𝑝̂ =

=

361

= .585

To determine if fewer than 60% of the coffee growers in southern Mexico are either certified or transitioning to become certified, we test: 𝐻 : 𝑝 = .60 𝐻 : 𝑝 < .60 pˆ − p 0

The test statistic is z =

=

σ pˆ

pˆ − p 0 p 0q 0 n

=

.585 − .60 .60 (1 − .60) 845

= −.89 .

The rejection region requires 𝛼 = .05 in the lower tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.645. The rejection region is 𝑧 < −1.645. Since the observed value of the test statistic does not fall in the rejection region (𝑧 = −.89 ≮ − 1.645), H0 is not rejected. There is insufficient evidence to indicate that fewer than 60% of the coffee growers in southern Mexico are either certified or transitioning to become certified at 𝛼 = .05. b.

To compute the power, we must first set up the rejection regions in terms of 𝑝̂ . The rejection region is 𝑧 < −1.645. In terms of 𝑝̂ , this is: −1.645 =

. .

(

.

)

. (

⇒ −1.645

. )

= 𝑝̂ − .6 ⇒ 𝑝̂ = .6 − .02772 = .57228

The rejection region is 𝑝̂ < .57228. The power of the test when 𝑝 = .57 would be:

𝑃𝑜𝑤𝑒𝑟 = 𝑃(𝑝̂ < .57228|𝑝 = .57) = 𝑃 𝑧 >

.

. .

(

.

)

= 𝑃(𝑧 < .134) = .5 + 0517 = .5517

(Using Table II, Appendix D) 7.134

a.

𝑝̂ = 24/40 = .6

To determine if the proportion of shoplifters turned over to police is greater than .5, we test: 𝐻 : 𝑝 = .5 𝐻 : 𝑝 > .5

The test statistic is 𝑧 =

=

.

. . (. )

= 1.26

The rejection region requires 𝛼 = .05 in the upper tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.645. The rejection region is 𝑧 > 1.645. Since the observed value of the test statistic does not fall in the rejection region (𝑧 = 1.26 ≯ 1.645), H0 is not rejected. There is insufficient evidence to indicate the proportion of shoplifters turned over to police is greater than .5 at 𝛼 = .05.

Copyright © 2022 Pearson Education, Inc.


362

Chapter 7

b.

To determine if the normal approximation is appropriate, we check: 𝑛𝑝 = 40(. 5) = 20 and 𝑛𝑞 = 40(. 5) = 20

Since both 𝑛𝑝 ≥ 15 and 𝑛𝑞 ≥ 15, the normal distribution will be adequate. c.

The observed significance level of the test is 𝑝-value = 𝑝 = 𝑃(𝑧 ≥ 1.26) = .5 − .3962 = .1038. (Using Table II, Appendix D) The probability of observing the value of our test statistic or anything more unusual if the true value of p is .5 is .1038. Since this p-value is so large, there is no evidence to reject H0. There is no evidence to indicate the true proportion of shoplifters turned over to police is greater than .5.

7.135

d.

Any value of𝛼that is greater than the p-value would lead one to reject H0. Thus, for this problem, we would reject H0 for any value of 𝛼 > .1038.

a.

A Type II error is concluding the percentage of shoplifters turned over to police is 50% when in fact, the percentage is higher than 50%.

b.

First, calculate the value of 𝑝̂ that corresponds to the border between the acceptance region and the rejection region. 𝑃(𝑝̂ > 𝑝 ) = 𝑃(𝑧 > 𝑧 ) = .05. From Table II, Appendix D, 𝑧 = 1.645 𝑝̂ = 𝑝 + 1.645𝜎 = .5 + 1.645

. (. )

= .5 + .1300 = .6300

𝛽 = 𝑃(𝑝̂ ≤ .6300 when 𝑝 = .55) = 𝑃 𝑧 ≤

c.

.

. (.

.

)

= 𝑃(𝑧 ≤ 1.02) = .5 + .3461 = .8461

If n increases, the probability of a Type II error would decrease. First, calculate the value of 𝑝̂ that corresponds to the border between the acceptance region and the rejection region. 𝑃(𝑝̂ > 𝑝 ) = 𝑃(𝑧 > 𝑧 ) = .05. From Table II, Appendix D, 𝑧 = 1.645 𝑝̂ = 𝑝 + 1.645𝜎 = .5 + 1.645

. (. )

= .5 + .082 = .582

𝛽 = 𝑃(𝑝̂ ≤ .582 when 𝑝 = .55) = 𝑃 𝑧 ≤

7.136

a.

.

. .

(.

)

= 𝑃(𝑧 ≤ 0.64) = .5 + .2389 = .7389

To determine whether the mean profit change for restaurants with frequency programs is greater than $1,050, we test: 𝐻 : 𝜇 = 1,050 𝐻 : 𝜇 > 1,050

b.

Using MINITAB, the descriptive statistics are:

Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Tests of Hypothesis Descriptive Statistics: x Variable N Mean x 12 2509

The test statistic is 𝑡 =

StDev 2149 ̄

=

/√

Variance 4619332 /√

Minimum -2191

Q1 1646

Median 2493

Q3 3426

363

Maximum 6553

= 2.35

The rejection region requires 𝛼 = .05 in the upper tail of the t-distribution with df = 𝑛 − 1 = 12 − 1 = 11. From Table III, Appendix D, 𝑡. = 1.796. The rejection region is 𝑡 > 1.796. Since the observed value of the test statistic falls in the rejection region (𝑡 = 2.35 > 1.796), H0 is rejected. There is sufficient evidence to indicate the mean profit change for restaurants with frequency programs is greater than $1,050 for 𝛼 = .05. It appears that the frequency program would be profitable for the company if adopted nationwide. 7.137

a.

To determine if the production process should be halted, we test: 𝐻 :𝜇 = 3 𝐻 :𝜇 > 3

Where 𝜇 = mean amount of vinyl chloride in the air. The test statistic is z =

x − μ0

σx

=

3.1 − 3 .5 / 50

= 1.41

The rejection region requires 𝛼 = .01in the upper tail of the z-distribution. From Table II, Appendix D, 𝑧. = 2.33. The rejection region is 𝑧 > 2.33. Since the observed value of the test statistic does not fall in the rejection region, (𝑧 = 1.41 ≯ 2.33), H0 is not rejected. There is insufficient evidence to indicate the mean amount of vinyl chloride in the air is more than 3 parts per million at 𝛼 = .01. Do not halt the manufacturing process.

7.138

b.

As plant manager, I do not want to shut down the plant unnecessarily. Therefore, I want 𝛼 =P(shut down plant when 𝜇 = 3) to be small.

c.

The p-value is 𝑝 = 𝑃(𝑧 ≥ 1.41) = .5 − .4207 = .0793. Since the p-value is not less than 𝛼 = .01, H0 is not rejected.

a.

A Type II error would be concluding the mean amount of vinyl chloride in the air is less than or equal to 3 parts per million when, in fact, it is more than 3 parts per million.

b.

From Exercise 7.137, 𝑧 =

̄ /√

⇒ 𝑥̄ = 𝑧

For 𝜇 = 3.1, 𝛽 = 𝑃(𝑥̄ ≤ 3.165) = 𝑃 𝑧 ≤

c.

+ 𝜇 ⇒ 𝑥̄ = 2.33

.

. . √

. √

+ 3 ⇒ 𝑥̄ = 3.165

= 𝑃(𝑧 ≤ .92) = .5 + .3212 = .8212

𝑃𝑜𝑤𝑒𝑟 = 1 − 𝛽 = 1 − .8212 = .1788

Copyright © 2022 Pearson Education, Inc.

(from Table II, Appendix D)


364

Chapter 7

d.

For 𝜇 = 3.2, 𝛽 = 𝑃(𝑥̄ ≤ 3.165) = 𝑃 𝑧 ≤

.

.

= 𝑃(𝑧 ≤ −.49) = .5 − .1879 = .3121

. √

𝑃𝑜𝑤𝑒𝑟 = 1 − 𝛽 = 1 − .3121 = .6879

As the plant's mean vinyl chloride departs further from 3, the power increases. 7.139

a.

No, it increases the risk of falsely rejecting H0, i.e., closing the plant unnecessarily.

b.

First, find 𝑥̄ such that 𝑃(𝑥̄ > 𝑥̄ ) = 𝑃(𝑧 > 𝑧 ) = .05. From Table II, Appendix D, 𝑧 = 1.645 𝑧=

̄ /√

⇒ 1.645 =

̄ . /√

⇒ 𝑥̄ = 3.116

Then, compute: 𝛽 = 𝑃(𝑥̄ ≤ 3.116 when 𝜇 = 3.1) = 𝑃 𝑧 ≤

.

. . /√

= 𝑃(𝑧 ≤ .23) = .5 + .0910 = .5910

𝑃𝑜𝑤𝑒𝑟 = 1 − 𝛽 = 1 − .5910 = .4090

7.140

c.

The power of the test increases as 𝛼 increases.

𝑝̂ =

=

= .106

To determine if the French unemployment rate dropped after the enactment of the 35-hour work week law, we test: 𝐻 : 𝑝 = .12 𝐻 : 𝑝 < .12

The test statistic is z =

pˆ − p 0

σ pˆ

=

pˆ − p 0 p 0q 0 n

=

.106 − .12 .12 (1 − .12) 500

= −.96 .

The rejection region requires 𝛼 = .05 in the lower tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.645. The rejection region is 𝑧 < −1.645. Since the observed value of the test statistic does not fall in the rejection region (𝑧 = −.96 ≮ − 1.645), H0 is not rejected. There is insufficient evidence to indicate that the French unemployment rate dropped after the enactment of the 35-hour work week law at 𝛼 = .05. 7.141

To determine if the diameters of the ball bearings are more variable when produced by the new process, test: 𝐻 : 𝜎 = .00156 𝐻 : 𝜎 > .00156

The test statistic is χ 2 =

( n − 1) s 2 = 99 (.00211) = 133.90 σ 02

.00156

Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Tests of Hypothesis

365

The rejection region requires use of the upper tail of the 𝜒 distribution with 𝑑𝑓 = 𝑛 − 1 = 100 − 1 = 99. We will use df = 100 ≈ 99 due to the limitations of the table. From Table IV, Appendix D, 𝜒. = 129.561 < 133. 90 < 135. 807 = 𝜒. . The p-value of the test is between .010 and .025. The decision made depends on the desired 𝛼. For 𝛼 < .010, there is not enough evidence to show that the variance in the diameters is greater than .00156. For 𝛼 ≥ .025, there is enough evidence to show that the variance in the diameters is greater than .00156. 7.142

Let 𝜇 = mean years of experience for commercial suppliers of the DoD. To determine if the mean years of experience is greater than 5, we test: 𝐻 :𝜇 = 5 𝐻 :𝜇 > 5

Using MINITAB, the descriptive statistics are: Descriptive Statistics: Years Variable N Mean StDev Years 6 12.33 8.87

The test statistic is 𝑡 =

̄

= √

Minimum 5.00 .

Q1 8.00

Median 10.00

Q3 15.00

Maximum 30.00

= 2.03.

. √

The rejection region requires 𝛼 = .05 in the upper tail of the t-distribution with df = 𝑛 − 1 = 6 − 1 = 5. From Table III, Appendix D, 𝑡. = 2.015. The rejection region is 𝑡 > 2.015. Since the observed value of the test statistic falls in the rejection region (𝑡 = 2.03 > 2.015), H0 is rejected. There is sufficient evidence to indicate the mean years of experience for commercial suppliers of the DoD is greater than 5 at 𝛼 = .05. 7.143

a.

Let 𝜇 = mean daily amount of distilled water collected by the new system. To determine if the mean daily amount of distilled water collected by the new system is greater than 1.4, we test: 𝐻 : 𝜇 = 1.4 𝐻 : 𝜇 > 1.4

b.

For this problem, 𝛼 =probability of concluding the mean daily amount of distilled water collected by the new system is greater than 1.4 when, in fact, the mean daily amount of distilled water collected by the new system is not greater than 1.4. Since 𝛼 = .10, this means that H0 will be rejected when it is true about 10% of the time.

c.

Using MINITAB, the descriptive statistics are: Descriptive Statistics: Water Variable N Mean StDev Water 3 5.243 0.192

Minimum 5.070

Q1 5.070

Median 5.210

𝑥̄ = 5.243 and 𝑠 = .192.

d.

The test statistic is 𝑡 =

̄

= √

e.

.

. .

= 34.67.

Using MINITAB:

Copyright © 2022 Pearson Education, Inc.

Q3 5.450

Maximum 5.450


366

Chapter 7 One-Sample T: Water Test of mu = 1.4 vs > 1.4

Variable Water

N 3

Mean 5.24333

StDev 0.19218

SE Mean 0.11096

95% Lower Bound 4.91935

T 34.64

P 0.000

The p-value is 𝑝 = 0.000.

7.144

f.

Since the p-value is less than 𝛼(𝑝 = .000 < .10), H0 is rejected. There is sufficient evidence to indicate daily amount of distilled water collected by the new system is greater than 1.4at 𝛼 = .10.

a.

𝑧=

̄

≈ ̄

.

= 1.29

. √

The p-value is 𝑝 = 𝑃(𝑧 ≥ 1.29) + 𝑃(𝑧 ≤ −1.29) = (. 5 − .4015) + (. 5 − .4015) = .1970. (Using Table II, Appendix D.) b.

The p-value is 𝑝 = 𝑃(𝑧 ≥ 1.29) = (. 5 − .4015) = .0985. (Using Table II, Appendix D.)

c.

𝑧=

̄

≈ ̄

.

= 0.88

. √

The p-value is 𝑝 = 𝑃(𝑧 ≥ 0.88) + 𝑃(𝑧 ≤ −0.88) = (. 5 − .3106) + (. 5 − .3106) = .3788. (Using Table II, Appendix D.) d.

In part a, in order to reject H0, α would have to be greater than .1970. In part b, in order to reject H0, α would have to be greater than .0985. In part c, in order to reject H0, α would have to be greater than .3788.

e.

For a two-tailed test, 𝛼/2 = .01/2 = .005. From Table II, Appendix D, 𝑧. 𝑧=

7.145

̄ ̄

.

⇒ 2.58 =

⇒ 2.58

= 2.58.

= 53.3 − 52 ⇒ .3649𝑠 = 1.3 ⇒ 𝑠 = 3.56

Using MINITAB, the descriptive statistics are: One-Sample T: Skid Test of mu = 425 vs < 425

Variable Skid

N 20

Mean 358.450

StDev 117.817

SE Mean 26.345

95% Upper Bound T 404.004 -2.53

P 0.010

To determine if the mean skidding distance is less than 425 meters, we test: 𝐻 : 𝜇 = 425 𝐻 : 𝜇 < 425

The test statistics is 𝑡 =

̄

= √

. .

= −2.53.

The rejection region requires 𝛼 = .10 in the lower tail of the t-distribution with. From Table III, Appendix D, 𝑡. = 1.328. The rejection region is 𝑡 < −1.328.

Copyright © 2022 Pearson Education, Inc.


Inferences Based on a Single Sample: Tests of Hypothesis

367

Since the observed value of the test statistic falls in the rejection region (𝑡 = −2.53 < −1.328), H0 is rejected. There is sufficient evidence to indicate the true mean skidding distance is less than 425 meters at 𝛼 = .10. There is sufficient evidence to refute the claim. 7.146

Let 𝑝 = proportion of patients taking the pill who reported an improved condition. First we check to see if the normal approximation is adequate: 𝑛𝑝 = 7000(. 5) = 3500 and 𝑛𝑞 = 7000(. 5) = 3500

Since both 𝑛𝑝 ≥ 15 and 𝑛𝑞 ≥ 15, the normal distribution will be adequate. To determine if there really is a placebo effect at the clinic, we test: 𝐻 : 𝑝 = .5 𝐻 : 𝑝 > .5

The test statistic is 𝑧 =

=

.

. . (. )

= 33.47

The rejection region requires 𝛼 = .05in the upper tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.645. The rejection region is 𝑧 > 1.645. Since the observed value of the test statistic falls in the rejection region (𝑧 = 33.47 > 1.645), H0 is rejected. There is sufficient evidence to indicate that there really is a placebo effect at the clinic at 𝛼 = .05. 7.147

Using MINITAB, the descriptive statistics are: Descriptive Statistics: Candy Variable N Mean StDev Candy 5 22.000 2.000

Minimum 20.000

Q1 20.500

Median 21.000

Q3 24.000

Maximum 25.000

To give the benefit of the doubt to the students we will use a small value of 𝛼. (We do not want to reject H0 when it is true to favor the students.) Thus, we will use 𝛼 = .001. We must also assume that the sample comes from a normal distribution. To determine if the mean number of candies exceeds 15, we test: 𝐻 : 𝜇 = 15 𝐻 : 𝜇 > 15

The test statistic is 𝑧 =

̄

= √

= 7.83 √

The rejection region requires 𝛼 = .001in the upper tail of the z-distribution. From Table II, Appendix = 3.08. The rejection region is 𝑧 > 3.08. D, 𝑧. Since the observed value of the test statistic falls in the rejection region (𝑧 = 7.83 > 3.08), H0 is rejected. There is sufficient evidence to indicate the mean number of candies exceeds 15 at 𝛼 = .001.

Copyright © 2022 Pearson Education, Inc.


Chapter 8 Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 8.1

8.2

8.3

a.

𝜇 ± 2𝜎 ̄ ⇒ 𝜇 ± 2

b.

𝜇 ± 2𝜎 ̄ ⇒ 𝜇 ± 2

c.

𝜇̄

d.

𝜇 −𝜇

e.

The variability of the difference between the sample means is greater than the variability of the individual sample means.

a.

𝜇 ̄ = 𝜇 = 12

𝜎̄ =

b.

𝜇 ̄ = 𝜇 = 10

𝜎̄ =

c.

𝜇̄

d.

Since 𝑛 ≥ 30 and 𝑛 ≥ 30, the sampling distribution of 𝑥̄ − 𝑥̄ is approximately normal by the Central Limit Theorem.

a.

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. 25 = 1.96. The confidence interval is:

⇒ 150 ± 2

⇒ 150 ± 2

⇒ 150 ± 6 ⇒ 144, 156

√ √

= 𝜇 − 𝜇 = 150 − 150 = 0

̄

̄

±2

+

± 𝑧.

𝜎̄

= ̄

⇒ 150 − 150 ± 2

= 𝜇 − 𝜇 = 12 − 10 = 2

𝑥̄ − 𝑥̄

⇒ 150 ± 8 ⇒ 142, 158

+

𝜎̄

+

+

= =

̄

=

+

=

= 5

⇒ 0 ± 10 ⇒ −10,10

= .5 = .375

=

+

=

⇒ 5,275 − 5,240 ± 1.96

+

+

=

= .625

⇒ 35 ± 24. 5 ⇒ 10.5, 59. 5

We are 95% confident that the difference between the population means is between 10.5 and 59.5. b.

The test statistic is z =

( x1 − x2 ) − ( μ1 − μ2 ) ( 5, 275 − 5, 240) − 0 σ12 n1

+

σ 22 n2

=

2

2

150 + 200 400 400

= 2.8

The p-value is 𝑝 = 𝑃 𝑧 ≤ −2.8 + 𝑃 𝑧 ≥ 2.8 = 2𝑃 𝑧 ≥ 2.8 = 2 . 5 − .4974 = 2 . 0026 = .0052 Since the p-value is so small, there is evidence to reject H0. There is evidence to indicate the two population means are different for 𝛼 > .0052.

368 Copyright © 2022 Pearson Education, Inc.


Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 369 c.

The p-value would be half of the p-value in part b. The p-value= 𝑝 = 𝑃 𝑧 ≥ 2.8 = .5 − .4974 = .0026. Since the p-value is so small, there is evidence to reject H0. There is evidence to indicate the mean for population 1 is larger than the mean for population 2 for 𝛼 > .0026.

d.

The test statistic is z =

( x1 − x2 ) − ( μ1 − μ 2 )

σ

2 1

n1

σ

+

2 2

=

(5, 275 − 5, 240) − 25 2

2

150 + 200 400 400

n2

= .8

The p-value of the test is 𝑝 = 𝑃(𝑧 ≤ −.8) + 𝑃(𝑧 ≥ .8) = 2𝑃(𝑧 ≥ .8) = 2(.5 − .2881) = 2(.2119) = .4238 Since the p-value is so large, there is no evidence to reject H0. There is no evidence to indicate that the difference in the 2 population means is different from 25 for 𝛼 ≤ .10. e. 8.4

We must assume that we have two independent random samples.

Assumptions about the two populations: 1. 2.

Both sampled populations have relative frequency distributions that are approximately normal. The population variances are equal.

Assumptions about the two samples: The samples are randomly and independently selected from the populations. 8.5

8.6

a.

No. Both populations must be normal.

b.

No. Both populations variances must be equal.

c.

No. Both populations must be normal.

d.

Yes.

e.

No. Both populations must be normal.

a.

𝑠 =

(

)

(

)

=

b.

𝑠 =

(

)

(

)

=

c.

𝑠 =

(

).

(

).

=

d.

𝑠 =

(

) ,

(

) ,

(

)

(

)

=

,

= 110

= 14.5714 .

= .1821 =

,

= 2,741.9355

𝑠 falls nearer the variance with the larger sample size. 8.7

Some preliminary calculations are: 𝑥̄ =

=

.

= 2.36

𝑠 =

𝑥̄ =

=

.

= 3.6

𝑠 =

(∑

)

= (∑

)

=

(

.

.

. )

= .733 (

. )

= .42

Copyright © 2022 Pearson Education, Inc.


370

Chapter 8

)

(

(

a.

𝑠 =

b.

𝐻 : 𝜇 −𝜇 =0 𝐻 : 𝜇 −𝜇 <0

)

=

The test statistic is 𝑡 =

(

( ̄

).

(

).

=

̄ )

=

( .

. )

.

.

= .5989

=

. .

= −2.39

The rejection region requires 𝛼 = .10 in the lower tail of the t-distribution with 𝑑𝑓 = 𝑛 + 𝑛 − 2 = 5 + 4 − 2 = 7. From Table III, Appendix D, 𝑡. =1.415. The rejection region is 𝑡 < −1.415. Since the test statistic falls in the rejection region (𝑡 = −2.39 < −1.415), H0 is rejected. There is sufficient evidence to indicate that 𝜇 > 𝜇 at 𝛼 = .10. c.

A small sample confidence interval is needed because 𝑛 = 5 < 30 and 𝑛 = 4 < 30. For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table III, Appendix D, with 𝑑𝑓 = 𝑛 + 𝑛 − 2 = 5 + 4 − 2 = 7, 𝑡.05 =1.895. The 90% confidence interval for (𝜇 − 𝜇 ) is: (𝑥̄ − 𝑥 ) ± 𝑡.

𝑠

⇒ (2.36 − 3.6) ± 1.895 . 5989

+

+

⇒ −1.24 ± .98 ⇒

(−2.22, −0.26)

8.8

d.

The confidence interval in part c provides more information about (𝜇 − 𝜇 ) than the test of hypothesis in part b. The test in part b only tells us that 𝜇 is greater than 𝜇 . However, the confidence interval estimates what the difference is between 𝜇 and 𝜇 .

a.

𝜎̄

b.

The sampling distribution of 𝑥̄ − 𝑥̄ is approximately normal by the Central Limit Theorem since 𝑛 ≥ 30 and 𝑛 ≥ 30. 𝜇̄

c.

= ̄

̄

+

=

+

= √. 25 = .5

= 𝜇 − 𝜇 = 10

𝑥̄ − 𝑥̄ = 26.6 − 15.5 = 11.1 No, it does not appear that 𝑥̄ − 𝑥̄ = 11.1 contradicts the null hypothesis 𝐻 : 𝜇 − 𝜇 = 10. The value 11.1 is fairly close to 10.

d.

The rejection region requires 𝛼/2 = .05/2 = .025 in each tail of the z-distribution. From Table II, Appendix D, 𝑧. 25 = 1.96. The rejection region is 𝑧 < −1.96 or 𝑧 > 1.96.

e.

𝐻 : 𝜇 − 𝜇 = 10 𝐻 : 𝜇 − 𝜇 ≠ 10 The test statistic is 𝑧 =

( ̄

̄ )

=

(

. )

. .

= 2.2

Copyright © 2022 Pearson Education, Inc.


Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 371 The rejection region is 𝑧 < −1.96 or 𝑧 > 1.96. (Refer to part d.) Since the observed value of the test statistic falls in the rejection region (𝑧 = 2.2 > 1.96), H0 is rejected. There is sufficient evidence to indicate the difference in the population means is not equal to 10 at 𝛼 = .05. f.

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. 25 = 1.96. The confidence interval is: (26. 6 − 15. 5) ± 1.96

⇒ 11. 1 ± .98 ⇒ (10. 12,12.08)

+

We are 95% confident that the difference in the two means is between 10.12 and 12.08 .

8.9

g.

The confidence interval gives more information.

a.

The p-value = 𝑝 = .1150. Since the p-value is not small, there is no evidence to reject H0 for 𝛼 ≤ .10. There is insufficient evidence to indicate the two population means differ for 𝛼 ≤ .10.

b.

If the alternative hypothesis had been one-tailed, the p-value would be half of the value for the twotailed test. Here, p-value = .1150/2 = .0575. There is no evidence to reject H0 for 𝛼 = .05. There is insufficient evidence to indicate the mean for population 1 is less than the mean for population 2 at 𝛼 = .05. There is evidence to reject H0 for 𝛼 > .0575. There is sufficient evidence to indicate the mean for population 1 is less than the mean for population 2 at 𝛼 > .0575.

8.10

Some preliminary calculations:

a.

𝑥̄ =

=

= 43.6

𝑠 =

𝑥̄ =

=

= 53.625

𝑠 =

𝑠 =

(

)

(

)

=

(

)

(∑

(∑

(

.

)

=

=

.

= 29.9714

=

=

.

= 29.3167

)

)

.

=

. )

=

.

= 29.6328

𝐻 : 𝜇 − 𝜇 = 10 𝐻 : 𝜇 − 𝜇 > 10 The test statistic is 𝑡 =

( ̄

̄ )

=

(

. .

. .

= .013

The rejection region requires 𝛼 = .01 in the upper tail of the t-distribution with df = 𝑛 + 𝑛 − 2 = 15 + 16 − 2 = 29. From Table III, Appendix D, 𝑡. = 2.462. The rejection region is 𝑡 > 2.462. Since the test statistic does not fall in the rejection region (𝑡 = .013 ≯ 2.462), H0 is not rejected. There is insufficient evidence to conclude 𝜇 − 𝜇 > 10 at 𝛼 = .01.

Copyright © 2022 Pearson Education, Inc.


372

Chapter 8 b.

For confidence coefficient .98, 𝛼 = .02 and 𝛼/2 = .02/2 = .01. From Table III, Appendix D, with df = 𝑛 + 𝑛 − 2 = 15 + 16 − 2 = 29, 𝑡. = 2.462. The 98% confidence interval for (𝜇 − 𝜇 ) is: (𝑥̄ − 𝑥̄ ) ± 𝑡 /

𝑠

⇒ (53.625 − 43. 6) ± 2.462 29.6328

+

+

⇒ 10.025 ± 4.817 ⇒ (5.208, 14.842) We are 98% confident that the difference between the mean of population 2 and the mean of population 1 is between 5.208 and 14.842. 8.11

a.

s = 2 p

( n1 − 1) s12 + ( n2 − 1) s22 n1 + n2 − 2

=

(17 -1) 3.4 2 + (12 − 1) 4.82 17 + 12 − 2

= 16.237

𝐻 : 𝜇 −𝜇 =0 𝐻 : 𝜇 −𝜇 ≠0 The test statistic is t =

( x1 − x2 ) − 0 1 1  s  +   n1 n2  2 p

=

( 5.4 − 7.9 ) − 0 1   1 16.237 +  17 12  

= − 1.646

Since no 𝛼 was given, we will use 𝛼 = .05. The rejection region requires 𝛼/2 = .05/2 = .025 in each tail of the t-distribution with 𝑑𝑓 = 𝑛 + 𝑛 − 2 = 17 + 12 − 2 = 27. From Table III, = 2.052. The rejection region is 𝑡 < −2.052 or 𝑡 > 2.052. Appendix D, 𝑡. Since the observed value of the test statistic does not fall in the rejection region (𝑡 = −1.646 ≮ − 2.052), H0 is not rejected. There is insufficient evidence to indicate 𝜇 − 𝜇 is different from 0 at 𝛼 = .05. b.

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table III, Appendix D, with = 2.052. The confidence interval is: 𝑑𝑓 = 𝑛 + 𝑛 − 2 = 17 + 12 − 2 = 27, 𝑡.

1 1 1 1 +   ( 5.4 − 7.9 ) − 2.052 16.237 +   −2.50 ± 3.12  ( −5.62, 0.62 ) n n  17 12   1 2 

( x1 − x2 ) ± t.025 sp2  8.12

a.

The target parameter is 𝜇 − 𝜇 = difference in mean 5-year capitalization rate of all single-property retail tenants with low S&P ratings and the mean 5-year capitalization rate of all single-property retail tenants with high S&P ratings .

b.

Using MINITAB, the descriptive statistics are: Descriptive Statistics: CR-5 CRED RATING HIGH LOW

N 3 4

Mean 6.333 8.238

StDev 0.722 0.492

SE Mean 0.42 0.25

The point estimate is 𝑥̄ − 𝑥̄ = 8.238 − 6.333 = 1.905.

Copyright © 2022 Pearson Education, Inc.


Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 373 c.

Since the sample sizes for both samples are so small, the Central Limit Theorem does not apply. In addition, the population standard deviations are not known and must be estimated with the sample standard deviations.

d.

Using MINITAB, the confidence interval is: Estimation for Difference 90% CI for Difference

Difference

-1.905 (-3.043, -0.765)

8.13

e.

Since 0 is not in the 90% confidence interval, there is sufficient evidence to indicate a difference in the mean 5-year capitalization rates for the two S&P ratings.

f.

We must assume that we have independent random samples from normal populations and that the population variances are the same.

a.

𝑠 =

(

)

(

)

=

(

)

.

(

) .

=

,

.

= 79.53125

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. Using MINITAB with = 2.011. The 95% confidence interval for (𝜇 − 𝜇 ) 𝑑𝑓 = 𝑛 + 𝑛 − 2 = 25 + 25 − 2 = 48, 𝑡. is: (𝑥̄ − 𝑥̄ ) ± 𝑡 /

𝑠

+

⇒ (25.08 − 19.38) ± 2.011 79.53125

+

⇒ 5.7 ± 5.073 ⇒ (. 627, 10.773)

8.14

8.15

b.

Since 0 does not fall in the 95% confidence interval, there is evidence to indicate there is a difference in the mean response times between the two groups. Since the interval contains only positive numbers, it indicates that the mean response time for the group of students whose last names begin with the letters R-Z is shorter than the mean response time for the group of students whose last names begin with the letters A-I. This supports the researchers’ last name effect theory.

a.

Since the p-value is less than 𝛼 (𝑝 = .000 < .05), H0 is rejected. There is sufficient evidence to indicate the mean leadership values for captains from successful and unsuccessful teams differ at 𝛼 = .05.

b.

Since the p-value is not less than 𝛼 (𝑝 = .907 ≮ . 05), H0 is not rejected. There is insufficient evidence to indicate the mean leadership values for flight attendants from successful and unsuccessful teams differ at 𝛼 = .05.

Let 𝜇 = mean data size of all normal packets and 𝜇 = mean data size of all attacked packets. To determine if the mean data size of all normal and attacked packets differs, we test: 𝐻 : 𝜇 =𝜇 𝐻 : 𝜇 ≠𝜇 From the printout, the p-value is 𝑝 = .0004. Since the p-value is less than 𝛼 (𝑝 = .0004 < .05), H0 is rejected. There is sufficient evidence to indicate the mean data size of all normal and attacked packets differs at 𝛼 = .05.

Copyright © 2022 Pearson Education, Inc.


374 8.16

Chapter 8 Let 𝜇 = mean drug concentration for Site1 and 𝜇 = mean drug concentration for Site 2. To determine if there is a difference in the mean drug concentration between the two Sites, we test: 𝐻 : 𝜇 −𝜇 =0 𝐻 : 𝜇 −𝜇 ≠0 From the printout, the test statistic is 𝑡 = .57 and the p-value is 𝑝 = .573. Since the p-value is not small, H0 is not rejected. There is insufficient evidence to indicate a difference in the mean drug concentrations between the two Sites.

8.17

a.

Let 𝜇 = mean forecast error of buy-side analysts and 𝜇 = mean forecast error of sell-side analysts. For confidence coefficient 0.95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, = 1.96. The 95% confidence interval is: 𝑧. (𝑥̄ − 𝑥̄ ) ± 𝑧.

+

⇒ . 85 − (−.05) ± 1.96

. ,

+

. ,

⇒ .90 ± .064 ⇒ (. 836, .964)

We are 95% confident that the difference in the mean forecast error of buy-side analysts and sell-side analysts is between .836 and .964. b.

Based on 95% confidence interval in part a, the buy-side analysts has the greater mean forecast error because our interval contains positive numbers.

c.

The assumptions about the underlying populations of forecast errors that are necessary for the validity of the inference are: 1. 2.

8.18

The samples are randomly and independently sampled. The sample sizes are sufficiently large.

Let 𝜇 = mean narcissism level of all accounting majors and 𝜇 = mean narcissism level of all introductory psychology students. To determine if the true narcissism level of all accounting majors is less than the true narcissism level of all introductory psychology students, we test: 𝐻 : 𝜇 −𝜇 =0 𝐻 : 𝜇 −𝜇 <0 The test statistic is 𝑧 =

( ̄

̄ )

(

.

)

. .

.

= −1.92

The rejection region requires 𝛼 = .01 in the lower tail of the z-distribution. From Table II, Appendix D, 𝑧. = 2.33. The rejection region is 𝑧 < −2.33. Since the observed value of the test statistic does not fall in the rejection region(𝑧 = −1.92 ≮ − 2.33), H0 is not rejected. There is insufficient evidence to indicate the true narcissism level of all accounting majors is less than the true narcissism level of all introductory psychology students at 𝛼 = .01. 8.19

a.

None of the p-values for the 5 varieties of peach jam are less than 𝛼 = .05. Thus, we cannot conclude that the mean taste scores of the two protocols differ for any of the varieties at 𝛼 = .05.

b.

The p-values for the cheese varieties A, C, and D are less than 𝛼 = .05. Thus, we can conclude that the mean taste scores of the two protocols differ for cheese varieties A, C, and D at 𝛼 = .05.

Copyright © 2022 Pearson Education, Inc.


Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 375

8.20

c.

In all the tests, the sample sizes were greater than 30. Thus, the Central Limit Theorem applies, so we do not have to assume that the populations of taste test scores are normal.

a.

Let 𝜇 = mean intention to defect if you make a one-time lump sum payment and 𝜇 = mean intention to defect if you choose a multiple payment plan. To determine if a difference exists in the mean intention to defect for the two payment options, we test: 𝐻 : 𝜇 −𝜇 =0 𝐻 : 𝜇 −𝜇 ≠0 The test statistic is 𝑧 =

( ̄

̄ )

( .

)

. .

.

= 1.83

Since no 𝛼 was given, we will use 𝛼 = .05. The rejection region requires 𝛼/2 = .05/2 = .025 in each tail of the z-distribution. From Table II, Appendix D, 𝑧.025 = 1.96. The rejection region is 𝑧 < −1.96 and 𝑧 > 1.96. Since the observed value of the test statistic does not fall in the rejection region (𝑧 = 1.83), H0 is not rejected. There is insufficient evidence to indicate a difference exists in the mean intention to defect for the two payment options 𝛼 = .05. b.

8.21

Because two large samples were taken, the Central Limit Theorem can be used. It is not required that the populations that are being sampled from be normally distributed. In this case, the 7-point scale will not impact the validity of the inference.

Using MINITAB, the descriptive statistics are: Descriptive Statistics: Control, Rude Variable N Mean StDev Minimum Rude 45 8.511 3.992 0.000 Control 53 11.81 7.38 0.00

Q1 5.500 5.50

Median 9.000 12.00

Q3 11.000 17.50

Maximum 18.000 30.00

Let 𝜇 = mean performance level of students in the rudeness group and 𝜇 = mean performance level of students in the control group. To determine if the true performance level for students in the rudeness condition is lower than the true mean performance level for students in the control group, we test: 𝐻 : 𝜇 −𝜇 =0 𝐻 : 𝜇 −𝜇 <0 The test statistic is 𝑧 =

( ̄

̄ )

( .

)

. .

.

= −2.81

The rejection region requires 𝛼 = .01 in the lower tail of the z-distribution. From Table II, Appendix D, 𝑧. = 2.33. The rejection region is 𝑧 < −2.33. Since the observed value of the test statistic falls in the rejection region (𝑧 = −2.81 < −2.33), H0 is rejected. There is sufficient evidence to indicate the true mean performance level for students in the rudeness condition is lower than the true mean performance level for students in the control group at 𝛼 = .01. 8.22

a.

If the manipulation was successful, then the positive group (requiring a strong display of positive emotions) should have the higher mean response. The members of this group should disagree with the statement presented, resulting in higher responses. Copyright © 2022 Pearson Education, Inc.


376

Chapter 8 b.

Using MINITAB, the descriptive statistics are: Descriptive Statistics: Positive, Neutral Variable N Mean StDev Minimum Positive 78 4.4872 0.6595 2.0000 Neutral 67 1.8955 0.4965 1.0000

Q1 4.0000 2.0000

Median 5.0000 2.0000

Q3 5.0000 2.0000

Maximum 5.0000 3.0000

Let 𝜇 = mean response for the positive group and 𝜇 = mean response for the neutral group. To determine if the manipulation was successful, we test: 𝐻 : 𝜇 −𝜇 =0 𝐻 : 𝜇 −𝜇 >0 The test statistic is z =

( x1 − x2 ) − ( μ1 − μ2 ) ( 4.4872 − 1.8955) − 0 σ 12 n1

+

=

σ 22 n2

2

.6595 + .4965 78 67

2

= 26.94 .

The rejection region requires 𝛼 = .05 in the upper tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.645. The rejection region is 𝑧 > 1.645. Since the observed value of the test statistic falls in the rejection region (𝑧 = 26.94 > 1.645), H0 is rejected. There is sufficient evidence to indicate the mean response for the positive group is greater than the mean response for the neutral group at 𝛼 = .05. Thus, the manipulation was successful. c. 8.23

We need to assume that random and independent samples were selected from each of the populations.

Using MINITAB, the descriptive statistics are: Descriptive Statistics: Honey, DM Variable N Mean StDev Honey 35 10.714 2.855 DM 33 8.333 3.256

Minimum 4.000 3.000

Q1 9.000 6.000

Median 11.000 9.000

Q3 12.000 11.500

Maximum 16.000 15.000

Let 𝜇 = mean improvement in total cough symptoms score for children receiving the Honey dosage and 𝜇 = mean improvement in total cough symptoms score for children receiving the DM dosage. To test if honey may be a preferable treatment for the cough and sleep difficulty associated with childhood upper respiratory tract infection, we test: 𝐻 : 𝜇 −𝜇 =0 𝐻 : 𝜇 −𝜇 >0 The test statistic is 𝑧 =

( ̄

̄ )

(

.

)

. .

.

= 3.20

Since no 𝛼 was given, we will use 𝛼 = .05. The rejection region requires 𝛼 = .05in the upper tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.645. The rejection region is 𝑧 > 1.645. Since the observed value of the test statistic falls in the rejection region (𝑧 = 3.20 > 1.645), H0 is rejected. There is sufficient evidence to indicate that honey may be a preferable treatment for the cough and sleep difficulty associated with childhood upper respiratory tract infection at 𝛼 = .05.

Copyright © 2022 Pearson Education, Inc.


Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 377 8.24

a.

Let 𝜇 = mean percentage of female board directors at firms with a nominating committee and 𝜇 ℎ = mean percentage of female board directors at firms without a nominating committee. To determine whether firms with a nominating committee would appoint more female directors than firms without a nominating committee, we test: 𝐻 : 𝜇with − 𝜇without = 0 𝐻 : 𝜇with − 𝜇without > 0

b.

The test statistic is 𝑧 = 5.51 and the p-value is 𝑝 = .0001. Since the p-value is less than 𝛼 (𝑝 = .0001 < .05), H0 is rejected. There is sufficient evidence to indicate that firms with a nominating committee would appoint more female directors than firms without a nominating committee at 𝛼 = .05.

c.

No. Both sample sizes are large. Therefore, the Central Limit Theorem applies.

d.

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. 25 = 1.96. The confidence interval is: (𝑥̄ − 𝑥̄ ) ± 𝑧.

+

We know that the test statistic is𝑧 = ( .

. )

( ̄

̄ )

and 𝑧 = 5.51.

.

= .5808.

The confidence interval is(𝑥̄ − 𝑥̄ ) ± 𝑧.

+

Thus, 5.51 =

+

=

.

⇒ (7.5 − 4.3) ± 1.96(. 5808) ⇒

3.2 ± 1.14 ⇒ (2.06,4.34) We are 95% confidence that the difference in true mean percentages at firms with and without a nomination committee is between 2.06% and 4.34%. Since this interval does not contain 0, the difference in the mean percentages is also practically significant. 8.25

a.

We cannot provide a measure of reliability because we have no measure of the variability or variance of the data.

b.

We would need the variances of the two samples.

c.

Let 𝜇 = mean age for self-employed immigrants and 𝜇 = mean age for the wage-earning immigrants. To determine if the mean age for self-employed immigrants is less than the mean age for wage-earning immigrants, we test: 𝐻 : 𝜇 −𝜇 =0 𝐻 : 𝜇 −𝜇 <0 The rejection region requires 𝛼 = .01 in the lower tail of the z-distribution. From Table II, Appendix D, 𝑧. = 2.33. The rejection region is 𝑧 < −2.33.

d.

We use the following to solve for 𝜎:

Copyright © 2022 Pearson Education, Inc.


378

Chapter 8

z=

( x1 − x2 ) − ( μ1 − μ2 ) = ( 44.88 − 46.79) − 0 ≤ − 2.33 σ 12 n1

 −1.91 ≤ σ

8.26

+

σ 22

σ2

n2

870

+

σ2

84,875

1 1 + ( −2.33)  σ ≤ 24.056 870 84,875

e.

The true value of𝜎is likely to be smaller than 24.056. This standard deviation would be too large for the ages of people.

a.

𝐻 : 𝜇 −𝜇 =0 𝐻 : 𝜇 −𝜇 <0 The rejection region requires 𝛼 = .10 in the lower tail of the z-distribution. From Table II, Appendix D, 𝑧.10 = 1.28. The rejection region is 𝑧 < −1.28.

b.

𝐻 : 𝜇 −𝜇 =0 𝐻 : 𝜇 −𝜇 <0 The test statistic is 𝑧 =

̅

.

=

√ √

= −4.71.

The rejection region is 𝑧 < −1.28. (Refer to part a.) Since the observed value of the test statistic falls in the rejection region (𝑧 = −4.71 < − 1.28), H0 is rejected. There is sufficient evidence to indicate 𝜇 − 𝜇 < 0 at 𝛼 = .10. c.

Since the sample size of the number of pairs is greater than 30, we do not need to assume that the population of differences is normal. The sampling distribution of 𝑥̅ is approximately normal by the Central Limit Theorem. We must assume that the differences are randomly selected.

d.

For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table II, Appendix D, 𝑧. 1.645. The 90% confidence interval is: 𝑥̅ ± 𝑧.

8.27

⇒ −3.5 ± 1.645

√ √

=

⇒ −3.5 ± 1.223 ⇒ (−4.723, −2.277)

e.

The confidence interval provides more information since it gives an interval of possible values for the difference between the population means.

a.

The rejection region requires 𝛼 = .05 in the upper tail of the t-distribution with 𝑑𝑓 = 𝑛 − 1 = 12 − 1 = 11. From Table III, Appendix D, 𝑡. = 1.796. The rejection region is𝑡 > 1.796.

b.

From Table III, with 𝑑𝑓 = 𝑛 − 1 = 24 − 1 = 23, 𝑡.

c.

From Table III, with 𝑑𝑓 = 𝑛 − 1 = 4 − 1 = 3, 𝑡.

= 3.182. The rejection region is 𝑡 > 3.182.

d.

Using Minitab, with 𝑑𝑓 = 𝑛 − 1 = 80 − 1 = 79, 𝑡.

= 2.374. The rejection region is 𝑡 > 2.374.

= 1.319. The rejection region is 𝑡 > 1.319.

Copyright © 2022 Pearson Education, Inc.


Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 379 8.28

a. Pair

Difference

1 2 3 4 5 6

3 2 2 4 0 1

𝑥̅ =

=

=2

𝑠 =

(

)

=

=2

b.

𝜇 =𝜇 −𝜇

c.

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table III, Appendix D, with 𝑑𝑓 = 𝑛 − 1 = 6 − 1 = 5, 𝑡. 25 = 2.571. The confidence interval is: 𝑥̅ ± 𝑡 /

d.

= 2.571

⇒ 2 ± 1.484 ⇒ (.516, 3.484)

𝐻 : 𝜇 =0 𝐻 : 𝜇 ≠0 The test statistic is 𝑡 =

̅

=

√ /√

= 3.46

The rejection region requires 𝛼/2 = .05/2 = .025 in each tail of the t-distribution with 𝑑𝑓 = 𝑛 − 1 = 6 − 1 = 5. From Table III, Appendix D,𝑡. 25 = 2.571. The rejection region is 𝑡 < −2.571 of 𝑡 > 2.571. Since the observed value of the test statistic falls in the rejection region ( 𝑡 = 3.46 > 2.571), H0 is rejected. There is sufficient evidence to indicate that the mean difference is different from 0 at 𝛼 = .05. 8.29

Let 𝜇 = mean of population 1 and 𝜇 = mean of population 2. a. b.

𝐻 : 𝜇 =0 𝐻 : 𝜇 <0

where 𝜇 = 𝜇 − 𝜇

Some preliminary calculations are: Pair

Population 1

Population 2

1 2 3 4 5 6 7 8

19 25 31 52 49 34 59 47

24 27 36 53 55 34 66 51

Difference, d −5 −2 −5 −1 −6 0 −7 −4

Copyright © 2022 Pearson Education, Inc.


380

Chapter 8 9 10

𝑥̅ =

=

17 51

20 55

= −3.7 ̅

The test statistic is 𝑡 =

𝑠 =

−3 −4 ∑

(

=

)

= 4.9

.

= √ . = −5.29 √

The rejection region requires 𝛼 = .10 in the lower tail of the t-distribution with 𝑑𝑓 = 𝑛 − 1 = 10 − 1 = 9. From Table III, Appendix D, 𝑡. = 1.383. The rejection region is 𝑡 < −1.383. Since the observed value of the test statistic falls in the rejection region (𝑡 = −5.29 < −1.383), H0 is rejected. There is sufficient evidence to indicate the mean of population 1 is less than the mean for population 2 at 𝛼 = .10. c.

For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table III, Appendix D, with 𝑑𝑓 = 𝑛 − 1 = 10 − 1 = 9, 𝑡. = 1.833. The 90% confidence interval is: 𝑥̅ ± 𝑡 /

⇒ −3.7 ± 1.833

√ .

⇒ −3.7 ± 1.28 ⇒ (−4.98, −2.42)

We are 90% confident that the difference in the two population means is between −4.98 and −2.42.

8.30

d.

We must assume that the population of differences is normal, and the sample of differences is randomly selected.

a.

Some preliminary calculations: 𝑥̅ =

=

= 11.7 𝑠 =

=

,

= 36.0103

To determine if 𝜇 − 𝜇 = 𝜇 is different from 10, we test: 𝐻 : 𝜇 = 10 𝐻 : 𝜇 ≠ 10 The test statistic is 𝑧 =

̅

=√

. . √

= 1.79

The rejection region requires 𝛼/2 = .05/2 = .025 in each tail of the z-distribution. From Table II, = 1.96. The rejection region is 𝑧 < −1.96 or 𝑧 > 1.96. Appendix D, 𝑧. Since the observed value of the test statistic does not fall in the rejection region (𝑧 = 1.79 ≯ 1.96), H0 is not rejected. There is insufficient evidence to indicate 𝜇 − 𝜇 = 𝜇 is different from 10 at 𝛼 = .05. b.

The p-value is 𝑝 = 𝑃(𝑧 ≤ −1.79) + 𝑃(𝑧 ≥ 1.79) = (. 5 − .4633) + (. 5 − .4633) = .0367 + .0367 = .0734. The probability of observing our test statistic or anything more unusual if H0 is true is .0734. Since this p-value is not small, there is no evidence to indicate 𝜇 − 𝜇 = 𝜇 is different from 10 at 𝛼 = .05.

Copyright © 2022 Pearson Education, Inc.


Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 381

8.31

c.

No, we do not need to assume that the population of differences is normally distributed. Because our sample size is 40, the Central Limit Theorem applies.

a.

Let 𝜇 = mean neutral percentage for all head measurements and 𝜇 = mean neutral percentage for all neck measurements. To compare the mean neutral percentage for the head to the mean neutral percentage for the neck, we test: 𝐻 : 𝜇 =0 𝐻 : 𝜇 ≠0

where 𝜇 = 𝜇 − 𝜇

b.

Since both head and neck data have been collected for each of the 31 workers, the data should be analyzed as a paired difference analysis.

c.

The summary statistics shown were calculated as if the samples were randomly and independently collected. The summary statistics for the paired difference analysis would be calculated differently.

d.

For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table II, Appendix D, 𝑧. 1.645. The 90% confidence interval is: 𝑥̅ ± 𝑧 /

⇒ 5.77 ± 1.645

. √

=

⇒ 5.77 ± .443 ⇒ (5.327, 6.213)

We are 90% confident that the mean neutral head percentage exceeds the mean neutral neck percentage by between 5.327% and 6.213%. 8.32

a.

For this situation, a paired-samples test is the most valid because there is much variation in opinions from person to person. By having each student rate both types of advertising, we can eliminate the variation among the students. Let 𝜇 = mean typical score for TV advertising and 𝜇 = mean typical score for magazine advertising. To determine if there is a difference in the mean typical scores between TV and magazine advertising, we test: 𝐻 : 𝜇TV − 𝜇Magazine = 0 𝐻 : 𝜇TV − 𝜇Magazine ≠ 0

b.

The test statistic is 𝑡 = 6.96 and the p-value is 𝑝 = .001. Since the p-value is so small (𝑝 = .001), H0 is rejected. There is sufficient evidence to indicate that there is a difference in the mean typical scores between TV and magazine advertising for any value of 𝛼 > .001.

c.

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. 25 = 1.96. The confidence interval is: 𝑥̅ ± 𝑧.

⇒ .45 ± 1.96

. √

⇒ .45 ± .13 ⇒ (. 32, . 58)

These values are very similar. The two means are probably not “practically significantly” different. 8.33

a.

Since the data were collected as “twin holes”, it needs to be analyzed as paired differences.

b.

The differences are calculated by finding the difference between the 1st hole and the second hole.

Copyright © 2022 Pearson Education, Inc.


382

Chapter 8 1st hole 5.5 11.0 5.9 8.2 10.0 7.9 10.1 7.4 7.0 9.2 8.3 8.6 10.5 5.5 10.0

Location 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ∑

𝑠 =

( . )

d.

For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table III, Appendix D with 𝑑𝑓 = 𝑛 − 1 = 15 − 1 = 14, 𝑡. = 1.761. The 90% confidence interval is: ⇒ .14 ± 1.761

. √

=

.

𝑥̅ =

e.

= 0.14

Difference -0.2 -0.2 -0.1 2.6 0.7 0.9 1.7 -1.6 1.0 1.1 -1.7 0.5 0.1 -1.5 -1.2

c.

𝑥̅ ± 𝑡 /

=

.

2nd hole 5.7 11.2 6.0 5.6 9.3 7.0 8.4 9.0 6.0 8.1 10.0 8.1 10.4 7.0 11.2

= 1.597 𝑠 = √1.597 = 1.2637.

⇒ .14 ± .575 ⇒ (−.435, .715)

We are 90% confident that the true difference in the mean THM measurements between the 1st and 2nd hole is between -.435 and .715. Yes, the geologists can conclude that there is no evidence of a difference in the true mean THM measurements between the original holes and their twin holes because 0 falls in the interval at 𝛼 = .10.

8.34

a.

Let 𝜇 = 𝜇 − 𝜇 . To determine if the mean score for the fictitious brand is greater than the mean score for the commercially available brand, we test: 𝐻 : 𝜇 =0 𝐻 : 𝜇 >0

b.

The data should be analyzed as paired differences. Each child rated both brands, so the samples are not independent.

c.

The rejection region requires 𝛼 = .05 in the upper tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.645. The rejection region is 𝑧 > 1.645. Since the test statistic falls in the rejection region (𝑧 = 5.71 > 1.645), H0 is rejected. There is sufficient evidence to indicate the mean score for the fictitious brand is greater than the mean score for the commercially available brand at 𝛼 = .05.

d.

The p-value is 𝑝 = 𝑃(𝑧 ≥ 5.71) ≈ .5 − .5 = 0.

e.

Yes. Since the p-value is less than𝛼 = .01, the conclusion would still be to reject H0.

Copyright © 2022 Pearson Education, Inc.


Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 383 8.35

a.

The data should be analyzed using a paired-difference experiment because that is how the data were collected. Response rates were observed twice from each survey using the “not selling” introduction method and the standard introduction method. Since the two sets of data are not independent, they cannot be analyzed using independent samples.

b.

Some preliminary calculations are: 𝑠 =

(

)

(

)

=

(

)(.

)

(

)(.

)

= .01325

Let 𝜇 = mean response rate for those using the “not selling” introduction and 𝜇 = mean response rate for those using the standard introduction. Using the independent-samples t-test to determine if the mean response rate for “not selling” is higher than that for the standard introduction, we test: 𝐻 : 𝜇 −𝜇 =0 𝐻 : 𝜇 −𝜇 >0 The test statistic is𝑡 =

( ̄

̄ )

=

(.

.

.

)

= .53

The rejection region requires 𝛼 = .05 in the upper tail of the t-distribution with df = 𝑛 + 𝑛 – 2 = 29 + 29– 2 = 56. From Table III, Appendix D, 𝑡. ≈ 1.671. The rejection region is 𝑡 > 1.671. Since the observed value of the test statistic does not fall in the rejection region (𝑡 = .53 ≯ 1.671), H0 is not rejected. There is insufficient evidence to indicate the mean response rate for “not selling” is higher than that for the standard introduction at 𝛼 = .05.

8.36

c.

Since p-value is less than 𝛼 (𝑝 = .001 < .05), H0 is rejected. There is sufficient evidence to indicate the mean response rate for “not selling” is higher than that for the standard introduction at 𝛼 = .05.

d.

The two inferences in parts b and c have different results because using the independent samples ttest is not appropriate for this study. The paired-difference design is better. There is much variation in response rates from survey to survey. By using the paired difference design, we can eliminate the survey to survey differences.

Minitab was used to create the following summary statistics: Descriptive Statistics

a.

Sample

N Mean StDev SE Mean

ConditionA ConditionB

24 23.046 24 20.196

4.439 3.893

0.906 0.795

Minitab was used to create the 90% confidence interval for the difference between the mean COP ranges for Conditions A and B and is shown here: Estimation for Paired Difference 90% CI for Mean StDev SE Mean μ_difference 2.850

1.262

0.258 (2.408, 3.292)

µ_difference: mean of (ConditionA - ConditionB)

Copyright © 2022 Pearson Education, Inc.


384

Chapter 8 We are 90% confident that the mean COP range for Condition A exceeds the mean COP range for condition B by between 2.408 and 3.292 millimeters. b.

Minitab was used to create the 90% confidence interval for the difference between the mean COP ranges for Conditions B and C and is shown here: Estimation for Paired Difference 90% CI for Mean StDev SE Mean μ_difference 0.479

1.880

0.384 (-0.178, 1.137)

µ_difference: mean of (ConditionB - ConditionC)

Because the value of 0 is contained in the 90% confidence interval, we cannot determine a difference between the mean COP range of Condition B and the mean COP range of Condition C. 8.37

Some preliminary calculations are: Operator 1 2 3 4 5 6 7 8 9 10

Difference (Before - After) 5 3 9 7 2 −2 −1 11 0 5 ∑

𝑥̅ =

a.

To determine if the new napping policy reduced the mean number of customer complaints, we test:

=

= 3.9

𝑠 =

=

= 18.5444

𝑠 = √18.5444 = 4.3063

𝐻 : 𝜇 =0 𝐻 : 𝜇 >0 The test statistic is 𝑡 =

̅

=

. . √

= 2.864

The rejection region requires 𝛼 = .05in the upper tail of the t-distribution with 𝑑𝑓 = 𝑛 − 1 = 10 − 1 = 9. From Table III, Appendix D, 𝑡. = 1.833. The rejection region is 𝑡 > 1.833. Since the observed value of the test statistic falls in the rejection region (𝑡 = 2.864 > 1.833), H0 is rejected. There is sufficient evidence to indicate the new napping policy reduced the mean number of customer complaints at 𝛼 = .05. b.

In order for the above test to be valid, we must assume that Copyright © 2022 Pearson Education, Inc.


Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 385

1. The population of differences is normal 2. The differences are randomly selected c.

8.38

Variables that were not controlled that could lead to an invalid conclusion include time of day agents worked, day of the week agents worked, and how much sleep the agents got before working, among others.

Using MINITAB, the descriptive statistics are: Descriptive Statistics: Initial, Final, Diff Variable N Mean StDev Minimum Initial 3 5.640 1.075 4.560 Final 3 5.453 1.125 4.270 Diff 3 0.1867 0.1106 0.0700

Q1 4.560 4.270 0.0700

Median 5.650 5.580 0.2000

Q3 6.710 6.510 0.2900

Maximum 6.710 6.510 0.2900

Let 𝜇 = mean initial pH level, 𝜇 = mean final pH level, and 𝜇 = 𝜇 − 𝜇 = difference in mean pH levels between the initial and final time periods. To determine if the mean pH level after 30 days differs from the initial pH level, we test: 𝐻 : 𝜇 =0 𝐻 : 𝜇 ≠0 The test statistic is 𝑡 =

̅

=

.

= 2.924.

. √

The rejection region requires 𝛼/2 = .05/2 = .025 in each tail of the t-distribution with 𝑑𝑓 = 𝑛 − 1 = 3 − 1 = 2. From Table III, Appendix D, 𝑡.025 = 4.303. The rejection region is 𝑡 < −4.303 and 𝑡 > 4.303. Since the observed value of the test statistic does not fall in the rejection region (𝑡 = 2.924 ≯ 4.303), H0 is not rejected. There is insufficient evidence to indicate the mean pH level after 30 days differs from the initial pH level at 𝛼 = .05. 8.39

Using MINITAB, the descriptive statistics are: Descriptive Statistics: E-W, N-S, Diff Variable N Mean StDev Minimum E-W 5 7436 1484 5120 N-S 5 7719 1548 5274 Diff 5 -283.6 86.4 -387.0

Q1 5991 6211 -357.5

Median 7930 8317 -286.0

Q3 8633 8929 -208.5

Maximum 8658 8936 -154.0

Let 𝜇 = mean solar energy amount generated by East-West oriented highways and 𝜇 = mean solar energy amount generated by North-South oriented highways. For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table III, Appendix D, with 𝑑𝑓 = 𝑛 − 1 = 5 − 1 = 4, 𝑡. 25 = 2.776. The confidence interval is: 𝑥̅ ± 𝑡.

⇒ −283.6 ± 2.776

. √

⇒ −283.6 ± 107.26 ⇒ (−390.86, − 176.34)

We are 95% confident that the difference in the mean amount of solar energy generated by East-West oriented highways and North-South oriented highways is between -390.86 and -176.34 kilo-Watt hours. Since this interval contains only negative numbers, it supports the researchers conclusion that the two-layer solar panel energy generation is more viable for the north-south oriented highways as compared to eastwest oriented roadways.

Copyright © 2022 Pearson Education, Inc.


386 8.40

Chapter 8 Let 𝜇 = mean number of crashes caused by red light running per intersection before camera is installed and 𝜇 = mean number of crashes caused by red light running per intersection after camera is installed. The data are collected as paired data, so we will analyze the data using a paired t-test. Then, let 𝜇 = 𝜇 − 𝜇 . Using MINITAB to compute the differences (Di) and the summary statistics, the results are: Descriptive Statistics: Before, After, Di Variable N Mean StDev Minimum Before 13 2.513 1.976 0.270 After 13 1.506 1.448 0.000 Di 13 1.007 1.209 -0.850

Q1 0.805 0.260 0.265

Median 2.400 1.360 0.560

Q3 3.405 2.380 2.335

Maximum 7.350 4.920 2.780

To determine if the mean number of crashes caused by red light running has been reduced since the installation of red light cameras, we test: 𝐻 : 𝜇 =0 𝐻 : 𝜇 >0 ̅

.

=

The test statistic is 𝑡 =

. √

= 3.003

The rejection region requires 𝛼 = .05 in the upper tail of the t-distribution with 𝑑𝑓 = 𝑛 − 1 = 13 − 1 = 12. From Table III, Appendix D, 𝑡. = 1.782. The rejection region is 𝑡 > 1.782. Since the observed value of the test statistic falls in the rejection region (𝑡 = 3.003 > 1.782), H0 is rejected. There is sufficient evidence to indicate that photo-red enforcement program is effective in reducing red-light-running crash incidents at intersections at 𝛼 = .05. 8.41

Using MINITAB, the descriptive statistics are: Descriptive Statistics: Male, Female, Diff Variable N Mean Median StDev Male 19 5.895 6.000 2.378 Female 19 5.526 5.000 2.458 Diff 19 0.368 1.000 3.515

Minimum 3.000 3.000 -5.000

Maximum Q1 12.000 4.000 12.000 4.000 7.000 -3.000

Q3 8.000 7.000 3.000

Let 𝜇 = mean number of swims by male rat pups, 𝜇 = mean number of swims by female rat pups, and 𝜇 = 𝜇 − 𝜇 . To determine if there is a difference in the mean number of swims required by male and female rat pups, we test: 𝐻 : 𝜇 =0 𝐻 : 𝜇 ≠0 The test statistic is 𝑡 =

̅

=

. . √

= 0.46

The rejection region requires 𝛼/2 = .10/2 = .05 in each tail of the t-distribution with 𝑑𝑓 = 𝑛 − 1 = 19 − 1 = 18. From Table III, Appendix D, 𝑡. = 1.734. The rejection region is 𝑡 < −1.734 or 𝑡 > 1.734. Since the observed value of the test statistic does not fall in the rejection region (𝑡 = .46 ≯ 1.734), H0 is not rejected. There is insufficient evidence to indicate that there is a difference in the mean number of swims required by male and female rat pups at 𝛼 = .10. (using Minitab, the p-value ≈ .653.) Since the sample size is not large, we must assume that the population of differences is normally distributed Copyright © 2022 Pearson Education, Inc.


Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 387 and that the sample of differences is random. There is no indication that the sample differences are not from a random sample. However, because the number of swims is discrete, the differences are probably not normal. 8.42

Using MINITAB, the descriptive statistics are: Descriptive Statistics: HMETER, HSTATIC, Diff Variable N Mean StDev Minimum Q1 Median HMETER 40 1.0405 0.0403 0.9936 1.0047 1.0232 HSTATIC 40 1.0410 0.0410 0.9930 1.0043 1.0237 Diff 40 -0.000523 0.001291 -0.004480 -0.001078 -0.000165

Q3 1.0883 1.0908 0.000317

Maximum 1.1026 1.1052 0.001580

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table III, Appendix D, with 𝑑𝑓 = 𝑛 − 1 = 40 − 1 = 39, 𝑡. 25 ≈ 2.021. The 95% confidence interval is: 𝑥̅ ± 𝑡.

⇒ −0.000523 ± 2.021

. √

⇒ −0.000523 ± 0.000413 ⇒ (−0.000936, −0.000110)

We are 95% confident that the true difference in mean density measurements between the two methods is between -0.000936 and -0.000110. Since the absolute value of this interval is completely less than the desired maximum difference of .002, the winery should choose the alternative method of measuring wine density. 8.43

a.

From the exercise, we know that 𝑥 and 𝑥 are binomial random variables with the number of trials equal to 𝑛 and 𝑛 . From Chapter 7, we know that for large n, the distribution of 𝑝̂ = is approximately normal. Since 𝑥 is simply 𝑝̂ multiplied by a constant, 𝑥 will also have an approximate normal distribution. Similarly, the distribution of 𝑝̂ = is approximately normal, and thus, the distribution of 𝑥 is approximately normal.

8.44

b.

The Central Limit Theorem is necessary to find the sampling distributions of 𝑝̂ and 𝑝̂ when n1 and n2 are large. Once we have established that both 𝑝̂ and 𝑝̂ have normal distributions, then the distribution of their difference will also be normal.

a.

The rejection region requires 𝛼 = .01 in the lower tail of the z-distribution. From Table II, Appendix D, 𝑧. = 2.33. The rejection region is 𝑧 < −2.33.

b.

The rejection region requires 𝛼 = .025 in the lower tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.96. The rejection region is 𝑧 < −1.96. The rejection region requires 𝛼 = .05 in the lower tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.645. The rejection region is 𝑧 < −1.645.

c. d. 8.45

The rejection region requires 𝛼 = .10 in the lower tail of the z-distribution. From Table II, Appendix D, 𝑧.10 = 1.28. The rejection region is 𝑧 < −1.28.

From Section 6.4, it was given that the distribution of 𝑝̂ is approximately normal if 𝑛𝑝̂ ≥ 15 and 𝑛𝑞 ≥ 15. a.

𝑛 𝑝̂ = 12(. 42) = 5.04 < 15 and 𝑛 𝑞 = 12(. 58) = 6.96 < 15 𝑛 𝑝̂ = 14(. 57) = 7.98 < 15 and 𝑛 𝑞 = 14(. 43) = 6.02 < 15 Thus, the sample sizes are not large enough to conclude the sampling distribution of (𝑝̂ − 𝑝̂ ) is approximately normal.

b.

𝑛 𝑝̂ = 12(. 92) = 11.04 < 15 and 𝑛 𝑞 = 12(. 08) = 0.96 < 15 𝑛 𝑝̂ = 14(. 86) = 12.04 < 15 and 𝑛 𝑞 = 14(. 14) = 1.96 < 15 Thus, the sample sizes are not large enough to conclude the sampling distribution of (𝑝̂ − 𝑝̂ ) is approximately normal. Copyright © 2022 Pearson Education, Inc.


388

8.46

Chapter 8

c.

𝑛 𝑝̂ = 30(. 70) = 21 > 15 and 𝑛 𝑞 = 30(. 30) = 9 < 15 𝑛 𝑝̂ = 30(. 73) = 21.9 > 15 and 𝑛 𝑞 = 30(. 27) = 8.1 < 15 Thus, the sample sizes are not large enough to conclude the sampling distribution of (𝑝̂ − 𝑝̂ ) is approximately normal.

d.

𝑛 𝑝̂ = 100(. 93) = 93 > 15 and 𝑛 𝑞 = 100(. 07) = 7 < 15 𝑛 𝑝̂ = 250(. 97) = 242.5 > 15 and 𝑛 𝑞 = 250(. 03) = 7.5 < 15 Thus, the sample sizes are not large enough to conclude the sampling distribution of (𝑝̂ − 𝑝̂ ) is approximately normal.

e.

𝑛 𝑝̂ = 125(. 08) = 10 < 15 and 𝑛 𝑞 = 125(. 92) = 115 > 15 𝑛 𝑝̂ = 200(. 12) = 24 > 15 and 𝑛 𝑞 = 200(. 88) = 176 > 15 Thus, the sample sizes are not large enough to conclude the sampling distribution of (𝑝̂ − 𝑝̂ ) is approximately normal.

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. = 1.96. The 95% confidence interval for 𝑝 − 𝑝 is approximately: a.

(𝑝̂ − 𝑝̂ ) ± 𝑧 /

+

⇒ (.65 − .58) ± 1.96

.

(

.

)

+

.

(

.

)

.

(

.

)

(

.

)

⇒ .07 ± .067 ⇒ (. 003, .137) b.

(𝑝̂ − 𝑝̂ ) ± 𝑧 /

+

⇒ (.31 − .25) − 1.96

.

(

.

)

+

⇒ .06 ± .086 ⇒ (−.026, .146) c.

(𝑝̂ − 𝑝̂ ) ± 𝑧 /

+

⇒ (.46 − .61) ± 1.96

.

(

.

)

+

.

⇒ −.15 ± .131 ⇒ (−.281, −.019) 8.47

a.

𝐻 : 𝑝 −𝑝 =0 𝐻 : 𝑝 −𝑝 >0 Will need to calculate the following: 𝑝̂ =

= .40

𝑝̂ =

The test statistic is 𝑧 =

(

= .50 )

=

𝑝̂ =

(.

= .45

(.

.

)(.

)

)

= −4.02

The rejection region requires 𝛼 = .05 in the upper tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.645. The rejection region is 𝑧 > 1.645. Since the observed value of the test statistic does not fall in the rejection region (𝑧 = −4.02 ≯ 1.645), H0 is not rejected. There is insufficient evidence to indicate that 𝑝 > 𝑝 the proportions are unequal at 𝛼 = .05. b.

𝐻 : 𝑝 −𝑝 =0 𝐻 : 𝑝 −𝑝 ≠0 The test statistic is 𝑧 = −4.02. Copyright © 2022 Pearson Education, Inc.


Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 389

The rejection region requires 𝛼/2 = .01/2 = .005 in each tail of the z-distribution. From Table II, Appendix D, 𝑧. = 2.58. The rejection region is 𝑧 < −2.58 or 𝑧 > 2.58. Since the observed value of the test statistic falls in the rejection region (𝑧 = −4.02 < −2.58), H0 is rejected. There is sufficient evidence to indicate that the proportions are unequal at 𝛼 = .01. c.

𝐻 : 𝑝 −𝑝 =0 𝐻 : 𝑝 −𝑝 <0 Test statistic as above 𝑧 = −4.02. The rejection region requires 𝛼 = .01in the lower tail of the z-distribution. From Table II, Appendix D, 𝑧. = 2.33. The rejection region is 𝑧 < −2.33. Since the observed value of the test statistic falls in the rejection region (𝑧 = −4.02 < −2.33), H0 is rejected. There is sufficient evidence to indicate that 𝑝 < 𝑝 at 𝛼 = .01.

d.

For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table II, Appendix D, 𝑧. = 1.645. The confidence interval is: (𝑝̂ − 𝑝̂ ) ± 𝑧.

⇒ (. 4 − .5) ± 1.645

+

(. )(. )

+

(. )(. )

⇒ −.10 ± .04 ⇒ (−.14, −.06)

We are 90% confident that the difference between p1 and p2 is between −.14 and −.06. 8.48

=

𝑝̂ =

(. )

(. )

=

.

= .646

𝑞 = 1 − 𝑝̂ = 1 − .646 = .354

𝐻 : 𝑝 −𝑝 =0 𝐻 : 𝑝 −𝑝 >0 The test statistic is 𝑧 =

(

)

=

.

(.

. )

(.

)

= 1.14

The rejection region requires 𝛼 = .05 in the upper tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.645. The rejection region is 𝑧 > 1.645. Since the observed value of the test statistic does not fall in the rejection region (𝑧 = 1.14 ≯ 1.645), H0 is not rejected. There is insufficient evidence to indicate the proportion from population 1 is greater than that for population 2 at 𝛼 = .05. 8.49

a.

𝑝̂ =

=

= .153

b.

𝑝̂ =

=

= .215

c.

For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table II, Appendix D, 𝑧. 1.645. The 90% confidence interval is: (𝑝̂ − 𝑝̂ ) ± 𝑧 /

+

⇒ (. 153 − .215) ± 1.645

.

(.

⇒ −.062 ± .070 ⇒ (−.132, . 008) Copyright © 2022 Pearson Education, Inc.

)

+

.

(.

)

=


390

Chapter 8 d.

We are 90% confident that the difference in the proportion of bidders who fall prey to the winner’s curse between super-experienced bidders and less-experienced bidders is between −.132 and .008. Since this interval contains 0, there is no evidence to indicate that there is a difference in the proportion of bidders who fall prey to the winner’s curse between super-experienced bidders and lessexperienced bidders.

8.50

Let 𝑝 = redemption rate for milkshake stores and 𝑝 = redemption rate for donut stores. a.

𝑝̂ =

b.

𝑝̂ =

c.

𝑝̂ − 𝑝̂ = .032 − .011 = .021

d.

For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table II, Appendix D, 𝑧. 1.645. The 90% confidence interval is

,

,

= .032 = .011

(𝑝̂ − 𝑝̂ ) ± 𝑧.

=

𝑝̂ (1 − 𝑝̂ ) 𝑝̂ (1 − 𝑝̂ ) + 𝑛 𝑛 ⇒ (. 032 − .011) ± 1.645

. 032(1 − .032) . 011(1 − .011) + 2,447 6,619

⇒ .021 ± .006 ⇒ (. 015, .027) We are 90% confident that the difference in the redemption rates between milkshake stores and donut stores is between .015 and .027.

8.51

e.

“90% confident” means that in repeated sampling, all confidence interval constructed in a similar manner will contain the true difference in redemption rates.

f.

Since 0 is not contained in the interval, there is a statistical difference in the redemption rates.

g.

The redemption rates are “practically different” if the rates differ by more than .01. Since .01 is not contained in the interval, the redemption rates are “practically different”.

a.

Let 𝑝 =proportion of men who provided a “bending” negotiating strategy and 𝑝 = proportion of women who provided a “bending” negotiating strategy The parameter of interest is 𝑝 − 𝑝 .

b.

To determine if the proportion of men who provided a “bending” negotiating strategy differs from the proportion of women, we test: 𝐻 : 𝑝 −𝑝 =0 𝐻 : 𝑝 −𝑝 ≠0

c.

The test statistic is 𝑧 = −2.63.

d.

The rejection region requires 𝛼/2 = .01/2 = .005 in each tail of the z-distribution. From Table II, Appendix D, 𝑧.005 = 2.58. The rejection region is 𝑧 < −2.58 and 𝑧 > 2.58.

e.

The p-value is 𝑝 = .008. Copyright © 2022 Pearson Education, Inc.


Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 391 f.

Since the observed value of the test statistic falls in the rejection region (𝑧 = −2.63 < −2.58), H0 is rejected. There is sufficient evidence to indicate that the proportion of men who provided a “bending” negotiating strategy differs from the proportion of women at 𝛼 = .01. Since the p-value is less than 𝛼 (𝑝 = .008 < .01), H0 is rejected. This is the same conclusion as above.

8.52

a.

𝑝̂ White =

b.

To determine if the true break-off rate for web users of the red welcome screen will be lower than the corresponding break-off rate for the white welcome screen, we test:

= .258

𝑝̂ Red =

= .258

𝐻 : 𝑝White − 𝑝Red = 0 𝐻 : 𝑝White − 𝑝Red > 0 c.

𝑝̂ = 𝑧=

= White White

8.53

= .2306

Red

=

.

Red

𝑞 = 1 − 𝑝̂ = 1 − .2306 = .7694 .

.

(.

)

= 1.28

d.

𝑝 = 𝑃(𝑧 > 1.28) = .5 − .3997 = .1003 (Using Table II, Appendix D)

e.

Since the p-value is greater than 𝛼 (𝑝 = .1003 > .10), H0 is rejected. There is sufficient evidence to indicate that the true break-off rate for web users of the red welcome screen will be lower than the corresponding break-off rate for the white welcome screen at 𝛼 = .10.

a.

Let 𝑝 = proportion of sports-playing children in 2019 who earned a college athletic scholarship and 𝑝 = proportion of sports-playing children in 2016 who earned a college athletic scholarship

b.

Some preliminary calculations: 𝑝̂ =

=

𝑝̂ =

=

, ,

= .110

𝑞 = 1 − 𝑝̂ = 1 − .110 = .890

= .250

𝑞 = 1 − 𝑝̂ = 1 − .250 = .750

The point estimate is (𝑝̂ − 𝑝̂ ) = .110 − .250 = −.140 c.

For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table II, Appendix D, 𝑧. = 1.645. A 90% confidence interval for the difference in the rate of heart attacks for the two groups is

(𝑝̂ − 𝑝̂ ) ± 𝑧.

𝑝̂ 𝑞 𝑝̂ 𝑞 . 110(. 890) . 250(. 750) + ⇒ (. 110 − .250) ± 1.645 + 𝑛 𝑛 1,001 1,001 ⇒ −.140 ± .028 ⇒ (−.168, −.112)

d.

8.54

Since this interval contains only negative values, we conclude that we are 90% confident that the proportion of sports-playing children in 2019 who earned a college athletic scholarship is less than the proportion of sports-playing children in 2016 who earned a college athletic scholarship.

Let 𝑝 = proportion of traffic signs that fail the minimum FHWA retroreflectivity requirements that are maintained by the NCDOT and 𝑝 = proportion of traffic signs that fail the minimum FHWA retroreflectivity requirements that are maintained by county owned roads. Copyright © 2022 Pearson Education, Inc.


392

Chapter 8 Some preliminary calculations are: 𝑝̂ =

= .512

,

𝑝̂ =

= .328

,

𝑝̂ =

,

,

=

,

= .420

𝑞 = 1 − 𝑝̂ = 1 − .420 = .580

To determine if there is a difference in the failure rates between signs maintained by the NCDOT and county roads, we test: 𝐻 : 𝑝 −𝑝 =0 𝐻 : 𝑝 −𝑝 ≠0 The test statistic is 𝑧 =

=

. .

(.

. )

= 8.34. ,

,

The rejection region requires 𝛼/2 = .05/2 = .025 in each tail of the z-distribution. From Table II, = 1.96. The rejection region is 𝑧 < −1.96 or 𝑧 > 1.96. Appendix D, 𝑧. Since the test statistic falls in the rejection region (𝑧 = 8.34 > 1.96), H0 is rejected. There is sufficient evidence to indicate there is a difference in the failure rates between signs maintained by the NCDOT and county roads at 𝛼 = .05. 8.55

Let 𝑝 = proportion of salmonella in the region’s water and 𝑝 = proportion of salmonella in the region’s wildlife. Some preliminary calculations are: 𝑝̂ =

=

= .071

𝑝̂ =

=

= .042

𝑝̂ =

=

=

= .052

To determine if the prevalence of salmonella in the region’s water differs from the prevalence of salmonella in the region’s wildlife, we test: 𝐻 : 𝑝 −𝑝 =0 𝐻 : 𝑝 −𝑝 ≠0 The test statistic is 𝑧 =

(

)

=

. .

(.

. )

= 1.68

The rejection region requires 𝛼/2 = .01/2 = .005 in each tail of the z-distribution. From Table II, Appendix D, 𝑧.005 = 2.58. The rejection region is 𝑧 < −2.58 and 𝑧 > 2.58. Since the observed value of the test statistic does not fall in the rejection region (𝑧 = 1.68 ≯ 2.58), H0 is not rejected. There is insufficient evidence to indicate the prevalence of salmonella in the region’s water differs from the prevalence of salmonella in the region’s wildlife at 𝛼 = .01. 8.56

Let 𝑝 = proportion of patients in the angioplasty group who had subsequent heart attacks and 𝑝 = proportion of patients in the medication only group who had subsequent attacks. Some preliminary calculations: 𝑝̂ = = = .184

𝑞 = 1 − 𝑝̂ = 1 − .184 = .816

𝑝̂ =

𝑞 = 1 − 𝑝̂ = 1 − .177 = .823

,

=

,

= .177

Copyright © 2022 Pearson Education, Inc.


Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 393 For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, = 1.96. A 95% confidence interval for the difference in the rate of heart attacks for the two groups is 𝑧. 𝑝̂ 𝑞 . 184(. 816) . 177(. 823) 𝑝̂ 𝑞 + ⇒ (. 184 − .177) ± 1.96 + ⇒ .007 ± .032 𝑛 𝑛 1,145 1,142

(𝑝̂ − 𝑝̂ ) ± 𝑧.

⇒ (−.025, .039) Since this interval contains 0, there is insufficient evidence to indicate that there is a difference in the rate of heart attacks between the angioplasty group and the medication only group at 𝛼 = .05. Yes, we agree with the study’s conclusion. 8.57

a.

Let 𝑝 = proportion of products purchased that are considered “healthy” during the indulgent scent hour and 𝑝 = proportion of products purchased that are considered “healthy” during the nonindulgent scent hour. Some preliminary calculations are: 𝑝̂ =

=

= .295

𝑝̂ =

=

= .455

𝑝̂ =

=

=

= .398

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, = 1.96. The 95% confidence interval is: 𝑧. (𝑝̂ − 𝑝̂ ) ± 𝑧.

𝑝̂ 𝑞 𝑝̂ 𝑞 . 295(. 705) . 455(. 545) + ⇒ (. 295 − .455) ± 1.96 + 𝑛 𝑛 292 527 ⇒. −.160 ± .067 ⇒ (−.227, −.093)

We are 95% confident that the difference in the proportion of products purchased that are considered “healthy” during the indulgent scent hour and the proportion of products purchased that are considered “healthy” during the non-indulgent scent hour is between -.227 and -.093. b.

To determine if there is a difference the proportion of products purchased that are considered “healthy” during the indulgent scent hour and the proportion of products purchased that are considered “healthy” during the non-indulgent scent hour, we test: 𝐻 : 𝑝 −𝑝 =0 𝐻 : 𝑝 −𝑝 ≠0 The test statistic is 𝑧 =

=

. .

(.

. )

= −4.51.

The rejection region requires 𝛼/2 = .05/2 = .025 in each tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.96. The rejection region is 𝑧 < −1.96 or 𝑧 > 1.96. Since the test statistic falls in the rejection region (𝑧 = −4.51 < −1.96), H0 is rejected. There is sufficient evidence to indicate there is a difference in the proportion of products purchased that are considered “healthy” during the indulgent scent hour and the proportion of products purchased that are considered “healthy” during the non-indulgent scent hour 𝛼 = .05. c.

Both the confidence interval and the test of hypothesis yield similar results and agree. Copyright © 2022 Pearson Education, Inc.


394

8.58

Chapter 8

Let 𝑝 = accuracy rate for modules with correct code and 𝑝 = accuracy rate for modules with defective code. Some preliminary calculations are: 𝑝̂ =

=

= .891

𝑝̂ =

=

= .408

For confidence coefficient .99, 𝛼 = .01 and 𝛼/2 = .01/2 = .005. From Table II, Appendix D, = 2.58. The 99% confidence interval is: 𝑧. (𝑝̂ − 𝑝̂ ) ± 𝑧.

𝑝̂ 𝑞 𝑝̂ 𝑞 . 891(. 109) . 408(. 592) + ⇒ (. 891 − .408) ± 2.58 + ⇒ .483 ± .185 𝑛 𝑛 449 49 ⇒ (. 298, .668)

We are 99% confident that the difference in accuracy rates between modules with correct code and modules with defective code is between .298 and .668. 8.59

Let 𝑝 = proportion of server-flow sites that are vulnerable to impersonation attacks and 𝑝 = proportion of client-flow sites that are vulnerable to impersonation attacks Some preliminary calculations are: 𝑝̂ =

= .500 𝑝̂ =

= .759

𝑝̂ =

=

= .649

𝑞 = 1 − 𝑝̂ = 1 − .649 = .351

To determine if a client-flow website is more likely to be vulnerable to an impersonation attack than a client-flow website, we test: 𝐻 : 𝑝 −𝑝 =0 𝐻 : 𝑝 −𝑝 <0 =

The test statistic is 𝑧 =

. .

. (.

)

= −2.60.

The p-value is 𝑝 = 𝑃(𝑧 < −2.60) = .5 − .4953 = .0047 (Using Table II, Appendix D) Since the p-value is so small, H0 is rejected. There is sufficient evidence to indicate a client-flow website is more likely to be vulnerable to an impersonation attack than a client-flow website for any value of 𝛼 > .0047. To determine how much more likely a client-flow website is more likely to be vulnerable to an impersonation attack than a server-flow website, we form a 95% confidence interval. For confidence level .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. = 1.96. The 95% confidence interval is:

(𝑝̂ − 𝑝̂ ) ± 𝑧.

. 759(1 − .759) . 500(1 − .500) 𝑝̂ (1 − 𝑝̂ ) 𝑝̂ (1 − 𝑝̂ ) + ⇒ (. 759 − .500) ± 1.96 + 𝑛 𝑛 54 40

⇒ .259 ± .192 ⇒ (. 067, .451) Copyright © 2022 Pearson Education, Inc.


Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 395

We are 95% confident that a client-flow website is more likely to be vulnerable to an impersonation attack than a server-flow website by anywhere from .067 to .451. 8.60

Let 𝑝 = proportion of TV commercials ten years earlier that used religious symbolism and 𝑝 = proportion of TV commercials in a recent study that used religious symbolism. Some preliminary calculations are: 𝑝̂ =

=

= .020

𝑝̂ =

=

= .034

,

𝑝̂ =

=

,

=

,

= .029

To determine if the percentage of TV commercials that use religious symbolism has changed in the previous ten years, we test: 𝐻 : 𝑝 −𝑝 =0 𝐻 : 𝑝 −𝑝 ≠0 The test statistic is 𝑧 =

(

)

=

.

.

(.

.

)

= −1.90 ,

Since no 𝛼 was given, we will use 𝛼 = .05. The rejection region requires 𝛼/2 = .05/2 = .025 in each tail of the z-distribution. From Table II, Appendix D, 𝑧.025 = 1.96. The rejection region is 𝑧 < −1.96 and 𝑧 > 1.96. Since the observed value of the test statistic does not fall in the rejection region (𝑧 = −1.90 ≮ − 1.96), H0 is not rejected. There is insufficient evidence to indicate the percentage of TV commercials that use religious symbolism has changed n the previous ten years at 𝛼 = .05. 8.61

a.

For confidence coefficient .99, 𝛼 = .01 and 𝛼/2 = .01/2 = .005. From Table II, Appendix = 2.58. D, 𝑧. 𝑛 =𝑛 =

b.

a.

) )

=

( .

) . (. ) . (. ) (.

)

= 29,953.8 ≈ 29,954

(𝑝 𝑞 + 𝑝 𝑞 ) (1.645) . 5(. 5) + .5(. 5) = = 2,164.82 ≈ 2,165 (. 025) (𝑆𝐸)

𝑧 /

From part b, 𝑧. 𝑛 =𝑛 =

8.62

( (

For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table II, Appendix D, 𝑧. = 1.645. Since we have no prior information about the proportions, we use 𝑝 = 𝑝 = .5 to get a conservative estimate. For a width of .05, the margin of error is .025. 𝑛 =𝑛 =

c.

/

/

= 1.645. ( (

) )

=

( .

) . (. ) . (. ) (.

)

= 1,112.5 ≈ 19113

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. = 1.96. 𝑛 =𝑛 =

/

=

( .

) (

) .

= 192.83 ≈ 193

Copyright © 2022 Pearson Education, Inc.


396

Chapter 8 b.

If the range of each population is 40, we would estimate 𝜎 by 𝜎 ≈ 60/4 = 15 For confidence coefficient .99, 𝛼 = .01 and 𝛼/2 = .01/2 = .005. From Table II, Appendix D, 𝑧. = 2.58. /

𝑛 =𝑛 = c.

(

( .

)

= 46. 80 ≈ 47

For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table II, Appendix D, 𝑧. = 1.645. For a width of 1, the margin of error is .5. /

𝑛 =𝑛 = 8.63

=

)

(

=

)

( .

)

.

.

= 143.96 ≈ 144

.

/

𝑛 =𝑛 =

(

)

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. = 1.96. 𝑛 =𝑛 = 8.64

(

.

) .

= 33.2 ≈ 34

First, find the sample sizes needed for width 5, or margin of error 2.5. For confidence coefficient .9, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table II, Appendix D, 𝑧. 𝑛 =𝑛 =

/

(

)

=

( .

)

= 1.645.

= 86.59 ≈ 87

.

Thus, the necessary sample size from each population is 87. Therefore, sufficient funds have been allocated to meet the specifications since 𝑛 = 𝑛 = 100 are large enough samples. 8.65

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧.025 = 1.96. 𝑛 =𝑛 =

8.66

(

)

=

( .

)

= 155.6 ≈ 156

For confidence coefficient 0.95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From the Table II, Appendix D, = 1.96. From the data found in Exercise 8.12 𝑠 = .353752. 𝑧. 𝑛 =𝑛 =

8.67

/

/

(

)

=

( .

) (.

.

.

)

= 30.2 ≈ 31

a.

The parameter of interest is 𝑝Server − 𝑝Client .

b.

The desired confidence level is .95.

c.

The desired sampling error is .15.

d.

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. = 1.96. The required sample sizes are 𝑛 = 𝑛 =

/

( (

) )

=

( .

Copyright © 2022 Pearson Education, Inc.

)

. (. ) . (. ) (.

)

= 85.4 ≈ 86.


Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 397

e.

/

Assume that 𝑛 = 2𝑛 . Then 𝑛 =

(

) (

)

=

( .

) . (. ) (.

(. )(. ) )

= 64 and 𝑛 = 2(64) =

128. 8.68

For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table II, Appendix D, 𝑧. = 1.645. Since no information is given about the values of 𝑝 and 𝑝 , we will be conservative and use .5 for both. A width of .04 means the margin of error is .04/2 = .02.

8.69

(

/

𝑛 =𝑛 =

(

)

=

)

.

. (. ) . (. ) .

= 3,382. 5 ≈ 3,383

For confidence coefficient .99, 𝛼 = .01 and 𝛼/2 = .01/2 = .005. From Table II, Appendix D, 𝑧.005 = 2.575. Assume that 𝑛 = 2𝑛 . Then𝑛 =

/

(

)

=

( .

) ( (. (.

) .

)

)

= 159.14 ≈ 160 and 𝑛 =

2(160) = 320. 8.70

From Exercise 8.39, 𝑠 = 86.4. For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table II, Appendix D, 𝑧.05 = 1.645. 𝑛 =

/

(

=

)

( .

) (

.

= 32.3 ≈ 33

)

Since there were 5 observations in the study, we would need an additional 28 observations to get the required 33 total observations. 8.71

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, = 1.96. 𝑧. 𝑧 /

𝑛 =𝑛 = 8.72

For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table II, Appendix D, 𝑧. = 1.645. Since no information is given about the values of 𝑝 and 𝑝 , we will be conservative and use .5 for both. 𝑛 =𝑛 =

8.73

8.74

(𝑝 𝑞 + 𝑝 𝑞 ) 1.96 . 06(. 94) + .08(. 92) = = 4,994.1 ≈ 4,995 (𝑆𝐸) . 01

/

( (

) )

=

.

. (. ) . (. ) .

= 1,503.3 ≈ 1,504

a.

With 𝜈 = 9 and 𝜈 = 6, 𝐹.

= 4.10.

b.

With 𝜈 = 18 and 𝜈 = 14, 𝐹. ≈ 3.57. (Since 𝜈 = 18 is not given, we estimate the value between those for 𝜈 = 15 and 𝜈 = 20.)

c.

With 𝜈 = 11 and 𝜈 = 4,𝐹. ≈ 8.80. (Since 𝜈 = 11 is not given, we estimate the value by averaging those given for 𝜈 = 10 and 𝜈 = 12.)

d.

With 𝜈 = 20 and 𝜈 = 5, 𝐹.

a.

With 𝜈 = 2 and 𝜈 = 30, 𝑃(𝐹 ≥ 5.39) = .01 (Table VIII, Appendix D)

b.

With 𝜈 = 24 and 𝜈 = 10, 𝑃(𝐹 ≥ 2.74) = .05 (Table VI, Appendix D)

= 3.21.

Copyright © 2022 Pearson Education, Inc.


398

Chapter 8 Thus, 𝑃(𝐹 < 2.74) = 1 − 𝑃(𝐹 ≥ 2.74) = 1 − .05=.95. c.

With 𝜈 = 7 and 𝜈 = 1, 𝑃(𝐹 > 236.8) = .05 (Table VI, Appendix D) Thus, 𝑃(𝐹 ≤ 236.8) = 1 − 𝑃(𝐹 > 236.8) = 1 − .05=.95.

8.75

8.76

8.77

8.78

d.

With 𝜈 = 40 and 𝜈 = 40, 𝑃(𝐹 > 2.11) = .01 (Table VIII, Appendix D)

a.

Reject H0 if 𝐹 > 𝐹.

= 1.74. (From Table V, Appendix D, with 𝜈 = 30 and 𝜈 = 20.)

b.

Reject H0 if 𝐹 > 𝐹.

= 2.04. (From Table VI, Appendix D, with 𝜈 = 30 and 𝜈 = 20.)

c.

Reject H0 if 𝐹 > 𝐹.

= 2.35. (From Table VII.)

d.

Reject H0 if 𝐹 > 𝐹.

= 2.78. (From Table VIII.)

To test 𝐻 : 𝜎 = 𝜎 against 𝐻 : 𝜎 ≠ 𝜎 , the rejection region is 𝐹 > 𝐹 / with 𝜈 = 10 and 𝜈 = 12. a.

𝛼 = .20 and 𝛼/2 = .20/2 = .10: Reject H0 if 𝐹 > 𝐹.

= 2.19 (Table V, Appendix D)

b.

𝛼 = .10 and 𝛼/2 = .10/2 = .05: Reject H0 if 𝐹 > 𝐹.

= 2.75 (Table VI, Appendix D)

c.

𝛼 = .05 and 𝛼/2 = .05/2 = .025: Reject H0 if 𝐹 > 𝐹.

= 3.37 (Table VII, Appendix D)

d.

𝛼 = .02 and 𝛼/2 = .02/2 = .01: Reject H0 if 𝐹 > 𝐹.

= 4.30 (Table VIII, Appendix D)

a.

The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution with 𝜈 = 𝑛 − 1 = 25 − 1 = 24 and 𝜈 = 𝑛 − 1 = 20 − 1 = 19. From Table VI, Appendix D, 𝐹. = 2.11. The rejection region is 𝐹 > 2.11 (if 𝑠 > 𝑠 ).

b.

The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution with 𝜈 = 𝑛 − 1 = 15 − 1 = 14 and 𝜈 = 𝑛 − 1 = 10 − 1 = 9. From Table VI, Appendix D, 𝐹. ≈ 3.01. The rejection region is 𝐹 > 3.01 (if 𝑠 > 𝑠 ).

c.

The rejection region requires 𝛼/2 = .10/2 = .05 in the upper tail of the F-distribution. If 𝑠 > 𝑠 , 𝜈 = 𝑛 − 1 = 21 − 1 = 20 and 𝜈 = 𝑛 − 1 = 31 − 1 = 30. From Table VI, Appendix D, 𝐹. = 1.93. The rejection region is 𝐹 > 1.93. If 𝑠 < 𝑠 , 𝜈 = 𝑛 − 1 = 30 and 𝜈 = 𝑛 − 1 = 20. From Table VI, 𝐹. = 2.04. The rejection region is 𝐹 > 2.04.

d.

The rejection region requires 𝛼 = .01 in the upper tail of the F-distribution with 𝜈 = 𝑛 − 1 = 41 − 1 = 40 and 𝜈 = 𝑛 − 1 = 31 − 1 = 30. From Table VIII, Appendix D, 𝐹. = 2.30. The rejection region is 𝐹 > 2.30 (if 𝑠 > 𝑠 ).

e.

The rejection region requires 𝛼 = .05 and 𝛼/2 = .05/2 = .025 in the upper tail of the F-distribution. If 𝑠 > 𝑠 , 𝜈 = 𝑛 − 1 = 7 − 1 = 6 and 𝜈 = 𝑛 − 1 = 16 − 1 = 15. From Table VII, Appendix D, 𝐹. = 3.41. The rejection region is 𝐹 > 3.41. If 𝑠 < 𝑠 , 𝜈 = 𝑛 − 1 = 15 and 𝜈 = 𝑛 − 1 = 6. From Table VII, Appendix D, 𝐹. = 5.27. The rejection region is 𝐹 > 5.27.

a.

To determine if a difference exists between the population variances, we test: 𝐻 :𝜎 = 𝜎 𝐻 :𝜎 ≠ 𝜎 Copyright © 2022 Pearson Education, Inc.


Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 399 The test statistic is 𝐹 =

=

.

= 2.26

.

The rejection region requires 𝛼/2 = .10/2 = .05 in the upper tail of the F-distribution with 𝜈 = 𝑛 − 1 = 27 − 1 = 26 and 𝜈 = 𝑛 − 1 = 12 − 1 = 11. From Table VI, Appendix D, 𝐹. ≈ 2.60. The rejection region is 𝐹 > 2.60. Since the observed value of the test statistic does not fall in the rejection region (𝐹 = 2.26 ≯ 2.60), H0 is not rejected. There is insufficient evidence to indicate a difference between the population variances at 𝛼 = .10. b.

The p-value is 𝑝 = 2𝑃(𝐹 ≥ 2.26). From Tables V and VI, with 𝜈 = 16 and 𝜈 = 11, 2(. 05) < 2𝑃(𝐹 ≥ 2.26) < 2(. 10) ⇒ .10 < 2𝑃(𝐹 ≥ 2.26) < .20 There is no evidence to reject H0 for 𝛼 ≤ .10.

8.79

a.

Using MINITAB, the descriptive statistics are: Descriptive Statistics: Sample 1, Sample 2 Variable N Mean Median StDev Sample 1 6 2.417 2.400 1.436 Sample 2 5 4.36 3.70 2.97

Minimum 0.700 1.40

Maximum Q1 Q3 4.400 1.075 3.650 8.90 1.84 7.20

To determine if the variance for population 2 is greater than that for population 1, we test: 𝐻 :𝜎 = 𝜎 𝐻 :𝜎 < 𝜎 The test statistic is 𝐹 =

=

. .

= 4.28

The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution with 𝜈 = 𝑛 − 1 = 5 − 1 = 4 and 𝜈 = 𝑛 − 1 = 6 − 1 = 5. From Table VI, Appendix D, 𝐹. = 5.19. The rejection region is 𝐹 > 5.19. Since the observed value of the test statistic does not fall in the rejection region (𝐹 = 4.29 ≯ 5.19), H0 is not rejected. There is insufficient evidence to indicate the variance for population 2 is greater than that for population 1 at 𝛼 = .05. b.

The p-value is 𝑝 = 𝑃(𝐹 ≥ 4.28). From Tables V and VI, with 𝜈 = 4 and 𝜈 = 5, . 05 < 𝑝 = 𝑃(𝐹 ≥ 4.28) < .10 There is no evidence to reject H0 for 𝛼 = .05 but there is evidence to reject H0 for 𝛼 = .10.

8.80

a.

Let 𝜎 = variance of the capitalization rates for low ratings and 𝜎ℎ ℎ = variance of the capitalization rates for high ratings. To determine if the variability in the capitalization rates differ for the two rating levels, we test: 𝐻 :𝜎 𝐻 :𝜎

= 𝜎ℎ ℎ ≠ 𝜎ℎ ℎ

Copyright © 2022 Pearson Education, Inc.


400

Chapter 8 b.

Minitab was used to generate the following printouts: Descriptive Statistics CRED RATING

N StDev Variance 95% CI for σ

High Low

3 4

𝑠

0.722 0.492

= 0.242, 𝑠

0.521 (0.106, 14.176) 0.242 (0.139, 3.410)

= 0.521 =

=

.

= 2.15

c.

The test statistic is 𝐹 =

d.

The p-value is 𝑝 = 2𝑃(𝐹 ≥ 2.15). From Tables V, with 𝜈 = 2 and 𝜈 = 3,

.

2(. 10) < 2𝑃(𝐹 ≥ 2.15) ⇒ .20 < 2𝑃(𝐹 ≥ 2.15) There is no evidence to reject H0 for 𝛼 = .05.

8.81

e.

There is insufficient evidence to indicate that the variability in the capitalization rates differ for the two rating levels at 𝛼 = .05.

f.

We must assume that the two populations of capitalization rates are normally distributed. We must also assume that we selected two independent random samples.

a.

Let 𝜎 = the variance of the size of normal packets and 𝜎 = the variance of the size of attacked data packets. To determine if the size variability of normal packets differs from the size variability of attacked data packets, we test: 𝐻 :𝜎 = 𝜎 𝐻 :𝜎 ≠ 𝜎

b.

The test statistic is 0.8812

c.

The rejection region requires 𝛼/2 = .01/2 = .005 in the upper tail of the F-distribution. Since 𝑠 > 𝑠 , 𝜈 = 𝑛 − 1 = 160 − 1 = 159 and 𝜈 = 𝑛 − 1 = 130 − 1 = 129. From Table VIII, Appendix D, 𝐹. ≈ 1.53. The rejection region is 𝐹 > 1.53

d.

The p-value p = 0.4561.

e.

Using either the p-value approach (𝛼 = .01 < .4561 = 𝑝) or the rejection region approach 𝐹 = 0.8812 ≯ 1.53), Ho cannot be rejected. There is insufficient evidence to indicate that the size variability of normal packets differs from the size variability of attacked data packets 𝛼 = .01.

f.

The 99% confidence interval is (0.5733, 1.3661). Since the value 1 is contained in the interval, we are unable to determine that a difference exists in the size variability of normal packets and the size variability of attacked data packets at the 99% confidence level.

Copyright © 2022 Pearson Education, Inc.


Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 401 8.82

a.

The amount of variability of GHQ scores tells us how similar or different the members of the group are on GHQ scores. The larger the variability, the larger the differences are among the members on the GHQ scores. The smaller the variability, the smaller the differences are among the members on the GHQ scores.

b.

Let 𝜎 = variance of the mental health scores of the employed and 𝜎 = variance of the mental health scores of the unemployed. To determine if the variability in mental health scores differs for employed and unemployed workers, we test: 𝐻 :𝜎 = 𝜎 𝐻 :𝜎 ≠ 𝜎

c.

The test statistic is 𝐹 =

=

=

. .

= 2.45

The rejection region requires 𝛼/2 = .05/2 = .025 in the upper tail of the F-distribution with 𝜈 = 𝑛 − 1 = 49 − 1 = 48 and 𝜈 = 𝑛 − 1 = 142 − 1 = 141. Using MINITAB, Inverse Cumulative Distribution Function F distribution with 48 DF in numerator and 141 DF in denominator P( X <= x ) x 0.975 1.55339

The rejection region is 𝐹 > 1.55. Since the observed value of the test statistic falls in the rejection region (𝐹 = 2.45 > 1.55), H0 is rejected. There is sufficient evidence to indicate that the variability in mental health scores differs for employed and unemployed workers for 𝛼 = .05. d.

8.83

We must assume that the 2 populations of mental health scores are normally distributed. We must also assume that we selected 2 independent random samples.

Let 𝜎 = variance at site 1 and 𝜎 = variance of site 2. To determine if the variances at the two locations differ, we test: 𝐻 :𝜎 = 𝜎 𝐻 :𝜎 ≠ 𝜎 From the printout, the test statistic is 𝐹 = .84 and the p-value is 𝑝 = .681. Since the p-value is not less than 𝛼 (𝑝 = .681 ≮ . 05), H0 is not rejected. There is insufficient evidence to indicate the variances at the two locations differ at 𝛼 = .05.

8.84

a.

Using MINITAB, the output for comparing the variances is: 95% Confidence Intervals Method F

CI for StDev Ratio (0.971, 2.202)

CI for Variance Ratio (0.942, 4.851)

The 95% confidence interval for the ratio of the variances is (. 942,4.851). We are 95% confidence that the ratio of variances for the 2 groups is between .942 and 4.851. Copyright © 2022 Pearson Education, Inc.


402

8.85

Chapter 8 b.

No. If there were no difference in the variances of the two groups, the ratio would be 1. Since 1 is contained in the interval, there is no evidence that one group has a larger variance in response time than the other.

c.

Since we concluded that there is no evidence of a difference in the variances, the inference from Exercise 8.13 is valid.

Using MINITAB, the descriptive statistics are: Descriptive Statistics: Novice, Experienced Variable N Mean Median StDev Novice 12 32.83 32.00 8.64 Experien 12 20.58 19.50 5.74

a.

Minimum 20.00 10.00

Maximum 48.00 31.00

Q1 26.75 17.25

Q3 39.00 24.75

Let 𝜎 = variance in inspection errors for novice inspectors and 𝜎 = variance in inspection errors for experienced inspectors. Since we wish to determine if the data support the belief that the variance is lower for experienced inspectors than for novice inspectors, we test: 𝐻 :𝜎 = 𝜎 𝐻 :𝜎 > 𝜎 The test statistic is F =

Larger sample variance s 12 8.64 2 = = = 2.27 Smaller sample variance s 22 5.74 2

The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution with 𝜈 = 𝑛 − 1 = 12 − 1 = 11 and 𝜈 = 𝑛 − 1 = 12 − 1 = 11. Using MINITAB: Inverse Cumulative Distribution Function F distribution with 11 DF in numerator and 11 DF in denominator P( X <= x ) 0.95 2.81793

x

The rejection region is 𝐹 > 2.82. Since the observed value of the test statistic does not fall in the rejection region (𝐹 = 2.27 ≯ 2.82), H0 is not rejected. The sample data do not support her belief at 𝛼 = .05. b.

Using MINITAB: Cumulative Distribution Function F distribution with 11 DF in numerator and 11 DF in denominator x 2.27

P( X <= x ) 0.905144

The p-value= 𝑃(𝐹 ≥ 2.27) = 1 − 𝑃(𝐹 < 2.27) = 1 − .905 = .095. 8.86

In order to perform the t-test, we must assume that the variances of the two groups are the same. Let 𝜎 = variance of internal oil content for sweet potato slices fried at 130o and 𝜎 = variance of internal oil content for sweet potato slices subjected to a two-stage frying process. To determine if there is a difference in the variances of the two groups, we test:

Copyright © 2022 Pearson Education, Inc.


Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 403 𝐻 :𝜎 = 𝜎 𝐻 :𝜎 ≠ 𝜎 The test statistic is 𝐹 =

Larger sample variance Smaller sample variance

=

=

.

= 30.25

.

Since no confidence level was given, we will use 𝛼 = .05. The rejection region requires 𝛼/2 = .05/2 = .025 in the upper tail of the F-distribution with 𝜈 = 𝑛 – 1 = 6– 1 = 5 and 𝜈 = 𝑛 – 1 = 6– 1 = 5. From = 7.15. The rejection region is 𝐹 > 7.15. Table VII, Appendix D, 𝐹. Since the observed value of the test statistic falls in the rejection region (𝐹 = 30.25 > 7.15), H0 is rejected. There is sufficient evidence to indicate the variances for the two groups differ at 𝛼 = .05. Since there is evidence that the variances are not equal, we would not recommend that the researchers carry out the analysis. 8.87

For each scenario, we will compute the test statistic that would be used to test to see if there is a difference between the two variances. For the first scenario, with 𝑠 = 4 and 𝑠 = 2, the test statistic is 𝐹 =

Larger sample variance Smaller sample variance

For the second scenario, with 𝑠 = 10 and 𝑠 = 15, the test statistic is 𝐹 =

=

Larger sample variance Smaller sample variance

=

= 4.

=

=

=

2.25. In both cases, the degrees of freedom for the tests are the same. Thus, the assumption required for the t-test for the first scenario, with 𝑠 = 4 and 𝑠 = 2, would be the most likely be violated because the value of the test statistic is larger. 8.88

Let 𝜎 = variance of improvement scores in the honey dosage group and 𝜎 = variance of improvement scores in the DM dosage group. From Exercise 8.23, 𝑠 = 2.855 and 𝑠 = 3.256. To determine if the variability in coughing improvement scores differs for the two groups, we test: 𝐻 :𝜎 = 𝜎 𝐻 :𝜎 ≠ 𝜎 The test statistic is 𝐹 =

larger sample variance smaller sample variance

=

=

. .

= 1.30

The rejection region requires 𝛼/2 = .10/2 = .05 in the upper tail of the F-distribution with 𝜈 = 𝑛 – 1 = 33– 1 = 32 and 𝜈 = 𝑛 – 1 = 35– 1 = 34. From Table VI, Appendix D, 𝐹. ≈ 1.84. The rejection region is 𝐹 > 1.84. Since the observed value of the test statistic does not fall in the rejection region (𝐹 = 1.30 ≯ 1.84), H0 is not rejected. There is insufficient evidence to indicate the variability in the coughing improvement scores differs for the two groups at 𝛼 = .10. 8.89

a.

The 2 samples are randomly selected in an independent manner from the two populations. The sample sizes, n1 and n2, are large enough so that 𝑥̄ and 𝑥̄ each have approximately normal sampling distributions and so that 𝑠 and 𝑠 provide good approximations to 𝜎 and 𝜎 . This will be true if 𝑛 ≥ 30 and 𝑛 ≥ 30.

b.

1. 2. 3.

Both sampled populations have relative frequency distributions that are approximately normal. The population variances are equal. The samples are randomly and independently selected from the populations. Copyright © 2022 Pearson Education, Inc.


404

8.90

Chapter 8

c.

1. 2.

The relative frequency distribution of the population of differences is normal. The sample of differences are randomly selected from the population of differences.

d.

The two samples are independent random samples from binomial distributions. Both samples should be large enough so that the normal distribution provides an adequate approximation to the sampling distributions of 𝑝̂ and 𝑝̂ .

e.

The two samples are independent random samples from populations which are normally distributed.

a.

Using MINITAB, some preliminary calculations are: Test for Two Variances Method σ(First) / σ(Second) = 1 σ(First) / σ(Second) ≠ 1 α = 0.05

Null hypothesis Alternative hypothesis Significance level

F method was used. This method is accurate for normal data only.

Method F

DF1 19

Test Statistic 0.26

DF2 14

P-Value 0.007

𝐻 :𝜎 = 𝜎 𝐻 :𝜎 ≠ 𝜎 The p-value is 𝑝 = .007. Since the p-value is less than 𝛼 (𝑝 = .0074 < .05), H0 is rejected. There is sufficient evidence to conclude 𝜎 ≠ 𝜎 at 𝛼 = .05.

8.91

b.

No, we should not use a small sample t- test to test 𝐻 : (𝜇 − 𝜇 ) = 0 against 𝐻 : (𝜇 − 𝜇 ) ≠ 0 because the assumption of equal variances does not seem to hold since we concluded 𝜎 ≠ 𝜎 in part b.

a.

Some preliminary calculations are: 𝑠 =

(

)

(

)

=

(

. )

̄ )

=

(

(

. )

= 66.7792

. )

= .78

𝐻 : 𝜇 −𝜇 =0 𝐻 :𝜇 − 𝜇 > 0 The test statistic is 𝑡 =

( ̄

. .

The p-value is 𝑝 = 𝑃(𝑡 > .78). Using MINITAB, with 𝑑𝑓 = 𝑛 + 𝑛 – 2 = 12 + 14– 2 = 24, Cumulative Distribution Function Student’s t distribution with 24 DF x 0.78

P( X ≤ x ) 0.778492

Copyright © 2022 Pearson Education, Inc.


Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 405 The p-value is 𝑝 = 𝑃(𝑡 > .78) = 1 − .778 = .222. Since the p-value is not less than 𝛼 (𝑝 = .222 ≮ . 05), H0 is not rejected. There is insufficient evidence to indicate that 𝜇 > 𝜇 at 𝛼 = .05. b.

For confidence coefficient .99, 𝛼 = .01 and 𝛼/2 = .01/2 = .005. From Table III, Appendix D, with 𝑑𝑓 = 𝑛 + 𝑛 – 2 = 12 + 14– 2 = 24, 𝑡. = 2.797. The confidence interval is: (𝑥̄ − 𝑥̄ ) ± 𝑡.

𝑠

+

⇒ (17.8 − 15.3) ± 2.797 66.7792

+

⇒ 2.50 ± 8.99 ⇒ (−6.49, 11.49) c.

For confidence coefficient .99, 𝛼 = .01 and 𝛼/2 = .01/2 = .005. From Table II, Appendix D, = 2.58. 𝑧. 𝑛 =𝑛 =

8.92

/

=

( .

) (

.

.

)

= 224.15 ≈ 225

Using MINITAB, some preliminary calculations are: Test for Two Proportions Sample X N Sample p 1 110 200 0.550000 2 130 200 0.650000

Difference = p (1) - p (2) Estimate for difference: -0.1 90% upper bound for difference: -0.0375449 Test for difference = 0 (vs < 0): Z = -2.04

P-Value = 0.021

Fisher’s exact test: P-Value = 0.026

𝑝̂ = a.

=

= .55

𝑝̂ =

=

= .65

𝑝̂ =

=

=

= .6

𝐻 : 𝑝 −𝑝 =0 𝐻 :𝑝 − 𝑝 < 0 The test statistic is 𝑧 = −2.04 and the p-value is 𝑝 = .026. Since the p-value is less than 𝛼 (𝑝 = .026 < .10), H0 is rejected. There is sufficient evidence to conclude 𝑝 − 𝑝 < 0 at 𝛼 = .10.

b.

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. = 1.96. The 95% confidence interval for (𝑝 − 𝑝 ) is approximately: (𝑝̂ − 𝑝̂ ) ± 𝑧 /

(

.

⇒ (.55 − .65) ± 1.96

+

)

.

+

.

(

.

)

⇒ −.10 ± .096 ⇒ (−.196, −.004) c.

From part b, 𝑧. = 1.96. Using the information from our samples, we can use 𝑝 = .55 and 𝑝 = .65. For a width of .01, the margin of error is .005. 𝑛 =𝑛 =

/

( (

) )

=

( .

) .

(

. .

) .

(

.

)

=

. .

Copyright © 2022 Pearson Education, Inc.

= 72,990.4 ≈ 72,991


406

Chapter 8

8.93

a.

For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table II, Appendix D, 𝑧. = 1.645. The confidence interval is: (𝑥̄ − 𝑥̄ ) ± 𝑧.

⇒ (12.2 − 8.3) ± 1.645

+

.

+

.

⇒ 3.90 ± .31 ⇒ (3.59, 4.21) b.

Using MINITAB, some preliminary calculations are: Two-Sample T-Test and CI Sample N Mean StDev 1 135 12.20 1.45 2 148 8.30 1.73

SE Mean 0.12 0.14

Difference = μ (1) - μ (2) Estimate for difference: 3.900 99% CI for difference: (3.409, 4.391) T-Test of difference = 0 (vs ≠): T-Value = 20.60

P-Value = 0.000

DF = 278

𝐻 : 𝜇 −𝜇 =0 𝐻 :𝜇 − 𝜇 ≠ 0 The test statistic is 𝑧 = 20.60 and 𝑝 = .000. Since the p-value is less than 𝛼 (𝑝 = .000 < .01), H0 is rejected. There is sufficient evidence to indicate that 𝜇 ≠ 𝜇 at 𝛼 = .01. c.

For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table II, Appendix D, 𝑧. = 1.645. 𝑛 =𝑛 =

8.94

a.

/

(

)

=

( .

) ( .

. )

(. )

= 345.02 ≈ 346

This is a paired difference experiment. Using MINITAB, some preliminary calculations are: Paired T-Test and CI: Pop1, Pop2 Paired T for Pop1 - Pop2

Pop1 Pop2 Difference

N 5 5 5

Mean 27.00 23.20 3.800

StDev 3.87 3.56 1.483

SE Mean 1.73 1.59 0.663

95% CI for mean difference: (1.958, 5.642) T-Test of mean difference = 0 (vs ≠ 0): T-Value = 5.73

P-Value = 0.005

𝐻 : 𝜇 =0 𝐻 :𝜇 ≠ 0 The test statistic is 𝑡 = 5.73 and 𝑝 = .005. Since the p-value is less than 𝛼 (𝑝 = .005 < .05), H0 is rejected. There is sufficient evidence to indicate that the population means are different at 𝛼 = .05.

Copyright © 2022 Pearson Education, Inc.


Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 407 b.

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. Therefore, we would use the same t-value as above, 𝑡. = 2.776. The confidence interval is: 𝑥̄ ± 𝑡 /

8.95

8.96

.

⇒ 3.8 ± 3.8 ± 2.776

⇒ 3.8 ± 1.84 ⇒ (1.96, 5.64)

c.

The sample of differences must be randomly selected from a population of differences which has a normal distribution.

a.

The parameter of interest is 𝑝Male − 𝑝Female . We must assume that the samples are independent.

b.

The parameter of interest is 𝜇Crestor − 𝜇Mevacor . We must assume that the samples are independent.

c.

The parameter of interest is 𝜇 . We must assume that the sample size is sufficiently large or that the population of differences is approximately normal.

d.

The parameter of interest is 𝜎High /𝜎Low . We must assume that the samples are independent and from normal populations.

a.

The target parameter is 𝜇 − 𝜇 = difference in mean trap measurements between the Bahia Tortugas fishing cooperative and the Punta Abreojos fishing cooperative.

b.

Using MINITAB, the descriptive statistics are: Descriptive Statistics: BT, PA Variable N Mean StDev Variance Minimum BT 7 89.86 11.63 135.14 70.00 PA 8 99.63 27.38 749.70 66.00

Q1 Median Q3 Maximum 82.00 93.00 99.00 105.00 76.50 96.00 115.00 153.00

The point estimate is 𝑥̄ − 𝑥̄ = 89.86 − 99.63 = −9.77. c.

Since the sample sizes for both samples are so small, the Central Limit Theorem does not apply. In addition, the population standard deviations are not known and must be estimated with the sample standard deviations.

d.

𝑠 =

(

)

(

)

=

(

)

.

(

)

.

=

,

.

= 466.0569

For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table III, Appendix D, with 𝑑𝑓 = 𝑛 + 𝑛 − 2 = 7 + 8 − 2 = 13, 𝑡. = 1.771. The 90% confidence interval for (𝜇 − 𝜇 ) is: (𝑥̄ − 𝑥̄ ) ± 𝑡 /

𝑠

+

⇒ (89.86 − 99.63) ± 1.771 466.0569

+

⇒ −9.77 ± 19.787 ⇒ (−29.557, 10.017) e.

Since 0 is in the 90% confidence interval, there is insufficient evidence to indicate a difference in the mean trap measurements between the two fishing cooperatives.

Copyright © 2022 Pearson Education, Inc.


408

Chapter 8 f.

To determine if a difference exists between the population variances, we test: 𝐻 :𝜎 𝐻 :𝜎

g.

=𝜎 ≠𝜎

Using MINITAB, the descriptive statistics are: Descriptive Statistics: BT, PA Variable N Mean Variance BT 7 89.86 135.14 PA 8 99.63 749.70

𝑠

= 135.14 and 𝑠

Minimum 70.00 66.00

Q1 82.00 76.50

Median 93.00 96.00

Q3 99.00 115.00

Maximum 105.00 153.00

= 749.70 =

Larger sample variance

h.

The test statistic is 𝐹 =

i.

Using MINITAB, the p-value is:

Smaller sample variance

.

=

= 5.55.

.

Cumulative Distribution Function F distribution with 7 DF in numerator and 6 DF in denominator x 5.55

P( X <= x ) 0.973421

The p-value is 𝑝 = 2(1 − .973421) = 2(. 026579) = .053158.

8.97

j.

Since the p-value is not less than 𝛼(𝑝 = .053158 ≮ . 01), H0 is not rejected. There is insufficient evidence to indicate the variances are different at 𝛼 = .01.

a.

Let 𝜇 = mean score for males and 𝜇 = mean score for females. For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table II, Appendix D, 𝑧. = 1.645. The 90% confidence interval is: ( x1 − x2 ) ± zα / 2

σ 12 n1

+

σ 22 n2

 (39.08 − 38.79) ± 1.645

6.732 6.942 + 127 114

 0.29 ± 1.452  ( −1.162, 1.742 )

We are 90% confident that the difference in mean service-rating scores between males and females is between -1.162 and 1.742. b.

To determine if the service-rating score variances differ by gender, we test: 𝐻 :𝜎 = 𝜎 𝐻 :𝜎 ≠ 𝜎 The test statistic is 𝐹 =

larger sample variance smaller sample variance

=

=

. .

= 1.06

The rejection region requires 𝛼/2 = .10/2 = .05 in the upper tail of the F-distribution with 𝜈 = 𝑛 – 1 = 114– 1 = 113 and 𝜈 = 𝑛 – 1 = 127– 1 = 126. Using MINITAB, we get:

Copyright © 2022 Pearson Education, Inc.


Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 409 Inverse Cumulative Distribution Function F distribution with 113 DF in numerator and 126 DF in denominator P( X <= x ) 0.95

𝐹.

x 1.35141

= 1.35. The rejection region is 𝐹 > 1.35.

Since the observed value of the test statistic does not fall in the rejection region (𝐹 = 1.06 ≯ 1.35), H0 is not rejected. There is insufficient evidence to indicate the service-rating score variances differ by gender at 𝛼 = .10. c.

8.98

Since we did not reject H0 in part b, the confidence interval in part a is valid. Because 0 falls in the 90% confidence interval, we are 90% confident that there is no difference in the mean service-rating scores between males and females.

Using MINITAB, some preliminary calculations are: Descriptive Statistics: Spillage Variable Cause N Mean Spillage Collision 10 76.6 Fire 11 75.0 Grounding 11 53.73 HullFail 9 63.7 Unknown 1 25.000

a.

StDev 70.4 61.9 29.45 63.1 *

Variance 4950.9 3829.6 867.22 3984.5 *

Q1 35.0 33.0 36.00 31.0 *

Median 41.5 50.0 41.00 36.0 25.000

Q3 102.0 82.0 62.00 73.5 *

Let 𝜇 = mean spillage for accidents caused by collision and 𝜇 = mean spillage for accidents caused by fire/explosion. 𝑠 =

)

(

(

)

=

(

) ,

(

.

) ,

.

= 4,360.7421

For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table III, Appendix D, with 𝑑𝑓 = 𝑛 + 𝑛 – 2 = 10 + 11– 2 = 19, 𝑡. = 1.729. The confidence interval is: (𝑥̄ − 𝑥̄ ) ± 𝑡.

𝑠

⇒ (76.6 − 75.0) ± 1.729 4,360.7421

+

+

⇒ 1.6 ± 49.89 ⇒ (−48.29, 51.49) b.

Let 𝜇 = mean spillage for accidents caused by grounding and 𝜇 = mean spillage for accidents caused by hull failure. 𝑠 =

(

)

(

)

=

(

)

(

.

) ,

.

= 2,252.6778

To determine if the mean spillage amount for accidents caused by grounding is different from the mean spillage amount caused by hull failure, we test: 𝐻 : 𝜇 −𝜇 =0 𝐻 :𝜇 − 𝜇 ≠ 0 The test statistic is 𝑡 =

( ̄

̄ )

=

(

. )

. ,

.

=

. .

= −.47

The rejection region requires 𝛼/2 = .05/2 = .025 in each tail of the t-distribution with = 2.101. The rejection 𝑑𝑓 = 𝑛 + 𝑛 – 2 = 11 + 9– 2 = 18. From Table III, Appendix D, 𝑡. region is 𝑡 < −2.101 or 𝑡 > 2.101. Copyright © 2022 Pearson Education, Inc.


Chapter 8

Since the observed value of the test statistic does not fall in the rejection region (𝑡 = −.47 ≮ − 2.101), H0 is not rejected. There is insufficient evidence to indicate the mean spillage amount for accidents caused by grounding is different from the mean spillage amount caused by hull failure at 𝛼 = .05. c.

The necessary assumptions are: We must assume that the distributions from which the samples were selected are approximately normal, the samples are independent, and the variances of the two populations are equal. Below are the histograms for each of the samples: Histogram of Spillage Normal -60 Collision

0

60

120

180

Fire

240 6.0 4.5 3.0

Frequency

410

1.5 Grounding

6.0

HullFail

0.0

4.5 3.0

Fire 75 Mean StDev 61.88 N 11 Grounding Mean 53.73 StDev 29.45 N 11 HullFail Mean 63.67 StDev 63.12 N 9

1.5 0.0

C ollision Mean 76.6 StDev 70.36 N 10

-60

0

60

120

180

240

Spillage Panel variable: Cause

Based on the shapes of the histograms, it does not appear that the data are normally distributed. Also, we know that if the data are normally distributed, then the Interquartile Range, IQR, divided by the standard deviation should be approximately 1.3. We will compute IQR/s for each of the samples: Collision: Fire: Grounding: Hull Failure:

𝐼𝑄𝑅/𝑠 = (102.0– 35.0)/70.4 = .95 𝐼𝑄𝑅/𝑠 = (82– 33)/61.9 = .79 𝐼𝑄𝑅/𝑠 = (62.0– 36)/29.45 = .88 𝐼𝑄𝑅/𝑠 = (73.5– 31)/63.1 = .67

Since all of these ratios are quite a bit smaller than 1.3, it indicates that none of the samples come from normal distributions. Thus, it appears that the assumption of normal distributions is violated. The sample standard deviations are: Collision: Fire:

𝑠 = 70.4 𝑠 = 61.9

Grounding: Hull Failure:

𝑠 = 29.45 𝑠 = 63.1

Without doing formal tests, it appears that the variances of the groups Collision, Fire, and Hull Failure are probably not significantly different. However, it appears that the variance for the Grounding group is smaller than the others. d.

Let 𝜎 = variance of spillage for accidents caused by collision and 𝜎 = variance of spillage for accidents caused by grounding. Copyright © 2022 Pearson Education, Inc.


Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 411 To determine if the variances of the amounts of spillage due to collision and grounding differ, we test: 𝐻 : 𝜎 −𝜎 =0 𝐻 :𝜎 − 𝜎 ≠ 0 The test statistic is 𝐹 =

Larger sample variance Smaller sample variance

=

,

=

. .

= 5.71

The rejection region requires 𝛼/2 = .02/2 = .01 in the upper tail of the F distribution with 𝜈 = 𝑛 – 1 = 10– 1 = 9 and 𝜈 = 𝑛 – 1 = 11– 1 = 10. From Table VIII, Appendix D, 𝐹. = 4.94. The rejection region is 𝐹 > 4.94. Since the observed value of the test statistic falls in the rejection region (𝐹 = 5.71 > 4.94), H0 is rejected. There is sufficient evidence to indicate the variances of the amounts of spillage due to collision and grounding differ at 𝛼 = .02. a.

Let 𝜇 = mean years of experience for commercial suppliers and 𝜇 = mean years of experience for government employees. To determine if the mean years of experience for commercial suppliers of DoD is less than that for government employees, we test: 𝐻 : 𝜇 =𝜇 𝐻 : 𝜇 <𝜇

b.

From the printout, the p-value is 𝑝 = .046. Since the p-value is less than 𝛼 (𝑝 = .046 < .05), H0 is rejected. There is sufficient evidence to indicate the mean years of experience for commercial suppliers of DoD is less than that for government employees at 𝛼 = .05.

c.

We have to assume that both samples are random samples, that the populations from which the samples were drawn were approximately normal, and that the population variances are the same. Using MINITAB, histograms of the data are: Histogram of Commercial, Government Normal 0

Commercial

10

20

30

Government

4

3.0 2.5

3

Frequency

8.99

2.0 2

40 Commercial Mean 12.33 StDev 8.869 N 6 Government Mean 20.82 StDev 9.368 N 11

1.5 1.0

1 0.5 0

-5

0

5

10

15

20 25 30

0.0

Neither population appears to be normally distributed. However, the spreads of the distributions appear to be the same. Thus, the assumption of equal variances appears to be met. d.

Let 𝜎 = variance of the years of experience for commercial suppliers and 𝜎 = variance of the years of experience for government employees. To determine if the variability in years of experience differ between the two groups, we test: 𝐻 :𝜎 = 𝜎 𝐻 :𝜎 ≠ 𝜎 Copyright © 2022 Pearson Education, Inc.


412

8.100

Chapter 8 e.

From the printout, the test statistic is 𝐹 = .8963.

f.

From the printout, the p-value is 𝑝 = .9617.

g.

Since the p-value is not less than 𝛼 (𝑝 = .9617 ≮ . 05), H0 is not rejected. There is insufficient evidence to indicate the variability in years of experience differ between the two groups at 𝛼 = .05. This agrees with the answer in part c.

a.

Since there is much variability among cars, by using matched pairs, we can block out the variability among the cars and compare the means of the 2 types of shocks.

b.

Let 𝜇 = mean strength of manufacturer’s shock and 𝜇 = mean strength of competitor’s shock. Also, let 𝜇 = 𝜇 − 𝜇 . Using MINITAB the descriptive statistics are: Descriptive Statistics: Manufacturer, Competitor, Di Variable N Mean StDev Minimum Q1 Manufacturer 6 10.717 1.752 8.800 9.400 Competitor 6 10.300 1.818 8.400 8.850 Dff 6 0.4167 0.1329 0.2000 0.3500

Median 10.100 9.700 0.4000

Q3 12.675 12.250 0.5250

Maximum 13.200 13.000 0.6000

To determine if there is a difference in the mean strength of the two types of shocks after 20,000 miles, we test: 𝐻 : 𝜇 =0 𝐻 :𝜇 ≠ 0 The test statistic is 𝑡 =

=

.

= 7.68

. √

The rejection region requires 𝛼/2 = .05/2 = .025 in each tail of the t-distribution with = 2.571. The rejection region is 𝑡 < 𝑑𝑓 = 𝑛 − 1 = 6 − 1 = 5. From Table III, Appendix D, 𝑡. −2.571 or 𝑡 > 2.571. Since the observed value of the test statistic falls in the rejection region (𝑡 = 7.68 > 2.571), H0 is rejected. There is sufficient evidence to indicate a difference in the mean strength of the two types of shocks after 20,000 miles at 𝛼 = .05. c.

Using MINITAB: Cumulative Distribution Function Student's t distribution with 5 DF x 7.68

P( X <= x ) 0.999702

The observed significance level is 𝑝 = 𝑃(𝑡 ≥ 7.68) + 𝑃(𝑡 ≤ −7.68) = 2𝑃(𝑡 ≥ 7.68) = 2(1 − .999702) = .000596 d.

We must assume that the population of differences is normally distributed and that the sample is random.

e.

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table III, Appendix B, with = 2.571. The 95% confidence interval is: 𝑑𝑓 = 𝑛 – 1 = 6– 1 = 5, 𝑡. Copyright © 2022 Pearson Education, Inc.


Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 413

𝑑 ± 𝑧.

⇒ .4167 ± 2.571

. √

⇒ .4167 ± .1395 ⇒ (. 2772, .5562)

We are 95% confident that the difference in mean strength between the two types of shocks after 20,000 miles is between .2772 and .5562. f.

Some preliminary calculations are: 𝑠 =

(

)

(

)

=

(

(

) .

) .

= 3.1873

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table III, Appendix B, with = 2.228. The 95% confidence interval is: 𝑑𝑓 = 𝑛 + 𝑛 – 2 = 6 + 6– 2 = 11, 𝑡.

1 1 1 1 +   (10.717 − 10.300 ) ± 2.228 3.1873  +  6 6  n1 n2 

( μ1 − μ2 ) ± t.025 s 2p 

 .417 ± 2.2965  ( −1.8795, 2.7135) We are 95% confident that the difference in mean strength between the two types of shocks after 20,000 miles is between -1.8795 and 2.7135.

8.101

g.

The interval assuming independent sample in part f is (−1.8795, 2.7135) while the interval assuming paired differences in part e is (.2772, .5562). The interval assuming independent samples is much wider because the interval for the paired-difference eliminated the car to car differences. The interval from part e gives more information because the interval is narrower.

h.

No. If the data were collected using a paired experiment, then the data must be analyzed as a paired experiment.

a.

Let 𝜇 = mean driver chest injury rating and 𝜇 = mean passenger chest injury rating. Because the data are paired, we are interested in 𝜇 − 𝜇 = 𝜇 , the difference in mean chest injury ratings between drivers and passengers.

b.

The data were collected as matched pairs and thus, must be analyzed as matched pairs. Two ratings are obtained for each car – the driver’s chest injury rating and the passenger’s chest injury rating.

c.

Using MINITAB, the descriptive statistics are: Descriptive Statistics: DrivChst, PassChst, diff Variable N Mean Median StDev DrivChst 98 49.663 50.000 6.670 PassChst 98 50.224 50.500 7.107 diff 98 -0.561 0.000 5.517

Minimum 34.000 35.000 -15.000

Maximum 68.000 69.000 13.000

Q1 45.000 45.000 -4.000

Q3 54.000 55.000 3.000

For confidence coefficient .99, 𝛼 = .01 and 𝛼/2 = .01/2 = .005. From Table II, Appendix D, = 2.58. The 99% confidence interval is: 𝑧. 𝑑 ± 𝑧. d.

⇒ −0.561 ± 2.58

. √

⇒ −0.561 ± 1.438 ⇒ (−1.999, 0.877)

We are 99% confidence that the difference between the mean chest injury ratings of drivers and frontseat passengers is between −1.999 and 0.877. Since 0 is in the confidence interval, there is no Copyright © 2022 Pearson Education, Inc.


414

Chapter 8

e. 8.102

a.

evidence that the true mean driver chest injury rating exceeds the true mean passenger chest injury rating. Since the sample size is large, the sampling distribution of 𝑑 is approximately normal by the Central Limit Theorem. We must assume that the differences are randomly selected. Let 𝜇 = mean carat size of diamonds certified by GIA and 𝜇 = mean carat size of diamonds certified by HRD. For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, = 1.96. The 95% confidence interval is: Appendix D, 𝑧.

σ 12

( x1 − x2 ) ± zα / 2

n1

+

σ 22 n2

.2456 2 .18312 + 151 79

 (.6723 − .8129 ) ± 1.96

 −.1406 ± .0563  ( −.1969, − .0843 )

b.

We are 95% confident that the difference in mean carat size between diamonds certified by GIA and those certified by HRD is between -.1969 and -.0843. Since both end points are negative, the mean carat size of diamonds certified by HRD is larger than the mean carat size of diamonds certified by GIA by anywhere from .0843 and .1969 carats.

c.

Let 𝜇 = mean carat size of diamonds certified by IGI.

( x1 − x3 ) ± zα / 2

σ 12 n1

+

σ 32 n3

 (.6723 − .3665 ) ± 1.96

.2456 2 .21632 + 151 78

 .3058 ± .0620  (.2438, .3678 )

d.

We are 95% confident that the difference in mean carat size between diamonds certified by GIA and those certified by IGI is between .2438 and .3678. Since both end points are positive, the mean carat size of diamonds certified by GIA is larger than the mean carat size of diamonds certified by IGI by anywhere from .2438 and .3678 carats.

e.

(𝑥̄ − 𝑥̄ ) ± 𝑧 /

+

⇒ (7,181 − 2,267) ± 1.96

,

+

,

⇒ 4,914 ± 793.7 ⇒ (4,120.3, 5,707.7) f.

We are 95% confident that the difference in mean selling price between diamonds certified by HRD and those certified by IGI is between 4,120.3 and 5,707.7. Since both end points are positive, the mean selling price of diamonds certified by HRD is larger than the mean selling price of diamonds certified by IGI by anywhere from 4,120.3 and 5,707.7. Let 𝜎 = variance of carat size for diamonds certified by GIA, 𝜎 = variance of carat size for diamonds certified by HRD, and 𝜎 = variance of carat size for diamonds certified by IGI.

g.

To determine if the variation in carat size differs for diamonds certified by GIA and diamonds certified by HRD, we test: 𝐻 : 𝜎 =𝜎 𝐻 :𝜎 ≠ 𝜎 The test statistic is 𝐹 =

Larger sample variance Smaller sample variance

=

=

. .

= 1.799

The rejection region requires 𝛼/2 = .05/2 = .025 in the upper tail of the F-distribution with = 1.494. 𝜈 = 𝑛 − 1 = 151 − 1 = 150 and 𝜈 = 𝑛 − 1 = 79 − 1 = 78. Using MINITAB, 𝐹. Copyright © 2022 Pearson Education, Inc.


Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 415 The rejection region is 𝐹 > 1.494. Since the observed value of the test statistic falls in the rejection region (𝐹 = 1.799 > 1.494), H0 is rejected. There is sufficient evidence to indicate the variation in carat size differs for diamonds certified by GIA and those certified by HRD at 𝛼 = .05. h.

To determine if the variation in carat size differs for diamonds certified by GIA and diamonds certified by IGI, we test: 𝐻 : 𝜎 =𝜎 𝐻 :𝜎 ≠ 𝜎 The test statistic is 𝐹 =

Larger sample variance Smaller sample variance

=

=

. .

= 1.289

The rejection region requires 𝛼/2 = .05/2 = .025 in the upper tail of the F-distribution with = 1.497. 𝜈 = 𝑛 − 1 = 151 − 1 = 150 and 𝜈 = 𝑛 − 1 = 78 − 1 = 77. Using MINITAB, 𝐹. The rejection region is 𝐹 > 1.497. Since the observed value of the test statistic does not fall in the rejection region (𝐹 = 1.289 ≯ 1.497), H0 is not rejected. There is insufficient evidence to indicate the variation in carat size differs for diamonds certified by GIA and those certified by IGI at 𝛼 = .05. i.

To determine if the variation in selling price differs for diamonds certified by HRD and diamonds certified by IGI, we test: 𝐻 : 𝜎 =𝜎 𝐻 :𝜎 ≠ 𝜎 The test statistic is 𝐹 =

Larger sample variance Smaller sample variance

=

=

= 1.87

The rejection region requires 𝛼/2 = .05/2 = .025 in the upper tail of the F-distribution with = 1.567. 𝜈 = 𝑛 − 1 = 79 − 1 = 78 and 𝜈 = 𝑛 − 1 = 78 − 1 = 77. Using MINITAB, 𝐹. The rejection region is 𝐹 > 1.567. Since the observed value of the test statistic falls in the rejection region (𝐹 = 1.87 > 1.567), H0 is rejected. There is sufficient evidence to indicate the variation in selling price differs for diamonds certified by HRD and those certified by IGI at 𝛼 = .05. j.

We will look at the 4 methods for determining if the data are normal. First, we will look at histograms of the data. Using MINITAB, the histograms of the carat sizes for the 3 certification bodies are:

Copyright © 2022 Pearson Education, Inc.


Chapter 8

Histogram of CARAT 0.30 GIA

0.45

0.60

0.75

0.90

1.05

HRD

30 20

Frequency

416

10 0

IGI

30 20 10 0

0.30

0.45

0.60

0.75

0.90

1.05

CARAT Panel variable: CERT

From the histograms, none of the data appear to be mound-shaped. It appears that none of the data sets are normal. Next, we look at the intervals 𝑥̄ ± 𝑠, 𝑥̄ ± 2𝑠, 𝑥̄ ± 3𝑠. If the proportions of observations falling in each interval are approximately .68, .95, and 1.00, then the data are approximately normal. For GIA: 𝑥̄ ± 𝑠 ⇒ .6723 ± .2456 ⇒ (. 4267, .9179) 84 of the 151 values fall in this interval. The proportion is .56. This is much smaller than the .68 we would expect if the data were normal. 𝑥̄ ± 2𝑠 ⇒ .6723 ± 2(. 2456) ⇒ .6723 ± .4912 ⇒ (. 1811, 1.1635) 151 of the 151 values fall in this interval. The proportion is 1.00. This is much larger than the .95 we would expect if the data were normal. 𝑥̄ ± 3𝑠 ⇒ .6723 ± 3(. 2456) ⇒ .6723 ± .7368 ⇒ (−.0645, 1.4091) 151 of the 151 values fall in this interval. The proportion is 1.00. This is the same as the 1.00 we would expect if the data were normal. From this method, it appears that the data are not normal. For IGI: 𝑥̄ ± 𝑠 ⇒ .3665 ± .2163 ⇒ (. 1502, .5828) 69 of the 78 values fall in this interval. The proportion is .88. This is much larger than the .68 we would expect if the data were normal. 𝑥̄ ± 2𝑠 ⇒ .3665 ± 2(. 2163) ⇒ .3665 ± .4326 ⇒ (−.0661, .7991) 74 of the 78 values fall in this interval. The proportion is .95. This is the same as the .95 we would expect if the data were normal. 𝑥̄ ± 3𝑠 ⇒ .3665 ± 3(. 2163) ⇒ .3665 ± .6489 ⇒ (−.2824, 1.0154) 78 of the 78 values fall in this interval. The proportion is 1.00. This is the same as the 1.00 we would expect if the data were normal. From this method, it appears that the data are not normal. For HRD: 𝑥̄ ± 𝑠 ⇒ .8129 ± .1831 ⇒ (. 6298, .9960) 30 of the 79 values fall in this interval. The proportion is .38. This is much smaller than the .68 we would expect if the data were normal. Copyright © 2022 Pearson Education, Inc.


Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 417

𝑥̄ ± 2𝑠 ⇒ .8129 ± 2(.1831) ⇒ .8129 ± .3662 ⇒ (. 4467, 1.1791) 79 of the 79 values fall in this interval. The proportion is 1.00. This is much larger than the .95 we would expect if the data were normal. 𝑥̄ ± 3𝑠 ⇒ .8129 ± 3(. 1831) ⇒ .8129 ± .5493 ⇒ (. 2636, 1.3622) 79 of the 79 values fall in this interval. The proportion is 1.00. This is the same as the 1.00 we would expect if the data were normal. From this method, it appears that the data are not normal. Next, we look at the ratio of the IQR to s. Using MINITAB, the quartiles are: Descriptive Statistics: CARAT Variable CERT N Mean CARAT GIA 151 0.6723 HRD 79 0.8129 IGI 78 0.3665

StDev 0.2456 0.1831 0.2163

Q1 0.5000 0.6500 0.2100

Median 0.7000 0.8100 0.2900

Q3 0.9000 1.0000 0.4850

For GIA: 𝐼𝑄𝑅 = 𝑄 – 𝑄 = .9 − .4 = .5. .

= = 2.036 This is much larger than the 1.3 we would expect if the data were normal. This . method indicates the data are not normal. For IGI: 𝐼𝑄𝑅 = 𝑄 – 𝑄 = .485 − .210 = .275. .

= = 1.27 This is very close to the 1.3 we would expect if the data were normal. This . method indicates the data might be normal. For HRD: 𝐼𝑄𝑅 = 𝑄 – 𝑄 = 1.00 − .65 = .35. .

= = 1.91 This is much larger than the 1.3 we would expect if the data were normal. This . method indicates the data are not normal. Finally, using MINITAB, the normal probability plots are:

Copyright © 2022 Pearson Education, Inc.


418

Chapter 8

Probability Plot of CARAT Normal - 95% CI -0.5

0.0

0.5

GIA

1.0

1.5

HRD

99.9 99 90 50

Percent

10 1 0.1

IGI

99.9 99 90

10

-0.5

0.0

0.5

1.0

HRD Mean 0.8129 StDev 0.1831 N 79 AD 3.405 P-Value <0.005 IGI Mean 0.3665 StDev 0.2163 N 78 AD 5.561 P-Value <0.005

50

1 0.1

GIA 0.6723 Mean StDev 0.2456 N 151 AD 3.268 P-Value <0.005

1.5

CARAT Panel variable: CERT

Since the data do not form a straight line for GIA, the data are not normal. Since the data do not form a straight line for IGI, the data are not normal. Since the data do not form a straight line HRD, the data are not normal. From the 4 different methods, all indications are that the carat size data are not normal for any of the certification bodies. 8.103

a.

Let 𝜇 = mean number of items recalled by those in the video only group and 𝜇 = mean number of items recalled by those in the audio and video group. To determine if the mean number of items recalled by the two groups is the same, we test: 𝐻 : 𝜇 −𝜇 =0 𝐻 : 𝜇 −𝜇 ≠0

b.

𝑠 =

(

)

(

)

=

The test statistic is 𝑡 =

( ̄

(

(

) .

̄ )

=

) .

( . .

= 4.22865 .

)

=

. .

= 0.62

c.

The rejection region requires 𝛼/2 = .10/2 = .05 in each tail of the t-distribution with 𝑑𝑓 = 𝑛 + 𝑛 − 2 = 20 + 20 − 2 = 38. From Table III, Appendix D, 𝑡. ≈ 1.684. The rejection region is 𝑡 < −1.684 or 𝑡 > 1.684.

d.

Since the observed value of the test statistic does not fall in the rejection region (𝑡 = 0.62 ≯ 1.684), Ho is not rejected. There is insufficient evidence to indicate a difference in the mean number of items recalled by the two groups at 𝛼 = .10.

Copyright © 2022 Pearson Education, Inc.


Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 419 e.

The p-value is 𝑝 = .542. This is the probability of observing our test statistic or anything more unusual if H0 is true. Since the p-value is not less than 𝛼 = .10, there is no evidence to reject H0. There is insufficient evidence to indicate a difference in the mean number of items recalled by the two groups at 𝛼 = .10.

f.

We must assume: 1. 2. 3.

8.104

a.

Both populations are normal Random and independent samples 𝜎 =𝜎

The point estimate for the proportion of all Democrats who prefer steak as their favorite barbeque food is 𝑝̂ = = = .5296. ,

b.

The point estimate for the proportion of all Republicans who prefer steak as their favorite barbeque food is 𝑝̂ = = = .6301.

c.

The point estimate for the difference between proportions of all Democrats and all Republicans who prefer steak as their barbeque food is 𝑝̂ − 𝑝̂ = .5296 − .6301 = −.1005.

d.

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, = 1.96. The 95% confidence interval for the difference between the proportions of all 𝑧. Democrats and all Republicans who prefer steak as their barbeque food is 𝑝̂ 𝑞 . 5296(. 4704) . 6301(. 3699) 𝑝̂ 𝑞 + ⇒ (. 5296 − .6301) ± 1.96 + 𝑛 𝑛 1,250 930

(𝑝̂ − 𝑝̂ ) ± 𝑧.

⇒ −.1005 ± .0416 ⇒ (−.1421, −.0589)

8.105

e.

We are 95 percent confident that the difference in proportions of all Democrats and all Republicans who prefer steak as their favorite barbeque food is between -.1421 and -.0589. Since this interval does not contain 0, there is a sufficient evidence to indicate that there is a significant difference between the proportions of all Democrats and all Republicans who prefer steak as their favorite barbeque food.

f.

“95% confident” means that in repeated sampling, 95% of all confidence intervals constructed in the same manner will contain the true population difference in proportions and 5% will not.

a.

Let 𝑝 = proportion of men who prefer to keep track of appointments in their head and 𝑝 = proportion of women who prefer to keep track of appointments in their head. To determine if the proportion of men who prefer to keep track of appointments in their head is greater than that of women, we test: 𝐻 : 𝑝 −𝑝 =0 𝐻 : 𝑝 −𝑝 >0

b.

c.

(.

)

(.

The test statistic is 𝑧 =

(

)

𝑝̂ =

=

)

= .51 and 𝑞 = 1 − 𝑝̂ = 1 − .51 = .49 =

.

(.

.

(.

)

)

= 3.16

The rejection region requires 𝛼 = .01 in the upper tail of the z distribution. From Table II, Appendix D, 𝑧. = 2.33. The rejection region is𝑧 > 2.33. Copyright © 2022 Pearson Education, Inc.


420

8.106

Chapter 8 d.

The p-value is 𝑝 = 𝑃(𝑧 ≥ 3.16) = .5 − .49921 = .00079

e.

Since the observed value of the test statistic falls in the rejection region (𝑧 = 3.16 > 2.33), H0 is rejected. There is sufficient evidence to indicate the proportion of men who prefer to keep track of appointments in their head is greater than that of women at 𝛼 = .01.

a.

Let 𝜇 = mean annual percentage turnover for U.S. plants and 𝜇 = mean annual percentage turnover for Japanese plants. The descriptive statistics are: Descriptive Statistics: US, Japan Variable N Mean Median US 5 6.562 6.870 Japan 5 3.118 3.220

𝑠 =

(

)

(

)

=

(

)( .

StDev 1.217 1.227

(

)

Minimum 4.770 1.920

)( .

)

Maximum 8.000 4.910

Q1 5.415 1.970

Q3 7.555 4.215

= 1.4933

To determine if the mean annual percentage turnover for U.S. plants exceeds that for Japanese plants, we test: 𝐻 : 𝜇 −𝜇 =0 𝐻 :𝜇 − 𝜇 > 0 The test statistic is 𝑡 =

( ̄

̄ )

=

( .

. .

)

= 4.456

The rejection region requires 𝛼 = .05 in the upper tail of the t-distribution with 𝑑𝑓 = 𝑛 + 𝑛 – 2 = 5 + 5– 2 = 8. From Table III, Appendix D, 𝑡. = 1.860. The rejection region is 𝑡 > 1.860. Since the observed value of the test statistic falls in the rejection region (𝑡 = 4.456 > 1.86), H0 is rejected. There is sufficient evidence to indicate the mean annual percentage turnover for U.S. plants exceeds that for Japanese plants at 𝛼 = .05. b.

The p-value= 𝑝 = 𝑃(𝑡 ≥ 4.456). Using MINITAB, with df = 𝑛 + 𝑛 – 2 = 5 + 5– 2 = 8, Cumulative Distribution Function Student's t distribution with 8 DF x 4.456

P( X <= x ) 0.998939

𝑝 = 𝑃(𝑡 ≥ 4.456) = 1 − .9989 = .0011. Since the p-value is so small, there is evidence to reject H0 for 𝛼 > .0011. c.

The necessary assumptions are: 1. 2. 3.

Both sampled populations are approximately normal. The population variances are equal. The samples are randomly and independently sampled.

There is no indication that the populations are not normal. The sample sizes are so small, it is hard to check the assumptions. Both sample variances are similar, so there is no evidence the population variances are unequal. There is no indication the assumptions are not valid. Copyright © 2022 Pearson Education, Inc.


Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 421 8.107

a.

The first population of interest is all hospital patients admitted in January. The second population of interest is all hospital patients admitted in May.

b.

𝑝̂ =

=

= .167

𝑝̂ =

=

= .084

The point estimate for the difference in malaria admission rates in January and May is 𝑝̂ − 𝑝̂ = .167 − .084 = .083. c.

For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table II, Appendix D, 𝑧.05 = 1.645. The 90% confidence interval is: (𝑝̂ − 𝑝̂ ) ± 𝑧.

⇒ (. 167 − .084) ± 1.645

+

.

(.

)

+

.

(.

)

⇒ .083 ± .050 ⇒ (. 033, .133) d. 8.108

Since 0 is not contained in the confidence interval, we can conclude that a difference exists in the true malaria admission rates in January and May.

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, = 1.96. 𝑧. /

𝑛 =𝑛 = 8.109

a.

=

( .

) (

)

= 1,728.7 ≈ 1,729

Using MINITAB, the descriptive statistics are: Descriptive Statistics: Purchasers, Nonpurchasers Variable N Mean Median StDev Minimum Maximum Purchase 20 39.80 38.00 10.04 23.00 59.00 Nonpurch 20 47.20 52.00 13.62 22.00 66.00

𝑠 =

(

)

(

)

=

(

)

(

.

)

.

Q1 32.25 33.50

Q3 48.75 58.75

= 143.153

Let 𝜇 = mean age of nonpurchasers and 𝜇 = mean age of purchasers. To determine if there is a difference in the mean age of purchasers and nonpurchasers, we test: 𝐻 : 𝜇 −𝜇 =0 𝐻 :𝜇 − 𝜇 ≠ 0 The test statistic is 𝑡 =

( ̄

̄ )

=

(

.

. .

)

= 1.956

The rejection region requires 𝛼/2 = .10/2 = .05 in each tail of the t-distribution with 𝑑𝑓 = 𝑛 + 𝑛 – 2 = 20 + 20– 2 = 38. From Table III, Appendix D, 𝑡. ≈ 1.684. The rejection region is 𝑡 < −1.684 or 𝑡 > 1.684. Since the observed value of the test statistic falls in the rejection region (𝑡 = 1.956 > 1.684), H0 is rejected. There is sufficient evidence to indicate the mean age of purchasers and nonpurchasers differ at 𝛼 = .10.

Copyright © 2022 Pearson Education, Inc.


422

Chapter 8 b.

The necessary assumptions are: 1. 2. 3.

Both sampled populations are approximately normal. The population variances are equal. The samples are randomly and independently sampled.

c.

The p-value is 𝑝 = 𝑃(𝑡 ≤ −1.956) + 𝑃(𝑡 ≥ 1.956) = (. 5 − .4748) + (. 5 − .4748) = .0504. The probability of observing a test statistic of this value or more unusual if H0 is true is .0504. Since this value is less than 𝛼 = .10, H0 is rejected. There is sufficient evidence to indicate there is a difference in the mean age of purchasers and nonpurchasers.

d.

For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table III, Appendix D, with 𝑑𝑓 = 38, 𝑡. ≈ 1.684. The confidence interval is: (𝑥̄ − 𝑥̄ ) ± 𝑡.

𝑠

⇒ (39. 8 − 47. 2) ± 1.684 143.153

+

+

⇒ −7.4 ± 6.37 ⇒ (−13.77, −1.03) We are 90% confident that the difference in mean ages between purchasers and nonpurchasers is between −13.77 and −1.03. 8.110

a.

Let 𝜇 = mean starting BMI and 𝜇 = mean ending BMI. To determine if the mean BMI at the end of the camp is less than the mean BMI at the start of camp, we test: 𝐻 : 𝜇 =0 𝐻 : 𝜇 > 0 where 𝜇 = 𝜇 − 𝜇

b.

The data should be analyzed as a paired-difference t-test. Each camper had his/her BMI measured at the start of the camp and at the end. Therefore, these two sets of BMI’s are not independent.

c.

The test statistic is z =

( x1 − x2 ) − ( μ1 − μ2 ) ( 34.9 − 31.6 ) − 0 σ 12 n1 ̅

+

σ 22

.

=

n2

2

2

6.9 + 6.2 76 76

= 3.10 .

d.

The test statistic is 𝑧 =

e.

The test statistic using the paired-difference formula is much larger than the test statistic using the independent samples formula. The test statistic for the paired-difference provides more evidence to support the alternative hypothesis.

f.

Since the p-value is less than 𝛼 (𝑝 < .0001 < .01), H0 is rejected. There is sufficient evidence to indicate the mean BMI at the end of camp is less than the mean BMI at the start of camp.

g.

No, the differences in the BMI values do not have to be normally distributed. The sample size is𝑛 = 76. Thus, the Central Limit Theorem applies and says that the sampling distribution of 𝑥̅ will be approximately normally distributed.

h.

For confidence coefficient .99, 𝛼 = .01 and 𝛼/2 = .01/2 = .005. From Table II, Appendix D, = 2.58. The 99% confidence interval is: 𝑧.

=

. /√

= 19.18.

Copyright © 2022 Pearson Education, Inc.


Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 423 𝑥̅ ± 𝑧 /

⇒ 3.3 ± 2.58

. √

⇒ 3.3 ± .444 ⇒ (2.856, 3.744)

We are 99% confident that the true difference in the mean BMI scores between the start of camp and 8.111

a.

No. Just looking at the sample means, as the students went from no solution to check figures, the sample mean improvement score increased. However, as the students went from check figures to complete solutions, the sample mean improvement score dropped to below the no solution group.

b.

The problem with using only the sample means to make inferences about the population mean knowledge gains for the three groups of students is that we don’t know the variability or the “spread” of the probability distributions of the populations.

c.

Let 𝜇 = mean knowledge gain for students in the “no solutions” group and 𝜇 = mean knowledge gain for students in the “check figures” group. To determine if the test score improvement decreases as the level of assistance increases, we test: 𝐻 : 𝜇 −𝜇 =0 𝐻 : 𝜇 −𝜇 >0

d.

Since the observed significance level of the test is not less than 𝛼 (𝑝 = .8248 ≮ . 05), H0 is not rejected. There is insufficient evidence to indicate that the mean knowledge gain of students in the “no solutions” group is greater than the mean knowledge gain of students in the “check figures” group at 𝛼 = .05.

e.

Let 𝜇 = mean knowledge gain of students in the “completed solutions” group. To determine if the test score improvement decreases as the level of assistance increases, we test: 𝐻 : 𝜇 −𝜇 =0 𝐻 : 𝜇 −𝜇 >0

f.

Since the observed significance level of the test is not less than 𝛼 (𝑝 = .1849 ≮ . 05), do not reject H0. There is insufficient evidence to indicate that the mean knowledge gain of students in the “check figures” group is greater than the mean knowledge gain of students in the “complete solutions” group at 𝛼 = .05.

g.

To determine if the test score improvement decreases as the level of assistance increases, we test: 𝐻 : 𝜇 −𝜇 =0 𝐻 : 𝜇 −𝜇 >0

8.112

h.

Since the observed significance level of the test is not less than 𝛼 (𝑝 = .2726 ≮ . 05), do not reject H0. There is insufficient evidence to indicate that the mean knowledge gain of students in the “no solutions” group is greater than the mean knowledge gain of students in the “complete solutions” group at 𝛼 = .05.

a.

𝜇 =

b.

We do not need to estimate anything – we know the parameter’s value.

c.

Using MINITAB, the descriptive statistics are:

=

= −4.686

Copyright © 2022 Pearson Education, Inc.


424

Chapter 8 Statistics Variable

N Mean StDev Minimum Median Maximum

MATH2019 MATH2017 MathDiff

51 552.20 51 556.88 51 -4.69

51.29 47.12 22.11

460.00 468.00 -89.00

546.00 548.00 1.00

648.00 651.00 32.00

Let 𝜇 = mean Math SAT score in 2019 and 𝜇 = mean Math SAT score in 2017. Then 𝜇 = 𝜇 − 𝜇 . To determine if the true mean Math SAT score in 2019 differs from that in 2017, we test: 𝐻 : 𝜇 =0 𝐻 : 𝜇 ≠0 The test statistic is 𝑧 =

̅

.

=

= −1.51

. √

The rejection region requires 𝛼/2 = .10/2 = .05 in each tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.645. The rejection region is 𝑧 < −1.645 or 𝑧 > 1.645. Since the observed value of the test statistic does not fall in the rejection region ( 𝑧 = −1.51 ≮ − 1.645), H0 is not rejected. There is insufficient evidence to indicate the true mean Math SAT score in 2019 is different than that in 2017 at 𝛼 = .10. 8.113

= 1.96. Since For probability .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. we have no prior information about the proportions, we use 𝑝 = 𝑝 = .5 to get a conservative estimate. 𝑛 =𝑛 =

8.114

)

( .

=

) . (

. )

. (

. )

.

=

. .

= 4,802

For confidence level .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. The standard deviation can be estimated by dividing the range by 4: 𝜎≈

Range

= 1.96.

= =1

𝑛 =𝑛 = 8.115

(

/

/

(

)

.

=

.

= 192.08 ≈ 193

Some preliminary calculations are: 𝑠 = 𝑠 =

(∑

)

= (∑

)

=

,

,

= =

= 31.5 .

= 11.3

Let 𝜎 = variance for instrument A and 𝜎 = variance for instrument B. Since we wish to determine if there is a difference in the precision of the two machines, we test: 𝐻 : 𝜎 =𝜎 𝐻 :𝜎 ≠𝜎 The test statistic is 𝐹 =

=

=

. .

= 2.79

Copyright © 2022 Pearson Education, Inc.


Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 425 The rejection region requires 𝛼/2 = .10/2 = .05 in the upper tail of the F-distribution with 𝜈 = 𝑛 − 1 = 5 − 1 = 4 and 𝜈 = 𝑛 − 1 = 5 − 1 = 4. From Table VI, Appendix D, 𝐹. = 6.39. The rejection region is 𝐹 > 6.39. Since the observed value of the test statistic does not fall in the rejection region (𝐹 = 2.79 ≯ 6.39), H0 is not rejected. There is insufficient evidence of a difference in the precision of the two instruments at 𝛼 = .10. 8.116

a.

Let 𝜇 = the mean heat rates of traditional augmented gas turbines and 𝜇 = the mean heat rates of aeroderivative augmented gas turbines. Some preliminary calculations are: 𝑠 =

(

)

(

)

=

(

(

)

)

= 2,371,831.409

To determine if there is a difference in the mean heat rates for traditional augmented gas turbines and the mean heat rates of aeroderivative augmented gas turbines, we test: 𝐻 : 𝜇 −𝜇 =0 𝐻 : 𝜇 −𝜇 ≠0 The test statistic is 𝑡 =

( ̄

̄ )

(

=

, ,

, ,

)

.

=

.

= −1.21

The rejection region requires 𝛼/2 = .05/2 = .025 in each tail of the t-distribution with 𝑑𝑓 = 𝑛 + 𝑛 – 2 = 39 + 7– 2 = 44. From Table III, Appendix D, 𝑡. 25 ≈ 2.021. The rejection region is 𝑡 < −2.021 or 𝑡 > 2.021. Since the observed value of the test statistic does not fall in the rejection region (𝑡 = −1.20 ≮ − 2.021), H0 is not rejected. There is insufficient evidence to indicate that there is a difference in the mean heat rates for traditional augmented gas turbines and the mean heat rates of aeroderivative augmented gas turbines at 𝛼 = .05. b.

Let 𝜇 = the mean heat rates of advanced augmented gas turbines and 𝜇 = the mean heat rates of aeroderivative augmented gas turbines. Some preliminary calculations are: 𝑠 =

(

)

(

)

=

(

(

)

)

= 1,937,117.077

To determine if there is a difference in the mean heat rates for traditional augmented gas turbines and the mean heat rates of aeroderivative augmented gas turbines, we test: 𝐻 : 𝜇 −𝜇 =0 𝐻 : 𝜇 −𝜇 ≠0 The test statistic is 𝑡 =

( ̄

̄ )

=

( , ,

, ,

.

)

=

, .

= −4.19

The rejection region requires 𝛼/2 = .05/2 = .025 in each tail of the t-distribution with 𝑑𝑓 = 𝑛 + 𝑛 – 2 = 21 + 7– 2 = 26. From Table III, Appendix D, 𝑡. 25 ≈ 2.056. The rejection region is 𝑡 < −2.056 or 𝑡 > 2.056. Copyright © 2022 Pearson Education, Inc.


426

Chapter 8 Since the observed value of the test statistic falls in the rejection region (𝑡 = −4.19 < −2.056), H0 is rejected. There is sufficient evidence to indicate that there is a difference in the mean heat rates for advanced augmented gas turbines and the mean heat rates of aeroderivative augmented gas turbines at 𝛼 = .05. c.

Let 𝜎 = heat rate variance of traditional augmented gas turbines, 𝜎 = heat rate variance of aeroderivative augmented gas turbines, and 𝜎 = heat rate variance of advanced augmented gas turbines. Using MINITAB, some preliminary calculations are: Descriptive Statistics: HEATRATE Variable ENGINE N Mean HEATRATE Advanced 21 9764 Aeroderiv 7 12312 Traditional 39 11544

StDev Minimum Q1 Median Q3 639 9105 9252 9669 10060 2652 8714 9469 12414 14628 1279 10086 10592 11183 11964

Maximum 11588 16243 14796

To determine if the heat rate variances for traditional and aeroderivative augmented gas turbines differ, we test: 𝐻 :𝜎 = 𝜎 𝐻 :𝜎 ≠ 𝜎 The test statistic is 𝐹 =

Larger sample variance Smaller sample variance

=

=

= 4.299

The rejection region requires 𝛼/2 = .05/2 = .025 in the upper tail of the F-distribution with 𝜈 = ≈ 2.74. The 𝑛 – 1 = 7– 1 = 6 and 𝜈 = 𝑛 – 1 = 39– 1 = 38. From Table VII, Appendix D, 𝐹. rejection region is 𝐹 > 2.74. Since the observed value of the test statistic falls in the rejection region (𝐹 = 4.299 > 2.74), H0 is rejected. There is sufficient evidence to indicate the heat rate variances for traditional and aeroderivative augmented gas turbines differ at 𝛼 = .05. Since the test in part a assumes that the population variances are the same, the validity of the test is suspect since we just found the variances are different. d.

To determine if the heat rate variances for advanced and aeroderivative augmented gas turbines differ, we test: 𝐻 :𝜎 = 𝜎 𝐻 :𝜎 ≠ 𝜎 The test statistic is 𝐹 =

Larger sample variance Smaller sample variance

=

=

= 17.224

The rejection region requires 𝛼/2 = .05/2 = .025 in the upper tail of the F-distribution with 𝜈 = = 3.13. The 𝑛 – 1 = 7– 1 = 6 and 𝜈 = 𝑛 – 1 = 21– 1 = 20. From Table VII, Appendix D, 𝐹. rejection region is 𝐹 > 3.13. Since the observed value of the test statistic falls in the rejection region (𝐹 = 17.224 > 3.13), H0 is rejected. There is sufficient evidence to indicate the heat rate variances for advanced and aeroderivative augmented gas turbines differ at 𝛼 = .05. Since the test in part b assumes that the population variances are the same, the validity of the test is suspect since we just found the variances are different. Copyright © 2022 Pearson Education, Inc.


Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 427 8.117

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. = 1.96. 𝑛 =𝑛 =

8.118

a.

/

(

)

=

.

= 292. 9 ≈ 293

Let 𝜇 = mean scale score for employees who report positive spillover of work skills and 𝜇 = mean scale score for employees who did not report positive work spillover. To determine if the mean scale score for employees who report positive spillover of work skills differs from the mean scale score for employees who did not report positive work spillover, we test: 𝐻 : 𝜇 −𝜇 =0 𝐻 :𝜇 − 𝜇 ≠ 0

b.

It is appropriate to apply the large sample z-test because there are 114 workers that have been studied and divided into two groups.

c.

From the printout, the test statistics is𝑡 = 8.847 (equal variances not assumed) and the p-value is 𝑝 < 0.0001. Since the p-value is less than 𝛼 (𝑝 < .0001 < .05), H0 is rejected. There is sufficient evidence to indicate the mean scale score for employees who report positive spillover of work skills is different from the mean scale score for employees who did not report positive work spillover at 𝛼 = .05.

d.

We are 95% confident that the difference between the mean use of creative ideas scale scores for the two groups in between .6287 and .9865. Since interval does not contain 0, then we can say that there is a significant difference on the mean scale scores between the two groups. Yes, the inference derived from the confidence interval agrees with that from the hypothesis test.

e.

Let 𝑝 = proportion of male workers who reported positive work spillover and 𝑝 = proportion of male workers who did not report positive spillover of work skills. To determine if the proportions of male workers in the two groups are significantly different, we test: 𝐻 : 𝑝 −𝑝 =0 𝐻 :𝑝 − 𝑝 ≠ 0 From the printout, the test statistic is 𝑧 = .77 and the p-value is 𝑝 = .442. Since the p-value is not small, there is no evidence to reject H0. There is insufficient evidence to indicate the proportions of male workers in the two groups are significantly different at any value of 𝛼 < .453.

8.119

Attitude towards the Advertisement: The p-value is 𝑝 = .091. There is no evidence to reject H0 for 𝛼 = .05. There is no evidence to indicate the first ad will be more effective when shown to males for 𝛼 = .05. There is evidence to reject H0 for 𝛼 = .10. There is evidence to indicate the first ad will be more effective when shown to males for 𝛼 = .10. Attitude toward Brand of Soft Drink: The p-value is 𝑝 = .032. There is evidence to reject H0 for 𝛼 > .032. There is evidence to indicate the first ad will be more effective when shown to males for 𝛼 > .032. Intention to Purchase the Soft Drink: The p-value is 𝑝 = .050. There is no evidence to reject H0 for 𝛼 = .05. There is no evidence to indicate the first ad will be more effective when shown to males for 𝛼 = .05. There is evidence to reject H0 for Copyright © 2022 Pearson Education, Inc.


428

Chapter 8 𝛼 > .050. There is evidence to indicate the first ad will be more effective when shown to males for 𝛼 > .050. No, I do not agree with the author’s hypothesis. The results agree with the author’s hypothesis for only the attitude toward the Brand using 𝛼 = .05. If we want to use 𝛼 = .10, then the author’s hypotheses are all supported.

8.120

Let 𝜇 = mean amount of surplus Missouri producers are willing to sell to the biomass market and 𝜇 = mean amount of surplus Illinois producers are willing to sell to the biomass market. To determine if there is a difference in the mean amount of surplus producers are willing to sell to the biomass market between Missouri and Illinois producers, we test: 𝐻 : 𝜇 −𝜇 =0 𝐻 : 𝜇 −𝜇 ≠0 The test statistic is z =

( x1 − x2 ) − ( μ1 − μ2 ) ( 21.5 − 22.2 ) − 0 σ 12 n1

+

=

σ 22 n2

2

2

33.4 + 34.9 431 508

= − .31 .

The rejection region requires 𝛼/2 = .05/2 = .025 in each tail of the z-distribution. From Table II, = 1.96. The rejection region is 𝑧 < 1.96 or 𝑧 > 1.96. Appendix D, 𝑧. Since the observed value of the test statistic does not fall in the rejection region (𝑧 = −.31 ≮ − 1.96), H0 is not rejected. There is insufficient evidence to indicate a difference in the mean amount of surplus producers are willing to sell to the biomass market between Missouri and Illinois producers at 𝛼 = .05. 8.121

Let 𝑝 = proportion of African American MBA students who begin their career as entrepreneurs and 𝑝 = proportion of white MBA students who begin their career as entrepreneurs. Some preliminary calculations: 𝑝̂ = = = .1603

𝑞 = 1 − 𝑝̂ = 1 − .1603 = .8397

𝑝̂ =

𝑞 = 1 − 𝑝̂ = 1 − .05 = .95

,

𝑝̂ =

=

= .05

,

=

,

,

= .0671

𝑞 = 1 − 𝑝̂ = 1 − .0671 = .9329

To determine if African American MBA students are more likely to begin their careers as an entrepreneur than white MAB students, we test: 𝐻 : 𝑝 −𝑝 =0 𝐻 : 𝑝 −𝑝 >0 The test statistic is 𝑧 =

(

)

=

. .

(.

. )

= 14.64 ,

,

Since no 𝛼 was given, we will use 𝛼 = .05. The rejection region requires 𝛼 = .05 in the upper tail of the z-distribution. From Table II, Appendix D, 𝑧.05 = 1.645. The rejection region is 𝑧 > 1.645. Since the observed value of the test statistic falls in the rejection region (𝑧 = 14.64 > 1.645), H0 is rejected. There is sufficient evidence to indicate that the proportion of African American MBA students who begin their career as entrepreneurs is significantly greater than the proportion of white MBA students who begin their career as entrepreneurs. Copyright © 2022 Pearson Education, Inc.


Inferences Based on Two Samples: Confidence Intervals and Tests of Hypotheses 429 8.122

a.

We cannot make inferences about the difference between the mean salaries of male and female accounting/finance/banking professionals because no standard deviations are provided.

b.

To determine if the mean salary for males is significantly greater than that for females, we test: 𝐻 : 𝜇 −𝜇 =0 𝐻 : 𝜇 −𝜇 >0 The rejection region requires 𝛼 = .05 in the upper tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.645. To make things easier, we will assume that the standard deviations for the 2 groups are the same. The test statistic is 𝑧 =

( ̄

̄ )

=

(

,

,

)

=

, (.

)

,

=

.

In order to reject H0 this test statistic must fall in the rejection region, or be greater than 1.645. Solving for 𝜎 we get: ,

𝑧=

.

> 1.645 ⇒ 𝜎 <

,

. .

= 286,866.99

Thus, to reject H0 the average of the two standard deviations has to be less than $286,866.99.

8.123

c.

Yes. In fact, reasonable values for the standard deviation will be much smaller than the required $286,866.99.

d.

These data were collected from voluntary subjects who responded to a Web-based survey. Thus, this is not a random sample, but a self-selected sample. Generally, subjects who respond to surveys tend to have very strong opinions, which may not be the same as the population in general. Thus, the results from this self-selected sample may not reflect the results from the population in general.

Let 𝜇 = mean output for Design 1, 𝜇 = mean output for Design 2, and 𝜇 = 𝜇 − 𝜇 . Some preliminary calculations are: Difference (Design 1 - Design 2) −53 −271 −206 −266 −213 −183 −118 −87

Working Days 8/16 8/17 8/18 8/19 8/20 8/23 8/24 8/25

𝑥̅ = 𝑠 =

=

,

= −174.625 𝑠 =

=

,

(

6,548.839 = 80.925

Copyright © 2022 Pearson Education, Inc.

,

)

= 6,548.839


430

Chapter 8 To determine if Design 2 is superior to Design 1, we test: 𝐻 : 𝜇 =0 𝐻 :𝜇 < 0 The test statistic is 𝑡 =

̅

=

.

= −6.103

. √

Since no 𝛼 value was given, we will use 𝛼 = .05. The rejection region requires 𝛼 = .05 in the lower tail of the t-distribution with 𝑑𝑓 = 𝑛 − 1 = 8 − 1 = 7. From Table III, Appendix D, 𝑡. = 1.895. The rejection region is 𝑡 < −1.895. Since the observed value of the test statistic falls in the rejection region (𝑡 = −6.103 < −1.895), H0 is rejected. There is sufficient evidence to indicate Design 2 is superior to Design 1 at 𝛼 = .05. For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025.From Table III, Appendix D, with 𝑑𝑓 = = 2.365. The 95% confidence interval for 𝜇 is: 𝑛 − 1 = 8 − 1 = 7, 𝑡. 𝑥̄ ± 𝑡.

⇒ −174.625 ± 2.365

. √

⇒ −174.625 ± 67.666 ⇒ (−242.29, −106.96)

Since this interval does not contain 0, there is evidence to indicate Design 2 is superior to Design 1.

Copyright © 2022 Pearson Education, Inc.


Chapter 9 Design of Experiments and Analysis of Variance 9.1

Since only one factor is utilized, the treatments are the four levels (A, B, C, D) of the qualitative factor.

9.2

The treatments are the combinations of levels of each of the two factors. There are 2 × 5 = 10 treatments. They are: (A, 50), (A, 60), (A, 70), (A, 80), (A, 90), (B, 50), (B, 60), (B, 70), (B, 80), (B, 90)

9.3

One has no control over the levels of the factors in an observational experiment. One does have control of the levels of the factors in a designed experiment.

9.4

a.

College GPA's are measured on college students. The experimental units are college students.

b.

Household income is measured on households. The experimental units are households.

c.

Gasoline mileage is measured on automobiles. The experimental units are the automobiles of a particular model.

d.

The experimental units are the sectors on a computer diskette.

e.

The experimental units are the states.

a.

This is an observational experiment. The economist has no control over the factor levels or unemployment rates.

b.

This is a designed experiment. The manager chooses only three different incentive programs to compare, and randomly assigns an incentive program to each of nine plants.

c.

This is an observational experiment. Even though the marketer chooses the publication, he has no control over who responds to the ads.

d.

This is an observational experiment. The load on the facility's generators is only observed, not controlled.

e.

This is an observational experiment. One has no control over the distance of the haul, the goods hauled, or the price of diesel fuel.

a.

The response variable is the self-regulation deficiency value of a binge watcher.

b.

There is just a single factor in this problem – the age group of the binge watcher.

c.

The treatments are the three levels of the factor – under 20 years old, 20-25 years old, and over 25 years old.

d.

The experimental unit is a binge watcher

9.5

9.6

431 Copyright © 2022 Pearson Education, Inc.


432

Chapter 9

9.7

a.

The experimental units are the firms with CPAs.

b.

The response variable is the firm’s likelihood of reporting sustainability policies.

c.

There are two factors – firm size and firm type.

d.

There are two levels of firm size – large and small. There are two levels of firm type – public and private.

e.

The treatments are the combinations of the factor levels. There are 2 × 2 = 4 treatments – large/public, large/private, small/public, and small/private.

a.

The experimental units are consumers.

b.

The response variable is the product quality rating.

c.

There are two factors – product advertised and advertisement message.

d.

There are two levels of product advertised – soda and soap. There are two levels of advertisement message – control and passion.

e.

The treatments are the combinations of the factor levels. There are 2 × 2 = 4 treatments – soda/control, soda/passion, soap/control, and soap/passion.

a.

The study is designed because the experimental units (study participants) were randomly assigned to the treatments (gift givers and gift receivers).

b.

The experimental units are the study participants. The response variable is the level of appreciation measured on a scale from 1 to 7. There is one factor – role. There are two levels of role and thus, two treatments. The treatments are gift giver or gift receiver.

a.

The response variable in this problem is the consumer’s opinion on the value of the discount offer.

b.

There are two treatments in this problem: Within-store price promotion and between-store price promotion.

c.

The experimental units are the consumers.

a.

There are 2 factors in this problem, each with 2 levels. Thus, there are a total of 2 × 2 = 4 treatments.

b.

The 4 treatments are: (Within-store, home), (Within-store, in store), (Between-store, home), and (Between-store, in store).

a.

The experimental unit is a healthy Boer goat.

b.

The response variable is the weight loss.

c.

The factor is the amount and time of ascorbic acid (AA) administered.

d.

There are 4 levels of the factor: AA administered 30 minutes prior to transportation, AA administered 30 minutes following transportation, no AA administered prior or following transportation, and no AA and no transportation.

a.

This is a designed experiment because the subjects are randomly assigned to groups and groups are randomly assigned to a decision rule.

9.8

9.9

9.10

9.11

9.12

9.13

Copyright © 2022 Pearson Education, Inc.


Design of Experiments and Analysis of Variance 433 b.

The experimental unit is a subject or person. The dependent variable is the number of words spoken by women on a certain topic per 1,000 total words spoken.

c.

There are 2 factors in this experiment – gender composition and decision rule. Gender composition has 6 levels: 0, 1, 2, 3, 4, or 5 women. Decision rule has 2 levels: unanimous or majority rule.

d.

There are a total of 6 × 2 = 12 treatments. They are: ( 0,U ) , (1,U ) , ( 2,U ) , ( 3,U ) , ( 4,U ) , ( 5,U ) ,

( 0, M ) , (1, M ) , ( 2, M ) , ( 3, M ) , ( 4, M ) , ( 5, M ) .

9.14

a.

The dependent variable is the dissolution time.

b.

There are 3 factors in this experiment: Binding agent, binding concentration, and relative density. Binding agent has 2 levels – khaya gum and PVP. Binding concentration has 2 levels − .5% and 4.0%. Relative density has 2 levels – high and low.

c.

There could be a total of 2 × 2 × 2 = 8 treatments for this experiment. They are: khaya gum, .5%, high khaya gum, .5%, low khaya gum, 4.0%, high khaya gum, 4.0%, low

9.15

9.16

9.17

PVP, .5%, high PVP, .5%, low PVP, 4.0%, high PVP, 4.0%, low

a.

From Table VI with ν1 = 4 and ν 2 = 4 , F.05 = 6.39 .

b.

From Table VIII with ν1 = 4 and ν 2 = 4 , F.01 = 15.98 .

c.

From Table V with ν1 = 30 and ν 2 = 40 , F.10 = 1.54 .

d.

From Table VII with ν1 = 15 and ν 2 = 12 , F.025 = 3.18 .

a.

𝑃(𝐹 ≤ 3.48) = 1 − .05 = .95 using Table VI, Appendix D, with 𝜈 = 5 and 𝜈 = 9

b.

𝑃(𝐹 > 3.09) = .01 using Table VIII, Appendix D, with 𝜈 = 15 and 𝜈 = 20

c.

𝑃(𝐹 > 2.40) = .05 using Table VI, Appendix D, with 𝜈 = 15 and 𝜈 = 15

d.

𝑃(𝐹 ≤ 1.83) = 1 − .10 = .90 using Table V, Appendix D, with 𝜈 = 8 and 𝜈 = 40

a.

In the second dot diagram #2, the difference between the sample means is small relative to the variability within the sample observations. In the first dot diagram #1, the values in each of the samples are grouped together with a range of 4, while in the second diagram #2, the range of values is 8.

b.

For diagram #1,  x1 = 7 + 8 + 9 + 9 + 10 + 11 = 54 = 9 x1 = n 6 6 For diagram #2,  x1 = 5 + 5 + 7 + 11 + 13 + 13 = 54 = 9 x1 = 6 6 n

c.

x2 =

 x = 12 + 13 + 14 + 14 + 15 + 16 = 84 = 14

x2 =

 x = 10 + 10 + 12 + 16 + 18 + 18 = 84 = 14

2

n

6

6

2

n

For diagram #1, Copyright © 2022 Pearson Education, Inc.

6

6


434

Chapter 9

  x = 54 + 84 = 11.5   x =  n 12  

2

SST =  ni ( xi − x )2 = 6(9 − 11.5)2 + 6(14 − 11.5)2 = 75 i =1

For diagram #2, 2

SST =  ni ( xi − x ) 2 = 6 ( 9 − 11.5 ) + 6 (14 − 11.5 ) = 75 2

2

i =1

d.

For diagram #1,

s12 =

( x ) x −

2

54 2 496 − 6 =2 = 6 −1

1

2 1

n1

n1 − 1

s22 =

( x ) x − 2 2

2

2

n2 n2 − 1

=

84 2 6 =2 6 −1

1186 −

SSE = ( n1 − 1) s12 + ( n2 − 1) s22 = ( 6 − 1) 2 + ( 6 − 1) 2 = 20

For diagram #2,

s12 =

( x ) x − 1

2 1

n1

n1 − 1

2

54 2 558 − 6 = 14.4 = 6 −1

s22 =

( x ) x − 2 2

2

2

n2 n2 − 1

=

84 2 6 = 14.4 6 −1

1248 −

SSE = ( n1 − 1) s12 + ( n2 − 1) s22 = ( 6 − 1)14.4 + ( 6 − 1)14.4 = 144

e.

For diagram #1, SS ( Total ) = SST + SSE = 75 + 20 = 95 SST is

SST 75 ×100% = × 100% = 78.95% of SS ( Total ) SS ( Total ) 95

For diagram #2, SS ( Total ) = SST + SSE = 75 + 144 = 219 SST is

f.

g.

SST 75 ×100% = × 100% = 34.25% of SS ( Total ) SS ( Total ) 219

For diagram #1, MST =

SST 75 = = 75 , k −1 2 −1

For diagram #2, MST =

SST 75 SSE 144 MST 75 = = 75 , MSE = = = 14.4 , F = = = 5.21 k −1 2 −1 n − k 12 − 2 MSE 14.4

MSE =

SSE 20 = = 2, n − k 12 − 2

F=

MST 75 = = 37.5 MSE 2

The rejection region for both diagrams requires 𝛼 = .05 in the upper tail of the F-distribution with 𝜈 = 𝑘 − 1 = 2 − 1 = 1 and 𝜈 = 𝑛 − 𝑘 = 12 − 2 = 10. From Table VI, Appendix D, 𝐹. = 4.96. The rejection region is 𝐹 > 4.96. For diagram #1, since the observed value of the test statistic falls in the rejection region (𝐹 = 37. 5 > 4.96), H0 is rejected. There is sufficient evidence to indicate the samples were drawn from populations with different means at 𝛼 = .05.

Copyright © 2022 Pearson Education, Inc.


Design of Experiments and Analysis of Variance 435

For diagram #2, since the observed value of the test statistic falls in the rejection region (𝐹 = 5.21 > 4.96), H0 is rejected. There is sufficient evidence to indicate the samples were drawn from populations with different means at 𝛼 = .05. h. 9.18

We must assume both populations are normally distributed with common variances.

For each dot diagram, we want to test: H 0 : μ1 = μ 2 H a : μ1 ≠ μ 2

From Exercise 9.17, Diagram #1 x1 = 9 x2 = 14

Diagram #2 x1 = 9 x2 = 14

s12 = 2

s12 = 14.4

s22 = 2

s22 = 14.4

a. Diagram #1 s 2 + s22 2 + 2 = =2 sp2 = 1 2 2 ( n1 = n2 )

Diagram #2 s 2 + s22 14.4 + 14.4 sp2 = 1 = = 14.4 2 2 ( n1 = n2 )

In Exercise 9.17, MSE = 2

In Exercise 9.17, MSE = 14.4

The pooled variance for the two-sample t-test is the same as the MSE for the F-test. b. t=

Diagram #1 x1 − x2 9 − 14 = 1 1 1 1 2 +  sp2  +  6 6  n1 n2 

= − 6.12 In Exercise 9.17, F = 37.5

t=

Diagram #2 x1 − x2 9 − 14 = 1 1 1 1 14.4  +  sp2  +  6 6  n1 n2 

= − 2.28 In Exercise 9.17, F = 5.21

The test statistic for the F-test is the square of the test statistic for the t-test. c. Diagram #1 For the t-test, the rejection region requires α / 2 = .05 / 2 = .025 in each tail of the tdistribution with df = n1 + n2 − 2 = 6 + 6 − 2 = 10 . From Table III, Appendix D, t.025 = 2.228 .

Diagram #2 For the t-test, the rejection region is the same as Diagram #1 since we are using the same α , n1, and n2 for both tests.

The rejection region is 𝑡 < −2.228 or 𝑡 > 2.228.

Copyright © 2022 Pearson Education, Inc.


436

Chapter 9

In Exercise 9.17, the rejection region for both diagrams using the F-test is 𝐹 > 4.96. The tabled F value equals the square of the tabled t value. d.

e.

Diagram #1 For the t-test, since the test statistic falls in the rejection region ( t = −6.12 < −2.228 ) ,

Diagram #2 For the t-test, since the test statistic falls in the rejection region ( t = −2.28 < −2.228 ) ,

we would reject H0. In Exercise 9.17, using the F-test, we rejected H0.

we would reject H0. In Exercise 9.17, using the F-test, we rejected H0.

Assumptions for the t-test: 1. 2. 3.

Both populations have relative frequency distributions that are approximately normal. The two population variances are equal. Samples are selected randomly and independently from the populations.

Assumptions for the F-test: 1. 2. 3.

Both population probability distributions are normal. The two population variances are equal. Samples are selected randomly and independently from the respective populations.

The assumptions are the same for both tests. 9.19

Refer to Exercise 9.17, the ANOVA table is: For diagram #1: Source Treatment Error Total

Df 1 10 11

SS 75 20 95

MS 75 2

F 37.5

SS 75 144 219

MS 75 14.4

F 5.21

For diagram #2: Source Treatment Error Total

9.20

a.

Df 1 10 11

SSE = SS ( Total ) − SST = 46.5 − 17.5 = 29.0

df for Error is 41 − 6 = 35 SST 17.5 = = 2.9167 6 k −1 MST 2.9167 = = 3.52 F= MSE .8286 MST =

MSE =

SSE 29.0 = = .8286 35 n−k

The ANOVA table is: Source Treatment Error Total

df 6 35 41

SS 17.5 29.0 46.5

MS 2.9167 .8286

F 3.52

Copyright © 2022 Pearson Education, Inc.


Design of Experiments and Analysis of Variance 437

b.

The number of treatments is k. We know k − 1 = 6  k = 7 .

c.

The total sample size is n = 41 + 1 = 42 , where 41 = df Total.

d.

First, one would number the 42 experimental units from 1 to 42. Then generate over 100 uniform random numbers from 1 to 42. The first 6 different random numbers will correspond to treatment 1. The next 6 different random numbers will correspond to treatment 2. Repeat the process for treatments, 3, 4, 5, 6, and 7.

e.

To determine if there is a difference among the population means, we test:

H 0 : μ1 = μ2 =  = μ7 H a : At least one of the population means differs from the rest The test statistic is 𝐹 = 3.52. The rejection region requires 𝛼 = .10 in the upper tail of the F-distribution with 𝜈 = 𝑘 − 1 = 7 − 1 = 6 and 𝜈 = 𝑛 − 𝑘 = 42 − 7 = 35. From Table V, Appendix D, 𝐹. ≈ 1.98. The rejection region is 𝐹 > 1.98. Since the observed value of the test statistic falls in the rejection region (𝐹 = 3.52 > 1.98), H0 is rejected. There is sufficient evidence to indicate a difference among the population means at 𝛼 = .10. f.

The observed significance level is 𝑃(𝐹 ≥ 3.52). Using MINITAB, Cumulative Distribution Function

F distribution with 6 DF in numerator and 35 DF in denominator x 3.52

P( X <= x ) 0.992128

P ( F ≥ 3.52 ) = 1 − .992128 = .007872 .

g.

H 0 : μ1 = μ 2 H a : μ1 ≠ μ 2

The test statistic is t =

x1 − x2 1 1  MSE  +   n1 n2 

=

3.7 − 4.1 1 1 .8286  +  6 6

= −.76

.

The rejection region requires = = .05in each tail of the t-distribution with df = 𝑛 − 𝑘 = 35. From Table III, Appendix D, 𝑡. ≈ 1.697. The rejection region is 𝑡 < −1.697 and 𝑡 > 1.697. Since the observed value of the test statistic does not fall in the rejection region (𝑡 = −.76 ≮ − 1.697), H0 is not rejected. There is insufficient evidence to indicate that μ1 and μ 2 differ at 𝛼 = .10. h.

For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table III, Appendix D, with 𝑑𝑓 = 35, 𝑡. ≈ 1.697. The confidence interval is:

Copyright © 2022 Pearson Education, Inc.


438

Chapter 9

1

( x1 − x2 ) ± t.05 MSE

 n1

i.

a.

1 1 1   ( 3.7 − 4.1) ± 1.697 .8286  +   −.4 ± .892  ( −1.292, .492 ) 6 6 n2 

The confidence interval is: x1 ± t.05

9.21

+

MSE .8286  3.7 ± 1.697  3.7 ± .631  ( 3.069, 4.331) 6 6

Using MINITAB, the results are: One-way ANOVA: T1, T2, T3 Source DF SS MS Factor 2 12.30 6.15 Error 9 18.89 2.10 Total 11 31.19 S = 1.449

b.

F 2.93

R-Sq = 39.44%

P 0.105

R-Sq(adj) = 25.98%

H 0 : μ1 = μ2 = μ3 H a : At least two treatment means differ The test statistic is 𝐹 = 2.931 and the p-value is 𝑝 = .105. Since the p-value is not less than 𝛼 (𝑝 = .105 ≮ . 01), H0 is not rejected. There is insufficient evidence to indicate a difference in the treatment means at 𝛼 = .01.

9.22

a.

To compare the mean self-regulation deficiency scores of binge watchers in the three age groups, we test:

H 0 : μ1 = μ2 = μ3 H a : At least two treatment means differ

9.23

b.

We expect variation to occur in sample means when different samples are collected. Differences in sample means are not enough evidence to reject the null hypothesis. We must conduct a test of hypothesis to determine whether Ho should be rejected.

c.

No value of 𝛼 was specified, so we use .05. Since the p-value is less than 𝛼 (𝑝 = .001 < .05), H0 is rejected. There is sufficient evidence to indicate a difference in the mean self-regulation deficiency scores of binge watchers in the three age groups at 𝛼 = .05.

a.

The experimental unit is a US adult.

b.

The dependent variable, or response variable, is the perceived harm response of a US adult.

c.

The single factor for this experiment is the advertisement type. The five levels studied were organic, natural, additive-free, light, and regular.

d.

To compare the mean perceived harm responses for the five ad types, we test: 𝐻 : 𝜇 =𝜇 =𝜇 =𝜇 =𝜇 𝐻 : At least two treatment means differ

Copyright © 2022 Pearson Education, Inc.


Design of Experiments and Analysis of Variance 439

9.24

e.

No value of 𝛼 was specified, so we use .05. Since the p-value is less than 𝛼 (𝑝 < .01 < .05), H0 is rejected. There is sufficient evidence to indicate a difference in the mean perceived harm responses for the five ad types at 𝛼 = .05.

a.

The type of design used was a completely randomized design.

b.

The dependent variable is the decrease in the number of promotional cards sold after implementation of the pay cuts.

c.

There is one factor in this example – type of pay cut. The factor levels are: unilateral wage cut, general wage cut, and baseline.

d.

Let μ1 = mean decrease in cards sold for those receiving the “unilateral wage cut”, μ2 = mean decrease in cards sold for those receiving the “general wage cut” and μ3 = mean decrease in cards sold for those receiving the “baseline”. To determine if the average decrease in cards sold differs depending on whether one or more of the workers received a pay cut, we test: H 0 : μ1 = μ 2 = μ3 H a : At least two treatment means differ

9.25

e.

Since the p-value is less than 𝛼 (𝑝 = .001 < .01), H0 is rejected. There is sufficient evidence to indicate the average decrease in the number of cards sold differs depending on whether one or more of the workers received a pay cut.

a.

There were 4 ANOVAs conducted. The response variable, treatments and hypotheses for each are: 1. Number of tweets for males; twitter skill level; 𝐻 : 𝜇 = 𝜇 = 𝜇 = 𝜇 = 𝜇 vs 𝐻 : At least 1 𝜇 differs. 2. Number of tweets for females; twitter skill level; 𝐻 : 𝜇 = 𝜇 = 𝜇 = 𝜇 = 𝜇 vs 𝐻 : At least 1 𝜇 differs. 3. Continue use for males; twitter skill level; 𝐻 : 𝜇 = 𝜇 = 𝜇 = 𝜇 = 𝜇 vs 𝐻 : At least 1 𝜇 differs. 4. Continue use for females; twitter skill level; 𝐻 : 𝜇 = 𝜇 = 𝜇 = 𝜇 = 𝜇 vs 𝐻 : At least 1 𝜇 differs.

b.

For male tweets, the p-value is 𝑝 = .331. Since the p-value is not less than 𝛼 (𝑝 = .331 ≮ . 05), H0 is not rejected. There is insufficient evidence to indicate a difference in the mean number of tweets among the 5 levels of twitter skill at 𝛼 = .05. For female tweets, the p-value is 𝑝 = .731. Since the p-value is not less than 𝛼(𝑝 = .731 ≮ . 05), H0 is not rejected. There is insufficient evidence to indicate a difference in the mean number of tweets among the 5 levels of twitter skill at 𝛼 = .05. For male continuing usage, the p-value is 𝑝 = .062. Since the p-value is not less than 𝛼 (𝑝 = .062 ≮ . 05), H0 is not rejected. There is insufficient evidence to indicate a difference in the mean continuing usage score among the 5 levels of twitter skill at 𝛼 = .05. For female continuing usage, the p-value is 𝑝 = .006. Since the p-value is less than 𝛼 (𝑝 = .006 < .05), H0 is rejected. There is sufficient evidence to indicate a difference in the mean continuing usage scores among the 5 levels of twitter skill at 𝛼 = .05.

9.26

a.

MST =

SST .010 = = .003333 df 3

MSE =

SSE .029 MST .003333 = = .000207 F = = = 16.10 df 140 MSE .000207

Copyright © 2022 Pearson Education, Inc.


440

Chapter 9

The completed table is: Source Exposure Error Total b.

df 3 140 143

SS .010 .029 .039

MS .003333 .000207

F 16.10

p-value <.001

To determine if there is evidence to indicate the mean dot area differs depending on the exposure time, we test:

H 0 : μ1 = μ2 = μ3 = μ4 H a : At least 1 μi differs The test statistic is 𝐹 = 16.10 and the p-value is𝑝 < .001. Since the p-value is less than 𝛼 (𝑝 < .001 < .05), H0 is rejected. There is sufficient evidence to indicate a difference in the mean dot areas among the 4 exposure times at 𝛼 = .05. 9.27

9.28

a.

A completely randomized design was used for this study. The experimental units are the bus customers. The dependent variable is the performance score. There is one factor which is bus depot with 3 levels – Depot 1, Depot 2, and Depot 3. These factor levels are the treatments of the experiment.

b.

Yes. The p-value from the ANOVA F-test was 𝑝 = .0001. For a 95% confidence level, 𝛼 = .05. Since the p-value is less than 𝛼 (𝑝 = .0001 < .05), H0 is rejected. There is sufficient evidence to indicate the mean customer performance scores differed across the three bus depots at 𝛼 = .05.

a.

This is a completely randomized design because the subjects were randomly assigned to one of three groups.

b.

The response variable was the total WTP (willing to pay) value and the treatments were the 3 types of instructions given.

c.

To determine if the mean total WTP values differed among the three groups, we test:

H 0 : μ1 = μ2 = μ3 H a : At least two treatment means differ

9.29

d.

One would number the subjects from 1 to 252. Then, use a random number generator to generate 350 to 400 random numbers from 1 to 252 (We need to generate more than 252 random numbers to account for duplicates.) The first 84 different random numbers will be assigned to group 1, the next 84 different random numbers will be assigned to groups 2, and the rest will be assigned to group 3.

a.

This was a completely randomized design.

b.

The experimental units are the college students. The dependent variable is the attitude toward tanning score and the treatments are the 3 conditions (view product advertisement with models with a tan, view product advertisement with models with no tan, and view product advertisement with no model).

c.

Let μ1 = mean attitude score for those viewing product advertisement with models with a tan, μ2 = mean attitude score for those viewing product advertisement with models without a tan, and

Copyright © 2022 Pearson Education, Inc.


Design of Experiments and Analysis of Variance 441

μ3 = mean attitude score for those viewing product advertisement with no models. To determine if the treatment mean scores differ among the three groups, we test: H 0 : μ1 = μ2 = μ3

d.

These are just sample means. To determine if the population means differ, we have to determine how many standard deviations are between these sample means. In addition, the next time an experiment was conducted, the sample means could change.

e.

The hypotheses are: H 0 : μ1 = μ 2 = μ3 H a : At least two treatment means differ

The test statistic is 𝐹 = 3.60 and the p-value is 𝑝 = .03 Since the p-value is less than 𝛼 (𝑝 = .03 < .05), H0 is rejected. There is sufficient evidence to indicate a difference in the mean attitude scores among the three groups at 𝛼 = .05.

9.30

f.

We must assume that we have random samples from approximately normal populations with equal variances.

a.

The treatments are the three levels of the factor service condition – human staff only, service robot only, and combined human staff and service robot.

b.

To determine if mean ratings differ for the three service conditions, we test: H 0 : μ1 = μ 2 = μ3 H a : At least two treatment means differ

9.31

c.

The degrees of freedom with the treatments is 𝑘 − 1 = 3 − 1 = 2. The degrees of freedom for the error is 𝑛 − 𝑘 = 339 − 3 = 336.

d.

The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution with 𝜈 = 𝑘 − 1 = 3 − 1 = 2 and 𝜈 = 𝑛 − 𝑘 = 338 − 2 = 336. From Table VI, Appendix D, 𝐹. ≈ 2.68. The rejection region is 𝐹 > 2.68.

e.

Since the p-value is less than 𝛼 (𝑝 < .01 < .05), H0 is rejected. There is sufficient evidence to indicate a difference in the mean interaction quality of the three service conditions at 𝛼 = .05.

f.

Since the p-value is not less than 𝛼 (𝑝 > .10 ≮ .05), H0 is not rejected. There is insufficient evidence to indicate a difference in the mean service outcome quality of the three service conditions at 𝛼 = .05.

g.

Since the p-value is less than 𝛼 (𝑝 < .05 = .05), H0 is rejected. There is sufficient evidence to indicate a difference in the mean physical appeal quality of the three service conditions at 𝛼 = .05.

To determine if the mean recall percentages differ for student-drivers in the four groups, we test:

H 0 : μ1 = μ2 = μ3 = μ4 H a : At least 1 μi differs The test statistic is 𝐹 = 5.388 and the p-value is 𝑝 = .0036. Copyright © 2022 Pearson Education, Inc.


442

Chapter 9

Since the p-value is less than 𝛼 (𝑝 < .0036 < .01), H0 is rejected. There is sufficient evidence to indicate a difference in the mean recall percentages differ among the 4 groups at α = .01 . 9.32

a.

I would classify this experiment as designed. Each subject was randomly assigned to receive one of the three dosages (DM, honey, nothing). There are 3 treatments in the study corresponding to the 3 dosages: DM, honey, nothing.

b.

Using MINITAB, the output is: One-way ANOVA: TotalScore versus Treatment

Source Treatment Error Total

DF 2 102 104

SS 318.51 927.72 1246.23

MS 159.25 9.10

S = 3.016

R-Sq = 25.56%

F 17.51

P 0.000

R-Sq(adj) = 24.10%

To determine if differences exist in the mean improvement scores among the 3 treatment groups, we test: H 0 : μ1 = μ 2 = μ3 H a : At least two treatment means differ

The test statistic is 𝐹 = 17.51 and the p-value is 𝑝 = 0.000. Since the observed p-value (𝑝 = 0.000) is less than any reasonable value of 𝛼, H0 is rejected. There is sufficient evidence to indicate a difference in the mean improvement scores among the three levels of dosage for any reasonable value of 𝛼. 9.33

To determine if the mean THICKNESS differs among the 4 types of housing, we test: H 0 : μ1 = μ 2 = μ3 = μ 4 H a : At least two treatment means differ

The test statistic is 𝐹 = 11.74 and the p-value is 𝑝 = 0.000. Since the observed p-value (𝑝 = 0.000) is less than any reasonable value of 𝛼, H0 is rejected. There is sufficient evidence to indicate a difference in the mean thickness among the four levels of housing for any reasonable value of 𝛼. To determine if the mean WHIPPING CAPACITY differs among the 4 types of housing, we test: H 0 : μ1 = μ 2 = μ3 = μ 4 H a : At least two treatment means differ

The test statistic is 𝐹 = 31.36 and the p-value is 𝑝 = 0.000. Since the observed p-value (𝑝 = 0.000) is less than any reasonable value of 𝛼, H0 is rejected. There is sufficient evidence to indicate a difference in the mean whipping capacity among the four levels of housing for any reasonable value of 𝛼. To determine if the mean STRENGTH differs among the 4 types of housing, we test: H 0 : μ1 = μ 2 = μ3 = μ 4 H a : At least two treatment means differ

Copyright © 2022 Pearson Education, Inc.


Design of Experiments and Analysis of Variance 443

The test statistic is 𝐹 = 1.70 and the p-value is 𝑝 = 0.193. Since the observed p-value (𝑝 = 0.193) is higher than any reasonable value of 𝛼, H0 is not rejected. There is insufficient evidence to indicate a difference in the mean strength among the four levels of housing for any reasonable value of 𝛼. Thus, the mean thickness and the mean percent overrun differ among the 4 housing systems. 4

n x

i i

9.34

a.

= x = i =1 177

55 ( 8.52) + 39 ( 8.55) + 28 ( 6.63) + 55 ( 6.38) 1,338.59 = = 7.5627 177 177

4

SST =  ni ( xi − x ) = 55 ( 8.52 − 7.5627 ) + 39 ( 8.55 − 7.5627 ) 2

2

2

i =1

+ 28 ( 6.63 − 7.5627 ) + 55 ( 6.38 − 7.5627 ) = 189.7099 2

2

SSE = ( n1 − 1) s12 + ( n2 − 1) s22 + ( n3 − 1) s32 + ( n4 − 1) s42 = ( 55 – 1)(1.01) + ( 39 – 1)(.94) + ( 28 – 1)(.31) + ( 55 – 1)(.38) 2

b.

2

2

2

= 99.0545

c.

SS (Total ) = SST + SSE = 189.7099 + 99.0545 = 288.7644 MST =

F=

SST 189.7099 = = 63.2366 k −1 4 −1

MSE =

SSE 99.0545 = = .5726 n − k 177 − 4

MST 63.2366 = = 110.44 MSE .5726

The ANOVA table is: Source Club Error Total

d.

df 3 173 176

SS 189.7099 99.0545 278.5634

MS 63.2366 .5726

F-value 110.44

To determine if there are differences in the mean market value losses of clubs in the four types, we test:

H 0 : μ1 = μ2 = μ3 = μ4 H a : At least 1 μi differs The test statistic is 𝐹 = 110.44. The rejection region requires 𝛼 = .01 in the upper tail of the F-distribution with 𝜈 = 𝑘 − 1 = 4 − 1 = 3 and 𝜈 = 𝑛 − 𝑘 = 177 − 4 = 173. From Table VIII, Appendix D, 𝐹. ≈ 3.78. The rejection region is 𝐹 > 3.78. Since the observed value of the test statistic falls in the rejection region (𝐹 = 110.44 > 3.78), H0 is rejected. There is sufficient evidence to indicate a difference in the mean market value losses of clubs in the four types at 𝛼 = .01. e.

The assumption of constant variance may not be satisfied since the four sample variances appear to be different (1.01 = 1.0201, .94 = .8836, . 31 = .0961, and . 38 = .1444).

Copyright © 2022 Pearson Education, Inc.


444

Chapter 9

We are unable to check the normality assumption since we need the individual observations to create a histogram or stem-and-leaf plot. 9.35

The number of pairwise comparisons is equal to 𝑘(𝑘 − 1)/2. a.

For k = 3 , the number of comparisons is 3(3 − 1)/2 = 3.

b.

For k = 5 , the number of comparisons is 5(5 − 1)/2 = 10.

c.

For k = 4 , the number of comparisons is 4(4 − 1)/2 = 6.

d.

For k = 10 , the number of comparisons is 10(10 − 1)/2 = 45.

9.36

The experimentwise error rate is the probability of making a Type I error for at least one of all of the comparisons made. If the experimentwise error rate is α = .05 , then each individual comparison is made at a value of 𝛼 which is less than .05.

9.37

A comparisonwise error rate is the error rate (or the probability of declaring the means different when, in fact, they are not different, which is also the probability of a Type I error) for each individual comparison. That is, if each comparison is run using 𝛼 = .05, then the comparisonwise error rate is .05.

9.38

a.

From the diagram, the following pairs of treatments are significantly different because they are not connected by a line: A and E, A and B, A and D, C and E, C and B, C and D, and E and D. All other pairs of means are not significantly different because they are connected by lines.

b.

From the diagram, the following pairs of treatments are significantly different because they are not connected by a line: A and B, A and D, C and B, C and D, E and B, E and D, and B and D. All other pairs of means are not significantly different because they are connected by lines.

c.

From the diagram, the following pairs of treatments are significantly different because they are not connected by a line: A and E, A and B, and A and D. All other pairs of means are not significantly different because they are connected by lines.

d.

From the diagram, the following pairs of treatments are significantly different because they are not connected by a line: A and E, A and B, A and D, C and E, C and B, C and D, E and D, and B and D. All other pairs of means are not significantly different because they are connected by lines.

9.39

( μ1 − μ2 ) : ( 2, 15)

Since all values in the interval are positive, μ1 is significantly greater than μ 2 .

( μ1 − μ3 ) : ( 4, 7 )

Since all values in the interval are positive, μ1 is significantly greater than μ3 .

( μ1 − μ4 ) : ( −10, 3)

Since 0 is in the interval, μ1 is not significantly different from μ 4 . However, since the center of the interval is less than 0, μ 4 is larger than μ1 .

( μ2 − μ3 ) : ( −5, 11)

Since 0 is in the interval, μ 2 is not significantly different from μ3 . However, since the center of the interval is greater than 0, μ 2 is larger than μ3 .

( μ2 − μ4 ) : ( −12, − 6 ) Since all values in the interval are negative, μ 4 is significantly greater than μ 2 .

( μ3 − μ4 ) : ( −8, − 5)

Since all values in the interval are negative, μ 4 is significantly greater than μ3 . Copyright © 2022 Pearson Education, Inc.


Design of Experiments and Analysis of Variance 445

Thus, the largest mean is μ 4 followed by μ1 , μ 2 ,and μ3 . 9.40

9.41

9.42

(

)

)

The number of pairwise comparisons is 𝑐 =

b.

True

c.

False. From the results of the multiple comparison procedure, both Organic and Natural are grouped together and we cannot detect difference between their population means.

d.

Only ad claims in the “a” response group are perceived as less harmful. These include the Organic, Natural, and Additive-Free ad claims. k ( k − 1) 3 ( 3 − 1) 6 The number of pairwise comparisons is c = = = = 3 . These are: 𝜇general − 2 2 2 𝜇unnilateral , 𝜇general − 𝜇baseline , and 𝜇unilateral − 𝜇baseline .

a.

=

(

a.

=

= 10.

b.

A multiple comparison procedure is recommended to keep the experimentwise error rate at the selected α level.

c.

Since the confidence interval contains only positive values, there is evidence of a significant difference in the average decrease in promotional cards sold. Since the values are positive, this indicates that the average decrease for the baseline is greater than the average decrease for the general wage cut.

a.

The test statistic is 𝐹 = 22.68 and the p-value is 𝑝 = 0.001. Since the observed p-value (𝑝 = 0.001) is less than any reasonable α level we select (.01, .05, or .10), we reject H0. There is sufficient evidence to indicate a difference in the mean number of alternatives listed among the three emotional states for any 𝛼 > .001.

b.

The probability of declaring at least one pair of means different when they are not is .05.

c.

The mean number of alternatives listed under the guilty state is significantly higher than mean number of alternatives listed under the angry and neutral states. There is no difference in the mean number of alternatives listed under the angry and neutral states.

9.43

The mean attitude score for those viewing the product advertisement with models with no tan was significantly lower than the mean attitude scores of the other two groups. There is no significant difference in the mean attitude scores between those viewing the product advertisement with models with tans and those viewing the product advertisement with no models. This indicates that the type of product advertisement can influence a consumer’s attitude towards tanning.

9.44

a.

The number of pairwise comparisons is 𝑐 =

b.

Tukey’s procedure controls the experimentwise error rate. The probability of making at least one Type I error across all three comparisons is 5%.

c.

With 95% confidence, the average perceived quality score of service robots only is significantly lower than the average perceived quality scores of both combined staff and robot and also of human staff only.

a.

The treatments are the seven cheeses.

b.

The dependent variable is the color change index.

9.45

(

)

=

(

)

= = 3.

Copyright © 2022 Pearson Education, Inc.


446

Chapter 9

c.

Since the p-value is small (𝑝 < .05), H0 is rejected. There is sufficient evidence to indicate a difference in mean color change index among the seven cheeses for any value of 𝛼 ≥ .05.

d.

There are a total of

e.

Based on the results, the comparisons yield: 𝜇 > 𝜇 > (𝜇 , 𝜇 ℎ ) > (𝜇

(

)

=

(

)

= 21 pairwise-comparisons possible. ,𝜇 ) > 𝜇 .

9.46

Based on the results, there is no significant difference in the mean dot area between exposure times 8 and 14. Thus, exposure times 8 and 14 yield the highest mean dot area. The mean dot area for exposure time 12 is significantly less than that for any other exposure time.

9.47

Since all confidence intervals contain only positive values, this indicates that there is evidence that all population means are different. The largest mean is for Depot 1, then next highest is Depot 2, and the lowest is Depot 3.

9.48

Using MINITAB, the multiple comparisons of the means is shown below: Tukey 95% Simultaneous Confidence Intervals All Pairwise Comparisons Individual confidence level = 98.06% Honey subtracted from:

DM Control

Lower -4.120 -5.890

Center -2.381 -4.201

Upper -0.642 -2.511

----+---------+---------+---------+----(-----*------) (------*------) ----+---------+---------+---------+-----5.0 -2.5 0.0 2.5

Upper -0.104

----+---------+---------+---------+----(------*------) ----+---------+---------+---------+-----5.0 -2.5 0.0 2.5

DM subtracted from:

Control

Lower -3.535

Center -1.820

None of the three confidence intervals contain 0: The confidence interval for the difference in mean improvement scores between DM and Honey is (−4.120 and −0.642). Since this confidence interval is strictly below zero, this implies that the improvement scores for Honey are significantly higher than those of DM. The confidence interval for the difference in mean improvement scores between the Control group and Honey is (−5.890 and −2.511). Since this confidence interval is strictly below zero, this implies that the improvement scores for Honey are significantly higher than those of the Control Group. Compared to the Control group (giving no treatment) and DM, honey is a preferable treatment since it has significantly higher improvement scores. The state is appropriate. 9.49

a.

−𝜇 ) is (−.1215, −.0325). Since 0 is not contained in this The confidence interval for (𝜇 interval, there is sufficient evidence of a difference in the mean shell thickness between cage and barn egg housing systems. Since this interval is negative, this implies that the thickness is larger for the barn egg housing system.

Copyright © 2022 Pearson Education, Inc.


Design of Experiments and Analysis of Variance 447

b.

The confidence interval for (𝜇 −𝜇 ) is (−.1231, −.0342). Since 0 is not contained in this interval, there is sufficient evidence of a difference in the mean shell thickness between cage and free range egg housing systems. Since this interval is negative, this implies that the thickness is larger for the free range egg housing system.

c.

−𝜇 ) is (−.1031, −.0142). Since 0 is not contained in The confidence interval for (𝜇 this interval, there is sufficient evidence of a difference in the mean shell thickness between cage and organic egg housing systems. Since this interval is negative, this implies that the thickness is larger for the organic egg housing system.

d.

−𝜇 ) is (−.0514, .0480). Since 0 is contained in this interval, The confidence interval for (𝜇 there is insufficient evidence of a difference in the mean shell thickness between barn and free range egg housing systems. Since the center of the interval is greater than 0, the sample mean for barn is greater than that for free range.

e.

−𝜇 ) is (−.0314, .0680). Since 0 is contained in this The confidence interval for (𝜇 interval, there is insufficient evidence of a difference in the mean shell thickness between barn and organic egg housing systems. Since the center of the interval is less than 0, the sample mean for barn is less than that for organic.

f.

−𝜇 ) is (−.0297, .0697). Since 0 is contained in this The confidence interval for (𝜇 interval, there is insufficient evidence of a difference in the mean shell thickness between free range and organic egg housing systems. Since the center of the interval is greater than 0, the sample mean for free range is greater than that for organic.

g.

We rank the housing system means as follows: Housing System:

Cage < Organic < Free < Barn

We are 95% confident that the mean shell thickness for the cage housing system is significantly less than the mean thickness for the other three housing systems. There is no significant difference in the mean shell thicknesses among the barn, free range and organic housing systems. 9.50

a.

There are 3 blocks used since Block df = b − 1 = 2 and 5 treatments since the treatment df = k − 1 = 4 .

b.

There were 15 observations since the Total df = n − 1 = 14 .

c.

H 0 : μ1 = μ2 =  = μ5 H a : At least two treatment means differ MST = 9.109 MSE

d.

The test statistic is F =

e.

The rejection region requires 𝛼 = .01 in the upper tail of the F distribution with 𝜈 = 𝑘 − 1 = 5 − 1 = 4 and 𝜈 = 𝑛 − 𝑘 − 𝑏 + 1 = 15 − 5 − 3 + 1 = 8. From Table VIII, Appendix D, 𝐹. = 7.01. The rejection region is 𝐹 > 7.01.

f.

Since the observed value of the test statistic falls in the rejection region (𝐹 = 9.109 > 7.01), H0 is rejected. There is sufficient evidence to indicate that at least two treatment means differ at 𝛼 = .01.

Copyright © 2022 Pearson Education, Inc.


448

Chapter 9

g.

9.51

a.

The assumptions necessary to assure the validity of the test are as follows: 1.

The probability distributions of observations corresponding to all the block-treatment combinations are normal.

2.

The variances of all the probability distributions are equal.

(  x ) = 49 = 266.7778 where CM = 2

B2 SSB =  i − CM i =1 k b

SSB =

2

i

9

n

17 2 152 17 2 + + − 266.7778 = .8889 3 3 3

SSE = SS ( Total ) − SST − SSB = 30.2222 − 21.5555 − .8889 = 7.7778

MST =

SST 21.5555 = = 10.7778 k −1 2

MSE =

SSE 7.7778 = = 1.9445 n − k − b +1 4

FT =

MST 10.7778 = = 5.54 MSE 1.9445

MSB =

FB =

SSB .8889 = = .4445 b −1 2

MSB .4445 = = .23 MSE 1.9445

The ANOVA table is: Source Treatment Block Error Total

b.

df 2 2 4 8

SS 21.5555 .8889 7.7778 30.2222

MS 10.7778 .4445 1.9445

F 5.54 .23

H 0 : μ1 = μ2 = μ3 H a : At least two treatment means differ MST = 5.54 MSE

c.

The test statistic is F =

d.

A Type I error would be concluding at least two treatment means differ when they do not. A Type II error would be concluding all the treatment means are the same when at least two differ.

e.

The rejection region requires 𝛼 = .05 in the upper tail of the F distribution with 𝜈 = 𝑘 − 1 = 3 − 1 = 2 and 𝜈 = 𝑛 − 𝑘 − 𝑏 + 1 = 9 − 3 − 3 + 1 = 4. From Table VI, Appendix D, 𝐹. = 6.94. The rejection region is 𝐹 > 6.94. Since the observed value of the test statistic does not fall in the rejection region (𝐹 = 5.54 ≯ 6.94), H0 is not rejected. There is insufficient evidence to indicate at least two of the treatment means differ at 𝛼 = .05.

Copyright © 2022 Pearson Education, Inc.


Design of Experiments and Analysis of Variance 449

9.52

a.

The ANOVA Table is as follows: Source Treatment Block Error Total

b.

df 2 3 6 11

SS 12.032 71.749 .708 84.489

MS 6.016 23.916 .118

F 50.958 202.586

To determine if the treatment means differ, we test:

H 0 : μ A = μ B = μC H a : At least two treatment means differ The test statistic is F =

MST = 50.958 MSE

The rejection region requires α = .05 in the upper tail of the F distribution withν 1 = k − 1 = 3 − 1 = 2 andν 2 = n − k − b + 1 = 12 − 3 − 4 + 1 = 6 . From Table VI, Appendix D, F.05 = 5.14 . The rejection region is F > 5.14 . Since the observed value of the test statistic falls in the rejection region ( F = 50.958 > 5.14 ) , H0 is rejected. There is sufficient evidence to indicate that the treatment means differ at α = .05 . c.

To see if the blocking was effective, we test:

H 0 : μ1 = μ2 = μ3 = μ4 H a : At least two block means differ The test statistic is F =

MSB = 202.586 MSE

The rejection region requires α = .05 in the upper tail of the F distribution withν 1 = b − 1 = 4 − 1 = 3 andν 2 = n − k − b + 1 = 12 − 3 − 4 + 1 = 6 . From Table VI, Appendix D, F.05 = 4.76 . The rejection region is F > 4.76 . Since the observed value of the test statistic falls in the rejection region ( F = 202.586 > 4.76 ) , H0 is rejected. There is sufficient evidence to indicate that blocking was effective in reducing the experimental error at α = .05 . d.

From the printouts, we are given the differences in the sample means. The difference between Treatment B and both Treatments A and C are positive (1.125 and 2.450), so Treatment B has the largest sample mean. The difference between Treatment A and C is positive (1.325), so Treatment A has a larger sample mean than Treatment C. So, Treatment B has the largest sample mean, Treatment A has the next largest sample mean and Treatment C has the smallest sample mean. From the printout, all the means are significantly different from each other.

e.

The assumptions necessary to assure the validity of the inferences above are: 1. 2.

9.53

a.

The probability distributions of observations corresponding to all the block-treatment combinations are normal. The variances of all the probability distributions are equal.

SST = .2 ( 500 ) = 100

SSB = .3 ( 500 ) = 150

Copyright © 2022 Pearson Education, Inc.


450

Chapter 9

SSE = SS (Total ) − SST − SSB = 500 − 100 − 150 = 250

MST =

SST 100 = = 33.3333 k −1 4 −1

MSE =

SSE 250 250 = = = 10.4167 n − k − b + 1 36 − 4 − 9 + 1 24

FT =

MST 33.3333 = = 3.20 MSE 10.4167

MSB =

FB =

SSB 150 = = 18.75 b −1 9 −1

MSB 18.75 = = 1.80 MSE 10.4167

To determine if differences exist among the treatment means, we test: H 0 : μ1 = μ 2 = μ 3 = μ 4 H a : At least two treatment means differ

The test statistic is 𝐹 = 3.20. The rejection region requires α = .05 in the upper tail of the F distribution with ν 1 = k − 1 = 4 − 1 = 3 and ν 2 = n − k − b + 1 = 36 − 4 − 9 + 1 = 24 . From Table VI, Appendix D, F.05 = 3.01 . The rejection region is 𝐹 > 3.01. Since the observed value of the test statistic falls in the rejection region (𝐹 = 3.20 > 3.01), H0 is rejected. There is sufficient evidence to indicate differences among the treatment means at α = .05 . To determine if differences exist among the block means, we test: H 0 : μ1 = μ 2 =  = μ 9 H a : At least two block means differ

The test statistic is 𝐹 = 1.80. The rejection region requires 𝛼 = .05 in the upper tail of the F distribution withν 1 = b − 1 = 9 − 1 = 8 andν 2 = n − k − b + 1 = 36 − 4 − 9 + 1 = 24 . From Table VI, Appendix D, F.05 = 2.36 . The rejection region is 𝐹 > 2.36. Since the observed value of the test statistic does not fall in the rejection region (𝐹 = 1.80 ≯ 2.36), H0 is not rejected. There is insufficient evidence to indicate differences among the block means at 𝛼 = .05. b.

SST = .5(500) = 250

SSB = .2(500) = 100

S S E = S S ( T o ta l ) − S S T − S S B = 5 0 0 − 2 5 0 − 1 0 0 = 1 5 0

MST =

SST 250 = = 83.3333 k −1 4 −1

MSE =

SSE 150 = = 6.25 n − k − b + 1 36 − 4 − 9 + 1

FT =

MST 83.3333 = = 13.33 MSE 6.25

MSB =

FB =

SSB 100 = = 12.5 b −1 9 −1

MSB 12.5 = =2 MSE 6.25

Copyright © 2022 Pearson Education, Inc.


Design of Experiments and Analysis of Variance 451

To determine if differences exist among the treatment means, we test: H 0 : μ1 = μ 2 = μ 3 = μ 4 H a : At least two treatment means differ

The test statistic is 𝐹 = 13.33. The rejection region is 𝐹 > 3.01 (same as above). Since the observed value of the test statistic falls in the rejection region (𝐹 = 13.33 > 3.01), H0 is rejected. There is sufficient evidence to indicate differences exist among the treatment means at 𝛼 = .05. To determine if differences exist among the block means, we test: H 0 : μ1 = μ 2 =  = μ 9 H a : At least two block means differ

The test statistic is 𝐹 = 2.00. The rejection region is 𝐹 > 2.36 (same as above). Since the observed value of the test statistic does not fall in the rejection region(𝐹 = 2.00 ≯ 2.36), H0 is not rejected. There is insufficient evidence to indicate differences exist among the block means at 𝛼 = .05. c.

S S T = .2 ( 5 00 ) = 1 0 0

S S B = .5 ( 50 0 ) = 2 50

S S E = S S ( T o ta l ) − S S T − S S B = 5 0 0 − 1 0 0 − 2 5 0 = 1 5 0

MST =

SST 100 = = 33.3333 k −1 4 −1

MSE =

SSE 150 = = 6.25 n − k − b + 1 36 − 4 − 9 + 1

FT =

MST 33.3333 = = 5.33 MSE 6.25

MSB =

FB =

SSB 250 = = 31.25 b −1 9 −1

MSB 31.25 = = 5.00 MSE 6.25

To determine if differences exist among the treatment means, we test: H 0 : μ1 = μ 2 = μ 3 = μ 4 H a : At least two treatment means differ

The test statistic is 𝐹 = 5.33. The rejection region is 𝐹 > 3.01 (same as above). Since the observed value of the test statistic falls in the rejection region(𝐹 = 5.33 > 3.01), H0 is rejected. There is sufficient evidence to indicate differences exist among the treatment means at 𝛼 = .05. To determine if differences exist among the block means, we test: H 0 : μ1 = μ 2 =  = μ 9 H a : At least two block means differ

The test statistic is 𝐹 = 5.00.

Copyright © 2022 Pearson Education, Inc.


452

Chapter 9

The rejection region is 𝐹 > 2.36 (same as above). Since the observed value of the test statistic falls in the rejection region (𝐹 = 5.00 > 2.36), H0 is rejected. There is sufficient evidence to indicate differences exist among the block means at 𝛼 = .05. d.

SST = .4 ( 500 ) = 200

SSB = .4 ( 500 ) = 200

S S E = S S (T o ta l ) − S S T − S S B = 5 0 0 − 2 0 0 − 2 0 0 = 1 0 0

MST =

SST 200 = = 66.6667 k −1 4 −1

MSE =

SSE 100 = = 4.1667 n − k − b + 1 36 − 4 − 9 + 1

FT =

MST 66.6667 = = 16.0 MSE 4.1667

MSB =

FB =

SSB 200 = = 25 b −1 9 −1

MSB 25 = = 6.00 MSE 4.1667

To determine if differences exist among the treatment means, we test: H 0 : μ1 = μ 2 = μ 3 = μ 4 H a : At least two treatment means differ

The test statistic is 𝐹 = 16.0. The rejection region is 𝐹 > 3.01 (same as above). Since the observed value of the test statistic falls in the rejection region (𝐹 = 16.0 > 3.01), H0 is rejected. There is sufficient evidence to indicate differences among the treatment means at 𝛼 = .05. To determine if differences exist among the block means, we test: H 0 : μ1 = μ 2 =  = μ 9 H a : At least two block means differ

The test statistic is 𝐹 = 6.00. The rejection region is 𝐹 > 2.36 (same as above). Since the observed value of the test statistic falls in the rejection region (𝐹 = 6.00 > 2.36), H0 is rejected. There is sufficient evidence to indicate differences exist among the block means at 𝛼 = .05. e.

S S T = .2 ( 5 00 ) = 1 0 0

S S B = .2 ( 5 00 ) = 10 0

SSE = SS (T otal ) − SST − SSB = 500 − 100 − 100 = 300

MST =

SST 100 = = 33.3333 k −1 4 −1

MSE =

SSE 300 = = 12.5 n − k − b + 1 36 − 4 − 9 + 1

FT =

MST 33.3333 = = 2.67 MSE 12.5

MSB =

SSB 100 = = 12.5 b −1 9 −1

FB =

MSB 12.5 = = 1.00 MSE 12.5

Copyright © 2022 Pearson Education, Inc.


Design of Experiments and Analysis of Variance 453

To determine if differences exist among the treatment means, we test: H 0 : μ1 = μ 2 = μ 3 = μ 4 H a : At least two treatment means differ

The test statistic is 𝐹 = 2.67. The rejection region is 𝐹 > 3.01 (same as above). Since the observed value of the test statistic does not fall in the rejection region (𝐹 = 2.67 ≯ 3.01), H0 is not rejected. There is insufficient evidence to indicate differences exist among the treatment means at 𝛼 = .05. To determine if differences exist among the block means, we test: H 0 : μ1 = μ 2 =  = μ 9 H a : At least two block means differ

The test statistic is 𝐹 = 1.00. The rejection region is 𝐹 > 2.36 (same as above). Since the observed value of the test statistic does not fall in the rejection region (𝐹 = 1.00 ≯ 2.36), H0 is not rejected. There is insufficient evidence to indicate differences among the block means at 𝛼 = .05. 9.54

a.

This experimental design is a randomized block design because in part B, the same subjects provided WTP amounts for insuring both a sculpture and a painting. Each subject had 2 responses.

b.

The dependent (response) variable is the WTP amount. The treatments are the two scenarios (sculpture and painting). The blocks are the 84 subjects.

c.

To determine if there is a difference in the mean WTM amounts between sculptures and paintings, we test: H 0 : μ1 = μ 2 H a : μ1 ≠ μ 2

9.55

a.

A randomized block design should be used to analyze the data because the same employees were measured at all three time periods. Thus, the blocks are the employees and the treatments are the three time periods.

b.

There is still enough information in the table to make a conclusion because the p-values are given.

c.

To determine if there are differences in the mean competence levels among the three time periods, we test: H 0 : μ1 = μ 2 = μ 3 H a : At least two treatment means differ

d.

The p-value is p = 0.001 . At a significance level > .001, we reject H0. There is sufficient evidence to conclude that there is a difference in the mean competence levels among the three time periods for any value of 𝛼 > .001.

Copyright © 2022 Pearson Education, Inc.


454

9.56

Chapter 9

e.

With 90% confidence, the mean competence before the training is significantly less than the mean competence 2-days after and 2-months after. There is no significant difference in the mean competence between 2-days after and 2-months after.

a.

The dependent variable is the monthly solar energy. The treatments are the 4 solar panel configurations. The blocks are the months.

b.

To compare the mean solar energy values generated by the four panel configurations, we test: H 0 : μ1 = μ 2 = μ 3 = μ 4 H a : At least 1 μ i differs

9.57

9.58

c.

The test statistic is 𝐹 = 115.54 and the p-value is p = .000 . Since the p-value is so small, H0 is rejected.

d.

There is sufficient evidence to indicate a difference in the mean solar energy values generated by the four panel configurations for any value of 𝛼 > .000.

a.

This is a randomized bock design.

b.

The treatments are the four levels of the device type – draw sheet, friction-reducing repositioning sheet, slide board, and air-assisted device.

c.

No value of 𝛼 was specified, so we use .05. Since the p-value is less than 𝛼 (𝑝 = .001 < .05), H0 is rejected. There is sufficient evidence to indicate a difference in the hand pull force necessary for the four device types at 𝛼 = .05.

d.

At the 95% confidence level, 𝜇 , 𝜇 > 𝜇 > 𝜇

a.

Because each taster rated each food/beverage item, the observations are not independent. The treatments are the 5 different food/beverage items. The blocks are the taste testers, and the dependent variable is the rating of the food/beverage item.

b.

The test statistic is 𝐹 = 434.2146 and the p-value is p < .0001 . Since the p-value is so small, H0 is rejected. There is sufficient evidence to indicate there is a difference in the mean ratings among the 5 different food/beverage items. Comparing the food/beverage means using Bonferroni multiple comparison yields the following: 𝜇

> 𝜇 ,𝜇

> (𝜇 , 𝜇 )

Thus, Cheesecake had the highest mean rating. There was no difference in the mean ratings for pepperoni and orange juice, but both were higher than the mean ratings for black coffee and grapefruit. There was no difference in the mean ratings for black coffee and grapefruit. c.

Using SAS, the results are: The ANOVA Procedure Dependent Variable: RATING

Source

DF

Sum of Squares

Mean Square

F Value

Pr > F

Model

203

744996.342

3669.933

4.20

<.0001

Error

796

696205.014

874.629

Corrected Total

999

1441201.356

Copyright © 2022 Pearson Education, Inc.


Design of Experiments and Analysis of Variance 455 R-Square

Coeff Var

Root MSE

RATING Mean

0.516927

408.5954

29.57413

7.238000

Source

DF

Anova SS

Mean Square

F Value

Pr > F

TASTER FOOD

199 4

165962.1560 579034.1860

833.9807 144758.5465

0.95 165.51

0.6554 <.0001

Bonferroni (Dunn) t Tests for RATING NOTE: This test controls the Type I experimentwise error rate, but it generally has a higher Type II error rate than REGWQ.

Alpha 0.05 Error Degrees of Freedom 796 Error Mean Square 874.6294 Critical Value of t 2.81488 Minimum Significant Difference 8.3248

Means with the same letter are not significantly different.

Bon Grouping

Mean

N

FOOD

A

34.490

200

OJ

B B B

24.410

200

CC

20.705

200

PP

C C C

-21.170

200

GF

-22.245

200

BC

The test statistic is 𝐹 = 165.51 and the p-value is 𝑝 < .0001. Since the p-value is so small, H0 is rejected. There is sufficient evidence to indicate there is a difference in the mean ratings among the 5 different food/beverage items. Comparing the food/beverage means using Bonferroni multiple comparison yields the following: 𝜇

> (𝜇 , 𝜇 ) > (𝜇 , 𝜇 )

Thus, orange juice had the highest mean rating. There was no difference in the mean ratings for pepperoni and cheesecake, but both were higher than the mean ratings for black coffee and grapefruit. There was no difference in the mean ratings for black coffee and grapefruit. 9.59

a.

To compare the mean item scores, we test: H 0 : μ1 = μ 2 =  = μ 5 H a : At least 2 of the treatment means differs

b.

Each of the 11 items were reviewed by each of the 5 systematic reviews. Since all reviews were made on each item, the observations are not independent. Thus, the randomized block ANOVA is appropriate.

c.

The p-value for Review is p = 0.319. Since the p-value is not small, H0 would not be rejected for any reasonable value of 𝛼. There is insufficient evidence to indicate a difference in the mean review scores among the 5 systematic reviews. The p-value for Item is p = 0.000. Since the p-value is small, H0 would be rejected for any reasonable value of 𝛼. There is sufficient evidence to indicate a difference in the mean scores among the 5 reviews.

d.

None of the means are significantly different because all means are connected with the letter ‘a’.

Copyright © 2022 Pearson Education, Inc.


456

Chapter 9

This agrees with the conclusion drawn in part c about the treatment Review. e. 9.60

The experiment-wise error rate is .05. This means that the probability of declaring at least 2 means different when they are not different is .05.

Using SAS, the ANOVA Table is: The ANOVA Procedure Dependent Variable: temp

Source

DF

Sum of Squares

Mean Square

F Value

Pr > F

Model

11

18.53700000

1.68518182

0.52

0.8634

Error

18

58.03800000

3.22433333

Corrected Total

29

76.57500000

R-Square

Coeff Var

Root MSE

temp Mean

0.242076

1.885189

1.795643

95.25000

Source

DF

Anova SS

Mean Square

F Value

Pr > F

STUDENT PLANT

9 2

18.41500000 0.12200000

2.04611111 0.06100000

0.63 0.02

0.7537 0.9813

To determine if there are differences among the mean temperatures among the three treatments, we test: H 0 : μ1 = μ 2 = μ 3 H a : At least 2 of the treatment means differs

The test statistic is 𝐹 = 0.02. The associated p-value is 𝑝 = .9813. Since the p-value is very large, there is no evidence of a difference in mean temperature among the three treatments for any reasonable value of 𝛼. Since there is no difference, we do not need to compare the means. It appears that the presence of plants or pictures of plants does not reduce stress. 9.61

Using SAS, the results are: Dependent Variable: Y Sum of Squares Mean Square

Source

DF

Model

11

5596.900000

508.809091

Error

18

197.400000

10.966667

Corrected Total

29

5794.300000

R-Square

Coeff Var

Root MSE

Y Mean

0.965932

6.331923

3.311596

52.30000

F Value

Pr > F

46.40

<.0001

Source

DF

Anova SS

Mean Square

F Value

Pr > F

SUBJECT TRT

9 2

5486.300000 110.600000

609.588889 55.300000

55.59 5.04

<.0001 0.0182

Copyright © 2022 Pearson Education, Inc.


Design of Experiments and Analysis of Variance 457 The ANOVA Procedure Tukey's Studentized Range (HSD) Test for Y NOTE: This test controls the Type I experimentwise error rate, but it generally has a higher Type II error rate than REGWQ.

Alpha 0.05 Error Degrees of Freedom 18 Error Mean Square 10.96667 Critical Value of Studentized Range 3.60930 Minimum Significant Difference 3.7797

Means with the same letter are not significantly different. Tukey Grouping

B B B

A A A

Mean 54.600

N 10

TRT High

52.400

10

Neutral

49.900

10

Low

To determine if there are differences in the mean ratings of the three candidates, we test: H 0 : μ1 = μ 2 = μ 3 H a : At least 1 μ i differs

The test statistic is 𝐹 = 5.04 and the p-value is𝑝 = .0182. Since the p-value is so small, H0 is rejected. There is sufficient evidence to indicate there is a difference in the mean ratings among the three candidates. Using Tukey’s multiple comparison procedure, the mean rating for the high-performance morph candidate is significantly higher than the mean rating for the low-performance morph candidate. No other significant differences exist. The high-performance morph candidate and the neutral candidate received the highest mean rating. 9.62

a.

The treatments are the 4 pre-slaughter phases. The blocks are the 8 cows.

b.

Using SPSS, the output is:

Copyright © 2022 Pearson Education, Inc.


458

Chapter 9 Tests of Between-Subjects Effects Dependent Variable:Rate Type III Sum of Source

Squares

df

Mean Square

F

Sig.

2444.000a

10

244.400

5.108

.001

341551.125

1

341551.125

7137.777

.000

Cow

1922.875

7

274.696

5.741

.001

Phase

521.125

3

173.708

3.630

.030

Error

1004.875

21

47.851

Total

345000.000

32

3448.875

31

Corrected Model Intercept

Corrected Total

a. R Squared = .709 (Adjusted R Squared = .570)

The ANOVA table is simpler form is: Source Phase Cow Error Total

c.

Df 3 7 21 31

SS 521.125 1922.875 1004.875 3448.875

MS 173.708 274.696 47.851

F-value 3.630 5.741

p-value .030 .001

To determine if there are differences among the mean heart rates of cows in the four pre-slaughter phases, we test: H 0 : μ1 = μ 2 = μ 3 = μ 4 H a : At least 2 of the treatment means differs

The test statistic is 𝐹 = 3.63 and the p-value is p = .030 .Since the p-value is less than 𝛼 (𝑝 = .030 < .05), H0 is rejected. There is sufficient evidence to indicate a difference in heart rates of cows among the four pre-slaughter phases at 𝛼 = .05. d.

Since we rejected H0 in part c, the multiple comparison procedure is warranted. Using SPSS, the results are:

Copyright © 2022 Pearson Education, Inc.


Design of Experiments and Analysis of Variance 459 Homogeneous Subsets Rate Tukey HSDa,b Subset Phase

N

1

2

2.00

8

97.0000

3.00

8

103.1250

103.1250

4.00

8

105.1250

105.1250

1.00

8

Sig.

108.0000 .119

.508

Means for groups in homogeneous subsets are displayed. Based on observed means. The error term is Mean Square(Error) = 47.851. a. Uses Harmonic Mean Sample Size = 8.000. b. Alpha = 0.05.

There is a significant difference in the mean heart rates between the first phase and the second phase. The mean heart rate at the first phase is significantly greater than the mean heart rate at the second phase. No other differences exist. 9.63

Using MINITAB, the ANOVA table is: Two-way ANOVA: Corrosion versus Time, System Source DF SS MS F P Time 2 63.1050 31.5525 337.06 0.000 System 3 9.5833 3.1944 34.12 0.000 Error 6 0.5617 0.0936 Total 11 73.2500 S = 0.3060

System 1 2 3 4

R-Sq = 99.23%

Mean 9.0667 9.7333 11.0667 8.7333

R-Sq(adj) = 98.59%

Individual 95% CIs For Mean Based on Pooled StDev ------+---------+---------+---------+--(----*-----) (-----*----) (----*-----) (----*-----) ------+---------+---------+---------+--8.80 9.60 10.40 11.20

To determine if there is a difference in mean corrosion rates among the 4 systems, we test: H 0 : μ1 = μ 2 = μ 3 = μ 4 H a : At least 2 of the treatment means differs

The test statistic is 𝐹 = 34.12 and the p-value is p = .000 . Since the p-value is so small, H0 is rejected. There is sufficient evidence to indicate a difference in mean corrosion rates among the 4 systems at any reasonable value of 𝛼.

Copyright © 2022 Pearson Education, Inc.


460

Chapter 9

Using SAS, Tukey’s multiple comparison results are: Tukey's Studentized Range (HSD) Test for CORROSION NOTE: This test controls the Type I experimentwise error rate, but it generally has a higher Type II error rate than REGWQ.

Alpha 0.05 Error Degrees of Freedom 6 Error Mean Square 0.093611 Critical Value of Studentized Range 4.89559 Minimum Significant Difference 0.8648

Means with the same letter are not significantly different.

Tukey Grouping

Mean

N

SYSTEM

A

11.0667

3

3

B B B

9.7333

3

2

9.0667

3

1

8.7333

3

4

C C C

The mean corrosion rate for system 3 is significantly larger than all of the other mean corrosion rates. The mean corrosion rate of system 2 is significantly larger than the mean for system 4. If we want the system (epoxy coating) with the lowest corrosion rate, we would pick either system 1 or system 4. There is no significant difference between these two groups and they are in the lowest corrosion rate group. 9.64

9.65

a.

There are two factors.

b.

No, we cannot tell whether the factors are qualitative or quantitative.

c.

Yes. There are four levels of factor A and three levels of factor B.

d.

A treatment would consist of a combination of one level of factor A and one level of factor B. There are a total of 4 × 3 = 12 treatments.

e.

One problem with only one replicate is there are no degrees of freedom for error. This is overcome by having at least two replicates.

a.

The ANOVA table is: Source A B AB Error Total

df 2 3 6 12 23

SS .8 5.3 9.6 1.3 17.0

MS .4000 1.7667 1.6000 .1083

F 3.69 16.31 14.77

df for A is 𝑎 − 1 = 3 − 1 = 2

df for B is 𝑏 − 1 = 4 − 1 = 3

df for AB is (𝑎 − 1)(𝑏 − 1) = 2(3) = 6

df for Error is 𝑛 − 𝑎𝑏 = 24 − 3(4) = 12

df for Total is 𝑛 − 1 = 24 − 1 = 23 Copyright © 2022 Pearson Education, Inc.


Design of Experiments and Analysis of Variance 461

𝑆𝑆𝐸 = 𝑆𝑆(𝑇𝑜𝑡𝑎𝑙) − 𝑆𝑆𝑇 − 𝑆𝑆𝐵 = 17. 0 − .8 − 5.3 − 9.6 = 1.3

MSB =

SS B 5.3 = = 1.7667 b −1 4 −1

MSE =

SSE 1.3 = = .1083 n − ab 24 − 3 ( 4 )

FAB = b.

MSAB =

FA =

SS AB

=

MSA =

SS A .8 = = .40 a − 1 3 −1

9.6

( a − 1)( b − 1) ( 3 − 1)( 4 − 1)

MS A .4000 = = 3.69 MSE .1083

FB =

= 1.60

MS B 1.7667 = = 16.31 MSE .1083

MS AB 1.6000 = = 14.77 MSE .1083

Sum of Squares for Treatment is 𝑀𝑆𝑇 = SS𝐴 + SS𝐵 + SS𝐴𝐵 = .8 = 5.3 + 2.6 = 15. 7 MST =

SST 15.7 = = 1.4273 ab − 1 3(4) − 1

FT =

MST 1.4273 = = 13.18 MSE .1083

To determine if the treatment means differ, we test: H 0 : μ1 = μ 2 =  = μ12 H a : At least 2 of the treatment means differs

The test statistic is 𝐹 = 13.18. The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution with 𝜈 = 𝑎𝑏 − 1 = 3(4) − 1 = 11 and 𝜈 = 𝑛 − 𝑎𝑏 = 24 − 3(4) = 12. From Table VI, Appendix D, 𝐹. ≈ 2.75. The rejection region is 𝐹 > 2.75. Since the observed value of the test statistic falls in the rejection region (𝐹 = 13.18 > 2.75), H0 is rejected. There is sufficient evidence to indicate the treatment means differ at 𝛼 = .05. c.

We need to partition the Treatment Sum of Squares into the Main Effects and Interaction Sum of Squares. Then we test whether factors A and B interact. Depending on the conclusion of the test for interaction, we either test for main effects or compare the treatment means.

d.

Two factors are said to interact if the effect of one factor on the dependent variable is not the same at different levels of the second factor. If the factors interact, then tests for main effects are not necessary. We need to compare the treatment means for one factor at each level of the second.

e.

To determine if the factors interact, we test: H0: Factors A and B do not interact to affect the response mean Ha: Factors A and B do interact to affect the response mean The test statistic is F =

MS AB = 14.77 MSE

Copyright © 2022 Pearson Education, Inc.


462

Chapter 9

The rejection region requires 𝛼 = .05in the upper tail of the F-distribution with 𝜈 = (𝑎 − 1)(𝑏 − 1) = (3 − 1)(4 − 1) = 6 and 𝜈 = 𝑛 − 𝑎𝑏 = 24 − 3(4) = 12. From Table VI, Appendix D,𝐹. = 3.00. The rejection region is 𝐹 > 3.00. Since the observed value of the test statistic falls in the rejection region (𝐹 = 14.77 > 3.00), H0 is rejected. There is sufficient evidence to indicate the two factors interact to affect the response mean at 𝛼 = .05.

9.66

f.

No. Testing for main effects is not warranted because interaction is present. Instead, we compare the treatment means of one factor at each level of the second factor.

a.

Factor A has 3 + 1 = 4 levels and factor B has 1 + 1 = 2 levels.

b.

There are a total of 23 + 1 = 24 observations and 4 × 2 = 8 treatments. Therefore, there were 24/8 = 3 observations for each treatment.

c.

𝑑𝑓 = (𝑎 − 1)(𝑏 − 1) = (4 − 1)(2 − 1) = 3 AB Error df = 𝑛 − 𝑎𝑏 = 24 − 4(2) = 16

MSA =

SSA  SSA = ( a − 1) MSA = ( 4 − 1)(.75) = 2.25 a −1

MSAB =

SSAB

( a − 1)( b − 1)

MSB =

SS B .95 = = .95 b −1 2 −1

 SSAB = ( a − 1)( b − 1) MSAB = ( 4 − 1)( 2 − 1)(.30 ) = .9

𝑆𝑆𝐸 = 𝑆𝑆(𝑇𝑜𝑡𝑎𝑙) − 𝑆𝑆𝐴 − 𝑆𝑆𝐵 − 𝑆𝑆𝐴𝐵 = 6.5 − 2.25 − .95 − .9 = 2.4 𝑀𝑆𝐸 =

.

=

( )

= .15

Treatment df = 𝑎𝑏 − 1 = 4(2) − 1 = 7

FT =

FA =

MST =

SST 4.1 = = .5857 ab − 1 7

MST .5857 = = 3.90 MSE .15

MS A .75 = = 5.00 MSE .15

FB =

MS B .95 = = 6.33 MSE .15

FAB =

The ANOVA table is: Source Treatments A B AB Error Total

d.

df 7 3 1 3 16 23

SS 4.1 2.25 .95 .90 2.40 6.50

MS .59 .75 .95 .30 .15

F 3.90 5.00 6.33 2.00

To determine whether the treatment means differ, we test:

Copyright © 2022 Pearson Education, Inc.

MS AB .30 = = 2.00 MSE .15


Design of Experiments and Analysis of Variance 463

H 0 : μ1 = μ 2 =  = μ 8 H a : At least two treatment means differs

The test statistic is F =

MST = 3.90 MSE

The rejection region requires 𝛼 = .10 in the upper tail of the F-distribution with 𝜈 = 𝑎𝑏 − 1 = 4(2) − 1 = 7 and 𝜈 = 𝑛 − 𝑎𝑏 = 24 − 4(2) = 16. From Table V, Appendix D, 𝐹. = 2.13. The rejection region is 𝐹 > 2.13. Since the observed value of the test statistic falls in the rejection region (𝐹 = 3.90 > 2.13), H0 is rejected. There is sufficient evidence to indicate the treatment means differ at 𝛼 = .10. e.

To determine if the factors interact, we test: H0: Factors A and B do not interact to affect the response mean Ha: Factors A and B do interact to affect the response mean The test statistic is 𝐹 = 2.00. The rejection region requires 𝛼 = .10 in the upper tail of the F-distribution with 𝜈 = (𝑎 − 1)(𝑏 − 1) = (4 − 1)(2 − 1) = 3 and 𝜈 = 𝑛 − 𝑎𝑏 = 24 − 4(2) = 16. From Table V, Appendix D, 𝐹. = 2.46. The rejection region is𝐹 > 2.46. Since the observed value of the test statistic does not fall in the rejection region (𝐹 = 2.00 ≯ 2.46), H0 is not rejected. There is insufficient evidence to indicate factors A and B interact at 𝛼 = .10. To determine if the four means of factor A differ, we test: H0: There is no difference in the four means of factor A Ha: At least two of the factor A means differ The test statistic is 𝐹 = 5.00. The rejection region requires 𝛼 = .10in the upper tail of the F-distribution with ν 1 = a − 1 = 4 − 1 = 3 and 𝜈 = 𝑛 − 𝑎𝑏 = 24 − 4(2) = 16. From Table V, Appendix D, F.10 = 2.46 . The rejection region is 𝐹 > 2.46. Since the observed value of the test statistic falls in the rejection region (𝐹 = 5.00 > 2.46), H0 is rejected. There is sufficient evidence to indicate at least two of the four means of factor A differ at 𝛼 = .10. To determine if the 2 means of factor B differ, we test: H0: There is no difference in the two means of factor B Ha: At least two of the factor B means differ The test statistic is 𝐹 = 6.33.

Copyright © 2022 Pearson Education, Inc.


464

Chapter 9

The rejection region requires 𝛼 = .10in the upper tail of the F-distribution withν 1 = b − 1 = 2 − 1 = 1 and 𝜈 = 𝑛 − 𝑎𝑏 = 24 − 4(2) = 16. From Table V, Appendix D, F.10 = 3.05 . The rejection region is𝐹 > 3.05. Since the observed value of the test statistic falls in the rejection region (𝐹 = 6.33 > 3.05), H0 is rejected. There is sufficient evidence to indicate the two means of factor B differ at𝛼 = .10. All of the tests performed are warranted because interaction was not significant. a.

The treatments are the combinations of the levels of factor A and the levels of factor B. There are 2×3 = 6 treatments. The treatment means are: x11 =

 x = 3.1 + 4.0 = 3.55

x21 =

 x = 5.9 + 5.3 = 5.6

11

2

2

21

2

2

 x = 4.6 + 4.2 = 4.4

x13 =

 x = 2.9 + 2.2 = 2.55

x23 =

x12 =

x22 =

12

2

2

22

2

2

 x = 6.4 + 7.1 = 6.75 13

2

x 2

23

2

=

3.3 + 2.5 = 2.9 2

Using MNIITAB, the graph is: Scatterplot of A1, A2 vs B 7

Variable A1 A2

6

Y-Data

9.67

5

4

3

2 1

2 B

3

The treatment means appear to be different because the sample means are quite different. The factors appear to interact because the lines are not parallel. b.

To determine whether the treatment means differ, we test: H 0 : μ1 = μ 2 =  = μ 6 H a : At least two treatment means differs

The test statistic is 𝐹 = 21.62 The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution with 𝜈 = 𝑎𝑏 − 1 = 2(3) − 1 = 5 and 𝜈 = 𝑛 − 𝑎𝑏 = 12 − 2(3) = 6. From Table VI, Appendix D, 𝐹. = 4.39. The rejection region is 𝐹 > 4.39. Since the observed value of the test statistic falls in the rejection region (𝐹 = 21.62 > 4.39), H0 is rejected. There is sufficient evidence to indicate that the treatment means differ at 𝛼 = .05. This supports the plot in a.

Copyright © 2022 Pearson Education, Inc.


Design of Experiments and Analysis of Variance 465

c.

Yes. Since there are differences among the treatment means, we test for interaction. To determine whether the factors A and B interact, we test: H0: Factors A and B do not interact to affect the mean response Ha: Factors A and B do interact to affect the mean response The test statistic is 𝐹 = 36.62 The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution with 𝜈 = (𝑎 − 1)(𝑏 − 1) = (2 − 1)(3 − 1) = 2 and 𝜈 = 𝑛 − 𝑎𝑏 = 12 − 2(3) = 6. From Table VI, Appendix D, 𝐹. = 5.14. The rejection region is 𝐹 > 5.14. Since the observed value of the test statistic falls in the rejection region (𝐹 = 36.62 > 5.14), H0 is rejected. There is sufficient evidence to indicate that factors A and B interact to affect the response mean at 𝛼 = .05. No. Because interaction is present, the tests for main effects are not warranted.

e.

The results of the tests in parts b and c support the visual interpretation in part a.

a.

The treatments are the combinations of the levels of factor A and the levels of factor B. There are 2 × 2 = 4 treatments. The treatment means are: x11 =

 x = 29.6 + 35.2 = 32.4 11

2

2

 x = 12.9 + 17.6 = 15.25 x = 21

21

2

2

x12 =

 x = 47.3 + 42.1 = 44.7 12

2

2

 x = 28.4 + 22.7 = 25.55 x = 22

22

2

2

Using MINITAB, the graph is: Scatterplot of A1, A2 vs B Variable A1 A2

45 40 35 Y-Data

9.68

d.

30 25 20 15 1

2 B

The factors do not appear to interact—the lines are almost parallel. The treatment means do appear to differ because the sample means range from 15.25 to 44.7.

Copyright © 2022 Pearson Education, Inc.


466

Chapter 9

(  x ) = 235.8 = 6,950.205 CM = 2

b.

2

i

n

8

𝑆𝑆(𝑇𝑜𝑡𝑎𝑙) = ∑ 𝑥 − 𝐶𝑀 = 7922.92 − 6950.205 = 972.715

SSA =

 A − CM = 154.2 + 81.6 − 6950.205 = 7609.05 − 6950.205 = 658.845

SSB =

 B − CM = 95.3 + 140.5 − 6950.205 = 7205.585 − 6950.205 = 255.38

2 i

br

2 i

ar

2

2

2 ( 2)

2 ( 2)

2

2

2 ( 2)

2 ( 2)

 AB − SSA − SSB − CM SSAB = 2 ij

r 64.82 89.42 30.52 51.12 = + + + − 658.845 − 255.38 − 6950.205 = 7866.43 − 7864.43 = 2 2 2 2 2

SSE = SS (T otal ) − SSA − SSB − SSA B = 972.715 − 658.845 − 255.38 − 2 = 56.49

A df = a − 1 = 2 − 1 = 1

B df = b − 1 = 2 − 1 = 1

AB df = ( a − 1)(b − 1) = (2 − 1)(2 − 1) = 1

Error 𝑑𝑓 = 𝑛 − 𝑎𝑏 = 8 − 2(2) = 4

Total df = n − 1= 8 − 1 = 7

MSA =

SSA 658.845 = = 658.845 a −1 1

MSE =

SS E 56.49 = = 14.1225 n − ab 4

FA =

MS A 658.845 = = 46.65 MSE 14.1225

MSB =

FB =

SS B 255.38 = = 255.38 b −1 1

MS B 255.38 = = 18.08 MSE 14.1225

MSAB =

FAB =

SS AB 2 = =2 ( a − 1)( b − 1) 1

MS AB 2 = = .14 MSE 14.1225

The ANOVA table is: Source A B AB Error Total

c.

df

1 1 1 4 7

SS 658.845 255.380 2.000 56.490 972.715

MS 658.845 255.380 2.000 14.1225

F 46.65 18.08 .14

SST = SSA+ SSB + SSAB = 658.845 + 255.380 + 2.000 = 916.225 MST =

SST 916.225 = = 305.408 ab − 1 3

FT =

𝑑𝑓 = 𝑎𝑏 − 1 = 2(2) − 1 = 3

MST 305.408 = = 21.63 MSE 14.1225

Copyright © 2022 Pearson Education, Inc.


Design of Experiments and Analysis of Variance 467

To determine whether the treatment means differ, we test: H 0 : μ1 = μ 2 = μ 3 = μ 4 H a : At least two treatment means differs

The test statistic is 𝐹 = 21.63. The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution with 𝜈 = 𝑎𝑏 − 1 = 2(2) − 1 = 3 and 𝜈 = 𝑛 − 𝑎𝑏 = 8 − 2(2) = 4. From Table VI, Appendix D, 𝐹. = 6.59. The rejection region is 𝐹 > 6.59. Since the observed value of the test statistic falls in the rejection region (𝐹 = 21.63 > 6.59), H0 is rejected. There is sufficient evidence to indicate the treatment means differ at 𝛼 = .05. This agrees with the conclusion in part a. d.

Since there are differences among the treatment means, we test for the presence of interaction: H0: Factors A and B do not interact to affect the response means Ha: Factors A and B do interact to affect the response means The test statistic is 𝐹 = .14. The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution with 𝜈 = (𝑎 − 1)(𝑏 − 1) = (2 − 1)(2 − 1) = 1 and 𝜈 = 𝑛 − 𝑎𝑏 = 8 − 2(2) = 4. From Table VI, Appendix D, 𝐹. = 7.71. The rejection region is 𝐹 > 7.71. Since the observed value of the test statistic does not fall in the rejection region (𝐹 = .14 ≯ 7.71), H0 is not rejected. There is insufficient evidence to indicate the factors interact at 𝛼 = .05.

e.

Since the interaction was not significant, we test for main effects. To determine whether the two means of factor A differ, we test: H 0 : μ1 = μ 2 H a : μ1 ≠ μ 2

The test statistic is 𝐹 = 46.65. The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution with 𝜈 = 𝑎 − 1 = 2 − 1 = 1 and 𝜈 = 𝑛 − 𝑎𝑏 = 8 − 2(2) = 4. From Table VI, Appendix D, 𝐹. = 7.71. The rejection region is 𝐹 > 7.71. Since the observed value of the test statistic falls in the rejection region (𝐹 = 46.65 > 7.71), H0 is rejected. There is sufficient evidence to indicate the two means of factor A differ at 𝛼 = .05. To determine whether the two means of factor B differ, we test: H 0 : μ1 = μ 2 H a : μ1 ≠ μ 2

The test statistic is 𝐹 = 18.08.

Copyright © 2022 Pearson Education, Inc.


468

Chapter 9

The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution withν 1 = b − 1 = 2 − 1 = 1 and 𝜈 = 𝑛 − 𝑎𝑏 = 8 − 2(2) = 4. From Table VI, Appendix D, 𝐹. = 7.71. The rejection region is 𝐹 > 7.71. Since the observed value of the test statistic falls in the rejection region (𝐹 = 18.08 > 7.71), H0 is rejected. There is sufficient evidence to indicate the two means of factor B differ at 𝛼 = .05. f.

The results of all the tests agree with those in part a.

g.

Since no interaction is present, but the means of both factors A and B differ, we compare the two means of factor A and compare the two means of factor B. Since there are only two means to compare for each factor, the higher population mean corresponds to the higher sample mean. Factor A:

x1 =

 x = 29.6 + 35.2 + 47.3 + 42.1 = 38.55

x2 =

 x = 12.9 +17.6 + 28.4 + 22.7 = 20.4

1

2 ( 2)

br

2

2 ( 2)

br

The mean for level 1 of factor A is significantly higher than the mean for level 2. Factor B:

x1 =

 x = 29.6 + 35.2 +12.9 +17.6 = 23.825

x2 =

 x = 47.3 + 42.1+ 28.4 + 22.7 = 35.125

1

2 ( 2)

ar

2

2 ( 2)

ar

The mean for level 2 of factor B is significantly higher than the mean for level 1. 9.69

a.

𝑆𝑆𝐴 = .2(1000) = 200, 𝑆𝑆𝐵 = .1(1000) = 100,

𝑆𝑆𝐴𝐵 = .1(1000) = 100

𝑆𝑆𝐸 = 𝑆𝑆(𝑇𝑜𝑡𝑎𝑙) − 𝑆𝑆𝐴 − 𝑆𝑆𝐵 − 𝑆𝑆𝐴𝐵 = 1000 − 200 − 100 − 100 = 600

SST = SSA+ SSB + SSAB = 200 +100 +100 = 400 MSB =

SS B 100 = = 50 b −1 3 −1

MSE =

SSE 600 = = 33.333 n − ab 27 − 3 ( 3 )

MSAB =

MSA =

SSAB

=

SSA 200 = = 100 a −1 3 −1

100

( a − 1)( b − 1) ( 3 − 1)( 3 − 1) MST =

= 25

SST 400 = = 50 ab − 1 3 ( 3 ) − 1

FA =

MSA 100 = = 3.00 MSE 33.333

FB =

MS B 50 = = 1.50 MSE 33.333

FAB =

MSAB 25 = = .75 MSE 33.333

FT =

MST 50 = = 1.50 MSE 33.333

Copyright © 2022 Pearson Education, Inc.


Design of Experiments and Analysis of Variance 469

Source A B AB Error Total

df 2 2 4 18 26

SS 200 100 100 600 1000

MS 100 50 25 33.333

F 3.00 1.50 .75

To determine whether the treatment means differ, we test: H 0 : μ1 = μ 2 =  = μ 9 H a : At least two treatment means differs

The test statistic is F =

MST = 1.50 MSE

Suppose 𝛼 = .05. The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution with 𝜈 = 𝑎𝑏 − 1 = 3(3) − 1 = 8 and 𝜈 = 𝑛 − 𝑎𝑏 = 27 − 3(3) = 18. From Table VI, Appendix D, 𝐹. = 2.51. The rejection region is 𝐹 > 2.51. Since the observed value of the test statistic does not fall in the rejection region (𝐹 = 1.50 ≯ 2.51), H0 is not rejected. There is insufficient evidence to indicate the treatment means differ at 𝛼 = .05. Since there are no treatment mean differences, we have nothing more to do. b.

SSA = .1 (1000 ) = 100 ,

SSB = .1 (1000 ) = 100 ,

SSAB = .5 (1000 ) = 500

S S E = S S (T o ta l ) − S S A − S S B − S S A B = 1 0 0 0 − 1 0 0 − 1 0 0 − 5 0 0 = 3 0 0

SST = SSA+ SSB + SSAB = 100 +100 + 500 = 700 MSB =

SS B 100 = = 50 b −1 3 −1

MSAB =

MSE =

SSE 300 = = 16.667 n − ab 27 − 3 ( 3 )

MST =

MSA =

SSA 100 = = 50 a −1 3 −1

SSAB 500 = = 125 a − 1 b − 1 3 − 1 ( )( ) ( )( 3 − 1)

SST 700 = = 87.5 ab − 1 9 − 1

FA =

MS A 50 = = 3.00 MSE 16.667

FB =

MS B 50 = = 3.00 MSE 16.667

FAB =

MS AB 125 = = 7.50 MSE 16.667

FT =

MST 87.5 = = 5.25 MSE 16.667

Copyright © 2022 Pearson Education, Inc.


470

Chapter 9

Source A B AB Error Total

df

SS 100 100 500 300 1000

2 2 4 18 26

MS 50 50 125 16.667

F 3.00 3.00 7.50

To determine if the treatment means differ, we test: H 0 : μ1 = μ 2 =  = μ 9 H a : At least two treatment means differs

The test statistic is F =

MST = 5.25 MSE

The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution with 𝜈 = 𝑎𝑏 − 1 = 3(3) − 1 = 8 and 𝜈 = 𝑛 − 𝑎𝑏 = 27 − 3(3) = 18. From Table VI, Appendix D, 𝐹. = 2.51. The rejection region is 𝐹 > 2.51. Since the observed value of the test statistic falls in the rejection region (𝐹 = 5.25 > 2.51), H0 is rejected. There is sufficient evidence to indicate the treatment means differ at 𝛼 = .05. Since the treatment means differ, we next test for interaction between factors A and B. To determine if factors A and B interact, we test: H0: Factors A and B do not interact to affect the mean response Ha: Factors A and B do interact to affect the mean response The test statistic is F =

MS AB = 7.50 MSE

The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution with 𝜈 = (𝑎 − 1)(𝑏 − 1) = (3 − 1)(3 − 1) = 4 and 𝜈 = 𝑛 − 𝑎𝑏 = 27 − 3(3) = 18. From Table VI, Appendix D, 𝐹. = 2.93. The rejection region is 𝐹 > 2.93. Since the observed value of the test statistic falls in the rejection region (𝐹 = 7.50 > 2.93), H0 is rejected. There is sufficient evidence to indicate the factors A and B interact at 𝛼 = .05. Since interaction is present, no tests for main effects are necessary. c.

S S A = .4 (1 0 0 0 ) = 4 0 0 ,

SSB = .1 (1000 ) = 100 ,

S S A B = .2 (1 0 0 0 ) = 2 0 0

SS E = S S (T otal ) − S S A − S S B − SS A B = 1 0 0 0 − 40 0 − 10 0 − 2 0 0 = 3 0 0

𝑆𝑆𝑇 = 𝑆𝑆𝐴 + 𝑆𝑆𝐵 + 𝑆𝑆𝐴𝐵 = 400 + 100 + 200 = 700

MSB =

SS B 100 = = 50 b −1 3 −1

MSAB =

MSA =

SSA 400 = = 50 a −1 3 −1

MSAB 200 = = 50 ( a − 1)( b − 1) ( 3 − 1)( 3 − 1)

Copyright © 2022 Pearson Education, Inc.


Design of Experiments and Analysis of Variance 471 MSE =

SSE 300 = = 16.667 n − ab 27 − 3 ( 3 )

MST =

SST 700 = = 87.5 ab − 1 3 ( 3 ) − 1

FA =

MSA 200 = = 12.00 MSE 16.667

FB =

MSB 50 = = 3.00 MSE 16.667

FAB =

MSAB 50 = = 3.00 MSE 16.667

FT =

MST 87.5 = = 5.25 MSE 16.667

Source A B AB Error Total

df

SS 400 100 200 300 1000

2 2 4 18 26

MS 200 50 50 16.667

F 12.00 3.00 3.00

To determine if the treatment means differ, we test: H 0 : μ1 = μ 2 =  = μ 9 H a : At least two treatment means differs

The test statistic is F =

MST = 5.25 MSE

The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution with 𝜈 = 𝑎𝑏 − 1 = 3(3) − 1 = 8 and 𝜈 = 𝑛 − 𝑎𝑏 = 27 − 3(3) = 18. From Table VI, Appendix D, 𝐹. = 2.51. The rejection region is 𝐹 > 2.51. Since the observed value of the test statistic falls in the rejection region (𝐹 = 5.25 > 2.51), H0 is rejected. There is sufficient evidence to indicate the treatment means differ at 𝛼 = .05. Since the treatment means differ, we next test for interaction between factors A and B. To determine if factors A and B interact, we test: H0: Factors A and B do not interact to affect the mean response Ha: Factors A and B do interact to affect the mean response The test statistic is F =

MS AB = 3.00 MSE

The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution with 𝜈 = (𝑎 − 1)(𝑏 − 1) = (3 − 1)(3 − 1) = 4 and 𝜈 = 𝑛 − 𝑎𝑏 = 27 − 3(3) = 18. From Table VI, Appendix D, 𝐹. = 2.93. The rejection region is 𝐹 > 2.93. Since the observed value of the test statistic falls in the rejection region (𝐹 = 3.00 > 2.93), H0 is rejected. There is sufficient evidence to indicate the factors A and B interact at 𝛼 = .05. Since interaction is present, no tests for main effects are necessary. d.

S S A = .4 (1 0 0 0 ) = 4 0 0 ,

S S B = .4 (1 0 0 0 ) = 4 0 0 ,

SSA B = .1 (1000 ) = 100

Copyright © 2022 Pearson Education, Inc.


472

Chapter 9 𝑆𝑆𝐸 = 𝑆𝑆(𝑇𝑜𝑡𝑎𝑙) − 𝑆𝑆𝐴 − 𝑆𝑆𝐵 − 𝑆𝑆𝐴𝐵 = 1000 − 400 − 400 − 100 = 100 𝑆𝑆𝑇 = 𝑆𝑆𝐴 + 𝑆𝑆𝐵 + 𝑆𝑆𝐴𝐵 = 400 + 400 + 100 = 900

MSB =

SS B 400 = = 200 b −1 3 −1

MSAB =

MSE =

SSE 100 = = 5.556 n − ab 27 − 3 ( 3 )

MST =

MSA =

SSAB 100 = = 25 a − 1 b − 1 3 − 1 ( )( ) ( )( 3 − 1)

SST 900 = = 112.5 ab − 1 3 ( 3 ) − 1

FA =

MSA 200 = = 36.00 MSE 5.556

FB =

MSB 200 = = 36.00 MSE 5.556

FAB =

MSAB 25 = = 4.50 MSE 5.556

FT =

MST 112.5 = = 20.25 MSE 5.556

Source A B

Error Total

df 2 2 4 18 26

SS 400 400 100 100 1000

MS 200 200 25 5.556

SSA 400 = = 200 a −1 3 −1

F 36.00 36.00 4.50

To determine if the treatment means differ, we test: H 0 : μ1 = μ 2 =  = μ 9 H a : At least two treatment means differs

The test statistic is F =

MST = 20.25 MSE

The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution with 𝜈 = 𝑎𝑏 − 1 = 3(3) − 1 = 8 and 𝜈 = 𝑛 − 𝑎𝑏 = 27 − 3(3) = 18. From Table VI, Appendix D, 𝐹. = 2.51. The rejection region is 𝐹 > 2.51. Since the observed value of the test statistic falls in the rejection region (𝐹 = 20.25 > 2.51), H0 is rejected. There is sufficient evidence to indicate the treatment means differ at 𝛼 = .05. Since the treatment means differ, we next test for interaction between factors A and B. To determine if factors A and B interact, we test: H0: Factors A and B do not interact to affect the mean response Ha: Factors A and B do interact to affect the mean response The test statistic is F =

MSAB = 4.50 MSE

Copyright © 2022 Pearson Education, Inc.


Design of Experiments and Analysis of Variance 473 The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution with 𝜈 = (𝑎 − 1)(𝑏 − 1) = (3 − 1)(3 − 1) = 4 and 𝜈 = 𝑛 − 𝑎𝑏 = 27 − 3(3) = 18. From Table VI, Appendix D, 𝐹. = 2.93. The rejection region is 𝐹 > 2.93. Since the observed value of the test statistic falls in the rejection region (𝐹 = 4.50 > 2.93), H0 is rejected. There is sufficient evidence to indicate the factors A and B interact at 𝛼 = .05. Since interaction is present, no tests for main effects are necessary. 9.70

a.

Using Minitab, the graph is: The product on the left represents the Soft Drink and the product on the right represents the Hand Soap. From the parallel nature of the plot, it does not appear the two factors interact.

9.71

b.

At 𝛼 = .05 (our choice), Ho cannot be rejected. There is insufficient evidence to indicate the factors Product and Ad Type interact at 𝛼 = .05. Since interaction is not present, tests for main effects are necessary.

c.

At 𝛼 = .05 (our choice), Ho cannot be rejected. There is insufficient evidence to indicate that the mean quality rating differed between the two product types at 𝛼 = .05.

d.

At 𝛼 = .05 (our choice), Ho is rejected. There is sufficient evidence to indicate that the mean quality rating differed between the two ad types at 𝛼 = .05. We are 95% confident that the mean quality rating for all Passion ad types exceeds the mean quality rating of all Control ad types.

a.

The two factors are Ad Claim (with levels organic, natural, additive-free, light, and regular) and Disclaimer (with levels Yes and No). The treatments are the 5x2 = 10 factor level combinations. They are (O,Y), (O,N), (N,Y), (N,N), (A,Y), (A,N), (L,Y), (L,N), R,Y), and (R,N).

b.

At 𝛼 = .01, Ho cannot be rejected. There is insufficient evidence to indicate the factors Ad Claim and Disclaimer interact at 𝛼 = .01. Since interaction is not present, tests for main effects are necessary. A hypothetical plot of the treatments that shows interaction is not significant is shown below:

Copyright © 2022 Pearson Education, Inc.


474

Chapter 9 10 9 8 7 6 5 4 3 2 1 0

Organic Natural Additive-free Light Regular

Yes

No

The parallel relationships indicate that the interaction is not significant.

9.72

9.73

c.

At 𝛼 = .01, Ho is rejected. There is sufficient evidence to indicate that the mean perceived harm response values differed between the five advertising claim types at 𝛼 = .01.

d.

At 𝛼 = .01, Ho is rejected. There is sufficient evidence to indicate that the mean perceived harm response values differed between the two disclaimer types at 𝛼 = .01.

a.

The experimental design used is a complete two-factor factorial design.

b.

There are a total of 4 treatments in this experiment: (High Adaption, High knowledge), (High Adaption, Low knowledge), (Low Adaption, High knowledge), (Low Adaption, Low knowledge).

c.

If factor interaction is detected, then the effect of adaption on the final sales price depends on the level of knowledge.

d.

If no factor interaction exists and the main effect of knowledge exists, then the mean final sales price differs for the two levels of knowledge.

a.

There are 4 treatments for this experiment: (Male, STEM), (Male, non-STEM), (Female, STEM), (Female, non-STEM).

b.

If gender and discipline interact, then the effect of discipline on the mean satisfaction depends on gender.

c.

Yes. Because the lines are not parallel, this indicates that interaction exists.

d.

The ANOVA table would be: Source Gender Discipline GxD Error Total

e.

df 1 1 1 211 214

The test statistic is 𝐹 = 4.10 and the p-value is 𝑝 = .04. Since the p-value is so small, H0 is rejected. There is sufficient evidence to indicate that gender and discipline interact to affect the mean satisfaction score for 𝛼 > .04. Copyright © 2022 Pearson Education, Inc.


Design of Experiments and Analysis of Variance 475

9.74

a.

If justice reparation potential and producer need interact, then the effect of justice reparation potential on intension depends on the level of producer need.

b.

To determine if interaction exists, we test: H0: Justice reparation potential and producer need do not interact Ha: Justice reparation potential and producer need do interact The test statistic is 𝐹 = 20.55 and the p-value is 𝑝 = 0.000. Since the p-value is less than 𝛼 (𝑝 = 0.000 < .01), H0 is rejected. There is sufficient evidence to indicate reparation justice potential and producer need interact to affect intension at 𝛼 = .01.

9.75

c.

No. Since the test for interaction was significant, then the tests for the main effects are not necessary.

d.

This plot indicates that for high reparation justice potential, as producer need changes from High to Moderate, the mean intension decreases. However, for low reparation justice potential, as producer need changes from High to Moderate, the mean intension increases. This indicates that the effect of reparation justice potential on intension depends on the level of producer need.

e.

Yes. This is exactly what the graph shows.

a.

To determine if interaction between gender and weight status exists, we test: H0: Gender and weight status do not interact Ha: Gender and weight status do interact The test statistic is 𝐹 = 9.78 and the p-value is 𝑝 < .001. Since the p-value is less than 𝛼 (𝑝 < .001 < .01), H0 is rejected. There is sufficient evidence to indicate that gender and weight status interact at 𝛼 = .01.

9.76

b.

No. Since the test for interaction was significant, then the tests for the main effects are not necessary.

c.

For Males, no significant differences can be detected in the mean work memory responses of the three weight status levels. For Females, the mean work memory responses of the obese weight status level is significantly less than the mean work memory responses of both the normal and overweight weight status levels.

a.

There are two factors for this experiment, housing system and weight class. There are a total of 2 × 4 = 8 treatments. The treatments are: Cage, M Barn, M

b.

Cage, L Barn, L

Free, M Organic, M

Free, L Organic, L

Using SAS, the results are:

Copyright © 2022 Pearson Education, Inc.


476

Chapter 9 The GLM Procedure Dependent Variable: OVERRUN

c.

Source

DF

Sum of Squares

Mean Square

F Value

Pr > F

Model

7

11364.52381

1623.50340

14.93

<.0001

Error

20

2175.33333

108.76667

Corrected Total

27

13539.85714

R-Square

Coeff Var

Root MSE

OVERRUN Mean

0.839339

2.061383

10.42913

505.9286

Source HOUSING WTCLASS HOUSING*WTCLASS

DF 3 1 3

Type I SS 10787.79048 329.14286 247.59048

Mean Square 3595.93016 329.14286 82.53016

F Value 33.06 3.03 0.76

Pr > F <.0001 0.0973 0.5303

Source

DF

Type III SS

Mean Square

F Value

Pr > F

HOUSING WTCLASS HOUSING*WTCLASS

3 1 3

10787.79048 320.47407 247.59048

3595.93016 320.47407 82.53016

33.06 2.95 0.76

<.0001 0.1015 0.5303

To determine if interaction between housing system and weight class exists, we test: H0: Housing system and weight class do not interact Ha: Housing system and weight class do interact The test statistic is 𝐹 = 0.76 and the p-value is 𝑝 = .5303. Since the p-value is not less than 𝛼 (𝑝 = .5303 ≮ . 05), H0 is not rejected. There is insufficient evidence to indicate that housing system and weight class interact at 𝛼 = .05.

d.

To determine if there is a difference in mean whipping capacity among the 4 housing systems, we test: H 0 : μ1 = μ 2 = μ 3 = μ 4 H a : At least two treatment means differs

The test statistic is 𝐹 = 33.06 and the p-value is 𝑝 < .0001. Since the p-value is less than 𝛼(𝑝 < .0001 < .05), H0 is rejected. There is sufficient evidence to indicate a difference in mean whipping capacity among the 4 housing systems at 𝛼 = .05. e.

To determine if there is a difference in mean whipping capacity between the 2 weight classes, we test: H 0 : μ1 = μ 2 H a : μ1 ≠ μ 2

The test statistic is 𝐹 = 2.95 and the p-value is 𝑝 = .1015. Since the p-value is not less than 𝛼 (𝑝 < .1015 ≮ . 05), H0 is not rejected. There is insufficient evidence to indicate a difference in mean whipping capacity between the 2 weight classes at 𝛼 = .05. 9.77

a.

𝑑𝑓Order = 𝑎 − 1 = 2 − 1 = 1, 𝑑𝑓Menu = 𝑏 − 1 = 2 − 1 = 1, 𝑑𝑓OxM = (𝑎 − 1)(𝑏 − 1) = (2 − 1)(2 − 1) = 1, 𝑑𝑓Error = 𝑛 − 𝑎𝑏 = 180 − 2(2) = 176

Copyright © 2022 Pearson Education, Inc.


Design of Experiments and Analysis of Variance 477 Source Order Menu Order x Menu Error Total

df 1 1 1 176 179

F-value ----11.25

p-value ----<.001

b.

Since the p-value is less than 𝛼 (𝑝 < 0.001 < .05), H0 is rejected. There is sufficient evidence to indicate order and menu interact to affect the amount willing to pay at 𝛼 = .05.

c.

No, these results are not required to complete the analysis. Since the test for interaction was significant, there is no need to run the main effect tests.

d.

Using MINITAB, a graph of the means is: Interaction Plot for WillingPay Order Vice Virtue

17 16

Mean

15 14 13 12 11 Homogeneous

Mixed Menu

9.78

Using MINITAB, the results of the ANOVA are: General Linear Model: NUMBER versus GROUP, SET Factor GROUP SET

Type fixed fixed

Levels 3 3

Values 3, 6, 12 FIRST, LAST, MIDDLE

Analysis of Variance for NUMBER, using Adjusted SS for Tests Source GROUP SET GROUP*SET Error Total

DF 2 2 4 81 89

S = 1.00308

Seq SS 15.267 62.600 7.133 81.500 166.500

Adj SS 15.267 62.600 7.133 81.500

R-Sq = 51.05%

Adj MS 7.633 31.300 1.783 1.006

F 7.59 31.11 1.77

P 0.001 0.000 0.142

R-Sq(adj) = 46.22%

Means

Copyright © 2022 Pearson Education, Inc.


478

Chapter 9 SET FIRST LAST MIDDLE

N 30 30 30

NUMBER 3.0000 1.1000 1.4000

GROUP 3 6 12

N 30 30 30

NUMBER 2.4000 1.4333 1.6667

Tukey 95.0% Simultaneous Confidence Intervals Response Variable NUMBER All Pairwise Comparisons among Levels of GROUP GROUP = 3 subtracted from: GROUP 6 12

GROUP =

Lower -1.586 -1.352

GROUP 12

6

Center -0.9667 -0.7333

Upper -0.3477 -0.1143

---+---------+---------+---------+--(--------*--------) (--------*-------) ---+---------+---------+---------+---1.40 -0.70 0.00 0.70

subtracted from:

Lower -0.3857

Center 0.2333

Upper 0.8523

---+---------+---------+---------+--(--------*--------) ---+---------+---------+---------+---1.40 -0.70 0.00 0.70

Tukey 95.0% Simultaneous Confidence Intervals Response Variable NUMBER All Pairwise Comparisons among Levels of SET SET = FIRST subtracted from: SET LAST MIDDLE

Lower -2.519 -2.219

SET = LAST SET MIDDLE

Center -1.900 -1.600

Upper -1.281 -0.981

-----+---------+---------+---------+(-----*-----) (-----*-----) -----+---------+---------+---------+-2.0 -1.0 0.0 1.0

subtracted from:

Lower -0.3190

Center 0.3000

Upper 0.9190

-----+---------+---------+---------+(-----*-----) -----+---------+---------+---------+-2.0 -1.0 0.0 1.0

To determine if group size and photo set interact to affect the number of selections, we test: H0: Group size and Photo set do not interact to affect the number of selections Ha: Group size and Photo set interact to affect the number of selections The test statistic is 𝐹 = 1.77 and the p-value is 𝑝 = .142. Since the p-value is not small, H0 is not rejected. There is insufficient evidence to indicate that group size and photo set interact to affect the number of selections for any reasonable level of 𝛼. Since there is no evidence of an interaction, we will next test for the main effects. To determine if group Copyright © 2022 Pearson Education, Inc.


Design of Experiments and Analysis of Variance 479

size had an effect on the mean number of selections, we test: H 0 : μ1 = μ 2 = μ 3 H a : At least two group size means differs

The test statistic is 𝐹 = 7.59 and the p-value is 𝑝 = .001. Since the p-value is so small, H0 is rejected. There is sufficient evidence to indicate that group size has an effect on the mean number of selections for any level of 𝛼 greater than .001. To determine if photo set had an effect on the mean number of selections, we test: H 0 : μ1 = μ 2 = μ 3 H a : At least two photo set means differs

The test statistic is 𝐹 = 31.11 and the p-value is 𝑝 = .000. Since the p-value is so small, H0 is rejected. There is sufficient evidence to indicate that photo set has an affect the mean number of selections for any level of 𝛼 greater than .000. Since both main effects are significant, we will run Tukey’s multiple comparison procedure on each main effect to find where the differences exist. The mean number of selections made for the different group sizes are: __________________ Means: 1.433 1.667 2.400 Groups: 6 12 3 The confidence interval comparing size 3 to size 6 is (-1.586, -.3477). Since both endpoints of the interval are negative, the mean number of selections for size 3 is significantly greater than the mean number of selections for size 6. The confidence interval comparing size 3 to size 12 is (-1.352, -.1143). Since both endpoints of the interval are negative, the mean number of selections for size 3 is significantly greater than the mean number of selections for size 12. The confidence interval comparing size 6 to size 12 is (-.3857, .8523). Since 0 is contained in the interval, there is no difference in the mean number of selections between sizes 6 and 12. Thus, there are significantly more selections made for group size 3 than for the other two sizes. The mean number of selections made for the different photo sets are: __________________ Means: 1.10 1.40 3.00 Groups: Last Middle First The confidence interval comparing the first photo set to the last photo set is (-2.519, -1.281). Since both endpoints of the interval are negative, the mean number of selections for the first photo set is significantly greater than the mean number of selections for the last photo set. The confidence interval comparing the first photo set to the middle photo set is (-2.219, -.981). Since both endpoints of the interval are negative, the mean number of selections for the first photo set is significantly greater than the mean number of selections for the middle photo set. The confidence interval comparing the middle photo set to the last photo set is (-.3190, .9190). Since 0 is contained in the interval, there is no difference in the mean number of selections between the last photo set and the middle photo set. Thus, there are significantly more selections made for the first photo set than for the other two photo sets. 9.79

a.

The treatments are the 4 combinations of size and distortion: (cut, extra-large), (no cut, extra-large), (cut, half-size), (not cut, half-size).

b.

To determine if paper size and paper distortion interact, we test: Copyright © 2022 Pearson Education, Inc.


480

Chapter 9

H0: Paper size and paper distortion do not interact to affect attitude Ha: Paper size and paper distortion interact to affect attitude The test statistic is 𝐹 = 7.52 and the p-value is 𝑝 < .010. Since the p-value is so small, H0 is rejected. There is sufficient evidence to indicate that paper size and paper distortion interact to affect attitude for 𝛼 > .01.

9.80

c.

No. Because the interaction is significant, the test for main effects should not be conducted.

d.

The mean attitude score for (cut, half-size) is significantly greater than the means for any of the other 3 treatments. No other differences exist.

A statistical software program was used to create the following analysis of variance table: Factorial AOV Table for RATING Source LOGO BRAND LOGO*BRAND Error Total

DF 1 1 1 176 179

Grand Mean CV 49.56

5.3613

SS 50.95 20.22 0.76 1242.44 1314.38

MS 50.9550 20.2206 0.7644 7.0593

F 7.22 2.86 0.11

P 0.0079 0.0923 0.7425

To determine if interaction between logo and brand exists, we test: H0: Logo and brand do not interact Ha: Logo and brand do interact The test statistic is 𝐹 = 0.11 and the p-value is 𝑝 = .7425. Since the p-value is not less than 𝛼 (𝑝 = .7425 ≮ . 05), H0 is not rejected. There is insufficient evidence to indicate logo and brand interact at 𝛼 = .05. Since no interaction was detected, it is appropriate to conduct the main effects tests. To determine if there is a difference in mean rating scores among the two Logo designs, we test: 𝐻 : 𝜇 =𝜇 𝐻 :𝜇 ≠𝜇

The test statistic is 𝐹 = 7.22 and the p-value is 𝑝 = .0079. Since the p-value is less than 𝛼 (𝑝 = .0079 < .05), H0 is rejected. There is sufficient evidence to indicate a difference in mean rating scores among the two Logo designs at 𝛼 = .05.

Copyright © 2022 Pearson Education, Inc.


Design of Experiments and Analysis of Variance 481

To determine if there is a difference in mean rating scores among the two Brands, we test: H 0 : μ1 = μ 2 H a : μ1 ≠ μ 2

The test statistic is 𝐹 = 2.86 and the p-value is 𝑝 = .0923. Since the p-value is not less than 𝛼 (𝑝 = .0923 ≮ . 05), H0 is not rejected. There is insufficient evidence to indicate a difference in mean rating scores among the two Brands at 𝛼 = .05. 9.81

a.

Low Load, Ambiguous: T otal1 = n1 x1 = 25 (18 ) = 450 High Load, Ambiguous: T o tal 2 = n 2 x 2 = 2 5 ( 6 .1 ) = 1 5 2.5 Low Load, Common: T otal 3 = n 3 x 3 = 25 ( 7.8 ) = 195 High Load, Common: T otal 4 = n 4 x 4 = 2 5 ( 6.3 ) = 157 .5 (sum of all observations) 2 ( 450 + 152.5 + 195 + 157.5 ) 955 2 = = = 9,120.25 n 100 100 2

b.

CM =

c.

Low Load total is 450 + 195 = 645. High Load total is 152. 5 + 157. 5 = 310. a

SS( Load ) =

A

2 i

i =1

br

6452 3102 + − 9,120.25 = 10, 242.5 − 9,120.25 = 1,122.25 2(25) 2(25)

− CM =

Ambiguous total is 450 + 152. 5 = 602.5. Common total is 195 + 157. 5 = 352. 5 b

SS ( Name ) =

B j =1

ar

2 j

− CM = a

SS ( Load × Name ) =

b

602.5 2 352.5 2 + − 7, 700.0625 = 9, 745.25 − 9,120.25 = 625 2 ( 25 ) 2 ( 25 )

 AB i =1 j =1

r

2 ij

− SS ( Load ) − SS ( Name ) − CM

4502 152.52 1952 157.52 + + + − 1,122.25 − 625 − 9,120.25 25 25 25 25 = 11,543.5 − 1,122.25 − 625 − 9,120.25 = 676 =

d.

e.

Low Load, Ambiguous: s12 = 15 2 = 225

( n1 − 1 ) s12 = ( 25 − 1 ) 225 = 5, 400

High Load, Ambiguous: s 22 = 9.5 2 = 90.25

( n 2 − 1 ) s 22 = ( 25 − 1 ) 90.25 = 2,166

Low Load, Common: s32 = 9.5 2 = 90.25

( n 3 − 1 ) s 32 = ( 25 − 1 ) 90.25 = 2,166

High Load, Common: s42 = 10 2 = 100

( n 4 − 1 ) s 42 = ( 25 − 1 )100 = 2, 400

S S E = ( n1 − 1 ) s12 + ( n 2 − 1 ) s 22 + ( n 3 − 1 ) s 32 + ( n 4 − 1 ) s 42 = 5, 4 0 0 + 2,1 6 6 + 2,1 6 6 + 2, 4 0 0 = 1 2,1 3 2

Copyright © 2022 Pearson Education, Inc.


Chapter 9

f.

SS ( Total ) = SS ( Load ) + SS ( Name ) + SS ( Load x Name ) + SSE

g.

The ANOVA table is:

= 1,122.25 + 625 + 676 + 12,132 = 14, 555.25

Source Load Name Load x Name Error Total

df 1 1 1 96 99

SS 1,122.25 625.00 676.00 12,132.00 14,555.25

MS 1,122.25 625.00 676.00 126.375

F 8.88 4.95 5.35

h.

Yes. We computed 5.35, which is almost the same as 5.34. The difference could be due to round-off error.

i.

To determine if interaction between Load and Name is present, we test: H0: Load and Name do not interact Ha: Load and Name class do interact The test statistic is 𝐹 = 5.35. The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution with 𝜈 = (𝑎 − 1)(𝑏 − 1) = (2 − 1)(2 − 1) = 1 and 𝜈 = 𝑛– 𝑎𝑏 = 100– 2(2) = 96. From Table VI, Appendix D, 𝐹. ≈ 3.96. The rejection region is 𝐹 > 3.96. Since the observed value of the test statistic falls in the rejection region (𝐹 = 5.35 > 3.96), H0 is rejected. There is sufficient evidence to indicate that Load and Name interact at 𝛼 = .05. Using MINITAB, a graph of the results is: Scatterplot of Mean vs Load Name 1 2

17.5

15.0

Mean

482

12.5

10.0

7.5

5.0 Low

High Load

From the graph, the interaction is quite apparent. For Low load, the mean number of jelly beans taken for the ambiguous name is much higher than the mean number taken for the common name. However, for High load, there is essentially no difference in the mean number of jelly beans taken between the two names. j.

We must assume that: 1. The response distributions for each Load-Name combination (treatment) is normal.

Copyright © 2022 Pearson Education, Inc.


Design of Experiments and Analysis of Variance 483

2.

The response variance is constant for all Load-Name combinations.

3.

Random and independent samples of experimental units are associated with each Load-Name combination.

9.82

A one-way ANOVA has only one factor with 2 or more levels. A two-way ANOVA has 2 factors, each at 2 or more levels.

9.83

In a completely randomized design, independent random selection of treatments to be assigned to experimental units is required. In a randomized block design, the experimental units are first grouped into blocks such that within the blocks the experimental units are homogeneous and between the blocks the experimental units are heterogeneous. Once the experimental units are grouped into blocks, the treatments are randomly assigned to the experimental units within each block so that each treatment appears one time in each block.

9.84

There are 3 × 2 = 6 treatments. They are A1B1, A1B2, A2B1, A2B2, A3B1, and A3B2.

9.85

When the overall level of significance of a multiple comparison procedure is 𝛼, the level of significance for each comparison is less than 𝛼. This is because the comparisons within the experiment are not independent of each other.

9.86

a.

𝑆𝑆𝐸 = 𝑆𝑆(Total) − 𝑆𝑆𝑇 = 62.55 − 36.95 = 25.60

df Treatment = k −1 = 4 −1 = 3 MST =

df Error = n − k = 20 − 4 = 16

SST 36.95 = = 12.32 3 df

MSE =

df Total = n −1 = 20 −1 =19

SSE 25.60 = = 1.60 16 df

F=

MST 12.32 = = 7.70 MSE 1.60

The ANOVA table: Source Treatment Error Total

b.

df 3 16 19

SS 36.95 25.60 62.55

MS 12.32 1.60

F 7.70

To determine if there is a difference in the treatment means, we test: H 0 : μ1 = μ 2 = μ 3 = μ 4 H a : At least two means differ

where the μ i represents the mean for the ith treatment. The test statistic is F =

MST = 7.70 MSE

The rejection region requires 𝛼 = .10 in the upper tail of the F-distribution with 𝜈 = 𝑘 − 1 = 4 − 1 = 3 and 𝜈 = 𝑛– 𝑘 = 20 − 4 = 16. From Table V, Appendix D, 𝐹. = 2.46. The rejection region is 𝐹 > 2.46. Since the observed value of the test statistic falls in the rejection region (𝐹 = 7.70 > 2.46), H0 is rejected. There is sufficient evidence to conclude that at least two of the means differ at 𝛼 = .10.

Copyright © 2022 Pearson Education, Inc.


484

Chapter 9

c.

x4 =

 x = 57 = 11.4 4

n4

5

For confidence level .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table III, Appendix D, with 𝑑𝑓 = 16, t.05 = 1.746 . The confidence interval is:

x4 ± t.05 9.87

a.

MSE 1.6  11.4 ±1.746  11.4 ± .99  (10.41, 12.39) n4 5

𝑆𝑆𝑇 = 𝑆𝑆(Total)– 𝑆𝑆(Block)– 𝑆𝑆𝐸 = 22.31– 10.688 − .288 = 11.334

MST =

SST 11.334 = = 3.778 , k −1 4 −1

MS ( Block ) =

MSE =

FT =

SS ( Block ) b −1

=

df = k –1 = 4 –1 = 3

10.688 = 2.672 , df = b – 1 = 5 – 1 = 4 5 −1

SSE .288 = = .024 , df = n – k – b + 1 = 20 – 4 – 5 + 1 = 12 n − k − b + 1 20 − 4 − 5 + 1

MST 3.778 = = 157.42 MSE .024

FB =

MS ( Block ) MSE

=

2.672 = 111.33 .024

The ANOVA Table is: Source Treatment Block Error Total

b.

df 3 4 12 19

SS 11.334 10.688 0.288 22.310

MS 3.778 2.672 0.024

F 157.42 111.33

To determine if there are differences among the treatment means, we test: H 0 : μA = μB = μC = μD H a : At least two treatment means differ

The test statistic is F =

MST = 157.42 MSE

The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution with 𝜈 = 𝑘 − 1 = 4 − 1 = 3 and 𝜈 = 𝑛– 𝑘 − 𝑏 + 1 = 20 − 4 − 5 + 1 = 12. From Table VI, Appendix D, 𝐹. = 3.49. The rejection region is 𝐹 > 3.49. Since the observed value of the test statistic falls in the rejection region (𝐹 = 157.42 > 3.49), H0 is rejected. There is sufficient evidence to indicate differences among the treatment means at 𝛼 = .05.

Copyright © 2022 Pearson Education, Inc.


Design of Experiments and Analysis of Variance 485

c.

Since there is evidence of differences among the treatment means, we need to compare the treatment ( ) ( ) = = 6. means. The number of pairwise comparisons is

d.

To determine if there are differences among the block means, we test: H0: All block means are the same Ha: At least two block means differ The test statistic is F =

MS ( Block ) MSE

= 111.33

The rejection region requires 𝛼 = .05 in the upper tail of the F distribution with 𝜈 = 𝑏 − 1 = 5 − 1 = 4 and 𝜈 = 𝑛– 𝑘 − 𝑏 + 1 = 20 − 4 − 5 + 1 = 12. From Table VI, Appendix D, 𝐹. = 3.26. The rejection region is 𝐹 > 3.26. Since the observed value of the test statistic falls in the rejection region (𝐹 = 111.33 > 3.26), H0 is rejected. There is sufficient evidence that the block means differ at 𝛼 = .05. 9.88

a.

df ( A B ) = ( a − 1 )( b − 1 ) = 3 ( 5 ) = 1 5

df ( Error ) = n − ab = 48 − 4 ( 6 ) = 24

S S A B = M S A B ( d f ) = 3.1 (1 5 ) = 46.5 S S ( T o tal ) = S S A + S S B + S S A B + S S E = 2 .6 + 9 .2 + 4 6 .5 + 1 8 .7 = 7 7

SS A 2.6 = = .8667 a −1 3

MSB =

MS A .8667 = = 1.11 MSE .7792

FB =

MSA =

FA =

Source A B AB Error Total

df 3 5 15 24 47

SS B 9.2 = = 1.84 b −1 5

MSE =

MS B 1.84 = = 2.36 MSE .7792

FAB =

SS

2.6 9.2 46.5 18.7 77.0

MS .8667 1.84 3.1 .7792

SSE 18.7 = = .7792 n − ab 24

MS AB 3.1 = = 3.98 MSE .7792

F 1.11 2.36 3.98

b.

Factor A has 𝑎 = 3 + 1 = 4 levels and factor B has 𝑏 = 5 + 1 = 6 levels. The number of treatments is 𝑎𝑏 = 4(6) = 24. The total number of observations is 𝑛 = 47 + 1 = 48. Thus, two replicates were performed.

c.

𝑆𝑆𝑇 = 𝑆𝑆𝐴 + 𝑆𝑆𝐵 + 𝑆𝑆𝐴𝐵 = 2.6 + 9.2 + 46.5 = 58.3 MST =

SST 58.3 = = 2.5347 ab − 1 4(6) − 1

F=

MST 2.5347 = = 3.25 MSE .7792

To determine whether the treatment means differ, we test: H 0 : μ1 = μ 2 =  = μ 24 H a : At least two treatment means differ

Copyright © 2022 Pearson Education, Inc.


486

Chapter 9

The test statistic is F =

MST = 3.25 MSE

The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution with 𝜈 = 𝑎𝑏 − 1 = 4(6) − 1 = 23 and 𝜈 = 𝑛– 𝑎𝑏 = 48– 4(6) = 24. From Table VI, Appendix D, 𝐹. ≈ 2.03. The rejection region is 𝐹 > 2.03. Since the observed value of the test statistic falls in the rejection region (𝐹 = 3.25 > 2.03), H0 is rejected. There is sufficient evidence to indicate the treatment means differ at 𝛼 = .05. d.

Since there are differences among the treatment means, we test for the presence of interaction: H0: Factor A and factor B do not interact to affect the response mean Ha: Factor A and factor B do interact to affect the response mean The test statistic is F =

MS AB = 3.98 MSE

The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution with 𝜈 = (𝑎 − 1)(𝑏 − 1) = (4 − 1)(6 − 1) = 15 and 𝜈 = 𝑛– 𝑎𝑏 = 48– 4(6) = 24. From Table VI, Appendix D,𝐹. = 2.11. The rejection region is𝐹 > 2.11. Since the observed value of the test statistic falls in the rejection region (𝐹 = 3.98 > 2.11), H0 is rejected. There is sufficient evidence to indicate factors A and B interact to affect the response means at 𝛼 = .05. Since the interaction is significant, no further tests are warranted. Multiple comparisons need to be performed. 9.89

9.90

9.91

a.

The response variable is QB production score.

b.

There is one factor which is draft position.

c.

The treatments are the three levels of draft position – Top 10, between picks 11-50, and after pick 50.

d.

The experimental units are the drafted quarterbacks.

a.

The experimental units are the accounting alumni.

b.

The response variable is income.

c.

There are 2 factors in the problem: Mach score classification and Gender.

d.

Mach score classification has 3 levels – high, moderate, and low. Gender has 2 levels – male and female.

e.

There are a total of 2 × 3 = 6 treatments in his experiment. The treatments are all of the Mach/Gender combinations

a.

The two factors are type of statement and order of information. There are 2 × 2 = 4 treatments: concrete/statement first, concrete/behavior first, abstract/statement first, and abstract/behavior first.

b.

This indicates that the effect of type of statement on the level of hypocrisy depends on the order of the

Copyright © 2022 Pearson Education, Inc.


Design of Experiments and Analysis of Variance 487

information. c.

Using MINITAB, a plot of the means is: Scatterplot of Hypocrisy vs Order 6.00

Type Abstract Concrete

5.75

Hypocrisy

5.50 5.25 5.00 4.75 4.50 Statement

Behavior Order

9.92

d.

Since the interaction between the type of statement and the order of information was significant, then the tests for main effects should not be performed. Multiple comparisons on some or all of the pairs

a.

To determine if differences exist in the mean rates of return among the three types of fund groups, we test: H 0 : μ1 = μ 2 = μ 3 H a : At least two treatment means differ

b.

The rejection region requires 𝛼 = .01 in the upper tail of the F-distribution with 𝜈 = 𝑘 − 1 = 3 − 1 = 2 and 𝜈 = 𝑛 − 𝑘 = 90 − 3 = 87. Using MINITAB, Inverse Cumulative Distribution Function F distribution with 2 DF in numerator and 87 DF in denominator P( X <= x ) 0.99

x 4.85777

The rejection region is 𝐹 > 4.86.

9.93

c.

Since the observed value of the test statistic falls in the rejection region (𝐹 = 6.965 > 4.86), H0 is rejected. There is sufficient evidence to indicate differences exist in the mean rates of return among the three types of fund groups at 𝛼 = .01.

a.

Tukey’s multiple comparison method is preferred over other methods because it controls experimental error at the chosen 𝛼 level. It is more powerful than the other methods.

b.

From the confidence interval comparing large-cap and medium-cap mutual funds, we find that 0 is in the interval. Thus, 0 is not an unusual value for the difference in the mean rates of return between large-cap and medium-cap mutual funds. This means we would not reject H0. There is insufficient evidence of a difference in mean rates of return between large-cap and medium-cap mutual funds at 𝛼 = .05.

Copyright © 2022 Pearson Education, Inc.


488

9.94

9.95

Chapter 9

c.

From the confidence interval comparing large-cap and small-cap mutual funds, we find that 0 is not in the interval. Thus, 0 is an unusual value for the difference in the mean rates of return between large-cap and small-cap mutual funds. This means we would reject H0. There is sufficient evidence of a difference in mean rates of return between large-cap and small-cap mutual funds at 𝛼 = .05.

d.

From the confidence interval comparing medium-cap and small-cap mutual funds, we find that 0 is in the interval. Thus, 0 is not an unusual value for the difference in the mean rates of return between medium-cap and small-cap mutual funds. This means we would not reject H0. There is insufficient evidence of a difference in mean rates of return between medium-cap and small-cap mutual funds at 𝛼 = .05.

e.

From the above, the mean rate of return for large-cap mutual funds is the largest, followed by medium-cap, followed by small-cap mutual funds. The mean rate of return for large-cap funds is significantly larger than that for small-cap funds. No other differences exist.

f.

We are 95% confident of this decision.

a.

The treatments were the 8 different activities.

b.

The blocks were the 15 adults who participated in the study.

c.

Since the p-value is less than 𝛼(𝑝 = .001 < .01) H0 is rejected. There is sufficient evidence to indicate a difference in mean heart rate among the 8 activities at 𝛼 = .01.

d.

The treadmill jogging had the highest mean heart rate. It was significantly greater than the mean heart rates of all the other activities. Brisk treadmill walking had the second highest mean heart rate. It was significantly less than the mean heart rate of treadmill jogging, but significantly greater than the mean heart rates of the other 6 activities. There was no significant difference in the mean heart rates among the treatments Wii aerobics, Wii muscle conditioning, Wii yoga, and Wii balance. The mean heart rates for these activities were significantly less than the mean heart rates for treadmill jogging and brisk treadmill walking, but greater than the mean heart rates of handheld gaming and rest. There was no significant difference in the mean heart rate between handheld gaming and rest. The mean heart rate for these two activities were significantly less than those for the other 6 activities.

a.

To determine if the mean LUST discount percentages across the seven states differ, we test: H 0 : μ1 = μ 2 =  = μ 7 H a : At least two treatment means differ

b.

From the ANOVA table, the test statistic is 𝐹 = 1.60 and the p-value is 𝑝 = 0.174. Since the observed p-value is not less than 𝛼 (𝑝 = .174 ≮ . 10), H0 is not rejected. There is insufficient evidence to indicate a difference in the mean LUST discount percentages among the seven states at 𝛼 = .10.

9.96

a.

This was a randomized complete block design. The blocks are the months and the treatments were the 3 types of measures of electrical consumption.

b.

df Method = k − 1 = 3 − 1 = 2 , df Error = n − k − b + 1 = 12 − 3 − 4 + 1 = 6 , SS T = ( k − 1 ) M S T = ( 3 − 1 )(.1 95 ) = .3 90 ,

FMonth =

S S B = ( b − 1 ) M S B = ( 4 − 1 )(1 0 .7 8 0 ) = 3 2 .3 4 0 ,

MSB 10.780 = = 159.23 MSE .069 Copyright © 2022 Pearson Education, Inc.


Design of Experiments and Analysis of Variance 489 Source Forecast Method Month Error Total

c.

df 2 3 6 11

SS .390 32.340 .414 33.144

MS .195 10.780 .069

F-value 2.83 156.23

p-value .08 < .01

To determine if there is a difference in the mean electrical consumption values among the three methods, we test: H 0 : μ1 = μ 2 = μ 3 H a : At least 2 of the treatment means differs

The test statistic is 𝐹 = 2.83 and the p-value is 𝑝 = .08. Since the p-value is not less than 𝛼 (𝑝 = .08 ≮ . 05), H0 is not rejected. There is insufficient evidence to indicate a difference in mean electrical consumption values among the three methods at 𝛼 = .05. 9.97

a.

There are a total of 2 × 4 = 8 treatments.

b.

The interaction between temperature and type was significant. This means that the effect of type of yeast on the mean autolysis yield depends on the level of temperature.

c.

To determine if the main effect of type of yeast is significant, we test: H 0 : μ Ba = μ Br H a : μ Ba ≠ μ Br

To determine if the main effect of temperature is significant, we test: H 0 : μ1 = μ 2 = μ 3 = μ 4 H a : At least two treatment means differs

d.

The tests for the main effects should not be run since the test for interaction was significant. If interaction is significant, then these interaction effects could cover up the main effects. Thus, the main effect tests would not be informative.

e.

Baker’s yeast: The mean yield for temperature 54o is significantly lower than the mean yields for the other 3 temperatures. There is no difference in the mean yields for the temperatures 45o, 48o and 51o. Brewer’s yeast: The mean yield for temperature 54o is significantly lower than the mean yields for the other 3 temperatures. There is no difference in the mean yields for the temperatures 45o, 48o and 51o.

9.98

a.

The response is the weight of a brochure. There is one factor and it is carton. The treatments are the five different cartons, while the experimental units are the brochures.

(  y ) = .75005 = .01406437506 CM = 2

b.

n

2

40

S S ( T o tal ) =  y 2 − C M = .0 1 4 0 6 6 5 3 7 − .0 1 4 0 6 4 3 7 5 0 6 = .0 0 0 0 0 2 1 6 2 6 4

Copyright © 2022 Pearson Education, Inc.


490

Chapter 9 SST = 

Ti 2 .14767 2 .15028 2 .14962 2 .15217 2 .150312 − CM = + + + + − .01406437506 8 8 8 8 8 ni

= .01406568209 − .01406437506 = .00000130703 SSE = SS ( T otal ) − SST = .00000216264 − .00000130703 = .00000085561

MST =

SST .00000130703 = = .000000326756 k −1 5 −1

MSE =

SSE .00000085561 = = .000000024446 n−k 40 − 5

Source Treatments Error Total

df 4 35 39

SS .00000130703 .00000085561 .00000216264

F=

MST .000000326756 = = 13.37 MSE .000000024446

MS .000000326756 .000000024446

F 13.37

To determine whether there are differences in mean weight per brochure among the five cartons, we test: H 0 : μ1 = μ 2 = μ 3 = μ 4 = μ 5 H a : At least two treatment means differ

The test statistic is 𝐹 = 13.37. The rejection region requires α = .05 in the upper tail of the F-distribution with 𝜈 = 𝑘 − 1 = 5 − 1 = 4 and 𝜈 = 𝑛– 𝑘 = 40 − 5 = 35. From Table VI, Appendix D, 𝐹. ≈ 2.61. The rejection region is 𝐹 > 2.61. Since the observed value of the test statistic falls in the rejection region (𝐹 = 13.37 > 2.61), H0 is rejected. There is sufficient evidence to indicate a difference in mean weight per brochure among the five cartons at 𝛼 = .05. c.

We must assume that the distributions of weights for the brochures in the five cartons are normal, that the variances of the weights for the brochures in the five cartons are equal, and that random and independent samples were selected from each of the cartons.

d.

Using MINITAB, the results of Tukey’s multiple comparison procedure are:

Level Carton1 Carton2 Carton3 Carton4 Carton5

N 8 8 8 8 8

Mean 0.018459 0.018785 0.018703 0.019021 0.018789

Individual 95% CIs For Mean Based on Pooled StDev StDev ---+---------+---------+---------+----0.000105 (-----*-----) 0.000101 (----*-----) 0.000109 (----*-----) 0.000232 (-----*-----) 0.000188 (----*-----) ---+---------+---------+---------+-----0.01840 0.01860 0.01880 0.01900

Pooled StDev = 0.000156

Copyright © 2022 Pearson Education, Inc.


Design of Experiments and Analysis of Variance 491 Tukey 95% Simultaneous Confidence Intervals - All Pairwise Comparisons Individual confidence level = 99.32% Carton1 subtracted from:

Carton2 Carton3 Carton4 Carton5

Carton2 Carton3 Carton4 Carton5

Lower 0.0001013 0.0000188 0.0003375 0.0001050

Center 0.0003262 0.0002437 0.0005625 0.0003300

Upper 0.0005512 0.0004687 0.0007875 0.0005550

------+---------+---------+---------+--(-----*------) (-----*-----) (-----*-----) (-----*------) ------+---------+---------+---------+---0.00035 0.00000 0.00035 0.00070

Carton2 subtracted from:

Carton3 Carton4 Carton5

Carton3 Carton4 Carton5

Lower -0.0003075 0.0000113 -0.0002212

Center -0.0000825 0.0002363 0.0000037

Upper 0.0001425 0.0004612 0.0002287

------+---------+---------+---------+--(------*-----) (------*-----) (-----*------) ------+---------+---------+---------+---0.00035 0.00000 0.00035 0.00070

Carton3 subtracted from:

Carton4 Carton5

Carton4 Carton5

Lower 0.0000938 -0.0001387

Center 0.0003187 0.0000862

Upper 0.0005437 0.0003112

------+---------+---------+---------+--(-----*------) (-----*------) ------+---------+---------+---------+---0.00035 0.00000 0.00035 0.00070

Carton4 subtracted from:

Carton5

Carton5

Lower -0.0004575

Center -0.0002325

Upper -0.0000075

------+---------+---------+---------+--(-----*------) ------+---------+---------+---------+---0.00035 0.00000 0.00035 0.00070

The means arranged in order are: Carton 1 .018459

Carton 3 .018703

Carton 2 .018785

Carton 5 .018789

Copyright © 2022 Pearson Education, Inc.

Carton 4 .019021


492

Chapter 9

The interpretation of the Tukey results are: The mean weight for carton 4 is significantly higher than the mean weights of all the other cartons. The mean weights of cartons 2, 3, and 5 are not significantly different from each other, but they are significantly higher than the mean weight of carton 1.

9.99

e.

Since there are differences among the cartons, management should sample from many cartons.

a.

The experimental units are the participants in the study.

b.

The dependent variable is the brand recall score.

c.

There is one factor in this study – TV viewing group. Since there is only one factor, the treatments correspond to the factor levels of this variable. Thus, the treatments are the same as the three levels of TV viewer group. These 3 levels are violent content code, sex content code, and neutral TV.

d.

The means given are only sample means. If new samples were selected and sample means computed, the values and order of the sample means could change. In addition, the variances are not taken into account.

e.

MINITAB was used to create the following printout: Analysis of Variance Source

DF Adj SS Adj MS F-Value P-Value

RATING Error Total

2 321 323

123.3 967.4 1090.6

61.633 3.014

20.45

0.000

The test statistic is 𝐹 = 20.45 and the p-value is 𝑝 = 0.000. f.

Since the p-value is less than 𝛼(𝑝 = 0.000 < .01), Ho is rejected. There is sufficient evidence to indicate differences in the mean recall scores among the three viewing groups at 𝛼 = .01. The researchers can conclude that the content of the TV show affects the recall of imbedded commercials.

g.

Using MINITAB, the histograms of the three viewing groups are:

Copyright © 2022 Pearson Education, Inc.


Design of Experiments and Analysis of Variance 493

Histogram of VIOLENT, SEX, NEUTRAL Normal VIOLENT

SEX

24

VIOLENT Mean 2.083 StDev 1.730 N 108

30

18

20

SEX Mean 1.713 StDev 1.664 N 108

Frequency

12 10

6 0

-2

0

2

4

0

6

-2

0

2

4

6

NEUTRAL

30

NEUTRA L Mean 3.167 StDev 1.811 N 108

20 10 0

0

2

4

6

The assumptions for ANOVA are that the data are approximately normal and the variances of the groups are the same. From the legend above, the standard deviations are 1.730, 1.664, and 1.811. These are all very similar. From the plots, the distributions of the violent group and the neutral group are fairly normal. The distribution of the sex group is skewed to the right and may not be normal. (

)

=

h.

The total number of pairwise comparisons made in the Tukey analysis is

i.

MINITAB was used to conduct the multiple comparisons: Grouping Information Using the Tukey Method and 95% Confidence RATING N V S

(

)

= 3.

N Mean Grouping 108 108 108

3.167 A 2.083 1.713

B B

Means that do not share a letter are significantly different.

For comparing the Violence and Sex groups: Since the ratings are grouped together in the printout, there is no indication that there is a difference in mean recall between the V and S groups at 𝛼 = .05. For comparing the Violence and Neutral groups: Since Neutral group is grouped higher than that of the Violence group, there is evidence to indicate the mean recall for the Neutral group is significantly higher than that of the Violence group. For comparing the Sex and Neutral groups: Since Neutral group is grouped higher than that of the Sex group, there is evidence to indicate the mean recall for the Neutral group is significantly higher than that of the Sex group j.

Yes. When compared to the Neutral group, the mean recalls for the V and S groups are significantly lower than the mean recall for the Neutral group.

k.

Using MINITAB, a complete factorial design was fit to the data:

Copyright © 2022 Pearson Education, Inc.


494

Chapter 9 General Linear Model: RECALL versus CONTENT, BEFORE Factor Type Levels Values CONTENT fixed 3 NEUTRAL, SEX, VIOLENT BEFORE fixed 2 NO, YES Analysis of Variance for RECALL, using Adjusted SS for Tests Source CONTENT BEFORE CONTENT*BEFORE Error Total S = 1.73153

DF 2 1 2 318 323

Seq SS 123.265 6.458 7.472 953.421 1090.617

R-Sq = 12.58%

Adj SS 120.004 6.393 7.472 953.421

Adj MS 60.002 6.393 3.736 2.998

F 20.01 2.13 1.25

P 0.000 0.145 0.289

R-Sq(adj) = 11.21%

Grouping Information Using Tukey Method and 95.0% Confidence CONTENT NEUTRAL VIOLENT SEX

N 108 108 108

Mean 3.167 2.090 1.731

Grouping A B B

Means that do not share a letter are significantly different. Grouping Information Using Tukey Method and 95.0% Confidence BEFORE NO YES

N 162 162

Mean 2.470 2.188

Grouping A A

Means that do not share a letter are significantly different.

First, we test for the interaction term. To determine if content group and whether one had watched the commercial before interact to affect recall, we test: H 0 : Content and whether one watched commercial before do not interact H a : Content and whether one watched commercial before do interact

The test statistic is 𝐹 = 1.25 and the p-value is 𝑝 = .289. Since the p-value is not small, H0 is not rejected. There is no evidence to indicate content and whether the commercial was viewed before interact to affect recall for any reasonable value of 𝛼. Next, we test for the main effects. To determine if the mean recall differs among the content groups, we test: H 0 : μ1 = μ 2 = μ 3 H a : At least two means differ

The test statistic is 𝐹 = 20.01 and the p-value is 𝑝 = .000. Since the p-value is very small, H0 is rejected. There is evidence to indicate the mean recall differs among the different content groups for any reasonable value of 𝛼. Tukey’s multiple comparison on the content means yielded the following. The mean recall for those in the neutral content group was significantly higher than the mean recall of the other 2 groups. No other differences existed.

Copyright © 2022 Pearson Education, Inc.


Design of Experiments and Analysis of Variance 495

To determine if the mean recall differs between whether one watched the ad before or not, we test: H 0 : μ1 = μ 2 H a : μ1 ≠ μ 2

The test statistic is 𝐹 = 2.13 and the p-value is 𝑝 = .145. Since the p-value is not small, H0 is not rejected. There is no evidence to indicate the mean recall differs between whether one watched the ad before or not for any reasonable value of 𝛼. These results agree with the researchers’ conclusions. 9.100

Using MINITAB, the ANOVA table is: Two-way ANOVA: Rate versus Week, Day Analysis of Variance for Rate Source DF SS MS Week 8 575.2 71.9 Day 4 94.2 23.5 Error 32 376.9 11.8 Total 44 1046.4

Day 1 2 3 4 5

Mean 8.8 4.6 5.8 5.4 6.4

F 6.10 2.00

P 0.000 0.118

Individual 95% CI -+---------+---------+---------+---------+ (--------*---------) (--------*---------) (--------*--------) (--------*---------) (---------*--------) -+---------+---------+---------+---------+ 2.5 5.0 7.5 10.0 12.5

To determine if there is a difference in mean rate of absenteeism among the 5 days of the week, we test: H 0 : μ1 = μ 2 = μ 3 = μ 4 = μ 5 H a : At least 2 of the treatment means differs

The test statistic is 𝐹 = 2.00 and the p-value is 𝑝 = .118. Since the p-value is not small, H0 is not rejected. There is insufficient evidence to indicate a difference in mean rate of absenteeism among the 5 days of the week for any value of 𝛼 < .118. To test for the effectiveness of blocking, we test: H 0 : μ1 = μ 2 =  = μ 9 H a : At least 2 of the block means differs

The test statistic is 𝐹 = 6.10 and the p-value is 𝑝 = .000. Since the p-value is so small, H0 is rejected. There is sufficient evidence to indicate blocking was effective at any reasonable value of 𝛼.

Copyright © 2022 Pearson Education, Inc.


496

Chapter 9

9.101

a.

To determine if the mean level of trust differs among the six treatments, we test: H 0 : μ1 = μ 2 =  = μ 6 H a : At least two treatment means differ

b.

The test statistic is 𝐹 = 2.21. The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution withν 1 = k − 1 = 6 − 1 = 5 andν 2 = n − k = 230 − 6 = 224 . Using MINITAB, Inverse Cumulative Distribution Function F distribution with 5 DF in numerator and 224 DF in denominator P( X <= x ) 0.95

x 2.25436

The rejection region is 𝐹 > 2.25. Since the observed value of the test statistic does not fall in the rejection region (𝐹 = 2.21 ≯ 2.25), H0 is not rejected. There is insufficient evidence to indicate that at least two mean trusts differ at 𝛼 = .05.

9.102

c.

We must assume that all six samples are drawn from normal populations, the six population variances are the same, and that the samples are independent.

d.

I would classify this experiment as designed. Each subject was randomly assigned to receive one of the six scenarios.

e.

The mean level of trust for the "no close" technique is significantly higher than that for "the visual close" and the "thermometer close" techniques. The mean level of trust for the "financial close event" technique is significantly higher than that for the "thermometer close" technique. No other significant differences exist.

a.

There are a total of 2 × 4 = 8 treatments for this study. They include all combinations of Insomnia status and Education level. The 8 treatments are: Normal sleeper, College Graduate Normal sleeper, Some college Normal sleeper, High School graduate Normal sleeper, High School Dropout

b.

Chronic insomnia, College Graduate Chronic insomnia, Some college Chronic insomnia, High School graduate Chronic insomnia, High School Dropout

Since Insomnia and Education did not interact, this means that the effect of Insomnia on the Fatigue Severity Scale does not depend on the level of Education. In a graph, the lines will be parallel. A possible graph of this situation is:

Copyright © 2022 Pearson Education, Inc.


Design of Experiments and Analysis of Variance 497 Scatterplot of FSS vs Insomnia Education 1 2 3 4

11 10 9

FSS

8 7 6 5 4 3 2 1.0

9.103

1.2

1.4 1.6 Insomnia

1.8

2.0

c.

This means that the researchers can infer that the population mean FSS for people who had insomnia is higher than the population mean FSS for normal sleepers.

d.

This means that at least one level of education had a mean FSS score that differed from the rest. There may be more than one difference, but there is at least one.

e.

With 95% confidence, we can conclude that the mean FSS value for high school dropouts is significantly higher than the mean FSS values for the 3 other education levels. There is no significant difference in the mean FSS values for college graduates, those with some college, and high school graduates.

a.

To determine if the mean knowledge gain differs among the three groups, we test: H 0 : μ1 = μ 2 = μ 3 H a : At least two treatment means differ

b.

Using MINITAB, the results are: One-way ANOVA: NO, CHECK, FULL Source DF SS MS F Factor 2 6.64 3.32 0.45 Error 72 527.36 7.32 Total 74 534.00 S = 2.706

c.

R-Sq = 1.24%

P 0.637

R-Sq(adj) = 0.00%

The test statistic is 𝐹 = 0.45 and the p-value is 𝑝 = 0.637.

Since the p-value (𝑝 = 0.637) is larger than any reasonable significance level, H0 is not rejected. There is insufficient evidence to indicate a difference in the mean knowledge gained among the three levels of assistance for any reasonable value of 𝛼. Practically speaking, there is not one type of assistance that helps students more than another. 9.104

a.

The experimenters expected there to be much variation in the number of participants from week to week (more participants at the beginning and fewer as time goes on). Thus, by blocking on weeks, this extraneous source of variation can be controlled.

b.

df(Week) = b −1 = 6 −1 = 5

Copyright © 2022 Pearson Education, Inc.


498

Chapter 9 MS ( Prompt ) =

SST 1185.0 = = 296.25 df 4

F ( Prompt ) =

MST 296.25 = = 39.87 MSE 7.43

MS 296.25 77.28 7.43

F 39.87 10.40

The ANOVA table is: Source Prompt Week Error Total

c.

df

4 5 20 29

SS 1185.0 386.4 148.6 1720.0

p 0.0001 0.0001

To determine if a difference exists in the mean number of walkers per week among the five walker groups, we test: H 0 : μ1 = μ 2 = μ 3 = μ 4 = μ 5 H a : At least two treatment means differ

where μ i represents the mean number of walkers in group i. The test statistic is 𝐹 = 39.87. The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution with 𝜈 = 𝑘 − 1 = 5 − 1 = 4 and 𝜈 = 𝑛– 𝑘 − 𝑏 + 1 = 30 − 5 − 6 + 1 = 20. From Table VI, Appendix D, 𝐹. = 2.87. The rejection region is 𝐹 > 2.87. Since the observed value of the test statistic falls in the rejection region (𝐹 = 39.87 > 2.87), H0 is rejected. There is sufficient evidence to indicate differences exist among the mean number of walkers per week among the 5 walker groups at 𝛼 = .05. d.

The following conclusions are drawn: There is no significant difference in the mean number of walkers per week in the "Frequent/High" group and the "Frequent/Low group". The means for these two groups are significantly higher than the means for the other three groups. There is no significant difference in the mean number of walkers per week in the "Infrequent/Low" group and the "Infrequent/High" group. The means for these two groups are significantly higher than the mean for the "Control group.

e.

9.105

In order for the above inferences to be valid, the following assumptions must hold: 1)

The probability distributions of observations corresponding to all block-treatment conditions are normal.

2)

The variances of all the probability distributions are equal.

a.

The experimental design used was a factorial design.

b.

The two factors are diet and age. There are 2 levels of diet – fine limestone (FL) and coarse limestone (CL). There are 2 levels of age – young and old. There are 2 × 2 = 4 treatments: FL/young, FL/old, CL/young, and CL/old.

c.

The experimental units are the hens.

d.

The dependent variable is egg shell thickness.

Copyright © 2022 Pearson Education, Inc.


Design of Experiments and Analysis of Variance 499

9.106

e.

If diet and age do not interact, then the effect of diet on the egg shell thickness is the same at each level of age.

f.

This indicates that there is no significant difference in egg shell thickness between the young and old hens.

g.

This indicates that there is a significant difference in the mean egg shell thickness due to diet. The mean egg shell thickness for eggs produced by hens on the CL diet is greater than the mean egg shell

a.

There are a total of a × b = 3× 3 = 9 treatments in this study.

b.

Using MINITAB, the ANOVA results are: ANOVA: Y versus Display, Price Factor Display Price

Type Levels Values fixed 3 1 2 3 fixed 3 1 2 3

Analysis of Variance for Y Source Display Price Display*Price Error Total S = 22.2428

DF 2 2 4 18 26

SS 1691393 3089054 510705 8905 5300057

R-Sq = 99.83%

MS F 845696 1709.37 1544527 3121.89 127676 258.07 495

P 0.000 0.000 0.000

R-Sq(adj) = 99.76%

To get the SS for Treatments, we must add the SS for Display, SS for Price, and the SS for Interaction. Thus, 𝑆𝑆𝑇 = 1,691,393 + 3,089,054 + 510,705 = 5,291,152. The 𝑑𝑓 = 2 + 2 + 4 = 8. MST =

SST 5, 291,152 = = 661, 394 3(3) − 1 ab − 1

F=

MST 661,394 = = 1336.15 MSE 495

To determine whether the treatment means differ, we test: H 0 : μ1 = μ 2 =  = μ 9 H a : At least two treatment means differ

The test statistic is F =

MST = 1,336.15 . MSE

The rejection region requires 𝛼 = .10 in the upper tail of the F-distribution with 𝜈 = 𝑎𝑏 − 1 = 3(3) − 1 = 8 and 𝜈 = 𝑛– 𝑎𝑏 = 27– 3(3) = 18. From Table V, Appendix D, 𝐹. = 2.04. The rejection region is 𝐹 > 2.04. Since the observed value of the test statistic falls in the rejection region (𝐹 = 1,336.15 > 2.04), H0 is rejected. There is sufficient evidence to indicate the treatment means differ at 𝛼 = .10.

Copyright © 2022 Pearson Education, Inc.


500

Chapter 9

c.

Since there are differences among the treatment means, we next test for the presence of interaction. H0: Factors A and B do not interact to affect the response means Ha: Factors A and B do interact to affect the response means

MSAB = 258.07 and the p-value is p = .000 . MSE Since the p-value is less than 𝛼 (𝑝 = .000 < .10), H0 is rejected. There is sufficient evidence to indicate the two factors interact at 𝛼 = .10. The test statistic is F =

9.107

d.

The main effect tests are not warranted since interaction is present in part c.

e.

The nine treatment means need to be compared.

Using MINITAB, the ANOVA results are: General Linear Model: Deviation versus Group, Trail Factor Type Levels Values Group fixed 4 F G M N Trail fixed 2 C E Analysis of Variance for Deviatio, using Adjusted SS for Tests Source Group Trail Group*Trail Error Total

DF 3 1 3 112 119

Seq SS 16271.2 46445.5 2245.2 82131.7 147093.6

Adj SS 13000.6 46445.5 2245.2 82131.7

Adj MS 4333.5 46445.5 748.4 733.3

F 5.91 63.34 1.02

P 0.001 0.000 0.386

First, we must test for treatment effects. S ST = SS ( G ro u p ) + S S (T ra il ) + S S ( G xT ) = 1 6, 2 7 1 .2 + 4 6, 4 4 5 .5 + 2, 2 4 5 .2 = 6 4, 9 6 1 .9 .

The df = 3 + 1 + 3 = 7 . MST =

SST 64, 961.9 = = 9, 280.2714 ab − 1 4 ( 2 ) − 1

F=

MST 9, 280.2714 = = 12.66 MSE 733.3

To determine if there are differences in mean ratings among the 8 treatments, we test: H0: All treatment means are the same Ha: At least two treatment means differ The test statistic is 𝐹 = 12.66. Since no α was given, we will use 𝛼 = .05. The rejection region requires 𝛼 = .05 in the upper tail of the F distribution with 𝜈 = 𝑎𝑏 − 1 = 4(2) − 1 = 7 and 𝜈 = 𝑛– 𝑎𝑏 = 120– 4(2) = 112. From Table VI, Appendix D, 𝐹. ≈ 2.09. The rejection region is 𝐹 > 2.09. Since the observed value of the test statistic falls in the rejection region (𝐹 = 12.66 > 2.09), H0 is rejected. There is sufficient evidence that differences exist among the treatment means at 𝛼 = .05. Since differences exist, we now test for the interaction effect between Trail and Group.

Copyright © 2022 Pearson Education, Inc.


Design of Experiments and Analysis of Variance 501

To determine if Trail and Group interact, we test: H0: Trail and Group do not interact Ha: Trail and Group do interact The test statistic is 𝐹 = 1.02 and the p-value is 𝑝 = .386. Since the p-value is greater than 𝛼 (𝑝 = .386 > .05), H0 is not rejected. There is insufficient evidence that Trail and Group interact at 𝛼 = .05. Since the interaction does not exist, we test for the main effects of Trail and Group. To determine if there are differences in the mean trail deviations between the two levels of Trail, we test: H 0 : μ1 = μ 2 H a : μ1 ≠ μ 2

The test statistics is 𝐹 = 63.34 and the p-value is 𝑝 = .000. Since the p-value is less than 𝛼 (𝑝 = .000 < .05), H0 is rejected. There is sufficient evidence that the mean trail deviations differ between the fecal extract trail and the control trail at 𝛼 = .05. To determine if there are differences in the mean trail deviations between the four levels of Group, we test: H 0 : μ1 = μ 2 = μ 3 = μ 4 H a : At least two means differ

The test statistics is 𝐹 = 5.91 and 𝑝 = .001. Since the p-value is less than 𝛼 (𝑝 = .001 < .05), H0 is rejected. There is sufficient evidence that the mean trail deviations differ among the four groups at 𝛼 = .05. 9.108

Using MINITAB, the ANOVA Table is: ANOVA: Rating versus Prep, Standing Factor Type Levels Values Prep fixed 2 PRACTICE REVIEW Standing fixed 3 HI LOW MED Analysis of Variance for Rating Source Prep Standing Prep*Standing Error Total

DF 1 2 2 126 131

SS 54.735 16.500 13.470 478.955 563.659

S = 1.94967

R-Sq = 15.03%

MS 54.735 8.250 6.735 3.801

F 14.40 2.17 1.77

P 0.000 0.118 0.174

R-Sq(adj) = 11.66%

Copyright © 2022 Pearson Education, Inc.


502

Chapter 9 Tukey 95.0% Simultaneous Confidence Intervals Response Variable Rating All Pairwise Comparisons among Levels of Prep Prep = PRACTICE subtracted from: Prep REVIEW

Lower -1.960

Center -1.288

Upper -0.6162

---+---------+---------+---------+--(-----------*----------) ---+---------+---------+---------+---1.80 -1.20 -0.60 0.00

First, we must test for treatment effects. 𝑆𝑆𝑇 = 𝑆𝑆𝑃 + 𝑆𝑆𝑆 + 𝑆𝑆𝑃𝑆 = 54.735 + 16.500 + 13.470 = 84.705.

The 𝑑𝑓 = 1 + 2 + 2 = 5. MST =

SST 84.705 = = 16.941 ab − 1 2 ( 3 ) − 1

F=

MST 16.941 = = 4.46 MSE 3.801

To determine if there are differences in mean ratings among the 6 treatments, we test: H0: All treatment means are the same Ha: At least two treatment means differ The test statistic is 𝐹 = 4.46. Since no α was given, we will use 𝛼 = .05 . The rejection region requires 𝛼 = .05 in the upper tail of the F distribution with 𝜈 = 𝑎𝑏 − 1 = 2(3) − 1 = 5 and 𝜈 = 𝑛– 𝑎𝑏 = 132– 2(3) = 126. From Table VI, Appendix D, 𝐹. ≈ 2.29. The rejection region is 𝐹 > 2.29. Since the observed value of the test statistic falls in the rejection region (𝐹 = 4.46 > 2.29), H0 is rejected. There is sufficient evidence that differences exist among the treatment means at 𝛼 = .05. Since differences exist, we now test for the interaction effect between Preparation and Class Standing. To determine if Preparation and Class Standing interact, we test: H0: Preparation and Class Standing do not interact Ha: Preparation and Class Standing do interact The test statistic is 𝐹 = 1.77 and the p-value is 𝑝 = .174. Since the p-value is greater than 𝛼 (𝑝 = .174 > .05), H0 is not rejected. There is insufficient evidence that Preparation and Class Standing interact at 𝛼 = .05. Since the interaction does not exist, we test for the main effects of Preparation and Class standing. To determine if there are differences in the mean rating between the three levels of Class standing, we test: H 0 : μL = μM = μH H a : At leaset two treatment means differ

The test statistics is 𝐹 = 2.17 and the p-value is 𝑝 = 0.118. Since the p-value is greater than 𝛼(𝑝 = .118 > .05), H0 is not rejected. There is insufficient evidence that the mean ratings differ among the 3 levels of Class Standing at 𝛼 = .05.

Copyright © 2022 Pearson Education, Inc.


Design of Experiments and Analysis of Variance 503

To determine if there are differences in the mean rating between the two levels of Preparation, we test: H 0 : μP = μR H a : μP ≠ μR

The test statistics is 𝐹 = 14.40 and the p-value is 𝑝 = 0.000. Since the p-value is less than 𝛼 (𝑝 = .000 < .05), H0 is rejected. There is sufficient evidence that the mean ratings differ between the two levels of preparation at 𝛼 = .05. There are only 2 levels of Preparation. The mean rating for Practice is higher than the mean rating Review. 9.109

a.

Using MINITAB, the results are: Two-Sample T-Test and CI: PAH2, Site Two-sample T for PAH2 Site Development IndustryA

N 7 8

Mean 1.0743 0.9981

StDev 0.0565 0.0464

SE Mean 0.021 0.016

Difference = μ (Development) - μ (IndustryA) Estimate for difference: 0.0762 95% CI for difference: (0.0169, 0.1355) T-Test of difference = 0 (vs ≠): T-Value = 2.83

P-Value = 0.016

DF = 11

To determine if the mean PAH2 ratio at the development site is different than the corresponding mean at Industry A, a two-sample t-test yields a test statistic of 𝑡 = 2.83 and a p-value of 𝑝 = .016. This indicates that the mean PAH2 ratio at the development site is significantly different than the corresponding mean at Industry A. Two-Sample T-Test and CI: PAH2, Site Two-sample T for PAH2 Site Development IndustryB

N 7 5

Mean 1.0743 1.0980

StDev 0.0565 0.0669

SE Mean 0.021 0.030

Difference = μ (Development) - μ (IndustryB) Estimate for difference: -0.0237 95% CI for difference: (-0.1106, 0.0632) T-Test of difference = 0 (vs ≠): T-Value = -0.65

P-Value = 0.539

DF = 7

To determine if the mean PAH2 ratio at the development site is different than the corresponding mean at Industry B, a two-sample t-test yields a test statistic of 𝑡 = −.65 and a p-value of 𝑝 = .539. This indicates that the mean PAH2 ratio at the development site is not significantly different than the corresponding mean at Industry B. b.

Since so many t-tests were performed that are not independent of each other, the overall significance level is inflated. The probability of declaring two means different when in fact they are not different is much higher than the value of 𝛼 used for each individual test.

c.

A more efficient analysis would be to run an analysis of variance, comparing all treatments at the same time. Using MINITAB for PAH1, the results are:

Copyright © 2022 Pearson Education, Inc.


504

Chapter 9

One-way ANOVA: PAH1 versus Site Null hypothesis All means are equal Alternative hypothesis At least one mean is different Significance level α = 0.05 Equal variances were assumed for the analysis. Factor Information Factor Site

Levels 4

Values Development, IndustryA, IndustryB, IndustryC

Analysis of Variance Source Site Error Total

DF 3 18 21

Adj SS 0.02041 0.04697 0.06738

Adj MS 0.006802 0.002610

F-Value 2.61

P-Value 0.083

Model Summary S 0.0510835

R-sq 30.29%

R-sq(adj) 18.67%

R-sq(pred) 0.00%

To determine if there is a difference in mean PAH1 among the 4 locations, we test: H 0 : μ1 = μ 2 = μ 3 = μ 4 H a : At least two treatment means differ

The test statistic is 𝐹 = 2.61 and the p-value is 𝑝 = .083. Since the p-value is not small, H0 is not rejected. There is insufficient evidence to indicate there is a difference in mean PAH1 among the 4 locations for any 𝛼 < .083.

Copyright © 2022 Pearson Education, Inc.


Design of Experiments and Analysis of Variance 505

Using MINITAB for PAH1, the results are: One-way ANOVA: PAH2 versus Site Null hypothesis All means are equal Alternative hypothesis At least one mean is different Significance level α = 0.05 Equal variances were assumed for the analysis. Factor Information Factor Site

Levels 4

Values Development, IndustryA, IndustryB, IndustryC

Analysis of Variance Source Site Error Total

DF 3 18 21

Adj SS 0.03977 0.05366 0.09343

Adj MS 0.013255 0.002981

F-Value 4.45

P-Value 0.017

Model Summary S 0.0546000

R-sq 42.56%

R-sq(adj) 32.99%

R-sq(pred) 14.59%

Tukey Pairwise Comparisons Grouping Information Using the Tukey Method and 95% Confidence Site IndustryB IndustryC Development IndustryA

N 5 2 7 8

Mean 1.0980 1.0875 1.0743 0.9981

Grouping A A B A B B

Means that do not share a letter are significantly different.

To determine if there is a difference in mean PAH2 among the 4 locations, we test: H 0 : μ1 = μ 2 = μ 3 = μ 4 H a : At least two treatment means differ

The test statistic is 𝐹 = 4.45 and the p-value is 𝑝 = .017. Since the p-value is small, H0 is rejected. There is sufficient evidence to indicate there is a difference in mean PAH2 among the 4 locations for any 𝛼 > .017. The mean PAH2 for Industry A is significantly greater than the mean PAH2 for Industry A. No other significant differences exist.

Copyright © 2022 Pearson Education, Inc.


Chapter 10 Categorical Data Analysis 10.1

a.

The rejection region requires α = .05 in the upper tail of the χ 2 distribution with df = k − 1 = 3 − 1 = 2 . 2 From Table IV, Appendix D, χ .05 = 5.99147 . The rejection region is χ 2 > 5.99147 .

b.

The rejection region requires α = .10 in the upper tail of the χ 2 distribution with df = k − 1 = 5 − 1 = 4 . 2 From Table IV, Appendix D, χ .10 = 7.77944 . The rejection region is χ 2 > 7.77944 .

c.

The rejection region requires α = .01 in the upper tail of the χ 2 distribution with df = k − 1 = 4 − 1 = 3 . 2 From Table IV, Appendix D, χ .01 = 11.3449 . The rejection region is χ 2 > 11.3449 .

10.2

The characteristics of the multinomial experiment are: 1. 2. 3. 4. 5.

The experiment consists of n identical trials. There are k possible outcomes to each trial. The probabilities of the k outcomes, denoted p1 , p2 , , pk remain the same from trial to trial, where p1 + p2 +  + pk = 1 . The trials are independent. The random variables of interest are the counts n1 , n2 , , nk in each of the k cells.

The characteristics of the binomial are the same as those for the multinomial with k = 2 . 10.3

The sample size n will be large enough so that, for every cell, the expected cell count, Ei, will be equal to 5 or more.

10.4

The hypotheses of interest are:

H 0 : p1 = .25, p2 = .25, p3 = .50 H a : At lease one of the probabilities differs from the hypothesized value E1 = np1,0 = 320 ( .25 ) = 80

The test statistic is χ 2 = 

E 2 = np 2,0 = 320 (.25 ) = 80

E3 = np3,0 = 320 ( .50 ) = 160

[ n − E ] = ( 78 − 80) + ( 60 − 80) + (182 − 160) = 8.075 2

i

2

2

2

i

Ei

80

80

160

The rejection region requires α = .05 in the upper tail of the χ 2 distribution with df = k − 1 = 3 − 1 = 2 . 2 From Table IV, Appendix B, χ .05 = 5.99147 . The rejection region is χ 2 > 5.99147 .

(

)

Since the observed value of the test statistic falls in the rejection region χ 2 = 8.075 > 5.99147 , H0 is rejected. There is sufficient evidence to indicate that at least one of the probabilities differs from its hypothesized value at α = .05 .

506 Copyright © 2022 Pearson Education, Inc.


Categorical Data Analysis 10.5

507

Some preliminary calculations are: If the probabilities are the same, p1,0 = p2,0 = p3,0 = p4,0 = .25 E1 = np1,0 = 205 (.25 ) = 51.25 = E 2 = E3 = E 4

a.

To determine if the multinomial probabilities differ, we test:

H 0 : p1 = p2 = p3 = p4 = .25 H a : At lease one of the probabilities differs from .25 The test statistic is

χ2 = 

[ n − E ] = ( 43 − 51.25) + ( 56 − 51.25) + ( 59 − 51.25) + ( 47 − 51.25) = 3.293 2

i

2

2

2

2

i

51.25

Ei

51.25

51.25

51.25

The rejection region requires α = .05 in the upper tail of the χ 2 distribution with df = k − 1 = 4 − 1 = 3 . 2 From Table IV, Appendix D, χ .05 = 7.81473 . The rejection region is χ 2 > 7.81473 .

Since the observed value of the test statistic does not fall in the rejection region χ 2 = 3.293 >/ 7.81473 , H0 is not rejected. There is insufficient evidence to indicate the multinomial

(

)

probabilities differ at α = .05 . b.

The Type I error is concluding the multinomial probabilities differ when, in fact, they do not. The Type II error is concluding the multinomial probabilities are equal, when, in fact, they are not.

c.

For confidence coefficient .95, α = .05 and α / 2 = .05 / 2 = .025 . From Table II, Appendix D, z.025 = 1.96 . pˆ 3 = 59 / 205 = .288

The confidence interval is:

pˆ 3 ± z.025 10.6

a.

.288 (.712) ˆˆ pq  .288 ± 1.96  .288 ± .062  (.226, .350) n 205

The expected numbers of adults in the subscriber categories are: 𝐸 𝑛 / = .30 800 = 240; 𝐸 𝑛 .1 800 = 80;

b.

= .20 800 = 160; 𝐸 𝑛

= .4 800 = 320; 𝐸 𝑛

To determine if the provider’s claim is true, we test: 𝐻 : 𝑝 = .3, 𝑝 = .2, 𝑝 = .4, 𝑝 = .1

c.

The test statistic is 𝜒 = ∑

=

+

(

)

+

(

Copyright © 2022 Pearson Education, Inc.

)

+

(

)

= 16

=


508

Chapter 10

d.

The rejection region requires α = .05 in the upper tail of the χ 2 distribution with 𝑑𝑓 = 𝑘 − 1 = 4 − 1 = 3. From Table IV, Appendix B, 𝜒. = 7.81473. The rejection region is 𝜒 > 7.81473.

e.

The p-value is𝑝 = 𝑃(𝜒 > 16). Using MINITAB, the results are: Chi-Square with 3 DF x P( X ≤ x ) 16

0.998866

The p-value is 𝑝 = 𝑃(𝜒 > 16) = 1 − .998866 = .001134.

10.7

f.

Since the p-value is less than 𝛼 (𝑝 = .001134 < .05), H0 is rejected. There is sufficient evidence to indicate the provider’s claim is incorrect at α = .05 .

a.

The data are categorical because they are measured using categories, not meaningful numbers. The possible categories are legs only, wheels only, both legs and wheels, and neither legs nor wheels.

b.

Let p1 = proportion of social robots with legs only, p2 = proportion of social robots with wheels only, p3 = proportion of social robots with both legs and wheels, and p4 = proportion of social robots with neither legs nor wheels. To determine if the design engineer’s claim is incorrect, we test:

H 0 : p1 = .50, p2 = .30, p3 = .10, and p4 = .10 H a : At least one of the probabilities differs from the hypothesized value c.

If the claim is true, E1 = np1,0 = 106 (.50 ) = 53 , E2 = np2,0 = 106 (.30 ) = 31.8 ,

E3 = np3,0 = 106 (.10 ) = 10.6 , and E4 = np4,0 = 106 (.10 ) = 10.6 .

[ n − E ] = ( 63 − 53) + ( 20 − 31.8) + (8 − 10.6) + (15 − 10.6) = 8.730 2

2

2

2

2

d.

The test statistic is χ = 

e.

The rejection region requires α = .05 in the upper tail of the χ 2 distribution with df = k − 1 = 4 − 1 = 3 .

2

i

i

Ei

53

31.8

10.6

10.6

2 From Table IV, Appendix D, χ .05 = 7.81473 . The rejection region is χ 2 > 7.81473 .

Since the observed value of the test statistic falls in the rejection region ( χ 2 = 8.730 > 7.81473) , H0 is rejected. There is sufficient evidence to indicate that at least one of the probabilities differs from its hypothesized value at. α = .05 . 10.8

a.

The variable measured was American Dream responses. The levels were: my family achieved it, it is within reach for me, somewhat optimistic I will reach it, somewhat pessimistic I will reach it, it is out of reach for me. and not sure.

b.

𝑛 = 𝑛𝑝̂ = 1,083(. 24) = 259.92 ≈ 260, 𝑛 = 𝑛𝑝̂ = 1,083(. 24) = 259.92 ≈ 260, 𝑛 = 𝑛𝑝̂ = 1,083(. 16) = 173.28 ≈ 173, 𝑛 = 𝑛𝑝̂ = 1,083(. 07) = 75.81 ≈ 76, 𝑛 = 𝑛𝑝̂ = 1,083(. 11) = 119.13 ≈ 119, 𝑛 = 𝑛𝑝̂ = 1,083(. 18) = 194.94 ≈ 195

c.

If the true percentages in each category are equal, the expected counts would all be 𝐸 = 𝑛𝑝 , = 1083(1/6) = 180.5 Copyright © 2022 Pearson Education, Inc.


Categorical Data Analysis d.

509

To determine if the true percentages in each category are equal, we test: 𝐻 :𝑝 = 𝑝 = 𝑝 = 𝑝 = 𝑝 = 𝑝 = 1 6 𝐻 : At least one of the probabilities differs from the hypothesized values

e.

The test statistic is 𝜒 = ∑ (

. ) .

+

(

. ) .

=

(

. ) .

+

(

. ) .

+

(

. ) .

+

(

. ) .

+

= 152.96

f.

The rejection region requires α = .05 in the upper tail of the χ 2 distribution with 𝑑𝑓 = 𝑘 − 1 = 6 − 1 = 5. From Table IV, Appendix D, 𝜒. = 11.0705. The rejection region is 𝜒 > 11.0705.

g.

Since the observed value of the test statistic falls in the rejection region (𝜒 = 152.96 > 11.0705), H0 is rejected. There is sufficient evidence to indicate that at least one of the probabilities differs from its hypothesized value at. α = .05 .

h.

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, 𝑧. = 1.96. 𝑝̂ = .24 The confidence interval is: 𝑝̂ ± 𝑧.

⇒ .24 ± 1.96

.

(.

)

⇒ .24 ± .0254 ⇒ (. 2146, .2654)

We are 95% confident that the proportion of all Americans who feel their family has achieved the American Dream is between .2146 and .2654. 10.9

a.

Let p1 = proportion using total visitors, p2 = proportion using paying visitors, p3 = proportion using big shows, p4 = proportion using funds raised, and p5 = proportion using members. To determine if one performance measure is used more often than any of the others, we test:

H 0 : p1 = p2 = p3 = p4 = p5 = .20 H a : At least one of the probabilities differs from the hypothesized value From the printout, the test statistic is χ 2 = 1.66667 and the p-value is p = 0.797 . Since the p-value is not less than α ( p = .797 </ .10 ) , H0 is not rejected. There is insufficient evidence to indicate that one performance measure is used more often than any of the others at α = .10 . b.

For confidence coefficient .90, α = .10 and α / 2 = .10 / 2 = .05 . From Table II, Appendix D, z.05 = 1.645 . pˆ1 = 8 / 30 = .267

The confidence interval is:

pˆ1 ± z.05

.267 (.733) ˆˆ pq  .267 ± 1.645  .267 ± .133  (.134, .400) n 30 Copyright © 2022 Pearson Education, Inc.


510

Chapter 10 We are 90% confident that the proportion of museums world-wide that use total visitors as their performance measure is between .134 and .400.

10.10

a.

The qualitative variable is firm position on off-shoring. There are four levels: “currently off-shoring”, “not currently off-shoring, but plan to do so”, “off-shored in the past, but no more”, and “off-shoring is not applicable”.

b.

Let p1 = proportion of firms currently off-shoring, p2 = proportion of firms not currently off-shoring, but plan to do so, p3 = proportion of firms off-shored in the past, but no more, and p4 = proportion of firms where off-shoring is not applicable. Some preliminary calculations are: E1 = E2 = E3 = E4 = npi,0 = 600 (.25) = 150 To determine if the proportions of U.S. firms in the four off-shoring position categories is significantly different, we test:

H 0 : p1 = p2 = p3 = p4 = .25 H a : At least one of the probabilities differs from the hypothesized value The test statistic is

χ = 2

[ n − E ] = (126 − 150) + ( 72 − 150) + ( 30 − 150) + ( 372 − 150) = 468.96 2

i

2

2

2

2

i

150

Ei

150

150

150

The rejection region requires α = .05 in the upper tail of the χ 2 distribution with df = k − 1 = 4 − 1 = 3 . 2 From Table IV, Appendix D, χ .05 = 7.81473 . The rejection region is χ 2 > 7.81473 .

Since the observed value of the test statistic falls in the rejection region ( χ 2 = 468.96 > 7.81473) , H0 is rejected. There is sufficient evidence to indicate that at least one of the proportions of U.S. firms in the four off-shoring position categories is significantly different at α = .05 . c.

For confidence coefficient .95, α = .05 and α / 2 = .05 / 2 = .025 . From Table II, Appendix D, z.025 = 1.96 . pˆ1 = 126 / 600 = .21

The confidence interval is:

pˆ1 ± z.025

.21(.79) ˆˆ pq  .21 ± 1.96  .21 ± .033  (.177, .243) n 600

We are 95% confident that the proportion of U.S. firms who are currently off-shoring is between .177 and .243. 10.11

Some preliminary calculations are: If the probabilities are the same, p1,0 = p2,0 = p3,0 = 1 / 3 E1 = np1,0 = 505 (1 / 3 ) = 168.333 = E 2 = E3

Copyright © 2022 Pearson Education, Inc.


Categorical Data Analysis

511

To determine if in the population of posts to a travel destination website there are no differences in the percentages of posts classified into the three destination image categories, we test:

H 0 : p1 = p2 = p3 = 1/ 3 H a : At lease one of the probabilities differs from 1/3 The test statistic is

χ2 = 

[ n − E ] = ( 338 − 168.333) + (112 − 168.333) + ( 55 − 168.333) = 266.17 2

i

2

2

2

i

168.333

Ei

168.333

168.333

The rejection region requires α = .10 in the upper tail of the χ 2 distribution with df = k − 1 = 3 − 1 = 2 . From 2 Table IV, Appendix D, χ .10 = 6.25139 . The rejection region is χ 2 > 6.25139 .

(

)

Since the observed value of the test statistic falls in the rejection region χ 2 = 266.17 > 6.25139 , H0 is rejected. There is sufficient evidence to indicate there are differences in the percentages of posts classified into the three destination image categories at α = .10 . 10.12

Some preliminary calculations are: E1 = np1,0 = 1, 000 (.35 ) = 350

E 2 = np 2,0 = 1, 000 (.45 ) = 450

E3 = np3,0 = 1, 000 (.10 ) = 100

E 4 = np 4,0 = 1, 000 (.10 ) = 100

To determine if the distribution of background colors for all road signs maintained by NCDOT match the color distribution of signs in the warehouse, we test:

H 0 : p1 = .35, p2 = .45, p3 = .10, p4 = .10 H a : At lease one of the probabilities differs from the hypothesized values The test statistic is

χ2 = 

[ n − E ] = ( 373 − 350) + ( 447 − 450) + (88 − 100) + ( 92 − 100) = 3.61 2

i

2

2

2

2

i

Ei

350

450

100

100

The rejection region requires α = .05 in the upper tail of the χ 2 distribution with df = k − 1 = 4 − 1 = 3 . From 2 Table IV, Appendix D, χ .05 = 7.81473 . The rejection region is χ 2 > 7.81473 .

(

)

Since the observed value of the test statistic does not fall in the rejection region χ 2 = 3.61 >/ 7.81473 , H0 is not rejected. There is insufficient evidence to indicate the distribution of background colors for all road signs maintained by NCDOT do not match the color distribution of signs in the warehouse at α = .05 . 10.13

Let p1 = proportion users using both hands/both thumbs, p2 = proportion of users using right hand/right thumb, p3 = proportion of users suing left hand/left thumb, p4 = proportion of users using both hands/right index finger, p5 = proportion of users using left hand/right index finger and p6 = proportion of users using other. Some preliminary calculations: E1 = E2 = E3 = E4 = E5 = E6 = npi,0 = 859 (1/ 6 ) = 143.167 . To determine if the proportions of mobile device users in the six texting style categories differ, we test: Copyright © 2022 Pearson Education, Inc.


512

Chapter 10

H 0 : p1 = p2 = p3 = p4 = p5 = p6 = 1/ 6 H a : At least one of the probabilities differs from the hypothesized value The test statistic is 𝜒 = ∑ (

.

)

.

+

=

(

)

. .

(

. .

)

+

(

. .

)

+

(

. .

)

+

(

. .

)

+

= 963.40

The rejection region requires α = .10 in the upper tail of the χ 2 distribution with df = k − 1 = 6 − 1 = 5 . From 2 Table IV, Appendix D, χ.10 = 9.23635 . The rejection region is χ 2 > 9.23635 .

Since the observed value of the test statistic falls in the rejection region (𝜒 = 963.4 > 9.23635), H0 is rejected. There is sufficient evidence to indicate that the proportions of mobile device users in the six texting style categories differ at α = .10 . 10.14

The first step in the analysis is to convert the percentages found in the table to observed counts. Assuming that 5,000 males, and 5,000 females were sampled, the observed counts are as follows: Males 3450 1000 550 a.

Females 2550 1600 850

All 6000 2600 1400

Let p1 = proportion in STEM, p2 = proportion in English, and p3 = proportion in Foreign Languages/Arts/Humanities. To determine if the percentages in the three study area categories differ for all students, we test:

H 0 : p1 = p2 = p3 = 1/ 3 H a : At least one of the probabilities differs from the hypothesized value MINITAB was used to create the following printout: Observed and Expected Counts Category 1 2 3

Test Proportion 0.333333 0.333333 0.333333

Observed 6000 2600 1400

Expected 3333.33 3333.33 3333.33

Contribution to Chi-Square 2133.33 161.33 1121.33

Chi-Square Test N DF 10000 2

Chi-Sq 3416

P-Value 0.000

The test statistic is 𝜒 = 3416. The p-value is p =0.000. Since the p-value is less than 𝛼 (𝑝 = 0.000 < .10), H0 is rejected. There is sufficient evidence to indicate that the percentages in the three study area categories differ for all students at α = .10 . Copyright © 2022 Pearson Education, Inc.


Categorical Data Analysis b.

513

Let p1 = proportion in STEM, p2 = proportion in English, and p3 = proportion in Foreign Languages/Arts/Humanities. To determine if the percentages in the three study area categories differ for all males, we test:

H 0 : p1 = p2 = p3 = 1/ 3 H a : At least one of the probabilities differs from the hypothesized value MINITAB was used to create the following printout: Observed and Expected Counts Category 1 2 3

Observed 3450 1000 550

Test Proportion 0.333333 0.333333 0.333333

Expected 1666.67 1666.67 1666.67

Contribution to Chi-Square 1908.17 266.67 748.17

Chi-Square Test N DF 5000 2

Chi-Sq 2923

P-Value 0.000

The test statistic is 𝜒 = 2923. The p-value is p =0.000. Since the p-value is less than 𝛼 (𝑝 = 0.000 < .10), H0 is rejected. There is sufficient evidence to indicate that the percentages in the three study area categories differ for all male students at α = .10 . c.

Let p1 = proportion in STEM, p2 = proportion in English, and p3 = proportion in Foreign Languages/Arts/Humanities. To determine if the percentages in the three study area categories differ for all females, we test:

H 0 : p1 = p2 = p3 = 1/ 3 H a : At least one of the probabilities differs from the hypothesized value MINITAB was used to create the following printout: Observed and Expected Counts Category 1 2 3

Observed 2550 1600 850

Test Proportion 0.333333 0.333333 0.333333

Expected 1666.67 1666.67 1666.67

Contribution to Chi-Square 468.167 2.667 400.167

Chi-Square Test N DF 5000 2

Chi-Sq 871

P-Value 0.000

The test statistic is 𝜒 = 871.

Copyright © 2022 Pearson Education, Inc.


514

Chapter 10 The p-value is p =0.000. Since the p-value is less than 𝛼 (𝑝 = 0.000 < .10), H0 is rejected. There is sufficient evidence to indicate that the percentages in the three study area categories differ for all male students at α = .10 .

10.15

Let p1 = proportion of mail only users, p2 = proportion of Internet only users, and p3 = proportion of both mail and Internet. Some preliminary calculations: E1 = E2 = E3 = npi,0 = 440 (1 / 3 ) = 146.667

To determine if the professor’s beliefs are correct, we test:

H 0 : p1 = p2 = p3 = 1/ 3 H a : At least one of the probabilities differs from the hypothesized value

χ2 =  The test statistic is

[ n − E ] = ( 262 − 146.667) + ( 43 − 146.667) + (135 − 146.667) = 164.895 2

i

2

2

2

i

146.667

Ei

146.667

146.667

The rejection region requires α = .01 in the upper tail of the χ 2 distribution with df = k − 1 = 3 − 1 = 2 . From 2 Table IV, Appendix D, χ .01 = 9.21034 . The rejection region is χ 2 > 9.21034 .

Since the observed value of the test statistic falls in the rejection region ( χ 2 = 164.895 > 9.21034 ) , H0 is rejected. There is sufficient evidence to indicate that the proportions mail only, Internet only, and both mail and Internet users differ at α = .01 . 10.16

Let p1 = proportion of anchor tenants, p2 = proportion of major space users, p3 = proportion of large standard tenants, p4 = proportion of small standard tenants, and p5 = proportion of small tenants. Some preliminary calculations:

E1 = np1,0 = 1,821(.01) = 18.21

E2 = np2,0 = 1,821(.05) = 91.05

E4 = np4,0 = 1,821(.40 ) = 728.4

E5 = np5,0 = 1,821(.44 ) = 801.24

E3 = np3,0 = 1,821(.10 ) = 182.1

To determine if the mall developer’s belief is correct, we test:

H 0 : p1 = .01, p2 = .05, p3 = .10, p4 = .40, p5 = .44 H a : At least one of the probabilities differs from the hypothesized value The test statistic is

χ2 = 

[ n − E ] = (14 − 18.21) + ( 61 − 91.05) + ( 216 − 182.1) + ( 711 − 728.4) 2

i

2

2

2

i

18.21

Ei

91.05

182.1

(819 − 801.24) = 18.011 2

+

801.24

Copyright © 2022 Pearson Education, Inc.

728.4

2


Categorical Data Analysis

515

The rejection region requires α = .01 in the upper tail of the χ 2 distribution with df = k − 1 = 5 − 1 = 4 . From 2 Table IV, Appendix D, χ .01 = 13.2767 . The rejection region is χ 2 > 13.2767 .

Since the observed value of the test statistic falls in the rejection region ( χ 2 = 18.011 > 13.2767 ) , H0 is rejected. There is sufficient evidence to indicate that the proportions of tenants in the five categories differ from the developer’s belief at α = .01 . 10.17

To determine if the number of overweight trucks per week is distributed over the 7 days of the week in direct proportion to the volume of truck traffic, we test:

H 0 : p1 = .191, p2 = .198, p3 = .187, p4 = .180, p5 = .155, p6 = .043, p7 = .046 H a : At least one of the probabilities differs from the hypothesized value E1 = np1,0 = 414 (.191) = 79.074

E 2 = np 2,0 = 414 (.198 ) = 81.972

E3 = np3,0 = 414 (.187 ) = 77.418

E 4 = np 4,0 = 414 ( .180 ) = 74.520

E5 = np 2,0 = 414 (.155 ) = 64.170

E 6 = np3,0 = 414 ( .043 ) = 17.802

E 7 = np3,0 = 414 ( .046 ) = 19.044

The test statistic is

χ = 2

[ n − E ] = ( 90 − 79.074) + (82 − 81.972) + ( 72 − 77.418) + ( 70 − 74.520) 2

i

2

2

2

2

i

Ei

79.074

81.972

77.418

+

74.520

( 51 − 64.170) + (18 − 17.802) + ( 31 − 19.044) 2 = 12.374 2

64.170

2

17.802

19.044

The rejection region requires α = .05 in the upper tail of the χ 2 distribution with df = k − 1 = 7 − 1 = 6 . 2

From Table IV, Appendix D, χ .05 = 12.5916 . The rejection region is χ 2 > 12.5916 .

(

)

Since the observed value of the test statistic does not fall in the rejection region χ 2 = 12.374 >/ 12.5916 , H0 is not rejected. There is insufficient evidence to indicate the number of overweight trucks per week is distributed over the 7 days of the week is not in direct proportion to the volume of truck traffic at α = .05 . 10.18

Some preliminary calculations are: E1 = np1,0 = 435 (.28 ) = 121.8

E 2 = np 2,0 = 435 (.04 ) = 17.4

E3 = np3,0 = 435 (.02 ) = 8.7

E 4 = np 4,0 = 435 (.66 ) = 287.1

To determine if the House of Representatives is not statistically representative of the religious affiliations of their constituents, we test:

H 0 : p1 = .28, p2 = .04, p3 = .02, and p4 = .66 H a : At lease one of the probabilities differs from the hypothesized value The test statistic is

χ = 2

( ni − Ei ) 2 = (117 − 121.8)2 + ( 61 − 17.4)2 + ( 30 − 8.7) 2 + ( 227 − 287.1) 2 = 174.169 Ei

121.8

17.4

8.7

Copyright © 2022 Pearson Education, Inc.

287.1


516

Chapter 10

Since no value of α was given, we will use α = .05 . The rejections region requires α = .05 in the upper tail 2 of the χ 2 distribution with df = k − 1 = 4 − 1 = 3 . From Table IV, Appendix D, χ .05 = 7.81473 . The rejection region is χ 2 > 7.81473 .

)

(

Since the test statistic falls in the rejection region χ 2 = 174.169 >/ 7.81473 , Ho is rejected. There is sufficient evidence to indicate the House of Representatives is not statistically representative of the religious affiliations of their constituents at α = .05 . 10.19

a.

df = ( r − 1)( c − 1) = ( 5 − 1)( 5 − 1) = 16 . From Table IV, Appendix D, χ .05 = 26.2962 . The rejection 2

region is χ 2 > 26.2962 . b.

df = ( r − 1)( c − 1) = ( 3 − 1)( 6 − 1) = 10 . From Table IV, Appendix D, χ .10 = 15.9871 . The rejection 2

region is χ 2 > 15.9871 . c.

df = ( r − 1)( c − 1) = ( 2 − 1)( 3 − 1) = 2 . From Table IV, Appendix D, χ .01 = 9.21034 . The rejection 2

region is χ 2 > 9.21034 . 10.20

a.

H0: H a:

The row and column classifications are independent The row and column classifications are dependent 2

b.

 nij − Eˆ ij   . The test statistic is χ =   Eˆ 2

ij

The rejection region requires α = .01 in the upper tail of the χ 2 distribution with df = ( r − 1)( c − 1) = ( 2 − 1)( 3 − 1) =2 . From Table IV, Appendix D, χ .01 = 9.21034 . The rejection 2

region is χ 2 > 9.21034 . c.

d.

The expected cell counts are: R C 96 ( 25) Eˆ11 = 1 1 = = 14.37 n 167

96 ( 64) RC Eˆ12 = 1 2 = = 36.79 n 167

R C 96 ( 78) Eˆ13 = 1 3 = = 44.84 n 167

71( 25) RC Eˆ 21 = 2 1 = = 10.63 n 167

71( 64 ) RC Eˆ 22 = 2 2 = = 27.21 n 167

71( 78) RC Eˆ 23 = 2 3 = = 33.16 n 167

The test statistic is 2

2 2 2  nij − Eˆij   = ( 9 − 14.37) + ( 34 − 36.79) + ( 53 − 44.84) χ = 14.37 36.79 44.84 Eˆ ij 2

+

(16 − 10.63)2 + ( 30 − 27.21) 2 + ( 25 − 33.16) 2 = 8.71 10.63

27.21

33.16

Since the observed value of the test statistic does not fall in the rejection region ( χ 2 = 8.71 >/ 9.21034) , H0 is not rejected. There is insufficient evidence to indicate the row and column classifications are dependent at α = .01 .

Copyright © 2022 Pearson Education, Inc.


Categorical Data Analysis

10.21

a.

To convert the frequencies to percentages, divide the numbers in each column by the column total and multiply by 100. Also, divide the row totals by the overall total and multiply by 100. The column totals are 25, 64, and 78, while the row totals are 96 and 71. The overall sample size is 165. The table of percentages are: Column 2

1

b.

517

3

Row 1

9 × 100 = 36% 25

34 × 100 = 53.1% 64

53 × 100 = 67.9% 78

96 × 100 = 57.5% 167

2

16 × 100 = 64% 25

30 × 100 = 46.9% 64

25 × 100 = 32.1% 78

71 × 100 = 42.5% 167

Using MINITAB, the graph is: 70 60

57.5

Percent

50 40 30 20 10 0

c.

10.22

1

2 Column

3

If the rows and columns are independent, the row percentages in each column would be close to the row total percentages. This pattern is not evident in the plot, implying the rows and columns are not independent. In Exercise 10.20, we did not have enough evidence to say the rows and columns were not independent. If the sample sizes were bigger, we would have been able to reject H0.

Some preliminary calculations are: R C 154 (134) = 47.007 Eˆ11 = 1 1 = 439 n

154 (163) Eˆ12 = = 57.180 439

154 (142) Eˆ13 = = 49.813 439

186 (134) Eˆ 21 = = 56.774 439

186 (163) Eˆ 22 = = 69.062 439

186 (142) Eˆ 23 = = 60.164 439

99 (134) Eˆ 31 = = 30.219 439

99 (163) Eˆ32 = = 36.759 439

99 (142) Eˆ 33 = = 32.023 439

To determine if the row and column classifications are dependent, we test: H0: The row and column classifications are independent Ha: The row and column classifications are dependent

Copyright © 2022 Pearson Education, Inc.


518

Chapter 10

The test statistic is 2

2 2 2 2  nij − Eˆij   = ( 40 − 47.007) + ( 72 − 57.180) + ( 42 − 49.813) + ( 63 − 56.774) χ = 47.007 57.180 49.813 56.774 Eˆ ij 2

+

( 53 − 69.062) 2 + ( 70 − 60.164) 2 + ( 31 − 30.219) 2 + ( 38 − 36.759) 2 + ( 30 − 32.023)2 = 12.36 69.062

60.164

30.219

32.023

36.759

The rejection region requires α = .05 in the upper tail of the χ 2 distribution with df = ( r − 1)( c − 1) = ( 3 − 1)( 3 − 1) = 4 . From Table IV, Appendix D, χ .05 = 9.48773 . The rejection region 2

is χ 2 > 9.48773 . Since the observed value of the test statistic falls in the rejection region ( χ 2 = 12.36 > 9.48773) , H0 is rejected. There is sufficient evidence to indicate the row and column classifications are dependent at α = .05 . a-b. To convert the frequencies to percentages, divide the numbers in each column by the column total and multiply by 100. Also, divide the row totals by the overall total and multiply by 100. B B2

B1

Totals

40 × 100 = 29.9% 134

72 × 100 = 44.2% 163

42 × 100 = 29.6% 142

154 × 100 = 35.1% 439

A2

63 × 100 = 47.0% 134

53 × 100 = 32.5% 163

70 × 100 = 49.3% 142

186 × 100 = 42.4% 439

A3

31 × 100 = 23.1% 134

38 × 100 = 23.3% 163

30 × 100 = 21.1% 142

99 × 100 = 22.6% 439

Row

c.

B3

A1

Using MINITAB, the graph of A1 is: 50

40 35.1 Percent

10.23

30

20

10

0

1

2 B

3

The graph supports the conclusion that the rows and columns are not independent. If they were, then the height of all the bars would be essentially the same. d.

Using MINITAB, the graph of A2 is: Copyright © 2022 Pearson Education, Inc.


Categorical Data Analysis

519

50 42.4

Percent

40

30

20

10

0

1

2 B

3

The graph supports the conclusion that the rows and columns are not independent. If they were, then the height of all the bars would be essentially the same. e.

Using MINITAB, the graph of A3 is:

25 22.6

Percent

20

15

10

5

0

1

2 B

3

The graph does not support the conclusion that the rows and columns are not independent. All the bars would be essentially the same. 10.24

a.

The two qualitative variables are Course Type (Elective, Required in a Group, or Core Required) and Digital Media Requirement (yes or no).

b.

To determine if whether a course incorporates digital media depends on course type, we test: 𝐻 : Digital Media Requirement and Course Type are independent 𝐻 : Digital Media Requirement and Course Type are dependent

c.

MINITAB was used to find the expected cell counts:

Copyright © 2022 Pearson Education, Inc.


520

Chapter 10 Rows: Digital Requirement Columns: Course Type Core Elective Group

All

Digtial

461 511.8

321 293.1

346 1128 323.1

Non-digital

1708 1657.2

921 948.9

1023 3652 1045.9

2169

1242

1369 4780

All

The expected cell counts are shown in the table below the actual counts. d.

The test statistic is 𝜒 = ∑ ∑ (

. ) .

+

(

. ) .

=

(

. ) .

+

(

. ) .

+

(

. ) .

+

(

. ) .

+

= 12.222

This agrees with the test statistic found on the XLSTAT printout. e.

The rejection region requires α = .05 in the upper tail of the χ 2 distribution with 𝑑𝑓 = (𝑟 − 1)(𝑐 − 1) = (2 − 1)(3 − 1) = 2. From Table IV, Appendix D, 𝜒. = 5.99147. The rejection region is 𝜒 > 5.99147. This is the same critical value found on the XLSTAT printout.

f.

Since the observed value of the test statistic falls in the rejection region (𝜒 = 12.222 > 5.99147), H0 is rejected. There is sufficient evidence to indicate that the digital media requirement depends on the course type at α = .05 . Since the p-value is less than 𝛼 (𝑝 = .0022 < .05), H0 is rejected. There is sufficient evidence to indicate the digital media requirement depends on the course type at α = .05 .

g.

Some preliminary calculations are: 𝑝̂ =

=

,

= .2544

𝑝̂ =

=

,

= .2125

For confidence coefficient .95, α = .05 and α / 2 = .05 / 2 = .025 . From Table II, Appendix D, z.025 = 1.96 . The 95% confidence interval is: (𝑝̂ − 𝑝̂ ) ± 𝑧.

+

⇒ (. 2544 − .2125) ± 1.96

(.

. ,

)

+

(.

.

)

,

⇒ .0418 ± .0296 ⇒ (. 0122, .0714)

Since the interval contains only positive numbers, the proportion of elective courses that require digital media exceeds the proportion of core courses that require digital media. 10.25

a.

Yes, it appears that the number of products purchased in the three categories differ in their percentages based on the type of ambient scent that was used. The percentages are insufficient to draw a conclusion because the sample sizes must be taken into account.

b.

To determine if a consumer’s choice of healthy, unhealthy, or neutral foods depends on the indulgent scent, we test: Copyright © 2022 Pearson Education, Inc.


Categorical Data Analysis

521

𝐻 : Consumer Choice and Indulgent Scent are independent 𝐻 : Consumer Choice and Indulgent Scent are dependent

10.26

c.

From the printout, the test statistic is 𝜒 = 23.804 and the p-value is 𝑝 < .0001.

d.

Since the p-value is less than 𝛼 (𝑝 < .0001 < .01), H0 is rejected. There is sufficient evidence to indicate that a consumer’s choice of healthy, unhealthy, or neutral foods depends on the indulgent scent 𝛼 = .01.

a.

To determine if Type of comment posted to a TripAdvisor depends on gender, we test:

H 0 : Type of comment and gender are independent H a : Type of comment and gender are dependent

10.27

b.

From the printout, the p-value is 𝑝 = .073. Since the p-value is less than 𝛼 (𝑝 = .073 < .10), H0 is rejected. There is sufficient evidence to indicate that Type of comment posted to a TripAdvisor depends on gender at α = .10 .

c.

From the printout, two of the expected cell counts are less than 5. If any of the expected cell counts are less than 5, the test statistic may not have a chi-square distribution. This would make the test invalid.

a.

To compare the two proportions, we could use either a test of hypothesis or a confidence interval. I will use a 95% confidence interval. Some preliminary calculations are: 𝑝̂

=

=

= .3896

𝑝̂

=

=

= .4413

For confidence coefficient .95, α = .05 and α / 2 = .05 / 2 = .025 . From Table II, Appendix D, z.025 = 1.96 . The 95% confidence interval is: (𝑝̂

− 𝑝̂ ) ± 𝑧.

⇒ (. 3896 − .4413) ± 1.96

+

.

(.

)

+

.

(.

)

⇒ −.0517 ± .1310 ⇒ (−.1827, .0793)

We are 95% confident that the difference in the proportions of male and female professionals who believe their salaries are too low is between −.1827 and .0793. Since 0 is in this interval, there is not enough evidence to indicate that the two proportions are different. b.

Some preliminary calculations are: 𝑝̂

=

=

= .4805

𝑝̂

=

=

= .4302

For confidence coefficient .95, α = .05 and α / 2 = .05 / 2 = .025 . From Table II, Appendix D, z.025 = 1.96 . The 95% confidence interval is: (𝑝̂

− 𝑝̂ ) ± 𝑧.

+

⇒ (. 4805 − .4302) ± 1.96

.

⇒ .0503 ± .1331 ⇒ (−.0827, .1834)

Copyright © 2022 Pearson Education, Inc.

(.

)

+

.

(.

)


522

Chapter 10

We are 95% confident that the difference in the proportions of male and female professionals who believe their salaries are equitable/fair is between -.0827 and .1834. Since 0 is in this interval, there is not enough evidence to indicate that the two proportions are different. c.

Some preliminary calculations are: 𝑝̂

=

=

= .1299

𝑝̂

=

=

= .1285

For confidence coefficient .95, α = .05 and α / 2 = .05 / 2 = .025 . From Table II, Appendix D, z.025 = 1.96 . The 95% confidence interval is: (𝑝̂

− 𝑝̂ ) ± 𝑧.

+

. 1299(. 8701) . 1285(. 8715) + 77 179 ⇒ .0014 ± .0897 ⇒ (−.0883, .0911)

⇒ (. 1299 − .1285) ± 1.96

We are 95% confident that the difference in the proportions of male and female professionals who believe they are well paid is between −.0883 and .0911. Since 0 is in this interval, there is not enough evidence to indicate that the two proportions are different. d.

Based on us finding no differences in any of the intervals created, we do not believe the opinion on the fairness of a travel profession’s salary differs for males and females.

e.

To determine if the opinion on the fairness of a travel professional’s salary differ for males and females, we test: H0: Opinion and Gender are independent Ha: Opinion and Gender are dependent MINITAB was used to create the following prinout: Chi-Square Test Chi-Square Pearson 0.646 Likelihood Ratio 0.647

DF P-Value 2 0.724 2 0.724

The test statistic is 𝜒 = .646 and the p-value is p = .724. Since the p-value is not less than 𝛼 (𝑝 = .724 ≮ . 10) H0 is not rejected. There is insufficient evidence to indicate that the opinions on the fairness of a travel professional’s salary differ for males and females at 𝛼 = .10. f.

For confidence coefficient .90, α = .10 and α / 2 = .10 / 2 = .05 . From Table II, Appendix D, z.05 = 1.645 . The 90% confidence interval is: (𝑝̂

− 𝑝̂ ) ± 𝑧.

+

⇒ (. 3896 − .4413) ± 1.645

.

⇒ −.0517 ± .1099 ⇒ (−.1616, .0582)

Copyright © 2022 Pearson Education, Inc.

(.

)

+

.

(.

)


Categorical Data Analysis

523

We are 90% confident that the difference in the proportions of male and female professionals who believe their salaries are too low is between -.1616 and .0582. Since 0 is in this interval, there is not enough evidence to indicate that the two proportions are different. 10.28

a.

Let p3 = proportion of the 3-photos per page group who selected the target mugshot, p6 = proportion of the 6-photos per page group who selected the target mugshot, and p12 = proportion of the 12-photos per page group who selected the target mugshot. pˆ 3 =

19 19 15 = .594 , pˆ 6 = = .594 , pˆ12 = = .469 32 32 32

The 12-photos per page group had the lowest proportion. b.

The contingency table is:

3-photos per page 3-photos per page 3-photos per page Total

c.

Target Mugshot selected 19 19 15 53

Target Mugshot not selected 13 13 17 43

Total

32 32 32 96

Some preliminary calculations are: R C 32 ( 53) R C 32 ( 53) Eˆ11 = 1 1 = = 17.667 Eˆ 21 = 2 1 = = 17.667 n 96 n 96

R C 32 ( 53) Eˆ 31 = 3 1 = = 17.667 n 96

32 ( 43) RC Eˆ12 = 1 2 = = 14.333 n 96

32 ( 43) RC Eˆ 32 = 3 2 = = 14.333 n 96

32 ( 43) RC Eˆ 22 = 2 2 = = 14.333 n 96

To determine if there are differences in the proportions who selected the target mugshot among the three photo groups, we test:

H 0 : Photo group and Mugshot selection are independent H a : Photo group and Mugshot selection are dependent The test statistic is: 2

2 2 2 2  nij − Eˆ ij   = (19 − 17.667 ) + (13 − 14.333) + (19 − 17.667 ) + (13 − 14.333) χ =   17.667 14.333 17.667 14.333 Eˆ ij 2

+

(15 − 17.667 ) 17.667

2

+

(17 − 14.333)

2

14.333

= 1.348

The rejection region requires α = .10 in the upper tail of the χ 2 distribution with df = ( r − 1)( c − 1) = ( 3 − 1)( 2 − 1) = 2 . From Table IV, Appendix D, χ .10 = 4.60517 . The rejection 2

region is χ 2 > 4.60517 .

Copyright © 2022 Pearson Education, Inc.


524

Chapter 10

Since the observed value of the test statistic does not fall in the rejection region ( χ 2 = 1.348 >/ 4.60517 ) , H0 is not rejected. There is insufficient evidence to indicate that there are differences in the proportions who selected the target mugshot among the three photo groups at α = .10 . 10.29

a.

To determine if package design and sound pitch combination influences the consumer’s opinion on product taste, we test: H0: Package design/sound pitch and taste are independent Ha: Package design/sound pitch and taste are dependent

b.

c.

42 ( 40) RC Eˆ11 = 1 1 = = 21 n 80

42 ( 40) RC Eˆ12 = 1 2 = = 21 n 80

38 ( 40) RC Eˆ 21 = 2 1 = = 19 n 80

38 ( 40) RC Eˆ 22 = 2 2 = = 19 n 80

The test statistic is 2

2 2 2 2  nij − Eˆij   = ( 35 − 21) + ( 7 − 21) + ( 5 − 19) + ( 33 − 19) = 39.30 χ =   21 21 19 19 Eˆij 2

d.

Since the p-value is so small ( p ≈ 0 ) , H0 is rejected. There is sufficient evidence to indicate that package design and sound pitch combination influences the consumer’s opinion on product taste for any reasonable value of α .

10.30

To determine if the state tax rate effects whether or not a smoker quits, we test: H0: State tax rate and Smoker Type are independent Ha: State tax rate and Smoker Type are dependent MINITAB was used to conduct the analysis and the following printouts were created: Rows: Smoker Columns: Tax Rate High

Low

All

Current

226 17 243 228.92 14.08

Former

457 25 482 454.08 27.92

All

683

42 725

Chi-Square Test Chi-Square Pearson 0.969 Likelihood Ratio 0.942

DF P-Value 1 0.325 1 0.332

Copyright © 2022 Pearson Education, Inc.


Categorical Data Analysis

525

The test statistic is 𝜒 = .969 and the p-value is p = .325. Since the p-value is not less than 𝛼 (𝑝 = .325 ≮ . 01) H0 is not rejected. There is insufficient evidence to indicate that the state tax rate effects whether or not a smoker quits at 𝛼 = .01. 10.31

To determine if the distribution of percentages in the three study area categories differs for males and females, we test: H0: Area of Study and Gender are independent Ha: Area of Study and Gender are dependent MINITAB was used to conduct the analysis and the following printouts were created: Rows: Study Columns: Gender Female

Male

All

English

1600 1300

1000 1300

2600

FL/A/H

850 700

550 700

1400

STEM

2550 3000

3450 3000

6000

All

5000

5000 10000

Chi-Square Test Chi-Square Pearson 337.747 Likelihood Ratio 340.015

DF P-Value 2 0.000 2 0.000

The test statistic is 𝜒 = 337.747 and the p-value is p = .000. Since the p-value is less than 𝛼 (𝑝 = .000 < .10) H0 is rejected. There is sufficient evidence to indicate the distribution of percentages in the three study area categories differs for males and females at 𝛼 = .10. 10.32

Some preliminary calculations are: 397 ( 388) RC Eˆ11 = 1 1 = = 344.600 n 447

397 ( 59) RC Eˆ12 = 1 2 = = 52.400 n 447

50 ( 388) RC Eˆ 21 = 2 1 = = 43.400 n 447

50 ( 59) RC Eˆ 22 = 2 2 = = 6.600 n 447

To determine if an NAWIC member’s satisfaction with life as an employee and their satisfaction with job challenge are related, we test: H0: Satisfaction with life as an employee and satisfaction with job challenge are independent Ha: Satisfaction with life as an employee and satisfaction with job challenge are dependent The test statistic is Copyright © 2022 Pearson Education, Inc.


526

Chapter 10 2

2 2 2 2  nij − Eˆ ij  364 − 344.6) ( 33 − 52.4) ( 24 − 43.4) ( 26 − 6.6) (   χ =  = + + + = 73.98 344.6 52.4 43.4 6.6 Eˆ ij 2

Since no significance level was given, we will use α = .05 . The rejection region requires α = .05 in the upper tail of the χ 2 distribution with df = ( r − 1)( c − 1) = ( 2 − 1)( 2 − 1) = 1 . From Table IV, Appendix D, 2 χ .05 = 3.84146 . The rejection region is χ 2 > 3.84146 .

Since the observed value of the test statistic falls in the rejection region ( χ 2 = 73.97 > 3.84146 ) , H0 is rejected. There is sufficient evidence to indicate an NAWIC member’s satisfaction with life as an employee and their satisfaction with job challenge are related at α = .05 .

10.33

Using MINITAB, the contingency table analysis is: Tabulated statistics: Position, Nationality Using frequencies in Fr Rows: Position Columns: Nationality

1 2 3 4 All

1

2

3

4

All

126 72 30 372 600

75 36 9 180 300

35 10 4 51 100

93 27 6 174 300

329 145 49 777 1300

Cell Contents:

Count

Pearson Chi-Square = 21.242, DF = 9, P-Value = 0.012 Likelihood Ratio Chi-Square = 21.327, DF = 9, P-Value = 0.011

To determine if a firm’s position on off-shoring depends on the firm’s nationality, we test:

H 0 : Position and Nationality are independent H a : Position and Nationality are dependent From the printout, the test statistic is χ 2 = 21.242 and the p-value is p = .012 . Since the p-value is less than α ( p = .012 < .05 ) , H0 is rejected. There is sufficient evidence to indicate a firm’s position on offshoring depends on the firm’s nationality at α = .05 .

10.34

a.

The 2x2 contingency table is shown here:

Decision 1-Point 2-Point Total

Outcome Successful Unsuccessful 3,453 224 141 159 3,594 383

Total 3,677 300

Copyright © 2022 Pearson Education, Inc.


Categorical Data Analysis

b.

527

To determine if the outcome of the extra point try depends on whether a team goes for 1 or 2 points, we test: H0: Decision and Outcome are independent Ha: Decision and Outcome are dependent MINITAB was used to conduct the Fisher’s Exact test: Fisher’s Exact Test P-Value 0.0000000

Since the p-value is less than 𝛼 (𝑝 = .000 < .05), H0 is rejected. There is sufficient evidence to indicate the outcome of the extra point try depends on whether a team goes for 1 or 2 points at α = .05 . 10.35

Some preliminary calculations are: R C 396 ( 335) R C 311( 335) Eˆ11 = 1 1 = = 154.435 Eˆ 21 = 2 1 = = 121.286 n 859 n 859

R C 70 ( 335) Eˆ 31 = 3 1 = = 27.299 859 n

R C 39 ( 335) Eˆ 41 = 4 1 = = 15.210 n 859

R C 18 ( 335) Eˆ51 = 5 1 = = 7.020 859 n

R C 25 ( 335) Eˆ 61 = 6 1 = = 9.750 859 n

R C 396 ( 524) Eˆ12 = 1 2 = = 241.565 859 n

R C 311( 524) Eˆ 22 = 2 2 = = 189.714 859 n

R C 70 ( 524) Eˆ 32 = 3 2 = = 42.701 n 859

R C 39 ( 524) Eˆ 42 = 4 2 = = 23.790 859 n

R C 18 ( 524) Eˆ 52 = 5 2 = = 10.980 n 859

R C 25 ( 524) Eˆ 62 = 6 2 = = 15.250 n 859

To determine if the proportions of mobile device users in the six texting style categories depend on whether a male or female are texting, we test: H 0 : Texting style and sex are independent H a : Texting style and sex are dependent

The test statistic is: 2

2 2 2  nij − Eˆ ij   = (161 − 154.435 ) + ( 235 − 241.565 ) +  + (14 − 15.250 ) = 4.209 χ =   154.435 241.565 15.250 Eˆ 2

ij

The rejection region requires α = .10 in the upper tail of the χ 2 distribution with 2 d f = ( r − 1 )( c − 1 ) = ( 6 − 1 )( 2 − 1 ) = 5 . From Table IV, Appendix D, χ .10 = 9.23635 . The rejection region is χ 2 > 9.23635 . Since the observed value of the test statistic does not fall in the rejection region (𝜒 = 4.209 ≯ 9.23635), H0 is not rejected. There is insufficient evidence to indicate the proportions of mobile device users in the six texting style categories depend on whether a male or female are texting at 𝛼 = .10.

Copyright © 2022 Pearson Education, Inc.


528

10.36

Chapter 10

Some preliminary calculations are:

R C 547 ( 434) Eˆ11 = 1 1 = = 405.116 586 n

547 (152) RC Eˆ12 = 1 2 = = 141.884 586 n

R C 39 ( 434) Eˆ 21 = 2 1 = = 28.884 586 n

39 (152) RC Eˆ 22 = 2 2 = = 10.116 586 n

To determine if the proportion of firefighters who wear a poorly fitting glove differs for males and females, we test: H0: Gender and glove fitting are independent Ha: Gender and glove fitting are dependent The test statistic is 2

nij − Eˆij  ( 415 − 405.116) 2 (132 −141.884) 2 (19 − 28.884) 2 ( 20 −10.116) 2  = 2 χ =   + + + = 13.97 405.116 141.884 28.884 10.116 Eˆ ij

The rejection region requires α = .01in the upper tail of the χ 2 distribution with 2 d f = ( r − 1 )( c − 1 ) = ( 2 − 1 )( 2 − 1 ) = 1 . From Table IV, Appendix D, χ .01 = 6.63490 . The rejection region is χ 2 > 6.63490 . Since the observed value of the test statistic falls in the rejection region (𝜒 = 13.97 > 6.63490), H0 is rejected. There is sufficient evidence to the proportion of firefighters who wear a poorly fitting glove differs for males and females at 𝛼 = .01. 10.37

Using MINITAB, the results are: Rows: Instruction Columns: Strategy

Cue Pattern All

Guess

Other

5 9 14

6 11 17

TTBC All 13 4 17

24 24 48

Chi-Square Test Chi-Square Pearson 7.378 Likelihood Ratio 7.668

DF P-Value 2 0.025 2 0.022

To determine if the choice of heuristic strategy depends on type of instruction, we test: H0: Heuristic strategy and type of instruction are independent Ha: Heuristic strategy and type of instruction are dependent From the printout, the test statistic is χ 2 = 7.378 and the p-value is p = .025 . Since the p-value is less than 𝛼 (𝑝 = .025 < .05), H0 is rejected. There is sufficient evidence to indicate the choice of heuristic strategy depends on type of instruction at 𝛼 = .05. Copyright © 2022 Pearson Education, Inc.


Categorical Data Analysis

529

Since the p-value is not less than 𝛼 (𝑝 = .025 ≮ . 01), H0 is not rejected. There is insufficient evidence to indicate the choice of heuristic strategy depends on type of instruction at 𝛼 = .01. 10.38

Using MINITAB, the results of the table comparing type of coupon user and gender are: Rows: USER Columns: GENDER Female both mail net All

Male

104 178 36 318

All

31 135 84 262 7 43 122 440

Chi-Square Test Chi-Square Pearson 6.797 Likelihood Ratio 7.105

DF P-Value 2 0.033 2 0.029

To determine if type of coupon user depends on gender, we test: H 0 : Type of coupon user and gender are independent H a : Type of coupon user and gender are dependent

The test statistic is χ 2 = 6.797 and the p-value is p = .033 . Since the p-value is not less than 𝛼 (𝑝 = .033 ≮ . 01), H0 is not rejected. There is insufficient evidence to indicate type of coupon user depends on gender at 𝛼 = .01. Using MINITAB, the results of the table comparing type of coupon user and coupon usage satisfaction level are: Rows: USER Columns: SATISF No both mail net All

3 28 4 35

Some Yes

All

9 123 135 62 172 262 9 30 43 80 325 440

Chi-Square Test Chi-Square Pearson 30.418 Likelihood Ratio 34.934

DF P-Value 4 0.000 4 0.000

To determine if type of coupon user depends on coupon usage satisfaction level, we test: H 0 : Type of coupon user and coupon usage satisfaction level are independent H a : Type of coupon user and coupon usage satisfaction level are dependent

Copyright © 2022 Pearson Education, Inc.


530

Chapter 10

The test statistic is χ 2 = 30.418 and the p-value is p = .000 . Since the p-value is less than 𝛼 (𝑝 = .000 < .01), H0 is rejected. There is sufficient evidence to indicate type of coupon user depends on coupon usage satisfaction level at 𝛼 = .01. 10.39

a.

Using MINITAB, the results for the First Trial are: Rows: CONDITION Columns: SWITCH No Yes Empty Steroids Steroids2 Vanish All

17 22 19 24 82

All

10 27 5 27 8 27 3 27 26 108

Chi-Square Test Chi-Square Pearson 5.876 Likelihood Ratio 6.096

DF P-Value 3 0.118 3 0.107

To determine if the likelihood of switching boxes depends on condition for the first trial, we test: H0: Likelihood of switching boxes and condition are independent Ha: Likelihood of switching boxes and condition are dependent From the printout above, the test statistic is χ 2 = 5.876 and the p-value is p = 0.118 . Since the pvalue is not small, Ho is not rejected. There is insufficient evidence to indicate that the likelihood of switching boxes depends on condition for the first trial for any value of 𝛼 < .118. Using MINITAB, the results for the Last Trial are: Rows: CONDITION Columns: SWITCH No Yes Empty Steroids Steroids2 Vanish All

4 6 8 15 33

All

23 27 21 27 19 27 12 27 75 108

Chi-Square Test Chi-Square Pearson 12.000 Likelihood Ratio 11.780

DF P-Value 3 0.007 3 0.008

To determine if the likelihood of switching boxes depends on condition for the last trial, we test: H0: Likelihood of switching boxes and condition are independent Ha: Likelihood of switching boxes and condition are dependent

Copyright © 2022 Pearson Education, Inc.


Categorical Data Analysis

531

From the printout above, the test statistic is χ 2 = 12.00 and the p-value is p = 0.007 . Since the pvalue is small, H0 is rejected. There is sufficient evidence to indicate that the likelihood of switching boxes depends on condition for the last trial for any value of 𝛼 > .007. b.

Using MINITAB, the results from the Empty condition are: Rows: TRIAL Columns: SWITCH No Yes All First Last All

17 4 21

10 23 33

27 27 54

Chi-Square Test Chi-Square Pearson 13.169 Likelihood Ratio 13.924

DF P-Value 1 0.000 1 0.000

To determine if the likelihood of switching boxes depends on trial number for the Empty condition, we test: H0: Likelihood of switching boxes and trial number are independent Ha: Likelihood of switching boxes and trial number are dependent From the printout above, the test statistic is χ 2 = 13.169 and the p-value is p = 0.000 . Since the pvalue is so small, H0 is rejected. There is sufficient evidence to indicate that the likelihood of switching boxes depends on trial number for the Empty condition for any value of 𝛼 > .000. Using MINITAB, the results from the Vanish condition are: Rows: TRIAL Columns: SWITCH No Yes All First Last All

24 15 39

3 12 15

27 27 54

Chi-Square Test Chi-Square Pearson 7.477 Likelihood Ratio 7.878

DF P-Value 1 0.006 1 0.005

To determine if the likelihood of switching boxes depends on trial number for the Vanish condition, we test: H0: Likelihood of switching boxes and trial number are independent Ha: Likelihood of switching boxes and trial number are dependent From the printout above, the test statistic is χ 2 = 7.477 and the p-value is p = 0.006 . Since the pvalue is so small, Ho is rejected. There is sufficient evidence to indicate that the likelihood of switching boxes depends on trial number for the Vanish condition for any value of 𝛼 > .006. Copyright © 2022 Pearson Education, Inc.


532

Chapter 10

Using MINITAB, the results from the Steroids condition are: Rows: TRIAL Columns: SWITCH No Yes All First Last All

22 6 28

5 21 26

27 27 54

Chi-Square Test Chi-Square Pearson 18.989 Likelihood Ratio 20.307

DF P-Value 1 0.000 1 0.000

To determine if the likelihood of switching boxes depends on trial number for the Steroids condition, we test: H0: Likelihood of switching boxes and trial number are independent Ha: Likelihood of switching boxes and trial number are dependent From the printout above, the test statistic is χ 2 = 18.989 and the p-value is p = .000 . Since the p-value is so small, H0 is rejected. There is sufficient evidence to indicate that the likelihood of switching boxes depends on trial number for the Steroids condition for any value of 𝛼 > .000. Using MINITAB, the results from the Steroids2 condition are: Rows: TRIAL Columns: SWITCH No Yes All First Last All

19 8 27

8 19 27

27 27 54

Chi-Square Test Chi-Square Pearson 8.963 Likelihood Ratio 9.229

DF P-Value 1 0.003 1 0.002

To determine if the likelihood of switching boxes depends on trial number for the Steroids2 condition, we test: H0: Likelihood of switching boxes and trial number are independent Ha: Likelihood of switching boxes and trial number are dependent From the printout above, the test statistic is χ 2 = 8.963 and the p-value is p = .003 . Since the p-value is so small, H0 is rejected. There is sufficient evidence to indicate that the likelihood of switching boxes depends on trial number for the Steroids2 condition for any value of 𝛼 > .003.

Copyright © 2022 Pearson Education, Inc.


Categorical Data Analysis

10.40

533

c.

Of all the tests performed, only one was not significant. There was no evidence that the likelihood of switching boxes depended on condition for the first trial. All other tests indicated that the variables were dependent. Thus, both condition and trial number influence a subject to switch.

a.

Some preliminary calculations are:

50 ( 50 ) Eˆ11 = = 10 250

50 ( 90 ) Eˆ12 = = 18 250

50 (110) Eˆ13 = = 22 250

100 ( 50) Eˆ 21 = = 20 250

100 ( 90) Eˆ 22 = = 36 250

100 (110) Eˆ 23 = = 44 250

100 ( 50) Eˆ31 = = 20 250

100 ( 90) Eˆ32 = = 36 250

100 (110) Eˆ33 = = 44 250

To determine if the rows and columns are dependent, we test: H0: H a:

Rows and columns are independent Rows and columns are dependent 2

2 2  nij − Eˆ ij   = ( 20 − 10 ) +  + ( 30 − 44 ) = 54.14 10 44 Eˆ ij

The test statistic is χ 2 =   

The rejection region requires α = .05 in the upper tail of the χ 2 distribution with 2 d f = ( r − 1 )( c − 1 ) = ( 3 − 1 )( 3 − 1 ) = 4 . From Table IV, Appendix D, χ .05 = 9.48773 . The rejection region is χ 2 > 9.48773 . Since the observed value of the test statistic falls in the rejection region (𝜒 = 54.15 > 9.48773), H0 is rejected. here is sufficient evidence to indicate the rows and columns are dependent at 𝛼 = .05. b.

No, the analysis remains identical.

c.

Yes, the assumptions differ. If the row and column totals are not fixed, then we assume that we take a random sample form a multinomial distribution. If the row totals are fixed, then we assume that we are taking k random samples from k multinomial populations.

d.

The percentages are in the table below. Column 2

1

Row

e.

3

Totals

1 20 ×100% = 40% 50

20 ×100% = 22.2% 90

10 ×100% = 9.1% 110

50 ×100% = 20% 250

2 10 ×100% = 20% 50

20 ×100% = 22.2% 90

70 × 100% = 63.6% 110

100 ×100% = 40% 250

3 20 ×100% = 40% 50

50 ×100% = 55.6% 90

30 × 100% = 27.3% 110

100 ×100% = 40% 250

Using MINITAB, the bar graph is: Copyright © 2022 Pearson Education, Inc.


534

Chapter 10

40

Percent

30

20

20

10

0

1

2 Column

3

The graph supports the decision in part a. In part a, we rejected the null hypothesis and concluded that the rows and columns were dependent. If they were independent, then we would expect the three bars to be the same height. In this graph, they are not the same height. 10.41

a.

If all the categories are equally likely, then 𝑝 , = 𝑝 , = 𝑝 , = 𝑝 , = 𝑝 , = .2. E1 = E 2 = E 3 = E 4 = E 5 = np i ,0 = 150 ( .20 ) = 30

To determine if the categories are not equally likely, we test: H 0 : p1 = p 2 = p3 = p 4 = p5 = .2 H a : At lease one of the probabilities differs from .2 2 The test statistic is χ = 

[ n − E ] = ( 28 − 30) + ( 35 − 30) + ( 33 − 30) + ( 25 − 30) = 2.133 2

i

2

2

2

2

i

Ei

30

30

30

30

The rejection region requires 𝛼 = .10 in the upper tail of the χ 2 distribution with 2 df = k − 1 = 5 − 1 = 4 . From Table IV, Appendix D, χ .10 = 7.77944 . The rejection region is 2 . χ > 7.77944 Since the observed value of the test statistic does not fall in the rejection region (𝜒 = 2.133 ≯ 7.77944), H0 is not rejected. There is insufficient evidence to indicate the categories are not equally likely at 𝛼 = .10. b.

pˆ 2 =

35 = .233 150

For confidence coefficient .90, α = .10 and α / 2 = .10 / 2 = .05 . From Table II, Appendix D, z.05 = 1.645 . The confidence interval is:

pˆ 2 ± z.05 10.42

a.

.233(.767) pˆ 2 qˆ2  .233 ± 1.645  .233 ± .057  (.176, .290) n2 150

The categorical variable is the rating of the student exposure to social and environmental issues. It has 5 levels: 1-star, 2-stars, 3-stars, 4-stars, and 5-stars. Copyright © 2022 Pearson Education, Inc.


Categorical Data Analysis

535

b.

If there were no difference in the category proportions, then each proportion should be pi = 1 / 5 = .20 . There were a total of n = 30 business schools sampled. The expected number would be: 𝐸 = 𝐸 = 𝐸 = 𝐸 = 𝐸 = 𝑛𝑝 , = 30(. 20) = 6

c.

To determine if there are differences in the star rating category proportions of all MBA programs, we test: H 0 : p1 = p 2 = p3 = p 4 = p5 = .20 H a : At lease one of the probabilities differs from the hypothesized value

d.

2 The test statistic is χ = 

[ ni − Ei ] = ( 2 − 6)2 + ( 9 − 6)2 + (14 − 6)2 + ( 5 − 6)2 + ( 0 − 6)2 = 21 2

Ei

6

6

6

6

6

e.

The rejection region requires 𝛼 = .05 in the upper tail of the χ 2 distribution with 2 df = k − 1 = 5 − 1 = 4 . From Table IV, Appendix D, χ .05 = 9.48773 . The rejection region is 2 χ > 9.48773 .

f.

Since the observed value of the test statistic falls in the rejection region (𝜒 = 21 > 9.48773), H0 is rejected. There is sufficient evidence to indicate differences in the star rating category proportions of all MBA programs at 𝛼 = .05.

g.

Some preliminary calculations are: pˆ 3 =

x3 14 = = .467 n 30

For confidence coefficient .95, α = .05 and α / 2 = .05 / 2 = .025 . From Table II, Appendix D, z.025 = 1.96 . The 95% confidence interval is: pˆ 3 ± z.025

.467 (.533 ) pˆ 3 qˆ 3  .467 ± 1.96  .467 ± .179  (.288, .646 ) n 30

We are 95% confident that the proportion of all MBA programs that are ranked in the 3-star category is between .288 and .646. 10.43

a.

Since there are 5 groups, we would expect 20% or 2,924(. 20) = 584.8 givers in each of the donation amount categories.

b.

The null hypothesis for testing whether the true proportions of charitable givers in each donation amount group are the same is: 𝐻 : 𝑝 = 𝑝 = ⋯ = 𝑝 = .20

c.

Some preliminary calculations are: 𝐸 = 𝐸 = ⋯ = 𝐸5 = 𝑛𝑝 , = 2,924(.2) = 584.8 𝜒 =

d.

𝑛 −𝐸 𝐸

=

(512 − 584.8) (905 − 584.8) (573 − 584.8) + +. . . + = 239.01 584.8 584.8 584.8

The rejection region requires 𝛼 = .10 in the upper tail of the χ 2 distribution with 𝑑𝑓 = 𝑘 − 1 = 5 − 1 = 4. From Table IV, Appendix D, 𝜒. = 7.77944. The rejection region is 𝜒 > 7.77944.

Copyright © 2022 Pearson Education, Inc.


536

10.44

Chapter 10

e.

Since the observed value of the test statistic falls in the rejection region (𝜒 = 239.01 > 7.77944), H0 is rejected. There is sufficient evidence to indicate that the true proportions of charitable givers in each donation amount group are not all the same at 𝛼 = .10.

a.

The contingency table would be: Donation Change

Research Charity Yes No 405 63 2,110 346 2,515 409

Yes No Total

b.

c.

𝐸

=

=

𝐸

=

=

( , , ( ,

, ,

)

= 402.5 )

= 2,112.5

𝐸

= 𝐸

= =

Total 468 2,456 2,924

(

)

,

=

= 65.5 (

, ,

)

= 343.5

MINITAB produced the following results: Chi-Square Test Pearson Likelihood Ratio

Chi-Square 0.128 0.129

DF 1 1

P-Value 0.720 0.719

The Chi-square test statistic is .128 d.

To determine if change due to taxes and research charity are related for Florida charitable givers, we test: H0: Change due to taxes and research charity are independent Ha: Change due to taxes and research charity are dependent Since the p-value is not smaller than 𝛼 (𝑝 = .720 ≮ . 05), H0 is not rejected. There is insufficient evidence to indicate that change due to taxes and research charity are related for Florida charitable givers at 𝛼 = .05.

10.45

a.

The sample proportion of negative tone news stories that are deceptive is 111/170 = .653.

b.

The sample proportion of neutral tone news stories that are deceptive is 61/110 = .555.

c.

The sample proportion of positive tone news stories that are deceptive is 11/31 = .355.

d.

Yes, it appears that the proportion of news stories that are deceptive depends on the story tone. The proportion that is deceptive for negative tone stories is .653, while the proportion that is deceptive for positive tone stories is only .355. These proportions look much different.

e.

To determine if the authenticity of a news story depends on tone, we test: H 0 : Authenticity and tone are independent H a : Authenticity and tone are dependent

f.

MINITAB was used to conduct the desired test: Copyright © 2022 Pearson Education, Inc.


Categorical Data Analysis

537

Chi-Square Test Chi-Square Pearson 10.427 Likelihood Ratio 10.348

DF P-Value 2 0.005 2 0.006

The test statistic is χ 2 = 10.427 and the p-value is p = .005 . Since the p-value is less than α ( p = .005 < .05 ) , Ho is rejected. There is sufficient evidence to indicate authenticity of a news story depends on tone at 𝛼 = .05. 10.46

a.

b.

Some preliminary calculations are: E 1 = n p1, 0 = 4 0 0 (.30 ) = 1 2 0

E 2 = np 2 ,0 = 400 ( .20 ) = 80

E 3 = np 3 , 0 = 400 (.20 ) = 80

E 4 = np 4 ,0 = 400 ( .10 ) = 40

E 5 = np 5 , 0 = 400 (.10 ) = 40

E 6 = np 6 , 0 = 400 (.10 ) = 40

The test statistic is

χ2 = 

[ n − E ] = (100 − 120) + ( 75 − 80) + (85 − 80) + ( 50 − 40) 2

i

2

2

Ei

120

80

2

+

80

40

( 40 − 40) + ( 50 − 40) = 8.958 2

c.

2

i

40

2

40

To determine if the true percentages of the colors produced differ from the manufacturer’s stated percentages, we test: H 0 : p1 = .30, p 2 = .20, p3 = .20, p 4 = .10, p5 = .10, and p 6 = .10 H a : At least one of the probabilities differs from the hypothesized value

The test statistic is χ 2 = 8.958 . The rejection region requires 𝛼 = .05 in the upper tail of the χ 2 distribution with 2 df = k − 1 = 6 − 1 = 5 . From Table IV, Appendix D, χ .05 = 11.0705 . The rejection region is 2 . χ > 11.0705 Since the observed value of the test statistic does not fall in the rejection region (𝜒 = 8.958 ≯ 11.0705), H0 is not rejected. There is insufficient evidence to indicate the true percentages of the colors produced differ from the manufacturer’s stated percentages at 𝛼 = .05. 10.47

a.

To determine if the data disagree with the percentages reported by Smart Insights, we test: 𝐻 : 𝑝 = .73, 𝑝 = .10, 𝑝 = .02, 𝑝 = .12, 𝑝 = .03 𝐻 :At lease one of the probabilities differs from the hypothesized values

MINITAB conducted the test and found the following results: Chi-Square Test

Copyright © 2022 Pearson Education, Inc.


538

Chapter 10 N DF Chi-Sq 1000 4 6.03699

P-Value 0.196

The test statistic is 𝜒 = 6.037. The rejection region requires 𝛼 = .05 in the upper tail of the χ 2 distribution with 𝑑𝑓 = 𝑘 − 1 = 5 − 1 = 4. From Table IV, Appendix D, 𝜒. = 9.48773. The rejection region is 𝜒 > 9.48773. Since the observed value of the test statistic does not fall in the rejection region (𝜒 = 6.037 ≯ 9.48773), H0 is not rejected. There is insufficient evidence to indicate the data disagree with the percentages reported by Smart Insights at 𝛼 = .05. b.

Some preliminary calculations are: 𝑝̂ =

=

= .72

For confidence coefficient .95, α = .05 and α / 2 = .05 / 2 = .025 . From Table II, Appendix D, z.025 = 1.96 . The 95% confidence interval is: 𝑝̂ ± 𝑧.

⇒ .72 ± 1.96

.

(. ,

)

⇒ .72 ± .028 ⇒ (. 692 .748)

We are 95% confident that the proportion of all internet searches that use the Google search engine is between .692 and .748. Expressing the confidence interval using percentages is (69.2%, 74.8%). 10.48

Some preliminary calculations are:

R C 234 ( 40) Eˆ11 = 1 1 = = 21.419 437 n

R C 234 ( 397) Eˆ12 = 1 2 = = 212.581 437 n

R C 203 ( 40) Eˆ 21 = 2 1 = = 18.581 437 n

R C 203( 397) Eˆ 22 = 2 2 = = 184.419 437 n

To determine if the response rate of air traffic controllers to mid-air collision alarms differs for true and false alerts, we test: H 0 : Responses and alerts are independent H a : Responses and alerts are dependent

The test statistic is: 2

2 2 2 2  nij − Eˆ ij   = ( 3 − 21.419 ) + ( 231 − 212.581) + ( 37 − 18.581) + (166 − 184.419 ) = 37.533 χ =   21.419 212.581 18.581 184.419 Eˆ 2

ij

The rejection region requires 𝛼 = .05 in the upper tail of the χ 2 distribution with 2 df = ( r − 1)(c − 1) = (2 − 1)(2 − 1) = 1 . From Table IV, Appendix D, χ .05 = 3.84146 . The rejection region is 2 χ > 3.84146 .

Copyright © 2022 Pearson Education, Inc.


Categorical Data Analysis

539

Since the observed value of the test statistic falls in the rejection region (𝜒 = 37.533 > 3.84146), H0 is rejected. There is sufficient evidence to indicate the response rate of air traffic controllers to mid-air collision alarms differs for true and false alerts at 𝛼 = .05. 10.49

Some preliminary calculations are: E1 = E 2 = E 3 = E 4 = np1,0 = 83(.25) = 20.75 To determine if there are differences in the percentages of incidents in the four cause categories, we test: H 0 : p1 = p 2 = p3 = p 4 = .25 H a : At lease one of the probabilities differs from its hypothesized value

The test statistic is 𝜒 =∑

=

(

.

)

+

.

(

. .

)

+

(

. .

)

+

(

. .

)

= 8.036

The rejection region requires 𝛼 = .05 in the upper tail of the χ 2 distribution with df = k − 1 = 4 − 1 = 3 . 2 From Table IV, Appendix D, χ .05 = 7.81473 . The rejection region is χ 2 > 7.81473 . Since the observed value of the test statistic falls in the rejection region (𝜒 = 8.036 > 7.81473), H0 is rejected. There is sufficient evidence to indicate there are differences in the percentages of incidents in the four cause categories at 𝛼 = .05. 10.50

a.

Yes, it appears that the male and female tourists differ in their responses to purchasing photographs, postcards, and paintings. The values in the ‘Always’ and ‘Rarely or Never’ categories are quite different. The percentages are insufficient to draw a conclusion because the sample sizes must be taken into account.

b.

The counts are found by changing the percentages to proportions and multiplying the proportions by the sample sizes in each gender. The counts are:

Always Often Occasionally Rarely or Never Total

c.

Male Tourist Female Tourist 240 476 405 527 525 493 330 204 1500 1700

Total 716 932 1018 534 3200

To determine whether male and female tourists differ in their responses to purchasing photographs, postcards, or paintings, we test: H 0 : Gender and purchasing are independent H a : Gender and purchasing are dependent

d.

Using MINITAB to conduct the analysis, the test statistic is χ 2 = 112.433 and the p-value is p = .000 . Since the p-value is less than 𝛼 (𝑝 = .000 < .01), H0 is rejected. There is sufficient evidence to indicate male and female tourists differ in their responses to purchasing photographs, postcards, or paintings at 𝛼 = .01.

10.51

Some preliminary calculations are:

Copyright © 2022 Pearson Education, Inc.


540

Chapter 10

R C 32 ( 32) Eˆ11 = 1 1 = = 10.667 96 n

R C 32 ( 32) Eˆ 21 = 2 1 = = 10.667 96 n

R C 32 ( 32) Eˆ31 = 3 1 = = 10.667 n 96

R C 32 ( 64) Eˆ12 = 1 2 = = 21.333 96 n

32 ( 64) RC Eˆ 22 = 2 2 = = 21.333 96 n

32 ( 64) RC Eˆ 32 = 3 2 = = 21.333 n 96

To determine if the proportion of subjects who selected menus consistent with the theory depends on goal condition, we test: H0: Goal condition and Consistent with theory are independent Ha: Goal condition and Consistent with theory are dependent The test statistic is 𝜒 =

(15 − 10.667) (17 − 21.333) (14 − 10.667) (18 − 21.333) = + + + 10.667 21.333 10.667 21.333 𝐸 (3 − 10.667) (29 − 21.333) + + = 12.469 10.667 21.333 𝑛 −𝐸

The rejection region requires 𝛼 = .01 in the upper tail of the χ 2 distribution with 𝑑𝑓 = (𝑟 − 1)(𝑐 − 1) = 2 (3 − 1)(2 − 1) = 2. From Table IV, Appendix D, χ .01 = 9.21034 . The rejection region is χ 2 > 9.21034 . Since the observed value of the test statistic falls in the rejection region (𝜒 = 12.469 > 9.21034), H0 is rejected. There is sufficient evidence to indicate that the proportion of subjects who selected menus consistent with the theory depends on goal condition at 𝛼 = .01. 10.52

Some preliminary calculations are: E1 = np1,0 = 943 ( .51) = 480.93

E 3 = np 3,0 = 943 ( .09 ) = 84.87

E 2 = n p 2 ,0 = 9 4 3 ( .3 7 ) = 3 4 8 .9 1

E 4 = np 4 ,0 = 943 ( .03 ) = 28.29

To determine if the data from the independent survey contradict the percentages reported by the CPS Cell Phone Supplement, we test: H 0 : p1 = .51, p 2 = .37, p3 = .09 and p 4 = .03 H a : At least one of the probabilities differs from the hypothesized value

The test statistic is

χ = 2

( ni − Ei ) 2 = ( 473 − 480.93)2 + ( 334 − 348.91) 2 + (106 − 84.87)2 + ( 30 − 29.29) 2 = 6.132 Ei

480.93

348.91

84.87

29.29

The rejections region requires α = .10 in the upper tail of the χ 2 distribution with df = k − 1 = 4 − 1 = 3 . From Table IV, Appendix D, 𝜒.

= 6.25139. The rejection region is χ 2 > 6.25139 .

Since the test statistic does not fall in the rejection region (𝜒 = 6.132 ≯ 6.25139), H0 is not rejected. There is insufficient evidence to indicate the data from the independent survey contradict the percentages reported by the CPS Cell Phone Supplement at 𝛼 = .10. 10.53

Some preliminary calculations are: Copyright © 2022 Pearson Education, Inc.


Categorical Data Analysis

R C 57 ( 60) Eˆ11 = 1 1 = = 20 n 171

R C 58 ( 60) Eˆ 21 = 2 1 = = 20.35 171 n

R C 56 ( 60) Eˆ31 = 3 1 = = 19.65 n 171

57 (111) RC Eˆ12 = 1 2 = = 37 171 n

58 (111) RC Eˆ 22 = 2 2 = = 37.65 171 n

56 (111) RC Eˆ32 = 3 2 = = 36.35 n 171

541

To determine if the option choice depends on emotion state, we test: H0: Option choice and emotion state are independent Ha: Option choice and emotion state are dependent The test statistic is 2

2 2 2 2  nij − Eˆ ij   = ( 45 − 20 ) + (12 − 37 ) + ( 8 − 20.35 ) + ( 50 − 37.65 ) χ =   20 37 20.35 37.65 Eˆ 2

ij

2 2 75 − 19.65 ) 49 − 36.35 ) ( ( + + = 72.234

19.65

36.35

The rejection region requires 𝛼 = .10 in the upper tail of the χ 2 distribution with𝑑𝑓 = (𝑟 − 1)(𝑐 − 1) = 2 (3 − 1)(2 − 1) = 2. From Table IV, Appendix D, χ .10 = 4.60517 . The rejection region is χ 2 > 4.60517 . Since the observed value of the test statistic falls in the rejection region (𝜒 = 72.234 > 4.60517), H0 is rejected. There is sufficient evidence to indicate that the option choice depends on emotion state at 𝛼 = .10. 10.54

a.

Some preliminary calculations are: The contingency table is:

Shift

1 2 3 Total

Defectives 25 35 80 140

200 (140) RC Eˆ11 = 1 1 = = 46.667 600 n

Non-Defectives 175 165 120 460

Total 200 200 200 600

200 (140) Eˆ 21 = Eˆ 31 = = 46.667 600

200 ( 460) Eˆ12 = Eˆ 22 = Eˆ 32 = = 153.333 600 To determine if quality of the filters are related to shift, we test: H0: H a:

Quality of filters and shift are independent Quality of filters and shift are dependent

Copyright © 2022 Pearson Education, Inc.


542

Chapter 10

The test statistic is 2

nij − Eˆij  ( 25 − 46.667) 2 ( 35 − 46.667) 2 ( 80 − 46.667) 2 (175 − 153.333) 2  = 2 + + + χ =   ˆ 46.667 46.667 46.667 153.333 Eij 2 2 165 − 153.333) (120 − 153.333) ( + + = 47.98

153.333

153.333

The rejection region requires 𝛼 = .05 in the upper tail of the χ 2 distribution with 2 d f = ( r − 1 )( c − 1 ) = ( 3 − 1 )( 2 − 1 ) = 2 . From Table IV, Appendix D, χ .05 = 5.99147 . The rejection region is χ 2 > 5.99147 . Since the observed value of the test statistic falls in the rejection region (𝜒 = 47.98 > 5.99147), H0 is rejected. There is sufficient evidence to indicate quality of filters and shift are related at 𝛼 = .05. b.

𝑝̂ =

= .125

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table II, Appendix D, z.025 = 1.96 . The 95% confidence interval is: pˆ 1 ± z.025

10.55

pˆ 1 (1 − pˆ 1 ) n

 .125 ± 1.96

.125 (.875 )  .125 ± .046  (.079, .171) 200

The contingency table for this data is

Recycle

Recycled Garbage Totals

Useful 26 13 39

Condition Control 14 25 39

Totals 40 38 78

Some preliminary calculations are:

40 ( 39) RC Eˆ11 = 1 1 = = 20 n 78

40 ( 39) RC Eˆ12 = 1 2 = = 20 78 n

R C 38 ( 39) Eˆ 21 = 2 1 = = 19 78 n

38 ( 39) RC Eˆ 22 = 2 2 = = 19 78 n

To determine if students in the usefulness is salient condition will recycle as a higher rate than students in the control condition, we test: H0: Condition and recycling are independent Ha: Condition and recycling are dependent The test statistic is 2

nij − Eˆij  ( 26 − 20) 2 (14 − 20) 2 (13 −19) 2 ( 25 −19) 2  = 2 χ =   + + + = 7.39 20 20 19 19 Eˆ ij

Copyright © 2022 Pearson Education, Inc.


Categorical Data Analysis

543

The rejection region requires α = .05 in the upper tail of the χ 2 distribution with 2 d f = ( r − 1 )( c − 1 ) = ( 2 − 1 )( 2 − 1 ) = 1 . From Table IV, Appendix D, χ .05 = 3.84146 . The rejection region is χ 2 > 3.84146 . Since the observed value of the test statistic falls in the rejection region (𝜒 = 7.39 > 3.84146), H0 is rejected. There is sufficient evidence to indicate students in the usefulness is salient condition will recycle as a higher rate than students in the control condition at 𝛼 = .05. 10.56

Using MINITAB, the results are: Tabulated statistics: Defect, PredEVG Using frequencies in Fr Rows: Defect Columns: PredEVG

1 2 All

1

2

All

441 47 488

8 2 10

449 49 498

Cell Contents:

Count

Pearson Chi-Square = 1.188, DF = 1 Likelihood Ratio Chi-Square = 0.948, DF = 1

To determine if Defect and Pred_EVG are dependent, we test: H 0 : Defect and Pred_EVG are independent H a : Defect and Pred_EVG are dependent

The test statistic is χ 2 = 1.188 . Since no α level was given, we will use α = .05 . The rejection region requires 𝛼 = .05 in the upper tail of the χ 2 distribution with d f = ( r − 1 )( c − 1 ) = ( 2 − 1 )( 2 − 1 ) = 1 . From Table IV, Appendix D, 2 χ .05 = 3.84146 . The rejection region is χ 2 > 3.84146 .

Since the observed value of the test statistic does not fall in the rejection region (𝜒 = 1.188 ≯ 3.84146), H0 is not rejected. There is insufficient evidence to indicate that Defect and Pred_EVG are dependent at 𝛼 = .05. If Defect and Pred_EVG are independent, then the Pred_EVG is no better predicting defects than just guessing. I would not recommend the essential complexity algorithm be used as a predictor of defective software modules.

10.57

[ n − E ] = ( 26 − 23) + (146 − 136) + ( 361 − 341) + (143 − 136) + (13 − 23) = 9.647 2

2

2

2

2

2

a.

χ2 = 

b.

2 From Table IV, Appendix D, with df = 5 , χ .05 = 11.0705

c.

No. Since the observed value of the test statistics does not fall in the rejection region (𝜒 = 9.647 ≯ 11. 0705), H0 is not rejected. There is insufficient evidence to indicate the salary distribution is non-normal for 𝛼 = .05.

i

i

Ei

23

136

341

Copyright © 2022 Pearson Education, Inc.

136

23


544

Chapter 10

d.

(

)

The p-value is p = P χ 2 ≥ 9.647 . Using MINITAB, Cumulative Distribution Function Chi-Square with 5 DF x P( X <= x ) 9.647 0.914122

(

)

The p-value is p = P χ 2 ≥ 9.647 = 1 − .914122 = .085878 . 10.58

Using SAS, the output is: The FREQ Procedure Table of CANDIDATE by TIME CANDIDATE

TIME

Frequency| Col Pct | 1| 2| 3| 4| 5| 6| ---------+--------+--------+--------+--------+--------+--------+ SMITH | 208 | 208 | 451 | 392 | 351 | 410 | | 52.53 | 55.32 | 55.34 | 55.92 | 56.16 | 55.33 | ---------+--------+--------+--------+--------+--------+--------+ COPPIN | 55 | 51 | 109 | 98 | 88 | 104 | | 13.89 | 13.56 | 13.37 | 13.98 | 14.08 | 14.04 | ---------+--------+--------+--------+--------+--------+--------+ MONTES | 133 | 117 | 255 | 211 | 186 | 227 | | 33.59 | 31.12 | 31.29 | 30.10 | 29.76 | 30.63 | ---------+--------+--------+--------+--------+--------+--------+ Total 396 376 815 701 625 741

Total 2020

505

1129

3654

Statistics for Table of CANDIDATE by TIME Statistic DF Value Prob -----------------------------------------------------Chi-Square 10 2.2839 0.9937 Likelihood Ratio Chi-Square 10 2.2722 0.9938 Mantel-Haenszel Chi-Square 1 0.9851 0.3209 Phi Coefficient 0.0250 Contingency Coefficient 0.0250 Cramer's V 0.0177 Sample Size = 3654

To determine if candidates received votes independent of time period, we test: H0: Voting and Time period are independent Ha: Voting and Time period are dependent The test statistic is 𝜒 = 2.2839. Since no value of 𝛼 was given, we will use 𝛼 = .05. The rejection region requires 𝛼 = .05 in the upper tail of the χ 2 distribution with 𝑑𝑓 = (𝑟 − 1)(𝑐 − 1) = (3 − 1)(6 − 1) = 10. From Table IV, Appendix D, 𝜒. = 18.3070. The rejection region is 𝜒 > 18.3070.

Copyright © 2022 Pearson Education, Inc.


Categorical Data Analysis

545

Since the observed value of the test statistic does not fall in the rejection region (𝜒 = 2.2839 ≯ 18.3070), H0 is not rejected. There is insufficient evidence to indicate Voting and Time period are dependent at 𝛼 = .05. Thus, we can conclude that voting and time period are independent. This means that regardless of time period, the percentage of votes received by each candidate is the same. In the table created by SAS, the bottom number in each cell is the column percent. This is the percent of votes received by the candidate in each time period. An inspection of these percents indicates that candidate Smith received approximately 55.3% of the votes each time period, candidate Coppin received approximately 13.8% of the vote, and candidate Montes received approximately 30.9% of the vote. All of this indicates that the election was rigged.

Copyright © 2022 Pearson Education, Inc.


Chapter 11 Simple Linear Regression 11.1 a.

b.

c.

11.2

d.

For all problems below, we use: Slope =

a.

Slope =

"rise" y2 − y1 = "run" x2 − x1

5 −1 = 1 = β1 5 −1

If y = β 0 + β1 x , then β 0 = y − β1 x . Since a given point is (1, 1) and β1 = 1 , the y-intercept is β0 = 1 − 1(1) = 0 . b.

Slope =

0−3 = −1 = β1 3−0

If y = β 0 + β1 x , then β 0 = y − β1 x . Since (0, 3) is given and β1 = −1 , the y-intercept is β0 = 3 − ( −1)( 0) = 3 . 546 Copyright © 2022 Pearson Education, Inc.


Simple Linear Regression

c.

2 −1 1 = = .2 = β1 4 − ( −1) 5

Slope =

If y = β 0 + β1 x , then β 0 = y − β1 x . Since a given point is (−1, 1) and β1 = .2 , the y-intercept is β0 = 1 − .2 ( −1) = 1.2 . d.

6 − ( −3) 9 = = 1.125 = β1 2 − ( −6) 8

Slope =

If y = β 0 + β1 x , then β 0 = y − β1 x . Since a given point is (−6, −3) and β1 = 1.125 , the y-intercept is β0 = −3 − 1.125 ( −6) = 3.75 . 11.3

The two equations are: 4 = β0 + β1 ( −2) and 6 = β0 + β1 ( 4) Subtracting the first equation from the second, we get 6 = β0 + 4β1

− ( 4 = β0 − 2β1 ) 6β1  β1 =

2=

Substituting β1 =

4 = β0 +

1 3

1 into the first equation, we get: 3

1 2 14 ( −2)  β0 = 4 + = 3 3 3

The equation for the line is y = 11.4

a.

14 1 + x. 3 3

The equation for a straight line (deterministic) is y = β 0 + β1 x . If the line passes through (1, 1), then 1 = β0 + β1 (1)  1 = β0 + β1 Likewise, through (5, 5), then 5 = β0 + β1 ( 5) Solving for these two equations: 1 = β0 + β1

− ( 5 = β0 − β1 ( 5) ) −4 =

− 4 β1  β1 = 1

Substituting β1 = 1 into the first equation, we get 1 = β 0 + 1  β 0 = 0

Copyright © 2022 Pearson Education, Inc.

547


548

Chapter 11

The equation is y = 0 + 1x or y = x . b.

The equation for a straight line is y = β 0 + β1 x . If the line passes through (0, 3), then 3 = β0 + β1 ( 0) , which implies β 0 = 3 . Likewise, through the point (3, 0), then 0 = β 0 + 3β1  − β 0 = 3β1 . Substituting β 0 = 3 , we get −3 = 3β1  β1 = −1 . Therefore, the line passing through (0, 3) and (3, 0) is y = 3 − x .

c.

The equation for a straight line is y = β 0 + β1 x . If the line passes through (−1, 1), then 1= β0 + β1 ( −1) 1 = β 0 + β1 (−1) . Likewise through the point (4, 2), 2 = β0 + β1 ( 4) . Solving for these two equations 2 = β0 + β1 ( 4 )

− (1 = β0 + β1 ( −1) )

β1 ( 5)  β1 =

1=

d.

1 5

Solving for β0 , 1 = β0 +

1 1 1 6 ( −1)  1 = β0 −  β0 = 1 + = 5 5 5 5

The equation, with β 0 =

6 1 6 1 and β1 = , is y = + x . 5 5 5 5

The equation for a straight line is y = β 0 + β1 x . If the line passes through (−6, −3), then

−3 = β0 + β1 ( −6) . Likewise, through the point (2, 6), 6 = β0 + β1 ( 2) . Solving these equations

simultaneously. 6 = β0 + β1 ( 2)

− ( ( −3) = β0 − β1 ( 6) )

9=

β1 ( 8)  β1 =

9 8

18 30 9 Solving for β0 , 6 = β 0 + 2    6 − = β 0  β 0 = 8 8 8   Therefore, y =

30 9 + x. 8 8

Copyright © 2022 Pearson Education, Inc.


Simple Linear Regression

11.5

To graph a line, we need two points. Pick two values for x, and find the corresponding y values by substituting the values of x into the equation. a.

Let x = 0  y = 4 + ( 0) = 4

b.

and x = 2  y = 4 + ( 2) = 6

c.

Let x = 0  y = −4 + 3 ( 0) = −4 and x = 2  y = −4 + 3 ( 2) = 2

Let x = 0  y = 5 − 2 ( 0) = 5 and x = 2  y = 5 − 2 ( 2) = 1

d.

Let x = 0  y = −2 ( 0) = 0

and x = 2  y = −2 ( 2) = −4

Copyright © 2022 Pearson Education, Inc.

549


550

Chapter 11

e.

11.6

11.7

Let x = 0  y = 0 and x = 2  y = 2

f.

Let x = 0  y = .5 + 1.5 ( 0) = .5

and x = 2  y = .5 + 1.5 ( 2) = 3.5

a.

y = 4 + x . The slope is β1 = 1 . The y-intercept is β 0 = 4 .

b.

y = 5 − 2 x . The slope is β1 = −2 . The y-intercept is β 0 = 5 .

c.

y = −4 + 3 x . The slope is β1 = 3 . The y-intercept is β 0 = −4 .

d.

y = −2 x The slope is β1 = −2 . The y-intercept is β 0 = 0 .

e.

y = x . The slope is β1 = 1 . The y-intercept is β 0 = 0 .

f.

y = .5 + 1.5 x . The slope is β1 = 1.5 . The y-intercept is β 0 = .5 .

A deterministic model does not allow for random error or variation, whereas a probabilistic model does. An example where a deterministic model would be appropriate is: Let y = cost of a 2 × 4 piece of lumber and x = length (in feet) The model would be y = β1 x . There should be no variation in price for the same length of wood. An example where a probabilistic model would be appropriate is: Let y = sales per month of a commodity and x = amount of money spent advertising The model would be y = β 0 + β1 x + ε . The sales per month will probably vary even if the amount of money spent on advertising remains the same.

11.8

The "line of means" is the deterministic component in a probabilistic model.

11.9

No. The random error component, ε , allows the values of the variable to fall above or below the line.

11.10

a.

The dependent variable is the legislator’s AAUW score. The independent variable is the number of daughters a legislator has.

b.

A probabilistic model is more appropriate than a deterministic model because not all legislators who have the same number of daughters have the same AAUW score. There is error present. Copyright © 2022 Pearson Education, Inc.


Simple Linear Regression

11.11

11.12

11.13

11.14

551

c.

The model is y = β 0 + β1 x + ε .

a.

The dependent variable is the CEO’s annual salary. The independent variable is the typical worker’s pay.

b.

A probabilistic model is more appropriate than a deterministic model because there will not be an exact relationship between a CEO’s annual salary and the typical worker’s payThere is error present.

c.

The model is y = β 0 + β1 x + ε .

a.

The dependent variable is the ratio of repair to replacement cost of commercial pipe. The independent variable is the pipe diameter.

b.

A probabilistic model is more appropriate than a deterministic model because not all pipes of the same diameter have the same ratio of repair to replacement cost. There is error present.

c.

The model is y = β 0 + β1 x + ε .

a.

The dependent variable is the opening weekend box-office revenue. The independent variable is the movie’s tweet rate.

b.

A probabilistic model is more appropriate than a deterministic model because not all movies with the same tweet rate have the same opening weekend box-office revenue. There is error present.

c.

The model is y = β 0 + β1 x + ε .

a. xi

yi

x i2

7

2

7 2 = 49

7 ( 2) = 14

4

4

42 = 16

4 ( 4) = 16

6

2

62 = 36

6 ( 2) = 12

2

5

2

2 =4

2 ( 5) = 10

1

7

12 = 1

1( 7 ) = 7

1

6

12 = 1

1( 6 ) = 6

3

5

2

3 =9

3 ( 5) = 15

 x = 24

 y = 31

 x = 116

 x y = 80

i

2 i

i

b.

SS xy =  xi yi −

c.

SS xx =  xi2 −

d.

βˆ1 =

xiyi

i

i

(  x )(  y ) = 80 − ( 24)( 31) = 80 − 106.2857143 = −26.2857143 i

i

n

7

(  x ) = 116 − ( 24) = 116 − 82.28571429 = 33.71428571 2

2

i

7

7

SS xy −26.2857143 = = −.779661017 ≈ −.7797 SS xx 33.71428571

Copyright © 2022 Pearson Education, Inc.


552

11.15

Chapter 11

 x = 24 = 3.428571429 i

i

x=

f.

βˆ0 = y − βˆ1x = 4.428571429 − ( −.779661017)( 3.428571429)

g.

The least squares line is yˆ = βˆ0 + βˆ1 x = 7.102 − .7797 x .

n

7

y=

 y = 31 = 4.428571429

e.

n

7

= 4.428571429 − ( −2.673123487) = 7.101694916 ≈ 7.102

From Exercise 11.14, βˆ0 = 7.10 and βˆ1 = −.78 . The fitted line is yˆ = 7.10 − .78 x . To obtain values for ŷ , we substitute values of x into the equation and solve for ŷ . a. x

y

ŷ = 7.10 - .78x

7 4 6 2 1 1 3

2 4 2 5 7 6 5

1.64 3.98 2.42 5.54 6.32 6.32 4.76

( y - yˆ )

( y - yˆ ) 2

.36 .02 −.42 −.54 .68 −.32 .24 ( y − yˆ ) = 0.02

.1296 .0004 .1764 .2916 .4624 .1024 .0576 SSE =  ( y − yˆ ) = 1.2204

b.

Copyright © 2022 Pearson Education, Inc.

2


Simple Linear Regression

c. x

y

ŷ = 14 - 2.5x

( y - yˆ )

( y - yˆ ) 2

7 4 6 2 1 1 3

2 4 2 5 7 6 5

−3.5 4 −1 9 11.5 11.5 6.5

5.5 0 3 −4 −4.5 −5.5 −1.5

30.25 0 9 16 20.25 30.25 2.25 SSE = 108.00

( y − yˆ ) = −7 11.16

a.

b.

Choose y = 1 + x since it best describes the relation of x and y.

c.

d.

y

x

yˆ = 1 + x

2 1 3

.5 1.0 1.5

1.5 2.0 2.5

y

x

yˆ = 3 - x

y - yˆ

2 1 3

.5 1.0 1.5

2.5 2.0 1.5

−.5 −1.0 1.5

SSE = ( y − yˆ )

y - yˆ

.5 −1.0 .5

( y − yˆ ) = 0

( y − yˆ ) = 0

2

SSE for 1st model: y = 1 + x , SSE = (.5) + ( −1) + (.5) = 1.5 2

2

2

SSE for 2nd model: y = 3 − x , SSE = (.5) + ( −1) + (1.5) = 3.5 2

2

2

Copyright © 2022 Pearson Education, Inc.

553


554

Chapter 11

The best fitting straight line is the one that has the smallest least squares. The model y = 1 + x has a smaller SSE, and therefore it verifies the visual check in part a. e.

Some preliminary calculations are:

x = 3  y = 6 SS xy = 

βˆ1 =

SS xy SS xx

 xy = 6.5 x = 3.5 2

(  x )(  y ) = 6.5 − ( 3)( 6) = .5 xy −

=

n

.5 =1; .5

2

SS xx = 

3

x=

 x = 3 =1; 3

y=

3

(  x) = 3.5 − ( 3) = .5 x − n

y = 6 =2 3

3

βˆ0 = y − βˆ1x = 2 −1(1) = 1  yˆ = βˆ0 + βˆ1x = 1 + x The least squares line is the same as the second line given. a.

Using MINITAB, the scattergram of the data is: Fitted Line Plot y = 8.543 - 0.9939 x S R-Sq R-Sq(adj)

7

1.06896 80.2% 76.2%

6 5 4

y

11.17

3 2 1 0 2

3

4

5

6

7

8

x

b.

Looking at the scattergram, x and y appear to have a negative linear relationship.

c.

Some preliminary calculations are:

x = 33  y = 27

 xy = 104  x = 179 2

SS xy =  xy −

(  x )(  y ) = 104 − ( 33)( 27) = −23.2857143

SS xx =  x 2 −

(  x ) = 179 − ( 33) = 23.4285714

n

7

2

βˆ1 =

SS xy SS xx

=

n

2

7

−23.2857143 = −.99390244 23.4285714

Copyright © 2022 Pearson Education, Inc.

2

2

3


Simple Linear Regression 𝑥̄ =

=

= 4.714285714

𝑦̄ =

=

555

= 3.857142857

𝛽 = 𝑦̄ − 𝛽 𝑥̄ = 3.857142857 − −.99390244 4.714285714 = 8.542682931 ≈ 8.5427

The least squares line is yˆ = 8.5427 − .9939 x .

11.18

11.19

d.

The least squares line is plotted in part a. It appears to fit the data well.

a.

In the scattergram, there does not appear to be much of a linear trend between cooperation use and average payoff.

b.

In the scattergram, there does not appear to be much of a linear trend between defection use and average payoff.

c.

In the scattergram, there does appear to be a linear trend between punishment use and average payoff. As punishment use increases, the average payoff tends to decrease.

d.

The slope of the line is negative.

a.

The straight-line model would be: y = βo + β1 x + ε From the printout, the least squares line is: 𝑦 = −5.1557 + 0.98233𝑥 .

11.20

b.

Since range of observed values for the 2017 Math SAT scores (x) does not include 0, the y-intercept has no meaning.

c.

The slope of the least squares line is 𝛽 = 0.98233. In terms of this problem, for each additional point increase in the 2017 Math SAT score, the mean 2019 Math SAT score is estimated to increase by 0.98233. This interpretation is meaningful for values of x within the observed range. The observed range of x is 468 to 651.

a.

The straight-line model would be: y = βo + β1 x + ε

b.

The least squares line is: 𝑦 = 44.29 + (−.26)𝑥 .

c.

𝛽 = 44.29. When the amount of whole banaba meal is 0%, we estimate the mean weight gain to be 44.29 grams. 𝛽 = −.26. For each additional percentage increase in the amount of whole banana meal, the mean weight gain is estimated to decrease by .26 grams.

11.21

a.

The least squares line is 𝑦 = 6.678 + .004786𝑥 .

b.

From the printout, SSE = .5374 . No. The least squares line has the minimum squared error.

c.

𝛽 = 6.678. Since the range of pipe diameters does not include 0, β̂0 has no practical interpretation. 𝛽 = .004786. For every 1 mm increase in pipe diameter, the mean ratio is estimated to increase by .004786.

Copyright © 2022 Pearson Education, Inc.


556

11.22

Chapter 11

d.

For 𝑥 = 800, 𝑦 = 6.678 + .004786(800) = 10.5068

e.

This prediction is not reliable because 800 is not in the observed range of pipe diameters. We do not know what the relationship between pipe diameter and the ratio looks like outside the observed range.

a.

Using MINITAB, the resilts are: Regression Analysis: Cost versus Year Analysis of Variance Source Regression Error Total

DF 1 10 11

Adj SS 6083.84 2309.07 8392.92

Adj MS 6083.84 230.91

F-Value 26.35

P-Value 0.000

Model Summary S 15.1956

R-sq 72.49%

R-sq(adj) 69.74%

R-sq(pred) 54.61%

Coefficients Term Constant Year

Coef -3675 1.870

SE Coef 724 0.364

T-Value -5.08 5.13

P-Value 0.000 0.000

VIF 1.00

Regression Equation Cost = -3675 + 1.870 Year

The least squares line is yˆ = −3, 675 + 1.87 x .

11.23

b.

βˆ0 = −3, 675 . Since the range of years does not include 0, β̂0 has no practical interpretation.

c.

βˆ1 = 1.87 . For every one year increase, the mean cost is estimated to increase by $1.87 million.

a.

Using MINITAB, the resilts are: Regression Equation Distance = 503.9 + 3.56 Exp

The least squares line is 𝑦 = 503.9 + 3.56𝑥 . b.

𝛽 = 503.9. Since the range of experience does not include 0, β̂0 has no practical interpretation. 𝛽 = 3.56. For every one year increase in experience, the mean pivot pin-top distnace is estimated to increase by 3.56 millimeters.

c.

For 𝑥 = 25, 𝑦 = 503.9 + 3.56(25) = 592.9

Copyright © 2022 Pearson Education, Inc.


Simple Linear Regression

11.24

a.

557

Using MINITAB, the scatterplot is: Scatterplot of AACC vs AAFEMA 0.7 0.6

AACC

0.5 0.4 0.3 0.2 0.1 0.0 0

5

10

15

20

25

30

AAFEMA

There appears to be a somewhat positive linear relationship between the average annual number of public corruption convictions and the average annual FEMA relief. b.

Using MINITAB, the results are: Regression Analysis: AACC versus AAFEMA The regression equation is AACC = 0.249 + 0.00542 AAFEMA Predictor Constant AAFEMA

Coef 0.24885 0.005416

S = 0.149176

SE Coef 0.02922 0.003245

R-Sq = 5.5%

T 8.52 1.67

P 0.000 0.102

R-Sq(adj) = 3.5%

Analysis of Variance Source Regression Residual Error Total

DF 1 48 49

SS 0.06200 1.06817 1.13016

MS 0.06200 0.02225

F 2.79

P 0.102

The fitted regression model is yˆ = .249 + .00542 x . c.

βˆ0 = .249 . Since 0 is not in the observed range of the average annual FEMA relief, β̂0 has no meaning. βˆ1 = .00542 . For each additional dollar in average annual FEMA relief per capita , the mean average annual number of public corruption convictions per 100,000 residents is estimated to increase by .00542.

11.25

a.

The straight line model would be: E ( y ) = β0 + β1 x

b.

Using MINITAB, the results are: Regression Equation ACCURACY

= 98.8 - 0.1321 DISTANCE

The least squares line is 𝑦 = 98.8 − 0.1321𝑥 . Copyright © 2022 Pearson Education, Inc.


558

11.26

Chapter 11

c.

Since 0 is not in the observed range of x (distance), β̂0 has no meaning.

d.

𝛽 = −0.1321. For each additional yard in a golfer’s average driving distance, the mean driving accuracy is estimated to decrease by 0.1321%.

e.

The estimate of the slope will help determine if the golfer’s concern is valid since it tells us the change in driving accuracy per unit change in driving distance.

a.

Some preliminary calculations are:

 x = 6,167

 y = 135.8

SS xy =  xy −

 x = 1,641,115 2

(  x )(  y ) = 34, 764.5 − ( 6167)(135.8) = −130.44167 n

24

(  x ) = 1, 641,115 − ( 6167 ) = 56, 452.95833 x − 2

SS xx = 

βˆ1 =

 xy = 34,764.5

2

2

n

24

SS xy −130.44167 = = −.002310625 ≈ −.0023 SS xx 56452.958

βˆ 0 = y − βˆ1 x =

135.8  6167  − ( −.002310625)  = 6.252067683 ≈ 6.25  24  24

The least squares line is yˆ = 6.25 − .0023x . b.

βˆ0 = 6.25 . Since x = 0 is not in the observed range, β̂0 has no interpretation other than being the yintercept. βˆ1 = −.0023 . For each additional increase of 1 part per million of pectin, the mean sweetness index is estimated to decrease by .0023.

c. 11.27

yˆ = 6.25 − .0023 ( 300) = 5.56

Some preliminary calculations are:

 x = 6,980.65 x=

 y = 576.3

 x = 6,980.65 = 303.5065 n

23

 xy = 396,603.225 y=

 x = 4,933,198.773 2

 y = 576.3 = 25.0565 n

23

SS xy =  xy −

(  x )(  y ) = 396, 603.225 − 6, 980.65 ( 576.3) = 221, 692.4165

SS xx =  x 2 −

(  x ) = 4, 933,198.773 − 6, 980.65 = 2,814, 525.972

n

23

2

n

2

23

Copyright © 2022 Pearson Education, Inc.

 y = 35,626.09 2


Simple Linear Regression

βˆ1 =

SS xy SS xx

=

559

221,692.4165 = 0.07876723 ≈ 0.0788 2,814,525.972

βˆ0 = y − βˆ1x = 25.0565 − (.07876723)( 303.5065) = 1.1501335 ≈ 1.1501 The fitted regression line is yˆ = 1.1501 + .0788 x . We would estimate that the movie’s opening weekend revenue would increase by (.0788)(100) = 7.88 million dollars as the tweet rate increases by 100. a.

Using MINITAB, the results are: Regression Analysis: VSHARE versus CDIFF Analysis of Variance Source Regression Error Total

DF 1 22

Adj SS Adj MS 9.08 9.080 1116.78 50.763 23 1125.86

F-Value 0.18

P-Value 0.676

Model Summary S 7.12481

R-sq 0.81%

R-sq(adj) 0.00%

R-sq(pred) 0.00%

Coefficients Term Constant CDIFF

Coef 49.57 0.0275

SE Coef 1.56 0.0650

T-Value 31.76 0.42

P-Value 0.000 0.676

VIF 1.00

Regression Equation VSHARE = 49.57 + 0.0275 CDIFF

The least squares line is yˆ = 49.57 + .0275 x . b.

Using MINITAB, the plot of the least squares line is: Fitted Line Plot

VSHARE = 49.57 + 0.02748 CDIFF 65

S 7.12481 R-Sq 0.8% R-Sq(adj) 0.0%

60

55

VSHARE

11.28

50

45

40

35 -75

-50

-25

0

25

50

CDIFF

There appears to be a weak, positive linear relationship between the variables. c.

βˆ1 = .0275 . For every one unit increase in the difference between the Democratic and Republican charisma value, the mean Democratic vote share is estimated to increase by .0275.

Copyright © 2022 Pearson Education, Inc.


560

11.29

Chapter 11

a.

Answers will vary. Suppose that the dependent variable is academic reputation score and the independent variable is average financial aid.

b.

Using MINITAB, the results are: Regression Equation Academic Rep Score = 48.96 + 0.000981 Avg Financial Aid

The least squares line is 𝑦 = 48.96 + .000981𝑥 . 𝛽 = 48.96. Since the range of average financial aid does not include 0, β̂0 has no practical interpretation. 𝛽 = .000981. For every one dollar increase in average financial aid, the mean academic reputation score is estimated to increase by .000981.

11.30

We will fit the model E ( y ) = βo + β1x . Some preliminary calculations are:

 x = 526 x=

 y = 60.1

 xy = 586.95

 x = 526 = 22.86956522 n

βˆ1 =

SS xy SS xx

 y = 262.2708 2

 y = 60.1 = 2.613043478 n

23

(  x )(  y ) = 586.95 − 526 ( 60.1) = 586.95 − 1, 374.46087 = −787.51087 23

n

(  x) = 18,936 − (526) = 18,936 −12,029.3913 = 6906.6087 x − 2

SSxx = 

2

y=

23

SS xy =  xy −

 x = 18,936

2

2

n

=

23

− 787.51087 = − 0.114022801 ≈ − 0.114 6906.6087

βˆ o = y − βˆ 1 x = 2.613043478 − ( − 0.114022801)( 22.86956522 ) = 5.220695365 ≈ 5.221

The fitted regression line is: yˆ = 5.221 − 0.114 x Since the estimate of the coefficient of the time variable is negative, it indicates that the spill tends to diminish as time increases. It is estimated that for each minute of time, the mean mass will diminish by .114 pounds.

Copyright © 2022 Pearson Education, Inc.


Simple Linear Regression

Using MINITAB, the fitted regression line is: Fitted Line Plot

Mass = 5.221 - 0.1140 Time 7

S R-Sq R-Sq(adj)

6

0.857257 85.3% 84.6%

5

Mass

4 3 2 1 0 -1 -2 0

10

20

30

40

50

60

Time

We can see that as time increases, the mass decreases. However, it appears that the mass decreases at a non-constant rate. A curvilinear line might be a better fit. 11.31

The graph in b would have the smallest s2 because the width of the data points is the smallest.

11.32

a.

SSE = SSyy − βˆ1SSxy = 95 − .75( 50) = 57.5

b.

(  y) = 860 − 50 = 797.5 y − 2

SSyy = 

s2 =

c.

2

n

s2 =

2

40

SSE 57.5 = = 3.19444 n − 2 20 − 2

SSE = SSyy − βˆ1SSxy = 797.5 −.2(2700) = 257.5

SSE 257.5 = = 6.776315789 ≈ 6.7763 n − 2 40 − 2

SSyy = ( yi − y ) = 58 2

βˆ1 =

91 SS xy = = .535294117 170 SS xx

SSE = SSyy − βˆ1SSxy = 58 − .535294117( 91) = 9.2882353 ≈ 9.288

11.33

11.34

s2 =

SSE 9.2882353 9.2882353 = = = 1.161029413 ≈ 1.1610 n−2 10 − 2 10 − 2

a.

s2 =

SSE 8.34 = = .3475 n − 2 26 − 2

b.

We would expect most of the observations to be within 2 s = 2 .3475 ≈ 1.179 of the least squares line.

(  yi ) SSE = SSyy − βˆ1SSxy where SSyy =  yi2 −

2

n

For Exercise 11.14,

 y = 15 9 2 i

 y = 31 i

SS xy = − 26.2857143

βˆ1 = − .7 7 9 6 6 10 1 7

Copyright © 2022 Pearson Education, Inc.

561


562

Chapter 11

SS yy =159 −

2

31 = 159 − 137.2857143 = 21.7142857 7

Therefore, 𝑆𝑆𝐸 = 21.7142857 − (−.779661017)(−26.2857143) = 1.22033896 ≈ 1.2203 𝑠 =

=

.

= .244067792, 𝑠 = √. 244067792 = .4960

We would expect about 95% of the observations to fall within 2s or 2(.4940) or .988 units of the least squares prediction line. 11.35

SSE 1.04 = = .04 and s = .04 = .2 n − 2 28 − 2

a.

s2 =

b.

We would expect most of the observations to be within 2s or of the 2 (.2 ) = .4 units of the fitted regression line.

11.36

11.37

11.38

11.39

11.40

a.

From the printout, 𝑆𝑆𝐸 = 24,404.303, 𝑠 = 𝑀𝑆𝐸 = 489.047, and 𝑠 = 22.317.

b.

𝑠 = 22.317. We would expect approximately 95% of the observed values of y (2019 Math SAT Score) to fall within 2s or 2(22.317) = 44.634 points of their least squares predicted values.

a.

From the printout, 𝑠 = .22.

b.

We would expect approximately 95% of the observed values of y (ratio of repair to replacement cost of commercial pipe) to fall within 2s or 2(. 22) = .44 units of their least squares predicted values.

a.

From Exercise 11.22, s = 15.20 .

b.

We would expect approximately 95% of the observed values of y (estimated annual cost) to fall within 2s or 2(15.2) = 30.4 units of their least squares predicted values.

a.

The straight line model would be: y = β 0 + β1 x + ε .

b.

From the printout, the least squares prediction equation is: 𝑦 = 156.0 + 0.2602𝑥 .

c.

The assumptions are: i. The mean of the probability distribution of the random component ε is 0. ii. The variance of the probability distribution of ε is constant for all settings of the independent variable x. iii. The probability distribution of ε is normal. iv. The values of ε associated with any two observed values of y are independent.

d.

𝑠 = 301.78

e.

About 95% of the observed values of y (total area) will fall within 2𝑠 or 2(301.780) = 603.560 thousands of square meters of their least squares predicted values.

a.

𝑠 =

=

,

= 277.75, 𝑠 = √277.75 = 16.666

Copyright © 2022 Pearson Education, Inc.


Simple Linear Regression

11.41

563

b.

The standard deviation can be interpreted practically. About 95% of the observed values of y (number of liquor stores selling tobacco) will fall within 2𝑠 or 2(16.666) = 33.332 stores of their least squares predicted values.

c.

About 95% of the observed values of y (number of pharmacies selling tobacco) will fall within 2𝑠 or 2(2.84) = 5.68 stores of their least squares predicted values.

d.

The pharmacy model has a much smaller forecast error.

a.

Using MINITAB, the results of fitting the regreeeion line are: Regression Analysis: AACC versus AAFEMA The regression equation is AACC = 0.249 + 0.00542 AAFEMA Predictor Constant AAFEMA

Coef 0.24885 0.005416

S = 0.149176

SE Coef 0.02922 0.003245

R-Sq = 5.5%

T 8.52 1.67

P 0.000 0.102

R-Sq(adj) = 3.5%

Analysis of Variance Source Regression Residual Error Total

DF 1 48 49

SS 0.06200 1.06817 1.13016

MS 0.06200 0.02225

F 2.79

P 0.102

The estimate of σ 2 is MSE = .02225 .

11.42

b.

The estimate of σ is s = .149176 .

c.

The standard deviation in part b can be interpreted practically. The standard deviation is measured in the same units as the data. The variance is measured in square units and is very difficult to interpret.

d.

The range of the values of the average annual number of public corruption convictions is from .06 to .71 or range = .71 − .06 = .65. Two standard deviations is 2(. 149) = .298. Adding and subtracting 2 standard deviations from the mean would give a width of 2(. 298) = .596. Thus, knowing the state’s average annual FEMA relief does not help very much in the prediction of a state’s average annual number of public convictions compared with prediction without using the state’s average annual FEMA relief.

a.

From Exercise 11.26, SS xy = − 130.44167 , βˆ1 = −.002310625 ,  y = 1 3 5.8 , and  y 2 = 769.72 . 𝑆𝑆

= ∑𝑦 −

𝑆𝑆𝐸 = 𝑆𝑆

= 769.72 −

=

(

. )

= 769.72 − 768.4016667 = 1.3183333

= 1.3183333 − (−.002310625)(−130.44167) = 1.016931516 ≈ 1.017

− 𝛽 𝑆𝑆

𝑠 = 𝑀𝑆𝐸 =

b.

(∑ )

.

= 0.046224159 ≈ .0462 and 𝑠 = √0.046224159 = 0.215

s 2 is measured in square units. It is very difficult to explain something measured in square units.

𝑠 = 0.215. We would expect approximately 95% of the observed values of y (sweetness index) to fall within 2s or 2(0.215) = 0.43 units of their least squares predicted values. Answers will vary. Suppose the dependent variable is again academic reputation score. Let the

c. 11.43

Copyright © 2022 Pearson Education, Inc.


564

Chapter 11

independent variable be average net cost. Using MINITAB, the results are: Regression Equation Academic Rep Score = 73.55 + 0.000108 Avg Net Cost

Model Summary S R-sq 13.4908 0.36%

R-sq(adj) 0.00%

R-sq(pred) 0.00%

From this analysis, the standard deviation is 𝑠 = 13.4908. From the analysis in Exercise 11.29 with the independent variable as average financial aid, the standard deviation was 𝑠 = 9.552 Since the standard deviation using average financial aid as the independent variable is smaller than the standard deviation using average net cost as the independent variable, average financial aid is a more accurate predictor for academic reputation score. 11.44

Some preliminary calculations for Brand A are:

 x = 750

 x = 40, 500

SS xy =  xy −

 x  y = 2, 022 − 750(44.8) = −218

2

 y = 44.8

βˆ1 =

SS xy

(  x) = 40,500 − 750 = 3,000 x −

SS xx

2

2

n

=

2

 xy = 2, 0 2 2

15

n

2

SSxx = 

 y = 168.70

15

− 218 = − 0.0726666667 ≈ − 0.0727 3, 000

(  y) = 168.70 − 44.8 = 34.89733333 y − 2

SSyy = 

2

2

n

βˆ0 = y − βˆ1 x =

15

44.8 750 − ( −0.0726666667) = 6.62 15 15

The least squares prediction equation for Brand A is: yˆ = 6.62 − 0.0727 x Some preliminary calculations for Brand B are:

 x = 750

 x = 40, 500

SS xy =  xy −

 x  y = 2, 622 − 750 ( 58.9 ) = −323

2

 xy = 2, 6 2 2

n

(  x) = 40,500 − 750 = 3,000 x −

βˆ1 =

SS xy SS xx

2

2

n

=

 y = 270.89 2

15

2

SSxx = 

 y = 58.9

15

− 323 = − 0.1076666667 ≈ − 0.1077 3, 000

(  y) = 270.89 − 58.9 = 39.60933333 y − 2

SSyy = 

2

2

βˆ0 = y − βˆ1 x =

n

15

58.9 750 − ( −0.1076666667) = 9.31 15 15

The least squares prediction equation for Brand B is: yˆ = 9.31 − 0.1077 x

Copyright © 2022 Pearson Education, Inc.


Simple Linear Regression

For Brand A, SSE = SSyy − βˆ1SSxy = 34.89733333 − ( −0.072666667)( −218) = 19.0560

s 2 = MSE =

SSE 19.0560 = = 1.4658 and s = 1.4658 = 1.211 n − 2 15 − 2

For Brand B, SSE = SSyy − βˆ1SSxy = 39.60933333 − ( −0.107666667)( −323) = 4.833

s 2 = MSE =

SSE 4.833 = = 0.37177 and s = 0.37177 = .61 n − 2 15 − 2

For Brand A, yˆ = 6.62 − .0727x . For x = 70 , yˆ = 6.62 − .0727 ( 7 0 ) = 1.531 and 2 s = 2 (1.211 ) = 2.422 Therefore, yˆ ± 2 s  1.53 1 ± 2.4 2 2  ( − .89 1, 3.5 9 3 ) . For Brand B, yˆ = 9.31 − .1077x . For x = 70 , yˆ = 9.3 1 − .107 7 ( 7 0 ) = 1.77 1 and 2 s = 2 (.61) = 1.22 Therefore, yˆ ± 2 s  1.771 ± 1.22  (.551, 2.991) . More confident with Brand B since there is less variation (s is smaller). 11.45

a.

For confidence coefficient .95, α = .05 and α / 2 = .05 / 2 = .025 . From Table III, Appendix D, with

df = n − 2 =10 − 2 = 8 , t.025 = 2.306 . The 95% confidence interval for β 1 is: βˆ1 ± t.025 sβˆ  βˆ1 ± t.025 1

s SS xx

 31 ± 2.306

3  31 ± 1.17  ( 29.83, 32.17 ) 35

For confidence coefficient .90, α = .10 and α / 2 = .10 / 2 = .05 . From Table III, Appendix D, with

df = 8 , t.05 = 1.860 . The 90% confidence interval for β 1 is: 3

βˆ1 ± t.05 sβˆ  31 ± 1.860

35

1

b.

s2 =

 31 ± .94  ( 30.06, 31.94 )

SSE 1960 2 = = 163.33 , s = s = 12.7802 n − 2 14 − 2

For confidence coefficient, .95, α = .05 and α / 2 = .05 / 2 = .025 . From Table III, Appendix D, with df = n − 2 = 14 − 2 = 12 , t.025 = 2.179 . The 95% confidence interval for β 1 is: βˆ1 ± t.025 sβˆ  βˆ1 ± t.025 1

s SS xx

 64 ± 2.179

12.7802  64 ± 5.08  ( 58.92, 69.08) 30

For confidence coefficient .90, α = .10 and α / 2 = .10 / 2 = .05 . From Table III, Appendix D, with df = 12 , t.05 = 1.782 . The 90% confidence interval for β 1 is:

βˆ1 ± t.05 sβˆ  64 ± 1.782 1

12.7802 30

 64 ± 4.16  ( 59.84, 68.16) .

Copyright © 2022 Pearson Education, Inc.

565


566

Chapter 11

c.

s2 =

SSE 146 2 = = 8.1111 , s = s = 2.848 . n − 2 20 − 2

For confidence coefficient .95, α = .05 and α / 2 = .05 / 2 = .025 . From Table III, Appendix D, with df = n − 2 = 20 − 2 = 18 , t.025 = 2.101 . The 95% confidence interval for β 1 is: s

βˆ1 ± t.025 sβˆ  βˆ1 ± t.025

SS xx

1

 −8.4 ± 2.101

2.848  −8.4 ± .75  ( −9.15, − 7.65) 64

For confidence coefficient .90, α = .10 and α / 2 = .10 / 2 = .05 . From Table III, Appendix D, with df = 18 , t.05 = 1.734 . The 90% confidence interval for β 1 is:

βˆ1 ± t.05 sβˆ  −8.4 ± 1.734

2.848 64

1

a.

Using MINITAB, the scatterplot is: Fitted Line Plot y = 0.5357 + 0.8214 x S R-Sq R-Sq(adj)

7

1.19224 72.7% 67.2%

6 5 4

y

11.46

 −8.4 ± .62  ( −9.02, − 7.78)

3 2 1 0 0

1

2

3

4

5

6

x

b.

Some preliminary calculations are:

 x = 21

 x = 91 2

SS xy =  xy −

 x y n

2

βˆ0 = y − βˆ1 x =

n

 y = 21

 y = 89 2

(  x) = 91− 21 = 28 x − 2

21(21) = 86 − = 23 7

(  y) = 89 − 21 = 26 y − 2

SS yy = 

 xy = 8 6

SSxx = 

2

7

βˆ1 =

n

2

7

SS xy 23 = = .821428571 ≈ .821 SS xx 28

21  21  − .821428571  = 3 − 2.4642857 = .535714285 ≈ .536 7  7 

The fitted line is yˆ = .536 + .821x . c.

2

See the plot in part a. Copyright © 2022 Pearson Education, Inc.


Simple Linear Regression

d.

567

To test whether x contributes significant information for predicting y, we test: H0 : β1 = 0 Ha : β1 ≠ 0

e.

The test statistic is t =

βˆ1 − 0 sβˆ

where sβˆ1 =

1

𝑆𝑆𝐸 = 𝑆𝑆 𝑠 =

sβˆ =

− 𝛽 𝑆𝑆

1.1922 28

1

SSxx

= 26 − .821428571(23) = 7.107142857

.

=

s

= 1.421428571

s = 1.42143 = 1.1922

t=

= .2253

.82143 − 0 = 3.646 .2253

The degrees of freedom for this t is df = n − 2 = 7 − 2 = 5 . f.

The rejection region requires α / 2 = .05 / 2 = .025 in each tail of the t-distribution. From Table III, Appendix D, t.025 = 2.571 with df = n − 2 = 7 − 2 = 5 . The rejection region is 𝑡 < −2.571 or 𝑡 > 2.571. Since the observed value of the test statistic falls in the rejection region (𝑡 = 3.646 > 2.571), H0 is rejected. There is sufficient evidence to indicate that x contributes information for the prediction of y at 𝛼 = .05.

11.47

From Exercise 11.46, βˆ1 = .82 , s =1.1922 , SS xx = 28 , and n= 7. For confidence coefficient .80, α = .20 and α / 2 = .20 / 2 = .10 . From Table III, Appendix D, with df = n − 2 = 7 − 2 = 5 , t.10 = 1.476 . The 80% confidence interval for β 1 is:

βˆ1 ± t.10 sβˆ  .82 ± 1.476 1

1.1922 28

 .82 ± 1.476 (.2253 )  .82 ± .33  (.49, 1.15 )

For confidence coefficient .98, α = .02 and α / 2 = .02 / 2 = .01 . From Table III, Appendix D, with df = 5 , t.01 = 3.365 . The 98% confidence interval for β 1 is:

βˆ1 ± t.01sβˆ  .82 ± 3.365 1

11.48

1.1922 28

 .82 ± 3.365 (.2253)  .82 ± .76  (.06, 1.58 )

Some preliminary calculations are:

 x = 19

 x = 65

SS xy =  xy −

2

 x y n

 xy = 6 5

 y = 21

 y = 91 2

(  x) = 65 − 19 = 4.8333333 x − 2

= 65 −

19(21) = −1.5 6

SSxx = 

2

Copyright © 2022 Pearson Education, Inc.

n

2

6


568

Chapter 11

(  y ) = 91− 21 = 17.5 y − 2

SS yy = 

2

2

n

6

−1.5 SS xy βˆ1 = = = −.310344829 ≈ −.3103 SS xx 4.8333333

SSE = SSyy − βˆ1SSxy = 17.5 − ( −.310344829)( −1.5) = 17.03448276 s2 =

SSE 17.03448276 = = 4.25862069 n−2 6−2

s=

4.25862069 = 2.0636

To determine whether a straight line is useful for characterizing the relationship between x and y, we test: H0 : β1 = 0 Ha : β1 ≠ 0

The test statistic is t =

βˆ1 − 0 sβˆ

1

=

−.3103 − 0 = −.33 2.0636 4.8333333

The rejection region requires α / 2 = .05 / 2 = .025 in each tail of the t-distribution with df = n − 2 = 6 − 2 = 4 . From Table III, Appendix D, t.025 = 2.776 . The rejection region is t < −2.776 or t > 2.776 . Since the observed value of the test statistic does not fall in the rejection region ( t = − .33 </ − 2.776 ) , H0 is not rejected. There is insufficient evidence to indicate that a straight line is useful for characterizing the relationship between x and y at 𝛼 = .05. 11.49

a.

To determine if the average state Math SAT score in 2019 has a positive relationship with the average state Math SAT score in 2017, we test: H 0 : β1 = 0 H a : β1 > 0

b.

From the printout in Exercise 11.19, the p-value is p < 0.0001. This is the p-value for a 2-tailed test. The p-value for this one-tailed test is 0.0001/2 = 0.00005. Since the p-value is less than α = .05 , H0 is rejected. There is sufficient evidence to indicate the average state Math SAT score in 2019 has a positive relationship with the average state Math SAT score in 2017 at 𝛼 = .05.

c.

For confidence coefficient .95, α = .05 and α / 2 = .05 / 2 = .025 . From Table III, Appendix D, with df = n − 2 = 51 − 2 = 49 , t.025 ≈ 2.011 . The 95% confidence interval is: 𝛽 ± 𝑡.

𝑠

⇒ .9823 ± 2.011(. 0670) ⇒ .9823 ± .1347 ⇒ (. 8476, 1.1170)

We are 95% confident that for each additional point on the 2017 average state Math SAT score, the increase in the mean 2019 average state Math SAT score is between .8476 and 1.1170.

Copyright © 2022 Pearson Education, Inc.


Simple Linear Regression

11.50

a.

569

To determine if the weight gain decreases linearly with percentage amount of whole banana meal, we test: 𝐻 : 𝛽 =0 𝐻 : 𝛽 <0

11.51

b.

No value of α was given in the problem so we will use .05. Since the p-value is less than 𝛼 (𝑝 < .05 = 𝛼), H0 is rejected. There is sufficient evidence to indicate weight gain decreases linearly with percentage amount of whole banana meal at 𝛼 = .05.

a.

The sign of β 1 should be positive. As the number of daughters increase, the AAUW score should be higher.

b.

For confidence coefficient .95, α = .05 and α / 2 = .05 / 2 = .025 . From Table II, Appendix D, z.025 = 1.96 . The 95% confidence interval is:

βˆ1 ± z.025 sβˆ  .27 ± 1.96 (.74)  .27 ± 1.4504  ( −1.1804, 1.7204 ) 1

11.52

c.

Since 0 falls in the confidence interval found in part a, there is no evidence to reject H0. There is insufficient evidence to indicate the number of daughters is linearly related to the AAUW score at 𝛼 = .05.

a.

The simple linear regression model is E ( y ) = β 0 + β 1 x .

b.

We would expect β 1 to be positive. As the helicopter parent’s score goes up, we would expect the entitlement score to go up.

c.

Since the p-value is less than α ( p = .002 < .01) , H0 is rejected. There is sufficient evidence to indicate that that helicopter parents lead to an entitlement mentality at 𝛼 = .01.

11.53

To determine if the cost ratio increases linearly with pipe diameter, we test: H 0 : β1 = 0 H a : β1 > 0

The test statistic is t =14.87 and the p-value is p = .000 / 2 = .000 . Since the p-value is less than 𝛼(𝑝 = .000 < .05), H0 is rejected. There is sufficient evidence to indicate that the cost ratio increases linearly with pipe diameter at 𝛼 = .05. From the printout, the 95% confidence interval for the increase in cost ratio for every 1 millimeter increase in pipe diameter is (.004077, .005494 ) . 11.54

From Exercise 11.26, 𝑆𝑆

∑ 𝑦 = 135.8

= −130.44167, 𝛽 = −0.002310625, and 𝑆𝑆

∑ 𝑦 = 769.72

(  y ) = 769.72 − 135.8 = 1.3183333 y − 2

SS yy = 

2

2

n

24

Copyright © 2022 Pearson Education, Inc.

= 56,452.95833.


570

Chapter 11

SSE = SSyy − βˆ1SSxy = 1.3183333 − ( −0.002310625)( −130.44167) = 1.016931516 SSE 1.016931516 = = 0.046224159 and 𝑠 = √0.046224159 = 0.214998 n−2 24 − 2

s 2 = MSE =

sβˆ = 1

MSE SSxx

.21499

=

56, 452.95833

= .0009049

For confidence coefficient .95, α = .05 and α / 2 = .05 / 2 = .025 . From Table III, Appendix D, with df = n − 2 = 24 − 2 = 22 , t.025 = 2.074 . The confidence interval is:

βˆ1 ± t.025sβˆ  −0.0023 ± 2.074( 0.0009049)  −0.0023 ± 0.0019  ( −0.0042, − 0.0004) 1

We are 95% confident that for each additional point increase in the amount of soluble pectin, the mean sweetness index will decrease from between .0004 and .0042 points. 11.55

a.

The MINITAB printout is shown here: Coefficients Term Constant Exp

Coef 503.9 3.56

SE Coef 59.6 2.20

T-Value 8.46 1.62

P-Value VIF 0.000 0.123 1.00

To determine if the slope of the line is positive, we test: H 0 : β1 = 0 H a : β1 > 0

The test statistic is 𝑡 = 1.62 and the p-value is 𝑝 =

.

= .0615.

Since the p-value is not less than 𝛼 (𝑝 = .0615 ≮ . 05), H0 is not rejected. There is insufficient evidence to indicate there is a positivelinear relationship between farmer’s experience and FROPS pivot pin-top distance at 𝛼 = .05. b.

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table III, Appendix D, with 𝑑𝑓 = 𝑛 − 2 = 20 − 2 = 18, 𝑡. = 2.101. The confidence interval is: 𝛽 ± 𝑡.

𝑠

⇒ 3.56 ± 2.101(2.20) ⇒ 3.56 ± 4.6222 ⇒ (−1.0622, 8.1822)

Since 0 falls in the confidence interval found in part a, there is no evidence to reject H0. There is insufficient evidence to indicate there is a positive linear relationship between farmer’s experience and FROPS pivot pin-top distance at 𝛼 = .05. 11.56

a.

The simple linear regression model is 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 .

b.

No, the y-intercept will have no practical meaning. The value of x (beauty index) is measured on a scale from 1 to 5, averaged, and then divided by the standard deviation. These values will always be greater than 0. Thus, 0 will not be in the observed range for x.

Copyright © 2022 Pearson Education, Inc.


Simple Linear Regression

c.

𝛽 = 22.91. For each unit increase in the beauty index, the mean percentage of votes obtained is estimated to increase by 22.91.

d.

To determine if the slope of the line is positive, we test:

571

H 0 : β1 = 0 H a : β1 > 0

The test statistic is 𝑡 =

=

. .

= 6.142.

The rejection region requires 𝛼 = .01in the upper tail of the t-distribution with 𝑑𝑓 = 𝑛 − 2 = 641 − 2 = 639. From Table II, Appendix D, 𝑡. = 2.326. The rejection region is 𝑡 > 2.326. Since the observed value of the test statistic falls in the rejection region (𝑡 = 6.142 > 2.326), H0 is rejected. There is sufficient evidence to indicate that there is a positive relationahip between beauty index and relative success of political candidates at 𝛼 = .01. 11.57

a.

To determine if driving accuracy decreases linearly as driving distance increases, we test: H 0 : β1 = 0 H a : β1 < 0

b.

From MINITAB: Coefficients Term Constant DISTANCE

Coef 98.8 -0.1321

SE Coef 29.4 0.0977

T-Value 3.37 -1.35

P-Value VIF 0.002 0.184 1.00

The test statistic is 𝑡 − 1.35. The p-value is 𝑝 = c. 11.58

.

= .092.

Since the p-value is not less than 𝛼 = .01, H0 is not rejected. There is insufficient evidence to indicate driving accuracy decreases linearly as driving distance increases at 𝛼 = .01.

To determine if there is a linear relationship between a state’s average annual number of public corruption convictions and a state’s average annual FEMA relief, we test: H 0 : β1 = 0 H a : β1 ≠ 0

From Exercise 11.24, the test statistic is t = 1.67 and the p-value is 𝑝 = .102. Since the p-value is not less than 𝛼 (𝑝 = .102 ≮ . 01), H0 is not rejected. There is insufficient evidence to indicate there is a linear relationship between a state’s average annual number of public corruption convictions and a state’s average annual FEMA relief at 𝛼 = .01.

Copyright © 2022 Pearson Education, Inc.


572

11.59

Chapter 11

To determine if the simple linear regression model is statistically useful for predicting Democratic vote share, we test: H 0 : β1 = 0 H a : β1 ≠ 0

From Exercies 11.28, the test statistic is 𝑡 = .42 and the p-value is p = .676 . Since the p-value is not less than 𝛼 (𝑝 = .676 ≮ . 10), H0 is not rejected. There is insufficient evidence to indicate the simple linear regression model is statistically useful for predicting Democratic vote share at 𝛼 = .10. 11.60

For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table III, Appendix D, with df = n − 2 = 12 − 2 = 10 , t.05 = 1.812 . The 95% confidence interval is:

βˆ1 ± t.05 sβˆ  1.87 ± 1.812 (.364 )  1.87 ± .660  (1.21, 2.53) 1

We are 90% confident that the increase in the mean cost of adding military aircraft to the JSF program each year is between $1.21 and $2.53 million. 11.61

11.62

a.

For every one-point increase in the socioeconomic deprivation scale, we are 95% confident that the mean number of liquor scores selling tobacco will increase between 10.48 and 17.86 stores.

b.

In repeated sampling, 95% of the intervals created would contain the true slope of the regression line.

Using MINITAB, the results are: Regression Analysis: Academic Rep Score versus Early Career Pay Analysis of Variance Source Regression Error Total

DF 1 48 49

Adj SS 887.3 7752.9 8640.2

Adj MS 887.3 161.5

F-Value 5.49

P-Value 0.023

Model Summary S 12.7090

R-sq 10.27%

R-sq(adj) 8.40%

Coefficients Term Constant Early Career Pay

Coef 51.9 0.000441

SE Coef 10.6 0.000188

T-Value 4.88 2.34

P-Value 0.000 0.023

Regression Equation Academic Rep Score = 51.9 + 0.000441 Early Career Pay

To determine if there is a positive linear relationship between academic reputation score and early career median salary, we test: H 0 : β1 = 0 H a : β1 > 0

Copyright © 2022 Pearson Education, Inc.


Simple Linear Regression

573

The test statistic is 𝑡 = 2.34 and the p-value is p = .023 / 2 = .0115 . Since the p-value is less than 𝛼 (𝑝 = .0115 < .05), H0 is rejected. There is sufficient evidence to indicate there is a positive linear relationship between academic reputation score and early career median salary at 𝛼 = .05. If we use 𝛼 = .01, the conclusion changes. Since the p-value is not less than 𝛼 (𝑝 = .0115 ≮ . 01), H0 is not rejected. There is insufficient evidence to indicate there is a positive linear relationship between academic reputation score and early career median salary at 𝛼 = .01. 11.63

For the conservative politician to be correct, we would expect to see a positive linear relationship between the homicide rate and the immigration rate. The observed slope was -.32. This does not support the politician’s theory at all. In fact, it provides evidence against the theory.

11.64

From Exercise 11.30, SS xy = − 787.51087 , SS xx = 6, 906.6087 ,  y = 6 0 .1 ,

y = 262.2708 , and 2

βˆ1 = − 0 .1 1 4 0 2 2 8 0 1 .

(  y ) = 262.2708 − ( 60.1) = 262.271 − 157.043913 = 105.226887 y − 2

SS yy = 

2

2

23

n

SSE = SSyy − βˆ1SSxy = 105.226887 − ( −0.114022801)( −787.51087) = 15.43269178 s 2 = MSE =

s βˆ = 1

s SS xx

SSE 15.43269178 = = 0.734890084 and s = 0.734890084 = 0.8573 n−2 23 − 2 0.8573

=

6, 906.6087

= 0.010316

To determine if the mass of the spill tends to diminish linearly as time increases, we test: H 0 : β1 = 0 H a : β1 < 0

The test statistic is 𝑡 =

=

. .

= −11.05.

The rejection region requires 𝛼 = .05 in the lower tail of the t-distribution with 𝑑𝑓 = 𝑛 − 2 = 23 − 2 = 21. From Table III, Appendix D, t.05 = 1.721 . The rejection region is 𝑡 < −1.721. Since the observed value of the test statistic falls in the rejection region (𝑡 = −11.05 < −1.721), H0 is rejected. There is sufficient evidence to indicate the mass of the spill tends to diminish linearly as time increases at 𝛼 = .05. For confidence coefficient .95, α =.05and α / 2 = .05 / 2 = .025 . From Table III, Appendix D, with 𝑑𝑓 = 𝑛 − 2 = 23 − 2 = 21, t.025 = 2.080 . The 95% confidence interval is:

βˆ1 ± t.025sβˆ  −0.1140 ± 2.080( 0.010316)  −0.1140 ± 0.0215  ( −0.1355, − 0.0925) 1

We are 95% confident that for each additional minute of elapsed time, the decrease in spill mass is between 0.1355 and 0.0925. Copyright © 2022 Pearson Education, Inc.


11.65

Chapter 11

a.

Using MINITAB, the results are: Regression Analysis: SLUGPCT versus ELEVATION The regression equation is SLUGPCT = 0.515 + 0.000021 ELEVATION Predictor Constant ELEVATION

Coef 0.515140 0.00002074

S = 0.0369803

SE Coef 0.007954 0.00000719

R-Sq = 23.6%

T 64.76 2.89

P 0.000 0.008

R-Sq(adj) = 20.7%

Analysis of Variance Source Regression Residual Error Total

DF 1 27 28

SS 0.011390 0.036924 0.048314

MS 0.011390 0.001368

F 8.33

P 0.008

To determine if a positive linear relationship exists between elevation and slugging percentage, we test: H 0 : β1 = 0 H a : β1 > 0

The test statistic is 𝑡 = 2.89 and the p-value is p = .008 / 2 = .004 . Since the p-value is less than α ( p = .0 0 4 < .0 1 ) , H0 is rejected. There is sufficient evidence to indicate that a positive linear relationship exists between elevation and slugging percentage at 𝛼 = .01. b.

The scatterplot for the data is: Fitted Line Plot SLUGPCT = 0.5151 + 0.000021 ELEVATION S R-Sq R-Sq(adj)

0.625

0.0369803 23.6% 20.7%

0.600 0.575

SLUGPCT

574

0.550 0.525 0.500 0.475 0.450 0

1000

2000

3000

4000

5000

6000

ELEVATION

The data point for Denver is very far from the rest. This point looks to be an outlier. It is much different than all of the rest of the points. c.

Removing the data point for Denver, the Minitab output is: Regression Analysis: SLUGPCT versus ELEVATION The regression equation is SLUGPCT = 0.515 + 0.000020 ELEVATION Predictor Constant ELEVATION

Coef 0.51537 0.00002012

SE Coef 0.01066 0.00002034

T 48.33 0.99

P 0.000 0.332

Copyright © 2022 Pearson Education, Inc.


Simple Linear Regression S = 0.0376839

R-Sq = 3.6%

575

R-Sq(adj) = 0.0%

Analysis of Variance Source Regression Residual Error Total

DF 1 26 27

SS 0.001389 0.036922 0.038311

MS 0.001389 0.001420

F 0.98

P 0.332

To determine if a positive linear relationship exists between elevation and slugging percentage, we test: H 0 : β1 = 0 H a : β1 > 0

The test statistic is 𝑡 = .99 and the p-value is p = .332 / 2 = .166 . Since the p-value is not less than 𝛼 (𝑝 = .166 ≮ . 01), H0 is not rejected. There is insufficient evidence to indicate that a positive linear relationship exists between elevation and slugging percentage when Denver is removed form the data at α =.01. Since there was a linear relationship with Denver in the data set and no linear relationship with Denver removed from the data set, it supports the “thin air” theory. 11.66

a.

r =1implies x and y have a perfect, positive linear relationship.

b.

r =−1implies x and y have a perfect, negative linear relationship.

c.

r = 0 implies x and y are not linearly related.

d.

r = −.90 implies x and y have a negative linear relationship. Since r is close to -1, the strength of the relationship is very high.

11.67

e.

r =.10 implies x and y have a positive linear relationship. Since r is close to 0, the relationship is fairly weak.

f.

r = −.88 implies x and y have a negative linear relationship. Since r is close to −1, the relationship is fairly strong.

a.

If r = .7 , there is a positive relationship between x and y. As x increases, y tends to increase. The slope is positive.

b.

If r =−.7 , there is a negative relationship between x and y. As x increases, y tends to decrease. The slope is negative.

c.

If r = 0 , there is a 0 slope. There is no relationship between x and y.

d.

If r 2 = .64 , then r is either .8 or −.8. The relationship between x and y could be either positive or negative.

Copyright © 2022 Pearson Education, Inc.


a.

Using MINITAB, a scattergrgam of the data is:

Scatterplot of y vs x 6 5 4 3 2

y

11.68

Chapter 11

1 0 -1 -2

Some preliminary calculations are:

x=0

 y = 12

 x 2 = 10

 y = 70

-3 -2

 xy = 2 0

-1

0

1

2

x

2

SS xy =  xy −

 x  y = 20 − 0 (12 ) = 20

SS xx =  x 2 −

SS yy =  y 2 −

(  y ) = 70 − 12 = 41.2

r =

n

5

2

2

5

n

(  x ) = 10 − 0 = 10 2

SS xy SS xx SS yy

2

5

n

20

=

10(41.2)

= .9853

r 2 = .9853 2 = .9709

Since 𝑟 = .9853, there is a very strong positive linear relationship between x and y. Since 𝑟 = .9709, 97.09% of the total sample variability around the sample mean response is explained by the linear relationship between x and y. b.

Using MINITAB, a scattergram of the data is: Scatterplot of y vs x 6

5

4

y

576

3

2

1

0 -2

-1

0

1

2

x

Some preliminary calculations are:

x=0

 x = 10

 xy = −15

2

 y = 16

 y = 74 2

SS xy =  xy −

 x  y = −15 − 0(16) = −15

SS xx =  x 2 −

SS yy =  y 2 −

(  y ) = 74 − 16 = 22.8

r =

n

5

2

n

2

5

(  x ) = 10 − 0 = 10 2

SS xy SS xx SS yy

Copyright © 2022 Pearson Education, Inc.

5

n

=

2

− 15 10(22.8)

= − .9934


Simple Linear Regression

577

r2 = ( −.9934) = .9868 2

Since 𝑟 = −.9934, there is a very strong negative linear relationship between x and y. Since 𝑟 = .9868, 98.68% of the total sample variability around the sample mean response is explained by the linear relationship between x and y. c.

Using MINITAB, a scattergram of the data is: Scatterplot of y vs x 3.0

y

2.5

2.0

1.5

1.0 1.0

1.5

2.0

2.5

3.0

3.5

4.0

x

Some preliminary calculations are:

 x = 18

 x = 52 2

SS xy =  xy −

 xy = 3 6

 x  y = 36 − 18(14) = 0 n

7

(  y ) = 32 − 14 = 4 y − 2

SS yy = 

 y = 14

2

2

2

(  x ) = 52 − 18 = 5.71428571 2

SS xx =  x 2 −

r=

7

n

 y = 32

SS xy SS xx SS yy

2

7

n

=

0 5.71428571( 4)

=0

r 2 = 02 = 0

Since 𝑟 = 0, this implies that x and y are not linearly related. Since 𝑟 = 0, 0% of the total sample variability around the sample mean response is explained by the linear relationship between x and y. Using MINITAB, the scattergram of the data is: Scatterplot of y vs x 2.0

1.5

y

d.

1.0

0.5

0.0 0

1

2

3

4

5

6

x

Copyright © 2022 Pearson Education, Inc.


578

Chapter 11

Some preliminary calculations are:

 x = 15

 x = 71 2

 xy = 1 2

SS xy = 

 x  y = 12 − 15(4) = 0 xy −

SS yy = 

(  y ) = 6 − 4 = 2.8 y −

y=4

2

2

n

2

(  x ) = 71 − 15 = 26 x − 2

SS xx = 

5

n

y =6

2

r =

5

2

2

SS xy SS xx SS yy

n

=

5

0 26(2.8)

=0

r 2 = 02 = 0

Since 𝑟 = 0, this implies that x and y are not linearly related. Since 𝑟 = 0, 0% of the total sample variability around the sample mean response is explained by the linear relationship between x and y. 11.69

a.

From Exercises 11.14 and 11.34, r 2 = 1 − SSE = 1 − 1.22033896 = 1 − .0562 = .9438 SS yy

21.7142857

94.38% of the total sample variability around the sample mean response is explained by the linear relationship between y and x. b.

Some preliminary calculations are:

 x = 33

 y = 27

 xy = 104

SS xy =  xy −

(  x )(  y ) = 104 − 33 ( 27 ) = − 23.2857143

SS xx =  x 2 −

(  x ) = 179 − 33 = 23.4285714

n

βˆ1 =

SS xy SS xx

=

2

 y = 133 2

7

2

n

 x = 179

2

7

− 23.2857143 = − .99390244 23.4285714

(  y ) = 133 − 27 = 28.8571429 2

SS yy =  y 2 −

n

2

7

SSE = SSyy − βˆ1SSxy = 28.8571429 − ( −.99390244)( −23.2857143) = 5.71341462 r2 = 1−

SSE 5.71341462 = 1− = 1 − .1980 = .802 SS yy 28.8571429

80.2% of the total sample variability around the sample mean response is explained by the linear relationship between y and x. 11.70

a.

The p-value is p = .33 . Since the p-value is not small, H0 would not be rejected. There is insufficient evidence to indicate a linear relationship between cooperation use and average payoff for any value of 𝛼 < .33.

Copyright © 2022 Pearson Education, Inc.


Simple Linear Regression

b.

c.

11.71

11.72

579

The p-value is p = .66 . Since the p-value is not small, H0 would not be rejected. There is insufficient evidence to indicate a linear relationship between defection use and average payoff for any value of 𝛼 < .66. The p-value is 𝑝 = .001. Since the p-value is small, H0 would be rejected. There is sufficient evidence to indicate a linear relationship between punishment use and average payoff for any value of 𝛼 > .001.

a.

r 2 = .18 . 18% of the total sample variability around the sample mean number of points scored by a team that has a first-down is explained by the linear relationship between the number of points scored by a team that has a first-down and the number of yards from the opposing goal line.

b.

r = .18 = − .424 . The value of r will be negative because the sign of the estimate of β 1 is negative.

a.

𝑟 = .5610. 56.1% of the total sample variability around the mean perceived level of traditionalism is explained by the linear relationwhip with the perceived level of adaptation.

b.

𝑟 = −.7490. There is a very strong negative linear relationship between perceived level of traditionalism and perceived level of adaptation.

c.

To determine if there is a linear correlation coefficient between perceived level of traditionalism and perceived level of adaptation, we test: 𝐻 : 𝜌=0 𝐻 : 𝜌≠0

The p-value is 𝑝 = 0.0202. Since the p-value is smaller than α, H0 is rejected. There is sufficient evidence to indicate a true linear correlation exists between perceived level of traditionalism and perceived level of adaptation at 𝛼 = .05. 11.73

a.

The linear model would be: E ( y ) = β 0 + β1 x

b.

r = .68. There is a moderate positive linear relationship between RMP and SET.

c.

Since 𝑟 = .68 is positive, the slope of the line will also be positive.

d.

The p-value is p = .001 . Since this value is so small, we would reject H0. There is sufficient evidence of a linear relationship between RMP and SET for any value of 𝛼 > .001.

e.

r 2 = .682 = .4624 . 46.24% of the total sample variability around the sample mean SET values is

explained by the linear relationship between SET and RMP. 11.74

𝑟 = .81. 81% of the total sample variability around the sample mean weight gain is explained by the linear relationship between weight gain and the percentage amount of whole banana meal in a fish’s diet.

11.75

a.

r = .983 . There is a strong positive linear relationship between the number of females in managerial positions and the number of females with college degrees.

b.

r = .074 . There is a very weak positive linear relationship between the number of females in managerial positions and the number of female high school graduates with no college degree.

c.

r = .722 . There is a moderately strong positive linear relationship between the number of males in managerial positions and the number of males with college degrees. Copyright © 2022 Pearson Education, Inc.


580

11.76

11.77

Chapter 11

d.

𝑟 = .528. There is a moderately weak positive linear relationship between the number of males in managerial positions and the number of male high school graduates with no college degree.

a.

𝑟 = −.271. There is a weak negative linear relationship between a student’s last name position and response time.

b.

Since the p-value is less than 𝛼 (𝑝 = .018 < .05), Ho is rejected. There is sufficient evidence of a negative linear relationship between a student’s last name position and response time.

c.

Yes. There is a statistically significant negative linear relationship between a student’s last name position and response time, although the relationship is rather weak. As the position of the last name increases, the response time decreases. This supports the researchers’ theory.

𝑟 = .213. There is a small positive linear relationship between overall satisfaction and desination competitiveness.

Since 𝑟 = .213 is positive, the slope of the line will also be positive. The p-value is p = .001 . Since this value is so small, we would reject H0. There is sufficient evidence of a linear relationship between overall satisfaction and desination competitiveness for any value of 𝛼 > .001. 𝑟 = .045. 4.5% of the total sample variability around the sample mean overall satisfaction scores is explained by the linear relationship between overall satisfaction and desination competitiveness.

11.78

a.

For the correlation between perceived hedonic intensity and perceived sensoriy intensity for favorite foods, 𝑟 = .401. This implies that perceived hedonic intensity and perceived sensoriy intensity have a positive linear relationship. Because the value of r is not very large, the relationship is fairly weak. For the correlation between perceived hedonic intensity and perceived sensoriy intensity for least favorite foods, 𝑟 = −.375. This implies that perceived hedonic intensity and perceived sensoriy intensity have a negative linear relationship. Because the value of r is not very large, the relationship is fairly weak.

b.

Yes. The correlation between perceived hedonic intensity and perceived sensoriy intensity for favorite foods is positive, while the correlation between perceived hedonic intensity and perceived sensoriy intensity for least favorite foods is negative. Thus those with the highest perceived sensoriy intensity scores tend to have the highest perceived hedonic intensity scores for favorite foods and the lowest perceived hedonic intensity scores for least favorite foods. If we take the difference in the perceived hedonic intensity scores between favorite and least favorite foods, then those with the highest perceived sensoriy intensity scores should tend to have the greatest difference. sing MINITAB, the plot of the differences against the perceived sensoriy intensity scores is: Scatterplot of Diff vs PSI 140 120 100

Diff

80 60 40 20 0 0

10

20

30

40

50

PSI

11.79

60

70

80

90

This graph confirms this belief.

Some preliminary calculations are: Copyright © 2022 Pearson Education, Inc.


Simple Linear Regression

 x = 6,1 67

 x = 1, 6 4 1,1 1 5

 xy = 34, 764.5

2

SS xy =  xy −

n

24 2

2

24

n

(  y ) = 769.72 − 135.8 = 1.3183333 2

SS xy SS xx

2

(  x ) = 1, 641,115 − ( 6,167 ) = 56, 452.95833 x −

SS yy =  y 2 −

βˆ1 =

 y = 769.72

 x  y = 34, 764.5 − 6167 (135.8) = −130.44167 2

SS xx = 

 y = 13 5.8

581

=

n

2

24

− 130.44167 = − 0.002310625 56, 452.95833

SSE = SSyy − βˆ1SSxy = 1.3183333 − ( −0.002310625)( −130.44167) = 1.016931516 r2 =

SS yy − SSE SS yy

=

1.3183333 − 1.016931516 = .2286 1.3183333

22.86% of the total sample variability around the sample mean sweetness index is explained by the linear relationship between the sweetness index and the amount of water soluble pectin. r = − .2286 = −.478 (The value of r is negative because βˆ 1 is negative.)

Since this value is not close to one, there is a rather weak negative linear relationship between the sweetness index and the amount of water soluble pectin. 11.80

Using MINITAB, the correlations are: Correlations: Wives, Sons, Daughters Wives Sons Sons 0.011 0.978 Daughters

-0.010 0.980

0.550 0.125

Cell Contents: Pearson correlation P-Value

11.81

a.

The correlation between wives and sons is 𝑟 = .011. Since this is very close to 0, there is no evidence of a linear correlation between the accompaniment rates of wives and sons.

b.

The correlation between wives and daughters is 𝑟 = −.010. Since this is very close to 0, there is no evidence of a linear correlation between the accompaniment rates of wives and daughters.

c.

The correlation between sons and daughters is 𝑟 = .550. There is a moderate positive linear

a.

To determine if the true population correlation coefficient relating NRMSE and bias is positive, we test: H0 : ρ = 0 Ha : ρ > 0

Copyright © 2022 Pearson Education, Inc.


582

Chapter 11

The test statistic is 𝑡 =

=

.

√ ,

.

=

.

= 17.75.

.

The p-value is 𝑝 = 𝑃(𝑡 > 17.75). Using MINITAB with df = n − 2 = 3,600 − 2 = 3,598 , 𝑝 = 𝑃(𝑡 > 17.75) ≈ 0. Since the p-value is so small, H0 is rejected. There is sufficient evidence to indicate the true population correlation coefficient relating NRMSE and bias is positive for any reasonable value of 𝛼. b. 11.82

No. Even though there is a significant positive linear relationship between NRMSE and bias, the relationship is very weak. The relationship is highly significant because the sample size is so large.

From Exercises 11.30 and 11.62, SS xy = − 787.51087 , SS xx = 6, 906.6087 , and SS yy = 105.226887 . 𝑟=

=

. √ ,

.

= −.924

.

There is a very strong negative linear relationship between mass of spill and elapsed time of the spill. r 2 = −.9242 = .854 . Approximately 85.4% of the variability in the mass of the spill around the sample mean is explained by the linear relationship between mass of the spill and elapsed time of the spill.

11.83

To determine whether average earnings and height are positively correlated for those in the different occupations, we test: H0 : ρ = 0 Ha : ρ > 0

The test statistic is 𝑡 =

We will use 𝛼 = .01 for all tests. The rejection region requires 𝛼 = .01 in the upper tail of the tdistribution. Since all of the sample sizes are over 100, the rejection regions will all be approximately the same. Using df = ∞ and Table III, Appendix D, t.01 ≈ 2.33 . The rejection region is t > 2.33 . For Sales, the test statistic is 𝑡 =

For Managers, the test statistic is 𝑡 =

=

.

= 4.82

.

=

.

= 7.95

.

For Blue Collar Workers, the test statistic is 𝑡 =

For Service Workers, the test statistic is 𝑡 =

.

=

=

.

√ .

For Clerical Workers, the test statistic is 𝑡 =

For Crafts/Forepersons, the test statistic is t =

=

.

r n−2 1− r

2

= 5.29

.

For Professional/Technical Workers, the test statistic is 𝑡 = √ .

=

= 6.29

=

.

√ .

= 4.87

.24 250 − 2 1 − .242

= 3.89

Copyright © 2022 Pearson Education, Inc.

= 6.68


Simple Linear Regression

583

Since the observed value of the test statistic falls in the rejection region for all occupations, H0 is rejected. There is sufficient evidence to indicate the average earnings and height for those in all occupations are positively correlated at 𝛼 = .01. We cannot conclude that a person taller than oneself will earn a higher salary. These correlation coefficients indicate that although there is a significant correlation, the correlations are all fairly weak. There is a trend, but there is also much variation that is not explained. 11.84

a.

b.

Some preliminary calculations are:

 x = 28

 x = 224 2

SS xy =  xy −

 xy = 2 5 4  y = 37

 x  y = 254 − 28 ( 37 ) = 106 n

7

2

2

2

βˆ0 = y − βˆ1 x =

n

7

(  x ) = 224 − 28 = 112 x − 2

SS xx = 

(  y ) = 307 − 37 = 111.4285714 y − 2

SS yy = 

 y = 307

βˆ1 =

n

37  28  − .946428571  = 1.5 7  7 

SSE = SSyy − βˆ1SSxy = 111.4285714 − (.946428571)(106) = 11.1071429 s2 =

SSE 11.1071429 = = 2.22143 n−2 7−2

7

SS xy 106 = = .946428571 SS xx 112

The least squares line is yˆ = 1.5 + .946x . c.

2

2

s = s2 = 2.22143 =1.4904

Copyright © 2022 Pearson Education, Inc.


584

Chapter 11

d.

For x p = 3 , 𝑦 = 1.5 + .946(3) = 4.338 and x = 4 For confidence coefficient .90, α = .10 and α / 2 = .10 / 2 = .05 . From Table III, Appendix D, t.05 = 2.015 with df = n − 2 = 7 − 2 = 5 . The 90% confidence interval is:

(

xp − x 1 yˆ ± tα / 2 s + n SS xx

e.

)  4.338 ± 2.015 (1.4904) 1 + ( 3 − 4)  4.338 ± 1.170  ( 3.168, 5.508) 2

2

7

The 90% prediction interval is

(

xp − x 1 yˆ ± tα / 2 s 1 + + n SS xx

f.

112

)  4.338 ± 2.015 (1.4904 ) 1 + 1 + ( 3 − 4 )  4.338 ± 3.223  (1.115, 7.561) 2

2

7

112

The 95% prediction interval for y is wider than the 95% confidence interval for the mean value of y when x p = 3 . The error of predicting a particular value of y will be larger than the error of estimating the mean value of y for a particular value of x. This is true since the error in estimating the mean value of y for a given x value is the distance between the least squares line and the true line of means, while the error in predicting some future value of y is the sum of two errors—the error of estimating the mean of y plus the random error that is a component of the value of y to be predicted.

11.85

a.,b. The scattergram is:

c.

𝑆𝑆𝐸 = 𝑆𝑆

s2 =

− 𝛽 𝑆𝑆

= 33. 6 − .84318766(32. 8) = 5.94344473

SSE 5.94344473 = = .742930591 n−2 10 − 2

s = .742930591 = .8619

(x − x ) The form of the confidence interval is yˆ ± tα / 2 s 1 + p n

2

SS xx

For 𝑥 = 6, 𝑦 = −.414 + .843(6) = 4.64

Copyright © 2022 Pearson Education, Inc.

x=

31 = 3.1 10


Simple Linear Regression

585

For confidence coefficient .95, α = .05 and α / 2 = .05 / 2 = .025 . From Table III, Appendix D, with df = n − 2 = 10 − 2 = 8 , t.025 = 2.306 . The confidence interval is:

1 ( 6 − 3.1) 4.64 ± 2.306(.8619) +  4.64 ± 1.12  ( 3.52, 5.76) 10 38.9 2

d.

For x p = 3.2 , yˆ = − .414 + .843 ( 3.2 ) = 2.28 The confidence interval is:

1 ( 3.2 − 3.1) 2.28 ± 2.306(.8619) +  2.28 ± .63  (1.65, 2.91) 10 38.9 2

For x p = 0 , yˆ = − .414 + .843 ( 0 ) = − .41 The confidence interval is:

−.41 ± 2.306(.8619) e.

1 ( 0 − 3.1) +  −.41 ± 1.17  ( −1.58, .76) 10 38.9 2

The width of the confidence interval for the mean value of y depends on the distance x p is from x. The width of the interval for x p = 3.2 is the smallest because 3.2 is the closest to x = 3.1. The width of the interval for x p = 0 is the widest because 0 is the farthest from x = 3.1.

11.86

a.

Some preliminary calculations are:

y=

 y = 22 = 2.2 , s =  2

n

( y) y −

2

2

10

n

n −1

=

82 −

( 22 ) 2

10 10 − 1

= 3.7333 and s =1.9322

For confidence coefficient .95, α =.05and α / 2 = .05 / 2 = .025 . From Table III, Appendix D, 𝑡. 2.262 with df = n − 2 = 10 − 1 = 9 . The 95% confidence interval is: y ± tα / 2

s n

 2.2 ± 2.262

1.9322 10

 2.2 ± 1.382  (.818, 3.582)

Copyright © 2022 Pearson Education, Inc.

=


586

Chapter 11

b.

c.

d.

The confidence intervals computed in Exercise 11.83 are much narrower than that found in part a. Thus, x appears to contribute information about the mean value of y. From Exercise 11.83, βˆ1 =.843, s =.8619, SS xx = 38.9 , and n = 10 . H 0 : β1 = 0 H a : β1 ≠ 0

The test statistic is 𝑡 =

=

= SS

. . √

= 6.10 .

The rejection region requires 𝛼/2 = .05/2 = .025 in each tail of the t-distribution with df = n − 2 = 10 − 2 = 8 . From Table III, Appendix D, t.025 = 2.306 . The rejection region is 𝑡 < −2.306 or 𝑡 > 2.306. Since the observed value of the test statistic falls in the rejection region (𝑡 = 6.10 > 2.306), H0 is rejected. There is sufficient evidence to indicate the straight-line model contributes information for the prediction of y at 𝛼 = .05. 11.87

a.

βˆ1 =

SS xy 28 = = .875 SS xx 32

βˆ 0 − y − βˆ1 x = 4 − .875 ( 3 ) = 1.375

The least squares line is yˆ = 1.375 + .875x . b.

The least squares line is:

Copyright © 2022 Pearson Education, Inc.


Simple Linear Regression

c.

𝑆𝑆𝐸 = 𝑆𝑆

− 𝛽 𝑆𝑆

d.

𝑠 =

e.

𝑠 = √. 1875 = .4330

=

.

587

= 26 − .875(28) = 1.5

= .1875

For 𝑥 = 2.5, 𝑦 = 1.375 + .875(2.5) = 3.5625 For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table III, Appendix D, with 𝑑𝑓 = 𝑛 − 2 = 10 − 2 = 8, 𝑡. = 2.306. The confidence interval is: (2.5 − 3) 𝑥 − 𝑥̄ 1 1 + ⇒ 3.5625 ± 2.306(.4330) + ⇒ 3.5625 ± .3279 𝑆𝑆 32 𝑛 10 ⇒ (3.2346, 3.8904) For x p = 4 , 𝑦 = 1.375 + .875(4) = 4.875 𝑦±𝑡 / 𝑠

f.

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table III, Appendix D, with = 2.306. The prediction interval is: df = n − 2 = 10 − 2 = 8 , 𝑡.

𝑦±𝑡 / 𝑠 1+ +

̄

⇒ 4.875 ± 2.306(.4330) 1 +

+

(

)

⇒ 4.875 ± 1.062

⇒ (3.813, 5.937)

11.88

11.89

a.

First, we need to find the least squares prediction line for predicting the average payoff based on the number of times a player used punishment. To predict the average payoff for a single player who used punishment 10 times, one would substitute x = 10 into the prediction equation to find the point estimate. If one wanted to find a confidence interval, one would use a prediction interval for an individual observation.

b.

One would use the same prediction equation as mentioned in part a. Again, one would substitute 𝑥 = 10 into the prediction equation to find the point estimate for the average payoff for all players who used punishment 10 times. If one wanted to find a confidence interval, one would use a confidence interval for the mean.

a.

To determine if the model is adequate, we test: 𝐻 : 𝛽 =0 𝐻 : 𝛽 ≠0

From the printout, the test statistic is 𝑡 = 6.59 and the p-value is 0.000 Since the p-value is so small, H0 is rejected. There is sufficient evidence to indicate the model is adequate for 𝛼 = .10. b.

We are 95% confident that the average total area for all states with 2,000 structurally deficient bridges will be between 563.589 and 789.426 thousand square feet. We are 95% confident that the actual total area for a single state with 2,000 structurally deficient bridges will be between 59.9375 and 1,293.08 thousand square feet. To predict the area for a single state, the FHWA should use the prediction interval.

Copyright © 2022 Pearson Education, Inc.


588

11.90

Chapter 11

a.

To determine if the model is adequate, we test: 𝐻 : 𝛽 =0 𝐻 : 𝛽 ≠0

From the printout, the test statistic is 𝑡 = 9.92 and the p-value is𝑝 < .0001. Since the p-value is so small, H0 is rejected. There is sufficient evidence to indicate the model is adequate for any reasonable value of 𝛼. In addition, r 2 = .8242 . 82.42% of the sample variation in the revenue values is explained by the linear relationship between the number of tweets and the opening weekend box-office revenue. Thus, we would recommend that the model be used for predicition purposes. b.

From the printout, the prediction interval for revenue of a movie with a tweet rate of 250 is (−7.46,49.14). Since the revenue cannot be negative, the interval is (0,49.14). We are 95% confident that the actual box-office revenue of a movie with 250 tweets per hour is between 0 and $49.14 million.

11.91

The 90% confidence interval for 𝑥 = 275 is (5.536, 5.697). We are 90% confident that the mean sweetness index of all orange juice samples will be between 5.536 and 5.697 parts per million when the pectin value is 275.

11.92

No, we would not recommend using the model to predict the number of public corruption convictions in a state with an annual FEMA relief of 5 thousand dollars. Since the p-value is p = .102 , there is no evidence of a linear relationship between the number of public corruption convictions and the annual FEMA relief.

11.93

a.

From MINITAB: Settings Variable Exp

Setting 25

Prediction Fit SE Fit 99% CI 99% PI 593.037 34.9733 (492.369, 693.706) (139.088, 1046.99)

We are 99% confident that the pivot pin-top distance set by a farmer with 25 years of experience will fall between 139.088 and 1,046.99 millimeters. b.

Settings Variable Exp

Setting 10

Prediction Fit SE Fit 99% CI 99% PI 539.569 43.4897 (414.386, 664.751) (79.5621, 999.576)

We are 99% confident that the pivot pin-top distance set by a farmer with 10 years of experience will fall between 79.5621 and 999.576 millimeters. c.

The interval in part a is narrower because the predicted value of experience that was used was closer to the average experience of all farmers in the sample than the value of experience that was used in part b. Copyright © 2022 Pearson Education, Inc.


Simple Linear Regression

11.94

d.

We are 99% confident that the average pivot pin-top distance set by all farmers with 10 years of experience will fall between 414.386 and 664.751 millimeters.

e.

The confidence interval for E(y) will always be narrower than the prediction interval for y.

a.

From MINITAB:

589

Settings Variable DISTANCE

Setting 300

Prediction Fit SE Fit 95% CI 95% PI 59.2186 1.18426 (56.8212, 61.6160) (43.8727, 74.5645)

We are 95% confident that the actual driving accuracy for a golfer who drives the ball 300 yards is between 43.8727 and 74.5645.

11.95

b.

We are 95% confident that the mean driving accuracy for all golfers who drive the ball 300 yards is between 56.8212 and 61.6160.

c.

If we are interested in knowing the average driving accuracy of all PGA golfers who have a driving distance of 300 yards, we would use the confidence interval for E(y) or the mean found in part b. The prediction interval is a confidence interval for one golfer, not the mean of all golfers.

a.

From Exercises 11.30 and 11.62, βˆ0 = 5.221 , βˆ1 = − .114 , SS xx = 6, 906.6087 , and s = .8573 . For x p = 15 , yˆ = 5.2207 − .11402 (15 ) = 3.5104 For confidence coefficient .99, α = .01 and α / 2 = .01 / 2 = .005 . From Table III, Appendix D, with df = n − 2 = 23 − 2 = 21 , t.005 = 2.831 . The confidence interval is:

𝑦±𝑡 / 𝑠

+

̄

⇒ 3.5104 ± 2.831(. 8573)

+

(

. ,

.

)

⇒ 3.5104 ± .5558

⇒ (2.9546, 4.0662)

We are 99% confident that the mean mass of all spills will be between 2.9546 and 4.0662 when the elapsed time is 15 minutes. b.

For x p = 15 , yˆ = 5.2207 − .11402 (15 ) = 3.5104 For confidence coefficient .99, α = .01 and α / 2 = .01 / 2 = .005 . From Table III, Appendix D, with df = n − 2 = 23 − 2 = 21 , t.005 = 2.831 . The prediction interval is: 𝑦±𝑡 / 𝑠 1+

𝑥 − 𝑥̄ 1 + 𝑛 𝑆𝑆

⇒ 3.5104 ± 2.831(. 8573) 1 +

(15 − 22.8696) 1 + 23 6,906.6087

⇒ 3.5104 ± 2.4898 ⇒ (1.0206, 6.0002)

We are 99% confident that the actual mass of a spill will be between 1.0206 and 6.0002 when the elapsed time is 15 minutes.

Copyright © 2022 Pearson Education, Inc.


590

11.96

Chapter 11

c.

The prediction interval for the actual value is larger than the confidence interval for the mean. This will always be true. The prediction interval for the actual value contains 2 errors. First, we must locate the true mean of the distribution. Once this mean is located, the actual values of the variables can still vary around this mean. There is variance in locating the mean and then variance of the actual observations around the mean.

a.

Using MINITAB, the results are: Regression Analysis: NITRO versus AMMON Analysis of Variance Source Regression Error Total

DF 1 118 119

Adj SS 88104 9666 97770

Adj MS 88104.0 81.9

F-Value 1075.57

P-Value 0.000

Model Summary S 9.05061

R-sq 90.11%

R-sq(adj) 90.03%

Coefficients Term Constant AMMON

Coef 2.22 0.5764

SE Coef 2.17 0.0176

T-Value 1.03 32.80

P-Value 0.307 0.000

Regression Equation NITRO = 2.22 + 0.5764 AMMON

To determine if the model is adequate, we test: H 0 : β1 = 0 H a : β1 ≠ 0

The test statistic is 𝑡 = 32.80 and the p-value is 𝑝 = .000. Since the p-value is so small, H0 is rejected. There is sufficient evidence to indicate the model is adequate for any reasonable value of 𝛼. In addition, 𝑟 = .9011. 90.11% of the sample variation in the amounts of nitrogen removed is explained by the linear relationship between the amounts of nitrogen removed and the amounts of ammonium used. Thus, we would recommend that the model be used for thr predicition of nitrogen amounts. b.

Using MINITAB, the results are: Prediction for NITRO Regression Equation NITRO = 2.22 + 0.5764 AMMON Variable AMMON

Setting 100

Fit 59.8596

SE Fit 0.861905

95% CI (58.1528, 61.5664)

95% PI (41.8558, 77.8634)

Copyright © 2022 Pearson Education, Inc.


Simple Linear Regression

591

The 95% prediction interval for nitrogen amount when the amount of ammonium used is 100 mg per liter is ( 4 1 .8 6 , 7 7 .8 6 ) . We are 95% confident that the amount of nitrogen removed when 100 mg of ammonium is used is between 41.86 and 77.86 mg. c.

The 95% confidence interval for the mean amount of nitrogen removed when 100 mg of ammonium is used will be narrower. From the output, the 95% confidence interval for the mean is (58.15,61.57).

d.

Using MINITAB, the results are: Prediction for NITRO Regression Equation NITRO = 2.22 + 0.5764 AMMON Variable AMMON

Setting 100

Fit 59.8596

SE Fit 0.861905

90% CI (58.4307, 61.2885)

90% PI (44.7870, 74.9322)

The 90% confidence interval for the mean nitrogen amount when amount of ammonium used is 100 milligrams per liter will be narrower because there is less confidence. The interval is ( 58.43, 61.29 ) . 11.97

a.

Using MINITAB, the results of the regression analysis are: Regression Analysis: QuitRate versus AvgWage The regression equation is QuitRate = 4.86 - 0.347 AvgWage Predictor Constant AvgWage

Coef 4.8615 -0.34655

S = 0.4862

SE Coef 0.5201 0.05866

R-Sq = 72.9%

T 9.35 -5.91

P 0.000 0.000

R-Sq(adj) = 70.8%

Analysis of Variance Source Regression Residual Error Total

DF 1 13 14

SS 8.2507 3.0733 11.3240

MS 8.2507 0.2364

F 34.90

P 0.000

To determine if the average hourly wage rate contributes information to predict quit rates, we test: H 0 : β1 = 0 H a : β 1 =/ 0

The test statistic is t =

βˆ1 − 0 sβˆ

= −5.91 and the p-value is p = 0.000 .

1

Since the p-value is less than 𝛼 (𝑝 = 0.000 < .05), H0 is rejected. There is sufficient evidence to indicate that the average hourly wage rate contributes information to predict quit ratio at 𝛼 = .05. Since the slope is negative 𝛽 = −.34655 , the model suggests that x and y have a negative relationship. As the average hourly wage rate increases, the quit rate tends to decrease.

Copyright © 2022 Pearson Education, Inc.


592

Chapter 11

b.

Some preliminary calculations are: x=

 x = 129.05 = 8.6033

 x = 129.05

 x = 1,178.9601

SS xx =  x 2 −

(  x ) = 1,178.9601 − (129.05 ) = 68.699933

2

15

n

2

yˆ = 4.8615 − 0.34655 ( 9 ) = 1.743

2

n

15

For confidence level .95, α = .05 and α / 2 = .05 / 2 = .025 . From Table III, Appendix D, with df = n − 2 = 15 − 2 = 13 , t.025 = 2.160 . The 95% prediction interval is:

(

1 xp − x yˆ ± tα / 2 s 1 + + n SSxx

)  1.743 ± 2.160(.4862) 1 + 1 + ( 9 − 8.6033) 2

15

2

68.699933

 1.743 ± 1.086  ( 0.657, 2.829) We are 95% confident that the actual quit rate when the average hourly wage is $9.00 is between 0.657 and 2.829. c.

The 95% confidence interval is:

(

1 xp − x yˆ ± tα / 2 s + SSxx n

)  1.743 ± 2.160(.4862) 1 + (9 − 8.6033) 2

15

2

68.699933

 1.743 ± 0.276  (1.467, 2.019) We are 95% confident that the mean quit rate when the average hourly wage is $9.00 is between 1.467 and 2.019. 11.98

a.

From Exercise 11.44, SS xx = 3000 and x = 50 . Also, for Brand A, s =1.211; for Brand B, s = .610 . For Brand A, for x p = 45 , yˆ = 6.62 − .0727 ( 45 ) = 3.349 , while for Brand B, 𝑦 = 9.31 − .1077(45) = 4.464.

The degrees of freedom for both brands is df = n − 2 = 15 − 2 = 13 . For confidence coefficient .90, (i.e., for all parts of this question), α =.10and α / 2 = .10 / 2 = .05 . From Table III, Appendix D, with df = 13 , t.05 = 1.771 . The form of both confidence intervals is yˆ ± tα /2

(

1 xp − x + n SSxx

)

2

For Brand A, we obtain:

1 ( 45 − 50) 3.349 ± 1.771(1.211) +  3.349 ± .587  ( 2.762, 3.936) 15 3000 2

For Brand B, we obtain:

4.464 ± 1.771(.610)

1 ( 45 − 50) +  4.464 ± .296  ( 4.168, 4.760) 15 3000 2

Copyright © 2022 Pearson Education, Inc.


Simple Linear Regression

593

The first interval is wider, caused by the larger value of s.

b.

(

1 xp − x The form of both prediction intervals is yˆ ± tα /2 s 1 + + n SS xx

)

2

For Brand A, we obtain:

3.349 ± 1.771(1.211) 1 +

1 ( 45 - 50) +  3.349 ± 2.224  (1.125, 5.573) 15 3000 2

For Brand B, we obtain:

4.464 ± 1.771(.610) 1 +

1 ( 45 - 50) +  4.464 ± 1.120  ( 3.344, 5.584) 15 3000 2

Again, the first interval is wider, caused by the larger value of s. Each of these intervals is wider than its counterpart from part a, since, for the same x, a prediction interval for an individual y is always wider than a confidence interval for the mean of y. This is due to an individual observation having a greater variance than the variance of the mean of a set of observations. c.

To obtain a confidence interval for the life of a brand A cutting tool that is operated at 100 meters per minute, we use:

(

1 xp − x yˆ ± tα /2 s 1 + + n SS xx

)

2

For x=100 , yˆ = 6.62 − .0727 (100 ) = − .65 . The degrees of freedom are n− 2 = 15 − 2 = 13 . For confidence coefficient .95, α =.05and α / 2 = .05 / 2 = .025 . From Table III, Appendix D, with df = 13 , t.025 = 2.160 . Here, we obtain:

1 (100 − 50) −.65 ± 2.160 (1.211) 1 + +  −.65 ± 3.606  ( −4.256, 2.956) 15 3000 2

The additional assumption would be that the straight line model fits the data well for the x's actually observed all the way up to the value under consideration, 100. Clearly from the estimated value of −.65, this is not true (usually, negative "useful lives" are not found). 11.99

Using MINITAB, the results are: Regression Equation AngerHos

=

0.633 + 0.7626 PER

Copyright © 2022 Pearson Education, Inc.


594

Chapter 11 Coefficients Term Constant PER

Coef 0.633 0.7626

SE Coef 0.171 0.0520

T-Value 3.70 14.66

P-Value VIF 0.000 0.000 1.00

Model Summary S R-sq 0.985523 53.87%

R-sq(adj) 53.62%

R-sq(pred) 53.00%

To determine if perceptions of an exploitative relationship lead employees to feel anger and hostility toward their organizations, we test: 𝐻 : 𝛽 =0 𝐻 : 𝛽 >0 .

= .000. Since the p-value is so small, H0 is The test statistic is 𝑡 = 14.66 and the p-value is 𝑝 = rejected. There is sufficient evidence to indicate that perceptions of an exploitative relationship lead employees to feel anger and hostility toward their organizations at any choice of 𝛼.

In addition, 𝑟 = .5387. 53.87% of the sample variation in the anger and hostility levels around their mean can be explained by the linear relationship between the anger and hostility levels and the percenptions of an eploitive relationship values. Thus, we would recommend that the model be used for thr predicition of anger and hostility levels. 11.100 a.

Using MINITAB, the results are: Regression Analysis: Efficiency versus Dust Analysis of Variance Source Regression Error Total

DF 1 8 9

Model Summary S R-sq 0.336142 12.19%

Adj SS 0.1255 0.9039 1.0294

Adj MS 0.1255 0.1130

F-Value 1.11

P-Value 0.323

R-sq(adj) 1.21%

Coefficients Term Constant Dust

Coef 1.411 349

SE Coef 0.265 331

T-Value 5.32 1.05

P-Value 0.001 0.323

Regression Equation Efficiency = 1.411 + 349 Dust

The fitted regression line is yˆ = 1.411 + 349 x . b.

To determine if the model is adequate for predicting efficiency, we test: H 0 : β1 = 0 H a : β1 ≠ 0

Copyright © 2022 Pearson Education, Inc.


Simple Linear Regression

595

The test statistic is t = 1.05 and the p-value is p = .323 . Since the p-value is not small, H0 is not rejected. There is insufficient evidence to indicate the model is adequate for any reasonable value of 𝛼. We would not recommend that the model be used to predict the efficiency of the solar panel. 11.101 a.

Using MINITAB, the results are: Regression Analysis: Corrupt versus GDP Analysis of Variance Source Regression Error Total

DF 1 11 12

Adj SS 3345.8 811.9 4157.7

Adj MS 3345.76 73.81

F-Value 45.33

P-Value 0.000

Model Summary S 8.59141

R-sq 80.47%

R-sq(adj) 78.70%

R-sq(pred) 70.72%

Coefficients Term Constant GDP

Coef 25.89 0.000985

SE Coef 3.09 0.000146

T-Value 8.37 6.73

P-Value 0.000 0.000

VIF 1.00

Regression Equation Corrupt = 25.89 + 0.000985 GDP

To determine if the model is adequate, we test: H 0 : β1 = 0 H a : β1 ≠ 0

The test statistic is 𝑡 = 6.73 and the p-value is 𝑝 = .000. Since the p-value is so small, H0 is rejected. There is sufficient evidence to indicate the model is adequate for predicting corruption level for any reasonable value of 𝛼. In addition, 𝑟 = .8047. 80.47% of the sample variation in the corruption levels is explained by the linear relationship between the corruption levels and the GDP per capita. Thus, we would recommend that the model be used for thr predicition of corruption levels. b.

Using MINITAB, the results are: Regression Analysis: Corrupt versus PolR Analysis of Variance Source Regression PolR Error Lack-of-Fit Pure Error Total

DF 1 1 11 4 7 12

Adj SS 2527.6 2527.6 1630.0 807.3 822.8 4157.7

Adj MS 2527.6 2527.6 148.2 201.8 117.5

F-Value 17.06 17.06

P-Value 0.002 0.002

1.72

0.250

Copyright © 2022 Pearson Education, Inc.


596

Chapter 11 Model Summary S 12.1732

R-sq 60.79%

R-sq(adj) 57.23%

R-sq(pred) 41.39%

Coefficients Term Constant PolR

Coef 66.06 -6.25

SE Coef 7.34 1.51

T-Value 9.00 -4.13

P-Value 0.000 0.002

VIF 1.00

Regression Equation Corrupt = 66.06 - 6.25 PolR

To determine if the model is adequate, we test: H 0 : β1 = 0 H a : β1 ≠ 0

The test statistic is 𝑡 = −4.13 and the p-value is 𝑝 = .002. Since the p-value is so small, H0 is rejected. There is sufficient evidence to indicate the model is adequate for predicting corruption level for any reasonable value of 𝛼. In addition, 𝑟 = .6079. 60.79% of the sample variation in the corruption levels is explained by the linear relationship between the corruption levels and the degree of freedom in political rights. Thus, we would recommend that the model be used for thr predicition of corruption levels. c.

The relationship between corruption levels and the GDP per capita is positive. The relationship 2 between corruption levels and the degree of freedom in political rights is negative. Because the r for 2 the model using GDP per capita to predict corruption levels is greater than the the r for the model using the degree of freedom in political rights to predict corruption levels, using GDP per capita will give better predictions.

11.102 Using MINITAB, a scattergram of the data is: Scatterplot of WLB-SCORE vs HOURS 80 70

WLB-SCORE

60 50 40 30 20 10 0 0

20

40

60

80

100

HOURS

From the plot, it appears that there may be a negative linear relationship between WLB-scores and the average number of hours worked per week.

Copyright © 2022 Pearson Education, Inc.


Simple Linear Regression

597

Using MINITAB, the results are: Regression Analysis: WLB-SCORE versus HOURS The regression equation is WLB-SCORE = 62.5 - 0.347 HOURS Predictor Constant HOURS

Coef 62.499 -0.34673

S = 12.2845

SE Coef 1.414 0.02761

R-Sq = 7.0%

T 44.22 -12.56

P 0.000 0.000

R-Sq(adj) = 7.0%

Analysis of Variance Source Regression Residual Error Total

DF 1 2085 2086

SS 23803 314647 338451

MS 23803 151

F 157.73

P 0.000

The fitted straight line model is: yˆ = 62.499 − .34673 x . For each additional hour worked per week, the mean WLB-score is estimated to decrease by .34673. To determine if the model is adequate, we test: H 0 : β1 = 0 H a : β1 ≠ 0

From the printout, the test statistic is 𝑡 = −12.56 and the p-value is p = 0.000 . Since the p-value is so small, H0 is rejected. There is sufficient evidence to indicate that there is a linear relationship between the average number of hours worked per week and the WLB-score for any reasonable value of 𝛼. Since βˆ 1 is negative, as the average number of hours worked per week increases, the WLB-score decreases. 2

From the printout, r = 7%or .07 . This means that only 7% of the sample variation of the WLB-scores around their means is explained by the linear relationship between the average number of hours worked per week and the WLB-scores. Even though the p-value for testing whether the model is adequate is extremely small, this model does not explain much of the variation. There is much variation in the WLB-scores that is not explained by the average number of hours worked per week. 11.103 a.

SS xy −88 = = −1.6 , βˆ0 = y − βˆ1 x = 35 − ( −1.6)(1.3) = 37.08 SS xx 55 The least squares line is yˆ = 37.08 − 1.6x .

βˆ1 =

Copyright © 2022 Pearson Education, Inc.


598

Chapter 11

b.

c.

SSE = SSyy − βˆ1SSxy = 198 − ( −1.6)( −88) = 57.2

d.

s2 =

e.

For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table III, Appendix D, with df = n − 2 = 15 − 2 = 13 , t.05 = 1.771 . The 90% confidence interval for β 1 is:

SSE 57.2 = = 4.4 n − 2 15 − 2

s

yˆ ± tα /2

SSxx

 −1.6 ± 1.771

4.4 55

 −1.6 ± .50  ( −2.10, −1.10)

We are 90% confident the change in the mean value of y for each unit change in x is between −2.10 and −1.10. f.

( )

For xp = 15 , yˆ = 37.08−1.6 15 =13.08 The 90% confidence interval is: yˆ ± tα / 2 s

g.

(

xp − x 1 + n SS xx

)  13.08 ± 1.771 4.4 2

(

)

1 (15 − 1.3) +  13.08 ± 6.93  ( 6.15, 20.01) 15 55 2

The 90% prediction interval is:

(

1 xp − x yˆ ± tα /2 s 1 + + n SSxx

)  13.08 ± 1.771 4.4 1+ 1 + (15 −1.3)  13.08 ± 7.86 2

(

)

2

15

55

 ( 5.22, 20.94)

Copyright © 2022 Pearson Education, Inc.


Simple Linear Regression

11.104 a.

Using MINITAB, the scattergram is: Scatterplot of y vs x

5

y

4

3

2

1 1

2

3

4

5

x

b.

One possible line is ŷ = x . x

y

y - ŷ

1 3 5

1 3 5

1 3 5

0 0 0 0

For this example

( y − yˆ) = 0

For this example

( y − yˆ) = 0

A second possible line is yˆ = 3 . x

y - ŷ

−2 0 2 0 Some preliminary calculations are: 1 3 5

c.

y

1 3 5

3 3 3

x = 9 x =35 xy=35

y =9 y =35

2

2

SS xy =  xy −

 x  y = 35 − 9 ( 9 ) = 8

SS yy =  y i2 −

(  y ) = 35 − 9 = 8

n

3

2

2

i

βˆ1 =

3

n

(  x ) = 35 − 9 = 8 2

SS xx =  x 2 −

SS xy 8 = =1 SS xx 8

n

For ŷ = x , SSE = SSyy − βˆ1SSxy = 8 −1( 8) = 0 For yˆ = 3 , SSE =

( y − yˆ ) = (1− 3) + ( 3 − 3) + ( 5 − 3) = 8 2

i

2

2

3

βˆ0 = y − βˆ1 x =

The least squares line is yˆ = 0 + 1x = x . d.

2

2

i

The least squares line has the smallest SSE of all possible lines.

Copyright © 2022 Pearson Education, Inc.

9 9 − 1  = 0 3 3

599


600

Chapter 11

11.105 a.

Using MINITAB, the scattergram of the data is: Scatterplot of y vs x 5

y

4

3

2

1 3

4

5

6

7

x

b.

Some preliminary calculations are:

x = 50

x = 270

SS xy =  xy −

 x  y = 143 − 50 ( 29 ) = −2

SS yy =  y 2 −

(  y ) = 97 − 29 = 12.9

2

10

n

2

r= c.

xy =143 y = 29

SS xy SS xx SS yy

2

(  x ) = 270 − 50 = 20 2

SS xx =  x 2 −

n

2

10

2

10

n

=

y = 97

−2 20 (12.9)

= −.1245

r2 = ( −.1245) = .0155 2

Some preliminary calculations are:

βˆ1 =

SS xy −2 = = −.1 SS xx 20

SSE = SSyy − βˆ1SSxy = 12.9 − ( −.1)( −2) = 12.7

s2 =

SSE 12.7 = = 1.5875 n − 2 10 − 2

s = 1.5875 = 1.25996

To determine if x and y are linearly correlated, we test: H 0 : β1 = 0 H a : β1 =/ 0

ˆ

The test statistic is t = β1 − 0 = −.1 − 0 = −.35 s SS xx

1.25996 20

The rejection requires α / 2 = .10 / 2 = .05 in the each tail of the t-distribution with 𝑑𝑓 = 𝑛 − 2 = 10 − 2 = 8. From Table III, Appendix D, t.05 = 1.86 . The rejection region is t < −1.86 or t > 1.86 .

Copyright © 2022 Pearson Education, Inc.


Simple Linear Regression

601

Since the observed value of the test statistic does not fall in the rejection region (𝑡 = −.35 ≮ − 1.86), H0 is not rejected. There is insufficient evidence to indicate that x and y are linearly correlated at 𝛼 = .10. 11.106 a.

Some preliminary calculations are:

x= 4.25 y=159 xy=126.05 x=

 x = 4.25 = .708333333 n

βˆ1 =

SS xy SS xx

y = 4,545 2

 y = 159 = 26.5 n

6

(  x )(  y ) = 126.05 − 4.25 (159 ) = 126.05 − 112.625 = 13.425 n

6

(  x ) = 3.5675 − ( 4.25 ) = 3.5675 − 3.010416667 = .5570833333 x − 2

SS xx = 

2

y=

6

SS xy =  xy −

x = 3.5675

2

2

n

6

13.425 = 24.09872851 ≈ 24.10 .5570833333

=

𝛽 = 𝑦̄ − 𝛽 𝑥̄ = 26.5 − 24.09872851(. 70833333) = 9.43006739 ≈ 9.43

The least squares line is yˆ = 9.43 + 24.10 x . b.

Since 0 is not in the observed range of x (Surface Area to Volume), 𝛽 has not meaning. 𝛽 = 24.10. For each unit change in Surface Area to Volume, the mean Drug Release Rate is estimated to increase by 24.10.

c.

SS yy =  y 2 −

(  y ) = 4, 545 − (159 ) = 4, 545 − 4, 213.5 = 331.5 2

n

2

6

SSE = SSyy − βˆ1SSxy = 331.5 − 24.09872851(13.425) = 7.9745698 s 2 = MSE =

SSE 7.9745698 = = 1.99364245 and s = 1.99364245 = 1.41196 ≈ 1.41 6−2 n−2

d.

𝑠 = 1.41. We would expect approximately 95% of the observed values of y (Drug release rate) to fall within 2s or 2(1.41) = 2.82units of their least squares predicted values.

e.

𝑠

=

√MSE SS

=

√ . √ .

= 1.8917

For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. From Table III, Appendix D, with df = n − 2 = 6 − 2 = 4 , t.05 = 2.132 . The 90% confidence interval is: 𝛽 ± 𝑡. 𝑠

⇒ 24.10 ± 2.132(1.8917) ⇒ 24.10 ± 4.03 ⇒ (20.07, 28.13)

We are 90% confident that for each additional unit increase in Surface Area to Volume, the increase in the Drug release rate is between 20.07 and 28.13.

Copyright © 2022 Pearson Education, Inc.


602

Chapter 11

f.

( )

For x = .50 , yˆ = 9.43+ 24.10 .50 = 21.48 . The 90% prediction interval is

(

1 xp − x yˆ ± tα /2 s 1 + + n SSxx

)  21.48 ± 2.132(1.41) 1+ 1 + (.5 − .7083333) 2

6

2

.557083333

 21.48 ± 3.35  (18.12, 24.84) 11.107 a.

b.

The value of r is .70. Since this number is somewhat close to 1, there is a moderate positive linear relationship between self-knowledge skill level and goal-setting ability.

( p = 0.001) , there is evidence to reject H . There is sufficient evidence

Since the p-value is so small

0

to indicate a significant linear relationship between self-knowledge skill level and goal-setting ability for any value of 𝛼 > .001. c.

11.108 a.

r 2 = .70 2 = .49 . 49% of the total sample variability around the sample mean goal-setting ability is explained by the linear relationship between self-knowledge skill level and goal-setting ability.

Using MININTAB, the scattergram of the data is: Scatterplot of Index vs Concentration 1000

800

Index

600

400

200

0 10

20

30

40

50

60

70

80

90

100

Concentration

The variables x and y do appear to be related. It appears when x increases, y tends to increase. b.

𝑟 = √𝑟 = √. 6123 = .7825

The correlation between concentration and exhaustion index is .7825. This relationship is positive since r > 0. The relationship is somewhat strong. No, this does not mean that concentration causes emotional exhaustion. They are just related. c.

To determine if the straight-line relationship is useful, we test: H 0 : β1 = 0 H a : β1 =/ 0

The test statistic is t =

βˆ1 − 0 sβˆ

= 6.03 and the p-value is p = .000 .

1

Since the p-value is less than 𝛼 (𝑝 = .000 < .05), H0 is rejected. There is sufficient evidence to indicate the model is useful for predicting burnout at 𝛼 = .05. Copyright © 2022 Pearson Education, Inc.


Simple Linear Regression

d.

𝑟 = .6123. 61.23% of the sample variation of exhaustion index is explained by the linear relationship between the exhaustion index and concentration.

e.

For confidence level .95, 𝛼 = .0 5and 𝛼/2 = .05/2 = .025. From Table III, Appendix D, with df = n − 2 = 25 − 2 = 23 , t.025 = 2.069 . The 95% confidence interval is: 𝛽 ± 𝑡.

𝑠

603

⇒ 8.865 ± 2.069(1.47) ⇒ 8.865 ± 3.041 ⇒ (5.824, 11.905)

We are 95% confident that the change in mean exhaustion index for each unit change in concentration is between 5.824 and 11.905. f.

On the printout, we find the confidence interval is (599.700, 759.782) We are 95% confident that the interval from 599.649 to 759.757 encloses the mean exhaustion level for all professionals who have 80% of their social contacts within their work groups.

11.109 a.

Using MINITAB, the scattergram is: Fitted Line Plot

Index = 569.6 - 0.001924 Salary 800

S 151.592 R-Sq 4.9% R-Sq(adj) 0.0%

700

Index

600

500

400

300 10000

20000

30000

40000

50000

60000

70000

Salary

It appears as salary increases, the retaliation index decreases. b.

x = 544,100 y = 7,497 xy = 263,977,000 x = 23,876,290,000 2

y = 4,061,063 2

x=

 x = 544,100 = 36, 273.333 n

15

y=

 y = 7, 497 = 499.8 n

15

SS xy =  xy −

(  x )(  y ) = 263, 977, 000 − ( 544,100 )( 7, 497 ) = 263, 977, 000 − 271, 941,180 = −7, 964,180

SS xx =  x 2 −

(  x ) = 23, 876, 290, 000 − ( 544,100 ) = 23, 876, 290, 000 − 19, 736, 320, 670 = 4,139, 969, 330

n

15

2

2

n 15 − 7, 964,180 βˆ1 = = = − .001923729 ≈ − .00192 SS xx 4,139, 969, 330

SS xy

βˆ 0 = y − βˆ1 x = 499.8 − ( −.001923729)( 36, 273.333) = 499.8 + 69.78007144 = 569.5800714 ≈ 569.5801 The fitted regression line is yˆ = 569.5801 − .00192x . Copyright © 2022 Pearson Education, Inc.


604

Chapter 11

c.

The least squares line supports the answer because the line has a negative slope.

d.

βˆ0 = 569.58. This has no meaning because x = 0 is not in the observed range.

e.

βˆ1 =−.00192. When the salary increases by $1, the mean retaliation index is estimated to decrease by .00192. This is meaningful for the range of x from $16,900 to $70,000.

f.

Some preliminary calculations are:

(  y ) = 4, 061, 063 − 7, 497 = 314, 062.4 2

SS yy =  y 2 −

2

15

n

SSE = SSyy − βˆ1SSxy = 314,062.4 − ( −.001923729)( −7,964,180) = 298,741.476 s 2 = MSE =

s βˆ = 1

s SS xx

SSE 298,741.476 = = 22,980.11354 n−2 15 − 2 =

151.591931 4,139, 969, 330

s = 22,980.11354 =151.591931

= .002356

To determine if the model is adequate, we test: H 0 : β1 = 0 H a : β1 =/ 0

The test statistic is t =

βˆ1 sβˆ

1

=

−.00192 = −.82 . .002356

The rejection region requires 𝛼/2 = .05/2 = .025 in each tail of the t-distribution with df = n − 2 = 15 − 2 = 13 . From Table III, Appendix D, t.025 = 2.160 . The rejection region is t < −2.160 or t > 2.160 . Since the observed value of the test statistic does not fall in the rejection region (𝑡 = −.82 ≮ − 2.160), H0 is not rejected. There is insufficient evidence to indicate that the model is adequate at 𝛼 = .05.

Copyright © 2022 Pearson Education, Inc.


Simple Linear Regression

11.110 a.

605

Using MINITAB, the graph is: Scatterplot of Catch vs Search 10000 9000 8000

Catch

7000 6000 5000 4000 3000 2000 0

5

10

15

20

25

30

35

Search

There appears to be a negative linear trend. As search percentage increases, the total catch tends to decrease. b.

From the printout, βˆ0 = 9,877.83 and βˆ1 =−163.6024 .

c.

There is no practical interpretation for the estimate of β 0 because the range of values for search frequency does not include 0.

d.

For each percentage increase in search frequency, the mean total catch is estimated to decreace by 163.6024.

e.

To determine if the total catch is negatively linearly related to search frequency, we test: H 0 : β1 = 0 H a : β1 < 0

f.

From the printout, the p-value is p = .0018 / 2 = .0009 .

g.

Since the p-value is less than α p = .0009 < .05 , H0 is rejected. There is sufficient evidence to

(

)

indicate that if the total catch is negatively linearly related to search frequency at α =.05. h.

r 2 = .5397 . 53.97% of the total sample variation around the mean total catch is explained by the linear relationship between the total catch and search frequency.

i.

r =−.7347 . The correlation between the total catch and search frequency is -.7347. Since this value is fairly close to -1, there is a moderate negative linear correlation between the total catch and the search frequency.

j.

We can use the value of r to partially support this inference. The value of r can be used in a test of hypothesis to see if there is a statistically significant linear relationship between total catch and search frequency. In addition, the sign of r tells the direction of the relationship.

k.

The 95% confidence interval for E y is (4,783, 6,792). We are 95% confident that the true mean

( )

total catch is between 4,783 and 6,792 kilograms when the search frequency is 25. Copyright © 2022 Pearson Education, Inc.


606

Chapter 11

l. 11.111 a.

The 95% prediction interval for y is (2,643, 8,933). We are 95% confident that the true actual total catch is between 2,643 and 8,933 kilograms when the search function is 25. The straight-line model is y = β 0 + β1 x + ε .

b.

The least squares line is yˆ = −2, 298.3676 + 11, 598.884 x .

c.

Since 0 is not in the range of observed number of carats, 𝛽 has no practical interpretation.

d.

The 95% confidence interval is 11,146.0846, 12,051.6834 . We are 95% confident that for each

(

)

additional carat, the mean asking price is estimated to increase from between $11,146.0846 and $12,051.6834. e.

The estimated standard deviation is 𝑠 = 𝑅𝑀𝑆𝐸 = 1,117.5642. Most of the observed values of asking price will fall within approximately 2s or 2(1,117.5642) = 2,235.1284 dollars of their respective predicted values.

f.

To determine if a positive linear relationship exits between asking price and size, we test: H 0 : β1 = 0 H a : β1 > 0

g.

The p-value is p < 0.0001 . Since the p-value is less than 𝛼 (𝑝 < .0001 < .05), H0 is rejected. There is sufficient evidence to indicate a positive linear relationship exits between asking price and size at 𝛼 = .05.

h.

r 2 = .8925 . 89.25% of the total sample variation in the asking prices around their sample mean is explained by the linear relationship between size and asking price.

i.

r = .9447 . Since this value is very close to 1, it indicates that there is a strong, positive linear relationship between size and asking price.

j.

The prediction interval is (1,297.6366,5,704.5322). We are 95% confident that the true asking price will fall between $1,297.6366 and $5,704.5322 when the size is .5 carats.

k.

The confidence interval is (3,362.4670,3,639.7018). We are 95% confident that the true mean asking price will fall between $3,362.4670 and $3,639.7018 when the size is .5 carats.

11.112 a.

Using MINITAB, the results are: Regression Analysis: MillionaireBirths versus TotalBirths The regression equation is MillionaireBirths = - 14.1 + 0.628 TotalBirths Predictor Constant TotalBirths

Coef -14.138 0.6277

SE Coef 8.121 0.2435

S = 3.32256

R-Sq = 62.4%

T -1.74 2.58

P 0.157 0.061

R-Sq(adj) = 53.0%

Copyright © 2022 Pearson Education, Inc.


Simple Linear Regression

607

Analysis of Variance Source Regression Residual Error Total

DF 1 4 5

SS 73.34 44.16 117.50

MS 73.34 11.04

F 6.64

P 0.061

The least squares prediction equation is 𝑦 = −14.138 + .6277𝑥 . b.

𝛽 = −14.138. Since 0 is not in the observed range of total US births, β̂0 has no meaning.

βˆ1 =.6277 . For each additional one million births, the mean number of software millionaire c.

birthdays is estimated to increase by .6277. For x = 35 , 𝑦 = −14.138 + .6277(35) = 7.8315.

d.

Using MINITAB, the results are: Regression Analysis: MillionaireBirths versus CEOBirths The regression equation is MillionaireBirths = 2.72 + 0.306 CEOBirths Predictor Constant CEOBirths

Coef 2.7227 0.30626

SE Coef 0.8513 0.04592

T 3.20 6.67

P 0.033 0.003

S = 1.55683 R-Sq = 91.7% Analysis of Variance

R-Sq(adj) = 89.7%

Source Regression Residual Error Total

MS 107.81 2.42

DF 1 4 5

SS 107.81 9.69 117.50

F 44.48

P 0.003

The least squares prediction equation is yˆ = 2.7227 + .3063x . e.

βˆ0 = 2.7227 . The estimate of the mean number of software millionaire birthdays is 2.7227 when the number of CEO birthdays is 0.

βˆ1 =.3063 . For each additional CEO birthday, the mean number of software millionaire birthdays is estimated to increase by .3063. f.

For x = 10 , 𝑦 = 2.7227 + .3063(10) = 5.7857.

g.

Using MINITAB, the results are: Regression Analysis: MillionaireBirths versus TotalBirths The regression equation is MillionaireBirths = - 14.1 + 0.628 TotalBirths Predictor Constant TotalBirths

Coef -14.138 0.6277

SE Coef 8.121 0.2435

S = 3.32256

R-Sq = 62.4%

T -1.74 2.58

P 0.157 0.061

R-Sq(adj) = 53.0%

Copyright © 2022 Pearson Education, Inc.


608

Chapter 11 Analysis of Variance Source Regression Residual Error Total

DF 1 4 5

SS 73.34 44.16 117.50

MS 73.34 11.04

F 6.64

P 0.061

From the printout, 𝑆𝑆𝐸 = 44.16, 𝑠 = 𝑀𝑆𝐸 = 11.04, and𝑠 = 3.32256. h.

Using MINITAB, the results are: Regression Analysis: MillionaireBirths versus CEOBirths The regression equation is MillionaireBirths = 2.72 + 0.306 CEOBirths Predictor Constant CEOBirths

Coef 2.7227 0.30626

S = 1.55683

SE Coef 0.8513 0.04592

R-Sq = 91.7%

T 3.20 6.67

P 0.033 0.003

R-Sq(adj) = 89.7%

Analysis of Variance Source Regression Residual Error Total

DF 1 4 5

SS 107.81 9.69 117.50

MS 107.81 2.42

F 44.48

P 0.003

From the printout, 𝑆𝑆𝐸 = 9.69, 𝑠 = 𝑀𝑆𝐸 = 2.42, and𝑠 = 1.55683. i.

The model containing CEO birthdays will have smaller errors of prediction because the value of s for that model (𝑠 = 1.55683) is smaller than the value of s for the other model (𝑠 = 3.32256).

j.

From part a, 𝛽 = .6277 and 𝑠

= .2435.

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table III, Appendix D, with 𝑑𝑓 = 𝑛 − 2 = 6 − 2 = 4, 𝑡. = 2.776. The confidence interval is: 𝛽 ± 𝑡.

𝑠

⇒ .6277 ± 2.776(. 2435) ⇒ .6277 ± .6760 ⇒ (−.0483, 1.3037)

We are 95% confident that for each additional one million US births, the mean number of software millionaire birthdays will change from -.0483 to 1.3037. Since 0 is in this interval, there is no evidence that total US births and the number of software millionaire birthdays are linearly related. k.

From part d, 𝛽 = .3063 and𝑠

= .0459.

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table III, Appendix D, with 𝑑𝑓 = 𝑛 − 2 = 6 − 2 = 4, 𝑡. = 2.776. The confidence interval is: 𝛽 ± 𝑡.

𝑠

⇒ .3063 ± 2.776(. 0459) ⇒ .3063 ± .1274 ⇒ (. 1789, .4337)

We are 95% confident that for each CEO birthdays, the mean number of software millionaire birthdays will increase between .1789 to .4337. Since 0 is not in this interval, there is evidence of a linear relationship between the number of CEO birthdays and the number of software millionaire birthdays.

Copyright © 2022 Pearson Education, Inc.


Simple Linear Regression

l.

609

No, you cannot conclude that the number of software millionaires born in a decade is linearly related to the total number of people born in the U.S. because 0 is contained in the 95% confidence interval for β 1 . Yes, you can conclude that the number of software millionaires born in a decade is linearly related to the number of CEOs born in a decade because 0 is not contained in the 95% confidence interval for β1 .

m.

From the printout in part a, r 2 = 62.4% . 62.4% of the total sample variability around the sample mean number of software millionaire birthdays is explained by the linear relationship between the number of software millionaire birthdays and the total number of U.S. births.

n.

From the printout in part d, r 2 = 91.7% . 91.7% of the total sample variability around the sample mean number of software millionaire birthdays is explained by the linear relationship between the number of software millionaire birthdays and the number of CEO birthdays.

o.

Yes. There is a very strong positive linear relationship between the number of sotware millionaire birthdays and the number of CEO birthdays. As the number of software millionaire birthdays increase, the number of CEO birthdays also increases.

p.

Using MINITAB, the results are: Predicted Values for New Observations New Obs Fit SE Fit 95% CI 95% PI 1 10.379 0.862 (7.987, 12.771) (5.439, 15.320) Values of Predictors for New Observations New Obs 1

CEOBirths 25.0

The 95% prediction interval is (5.439,15.320). We are 95% confident that the actual number of software millionaire birthdays in the decade will be between 5.439 and 15.320 when the number of CEO birthdays is 25. q.

The smaple mean number of CEO birthdays per decade is 𝑥̄ =

=

= 12.333. The narrowest

prediction interval is when the value of x used for the prediction is equal to x. The further the value of x is from x, the wider the prediction interval. hus, the interval when x = 11 will be narrower than the interval when x = 25 . 11.113 Step 1: The hypothesized model is y = β 0 + β1 x + ε . Step 2: The estimates of the unknown parameters are βˆ0 =−32.35 and βˆ1 = 4.82.

Since 0 is not in the range of observed values of the monthly price of naphtha, β̂0 has no practical interpretation. For each additional unit increase in the monthly price of naphtha, the mean monthly price of recycled colored plastic bottles is estimated to increase by 4.82. Step 3: We assume that the error terms are normally and independently distributed with a mean of 0 and constant variance. Not enough information was provided in the exercise to estimate the variance of the error terms.

Copyright © 2022 Pearson Education, Inc.


610

Chapter 11 Step 4: To determine if there is a linear relationship between the monthly price of recycled colored plastic bottles and the monthly price of naphtha, we test: H 0 : β1 = 0 H a : β1 ≠ 0

The test statistic is t =16.60 .

(

) ( ) Using MINITAB, P( t >16.60) + P( t < −16.60) = .000 +.000 = .000 .

The p-value is P t >16.60 + P t <−16.60 where the t-distribution has df = n − 2 = 120 − 2 = 118 .

Since the p-value is so small, H0 is rejected for any reasonable value of 𝛼. There is sufficient evidence to indicate a linear relationship exists between the monthly price of recycled colored plastic bottles and the monthly price of naphtha. 𝑟 = .69. 69% of the total sample variation of monthly prices of recycled colored plastic bottles around their sample mean is explained by the linear relationship between the monthly price of recycled colored plastic bottles and the monthly price of naphtha.

r = .83. The correlation indicates a fairly strong positive linear relationship. This confirms our conclusion that monthly naphtha prices and monthly prices of recycled colored plastic bottles are positively linearly related. 11.114 a.

Using MINITAB, the results are: Regression Analysis: YRSPRAC versus EDYRS 81 cases used, 37 cases contain missing values Predictor Constant EDYRS

Coef 14.269 -0.000422

S = 9.74574

SE Coef 1.108 0.009822

R-Sq = 0.0%

T 12.87 -0.04

P 0.000 0.966

R-Sq(adj) = 0.0%

Analysis of Variance Source Regression Residual Error Total

DF 1 79 80

SS 0.18 7503.38 7503.56

MS 0.18 94.98

F 0.00

P 0.966

The fitted regression line is yˆ = 14.269 − .000422 x . To determine if there is a linear relationship between length of time in practice and the amount of exposure in medical school, we test: H 0 : β1 = 0 H a : β1 ≠ 0

The test statistic is t = −.04 and the p-value is p = .966 .

Copyright © 2022 Pearson Education, Inc.


Simple Linear Regression

611

Since the p-value is not less than 𝛼 (𝑝 = .966 ≮ . 05), H0 is not rejected. There is insufficient evidence to indicate there is a linear relationship between length of time in practice and the amount of exposure in medical school at 𝛼 = .05. b.

Deleting the outlier, the results are: Regression Analysis: YRSPRAC2 versus EDYRS2 80 cases used, 38 cases contain missing values Predictor Constant EDYRS2

Coef 16.623 -0.20279

S = 9.23727

SE Coef 1.289 0.06487

R-Sq = 11.1%

T 12.90 -3.13

P 0.000 0.002

R-Sq(adj) = 10.0%

Analysis of Variance Source Regression Residual Error Total

DF 1 78 79

SS 833.87 6655.51 7489.39

MS 833.87 85.33

F 9.77

P 0.002

The fitted regression line is yˆ = 16.623 − .20279 x . To determine if there is a linear relationship between length of time in practice and the amount of exposure in medical school, we test: H 0 : β1 = 0 H a : β1 ≠ 0

The test statistic is 𝑡 = −3.13 and the p-value is p = .002 . Since the p-value is less than 𝛼 (𝑝 = .002 < .05), H0 is rejected. There is sufficient evidence to indicate there is a linear relationship between length of time in practice and the amount of exposure in medical school at 𝛼 = .05. By eliminating the one outlier, the relationship is now statistically significant. 11.115 a. b.

A proposed model is y = β o + β1 x + ε . Some preliminary calculations are:

x=1,292.7 y =3,781.1 x=

xy = 218,291.63 x =88,668.43 y = 651,612.45 2

 x = 1, 292.7 = 58.75909091 n

y=

22

SS xy =  xy −

2

 y = 3, 781.1 = 171.8681818 n

22

(  x )(  y ) = 218, 291.63 − 1, 292.7 ( 3, 781.1)

22 n = 218, 291.63 − 222,173.9986 = −3,882.3686

(  x ) = 88, 668.43 − (1, 292.7 ) = 88, 668.43 − 75, 957.87682 = 12, 710.55318 x − 2

SS xx = 

2

2

n

22

Copyright © 2022 Pearson Education, Inc.


Chapter 11

(  y ) = 651, 612.45 − ( 3, 781.1) = 651, 612.45 − 649, 850.7823 = 1, 761.6677 y − 2

SS yy = 

βˆ1 =

SS xy SS xx

2

2

n

=

22

− 3, 882.3686 = − 0.305444503 ≈ − 0.305 12, 710.55318

βˆo = y − βˆ1x = 171.8681818 − ( −0.305444503)( 58.75909091) = 189.8158231 ≈ 189.816 The fitted regression line is: yˆ = 189.816 − 0.305 x c.

Using MINITAB, a graph of the fitted regression line is: Fitted Line Plot

MATH = 189.8 - 0.3054 POVERTY 190

S R-Sq R-Sq(adj)

185

5.36572 67.3% 65.7%

180

MATH

612

175 170 165 160 155 10

20

30

40

50

60

70

80

90

100

POVERTY

From the fitted regression line, the relationship between the two variables is negative. d.

βˆo =189.816 . βˆ1 =−0.305.

e.

Since 0 is not in the range of observed values of the variable % Below Poverty, the y-intercept has no meaning. For each one percent increase in the % Below Poverty, the mean value of FCAT-Math is estimated to decrease by 0.305.

Some preliminary calculations are:

SSE = SSyy − βˆ1SSxy = 1,761.6677 − ( −0.305444503)( −3,882.3686) = 575.8195525 s2 =

s βˆ = 1

SSE 575.8195525 = = 28.79097763 n−2 22 − 2 s SS xx

=

5.3657 12,710.55318

s=

28.79097763 = 5.3657

= .0476

Copyright © 2022 Pearson Education, Inc.


Simple Linear Regression

613

To determine if math score and percentage below the poverty level are linearly related, we test: H 0 : β1 = 0 H a : β1 ≠ 0

βˆ1 − 0

The test statistic is t =

sβˆ

1

=

−.3054 − 0 = −6.42 . .0476

The rejection region requires α / 2 = .05 / 2 = .025 in each tail of the t-distribution. From Table III, Appendix D, t.025 = 2.086 with df = n − 2 = 22 − 2 = 20 . he rejection region is 𝑡 < −2.086 or 𝑡 > 2.086. Since the observed value of the test statistic falls in the rejection region (𝑡 = −6.42 < −2.086), H0 is rejected. There is sufficient evidence to indicate that math score and percentage below the poverty level are linearly related at 𝛼 = .05. f.

A proposed model is y = β o + β1 x + ε . Some preliminary calculations are:

x=1,292.7 y =3,764.2 y = 645,221.16

xy = 217,738.81 x =88,668.43 2

2

x=

 x = 1, 292.7 = 58.75909091 n

y=

22

SS xy =  xy −

 y = 3, 764.2 = 171.1 n

22

(  x )(  y ) = 217, 738.81 − 1, 292.7 ( 3, 764.2 ) 22

n = 217, 738.81 − 221,180.97 = − 3, 442.16

SS xx = 

(  x ) = 88, 668.43 − (1, 292.7 ) = 88, 668.43 − 75, 957.87682 = 12, 710.55318 x −

SS yy = 

(  y ) = 645, 221.16 − ( 3, 764.2 ) = 645, 221.16 − 644, 054.62 = 1,166.54 y −

2

2

2

n

22

2

2

2

n

22

SS xy

− 3, 442.16 βˆ1 = = = − 0.270811187 ≈ − 0.271 SS xx 12, 710.55318

βˆo = y −βˆ1x =171.1−( −0.270811187)( 58.75909091) =187.0126192 ≈187.013 The fitted regression line is: yˆ = 187.013 − 0.271x

Copyright © 2022 Pearson Education, Inc.


Chapter 11

Using MINITAB, a graph of the fitted regression line is: Fitted Line Plot

READING = 187.0 - 0.2708 POVERTY 185

S R-Sq R-Sq(adj)

3.42319 79.9% 78.9%

180

175

READING

614

170

165

160

10

20

30

40

50

60

70

80

90

100

POVERTY

From the fitted regression line, the relationship between the two variables is negative.

βˆo =187.013.

Since 0 is not in the range of observed values of the variable % Below Poverty, the y-intercept has no meaning.

βˆ1 =−0.271.

For each additional one percent increase in % Below Poverty, the mean value of FCAT-Reading is estimated to decrease by .271.

Some preliminary calculations are:

SSE = SSyy − βˆ1SSxy = 1,166.54 − ( −0.270811187)( −3,442.16) = 234.3645646 s2 =

SSE 234.3645646 = = 11.71822823 22 − 2 n−2

s βˆ = 1

s SS xx

=

3.4232 12,710.55318

s = 11.71822823 = 3.4232

= .0304

To determine if reading score and percentage below the poverty level are linearly related, we test: H 0 : β1 = 0 H a : β1 ≠ 0

The test statistic is t =

βˆ1 − 0 sβˆ

1

=

−.2708 − 0 = −8.91 . .0304

The rejection region requires α / 2 = .05 / 2 = .025 in each tail of the t-distribution. From Table III, Appendix D, t.025 = 2.086 with df = n − 2 = 22 − 2 = 20 . The rejection region is 𝑡 < −2.086 or 𝑡 > 2.086.

Since the observed value of the test statistic falls in the rejection region (𝑡 = −8.91 < −2.086), H0 is rejected. There is sufficient evidence to indicate that reading score and percentage below the poverty level are linearly related at 𝛼 = .05. Copyright © 2022 Pearson Education, Inc.


Simple Linear Regression

g.

615

From part e, 𝑠 = √28.79097763 = 5.37. We would expect approximately 95% of the observed

(

)

values of y (FCAT-Math scores) to fall within 2s or 2 5.37 =10.74 units of their least squares predicted values. h.

From part f, 𝑠 = √11.71822823 = 3.42. We would expect approximately 95% of the observed

(

)

values of y (FCAT-Reading scores) to fall within 2s or 2 3.42 = 6.84 units of their least squares predicted values. i.

The sample standard deviation for predicting FCAT-Math scores is s = 5.37 . The sample standard deviation for predicting FCAT-Reading scores is s = 3.42 . Since the standard deviation for predicting FCAT-Reading scores is smaller than the standard deviation for predicting FCAT-Math scores, we can more accurately predict the FCAT-Reading scores.

11.116 a.

A straight line model relating an NFL team’s current value to its operating income is y = β 0 + β1 x + ε

b.

From MINITAB: Coefficients Term Constant OPINCOME

Coef 2094 7.46

SE Coef 161 1.29

T-Value 12.99 5.77

P-Value VIF 0.000 0.000 1.00

The fitted regression line is: 𝑦 = 2094 + 7.46𝑥 c.

𝛽 = 7.46. When operating income increases by 1 millon dollars, the mean current value is estimated to increase by 7.46 million dollars. This is meaningful for values of operating income between 28 and 420 million dollars. 𝛽 = 2094. Since 0 is not in the observed range of income, this has no practical meaning.

d.

To determine if a linear relationship exists between current value and operating income, we test: 𝐻 : 𝛽 =0 𝐻 : 𝛽 ≠0

The test statistics is 𝑡 = 5.77 The p-value is p = 0.000 No 𝛼 was given so we will use 𝛼 = .05. Since the p-value is so small (𝑝 = 0.000 < .05), H0 is rejected. There is sufficient evidence to indicate a significant linear relationship between current value and operating income at 𝛼 = .05. From MINITAB: Model Summary S R-sq 524.425 52.63%

R-sq(adj) 51.05%

R-sq(pred) 44.92%

𝑟 = .5263: 52.63% of the sample variation around the sample mean current value is explained by the linear relationship between current value and operating income.

Copyright © 2022 Pearson Education, Inc.


616

Chapter 11

There is a significant linear relationship between current value and operating income. In addition, the relationship is fairly strong. We would recommend using it to predict an NFL team’s value. 11.117 a.

Using MINITAB, the scattergram of the data is: Scatterplot of y1, y2 vs x 27

30

33

y1

36

39

y2 38

40.0 37.5

36

35.0

34

32.5

32

30.0

30

27.5

28

25.0 27

30

33

36

39

26

x

b.

It appears that the weigh-in-motion reading after calibration adjustment is more highly correlated with the static weight of trucks than prior to calibration adjustment. The scattergram is closer to a straight line.

c.

Some preliminary calculations are:

x=312.8

x =9,911.42

xy =10,201.41 xy =9,859.84

y = 320.2

y =10,543.68

y = 311.2

2

2 1

1

SS xy1 =  xy1 −

1

y = 9,809.52 2 2

2

 x y = 10, 201.41 − 312.8 ( 320.2) = 185.554 1

10

n

(  x ) = 9, 911.42 − 312.8 = 127.036 x − 2

SS xx = 

2

2

2

10

n

SS y1 y1 =  y12 −

(  y ) = 10, 543.68 − 320.2 = 290.876

SS xy2 =  xy2 −

 x  y = 9,859.84 − 312.8 ( 311.2) = 125.504

SS y 2 y 2 =  y 22 −

(  y ) = 9, 809.52 − 311.2 = 124.976

2

2

1

10

n

2

n

10

2

r1 =

SSxy1 SSxx SS y1 y1

2

2

10

n

=

185.554 127.036 ( 290.876)

= .965

r2 =

SSxy2 SSxx SS y2 y2

Copyright © 2022 Pearson Education, Inc.

=

125.504 127.036 (124.976)

= .996


Simple Linear Regression

617

r1 = .965 implies the static weight of trucks and weigh-in-motion prior to calibration adjustment have

a strong positive linear relationship. r2 = .996 implies the static weight of trucks and weigh-in-motion after calibration adjustment have a

stronger positive linear relationship. The closer r is to 1 indicates the more accurate the weigh-in-motion readings are. d. 11.118 a.

Yes. If the weigh-in-motion readings were all exactly the same distance below (or above) the actual readings, r would be 1. Using MINITAB, the regression analysis is: Regression Analysis: BTU versus Area The regression equation is BTU = - 99045 + 103 Area Predictor Constant Area

Coef -99045 102.81

SE Coef 261618 15.86

S = 628185

R-Sq = 67.8%

T -0.38 6.48

P 0.709 0.000

R-Sq(adj) = 66.1%

Analysis of Variance Source

DF

SS

MS

F

P

Regression Residual Error Total

1 1.65850E+13 1.65850E+13 20 7.89232E+12 3.94616E+11 21 2.44773E+13

42.03

0.000

Predicted Values for New Observations New Obs 1

Fit 723467

SE Fit 165874

(

95.0% CI 377459, 1069475)

95.0% PI ( -631816, 2078750)

Values of Predictors for New Observations New Obs Area 1 8000

βˆ0 =−99,045 b.

βˆ1 =102.81

To determine if energy consumption is positively linearly related to the shell area, we test: H 0 : β1 = 0 H a : β1 > 0

The test statistic is 𝑡 = 6.48 and the p-value is p = 0.000 / 2 = 0.000 . Since the p-value is less than 𝛼 (𝑝 = .000 < .10), H0 is rejected. There is sufficient evidence to indicate that energy consumption is positively linearly related to the shell area at 𝛼 = .10.

Copyright © 2022 Pearson Education, Inc.


618

Chapter 11

c.

Since this is a one-tailed test but the output calculates the p-value for a two-tailed test, the observed significance level is: 𝑝=

(.000) = .000

This is the probability of observing our value of t (6.481) or anything larger if 𝛽 = 0. Since this probability is so small, there is strong evidence to reject H0. d.

𝑟 = 𝑅 − Square = .678

67.8% of the total sample variability in energy consumption around its mean is explained by the linear relationship between energy consumption and shell area. e.

From the printout, for x p = 8,000 , yˆ = 723, 467 The 95% prediction interval is (−631,816, 2,078,750). This interval is so large and includes negative BTU's; it is not very useful.

f.

11.119 a.

We must assume that the probability distribution of ε has a mean of 0, the variance of the probability distribution of ε is constant for all values of x, the distribution of ε is normal, and the ε ' s are independent. Using MINITAB, the results of the regression analysis are: Regression Analysis: Managers versus UnitsSold The regression equation is Managers = 5.33 + 0.586 UnitsSold Predictor Constant UnitsSol

Coef 5.325 0.58610

S = 2.566

SE Coef 1.180 0.03818

R-Sq = 92.9%

T 4.51 15.35

P 0.000 0.000

R-Sq(adj) = 92.5%

Analysis of Variance Source Regression Residual Error Total

DF 1 18 19

SS 1552.0 118.6 1670.5

MS 1552.0 6.6

F 235.63

P 0.000

To determine the usefulness of the model, we test: H 0 : β1 = 0 H a : β1 =/ 0

The test statistic is 𝑡 =

= 15.35 and the p-value is 𝑝 = 0.000.

Since the p-value is less than 𝛼 (𝑝 = 0.000 < .05), H0 is rejected. There is sufficient evidence to indicate the model is useful at 𝛼 = .05. Therefore, the monthly sales is useful in predicting the number of managers at 𝛼 = .05.

Copyright © 2022 Pearson Education, Inc.


Simple Linear Regression

b.

619

For confidence coefficient .90, α = .10 and α / 2 = .10 / 2 = .05 . From Table III, Appendix D, with df = n − 2 = 20 − 2 = 18 , t.05 = 1.734 . x 540 ˆ = 5.325+.5861 39 = 28.1829. Also, SS xx = 4, 518 . For x p = 39 , x =  = = 27 , and y 20 n

( )

The 90% prediction interval is:

(

1 xp − x yˆ ± tα /2 s 1 + + n SSxx

)  28.183 ± 1.734( 2.566) 1+ 1 + ( 39 − 27) 2

20

2

4,518

 28.183 ± 4.628  ( 23.555, 32.811) c.

We are 90% confident the actual number of managers needed when 39 units are sold is between 23.55 and 32.81.

11.120 Some preliminary calculations are:

x= 4,305  x = 1,652,025 xy=76,652,695 y = 201,558  y = 3,571,211,200 2

a.

βˆ1 =

2

 xy = 76,652,695 = 46.39923427 ≈ 46.3992  x 1,652,025 2

The least squares line is 𝑦 = 46.3992𝑥 .

Copyright © 2022 Pearson Education, Inc.


620

Chapter 11

b.

SS xy =  xy −

 x  y = 76, 652, 695 − 4, 305 ( 201, 558) = 18,805, 549 n

15

(  x ) = 1, 652, 025 − 4, 305 = 416, 490 x − 2

SS xx = 

βˆ1 =

SS xy SS xx

2

2

15

n

=

18,805, 549 = 45.15246224 ≈ 45.1525 416, 490

βˆ 0 = y − βˆ1 x =

201,558  4, 305  − 45.15246224  = 478.443336 ≈ 478.4433  15  15

The least squares line is yˆ = 478.4433 + 45.1525x . c.

Because x= 0is not in the observed range, we are trying to represent the data on the observed interval with the best fitting line. We are not concerned with whether the line goes through (0, 0) or not.

d.

Some preliminary calculations are:

(  y ) = 3, 571, 211, 200 − 201, 558 = 862, 836, 042 2

SS yy =  y 2 −

2

15

n

SSE = SSyy − βˆ1SSxy = 862,836,042 − 45.15246224(18,805,549) = 13,719,200.88 s2 =

SSE 13,719, 200.88 = = 1,055,323.145 n−2 15 − 2

s = 1,055,323.145=1,027.2892

H 0 : β0 = 0 H a : β0 ≠ 0

The test statistic is t =

βˆ0 − 0 2

1 x s + n SSxx

=

478.4433 2 1 + 287 1,027.2892 15 416, 490

= .906

The rejection region requires 𝛼/2 = .10/2 = .05 in each tail of the t-distribution with df = n − 2 = 15 − 2 = 13 . From Table III, Appendix D, t.05 = 1.771 . The rejection region is t < −1.771 or t > 1.771 . Since the observed value of the test statistic does not fall in the rejection region (𝑡 = .906 ≯ 1.771), H0 is not rejected. There is insufficient evidence to indicate β 0 is different from 0 at 𝛼 = .10. Thus, β 0 should not be included in the model.

Copyright © 2022 Pearson Education, Inc.


Simple Linear Regression

621

11.121 Using MINITAB, the two regression analyses are: Regression Analysis: IndCosts versus MachHours The regression equation is Ind.Costs = 301 + 10.3 MachHours Predictor Constant MachHour

Coef 301.0 10.312

StDev 229.8 3.124

S = 170.5

R-Sq = 52.1%

T 1.31 3.30

P 0.219 0.008

R-Sq(adj) = 47.4%

Analysis of Variance Source Regression Residual Error Total

DF 1 10 11

SS 316874 290824 607698

MS 316874 29082

F 10.90

T 3.42 1.43

P 0.007 0.183

P 0.008

Regression Analysis: IndCosts versus DirectHour The regression equation is Ind.Costs = 745 + 7.72 DirectHour Predictor Constant DirectHo

Coef 744.7 7.716

StDev 217.6 5.396

S = 224.6

R-Sq = 17.0%

R-Sq(adj) = 8.7%

Analysis of Variance Source Regression Residual Error Total

DF 1 10 11

SS 103187 504511 607698

MS 103187 50451

F 2.05

P 0.183

From these two cost functions, the model containing Machine-Hours should be used to predict Indirect Manufacturing Labor Costs. There is a significant linear relationship between Indirect Manufacturing

(

)

Labor Costs and Machine-Hours t = 3.30, p =0.008 . There is not a significant linear relationship between

(

)

Indirect Manufacturing Labor Costs and Direct Manufacturing Labor-Hours t =1.43, p =0.183 . The r2 for the first model is .521 while the r2 for the second model is .170. In addition, the standard deviation for the first model is 170.5 while the standard deviation for the second model is 224.6. All of these lead to the better model as the model containing Machine-Hours as the independent variable. 11.122 Answers may vary. Possible answer: The scaffold-drop survey provides the most accurate estimate of spall rate in a given wall segment. However, the drop areas were not selected at random from the entire complex; rather, drops were made at areas with high spall concentrations. Therefore, if the photo spall rates could be shown to be related to drop spall rates, then the 83 photo spall rates could be used to predict what the drop spall rates would be.

Copyright © 2022 Pearson Education, Inc.


Chapter 11

Using MINITAB, a scattergram for the data is: Scatterplot of DROPRATE vs PHOTORATE

40

30

DROPRATE

622

20

10

0 0

2

4

6

8

10

12

14

16

PHOTORATE

The scattergram shows a positive relationship between the photo spall rate (x) and the drop spall rate (y). Find the prediction equation for drop spall rate. The MINITAB output shows the results of the analysis. Regression Analysis: DROPRATE versus PHOTORATE The regression equation is DROPRATE = 2.55 + 2.76 PHOTORATE Predictor Constant PHOTORATE

Coef 2.548 2.7599

S = 4.16352

SE Coef 1.637 0.2180

R-Sq = 94.7%

T 1.56 12.66

P 0.154 0.000

R-Sq(adj) = 94.1%

Analysis of Variance Source Regression Residual Error Total

DF 1 9 10

SS 2777.5 156.0 2933.5

MS 2777.5 17.3

F 160.23

P 0.000

The fitted regression line is 𝑦 = 2.55 + 2.76𝑥 . Conduct a formal statistical hypthesis test to determine if the photo spall rates contribute information for the prediction of drop spall rates. 𝐻 : 𝛽 =0 𝐻 : 𝛽 ≠0

The test statistic is t =12.66 and the p-value is p = 0.000 . Since the p-value is so small, H0 would be rejected for any reasonable level of significance. There is sufficient evidence to indicate that photo spall rates contribute information for the prediction of drop spall rates at 𝛼 > 0.000. In addition, 𝑅 − sq = .947. This indicates that 94.7% of the total variation on the drop spall rates about their mean can be explained by the linear relationship between the drop spall rate and the photo spall rate. The prediction line should be very good. One could now use the 83 photos spall rates to predict values for 83 drop spall rates. Then use this information to estimate the true spall rate at a given wall segment and estimate to total spall damage.

Copyright © 2022 Pearson Education, Inc.


Chapter 12 Multiple Regression and Model Building 12.1

12.2

a.

𝐸 𝑦 =𝛽 +𝛽 𝑥 +𝛽 𝑥

b.

𝐸 𝑦 =𝛽 +𝛽 𝑥 +𝛽 𝑥 +𝛽 𝑥 +𝛽 𝑥

c.

𝐸 𝑦 =𝛽 +𝛽 𝑥 +𝛽 𝑥 +𝛽 𝑥 +𝛽 𝑥 +𝛽 𝑥

a.

𝛽 = 506.35, 𝛽 = −941. 900, 𝛽 = −429. 060

b.

𝑦 = 506.346 − 941. 900𝑥 − 429. 060𝑥

c.

𝑆𝑆𝐸 = 151,016, 𝑀𝑆𝐸 = 8,883, 𝑠 = 94.251 We expect about 95% of the y-values to fall within 2𝑠 or 2 94.251 or 188.502 units of the fitted regression equation.

d.

𝐻 :𝛽 = 0 𝐻 :𝛽 ≠ 0 The test statistic is 𝑡 =

=

. .

= −3.42

The rejection region requires 𝛼/2 = .05/2 = .025 in each tail of the t-distribution with 𝑑𝑓 = 𝑛 − = 2.110. The rejection region 𝑘 + 1 = 20 − 2 + 1 = 17. From Table III, Appendix D, 𝑡. is 𝑡 < −2.110or 𝑡 > 2.110. Since the observed value of the test statistic falls in the rejection region 𝑡 = −3.42 < −2.110 , H0 is rejected. There is sufficient evidence to indicate 𝛽 ≠ 0 at 𝛼 = .05. e.

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table III, Appendix D, with 𝑑𝑓 = 𝑛 − 𝑘 + 1 = 20 − 2 + 1 = 17, 𝑡. = 2.110. The 95% confidence interval is: 𝛽 ± 𝑡. 25 𝑠

f.

⇒ −429. 060 ± 2.110 379.83 ⇒ −429. 060 ± 801.441 ⇒ −1,230.501, 372.381

𝑅 = R-Sq = .459. 45.9% of the total sample variation of the y values is explained by the model containing x1 and x2. 𝑅 = R-Sq adj = .396. 39.6% of the total sample variation of the y values is explained by the model containing x1 and x2, adjusted for the sample size and the number of parameters in the model.

g.

To determine if at least one of the independent variables is significant in predicting y, we test: 𝐻 :𝛽 = 𝛽 = 0 𝐻 : At least one 𝛽 ≠ 0 From the printout, the test statistic is 𝐹 = 7.22. Since no α level was given, we will choose 𝛼 = .05. The rejection region requires 𝛼 = .05 in the 623 Copyright © 2022 Pearson Education, Inc.


624 Chapter 12 upper tail of the F-distribution with 𝜈 = 𝑘 = 2 and 𝜈 = 𝑛– 𝑘 + 1 = 20– 2 + 1 = 17. From Table VI, Appendix D, 𝐹. = 3.59. The rejection region is 𝐹 > 3.59. Since the observed value of the test statistic falls in the rejection region 𝐹 = 7.22 > 3.59 , H0 is rejected. There is sufficient evidence to indicate at least one of the variables, x1 or x2, is significant in predicting y at 𝛼 = .05.

12.3

h.

The observed significance level of the test is 𝑝 = 0.005. Since the p-value is so small, we will reject H0 for most reasonable values of 𝛼. There is sufficient evidence to indicate at least one of the variables, x1 or x2, is significant in predicting y at 𝛼 > 0.005.

a.

We are given 𝛽 = 2.7, 𝑠

= 1.86, and 𝑛 = 30.

𝐻 :𝛽 = 0 𝐻 :𝛽 ≠ 0 The test statistic is 𝑡 =

.

=

.

= 1.45

The rejection region requires 𝛼/2 = .05/2 = .025 in each tail of the t distribution with 𝑑𝑓 = 𝑛 − = 2.056. The rejection region is 𝑘 + 1 = 30 − 3 + 1 = 26. From Table III, Appendix D, 𝑡. 𝑡 < −2.056 or 𝑡 > 2.056. Since the observed value of the test statistic does not fall in the rejection region 𝑡 = 1.45 ≯ 2.056 , H0 is not rejected. There is insufficient evidence to indicate 𝛽 ≠ 0 at 𝛼 = .05. b.

We are given 𝛽 = .93, 𝑠

= .29, and 𝑛 = 30.

𝐻 :𝛽 = 0 𝐻 :𝛽 ≠ 0 The test statistic is 𝑡 =

=

.

= 3.21

.

The rejection region is the same as part a, 𝑡 < −2.056 or 𝑡 > 2.056. Since the observed value of the test statistic falls in the rejection region 𝑡 = 3.21 > 2.056 , H0 is rejected. There is sufficient evidence to indicate 𝛽 ≠ 0 at 𝛼 = .05.

12.4

c.

𝛽 has a smaller estimated standard error than 𝛽 . Therefore, the test statistic is larger for 𝛽 even though 𝛽 is smaller than 𝛽 .

a.

We are given 𝛽 = 3.1, 𝑠

= 2.3, and 𝑛 = 25.

𝐻 :𝛽 = 0 𝐻 :𝛽 > 0 The test statistic is 𝑡 =

=

. .

= 1.35

The rejection region requires 𝛼 = .05 in the upper tail of the t-distribution with 𝑑𝑓 = 𝑛 − 𝑘 + 1 = 25 − 2 + 1 = 22. From Table III, Appendix D, 𝑡. = 1.717. The rejection region is 𝑡 > 1.717. Since the observed value of the test statistic does not fall in the rejection region 𝑡 = 1.35 ≯ 1.717 , H0 is not rejected. There is insufficient evidence to indicate 𝛽 > 0 at 𝛼 = .05. Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building

b.

We are given 𝛽 = .92, 𝑠

625

= .27, and 𝑛 = 25.

𝐻 :𝛽 = 0 𝐻 :𝛽 ≠ 0 The test statistic is 𝑡 =

=

. .

= 3.41.

The rejection region requires 𝛼/2 = .05/2 = .025 in each tail of the t distribution with 𝑑𝑓 = 𝑛 − 𝑘 + 1 = 25 − 2 + 1 = 22. From Table III, Appendix D, 𝑡. 25 = 2.074. The rejection region is 𝑡 < −2.074 or 𝑡 > 2.074. Since the observed value of the test statistic falls in the rejection region 𝑡 = 3.41 > 2.074 , reject H0. There is sufficient evidence to indicate 𝛽 ≠ 0 at 𝛼 = .05. c.

For confidence coefficient .90, 𝛼 = .10and 𝛼/2 = .10/2 = .05. From Table III, Appendix D, with 𝑑𝑓 = 𝑛 − 𝑘 + 1 = 25 − 2 + 1 = 22, 𝑡. = 1.717. The confidence interval is: 𝛽 ± 𝑡. 𝑠

⇒ 3.1 ± 1.717 2.3 ⇒ 3.1 ± 3.949 ⇒ −.849, 7. 049

We are 90% confident that 𝛽 falls between −.849 and 7.049. d.

For confidence coefficient .99, 𝛼 = .01and 𝛼/2 = .01/2 = .005. From Table III, Appendix D, with 𝑑𝑓 = 𝑛 − 𝑘 + 1 = 25 − 2 + 1 = 22, 𝑡. = 2.819. The confidence interval is: 𝛽 ± 𝑡.

𝑠

⇒ .92 ± 2.819 .27 ⇒ .92 ± .761 ⇒ .159, 1.681

We are 99% confident that β2 falls between .159 and 1.681. 12.5

The number of degrees of freedom available for estimating 𝜎 is 𝑛 − 𝑘 + 1 where k is the number of independent variables in the regression model. Each additional independent variable placed in the model causes a corresponding decrease in the degrees of freedom.

12.6

a.

For 𝑥 = 1 and 𝑥 = 3, 𝐸 𝑦 = 1 + 2𝑥 + 1 − 3 3 = −7 + 2𝑥 The graph is :

b.

For𝑥 = −1and𝑥 = 1, 𝐸 𝑦 = 1 + 2𝑥 + −1 − 3 1 = −3 + 2𝑥

Copyright © 2022 Pearson Education, Inc.


626 Chapter 12 The graph is:

12.7

c.

The two lines are parallel, each with a slope of 2. They have different y-intercepts.

d.

The relationship will be parallel lines.

a.

Yes. Since 𝑅 = .92 is close to 1, this indicates the model provides a good fit. Without knowledge of the units of the dependent variable, the value of SSE cannot be used to determine how well the model fits.

b.

𝐻 :𝛽 = 𝛽 = ⋯ = 𝛽 = 0 𝐻 : At least one 𝛽 ≠ 0 The test statistic is 𝐹 =

/ /

=

. .

/

/

= 55.2

The rejection region requires 𝛼 = .05 in the upper tail of the F distribution with 𝜈 = 𝑘 = 5 and 𝜈 = 𝑛– 𝑘 + 1 = 30– 5 + 1 = 24. From Table VI, Appendix D, 𝐹. = 2.62. The rejection region is 𝐹 > 2.62. Since the observed value of the test statistic falls in the rejection region 𝐹 = 55.2 > 2.62 , H0 is rejected. There is sufficient evidence to indicate the model is useful in predicting y at 𝛼 = .05. 12.8

No. There may be other independent variables that are important that have not been included in the model, while there may also be some variables included in the model which are not important. The only conclusion is that at least one of the independent variables is a good predictor of y.

12.9

a.

𝐸 𝑦 =𝛽 +𝛽 𝑥 +𝛽 𝑥 +𝛽 𝑥 +𝛽 𝑥 +𝛽 𝑥

b.

𝐻 :𝛽 =𝛽 =⋯=𝛽 =0 𝐻 : At least one 𝛽 ≠ 0 The test statistic is 𝐹 = 103.0 The p-value is p < .01 Since the p-value is small 𝑝 < .01 < .05 , H0 is rejected. There is sufficient evidence to indicate the model is useful in predicting a player’s salary at 𝛼 = .05.

c.

𝑅 = R-Sq adj = .89. 89% of the total sample variation of the player’s salaries can be explained by the model containing x1, x2, x3, x4, and x5 adjusted for the sample size and the number of parameters in the model. Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building d.

627

To determine if points scored in the last 4 minutes is a statistically useful predictor of an NBA player’s salary, we test: 𝐻 : 𝛽 =0 𝐻 : 𝛽 ≠0 The p-value for the test is p = .9670. Since the p-value is large 𝑝 = .9670 > .05 , H0 is not rejected. There is insufficient evidence to indicate that the points scored in the last 4 minutes is a statistically useful predictor of an NBA player’s salary at 𝛼 = .05.

e.

𝛽 : For each one attempt increase in the number of last minute field goal attempts, the mean player’s salary is estimated to increase by 470.9 units, holding all other variables constant.

f.

To determine if last minute field goal attempts is a statistically useful predictor of an NBA player’s salary, we test: 𝐻 :𝛽 =0 𝐻 : 𝛽 ≠0 The p-value for the test is p = .0236. Since the p-value is small 𝑝 = .0236 < .05 , H0 is rejected. There is sufficient evidence to indicate that last minute field goal attempts is a statistically useful predictor of an NBA player’s salary at 𝛼 = .05.

12.10

a.

The 1st-order model is𝑦 = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝜀.

b.

𝛽 represents the change in revenue for every 1-tweet increase in tweet rate, holding PN-ratio constant.

c.

𝛽 represents the change in revenue for every 1-unit increase in PN-ratio, holding tweet rate constant.

d.

𝑅 = .945. 94.5% of the sample variation in revenue about its mean is explained by the model containing tweet rate and PN-ratio. 𝑅 = .940. 94.0% of the sample variation in revenue about its mean is explained by the model containing tweet rate and PN-ratio, adjusting for the sample size and the number of parameters in the model.

e.

𝐻 :𝛽 =𝛽 =0 𝐻 : At least one 𝛽 ≠ 0 The test statistic is 𝐹 =

/ /

=

. .

/ /

= 180.41.

The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution with 𝜈 = 𝑘 = 2and 𝜈 = 𝑛 − 𝑘 + 1 = 24 − 2 + 1 = 21. From Table VI, Appendix D, 𝐹. = 3.47. The rejection region is 𝐹 > 3.47. Since the observed value of the test statistic falls in the rejection region 𝐹 = 180.41 > 3.47 , H0 is rejected. There is sufficient evidence that the model is adequate for predicting y at 𝛼 = .05.

Copyright © 2022 Pearson Education, Inc.


628 Chapter 12 f.

For testing 𝐻 : 𝛽 = 0, the p-value is 𝑝 < .0001. Since the p-value is less than 𝛼 𝑝 < .0001 < .01 , H0 is rejected. There is sufficient evidence of a linear relationship between revenue and tweet rate, adjusted for PN-ratio. For testing 𝐻 : 𝛽 = 0, the p-value is 𝑝 < .0001. Since the p-value is less than 𝛼 𝑝 < .0001 < .01 , H0 is rejected. There is sufficient evidence of a linear relationship between revenue and PN-ratio, adjusted for tweet rate.

12.11

a.

The first order model is 𝐸 𝑦 = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 .

b.

𝛽 : For each unit increase in air quality, the mean satisfaction score is estimated to increase by .122. 𝛽 : For each unit increase in temperature, the mean satisfaction score is estimated to increase by .018. 𝛽 : For each unit increase in odor/aroma, the mean satisfaction score is estimated to increase by .124. 𝛽 : For each unit increase in music, the mean satisfaction score is estimated to increase by .119. 𝛽 : For each unit increase in noise/sound level, the mean satisfaction score is estimated to increase by .101. 𝛽 : For each unit increase in overall image, the mean satisfaction score is estimated to increase by .463.

12.12

12.13

c.

We are 99% confident that mean satisfaction will increase between .350 and .576 units for each unit increase in overall image.

d.

𝑅 = .501. 50.1% of the simple variation in satisfaction scores is explained by the model including the 6 vairables, adjusted for the number of variables and the simple size.

e.

Yes. The test statistic is 𝐹 = 71.42. Using MINITAB with 𝜈 = 𝑘 = 6 and 𝜈 = 𝑛– 𝑘 + 1 = 422– 6 + 1 = 415, the p-value is 𝑝 = 𝑃 𝐹 > 71.42 ≈ 0. Since the p-value is less than α 𝑝 = 0 < .01 , H0 is rejected. There is sufficient evidence to indicate the overall model is useful for predicting hotel image at 𝛼 = .01.

a.

The two properties are that the sum of the errors of prediction is 0 and the sum of the squares of the errors of prediction is SSE.

b.

𝛽 = .42. For each unit change in the betweenness centrality score, the mean lead-user rating is estimated to increase by .42, holding all other variables constant.

c.

Since the p-value is less than 𝛼 𝑝 = .002 < .05 , H0 is rejected. There is sufficient evidence to indicate that there is a significant linear relationship between betweenness centrality and lead-user rating, holding all other variables constant at 𝛼 = .05.

a.

To determine if the overall model is useful, we test: 𝐻 :𝛽 =𝛽 =𝛽 =0 𝐻 : At least one 𝛽 ≠ 0 The p-value is 𝑝 = .049. Since the p-value is less than 𝛼 𝑝 = .049 < .05 , H0 is rejected. There is sufficient evidence to indicate the model is useful at 𝛼 = .05.

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building

629

𝑅 = .075. 7.5% of the simple variation of the percentages of silver in the alloy can be explained by the model. b.

To determine if the overall model is useful, we test: 𝐻 :𝛽 =𝛽 =𝛽 =0 𝐻 : At least one 𝛽 ≠ 0 The p-value is 𝑝 < .001. Since the p-value is less than 𝛼 𝑝 < .001 < .05 , H0 is rejected. There is sufficient evidence to indicate the model is useful at 𝛼 = .05. 𝑅 = .783. 78.3% of the simple variation of the percentages of iron in the alloy can be explained by the model.

c.

Using MINITAB, the relationship between percentage of siver and proportion of aluminum scraps from cans could look something like: Scatterplot of y-silver vs x1 0.00200

0.00175

y-silver

0.00150

0.00125

0.00100

0.00075

0.00050 0

20

40

60

80

100

x1

To determine if the relationship between percentage of silver and the proportion of aluminum scaps from cans is significant, we test: 𝐻 :𝛽 =0 𝐻 :𝛽 ≠0 The p-value is 𝑝 = .015. Since the p-value is less than 𝛼 𝑝 < .015 < .05 , H0 is rejected. There is sufficient evidence to indicate the relationship between percentage of silver and the proportion of aluminum scaps from cans is significant at 𝛼 = .05. d.

Using MINITAB, the relationship between percentage of iron and proportion of aluminum scraps from cans could look something like:

Copyright © 2022 Pearson Education, Inc.


630 Chapter 12 Scatterplot of y-iron vs x1 2.0

y-iron

1.5

1.0

0.5

0.0 75

80

85

90

95

100

x1

To determine if the relationship between percentage of iron and the proportion of aluminum scaps from cans is significant, we test: 𝐻 :𝛽 =0 𝐻 :𝛽 ≠0 The p-value is 𝑝 < .001. Since the p-value is less than 𝛼 𝑝 < .001 < .05 , H0 is rejected. There is sufficient evidence to indicate the relationship between percentage of iron and the proportion of aluminum scaps from cans is significant at 𝛼 = .05. 12.14

a.

The least square prediction equation is 𝑦 = −295 + .122𝑥 + .443𝑥 + .793𝑥 + .400𝑥 + 1.742𝑥 + .647𝑥 − .754𝑥 − .0542𝑥 + .0353𝑥

b.

𝛽 = .122. For each additional walk, the mean total number of runs scored is estimated to increase by .122. 𝛽 = .443. For each additional single, the mean total number of runs scored is estimated to increase by .443. 𝛽 = .793. For each additional double, the mean total number of runs scored is estimated to increase by .793. 𝛽 = .400. For each additional triple, the mean total number of runs scored is estimated to increase by .400. 𝛽 = 1.742. For each additional home run, the mean total number of runs scored is estimated to increase by 1.742. 𝛽 = .647. For each additional stolen base, the mean total number of runs scored is estimated to increase by .647. 𝛽 = −.754. For each additional time caught stealing, the mean total number of runs scored is estimated to decrease by .754. 𝛽 = −.0542. For each additional strike out, the mean total number of runs scored is estimated to decrease by .0542. 𝛽 = .0353. For each additional ground out, the mean total number of runs scored is estimated to increase by .0353. Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building

c.

631

𝐻 : 𝛽 =0 𝐻 : 𝛽 <0 The test statistic is 𝑡 = −1.03 and the p-value is 𝑝 = .316/2 = .158. Since the p-value is not less than 𝛼 𝑝 = .158 ≮ . 05 , H0 is not rejected. There is insufficient evidence to indicate a negative relationship exists between the total number of runs scored and the number of times caught stealing at 𝛼 = .05.

12.15

d.

The 95% confidence interval is 1.248, 2.236 . We are 95% confident that for each additional home run hit, the mean total number of runs scored will increase between 1.248 and 2.236 runs.

a.

The first order model is 𝐸 𝑦 = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 .

b.

T determine if the overall model is statistically useful for predicting sustainability index, we test: 𝐻 :𝛽 =𝛽 =⋯=𝛽 =0 𝐻 : At least one 𝛽 ≠ 0 The test statistic is 𝐹 = 9.35 The p-value is p < .01 Since the p-value is small 𝑝 < .01 < .05 , H0 is rejected. There is sufficient evidence to indicate the model is useful in predicting sustainability index at 𝛼 = .05.

c.

𝑅 is the preferred measure of model fit. From the printout, 𝑅 = .28. This indicates that 28% of the total sample variation in sustainability index is explained by the model containing self-transcendence, passion, gender,children, firm size, and education level, adjusting for the sample size and the number of variables in the model.

d.

𝛽 = .36. For each additional one-point increase in self-transcendence score, the mean sustainabilty index is estimated to increase by .36. 𝛽 = −.21. For each additional one-point increase in passion score, the mean sustainabilty index is estimated to decrease by .21. 𝛽 = −.14. We estimate the mean sustainabilty index for male restaurant owners to be .14 points lower than the mean sustainabilty index for female restaurant owners. 𝛽 =-.11. We estimate the mean sustainabilty index for restaurant owners with children to be .11 points lower than the mean sustainabilty index for restaurant owners without children. 𝛽 = −.03. For each additional one person increase in firm size, the mean sustainabilty index is estimated to decrease by .03.

. 𝛽 = .19. For each additional one-point increase in education level, the mean sustainabilty index is estimated to increase by .19. Independent variables will be considered useful predictors of sustaibability if the corresponding test for that variable results in a p-value < .05. Looking at the printout, we see that the following variables will be considered useful predictor of sustainability: Self-transcendence, Passion, and Education level.

Copyright © 2022 Pearson Education, Inc.


632 Chapter 12

12.16

a.

Let 𝑥 =latitude, 𝑥 =longitude, and𝑥 =depth. The 1st-order model is𝐸 𝑦 = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽𝑥 .

b.

Using MINITAB, the results are: Regression Analysis: ARSENIC versus LATITUDE, LONGITUDE, DEPTH-FT The regression equation is ARSENIC = - 86868 - 2219 LATITUDE + 1542 LONGITUDE - 0.350 DEPTH-FT 327 cases used, 1 cases contain missing values Predictor Coef SE Coef T P Constant -86868 31224 -2.78 0.006 LATITUDE -2218.8 526.8 -4.21 0.000 LONGITUDE 1542.2 373.1 4.13 0.000 DEPTH-FT -0.3496 0.1566 -2.23 0.026 S = 103.301

R-Sq = 12.8%

R-Sq(adj) = 12.0%

Analysis of Variance Source DF SS MS F P Regression 3 505770 168590 15.80 0.000 Residual Error 323 3446791 10671 Total 326 3952562 Source DF Seq SS LATITUDE 1 132448 LONGITUDE 1 320144 DEPTH-FT 1 53179

The least squares model is: 𝑦 = −86,868 − 2,218.8𝑥 + 1,542.2𝑥 − .3496𝑥 c.

𝛽 = −2,218.8. For each unit increase in latitude, the mean arsenic level is estimated to decrease by 2,218.8, holding longitude and depth constant. 𝛽 = 1,542.2. For each unit increase in longitude, the mean arsenic level is estimated to increase by 1,542.2, holding latitude and depth constant. 𝛽 = −.3496. For each unit increase in depth, the mean arsenic level is estimated to decrease by .3496, holding latitude and longitude constant.

d.

From the printout, the 𝑠 = 103.301. We would expect about 95% of all observations to fall within 2𝑠 = 2 103.301 = 206.602 units of their predicted values.

e.

From the printout, 𝑅 = .128. 12.8% of the total sample variation of the arsenic levels is explained by the model containing latitude, longitude, and depth. From the printout, 𝑅 = .120. 12.0% of the total sample variation of the arsenic levels is explained by the model containing latitude, longitude, and depth, adjusting for the sample size and number of independent variables in the model.

f.

To determine if the model is adequate, we test: 𝐻 :𝛽 =𝛽 =𝛽 =0 𝐻 : At least one 𝛽 ≠ 0

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building

633

From the printout, the test statistic is 𝐹 = 15.80 and the p-value is 𝑝 = 0.000. Since the p-value is less than 𝛼 𝑝 = 0.000 < .05 , H0 is rejected. There is sufficient evidence to indicate the model is adequate at 𝛼 = .05.

12.17

g.

Although the model was found to be adequate for 𝛼 = .05, it is not a particularly good model. The R2 = .120. Only about 12% of the variation in arsenic values is explained by value is only .128 and 𝑅 the model.

a.

Using MINITAB, the results of fitting the first-order model are: Regression Analysis: DESIRE versus GENDER, SELFESTM, BODYSAT, IMPREAL The regression equation is DESIRE = 14.0 - 2.19 GENDER - 0.0479 SELFESTM - 0.322 BODYSAT + 0.493 IMPREAL Predictor Coef SE Coef T P Constant 14.0107 0.7753 18.07 0.000 GENDER -2.1865 0.6766 -3.23 0.001 SELFESTM -0.04794 0.03669 -1.31 0.193 BODYSAT -0.3223 0.1435 -2.25 0.026 IMPREAL 0.4931 0.1274 3.87 0.000 S = 2.25087

R-Sq = 49.8%

R-Sq(adj) = 48.5%

Analysis of Variance Source DF SS MS F P Regression 4 827.83 206.96 40.85 0.000 Residual Error 165 835.95 5.07 Total 169 1663.79 Source DF Seq SS GENDER 1 674.64 SELFESTM 1 57.66 BODYSAT 1 19.62 IMPREAL 1 75.91

The least squares prediction equation is𝑦 = 14.0107 − 2.1865𝑥 − .04794𝑥 − .3223𝑥 + .4931𝑥 b.

𝛽 = 14.0107. This has no meaning other than the y-intercept. 𝛽 = −2.1865. The mean value of desire to have cosmetic surgery is estimated to be 2.1865 units lower for males than females, holding all other variables constant. 𝛽 = −0.04794. For each unit increase in self-esteem, the mean value of desire to have cosmetic surgery is estimated to decrease by 0.04794 units, holding all other variables constant. 𝛽 = −0.3223. For each unit increase in body satisfaction, then mean value of desire to have cosmetic surgery is estimated to decrease by .3223 units, holding all other variables constant. 𝛽 = 0.4931. For each unit increase in impression of reality TV, the mean value of desire to have cosmetic surgery is estimated to increase by 0.4931 units, holding all other variables constant.

c.

To determine if the overall model is useful for predicting desire to have cosmetic surgery, we test: 𝐻 :𝛽 =𝛽 =𝛽 =𝛽 =0 𝐻 : At least 1𝛽 ≠ 0 From the printout, the test statistic is 𝐹 = 40.85 and the p-value is 𝑝 = 0.000. Copyright © 2022 Pearson Education, Inc.


634 Chapter 12

Since the p-value is less than 𝛼 𝑝 = 0.000 < .01 , H0 is rejected. There is sufficient evidence to indicate the overall model is useful for predicting desire to have cosmetic surgery at 𝛼 = .01. d.

𝑅 is the preferred measure of model fit. From the printout, 𝑅 = .485. This indicates that 48.5% of the total sample variation in desire values is explained by the model containing gender, self-esteem, body satisfaction and impression of reality TV, adjusting for the sample size and the number of variables in the model.

e.

To determine if the desire to have cosmetic surgery decreases linearly as level of body satisfaction increases, we test: 𝐻 : 𝛽 =0 𝐻 : 𝛽 <0 From the printout, the test statistic is 𝑡 = −2.25 and the p-value is 𝑝 = .026/2 = .013. Since the p-value is less than 𝛼 𝑝 = 0.013 < .05 , H0 is rejected. There is sufficient evidence to indicate the desire to have cosmetic surgery decreases linearly as level of body satisfaction increases, holding all other variables constant at 𝛼 = .05.

f.

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table III, Appendix D, with 𝑑𝑓 = 𝑛 − 𝑘 + 1 = 170 − 4 + 1 = 165, 𝑡. 25 ≈ 1.98. The 95% confidence interval is 𝛽 ± 𝑡.

𝑠

⇒ .4931 ± 1.98 .1274 ⇒ .4931 ± .2523 ⇒ . 2408, .7454

We are 95% confident that the increase in mean desire for cosmetic surgery is between .2408 and .7454 for each unit increase in impression of reality TV, holding all other variables constant. 12.18

a.

From MINITAB, the output is: Regression Analysis: DDT versus Mile, Length, Weight The regression equation is DDT = - 108 + 0.0851 Mile + 3.77 Length - 0.0494 Weight Predictor Coef Constant -108.07 Mile 0.08509 Length 3.771 Weight -0.04941 S = 97.48

SE Coef 62.70 0.08221 1.619 0.02926

R-Sq = 3.9%

T -1.72 1.03 2.33 -1.69

P 0.087 0.302 0.021 0.094

R-Sq(adj) = 1.8%

Analysis of Variance Source Regression Residual Error Total

DF 3 140 143

SS 53794 1330210 1384003

MS 17931 9501

F 1.89

P 0.135

The least squares prediction equation is: 𝑦 = −108.07 + 0.08509𝑥 + 3.771𝑥 – 0.04941𝑥 b.

𝑠 = 97.48. We would expect about 95% of the observed values of DDT level to fall within 2𝑠 = 2 97.48 = 194.96 units of their least squares predicted values.

c.

To determine if at least one of the variables is useful in predicting the DDT level, we test: 𝐻 :𝛽 =𝛽 =𝛽 =0 Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building

635

𝐻 : At least one 𝛽 ≠ 0 The test statistic is 𝐹 = 1.89 and the p-value is 𝑝 = .135. Since the p-value is not less than 𝛼 𝑝 = .135 ≮ . 05 , H0 is not rejected. There is insufficient evidence to indicate at least one of the variables is useful in predicting the DDT level at 𝛼 = .05. d.

To determine if DDT level increases as length increases, we test: 𝐻 :𝛽 =0 𝐻 :𝛽 >0 The test statistics is 𝑡 = 2.33 and the p-value is 𝑝 = .021/2 = .0105. Since the p-value is less than 𝛼 𝑝 = .0105 < .05 , H0 is rejected. There is sufficient evidence to indicate that DDT level increases as length increases, holding the other variables constant at 𝛼 = .05. The observed significance level is 𝑝 = .0105.

e.

For confidence coefficient .95, 𝛼 = .05 and 𝛼/2 = .05/2 = .025. From Table III, Appendix D, with = 1.96. The 95% confidence interval is: 𝑑𝑓 = 𝑛 − 𝑘 + 1 = 144 − 3 + 1 = 140, 𝑡. 𝛽 ±𝑡 / 𝑠

⇒ −0.049 ± 1.96 0.02926 ⇒ −0.049 ± 0.057 ⇒ −0.106, 0.008

We are 95% confident that the mean DDT level will change from −0.106 to 0.008 for each additional point increase in weight, holding length and mile constant. Since 0 is in the interval, there is no evidence that weight and DDT level are linearly related. 12.19

12.20

a.

If the theory is correct, then 𝛽 will be positive as a positive linear relationship is expected between y and x1.

b.

If the theory is correct, then 𝛽 will be positive as a positive linear relationship is expected between y and x2.

c.

β will be positive (mean for the envied target being a friend is greator than the mean for the envied target not being a friend.

a.

Using MINITAB, the results are: Regression Equation Academic Rep Score

=

-23.6 + 0.000446 Avg Financial Aid - 0.000267 Avg Net Cost + 0.001550 Early Career Pay + 5.1 % High Meaning - 36.0 % STEM

Coefficients Term Constant Avg Financial Aid Avg Net Cost Early Career Pay % High Meaning % STEM

Coef SE Coef -23.6 26.8 0.000446 0.000224 -0.000267 0.000207 0.001550 0.000381 5.1 36.2 -36.0 12.2

T-Value -0.88 1.99 -1.29 4.07 0.14 -2.96

P-Value 0.385 0.053 0.205 0.000 0.888 0.005

VIF 3.65 1.87 4.88 1.57 4.32

Copyright © 2022 Pearson Education, Inc.


636 Chapter 12 Model Summary S R-sq 7.92041 68.52%

R-sq(adj) 64.94%

R-sq(pred) 60.11%

Analysis of Variance Source Regression Avg Financial Aid Avg Net Cost Early Career Pay % High Meaning % STEM Error Total

DF Adj SS 5 6007.43 1 248.98 1 103.89 1 1036.79 1 1.26 1 550.51 44 2760.25 49 8767.68

Adj MS 1201.49 248.98 103.89 1036.79 1.26 550.51 62.73

F-Value 19.15 3.97 1.66 16.53 0.02 8.78

P-Value 0.000 0.053 0.205 0.000 0.888 0.005

The fitted model is 𝑦 = −23.6 + .000446𝑥 − .000267𝑥 + .001550𝑥 + 5.1𝑥 − 36.0𝑥 . b.

The standard deviation is 𝑠 = 7.92041. Approximately 95% of the actual academic reputation scores will fall within 2𝑠 = 2 7.92041 = 15.84082 units of the least squares predition line.

c.

𝑅 = .6852. 68.52% of the sample variation in academic reputation scores is explained by the model containing the 5 variables.

d.

To determine if the overall model is adequate, we test: 𝐻 : 𝛽 =𝛽 =𝛽 =𝛽 =𝛽 =0 𝐻 : At least one 𝛽 ≠ 0 The test statistic is 𝐹 = 19.15 and the p-value is 𝑝 = .000. Since the p-value is less than 𝛼 𝑝 = .000 < .05 , H0 is rejected. There is sufficient evidence to indicate the overall model is adequate at 𝛼 = .05.

e.

For confidence coefficient .90, 𝛼 = .10 and 𝛼/2 = .10/2 = .05. Using MINITAB, with 𝑑𝑓 = 𝑛 − 𝑘 + 1 = 50 − 5 + 1 = 44, 𝑡. = 1.680. The confidence interval is: 𝛽 ± 𝑡. 𝑠

⇒ 5.1 ± 1.680 36.2 ⇒ 5.1 ± 60.816 ⇒ −55.716, 65.916

Because 0 is contained in the interval, we don’t have sufficient evidence to indicate that % High Meaning is a useful linear predictor of Academic Reputation score at the 90% confident level. 12.21

a.

Using MINITAB, the results are: Regression Analysis: Diameter versus MassFlux, HeatFlux Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 2 0.11914 0.05957 1.00 0.391 MassFlux 1 0.10832 0.10832 1.82 0.198 HeatFlux 1 0.01082 0.01082 0.18 0.676 Error 15 0.89350 0.05957 Total 17 1.01264

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building

637

Model Summary S 0.244063

R-sq 11.77%

R-sq(adj) 0.00%

R-sq(pred) 0.00%

Coefficients Term Constant MassFlux HeatFlux

Coef 1.088 -0.000234 -0.080

SE Coef 0.184 0.000174 0.188

T-Value 5.92 -1.35 -0.43

P-Value 0.000 0.198 0.676

VIF 1.00 1.00

Regression Equation Diameter = 1.088 - 0.000234 MassFlux - 0.080 HeatFlux

The fitted model is 𝑦 = 1.088 − .000234𝑥 − .080𝑥 . To determine if the overall model is adequate, we test: 𝐻 :𝛽 =𝛽 =0 𝐻 : At least one 𝛽 ≠ 0 The test statistic is 𝐹 = 1.00 and the p-value is 𝑝 = .391. Since the p-value is not small, H0 is not rejected. There is insufficient evidence to indicate the overall model is adequate for any 𝛼 < .391. b.

Using MINITAB, the results are: Regression Analysis: Density versus MassFlux, HeatFlux Analysis of Variance Source Regression MassFlux HeatFlux Error Total

DF 2 1 1 15 17

Adj SS 1.92985E+11 6614680108 1.86370E+11 11931156939 2.04916E+11

Adj MS 96492327113 6614680108 1.86370E+11 795410463

R-sq(adj) 93.40%

R-sq(pred) 91.27%

SE Coef 21237 20.1 21692

T-Value -0.05 -2.88 15.31

F-Value 121.31 8.32 234.31

Model Summary S 28203.0

R-sq 94.18%

Coefficients Term Constant MassFlux HeatFlux

Coef -1030 -57.9 332037

P-Value 0.962 0.011 0.000

VIF 1.00 1.00

Regression Equation Density = -1030 - 57.9 MassFlux + 332037 HeatFlux

The fitted model is yˆ = −1, 030 − 57.9 x1 + 332, 037 x2 . To determine if the overall model is adequate, we test:

Copyright © 2022 Pearson Education, Inc.

P-Value 0.000 0.011 0.000


638 Chapter 12 𝐻 :𝛽 =𝛽 =0 𝐻 : At least one 𝛽 ≠ 0 The test statistic is 𝐹 = 121.31 and the p-value is 𝑝 = .000. Since the p-value is so small, H0 is rejected. There is sufficient evidence to indicate the overall model is adequate for any reasonable value of 𝛼. c. 12.22

The density is better predicted by mass flux and heat flux. The model predicting diameter is not adequate and should not be used, while the model predicting density is adequate.

To determine if the model is useful, we test: 𝐻 : 𝛽 =𝛽 =⋯=𝛽 =0 𝐻 : At least one 𝛽 ≠ 0 /

The test statistic is 𝐹 =

=

/

. .

/

/

= 1.06.

The rejection region requires 𝛼 = .05 in the upper tail of the F distribution with 𝜈 = 𝑘 = 18 and 𝜈 = 𝑛– 𝑘 + 1 = 20– 18 + 1 = 1. From Table VI, Appendix D, 𝐹. ≈ 245.9. The rejection region is 𝐹 > 245.9. Since the observed value of the test statistic does not fall in the rejection region 𝐹 = 1.06 ≯ 245.9 , H0 is not rejected. There is insufficient evidence to indicate the model is adequate at 𝛼 = .05. Note: Although R2 is large, there are so many variables in the model that𝜈 is small. 12.23

a.

𝐻 :𝛽 =0 𝐻 :𝛽 ≠0 The test statistic is 𝑡 =

=

. .

= 5.00

Since no 𝛼 was given, we will use 𝛼 = .05. The rejection region requires 𝛼/2 = .05/2 = .025 in each tail of the t-distribution. From Table III, Appendix D, with 𝑑𝑓 = 𝑛 − 𝑘 + 1 = 29 − = 2.069. The rejection region is 𝑡 < −2.069 or 𝑡 > 2.069. 5 + 1 = 23, 𝑡. Since the observed value of the test statistic falls in the rejection region 𝑡 = 5.00 > 2.069 , H0 is rejected. There is sufficient evidence to indicate that there is a linear relationship between vintage year and the logarithm of price, adjusting for all other variables at 𝛼 = .05. 𝐻 :𝛽 =0 𝐻 :𝛽 ≠0 The test statistic is 𝑡 =

=

. .

= 5.00.

The rejection region is 𝑡 < −2.069 or 𝑡 > 2.069. Since the observed value of the test statistic falls in the rejection region 𝑡 = 5.00 > 2.069 , H0 is rejected. There is sufficient evidence to indicate that there is a linear relationship between average growing season temperature and the logarithm of price, adjusting for all other variables at 𝛼 = .05. 𝐻 :𝛽 =0 𝐻 : 𝛽 ≠0

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building The test statistic is 𝑡 =

.

=

.

639

= −4.00

The rejection region is 𝑡 < −2.069 or 𝑡 > 2.069. Since the observed value of the test statistic falls in the rejection region 𝑡 = −4.00 < −2.069 , H0 is rejected. There is sufficient evidence to indicate that there is a linear relationship between Sept./Aug. rainfall and the logarithm of price, adjusting for all other variables at 𝛼 = .05. 𝐻 :𝛽 =0 𝐻 : 𝛽 ≠0 The test statistic is 𝑡 =

=

. .

= 3.00

The rejection region is 𝑡 < −2.069 or 𝑡 > 2.069. Since the observed value of the test statistic falls in the rejection region 𝑡 = 3.00 > 2.069 , H0 is rejected. There is sufficient evidence to indicate that there is a linear relationship between rainfall in months preceding vintage and the logarithm of price, adjusting for all other variables at 𝛼 = .05. 𝐻 : 𝛽 =0 𝐻 :𝛽 ≠0 The test statistic is 𝑡 =

=

. .

= .015.

The rejection region is 𝑡 < −2.069 or 𝑡 > 2.069. Since the observed value of the test statistic does not fall in the rejection region 𝑡 = .015 ≯ 2.069 , H0 is not rejected. There is insufficient evidence to indicate that there is a linear relationship between average September temperature and the logarithm of price, adjusting for all other variables at 𝛼 = .05. b.

𝛽 = .03, 𝑒 .

− 1 = .030

We estimate that the mean price will increase by 3.0% for each additional year increase in x1, vintage year (with all other variables held constant). 𝛽 = .60, 𝑒 .

− 1 = .822

We estimate that the mean price will increase by 82.2% for each additional degree increase in x2, average growing season temperatures in °C (with all other variables held constant). 𝛽 = −.004, 𝑒 .

− 1 = −.004

We estimate that the mean price will decrease by .4% for each additional centimeter increase in x3, Sept./Aug. rainfall in cm, (with all other variables held constant). 𝛽 = .0015, 𝑒 .

− 1 = .0015

We estimate that the average mean price will increase by .15% for each additional centimeter increase in x4, rainfall in months preceding vintage in cm (with all other variables held constant). 𝛽 = .008, 𝑒 .

− 1 = .008 Copyright © 2022 Pearson Education, Inc.


640 Chapter 12

We estimate that the average mean price will increase by .8% for each additional degree increase in x5, average Sept. temperature in °C (with all other variables held constant). c.

𝑅 = .85. 85% of the sample variation in the logarithm of price values is explained by the model containing the 5 variables. 𝑠 = .30. Approximately 95% of the values of the logarithm of price will fall within 2𝑠 = 2 . 30 = .60 units of their predicted values. This model appears to be a fairly good model. The standard deviation is fairly small and the R2 value is fairly large. Four of the five independent variables are significant. A better model might be one that does not include 𝑥 , the average September temperature because it was not statistically significant.

12.24

a.

From MINITAB, the output is: Regression Analysis: Labor versus Pounds, Units, Weight The regression equation is Labor = 132 + 2.73 Pounds + 0.0472 Units - 2.59 Weight Predictor Constant Pounds Units Weight

Coef 131.92 2.726 0.04722 -2.5874

S = 9.810

SE Coef 25.69 2.275 0.09335 0.6428

R-Sq = 77.0%

T 5.13 1.20 0.51 -4.03

P 0.000 0.248 0.620 0.001

R-Sq(adj) = 72.7%

Analysis of Variance Source Regression Residual Error Total Source Pounds Units Weight

DF 3 16 19

DF 1 1 1

SS 5158.3 1539.9 6698.2

MS 1719.4 96.2

F 17.87

P 0.000

Seq SS 3400.6 198.4 1559.3

The least squares equation is𝑦 = 131.92 + 2.726𝑥 + .04722𝑥 − 2.5874𝑥 b.

To test the usefulness of the model, we test: 𝐻 :𝛽 = 𝛽 = 𝛽 = 0 𝐻 :At least one 𝛽 ≠ 0 The test statistic is 𝐹 =

=

. .

= 17.87.

The rejection region requires 𝛼 = .01 in the upper tail of the F-distribution with 𝜈 = 𝑘 = 3 and 𝜈 = 𝑛– 𝑘 + 1 = 20– 3 + 1 = 16. From Table VIII, Appendix D, 𝐹. = 5.29. The rejection region is 𝐹 > 5.29. Since the observed value of the test statistic falls in the rejection region 𝐹 = 17.87 > 5.29 , H0 is rejected. There is sufficient evidence to indicate a relationship exists between hours of labor and at least one of the independent variables at 𝛼 = .01.

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building c.

641

𝐻 :𝛽 = 0 𝐻 :𝛽 ≠ 0 The test statistic is 𝑡 = .51 and the p-value is 𝑝 = .620. Since the p-value is not less than 𝛼 𝑝 = .620 ≮ . 05 , do not reject H0. There is insufficient evidence to indicate a relationship exists between hours of labor and percentage of units shipped by truck, all other variables held constant, at 𝛼 = .05.

d.

R2 is printed as R-Sq. 𝑅 = .770. We conclude that 77% of the sample variation of the labor hours is explained by the regression model, including the independent variables pounds shipped, percentage of units shipped by truck, and weight.

e.

If the average number of pounds per shipment increases from 20 to 21, the estimated change in mean number of hours of labor is −2.587. Thus, it will cost $7.50 2.587 = $19. 4025 less, if the variables x1 and x2 are constant.

f.

Since𝑠 = 9.81, we can estimate approximately with±2𝑠precision or ±2 9.81 or ± 19.62hours.

g.

No. Regression analysis only determines if variables are related. It cannot be used to determine cause and effect.

12.25

a. b.

For 𝑥 = 1, 𝑥 = 10, 𝑥 = 5, and 𝑥 = 2, 𝑦 = 3.58 + .01 1 − .06 10 − .01 5 + .42 2 = 3.78 For 𝑥 = 0, 𝑥 = 8, 𝑥 = 10, and 𝑥 = 4, 𝑦 = 3.58 + .01 0 − .06 8 − .01 10 + .42 4 = 4.68

12.26

a.

For 𝑥 = 10, 𝑥 = 0, and 𝑥 = 1, 𝑦 = 52,484 + 2,941 10 + 16,880 0 + 11,108 1 = 93,002.

b.

For 𝑥 = 10, 𝑥 = 1, and 𝑥 = 0, 𝑦 = 52,484 + 2,941 10 + 16,880 1 + 11,108 0 = 98,774

c.

A 95% prediction interval is preferred over point estimates because point estimates do not take into account variability. The prediction intervals take into account the variability in the salaries of individuals with the same qualifications.

a.

The confidence interval is 13.42,14.31 . We are 95% confident that the mean desire to have cosmetic surgery is between 13.42 and 14.31 for females with a self-esteem of 24, body satisfaction of 3 and impression of reality TV of 4.

b.

The confidence interval is 8.79,10.89 . We are 95% confident that the mean desire to have cosmetic surgery is between 8.79 and 10.89 for males with a self-esteem of 22, body satisfaction of 9 and impression of reality TV of 4.

12.27

12.28

From the printout, the 90% prediction interval is −143.2178, 180.9784 . We are 90% confidence that an actual DDT level for a fish caught 300 miles upstream that is 40 centimeters long and weighs 800 grams will be between −143.2178 and 180.9784. Since the DDT level cannot be negative, the interval would be between 0 and 180.9784.

12.29

a.

To determine if the overall model is useful for predicting y, we test: 𝐻 :𝛽 =𝛽 =0 𝐻 : At least one 𝛽 ≠ 0 The test statistic is 𝐹 = 5.16 and the p-value is 𝑝 = .015. Since the p-value is less than 𝛼 𝑝 = .015 < .10 , H0 is rejected. There is sufficient evidence to indicate the overall model is useful for predicting y at 𝛼 = .10.

Copyright © 2022 Pearson Education, Inc.


642 Chapter 12

12.30

b.

The 95% prediction interval is 32.5671, 58.9371 . We are 95% confident that the actual Democratic vote share for a single year when the charisma of the Democratic candidate is 75 points and they are running against an incumbent Republican president will be between 32.5671% and 58.9371%.

c.

The 95% confidence interval is 41.4671, 50.0371 . We are 95% confident that the mean Democratic vote share for all years when the charisma of the Democratic candidates is 75 points and they are running against an incumbent Republican president will be between 41.4671% and 50.0371% .

d.

Yes. The confidence interval for the mean will always be smaller than the prediction interval for the actual value. This is because there are 2 error terms involved in predicting an actual value and only one error term involved in estimating the mean. First, we have the error in locating the mean of the distribution. Once the mean is located, the actual value can still vary around the mean, thus, the second error. There is only one error term involved when estimating the mean, which is the error in locating the mean.

Yes, we agree. The fitted regression model is 𝑦 = −86,868 − 2,218.8𝑥 + 1,542.2𝑥 − .3496𝑥 . Because the estimated coefficients for latitude and depth are negative, the higher levels of arsenic will be when these levels are low. Because the estimated coefficient for longitude is positive, the higher levels of arsenic will be when longitude is high. The lowest value of latitude is 23.7547, the maximum longitude is 90.6617 and the minimum depth is 25. Using MINITAB, the output is: Predicted Values for New Observations New Obs Fit SE Fit 95% CI 1 232.54 23.27 (186.76, 278.31)

95% PI (24.21, 440.86)X

X denotes a point that is an outlier in the predictors. Values of Predictors for New Observations New Obs 1

LATITUDE 23.8

LONGITUDE 90.7

DEPTH-FT 25.0

From the printout, the 95% prediction interval is 24.21, 440.86 . We are 95% confident that the actual arsenic level will be between 24.21 and 440.86 when the latitude is 23.7547, longitude is 90.6617, and depth is 25. 12.31

You would look up the number of walks 𝑥 , singles 𝑥 , doubles 𝑥 , triples 𝑥 , home runs 𝑥 , stolen bases 𝑥 , caught stealing 𝑥 , strikeouts 𝑥 , and outs 𝑥 for your favorite team. Then use the following fitted regression line to predict the number of runs scored: 𝑦 = −295 + .122𝑥 + .443𝑥 + .793𝑥 + .400𝑥 + 1.742𝑥 + .647𝑥 − .754𝑥 − .0542𝑥 + .0353𝑥

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building 12.32

643

Using MINITAB, the results are: Settings Variable Avg Financial Aid Avg Net Cost Early Career Pay % High Meaning % STEM

Setting 18236 13463 64500 .46 .23

Prediction Fit SE Fit 95% CI 95% PI 75.0210 2.51812 (69.9461, 80.0960) (58.2712, 91.7709)

The 95% prediction interval is 58.2712, 91.7709 . We are 95% confident that the academic reputation score of a university with the above characteristics is between 58.27 and 91.77. The University of Virginia had these characteristics and had an academic reputation score of 76. This observation is within the 95% prediction interval. 12.33

a.

The first-order model would be 𝐸 𝑦 = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 .

b.

Using MINITAB, the results are: Regression Analysis: Project versus Intrapersonal, Stress, Mood Analysis of Variance Source Regression Intrapersonal Stress Mood Error Total

DF 3 1 1 1 19 22

Adj SS 70.29 44.25 34.19 10.48 167.08 237.37

Adj MS 23.429 44.251 34.186 10.482 8.794

F-Value 2.66 5.03 3.89 1.19

P-Value 0.077 0.037 0.063 0.289

Model Summary S 2.96544

R-sq 29.61%

R-sq(adj) 18.50%

R-sq(pred) 3.79%

Coefficients Term Constant Intrapersonal Stress Mood

Coef 86.90 -0.2099 0.1515 0.0733

SE Coef 3.20 0.0936 0.0769 0.0671

T-Value 27.17 -2.24 1.97 1.09

P-Value 0.000 0.037 0.063 0.289

VIF 1.06 1.06 1.09

Regression Equation Project = 86.90 - 0.2099 Intrapersonal + 0.1515 Stress + 0.0733 Mood

The fitted regression model is 𝑦 = 86.90 − .2099𝑥 + .1515𝑥 + .0733𝑥 . c.

To determine if the overall model is useful for predicting y, we test: 𝐻 : 𝛽 =𝛽 =𝛽 =0 Copyright © 2022 Pearson Education, Inc.


644 Chapter 12 𝐻 : At least one 𝛽 ≠ 0 The test statistic is 𝐹 = 2.66 and the p-value is 𝑝 = .077. Since the p-value is less than 𝛼 𝑝 = .077 < .10 , H0 is rejected. There is sufficient evidence to indicate the overall model is useful for predicting y at 𝛼 = .10. d.

𝑅 = .185. 18.5% of the sample variation in the project scores is explained by the model containing the 3 variables, adjusted for the simple size and the number of parameters in the model. 𝑠 = 2.96544 and 2𝑠 = 2 2.96544 = 5.93. Approximately 95% of the observed values of Project scores will fall within 5.93 units of their predicted values.

e.

Using MINITAB, the resutls are: Prediction for Project Regression Equation Project = 86.90 - 0.2099 Intrapersonal + 0.1515 Stress + 0.0733 Mood Variable Setting Intrapersonal 20 Stress 30 Mood 25

Fit SE Fit 95% CI 95% PI 89.0837 0.892843 (87.2149, 90.9524) (82.6017, 95.5656)

The 95% prediction interval is 82.60,95.57 . We are 95% confident that the actual Project score will be between 82.60 and 95.57 when the range of interpersonal scores is 20, the range of management scores is 30, and the range of mood scores is 25. 12.34

a.

From MINITAB, the output is: Regression Analysis: Man-Hours versus Capacity, Pressure, Type, Drum The regression equation is Man-Hours = - 3783 + 0.00875 Capacity + 1.93 Pressure + 3444 Type + 2093 Drum Predictor Constant Capacity Pressure Type Drum

Coef -3783 0.0087490 1.9265 3444.3 2093.4

S = 894.6

SE Coef 1205 0.0009035 0.6489 911.7 305.6

R-Sq = 90.3%

T -3.14 9.68 2.97 3.78 6.85

P 0.004 0.000 0.006 0.001 0.000

R-Sq(adj) = 89.0%

Analysis of Variance Source Regression Residual Error Total

DF 4 31 35

Source Capacity Pressure Type Drum

Seq SS 175007141 490357 17813091 37544266

DF 1 1 1 1

SS 230854854 24809761 255664615

MS 57713714 800315

F 72.11

Copyright © 2022 Pearson Education, Inc.

P 0.000


Multiple Regression and Model Building

645

Predicted Values for New Observations New Obs 1

Fit 1936

SE Fit 239

(

95.0% CI 1449,

2424)

(

95.0% PI 48,

3825)

Values of Predictors for New Observations New Obs 1

Capacity 150000

Pressure 500

Type 1.00

Drum 0.000000

The fitted regression line is 𝑦 = −3,783 + 0.00875𝑥 + 1.9265𝑥 + 3,444.3𝑥 + 2,093.4𝑥 . b.

To determine if the model is useful for predicting the number of man-hours needed, we test: 𝐻 :𝛽 =𝛽 =𝛽 =𝛽 =0 𝐻 : At least one 𝛽 ≠ 0 The test statistic is 𝐹 = 72.11 and the p-value is 𝑝 = 0.000. Since the p-value is less than 𝛼 𝑝 = .000 < .01 , H0 is rejected. There is sufficient evidence that the model is useful for predicting man-hours at 𝛼 = .01.

c.

The confidence interval is (14,49, 2,424). With 95% confidence, we can conclude that the mean number of man-hours for all boilers with characteristics 𝑥 = 150,000, 𝑥 = 500, 𝑥 = 1, 𝑥 = 0 will fall between 1,449 hours and 2,424 hours.

12.35

a.

The response surface is a twisted surface in three-dimensional space.

b.

For 𝑥 = 0, 𝐸 𝑦 = 3 + 0 + 2𝑥 − 0𝑥 = 3 + 2𝑥 For 𝑥 = 1, 𝐸 𝑦 = 3 + 1 + 2𝑥 − 1𝑥 = 4 + 𝑥 For 𝑥 = 2, 𝐸 𝑦 = 3 + 2 + 2𝑥 − 2𝑥 = 5 The plot of the lines is

c.

The lines are not parallel because interaction between x1 and x2 is present. Interaction between 𝑥 and 𝑥 means that the effect of 𝑥 on y depends on what level 𝑥 takes on.

d.

For 𝑥 = 0, as 𝑥 increases from 0 to 5, 𝐸 𝑦 increases from 3 to 13. For 𝑥 = 1, as 𝑥 increases from 0 to 5, 𝐸 𝑦 increases from 4 to 9. For 𝑥 = 2, as 𝑥 increases from 0 to 5, 𝐸 𝑦 = 5.

e.

For 𝑥 = 2 and 𝑥 = 4, 𝐸 𝑦 = 5. For 𝑥 = 0 and 𝑥 = 5, 𝐸 𝑦 = 13. Thus, 𝐸 𝑦 changes from 5 to 13. Copyright © 2022 Pearson Education, Inc.


646 Chapter 12

12.36

a.

R2 = 1 −

SSE SS yy

=1 −

21 = .956 479

95.6% of the total variability of the y values is explained by this model. b.

To test the utility of the model, we test: 𝐻 : 𝛽 =𝛽 =𝛽 =0 𝐻 : At least one 𝛽 ≠ 0 The test statistic is 𝐹 =

/ /

=

. .

/ /

= 202.8

The rejection region requires 𝛼 = .05 in the upper tail of the F distribution, with 𝜈 = 𝑘 = 3and𝜈 = 𝑛– 𝑘 + 1 = 32– 3 + 1 = 28. From Table VI, Appendix D, 𝐹. = 2.95. The rejection region is 𝐹 > 2.95. Since the observed value of the test statistic falls in the rejection region 𝐹 = 202.8 > 2.95 , H0 is rejected. There is sufficient evidence that the model is adequate for predicting y at 𝛼 = .05. c.

Since the interaction term is significant, the relationship between y and 𝑥 depends on the level of 𝑥 . From this graph, when 𝑥 = 0, as 𝑥 increases from 0 to 2, the value of y increases slightly. When 𝑥 = 2.0, as 𝑥 increases from 0 to 2, the value of y decreases a great deal.

d.

To determine if 𝑥 and 𝑥 interact, we test: 𝐻 :𝛽 = 0 𝐻 :𝛽 ≠ 0 The test statistic is 𝑡 =

=

= 2.5.

The rejection region requires 𝛼/2 = .05/2 = .025 in each tail of the t-distribution with 𝑑𝑓 = 𝑛 − 𝑘 + 1 = 32 − 3 + 1 = 28. From Table III, Appendix D, 𝑡. = 2.048. The rejection region is 𝑡 < −2.048or 𝑡 > 2.048. Since the observed value of the test statistic falls in the rejection region 𝑡 = 2.5 > 2.048 , H0 is rejected. There is sufficient evidence to indicate that x1 and x2 interact at 𝛼 = .05.

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building 12.37

a.

The prediction equation is 𝑦 = −2.55 + 3.815𝑥 + 2.63𝑥 − 1.285𝑥 𝑥

b.

For 𝑥 = 10, 𝑦 = −2.55 + 3.815𝑥 + 2.63 10 − 1.285𝑥 10 = 23.75 − 9.035𝑥 . The estimate of the slope of the line is -9.035.

c.

For 𝑥 = 1, 𝑦 = −2.55 + 3.815𝑥 + 2.63 1 − 1.285𝑥 1 = .08 + 2.53𝑥 For 𝑥 = 3, 𝑦 = −2.55 + 3.82𝑥 + 2.63 3 − 1.29𝑥 3 = 5.34 − .05𝑥 For 𝑥 = 5, 𝑦 = −2.55 + 3.82𝑥 + 2.63 5 − 1.29𝑥 5 = 10.6 − 2.63𝑥

d.

If 𝑥 and 𝑥 interact, the effect of 𝑥 on 𝑦 is different at different levels of 𝑥 . When 𝑥 = 1, as x1 increases, 𝑦 also increases. When 𝑥 = 5, as 𝑥 increases, 𝑦 decreases.

e.

The hypotheses are:

647

𝐻 :𝛽 = 0 𝐻 :𝛽 ≠ 0 f.

The test statistic is 𝑡 =

=

. .

= −8.06

The rejection region requires 𝛼/2 = .01/2 = .005 in each tail of the t-distribution with 𝑑𝑓 = 𝑛 − 𝑘 + 1 = 15 − 3 + 1 = 11. From Table III, Appendix D, 𝑡. = 3.106. The rejection region is 𝑡 < −3.106or 𝑡 > 3.106. Since the observed value of the test statistic falls in the rejection region 𝑡 = −8.06 < −3.106 , H0 is rejected. There is sufficient evidence to indicate that x1 and x2 interact at 𝛼 = .01. 12.38

12.39

a.

The model would be 𝐸 𝑦 = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 .

b.

The model would be 𝐸 𝑦 = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 𝑥 .

c.

Since the lines on the graph are not parallel, we would expect the model in part b to fit the data better.

a.

The interaction model is 𝐸 𝑦 = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 𝑥 .

b.

For 𝑥 = 2.5, 𝐸 𝑦 = 𝛽 + 𝛽 𝑥 + 𝛽 2.5 + 𝛽 𝑥 2.5 = 𝛽 + 2.5𝛽 + 𝛽 + 2.5𝛽 𝑥 . The change in revenue for every 1-tweet increase in tweet rate is (𝛽 + 2.5𝛽 ).

c.

For 𝑥 = 5, 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 (5) + 𝛽 𝑥 (5) = 𝛽 + 5𝛽 + (𝛽 + 5𝛽 )𝑥 . The change in revenue for every 1-tweet increase in tweet rate is (𝛽 + 5𝛽 ).

d.

For 𝑥 = 100, 𝐸(𝑦) = 𝛽 + 𝛽 (100) + 𝛽 𝑥 + 𝛽 (100)𝑥 = 𝛽 + 100𝛽 + (𝛽 + 100𝛽 )𝑥 . The change in revenue for every 1-unit increase in PN-ratio is (𝛽 + 100𝛽 ).

Copyright © 2022 Pearson Education, Inc.


648 Chapter 12 e.

To determine if tweet rate and PN-ratio interact, we test: 𝐻 :𝛽 = 0

12.40

a.

The interaction model is 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 𝑥 . To determine if the overall model is adequate, we test: 𝐻 : 𝛽 =𝛽 =𝛽 =0 𝐻 : At least one 𝛽 ≠ 0 The test statistic is 𝐹 = 7.52 and the p-value is 𝑝 < .01. Since the p-value is less than 𝛼 (𝑝 < .01 < .05), H0 is rejected. There is sufficient evidence to indicate the overall model is useful at 𝛼 = .05.

b.

To determine if time duration and payment plan interact, we test: 𝐻 :𝛽 = 0 𝐻 :𝛽 ≠ 0 The test statistic is 𝑡 = 2.50 and the p-value is 𝑝 < .05. Since the p-value is less than 𝛼 (𝑝 =< .05), H0 is rejected. There is sufficient evidence to indicate time duration and payment plan interact to affect intention to defect at 𝛼 = .05.

12.41

c.

For every one-month increase in time duration for single payment plans, we estimate the intention to defect to increase by .099.

d.

For every one-month increase in time duration for single payment plans, we estimate the intention to defect to increase by .099 + .140 = .239.

a.

To determine if the overall model is adequate, we test: 𝐻 : 𝛽 =𝛽 =𝛽 =0 𝐻 : At least one 𝛽 ≠ 0 The test statistic is 𝐹 = 31.98 and the p-value is 𝑝 < .001. Since the p-value is less than 𝛼 (𝑝 < .001 < .01), H0 is rejected. There is sufficient evidence to indicate the overall model is useful at 𝛼 = .01.

b.

To determine if the effect of perceived effect of experience on tilting (𝑥 ) on the rate of change of severity of tilting (𝑦) depends on póker experience (𝑥 ), we test: 𝐻 :𝛽 = 0 𝐻 :𝛽 ≠ 0 The test statistic is 𝑡 = −5.61 and the p-value is 𝑝 < .001. Since the p-value is less than 𝛼 (𝑝 < .001 < .01), H0 is rejected. There is sufficient evidence to indicate the effect of perceived effect of experience on tilting (𝑥 ) on the rate of change of severity of tilting (𝑦) depends on póker experience (𝑥 ) at 𝛼 = .01.

12.42

a.

The hypothesized regression model including the interaction between 𝑥 and 𝑥 would be: 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 𝑥 Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building

12.43

649

b.

If “𝑥 and 𝑥 interact to affect y” then the effect of “number ahead in line” on the negative feelings score depends on the level of “number behind in line”. Also, the effect of “number behind in line” on the negative feelings score depends on the level of “number ahead in line”.

c.

Since the p-value is not small (𝑝 = .25), Ho is not rejected. There is insufficient evidence to indicate “number ahead in line” and “number behind in line” interact to affect the negative feelings score.

d.

𝛽 corresponds to 𝑥 , the number ahead in line. If the “negative feeling score” gets larger as the number of people ahead increases, then 𝛽 is positive. 𝛽 corresponds to 𝑥 , the number behind in line. If the “negative feeling score” gets lower as the number of people behind increases, then 𝛽 is negative.

a.

The least squares prediction equation is 𝑦 = 11.779 − 1.972𝑥 + .585𝑥 − .553𝑥 𝑥 .

b.

For 𝑥 = 1and 𝑥 = 5, 𝑦 = 11.779 − 1.972(1) + .585(5) − .553(1)(5) = 9.967.

c.

To determine if the model is adequate, we test: 𝐻 : 𝛽 =𝛽 =𝛽 =0 𝐻 : At least 1𝛽 ≠ 0 The test statistic is 𝐹 = 45.086 and the p-value is 𝑝 < .0001. Since the p-value is less than 𝛼 (𝑝 < .0001 < .10), H0 is rejected. There is sufficient evidence to indicate the model is adequate in predicting desire to have cosmetic surgery at 𝛼 = .10.

d.

𝑅 = .439. 43.9% of the sample variation in the desire to have cosmetic surgery around its mean is explained by the model containing gender, impression of reality TV and the interaction of the two variables.

e.

𝑠 = 2.350. Most of the observed values of desire will fall within 2𝑠 = 2(2.350) = 4.70 points of their predicted values.

f.

To determine if gender and impression of reality TV interact, we test: 𝐻 :𝛽 = 0 𝐻 :𝛽 ≠ 0 The test statistic is 𝑡 = −2.004 and the p-value is 𝑝 = .0467. Since the p-value is less than 𝛼 (𝑝 = .0467 < .10), H0 is rejected. There is sufficient evidence to indicate gender and impression of reality TV interact to affect desire to have cosmetic surgery at 𝛼 = .10.

12.44

a.

If client credibility and linguistic delivery style interact, then the effect of client credibility on the likelihood value depends on the level of linguistic delivery style.

b.

To determine the overall model adequacy, we test: 𝐻 : 𝛽 =𝛽 =𝛽 =0 𝐻 : At least one 𝛽 ≠ 0

Copyright © 2022 Pearson Education, Inc.


650 Chapter 12 c.

The test statistic is 𝐹 = 55.35 and the p-value is 𝑝 < .0005. Since the p-value is so small (𝑝 < .0005), H0 is rejected for any reasonable value of 𝛼. There is sufficient evidence to indicate that the model is adequate at 𝛼 > 0.0005.

d.

To determine if client credibility and linguistic delivery style interact, we test: 𝐻 :𝛽 = 0 𝐻 :𝛽 ≠ 0

e.

The test statistic is 𝑡 = 4.008 and the p-value is 𝑝 < 0.005. Since the p-value is so small (𝑝 < 0.005), H0 is rejected. There is sufficient evidence to indicate that client credibility and linguistic delivery style interact at 𝛼 > 0.005.

f.

When 𝑥 = 22, the least squares line is: 𝑦 = 15.865 + 0.037(22) − 0.678𝑥 + 0.036𝑥 (22) = 16.679 + 0.114𝑥 The estimated slope of the Likelihood-Linguistic delivery style line when client credibility is 22 is 0.114. When client credibility is equal to 22, for each additional point increase in linguistic delivery style, the mean likelihood is estimated to increase by 0.114.

g.

When 𝑥 = 46, the least squares line is: 𝑦 = 15.865 + 0.037(46) − 0.678𝑥 + 0.036𝑥 (46) = 17.567 + 0.978𝑥 The estimated slope of the Likelihood-Linguistic delivery style line when client credibility is 46 is 0.978. When client credibility is equal to 46, for each additional point increase in linguistic delivery style, the mean likelihood is estimated to increase by 0.978.

12.45

a.

The first-order model would be 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 .

b.

The 𝛽-coefficient that measures the effect of flexibility on relationship quality independently of the other independent variables is 𝛽 .

c.

The 𝛽-coefficient that measures the effect of reputation on relationship quality independently of the other independent variables is 𝛽 . The 𝛽-coefficient that measures the effect of empathy on relationship quality independently of the other independent variables is 𝛽 . The 𝛽-coefficient that measures the effect of task alignment on relationship quality independently of the other independent variables is 𝛽 .

d.

The interaction model is 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 𝑥 + 𝛽 𝑥 𝑥 + 𝛽 𝑥 𝑥 .

e.

The null hypothesis to determine if the effect of flexibility (𝑥 ) on relationship quality (𝑦) depends on task alignment (𝑥 ) would be 𝐻 : 𝛽 = 0.

f.

The null hypothesis to determine if the effect of reputation (𝑥 ) on relationship quality (𝑦) depends on task alignment (𝑥 ) would be 𝐻 : 𝛽 = 0. The null hypothesis to determine if the effect of empathy (𝑥 ) on relationship quality (𝑦) depends on task alignment (𝑥 ) would be 𝐻 : 𝛽 = 0. Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building

12.46

651

g.

Yes. Since none of the interaction terms are significant, there is no evidence to indicate the impact of each x (𝑥 , 𝑥 , or 𝑥 ) on y depends on 𝑥 .

a.

The interaction model is 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 𝑥 .

b.

Using MINITAB, the results are: Regression Analysis: Density versus x1, x2, x1x2 Analysis of Variance Source Regression x1 x2 x1x2 Error Total

DF 3 1 1 1 14 17

Adj SS 1.96608E+11 165532568 47892440959 3623223340 8307933599 2.04916E+11

Adj MS 65535959189 165532568 47892440959 3623223340 593423829

R-sq(adj) 95.08%

R-sq(pred) 93.76%

SE Coef 31150 35.5 49589 56.6

T-Value -2.03 0.53 8.98 -2.47

F-Value 110.44 0.28 80.71 6.11

P-Value 0.000 0.606 0.000 0.027

Model Summary S 24360.3

R-sq 95.95%

Coefficients Term Constant x1 x2 x1x2

Coef -63238 18.8 445486 -139.8

P-Value 0.062 0.606 0.000 0.027

VIF 4.20 7.00 10.21

Regression Equation Density = -63238 + 18.8 x1 + 445486 x2 - 139.8 x1x2

The fitted model is 𝑦 = −63,238 + 18.8𝑥 + 445,486𝑥 − 139.8𝑥 𝑥 . c.

To determine if the overall model is adequate, we test: 𝐻 :𝛽 =𝛽 =𝛽 =0 𝐻 : At least one 𝛽 ≠ 0 The test statistic is 𝐹 = 110.44 and the p-value is 𝑝 = .001. Since the p-value is less than 𝛼 (𝑝 = .001 < .05), H0 is rejected. There is sufficient evidence to indicate the overall model is adequate at 𝛼 = .05. 𝑅 = .9508. 95.08% of the sample variation in bubble density is explained by the interaction model, adjusted for the simple size and the number of parameters in the model. 𝑠 = 24,360.3 and 2𝑠 = 2(24,360.3) = 48,720.6. Approximately 95% of the observed values of bubble density will fall within 48,720.6 units of their predicted values. The model is statistically useful. The model is adequate and the model explains most of the variation in the bubble density values.

d.

To determine if mass flux and heat flux interact, we test: Copyright © 2022 Pearson Education, Inc.


652 Chapter 12 𝐻 :𝛽 = 0 𝐻 :𝛽 ≠ 0 The test statistic is 𝑡 = −2.47 and the p-value is 𝑝 = .027. Since the p-value is less than 𝛼 (𝑝 = .027 < .05), H0 is rejected. There is sufficient evidence to indicate mass flux and heat flux interact at 𝛼 = .05.

12.47

e.

For 𝑥 = .5, 𝑦 = 159,505 − 51.1𝑥 . Thus, we would expect bubble density to decrease by 51.1 liters/m2 for every 1 kg/m2-sec increase in mass flux, when heat flux is set at .5 megawatta/m2.

a.

Let 𝑥 = latitude, 𝑥 = longitude, and 𝑥 = depth. The model is 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 𝑥 + 𝛽 𝑥 𝑥 .

b.

Using MINITAB, the results are: Regression Analysis: ARSENIC versus LATITUDE, LONGITUDE, ... The regression equation is ARSENIC = 10845 - 1280 LATITUDE + 217 LONGITUDE - 1549 DEPTH-FT – 11.0 Lat_d + 20.0 Long_d 327 cases used, 1 cases contain missing values Predictor Constant LATITUDE LONGITUDE DEPTH-FT Lat_D Long_D

Coef 10845 -1280 217.4 -1549.2 -11.00 19.98

S = 103.072

SE Coef 67720 1053 814.5 985.6 11.86 11.20

R-Sq = 13.7%

T 0.16 -1.22 0.27 -1.57 -0.93 1.78

P 0.873 0.225 0.790 0.117 0.355 0.076

R-Sq(adj) = 12.4%

Analysis of Variance Source Regression Residual Error Total

DF 5 321 326

Source LATITUDE LONGITUDE DEPTH-FT Lat_D Long_D

Seq SS 132448 320144 53179 2756 33777

DF 1 1 1 1 1

SS 542303 3410258 3952562

MS 108461 10624

F 10.21

P 0.000

The least squares model is: 𝑦 = 10,845 − 1,280𝑥 + 217.4𝑥 − 1,549.2𝑥 − 11.00𝑥 𝑥 + 19.98𝑥 𝑥 c.

To determine if latitude and depth interact to affect arsenic level, we test: 𝐻 :𝛽 = 0 𝐻 :𝛽 ≠ 0 From the printout, the test statistic is 𝑡 = −.93 and the p-value is 𝑝 = .355. Since the p-value is not less than 𝛼 (𝑝 = .355 ≮ . 05), Ho is not rejected. There is insufficient evidence to indicate latitude and depth interact to affect arsenic level at 𝛼 = .05.

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building d.

653

To determine if longitude and depth interact to affect arsenic level, we test: 𝐻 :𝛽 = 0 𝐻 :𝛽 ≠ 0 From the printout, the test statistic is 𝑡 = 1.78 and the p-value is 𝑝 = .076. Since the p-value is not less than 𝛼 (𝑝 = .076 ≮ . 05), Ho is not rejected. There is insufficient evidence to indicate longitude and depth interact to affect arsenic level at 𝛼 = .05.

12.48

e.

Because the interactions are not significant, this means that the effect of latitude on the arsenic levels does not depend on the depth and the effect of longitude on the arsenic levels does not depend on the depth.

a.

Since the researchers theorize the lines will be steeper for workers as CSE increases, we would expect 𝛽 to be a positive value. Since the researchers theorize the line will be steeper if the envied target is a friend, we would expect 𝛽 to be a positive value.

b.

To determine if 𝛽 is a positive value, we test: 𝐻 :𝛽 = 0 𝐻 :𝛽 > 0 .

No value 𝛼 is specified so we choose 𝛼 = .05. The p-value is 𝑝 < = .005. Since the p-value is less than 𝛼 (𝑝 < .005 < .05), Ho is rejected. There is sufficient evidence to indicate that 𝛽 is a positive value at 𝛼 = .05. c.

To determine if 𝛽 is a positive value, we test: 𝐻 :𝛽 = 0 𝐻 :𝛽 > 0 .

= .025. Since the p-value is No value 𝛼 is specified so we choose 𝛼 = .05. The p-value is 𝑝 < less than 𝛼 (𝑝 < .025 < .05), Ho is rejected. There is sufficient evidence to indicate that 𝛽 is a positive value at 𝛼 = .05. 12.49

12.50

a.

𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥

b.

𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 𝑥 + 𝛽 𝑥 + 𝛽 𝑥

c.

𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 𝑥 + 𝛽 𝑥 𝑥 + 𝛽 𝑥 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥

a.

𝐻 :𝛽 = 0 𝐻 :𝛽 ≠ 0 The test statistic is 𝑡 =

=

. .

= 3.133

The rejection region requires 𝛼/2 = .05/2 = .025 in each tail of the t-distribution with 𝑑𝑓 = 𝑛 − (𝑘 + 1) = 25 − (2 + 1) = 22. From Table III, Appendix D, 𝑡. = 2.074. The rejection region is 𝑡 < −2.074 or 𝑡 > 2.074. Since the observed value of the test statistic falls in the rejection region (𝑡 = 3.133 > 2.074), H0 is Copyright © 2022 Pearson Education, Inc.


654 Chapter 12 rejected. There is sufficient evidence to indicate the quadratic term should be included in the model at 𝛼 = .05. b.

𝐻 :𝛽 = 0 𝐻 :𝛽 > 0 The test statistic is the same as in part a, 𝑡 = 3.133. The rejection region requires 𝛼 = .05 in the upper tail of the t-distribution with 𝑑𝑓 = 22. From Table III, Appendix D, 𝑡. = 1.717. The rejection region is 𝑡 > 1.717. Since the observed value of the test statistic falls in the rejection region (𝑡 = 3.133 > 1.717), H0 is rejected. There is sufficient evidence to indicate the quadratic curve opens upward at 𝛼 = .05.

12.51

a.

To determine if the model contributes information for predicting y, we test: 𝐻 :𝛽 = 𝛽 = 0 𝐻 :At least one 𝛽 ≠ 0 The test statistic is 𝐹 = (

/ )/

(

)

=(

. .

/

)/

(

)

= 85.94.

The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution, with 𝜈 = 𝑘 = 2, and 𝜈 = 𝑛– (𝑘 + 1) = 20– (2 + 1) = 17. From Table VI, Appendix D, 𝐹. = 3.59. The rejection region is 𝐹 > 3.59. Since the observed value of the test statistic falls in the rejection region (𝐹 = 85.94 > 3.59), H0 is rejected. There is sufficient evidence that the model contributes information for predicting y at 𝛼 = .05. b.

To determine if upward curvature exists, we test: 𝐻 :𝛽 = 0 𝐻 :𝛽 > 0

c.

To determine if downward curvature exists, we test: 𝐻 :𝛽 = 0 𝐻 :𝛽 < 0

12.52

a.

b.

It moves the graph to the right (−2x) or to the left (+2x) compared to the graph of 𝑦 = 1 + 𝑥 .

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building

12.53

c.

It controls whether the graph opens up (+𝑥 ) or down (−𝑥 ). It also controls how steep the curvature is, i.e., the larger the absolute value of the coefficient of x2, the narrower the curve is.

a.

To determine if at least one of the parameters is nonzero, we test:

655

𝐻 : 𝛽 =𝛽 =𝛽 =𝛽 =𝛽 =0 𝐻 : At least one 𝛽 ≠ 0 The test statistic is 𝐹 = 25.93 and the p-value is 𝑝 < 0.0001. Since the p-value is less than 𝛼 (𝑝 < 0.0001 < .05), H0 is rejected. There is sufficient evidence to indicate that at least one of the parameters 𝛽 , 𝛽 , 𝛽 , 𝛽 , and 𝛽 is nonzero at 𝛼 = .05. b.

𝐻 :𝛽 = 0 𝐻 :𝛽 ≠ 0 The test statistic is 𝑡 = −10.74 and the p-value is 𝑝 < 0.0001. Since the p-value is less than 𝛼 (𝑝 < 0.000.01), H0 is rejected. There is sufficient evidence to indicate that 𝛽 ≠ 0 at 𝛼 = .01.

c.

𝐻 :𝛽 = 0 𝐻 :𝛽 ≠ 0 The test statistic is 𝑡 = .60 and the p-value is 𝑝 = .5504. Since the p-value is greater than 𝛼 (𝑝 = .5504 ≮ . 01), H0 is not rejected. There is insufficient evidence to indicate that 𝛽 ≠ 0 at 𝛼 = .01.

d.

A possible graph may look like:

Notice that there is no curvature in the x2 plane but there is curvature in the x1 plane. 12.54

a.

The correlation coefficient, r, only measures the linear relationship between two variables. If the relationship between two variables is curvilinear, then r should not be used to measure the relationship.

b.

The model is 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 .

Copyright © 2022 Pearson Education, Inc.


656 Chapter 12 A possible graph is:

y

c.

x

d.

If the theory is supported, then the expected sign of 𝛽 is negative.

e.

To determine if task performance will increase as a level of conscientiousness increase, but at a decreasing rate, we test: 𝐻 :𝛽 = 0 𝐻 :𝛽 < 0 Since the p-value, (𝑝 < .05), is less than 𝛼 = .05, H0 is rejected. There is sufficient evidence to indicate that task performance will increase as the level of conscientiousness increases, but at a decreasing rate at 𝛼 = .05.

12.55

a.

𝛽 = 6.13. Since 0 is not in the observed range (one cannot have the ball on the goal line), this has no meaning other than the y-intercept. 𝛽 = .141. Since the quadratic term is present in the model, this is no longet the slope of the line. It is simply a location parameter. 𝛽 = −.0009. Since this term is negative, it indicates that the shape of the relationship is moundshaped, or concave downward. As the distance from the goal line increases, the predicted number of points scored will increase to some point and then start decreasing.

b.

𝑅 = .226. 22.6% of the sample variation in the number of points scored around their mean is explained by the quadratic relationship between the number of points scored and the number of yards from the opposing goal line.

c.

No. Even though the value of 𝑅 has increased, we do not know if the increase is statistically significant.

d.

To determine if the quadratic model is a better fit, we would test: 𝐻 :𝛽 = 0 𝐻 :𝛽 ≠ 0

12.56

a.

Because the graph is curved, we would hypothesize that the model should be: 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 .

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building

657

b.

The prediction equation is 𝑦 = 493.2 + 4.92𝑥 − .026𝑥 .

c.

Since the value of experience = 0 falls outside the range of the data sampled, the y-intercept has no practical interpretation.

d.

Since a quadratic model was fit to the data, there is no slope in the model. Hence, 𝛽 has no practical interpretation.

e.

𝛽 = −.026. Since the value of 𝛽 is negative, the curvature is downward.

f.

To determine if if the pivot pin-top distance is curvilinearly related to farmer’s working experience, we test: 𝐻 :𝛽 = 0 𝐻 :𝛽 ≠ 0 From the printout, the test statistic is 𝑡 = −.18 and the p-value is 𝑝 = .858. Since the p-value is not less than 𝛼 (𝑝 = .858 ≮ . 10), H0 is not rejected. There is insufficient evidence to indicate that the pivot pin-top distance is curvilinearly related to farmer’s working 𝛼 = .10.

12.57

Because the graph is curveved, we would hypothesize that the model should be 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 . Since the curve opens down, 𝛽 will be negative.

12.58

a.

To determine if the model is adequate, we test: 𝐻 :𝛽 =𝛽 =0 𝐻 : At least one 𝛽 ≠ 0 The test statistic is 𝐹 = (

/ )/

(

)

=(

. .

)/

/ (

)

= 26.25.

The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution with 𝜈 = 𝑘 = 2 and 𝜈 = 𝑛– (𝑘 + 1) = 388– (2 + 1) = 385. From Table VI, Appendix D, 𝐹. ≈ 3.00. The rejection region is 𝐹 > 3.00. Since the observed value of the test statistic falls in the rejection region (𝐹 = 26.25 > 3.00), H0 is rejected. There is sufficient evidence to indicate the model is adequate at 𝛼 = .05. b.

To determine if leadership ability increases at a decreasing rate with assertiveness, we test: 𝐻 :𝛽 = 0 𝐻 :𝛽 < 0

12.59

c.

From the table, the test statistic is 𝑡 = −3.97 and the p-value is 𝑝 < .01/2 = .005. Since the p-value is less than 𝛼 (𝑝 < .005 < .05), H0 is rejected. There is sufficient evidence to indicate leadership ability increases at a decreasing rate with assertiveness at 𝛼 = .05.

a.

The complete 2nd order model is 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 .

b.

𝑅 = .14. 14% of the total variation in the efficiency scores is explained by the complete 2nd order model containing level of CEO leadership and level of congruence between the CEO and the VP.

c.

If the 𝛽-coefficient for the 𝑥 term is negative, then as the value of the level of congruence increases, the efficiency will increase at a decreasing rate to some point and then the efficiency will decrease at an increasing rate, holding level of CEO leadership constant.

Copyright © 2022 Pearson Education, Inc.


658 Chapter 12

12.60

d.

Since the p-value is less than 𝛼 (𝑝 = .02 < .05), H0 is rejected. There is sufficient evidence to indicate that the level of CEO leadership and the level of congruence between the CEO and the VP interact to affect efficiency. This means that the effect of CEO leadership on efficiency depends on the level of congruence between the CEO and the VP.

a.

Using MINITAB, the results are: Regression Analysis: Years versus Age, Age-sq The regression equation is Years = - 8.9 + 0.704 Age - 0.00341 Age-sq Predictor Constant Age Age-sq

Coef -8.93 0.7039 -0.003412

S = 7.42668

SE Coef 10.24 0.5518 0.006725

R-Sq = 42.9%

T -0.87 1.28 -0.51

P 0.389 0.211 0.615

R-Sq(adj) = 39.6%

Analysis of Variance Source Regression Residual Error Total Source Age Age-sq

DF 1 1

DF 2 35 37

SS 1450.95 1930.45 3381.39

MS 725.47 55.16

F 13.15

P 0.000

Seq SS 1436.75 14.20

The prediction equation is 𝑦 = −8.93 + .7039𝑥 − .0034𝑥 . b.

To determine if the model is adequate, we test: 𝐻 :𝛽 =𝛽 =0 𝐻 : At least 1𝛽 ≠ 0 The test statistic is 𝐹 = 13.15 and the p-value is 𝑝 = 0.000. Since the p-valus is less than 𝛼 (𝑝 = 0.000 < .01), H0 is rejected. There is sufficient evidence to indicate the model is adequate at 𝛼 = .01.

c.

To determine if the quadratic model is better than the linear model, we test: 𝐻 :𝛽 = 0 𝐻 :𝛽 ≠ 0 The test statistic is 𝑡 = −.51 and the p-value is 𝑝 = .615. Since the p-valus is not less than 𝛼 (𝑝 = .615 ≮ . 01), H0 is not rejected. There is insufficient evidence to indicate the quadratic model is better than the linear model at 𝛼 = .01. Thus, the relationship between age and number of years shopping on black Friday is best represented by a linear function.

12.61

a.

A first order model is 𝐸(𝑦) = 𝛽 + 𝛽 𝑥.

b.

A second order model is 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 .

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building c.

659

Using MINITAB, a scattergram of these data is:

From the plot, it appears that the first-order model might fit the data better. There does not appear to be a curve to the relationship. d.

Using MINITAB, the output is: Regression Equation INTLGROSS

=

-140 + 1.92 Domestic - 0.00000 DomSQ

Coefficients Term Constant Domestic DomSQ

Coef

SE Coef

T-Value

P-Value

VIF

-140 1.92 -0.00000

246 1.18 0.00116

-0.57 1.63 -0.00

0.574 0.114 0.998

21.00 21.00

Model Summary S

R-sq

R-sq(adj)

R-sq(pred)

302.447

67.40%

64.99%

47.31%

Analysis of Variance Source

DF

Adj SS

Adj MS

F-Value

P-Value

Regression Domestic DomSQ Error Total

2 1 1 27 29

5106634 243851 0 2469801 7576435

2553317 243851 0 91474

27.91 2.67 0.00

0.000 0.114 0.998

To investigate the usefulness of the model, we test: 𝐻 :𝛽 =𝛽 =0 𝐻 : At least 1 𝛽 ≠ 0 The test statistic is 𝐹 = 27.91 and the p-value is 𝑝 = .000. Since the p-value is less than 𝛼 (𝑝 = .000 < .05), H0 is rejected. There is sufficient evidence to indicate the model is useful for predicting foreign gross revenue at 𝛼 = .05.

Copyright © 2022 Pearson Education, Inc.


660 Chapter 12 To determine if a curvilinear relationship exists between foreign and domestic gross revenues, we test: 𝐻 :𝛽 = 0 𝐻 :𝛽 ≠ 0 The test statistic is 𝑡 = 0.00 and the p-value is 𝑝 = .998. Since the p-value is not less than 𝛼 (𝑝 = .998 ≮ . 05), H0 is not rejected. There is insufficient evidence to indicate that a curvilinear relationship exists between foreign and domestic gross revenues at 𝛼 = .05.

12.62

e.

From the analysis in part d, the first-order model better explains the variation in foreign gross revenues. In part d, we concluded that the second-order term did not improve the model.

a.

To test the researcher’s theory, we need to test to see if the relationship is curved and opens down. Thus, we would test: 𝐻 :𝛽 = 0 𝐻 :𝛽 < 0

b.

The test statistic is 𝑡 =

=

. .

= −7.15.

The rejection region requires 𝛼 = .01 in the lower tail of the t-distribution with 𝑑𝑓 = 𝑛 − (𝑘 + 1) = 419,225 − (2 + 1) = 419,222. From Table III, Appendix D, 𝑡. = 2.326. The rejection region is 𝑡 < −2.326. Since the observed value of the test statistic falls in the rejection region (𝑡 = −7.15 < −2.326), H0 is rejected. There is sufficient evidence to support the researchers’ claim at 𝛼 = .01. a.

Using MINITAB, the scattergram of the data is: Scatterplot of Time vs Temp 10000

8000

6000

Time

12.63

4000

2000

0 120

130

140

150

160

170

Temp

The relationship appears to be curvilinear. As temperature increases, the value of time tends to decrease but at a decreasing rate. b.

Using MINITAB the results are:

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building

661

Regression Analysis: Time versus Temp, Tempsq The regression equation is Time = 154243 - 1909 Temp + 5.93 Tempsq

Predictor Constant Temp Tempsq

Coef 154243 -1908.9 5.929

S = 688.137

SE Coef 21868 303.7 1.048

R-Sq = 94.2%

T 7.05 -6.29 5.66

P 0.000 0.000 0.000

R-Sq(adj) = 93.5%

Analysis of Variance Source Regression Residual Error Total Source Temp Tempsq

DF 1 1

DF 2 19 21

SS 144830280 8997107 153827386

MS 72415140 473532

F 152.93

P 0.000

Seq SS 129663987 15166293

The fitted regression line is 𝑦 = 154,243 − 1,908.9𝑥 + 5.929𝑥 . c.

To determine if there is an upward curvature in the relationship between failure time and solder temperature, we test: 𝐻 :𝛽 = 0 𝐻 :𝛽 > 0 From the printout, the test statistic is 𝑡 = 5.66 and the p-value is 𝑝 = 0.000. Since the p-value is less than 𝛼 (𝑝 = 0.000 < .05), H0 is rejected. There is sufficient evidence to indicate an upward curvature in the relationship between failure time and solder temperature at 𝛼 = .05.

12.64

a.

The least squares prediction equation is 𝑦 = 6.27 + .00791𝑥 − .000004𝑥 .

b.

To determine if the overall model is adequate, we test: 𝐻 :𝛽 = 𝛽 = 0 𝐻 :At least 1𝛽 ≠ 0 The test statistic is 𝐹 = 210.56 and the p-value is 𝑝 = .000. Since the p-value is less than 𝛼 (𝑝 = .000 < .01), H0 is rejected. There is sufficient evidence to indicate the model is adequate at 𝛼 = .01.

c.

𝑅 = .9722. 97.22% of the sample variation in ratio values is explained by the second-order model containing pipe diameter, adjusted for the simple size and the number of parameters in the model.

d.

To determine if the rate of increase of ratio(𝑦)with diameter(𝑥)is slower for larger pipe sizes, we test: 𝐻 :𝛽 = 0 𝐻 :𝛽 < 0

e.

The test statistic is 𝑡 = −3.23 and the p-value is 𝑝 = .009/2 = .0045.

Copyright © 2022 Pearson Education, Inc.


662 Chapter 12 Since the p-value is less than 𝛼 (𝑝 = .0045 < .01), H0 is rejected. There is sufficient evidence to indicate the rate of increase of ratio (𝑦) with diameter (𝑥) is slower for larger pipe sizes at 𝛼 = .01. The 95% prediction interval is (7.59,8.36). We are 95% confident that the actual ratio of repair ro replacement cost will be between 7.59 and 8.36 when the pipe diameter is 240mm.

a.

A scatterplot of the data is: Scatterplot of Demand vs Day 12000

10000

Demand

12.65

f.

8000

6000

4000

2000 0

10

20

30

40

Day

b.

From the plot, it looks like a second-order model would fit the data better than a first-order model. There is little evidence that a third-order model would fit the data better than a second-order model.

c.

Using MINITAB, the output for fitting a first-order model is: Regression Analysis: Demand versus Day The regression equation is Demand = 2802 + 123 Day Predictor Constant Day

Coef 2802.4 122.95

S = 1876.57

SE Coef 604.7 25.70

T 4.63 4.78

R-Sq = 37.6%

P 0.000 0.000

R-Sq(adj) = 35.9%

Analysis of Variance Source Regression Residual Error Total

DF 1 38 39

SS 80572885 133817026 214389911

MS 80572885 3521501

F 22.88

P 0.000

To see if there is a significant linear relationship between day and demand, we test: 𝐻 :𝛽 = 0 𝐻 :𝛽 ≠ 0 The test statistic is 𝑡 = 4.78 and the p-value is 𝑝 = .000. Since the p-value is less than 𝛼 (𝑝 = .000 < .05), H0 is rejected. There is sufficient evidence to indicate that there is a linear relationship between day and demand at 𝛼 = .05. d.

Using MINITAB, the output for fitting a second-order model is:

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building

663

Regression Analysis: Demand versus Day, Day-sq The regression equation is Demand = 4944 - 183 Day + 7.46 Day-sq Predictor Constant Day Day-sq

Coef 4944.2 -183.03 7.463

S = 1662.23

SE Coef 829.6 93.32 2.207

R-Sq = 52.3%

T 5.96 -1.96 3.38

P 0.000 0.057 0.002

R-Sq(adj) = 49.7%

Analysis of Variance Source Regression Residual Error Total Source Day Day-sq

DF 1 1

DF 2 37 39

SS 112158325 102231587 214389911

MS 56079162 2763016

F 20.30

P 0.000

Seq SS 80572885 31585440

To see if there is a significant quadratic relationship between day and demand, we test: 𝐻 :𝛽 = 0 𝐻 :𝛽 ≠ 0 The test statistic is 𝑡 = 3.38 and the p-value is 𝑝 = .002. Since the p-value is less than 𝛼 (𝑝 = .002 < .05), H0 is rejected. There is sufficient evidence to indicate that there is a quadratic relationship between day and demand at 𝛼 = .05. e.

12.66

Since the quadratic term is significant in the second-order model in part d, the second order model is better.

1 if qualitative variable assumes 2nd level Let x =  0 otherwise

The model is E ( y ) = β 0 + β1 x1 . 𝛽 = mean value of y when the qualitative variable assumes the first level 𝛽 = difference in the mean value of y between levels 2 and 1 of the qualitative variable 12.67

The model is 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 where

 1 if the variable is at level 2 x1 =   0 otherwise

 1 if the variable is at level 3 x2 =   0 otherwise

𝛽 = mean value of y when qualitative variable is at level 1. 𝛽 = difference in mean value of y between level 2 and level 1 of qualitative variable. 𝛽 = difference in mean value of y between level 3 and level 1 of qualitative variable. 12.68

a.

The least squares prediction equation is 𝑦 = 80 + 16. 8𝑥 + 40.4𝑥 .

b.

𝛽 = 16.8 The difference in the mean value of the dependent variable between level 2 and level 1 of the independent variable is estimated to be 16.8.

Copyright © 2022 Pearson Education, Inc.


664 Chapter 12

𝛽 = 40.4 The difference in the mean value of the dependent variable between level 3 and level 1 of the independent variable is estimated to be 40.4. c.

The hypothesis 𝐻 : 𝛽 = 𝛽 = 0 is the same as 𝐻 : 𝜇 = 𝜇 = 𝜇 . The hypothesis 𝐻 :At least one 𝛽 ≠ 0 is the same as 𝐻 :At least one 𝜇 differs.

d.

The test statistic is 𝐹 = 24.72 and the p-value is 𝑝 < .0001. Since the p-value is so small (𝑝 < .00001), H0 is rejected. There is sufficient evidence to indicate at least one of the means is different for any reasonable value of 𝛼.

12.69

a.

Level 1 implies 𝑥 = 𝑥 = 𝑥 = 0. 𝑦 = 10.2 − 4(0) + 12(0) + 2(0) = 10.2 Level 2 implies 𝑥 = 1 and 𝑥 = 𝑥 = 0. 𝑦 = 10.2 − 4(1) + 12(0) + 2(0) = 6.2 Level 3 implies 𝑥 = 1 and 𝑥 = 𝑥 = 0. 𝑦 = 10.2 − 4(0) + 12(1) + 2(0) = 22.2 Level 4 implies 𝑥 = 1 and 𝑥 = 𝑥 = 0. 𝑦 = 10.2 − 4(0) + 12(0) + 2(1) = 12.2

b.

The hypotheses are: 𝐻 :𝛽 =𝛽 =𝛽 =0 𝐻 : At least one 𝛽 ≠ 0

12.70

a.

We estimate the mean forecast accuracy when the analyst’s fWTH does not exceed the median fWTH to be .661.

b.

We estimate the mean forecast accuracy when the analyst’s fWTH does exceed the median fWTH to be .032 higher than the mean forecast accuracy when the analyst’s fWTH does not exceed the median fWTH.

c.

To determine if the mean forecast accuracy differs for the two levels, we test: 𝐻 :𝛽 = 0 𝐻 :𝛽 ≠ 0 The test statistic is 𝑡 = 2.46 and the p-value is 𝑝 < .01. No 𝛼 was specified so we will use a value of 𝛼 = .05. Since the p-value is less than 𝛼 (𝑝 < .01 < .05), H0 is rejected. There is sufficient evidence to indicate the mean forecast accuracy differs for the two levels at 𝛼 = .05.

12.71

a.

The model is 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 where 𝑥 =

b.

1 𝑖𝑓 ℎ𝑢𝑚𝑎𝑛 𝑠𝑡𝑎𝑓𝑓 𝑜𝑛𝑙𝑦 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

𝑥 =

1 𝑖𝑓 𝑠𝑒𝑟𝑣𝑖𝑐𝑒 𝑟𝑜𝑏𝑜𝑡 𝑜𝑛𝑙𝑦 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

𝛽 =mean value of y when combined staff and service robot, so 𝛽 = 3.74. 𝛽 =difference in mean value of y between human only and combined, so 𝛽 = 4.27 − 3.74 = .53. 𝛽 =difference in mean value of y between robot only and combined, so 𝛽 = 3.15 − 3.74 = −.59.

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building c.

665

To determine if the mean interaction quality differs depending on which type of service was delivered, we test: 𝐻 : 𝛽 =𝛽 =0 𝐻 : At least 1𝛽 ≠ 0 The test statistic is 𝐹 = 74.33 and the p-value is 𝑝 < .01.

12.72

a. b.

No 𝛼 was specified so we will use a value of 𝛼 = .05. Since the p-value is less than 𝛼 (𝑝 < .01 < .05), H0 is rejected. There is sufficient evidence to indicate the mean interaction quality differs depending on which type of service was delivered at 𝛼 = .05. 1 if gift giver . The model is E ( y ) = β 0 + β1 x where x =  0 otherwise

β 0 = mean level of appreciation for gift receivers. β1 = difference in the mean level of appreciation between levels birthday gift givers and birthday gift receivers.

c.

To determine if the average level of appreciation will be higher for birthday gift givers than birthday gift receivers, we test: H 0 : β1 = 0 H a : β1 > 0

12.73

a.

The model is E ( y ) = β 0 + β1 x1 + β 2 x 2 + β 3 x3 1 if dosage group A where x1 =  0 otherwise

1 if dosage group B x2 =  0 otherwise

1 if dosage group C x3 =  0 otherwise

b.

β1 is the difference in the meanweight loss between AA groups A and D. β 2 is the difference in the meanweight loss between AA groups B and D. β3 is the difference in the meanweight loss between AA groups C and D.

c.

The mean weight loss is reduced in goats administered AA compared to goats not given any AA. Therefore, both β1 < 0 and β 2 < 0 .

12.74

a.

The model would be: E ( y ) = β 0 + β1 x

b.

β 0 = mean relative optimism for analysts who worked for sell-side firms β1 = difference in mean relative optimism for analysts who worked for buy-side and sell-side firms

c.

Yes.

d.

Yes. If the buy-side analysts are less optimistic, then their estimates will be smaller than the sell-side estimates. Thus, the estimate of β1 will be negative.

Copyright © 2022 Pearson Education, Inc.


666 Chapter 12 12.75

12.76

a.

1 if blonde Caucasian Let x1 =  0 otherwise

1 if brunette Caucasian Let x2 =  0 otherwise

b.

The model would be: E ( y ) = β 0 + β1 x1 + β 2 x2

c.

The mean for a blonde Caucasian solicitor would be E ( y ) = β 0 + β1 (1) + β 2 (0) = β 0 + β1 .

d.

The difference in the mean level of contribution between a blode solicitor and a minority female solicitor is β1 .

e.

If the theory is correct, then β 0 (mean for minority female solicitors) will be positive, β1 will be positive (mean for female Caucasian solicitors is greator than the means of the other groups), and β 2 will be close to 0 (no difference in the means for minority female solicitors and brunette Caucasian solicitors).

f.

Yes. The β -estimate for the dummy variable for blonde Caucasian solicitors should be positive and significantly different from 0. The β -estimate for the dummy variable for brunette Caucasian solicitors should be close to 0. In this case, it is not statistically different from 0.

a.

1 if election was effected by a World War The model is E ( y ) = β 0 + β1 x where x =  0 otherwise

b.

An expression for the mean Democratic vote share during all years when there is no World War is β0 .

c.

An expression for the mean Democratic vote share during all years when there is a World War is β0 + β1 .

d.

Using MINITABN, the results are: Regression Analysis: VSHARE versus WAR Analysis of Variance Source Regression WAR Error Total

DF 1 1 22 23

Adj SS 12.51 12.51 1113.36 1125.86

Adj MS 12.51 12.51 50.61

F-Value 0.25 0.25

P-Value 0.624 0.624

Model Summary S 7.11387

R-sq 1.11%

R-sq(adj) 0.00%

R-sq(pred) 0.00%

SE Coef 1.55 4.39

T-Value 31.95 -0.50

Coefficients Term Constant WAR

Coef 49.60 -2.18

P-Value 0.000 0.624

VIF 1.00

Regression Equation VSHARE = 49.60 - 2.18 WAR

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building

667

To determine if the mean Democratic vote share during all years when there is a World War differs from the mean when there is no World War, we test: H 0 : β1 = 0 H a : β1 ≠ 0

The test statistic is t = −.50 and the p-value is p = .624 . Since the p-value is not less than α ( p = .624 </ .10 ) , H0 is not rejected. There is insufficient evidence to indicate the mean Democratic vote share during all years when there is a World War differs from the mean when there is no World War at α = .10 . e.

The model would be E ( y ) = β 0 + β1 x1 + β 2 x 2 1 if Democrat incumbant where x1 =  0 otherwise

1 if Republican incumbant x2 =  0 otherwise

f.

An expression for the mean Democratic vote share during all years when there is no incumbent running is β0 .

g.

An expression for the mean Democratic vote share during all years when a Republican incumbent is running is β0 + β2 .

h.

An expression for the difference between the mean Democratic vote share for all years when a Democratic incumbent is running and when there is no incumbent running is β1 .

i.

Using MINITAB, the results are: Regression Analysis: VSHARE versus INCUMB1, INCUMB2 Analysis of Variance Source Regression INCUMB1 INCUMB2 Error Total

DF 2 1 1 21 23

Adj SS 318.62 239.09 1.02 807.25 1125.86

Adj MS 159.309 239.093 1.017 38.440

F-Value 4.14 6.22 0.03

P-Value 0.030 0.021 0.872

Model Summary S 6.20003

R-sq 28.30%

R-sq(adj) 21.47%

R-sq(pred) 5.51%

Coefficients Term Constant INCUMB1 INCUMB2

Coef 46.47 8.00 0.51

SE Coef 2.34 3.21 3.12

T-Value 19.83 2.49 0.16

P-Value 0.000 0.021 0.872

VIF 1.43 1.43

Regression Equation VSHARE = 46.47 + 8.00 INCUMB1 + 0.51 INCUMB2

Copyright © 2022 Pearson Education, Inc.


668 Chapter 12 To determine if the mean Democratic vote share differs depending on the incumbent running, we test: H 0 : β1 = β 2 = 0 H a : At least 1 β i ≠ 0

The test statistic is F = 4.14 and the p-value is p = .030 . Since the p-value is less than α ( p = .030 < .10 ) , H0 is rejected. There is sufficient evidence to

indicate the mean Democratic vote share differs depending on the incumbent running at α = .10 .

12.77

a.

Since there are four groups, we need 3 dummy variables. 1 if large/private Let x1 =  0 otherwise

b.

1 if small/public Let x2 =  0 otherwise

1 if small/private Let x3 =  0 otherwise

The model is E ( y ) = β 0 + β1 x1 + β 2 x 2 + β 3 x3 .

β 0 = mean likelihood of reporting sustainability policies for large/public firms. β1 = difference in mean likelihood of reporting sustainability policies between large/private firms and large/public firms. β 2 = difference in mean likelihood of reporting sustainability policies between small/public firms and large/public firms. β3 = difference in mean likelihood of reporting sustainability policies between small/private firms and large/public firms. c.

Since the p-value is very small ( p < .001) , H0 would be rejected for any reasonable value of α . There is sufficient evidence to indicate a difference in the mean likelihood of reporting sustainability policies among the 4 groups.

d.

Since there are 2 levels of each of the 2 variables, we need to create 2 dummy variables. 1 if small Let x1 =  0 otherwise

1 if private . Let x2 =  0 otherwise

e.

The main effects model would be: E ( y ) = β 0 + β1 x1 + β 2 x2 .

f.

For large/public, E ( y ) = β 0 + β1 ( 0 ) + β 2 ( 0 ) = β 0 . For large/private, E ( y ) = β 0 + β1 ( 0 ) + β 2 (1) = β 0 + β 2 . For small/public, E ( y ) = β 0 + β1 (1) + β 2 ( 0 ) = β 0 + β1 . For small/private, E ( y ) = β 0 + β1 (1) + β 2 (1) = β 0 + β1 + β 2 .

g.

For public firms, the difference between small and large firms is ( β 0 + β1 ) − β 0 = β1 . Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building

669

For private firms, the difference between small and large firms is ( β 0 + β1 + β 2 ) − ( β 0 + β 2 ) = β1 . h.

The model is E ( y ) = β 0 + β1 x1 + β 2 x 2 + β 3 x1 x2 .

i.

For large/public, E ( y ) = β 0 + β1 ( 0 ) + β 2 ( 0 ) + β 3 ( 0 )( 0 ) = β 0 . For large/private, E ( y ) = β 0 + β1 ( 0 ) + β 2 (1) + β 3 ( 0 )(1) = β 0 + β 2 . For small/public, E ( y ) = β 0 + β1 (1) + β 2 ( 0 ) + β 3 (1)( 0 ) = β 0 + β1 . For small/private, E ( y ) = β 0 + β1 (1) + β 2 (1) + β 3 (1)(1) = β 0 + β1 + β 2 + β 3 .

j.

For public firms, the difference between small and large firms is ( β 0 + β1 ) − β 0 = β1 . For private firms, the difference between small and large firms is ( β 0 + β1 + β 2 + β 3 ) − ( β 0 + β 2 ) = β 1 + β 3 .

12.78

a.

The model is 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 1 𝑖𝑓 𝑚𝑜𝑟𝑒 𝑑𝑒𝑠𝑐𝑟𝑖𝑝𝑡𝑖𝑣𝑒 𝑙𝑜𝑔𝑜 0 𝑖𝑓 𝑙𝑒𝑠𝑠 𝑑𝑒𝑠𝑐𝑟𝑖𝑝𝑡𝑖𝑣𝑒 𝑙𝑜𝑔𝑜

where 𝑥 =

𝛽 =mean authenticity rating when using a less descriptive logo 𝛽 =difference in mean authenticy rating between when using a more descriptive logo and when using a less descriptive logo. b.

MINITAB was used to fit the model: Regression Equation RATING

=

4.829 + 0.0 LOGO_Less + 1.064 LOGO_More

Coefficients Term

Coef

SE Coef

T-Value

P-Value

Constant LOGO More

4.829

0.281

17.20

0.000

1.064

0.397

2.68

0.008

VIF

1.00

Analysis of Variance Source

DF

Adj SS

Adj MS

F-Value

P-Value

Regression LOGO Error Total

1 1 178 179

50.95 50.95 1263.42 1314.38

50.955 50.955 7.098

7.18 7.18

0.008 0.008

To determine if the mean authenticy rating when using a less descriptive logo differs from the mean authenticy rating when using a more descriptive logo, we test: H 0 : β1 = 0 H a : β1 ≠ 0

The test statistic is 𝐹 = 7.18 and the p-value is 𝑝 = .008. Copyright © 2022 Pearson Education, Inc.


670 Chapter 12

No 𝛼 was specified so we will use a value of 𝛼 = .05. Since the p-value is less than 𝛼 (𝑝 = .008 < .05), H0 is rejected. There is sufficient evidence to indicate mean authenticy rating when using a less descriptive logo differs from the mean authenticy rating when using a more descriptive logo 𝛼 = .05. c.

The model is 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 1 𝑖𝑓 𝑟𝑢𝑛𝑛𝑖𝑛𝑔 𝑠ℎ𝑜𝑒 𝑏𝑟𝑎𝑛𝑑 0 𝑖𝑓 𝑏𝑎𝑠𝑘𝑒𝑡𝑏𝑎𝑙𝑙 𝑏𝑟𝑎𝑛𝑑

where 𝑥 =

𝛽 =mean authenticity rating when using a basketball brand 𝛽 =difference in mean authenticy rating between when using a running shoe brand and when using a basketball brand. d.

MINITAB was used to fit the model: Regression Equation RATING

=

5.026 + 0.0 BRAND_BBall + 0.670 BRAND_RShoe

Coefficients Term Constant BRAND RShoe

Coef 5.026

SE Coef 0.284

T-Value 17.68

0.670

0.402

1.67

P-Value 0.000

VIF

0.097 1.00

Analysis of Variance Source Regression BRAND Error Total

DF Adj SS 1 20.22 1 20.22 178 1294.16 179 1314.38

Adj MS 20.221 20.221 7.271

F-Value 2.78 2.78

P-Value 0.097 0.097

To determine if the mean authenticy rating when using a running shoe brand differs from the mean authenticy rating for when using a basketball brand, we test: H 0 : β1 = 0 H a : β1 ≠ 0

The test statistic is 𝐹 = 2.78 and the p-value is 𝑝 = .097. No 𝛼 was specified so we will use a value of 𝛼 = .05. Since the p-value is not less than 𝛼 (𝑝 = .097 ≮ . 05), H0 is notrejected. There is insufficient evidence to indicate mean authenticy rating when using a running shoe brand differs from the mean authenticy rating when using a basketball brand at 𝛼 = .05.

12.79

a.

The model is E ( y ) = β 0 + β1 x1 + β 2 x 2 + β 3 x3

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building 1 if continuous verbal where x1 =  0 otherwise

b.

1 if late verbal x2 =  0 otherwise

671

1 if no verbal x3 =  0 otherwise

From the printout, βˆ0 = 49.00 , βˆ1 = 34.30 − 49.00 = −14.7 , βˆ 2 = 63.40 − 49.00 = 14.4 ,

βˆ3 = 63.90 − 49.00 = 14.9 c.

Using MINITAB, the results are: Regression Analysis: y versus x1, x2, x3 Analysis of Variance Source Regression x1 x2 x3 Error Total

DF 3 1 1 1 36 39

Adj SS 5922 1080 1037 1110 13189 19111

Adj MS 1973.9 1080.4 1036.8 1110.1 366.4

F-Value 5.39 2.95 2.83 3.03

P-Value 0.004 0.095 0.101 0.090

Model Summary S 19.1409

R-sq 30.99%

R-sq(adj) 25.23%

R-sq(pred) 14.80%

SE Coef 6.05 8.56 8.56 8.56

T-Value 8.10 -1.72 1.68 1.74

Coefficients Term Constant x1 x2 x3

Coef 49.00 -14.70 14.40 14.90

P-Value 0.000 0.095 0.101 0.090

VIF 1.50 1.50 1.50

Regression Equation y = 49.00 - 14.70 x1 + 14.40 x2 + 14.90 x3

From the output, the β estimates are: βˆ0 = 49.00 , βˆ1 = −14.7 , βˆ 2 = 14.4 , βˆ3 = 14.9 . d.

To determine if the the mean recall percentage differs for student-drivers in the four groups, we test: H 0 : β1 = β 2 = β3 = 0 H a : At least 1 βi ≠ 0

The test statistic is F = 5.39 and the p-value is p = .004 . Since the p-value is less than α ( p = .004 < .01) , H0 is rejected. There is sufficient evidence to indicate the mean recall percentage differs for student-drivers in the four groups at α = .01 .

12.80

a.

For no stock split, x1 = 0 . For high discretionary accrual, x2 = 1 . The mean buy-and-hold return rate is E ( y ) = β 0 + β1 x1 + β 2 x 2 + β 3 x1 x 2 = β 0 + β1 (1) + β 2 (1) + β 3 ( 0 )(1) = β 0 + β 2 .

b.

For no stock split, x1 = 0 . For low discretionary accrual, x2 = 0 . The mean buy-and-hold return rate is E ( y ) = β 0 + β1 x1 + β 2 x 2 + β 3 x1 x2 = β 0 + β1 ( 0 ) + β 2 ( 0 ) + β 3 ( 0 )( 0 ) = β 0 .

Copyright © 2022 Pearson Education, Inc.


672 Chapter 12

c.

The difference would be β 0 + β 2 − β 0 = β 2 .

d.

For stock split, x1 = 1 . For high discretionary accrual, x2 = 1 . The mean buy-and-hold return rate is E ( y ) = β 0 + β1 x1 + β 2 x 2 + β 3 x1 x 2 = β 0 + β1 (1) + β 2 (1) + β 3 (1)(1) = β 0 + β1 + β 2 + β 3 . For stock split, x1 = 1 . For low discretionary accrual, x2 = 0 . The mean buy-and-hold return rate is E ( y ) = β 0 + β1 x1 + β 2 x 2 + β 3 x1 x 2 = β 0 + β1 (1) + β 2 ( 0 ) + β 3 (1)( 0 ) = β 0 + β1

The difference would be β 0 + β1 + β 2 + β 3 − ( β 0 + β1 ) = β 2 + β 3 . b.

c.

d.

When there is no stock split, the mean buy-and-hold return rate increases by β 2 when discretionary accrual goes from low to high. When there is a stock split, the mean buy-and-hold return rate increases by β2 + β3 when discretionary accrual goes from low to high. Thus, the effect of discretionary accrual on the mean buy-and-hold return rate depends on the level of stock split. Since the p-value is less than α ( p = .027 < .05 ) , Ho is rejected. There is sufficient evidence to indicate that interaction between stock split and discretionary accrual exists at α = .05 . Yes. For no stock split, the difference between high discretionary accrual and low discretionary accrual is β 2 . Since β 2 is negative, then the performance of the high discretionary accrual acquirers is worse than low discretionary accrual acquirers. For stock split, the difference between high discretionary accrual and low discretionary accrual is β2 + β3 . Since both β 2 and β 3 are negative, then the performance of the high discretionary accrual acquirers is worse than low discretionary accrual acquirers, and even worse than for no stock split.

12.81

a.

The model is E ( y ) = β 0 + β1 x1 + β 2 x 2 + β 3 x3 1 if banned/other  0 otherwise

where x1 =  b.

1 if banned/no other x2 =   0 otherwise

1 if no ban/other x3 =   0 otherwise

Using MINITAB, the results are: Regression Analysis: MVL versus X1, X2, X3 Analysis of Variance Source Regression X1 X2 X3 Error Total

DF 3 1 1 1 173 176

Adj SS 189.391 125.640 107.559 1.184 103.651 293.042

Adj MS 63.130 125.640 107.559 1.184 0.599

F-Value 105.37 209.70 179.52 1.98

P-Value 0.000 0.000 0.000 0.162

Model Summary S 0.774039

R-sq 64.63%

R-sq(adj) 64.02%

R-sq(pred) 63.09%

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building

673

Coefficients Term Constant X1 X2 X3

Coef 6.376 2.137 2.171 0.253

SE Coef 0.104 0.148 0.162 0.180

T-Value 61.09 14.48 13.40 1.41

P-Value 0.000 0.000 0.000 0.162

VIF 1.38 1.33 1.27

Regression Equation MVL = 6.376 + 2.137 X1 + 2.171 X2 + 0.253 X3

The least squares prediction equation is yˆ = 6.38 + 2.14 x1 + 2.17 x2 + .25 x3 . c.

βˆ0 = 6.38 The mean MVL for the “no ban/no other” group is estimated to be 6.38. βˆ1 = 2.14 The difference in the mean MVL between the “banned/other” group and the “no ban/no other” group is estimated to be 2.14.

βˆ2 = 2.17 The difference in the mean MVL between the “banned/no other” group and the “no ban/no other” group is estimated to be 2.17.

βˆ3 = .25 The difference in the mean MVL between the “no ban/other” group and the “no ban/no other” group is estimated to be .25. d.

To determine if there are differences in mean MVL vales among the four groups, we test: H 0 : β1 = β 2 = β3 = 0 H a : At least 1 β i ≠ 0

The test statistic is F = 105.37 and the p-value is p = .000 . Since the p-value is less than α ( p = .000 < .05 ) , H0 is rejected. There is sufficient evidence to indicate there are differences in mean MVL vales among the four groups at α = .05 .

e.

ˆ

ˆ

ˆ

ˆ

The mean MVL value for the “banned/other” group is estimated to be β1 + β0 = 2.14 + 6.38 = 8.52 . The mean MVL value for the “banned/no other” group is estimated to be

βˆ2 + βˆ0 = 2.17 + 6.38 = 8.55 . The mean MVL value for the “no ban/other” group is estimated to be β3 + β0 = .25 + 6.38 = 6.63 .

ˆ

The mean MVL value for the “no ban/no other” group is estimated to be β0 = 6.38 . 12.82

a.

The first-order model is E ( y ) = β 0 + β 1 x1 .

b.

The new model is E ( y ) = β 0 + β 1 x1 + β 2 x 2 + β 3 x 3 . 1 if level 2 1 if level 3 , and x 3 =   0 otherwise  0 otherwise

where x 2 = 

Copyright © 2022 Pearson Education, Inc.


674 Chapter 12 c.

To allow for interactions, the model is: E ( y ) = β 0 + β 1 x1 + β 2 x 2 + β 3 x 3 + β 4 x1 x 2 + β 5 x1 x 3

12.83

d.

The response lines will be parallel if β 4 = β 5 = 0

e.

There will be one response line if β 2 = β 3 = β 4 = β 5 = 0 .

a.

The complete second-order model is E ( y ) = β 0 + β 1 x1 + β 2 x12 .

b.

The new model is E ( y ) = β 0 + β 1 x1 + β 2 x12 + β 3 x 2 + β 4 x 3 . 1 if level 2 0 otherwise

where x2 =  c.

 1 if level 3 x3 =  0 otherwise

The model with the interaction terms is: E ( y ) = β 0 + β 1 x1 + β 2 x12 + β 3 x 2 + β 4 x 3 + β 5 x1 x 2 + β 6 x1 x 3 + β 7 x12 x 2 + β 8 x12 x 3

12.84

d.

The response curves will have the same shape if none of the interaction terms are present or if β5 = β 6 = β 7 = β8 = 0 .

e.

The response curves will be parallel lines if the interaction terms as well as the second-order terms are absent or if β 2 = β 5 = β 6 = β 7 = β 8 = 0 .

f.

The response curves will be identical if no terms involving the qualitative variable are present or β 3 = β 4 = β 5 = β 6 = β 7 = β8 = 0

a.

When x 2 = x3 = 0 , E ( y ) = β 0 + β 1 x1 . When x 2 = 1 and x3 = 0 , E ( y ) = β 0 + β 1 x1 + β 2 When x 2 = 0 and x 3 = 1 , E ( y ) = β 0 + β 1 x1 + β 3

12.85

b.

For level 1, yˆ = 44.8 + 2.2x1 For level 2, yˆ = 44.8 + 2.2x1 + 9.4 = 54.2 + 2.2x1 For level 3, yˆ = 44.8 + 2.2x1 + 15.6 = 60.4+2.2x1

a.

For x 2 = 0 and x3 = 0 , E ( y ) = β 0 + β 1 x1 + β 2 x12 For x 2 = 1 and x3 = 0 , E ( y ) = β 0 + β 1 x1 + β 2 x12 + β 3 + β 5 x1 + β 7 x12 = ( β 0 + β 3 ) + ( β 1 + β 5 ) x1 + ( β 2 + β 7 ) x12

For x 2 = 0 and x 3 = 1 ,

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building

675

E ( y ) = β 0 + β 1 x1 + β 2 x12 + β 4 + β 6 x1 + β 8 x12 = ( β 0 + β 4 ) + ( β 1 + β 6 ) x1 + ( β 2 + β 8 ) x12

b.

Foe level 1, yˆ = 48.8 − 3.4 x1 + .07 x12 For level 2, yˆ = 4 8 .8 − 3 .4 x1 + .0 7 x12 − 2 .4 (1) + 3 .7 x1 (1 ) − .0 2 x12 (1) = 4 6 .4 + 0 .3 x1 + .0 5 x12 For level 3, yˆ = 4 8 .8 − 3 .4 x1 + .0 7 x12 − 7 .5 (1) + 2 .7 x1 (1) − .0 4 x12 (1) = 4 1 .3 − 0 .7 x1 + 0 .0 3 x12 The plots of the lines are:

12.86

The model is E ( y ) = β 0 + β 1 x1 + β 2 x12 + β 3 x 2 + β 4 x 3 + β 5 x 4 where x1 is the quantitative variable and  1 if level 2 of qualitative variable x2 =   0 otherwise

 1 if level 3 of qualitative variable x3 =   0 otherwise

 1 if level 4 of qualitative variable x4 =   0 otherwise

12.87

a.

For female students, the equation is yˆ = 1 1 .7 8 − 1 .9 7 ( 0 ) + .5 8 x 4 − .5 5 ( 0 ) x 4 = 1 1 .7 8 + .58 x 4 . Thus, for females for each 1-point increase in impression of reality TV show, the mean desire is estimated to increase by .58.

b.

For male students, the equation is yˆ = 1 1 .7 8 − 1 .9 7 (1) + .5 8 x 4 − .55 (1 ) x 4 = 9 .8 1 + .0 3 x 4 . Thus, for males for each 1-point increase in impression of reality TV show, the mean desire is estimated to

12.88

a.

1 if blonde Caucasian  0 otherwise

Let x1 = 

1 if brunette Caucasian  0 otherwise

Let x2 = 

Let x3 = beauty rating

The first order model would be E ( y ) = β 0 + β 1 x1 + β 2 x 2 + β 3 x 3 . b.

For minority females, the model would be E ( y ) = β 0 + β 1 ( 0 ) + β 2 ( 0 ) + β 3 x 3 = β 0 + β 3 x 3 . For each 1point increase in solicitor’s beauty rating, contribution level changes by β 3 . For blonde Caucasians, the model would be E ( y ) = β 0 + β 1 (1) + β 2 ( 0 ) + β 3 x 3 = β 0 + β 1 + β 3 x 3 . For each 1-point increase in solicitor’s beauty rating, the contribution level changes by β 3 . For brunette Caucasians, the model would be E ( y ) = β 0 + β 1 ( 0 ) + β 2 (1) + β 3 x 3 = β 0 + β 2 + β 3 x 3 . For each 1-point increase in solicitor’s beauty rating, the contribution level changes by β 3 . Copyright © 2022 Pearson Education, Inc.


676 Chapter 12 c.

The interaction model would be E ( y ) = β 0 + β 1 x1 + β 2 x 2 + β 3 x 3 + β 4 x1 x 3 + β 5 x 2 x 3 .

d.

For minority females, the model would be E ( y ) = β 0 + β 1 ( 0 ) + β 2 ( 0 ) + β 3 x 3 + β 4 ( 0 ) x 3 + β 5 ( 0 ) x 3 = β 0 + β 3 x 3 . For each 1-point increase in

solicitor’s beauty rating, the contribution level changes by β 3 . For blonde Caucasians, the model would be

E ( y ) = β 0 + β 1 (1) + β 2 ( 0 ) + β 3 x 3 + β 4 (1) x 3 + β 5 ( 0 ) x 3 = ( β 0 + β 1 ) + ( β 3 + β 4 ) x 3 . For each 1-point

increase in solicitor’s beauty rating, the contribution level changes by β3 + β4 . For brunette Caucasians, the model would be

E ( y ) = β 0 + β 1 ( 0 ) + β 2 (1 ) + β 3 x 3 + β 4 ( 0 ) x 3 + β 5 (1 ) x 3 = ( β 0 + β 2 ) + ( β 3 + β 5 ) x 3 . For each 1-point

increase in solicitor’s beauty rating, the contribution level changes by β3 + β5 . e.

A possible graph might look like:

Contribution

Hair 1 2 3

Beauty rating

a.

When x 2 = 0 , 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 (0) + 𝛽 𝑥 (0) + 𝛽 𝑥 (0) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥

b.

When x 2 = 1 ,

E ( y ) = β 0 + β 1 x1 + β 2 x12 + β 3 (1 ) + β 4 x1 (1 ) + β 5 x12 (1) = ( β 0 + β 3 ) + ( β 1 + β 4 ) x1 + ( β 2 + β 5 ) x12

c.

Answers will vary. Using MINITAB, a possible graph is: Scatterplot of y1, y2 vs x1 20

Variable y1 y2

15 10

Y-Data

12.89

5 0 -5 -10 -15 0

2

4

6

8

10

x1

The graph of y1 is an example of plotting the line when the team leader is not effective. This has a downward curvature. The graph of y2 is an example of plotting the line when the team leader is effective. This has an upward curvature.

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building 12.90

12.91

12.92

677

a.

The model would be 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥

b.

The model would be 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 . This model would allow for two parallel lines. There would be a separate line for each of the two payment plans.

c.

The model would be 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 𝑥 . This model would allow for two non-parallel lines. There would be a separate line for each of the two payment plans.

a.

1 if perceived organizational support is low Let x2 =   0 otherwise 1 ifperceived organizational support is neutral x3 =   0 otherwise

b.

The model would be E ( y ) = β 0 + β 1 x1 + β 2 x 2 + β 3 x 3 .

c.

The model would be E ( y ) = β 0 + β 1 x1 + β 2 x 2 + β 3 x 3 + β 4 x1 x 2 + β 5 x1 x 3 .

d.

If the effect of bullying on intention to leave is greater at the low level of POS than at the high level of POS, this indicates that POS and bullying interact. Thus, the model in part c supports these findings.

a.

The expected sign of β 1 would be negative.

b.

No. In order to test this theory, the interaction term between gender and agreeableness score would have to be added.

c.

If the theory is correct, the sign of β 1 (slope of the line for females) should be negative. The sign of β 3 (difference in the slopes for males and females) should also be negative.

d.

Using MINITAB, the results are: Regression Analysis: Income versus Gender, Agree, G_A The regression equation is Income = 40977 - 5479 Agree + 54225 Gender - 8708 G_A Predictor Constant Agree Gender G_A

Coef 40977 -5479 54225 -8708

S = 7775.99

SE Coef 8789 2644 12989 3935

R-Sq = 76.3%

T 4.66 -2.07 4.17 -2.21

P 0.000 0.041 0.000 0.029

R-Sq(adj) = 75.5%

Analysis of Variance Source Regression Residual Error Total Source Agree Gender G_A

DF 1 1 1

DF 3 96 99

SS 18651136903 5804741101 24455878004

MS 6217045634 60466053

F 102.82

Seq SS 1896882849 16458104868 296149185

The signs of β̂1and β̂3 are both negative as expected. Copyright © 2022 Pearson Education, Inc.

P 0.000


678 Chapter 12 e.

To determine if the rate of decrease of income with agreeableness is steeper for males than for females, we test: H 0 : β3 = 0 H a : β3 < 0

12.93

f.

The test statistic is 𝑡 = −2.21and the p-value is p = .029 / 2 = .0145 . Since the p-value is less than 𝛼 (𝑝 = .0145 < .05), H0 is rejected. There is sufficient evidence to indicate the rate of decrease of income with agreeableness is steeper for males than for females at 𝛼 = .05.

a.

Let x1 = 

b.

Let x3 = weight. The model would be: E ( y ) = β 0 + β 1 x1 + β 2 x 2 + β 3 x 3

c.

The model would be: E ( y ) = β 0 + β 1 x1 + β 2 x 2 + β 3 x 3 + β 4 x1 x 3 + β 4 x 2 x 3

d.

From MINITAB, the output is:

1 if channel catfish  0 otherwise

1 if largemouth bass x2 =  0 otherwise

Regression Analysis: DDT versus x1, x2, Weight The regression equation is DDT = 3.1 + 26.5 x1 - 4.1 x2 + 0.0037 Weight Predictor Constant x1 x2 Weight

Coef 3.13 26.51 -4.09 0.00371

S = 98.57

SE Coef 38.89 21.52 37.91 0.02598

R-Sq = 1.7%

T 0.08 1.23 -0.11 0.14

P 0.936 0.220 0.914 0.887

R-Sq(adj) = 0.0%

Analysis of Variance Source Regression Residual Error Total Source x1 x2 Weight

DF 1 1 1

DF 3 140 143

SS 23652 1360351 1384003

MS 7884 9717

F 0.81

P 0.490

Seq SS 23041 414 198

The least squares prediction equation is: yˆ = 3.13 + 26.51x1 − 4.09 x2 + 0.00371x3 e.

𝛽 = 0.00371. For each additional gram of weight, the mean level of DDT is expected to increase by 0.00371 units, holding species constant.

f.

From MINITAB, the output is: Regression Analysis: DDT versus x1, x2, Weight, x1Weight, x2Weight The regression equation is DDT = 3.5 + 25.6 x1 - 3.5 x2 + 0.0034 Weight + 0.0008 x1Weight - 0.0013 x2Weight

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building Predictor Constant x1 x2 Weight x1Weight x2Weight

Coef 3.50 25.59 -3.47 0.00344 0.00082 -0.00129

S = 99.29

SE Coef 54.69 67.52 84.70 0.03843 0.05459 0.09987

R-Sq = 1.7%

T 0.06 0.38 -0.04 0.09 0.02 -0.01

679

P 0.949 0.705 0.967 0.929 0.988 0.990

R-Sq(adj) = 0.0%

Analysis of Variance Source Regression Residual Error Total Source x1 x2 Weight x1Weight x2Weight

DF 1 1 1 1 1

DF 5 138 143

SS 23657 1360346 1384003

MS 4731 9858

F 0.48

P 0.791

Seq SS 23041 414 198 4 2

The least squares prediction equation is: yˆ = 3.50 + 25.59 x1 − 3.47 x2 + 0.00344 x3 + 0.00082 x1 x3 − .00129 x2 x3

g.

For channel catfish, x1 = 1 and x 2 = 0 . The least squares line is yˆ = 3.50 + 25 .59 (1) − 3.4 7 ( 0 ) + 0.0034 4 x 3 + 0 .0 0082 (1) x 3 − .0 0129 ( 0 ) x 3 = 29.09 + .0042 6 x 3

The estimated slope is .00426. 12.94

a.

For jobs that are not highly complex, the equation would be E ( y ) = β 0 + β 1 x1 + β 2 x12 + β 3 ( 0 ) + β 4 x1 ( 0 ) + β 5 x12 ( 0 ) = β 0 + β 1 x1 + β 2 x12 .

b.

β 0 = mean task performance score when the conscientiousness score is 0 and the job is not highly complex. β1 = location parameter for jobs that are not highly complex since the quadratic term is in the model. β 2 determines the rate of increase or decrease in the mean task performance score as the conscientiousness score increases for jobs that are not highly complex. If the value of β 2 is positive, then the shape of the curve will be concave upward. If the value of β 2 is negative, then the shape of the curve will be concave downward.

c.

For jobs that are highly complex, the equation would be

d.

β 0 = mean task performance score when the conscientiousness score is 0 and the job is not highly complex.

E ( y ) = β 0 + β 1 x1 + β 2 x12 + β 3 (1 )(1 ) + β 4 x1 (1 ) + β 5 x12 (1 ) = ( β 0 + β 3 ) + ( β 1 + β 4 ) x1 + ( β 2 + β 5 ) x12 .

β 3 = difference in the mean task performance score when the conscientiousness score is 0 between highly complex jobs and jobs that are not highly complex.

Copyright © 2022 Pearson Education, Inc.


680 Chapter 12

β1 = location parameter for jobs that are not highly complex since the quadratic term is in the model. β 4 = difference in the location parameter between jobs that are highly complex and jobs that are not highly complex since the quadratic term is in the model. β 2 determines the rate of increase or decrease in the mean task performance score as the conscientiousness score increases for jobs that are not highly complex. β 5 = difference in the rate of increase or decrease in the mean task performance score as the conscientiousness score increases between highly complex jobs and jobs that are not highly complex.

12.95

e.

Yes. Since the interaction terms are included in the model, the curvilinear relationship for highly complex jobs is different than the curvilinear relationship for jobs that are not highly complex.

a.

Let x1 = sales volume  1 if NW x2 =   0 if not

 1 if S x3 =   0 if not

 1 if W x4 =   0 if not

The complete second order model for the sales price of a single-family home is: E ( y ) = β 0 + β 1 x1 + β 2 x1 2 + β 3 x 2 + β 4 x 3 + β 5 x 4 + β 6 x1 x 2 + β 7 x1 x 3 + β 8 x1 x 4 + β 9 x1 2 x 2 + β 10 x1 2 x 3 + β 1 1 x1 2 x 4

b.

For the West, x 2 = 0 , x3 = 0 , and x 4 = 1 . The equation would be: E ( y ) = β0 + β1 x1 + β 2 x12 + β3 ( 0 ) + β 4 ( 0 ) + β5 (1) + β6 x1 ( 0 ) + β7 x1 ( 0) + β8 x1 (1) + β9 x12 ( 0 ) + β10 x12 ( 0 ) + β11 x12 (1)

= β0 + β1 x1 + β 2 x12 + β5 + β8 x1 + β11 x12 = ( β0 + β5 ) + ( β1 + β8 ) x1 + ( β 2 + β11 ) x12

c.

For the Northwest, x 2 = 1 , x3 = 0, and x 4 = 0 . The equation would be: E ( y ) = β0 + β1 x1 + β 2 x12 + β3 (1) + β4 ( 0 ) + β5 ( 0) + β6 x1 (1) + β7 x1 ( 0) + β8 x1 ( 0) + β9 x12 (1) + β10 x12 ( 0) + β11 x12 ( 0)

= β0 + β1 x1 + β2 x12 + β3 + β6 x1 + β9 x12 = ( β0 + β3 ) + ( β1 + β6 ) x1 + ( β2 + β9 ) x12

d.

The parameters β 3 , β 4 , and β 5 allow for the y-intercepts of the 4 regions to be different. The parameters β 6 , β 7 , and β 8 allow for the peaks of the curves to be a different value of sales volume ( x1 ) for the four regions. The parameters β 9 , β10 , and β11 allow for the shapes of the curves to be different for the four regions. Thus, all the parameters from β 3 through β11 allow for differences in mean sales prices among the four regions.

e.

Using MINITAB, the printout is:

Regression Analysis: Price versus X1, X1SQ, ... The regression equation is Price = 1904740 - 70.4 X1 + 0.000721 X1SQ + 159661 X2 + 5291908 X3 + 3663319 X4 + 22.2 X1X2 - 23.9 X1X3 - 37 X1X4 - 0.000421 X1SQX2 - 0.000404 X1SQX3 - 0.000181 X1SQX4

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building Predictor Constant X1 X1SQ X2 X3 X4 X1X2 X1X3 X1X4 X1SQX2 X1SQX3 X1SQX4

Coef 1904740 -70.44 0.0007211 159661 5291908 3663319 22.25 -23.86 -37.2 -0.0004210 -0.0004044 -0.0001810

S = 24365.8

SE Coef 1984278 72.09 0.0006515 2069265 4812586 4478880 73.74 92.09 103.0 0.0006589 0.0006777 0.0007333

R-Sq = 85.0%

T 0.96 -0.98 1.11 0.08 1.10 0.82 0.30 -0.26 -0.36 -0.64 -0.60 -0.25

681

P 0.351 0.343 0.285 0.939 0.288 0.425 0.767 0.799 0.723 0.532 0.559 0.808

R-Sq(adj) = 74.6%

Analysis of Variance Source Regression Residual Error Total Source X1 X1SQ X2 X3 X4 X1X2 X1X3 X1X4 X1SQX2 X1SQX3 X1SQX4

DF 1 1 1 1 1 1 1 1 1 1 1

DF 11 16 27

SS 53633628997 9499097458 63132726455

MS 4875784454 593693591

F 8.21

P 0.000

Seq SS 3591326 64275360 11338642654 10081000583 241539024 18258475317 5579187440 7566169810 138146367 326425228 36175888

To determine if the model is useful for predicting sales price, we test: H 0 : β1 = β 2 =  = β11 = 0 H a : At least one β i ≠ 0

MS ( Model ) = 8.21 and the p-value is p = .000. Since the p-value is less than MSE 𝛼 (𝑝 = .000 < .01), H0 is rejected. There is sufficient evidence to indicate the model is useful in predicting sales price at 𝛼 = .01. The test statistic is F =

12.96

a.

The model would be E ( y ) = β 0 + β 1 x1 + β 2 x 2 + β 3 x1 x 2 .

b.

For election years effected by a World War, x 2 = 1 . The model would be E ( y ) = β 0 + β 1 x1 + β 2 (1 ) + β 3 x1 (1 ) = ( β 0 + β 2 ) + ( β 1 + β 3 ) x1 . The slope of the line would be ( β1 + β 3 ) .

c.

For election years not effected by a World War, x 2 = 0 . The model would be E ( y ) = β 0 + β 1 x1 + β 2 ( 0 ) + β 3 x1 ( 0 ) = β 0 + β 1 x1 . The slope of the line would be β1.

d.

When there is no chaisma difference, x1 = 0 . The model is E ( y ) = β 0 + β 1 ( 0 ) + β 2 x 2 + β 3 ( 0 ) x 2 = β 0 + β 2 x 2 . Thus, the effect of a World War on the mean Democratic vote share is β 2 . Copyright © 2022 Pearson Education, Inc.


682 Chapter 12

e.

When there chaisma difference is 50, x1 = 50 . The model is E ( y ) = β 0 + β 1 ( 5 0 ) + β 2 x 2 + β 3 ( 5 0 ) x 2 = ( β 0 + 5 0 β 1 ) + ( β 2 + 5 0 β 3 ) x 2 . Thus, the effect of a World

War on the mean Democratic vote share is β 2 + 50 β 3 . f.

Using MINITAB, the results are: Regression Analysis: VSHARE versus WAR, CDIFF, CDIFF*WAR Analysis of Variance Source Regression WAR CDIFF CDIFF*WAR Error Lack-of-Fit Pure Error Total

DF 3 1 1 1 20 19 1 23

Adj SS 216.33 62.57 13.18 199.71 909.53 818.81 90.72 1125.86

Adj MS 72.11 62.57 13.18 199.71 45.48 43.10 90.72

F-Value 1.59 1.38 0.29 4.39

P-Value 0.224 0.255 0.596 0.049

0.48

0.837

P-Value 0.000 0.255 0.596 0.049

VIF

Model Summary S 6.74364

R-sq 19.21%

R-sq(adj) 7.10%

R-sq(pred) 0.00%

Coefficients Term Constant WAR CDIFF CDIFF*WAR

Coef 49.37 7.17 -0.0378 0.378

SE Coef 1.53 6.11 0.0701 0.180

T-Value 32.21 1.17 -0.54 2.10

2.16 1.30 2.53

Regression Equation VSHARE = 49.37 + 7.17 WAR - 0.0378 CDIFF + 0.378 CDIFF*WAR

To determine if the linear effect of charisma difference on mean Democratic vote share depends on World War status, we test: H 0 : β3 = 0 H a : β3 ≠ 0

The test statistic is 𝑡 = 2.10 and the p-value is p = .049 . Since the p-value is less than 𝛼(𝑝 = .049 < .10), H0 is rejected. There is sufficient evidence to indicate the linear effect of charisma difference on mean Democratic vote share depends on World War status at 𝛼 = .10. 12.97

The models in parts a and b are nested: The complete model is E ( y ) = β 0 + β 1 x1 + β 2 x 2 . The reduced model is E ( y ) = β 0 + β 1 x1 . The models in parts a and d are nested.

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building

683

The complete model is E ( y ) = β 0 + β 1 x1 + β 2 x 2 + β 3 x1 x 2 . The reduced model is E ( y ) = β 0 + β 1 x1 + β 2 x 2 . The models in parts a and e are nested. The complete model is E ( y ) = β 0 + β 1 x1 + β 2 x 2 + β 3 x1 x 2 + β 4 x12 + β 5 x 22 . The reduced model is E ( y ) = β 0 + β 1 x1 + β 2 x 2 . The models in parts b and c are nested. The complete model is E ( y ) = β 0 + β 1 x1 + β 2 x12 . The reduced model is E ( y ) = β 0 + β 1 x1 . The models in parts b and d are nested. The complete model is E ( y ) = β 0 + β 1 x1 + β 2 x 2 + β 3 x1 x 2 . The reduced model is E ( y ) = β 0 + β 1 x1 . The models in parts b and e are nested. The complete model is E ( y ) = β 0 + β 1 x1 + β 2 x 2 + β 3 x1 x 2 + β 4 x12 + β 5 x 22 The reduced model is E ( y ) = β 0 + β 1 x1 . The models in parts c and e are nested. The complete model is E ( y ) = β 0 + β 1 x1 + β 2 x 2 + β 3 x1 x 2 + β 4 x12 + β 5 x 22 The reduced model is E ( y ) = β 0 + β 1 x1 + β 2 x12 The models in parts d and e are nested. The complete model is E ( y ) = β 0 + β 1 x1 + β 2 x 2 + β 3 x1 x 2 + β 4 x12 + β 5 x 22 The reduced model is E ( y ) = β 0 + β 1 x1 + β 2 x 2 + β 3 x1 x 2 12.98

a.

H a : At least one β i ≠ 0, i = 3, 4, 5

b.

The reduced model would be E ( y ) = β 0 + β 1 x1 + β 2 x 2 .

c.

ν 1 = k − g = 5 − 2 = 3 and the ν 2 = n – ( k + 1) = 3 0 – ( 5 + 1) = 2 4 .

d.

H 0 : β3 = β 4 = β5 = 0 H a : At least one β i ≠ 0, i = 3, 4, 5 − / k − g ) (1250.2 − 1125.2 ) / ( 5 − 2 ) 41.6667 The test statistic is F = ( SSE R SSE C ) ( = = = .89 SSE C /  n − ( k + 1) 

1125.2 /  30 − ( 5 + 1) 

46.8833

The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution with 𝜈 = 𝑘 − 𝑔 = 5 − 2 = 3 and denominator 𝜈 = 𝑛– (𝑘 + 1) = 30– (5 + 1) = 24. From Table VI, Appendix D, F.05 = 3.01 . The rejection region is 𝐹 > 3.01. Copyright © 2022 Pearson Education, Inc.


684 Chapter 12

Since the observed value of the test statistic does not fall in the rejection region (𝐹 = .89 ≯ 3.01), H0 is not rejected. There is insufficient evidence to indicate the second-order terms are useful at 𝛼 = .05. 12.99

a.

Including β 0 , there are five 𝛽 parameters in the complete model and three in the reduced model.

b.

The hypotheses are: H 0 : β3 = β4 = 0 H a : At least one β i ≠ 0, i = 3, 4

c.

/ k − g ) (160.44 − 152.66 ) / ( 4 − 2 ) − The test statistic is F = ( SSE R SSE C ) ( = = SSE C /  n − ( k + 1) 

152.66 /  20 − ( 4 + 1) 

3.89 = .38 10.1773

The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution with 𝜈 = 𝑘 − 𝑔 = 4 − 2 = 2 and 𝜈 = 𝑛– (𝑘 + 1) = 20– (4 + 1) = 15. From Table VI, Appendix D, F.05 = 3.68 . The rejection region is 𝐹 > 3.68. Since the observed value of the test statistic does not fall in the rejection region (𝐹 = .38 ≯ 3.68), H0 is not rejected. There is insufficient evidence to indicate the complete model is better than the reduced model at𝛼 = .05. 12.100 a. b.

The complete 1st-order model is E ( y ) = β 0 + β 1 x1 + β 2 x 2 + β 3 x 3 + β 4 x 4 + β 5 x 5 + β 6 x 6 . The hypotheses would be H 0 : β1 = β 2 = 0 H a : At least 1 β i ≠ 0

The reduced model is E ( y ) = β 0 + β 3 x 3 + β 4 x 4 + β 5 x 5 + β 6 x 6 . 2

2

c.

Because the R value for the reduced model is so much smaller than the R value for the complete model, it appears that at least one of the variables purser and head flight attendant is significant.

d.

The p-value is p < .05 . Since the p-value is less than α ( p < .05 ) , H0 is rejected. There is sufficient evidence to indicate the leadership score of either the purser or head flight attendant (or both) is statistically useful for predicting team goal attainment at α = .05 . 2

2

e.

Because the R value for the reduced model is almost the same as the R value for the complete model, it appears that neither of the variables purser and head flight attendant is significant.

f.

The p-value is p > .10 . Since the p-value is not less than α ( p > .10 ) , H0 is not rejected. There is insufficient evidence to indicate the leadership score of either the purser or head flight attendant (or both) is statistically useful for predicting team goal attainment at α = .05 .

12.101 a.

To determine whether the quadratic terms in the model are statistically useful for predicting relative optimism, we test:

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building

685

H 0 : β 4 = β5 = 0 H a : At least one β i ≠ 0

b.

The complete model is E ( y ) = β 0 + β 1 x1 + β 2 x 2 + β 3 x1 x 2 + β 4 x 2 2 + β 5 x1 x 2 2 and the reduced model is E ( y ) = β 0 + β 1 x1 + β 2 x 2 + β 3 x1 x 2 .

b.

To determine whether the interaction terms in the model are statistically useful for predicting relative optimism, we test: H 0 : β3 = β5 = 0 H a : At least one β i ≠ 0

d.

The complete model is E ( y ) = β 0 + β 1 x1 + β 2 x 2 + β 3 x1 x 2 + β 4 x 2 2 + β 5 x1 x 2 2 and the reduced model is E ( y ) = β 0 + β 1 x1 + β 2 x 2 + β 4 x 2 2 .

e.

To determine whether the dummy variable terms in the model are statistically useful for predicting relative optimism, we test: H 0 : β1 = β 3 = β 5 = 0 H a : At least one β i ≠ 0

f.

The complete model is E ( y ) = β 0 + β 1 x1 + β 2 x 2 + β 3 x1 x 2 + β 4 x 2 2 + β 5 x1 x 2 2 and the reduced model is E ( y ) = β 0 + β 2 x2 + β 4 x2 2 .

12.102 a.

The model from part b of Exercise 12.91 is 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 . The model from part c of Exercise 12.91 is 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 𝑥 + 𝛽 𝑥 𝑥 . These two models are nested because all of the terms in the first model are contained in the second model. The first model is the reduced model and the second model is the complete model.

b.

The null hypothesis for comparing the two models is 𝐻 : 𝛽 = 𝛽 = 0.

c.

If we reject H0 in part b, we would conclude that at least one of the interaction terms is not 0. Thus, we would prefer the second model.

d.

If we fail to reject H0 in part b, then we would conclude that we have no evidence to indicate that the interaction terms were significant. Thus, we would prefer the first model.

12.103 a.

To determine if any of the extra variables are useful predictors of forecast accuracy, we test the null hypothesis: 𝐻 :𝛽 = 𝛽 = 𝛽 = 𝛽 = 𝛽 = 𝛽 = 𝛽 = 𝛽 = 𝛽 (

b.

)

The test statistic is 𝐹 = [

(

(

) )]

=

( /[

,

=𝛽

)/(

)

(

)

=0

= 2,296.3

The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution with 𝜈 = 𝑘 − 𝑔 = 11 − 1 = 10 and 𝜈 = 𝑛 − (𝑘 + 1) = 19,324 − (10 + 1) = 19,313. From Table VI, Appendix D, 𝐹. ≈ 1.91. The rejection region is 𝐹 > 1.91.

Copyright © 2022 Pearson Education, Inc.


686 Chapter 12 Since the observed value of the test statistic does fall in the rejection region ( 𝐹 = 2,296.3 > 1.91), H0 is rejected. There is sufficient evidence to indicate that the enhanced model is a better predictor of forescast accuracy than the simple model at 𝛼 = .05. 12.104 a.

To determine the overall adequacy of the model, the null hypothesis is: 𝐻 :𝛽 = 𝛽 = 𝛽 = 𝛽 = 𝛽 = 0

b.

To determine if task performance score and conscientiousness score are curvilinearly related, the null hypothesis is: 𝐻 :𝛽 = 𝛽 = 0

c.

To determine if the curvilinear relationship between task performance score and conscientiousness score depends on job complexity, the null hypothesis is: 𝐻 :𝛽 = 0

d.

From Exercise 12.54 we know that 𝑛 = 602. For part a, the reduced model is 𝐸(𝑦) = 𝛽 . The test statistic is 𝐹=

( /[

)/(

)

(

)]

=

(

)/ /

For part b, the reduced model is 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 𝑥 . The test statistic is 𝐹=

( /[

)/(

)

(

)]

=

(

)/ /

For part c, the reduced model is 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 𝑥 . The test statistic is 𝐹= 12.105 a. b.

( /[

)/(

)

(

)]

=

(

)/ /

The model would be: 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 𝑥 + 𝛽 𝑥 𝑥 + 𝛽 𝑥 𝑥 Using MINITAB, the results are: Regression Analysis: DESIRE versus GENDER, SELFESTM, ... The regression equation is DESIRE = 13.1 - 1.89 GENDER - 0.091 SELFESTM + 0.135 BODYSAT + 0.746 IMPREAL - 0.065 G_I + 0.0098 SE_I - 0.112 BS_I Predictor Constant GENDER SELFESTM BODYSAT IMPREAL G_I SE_I BS_I S = 2.23593

Coef 13.092 -1.890 -0.0908 0.1350 0.7460 -0.0647 0.00977 -0.1121

SE Coef 2.013 2.074 0.1176 0.4749 0.4918 0.5110 0.02808 0.1160

R-Sq = 51.3%

T 6.50 -0.91 -0.77 0.28 1.52 -0.13 0.35 -0.97

P 0.000 0.363 0.441 0.777 0.131 0.899 0.728 0.335

R-Sq(adj) = 49.2%

Analysis of Variance Source Regression

DF 7

SS 853.89

MS 121.98

F 24.40

P 0.000

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building Residual Error Total Source GENDER SELFESTM BODYSAT IMPREAL G_I SE_I BS_I

DF 1 1 1 1 1 1 1

162 169

809.90 1663.79

687

5.00

Seq SS 674.64 57.66 19.62 75.91 20.36 1.03 4.67

To determine the overall utility of the model, we test: 𝐻 :𝛽 =𝛽 =𝛽 =𝛽 =𝛽 =𝛽 =𝛽 =0 𝐻 : At least 1 𝛽 ≠ 0 The test statistic is 𝐹 = 24.40 and the p-value is 𝑝 = .000. Since the p-value is so small, H0 will be rejected for any reasonable value of 𝛼. There is sufficient evidence to indicate the model is useful for predicting desire to have cosmetic surgery. c.

To determine if impression of reality TV interacts with each of the other independent variables, the null hypothesis is: 𝐻 :𝛽 = 𝛽 = 𝛽 = 0

d.

The reduced model is 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 . This model was fit in Exercise 12.17. From this exercise, 𝑆𝑆𝐸 = 835.95. The test statistic is F =

( SSE R − SSEC) / ( k − g) = (835.95 − 809.90) / ( 7 − 4) = 1.74 . 809.90/ 170 − ( 7 +1)  SSEC/ n − ( k +1) 

Since no 𝛼 was given, we will use 𝛼 = .05. The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution with 𝜈 = 𝑘 − 𝑔 = 7 − 4 = 3 and 𝜈 = 𝑛 − (𝑘 + 1) = 170 − (7 + 1) = 162. From Table VI, Appendix D, 𝐹. ≈ 2.68. The rejection region is 𝐹 > 2.68. Since the observed value of the test statistic does not fall in the rejection region ( 𝐹 = 1.74 ≯ 2.68), H0 is not rejected. There is insufficient evidence to indicate impression of reality TV interacts with each of the other independent variables at 𝛼 = .05.

Copyright © 2022 Pearson Education, Inc.


688 Chapter 12 12.106 a.

Model 1: R2 = .101 . 10.1% of the total variation in the supervisor-directed aggression score is explained by the terms in Model 1. Model 2: R2 = .555 . 55.5% of the total variation in the supervisor-directed aggression score is explained by the terms in Model 2.

b.

To compare the fits of Model 1 and Model 2, we test: H 0 : β5 = β 6 = β 7 = β8 = 0 H a : At least one β i ≠ 0

c.

Yes. All of the terms in Model 1 are contained in Model 2.

d.

Since the p-value is so small (𝑝 < .001), H0 would be rejected. There is sufficient evidence that at least one of the variables self-esteem, history of aggression, interactional injustice at primary job, and abusive supervisor at primary job is significant in predicting supervisor-directed aggression score.

e.

Model 3:

(

E ( y ) = β0 + β1 ( Age ) + β2 ( Gender ) + β3 Interaction injustice at 2 nd job

(

)

)

+ β 4 Abusive supervisor at 2 job + β5 ( Self − esteem ) + β6 ( History of aggression ) nd

+ β7 ( Interactional injustice at primary job ) + β8 ( Abusive supervisor at primary job )

+ β9 ( Self − esteem )( History of aggression ) + β10 ( Self − esteem )( Interactional injustice at primary job ) + β11 ( Self − esteem )( Abusive supervisor at primary job)

+ β12 ( History of aggression ) ( Interactional injustice at primary job )

+ β13 ( History of aggression ) ( Abusive supervisor at primary job )

+ β14 ( Interactional injustice at primary job )( Abusive supervisor at primary job)

f.

To compare Model 2 with Model 3, we test: H 0 : β 9 = β10 =  = β 14 = 0 H a : At least one β i ≠ 0

The p-value for the test is p > .10 . Since the p-value > .10, H0 is not rejected. There is insufficient evidence to indicate any of the interaction terms are significant in predicting supervisor-directed aggression score for any reasonable value of 𝛼. If the theory is correct, then the sign of β 3 should be positive. A possible graph might look like: Variable Males Females

50

40

30

Y

12.107 a. b.

20

10

0 -7

-6

-5

-4

-3

-2

-1

0

x1

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building c.

The complete 2nd-order model is E ( y ) = β 0 + β 1 x1 + β 2 x12 + β 3 x 2 + β 4 x1 x 2 + β 5 x12 x 2 .

d.

A possible graph might look like: 160

Variable Male Female

140 120

Y

100 80 60 40 20 0 -7

-6

-5

-4

-3

-2

-1

0

x1

e.

To compare the two models, we test: H 0 : β 4 = β5 = 0 H a : At least 1 β i ≠ 0

f.

Using MINITAB, the results of fitting the model in part a are: Regression Analysis: Income versus Agree, Agree-sq, Gender The regression equation is Income = - 21657 + 37155 Agree - 7056 Agree-sq + 25482 Gender Predictor Constant Agree Agree-sq Gender

Coef -21657 37155 -7056 25482

S = 7737.36

SE Coef 31780 19257 2903 1552

R-Sq = 76.5%

T -0.68 1.93 -2.43 16.42

P 0.497 0.057 0.017 0.000

R-Sq(adj) = 75.8%

Analysis of Variance Source Regression Residual Error Total Source Agree Agree-sq Gender

DF 1 1 1

DF 3 96 99

SS 18708663846 5747214158 24455878004

MS 6236221282 59866814

F 104.17

Seq SS 1896882849 663015651 16148765346

Using MINITAB, the results of fitting the model in part c are:

Copyright © 2022 Pearson Education, Inc.

P 0.000

689


690 Chapter 12 Regression Analysis: Income versus Agree, Agree-sq, Gender, G_A, G_A-sq The regression equation is Income = - 9847 + 27248 Agree - 5169 Agree-sq + 42549 Gender - 4765 G_A - 128 G_A-sq Predictor Constant Agree Agree-sq Gender G_A G_A-sq

Coef -9847 27248 -5169 42549 -4765 -128

S = 7751.27

SE Coef 45303 28743 4520 71654 43177 6474

R-Sq = 76.9%

T -0.22 0.95 -1.14 0.59 -0.11 -0.02

P 0.828 0.346 0.256 0.554 0.912 0.984

R-Sq(adj) = 75.7%

Analysis of Variance Source Regression Residual Error Total Source Agree Agree-sq Gender G_A G_A-sq

DF 1 1 1 1 1

DF 5 94 99

SS 18808157832 5647720172 24455878004

MS 3761631566 60082129

F 62.61

P 0.000

Seq SS 1896882849 663015651 16148765346 99470471 23515

The test statistic is F =

( SSE R − SSEC ) / ( k − g ) (5, 747, 214,158 − 5, 647, 720,172) / (5 − 3) = = .83 . SSEC / [ n − ( k + 1)] 5, 647, 720,172 / [100 − (5 + 1)]

The rejection region requires 𝛼 = .10 in the upper tail of the F-distribution withν 1 = k − g = 5 − 3 = 2 andν 2 = n − ( k + 1) = 100 − (5 + 1) = 94 . From Table V, Appendix D, F.10 ≈ 2.37 . The rejection region is 𝐹 > 2.37. Since the observed value of the test statistic does not fall in the rejection region ( F = .83 >/ 2.37) , H0 is not rejected. There is insufficient evidence to indicate the interaction terms improve the model at 𝛼 = .10. 12.108 a.

Using MINITAB, the results for fitting the reduced model are: Regression Analysis: Price versus X1, X2, X3, X4, X1X2, X1X3, X1X4 The regression equation is Price = - 286970 + 9.32 X1 + 578133 X2 + 60968 X3 - 575769 X4 - 10.4 X1X2 - 6.52 X1X3 + 1.00 X1X4

Predictor Constant X1 X2 X3 X4 X1X2 X1X3 X1X4 S = 30785.9

Coef -286970 9.317 578133 60968 -575769 -10.408 -6.522 1.000

SE Coef 161003 2.900 183578 292823 325699 3.060 3.300 3.903

R-Sq = 70.0%

T -1.78 3.21 3.15 0.21 -1.77 -3.40 -1.98 0.26

P 0.090 0.004 0.005 0.837 0.092 0.003 0.062 0.800

R-Sq(adj) = 59.5%

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building

691

Analysis of Variance Source Regression Residual Error Total Source X1 X2 X3 X4 X1X2 X1X3 X1X4

DF 1 1 1 1 1 1 1

DF 7 20 27

SS 44177277861 18955448594 63132726455

MS 6311039694 947772430

F 6.66

P 0.000

Seq SS 3591326 8414868549 9294417537 1463449502 17344397940 7594294303 62258704

From Exercise 12.95, 𝑆𝑆𝐸 = 9,499,097,458, 𝑛 = 28, and 𝑘 = 11. To determine if the quadratic terms are statistically useful for predicting sales price, we test: H 0 : β 2 = β 9 = β10 = β11 = 0 H a : At least 1 β i ≠ 0 − / k − g ) (18, 955, 448, 594 − 9, 499, 097, 458 ) / (11 − 7 ) The test statistic is F = ( SSE R SSE C ) ( = = 3.98 SSE C /  n − ( k + 1) 

9, 499, 097, 458 /  28 − (11 + 1) 

The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution withν 1 = k − g = 11 − 7 = 4 and ν 2 = n − ( k + 1) = 28 − (11 + 1) = 16 . From Table VI, Appendix D, F.05 = 3.01 . The rejection region is 𝐹 > 3.01. Since the observed value of the test statistic falls in the rejection region (𝐹 = 3.98 > 3.01), H0 is rejected. There is sufficient evidence to indicate at least one of the quadratic terms is statistically useful for predicting sales price at 𝛼 = .05. b.

Since we rejected H0 in part a, the complete model is preferred. At least one of the quadratic terms is significant.

c.

The preferred model from part b is the complete model. Using MINITAB, the results of fitting the model without the interaction terms is: Regression Analysis: Price versus X1, X1SQ, X2, X3, X4 The regression equation is Price = 289549 - 2.15 X1 + 0.000019 X1SQ - 57530 X2 - 203755 X3 - 24038 X4

Predictor Constant X1 X1SQ X2 X3 X4 S = 43381.9

Coef 289549 -2.150 0.00001888 -57530 -203755 -24038

SE Coef 138840 3.325 0.00001621 50653 113332 67099

R-Sq = 34.4%

T 2.09 -0.65 1.16 -1.14 -1.80 -0.36

P 0.049 0.524 0.257 0.268 0.086 0.724

R-Sq(adj) = 19.5%

Copyright © 2022 Pearson Education, Inc.


692 Chapter 12 Analysis of Variance Source Regression Residual Error Total Source X1 X1SQ X2 X3 X4

DF 1 1 1 1 1

DF 5 22 27

SS 21729048947 41403677509 63132726455

MS 4345809789 1881985341

F 2.31

P 0.079

Seq SS 3591326 64275360 11338642654 10081000583 241539024

To determine whether region and sales volume interact to affect sales price, we test: H 0 : β 6 = β 7 = β 8 = β 9 = β10 = β11 = 0 H a : At least 1 β i ≠ 0 / k − g ) ( 41, 403, 677, 509 − 9, 499, 097, 458 ) / (11 − 5 ) − The test statistic is F = ( SSE R SSE C ) ( = = 8.96 SSE C /  n − ( k + 1) 

9, 499, 097, 458 /  28 − (11 + 1) 

The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution with 𝜈 = 𝑘 − 𝑔 = 11 − 5 = 6 and 𝜈 = 𝑛 − (𝑘 + 1) = 28 − (11 + 1) = 16. From Table VI, Appendix D, F.05 = 2.74 . The rejection region is 𝐹 > 2.74. Since the observed value of the test statistic falls in the rejection region (𝐹 = 8.96 > 2.74), H0 is rejected. There is sufficient evidence to indicate region and sales volume interact to affect sales price at 𝛼 = .05. d. 12.109 a. b.

Since we rejected H0 in part c, the complete model is preferred. At least one of the interaction terms is significant. The model would be: 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 For traditional hotels: For every one-point increase in guestroom rating, we estimate the mean guest experience rating to increase by .093, holding all other factors constant. For lifestyle hotels: For every one-point increase in guestroom rating, we estimate the mean guest experience rating to increase by .055, holding all other factors constant.

c.

The model would be: 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 𝑥 + 𝛽 𝑥 𝑥 +𝛽 𝑥 𝑥 +𝛽 𝑥 𝑥 +𝛽 𝑥 𝑥 where 𝑥 =

1 𝑖𝑓 𝑡𝑟𝑎𝑑𝑖𝑡𝑖𝑜𝑛𝑎𝑙 ℎ𝑜𝑡𝑒𝑙 0 𝑖𝑓 𝑙𝑖𝑓𝑒𝑠𝑡𝑦𝑙𝑒 ℎ𝑜𝑡𝑒𝑙

d.

For traditional hotels, we plug the value of 𝑥 = 1 in the model and find the slope would be 𝛽 + 𝛽 For lifestyle hotels, we plug the value of 𝑥 = 0 in the model and find the slope would be 𝛽

e.

To determine if the two slopes are the same, we would test: 𝐻 :𝛽 = 0

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building 12.110 a.

693

To determine whether the rate of increase of team performance with time pressure depends on effectiveness of the team leader, we test: H 0 : β 4 = β5 = 0 H a : At least one βi ≠ 0

b.

For fixed time ppressure, to determine whether the mean team performance differs for teams with effective and non-effective team leaders, we test: H 0 : β 3 = β 4 = β5 = 0 H a : At least one βi ≠ 0

12.111 a.

The best one-variable predictor of y is the one whose t statistic has the largest absolute value. The t statistics for each of the variables are:

Independent Variable x1 x2 x3 x4 x5 x6

t=

βˆi sβˆ

i

t = 1.6 / .42 = 3.81 t = −.9 / .01 = −90 t = 3.4 /1.14 = 2.98 t = 2.5 / 2.06 = 1.21 t = −4.4 / .73 = −6.03 t = .3/ .35 = .86

The variable x 2 is the best one-variable predictor of y. The absolute value of the corresponding t score is 90. This is larger than any of the others. b.

Yes. In the stepwise procedure, the first variable entered is the one which has the largest absolute value of t, provided the absolute value of the t falls in the rejection region.

c.

Once x 2 is entered, the next variable that is entered is the one that, in conjunction with x2, has the largest absolute t value associated with it.

12.112 a.

The form of the model in step 1 is 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 . Since there are 8 independent variables, there would be 8 one-variable models fit. The “best” independent variable selected is the variable with the largest absolute value of the t-statistic.

b.

The form of the model in step 2 is 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 . Since there are 7 remaining independent variables, there would be 7 two-variable models fit. The “best” independent variable selected in this step is the variable with the largest absolute value of the t-statistic after the first variable is in the model.

c.

The form of the model in step 3 is 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 . Since there are 6 remaining independent variables, there would be 6 three-variable models fit. The “best” independent variable selected in this step is the variable with the largest absolute value of the t-statistic after the first two variables are in the model.

d.

The next steps should include looking at the possibility of adding 2nd order terms to the model as well as interaction terms among the independent variables selected.

Copyright © 2022 Pearson Education, Inc.


694 Chapter 12 12.113 a.

There would be five 1-variable models fit in step 1.

b.

There would be four 2-variable models fit in step 2.

c.

There would be three 3-variable models fit in step 3.

d.

There would be two 4-variable models fit in step 4.

e.

There would be a total of fourteen t-tests performed. For each individual t-test, P ( T yp e I e rro r ) = .0 5 .  14  0 14 P ( at least Type I error ) = 1 − P ( 0 Type I error ) = 1 −   (.05 ) (.95 ) = 1 − .49 = .51  0

12.114 a.

In step 1, there were five one-variable models fit to the data.

b.

Each of the five models were fit to the data and the one with the best results (lowest p-value) is selected. According to this problem, 𝑥 was selected.

c.

In step 2, there were four two-variable models fit to the data.

d.

Since the p-value for testing the individualized consideration variable is small (p < .01), Ho can be rejected. There is sufficient evidence to indicate that individualized consideration is a positive linear predicter of supply chain management performance. Yes, we agree with their interpretation.

e.

𝑅 = .178. 17.8% of the total sample variation of supply chain management performance about its mean is explained by the model containing individualized consideration. This interpretation is different than the one made by the researcher.

12.115 a.

In step 1, there were 11 one-variable models fit to the data. Thus, there were 11 t-tests run.

b.

In step 2, there were 10 two-variable models fit to the data. Thus, there were 10 t-tests run.

c.

The Global F p-value = .001. Since this p-value is so small, there is evidence that the final model is useful for predicting TME. 𝑅 = .988. 98.8% of the total sample variation of TME about its mean is explained by the model containing AMAP and NDF.

d.

The stepwise procedure does not guarantee that the “best” model has been determined. It is possible that important variables were not located. In addition, 2nd order terms and interaction terms should be considered.

e.

The complete 2nd order model would be: 𝐸(𝑦) = 𝛽 + 𝛽 (𝐴𝑀𝐴𝑃) + 𝛽 (𝑁𝐷𝐹) + 𝛽 (𝐴𝑀𝐴𝑃) + 𝛽 (𝑁𝐷𝐹) + 𝛽 (𝐴𝑀𝐴𝑃)(𝑁𝐷𝐹)

f.

To determine if the terms in the model that allow for curvature are statistically significant, we test: 𝐻 :𝛽 =𝛽 =𝛽 =0 𝐻 : At least 1𝛽 ≠ 0 We would compare the complete model (form part e) to the reduced model with just the main effects of AMAP and NDF using the test statistic F =

( SSER − SSEC) / ( k − g) . SSEC/ n − ( k +1) 

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building 12.116 a.

In step 1, there will be 4 one-variable models fit to the data. Thus, there will be 4 t-tests run.

b.

In step 2, there will be 3 two-variable models fit to the data. Thus, there will be 3 t-tests run.

c.

Using MINITAB, the stepwise regression is:

695

Stepwise Regression: DESIRE versus SELFESTM, BODYSAT, IMPREAL, GENDER Forward selection. Alpha-to-Enter: 0.15 Response is DESIRE on 4 predictors, with N = 170 Step Constant

1 15.54

2 13.62

3 13.32

BODYSAT T-Value P-Value

-0.674 -10.90 0.000

-0.708 -11.83 0.000

-0.451 -4.31 0.000

0.52 4.00 0.000

0.48 3.79 0.000

IMPREAL T-Value P-Value GENDER T-Value P-Value S R-Sq R-Sq(adj) Mallows C-p

-1.91 -2.97 0.003 2.41 41.42 41.07 26.4

2.31 46.55 45.91 11.5

2.26 49.24 48.32 4.7

Using the forward stepwise procedure with 𝛼 = .15, the best subset of variables for predicting one’s desire to have cosmetic surgery is body satisfaction, impression of reality TV, and gender. d. 12.117 a.

I would suggest trying to include some 2nd order terms involving body satisfaction and impression of reality TV and maybe some interaction terms before deciding on the final model. In step 1, all 1 variable models are fit. Thus, there are a total of 12 models fit.

b.

In step 2, all two-variable models are fit, where 1 of the variables is the best one selected in step 1. Thus, a total of 11 two-variable models are fit.

c.

In the 12th step, only one model is fit – the model containing all the independent variables.

d.

The model would be 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 .

e.

The estimated coefficient for the level of development of internet finance variable is positive, so we agree that a significant positive correlation exists between 𝑥 and y. The estimated coefficient for the cost-income ration variable is negative, so we agree that a higher cost-to-income ratio results in lower systemic risk.

f.

Using stepwise regression does not guarantee that the best model will be found. There may be better combinations of the independent variables that are never found, because of the order in which the independent variables are entered into the model. In addition, there are no squared or interaction terms included. There is a high probability of making at least one Type 1 error.

12.118 a.

From the printout, the three variables that should be included in the model are 𝑥 , 𝑥 , and 𝑥 . They are all entered into the model using stepwise regression and all are retained. Copyright © 2022 Pearson Education, Inc.


696 Chapter 12 b.

No. It is possible that other combinations of variables would provide a better fit. There may be other independent variables that were not included.

c.

The model is 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 𝑥 + 𝛽 𝑥 𝑥 + 𝛽 𝑥 𝑥

d.

The EPA would test 𝐻 :𝛽 =𝛽 =𝛽 =0 𝐻 : At least one 𝛽 ≠ 0 The EPA would fit the first-order model and record 𝑆𝑆𝐸 . The EPA would then fit the model with the interaction terms and record 𝑆𝑆𝐸 . The test statistic is F =

e.

2.119

( SSER − SSEC) / ( k − g) . SSEC/ n − ( k +1) 

To improve the model, the EPA could try to find other independent variables that affect y, the log of the number of marine animals present, or higher order terms of the already identified independent variables.

No, we would not suggest using stepwise regression because there are only two independent variables. In order to find the best model, several models could be tested. A first-order model could be fit with the two independent variables, artist death status and album sales. The interaction term between artist death status and album sales could then be added to the model. A second-order model could be fit with album sales and the square of album sales along with artist death status. Finally, the interaction terms between artist death status and album sales and between artist death status and the square of album sales could be added to the second-order model.

12.120 a.

The plot of the residuals reveals a nonrandom pattern. The residuals exhibit a curved shape. Such a pattern usually indicates that curvature to be added to the model.

b.

The plot of the residuals reveals a nonrandom pattern. The residuals versus the predicted values shows a pattern where the range of values of the residuals increases as 𝑦 increases. This indicates that the variance of the random error, 𝜀, becomes larger as the estimate of E(y) increases in value. Since E(y) depends on the x-values in the model, this implies that the variance of 𝜀 is not constant for all settings of the x's.

c.

This plot reveals an outlier, since all or almost all of the residuals should fall within 3 standard deviations of their mean of 0.

d.

This frequency distribution of the residuals is skewed to the right and is not normal. This may be due to outliers or could indicate the need for a transformation of the dependent variable.

12.121 Yes. 𝑥 and 𝑥 are highly correlated (.93), as well as 𝑥 and 𝑥 (.86). When highly correlated independent variables are present in a regression model, the results can be confusing. The researcher may want to include only one of the variables. 12.122 No signs of multicollinearity is detected. The correlations among the independent variables range from -.198 to .201. These correlations are very small and do not indicate a problem with multicollinearity. In addition, the signs of the parameter estimates are what would be expected. 12.123 a. b.

For a basic wooden casket (𝑥 = 1) and funeral home in a restricted state (𝑥 = 1), 𝑦 = 1,432 + 793(1) − 252(1) + 261(1)(1) = 2,234. No, this would not be an outlier. The point $2,200 is less than one standard deviation from its

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building

697

predicted value. Yes. This data point is 𝑧 =

c.

,

,

= 6 standard deviations from its predicted value.

12.124 There are six pairwise correlations between .2 and .8 in absolute value. This indicates moderate multicollinearity. Two of the correlations are .62 in absolute value. One should probably use either x1 (conscientiousness) or x3 (emotional stability) in the model but not both. Also, one should probably use either x4 (organizational citizenship) or x5 (counterproductive work) in the model but not both. 12.125 a.

The number of females in managerial positions is the dependent variable. The correlation between it and the independent variables does not imply multicollinearity.

b.

Again, the number of females in managerial positions is the dependent variable. The correlation between it and the independent variables does not imply multicollinearity.

c.

Since the absolute value of the correlation coefficient is .722, this would imply there is a moderate potential for multicollinearity.

d.

Since the absolute value of the correlation coefficient is .528, this would imply there is a moderate potential for multicollinearity.

12.126 Multicollinearity occurs when there is a relationship between the independent variables in a regression model. In this example, GDP is highly correlated with four other independent variables and moderately correlated with a fifth variable. We believe multicollinearity is present. Due to the multicollinearity present, we would recommend not including GDP in the model.. 12.127 Using MINITAB, the residual plots are: Residual Plots for ARSENIC Normal Probability Plot of the Residuals 99

Percent

90 50 10 1 0.1

-4

-2

0

2

0

0

40

80

120

160

Fitted Value

Histogram of the Residuals

Residuals Versus the Order of the Data Standardized Residual

Frequency

2

4

60 40 20

-0.75

4

Standardized Residual

80

0

Residuals Versus the Fitted Values Standardized Residual

99.9

0.00

0.75

1.50

2.25

3.00

Standardized Residual

3.75

4.50

4

2

0

1

50

100

150

200

Observation Order

Copyright © 2022 Pearson Education, Inc.

250

300


698 Chapter 12

Scatterplot of SRES1 vs LATITUDE, LONGITUDE, DEPTH-FT LATITUDE

LONGITUDE 4

2

SRES1

0

23.76

23.77

23.78

23.79

23.80

90.60

90.62

90.64

90.66

DEPTH-FT 4

2

0

0

50

100

150

200

a.

From the histogram of the standardized residuals, it appears that the mean of the residuals is close to 0. Thus, the assumption that the mean error is 0 appears to be met.

b.

From the plot of the standardized residuals versus the fitted values, it appears that the spread of the residuals increases as the fitted values increase. Thus, it appears that the assumption of constant variance is violated.

c.

From the plots of the standardized residuals versus the fitted values, it appears that there are some outliers. There are several observations with standardized residuals of 4 or more.

d.

From the normal probability plot, the data do not form a straight line. Thus, it appears that the assumption of normal error terms is violated.

e.

Using MINITAB, the correlations among the independent variables are: Correlations: LATITUDE, LONGITUDE, DEPTH-FT LATITUDE LONGITUDE LONGITUDE 0.311 0.000 DEPTH-FT

0.151 0.006

-0.328 0.000

Cell Contents: Pearson correlation P-Value

None of the pairwise correlations are large in absolute value, so there is no evidence of multicollinearity. In addition, the global test indicates that at least one of the independent variables is significant and each of the independent variables is statistically significant. This also indicates that multicollinearity does not exist.

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building 12.128 a.

699

Using MINITAB, the correlations are: Correlations: DESIRE, SELFESTM, BODYSAT, IMPREAL, GENDER DESIRE SELFESTM BODYSAT IMPREAL SELFESTM -0.485 0.000 BODYSAT

-0.644 0.000

0.757 0.000

IMPREAL

0.132 0.087

0.167 0.030

0.143 0.062

GENDER

-0.637 0.000

0.511 0.000

0.828 0.000

0.065 0.398

Cell Contents: Pearson correlation P-Value

The pairwise correlation between self-esteem and body satisfaction is .757 and the pairwise correlation between body satisfaction and gender is .828. Both of these are fairly close to 1 and indicate a rather strong possibility of multicollinearity. However, the signs of the beta coefficients correspond to what they should based on the correlation coefficients of the independent variables and the dependent variable. One might not want to include both variables in the pairs where the correlation is above .7. Using MINITAB, the graphs of the residuals are: Residual Plots for DESIRE Normal Probability Plot of the Residuals

Residuals Versus the Fitted Values Standardized Residual

99.9

Percent

99 90 50 10 1 0.1

-4

-2

0

2

-2 8

10

12

14

16

Fitted Value

Histogram of the Residuals

Residuals Versus the Order of the Data Standardized Residual

10 5

-1.50

0

Standardized Residual

15

0

2

4

20

Frequency

b.

-0.75

0.00

0.75

1.50

Standardized Residual

2.25

2

0

-2 1

20

40

60

80

100

120

140

160

Observation Order

From the normal probability plot and the histogram, the error terms appear to be fairly normally distributed. From the plot of the residuals versus the fitted values, there is no evidence of nonconstant variance for the error terms. From the same plot, all of the standardized residuals have values less than 2.5 in absolute value. Thus, there are no outliers. Since the data were not collected sequentially, the plot of the residuals versus time is meaningless and we cannot check for independence of the error terms.

Copyright © 2022 Pearson Education, Inc.


700 Chapter 12 12.129 a.

Using MINITAB, the results are: Regression Analysis: Time versus Temp The regression equation is Time = 30856 - 192 Temp Predictor Constant Temp

Coef 30856 -191.57

S = 1099.17

SE Coef 2713 18.49

R-Sq = 84.3%

T 11.37 -10.36

P 0.000 0.000

R-Sq(adj) = 83.5%

Analysis of Variance Source Regression Residual Error Total

DF 1 20 21

SS 129663987 24163399 153827386

MS 129663987 1208170

F 107.32

P 0.000

The fitted regression line is 𝑦 = 30,856 − 191.57𝑥. b.

For temperature = 149, 𝑦 = 30,856 − 191.57(150) = 2,312.07. There are 2 observations with a temperature of 149. The residuals for the microchips manufactured at a temperature of 149o C are 𝜀̂ = 𝑦 − 𝑦 = 1,100 − 2,312.07 = −1,212.07 and 𝜀̂ = 𝑦 − 𝑦 = 1,150 − 2,312.07 = −1,162.07.

c.

Using MINITAB, the plot of the residuals versus temperature is: Scatterplot of RESI1 vs Temp 3000

2000

RESI1

1000

0

0

-1000

-2000 120

130

140

150

160

170

Temp

There appears to be a U-shaped trend to the data. d.

Yes. Because there appears to be a U-shaped trend to the data, this indicates that there is a curvilinear relationship between temperature and time.

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building

701

12.130 Using MINITAB, the residual plots are: Residual Plots for Density Normal Probability Plot

Versus Fits Standardized Residual

99

Percent

90 50 10 1

-2

-1

0

1

2 1

0 -1

2

0

100000

Standardized Residual

Frequency

4.8 3.6 2.4 1.2

-1.5

-1.0

-0.5

0.0

300000

Versus Order Standardized Residual

Histogram

0.0

200000

Fitted Value

0.5

1.0

1.5

2

1

0

-1 2

4

6

Standardized Residual

8

10

12

14

16

18

Observation Order

Scatterplot of SRES vs MassFlux, HeatFlux 0.2

MassFlux

2.0

0.4

0.6

0.8

1.0

HeatFlux

1.5

SRES

1.0 0.5 0.0

0

-0.5 -1.0 -1.5 400

600

800

1000

1200

From the normal probability plot, the points fall close to a straight line, indicating the residuals could be normal. The plot of the standardized residuals versus the fitted values indicates the variance appears to be constant and that there are no outliers. The plot of the standardized residuals versus mass flux indicates that the relationship between bubble density and mass flux is linear because there is no U-shape and that there is constant variance because the standardized residuals are evenly spread. The plot of the standardized residuals versus heat flux indicates that the relationship between bubble density and heat flux is linear because there is no U-shape and that there is constant variance because the standardized residuals are evenly spread. The assumptions for the model appear to be met.

Copyright © 2022 Pearson Education, Inc.


702 Chapter 12 12.131 Using MINITAB, the results are: Residual Plots for MVL Normal Probability Plot

Versus Fits Standardized Residual

99.9 99

Percent

90 50 10 1 0.1

-4

0

4

8

8

4

0

-4

6.5

7.0

7.5

Standardized Residual

Histogram Standardized Residual

Frequency

45 30 15

-3.0

-1.5

0.0

1.5

8.5

Versus Order

60

0

8.0

Fitted Value

3.0

4.5

6.0

8

4

0

-4

1

20

40

Standardized Residual

60

80

100

120

140

160

Observation Order

From the normal probability plot, the points do not fall on the straight line, indicating the residuals are not normal. The histogram of the standardized residuals is fairly mound-shaped except for the outlier to the right, again indicating that the data are not normal. The plot of the standardized residuals versus the fitted values indicates the variance is not constant. As the fitted value increases, the spread of the residuals increases. The assumptions for the model do not appear to be met. 12.132 Using MINITAB, the residual plots are: Residual Plots for MVL Normal Probability Plot

Versus Fits

99.9

6

99

4

Residual

Percent

90 50 10

0 -2

1 0.1

2

-2

0

2

4

6

6.5

7.0

7.5

Residual

Histogram

8.5

Versus Order

80

6 4

60

Residual

Frequency

8.0

Fitted Value

40 20

2 0 -2

0

-2

-1

0

1

2

Residual

3

4

5

1

20

40

60

80

100

120

140

160

Observation Order

From the plot of the standardized residuals versus the fitted values, there is a slight increase in the spread of the residuals as the fitted values increase. There is some evidence of non-constant variance. All of the standardized residuals are less than 3 in absolute value, indicating there are no outliers. Looking at the normal probability plot and the histogram of the residuals, there is no evidence that the error terms are not normal. The data were not collected sequentially, so the plot of the residuals versus time is meaningless. To correct the non-constant variance, one might transform the dependent variable. Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building 12.133 a.

The model is 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + ⋯ + 𝛽 𝑥

b.

The model is 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + ⋯ + 𝛽 𝑥 where 𝑥

=

703

+𝛽 𝑥

1 𝑖𝑓 𝑦𝑒𝑎𝑟 2000 − 2006 0 𝑖𝑓 𝑦𝑒𝑎𝑟 2007 − 2009

c.

Since the value of the estimated coefficient was positive, we would estimate the mean prices were higher during the 2000-2006 years than during the 2007-2009 years. This agrees with the Court’s ruling.

d.

The residuals are defined to be the observal value of y minus the predicted value of y. If the actual price exceeds the competitive (predicted) price, the result would be a positive residual.

e.

The expectation would be that half the residuals were positive and half would be negative. We would expect the average residual of an unimpacted customer to be close to 0. We know that random variation in the residuals will exist, so it would be expected that some would be positive and some negative, with an even split between the two.

f.

The statistician should require that the average residuals be significantly different from 0, so as to rule out random chance (false positive results).

g.

It is more likely that the non-positive average residuals were the result of random variablility that produced those results.

h.

We would expect half the customers to have positive averages and half to have negative averages.

12.134 The error of prediction is smallest when the values of x1 , x 2 , and x3 are equal to their sample means. The further x1 , x 2 , and x3are from their means, the larger the error. When x1 = 60 , x2 = .4 , and x3 = 900 , the observed values are outside the observed ranges of the x values. When x1 = 30 , x2 = .6 , and x3 = 1, 300 , the observed values are within the observed ranges and consequently the x values are closer to their means. Thus, when x1 = 30 , x2 = .6 , and x3 = 1, 300 , the error of prediction is smaller. 12.135 In multiple regression, as in simple regression, the confidence interval for the mean value of y is narrower than the prediction interval of a particular value of y. This is because the variance when predicting a particular value of y contains both the variance in locating the mean and the variance of the actual values once the mean has been located. The variance when estimating the mean value is y contains only the variance in locating the mean. 12.136 a.

To determine if at least one of the β parameters is not zero, we test: H 0 : β1 = β 2 = β 3 = β 4 = 0 H a : At least one β i ≠ 0

The test statistic is 𝐹 = (

/ )/[

(

)]

=(

. .

)/[

/ (

)]

= 24.41

Copyright © 2022 Pearson Education, Inc.


704 Chapter 12 The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution withν 1 = k = 4 and ν 2 = n – ( k + 1) = 25 – ( 4 + 1) = 20 . From Table VI, Appendix D, F.05 = 2.87 . The rejection region is 𝐹 > 2.87. Since the observed value of the test statistic falls in the rejection region (𝐹 = 24.41 > 2.87), H0 is rejected. There is sufficient evidence to indicate at least one of the β parameters is nonzero at 𝛼 = .05. b.

H 0 : β1 = 0 H a : β1 < 0

The test statistic is 𝑡 =

.

=

.

= −2.01.

The rejection region requires 𝛼 = .05 in the lower tail of the t-distribution with df = n − ( k + 1) = 25 − ( 4 + 1) = 20 . From Table III, Appendix D, t.025 = 1.725 . The rejection region is 𝑡 < −1.725. Since the observed value of the test statistic falls in the rejection region (𝑡 = −2.01 < −1.725), H0 is rejected. There is sufficient evidence to indicate β 1 is less than 0 at 𝛼 = .05. c.

H 0 : β2 = 0 H a : β2 > 0

The test statistic is 𝑡 =

=

. .

= .31.

The rejection region requires 𝛼 = .05 in the upper tail of the t-distribution. From part b above, the rejection region is 𝑡 > 1.725. Since the observed value of the test statistic does not fall in the rejection region (𝑡 = .31 ≯ 1.725), H0 is not rejected. There is insufficient evidence to indicate β 2 is greater than 0 at 𝛼 = .05. d.

H 0 : β3 = 0 H a : β3 ≠ 0

The test statistic is 𝑡 =

=

. .

= 2.38.

The rejection region requires 𝛼/2 = .05/2 = .025 in each tail of the t-distribution with df = 20 . From Table III, Appendix D, t.025 = 2.086 . The rejection region is 𝑡 < −2.086 or 𝑡 > 2.086. Since the observed value of the test statistic falls in the rejection region (𝑡 = 2.38 > 2.086), H0 is rejected. There is sufficient evidence to indicate β 3 is different from 0 at 𝛼 = .05. 12.137 a. b.

The least squares equation is 𝑦 = 90.1 − 1.836𝑥 + .285𝑥 . 𝑅 = .916. About 91.6% of the sample variability in the y's is explained by the model 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 +𝛽 𝑥 .

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building c.

705

To determine if the model is useful for predicting y, we test: 𝐻 :𝛽 = 𝛽 = 0 𝐻 : At least one 𝛽 ≠ 0 The test statistic is 𝐹 =

=

= 64.91 and the p-value is 𝑝 < .0001.

Since the p-value is less than 𝛼 (𝑝 < .0001 < .05), H0 is rejected. There is sufficient evidence to indicate the model is useful for predicting y at 𝛼 = .05. d.

𝐻 :𝛽 = 0 𝐻 :𝛽 ≠ 0 The test statistic is 𝑡 =

=

. .

= −5.01 and the p-value is 𝑝 < .0001.

Since the p-value is less than 𝛼 (𝑝 < .0001 < .05), H0 is rejected. There is sufficient evidence to indicate 𝛽 is not 0 at 𝛼 = .05. e.

The standard deviation is 𝑠 = 10.68. We would expect about 95% of the observations to fall within 2𝑠 = 2(10.68) = 21.36 units of the fitted regression line.

12.138 From the plot of the residuals for the straight line model, there appears to be a mound shape which implies the quadratic model should be used. 12.139

E ( y ) = β 0 + β 1 x1 + β 2 x 2 + β 3 x 3

 1 if level 2  0 otherwise

 1 if level 3 x2 =   0 otherwise

where x1 =  12.140 a.

E ( y ) = β 0 + β 1 x1 + β 2 x 2 + β 3 x 3

 1 if level 2  0 otherwise

where x2 =  b.

 1 if level 4 x3 =   0 otherwise

 1 if level 3 x3 =   0 otherwise

E ( y ) = β 0 + β 1 x1 + β 2 x12 + β 3 x 2 + β 4 x 3 + β 5 x1 x 2 + β 6 x1 x 3 + β 7 x12 x 2 + β 8 x12 x 3

where x1 , x 2 , and x3 are as in part a. 12.141 The stepwise regression method is used to try to find the best model to describe a process. It is a screening procedure that tries to select a small subset of independent variables from a large set of independent variables that will adequately predict the dependent variable. This method is useful in that it can eliminate some unimportant independent variables from consideration. 12.142 a. b.

E ( y ) = β 0 + β 1 x1 + β 2 x 2

E ( y ) = β 0 + β 1 x1 + β 2 x12 + β 3 x 2 + β 4 x 22 + β 5 x1 x 2

12.143 Even though SSE = 0 , we cannot estimate σ 2 because there are no degrees of freedom corresponding to error. With three data points, there are only two degrees of freedom available. The degrees of freedom corresponding to the model is k = 2 and the degrees of freedom corresponding to error is

Copyright © 2022 Pearson Education, Inc.


706 Chapter 12 d f = n − ( k + 1 ) = 3 − ( 2 + 1 ) = 0 . Without an estimate for σ , no inferences can be made. 2

12.144 a. b.

H a : At least one of β 4 and β 5 ≠ 0

The regression model 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 𝑥 + 𝛽 𝑥 𝑥 is fit to the 35 data points, yielding a sum of squares for error, denoted SSEC. The regression model 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 is also fit to the data and its sum of squares for error is obtained, denoted SSER. − / k − g) Then the test statistic is F = ( SSE R SSE C ) ( , where 𝑘 = 5, 𝑔 = 3, and 𝑛 = 35. SSE C /  n − (( k + 1) 

c.

The numerator degrees of freedom is 𝑘 − 𝑔 = 5 − 3 = 2, and the denominator degrees of freedom is 𝑛– (𝑘 + 1) = 35– (5 + 1) = 29.

d.

The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution with numerator df = 2 and denominator df = 29 . From Table VI, Appendix D, F.05 = 3.33 . The rejection region is 𝐹 > 3.33.

12.145 a.

A confidence interval for the difference of two population means, ( μ 1 − μ 2 ) , could be used. Since both sample sizes are over 30, the large sample confidence interval is used (with independent samples).  1 if public college  0 otherwise

The model is E ( y ) = β 0 + β 1 x1 .

b.

Let x = 

c.

β 1 is the difference between the two population means. A point estimate for β 1 is βˆ1 . A confidence interval for β 1 could be used to estimate the difference in the two population means.

12.146 a.

1. 2. 3. 4. 5.

b.

c.

The "quantitative GMAT score" is measured on a numerical scale, so it is a quantitative variable. The "verbal GMAT score" is measured on a numerical scale, so it is a quantitative variable. The "undergraduate GPA" is measured on a numerical scale, so it is a quantitative variable. The "first-year graduate GPA" is measured on a numerical scale, so it is a quantitative variable. The "student cohort" has 3 categories, so it is a qualitative variable. Note that the numerical scale is meaningless in this situation. (It is possible to consider this as a quantitative variable. However, for this problem we will consider it as qualitative.)

The quantitative variables quantitative GMAT score, verbal GMAT score, undergraduate GPA, and first-year graduate GPA should all be positively correlated to final GPA. 1 if student entered doctoral program in year 3 x5 =   0 otherwise 1 if student entered doctoral program in year 5 x6 =   0 otherwise

d.

E ( y ) = β 0 + β 1 x1 + β 2 x 2 + β 3 x 3 + β 4 x 4 + β 5 x 5 + β 6 x 6

e.

β 0 = the y-intercept for students entering in year 1.

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building

707

β1 : The mean final GPA will increase by β 1 for each additional point increase in quantitative GMAT score, holding the remaining variables constant. β 2 : The mean final GPA will increase by β 2 for each additional point increase in verbal GMAT score, holding the remaining variables constant. β 3 : The mean final GPA will increase by β 3 for each additional point increase in undergraduate GPA, holding the remaining variables constant. β 4 : The mean final GPA will increase by β 4 for each additional point increase in first-year graduate GPA, holding the remaining variables constant. β 5 = difference in mean final GPA between student cohort year 3 and year 1. β 6 = difference in mean final GPA between student cohort year 3 and year 1. f.

E ( y ) = β 0 + β1 x1 + β 2 x 2 + β 3 x 3 + β 4 x 4 + β 5 x 5 + β 6 x 6 + β 7 x1 x 5 + β 8 x1 x 6

g.

For the year 1 cohort, x5 = x 6 = 0 . The model is:

+ β 9 x 2 x5 + β10 x 2 x 6 + β11 x3 x 5 + β12 x 3 x 6 + β13 x 4 x5 + β14 x 4 x 6

E ( y ) = β0 + β1 x1 + β2 x2 + β3 x3 + β4 x4 + β5 ( 0) + β6 ( 0) + β7 x1 ( 0) + β8 x1 ( 0)

+ β9 x2 ( 0) + β10 x2 ( 0) + β11 x3 ( 0) + β12 x3 ( 0) + β13 x4 ( 0) + β14 x4 ( 0)

= β0 + β1 x1 + β2 x2 + β3 x3 + β4 x4 The slopes for the four variables are β 1 , β 2 , β 3 , and β 4 respectively. 12.147 a.

The type of juice extractor is qualitative. The size of the orange is quantitative.

b.

The model is E ( y ) = β 0 + β 1 x1 + β 2 x 2 where x1 = diameter of orange and x 2 = 

c.

To allow the lines to differ, the interaction term is added: E ( y ) = β 0 + β 1 x1 + β 2 x 2 + β 3 x1 x 2

d.

For part b:

1 if Brand B

 0 if not

For part c:

e.

To determine whether the model in part c provides more information for predicting yield than does Copyright © 2022 Pearson Education, Inc.


708 Chapter 12 the model in part b, we test: H 0 : β3 = 0 H a : β3 ≠ 0

f.

The test statistic would be 𝐹 =

(

)/( /[

(

) )]

.

To compute SSER: The model in part b is fit and SSER is the sum of squares for error. To compute SSEC: The model in part c is fit and SSEC is the sum of squares for error. k − g = 3 − 2 = 1 = number of parameters in H0 n – ( k + 1) = degrees of freedom for error in the complete model

12.148 a. b.

The least squares prediction equation is: yˆ = 1.81231 + 0.10875 x1 + 0.00017 x2 𝛽 = 1.81231. Since x1 = 0 and x2 = 0 are not in the observed range, 𝛽 has no meaning. 𝛽 = 0.10875. For each additional mile of roadway length, the mean number of crashes per three years is estimated to increase by .10875 when average annual daily traffic is held constant. 𝛽 = 0.00017. For each additional unit increase in average annual daily traffic, the mean number of crashes per three years is estimated to increase by .00017 when miles of roadway length is held constant.

c.

For confidence coefficient .99, α = .01 and α / 2 = .01 / 2 = .005 . From Table III, Appendix D, with d f = n − ( k + 1 ) = 1 0 0 − ( 2 + 1 ) = 9 7 , t.005 = 2.63 . The 99% confidence interval is: 𝛽 ± 𝑡.

𝑠

⇒ 0.10875 ± 2.63(0.03166) ⇒ 0.10875 ± 0.08327 ⇒ (0.02548, 0.19202)

We are 99% confident that the increase in the mean number of crashes per three years will be between 0.02548 and 0.19202 for each additional mile of roadway length, holding average annual daily traffic constant. d.

The 99% confidence interval is: 𝛽 ± 𝑡.

𝑠

⇒ 0.00017 ± 2.63(0.00003) ⇒ 0.00017 ± 0.00008 ⇒ (0.00009, 0.00025)

We are 99% confident that the increase in the mean number of crashes per three years will be between 0.00009 and 0.00025 for each additional unit increase in average annual daily traffic, holding mile of roadway length constant. e.

The least squares prediction equation is: 𝑦 = 1.20785 + 0.06343𝑥 + 0.00056𝑥 𝛽 = 1.20785. Since x1 = 0 and x2 = 0 are not in the observed range, 𝛽 has no meaning. 𝛽 = 0.06343. For each additional mile of roadway length, the mean number of crashes per three years is estimated to increase by 0.06343 when average annual daily traffic is held constant. 𝛽 = 0.00056. For each additional unit increase in average annual daily traffic, the mean number of crashes per three years is estimated to increase by 0.00056 when miles of roadway length is held

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building constant. The 99% confidence interval is: 𝛽 ± 𝑡.

𝑠

⇒ 0.06343 ± 2.63(0.01809) ⇒ 0.06343 ± 0.04758 ⇒ (0.01585, 0.11101)

We are 99% confident that the increase in the mean number of crashes per three years will be between 0.01585 and 0.11101 for each additional mile of roadway length, holding average annual daily traffic constant. The 99% confidence interval is: 𝛽 ± 𝑡.

𝑠

⇒ 0.00056 ± 2.63(0.00012) ⇒ 0.00056 ± 0.00032 ⇒ (0.00024, 0.00088)

We are 99% confident that the increase in the mean number of crashes per three years will be between 0.00024 and 0.00088 for each additional unit increase in average annual daily traffic, holding mile of roadway length constant. f.

The 1st-order model would be E ( y ) = β 0 + β1 x1 + β 2 x2 + β 3 x3 + β 4 x1 x3 + β 5 x2 x3 where x3 = 

1 if segment is Interstate  0 otherwise

12.149 a.

1 if grape-picking method is manual Let x1 =   0 otherwise

1 if soil type is clay Let x 2 =   0 otherwise

1 if soil type is gravel Let x3 =   0 otherwise

1 if slope orientation is East Let x4 =   0 otherwise

1 if slope orientation is South Let x5 =   0 otherwise

1 if slope orientation is West Let x6 =   0 otherwise

1 if slope orientation is Southeast Let x7 =   0 otherwise

b.

The model is: E ( y ) = β 0 + β 1 x1 β 0 = mean wine quality for grape-picking method automated β1 = difference in mean wine quality between grape-picking methods manual and automated

c.

The model is: E ( y ) = β 0 + β 1 x 2 + β 2 x 3 β 0 = mean wine quality for soil type sand β1 = difference in mean wine quality between soil types clay and sand β 2 = difference in mean wine quality between soil types gravel and sand

d.

The model is: E ( y ) = β 0 + β 1 x 4 + β 2 x 5 + β 3 x 6 + β 4 x 7 β 0 = mean wine quality for slope orientation Southwest β1 = difference in mean wine quality between slope orientations East and Southwest β 2 = difference in mean wine quality between slope orientations South and Southwest β 3 = difference in mean wine quality between slope orientations West and Southwest

Copyright © 2022 Pearson Education, Inc.

709


710 Chapter 12

β 4 = difference in mean wine quality between slope orientations Southeast and Southwest 12.150 a.

To determine if the overall model is useful for predicting y, we test: 𝐻 :𝛽 = 𝛽 = 𝛽 = 0 𝐻 :At least one 𝛽 ≠ 0 The test statistic is 𝐹 = 226.35 and the p-value is 𝑝 < .001. Since the p-value is less than 𝛼 (𝑝 < .001 < .05), Ho is rejected. There is sufficient evidence to indicate the overall model is useful for predicting y, willingness of the consumer to shop at a retailer’s store in the future at 𝛼 = .05.

b.

To determine if consumer satisfaction and retailer interest interact to affect willingness to shop at a retailer’s shop in future, we test: 𝐻 :𝛽 = 0 𝐻 :𝛽 ≠ 0 The test statistic is 𝑡 = −3.09 and the p-value is 𝑝 < .01. Since the p-value is less than 𝛼 (𝑝 < .01 < .05), H0 is rejected. There is sufficient evidence to indicate consumer satisfaction and retailer interest interact to affect willingness to shop at a retailer’s shop in future at 𝛼 = .05.

c.

When 𝑥 = 1, 𝑦 = 𝛽 + .426𝑥 + .044𝑥 − .157𝑥 𝑥 = 𝛽 + .426𝑥 + .044(1) − .157𝑥 (1) = 𝛽 + .044 + (. 426 − .157)𝑥 = 𝛽 + .044 + .269𝑥 Since no value is given for 𝛽 , we will use 𝛽 = 1 for graphing purposes. Using MINITAB, a graph might look like: Scatterplot of YHAT vs X1 when X2=1 3.0

YHAT

2.5

2.0

1.5

1

2

3

4

5

6

7

X1

d.

When 𝑥 = 7, 𝑦 = 𝛽 + .426𝑥 + .044𝑥 − .157𝑥 𝑥 = 𝛽 + .426𝑥 + .044(7) − .157𝑥 (7) = 𝛽 + .308 + (. 426 − 1.099)𝑥 = 𝛽 + .308 − .673𝑥 Since no value is given for 𝛽 , we will again use 𝛽 = 1 for graphing purposes.

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building Using MINITAB, a graph might look like: Scatterplot of YHAT vs X1 when X2=7

0

YHAT

-1

-2

-3

-4 1

2

3

4

5

6

7

X1

Using MINITAB, both plots on the same graph would be: Scatterplot of YHAT vs X1 X2 1 7

3 2 1

YHAT

e.

0 -1 -2 -3 -4 1

2

3

4

5

6

7

X1

Since the lines are not parallel, it indicates that interaction is present.

Copyright © 2022 Pearson Education, Inc.

711


712 Chapter 12 12.151 a.

𝛽 = −.0304. Since x1 = 0 and x2 = 0 would not be in the observed range, this is simply the yintercept. 𝛽 = 2.006. For each unit increase in the proportion of block with low-density residential areas, the mean population density is estimated to increase by 2.006, holding proportion of block with highdensity residential areas constant. Since x1 is a proportion, it is unlikely that it can increase by one unit. A better interpretation is: For each increase of .1 in the proportion of block with low-density residential areas, the mean population density is estimated to increase by .2006, holding proportion of block with high-density residential areas constant. 𝛽 = 5.006. For each unit increase in the proportion of block with high-density residential areas, the mean population density is estimated to increase by 5.006, holding proportion of block with lowdensity residential areas constant. Since x 2 is a proportion, it is unlikely that it can increase by one unit. A better interpretation is: For each increase of .1 in the proportion of block with high-density residential areas, the mean population density is estimated to increase by .5006, holding proportion of block with low-density residential areas constant.

b.

R 2 = .686 . 68.6% of the total sample variation of the population densities is explained by the linear relationship between population density and the independent variables proportion of block with lowdensity residential areas and the proportion of block with high-density residential areas.

c.

To determine if the overall model is adequate, we test: H 0 : β1 = β 2 = 0 H a : At least one β i ≠ 0 /

.

/

d.

The test statistic is 𝐹 = (

e.

The rejection region requires 𝛼 = .01 in the upper tail of the F-distribution with 𝜈 = 𝑘 = 2 and 𝜈 = 𝑛– (𝑘 + 1) = 125– (2 + 1) = 122. From Table VIII, Appendix D, F.01 ≈ 4.79 . The rejection region is 𝐹 > 4.79.

)/[

(

)]

=(

.

)/[

(

)]

= 133.27.

Since the observed value of the test statistic falls in the rejection region (𝐹 = 133.27 > 4.79), H0 is rejected. There is sufficient evidence to indicate the model is adequate at 𝛼 = .01. Using MINITAB, a scattergram of the data is: Scatterplot of y vs x 37.5

35.0

32.5

y

12.152 a.

30.0

27.5

25.0 30

31

32

33

34

35

36

x

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building b.

12.153 a.

713

If information were available only for x = 30, 31, 32, and 33 , we would suggest a first-order model where β1 > 0 . If information were available only for x = 33, 34, 35, and 36 , we would again suggest a first-order model where β1 < 0 . If all the information was available, we would suggest a secondorder model. To determine if the model is adequate, we test: H 0 : β1 = β 2 =  = β12 = 0 H a : At least one β i ≠ 0

The test statistic is 𝐹 = 26.9. Using MINITAB with 𝜈 = 𝑘 = 12 and 𝜈 = 𝑛– (𝑘 + 1) = 148– (12 + 1) = 135, Cumulative Distribution Function F distribution with 12 DF in numerator and 135 DF in denominator x 26.9

P( X <= x ) 1

The p-value associated with 𝐹 = 26.9 is p = 1 − 1 = 0 . Since the p-value is so small, H0 is rejected. There is sufficient evidence to indicate the model is adequate for any reasonable value of 𝛼. 𝑅 = .705. 70.5% of the total variation of the natural logarithm of card prices is explained by the model with the 12 variables in the model. 𝑅 = .681. 68.1% of the total variation of the natural logarithm of card prices is explained by the model with the 12 variables in the model, adjusting for the sample size and the number of variables in the model. Since these 𝑅 values are fairly large, it indicates that the model is pretty good. b.

To determine if race contributes to the price, we test: H 0 : β1 = 0 H a : β1 ≠ 0

The test statistic is 𝑡 = −1.014 and the p-value is p = .312 . Since the p-value is so large, H0 is not rejected. There is insufficient evidence to indicate race has an impact on the value of professional football player’s rookie cards for any reasonable value of α , holding the other variables constant. c.

To determine if card vintage contributes to the price, we test: H 0 : β3 = 0 H a : β3 ≠ 0

The test statistic is 𝑡 = −10.92 and the p-value is 𝑝 = .000. Since the p-value is so small, H0 is rejected. There is sufficient evidence to indicate card vintage has an impact on the value of professional football player’s rookie cards for any reasonable value of α , holding the other variables constant.

Copyright © 2022 Pearson Education, Inc.


714 Chapter 12 d.

The first order model is: E ( y ) = β 0 + β1 x3 + β 2 x5 + β 3 x 6 + β 4 x 7 + β 5 x8 + β 6 x9 + β 7 x10 + β 8 x11 + β 9 x12 + β10 x5 x3 + β11 x 6 x3 + β12 x 7 x3 + β13 x8 x3 + β14 x9 x3 + β15 x10 x3 + β16 x11 x3 + β17 x12 x3

12.154 a.

In the first step, there are 8 one-variable models fit to the data.

b.

The “best” one-variable model is the model that contains the one variable with the largest absolute value of the t-statistic. This would also correspond to the one variable with the smallest p-value.

c.

In step 2, there would be 7 two-variable models fit to the data.

d.

𝛽 = −.28. The mean relative error for developers is estimated to be .28 lower than the mean relative error for project leaders, holding previous accuracy constant. 𝛽 = .27. The mean relative error for previous accuracy more than 20% is estimated to be .27 higher than the mean relative error for previous accuracy less than 20%, holding company role of estimator constant.

e.

There are a couple of reasons for being wary of using this model as the final model. First, in stepwise regression, once a variable is in the model, it cannot be dropped. The best one variable model might contain 𝑥 , but the best model may contain the variables 𝑥 and 𝑥 . By including 𝑥 in the model, we may never get to the best model. Another reason to be wary is that we have not considered any 2nd order terms in the model or any interactions. These higher order terms might be very important in the model.

f.

It is possible that company role of estimator and previous accuracy could be correlated with each other. This could indicate multicollinearity may be present.

12.155 a.

In Step 1, all one-variable models are fit to the data. These models are of the form: 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 Since there are 7 independent variables, 7 models are fit. (Note: There are actually only 6 independent variables. One of the qualitative variables has three levels and thus two dummy variables. Some statistical packages will allow one to bunch these two variables together so that they are either both in or both out. In this answer, we are assuming that each 𝑥 stands by itself.)

b.

In Step 2, all two-variable models are fit to the data, where the variable selected in Step 1, say 𝑥 , is one of the variables. These models are of the form: 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 Since there are 6 independent variables remaining, 6 models are fit.

c.

In Step 3, all three-variable models are fit to the data, where the variables selected in Step 2, say 𝑥 and 𝑥 , are two of the variables. These models are of the form: 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 Since there are 5 independent variables remaining, 5 models are fit.

d.

The procedure stops adding independent variables when none of the remaining variables, when added to the model, have a p-value less than some predetermined value. This predetermined value is usually 𝛼 = .05.

e.

Two major drawbacks to using the final stepwise model as the "best" model are: (1)

An extremely large number of single 𝛽 parameter t-tests have been conducted. Thus, the probability is very high that one or more errors have been made in including or excluding Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building

715

variables. (2)

Often the variables selected to be included in a stepwise regression do not include the highorder terms. Consequently, we may have initially omitted several important terms from the model.

12.156 CEO income (𝑥 ) and stock percentage (𝑥 ) are said to interact if the effect of one variable, say CEO income, on the dependent variable profit (y) depends on the level of the second variable, stock percentage. 12.157 a.

To determine if the model is useful, we test: 𝐻 :𝛽 =𝛽 =𝛽 =𝛽 =0 𝐻 : At least one 𝛽 ≠ 0 From the problem, the test statistic is 𝐹 = 4.74 and the p-value is 𝑝 < .01. Since the p-value is less than 𝛼 (𝑝 < .01 < .05), H0 is rejected. There is sufficient evidence to indicate the model is useful for predicting accountant’s Mach scores at 𝛼 = .05.

b.

𝑅 = .13. 13% of the total sample variation of the accountant’s Mach scores around their means is explained by the model containing age, gender, education, and income.

c.

To determine if income is a useful predictor of Mach score, we test: 𝐻 :𝛽 =0 𝐻 :𝛽 ≠0 From the printout, 𝑡 = 0.52 and the p-value is 𝑝 > .10. Since the p-value is not less than 𝛼 (𝑝 > .10 ≮ . 05), H0 is not rejected. There is insufficient evidence to indicate that income is a useful predictor of Mach score adjusted for age, gender, and education at 𝛼 = .05.

12.158 a.

The MINITAB output is: Regression Analysis: Sales versus x1, x2, x1x2 The regression equation is Sales = 1333 - 0.151 x1 - 2.63 x2 + 0.0520 x1x2 Predictor Constant x1 x2 x1x2

Coef 1333.2 -0.1512 -2.625 0.051954

S = 188.551

SE Coef 291.0 0.3786 5.346 0.006864

R-Sq = 97.8%

T 4.58 -0.40 -0.49 7.57

P 0.000 0.695 0.630 0.000

R-Sq(adj) = 97.4%

Analysis of Variance Source Regression Residual Error Total Source x1 x2 x1x2

DF 1 1 1

DF 3 16 19

SS 25784705 568826 26353531

MS 8594902 35552

F 241.76

P 0.000

Seq SS 15489879 8257939 2036887

The fitted model is yˆ = 1333.2 − .1512x1 − 2.625x2 + .051954x1 x2 . Copyright © 2022 Pearson Education, Inc.


716 Chapter 12 b.

To determine if the overall model is useful, we test: H 0 : β1 = β 2 = β 3 = 0 H a : At least one β i ≠ 0

The test statistic is 𝐹 =

=

,

, ,

= 241.76 and the p-value is p = .000 .

Since the p-value is less than 𝛼 (𝑝 = .000 < .05), H0 is rejected. There is sufficient evidence to indicate the model is useful at 𝛼 = .05. c.

To determine if the interaction is present, we test: H 0 : β3 = 0 H a : β3 ≠ 0

The test statistic is 𝑡 =

= 7.57 and the p-value is p = .000 .

Since the p-value is less than 𝛼 (𝑝 = .000 < .05), H0 is rejected. There is sufficient evidence to indicate the interaction between advertising expenditure and shelf space is present at 𝛼 = .05. d.

Advertising expenditure and shelf space are said to interact if the affect of advertising expenditure on sales is different at different levels of shelf space.

e.

If a first-order model was used, the effect of advertising expenditure on sales would be the same regardless of the amount of shelf space. If interaction really exists, the effect of advertising expenditure on sales would depend on which level of shelf space was present.

f.

Since the data were collected sequentially, it is fairly unlikely that the error terms are independent.

12.159 a.

Not necessarily. If Nickel was highly correlated with several other variables, then it might be better to keep Nickel and drop some of the other highly correlated variables.

b.

Using stepwise regression is a good start for selecting the best set of predictor variables. However, one should use caution when looking at the model selected using stepwise regression. Sometimes important variables are not selected to be entered into the model. Also, many t-tests have been run, thus inflating the Type I and Type II error rates. One must also consider using higher order terms in the model and interaction terms.

c.

No, further exploration should be used. One should consider using higher order terms for the variables (i.e. squared terms) and also interaction terms.

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building 12.160 a.

717

Using MINITAB, a sketch of the least squares prediction equation is: Scatterplot of yhat vs Dose 12 10

yhat

8 6 4 2 0 0

100

200

300

400

500

600

700

800

Dose

b.

For 𝑥 = 500, 𝑦 = 10.25 + .0053(500) − .0000266(500 ) = 10.25 + 2.65 − 6.65 = 6.25

c.

For 𝑥 = 0, 𝑦 = 10.25 + .0053(0) − .0000266(0 ) = 10.25

d.

For 𝑥 = 100, 𝑦 = 10.25 + .0053(100) − .0000266(100 ) = 10.25 + .53 − .266 = 10.514 This value is slightly larger than that for the control group (10.25). For 𝑥 = 200, 𝑦 = 10.25 + .0053(200) − .0000266(200 ) = 10.25 + 1.06 − 1.064 = 10.246 This value is slightly smaller than that for the control group (10.25). So, the largest value of x which yields an estimated weight change that is closest to, but just less than the estimated weight change for the control group is 𝑥 = 200.

12.161 a.

Using MINITAB, the scattergram is: Scatterplot of PeakHour vs Hour24 2200

2150

PeakHour

2100

2050

2000

1950

1900 16000

18000

20000

22000

24000

26000

Hour24

b.

Let x 2 = 

1 if I-35W

 0 if not

The complete second-order model would be E ( y ) = β 0 + β 1 x1 + β 2 x1 2 + β 3 x 2 + β 4 x1 x 2 + β 5 x1 2 x 2

Copyright © 2022 Pearson Education, Inc.


718 Chapter 12 c.

Using MINITAB, the printout is: Regression Analysis: PeakHour versus x1, x1-sq, x2, x1x2, x1-sqx2 The regression equation is PeakHour = 776 + 0.104 x1 - 0.000002 x1-sq + 232 x2 - 0.0091 x1x2 + 0.000000 x1-sqx2 Predictor Constant x1 x1-sq x2 x1x2 x1-sqx2

Coef 776.4 0.10418 -0.00000223 232 -0.00914 0.00000027

S = 15.5829

SE Coef 144.5 0.01388 0.00000033 1094 0.09829 0.00000220

R-Sq = 97.2%

T 5.37 7.50 -6.73 0.21 -0.09 0.12

P 0.000 0.000 0.000 0.833 0.926 0.903

R-Sq(adj) = 97.0%

Analysis of Variance Source Regression Residual Error Total Source x1 x1-sq x2 x1x2 x1-sqx2

DF 1 1 1 1 1

DF 5 66 71

SS 555741 16027 571767

MS 111148 243

F 457.73

P 0.000

Seq SS 254676 21495 279383 183 4

The fitted model is yˆ = 776 + .104 x1 − .000002 x12 + 232 x 2 − .0091 x1 x2 + .00000027 x12 x 2 . To determine if the curvilinear relationship is different at the two locations, we test: H 0 : β3 = β 4 = β5 = 0 H a : At least one β i ≠ 0

In order to test this hypothesis, we must fit the reduced model E ( y ) = β 0 + β 1 x1 + β 2 x1 2 . Using MINITAB, the printout from fitting the reduced model is: Regression Analysis: PeakHour versus x1, x1-sq The regression equation is PeakHour = 197 + 0.149 x1 - 0.000003 x1-sq Predictor Constant x1 x1-sq

Coef 197.5 0.14921 -0.00000295

S = 65.4523

SE Coef 578.9 0.05551 0.00000132

R-Sq = 48.3%

T 0.34 2.69 -2.24

P 0.734 0.009 0.028

R-Sq(adj) = 46.8%

Analysis of Variance Source Regression Residual Error Total

DF 2 69 71

SS 276171 295597 571767

MS 138085 4284

F 32.23

P 0.000

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building Source x1 x1-sq

DF 1 1

719

Seq SS 254676 21495

The fitted regression line is 𝑦 = 197 + .149𝑥 − .000003𝑥 . To determine if the curvilinear relationship is different at the two locations, we test: H 0 : β3 = β 4 = β5 = 0 H a : At least one β i ≠ 0

The test statistic is 𝐹 =

(

)/( /[

(

) )]

=

(

,

)/(

, ,

(

/[

) )]

= 383.76.

Since no α was given we will use 𝛼 = .05. The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution with 𝜈 = 𝑘 − 𝑔 = 5 − 2 = 3 and𝜈 = 𝑛– (𝑘 + 1) = 72– (5 + 1) = 66. From Table VI, Appendix D, F.05 ≈ 2.76 . The rejection region is 𝐹 > 2.76. Since the observed value of the test statistic falls in the rejection region (𝐹 = 383.76 > 2.76), H0 is rejected. There is sufficient evidence to indicate the curvilinear relationship is different at the two locations at 𝛼 = .05. Using MINITAB, the residual plots are: Residual Plots for PeakHour Normal Probability Plot of the Residuals

Residuals Versus the Fitted Values Standardized Residual

99.9

Percent

99 90 50 10 1 0.1

-4

-2

0

2

2 1 0 -1 -2

4

1900

2000

Standardized Residual

Histogram of the Residuals Standardized Residual

8 4

-2.4

-1.6

-0.8

0.0

Standardized Residual

2200

Residuals Versus the Order of the Data

12

0

2100

Fitted Value

16

Frequency

d.

0.8

1.6

2 1 0 -1 -2 1

5

10 15

20 25 30

35 40 45

50 55 60

65 70

Observation Order

From the plot of the standardized residuals versus the fitted value, we notice that there is only one point more than 2 standard deviations from the mean and no points that are more than 3 standard deviations from the mean. Thus, there do not appear to be any outliers. There is no curve to the residuals, so we have the appropriate model. In addition, there is no cone shape to the plot, so it appears that there is no problem with constant variance. The normal probability plot looks like a fairly straight line, so it appears that the assumption of normality is valid. Also, the histogram of the residuals is somewhat mound shaped.

Copyright © 2022 Pearson Education, Inc.


720 Chapter 12 12.162 a.

The model would be 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 .

b.

The model including the interaction terms is: 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 𝑥 + 𝛽 𝑥 𝑥

c.

For AL, 𝑥 = 𝑥 = 0. The model would be: 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 (0) + 𝛽 (0) + 𝛽 𝑥 (0) + 𝛽 𝑥 (0) = 𝛽 + 𝛽 𝑥 The slope of the line is 𝛽 . For TDS-3A, 𝑥 = 1and 𝑥 = 0. The model would be: 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 (1) + 𝛽 (0) + 𝛽 𝑥 (1) + 𝛽 𝑥 (0) = (𝛽 + 𝛽 ) + (𝛽 + 𝛽 )𝑥 The slope of the line is 𝛽 + 𝛽 . For FE, 𝑥 = 0 and 𝑥 = 1. The model would be: 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 (0) + 𝛽 (1) + 𝛽 𝑥 (0) + 𝛽 𝑥 (1) = (𝛽 + 𝛽 ) + (𝛽 + 𝛽 )𝑥 The slope of the line is 𝛽 + 𝛽 .

d.

To test for the presence of temperature-waste type interaction, we would fit the complete model listed in part b and the reduced model found in part a. The hypotheses would be: 𝐻 : 𝛽 =𝛽 =0 𝐻 : At least one 𝛽 ≠ 0 − / k − g) The test statistic would be F = ( SSE R SSE C ) ( where 𝑘 = 5, 𝑞 = 3, 𝑆𝑆𝐸 is the 𝑆𝑆𝐸 for the SSE C /  n − ( k + 1) 

reduced model, and 𝑆𝑆𝐸 is the 𝑆𝑆𝐸 for the complete model. 12.163 a.

The model is 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 where y = market share  1 if M x1 =   0 otherwise

 1 if H x2 =   0 otherwise

 1 if VH x3 =   0 otherwise

We assume that the error terms (𝜀 ) or y's are normally distributed at each exposure level, with a common variance. Also, we assume the 𝜀 ′𝑠 have a mean of 0 and are independent. b.

No interaction terms were included because we have only one independent variable, exposure level. Even though we have 3 xi's in the model, they are dummy variables and correspond to different levels of the one independent variable.

c.

Using MINITAB, the output is: Regression Analysis: y versus x1, x2, x3 The regression equation is y = 10.2 + 0.683 x1 + 2.02 x2 + 0.500 x3 Predictor Constant x1 x2 x3 S = 0.265518

Coef 10.2333 0.6833 2.0167 0.5000

SE Coef 0.1084 0.1533 0.1533 0.1533

R-Sq = 90.4%

T 94.41 4.46 13.16 3.26

P 0.000 0.000 0.000 0.004

R-Sq(adj) = 89.0%

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building

721

Analysis of Variance Source Regression Residual Error Total Source x1 x2 x3

d.

DF 3 20 23

DF 1 1 1

SS 13.3433 1.4100 14.7533

MS 4.4478 0.0705

F 63.09

P 0.000

Seq SS 0.1089 12.4844 0.7500

The fitted model is yˆ = 10.2 + .683x1 + 2.02x2 + .5x3 . To determine if the firm's expected market share differs for different levels of advertising exposure, we test: H 0 : β1 = β 2 = β 3 = 0 H a : At least one β i ≠ 0

The test statistic is 𝐹 = 63.09 and the p-value is p = .000 . Since the p-value is less than 𝛼 (𝑝 = .000 < .05), H0 is rejected. There is sufficient evidence to indicate the firm's expected market share differs for different levels of advertising exposure at 𝛼 = .05. 12.164 a. b.

The 1st-order model is 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 . Using MINITAB, the results are: Regression Analysis: HEATRATE versus RPM, INLET-TEMP, ... The regression equation is HEATRATE = 13614 + 0.0888 RPM - 9.20 INLET-TEMP + 14.4 EXH-TEMP + 0.4 CPRATIO - 0.848 AIRFLOW Predictor Constant RPM INLET-TEMP EXH-TEMP CPRATIO AIRFLOW

Coef 13614.5 0.08879 -9.201 14.394 0.35 -0.8480

S = 458.828

SE Coef 870.0 0.01391 1.499 3.461 29.56 0.4421

R-Sq = 92.4%

T 15.65 6.38 -6.14 4.16 0.01 -1.92

P 0.000 0.000 0.000 0.000 0.991 0.060

R-Sq(adj) = 91.7%

Analysis of Variance Source Regression Residual Error Total

DF 5 61 66

Source RPM INLET-TEMP EXH-TEMP CPRATIO AIRFLOW

Seq SS 119598530 26893467 7784225 4623 774427

DF 1 1 1 1 1

SS 155055273 12841935 167897208

MS 31011055 210524

F 147.30

The least squares prediction equation is:

Copyright © 2022 Pearson Education, Inc.

P 0.000


722 Chapter 12 𝑦 = 13,614.5 + 0.08879𝑥 − 9.201𝑥 + 14.394𝑥 + 0.35𝑥 − 0.848𝑥 c.

𝛽 = 13,614.5. Since 0 is not within the range of all the independent variables, this value has no meaning. 𝛽 = 0.08879. For each unit increase in RPM, the mean heat rate is estimated to increase by .08879, holding all the other 4 variables constant. 𝛽 = −9.201. For each unit increase in inlet temperature, the mean heat rate is estimated to decrease by 9.201, holding all the other 4 variables constant. 𝛽 = 14.394. For each unit increase in exhaust temperature, the mean heat rate is estimated to increase by 14.394, holding all the other 4 variables constant. 𝛽 = 0.35. For each unit increase in cycle pressure ratio, the mean heat rate is estimated to increase by 0.35, holding all the other 4 variables constant. 𝛽 = −0.8480. For each unit increase in air flow rate, the mean heat rate is estimated to decrease by .848, holding all the other 4 variables constant.

d.

From the printout, 𝑠 = 458.828. We would expect to see most of the heat rate values within 2𝑠 = 2(458.828) = 917.656 units of the least squares line.

e.

To determine if at least one of the variables is useful in predicting the heat rate values, we test: 𝐻 :𝛽 =𝛽 =𝛽 =𝛽 =𝛽 =0 𝐻 : At least one 𝛽 ≠ 0 The test statistic is 𝐹 = 147.30 and the p-value is 𝑝 = .000. Since the p-value is less than 𝛼 (𝑝 = .000 < .01), H0 is rejected. There is sufficient evidence to indicate at least one of the variables is useful in predicting the heat rate values at 𝛼 = .01.

f.

𝑅 = R-Sq(adj) = .917. 91.7% of the total sample variance of the heat rate values is explained by the model containing the 5 independent variables, adjusted for the number of variable and the sample size.

g.

To determine if there is evidence to indicate heat rate is linearly related to inlet temperature, we test: 𝐻 : 𝛽 =0 𝐻 :𝛽 ≠0 The test statistic is 𝑡 = −6.14 and the p-value is 𝑝 = .000. Since the p-value is less than 𝛼 (𝑝 = .000 < .01), H0 is rejected. There is sufficient evidence to indicate heat rate is linearly related to inlet temperature, adjusted for the other 4 variables at 𝛼 = .01.

h.

The MINITAB printout is: Regression Equation HEATRATE

=

13614 + 0.0888 RPM - 9.20 INLET-TEMP + 14.39 EXH-TEMP + 0.4 CPRATIO - 0.848 AIRFLOW

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building

723

Settings Variable RPM INLET-TEMP EXH-TEMP CPRATIO AIRFLOW

Setting 7500 1000 525 13.5 10

Prediction Fit SE Fit 95% CI 95% PI 12632.5 237.342 (12157.9, 13107.1) (11599.6, 13665.5)

The 95% prediction interval is (11,599.6, 13,665.5). We are 95% confident that the actual heat rate will be between 11,599.6 and 13.665.5 when the RPM is 7,500, the inlet temperature is 1,000, the exhaust temperature is 525, the cycle pressure ratio is 13.5 and the air flow rate is 10. i.

The 95% confidence interval is (12,157.9, 13,107.1). We are 95% confident that the mean heat rate will be between 12,157.9 and 13,107.1 when the RPM is 7,500, the inlet temperature is 1,000, the exhaust temperature is 525, the cycle pressure ratio is 13.5 and the air flow rate is 10.

j.

Yes. The confidence interval for the mean will always be smaller than the prediction interval for the actual value. This is because there are 2 error terms involved in predicting an actual value and only one error term involved in estimating the mean. First, we have the error in locating the mean of the distribution. Once the mean is located, the actual value can still vary around the mean, thus, the second error. There is only one error term involved when estimating the mean, which is the error in locating the mean.

k.

The model that incorporates the researchers’ theories is: 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 𝑥 + 𝛽 𝑥 𝑥

l.

Using MINITAB, the results of fitting the model are: Regression Analysis: HEATRATE versus INLET-TEMP, EXH-TEMP, ... The regression equation is HEATRATE = 13945 - 15.1 INLET-TEMP + 28.8 EXH-TEMP - 0.69 AIRFLOW + 0.0228 IT_AFR - 0.0543 ET_AFR Predictor Constant INLET-TEMP EXH-TEMP AIRFLOW IT_AFR ET_AFR

Coef 13945 -15.1379 28.843 -0.689 0.022770 -0.05430

S = 425.072

SE Coef 1044 0.7775 2.304 3.628 0.002999 0.01053

R-Sq = 93.4%

T 13.35 -19.47 12.52 -0.19 7.59 -5.16

P 0.000 0.000 0.000 0.850 0.000 0.000

R-Sq(adj) = 92.9%

Analysis of Variance Source Regression Residual Error Total

DF 5 61 66

SS 156875371 11021838 167897208

MS 31375074 180686

F 173.64

The least squares prediction equation is:

Copyright © 2022 Pearson Education, Inc.

P 0.000


724 Chapter 12 𝑦 = 13,945 − 15.1379𝑥 + 28.843𝑥 − 0.689𝑥 + 0.02277𝑥 𝑥 − 0.0543𝑥 𝑥 m.

To determine if inlet temperature and air flow rate interact to affect heat rate, we test: 𝐻 :𝛽 = 0 𝐻 :𝛽 ≠ 0 The test statistic is 𝑡 = 7.59 with a p-value of 𝑝 = 0.000. Since the p-value is less than 𝛼 (𝑝 = 0.000 < .05), H0 is rejected. There is sufficient evidence to indicate that inlet temperature and air flow rate interact to affect heat rate at 𝛼 = .05.

n.

To determine if exhaust temperature and air flow rate interact to affect heat rate, we test: 𝐻 :𝛽 = 0 𝐻 :𝛽 ≠ 0 The test statistic is 𝑡 = −5.16 with a p-value of 𝑝 = 0.000. Since the p-value is less than 𝛼 (𝑝 = 0.000 < .05), H0 is rejected. There is sufficient evidence to indicate that exhaust temperature and air flow rate interact to affect heat rate at 𝛼 = .05.

o.

Since the interaction of inlet temperature and air flow rate is significant, it means that the effect of inlet temperature on the heat rate depends on the level of air flow rate. Also, since the interaction of exhaust temperature and air flow rate is significant, it means that the effect of exhaust temperature on the heat rate also depends on the level of air flow rate

p.

Let x1 = cycle speed and x 2 = cycle pressure ratio. A complete second order model is: E ( y ) = β 0 + β 1 x1 + β 2 x 2 + β 3 x12 + β 4 x 22 + β 5 x1 x 2

q.

To determine whether the curvature terms in the complete 2nd –order model are useful for predicting heat rate, we test: H 0 : β3 = β4 = 0 H a : At least one β i ≠ 0

r.

The complete model is E ( y ) = β 0 + β 1 x1 + β 2 x 2 + β 3 x12 + β 4 x 22 + β 5 x1 x 2 The reduced model is E ( y ) = β 0 + β 1 x1 + β 2 x 2 + β 5 x1 x 2

s.

A printout for the test is shown below (we use the bottom row for the test desired): Best Subset Regression Models for HEATRATE Forced Independent Variables: (A)RPM (B)CPRATIO (C)RPMxCPR Unforced Independent Variables: (D)RPMSQ (E)CPRSQ

P Variables 4 5 5 6

Cp

Adjusted R Square

20.7 4.0 20.7 6.0

0.8421 0.8772 0.8436 0.8752

Cases Included 67

AICc Min AICc

Resid SS

F

P(F) Model

871.40 2.531E+07 A B C 855.90 1.937E+07 19.00 0.0001 A B C E 872.11 2.467E+07 1.60 0.2105 A B C D 858.39 1.937E+07 9.35 0.0003 A B C D E

Missing Cases 0

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building From the printout the test statistic is 𝐹 = 9.35 and the p-value is 𝑝 = .0003 Since the p-value is less than 𝛼 (𝑝 = .0003 < .10), H0 is rejected. There is sufficient evidence to indicate at least one of the variables is useful in predicting the heat rate values at 𝛼 = .10. Using MINITAB, the results of the regression are: Regression Analysis: HEATRATE versus RPM, CPRATIO, RPM*CPR The regression equation is HEATRATE = 12065 + 0.170 RPM - 146 CPRATIO - 0.00242 RPM*CPR Predictor Constant RPM CPRATIO RPM*CPR

Coef 12065.5 0.16969 -146.07 -0.002425

S = 633.842

SE Coef 418.5 0.03467 26.66 0.003120

R-Sq = 84.9%

T 28.83 4.89 -5.48 -0.78

P 0.000 0.000 0.000 0.440

R-Sq(adj) = 84.2%

Analysis of Variance Source Regression Residual Error Total Source RPM CPRATIO RPM*CPR

DF 1 1 1

DF 3 63 66

SS 142586570 25310639 167897208

MS 47528857 401756

F 118.30

P 0.000

Seq SS 119598530 22745478 242561

The residual plots are: Residual Plots for HEATRATE Normal Probability Plot

Versus Fits

99.9

2000

99

Residual

Percent

90 50 10 1 0.1

1000 0 -1000

-2000

-1000

0

1000

2000

8000

10000

Residual

12000

14000

16000

Fitted Value

Histogram

Versus Order

16

2000

12

Residual

Frequency

t.

8

1000 0

4 -1000 0

-1000

-500

0

500

Residual

1000

1500

2000

1

5

10

15

20 25 30 35 40 45 50 55 60 65

Observation Order

Copyright © 2022 Pearson Education, Inc.

725


726 Chapter 12 Scatterplot of SRES vs RPM, CPRATIO 10

20

RPM

4

30

CPRATIO

3

SRES

2 1 0

0 -1 -2 0

10000

20000

30000

From the normal probability plot, the points do not fall on a straight line, indicating the residuals are not normal. The histogram of the residuals indicates the residuals are skewed to the right, which also indicates that the residuals are not normal. The plot of the residuals versus the fitted values indicates that there are potentially 3 outliers with standardized residuals of 3 or more. The variance appears to be constant. On the graph of the residuals versus RPM, the spread of the residuals appears to decrease as the value of RPM increases. This indicates the variance may not be constant for RPMs. Since the assumptions of normality and constant variance appear to be violated, we could consider transforming the data. We should also check the outlying observations to see if there are any errors connected with these observations. 12.165 a.

𝛽 = −105 has no meaning because x3 = 0 is not in the observable range. 𝛽 is simply the yintercept. 𝛽 = 25. The estimated difference in mean attendance between weekends and weekdays is 25, holding temperature and weather constant. 𝛽 = 100. The estimated difference in mean attendance between sunny and overcast days is 100, holding type of day (weekend or weekday) and temperature constant. 𝛽 = 10. The estimated change in mean attendance for each additional degree of temperature is 10, holding type of day (weekend or weekday) and weather (sunny or overcast) held constant.

b.

To determine if the model is useful for predicting daily attendance, we test: H 0 : β1 = β 2 = β 3 = 0 H a : At least one β i ≠ 0

The test statistic is 𝐹 = (

/ )/[

(

)]

=(

. .

)/[

/ (

)]

= 16.10.

The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution with 𝜈 = 𝑘 = 3 and 𝜈 = 𝑛– (𝑘 + 1) = 30– (3 + 1) = 26. From Table VI, Appendix D, F.05 = 2.98 . The rejection region is 𝐹 > 2.98.

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building

727

Since the observed value of the test statistic falls in the rejection region (𝐹 = 16.10 > 2.98), H0 is rejected. There is sufficient evidence to indicate the model is useful for predicting daily attendance at 𝛼 = .05. c.

To determine if mean attendance increases on weekends, we test: H 0 : β1 = 0 H a : β1 > 0

The test statistic is 𝑡 =

=

= 2.5.

The rejection region requires 𝛼 = .10 in the upper tail of the t-distribution with 𝑑𝑓 = 𝑛 − (𝑘 + 1) = 30 − (3 + 1) = 26. From Table III, Appendix D, t.10 = 1.315 . The rejection region is 𝑡 > 1.315. Since the observed value of the test statistic falls in the rejection region (𝑡 = 2.5 > 1.315), H0 is rejected. There is sufficient evidence to indicate the mean attendance increases on weekends at 𝛼 = .10. d.

Sunny ⇒ 𝑥 = 1, Weekday ⇒ 𝑥 = 0, Temperature 95° ⇒ 𝑥 = 95 𝑦 = −105 + 25(0) + 100(1) + 10(95) = 945

e.

We are 90% confident that the actual attendance for sunny weekdays with a temperature of 95° is between 645 and 1245.

12.166 a.

For a sunny weekday, x1 = 0 and x 2 = 1 : x 3 = 70  yˆ = 250 − 700 ( 0 ) + 100 (1 ) + 5 ( 70 ) + 15 ( 0 )( 70 ) = 700 x3 = 80  yˆ = 250 − 700 ( 0 ) + 100 (1) + 5 ( 80 ) + 15 ( 0 )( 80 ) = 750

x3 = 90  yˆ = 800 x3 = 100  yˆ = 850

For a sunny weekend, x1 = 1 and x2 = 1 : x 3 = 70  yˆ = 250 − 700 (1 ) + 100 (1 ) + 5 ( 70 ) + 15 (1 )( 70 ) = 1050

x 3 = 80  yˆ = 250 − 700 (1 ) + 100 (1 ) + 5 ( 80 ) + 15 (1 )( 80 ) = 1250

x3 = 90  yˆ = 1450 x3 = 100  yˆ = 1650

Copyright © 2022 Pearson Education, Inc.


728 Chapter 12

For both sunny weekdays and sunny weekend days, as the predicted high temperature increases, so does the predicted day's attendance. However, the predicted day's attendance on sunny weekend days increases at a faster rate than on sunny weekdays. Also, the predicted day's attendance is higher on sunny weekend days than on sunny weekdays. b.

To determine if the interaction term is a useful addition to the model, we test: H 0 : β4 = 0 H a : β4 ≠ 0

The test statistic is 𝑡 =

=

=5 .

= .025in each tail of the t distribution with 𝑑𝑓 = 𝑛 − (𝑘 + 1) = The rejection region requires = 30 − (4 + 1) = 25. From Table III, Appendix D, t.025 = 2.06 . The rejection region is 𝑡 < −2.06 or 𝑡 > 2.06. Since the observed value of the test statistic falls in the rejection region (𝑡 = 5 > 2.06), H0 is rejected. There is sufficient evidence to indicate the interaction term is a useful addition to the model at 𝛼 = .05. c.

For x1 = 0, x2 = 1, and x3 = 95 , 𝑦 = 250 − 700(0) + 100(1) + 5(95) + 15(0)(95) = 825

d.

The width of the interval in Exercise 12.161e is 1245 − 645 = 600, while the width is 850 − 800 = 50 for the model containing the interaction term. The smaller the width of the interval, the smaller the variance. This implies that the interaction term is quite useful in predicting daily attendance. It has reduced the unexplained error.

e.

Because an interaction term including x1 is in the model, the coefficient corresponding to x1 must be interpreted with caution. For all observed values of 𝑥 (temperature), the interaction term value is greater than 700.

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building 12.167 a.

𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 where x6 = 

 1 if condition is fair x7 =   0 otherwise

1 if condition is good  0 otherwise

b.

The model specified in part a seems appropriate. The points for E, F, and G cluster around three parallel lines. Scatterplot of SalePrice vs Apartments 1000000

Condition E F G

SalePrice

800000

600000

400000

200000

0 0

10

20

30

40

50

60

70

Apartments

c.

Using MINITAB, the output is Regression Analysis: SalePrice versus X1, X6, X7 The regression equation is SalePrice = 188875 + 15617 X1 - 103046 X6 - 152487 X7 Predictor Constant X1 X6 X7

Coef 188875 15617 -103046 -152487

S = 64623.6

SE Coef 28588 1066 31784 39157

R-Sq = 91.8%

T 6.61 14.66 -3.24 -3.89

P 0.000 0.000 0.004 0.001

R-Sq(adj) = 90.7%

Analysis of Variance Source Regression Residual Error Total Source X1 X6 X7

DF 1 1 1

DF 3 21 24

SS 9.86170E+11 87700442851 1.07387E+12

MS 3.28723E+11 4176211564

F 78.71

Seq SS 9.15776E+11 7061463149 63332198206

The fitted model is yˆ = 188,875 + 15, 617x1 − 103, 046x6 − 152, 487x7 For excellent condition, yˆ = 188,875 + 15, 617x1 For good condition, yˆ = 85,829 + 15, 617x1 For fair condition, yˆ = 36,388 + 15, 617x1

Copyright © 2022 Pearson Education, Inc.

P 0.000

729


730 Chapter 12 d.

Using MINITAB, the plot is: Scatterplot of SalePrice vs Apartments 1000000

Condition E F G

SalePrice

800000

600000

400000

200000

0 0

10

20

30

40

50

60

70

Apartments

e.

We must first fit a reduced model with just x1 , number of apartments. Using MINITAB, the output is: Regression Analysis: SalePrice versus X1 The regression equation is SalePrice = 101786 + 15525 X1 Predictor Constant X1

Coef 101786 15525

S = 82907.5

SE Coef 23291 1345

R-Sq = 85.3%

T 4.37 11.54

P 0.000 0.000

R-Sq(adj) = 84.6%

Analysis of Variance Source Regression Residual Error Total

DF 1 23 24

SS 9.15776E+11 1.58094E+11 1.07387E+12

MS 9.15776E+11 6873656705

F 133.23

P 0.000

The fitted model is yˆ = 101, 786 + 15, 525x1 . To determine if the relationship between sale price and number of units differs depending on the physical condition of the apartments, we test: H 0 : β 2 = β3 = 0 H a : At least one β i ≠ 0

The test statistic is 𝐹 =

(

)/( /[

(

) )]

=

.

,

× ,

,

,

, /[

,

/(

(

)]

)

= 8.43

The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution with 𝜈 = 𝑘 − 𝑔 = 3 − 1 = 2 and 𝜈 = 𝑛– (𝑘 + 1) = 25– (3 + 1) = 21. From Table VI, Appendix D, F.05 = 3.47 . The rejection region is 𝐹 > 3.47.

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building

731

Since the observed value of the test statistic falls in the rejection region (𝐹 = 8.43 > 3.47), H0 is rejected. There is evidence to indicate that the relationship between sale price and number of units differs depending on the physical condition of the apartments at 𝛼 = .05. f.

Using MINITAB, the pairwise correlations are: Correlations: x1, x2, x3, x4, x5, x6, x7 x1 x2 x3 x2 -0.014 0.946

x4

x5

x3

0.800 0.000

-0.191 0.361

x4

0.224 0.281

-0.363 0.075

0.167 0.425

x5

0.878 0.000

0.027 0.898

0.673 0.000

0.089 0.671

x6

0.175 0.403

-0.447 0.025

0.273 0.187

0.112 0.594

0.020 0.923

x7

-0.128 0.541

0.392 0.053

-0.123 0.557

0.050 0.814

-0.238 0.252

x6

-0.564 0.003

When highly correlated independent variables are present in a regression model, the results are confusing. The researchers may only want to include one of the variables. This may be the case for the variables: x1 and x3 , x1 and x5, x3 and x5 Using MINITAB, the residual plots are: Residual Plots for SalePrice Normal Probability Plot of the Residuals

Percent

90

50

10 1

Residuals Versus the Fitted Values Standardized Residual

99

-3.0

-1.5

0.0

1.5

3.0 1.5 0.0 -1.5 -3.0

3.0

0

250000

500000

Standardized Residual

6 4 2

-2.4

-1.2

0.0

1.2

Standardized Residual

1000000

Residuals Versus the Order of the Data Standardized Residual

Histogram of the Residuals

0

750000

Fitted Value

8

Frequency

g.

2.4

3.0 1.5 0.0 -1.5 -3.0 2

4

6

8

10

12

14

16

18

20

22

24

Observation Order

Copyright © 2022 Pearson Education, Inc.


732 Chapter 12 Scatterplot of RESI1 vs X1 200000

150000

RESI1

100000

50000

0

0

-50000

-100000 0

10

20

30

40

50

60

70

X1

From the plots of the residuals versus the fitted values, there do not appear to be any outliers - no standardized residuals are larger than 3 in magnitude. In addition, there is no trend that would indicate non-constant variance (no funnel shape). There is a possible upside-down U shape that would indicate that the relationship between price and number of apartments might be curvilinear. In the histogram of the residuals, the plot is fairly mound-shaped, which would indicate the residuals are approximately normally distributed. Also, the normal probability plot looks to be a fairly straight line, indicating the residuals are approximately normal. In the plot of the residuals versus x1, there is a possible upside-down U shape that would indicate that the variable number of apartments should be squared. Otherwise, all of the assumptions appear to be met. 12.168 a.

Let 𝑥 =

1 if study group complete solution 1 if study group check figures Let 𝑥 = 0 otherwise 0 otherwise

A possible model would be: 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 b.

The difference between the mean knowledge gains of students in the “completed solution” and “no help groups” would be 𝛽 .

c.

Using MINITAB, the results are: Regression Analysis: IMPROVE versus X1, X2 The regression equation is IMPROVE = 2.43 - 0.483 X1 + 0.287 X2 Predictor Constant X1 X2

Coef 2.4333 -0.4833 0.2867

S = 2.70636

SE Coef 0.4941 0.7813 0.7329

R-Sq = 1.2%

T 4.92 -0.62 0.39

P 0.000 0.538 0.697

R-Sq(adj) = 0.0%

Analysis of Variance Source Regression Residual Error Total Source X1 X2

DF 1 1

DF 2 72 74

SS 6.643 527.357 534.000

MS 3.322 7.324

F 0.45

P 0.637

Seq SS 5.523 1.121

The least squares prediction equation is: 𝑦 = 2.4333 − .4833𝑥 + .2867𝑥 Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building d.

733

To determine if the model is useful, we test: 𝐻 :𝛽 =𝛽 =0 𝐻 : At least 1 𝛽 ≠ 0 From the printout, the test statistic is 𝐹 = .45 and the p-value is 𝑝 = .637. Since the p-value is not less than 𝛼 (𝑝 = .637 ≮ . 05), H0 is not rejected. There is insufficient evidence to indicate that the model was useful at 𝛼 = .05.

e. 12.169 a.

From Exercise 9.103, the test statistic was 𝐹 = .45 and the p-value was 𝑝 = .637. These are the same as those in part d. Thus, the results agree. To determine whether the complete model contributes information for the prediction of y, we test: H 0 : β1 = β 2 = β 3 = β 4 = β 5 = 0 H a : At least one β i ≠ 0

b.

MSR =

SS ( Model ) 4,911.5 = = 982.3 k 5

The test statistic is F =

MSE =

SSE 1, 830.44 = = 53.84 n − ( k + 1) 40 − ( 5 + 1)

MSR 982.31 = = 18.24 . MSE 53.84

The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution with 𝜈 = 𝑘 = 5 and 𝜈 = 𝑛– (𝑘 + 1) = 40– (5 + 1) = 34. From Table VI, Appendix D, F.05 ≈ 2.53 . The rejection region is 𝐹 > 2.53. Since the observed value of the test statistic falls in the rejection region (𝐹 = 18.24 > 2.53), H0 is rejected. There is sufficient evidence to indicate that the complete model contributes information for the prediction of y at 𝛼 = .05. c.

To determine whether a second-order model contributes more information than a first-order model for the prediction of y, we test: H 0 : β3 = β4 = β5 = 0 H a : At least one β i ≠ 0

d.

The test statistic is 𝐹 =

(

)/( /[

(

) )]

=

(

.

)/(

. .

/[

(

) )]

= 8.46.

The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution with 𝜈 = 𝑘 − 𝑔 = 5 − 2 = 3 and 𝜈 = 𝑛– (𝑘 + 1) = 40– (5 + 1) = 34. From Table VI, Appendix D, F.05 ≈ 2.92 . The rejection region is 𝐹 > 2.92. Since the observed value of the test statistic falls in the rejection region (𝐹 = 8.46 > 2.92), H0 is rejected. There is sufficient evidence to indicate the second-order model contributes more information than a first-order model for the prediction of y at 𝛼 = .05. e.

The second-order model, based on the test result in part d.

Copyright © 2022 Pearson Education, Inc.


734 Chapter 12 12.170 a.

From MINITAB, the output is: Regression Analysis: Food versus Income, Size The regression equation is Food = 2.79 - 0.00016 Income + 0.383 Size Predictor Constant Income Size

Coef 2.7944 -0.000164 0.38348

S = 0.7188

SE Coef 0.4363 0.006564 0.07189

R-Sq = 55.8%

T 6.40 -0.02 5.33

P 0.000 0.980 0.000

R-Sq(adj) = 52.0%

Analysis of Variance Source Regression Residual Error Total Source Income Size

DF 2 23 25

DF 1 1

SS 15.0027 11.8839 26.8865

MS 7.5013 0.5167

F 14.52

P 0.000

Seq SS 0.2989 14.7037

Correlations: Income, Size Pearson correlation of Income and Size = −0.137 P-Value = 0.506

No; Income and household size do not seem to be highly correlated. The correlation coefficient between income and household size is −.137. Using MINITAB, the residual plots are: Scatterplot of RESI1 vs Income, Size 0

Income

3

2

4

6

8

Size

2

RESI1

b.

1

0

0

-1 0

25

50

75

100

Yes; The residuals versus income and residuals versus home size exhibit somewhat curved shapes. Such a pattern could indicate that a second-order model may be more appropriate.

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building c.

735

Using MINITAB, the residual plot versus fitted values is: Scatterplot of RESI1 vs FITS1 3

RESI1

2

1

0

0

-1 3.0

3.5

4.0

4.5

5.0

5.5

6.0

6.5

FITS1

No; The residuals versus the predicted values reveals varying spreads for different values of ŷ . This implies that the variance of ε is not constant for all settings of the x's. d.

Yes; The outlier shows up in several plots and is the 26th household (Food consumption = $7500, income = $7300 and household size = 5).

e.

Using MINITAB, the histogram of the residuals is: Histogram of RESI1 12 10

Frequency

8 6 4 2 0

-1

0

1 RESI1

2

3

No; The frequency distribution of the residuals shows that the outlier skews the frequency distribution to the right. 12.171 a. b.

R 2 = .78 . 78% of the total sample variation in the price of a direct burial about its mean is explained by the model that contains type of state, type of casket, and the interaction of the two.

To determine if the overall model is adequate, we test: H 0 : β1 = β 2 = β 3 = 0 H a : At least 1 β i ≠ 0

Copyright © 2022 Pearson Education, Inc.


736 Chapter 12 The test statistic is 𝐹 = (

/ )/[

(

)]

=(

. .

/

)/[ ,

(

)]

= 1693.55.

The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution with 𝜈 = 𝑘 = 3 and 𝜈 = 𝑛 − (𝑘 + 1) = 1,437 − (3 + 1) = 1,433. From Table VI, Appendix D, F.05 ≈ 2.60 . The rejection region is 𝐹 > 2.60. Since the observed value of the test statistic falls in the rejection region (𝐹 = 8.96 > 2.60), H0 is rejected. There is sufficient evidence to indicate the model is adequate for the prediction of sales price at 𝛼 = .05. c.

For x1 = 1 and x2 = 1 (wooden casket in restrictive state), 𝑦 = 1,432 + 793(1) − 252(1) + 261(1)(1) = 2,234.

d.

For x1 = 1 and x 2 = 0 (no casket in restrictive state), yˆ = 1, 432 + 793 (1) − 252 ( 0 ) + 261 (1)( 0 ) = 2, 225 . The difference would be $2,234 − $2,225 = $9.

e.

For x1 = 0 and x 2 = 1 (wooden casket in non-restrictive state), yˆ = 1, 4 3 2 + 7 9 3 ( 0 ) − 2 5 2 (1 ) + 2 6 1 ( 0 )(1 ) = 1,1 8 0 .

For x1 = 0 and x2 = 0 (no casket in non-restrictive state), yˆ = 1, 432 + 793 ( 0 ) − 252 ( 0 ) + 261 ( 0 )( 0 ) = 1, 432 .

The difference would be $1,180 − $1,432 = −$252. f.

To determine if the difference between the mean price of a direct burial with a basic wooden casket and the mean of a burial with no casket depends on whether the funeral home is in a restrictive state, we test: H 0 : β3 = 0 H a : β3 ≠ 0

The test statistic is 𝑡 =

=

= 2.39. .

= .025in each tail of the t-distribution with 𝑑𝑓 = 𝑛 − The rejection region requires = (𝑘 + 1) = 1,437 − (3 + 1) = 1,433. From Table III, Appendix D, t.025 = 1.96 . The rejection region is 𝑡 < −1.96 or 𝑡 > 1.96. Since the observed value of the test statistic falls in the rejection region (𝑡 = 2.39 > 1.96), H0 is rejected. There is sufficient evidence to indicate the difference between the mean price of a direct burial with a basic wooden casket and the mean of a burial with no casket depends on whether the funeral home is in a restrictive state at 𝛼 = .05.

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building 12.172 a.

Using MINITAB, the output from fitting a complete second-order model is: Regression Analysis: Collision versus X1, X2, X1X2, X1SQ, X2SQ The regression equation is Collision = 172788 - 10739 X1 - 499 X2 - 20.2 X1X2 + 198 X1SQ + 14.7 X2SQ Predictor Constant X1 X2 X1X2 X1SQ X2SQ

Coef 172788 -10739 -499 -20.20 197.57 14.678

S = 13132.0

SE Coef 97785 2789 1444 21.36 22.60 8.819

R-Sq = 95.9%

T 1.77 -3.85 -0.35 -0.95 8.74 1.66

P 0.084 0.000 0.731 0.350 0.000 0.103

R-Sq(adj) = 95.5%

Analysis of Variance Source Regression Residual Error Total Source X1 X2 X1X2 X1SQ X2SQ

DF 1 1 1 1 1

DF 5 42 47

SS 1.70956E+11 7242911076 1.78199E+11

MS 34191134613 172450264

F 198.27

P 0.000

Seq SS 1.56067E+11 13214029 1686340039 12711376407 477703108

The fitted model is 𝑦 = 172,788 − 10,739𝑥 − 499𝑥 − 20.2𝑥 𝑥 + 197.57𝑥 + 14.678𝑥 . b.

To test the hypothesis H 0 : β 4 = β 5 = 0 , we must fit the reduced model 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 𝑥 . Using MINITAB, the output from fitting the reduced model is: Regression Analysis: Collision versus X1, X2, X1X2 The regression equation is Collision = - 476768 + 11458 X1 + 3404 X2 - 64.4 X1X2 Predictor Constant X1 X2 X1X2

Coef -476768 11458 3404 -64.35

S = 21549.1

SE Coef 100852 1874 1814 33.77

R-Sq = 88.5%

T -4.73 6.11 1.88 -1.91

P 0.000 0.000 0.067 0.063

R-Sq(adj) = 87.8%

Analysis of Variance Source Regression Residual Error Total

DF 3 44 47

SS 1.57767E+11 20431990591 1.78199E+11

MS 52588864517 464363423

F 113.25

Copyright © 2022 Pearson Education, Inc.

P 0.000

737


738 Chapter 12 Source X1 X2 X1X2

DF 1 1 1

Seq SS 1.56067E+11 13214029 1686340039

The test is: H 0 : β 4 = β5 = 0 H a : At least one β i ≠ 0

The test statistic is 𝐹 =

(

)/( /[

(

) )]

(

=

,

,

,

,

,

, ,

, /[

, (

)/(

)

)]

= 38.24

The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution with 𝜈 = 𝑘 − 𝑔 = 5 − 3 = 2 and 𝜈 = 𝑛– (𝑘 + 1) = 48– (5 + 1) = 42. From Table VI, Appendix D, F.05 ≈ 3.23 . The rejection region is 𝐹 > 3.23. Since the observed value of the test statistic falls in the rejection region (𝐹 = 38.24 > 3.23), H0 is rejected. There is sufficient evidence to indicate that at least one of the quadratic terms contributes to the prediction of monthly collision claims at 𝛼 = .05. c.

From part b, we know at least one of the quadratic terms is significant. From part a, it appears that none of the terms involving x 2 may be significant. Thus, we will fit the model with just x1 and x 12 . The MINITAB output is: Regression Analysis: Collision versus X1, X1SQ The regression equation is Collision = 185160 - 11580 X1 + 196 X1SQ Predictor Constant X1 X1SQ

Coef 185160 -11580 195.54

S = 13219.4

SE Coef 54791 2182 21.64

R-Sq = 95.6%

T 3.38 -5.31 9.04

P 0.002 0.000 0.000

R-Sq(adj) = 95.4%

Analysis of Variance Source Regression Residual Error Total Source X1 X1SQ

DF 1 1

DF 2 45 47

SS 1.70335E+11 7863863065 1.78199E+11

MS 85167360538 174752513

F 487.36

P 0.000

Seq SS 1.56067E+11 14267681594

To see if any of the terms involving x2 are significant, we test: H 0 : β 2 = β3 = β5 = 0 H a : At least one β i ≠ 0

The test statistic is 𝐹 =

(

)/( /[

(

) )]

=

( ,

,

, ,

, ,

,

, /[

Copyright © 2022 Pearson Education, Inc.

,

)/(

(

)]

)

= 1.20.


Multiple Regression and Model Building

739

The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution with 𝜈 = 𝑘 − 𝑔 = 5 − 2 = 3 and 𝜈 = 𝑛– (𝑘 + 1) = 48– (5 + 1) = 42. From Table VI, Appendix D, F.05 ≈ 2.84 . The rejection region is 𝐹 > 2.84. Since the observed value of the test statistic does not fall in the rejection region (𝐹 = 1.20 ≯ 2.84), H0 is not rejected. There is insufficient evidence to indicate that any of the terms involving x2 contribute to the model at 𝛼 = .05. Thus, it appears that the best model is 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 . The model does not support the analyst's claim. In the model above, the estimate for 𝛽 is positive. This would indicate that the higher claims are for both the young and the old. Also, there is no evidence to support the claim that there are more claims when the temperature goes down. 12.173 First, we will fit the simple linear regression model 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 . Using MINITAB, the results are: Regression Analysis: y versus x1, x2 The regression equation is y = - 1.57 + 0.0257 x1 + 0.0336 x2 Predictor Constant x1 x2

Coef -1.5705 0.025732 0.033615

S = 0.4023

SE Coef 0.4937 0.004024 0.004928

R-Sq = 68.1%

T -3.18 6.40 6.82

P 0.003 0.000 0.000

R-Sq(adj) = 66.4%

Analysis of Variance Source Regression Residual Error Total Source x1 x2

DF 1 1

DF 2 37 39

SS 12.7859 5.9876 18.7735

MS 6.3930 0.1618

F 39.51

P 0.000

Seq SS 5.2549 7.5311

To determine if the model is useful in the prediction of y (GPA), we test: H 0 : β1 = β 2 = 0 H a : At least one β i ≠ 0

The test statistic is 𝐹 = 39.51 and the p-value is 𝑝 = .000. Since the p-value is so small, H0 is rejected for any reasonable value of 𝛼. There is sufficient evidence to indicate at least one of the variables Verbal score or Mathematics score is useful in predicting GPA. To determine if Verbal score is useful in predicting GPA, controlling for Mathematics score, we test: H 0 : β1 = 0 H a : β1 ≠ 0

The test statistic is 𝑡 = 6.40 and the p-value is 𝑝 = .000. Since the p-value is so small, H0 is rejected for any reasonable value of 𝛼. There is sufficient evidence to indicate Verbal score is useful in predicting GPA, controlling for Mathematics score.

Copyright © 2022 Pearson Education, Inc.


740 Chapter 12 To determine if Mathematics score is useful in predicting GPA, controlling for Verbal score, we test: H 0 : β2 = 0 H a : β2 ≠ 0

The test statistic is 𝑡 = 6.82 and the p-value is 𝑝 = .000. Since the p-value is so small, H0 is rejected for any reasonable value of 𝛼. There is sufficient evidence to indicate Mathematics score is useful in predicting GPA, controlling for Verbal score. Thus, both terms in the model are significant. The R-squared value is 𝑅 = .681. This indicates that 68.1% of the sample variance of the GPA’s is explained by the model. Now, we need to check the residuals. From MINITAB, the plots are: Residual Plots for y Normal Probability Plot

Versus Fits Standardized Residual

99

Percent

90 50 10 1

-3.0

-1.5

0.0

1.5

2

0

-2

3.0

2

3

Standardized Residual

Versus Order Standardized Residual

Histogram

Frequency

12 9 6 3 0

-3

-2

-1

4

Fitted Value

0

1

2

0

-2

2

1

Standardized Residual

5

10

15

20

25

30

35

40

Observation Order

Scatterplot of SRES vs x1, x2 50

x1

2

75

100

x2

1

0

SRES

0

-1

-2

-3 40

60

80

100

From the normal probability plot, it appears that the assumption of normality is valid. The points are very close to a straight line except for the first 2 points. The histogram of the residuals implies that the residuals are slightly skewed to the left. I would still consider the assumption to be valid. The plot of the residuals versus the fitted values indicates a random spread of the residuals between the two bands. This indicates that the assumption of equal variances is probably valid. The plot of the residuals versus x1 indicates that the relationship between GPA and Verbal score may not be linear, but quadratic because the points form a somewhat upside down U shape. The plot of the residuals versus x2 indicates that the relationship between GPA and Mathematics score may or may not be quadratic. Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building

741

2

Since the plots indicate a possible 2nd order model and the R value is not real large, we will fit a complete 2nd order model: 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 𝑥 Using MINITAB, the results are: Regression Analysis: y versus x1, x2, x1sq, x2sq, x1x2 The regression equation is y = - 9.92 + 0.167 x1 + 0.138 x2 - 0.00111 x1sq - 0.000843 x2sq + 0.000241 x1x2 Predictor Constant x1 x2 x1sq x2sq x1x2

Coef -9.917 0.16681 0.13760 -0.0011082 -0.0008433 0.0002411

S = 0.187142

SE Coef 1.354 0.02124 0.02673 0.0001173 0.0001594 0.0001440

R-Sq = 93.7%

T -7.32 7.85 5.15 -9.45 -5.29 1.67

P 0.000 0.000 0.000 0.000 0.000 0.103

R-Sq(adj) = 92.7%

Analysis of Variance Source Regression Residual Error Total Source x1 x2 x1sq x2sq x1x2

DF 1 1 1 1 1

DF 5 34 39

SS 17.5827 1.1908 18.7735

MS 3.5165 0.0350

F 100.41

P 0.000

Seq SS 5.2549 7.5311 3.6434 1.0552 0.0982

To determine if the interaction between Verbal score and Mathematics score is useful in the prediction of y (GPA), we test: H 0 : β5 = 0 H a : β5 ≠ 0

The test statistic is 𝑡 = 1.67 and the p-value is 𝑝 = .103. Since the p-value is not small, H0 is not rejected for any value of 𝛼 < .10. There is insufficient evidence to indicate the interaction between Verbal score and Mathematics score is useful in predicting GPA. Now, we will fit a model without the interaction term, but including the squared terms: 𝐸(𝑦) = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 Using MINITAB, the results are:

Copyright © 2022 Pearson Education, Inc.


742 Chapter 12 Regression Analysis: y versus x1, x2, x1sq, x2sq The regression equation is y = - 11.5 + 0.189 x1 + 0.159 x2 - 0.00114 x1sq - 0.000871 x2sq Predictor Constant x1 x2 x1sq x2sq

Coef -11.458 0.18887 0.15874 -0.0011412 -0.0008705

S = 0.191905

SE Coef 1.019 0.01709 0.02417 0.0001186 0.0001626

R-Sq = 93.1%

T -11.24 11.05 6.57 -9.62 -5.35

P 0.000 0.000 0.000 0.000 0.000

R-Sq(adj) = 92.3%

Analysis of Variance Source Regression Residual Error Total Source x1 x2 x1sq x2sq

DF 1 1 1 1

DF 4 35 39

SS 17.4845 1.2890 18.7735

MS 4.3711 0.0368

F 118.69

P 0.000

Seq SS 5.2549 7.5311 3.6434 1.0552

To determine if the relationship between Verbal score and GPA is quadratic, controlling for Mathematics score, we test: H 0 : β3 = 0 H a : β3 ≠ 0

The test statistic is 𝑡 = −9.62 and the p-value is 𝑝 = .000. Since the p-value is so small, H0 is rejected for any reasonable value of 𝛼. There is sufficient evidence to indicate the relationship between Verbal score and GPA is quadratic, controlling for Mathematics score. To determine if the relationship between Verbal score and GPA is quadratic, controlling for Mathematics score, we test: H 0 : β4 = 0 H a : β4 ≠ 0

The test statistic is 𝑡 = −5.35and the p-value is 𝑝 = .000. Since the p-value is so small, H0 is rejected for any reasonable value of 𝛼. There is sufficient evidence to indicate the relationship between Mathematics score and GPA is quadratic, controlling for Verbal score. Thus, both quadratic terms in the model are significant. The R-squared value is 𝑅 = .913. This indicates that 91.3% of the sample variance of the GPA’s is explained by the model.

Copyright © 2022 Pearson Education, Inc.


Multiple Regression and Model Building

743

Now, we need to check the residuals. From MINITAB, the plots are: Residual Plots for y Normal Probability Plot

Versus Fits Standardized Residual

99

Percent

90 50 10 1

-1

0

1

1

Frequency

2

3

Fitted Value

Histogram

Versus Order

9 6 3

-2.4

-1 -2

2

12

0

1 0

Standardized Residual

Standardized Residual

-2

2

-1.2

0.0

1.2

2 1 0 -1 -2

2.4

1

Standardized Residual

5

10

15

20

25

30

35

40

Observation Order

Scatterplot of SRES_1 vs x1, x2 50

x1

3

75

100

x2

2

SRES_1

1

0

0

-1

-2

-3

40

60

80

100

From the normal probability plot, it appears that the assumption of normality is valid. The points are very close to a straight line. The histogram of the residuals also implies that the residuals are approximately normal. The plot of the residuals versus the fitted values indicates a random spread of the residuals between the two bands. This indicates that the assumption of equal variances is probably valid. The plot of the residuals versus x1 indicates a random spread of the residuals between the two bands. This indicates that the order of x1 (2nd) is appropriate. The plot of the residuals versus x 2 indicates a random spread of the residuals between the two bands. This indicates that the order of x 2 (2nd) is appropriate. The model appears to be pretty good. All terms in the model are significant, the residual analysis indicates the assumptions are met and the R-squared value is fairly close to 1. The fitted model is 𝑦 = −11.5 + .189𝑥 + .159𝑥 − .0114𝑥 − .00087𝑥 12.174 a.

Answers will vary. Problem 1: The model fit is just a first-order model. The actual relationship between some of the independent variables and the dependent variable could be curvilinear. If curvilinear terms were added, then the relationship between IQ and income could change. In addition, even though IQ is significantly related to income, the overall model may not be adequate for predicting income. More independent variables might be needed or more complex models need to be considered. Copyright © 2022 Pearson Education, Inc.


744 Chapter 12

Problem 2: Regression analysis looks at the relationship between a dependent variable and one or more independent variables. The data used here is observational. When observational data is used in a regression analysis, no cause-and-effect inferences can be made. Just because 2 variables are related does not imply that one causes the other. Problem 3: The independent variable does not need to be normal. If a variable that is highly skewed is normalized before being included as an independent variable in a regression model, then the relationship between the normalized variable and the dependent variable will be different than the relationship between the original variable and the dependent variable. If IQ was not normalized, then the relationship between IQ and income might be quite different. Problem 4: By not including important independent variables in a model, the relationship between the dependent variable and the included independent variables may look different. In this example, the level of education could very easily explain the difference in income rather than IQ. b.

Answers will vary. The model proposed includes IQ, socioeconomic level, and age as linear independent variables. A more complicated model that we could look at would include squared terms for each of the independent variables to allow for curvilinear relationships between income and the independent variables. We could also include interaction terms between the independent variables. A complete 2nd order model would be: E ( y ) = β 0 + β 1 x1 + β 2 x 2 + β 3 x 3 + β 4 x12 + β 5 x 22 + β 6 x 32 + β 7 x1 x 2 + β 8 x1 x 3 + β 9 x 2 x 3

Before fitting the above model, we could plot the residuals from the first-order model against each of the independent variables to see if the model was misspecified. From these plots, we could get some information to see if adding the squared terms to the model might help. We could look at the model with the first-order terms plus the quadratic terms and compare it to the first-order model to see if adding any of the squared terms would be helpful. If at least one is significant, then we would have to investigate which of these terms would be helpful. Once we determine if any of the squared terms are significant, we could add the interaction terms to the model to see if any of the interaction terms would be helpful in predicting income.

Copyright © 2022 Pearson Education, Inc.


Chapter 13 Methods for Quality Improvement: Statistical Process Control 13.1

A control chart is a time series plot of individual measurements or means of a quality variable to which a centerline and two other horizontal lines called control limits have been added. The center line represents the mean of the process when the process is in a state of statistical control. The upper control limit and the lower control limit are positioned so that when the process is in control the probability of an individual measurement or mean falling outside the limits is very small. A control chart is used to determine if a process is in control (only common causes of variation present) or not (both common and special causes of variation present). This information helps us to determine when to take action to find and remove special causes of variation and when to leave the process alone.

13.2

If rational subgrouping is not used, it is possible that a change in the process mean will go undetected. In rational subgrouping, samples are selected so that a change in the process mean occurs between samples, not within samples.

13.3

When a control chart is first constructed, it is not known whether the process is in control or not. If the process is found not to be in control, then the centerline and control limits should not be used to monitor the process in the future.

13.4

An x -chart is used to monitor the process mean.

13.5

Even if all the points of an x -chart fall within the control limits, the process may be out of control. Nonrandom patterns may exist among the plotted points that are within the control limits, but are very unlikely if the process is in control. Examples include six points in a row steadily increasing or decreasing and 14 points in a row alternating up and down.

13.6

The variation of a process must be stable. If it is not, the control limits of the x -chart would be meaningless since they are a function of the process variation.

13.7

Rule 1: Rule 2: Rule 3: Rule 4: Rule 5: Rule 6:

One point beyond Zone A: No points are beyond Zone A. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist. Two out of three points in Zone A or beyond: There are no groups of three consecutive points that have two or more in Zone A or beyond. Four out of five points in a row in Zone B or beyond: Points 18 through 21 are all in Zone B or beyond. This indicates the process is out of control.

Thus, rule 6 indicates this process is out of control. 13.8

a.

According to rule 4 (14 points in a row alternating up and down), the process is out of control. Therefore, it is affected by both common and special causes of variation. An in-control process is affected by only common causes. Rule 4 says that if we observe 14 points in a row alternating up and

745 Copyright © 2022 Pearson Education, Inc.


746

Chapter 13 down, that is an indication of the presence of special causes of variation in addition to common causes. Points 2 through 16 alternate up and down. b.

The extended x -chart is: Scatterplot of Mean vs Sample Number

Mean

40 35

UCL

30

+AB

25

+BC

20

Centerline

15

-BC

10

-AB LCL

5 0

5

10

15

20

25

30

Sample Number

The additional points suggest that the process is out of control. Rule 1 (One point beyond Zone A), Rule 5 (2 out of 3 points in a row in Zone A or beyond), and Rule 6 (4 out of 5 points in a row in Zone B or beyond) indicate the process is out of control. 13.9

Using Table IX, Appendix D:

13.10

a.

With n = 3 , A2 = 1.023

b.

With n = 10 , A2 = 0.308

c.

With n = 22 , A2 = 0.167

a.

x=

b.

Centerline = x = 80.352

x1 + x2 +  + x25 2008.8 = = 80.352 k 25

R=

R1 + R2 +  + R25 198.7 = = 7.948 k 25

From Table IX, Appendix D, with n = 5 , A2 = .577 . Upper control limit = x + A2 R = 80.352 + .577 ( 7.948 ) = 84.938 Lower control limit = x − A2 R = 80.352 − .577 ( 7.948 ) = 75.766 c – d.

2 2 A2 R = 80.352 + (.577)( 7.948) = 83.409 3 3 2 2 Lower A–B boundary = x − A2 R = 80.352 − (.577)( 7.948) = 77.295 3 3 1 1 Upper B–C boundary = x + A2 R = 80.352 + (.577)( 7.948) = 81.881 3 3 1 1 Lower B–C boundary = x − A2 R = 80.352 + (.577)( 7.948) = 78.823 3 3

Upper A–B boundary = x +

(

)

(

)

(

)

(

)

Copyright © 2022 Pearson Education, Inc.


Methods for Quality Improvement: Statistical Process Control

747

The x -chart is:

85.0

UCL=84.94 +AB=83.41

82.5

Xbar

+BC=81.88 Centerline=80.35

80.0

-BC=78.82 77.5

-AB=77.30 LCL=75.77

75.0 0

5

10

15

20

25

Sample

Rule 1: Rule 2: Rule 3: Rule 4: Rule 5: Rule 6:

One point beyond Zone A: Point 10 is beyond Zone A. This indicates the process is out of control. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist. Two out of three points in Zone A or beyond: There are no groups of three consecutive points that have two or more in Zone A or beyond. Four out of five points in a row in Zone B or beyond: No sequence of five points has four or more in Zone B or beyond.

Rule 1 indicates the process is out of control. 13.11

a.

For each sample, we compute x1 =

 x and R = range = largest measurement - smallest

n measurement. The results are listed in the table:

Sample No.

x1

R

Sample No.

x2

R

1 2 3 4 5 6 7 8 9 10

20.225 19.750 20.425 19.725 20.550 19.900 21.325 19.625 19.350 20.550

1.8 2.8 3.8 2.5 3.7 5.0 5.5 3.5 2.5 4.1

11 12 13 14 15 16 17 18 19 20

21.225 20.475 19.650 19.075 19.400 20.700 19.850 20.200 20.425 19.900

3.2 0.9 2.6 4.0 2.2 4.3 3.6 2.5 2.2 5.5

Copyright © 2022 Pearson Education, Inc.


Chapter 13 x1 + x2 +  x20 402.325 = = 20.11625 k 20

b.

x=

c.

Centerline = x = 20.116

R=

R1 + R2 +  R20 66.2 = = 3.31 k 20

From Table IX, Appendix D, with n = 4, A2 = .729 . Upper control limit = x + A2 R = 20.116 + .729 ( 3.31) = 22.529 Lower control limit = x − A2 R = 20.116 − .729 ( 3.31) = 17.703 d.

e.

2 2 A2 R = 20.116 + (.729)( 3.31) = 21.725 3 3 2 2 Lower A-B boundary = x − A2 R = 20.116 − (.729)( 3.31) = 18.507 3 3 1 1 Upper B-C boundary = x + A2 R = 20.116 + (.729)( 3.31) = 20.920 3 3 1 1 Lower B-C boundary = x − A2 R = 20.116 − (.729)( 3.31) = 19.312 3 3

Upper A-B boundary = x +

(

)

(

)

(

)

(

)

The x -chart is: 23

UCL=22.529 22

X-bar

748

+AB=21.725

21

+BC=20.92

20

Centerline=20.116 -BC=19.312

19

-AB=18.507 18

LCL=17.703 17 0

5

10

15

20

Sample

Rule 1: Rule 2: Rule 3: Rule 4: Rule 5: Rule 6:

One point beyond Zone A: No points are beyond Zone A. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist. Two out of three points in Zone A or beyond: There are no groups of three consecutive points that have two or more in Zone A or beyond. Four out of five points in a row in Zone B or beyond: No sequence of five points has four or more in Zone B or beyond.

The process appears to be in control.

Copyright © 2022 Pearson Education, Inc.


Methods for Quality Improvement: Statistical Process Control 13.12

13.13

749

a.

The process is out of control. There are several points that fall outside the lower control limit.

b.

The September 21, 22, and 30 observations represent special cause variation. There is a specific reason the observations were low on those days. The September 11 observation requires more investigation to determine if it should be considered due to a special cause of variation. The reason for the decline in the air conditioning consumption must be investigated before labeling the reason as a special cause.

a.

The senterline would be 𝑥̿ = 672.8.

b.

The standard deviation would be 𝜎 ̅ =

. √

= 6.755

The upper control limit is 𝑈𝐶𝐿 = 𝑥̄ + 3 𝑠 = 672.8 + 3 6.755 = 693.065 The lower control limit is 𝐿𝐶𝐿 = 𝑥̄ − 3 𝑠 = 672.8 − 3 6.755 = 652.535 c.

= 626.67. Since this value falls The 𝑥̅ −value for the future inspection would be 𝑥̅ = below the lower control limit, we would say that the process is out of control. This would very much be cause for concern.

13.14

Yes, the process appears to be in control. None of the rules are violated.

13.15

a.

From the printout, x = 100.59 and s = .454 .

b.

The centerline would be x = 100.59 . The upper control limit is UCL = x + 3 ( s ) = 100.59 + 3 (.454 ) = 101.952 . The lower control limit is LCL = x − 3 ( s ) = 100.59 − 3 (.454 ) = 99.228 .

c.

Using MINITAB, the x-chart is Scatterplot of x vs Order 102.0

UCL=101.952

101.5

x

101.0 Centerline=100.59

100.5 100.0 99.5

LCL=99.228 99.0 0

10

20

30

40

50

60

Order

The process is not in control as there is an observation below the lower control limit. 13.16

Result (1) resulted in an observation that fell outside the upper control limit. This would break Rule #1 and we would say that the process is out of control. Result (2) resulted in more than nine points in a row in Zone C or beyond. This would break Rule #2 and we would say that the process is out of control.

Copyright © 2022 Pearson Education, Inc.


13.17

Chapter 13 a.

x=

x1 + x2 +  + x20 1, 400 = = 70 k 20

b.

R=

R1 + R2 +  + R22 650 = = 32.5 k 20

c.

From Table IX, Appendix D, with n = 10 , A2 = .308 . Upper control limit = x + A2 R = 70 + .308 ( 32.5) = 80.01

Lower control limit = x − A2 R = 70 − .308 ( 32.5) = 59.99 d.

2 2 A2 R ) = 70 + (.308)( 32.5) = 76.67 ( 3 3 2 2 Lower A–B boundary = x + ( A2 R ) = 70 − (.308)( 32.5) = 63.33 3 3 1 1 Upper B–C boundary = x + ( A2 R ) = 70 + (.308)( 32.5) = 73.34 3 3 1 1 Lower B–C boundary = x + ( A2 R ) = 70 − ( .308)( 32.5) = 66.66 3 3 The x -chart is: Upper A–B boundary = x +

UCL=80.01

80

+AB=76.67 75

+BC=73.34 Pain Level

750

Centerline=70

70

-BC=66.66 65

-AB=63.33

LCL=59.99

60 0

5

10

15

20

Week

Rule 1: Rule 2: Rule 3: Rule 4: Rule 5: Rule 6:

One point beyond Zone A: No points are beyond Zone A. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist. Two out of three points in Zone A or beyond: There are no groups of three consecutive points that have two or more in Zone A or beyond. Four out of five points in a row in Zone B or beyond: No sequence of five points has four or more in Zone B or beyond.

The process appears to be in control.

Copyright © 2022 Pearson Education, Inc.


Methods for Quality Improvement: Statistical Process Control

e.

751

The x -chart with the additional points is: UCL=80.01

80

+AB=76.67 75

Pain Level

+BC=73.34

Centerline=70

70

-BC=66.66 65

-AB=63.33

LCL=59.99

60 0

5

10

15

20

25

30

Week

f.

Rule 1: Rule 2: Rule 3: Rule 4: Rule 5: Rule 6:

One point beyond Zone A: No points are beyond Zone A. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: There are six points steadily decreasing. This indicates the process is out of control. Fourteen points in a row alternating up and down: This pattern does not exist. Two out of three points in Zone A or beyond: There are no groups of three consecutive points that have two or more in Zone A or beyond. Four out of five points in a row in Zone B or beyond: No sequence of five points has four or more in Zone B or beyond.

Rule 3 indicates the process is out of control. There is a shift in the pain level of the patients following the intervention because the new observations are steadily decreasing. 13.18

a.

b.

The means and ranges are shown in the table below: Plate

Mean

Range

Plate

Mean

Range

1 2 3 4 5 6 7 8 9 10 11 12

19.1023 19.1040 19.1048 19.1025 19.1025 19.1030 19.1045 19.1030 19.1030 19.1005 19.1035 19.1025

0.0020 0.0050 0.0060 0.0060 0.0050 0.0020 0.0060 0.0060 0.0050 0.0040 0.0010 0.0060

13 14 15 16 17 18 19 20 21 22 23 24

19.1008 19.1038 19.1015 19.1038 19.0998 19.1025 19.1025 19.1008 19.1025 19.1025 19.1028 19.1035

0.0040 0.0020 0.0080 0.0090 0.0060 0.0020 0.0030 0.0080 0.0030 0.0080 0.0030 0.0050

MINITAB was used to create the control chart:

Copyright © 2022 Pearson Education, Inc.


13.19

Chapter 13

c.

None of the rules are violated, so we would say that the hole drilling process is in control.

d.

Since the process is in control, these limits should be used to monitor future process output.

a.

x=

b.

The x -chart is:

x1 + x2 +  + x10 151 = = 15.1 k 10

35

UCL=31.5

30

+AB=26

25

Time

752

20

+BC=20.5

15

Centerline=15.1

10

-BC=9.5

5

-AB=4 LCL=0

0 0

2

4

6

8

10

Day

c.

Rule 1: Rule 2: Rule 3: Rule 4: Rule 5: Rule 6:

One point beyond Zone A: No points are beyond Zone A. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: This pattern does not exist. Fourteen points in a row alternating up and down: This pattern does not exist. Two out of three points in Zone A or beyond: There are no groups of three consecutive points that have two or more in Zone A or beyond. Four out of five points in a row in Zone B or beyond: No sequence of five points has four or more in Zone B or beyond.

There is no evidence of special causes of variation. d.

x=

x1 + x2 +  + x14 152.5 = = 10.9 . k 14

Copyright © 2022 Pearson Education, Inc.


Methods for Quality Improvement: Statistical Process Control

753

The x -chart is: UCL=21.5 20

+AB=18

Time

15

+BC=14.5

Centerline=10.9

10

-BC=7.5 5

-AB=4

LCL=0.5

0 0

2

4

6

8

10

12

14

Day

Rule 1: Rule 2:

One point beyond Zone A: No points are beyond Zone A. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: This pattern does not exist. Fourteen points in a row alternating up and down: This pattern does not exist. Two out of three points in Zone A or beyond: There are no groups of three consecutive points that have two or more in Zone A or beyond. Four out of five points in a row in Zone B or beyond: No sequence of five points has four or more in Zone B or beyond.

Rule 3: Rule 4: Rule 5: Rule 6:

There is no evidence of special causes of variation. e.

The side-by-side x -charts are: 0

Time*Day

5

10

15

Time2*Day2

25

21.5 20

18 15

14.5 10.9

10

7.5 5

4 0.5

0 0

3

6

9

It appears that a process shift has occurred. All of the points after the implementation are below the centerline of the points before the implementation. 13.20

a.

The upper control limit is set at 3 σ above the centerline. The probability that a mean bow measurement for a randomly selected hour will fall above the upper control limit is about .0027 / 2 = .00135 if the process is in control. Copyright © 2022 Pearson Education, Inc.


754

13.21

Chapter 13

b.

If the measurements are independent, then the probability that 3 of the 67 measurements will fall above the upper control limit would be  67 67! 67 − 3 64 3 = .001353 (.99865) = .000108 .  3  .00135 (.99865) 3!64!

a.

The sample means and ranges are: Sample

x-bar

Range

Sample

x-bar

Range

Sample

x-bar

Range

1 2 3 4 5 6 7 8 9 10 11 12 13 14

99.743 99.447 100.040 100.353 99.287 99.507 99.707 99.717 100.537 100.097 99.633 100.883 100.843 100.507

0.12 1.53 0.29 1.68 0.38 0.79 0.28 0.83 1.26 0.39 0.92 1.05 1.01 0.50

15 16 17 18 19 20 21 22 23 24 25 26 27

100.543 100.503 100.087 99.383 100.457 100.863 99.713 100.050 100.283 99.910 100.510 99.723 99.327

0.24 1.19 1.14 0.20 0.86 0.97 0.65 0.69 1.24 0.75 1.62 0.79 0.23

28 29 30 31 32 33 34 35 36 37 38 39 40

100.597 100.180 99.940 100.653 99.473 99.877 100.503 100.053 99.783 100.367 100.503 100.270 99.377

0.37 1.77 0.47 0.77 0.65 0.99 0.39 0.76 1.23 1.69 0.70 0.69 0.18

x=

x1 + x2 + ... + x40 4, 003.229 = = 100.081 k 40

R=

R1 + R2 + ... + R40 32.26 = = .8065 40 40

Centerline = x = 100.081 From Table IX, Appendix D, with n = 3, A2 = 1.023 . Upper control limit = x + A2 R = 100.081 + 1.023 (.8065 ) = 100.906 Lower control limit = x − A2 R = 100.081 − 1.023 (.8065 ) = 99.256 2 2 A2 R = 100.081 + (1.023)(.8065) = 100.631 3 3 2 2 Lower A-B boundary = x − A2 R = 100.081 − (1.023)(.8065) = 99.531 3 3 1 1 Upper B-C boundary = x + A2 R = 100.081 + (1.023)(.8065) = 100.356 3 3 1 1 Lower B-C boundary = x − A2 R = 100.081 − (1.023)(.8065) = 99.806 3 3 Upper A-B boundary = x +

Copyright © 2022 Pearson Education, Inc.


Methods for Quality Improvement: Statistical Process Control

The x -chart is: 101.0

UCL=100.906 A 100.631 100.5 B

100.356 x-bar

C 100.0

Centerline=100.081 C 99.806 B 99.531

99.5

A LCL=99.256 0

10

20

30

40

Sample

Using Rule 1, there are no points beyond Zone A. Therefore, the process appears to be in control. b.

The sample means and ranges for the rounded data are: Sample

x-bar

Range

Sample

x-bar

Range

Sample

x-bar

Range

1 2 3 4 5 6 7 8 9 10 11 12 13 14

100.000 99.333 100.000 100.333 99.000 99.667 100.000 99.667 100.667 100.000 99.667 101.333 101.333 100.333

0 1 0 2 0 1 0 1 1 0 1 1 1 1

15 16 17 18 19 20 21 22 23 24 25 26 27

100.667 100.667 100.333 99.000 100.667 101.333 99.667 100.000 100.333 100.000 100.667 99.667 99.000

1 1 1 0 1 1 1 0 1 0 1 1 0

28 29 30 31 32 33 34 35 36 37 38 39 40

100.333 100.000 100.000 100.667 99.667 99.667 100.333 100.000 99.667 100.333 100.333 100.333 99.000

1 2 0 1 1 1 1 0 1 1 1 1 0

x=

x1 + x2 + ... + x40 4, 003.667 = = 100.092 k 40

R=

R1 + R2 + ... + R40 30 = = .75 40 40

Centerline = x = 100.092 From Table IX, Appendix D, with n = 3, A2 = 1.023 . Upper control limit = x + A2 R = 100.092 + 1.023 (.75 ) = 100.859 Lower control limit = x − A2 R = 100.092 −1.023 (.75 ) = 99.325 Copyright © 2022 Pearson Education, Inc.

755


756

Chapter 13

2 2 A2 R = 100.092 + (1.023)(.75) = 100.604 3 3 2 2 Lower A-B boundary = x − A2 R = 100.092 − (1.023)(.75) = 99.581 3 3 1 1 Upper B-C boundary = x + A2 R = 100.092 + (1.023)(.75) = 100.348 3 3 1 1 Lower B-C boundary = x − A2 R = 100.02 − (1.023)(.75) = 99.836 3 3 The x -chart is:

Upper A-B boundary = x +

101.5

101.0

UCL=100.859

A

x-bar

100.5

100.604

B

100.348

C

Centerline=100.092

100.0 C

99.836 B

99.581

99.5 A

LCL=99.325 99.0 0

10

20

30

40

Sample

Using Rule 1, there are seven points beyond Zone A. This process appears to be out of control. When the data are rounded, the process is obviously out of control. 13.22

a.

MINITAB was used to construct the control chart:

The centerline is 𝑥̿ = 1.5056. The Upper Control Limit is 𝑈𝐶𝐿 = 1.6927. The Lower Control limit is 𝐿𝐶𝐿 = 1.3186. b.

The control chart is shown above.

c.

All the rules are met, so the process is in control.

d.

MINITAB was used to construct the control chart:

Copyright © 2022 Pearson Education, Inc.


Methods for Quality Improvement: Statistical Process Control

757

All the rules are met, so the process is still in control. 13.23

The R-chart is designed to monitor the variation of the process.

13.24

The control limits of the R-chart are a function of and reflect the variation in the process. If the variation is unstable (i.e., out of control), the control limits would not be constant. Under these circumstances, the fixed control limits of the x -chart would have little meaning. We use the R-chart to determine whether the variation of the process is stable. If it is, the x -chart is meaningful. Thus, we interpret the R-chart prior to the x -chart.

13.25

Using Table IX, Appendix D:

13.26

a.

With n = 4, D3 = 0.000 and D4 = 2.282

b.

With n = 12, D3 = 0.283 and D4 = 1.717

c.

With n = 24, D3 = 0.451 and D4 = 1.548

a.

From Exercise 13.10, R =

R1 + R2 +  + R25 198.7 = = 7.948 k 25

Centerline = R = 7.948 From Table IX, Appendix D, with n = 5 , D4 = 2.114 and D3 = 0 . Upper control limit = RD4 = 7.948 ( 2.114 ) = 16.802 Since D3 = 0 , the lower control limit is negative and is not included on the chart. b.

From Table IX, Appendix D, with n = 5 , d 2 = 2.326 , and d3 = .864 . Upper A−B boundary = R + 2d3

R 7.948 = 7.948 + 2 (.864 ) = 13.853 d2 2.326

Lower A−B boundary = R − 2d3

R 7.948 = 7.948 − 2 (.864 ) = 2.043 d2 2.326

Copyright © 2022 Pearson Education, Inc.


758

Chapter 13

c.

Upper B−C boundary = R + d3

R 7.948 = 7.948 + (.864 ) = 10.900 d2 2.326

Lower B−C boundary = R − d 3

R 7.948 = 7.948 − (.864 ) = 4.996 d2 2.326

The R-chart is: 18

UCL=16.80

16 14

+AB=13.85

12

+BC=10.9 R

10 8

Centerline=7.95

6

-BC=5.00 4

-AB=2.04

2 0 0

5

10

15

20

25

Sample

To determine if the process is in or out of control, we check the four rules: Rule 1: Rule 2: Rule 3: Rule 4:

One point beyond Zone A: No points are beyond Zone A. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist.

The process appears to be in control. 13.27

a.

From Exercise 13.11, the R values are: Sample No. 1 2 3 4 5 6 7 8 9 10 R=

R 1.8 2.8 3.8 2.5 3.7 5.0 5.5 3.5 2.5 4.1

Sample No. 11 12 13 14 15 16 17 18 19 20

R 3.2 0.9 2.6 4.0 2.2 4.3 3.6 2.5 2.2 5.5

R1 + R2 +  R20 66.2 = = 3.31 k 20

Centerline = R = 3.31

Copyright © 2022 Pearson Education, Inc.


Methods for Quality Improvement: Statistical Process Control

759

From Table IX, Appendix D, with n = 4 , D4 = 2.282 , and D3 = 0 . Upper control limit = RD4 = 3.31 ( 2.282 ) = 7.553 Since D3 = 0 , the lower control limit is negative and is not included on the chart. b.

c.

From Table IX, Appendix D, with n = 4 , d 2 = 2.059 , and d3 = .880 . Upper A–B boundary = R + 2d 3

R 3.31 = 3.31 + 2 (.880 ) = 6.139 d2 2.059

Lower A–B boundary = R − 2d3

R 3.31 = 3.31 − 2 (.880 ) = 0.481 d2 2.059

Upper B–C boundary = R + d 3

R 3.31 = 3.31 + (.880 ) = 4.725 d2 2.059

Lower B–C boundary = R − d3

R 3.31 = 3.31 − (.880 ) = 1.895 d2 2.059

The R-chart is: Scatterplot of R vs Sample 8

7.553 7

6.139

6

R

5

4.725

4

3.31

3 2

1.895

1

0.481 0 0

5

10

15

20

Sample

To determine if the process is in or out of control, we check the four rules: Rule 1: Rule 2: Rule 3: Rule 4:

One point beyond Zone A: No points are beyond Zone A. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist.

The process appears to be in control. 13.28

First, we construct an R-chart. R=

R1 + R2 +  + R20 80.6 = = 4.03 k 20

Copyright © 2022 Pearson Education, Inc.


Chapter 13

Centerline = R = 4.03 From Table IX, Appendix D, with n = 7, D3 = 0.076 and D4 = 1.924 . Upper control limit = RD4 = 4.03 (1.924 ) = 7.754 Lower control limit = RD3 = 4.03 ( 0.076 ) = 0.306 From Table IX, Appendix D, with n = 7, d 2 = 2.704 and d 3 = .833 . Upper A−B boundary = R + 2d3

4.03 R = 4.03 + 2 (.833) = 6.513 2.704 d2

Lower A−B boundary = R − 2d3

R 4.03 = 4.03 − 2 (.833) = 1.547 d2 2.704

Upper B−C boundary = R + d3

R 4.03 = 4.03 + (.833) = 5.271 d2 2.704

Lower B−C boundary = R − d3

R 4.03 = 4.03 − (.833) = 2.789 d2 2.704

The R-chart is: 8

UCL=7.754

7

+AB=6.513 6

+BC=5.271

5

R

760

Centerline=4.03

4 3

-BC=2.789

2

-AB=1.547 1

LCL=0.306

0 0

5

10

15

20

Sample

To determine if the process is in or out of control, we check the four rules: Rule 1: Rule 2:

One point beyond Zone A: No points are beyond Zone A. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Rule 3: Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Rule 4: Fourteen points in a row alternating up and down: This pattern does not exist. The process appears to be in control. Since the process variation is in control, it is appropriate to construct the x -chart. To construct an x -chart, we first calculate the following: x=

x1 + x2 +  + x20 434.56 = = 21.728 k 20

R=

R1 + R2 +  R20 80.6 = = 4.03 k 20

Copyright © 2022 Pearson Education, Inc.


Methods for Quality Improvement: Statistical Process Control

761

Centerline = x = 21.728 From Table IX, Appendix D, with n = 7, A2 = .419 . Upper control limit = x + A2 R = 21.728 + .419 ( 4.03 ) = 23.417 Lower control limit = x − A2 R = 21.728 − .419 ( 4.03 ) = 20.039 2 2 A2 R = 21.728 + (.419)( 4.03) = 22.854 3 3 2 2 Lower A-B boundary = x − A2 R = 21.728 − (.419)( 4.03) = 20.602 3 3 1 1 Upper B-C boundary = x + A2 R = 21.728 + (.419)( 4.03) = 22.291 3 3 1 1 Lower B-C boundary = x − A2 R = 21.728 − (.419)( 4.03) = 21.165 3 3

Upper A-B boundary = x +

(

)

(

)

(

)

(

)

The x -chart is:

28

Xbar

26

24

UCL=23.42 +AB=22.85 +BC=22.29 Centerline=21.73 -BC=21.17 -AB=20.60 LCL=20.04

22

20

18 0

5

10

15

20

Sample

To determine if the process is in or out of control, we check the six rules: Rule 1: Rule 2: Rule 3: Rule 4: Rule 5: Rule 6:

One point beyond Zone A: There are 12 points beyond Zone A. This indicates the process is out of control. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: Points 6 through 12 steadily increase. This indicates the process is out of control. Fourteen points in a row alternating up and down: This pattern does not exist. Two out of three points in Zone A or beyond: There are several groups of three consecutive points that have two or more in Zone A or beyond. This indicates the process is out of control. Four out of five points in a row in Zone B or beyond: Several sequences of five points have four or more in Zone B or beyond. This indicates the process is out of control.

Rules 1, 3, 5, and 6 indicate that the process is out of control.

Copyright © 2022 Pearson Education, Inc.


13.29

13.30

Chapter 13

a.

The rational subgroups used to construct the R-chart are the 40 samples of size n = 5 .

b.

No, the process is not in control. There are two observations outside the control limits. Since the process is out of control, the special causes of variation need to be identified and eliminated.

a.

R=

b.

Centerline = R = 32.5

R1 + R2 +  R20 650 = = 32.5 . k 20

From Table IX, Appendix D, with n = 10 , D4 = 1.777 , and D3 = .223 . Upper control limit = RD4 = 32.5 (1.777 ) = 57.753 Lower control limit = RD3 = 32.5 (.223 ) = 7.248 c.

From Table IX, Appendix D, with n = 10 , d 2 = 3.078 , and d3 = .797 . Upper A–B boundary = R + 2d 3

R 32.5 = 32.5 + 2 (.797 ) = 49.331 d2 3.078

Lower A–B boundary = R − 2d 3

R 32.5 = 32.5 − 2 (.797 ) = 15.669 d2 3.078

Upper B–C boundary = R + d 3

R 32.5 = 32.5 + (.797 ) = 40.915 d2 3.078

Lower B–C boundary = R − d 3

R 32.5 = 32.5 − (.797 ) = 24.085 d2 3.078

The R-chart is: 60

R

762

UCL=57.75

50

+AB=49.33

40

+BC=40.92 Centerline=32.5

30

-BC=24.09 20

-AB=15.67 10

LCL=7.25

0 0

5

10

15

20

Week

The process appears to be in control. All of the observations are very close to the centerline and none of the patterns appear in the chart.

Copyright © 2022 Pearson Education, Inc.


Methods for Quality Improvement: Statistical Process Control d.

763

The R-chart with the addional points is:

R

60

UCL=57.75

50

+AB=49.33

40

+BC=40.92 Centerline=32.5

30

-BC=24.09 20

-AB=15.67 10

LCL=7.25

0 0

5

10

15

20

25

30

Week

e.

To determine if the process is in or out of control, we check the four rules: Rule 1: Rule 2: Rule 3: Rule 4:

One point beyond Zone A: No points are beyond Zone A. Nine points in a row in Zone C or beyond: There are nine points in Zone C (on one side of the centerline) or beyond. The last 9 points fit this pattern. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist.

Because of Rule 2, it appears that the process variation is not in control. 13.31

a.

MINITAB was used to create the control chart:

𝑅 = .00479 b.

LCL = 0, UCL = .01093

c.

The plot appears above. Copyright © 2022 Pearson Education, Inc.


764

13.32

Chapter 13

d.

Since all the rules appears satisfied, the process is in control.

a.

The range for each of the 25 days is shown in the table below:

Sample 1 2 3 4 5 6 7 8 9 10 11 12 13 b.

𝑅=

...

Range 0.3679 0.2517 0.139 0.3521 0.3706 0.2674 0.4189 0.2447 0.3589 0.2658 0.3509 0.4204 0.447

Sample 14 15 16 17 18 19 20 21 22 23 24 25

.

= .3252

=

Range 0.2422 0.3499 0.6823 0.3589 0.3153 0.3062 0.524 0.2185 0.1863 0.2533 0.1156 0.3224

Centerline= 𝑅 = .3252 From Table IX, Appendix D, with 𝑛 = 5, 𝐷 = 0.000 and 𝐷 = 2.114. Upper control limit= 𝑅𝐷 = .3252 2.114 = .6875 Since D3 = 0 , the lower control limit is negative and not included. From Table IX, Appendix D, with 𝑛 = 5, 𝑑 = 2.326 and 𝑑 = .864.

c.

Upper A-B boundary= 𝑅 + 2𝑑

= .3252 + 2 . 864

.

Lower A-B boundary= 𝑅 − 2𝑑

= .3252 − 2 . 864

.

Upper B-C boundary= 𝑅 + 𝑑

= .3252 + . 864

Lower B-C boundary= 𝑅 − 𝑑

== .3252 − . 864

.

.

.

= .5668 = .0836 = .4460

. . .

= .2044

MINITAB was used to create the R-chart:

Copyright © 2022 Pearson Education, Inc.


Methods for Quality Improvement: Statistical Process Control

Since all the rules appears satisfied, the process is in control.

d.

The phase 2 data is plotted on the same chart:

Since all the rules appears satisfied, the process is in control. 13.33

a.

R=

R1 + R2 + ... + R16 .3800 = = .0238 16 16

Centerline = R = .0238 From Table IX, Appendix D, with n = 2, D3 = 0.000 and D4 = 3.267 . Upper control limit = RD4 = .0238 ( 3.267 ) = .0778 Since D3 = 0 , the lower control limit is negative and not included. From Table IX, Appendix D, with n = 2, d 2 = 1.128 and d3 = .853 .

Copyright © 2022 Pearson Education, Inc.

765


Chapter 13

Upper A-B boundary = R + 2d3

.0238 R = .0238 + 2 (.853) = .0598 1.128 d2

Lower A-B boundary = R − 2d3

R .0238 = .0238 − 2 (.853) = −.0122 or 0 (cannot be negative) d2 1.128

Upper B-C boundary = R + d3

.0238 R = .0238 + (.853) = .0418 1.128 d2

Lower B-C boundary = R − d3

R .0238 = .0238 − (.853) = .0058 1.128 d2

The R-chart is: 0.08

UCL=0.0778 A

0.07 0.06 0.05

Range

766

0.0598 B 0.0418

0.04

C 0.03

Centerline=0.0238

0.02

C

0.01

0.0058 0.00 0

2

4

6

8

10

12

14

16

Sample

To determine if the process is in or out of control, we check the four rules: Rule 1: Rule 2: Rule 3: Rule 4:

One point beyond Zone A: No points are beyond Zone A. Nine points in a row in Zone C or beyond: This pattern doe not exist. Six points in a row steadily increasing or decreasing: This pattern doe not exist. Fourteen points in a row alternating up and down: This pattern doe not exist.

The process appears to be in control. b.

x=

x1 + x2 + ... + x16 3.5430 = = .2214 k 16

Centerline = x = .2214 From Table IX, Appendix D, with n = 2, A2 = 1.880 . Upper control limit = x + A2 R = .2214 + 1.880 (.0238 ) = .2661 Lower control limit = x − A2 R = .2214 − 1.880 (.0238 ) = .1767 2 2 A2 R = .2214 + (1.880)( .0238) = .2512 3 3 2 2 Lower A-B boundar y = x − A2 R = .2214 − (1.880)(.0238) = .1916 3 3 1 1 Upper B-C boundary = x + A2 R = .2214 + (1.880)(.0238) = .2363 3 3 1 1 Lower B-C boundary = x − A2 R = .2214 − (1.880)(.0238) = .2065 3 3 Upper A-B boundary = x +

Copyright © 2022 Pearson Education, Inc.


Methods for Quality Improvement: Statistical Process Control

767

The x -chart is: 0.27 0.26

UCL=0.2661 A 0.2515

0.25

B 0.24

Avg

0.23

0.2363 C Centerline=0.2214

0.22

C

0.21 0.20 0.19

0.2065 B 0.1916 A

0.18

LCL=0.1767 0

2

4

6

8

10

12

14

16

Sample

To determine if the process is in or out of control, we check the six rules: Rule 1: Rule 2: Rule 3: Rule 4: Rule 5: Rule 6:

One point beyond Zone A: There are no points beyond Zone A. Nine points in a row in Zone C or beyond: This pattern does not exist. Six points in a row steadily increasing or decreasing: This pattern does not exist. Fourteen points in a row alternating up and down: This pattern does not exist. Two out of three points in Zone A or beyond: This pattern does not exist. Four out of five points in a row in Zone B or beyond: This pattern does not exist.

This process appears to be in control.

13.34

e.

Based on the R-chart and the x -chart, the process appears to be in control. An estimate of the true average thickness of the expensive layer would be x =.2214.

a.

Yes. Because all five observations in each sample were selected from the same dispenser, the rational subgrouping will enable the company to detect variation in fill caused by differences in the carbon dioxide dispensers.

b.

For each sample, we compute the range, R, the largest measurement - smallest measurement. The results are listed in the table: Sample No. 1 2 3 4 5 6 7 8 9 10 11 12 R=

R .05 .06 .06 .05 .07 .07 .09 .08 .08 .11 .14 .14

Sample No. 13 14 15 16 17 18 19 20 21 22 23 24

R .05 .04 .05 .05 .06 .06 .05 .08 .08 .12 .12 .15

R1 + R2 +  R24 1.91 = = .0796 k 24

Copyright © 2022 Pearson Education, Inc.


Chapter 13

Centerline = R = .0796 From Table IX, Appendix D, with n = 5, D3 = 0.000 and D4 = 2.114 . Upper control limit = RD4 = .0796 ( 2.114 ) = .168 Since D3 = 0 , the lower control limit is negative and is not included on the chart. From Table IX, Appendix D, with n = 5, d 2 = 2.326 and d 3 = .864 . Upper A−B boundary = R + 2d3

R .0796 = .0796 + 2 (.864 ) = .139 2.326 d2

Lower A−B boundary = R − 2d3

R .0796 = .0796 − 2 (.864 ) = .020 2.326 d2

Upper B−C boundary = R + d3

R .0796 = .0796 + (.864 ) = .109 d2 2.326

Lower B−C boundary = R − d3

R .0796 = .0796 − (.864 ) = .050 d2 2.326

The R-chart is: 0.18

UCL=0.168

0.16 0.14

+AB=0.139

0.12

+BC=0.109 0.10

R

768

Centerline=0.0796

0.08 0.06

-BC+0.050 0.04

-AB=0.020

0.02 0.00 0

5

10

15

20

25

Sample Number

c.

To determine if the process is in or out of control, we check the four rules: Rule 1: Rule 2: Rule 3: Rule 4:

One point beyond Zone A: No points are beyond Zone A. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist.

The process appears to be in control. d.

Since the process variation is in control, the R-chart should be used to monitor future process output.

e.

The x -chart should be constructed. The control limits of the x -chart depend on the variation of the process. (In particular, they are constructed using R .) If the variation of the process is in control, the control limits of the x -chart are meaningful.

Copyright © 2022 Pearson Education, Inc.


Methods for Quality Improvement: Statistical Process Control

a.

The values of R for the samples are: Sample R

Sample

R

Sample

1 2 3 4 5 6 7 8 9 10

0.12 11 0.92 21 1.53 12 1.05 22 0.29 13 1.01 23 1.68 14 0.5 24 0.38 15 0.24 25 0.79 16 1.19 26 0.28 17 1.14 27 0.83 18 0.2 28 1.26 19 0.86 29 0.39 20 0.97 30 R1 + R2 +  R40 32.26 Centerline = R = = = .8065 . 40 k

R

Sample

R

0.65 0.69 1.24 0.75 1.62 0.79 0.23 0.37 1.77 0.47

31 32 33 34 35 36 37 38 39 40

0.77 0.65 0.99 0.39 0.76 1.23 1.69 0.7 0.69 0.18

From Table IX, Appendix D, with n = 3 , D4 = 2.574 , and D3 = 0 . Upper control limit = RD4 = .8065 ( 2.574 ) = 2.076 Since D3 = 0 , the lower control limit is negative and is not included on the chart. From Table IX, Appendix D, with n = 3 , d 2 = 1.693 , and d3 = .888 . Upper A–B boundary = R + 2d 3

R .8065 = .8065 + 2 (.888 ) = 1.653 d2 1.693

Lower A–B boundary = R − 2d3

R .8065 = .8065 − 2 (.888 ) = −.040 d2 1.693

Upper B–C boundary = R + d 3

R .8065 = .8065 + (.888 ) = 1.23 1.693 d2

Lower B–C boundary = R − d 3

R .8065 = .8065 − (.888 ) = .383 d2 1.693

The R-chart is: UCL=2.076

2.0

+AB=1.653 1.5

+BC=1.23 R

13.35

1.0

Centerline=0.807 0.5

-BC=0.383

0.0

-AB=-0.04 0

10

20

30

40

Sample

To determine if the process is in or out of control, we check the four rules: Copyright © 2022 Pearson Education, Inc.

769


770

Chapter 13

Rule 1: Rule 2:

One point beyond Zone A: No points are beyond Zone A. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist.

Rule 3: Rule 4:

It appears that the process variation is in control. b.

The values of R for the rounded data are: Sample

R

1 2 3 4 5 6 7 8 9 10

0 2 0 2 0 1 0 1 1 0

Sample

R

Sample

11 1 12 1 13 1 14 1 15 0 16 1 17 1 18 0 19 1 20 1 R + R2 +  R40 33 Centerline = R = 1 = = .825 . 40 k

21 22 23 24 25 26 27 28 29 30

R

Sample

R

1 1 1 1 2 1 0 0 2 0

31 32 33 34 35 36 37 38 39 40

1 1 1 0 1 1 2 1 1 0

From Table IX, Appendix D, with n = 3 , D4 = 2.574 , and D3 = 0 . Upper control limit = RD4 = .825 ( 2.574 ) = 2.124 Since D3 = 0 , the lower control limit is negative and is not included on the chart. From Table IX, Appendix D, with n = 3 , d 2 = 1.693 , and d3 = .888 . Upper A–B boundary = R + 2d 3

R .825 = .825 + 2 (.888 ) = 1.69 d2 1.693

Lower A–B boundary = R − 2d 3

R .825 = .825 − 2 (.888 ) = −.04 d2 1.693

Upper B–C boundary = R + d 3

R .825 = .825 + (.888 ) = 1.258 d2 1.693

Lower B–C boundary = R − d 3

R .825 = .825 − (.888 ) = .392 d2 1.693

The R-chart for the rounded data is:

Copyright © 2022 Pearson Education, Inc.


Methods for Quality Improvement: Statistical Process Control

771

UCL=2.124 2.0 +AB=1.69 1.5

R-Ro

+BC=1.258 1.0 Centerline=0.825 0.5

-BC=0.392

0.0 0

10

20

30

40

Sample

To determine if the process is in or out of control, we check the four rules: Rule 1: Rule 2: Rule 3: Rule 4:

One point beyond Zone A: No points are beyond Zone A. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist.

It appears that the process variation is in control. There are very few different values for R, once the data have been rounded off. 13.36

a.

R=

R1 + R2 +  + R20 4 + 6 +  + 15 176 = = = 8.8 20 20 k

Centerline = R = 8.8 From Table IX, Appendix D, with n = 5, D3 = 0.000 and D4 = 2.114 . Upper control limit = RD4 = 8.8 ( 2.114 ) = 18.603 Since D3 = 0 , the lower control limit is negative and is not included on the chart. From Table IX, Appendix D, with n = 5, d 2 = 2.326 and d 3 = .864 . Upper A – B boundary = R + 2d3

R 8.8 = 8.8 + 2 (.864 ) = 15.338 d2 2.326

Lower A – B boundary = R − 2d3

R 8.8 = 8.8 − 2 (.864 ) = 2.262 d2 2.326

Upper B – C boundary = R + d3

R 8.8 = 8.8 + (.864 ) = 12.069 d2 2.326

Lower B – C boundary = R − d3

R 8.8 = 8.8 − (.864 ) = 5.531 d2 2.326

The R-chart is: Copyright © 2022 Pearson Education, Inc.


772

Chapter 13

20 ucl=18.60

+AB=15.34

15

R

+BC=12.07 10 CenterLine=8.8

-BC=5.53

5

-AB=2.26 0 0

5

10

15

20

Week

b.

To determine if the process is in or out of control, we check the four rules: Rule 1: One point beyond Zone A: No points are beyond Zone A. Rule 2: Nine points in a row in Zone C or beyond: Points 1 through 13 are in Zone C (on one side of the centerline) or beyond. Rule 3: Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Rule 4: Fourteen points in a row alternating up and down: This pattern does not exist. Because of Rule 2, it appears that the process variation is not in control. Special causes of variation appear to be present.

13.37

c.

Since the process appears to be in control, the control limits of the R-chart could be used to monitor future replacement cycle times.

d.

It appear that as the week increases, the value of R tends to increase. This indicates that as time goes on, the variation in the time for replacement cards increases. The bank needs to address this issue.

a.

The process of interest is the production of bolts used in military aircraft.

b.

MINITAB was used to create the R-chart for the process:

c.

To determine if the process is in or out of control, we check the six rules:

Copyright © 2022 Pearson Education, Inc.


Methods for Quality Improvement: Statistical Process Control Rule 1: Rule 2: Rule 3: Rule 4:

773

One point beyond Zone A: No points are beyond Zone A. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist.

The process appears to be in control. No special causes of variation appear to be present.

d.

An example of a special cause of variation would be if the machine used to produce the bolts slipped out of alignment and started producing bolts of a different length. An example of common cause variation would be the grade of the raw material used to make the bolts.

e.

Since the process appears to be in control, it is appropriate to use these data to create the 𝑥̅ -chart. MINITAB was used to create the control chart:

To determine if the process is in or out of control, we check the six rules: Rule 1: Rule 2: Rule 3: Rule 4: Rule 5: Rule 6:

One point beyond Zone A: No points are beyond Zone A. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: his pattern does not exist. Two out of three points in Zone A or beyond: There are no groups of three consecutive points that have two or more in Zone A or beyond. Four out of five points in a row in Zone B or beyond: No sequence of five points has four or more in Zone B or beyond.

The process appears to be in control. No special causes of variation appear to be present. 13.38

a.

R=

R1 + R2 +  + R25 2.5 + 1.5 +  + 2.0 52 = = = 2.08 k 25 25 Copyright © 2022 Pearson Education, Inc.


Chapter 13

Centerline = R = 2.08 From Table IX, Appendix D, with n = 5, D3 = 0.000 and D4 = 2.114 . Upper control limit = RD 4 = 2.08 ( 2.114 ) = 4.397 Since D3 = 0 , the lower control limit is negative and is not included on the chart. From Table IX, Appendix D, with n = 5, d 2 = 2.326 and d 3 = .864 . Upper A – B boundary = R + 2d3

R 2.08 = 2.08 + 2 (.864 ) = 3.625 2.326 d2

Lower A – B boundary = R − 2d3

R 2.08 = 2.08 − 2 (.864 ) = .535 d2 2.326

Upper B – C boundary = R + d3

R 2.08 = 2.08 + (.864 ) = 2.853 d2 2.326

Lower B – C boundary = R − d3

R 2.08 = 2.08 − (.864 ) = 1.307 d2 2.326

The R-chart is: UCL=4.397 4

+AB=3.625 3

Range

774

+BC=2.853

Centerline=2.08

2

-BC=1.307 1

-AB=0.535 0 0

5

10

15

20

25

Sample

To determine if the process is in or out of control, we check the four rules: Rule 1: Rule 2: Rule 3: Rule 4:

One point beyond Zone A: No points are beyond Zone A. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist.

The process appears to be in control since none of the out-of-control signals are observed. No special causes of variation appear to be present. b.

Since the process appears to be under control, it is appropriate to construct an x -chart for the data.

Copyright © 2022 Pearson Education, Inc.


Methods for Quality Improvement: Statistical Process Control

c.

R=

775

R1 + R2 +  + R25 2.5 + 0.0 +  + 2.5 42.5 = = = 1.7 k 25 25

Centerline = R = 1.7 From Table IX, Appendix D, with n = 5, D3 = 0.000 and D4 = 2.114 . Upper control limit = RD4 = 1.7 ( 2.114 ) = 3.594 Since D3 = 0 , the lower control limit is negative and is not included on the chart. From Table IX, Appendix D, with n = 5, d 2 = 2.326 and d 3 = .864 . Upper A – B boundary = R + 2d3

R 1.7 = 1.7 + 2 (.864 ) = 2.963 2.326 d2

Lower A – B boundary = R − 2d3

R 1.7 = 1.7 − 2 (.864 ) = .437 d2 2.326

Upper B – C boundary = R + d3

R 1.7 = 1.7 + (.864 ) = 2.331 d2 2.326

Lower B – C boundary = R − d3

R 1.7 = 1.7 − (.864 ) = 1.069 d2 2.326

The R chart is: 5

UCL=4.228

4

+AB=3.486 3

RR

+BC=2.743 Centerline=2

2

-BC=1.257 1

-AB=0.514 0 0

5

10

15

20

25

Sample

To determine if the process is in or out of control, we check the four rules: Rule 1: Rule 2: Rule 3: Rule 4:

One point beyond Zone A: Two points are beyond Zone A. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist.

The process appears to be out of control. Rule 1 indicates the process is out of control. Since this process is out of control, it is not appropriate to construct an x-chart for the data. d.

We get two different answers as to whether this process is in control, depending on the accuracy of the data. When the data were measured to an accuracy of .5 gram, the process appears to be in control. However, when the data were measured to an accuracy of only 2.5 grams, the process Copyright © 2022 Pearson Education, Inc.


776

Chapter 13

appears to be out of control. The same data were used for each chart – just measured to different accuracies. 13.39

The p-chart is designed to monitor the proportion of defective units produced by a process.

13.40

a.

The sample size is determined by the following: n >

9 (1 − p 0 ) p0

9 (1 − .01) = 891 .01

=

The minimum sample size is 892. b.

The sample size is determined by the following: n >

9 (1 − p 0 ) p0

9 (1 − .05)

=

.05

= 171

The minimum sample size is 172. c.

The sample size is determined by the following: n >

9 (1 − p 0 ) p0

=

9 (1 − .10) = 81 .10

=

9 (1 − .20) = 36 .20

The minimum sample size is 82. d.

The sample size is determined by the following: n >

9 (1 − p 0 ) p0

The minimum sample size is 37. 13.41

The sample size is determined as follows: n >

13.42

a.

9 (1 − p0 ) p0

=

9 (1 − .08) = 103.5 ≈ 104 .08

To compute the proportion of defectives in each sample, divide the number of defectives by the number in the sample, 200: No. of defectives No. in sample The sample proportions are listed in the table: pˆ =

b.

Sample No.

Sample No.

1 2 3 4 5 6 7 8 9 10 11 12 13

.080 .070 .045 .055 .075 .040 .060 .080 .085 .065 .075 .050 .045

14 15 16 17 18 19 20 21 22 23 24 25

.060 .070 .055 .040 .035 .060 .075 .045 .080 .065 .055 .050

To get the total number of defectives, sum the number of defectives for all 25 samples. The sum is 303. To get the total number of units sampled, multiply the sample size by the number of samples: 200 ( 25) = 5000 . Copyright © 2022 Pearson Education, Inc.


Methods for Quality Improvement: Statistical Process Control

p=

777

Total defective in all samples 303 = = .0606 Total units sampled 5000

Centerline = p = .0606

c.

Upper control limit = p + 3

p (1 − p ) .0606 (.9394) = .0606 + 3 = .1112 n 200

Lower control limit = p − 3

p (1 − p ) .0606 (.9394) = .0606 − 3 = .0100 n 200

Upper A–B boundary = p + 2

p (1 − p ) .0606 (.9394) = .0606 + 2 = .0943 n 200

Lower A–B boundary = p − 2

p (1 − p ) .0606 (.9394) = .0606 − 2 = .0269 n 200

Upper B–C boundary = p + Lower B–C boundary = p − d.

p (1 − p ) n

= .0606 +

.0606 (.9394) 200

= .0775

p (1 − p ) .0606 (.9394) = .0606 − = .0437 n 200

The p-chart is: 0.12

UCL=0.1112 0.10

p-hat

+AB=0.0943 0.08

+BC=0.0775

0.06

Centerline=0.0606 -BC=0.0437

0.04

-AB=0.0269 0.02

LCL=0.0100 0.00 0

5

10

15

20

25

Sample

e.

To determine if the process is in or out of control, we check the four rules: Rule 1: Rule 2: Rule 3: Rule 4:

One point beyond Zone A: No points are beyond Zone A. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist.

The process appears to be in control. There do not appear to be any special causes of variation.

Copyright © 2022 Pearson Education, Inc.


778

13.43

Chapter 13

a.

We must first calculate p . To do this, it is necessary to find the total number of defectives in all the samples. To find the number of defectives per sample, we multiple the proportion by the sample size, 150. The number of defectives per sample are shown in the table: Sample No. 1 2 3 4 5 6 7 8 9 10

p .03 .05 .10 .02 .08 .09 .08 .05 .07 .06

No. Defectives 4.5 7.5 15.0 3.0 12.0 13.5 12.0 7.5 10.5 9.0

Sample No. 11 12 13 14 15 16 17 18 19 20

p .07 .04 .06 .05 .07 .06 .07 .02 .05 .03

No. Defectives 10.5 6.0 9.0 7.5 10.5 9.0 10.5 3.0 7.5 4.5

Note: There cannot be a fraction of a defective. The proportions presented in the exercise have been rounded off. I have used the fractions to minimize the roundoff error. To get the total number of defectives, sum the number of defectives for all 20 samples. The sum is 172.5. To get the total number of units sampled, multiply the sample size by the number of samples: 1 50 ( 2 0 ) = 30 0 0 . p=

Total defective in all samples 172.5 = = .0575 Total units sampled 3000

Centerline = p = .0575 Upper control limit = p + 3 Lower control limit = p − 3

b.

p (1 − p ) n p (1 − p )

Upper A−B boundary = p + 2 Lower A-B boundary = p − 2 Upper B-C boundary = p + Lower B-C boundary = p −

n

= .0575 + 3 = .0575 − 3

p (1 − p ) n p (1 − p ) n p (1 − p ) n p (1 − p ) n

.0575 (.9425 ) 150 .0575 (.9425 ) 150

= .0575 + 2 = .0575 − 2

= .0575 + = .0575 −

= .1145 = .0005

.0575 (.9425 ) 150 .0575 (.9425 ) 150

.0575 (.9425 ) 150 .0575 (.9425 ) 150

= .0955 = .0195

= .0765 = .0385

Copyright © 2022 Pearson Education, Inc.


Methods for Quality Improvement: Statistical Process Control

c.

779

The p-chart is: 0.12

UCL=0.1145

0.10

+AB=0.0955

p-hat

0.08

+BC=0.0765

0.06

Centerline=0.0575

0.04

-BC=0.0385

0.02

-AB=0.0195

LCL=0.0005

0.00 0

5

10

15

20

Sample

d.

To determine if the process is in or out of control, we check the four rules: Rule 1: Rule 2: Rule 3: Rule 4:

One point beyond Zone A: No points are beyond Zone A. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: Points 7 through 20 alternate up and down. This indicates the process is out of control.

Rule 4 indicates that the process is out of control. e.

Since the process is out of control, the centerline and control limits should not be used to monitor future process output. The centerline and control limits are intended to represent the behavior of the process when it is under control.

13.44

Yes. The process is out of control because there are several points above the upper control limit and one pointbelow the lower control point.

13.45

a.

The attribute of interest is post-operative complications.

b.

The rational subgroups are the months.

c.

p=

d.

The proportions are found by dividing the number of complications each month by the number of procedures each month. The proportions are:

Total defective in all samples 294 = = .100 Total units sampled 2939

Copyright © 2022 Pearson Education, Inc.


Chapter 13 Month

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 e.

Complications 14 12 10 12 9 7 9 11 9 12 10 7 12 9 15

Procedures

Prop.

Month

105 97 115 100 95 111 68 47 83 108 115 94 107 99 105

.133 .124 .087 .120 .095 .063 .132 .234 .108 .111 .087 .074 .112 .091 .143

16 17 18 19 20 21 22 23 24 25 26 27 25 29 30

Upper control limit = p + 3 Lower control limit = p − 3

p (1 − p ) n p (1 − p ) n

Upper A–B boundary = p + 2 Lower A–B boundary = p − 2

= .100 + 3 = .100 − 3

p (1 − p ) n p (1 − p ) n p (1 − p )

Upper B–C boundary = p +

n p (1 − p )

Lower B–C boundary = p − f.

Complications 13 7 10 8 5 12 9 7 9 15 12 8 2 9 10

Procedures

Prop.

110 97 105 71 48 95 110 103 95 105 100 116 110 105 120

.118 .072 .095 .113 .104 .126 .082 .068 .095 .143 .120 .069 .018 .086 .083

Since the sample sizes varied for each sample, we will use the average sample size for n: 2939 n= ≈ 98 30

n

.100 (.900 ) 98 .100 (.900 ) 98

= .100 + 2 = .100 − 2

= .100 + = .100 −

= .191 = .009

.100 (.900 ) 98 .100 (.900 ) 98

.100 ( .900 ) 98 .100 ( .900 ) 98

= .161 = .039

= .130 = .070

The p-chart for the data is: 0.25

0.20

UCL=0.191 +AB=0.161

0.15

+BC=0.13

p

780

Centerline=0.1

0.10

-BC=0.07 0.05

-AB=0.039 LCL=0.009

0.00 0

5

10

15

20

25

30

Month

Copyright © 2022 Pearson Education, Inc.


Methods for Quality Improvement: Statistical Process Control

g.

To determine if the process is in or out of control, we check the four rules: Rule 1: Rule 2: Rule 3: Rule 4:

One point beyond Zone A: One point is beyond Zone A. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist.

Rule 1 is violated. The process appears to be out of control. 13.46

a.

The centerline is p = .107 .

b.

Upper control limit = p + 3 Lower control limit = p − 3

13.47

781

p (1 − p ) n p (1 − p ) n

= .107 + 3 = .107 − 3

.107 (.893 ) 18 .107 (.893 ) 18

= .326 = − .112

c.

Since none of the observations were outside the UCL and LCL, the process could be in control. However, we would also need to have the zone boundaries to check for other patterns that could indicate the process is out of control.

a.

The proportion of damaged soybeans for each lot is found by dividing the number of damaged soybeans found in each lot by the sample size of n = 200. The results are: Lot VarA 1 0.05 2 0.025 3 0.02 4 0.01 5 0.015 6 0 7 0 8 0 9 0 10 0.005 11 0

b.

The centerline is 𝑝̄ =

Total defective in all samples Total units sampled

=

= .01136400

Upper control limit= 𝑝̄ + 3

̄

̄

= .0114 + 3

.

.

= .0338

Lower control limit= 𝑝̄ − 3

̄

̄

= .0114 − 3

.

.

=0

Copyright © 2022 Pearson Education, Inc.


782

Chapter 13 c.

The control chart is shown here:

To determine if the process is in control, we check the four rules. Rule 1: Rule 2: Rule 3: Rule 4:

One point beyond Zone A: We see sample 1 beyond Zone A. Nine points in a row in Zone C or beyond: There are not nine points in a row in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist.

It appears that the process is out of control. d.

The proportion of damaged soybeans for each lot is found by dividing the number of damaged soybeans found in each lot by the sample size of n = 200. The results are: Lot VarB 1 0 2 0 3 0.01 4 0.01 5 0.005 6 0.02 7 0 8 0.02 9 0.01 10 0.01 11 0.005 The centerline is 𝑝̄ =

Total defective in all samples Total units sampled

=

= .008182

Upper control limit= 𝑝̄ + 3

̄

̄

= .0082 + 3

.

.

= .0273

Lower control limit= 𝑝̄ − 3

̄

̄

= .0082 − 3

.

.

=0

Copyright © 2022 Pearson Education, Inc.


Methods for Quality Improvement: Statistical Process Control The control chart is shown here:

To determine if the process is in control, we check the four rules. Rule 1: Rule 2: Rule 3: Rule 4:

One point beyond Zone A: No points beyond Zone A. Nine points in a row in Zone C or beyond: There are not nine points in a row in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist.

It appears that the process is in control. e.

The proportion of damaged soybeans for each lot is found by dividing the number of damaged soybeans found in each lot by the sample size of n = 200. The results are: Lot VarC 1 0.01 2 0 3 0 4 0.015 5 0.01 6 0.015 7 0.01 8 0.005 9 0 10 0.005 11 0 The centerline is 𝑝̄ =

Total defective in all samples Total units sampled

=

= .00636

Upper control limit= 𝑝̄ + 3

̄

̄

= .0064 + 3

.

.

= .0232

Lower control limit= 𝑝̄ − 3

̄

̄

= .0064 − 3

.

.

=0

Copyright © 2022 Pearson Education, Inc.

783


784

Chapter 13 The control chart is shown here:

To determine if the process is in control, we check the four rules. Rule 1: Rule 2: Rule 3: Rule 4:

One point beyond Zone A: No points beyond Zone A. Nine points in a row in Zone C or beyond: There are not nine points in a row in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist.

It appears that the process is in control. f.

The proportion of damaged soybeans for each lot is found by dividing the number of damaged soybeans found in each lot by the sample size of n = 200. The results are: Lot VarD 1 0 2 0 3 0 4 0 5 0 6 0 7 0.005 8 0.005 9 0 10 0 11 0 The centerline is 𝑝̄ =

Total defective in all samples Total units sampled

=

= .00091

Upper control limit= 𝑝̄ + 3

̄

̄

= .0009 + 3

.

.

= .0073

Lower control limit= 𝑝̄ − 3

̄

̄

= .0009 − 3

.

.

=0

Copyright © 2022 Pearson Education, Inc.


Methods for Quality Improvement: Statistical Process Control

785

The control chart is shown here:

To determine if the process is in control, we check the four rules. Rule 1: Rule 2: Rule 3: Rule 4:

One point beyond Zone A: No points beyond Zone A. Nine points in a row in Zone C or beyond: There are not nine points in a row in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist.

It appears that the process is in control. 13.48

a.

To get the total number of defectives, sum the number of defectives for all 100 samples. The sum is 30. To get the total number of units sampled, multiply the sample size by the number of samples: 100 ( 20 ) = 2,000 . p=

Total defective in all samples 30 = = .015 Total units sampled 2, 000

Centerline = p = .015 b.

Upper control limit = p + 3

c.

Lower control limit = p − 3

d.

The p-chart is:

p (1 − p ) n

p (1 − p ) n

= .015 + 3

= .015 − 3

.015 (.985 ) 100

.015 (.985 ) 100

= .051

= − .021

Copyright © 2022 Pearson Education, Inc.


786

Chapter 13

0.10

0.08

p-hat

0.06 UCL=0.051 0.04

0.02

Centerline=0.015

0.00 0

20

40

60

80

100

Sample

The process appears to be out of control as several observations are above the upper control limit.

e.

UCL* = UCL +

LCL* = LCL +

4(1 − 2 p) 3n 4 (1 − 2 p) 3n

= .051 +

4(1 − 2(.015) ) 3( 20)

= −.021 +

= .116

4 (1 − 2 (.015) ) 3( 20)

= .044

The adjusted p-chart is: 0.12

UCL=0.116

0.10

p-hat

0.08

0.06 Centerline=0.044

0.04

0.02

LCL=0.015

0.00 0

20

40

60

80

100

Sample

The process appears to be out of control as several observations are below the lower control limit. 13.49

a.

To compute the proportion of defectives in each sample, divide the number of defectives by the number in the sample, 100: pˆ =

No. of defectives No. in sample

Copyright © 2022 Pearson Education, Inc.


Methods for Quality Improvement: Statistical Process Control

787

The sample proportions are listed in the table: Sample No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

.02 .04 .13 .04 .01 .01 .10 .11 .09 .00 .03 .04 .02 .08 .02

Sample No. 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

.03 .02 .07 .03 .02 .03 .07 .04 .03 .02 .02 .00 .03 .01 .04

To get the total number of defectives, sum the number of defectives for all 30 samples. The sum is 120. To get the total number of units sampled, multiply the sample size by the number of samples: 10 0 ( 30 ) = 30 0 0 . p=

Total defective in all samples 120 = = .04 Total units sampled 3000

The centerline is = p = .04 Upper control limit = p + 3 Lower control limit = p − 3

p (1 − p ) n p (1 − p )

Upper A−B boundary = p + 2 Lower A−B boundary = p − 2 Upper B−C boundary = p + Lower B−C boundary = p −

n

= .04 + 3 = .04 − 3

p (1 − p ) n p (1 − p ) n p (1 − p ) n p (1 − p ) n

.04 (1 − .04 ) 100 .04 (1 − .04 ) 100

= .04 + 2 = .04 − 2

= .04 + = .04 −

= .099 = − .019 or 0 (cannot be negative)

.04 (1 − .04 ) 100 .04 (1 − .04 ) 100

.04 (1 − .04 ) 100 .04 (1 − .04 ) 100

= .079 = .001

= .060 = .020

The p-chart is:

Copyright © 2022 Pearson Education, Inc.


788

Chapter 13

0.14

p-hat

0.12 0.10

UCL=0.099

0.08

+AB=0.079

0.06

+BC=0.06

0.04

Centerline=0.04

0.02

-BC=0.02 -AB=0.001

0.00 0

5

10

15

20

25

30

Week

b.

To determine if the process is in or out of control, we check the four rules for the p-chart. Rule 1: Rule 2: Rule 3: Rule 4:

One point beyond Zone A: There are 3 points beyond Zone A—points 3, 7, and 8. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: This pattern is not present. Fourteen points in a row alternating up and down: This pattern does not exist.

The process does not appear to be in control. Rule 1 indicates that the process is out of control.

13.50

c.

No. Since the process is not in control, then these control limits are meaningless.

a.

MINITAB was used to construct the p-chart below:

b.

To determine if the process is in or out of control, we check the four rules: Rule 1: Rule 2: Rule 3: Rule 4:

One point beyond Zone A: There are many points beyond Zone A. Nine points in a row in Zone C or beyond: We see several sequences of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: This pattern is not present. Fourteen points in a row alternating up and down: This pattern does not exist. Copyright © 2022 Pearson Education, Inc.


Methods for Quality Improvement: Statistical Process Control

The process does not appear to be in control. Rules 1 and 2 indicates that the process is out of control. c.

To compute the proportion of leaky pumps in each sample, divide the number of leaky pumps by the number in the sample, 500: pˆ =

No. leaky pumps No. in sample

The sample proportions are listed in the table: Week 1 2 3 4 5 6 7

Week 8 9 10 11 12 13

0.72 .056 .048 .052 .040 .112 .052

.056 .062 .052 .068 .052 .064

To get the total number of leaky pumps, sum the number of leaky pumps for all 13 samples. The sum is 393. To get the total number of pumps sampled, multiply the sample size by the number of samples: 500 (13 ) = 6, 500 . p=

Total leaky pumps in all samples 393 = = .060 Total pumps sampled 6500

The Centerline is p = .060 p (1 − p )

Upper control limit = p + 3

n p (1 − p )

Lower control limit = p − 3

n

Upper A-B boundary = p + 2 Lower A-B boundary = p − 2 Upper B-C boundary = p + Lower B-C boundary = p −

= .060 + 3 = .060 − 3

p (1 − p ) n p (1 − p ) n p (1 − p ) n p (1 − p ) n

.06 (1 − .06 ) .06 (1 − .06 )

= .060 − 2

= .060 −

= .060 − .032 = .028

500

= .060 + 2

= .060 +

= .060 + .032 = .092

500

.06 (1 − .06 ) 500 .06 (1 − .06 ) 500

.06 (1 − .06 ) 500

= .060 + .021 = .081 = .060 − .021 = .039

= .060 + .011 = .071

.06 (1 − .06 ) 500

= .060 − .011 = .049

The p-chart is:

Copyright © 2022 Pearson Education, Inc.

0.11 0.10

UCL=0.092

0.09

A 0.08

eaks

13.51

Any of the days that fall outside the control limits may involve special causes of variation.

B 0.07

0.081 0.071

789


790

Chapter 13

To determine if the process is in or out of control, we check the four rules: Rule 1: Rule 2: Rule 3: Rule 4:

One point beyond Zone A: One point lies beyond Zone A. Nine points in a row in Zone C or beyond: This pattern doe not exist. Six points in a row steadily increasing or decreasing: This pattern doe not exist. Fourteen points in a row alternating up and down: This pattern doe not exist.

The process appears to be out of control because Rule 1 is not followed. One observation is beyond Zone A. It appears that the process is not stable. 13.52

a.

The sample size is determined by the following: n >

9 (1 − p 0 ) p0

=

9 (1 − .07 ) = 119.6 ≈ 120 .07

The minimum sample size is 120. b.

To compute the proportion of defectives in each sample, divide the number of defectives by the number in the sample, 120: pˆ =

No. defectives No. in sample

The sample proportions are listed in the table: Sample No. 1 2 3 4 5 6 7 8 9 10

.092 .042 .033 .067 .083 .108 .075 .067 .083 .092

Sample No. 11 12 13 14 15 16 17 18 19 20

.083 .100 .067 .050 .083 .042 .083 .083 .025 .067

To get the total number of defectives, sum the number of defectives for all 20 samples. The sum is 171. To get the total number of units sampled, multiply the sample size by the number of samples: 120 ( 20 ) = 2,400 . Copyright © 2022 Pearson Education, Inc.


Methods for Quality Improvement: Statistical Process Control

p=

791

Total defective in all samples 171 = = .071 Total units sampled 2400

Centerline = p = .071 Upper control limit = p + 3

.071 (.929 ) p (1 − p ) = .071 + 3 = .141 n 120

Lower control limit = p − 3

.071 (.929 ) p (1 − p ) = .071 − 3 = .001 n 120

Upper A–B boundary = p + 2

.071 (.929 ) p (1 − p ) = .071 + 2 = .118 n 120

Lower A–B boundary = p − 2

.071 (.929 ) p (1 − p ) = .071 − 2 = .024 n 120

Upper B–C boundary = p + Lower B–C boundary = p −

p (1 − p ) = .071 + n

.071 (.929 )

p (1 − p )

.071 (.929 )

n

= .071 −

120

= .094

120

= .048

The p-chart is: 0.14

UCL=0.141

0.12

+AB=0.118

p-hat

0.10

+BC=0.094

0.08

Centerline=0.071 0.06

-BC=0.048 0.04

-AB=0.024

0.02

LCL=0.001

0.00 0

5

10

15

20

Sample

c.

To determine if the process is in or out of control, we check the four rules: Rule 1: Rule 2: Rule 3: Rule 4:

One point beyond Zone A: No points are beyond Zone A. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist.

The process appears to be in control. d.

Since the process is in control, it is appropriate to use the control limits to monitor future process output.

e.

No. The number of defectives recorded was per day, not per hour. Therefore, the p-chart is not capable of signaling hour-to-hour changes in p. Copyright © 2022 Pearson Education, Inc.


792

Chapter 13

13.53

A capability analysis is a methodology used to help determine when common cause variation is unacceptably high. If a process is not in statistical control, then both common causes and special causes of variation exist. It would not be possible to determine if the common cause variation is too high because it could not be separated from special cause variation.

13.54

Specification spread is the difference between the upper specification limit and the lower specification limit. The specification spread is determined by customers, management, and product designers. Process spread is the spread of the actual output and is a function of the standard deviation of the data.

13.55

One way to assess the capability of a process is to construct a frequency distribution or stem-and-leaf display for a large sample of individual measurements from the process. Then, the specification limits and the target value for the output variable are added to the graph. This is called a capability analysis diagram. A second way to assess the capability of a process is to quantify capability. The most direct way to quantify capability is to count the number of items that fall outside the specification limits in the capability analysis diagram and report the percentage of such items in the sample. Also, one can construct a capability index. This is the ratio of the difference in the specification spread and the difference in the process spread. This measure is called CP. If CP is less than 1, then the process is not capable.

13.56

There are two reasons why CP should not be used in isolation. First, CP is a statistic and is subject to sampling error. The sample standard deviation is used to estimate the population standard deviation which is used to calculate the process spread. Thus, the estimate of the process spread can vary from sample to sample. Second, CP does not reflect the shape of the output distribution. Distributions with different shapes can have the same CP value.

13.57

a.

C p = 1.00 . For this value, the specification spread is equal to the process spread. This indicates that the

process is capable. Approximately 2.7 units per 1,000 will be unacceptable. b.

C p = 1.33 . For this value, the specification spread is greater than the process spread. This indicates

that the process is capable. Approximately 63 units per 1,000,000 will be unacceptable. c.

C p = 0 .5 0 . For this value, the specification spread is less than the process spread. This indicates that

the process is not capable. d.

C p = 2 .0 0 . For this value, the specification spread is greater than the process spread. This indicates

that the process is capable. Approximately 2 units per billion will be unacceptable. 13.58

The specification spread is the difference between the upper specification limit and the lower specification limit. a.

Specification spread = USL − LSL =19.65 −12.45 = 7.20

b.

Specification spread = USL − LSL = .0010 − .0008 = .0002

c.

Specification spread = USL − LSL = 1.43 −1.27 = 0.16

d.

Specification spread = USL − LSL = 490 − 486 = 4

Copyright © 2022 Pearson Education, Inc.


Methods for Quality Improvement: Statistical Process Control

13.59

13.60

13.61

793

The process spread is 6𝜎. a.

For σ = 21 , the process spread is 6 ( 2 1 ) = 1 2 6 .

b.

For σ = 5.2 , the process spread is 6 ( 5 .2 ) = 3 1 .2 .

c.

For s = 110.06 , the process spread is estimated by 6 (1 10 .0 6 ) = 6 6 0 .3 6

d.

For s = .0024 , the process spread is estimated by 6 (.0024 ) = .0144

Cp =

Specification spread USL − LSL = Process spread 6σ

a.

Cp ≈

USL − LSL 1.0065 − 1.0035 .003 = = =1 6s 6 (.0005 ) .003

b.

Cp ≈

USL − LSL 22 − 21 1 = = = .8333 6s 6 (.2 ) 1.2

c.

Cp ≈

USL − LSL 875 − 870 5 = = = 1.111 6s 6 (.75) 4.5

We know that C p =

USL − LSL 6σ

USL − LSL  12σ = USL − LSL . The process mean is halfway between the USL 6σ and the LSL. Since the specification spread covers 12𝜎, then the USL must be 12σ / 2 = 6σ from the process mean. Thus, if C p = 2 , then 2 =

13.62

a.

If the output distribution is normal with a mean of 1.5 and a standard deviation of .02, then the proportion of the output that is unacceptable is: .

𝑃 𝑥 < 1.47 + 𝑃 𝑥 > 1.53 = 𝑃 𝑧 <

. .

+ 𝑃 𝑧 >

= 𝑃 𝑧 < −15 + 𝑃 𝑧 > 15 ≈ 0

.

. .

(using Table II, Appendix D) The percentage of unacceptable output is 0%. b.

𝐶 =

Specification spread Process spread

=

USL LSL

=

.

. (.

)

=5

Since the value of CP is greater than 1, the process is capable. 13.63

The capability index is C p =

USL − LSL .7 − .1 .6 = = = .377 . 6σ 6 (.265) 1.59

Since the capability index is less than 1, the process is not capable. The process spread is wider than the specification spread. Copyright © 2022 Pearson Education, Inc.


794 13.64

Chapter 13 a.

A capability analysis diagram is:

b.

From the sample, 𝑥̄ = 19.1026 and 𝑠 = .00238. 𝐶 =

USL LSL

=

.

. (.

)

=

. .

= 1.12

Since the CP value is greater than 1, the process is capable. 13.65

a.

A capability diagram is (LSL = 35 is off the chart.): Process Capability of Length Target, USL P rocess Data LSL * Target 37 USL 37 S ample M ean 37.0066 S ample N 100 S tD ev (O v erall) 0.0834026

O v erall C apability Pp * PPL * P P U -0.03 P pk -0.03 C pm 0.00

36.825 36.900 36.975 37.050 37.125 37.200 O bserv ed P erformance % < LS L * % > U SL 51.00 % Total 51.00

E xp. O v erall P erformance % < LS L * % > U S L 53.15 % Total 53.15

b.

Fifty-one percent of the observations are above the upper specification limit.

c.

From the sample, x = 37.007 and s = .0834 . Cp =

d.

USL − LSL 37 − 35 2 ≈ = = 3.9968 6s 6 (.0834 ) .5004

Since the CP value is greater than 1, the process is capable. Copyright © 2022 Pearson Education, Inc.


Methods for Quality Improvement: Statistical Process Control

13.66

795

Using MINITAB, the capability analysis diagram is: Process Capability of Run LSL

Target

USL

P rocess Data LS L 5.9 Target 6.3 U SL 6.5 S ample M ean 6.20875 S ample N 80 S tDev (O v erall) 0.169872

O v erall C apability Pp 0.59 P P L 0.61 P P U 0.57 P pk 0.57 C pm 0.35

5.85 O bserv ed P erformance % < LS L 2.50 % > U S L 6.25 % Total 8.75

6.00

6.15

6.30

6.45

6.60

Exp. O v erall P erformance % < LSL 3.46 % > U S L 4.32 % Total 7.78

From the sample, 𝑥̄ = 6.20875 and 𝑠 = .169872. CP =

USL - LSL 6.5 − 5.9 .6 ≈ = = .5887 6σ 6 (.169872 ) 1.019232

Since the CP value is less than 1, the process is not capable. From the chart, we know that 8.75% of the 80 weights fall outside the specification limits. This is also an indication that the process is not capable. 13.67

Using MINITAB, the capability analysis diagram is: Process Capability Report for Mean Target

USL Overall Capability Pp * PPL * PPU 1.28 Ppk 1.28 Cpm 0.57

Process Data LSL * Target 205 USL 205.05 Sample Mean 205.028 Sample N 20 StDev(Overall) 0.0057308

5. 20

% < LSL % > USL % Total

50 00

25 01 5. 20

5 20

20 .0

0 5 20

27 .0

5

50 03 5. 0 2

0 5. 20

5 42

0 5. 20

0 50

Performance Observed Expected Overall * * 0.00 0.01 0.00 0.01

From the sample. 𝑥̄ = 205.028 and 𝑠 = .0057308. CP =

USL - LSL 205.05 − 204.05 1 ≈ = = 29.08 6σ 6 (.0057308 ) .03438

Since the CP value is greater than 1, the process is capable. Copyright © 2022 Pearson Education, Inc.


796

Chapter 13

From the chart, we know that none of the 20 obervations fall outside the specification limits. This is also an indication that the process is capable. 13.68

a.

Using MINITAB, the simple statistics for the data are: Descriptive Statistics: Thick Variable N Mean Thick 32 0.22144

StDev 0.02031

Minimum 0.16700

Q1 0.20850

Median 0.22100

Q3 0.23725

Maximum 0.25800

The standard deviation is s = .0203 . USL - LSL .30 − .10 = = 1.642 . Since the value of Cp is greater than 1, the process is capable. 6σ 6 (.0203)

b.

Cp =

c.

From Exercise 13.33, the LCL = .1767 . We found in Exercise 13.33 that the process was under control. Since the process is under control and the LCL > LSL, this implies that the average thickness of the material can be lowered and still meet specifications.

13.69

The quality of a good or service is indicated by the extent to which it satisfies the needs and preferences of its users. Its eight dimensions are: performance, features, reliability, conformance, durability, serviceability, aesthetics, and other perceptions that influence judgments of quality.

13.70

A process is a series of actions or operations that transform inputs to outputs. A process produces output over time. An example of an organizational process is the manufacturing a product.

13.71

A system is a collection or arrangement of interacting components that has an on-going purpose or mission. A system receives inputs from its environment, transforms those inputs to outputs, and delivers those outputs to its environment.

13.72

The six major sources of process variation are: people, machines, materials, methods, measurements, and environment.

13.73

Yes. Even though the output may all fall within the specification limits, the process may still be out of control.

13.74

Solution will vary. See Section 13.7 for Guided Solutions.

13.75

Solution will vary. See Section 13.7 for Guided Solutions.

13.76

Solution will vary. See Section 13.7 for Guided Solutions.

13.77

If a process is in control and remains in control, its future will be like its past. It is predictable in that its output will stay within certain limits. If a process is out of control, there is no way of knowing what the future pattern of output from the process may look like.

13.78

Common causes of variation are the methods, materials, equipment, personnel, and environment that make up a process and the inputs required by the process. That is, common causes are attributable to the design of the process. Special causes of variation are events or actions that are not part of the process design. Typically, they are transient, fleeting events that affect only local areas or operations within the process for a brief period of time. Occasionally, however, such events may have a persistent or recurrent effect on the process.

Copyright © 2022 Pearson Education, Inc.


Methods for Quality Improvement: Statistical Process Control

797

13.79

Control limits are a function of the natural variability of the process. The position of the limits is a function of the size of the process standard deviation. Specification limits are boundary points that define the acceptable values for an output variable of a particular product or service. They are determined by customers, management, and/or product designers. Specification limits may be either two-sided, with upper and lower limits, or one-sided with either an upper or lower limit. Specification limits are not dependent on the process in any way. The process may not be able to meet the specification limits even when it is under statistical control.

13.80

If a process is capable, then it is necessarily in control. If a process is in control, then the control chart should be used to monitor the process.

13.81

The CP statistic is used to assess capability if the process is stable (in control) and if the process is centered on the target value.

13.82

The probability of observing a value of x more than 3 standard deviations from its mean is: P ( x > μ + 3 σ ) + P ( x < μ − 3 σ ) = P ( z > 3 ) + P ( z < 3 ) = .5000 − .4987 + .5000 − .4987 = .0026

If we want to find the number of standard deviations from the mean the control limits should be set so the probability of the chart falsely indicating the presence of a special cause of variation is .10, we must find the z score such that: P ( z > z 0 ) + P ( z < − z 0 ) = .10 0 0 o r P ( z > z 0 ) = .0 5 0 0

Using Table II, Appendix D, z 0 = 1.645 . Thus the control limits should be set 1.645 standard deviations above and below the mean. 13.83

Rule 2:

Nine points in a row in Zone C or beyond: We see a long sequence of points are in Zone C (on the lower side of the centerline) or beyond.

Since the chart is monitoring mortality rate, this is good news. While the process is out of control, it is out of control because the mortality rates are experiencing a long run of decreasing values. 13.84

a.

The centerline x =  = x

n

150.58 = 7.529 . The time series plot is: 20

7.60 7.58

Length

7.56

7.54

x-bar=7.529 7.52

7.50

7.48

7.46 2

4

6

8

10

12

14

16

18

20

Time

b. 13.85

The variation pattern that best describes the pattern in this time series is the level shift. Points 1 through 10 all have fairly low values, while points 11 through 20 all have fairly high values.

To determine if the process is in or out of control, we check the six rules: Copyright © 2022 Pearson Education, Inc.


798

Chapter 13

Rule 1: Rule 2: Rule 3: Rule 4: Rule 5: Rule 6:

One point beyond Zone A: No points are beyond Zone A. Nine points in a row in Zone C or beyond: Points 8 through 16 are in Zone C (on one side of the centerline) or beyond. This indicates the process is out of control. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist. Two out of three points in Zone A or beyond: No group of three consecutive points have two or more in Zone A or beyond. Four out of five points in a row in Zone B or beyond: No sequence of five points has four or more in Zone B or beyond.

Rule 2 indicates that the process is out of control. A special cause of variation appears to be present. 13.86

a.

Yes. The minimum sample size necessary so the lower control limit is not negative is n > From the data, p =

Thus, n > b.

9 (1 − p 0 ) p0

Total defective in all samples 258 = = .061 . Thus, p 0 ≈ .06 . Total units sampled 200 ( 21)

9 (1 − .06) = 141 . Our sample size was 200. .06

To compute the proportion of defectives in each sample, divide the number of defectives by the number in the sample, 200: pˆ =

No. of defectives No. in sample

The sample proportions are listed in the table: Sample No. 1 2 3 4 5 6 7 8 9 10 11

.02 .03 .055 .06 .025 .05 .04 .08 .085 .10 .14

Sample No. 12 13 14 15 16 17 18 10 20 21

.10 .10 .085 .065 .05 .055 .035 .03 .04 .045

To get the total number of defectives, sum the number of defectives for all 21 samples. The sum is 258. To get the total number of units sampled, multiply the sample size by the number of samples: 2 0 0 ( 2 1) = 4 2 0 0 . p=

No. of defectives 258 = = .0614 No. in sample 4200

Centerline = p = .0614 Copyright © 2022 Pearson Education, Inc.

.


Methods for Quality Improvement: Statistical Process Control p (1 − p )

Upper control limit = p + 3

n p (1 − p )

Lower control limit = p − 3

n

Upper A-B boundary = p + 2 Lower A-B boundary = p − 2 Upper B-C boundary = p + Lower B-C boundary = p −

= .0614 + 3 = .0614 − 3

p (1 − p ) n p (1 − p ) n p (1 − p ) n p (1 − p ) n

.0614 ( .9386 ) 200 .0614 ( .9386 ) 200

= .0614 + 2 = .0614 − 2

= .0614 + = .0614 −

= .1123 = .0105

.0614 (.9386 ) 200 .0614 (.9386 ) 200

.0614 (.9386 ) 200 .0614 (.9386 ) 200

799

= .0953 = .0275

= .0784 = .0444

The p-chart is: 0.14 0.12

UCL=0.1123

p-hat

0.10

+AB=0.0953

0.08

+BC=0.0784

0.06

Centerline=0.0614 -BC+0.0444

0.04

-AB=0.0275 0.02

LCL=0.0105 0.00 0

5

10

15

20

Sample

c.

To determine if the control limits should be used to monitor future process output, we need to check the four rules. Rule 1: Rule 2: Rule 3: Rule 4:

One point beyond Zone A: The 11th point is beyond Zone A. This indicates the process is out of control. Nine points in a row in Zone C or beyond: There are not nine points in a row in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist.

Rule 1 indicates the process is out of control. These control limits should not be used to monitor future process output.

13.87

d.

Answers will vary.

a.

From the problem, we are given LCL =12.3 and UCL = 13.8 . We are given the sample means for the 30 observations, but not the Ranges. Thus, we will have to compute R from the UCL and UCL. We can also compute x from the UCL and LCL. From Table IX, Appendix D, with n = 6, A2 = .483 .

Copyright © 2022 Pearson Education, Inc.


800

Chapter 13

x=

UCL + LCL 13.76 + 12.26 = = 13.01 2 2

UCL = x + A2 R  R =

UCL − x 13.76 − 13.01 = = 1.55 A2 .483

2 2 A2 R = 13.01 + (.483)(1.55) = 13.51 3 3 2 2 Lower A-B boundary = x − A2 R = 13.01 − (.483)(1.55) = 12.51 3 3 1 1 Upper B-C boundary = x + A2 R = 13.01 + (.483)(1.55) = 13.26 3 3 1 1 Lower B-C boundary = x − A2 R = 13.01 − (.483)(1.55) = 12.76 3 3

Upper A-B boundary = x +

The x -chart is: 14.0

UCL=13.8 +AB=13.51

13.5

x-bar

+BC=13.26 Centerline=13.01

13.0

-BC=12.76 -AB=12.51

12.5

LCL=12.3 12.0 0

5

10

15

20

25

30

DAY

b.

To determine if the process is in or out of control, we check the six rules: Rule 1: One point beyond Zone A: There is one point beyond Zone A. Rule 2: Nine points in a row in Zone C or beyond: Points 8 though 18 are all in the lower Zone C or below Rule 3: Six points in a row steadily increasing or decreasing: This pattern does not exist. Rule 4: Fourteen points in a row alternating up and down: This pattern does not exist. Rule 5: Two out of three points in Zone A or beyond: This pattern does not exist. Rule 6: Four out of five points in a row in Zone B or beyond: Points 11 through 156 satisfy this rule. This process appears to be out of control. Rules 1, 2, and 6 indicate that the process is out of control.

13.88

c.

Nine of these ten observations fall below the lower control limit. This would be extremely unusual if there was no under-reporting. We would conclude that there is under-reporting for the emissions data for this 10-day period.

a.

In order for the x -chart to be meaningful, we must assume the variation in the process is constant (i.e., stable).

Copyright © 2022 Pearson Education, Inc.


Methods for Quality Improvement: Statistical Process Control

For each sample, we compute x = 

n

x

and R = range = largest measurement - smallest

measurement. The results are listed in the table: Sample No. 1 2 3 4 5 6 7 8 9 10 11 12

x=

x 32.325 30.825 30.450 34.525 31.725 33.850 32.100 28.250 32.375 30.125 32.200 29.150

R 11.6 12.4 7.8 10.2 9.1 10.4 10.1 6.8 8.7 6.3 7.1 9.3

Sample No. 13 14 15 16 17 18 19 20 21 22 23 24

x1 + x2 +  + x24 755.225 = = 31.4677 k 24

x 31.050 34.400 31.350 28.150 30.950 32.225 29.050 31.400 30.350 34.175 33.275 30.950

R=

R1 + R2 +  + R24 221.8 = = 9.242 k 24

Centerline = x = 31.468 From Table IX, Appendix D, with n = 4, A2 = .729 . Upper control limit = x + A2 R = 3 1.46 8 + .7 29 ( 9.24 2 ) = 3 8.20 5 Lower control limit = x − A2 R = 3 1.4 68 − .7 29 ( 9.24 2 ) = 2 4.73 1

2 2 A2R = 31.468 + (.729)( 9.242) = 35.960 3 3 2 2 Lower A-B boundary = x − A2R = 31.468 − (.729)( 9.242) = 26.976 3 3 1 1 Upper B-C boundary = x + A2R = 31.468 + (.729)( 9.242) = 33.714 3 3 1 1 Lower B-C boundary = x − A2R = 31.468 − (.729)( 9.242) = 29.222 3 3 Upper A-B boundary = x +

R 13.3 9.6 7.3 8.6 7.6 5.6 10.0 8.7 8.9 10.5 13.0 8.9

( )

( )

( ) ( )

Copyright © 2022 Pearson Education, Inc.

801


802

Chapter 13

The x -chart is: 40.0

UCL=38.21

37.5

+AB=35.96 35.0

x-bar

+BC=33.71 32.5

Centerline=31.47 30.0

-BC=29.22 27.5

-AB=26.98

25.0

LCL=24.73 0

5

10

15

20

25

SAMPLE

b.

To determine if the process is in or out of control, we check the six rules. Rule 1: Rule 2:

One point beyond Zone A: No points are beyond Zone A. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist. Two out of three points in Zone A or beyond: There are no groups of three consecutive points that have two or more in Zone A or beyond. Four out of five points in a row in Zone B or beyond: No sequence of five points has four or more in Zone B or beyond.

Rule 3: Rule 4: Rule 5: Rule 6:

The process appears to be in control. There are no indications that special causes of variation are affecting the process. c. 13.89

a.

Since the process appears to be in control, these limits should be used to monitor future process output.

For each sample, we compute x =  and R = range = largest measurement - smallest measurement. x

n

The results are listed in the table: Sample No.

x

R

Sample No.

x

R

1 2 3 4 5 6 7 8 9 10

4.36 5.10 4.52 3.42 2.62 3.94 2.34 3.26 4.06 4.96

7.1 7.7 5.0 5.8 6.2 3.9 5.3 3.2 8.0 7.1

11 12 13 14 15 16 17 18 19 20

3.32 4.02 5.24 3.58 3.48 5.00 3.68 2.68 3.66 4.10

4.8 4.8 7.8 3.9 5.5 3.0 6.2 3.9 4.4 5.5

Copyright © 2022 Pearson Education, Inc.


Methods for Quality Improvement: Statistical Process Control

x + x +  + x20 77.34 x= 1 2 = = 3.867 k 20

R=

803

R1 + R2 +  + R20 109.1 = = 5.455 k 20

First, we construct an R-chart. Centerline = R = 5.455 From Table IX, Appendix D, with n = 5, D3 = 0.000, and D4 = 2.114 . Upper control limit = R D 4 = 5.4 5 5 ( 2.1 1 4 ) = 1 1.5 3 2 Since D3 = 0 , the lower control limit is negative and is not included on the chart. Upper A–B boundary = R + 2 d 3 R = 5.455 + 2 (.864 )

( 5.455 ) = 9.508

Lower A–B boundary = R − 2 d 3 R = 5.455 − 2 (.864 )

( 5.455 ) = 1.402

d2 d2

Upper B–C boundary = R + d 3 R = 5.455 + (.864 ) d2

2.326 2.326

( 5.455 ) = 7.481 2.326

( 5.455 ) = 3.429 Lower B–C boundary = R − d 3 R = 5.455 − (.864 ) d2

2.326

The R-chart is: 12

UCL=11.53

10

+AB=9.51

R

8

+BC=7.48

6

Centerline=5.46

4

-BC=3.46

2

-AB=1.40 0 0

5

10

15

20

Sample

b.

To determine if the process is in or out of control, we check the four rules: Rule 1: Rule 2: Rule 3: Rule 4:

One point beyond Zone A: No points are beyond Zone A. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increase or decrease. Fourteen points in a row alternating up and down: This pattern does not exist.

The process appears to be in control. Since the process variation is in control, it is appropriate to construct the x -chart.

Copyright © 2022 Pearson Education, Inc.


Chapter 13

c.

In order for the x -chart to be valid, the process variation must be in control. The R-chart checks to see if the process variation is in control.

d.

To construct an 𝑥̄ -chart, we first calculate the following:

x1 + x2 +  + x20 77.24 = = 3.867 k 20

x=

R=

R1 + R2 +  + R20 109.1 = = 5.4557 k 20

Centerline = x = 3.867 From Table IX, Appendix D, with n = 5, A2 = .577 . Upper control limit = x + A2 R = 3.8 6 7 + .5 77 ( 5.4 5 5 ) = 7.0 1 5 Lower control limit = x − A2 R = 3.867 − .577 ( 5.455 ) = .719

2 2 A2R = 3.867 + (.577)( 5.455) = 5.965 3 3 2 2 Lower A–B boundary = x − A2R = 3.867 − (.577)( 5.455) = 1.769 3 3 1 2 Upper B–C boundary = x + A2R = 3.867 + (.577)( 5.455) = 4.916 3 3 1 2 Lower B–C boundary = x − A2R = 3.867 − (.577)( 5.455) = 2.818 3 3 The x -chart is:

( )

Upper A–B boundary = x +

( )

( )

( )

x-bar

804

7

UCL=7.015

6

+AB=5.965

5

+BC=4.916

4

Centerline=3.867

3

-BC=2.818

2

-AB=1.769

1

LCL=0.719

0 0

5

10

15

20

Sample

e.

To determine if the process is in or out of control, we check the six rules: Rule 1: Rule 2: Rule 3: Rule 4: Rule 5: Rule 6:

One point beyond Zone A: No points are beyond Zone A. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: No sequence of six points steadily increases or decreases. Fourteen points in a row alternating up and down: This pattern does not exist. Two out of three points in Zone A or beyond: There are no groups of three consecutive points that have two or more in Zone A or beyond. Four out of five points in a row in Zone B or beyond: No sequence of five points has four or more in Zone B or beyond. Copyright © 2022 Pearson Education, Inc.


Methods for Quality Improvement: Statistical Process Control

805

The process appears to be in control. f. g.

Since both the R-chart and the x -chart are in control, these control limits should be used to monitor future process output. A capability analysis diagram is: Process Capability of Wait USL P rocess D ata LSL * Target * USL 5 S ample M ean 3.867 S ample N 100 S tDev (O v erall) 2.18979

O v erall C apability Pp * PPL * P P U 0.17 P pk 0.17 C pm *

0 O bserv ed P erformance % < LS L * % > U SL 27.00 % Total 27.00

h.

2

4

6

8

10

E xp. O v erall P erformance % < LS L * % > U S L 30.24 % Total 30.24

For an upper specification limit of 5, there are 27 observations above this limit. Thus, ( 27 / 100 ) × 100% = 27% of the observations are unacceptable. It does not appear that the process is capable.

i.

It is appropriate to estimate CP because the process is in control. From the sample, 𝑥̄ = 3.867 and 𝑠 = 2.190 Cp =

USL − LSL 5−0 5 ≈ = = .381 6s 6 ( 2.19 ) 13.14

Since the CP value is less than 1, the process is not capable.

13.90

j.

There is no lower specification limit because management has no time limit below which is unacceptable. The variable being measured is time customers wait in line. The actual lower limit would be 0.

a.

To get the total number of defectives, sum the number of defectives for all 36 samples. The sum is 279. To get the total number of units sampled, multiply the sample size by the number of samples: 160 ( 36 ) = 5,760 . p=

Total defective in all samples 279 = = .048 Total units sampled 5760

The centerline is = p = .048 Upper control limit = p + 3

p (1 − p ) N

= .048 + 3

.048 (1 − .048 ) 160

= .099

Copyright © 2022 Pearson Education, Inc.


Chapter 13 p (1 − p )

Lower control limit = p − 3

N

Upper A–B boundary = p + 2 Lower A–B boundary = p − 2 Upper B–C boundary = p +

= .048 − 3

p (1 − p ) N p (1 − p ) N p (1 − p ) N p (1 − p )

Lower B–C boundary = p −

N

.048 (1 − .048 )

= .048 + 2 = .048 − 2

= .048 + = .048 −

= − .003

160

.048 (1 − .048 ) 160 .048 (1 − .048 ) 160

.048 (1 − .048 ) 160 .048 (1 − .048 ) 160

= .082 = .014

= .065

= .031

The p-chart is: 0.10

UCL=0.099

0.08

+AB=0.082

+BC=0.065 p-hat

0.06

Centerline=0.048 0.04

-BC=0.031 0.02

-AB=0.014 0.00 0

10

20

30

40

Shift

b

To determine if the process is in or out of control, we check the four rules of the R-chart: Rule 1: Rule 2: Rule 3: Rule 4:

One point beyond Zone A: No points are beyond Zone A. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: This pattern is not present. Fourteen points in a row alternating up and down: This pattern does not exist.

The process appears to be in control. Thus, there is no indication that special causes of variation are present. c.

The Pareto diagram is: 160 140 120

Sum of Defects

806

100 80 60 40 20 0

Microcracks

Brokenstrands

Gaps

Voids

Type

Most of the defects are due to microcracks. Thus, "microcracks" are the "vital few." The other types of defectives are broken strands, gaps between layers, and internal voids. These are the "trivial many." Copyright © 2022 Pearson Education, Inc.


Methods for Quality Improvement: Statistical Process Control

13.91

a.

807

The capability analysis diagram is: Process Capability of CARBPCT LSL

Target

USL

P rocess D ata LS L 3.12 Target 3.42 USL 3.72 S ample M ean 3.43091 S ample N 33 S tD ev (O v erall) 0.198153

O v erall C apability Pp 0.50 PPL 0.52 P P U 0.49 P pk 0.49 C pm 0.50

3.0 O bserv ed P erformance % < LS L 9.09 % > USL 6.06 % Total 15.15

3.2

3.4

3.6

3.8

E xp. O v erall P erformance % < LS L 5.83 % > USL 7.23 % Total 13.06

b.

Two observations are above the upper specification limit and three observations are below the lower specification limit. Thus, the proportion of measurements that fall outside the specifications is 5 / 33 = .1515 .

c.

From the sample, 𝑥̄ = 3.43 and 𝑠 = .1982. Cp =

USL − LSL 3.72 − 3.12 .6 = = = .505 6σ 6 (.1982 ) 1.1892

Since the Cp value is less than 1, the process is not capable. 13.92

a.

The Centerline is x = 10.16 . From Table IX, Appendix D, with n = 5, A2 = .577 . Upper control limit = x + A 2 R = 10 .1 6 + .5 7 7 (1 4.8 7 ) = 1 8.7 4 Lower control limit = x − A 2 R = 10 .1 6 − .5 7 7 (1 4.8 7 ) = 1.5 8

b.

The upper and lower control limits are constructed so that they are 3 standard deviations from the mean or centerline. Thus, if an observation falls above the upper control limit, it is clearly an unusual observation. The probability of observing an observation above the upper control limit is P ( z > 3 ) = .5 − .4987 = .0013 .

Copyright © 2022 Pearson Education, Inc.


808

13.93

Chapter 13

First, we must compute the range for each sample. The range = R = largest measurement − smallest measurement. The results are listed in the table: Sample No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

R 2.0 2.1 1.8 1.6 3.1 3.1 4.2 3.6 4.6 2.6 3.5 5.3 5.5 5.6 4.6 3.0 4.6 4.5 4.8 5.4 5.5 3.8 3.6 2.5

Sample No. 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

x1 + x2 +  + x72 3537.3 = = 49.129 k 72 Centerline = 𝑥̄ = 49.129 x=

R 4.6 3.0 3.4 2.3 2.2 3.3 3.6 4.2 2.4 4.5 5.6 4.9 10.2 5.5 4.7 4.7 3.6 3.0 2.2 3.3 3.2 0.8 4.2 5.6

R=

Sample No. 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72

R 4.0 4.9 3.8 4.6 7.1 4.6 2.2 3.6 2.6 2.0 1.5 6.0 5.7 5.6 2.3 2.3 2.6 3.8 2.8 2.2 4.2 2.6 1.0 1.9

R1 + R1 +  + R72 268.8 = = 3.733 k 72

From Table IX, Appendix D, with n = 6, A2 = .483 . Upper control limit = x + A 2 R = 4 9.1 2 9 + .4 8 3 ( 3.7 3 3 ) = 50 .9 3 2 Lower control limit = x − A 2 R = 4 9.1 2 9 − .4 8 3 ( 3.7 3 3 ) = 4 7 .3 2 6

2 2 A2 R ) = 49.129 + (.483)( 3.733) = 50.331 ( 3 3 2 2 Lower A−B boundary = x − ( A2 R ) = 49.129 − (.483)( 3.733) = 47.927 3 3 1 1 Upper B−C boundary = x + ( A2 R ) = 49.129 + ( .483)( 3.733) = 49.730 3 3 1 1 Lower B−C boundary = x − ( A2 R ) = 49.129 − ( .483)( 3.733) = 48.528 3 3 Upper A−B boundary = x +

Copyright © 2022 Pearson Education, Inc.


Methods for Quality Improvement: Statistical Process Control

809

The 𝑥̄ -chart is: 52

51

UCL=50.932 +AB=50.331

x-bar

50

+BC=49.73 Centerline=49.129

49

-BC=48.528 48

-AB=47.927 LCL=47.326

47

46 0

10

20

30

40

50

60

70

80

Sample

To determine if the process is in or out of control, we check the six rules: Rule 1: Rule 2: Rule 3: Rule 4: Rule 5: Rule 6:

One point beyond Zone A: There are a total of 17 points beyond Zone A. Nine points in a row in Zone C or beyond: No sequence of nine points are in Zone C (on one side of the centerline) or beyond. Six points in a row steadily increasing or decreasing: There is one sequence of seven points that are steadily increasing—Points 15 through 21. Fourteen points in a row alternating up and down: This pattern does not exist. Two out of three points in Zone A or beyond: There are four groups of at least three points in Zone A or beyond—Points 12–16, Points 35–37, Points 39–41, and Points 60–63. Four out of five points in a row in Zone B or beyond: There are several groups of points that satisfy this rule.

The process appears to be out of control. Rules 1, 3, 5, and 6 indicate that the process is out of control. No. The problem does not give the times of the shifts. However, suppose we let the first shift be from 6:00 A.M. to 2:00 P.M., the second shift be from 2:00 P.M. to 10:00 P.M., and the third shift be from 10:00 P.M. to 6:00 A.M. If this is the case, the major problems are during the second shift.

Copyright © 2022 Pearson Education, Inc.


Chapter 14 Time Series: Descriptive Analyses, Models, and Forecasting 14.1

To calculate a simple index number, first obtain the prices or quantities over a time period and select a base year. For each time period, the index number is the number at that time period divided by the value at the base period multiplied by 100.

14.2

a.

The simple composite index is calculated as follows: First, sum the observations for all the series of interest at each time period. Select the base time period. Divide each sum by the sum in the base time period and multiply by 100.

b.

To calculate a weighted composite index, we follow the following steps: First, multiply the observations in each time series by its appropriate weight. Then sum the weighted observations across all times series for each time period. Select the base time period. Divide each weighted sum by the weighted sum in the base time period and multiply by 100.

c.

The steps necessary to compute a Laspeyres Index are: 1. 2. 3. 4. 5.

d.

Collect data for each of k price series. Select a base time period and collect purchase quantity information for each of the k series at the base time period. Using the purchase quantity values at the base period as weights, multiply each value in the kth series by its corresponding weight. Sum the products for each time period. Divide each sum by the sum corresponding to the base period and multiply by 100.

The steps necessary to compute a Paasche index are: 1. 2. 3. 4. 5.

Collect data for each of k price series. Select a base period. Collect purchase quantity information for each series at each time period. For each time period, multiply the value in each price series by its corresponding purchase quantity for that time period. Sum the products for each time period. To find the value of the Paasche index at a particular time period, multiply the purchase quantity values (weights) for that time period by the corresponding price values of the base time period. Sum the results for the base period. The Paasche Index is then found by dividing the sum found in (4) by the sum found in (5).

14.3

A Laspeyres index uses the purchase quantity at the base period as the weights for all other time periods. A Paasche index uses the purchase quantity at each time period as the weight for that time period. The weights at the specified time period are also used with the base period to find the index.

14.4

a.

The simple index for the quarter 4 price of product A, using quarter 1 as the base period is ( 4.25 / 3.25 ) × 100 = 130.77 .

810 Copyright © 2022 Pearson Education, Inc.


Time Series: Descriptive Analyses, Models, and Forecasting

14.5

811

b.

The simple index for the quarter 2 price of product B, using quarter 1 as the base period is (1.25 / 1.75 ) × 100 = 71.43 .

c.

To find the simple composite index, we must first sum the prices for all three products over the base period and the quarter for which we want to compute the simple composite index. The sum for quarter 1 is 3.25 + 1.75 + 8.00 = 13.00 . The sum for quarter 4 is 4.25 + 1.00 + 10.50 = 15.75 . The simple composite index for quarter 4 using quarter 1 as the base period is 15.75/13. 00 × 100 = 121.15.

d.

The sum of all the products for quarter 2 is 3.50 + 1.25 + 9.35 = 14.10 . The simple composite index for quarter 4 using quarter 2 as the base period is (15.75 / 14.10 ) × 100 = 111.70 .

a.

To find Laspeyres index, we use the quantities for the base period as the weights. We multiply the quantity for quarter 1 times the prices for quarters 1 and 4 for each product (A, B, or C). We then sum the products for both time periods. Finally, we divide the sum for quarter 4 by the sum for quarter 1. The sum of the products for quarter 1 is 100 ( 3.25 ) + 20 (1.75 ) + 50 ( 8.00 ) = 325 + 35 + 400 = 760 . The sum of the products for quarter 4 is

100 ( 4.25 ) + 20 (1.00 ) + 50 (10.50 ) = 425 + 20 + 525 = 970 . Laspeyres index is ( 970 / 760 ) × 100 = 127.63 .

b.

To find Paasche index, we use the quantities for all time periods as weights. We multiple the quantity for each quarter and each product by the corresponding price. We then sum these products for the base period quarter 2 and the quarter for which we want to compute Paasche’s index (quarter 4). he sum for quarter 2 is 300 ( 3.50 ) + 100 (1.25 ) + 20 ( 9.35 ) = 1050 + 125 + 107 = 1362 . The sum of the products for quarter 4 is 300 ( 4.25 ) + 100 (1.00 ) + 20 (10.50 ) = 1275 + 100 + 210 = 1585 . Paasche’s index is

(1585 / 1362 ) × 100 = 116.37 .

14.6

a.

To find the simple index, divide each value by the value for the base year and multiply by 100. The index numbers are:

YEAR 1990 1994 1998 2002 2006 2010 2014 2018 b.

INCOME 29943 32264 38885 42409 48201 49276 53657 63179

Simple Index Base Year 1990 Base Year 1994 100.00 92.81 107.75 100.00 129.86 120.52 141.63 131.45 160.97 149.40 164.56 152.73 179.20 166.31 211.00 195.82

The index value for 2002 is 141.63 when the base is 1990. Thus, the median annual family income for 2002 increased by 141.63– 100 = 41.63% over the median annual family income in 1990. The index value for 2002 is 131.45 when the base is 1994. Thus, the median annual family income for 2002 increased by 131.45– 100 = 31.45% over the median annual family income in 1994.

14.7

a.

To find the simple index, divide each value by the value for the base year and multiply by 100. The index numbers are:

Copyright © 2022 Pearson Education, Inc.


812

Chapter 14

YEAR

PRODUCTION

2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019

5.83 6.29 7.10 7.98 8.49 9.07 10.13 11.46 13.24 15.51 22.16 24.52 24.30 25.00 25.50 26.30

Simple Index Base Year 2004 Base Year 2010 100.00 107.89 121.78 136.88 145.63 155.57 173.76 196.57 227.10 266.04 380.10 420.58 416.81 428.82 437.39 451.11

57.55 62.09 70.09 78.78 83.81 89.54 100.00 113.13 130.70 153.11 218.76 242.05 239.88 246.79 251.73 259.62

The index value for 2019 is 451.11 when the base is 2004. Thus, the craft beer production for 2019 increased by 451.11– 100 = 351.11% over craft beer production in 2004. b.

This is a quantity index.

c.

The values of the simple index using 2010 as the base are listed in the table in part a. Using MINITAB, the graph of the indices is:

Both indices increase over time. Both increase at an increasing rate. 14.8

a.

To compute the simple index, divide each housing start value by the 2017, Quarter 1 value, 144 and then multiply by 100.

Copyright © 2022 Pearson Education, Inc.


Time Series: Descriptive Analyses, Models, and Forecasting YEAR 2017 2017 2017 2017 2018 2018 2018 2018 2019 2019 2019 2019

14.9

QUARTER 1 2 3 4 1 2 3 4 1 2 3 4

STARTS 144 180 168 148 158 194 171 138 153 181 176 163

813

Simple Index 100.00 125.00 116.67 102.78 109.72 134.72 118.75 95.83 106.25 125.69 122.22 113.19

b.

The value of the index for Quarter 2, 2018 is 134.72. Thus, the housing starts in Quarter 2, 2018 increased by 134.72 − 100 = 34.72% over the housing starts in the base quarter, Quarter 1, 2017.

c.

The value of the index for Quarter 4, 2019 is 113.19. Thus, the housing starts in Quarter 4, 2019 increased by 113.19 − 100 = 13.19% over the housing starts in the base quarter, Quarter 1, 2017.

d.

The number of housing starts for Quarter 1, 2018 is 158 thousand. The number of housing starts for Quarter 4, 2019 is 163 thousand. Using Quarter 1, 2018 as the base, the index for Quarter 4, 2019 is 163/158 × 100 = 103.16. Thus, the number of housing starts in Quarter 4, 2019 increased by 3.16% over the housing starts in Quarter 1, 2018.

a.

To compute the simple index, divide each spot price value by the 2000 value, 4.31 and then multiply by 100. YEAR

PRICE

Simple Index

2000

4.31

100.00

2001

3.96

91.88

2002

3.38

78.42

2003

5.47

126.91

2004

5.89

136.66

2005

8.69

201.62

2006

6.73

156.15

2007

6.97

161.72

2008

8.86

205.57

2009

3.94

91.42

2010

4.37

101.39

2011

4

92.81

2012

2.75

63.81

2013

3.73

86.54

2014

4.37

101.39

2015

2.62

60.79

2016

2.52

58.47

2017

2.99

69.37

2018

3.15

73.09

2019

2.56

59.40

2020

1.85

42.92

Copyright © 2022 Pearson Education, Inc.


814

Chapter 14 The plot of the simple index is:

14.10

b.

The gas price of natural gas basically increased from 1995 to 2005, remained fairly high from 2005 to 2008, and thenbasically decreased to 2015.

c.

This is a price index because the values used were the spot prices of natural gas.

a.

To compute the simple index for the agricultural data, divide each farm value by the 1980 value 3,364 and then multiply by 100. To compute the simple index for the nonagricultural data, divide each nonfarm value by the 1980 value 95,938 and then multiply by 100. The two indices are: YEAR 1980 1985 1990 1995 2000 2005 2010 2015 2020

FARM 3364 3179 3223 3440 2464 2197 2206 2422 2399

NONFARM 95938 103971 115570 121460 134427 139532 136858 146411 131045

Farm Index 100.00 94.50 95.81 102.26 73.25 65.31 65.58 72.00 71.31

Nonfarm Index 100.00 108.37 120.46 126.60 140.12 145.44 142.65 152.61 136.59

b.

The nonfarm segment has shown the greater percentage change in employment over the time period. The nonfarm employment in 2020 was 36.59% greater than in 1980. The farm employment in 2020 was 28.69% lower than in 1980.

c.

To compute the simple composite index, first sum the two values (farm and nonfarm) for every time period. Then divide the sum by the sum in 1980, 99,302, and then multiply by 100. The simple composite index is: YEAR 1980 1985 1990 1995 2000 2005 2010 2015 2020

Sum 99302 107150 118793 124900 136891 141729 139064 148833 133444

Composite Index 100.00 107.90 119.63 125.78 137.85 142.73 140.04 149.88 134.38

Copyright © 2022 Pearson Education, Inc.


Time Series: Descriptive Analyses, Models, and Forecasting

14.11

815

d.

The simple composite index value for 2010 is 140.04. The composite employment is 40.04% higher in 2010 than in 1980.

a.

To compute the simple composite index, first sum the three values (durables, nondurables, and services) for every time period. Then, divide each sum by the sum in 1970, 649, and then multiply by 100. The simple composite index for 1970 is:

Year 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 2020

Sum 649 1025 1755 2667 3835 4988 6830 8819 10348 12276 13818

Simple Composite Index Base Year - 1970 Base Year - 1980 100.00 36.98 157.94 58.40 270.42 100.00 410.94 151.97 590.91 218.52 768.57 284.22 1052.39 389.17 1358.86 502.51 1594.45 589.63 1891.53 699.49 2129.12 787.35

b.

To update the 1970 index to the 1980 index, divide the 1970 index values by the 1970 index value for 1980, 270.42, and then multiply by 100. The 1980 simple composite index is also listed in the table in part a.

c.

The graph of the two indices is:

Changing the base year from 1970 to 1980 flattens out the graph. Also, the spread of the values for the 1980 index is much smaller than the spread of the values in the 1970 index. 14.12

a.

To find the Laspeyres index, we multiply the durable goods by 10.9, the nondurable goods by 14.02, and the services by 42.6. The three products are then summed. The index is found by dividing the weighted sum at each time period by the weighted sum of 1970 (17,108.86) and then multiplying by 100. The Laspeyres index and the simple composite index for 1970 (computed in Exercise 14.11) are:

Copyright © 2022 Pearson Education, Inc.


816

Chapter 14

Year 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 2020

b.

Sum 649 1025 1755 2667 3835 4988 6830 8819 10348 12276 13818

Simple Composite Base Year - 1970 100.00 157.94 270.42 410.94 590.91 768.57 1052.39 1358.86 1594.45 1891.53 2129.12

Weighted Sum 18249.58 27527.92 51222.46 76159.08 119207.58 158603.20 217821.86 284383.76 339540.62 404976.98 462371.50

Laspeyres Base Year - 1970 100.00 150.84 280.68 417.32 653.21 869.08 1193.57 1558.30 1860.54 2219.10 2533.60

The plot of the two indices is:

The two indices are very similar from 1970 to approximately 1985. After 1985, the difference between the two indices becomes larger, with the Laspeyres index increasing faster than the simple composite index. 14.13

a.

To compute the simple index for the men data, divide each men value by the 2010 value, 824, and then multiply by 100. To compute the simple index for the women data, divide each women value by the 2010 value, 670, and then multiply by 100. The two indices are: Year 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019

Men 824 831 854 861 870 894 915 941 973 1003

Men Index 100.00 100.85 103.64 104.49 105.58 108.50 111.04 114.20 118.08 121.72

Women 670 683 691 706 719 726 749 770 788 813

Women Index 100.00 101.94 103.13 105.37 107.31 108.36 111.79 114.93 117.61 121.34

Copyright © 2022 Pearson Education, Inc.


Time Series: Descriptive Analyses, Models, and Forecasting b.

817

The graph of the two indices is:

The weekly earnings for men and the weekly earnings for women increased at a very similar rate.

c. To compute the simple composite index, first sum the two values (men and women) for every time period. Then, divide each sum by the sum in 2010 (14940) and then multiply by 100. The simple composite index is: Year 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019

d.

Sum 1494 1514 1545 1567 1589 1620 1664 1711 1761 1816

Simple Composite Index 100.00 101.34 103.41 104.89 106.36 108.43 111.38 114.52 117.87 121.55

The graph of the index is:

From 2010 to 2019, the weekly earnings increased at a fairly steady rate. 14.14

a.

To get the simple composite price index, sum the prices for the three metals for each year, divide by 529 (the sum of the prices for the base period 2014), and multiply by 100. To get the simple composite quantity index, sum the productions for the three metals for each year, divide by 3448 (the sum of the quantities for the base period 2014), and multiply by 100. The indices are:

Copyright © 2022 Pearson Education, Inc.


818

Chapter 14 YEAR 2014 2015 2016 2017 2018

b.

Production Total 3448 3337 2594 2311 2350

Price Index 100 82.23062382 75.42533081 93.95085066 100.1890359

Production Index 100 96.78074246 75.23201856 67.02436195 68.15545244

To compute the Laspeyres index, multiply the price for each year by the quantity for each of the metals for 2014, sum the products for the three metals, divide by 652,098 (the sum for the base period 2014), and multiply by 100. The Laspeyres index is: YEAR 2014 2015 2016 2017 2018

c.

Price Total 529 435 399 497 530

Total 652098 526606 419714 467058 492250

Laspeyres 100.00 80.76 64.36 71.62 75.49

The plots of the simple composite price index, the simple composite quantity index, and Laspeyres index are:

The quantity (production) index and the Laspeyres index both decline over time. The price declines for the years 2014 through 2016, but then increase in 2017 and 2018 d.

The following steps are used to compute the Paasche index: 1. 2. 3.

First, multiply the price × production for copper, iron scrap, and aluminum for each year. The numerator of the index is the sum of these three quantities at each year. Next, multiply the production values of copper by 318, the production of iron scrap by 106, and the production of aluminum by 105. The denominator is the sum of these three quantities at each year. The values of the Paasche index are the ratios of these two values at each year times 100.

Copyright © 2022 Pearson Education, Inc.


Time Series: Descriptive Analyses, Models, and Forecasting

819

The Paasche index is: YEAR 2014 2015 2016 2017 2018

e.

Paasche Numerator 652098 526606 419714 467058 492250

Paasche Denominator 652098 644695 577306 511345 502610

Paasche Index 100.00 81.68 72.70 91.34 97.94

The plot of the Laspeyres index and the Paasche index is:

The two indices are very similar from 2014 – 2016, but the Paasche index is much higher in 2017 and 2018. f.

The values of Laspeyres index for 2016 and 2018 are 64.36 and 75.49. The values of the Paasche index for 2016 and 2018 are 72.70 and 97.94. The Laspeyres index would be more appropriate for describing the change in this 2-year period. When comparing two time periods when neither are the base period, the Paasche index is not appropriate because the quantites used for the two time periods are not the same.

14.15

The smaller the value of w, the smoother the series. With w = .2, the current value receives a weight of .2 while the previous exponentially smoothed value receives a weight of .8. With w = .8, the current value receives a weight of .8 while the previous exponentially smoothed value receives a weight of .2. The smaller the value of w, the less chance the series can be affected by large jumps.

14.16

a.

The exponentially smoothed employment for the first period is equal to the employment for that period. For the rest of the time periods, the exponentially smoothed employment values are found by multiplying .5 times the employment value of that time period and adding to that (1 − .5) times the value of the exponentially smoothed employment figure of the previous time period. The exponentially smoothed employment value for the time period 2 is . 5 281 1 − . 5 280 = 280.5. The rest of the values are shown in the table.

Copyright © 2022 Pearson Education, Inc.


820

Chapter 14

Month Jan. Feb. Mar. Apr. May June July Aug. Sept. Oct. Nov. Dec. b.

t 1 2 3 4 5 6 7 8 9 10 11 12

Exponentially Smoothed Series w = .5 280.0 280.5 265.3 255.6 247.3 232.7 225.3 217.7 211.3 208.7 204.3 202.2

Yt 280 281 250 246 239 218 218 210 205 206 200 200

The graph of the time series and the exponentially smoothed series is: 290

Variable Yt Exp Series

280 270

Employment

260 250 240 230 220 210 200 1

2

3

4

5

6

7

8

9

10

11

12

Time

14.17

a.

The exponentially smoothed craft beer production for the first period is equal to the craft beer production for that period. For the rest of the time periods, the exponentially smoothed craft beer production is found by multiplying the craft beer production of that time period by w = .2 and adding to that (1 − .2) times the exponentially smoothed value above it. The exponentially smoothed value

for the second period is .2 ( 6.29 ) + (1 − .2)( 5.83) = 5.922 . The rest of the values are shown in the following table.

Copyright © 2022 Pearson Education, Inc.


Time Series: Descriptive Analyses, Models, and Forecasting

YEAR 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019

b.

PRODUCTION 5.83 6.29 7.1 7.98 8.49 9.07 10.13 11.46 13.24 15.51 22.16 24.52 24.3 25 25.5 26.3

821

Exponentially Smoothed Production w = .2 w = .8 5.830 5.830 5.922 6.198 6.158 6.920 6.522 7.768 6.916 8.346 7.347 8.925 7.903 9.889 8.615 11.146 9.540 12.821 10.734 14.972 13.019 20.722 15.319 23.760 17.115 24.192 18.692 24.838 20.054 25.368 21.303 26.114

The exponentially smoothed craft beer production for the first period is equal to the craft beer production for that period. For the rest of the time periods, the exponentially smoothed craft beer production is found by multiplying w = .8 times the beer production of that time period and adding to that (1 − .8) times the value of the exponentially smoothed beer production figure of the previous time period. The exponentially smoothed beer production for the second time period is . 8 6.29 1 − .8 5.830 = 6.198. The rest of the values are shown in the table in part a.

c.

The plot of the two series is:

Because the craft beer production is steadily increasing, the exponentially smoothed series with w = .8 better represents the actual craft beer production than the series with w = .2 . Thus, the series with w = .8 best portrays the long-term trend. 14.18

a.

The exponentially smoothed value for the first period is equal to the production for that period. For the rest of the time periods, the exponentially smoothed fish catch values are found by multiplying w = .5 times the production of that time period and adding to that (1 − .5) times the value of the exponentially smoothed production value of the previous time period.

Copyright © 2022 Pearson Education, Inc.


822

Chapter 14

The exponentially smoothed fish production value for time period 2009 is .5(82,461) + (1.5)(73,926) = 78,193.50. The rest of the values are shown in the table. Similarly, the exponentially smoothed shrimp production value thetime period 2009 is equal to the . 5 74,172 1 − .5 86,131 = 80,151.50. The rest of the values are shown in the table.

b.

Year

Fish

2008

73926

Fish (w=.5) 73926.00

86131

Shrimp (w=.5) 86131.00

2009

82461

78193.50

74172

80151.50

2010

79586

78889.75

97124

88637.75

2011

74451

76670.38

116935

102786.38

2012

80034

78352.19

92460

97623.19

2013

58214

68283.09

79740

88681.59

2014

58545

63414.05

109293

98987.30

2015

66289

64851.52

107929

103458.15

2016

55398

60124.76

106003

104730.57

2017

54698

57411.38

84235

94482.79

2018

55087

56249.19

84391

89436.89

Shrimp

The plot of the two time series and the two exponentially smoothed series is:

Both the time series and the exponentially smoothed series for the fish production are fairly stable over time, but are generally decreasing. For the shrimp series, there is an increase, then a decrease, then another increase, and another decrease for both the time series and the exponentially smoothed series. 14.19

a.

The exponentially smoothed gold price for the first period is equal to the gold price for that period. For the rest of the time periods, the exponentially smoothed gold price is found by multiplying the price for the time period by 𝑤 = .8 and adding to that (1 − .8) times the exponentially smoothed value from the previous time period. The exponentially smoothed value for the second time period is . 8 384 1 − .8 384 = 384.0 The rest of the values are shown in the table. Copyright © 2022 Pearson Education, Inc.


Time Series: Descriptive Analyses, Models, and Forecasting

Year 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006

b.

Price 384 384 388 331 294 279 279 271 310 363 410 445 603

Exponentially Smoothed w = .8 384.00 384.00 387.20 342.24 303.65 283.93 279.99 272.80 302.56 350.91 398.18 435.64 569.53

Year 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019

Price 695 872 972 1225 1572 1669 1411 1267 1160 1251 1257 1268 1378

823

Exponentially Smoothed w = .8 669.91 831.58 943.92 1168.78 1491.36 1633.47 1455.49 1304.70 1188.94 1238.59 1253.32 1265.06 1355.41

The plot of the two series is:

The exponentially smooth series with w = .8 is almost the same as the original series. Both series are fairly constant from 1994 to 2004. Then both series start an increasing trend until 2012 and then start a decreasing trend until 2015. After that, a slight increasing trend occurs. 14.20

a.

The exponentially smoothed expenditure for the first time period is equal to the expenditure for that period. For the rest of the time periods, the exponentially smoothed expenditures are found by multiplying the expenditures for the time period by w = .2 and adding to that (1 − .2) times the exponentially smoothed value above it. The exponentially smoothed value for the year 2010 is . 2 961.7 1 − .2 879.7 = 896.1. The rest of the values appear in the table. The process is repeated with w = .8 .

Copyright © 2022 Pearson Education, Inc.


824

Chapter 14

b.

Year

Expenditures

Exponentially Smoothed w = .2

2009 2010 2011 2012 2013 2014 2015 2016 2017 2018

879.7 961.7 1080.3 1135.5 1170.5 1202.4 1162.6 1160.4 1225.2 1286.6

879.70 896.10 932.94 973.45 1012.86 1050.77 1073.14 1090.59 1117.51 1151.33

Exponentially Smoothed w = .8 879.70 945.30 1053.30 1119.06 1160.21 1193.96 1168.87 1162.09 1212.58 1271.80

The plot of the three series is:

We see personal consumption in transportation had an increasing trend from 2009 thru 2014, then a slight decrease until 2016, before beginning another increasing trend. 14.21

a.

The exponentially smoothed imports for the first period is equal to the imports that period. For the rest of the time periods, the exponentially smoothed imports is found by multiplying 𝑤 = .1 times the imports for that time period and adding to that (1 − .1) times the value of the exponentially smoothed imports figure of the previous time period. The exponentially smoothed imports for the second time period is . 1 1,541 1 − .1 1,544 = 1,543.70. The rest of the values are shown in the table. The same procedure is followed for w = .9 . The exponentially smoothed imports/exports for the second time period is . 9 1,541 1 − .9 1,544 = 1,541.30. The rest of the values are shown in the table.

Copyright © 2022 Pearson Education, Inc.


Time Series: Descriptive Analyses, Models, and Forecasting

Year 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 b.

t 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Imports 1544 1541 1668 1790 1808 1904 2018 1681 1884 2086 2039 2014 2183 2179 1743 1791 1663 1563 1358 1181 1056 1261 1229 1054 598

Exponentially Smoothed w = .1 1544.00 1543.70 1556.13 1579.52 1602.37 1632.53 1671.08 1672.07 1693.26 1732.54 1763.18 1788.26 1827.74 1862.86 1850.88 1844.89 1826.70 1800.33 1756.10 1698.59 1634.33 1597.00 1560.20 1509.58 1418.42

825

Exponentially Smoothed w = .9 1544.00 1541.30 1655.33 1776.53 1804.85 1894.09 2005.61 1713.46 1866.95 2064.09 2041.51 2016.75 2166.38 2177.74 1786.47 1790.55 1675.75 1574.28 1379.63 1200.86 1070.49 1241.95 1230.29 1071.63 645.36

The plot of the three series is:

The exponentially smoothed series with 𝑤 = .9 looks more like the original series. The closer w is to 1 the closer the exponentially smoothed curve looks like the original.

Copyright © 2022 Pearson Education, Inc.


826 14.22

Chapter 14 a.

The exponentially smoothed Stock Index for the first time period is equal to the Stock Index for that time period. For the rest of the time periods, the exponentially smoothed stock price is found by multiplying 𝑤 = .3 times the stock prices for that time period and adding to that 1 − .3 times the value of the exponentially smoothed stock price for the previous time period. The exponentially smoothed stock prices for the second time period is . 3 1,349.7 + 1 − .3 1,348.8 = 1,349.07. The rest of the values are shown in the table. Time Year 2012

2013

2014

2015

2016

2017

2018

2019

Quarter

Period

S&P500

Smoothed

Smoothed

w=.3

w=.7

1

1

1348.8

1348.8

1348.8

2

2

1349.7

1349.07

1349.43

3

3

1400.9

1364.62

1385.46

4

4

1418.1

1380.66

1408.31

1

5

1514

1420.66

1482.29

2

6

1609.5

1477.32

1571.34

3

7

1674.9

1536.59

1643.83

4

8

1768.7

1606.22

1731.24

1

9

1834.9

1674.83

1803.80

2

10

1900.4

1742.50

1871.42

3

11

1975.9

1812.52

1944.56

4

12

2009.3

1871.55

1989.88

1

13

2067.89

1930.45

2044.49

2

14

2063.11

1970.25

2057.52

3

15

1920.03

1955.18

1961.28

4

16

2043.94

1981.81

2019.14

1

17

2059.74

2005.19

2047.56

2

18

2098.86

2033.29

2083.47

3

19

2168.27

2073.78

2142.83

4

20

2238.83

2123.30

2210.03

1

21

2362.72

2195.12

2316.91

2

22

2423.41

2263.61

2391.46

3

23

2519.36

2340.34

2480.99

4

24

2673.61

2440.32

2615.82

1

25

2640.87

2500.48

2633.36

2

26

2718.37

2565.85

2692.87

3

27

2913.98

2670.29

2847.65

4

28

2506.85

2621.26

2609.09

1

29

2834.4

2685.20

2766.81

2

30

2941.76

2762.17

2889.27

3

31

2976.74

2826.54

2950.50

4

32

3230.8

2947.82

3146.71

The plot of the original series and the exponentially smoothed series with 𝑤 = .3 is:

Copyright © 2022 Pearson Education, Inc.


Time Series: Descriptive Analyses, Models, and Forecasting

b.

827

The same procedure is followed for 𝑤 = .7. The exponentially smoothed Stock Index for the first time period is equal to the Stock Index for that time period. For the rest of the time periods, the exponentially smoothed stock price is found by multiplying 𝑤 = .7 times the stock prices for that time period and adding to that (1 − .7 ) times the value of the exponentially smoothed stock price for the previous time period. The exponentially smoothed stock prices for the second time period is . 7 1,349.7 + 1 − .7 1,348.8 = 1,349.43. The rest of the values are shown in the table in part a. The plot of the original series and the exponentially smoothed series with 𝑤 = .7 is:

b.

The exponentially smoothed series with 𝑤 = .3 better describes the trends in the series. The exponentially smoothed series with 𝑤 = .7 is almost exactly like the original series.

14.23

If w is small (near 0), one will obtain a smooth, slowly changing series of forecasts. If w is large (near 1), one will obtain more rapidly changing forecasts that depend mostly on the current values of the series.

14.24

a.

The missing trend value for quarter 3 is:

b.

The missing smoothed value for quarter 4 is: E4 = wY4 + (1 – w) ( E3 + T3 ) = .2 ( 4.25) + (1 − .2)( 3.78 + .27 ) = 4.09

c.

The forecast for quarter 5 is: FQ 5 = Ft +1 = Et + Tt = 4.09 + .29 = 4.38

a.

We first compute the exponentially smoothed values E1, E2, … , Et for years 2004 – 2017.

14.25

T3 = v ( E3 – E2 ) + (1 – v ) T2 = .6 ( 3.78 – 3.50) + (1 − .6)(.25) = .27

E1 = Y1 = 5.83

For w = .3 , E2 = wY2 + (1 – w) E1 = .3( 6.29) + (1 − .3)( 5.83) = 5.968

E3 = wY3 + (1 – w) E2 = .3( 7.10) + (1 − .3)( 5.968) = 6.308

The rest of the values appear in the table. Copyright © 2022 Pearson Education, Inc.


828

Chapter 14 For w = .7 , E2 = wY2 + (1 – w) E1 = .7 ( 6.29) + (1 − .7 )( 5.83) = 6.152

E3 = wY3 + (1 – w) E2 = .7 ( 7.10) + (1 − .7 )( 6.152) = 6.816

The rest of the values appear in the table.

Year 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019

Beer 5.83 6.29 7.1 7.98 8.49 9.07 10.13 11.46 13.24 15.51 22.16 24.52 24.3 25 25.5 26.3

Exponentially Smoothed Production w = .3 w = .7 5.830 5.830 5.968 6.152 6.308 6.816 6.809 7.631 7.314 8.232 7.840 8.819 8.527 9.737 9.407 10.943 10.557 12.551 12.043 14.622 15.078 19.899 17.911 23.134 19.827 23.950 21.379 24.685

To forecast using exponentially smoothed values, we use the following:

b.

For 𝑤 = .3: =𝐹 𝐹 =𝐹 𝐹

= 𝐸 = 21.379 = 𝐹 = 21.379

For 𝑤 = .7: =𝐹 𝐹 =𝐹 𝐹

= 𝐸 = 24.685 = 𝐹 = 24.685

We first compute the Holt’s values for the years 2004-2017. With w = .7 andν = .3 , E2 = Y2 = 6.29 E3 = wY3 + (1 – w) ( E2 + T2 ) = .7 ( 7.10) + (1 − .7 )( 6.29 + .46) = 6.995

T2 = Y2 – Y1 = 6.29 – 5.83 = .46 T3 = v ( E3 – E2 ) + (1 – v ) T2 = .3( 6.995 – 6.290) + (1 − .3)(.46) = .534

The rest of the Et’s and Tt’s appear in the table that follows. With w = .3 andν = .7 , E2 = Y2 = 6.29 E3 = wY3 + (1 – w) ( E2 + T2 ) = .3( 7.10) + (1 − .3)( 6.29 + .46) = 6.855

T2 = Y2 – Y1 = 6.29 – 5.83 = .46

Copyright © 2022 Pearson Education, Inc.


Time Series: Descriptive Analyses, Models, and Forecasting T3 = v ( E3 – E2 ) + (1 – v ) T2 = .7 ( 6.855 – 6.290) + (1 − .7 )(.46) = .534

The rest of the Et’s and Tt’s appear in the table that follows.

Year

Beer

2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019

5.83 6.29 7.1 7.98 8.49 9.07 10.13 11.46 13.24 15.51 22.16 24.52 24.3 25 25.5 26.3

Holt’s Et Tt w=.7 v=.3

Holt’s Et Tt w=.3 v=.7

6.290 6.995 7.845 8.485 9.084 10.003 11.236 12.899 15.059 20.456 24.085 25.111 25.739

6.290 6.855 7.566 8.304 9.033 9.869 10.909 12.286 14.131 17.708 21.854 25.250 27.638

0.460 0.534 0.628 0.632 0.622 0.711 0.868 1.106 1.422 2.615 2.919 2.351 1.834

0.460 0.534 0.658 0.714 0.725 0.803 0.968 1.255 1.668 3.004 3.804 3.519 2.727

To forecast using the Holt’s Model: For 𝑤 = .7 and 𝜈 = .3, = 𝐹 = 𝐸 + 𝑇 = 25.739 + 1.834 = 27.573 𝐹 = 𝐹 = 𝐸 + 2𝑇 = 25.739 + 2(1.834) = 29.407 𝐹 For 𝑤 = .3 and 𝜈 = .7, = 𝐹 = 𝐸 + 𝑇 = 27.638 + 2.727 = 30.365 𝐹 = 𝐹 = 𝐸 + 2𝑇 = 27.638 + 2(2.727) = 33.092 𝐹 14.26

a.

To compute the exponentially smoothed values, we follow these steps: 𝐸 = 𝑌 = 144 𝐸 = 𝑤𝑌 + (1– 𝑤)𝐸 = .6(180) + (1 − .6)(144) = 165.60 𝐸 = 𝑤𝑌 + (1– 𝑤)𝐸 = .6(168) + (1 − .6)(165.6) = 167.04 The rest of the values are computed in a similar manner and are listed in the table:

Copyright © 2022 Pearson Education, Inc.

829


830

Chapter 14 Exponentially Smoothed Year 2017

2018

2019

b.

c.

Quarter 1 2 3 4 1 2 3 4 1 2 3 4

a.

w = .6 144.000 165.600 167.040 155.616 157.046 179.219 174.287 152.515

Using MINITAB, the plot is:

To forecast using exponentially smoothed values, we use the following:

𝐹 𝐹 𝐹 𝐹 14.27

Housing 144 180 168 148 158 194 171 138 153 181 176 163

=𝐹 19,2 = 𝐹 19, = 𝐹 19, = 𝐹 19,

= 𝐸 = 152.515 = 𝐹 = 152.515 = 𝐹 = 152.515 = 𝐹 = 152.515

Using MINITAB, the time series plot is:

There appears to be an increasing trend in CPI over time. Copyright © 2022 Pearson Education, Inc.


Time Series: Descriptive Analyses, Models, and Forecasting b.

To compute the exponentially smoothed values, we follow these steps: E1 = Y1 = 125.8 E2 = wY2 + (1 – w) E1 = .4 (129.1) + (1 − .4)(125.8) = 127.12

E3 = wY3 + (1 – w) E2 = .4 (132.8) + (1 − .4)(127.12) = 129.39

The rest of the values are computed in a similar manner and are listed in the table:

YEAR 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019

T 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

CPI 125.8 129.1 132.8 136.8 147.8 152.4 156.9 160.5 163 166.6 171.5 177.1 179.9 184 188.9 195.3 201.6 207.3 215.3 214.5 218.1 224.9 229.6 233 236.7 237 240 245.1 251.1 254.1

Exponentially Smoothed w=.4 125.80 127.12 129.39 132.36 138.53 144.08 149.21 153.72 157.43 161.10 165.26 170.00 173.96 177.97 182.34 187.53 193.16 198.81 205.41 209.04 212.67 217.56 222.38 226.63 230.66 233.19 235.92 239.59 244.19 248.28

Using MINITAB, the plot is:

To forecast using exponentially smoothed values, we use the following: 𝐹 248.28 Copyright © 2022 Pearson Education, Inc.

20 = 𝐹

=𝐸 =

831


832

Chapter 14 c.

We first compute the Holt’s values for the years 1990-2019. With w = .4 andν = .5 , E2 = Y2 = 129.1 E3 = wY3 + (1 – w) ( E2 + T2 ) = .4 (132.8) + (1 − .4)(129.1 + 3.3) = 132.56

T2 = Y2 – Y1 = 129.1 – 125.8 = 3.3 T3 = v ( E3 – E2 ) + (1 – v ) T2 = .5 (132.56 – 129.1) + (1 − .5)( 3.3) = 3.38

The rest of the Et’s and Tt’s appear in the table that follows.

YEAR

T

CPI

1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

125.8 129.1 132.8 136.8 147.8 152.4 156.9 160.5 163 166.6 171.5 177.1 179.9 184 188.9 195.3 201.6 207.3 215.3 214.5 218.1 224.9 229.6 233 236.7 237 240 245.1 251.1 254.1

Holt’s Et Tt w=.4 v=.5 129.100 132.560 136.284 143.022 149.860 156.271 161.683 165.694 169.002 172.466 176.591 180.288 184.069 188.283 193.495 199.504 205.808 213.089 217.580 221.098 225.330 229.663 233.610 237.336 239.565 241.589 244.526 248.803 253.148

3.300 3.380 3.552 5.145 5.992 6.201 5.807 4.909 4.108 3.786 3.956 3.826 3.803 4.009 4.611 5.309 5.807 6.544 5.517 4.518 4.375 4.354 4.151 3.938 3.084 2.554 2.745 3.511 3.928

To forecast using the Holt’s Model: For 𝑤 = .4 and 𝜈 = .5, 𝐹 14.28

a.

=𝐹

= 𝐸 + 𝑇 = 253.148 + 3.928 = 257.076

Using the information from Exercise 14.21, the forecast imports in 2019 using the exponentially smoothed values up through 2018 with 𝑤 = .9 is: Copyright © 2022 Pearson Education, Inc.


Time Series: Descriptive Analyses, Models, and Forecasting 𝐹 b.

=𝐹

1

= 𝐸 = 1071.63

We first compute the Holt’s values for years 1995-2018. With 𝑤 = .3 and 𝜈 = .8, E2 = Y2 = 1,541 E3 = wY3 + (1 – w) ( E2 + T2 ) = .3(1,668) + (1 − .3)(1,541 − 3) = 1,577

T2 = Y2 – Y1 = 1,541 – 1,544 = −3 T3 = v ( E3 – E2 ) + (1 – v ) T2 = .8 (1,577 – 1,541) + (1 − .8)( −3) = 28.2

The rest of the Et’s and Tt’s appear in the table: Holt’s Year

t

Imports

Et w=.3

1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

1544 1541 1668 1790 1808 1904 2018 1681 1884 2086 2039 2014 2183 2179 1743 1791 1663 1563 1358 1181 1056 1261 1229 1054 598

1541.000 1577.000 1660.640 1755.634 1863.498 1982.923 1973.839 1958.099 1989.897 2021.120 2039.768 2097.337 2156.994 2073.236 1949.748 1786.807 1613.035 1417.886 1213.809 1025.581 962.622 980.561 1000.244

Tt v=.8 -3.000 28.200 72.552 90.506 104.392 116.419 16.016 -9.389 23.561 29.691 20.856 50.226 57.771 -55.452 -109.880 -152.329 -169.484 -190.016 -201.265 -190.835 -88.534 -3.355 15.075

To forecast imports in 2019 using the Holt’s Model up through 2018: For w = .3 andν = .8 , 𝐹 c.

=𝐹

= 𝐸 + 𝑇 = 1,000.244 + 15.075 = 1,015.319

The error forecast for the exponentially smoothed series is 𝑌 −473.63. The error forecast for the Holt’s series is 𝑌

–𝐹

–𝐹

= 598 − 1071.63 =

= 598 − 1015.319 = −417.319.

Copyright © 2022 Pearson Education, Inc.

833


834

Chapter 14

The error for the Holt’s forecast is smaller than the error for the exponentially smoothed forecast. 14.29

a.

From Exercise 4.22, we have the expnonentially smoothed series:

Year

Quarter

Time Period

S&P500

Smoothed w=.3

Smoothed w=.7

2012

1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

1348.8 1349.70 1400.9 1418.1 1514 1609.5 1674.9 1768.7 1834.9 1900.4 1975.9 2009.3 2067.89 2063.11 1920.03 2043.94 2059.74 2098.86 2168.27 2238.83 2362.72 2423.41 2519.36 2673.61 2640.87 2718.37 2913.98 2506.85

1348.80 1349.07 1364.62 1380.66 1420.66 1477.32 1536.59 1606.22 1674.83 1742.50 1812.52 1871.55 1930.45 1970.25 1955.18 1981.81 2005.19 2033.29 2073.78 2123.30 2195.12 2263.61 2340.34 2440.32 2500.48 2565.85 2670.29 2621.26

1348.80 1349.43 1385.46 1408.31 1464.63 1566.04 1642.24 1730.76 1803.66 1871.38 1944.54 1989.87 2044.48 2057.52 1961.28 2019.14 2047.56 2083.47 2142.83 2210.03 2316.91 2391.46 2480.99 2615.82 2633.36 2692.87 2847.65 2609.09

2013

2014

2015

2016

2017

2018

The forecasts using the exponentially smoothed values with 𝑤 = .7 are: 𝐹 𝐹 𝐹 𝐹 b.

19, 19, 19, 19,

=𝐹 =𝐹 =𝐹 =𝐹

= 𝐸 = 2,609.09 = 𝐹 = 2,609.09 = 𝐹 = 2,609.09 = 𝐹 = 2,609.09

The forecasts using the exponentially smoothed values with 𝑤 = .3 are: 𝐹 19, = 𝐹 = 𝐸 = 2,621.26 𝐹 19, = 𝐹 = 𝐹 = 2,621.26 𝐹 19, = 𝐹 = 𝐹 = 2,621.26 𝐹 19, = 𝐹 = 𝐹 = 2,621.26

Copyright © 2022 Pearson Education, Inc.


Time Series: Descriptive Analyses, Models, and Forecasting 14.30

We first compute the Holt’s values for the years 2017-2018. With 𝑤 = .3 and 𝜈 = .5, 𝐸 = 𝑌 = 2,423.41 𝐸 = 𝑤𝑌 + (1– 𝑤)(𝐸 + 𝑇 ) = .3(2,519.36) + (1 − .3)(2,423.410 + 60.690) = 2,494.678 𝑇 = 𝑌 – 𝑌 = 2,423.41 − 2,362.72 = 60.690 𝑇 = 𝑣(𝐸 – 𝐸 ) + (1– 𝑣)𝑇 = .5(2,494.678 − 2,423.41) + (1 − .5)(60.69) = 65.979 The rest of the Et’s and Tt’s appear in the table that follows. Holt’s Year

Quarter

S&P500

2017

1 2 3 4 1 2 3 4

2362.72 2423.41 2519.36 2673.61 2640.87 2718.37 2913.98 2506.85

2018

Holt’s

Et w=.3

Tt v=.5

Et w=.7

Tt v=.5

2423.410 2494.678 2594.543 2666.486 2736.254 2841.092 2803.273

60.690 65.979 82.922 77.433 73.600 89.219 25.700

2423.410 2508.782 2646.071 2673.978 2725.013 2874.925 2648.577

60.690 73.031 105.160 66.534 58.784 104.348 -61.000

To forecast using the Holt’s Model with 𝑤 = .3 and 𝜈 = .5: 𝐹 𝐹 𝐹 𝐹

19, 19, 19, 19,

=𝐹 =𝐹 =𝐹 =𝐹

= 𝐸 + 𝑇 = 2,803.273 + 25.700 = 2,828.973 = 𝐸 + 2𝑇 = 2,803.273 + 2(25.700) = 2,854.673 = 𝐸 + 3𝑇 = 2,803.273 + 3(25.700) = 2,880.373 = 𝐸 + 4𝑇 = 2,803.273 + 4(25.700) = 2,906.073

With 𝑤 = .7 and 𝜈 = .5, 𝐸 = 𝑌 = 2,423.41 𝐸 = 𝑤𝑌 + (1– 𝑤)(𝐸 + 𝑇 ) = .7(2,519.36) + (1 − .7)(2,423.410 + 60.690) = 2,508.782 𝑇 = 𝑌 – 𝑌 = 2,423.41 − 2,362.72 = 60.690 𝑇 = 𝑣(𝐸 – 𝐸 ) + (1– 𝑣)𝑇 = .5(2,508.782 − 2,423.41) + (1 − .5)(60.69) = 73.031 The rest of the Et’s and Tt’s appear in the table above. To forecast using the Holt’s Model with w = .7 andν = .5 : 𝐹 19, = 𝐹 = 𝐸 + 𝑇 = 2,648.577 + (−61.000) = 2,587.577 𝐹 19, = 𝐹 = 𝐸 + 2𝑇 = 2,648.577 + 2(−61.000) = 2,526.577 𝐹 19, = 𝐹 = 𝐸 + 3𝑇 = 2,648.577 + 3(−61.000) = 2,465.577 𝐹 19, = 𝐹 = 𝐸 + 4𝑇 = 2,648.577 + 4(−61.000) = 2,404.577 14.31

a.

We first compute the exponentially smoothed values E1, E2, … , Et for 2015 through 2018. 𝐸 = 𝑌 = 1,250.75 For 𝑤 = .5, 𝐸 = 𝑤𝑌 + (1– 𝑤)𝐸 = .5(1,227.08) + (1 − .5)(1,250.75) = 1,238.915 𝐸 = 𝑤𝑌 + (1– 𝑤)𝐸 = .5(1,178.63) + (1 − .5)(1,238.915) = 1,208.773 Copyright © 2022 Pearson Education, Inc.

835


836

Chapter 14

The rest of the values are found in the table: Exponential Smoothed YEAR

MONTH

PRICE

w = .5

2015 2015 2015 2015 2015 2015 2015 2015 2015 2015 2015 2015 2016 2016 2016 2016 2016 2016 2016 2016 2016 2016 2016 2016 2017 2017 2017 2017 2017 2017 2017 2017 2017 2017 2017 2017 2018 2018 2018 2018 2018 2018 2018 2018

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug

1,250.75 1,227.08 1,178.63 1,198.93 1,198.63 1,181.50 1,128.31 1,117.93 1,124.77 1,159.25 1,086.44 1,075.74 1,097.91 1,199.50 1,245.14 1,242.26 1,260.95 1,276.40 1,336.66 1,340.17 1,326.61 1,266.55 1,238.35 1,157.36 1,192.10 1,234.20 1,231.42 1,266.88 1,246.04 1,260.26 1,236.84 1,283.04 1,314.07 1,279.51 1,281.90 1,264.45 1,331.30 1,330.73 1,324.66 1,334.76 1,303.45 1,281.57 1,237.71 1,201.71

1250.750 1238.915 1208.773 1203.851 1201.241 1191.370 1159.840 1138.885 1131.828 1145.539 1115.989 1095.865 1096.887 1148.194 1196.667 1219.463 1240.207 1258.303 1297.482 1318.826 1322.718 1294.634 1266.492 1211.926 1202.013 1218.106 1224.763 1245.822 1245.931 1253.095 1244.968 1264.004 1289.037 1284.273 1283.087 1273.768 1302.534 1316.632 1320.646 1327.703 1315.577 1298.573 1268.142 1234.926

Holt’s Et w=.5

Tt v=.5

1227.080 1191.020 1180.043 1179.126 1174.978 1147.940 1124.323 1114.337 1129.192 1107.729 1086.325 1084.062 1137.187 1202.148 1243.937 1273.757 1293.190 1328.839 1350.374 1351.810 1316.198 1271.880 1200.844 1171.825 1183.434 1200.540 1234.543 1249.209 1262.860 1257.325 1272.536 1298.283 1297.823 1294.210 1280.601 1303.184 1321.219 1329.580 1337.580 1325.220 1302.658 1264.175 1220.317

-23.670 -29.865 -20.421 -10.669 -7.408 -17.223 -20.420 -15.203 -0.174 -10.819 -16.111 -9.187 21.969 43.465 42.627 36.223 27.828 31.739 26.637 14.037 -10.788 -27.553 -49.295 -39.157 -13.774 1.666 17.835 16.250 14.950 4.708 9.960 17.853 8.697 2.542 -5.534 8.525 13.280 10.820 9.410 -1.475 -12.019 -25.251 -34.554

Copyright © 2022 Pearson Education, Inc.


Time Series: Descriptive Analyses, Models, and Forecasting 2018 2018 2018 2018 2019 2019 2019 2019 2019 2019 2019 2019 2019

Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep

1,198.39 1,215.39 1,220.65 1,250.40 1,291.75 1,320.07 1,300.90 1,285.91 1,283.70 1,359.04 1,412.89 1,500.41 1,510.58

1216.658 1216.024 1218.337 1234.368 1263.059 1291.565 1296.232 1291.071 1287.386 1323.213 1368.051 1434.231 1472.405

1192.076 1188.034 1195.482 1220.373 1261.000 1303.161 1318.884 1314.754 1304.373 1331.685 1379.104 1455.020 1509.411

-31.397 -17.720 -5.136 9.878 25.252 33.707 24.715 10.292 -0.044 13.633 30.526 53.221 53.806

To forecast the monthly prices for 2019 using the data through December 2018: Ft +1 = Et Ft + I = Ft + i = Et for i = 2, 3, … 𝐹 = 𝐸Dec, =1,234.368

Year 2019

b.

Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Forecast 1,234.368 1,234.368 1,234.368 1,234.368 1,234.368 1,234.368 1,234.368 1,234.368 1,234.368 1,234.368 1,234.368 1,234.368

To compute the one-step-ahead forecasts for 2019, we use Ft +1 = Et , where Et is recomputed each time period (month). The forecasts are obtained from the table in part a. Year 2019

c.

Month Forecast 1,234.368 Jan 1,263.059 Feb 1,291.565 Mar 1,296.232 Apr 1,291.071 May 1,287.386 Jun 1,323.213 Jul 1,368.051 Aug 1,434.231 Sep First, we compute the Holt’s values for the years 2015-2018. With w = .5 andν = .5 , Copyright © 2022 Pearson Education, Inc.

837


838

Chapter 14 𝐸 = 𝑌 = 1,227.08 𝐸 = 𝑤𝑌 + (1– 𝑤)(𝐸 + 𝑇 ) = .5(1,178.63) + (1 − .5)(1,227.08 − 23.67) = 1,191.02 𝑇 = 𝑌 – 𝑌 = 1,227.08 − 1,250.75 = −23.67 𝑇 = 𝑣(𝐸 – 𝐸 ) + (1– 𝑣)𝑇 = .5(1,191.02 − 1,227.08) + (1 − .5)(−23.67) = −29.865 The rest of the Et’s and Tt’s appear in the table in part a. To forecast the monthly prices for 2019 using the data through December 2018: 𝐹 = 𝐸 + 𝑇 = 1,220.373 + 9.878 = 1,230.251 𝐹 = 𝐸 + 2𝑇 = 1,220.373 + 2(9.878) = 1,240.129 Ft + n = Et + nTt The rest of the forecasts appear in the table: Year 2019 2019 2019 2019 2019 2019 2019 2019 2019 2019 2019 2019

Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Forecast 1230.251 1240.129 1250.007 1259.885 1269.763 1279.641 1289.519 1299.397 1309.275 1319.153 1329.031 1338.909

To compute the one-step-ahead forecasts for 2019, we use Ft +1 = Et + Tt where Et and Tt are recomputed each time period. The forecasts are obtained from the table in part a. 𝐹Jan, 19 = 𝐸Dec, 20 + 𝑇Dec, 20 = 1,220.373 + 9.878 = 1,230.251 𝐹Feb, 19 = 𝐸Jan, 2019 + 𝑇Jan, 2019 = 1,261.000 + 25.252 = 1,286.252 The rest of the values appear in the table: Year 2019 2019 2019 2019 2019 2019 2019 2019 2019 2019

14.32

a.

Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct

Forecast 1230.251 1286.252 1336.868 1343.599 1325.047 1304.329 1345.318 1409.631 1508.242 1563.217

From Exercise 14.25a, the forecasts for 2018-2019 using 𝑤 = .3 are:

Copyright © 2022 Pearson Education, Inc.


Time Series: Descriptive Analyses, Models, and Forecasting =𝐹 =𝐹

𝐹 𝐹

839

= 𝐸 = 21.379 = 𝐹 = 21.379

The errors are the differences between the actual values and the predicted values. Thus, the errors are: −𝐹 = 25.5 − 21.379 = 4.121 𝑌 −𝐹 = 26.3 − 21.379 = 4.921 𝑌 b.

From Exercise 14.25a, the forecasts for 2018-2019 using 𝑤 = .7 are: = 𝐹 = 𝐸 = 24.685 𝐹 = 𝐹 = 𝐹 = 24.685 𝐹 The errors are: 𝑌 𝑌

c.

𝑀𝐴𝑃𝐸 =

𝑅𝑀𝑆𝐸 =

|

|

=

|

.

| |

.

.

.

.

)

(

=

(

.

.

=

.

.

100 =

|

.

= 4.521

. .

.

)

(

100 = 17.436

.

.

)

.

=

= 4.539

For the exponentially smoothed forecasts with 𝑤 = .7, MAD =

𝑀𝐴𝑃𝐸 =

𝑅𝑀𝑆𝐸 =

14.33

= 25.5 − 24.685 = 0.815 = 26.3 − 24.685 = 1.615

For the exponentially smoothed forecasts with 𝑤 = .3, MAD =

d.

−𝐹 −𝐹

|

|

=

|

| |

.

.

.

.

.

)

=

(

.

.

.

=

.

.

100 =

(

|

.

= 1.215

. .

)

(

.

100 = 4.668

.

)

=

.

= 1.279

e.

All three measures of forecast accuracy for the exponentially smoothed series using 𝑤 = .7 are smaller than the corresponding values for the exponentially smoothed series using 𝑤 = .3, we recommend using the exponentially smoothed series using 𝑤 = .7.

a.

From Exercise 14.25b, the Holt’s forecasts for 2018-2019 using 𝑤 = .3 and 𝜈 = .7 are: 𝐹

=𝐹 𝐹

= 𝐸 + 𝑇 = 27.638 + 2.727 = 30.365 = 𝐹 = 𝐸 + 2𝑇 = 27.638 + 2(2.727) = 33.092

The errors are the differences between the actual values and the predicted values. Thus, the errors are: Copyright © 2022 Pearson Education, Inc.


840

Chapter 14 −𝐹 −𝐹

𝑌 𝑌 b.

From Exercise 14.25b, the Holt’s forecasts for 2018-2019 using 𝑤 = .7 and 𝜈 = .3 are: =𝐹 𝐹

= 𝐸 + 𝑇 = 25.739 + 1.834 = 27.573 = 𝐹 = 𝐸 + 2𝑇 = 25.739 + 2(1.834) = 29.407

−𝐹 −𝐹

= 25.5 − 27.573 = −2.073 = 26.3 − 29.407 = −3.107

𝐹

The errors are: 𝑌 𝑌 c.

For the Holt’s forecasts with 𝑤 = .3 and 𝜈 = .7, MAD =

𝑀𝐴𝑃𝐸 =

𝑅𝑀𝑆𝐸 = d.

|

|

=

|

.

| |

.

.

.

.

)

(

=

(

.

.

=

.

= 5.829

.

.

100 =

|

.

.

.

)

(

100 = 22.452

.

.

)

=

.

= 5.908

For the Holt’s forecasts with 𝑤 = .7 and 𝜈 = .3, MAD =

𝑀𝐴𝑃𝐸 =

𝑅𝑀𝑆𝐸 =

14.34

= 25.5 − 30.365 = −4.865 = 26.3 − 33.092 = −6.792

|

|

=

|

.

| |

.

.

.

.

)

(

=

(

.

.

.

=

.

= 2.5885

.

.

100 =

|

.

.

)

(

.

100 = 9.972

.

)

=

.

= 2.641

e.

All of the measures of forecast accuracy for the Holt’s forecast with 𝑤 = .7 and 𝑣 = .3 are smaller than the corresponding values for the Holt’s forecast with 𝑤 = .3 and 𝑣 = .7. We recommend using the Holt’s forecast with 𝑤 = .7 and 𝑣 = .3.

a.

From Exercise 14.29a, the forecasts for the 4 quarters of 2019 using 𝑤 = .7 are: 𝐹 𝐹 𝐹 𝐹

=𝐹 =𝐹 =𝐹 =𝐹

19, 19, 19, 19,

= 𝐸 = 2,609.09 = 𝐹 = 2,609.09 = 𝐹 = 2,609.09 = 𝐹 = 2,609.09

For the exponentially smoothed forecasts with w = .7 : MAD =

∑|

|

= 386.835

|

.

.

| |

.

.

| |

.

.

Copyright © 2022 Pearson Education, Inc.

| |

.

.

|

=

.

=


Time Series: Descriptive Analyses, Models, and Forecasting

.

𝑀𝐴𝑃𝐸 =

.

.

.

.

100 =

.

.

.

.

841

.

.

.

100 =

12.713

∑(

𝑅𝑀𝑆𝐸 =

) .

b.

(|

=

.

|)

.

(

.

.

)

(|

.

.

|)

(

.

.

)

=

= 413.257

From Exercise 14.29b, the forecasts for the 4 quarters of 2019 using 𝑤 = .3 are: 𝐹 𝐹 𝐹 𝐹

=𝐹 =𝐹 =𝐹 =𝐹

19, 19, 19, 19,

= 𝐸 = 2,621.26 = 𝐹 = 2,621.26 = 𝐹 = 2,621.26 = 𝐹 = 2,621.26

For the exponentially smoothed forecasts with w = .3 : MAD =

∑|

|

= 374.665

𝑀𝐴𝑃𝐸 =

|

.

| |

.

.

.

| |

.

.

.

| |

.

.

.

100 =

.

.

.

.

.

.

.

|

.

=

=

.

.

.

100 =

12.306

∑(

𝑅𝑀𝑆𝐸 =

)

=

(|

.

.

|)

(

.

.

)

(|

.

.

|)

(

.

.

)

=

= 401.887

14.35

c.

For all three measures of forecast accuracy, the exponentially smoothed series with 𝑤 = .3 is smaller than the exponentially smoothed series with 𝑤 = .7. Thus, the more accurate series would be the exponentially smoothed series with 𝑤 = .3.

a.

From Exercise 14.30, the forecasts for the 4 quarters of 2019 using the Holt’s forecasts with 𝑤 = .3 and 𝜈 = .5 are: 𝐹 𝐹 𝐹 𝐹 𝐹 𝐹 𝐹 𝐹

19, 19, 19, 19, 19, 19, 19, 19,

=𝐹 =𝐹 =𝐹 =𝐹

= 𝐸 + 𝑇 = 2,803.273 + 25.700 = 2,828.973 = 𝐸 + 2𝑇 = 2,803.273 + 2(25.700) = 2,854.673 = 𝐸 + 3𝑇 = 2,803.273 + 3(25.700) = 2,880.373 = 𝐸 + 4𝑇 = 2,803.273 + 4(25.700) = 2,906.073

=𝐹 =𝐹 =𝐹 =𝐹

= 𝐸 + 𝑇 = 2,768.806 + 4.997 = 2,773.803 = 𝐸 + 2𝑇 = 2,768.806 + 2(4.997) = 2,778.800 = 𝐸 + 3𝑇 = 2,768.806 + 3(4.997) = 2,783.797 = 𝐸 + 4𝑇 = 2,768.806 + 4(4.997) = 2,788.794

Copyright © 2022 Pearson Education, Inc.


842

Chapter 14 MAD =

∑|

|

= 214.627

|

,

| |

.

.

𝑀𝐴𝑃𝐸 =

.

,

.

,

.

.

, .

.

100 =

| |

.

.

,

.

| |

.

.

, .

.

.

,

.

|

.

,

=

.

=

. .

100 =

6.960

∑(

𝑅𝑀𝑆𝐸 =

)

(|

.

, .

b.

= |)

.

(

.

,

.

)

(|

.

,

|)

.

(

.

,

.

)

=

= 256.332

From Exercise 14.30, the forecasts for the 4 quarters of 2019 using the Holt’s forecasts with 𝑤 = .7 and 𝜈 = .5 are: 𝐹 𝐹 𝐹 𝐹

MAD =

=𝐹 =𝐹 =𝐹 =𝐹

= 𝐸 + 𝑇 = 2,648.577 + (−61.000) = 2,587.577 = 𝐸 + 2𝑇 = 2,648.577 + 2(−61.000) = 2,526.577 = 𝐸 + 3𝑇 = 2,648.577 + 3(−61.000) = 2,465.577 = 𝐸 + 4𝑇 = 2,648.577 + 4(−61.000) = 2,404.577

𝐹 𝐹 𝐹 𝐹

=𝐹 =𝐹 =𝐹 =𝐹

19, 19, 19, 19,

∑|

19, 19, 19, |

.

𝑀𝐴𝑃𝐸 =

19,

=

|

= 𝐸 + 𝑇 = 2,648.806 + (−61.000) = 2,587.806 = 𝐸 + 2𝑇 = 2,648.806 + 2(−61.000) = 2,526.806 = 𝐸 + 3𝑇 = 2,648.806 + 3(−61.000) = 2,465.806 = 𝐸 + 4𝑇 = 2,648.806 + 4(−61.000) = 2,404.806

.

,

| |

.

.

,

| |

.

.

,

| |

.

.

,

|

.

=

= 499.619 .

,

.

.

, .

.

100 =

.

.

, .

.

.

,

. .

100 =

16.384

𝑅𝑀𝑆𝐸 =

∑( (|

) .

, .

c.

= .

|)

(

.

,

.

)

(|

.

,

.

|)

(

.

,

.

)

=

= 542.290

For all three measures of error, the Holt’s series with 𝑤 = .3 and 𝜈 = .5 is smaller than the Holt’s series with 𝑤 = .7 and 𝜈 = .5. Thus, the more accurate series would be the Holt’s series with 𝑤 = .3 and 𝜈 = .5.

Copyright © 2022 Pearson Education, Inc.


Time Series: Descriptive Analyses, Models, and Forecasting 14.36

a.

843

From Exercise 14.31, the actual data and the forecasts using the exponential smoothing and the Holt Winters forecasts are:

Year

Month

Price

2019

Jan Feb Mar Apr May Jun Jul Aug Sep

1,291.75

Smoothed Forecast

Holt’s Forecast

1,234.37 1,234.37 1,234.37 1,234.37 1,234.37 1,234.37 1,234.37 1,234.37 1,234.37

1230.251

1,320.07 1,300.90 1,285.91 1,283.70 1,359.04 1,412.89 1,500.41 1,510.58

1240.129 1250.007 1259.885 1269.763 1279.641 1289.519 1299.397 1309.275

For the exponential smoothing forecasts with w = .5 : MAD =

∑|𝑌 − 𝐹 | |1291.75 − 1234.37| + |1320.07 − 1234.37| + ⋯ + |1510.58 − 1234.37| = 𝑚 9 1155.94 = = 128.438 9 .

𝑀𝐴𝑃𝐸 =

∑(

𝑅𝑀𝑆𝐸 =

)

=

(|

.

.

.

.

100 =

.

|)

.

(

.

.

.

.

. .

)

⋯ (

100 = 9.081

.

.

)

.

=

=

154.431 For the Holt’s forecasts with w = .5 andν = .5 : ∑|𝑌 − 𝐹 | 𝑚 |1291.75 − 1230.251| + |1320.07 − 1240.129| + ⋯ + |1510.58 − 1309.275| 837.38 = = 9 9 = 93.043

MAD =

𝑀𝐴𝑃𝐸 =

𝑅𝑀𝑆𝐸 =

.

∑(

)

=

(|

.

.

.

.

100 =

.

.

|)

(

.

.

.

.

)

. .

⋯ (

100 = 6.571

.

.

)

=

.

=

113.573 For all three measures of forecast errors, the Holt’s forecasts had smaller errors than the exponential smoothing forecasts. Thus, the Holt’s forecasts are better.

Copyright © 2022 Pearson Education, Inc.


844

Chapter 14 b.

From Exercise 14.31, the actual data and the forecasts using the exponential smoothing one-stepahead and the Holt’s one-step-ahead forecasts are: Exponential

Year 2019

Month Jan Feb Mar Apr May Jun Jul Aug Sep

Price

Forecast

1,291.75 1,320.07 1,300.90 1,285.91 1,283.70 1,359.04 1,412.89 1,500.41 1,510.58

1,234.37 1,263.06 1,291.57 1,296.23 1,291.07 1,287.39 1,323.21 1,368.05 1,434.23

Holt’s Forecast 1230.251 1286.252 1336.868 1343.599 1325.047 1304.329 1345.318 1409.631 1508.242

For the exponential smoothing one-step-ahead forecasts with w = .5 : MAD =

∑|𝑌 − 𝐹 | |1291.75 − 1234.37| + |1320.07 − 1263.06| + ⋯ + |1510.58 − 1434.23| = 𝑚 9 511.45 = = 56.828 9

𝑀𝐴𝑃𝐸 =

𝑅𝑀𝑆𝐸 =

.

∑(

)

=

(|

.

.

.

.

100 =

.

|)

.

(

.

.

.

.

. .

)

⋯ (

100 = 4.039

.

.

)

.

=

=

69.374 For the Holt’s one-step-ahead forecasts with w = .5 andν = .5 : MAD =

∑|𝑌 − 𝐹 | 𝑚

|1291.75 − 1230.251| + |1320.07 − 1286.252| + ⋯ + |1510.58 − 1508.242| 9 445.72 = = 49.525 9 =

𝑀𝐴𝑃𝐸 =

𝑅𝑀𝑆𝐸 =

.

∑(

)

=

(|

.

.

.

.

100 =

.

.

|)

(

.

.

.

.

)

. .

⋯ (

100 = 3.645

.

.

)

=

54.836 For all three measures of forecast errors, the Holt’s forecasts have smaller errors than the exponentially smoothed forecasts. Thus, the Holt’s forecasts are better.

Copyright © 2022 Pearson Education, Inc.

.

=


Time Series: Descriptive Analyses, Models, and Forecasting 14.37

a.

To compute the exponentially smoothed values, we follow these steps: 𝐸 = 𝑌 = 64,764 𝐸 = 𝑤𝑌 + (1– 𝑤)𝐸 = .8(65,743) + (1 − .8)(64,764) = 65, 547.2 𝐸 = 𝑤𝑌 + (1– 𝑤)𝐸 = .8(66,470) + (1 − .8)(65, 547.2) = 66,285.44 The rest of the values are computed in a similar manner and are listed in the table:

Year

Enroll

1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019

64764 65743 66470 66983 67667 68146 69936 71215 71442 71688 72075 73318 73865 74079 77288 78519 79,043 78,426 77,772 77,214 77,056 77,232 76,409 76,346 76,476

Exponentially Smoothed w = .8 64764.00 65547.20 66285.44 66843.49 67502.30 68017.26 69552.25 70882.45 71330.09 71616.42 71983.28 73051.06 73702.21 74003.64 76631.13 78141.43 78862.69 78513.34 77920.27 77355.25 77115.85 77208.77

Holt’s Et Tt w = .8 v = .7 65743.00 66520.40 67058.06 67670.76 68174.39 69691.22 71154.86 71662.87 71837.58 72098.36 73131.83 73880.38 74192.68 76758.68 78553.10 79312.09 78819.60 77977.48 77247.59 76956.40 77094.74

979.00 837.88 627.72 617.21 537.71 1223.09 1391.48 773.05 354.21 288.81 810.07 767.01 448.71 1930.81 1835.34 1081.89 -20.18 -595.53 -689.58 -410.71 -26.38

The forecasts for 2017-2019 using the exponential smoothing series with w = .8 are: 𝐹 = 𝐹 = 𝐸 = 77,208.77 = 𝐹 = 𝐹 = 77,208.77 𝐹 𝐹 19 = 𝐹 = 𝐹 = 77,208.77 b.

To compute the Holt’s values with 𝑤 = .8 and 𝜈 = .7: 𝐸 = 𝑌 = 65, 743 𝐸 = 𝑤𝑌 + (1– 𝑤)(𝐸 + 𝑇 ) = .8(66,470) + (1 − .8)(65,743 + 979) = 66, 520.40 𝑇 = 𝑌 – 𝑌 = 65,743-64,764=979 𝑇 = 𝑣(𝐸 – 𝐸 ) + (1– 𝑣)𝑇 = .7(66,520.4-65,743) + (1 − .7)(979) = 837.88

Copyright © 2022 Pearson Education, Inc.

845


846

Chapter 14 The rest of the Et’s and Tt’s appear in the table in part a, The forecasts for 2012-2014 using the Holt’s series with 𝑤 = .8 and 𝜈 = .7 are: =𝐹 =𝐹 =𝐹

𝐹 𝐹 𝐹 c.

= 𝐸 + 𝑇 = 77,094.74 + (−26.38) = 77,068.36 = 𝐸 + 2𝑇 = 77,094.74 + 2(−26.38) = 77,041.98 = 𝐸 + 3𝑇 = 77,094.74 + 3(−26.38) = 77,015.60

For the exponential smoothing forecasts with 𝑤 = .8: MAD =

∑|

|

=

|

.

| |

)

=

,

,

,

.

(|

,

| |

.

,

,

,

,

100 =

∑(

𝑅𝑀𝑆𝐸 =

,

,

𝑀𝐴𝑃𝐸 =

,

,

.

,

,

,

,

|)

.

(

|

.

.

= .

,

,

,

.

)

= 798.44

100 = 1.045

(

,

,

.

)

.

,

.

= 631.65

=

=

800.199 For the Holt’s forecasts with 𝑤 = .8 and 𝜈 = .7: MAD =

∑|

𝑀𝐴𝑃𝐸 =

𝑅𝑀𝑆𝐸 =

|

=

|

,

.

| |

,

∑(

,

=

(|

,

,

.

,

| |

.

,

,

,

,

100 =

)

,

,

.

,

,

,

,

.

|)

(

,

|

.

= .

,

,

.

)

(

100 = 0.827

,

,

.

)

=

,

,

.

=

635.167 For all three measures of forecast errors, the Holt’s forecasts have smaller errors than the exponential smoothing forecasts. Thus, the Holt’s forecasts are better. 14.38

a.

Using MINITAB, the output is: Regression Analysis: Price versus t The regression equation is Price = 24.7 + 0.0910 t Predictor Constant t S = 1.497

Coef 24.6975 0.09103

SE Coef 0.7851 0.08119

R-Sq = 8.2%

T 31.46 1.12

P 0.000 0.281

R-Sq(adj) = 1.7%

Analysis of Variance Source Regression Residual Error Total

DF 1 14 15

SS 2.817 31.379 34.197

MS 2.817 2.241

Copyright © 2022 Pearson Education, Inc.

F 1.26

P 0.281


Time Series: Descriptive Analyses, Models, and Forecasting

847

The fitted model is Yˆt = 24.6975 + .09103t . b.

c.

d.

The estimates of the parameters in the model, E (Yt ) = β 0 + β1t , are βˆ0 = 24.6975

The price is estimated to be 24.6975 cents/pound for t = 0 or for 2004.

βˆ1 = .09103

The price is estimated to increase by .09103 cents/pound for each additional year.

The forecast for 2021, using t = 17 , 𝑌

= 24.6975 + .09103(17) = 26.2450

The forecast for 2022, using t = 18 , 𝑌

= 24.6975 + .09103(18) = 26.3360

Using MINITAB, Predicted Values for New Observations New Obs Fit SE Fit 95.0% CI 1 26.245 0.785 (24.561, 27.929)

95.0% PI 29.871)

(22.619,

Values of Predictors for New Observations New Obs 1

t 17.0

Predicted Values for New Observations New Obs Fit SE Fit 95.0% CI 2 26.336 0.857 (24.497, 28.175)

95.0% PI 30.036)

(22.636,

Values of Predictors for New Observations New Obs 2

t 18.0

From the printout, the 95% forecast intervals are: 2021: ( 22.619, 29.871)

2022: ( 22.636, 30.036)

We are 95% confident that the actual price in 2021 will be between 22.619 and 29.871. We are 95% confident that the actual price in 2022 will be between 22.636 and 30.036. 14.39

a.

 1 if quarter 1 Let x1 =   0 otherwise

 1 if quarter 2 x2 =   0 otherwise

 1 if quarter 3 x3 =   0 otherwise

t = time = 1, 2, ... , 40 The model is E (Yt ) = β 0 + β1t + β 2 x1 + β 3 x2 + β 4 x3 b.

Using MINITAB, the output is:

Copyright © 2022 Pearson Education, Inc.


848

Chapter 14 Regression Analysis: Y versus T, X1, X2, X3 The regression equation is Y = 11.5 + 0.510 T - 3.95 X1 - 2.09 X2 - 4.52 X3 Predictor Constant T X1 X2 X3

Coef 11.4933 0.509848 -3.9505 -2.0903 -4.5202

S = 0.5528

SE Coef 0.2420 0.007607 0.2483 0.2477 0.2473

R-Sq = 99.3%

T 47.49 67.02 -15.91 -8.44 -18.28

P 0.000 0.000 0.000 0.000 0.000

R-Sq(adj) = 99.2%

Analysis of Variance Source Regression Residual Error Total Source T X1 X2 X3

DF 4 35 39

DF 1 1 1 1

SS 1558.79 10.69 1569.48

MS 389.70 0.31

F 1275.44

P 0.000

Seq SS 1433.96 22.56 0.21 102.06

The fitted model is Yˆt = 11.4933 + .5098t − 3.9505x1 − 2.0903x2 − 4.5202x3 . To determine if the model is adequate, we test: H 0 : β1 = β 2 = β 3 = β 4 = 0 H a : At least oneβ i ≠ 0

The test statistic is F = 1, 275.44 . The rejection region requires α = .05 in the upper tail of the F-distribution with ν 1 = k = 4 and ν 2 = n − (k + 1) = 40 − (4 + 1) = 35 . From Table VI, Appendix D, F.05 ≈ 2.69 . The rejection region is 𝐹 > 2.69. Since the observed value of the test statistic falls in the rejection region ( F = 1, 275.44 > 2.69) , H0 is rejected. There is sufficient evidence to indicate the model is useful at α = .05 . c.

From MINITAB, the predicted values and prediction intervals are: Predicted Values for New Observations New Obs Fit SE Fit 95.0% CI 1 28.4467 0.2420 ( 27.9554, 28.9379) Values of Predictors for New Observations New Obs 1

T 41.0

X1 1.00

X2 0.000000

X3 0.000000

Copyright © 2022 Pearson Education, Inc.

95.0% PI ( 27.2217, 29.6716)


Time Series: Descriptive Analyses, Models, and Forecasting Predicted Values for New Observations New Obs Fit SE Fit 95.0% CI 2 30.8167 0.2420 ( 30.3254, 31.3079)

849

95.0% PI ( 29.5917, 32.0416)

Values of Predictors for New Observations New Obs 2

T 42.0

X1 0.000000

X2 1.00

X3 0.000000

Predicted Values for New Observations New Obs Fit SE Fit 95.0% CI 3 28.8967 0.2420 ( 28.4054, 29.3879)

95.0% PI ( 27.6717, 30.1216)

Values of Predictors for New Observations New Obs 3

T 43.0

X1 0.000000

X2 0.000000

X3 1.00

Predicted Values for New Observations New Obs Fit SE Fit 95.0% CI 4 33.9267 0.2420 ( 33.4354, 34.4179)

95.0% PI ( 32.7017, 35.1516)

Values of Predictors for New Observations New Obs 4

T 44.0

X1 0.000000

X2 0.000000

X3 0.000000

From the above output, the predicted values and 95% prediction intervals are: For year = 11, quarter = 1, Yˆ41 = 28.4467 and the 95% PI is ( 27.22, 29.67 ) For year = 11, quarter = 2, Yˆ42 = 30.8167 and the 95% PI is ( 29.59, 32.04) For year = 11, quarter = 3, Yˆ43 = 28.8967 and the 95% PI is ( 27.67, 30.12) For year = 11, quarter = 4, Yˆ44 = 33.9267 and the 95% PI is ( 32.70, 35.15) 14.40

The major advantage of regression forecasts over the exponentially smoothed forecasts is that prediction intervals can be formed using the regression forecasts and not using the exponentially smoothed forecasts. Future forecasts using exponential smoothing are all equal to the last exponentially smoothed value. The main problem with regression forecasts is that we have to assume that the trend in the future remains the same. This is not necessarily true. Depending on what happens in the future, either the regression forecasts or the exponential smoothing forecasts could be more accurate.

14.41

a.

Using MINITAB, the results are: Regression Equation IntRate

=

8.437 - 0.1884 t

Coefficients Term

Coef

SE Coef

T-Value

P-Value

VIF

Constant t

8.437 -0.1884

0.189 0.0120

44.63 -15.68

0.000 0.000

1.00

Model Summary S

R-sq

R-sq(adj)

R-sq(pred)

0.513639

90.43%

90.06%

89.00%

Copyright © 2022 Pearson Education, Inc.


850

Chapter 14 Settings Variable

Setting

t

28

Prediction Fit

SE Fit

95% CI

95% PI

3.16246

0.199457

(2.75247, 3.57245)

(2.02985, 4.29507)

The fitted model is: 𝑌 = 8.437 − .1884𝑡 b.

For 2020, 𝑡 = 28. The forecast for the average interest rate in 2020 is shown in the printout as 3.162. From the printout, the 95% prediction interval is (2.030, 4.295).

14.42

a.

Using 𝑡 = 1 for 2000, the results using MINITAB are: Coefficients Term

Coef

SE Coef

T-Value

P-Value

VIF

Constant t

6.230 -0.1796

0.709 0.0606

8.79 -2.96

0.000 0.008

1.00

Model Summary S

R-sq

R-sq(adj)

R-sq(pred)

1.68214

31.60%

28.00%

17.17%

Settings Variable

Setting

t

21

Prediction Fit

SE Fit

95% CI

95% PI

2.45824

0.761179

(0.865071, 4.05140)

(-1.40621, 6.32268)

Settings Variable

Setting

t

22

Prediction Fit

SE Fit

95% CI

95% PI

2.27864

0.814809

(0.573225, 3.98406)

(-1.63342, 6.19070)

From the printout: 𝛽 = 6.230. The price of gas is estimated to be 6.230 dollars per 1,000 cubic feet in 1999. 𝛽 = −.1796. For each additional year, the price of gas is estimated to decrease by .1796 dollars per 1,000 cubic feet. b.

To determine the model fit, we test: 𝐻 : 𝛽 =0 𝐻 : 𝛽 ≠0 Copyright © 2022 Pearson Education, Inc.


Time Series: Descriptive Analyses, Models, and Forecasting

851

The test statistic is 𝑡 = −2.96 and the p-value is 𝑝 = .008. Since the p-value is small, H0 is rejected for any reasonable value of α . There is sufficient evidence that the model has an adequate fit. c.

The 95% prediction interval for 2021 is (−1.406, 6.322). We are 95% confident that the actual annual price of natural gas in 2021 is between $0 (since price cannot be negative) and $6.322 per 1,000 cubic feet. The 95% prediction interval for 2022 is (−1.633, 6.191). We are 95% confident that the actual annual price of natural gas in 2017 is between $0 (since price cannot be negative) and $6.191 per 1,000 cubic feet.

14.43

d.

There are basically two problems with using simple linear regression for predicting time series data. First, we must predict values of the time series for values of time outside the observed range. We observe data for time periods 0, 1, 2, … , t and use the regression model to predict values of the time series for t + 1, t + 2, … . The second problem is that simple linear regression does not allow for any cyclical effects such as seasonal trends.

a.

The regression model would be: E (Yt ) = β0 + β1 X t

b.

First, create dummy variables: 1 if January 1 if February 1 if November , m2 =  , . . ., m11 =  m1 =  0 if not 0 if not   0 if not

The new model is: E (Yt ) = β0 + β1 X t + β2 m1 + β3 m2 +  + β12 m11 c.

To determine if mean gasoline consumption varies from month to month, we test: H 0 : β 2 = β3 =  = β12 = 0

d.

Let 𝑡 = 1 for time January, 2009. Then for January, 2024, 𝑡 = 181. The forecast would be 𝑌

=

𝛽 + 𝛽 (181) + 𝛽 14.44

a.

The model would be: E ( Yt ) = β0 + β1 x1t + β2 x2t + β3 x3t + β4 x4t + β5 x5t

b.

R 2 = .91 . 91% of the sample variability in the percentage of two-party vote won by the incumbent party’s candidate is accounted for by the model that includes the 5 independent variables.

c.

To determine if the model is adequate, we test: H 0 : β1 = β 2 = β 3 = β 4 = β 5 = 0 H a : At least one β i ≠ 0

The test statistic is F =

R2 / k

(1 − R ) / n − ( k + 1) 2

=

.91/ 5 = 36.4 . (1 − .91) / 24 − ( 5 + 1) 

The rejection region requires α = .05 in the upper tail of the F-distribution with ν1 = k = 5 and

ν 2 = n − ( k + 1) = 24 − ( 5 + 1) = 18 . From Table VI, Appendix D, F.05 = 2.77 . The rejection region is F > 2.77 . Copyright © 2022 Pearson Education, Inc.


852

Chapter 14 Since the observed value of the test statistic falls in the rejection region (𝐹 = 36.4 > 2.77), H0 is rejected. There is sufficient evidence to indicate the model is adequate at α = .05 . d.

βˆ1 = −4.08 . The mean percentage of the two-party vote won by the incumbent party’s candidate is estimated to be 4.01 less if the fiscal policy of the incumbent in the election year was expansion rather than not, holding all other variables constant.

e.

βˆ2 = −3.41 . For each additional number of consecutive terms served by the incumbent party, the mean percentage of the two-party vote won by the incumbent party’s candidate is estimated to decrease by 3.41, holding all other variables constant.

f.

βˆ3 = −4.84 .The mean percentage of the two-party vote won by the incumbent party’s candidate is estimated to be 4.84 less if the party of the incumbent in the election year is Democrat rather than Republican, holding all other variables constant.

g.

βˆ4 = .92 . For each additional number of quarters of the previous administration where the GDP > 3.2%, the mean percentage of the two-party vote won by the incumbent party’s candidate is estimated to increase by .92, holding all other variables constant.

h.

βˆ5 = .66 . For each additional unit increase in the growth rate of the GDP in the first 3 quarters of the election year, the mean percentage of the two-party vote won by the incumbent party’s candidate is estimated to increase by .66, holding all other variables constant.

i.

s = 2.36 . We would estimate that most of the observed values of the percentage of the two-party vote won by the incumbent party’s candidate will fall within 2s = 2 ( 2.36) = 4.72 units of their predicted values.

14.45

j.

Yes. The R2 value is close to 1and 4 terms in the model are statistically significant. However, the standard deviation is rather large. The actual values of the percentage of the two-party vote won by the incumbent party’s candidate can vary 4.72 precentage points above or below the predicted value. This is a range of 9.42 percentage points. One might not be able to actually predict the outcome of the elction with this wide of range.

a.

Using t = 1 for 1990, the results using MINITAB are: Coefficients Term Constant t

Coef 400.56 -4.166

SE Coef 9.56 0.556

T-Value 41.92 -7.49

P-Value VIF 0.000 0.000 1.00

R-sq(adj) 66.30%

R-sq(pred) 63.85%

Model Summary S R-sq 25.0658 67.50%

Settings Variable t

Setting 30

Copyright © 2022 Pearson Education, Inc.


Time Series: Descriptive Analyses, Models, and Forecasting Prediction Fit SE Fit 95% CI 95% PI 275.579 9.55530 (255.973, 295.185) (220.538, 330.620)

Settings Variable t

Setting 31

Prediction Fit SE Fit 95% CI 95% PI 271.413 10.0448 (250.803, 292.023) (216.006, 326.820)

The fitted model is: 𝑌 = 400.56 − 4.166𝑡 b.

From the printout, the forecasted values for 2019 and 2020 (𝑡 = 30 and 𝑡 = 31) are: 2019: 275.579 2020: 271.413

14.46

c.

From the printout, the 95% prediction intervals for 2015 and 2020 are: 2019: (220.538, 330.620) 2020: (216.006, 326.820)

a.

The regression model is: 𝐸(𝑌 ) = 𝛽 + 𝛽 𝑡 + 𝛽 𝑄 + 𝛽 𝑄 + 𝛽 𝑄

b.

Using MINITAB, the output is: Regression Analysis: Sales versus t, Q1, Q2, Q3 The regression equation is Sales = 120 + 16.5 t + 262 Q1 + 223 Q2 + 106 Q3 Predictor Constant t Q1 Q2 Q3

Coef 119.85 16.512 262.34 222.83 105.51

SE Coef 16.95 1.028 16.73 16.57 16.48

S = 26.00

R-Sq = 96.9%

T 7.07 16.07 15.68 13.45 6.40

P 0.000 0.000 0.000 0.000 0.000

R-Sq(adj) = 96.1%

Analysis of Variance Source Regression Residual Error Total Source t Q1 Q2 Q3

DF 1 1 1 1

DF 4 15 19

SS 318560 10139 328700

MS 79640 676

Seq SS 114343 81883 94610 27724

Copyright © 2022 Pearson Education, Inc.

F 117.82

P 0.000

853


854

Chapter 14 Predicted Values for New Observations New Obs Fit SE Fit 95.0% CI 1 728.95 16.95 ( 692.82, 765.08)

(

95.0% PI 662.80, 795.10)

(

95.0% PI 639.80, 772.10)

(

95.0% PI 539.00, 671.30)

(

95.0% PI 450.00, 582.30)

Values of Predictors for New Observations New Obs 1

t 21.0

Q1 1.00

Q2 0.000000

Q3 0.000000

Predicted Values for New Observations New Obs Fit SE Fit 95.0% CI 2 705.95 16.95 ( 669.82, 742.08) Values of Predictors for New Observations New Obs 2

t 22.0

Q1 0.000000

Q2 1.00

Q3 0.000000

Predicted Values for New Observations New Obs Fit SE Fit 95.0% CI 3 605.15 16.95 ( 569.02, 641.28) Values of Predictors for New Observations New Obs 3

t 23.0

Q1 0.000000

Q2 0.000000

Q3 1.00

Predicted Values for New Observations New Obs Fit SE Fit 95.0% CI 4 516.15 16.95 ( 480.02, 552.28) Values of Predictors for New Observations New Obs 4

t 24.0

Q1 0.000000

Q2 0.000000

Q3 0.000000

The least squares equation is: Yˆt = 119.85 + 16.512t + 262.34Q1 + 222.83Q2 + 105.51Q3

βˆ1 = 16.512

For every increase in time period (1 quarter), the mean sales index increases by an estimated 16.512. ˆ β 2 = 262.34 The difference in mean sales index between the first and fourth quarters is estimated to be 262.34. ˆ β3 = 222.83 The difference in the mean sales index between the second and fourth quarters is estimated to be 222.83. βˆ4 = 105.51 The difference in the mean sales index between the third and fourth quarters is estimated to be 105.51. To determine if the model is useful, we test: H 0 : β1 = β 2 = β3 = β 4 = 0 H a : At least one βi ≠ 0

The test statistic is F = 117.82 and the p-value is p = .000 .

Copyright © 2022 Pearson Education, Inc.


Time Series: Descriptive Analyses, Models, and Forecasting

855

Since the p-value is is so small, H0 is rejected for any reasonable value of α . There is sufficient evidence to indicate the model is useful for any reasonable value of α . c.

The assumption of independent error terms is in doubt.

d.

The forecasts and the 95% prediction intervals are found at the bottom of the printout and are:

2021

14.47

I II III IV

Forecast 728.95 705.95 605.15 516.15

95% Lower Limit 662.8 639.8 539.0 450.0

95% Upper Limit 795.1 772.1 671.3 582.3

a.

The regression model for liquidity risk at quarter t as a linear function of DELR in the previous quarter is: E (Yt ) = β0 + β1 X 1,t −1 .

b.

The new model is: E (Yt ) = β0 + β1 X 1,t −1 + β2 X 2,t −1 + β3 X 3,t −1 + β4 X 4,t −1 .

c.

The new model is: E (Yt ) = β0 + β1 X1,t −1 + β2Q1 + β3Q2 + β4 Q3 1 if Quarter 1 where Q1 =  0 if not

1 if Quarter 2 Q2 =  0 if not

1 if Quarter 3 Q3 =  0 if not

14.48

Autocorrelation is the correlation between time series residuals at different points in time. In the presence of autocorrelated residuals, the regressin analysis tends to produce inflated t-statistics. Consequently, an analyst has a greater than α probability of committing a Type I error when testing a model parameter .

14.49

a.

For α = .05 , the rejection region is d < d L,α = d L,.05 = 1.10 . The value of d L ,.05 is found in Table X, Appendix D, with k = 2, n = 20 , and α = .05 . Also, d U, .05 = 1.54 . Since the test statistic falls between d L ,.05 and dU ,.05 (1.10 ≤ 1.10 ≤ 1.54) , no decision can be made.

b.

For α = .01 , the rejection region is d < d L,α = d L,.01 = .86 . The value of d L ,.01 is found in Table XI, Appendix D, with k = 2, n = 20 , and α = .01 . Also, d U, .01 = 1.27 . Since the test statistic falls between d L ,.01 and dU ,.01 (.86 ≤ 1.10 ≤ 1.27 ) , no decision can be made.

c.

For α = .05 , the rejection region is d < d L,α = d L,.05 = 1.44 . The value of d L ,.05 is found in Table X, Appendix D, with k = 5, n = 65 , and α = .05 . Since the test statistic falls in the rejection region ( d = .95 < 1.44 ) , H0 is rejected. There is sufficient

evidence to indicate positive first-order autocorrelation at α = .05 . d.

For α = .01 , the rejection region is d < d L,α = d L,.01 = 1.15 . The value of d L ,.01 is found in Table XI, Appendix D, with k = 1, n = 31 , and α = .01 . Also, d U, .01 = 1.27 .

Copyright © 2022 Pearson Education, Inc.


856

Chapter 14 Since the test statistic does not fall in the rejection region ( d = 1.35 </ 1.15 ) , and the test statistic is above dU ,.01 ( d = 1.35 > 1.27 ) H0 is not rejected.

14.50

14.51

a.

d = 3.9 indicates the residuals are very strongly negatively autocorrelated.

b.

d = .2 indicates the residuals are very strongly positively autocorrelated.

c.

d = 1.99 indicates the residuals are probably uncorrelated.

a.

To determine if positive first-order autocorrelation exists, we test: H0: No autocorrelation Ha: Positive autocorrelation exists

14.52

b.

The p-value is p < .0001 . Since the p-value is so small, H0 is rejected. There is sufficient evidence to indicate that positive autocorrelation exists for any reasonable value of α . The researchers should not proceed.

a.

To determine if the overall model contributes information for the prediction of monthly passenger car and light truck sales, we test: H 0 : β1 = β 2 = β 3 = β 4 = β 5 = 0 H a : At least one β i ≠ 0

The test statistic is F =

R2 / k

(1 − R ) / n − ( k +1) 2

=

.856 / 5 = 164.067 (1 − .856) / 144 − ( 5 +1)

The rejection region requires α = .05 in the upper tail of the F-distribution withν 1 = k = 5 and ν 2 = n – ( k + 1 ) = 144 – ( 5 + 1 ) = 138 . From Table VI, Appendix D, F.05 ≈ 2.29 . The rejection region is F > 2.29 . Since the observed value of the test statistic falls in the rejection region ( F = 1 64.06 7 > 2.29 ) , H0 is rejected. There is sufficient evidence to indicate the overall model contributes information for the prediction of monthly passenger car and light truck sales at α = .05 . b.

To determine if positive autocorrelation is present, we test: H0: No first-order autocorrelation Ha: Positive first-order autocorrelation of residuals The test statistics is d = 1.01 . For α = .05 , the rejection region is d < d L ,α = d L ,.05 ≈ 1.57 . The value dL,.05 is found in Table X,

Appendix D, with k = 5, n = 144 , and α = .05 .

Since the observed value of the test statistic falls in the rejection region ( d = 1.01 < 1.57 ) , H0 is rejected. There is sufficient evidence to indicate the time series residuals are positively autocorrelated at α = .05 . Copyright © 2022 Pearson Education, Inc.


Time Series: Descriptive Analyses, Models, and Forecasting c.

14.53

857

One of the requirements for the validity of the test in part a is that the error terms are independent. Since H0 was rejected in part b, there is evidence that positive autocorrelation exists. Since the error terms are not independent, the test in part a may not be valid.

To determine if positive autocorrelation is present, we test:

H0: No first-order autocorrelation Ha: Positive first-order autocorrelation of residuals The test statistics is d = 1.77 . For α = .05 , the rejection region is d < d L ,α = d L ,.05 = .93 . The value dL,.05 is found in Table X,

Appendix D, with k = 5, n = 24 , and α = .05 .

Since the observed value of the test statistic does not fall in the rejection region ( d = 1.77 </ .93 ) , H0 is not rejected. There is insufficient evidence to indicate the time series residuals are positively autocorrelated at α = .05 . 14.54

a.

Using MINITAB, the plot of the residuals against t is:

Since there appear to be groups of consecutive positive and groups of consecutive negative residuals, the data appear to be autocorrelated. b.

Using MINITAB, the output is:

Regression Equation IntRate

=

8.437 - 0.1884 t

Durbin-Watson Statistic Durbin-Watson Statistic =

1.10778

To determine if positive autocorrelation is present, we test:

H0: No first-order autocorrelation Ha: Positive first-order autocorrelation of residuals The test statistics is 𝑑 = 1.10778. For 𝛼 = .05, the rejection region is 𝑑

𝑑 , = 𝑑 ,.

= 1.33. The value 𝑑 ,.

Appendix D, with k = 1 , 𝑛 = 28, and 𝛼 = .05. Copyright © 2022 Pearson Education, Inc.

is found in Table X,


858

Chapter 14

Since the observed value of the test statistic falls in the rejection region (𝑑 = 1.10778 1.33), H0 is rejected. There is sufficient evidence to indicate the time series residuals are positively autocorrelated at 𝛼 = .05.

14.55

c.

Since the error terms are dependent, the validity of the test for the model adequacy appears to be questionable.

a.

Using MINITAB, the plot of the residuals against t is:

Since there appear to be groups of consecutive positive and groups of consecutive negative residuals, the data appear to be autocorrelated. b.

Using MINITAB, the output is:

Analysis of Variance Source

DF

Adj SS

Adj MS

F-Value

P-Value

Regression t Error Total

1 1 19 20

24.84 24.84 53.76 78.60

24.837 24.837 2.830

8.78 8.78

0.008 0.008

Model Summary S

R-sq

R-sq(adj)

R-sq(pred)

1.68214

31.60%

28.00%

17.17%

Coefficients Term

Coef

SE Coef

T-Value

P-Value

VIF

Constant t

6.409 -0.1796

0.761 0.0606

8.42 -2.96

0.000 0.008

1.00

Regression Equation PRICE

=

6.409 - 0.1796 t

Durbin-Watson Statistic Durbin-Watson Statistic =

0.961790

To determine if positive autocorrelation is present, we test:

H0: No first-order autocorrelation Ha: Positive first-order autocorrelation of residuals Copyright © 2022 Pearson Education, Inc.


Time Series: Descriptive Analyses, Models, and Forecasting

859

The test statistics is 𝑑 = 0.961790. For 𝛼 = .05, the rejection region is 𝑑

𝑑 , = 𝑑 ,.

= 1.22. The value 𝑑 ,.

is found in Table X,

Appendix D, with k = 1 , 𝑛 = 21, and 𝛼 = .05. Since the observed value of the test statistic falls in the rejection region (𝑑 = 0.961790 1.22), H0 is rejected. There is sufficient evidence to indicate the time series residuals are positively autocorrelated at 𝛼 = .05.

14.56

c.

Since the error terms are dependent, the validity of the test for the model adequacy appears to be questionable.

a.

Using MINITAB, the plot of the residuals against t is:

Since there appear to be groups of consecutive positive and groups of consecutive negative residuals, the data appear to be autocorrelated. b.

Using MINITAB, the output is: Regression Equation Policies

=

400.56 - 4.166 t

Durbin-Watson Statistic Durbin-Watson Statistic =

0.271249

To determine if positive autocorrelation is present, we test:

H0: No first-order autocorrelation Ha: Positive first-order autocorrelation of residuals The test statistics is 𝑑 = 0.271249. For 𝛼 = .05, the rejection region is 𝑑 𝑑 , = 𝑑 ,. Appendix D, with 𝑘 = 1, 𝑛 = 29, and 𝛼 = .05.

= 1.34. The value 𝑑 ,.

is found in Table X,

Since the observed value of the test statistic falls in the rejection region (𝑑 = 0.271249 1.34), H0 is rejected. There is sufficient evidence to indicate the time series residuals are positively autocorrelated at 𝛼 = .05.

Copyright © 2022 Pearson Education, Inc.


860

14.57

Chapter 14 c.

Since the error terms do not appear to be independent, the validity of the test for model adequacy is in question.

a.

We can explain 90% of the variation of the 𝑙𝑛(𝐹𝐷𝐼) values around their mean using the regression model that includes the variables 𝑙𝑛(𝑋 ), 𝑙𝑛(𝑋 ), 𝑎𝑛𝑑 𝑙𝑛(𝑋 )𝑙𝑛(𝑋 ).

b.

To determine if the model is useful for predicting the number of man-hours needed, we test: 𝐻 :𝛽 =𝛽 =𝛽 =0 𝐻 : At least one 𝛽 ≠ 0 The test statistic is 𝐹 = 315.8 and the p-value is 𝑝 < .01. Since the p-value is less than 𝛼 (𝑝 < .01), H0 is rejected. There is sufficient evidence that the model is useful for predicting 𝑙𝑛(𝐹𝐷𝐼) at 𝛼 = .01.

c.

To determine if capital stock and level of employment interact, we test: 𝐻 :𝛽 = 0 𝐻 :𝛽 ≠ 0 The test statistic is 𝑡 = 4.28 and the p-value is 𝑝 < .01. Since the p-value is less than 𝛼 (𝑝 =< .01), H0 is rejected. There is sufficient evidence to indicate capital stock and level of employment interact at 𝛼 = .01.

d.

To determine if positive autocorrelation is present, we test: H0: No first-order autocorrelation Ha: Positive first-order autocorrelation of residuals The test statistics is 𝑑 = 1.62. For 𝛼 = .01, the rejection region is 𝑑 < 𝑑 , = 𝑑 ,. Appendix D, with 𝑘 = 3, 𝑛 = 29, and 𝛼 = .01.

= .99. The value 𝑑 ,.

is found in Table XI,

Since the observed value of the test statistic does not fall in the rejection region (𝑑 = 1.62 ≮ . 99), H0 is not rejected. There is insufficient evidence to indicate the time series residuals are positively autocorrelated at 𝛼 = .01. e.

To determine if positive autocorrelation is present, we test: H0: No first-order autocorrelation Ha: Positive first-order autocorrelation of residuals The test statistics is 𝑑 = .62. For 𝛼 = .01, the rejection region is 𝑑 < 𝑑 , = 𝑑 ,. Appendix D, with 𝑘 = 3, 𝑛 = 29, and 𝛼 = .01.

= .99. The value 𝑑 ,.

is found in Table XI,

Since the observed value of the test statistic does fall in the rejection region (𝑑 = .62 < .99), H0 is rejected. There is sufficient evidence to indicate the time series residuals are positively autocorrelated at 𝛼 = .01.

Copyright © 2022 Pearson Education, Inc.


Time Series: Descriptive Analyses, Models, and Forecasting 14.58

a.

The simple composite index is found by summing the three worker quantities, dividing by 323.4, the sum for the base period, 2000, and multiplying by 100. The values appear in the table. Year 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019

14.59

861

Fully Permanent 140.4 142.2 144.1 146 148.1 150.2 152.4 154.6 156.6 158.5 160.3 161.9 163.4 165.1 166.8 168.5 170.2 172.2 174.3 176.2

Fully Not Permanent 44.9 45.3 45.3 45 44.7 44.7 44.8 45 45 44.6 44 43.8 44.1 44.6 45.4 46.2 47.3 47.5 47.6 47.7

Event of Disability 138.1 140 141.3 142.4 143.8 145.5 147.3 148.9 149.9 149.6 148.9 148.9 149.5 149.8 150.6 151.5 152.6 154.1 155.5 155.9

Total Quantity 323.4 327.5 330.7 333.4 336.6 340.4 344.5 348.5 351.5 352.7 353.2 354.6 357 359.5 362.8 366.2 370.1 373.8 377.4 379.8

Index 100.0 101.3 102.3 103.1 104.1 105.3 106.5 107.8 108.7 109.1 109.2 109.6 110.4 111.2 112.2 113.2 114.4 115.6 116.7 117.4

b.

This is a quantity index because it is based on the numbers of workers rather than prices.

c.

The index value for 2019 is 117.4. This means that the total number of insured workers in 2019 is 117.4 – 100 = 17.4% higher than in 2000.

a.

Compute the exponentially smoothed values E1, E2, …, Et for years 2000 to 2019: 𝐸 = 𝑌 = 140.4 For 𝑤 = .5, 𝐸 = 𝑤𝑌 + (1– 𝑤)𝐸 = .5(142. 2) + (1 − .5)140.4 = 141. 3 𝐸 = 𝑤𝑌 + (1– 𝑤)𝐸 = .5(144. 1) + (1 − .5)(141.3) = 142.7 The rest of the values appear in the table.

Copyright © 2022 Pearson Education, Inc.


862

Chapter 14

Year

Fully Permanent

2000

140.4

2001

142.2

2002

144.1

2003

146

2004

148.1

2005

150.2

2006

152.4

2007

154.6

2008

156.6

2009

158.5

2010

160.3

2011

161.9

2012

163.4

2013

165.1

2014

166.8

2015

168.5

2016

170.2

2017

172.2

2018

174.3

2019

176.2

Exponentially Smoothed w = .5 140.4 141.3 142.7 144.4 146.2 148.2 150.3 152.5 154.5 156.5 158.4 160.2 161.8 163.4 165.1 166.8 168.5 170.4 172.3 174.3

b.

Using MINITAB, the plot of the workers and the exponentially smoothed values is:

c.

To forecast using the exponentially smoothed values, we use the following: For 𝑤 = .5:

𝐹 𝐹

=𝐹 =𝐹

= 𝐸 = 174.3 = 𝐹 = 174.3

The drawback to these forecasts is that all the forecasts for future values are the same. Copyright © 2022 Pearson Education, Inc.


Time Series: Descriptive Analyses, Models, and Forecasting 14.60

a.

To compute the Laspeyres index, multiply the price for each year by the quantity for each of the items for 1990, sum the products for the four items, divide by 14.05 (the sum for the base period 1990), and multiply by 100. The Laspeyres index is: Year 1990 1995 2000 2005 2010 2015

14.61

863

Ground Beef 1.63 1.40 1.63 2.30 2.38 4.23

Spaghetti 0.85 0.88 0.88 0.87 1.19 1.34

Eggs 1.00 1.16 0.96 1.35 1.79 2.06

Potatoes 0.32 0.38 0.35 0.50 0.58 0.65

Total 14.05 13.72 14.37 19.59 21.87 32.39

Laspeyres 100.00 97.65 102.28 139.43 155.66 230.53

b.

From 1990 to 2015, the “basket” of foods increased by 230.53–100 = 130.53%.

a.

Using MINITAB, the output is: Regression Analysis: Daily Visits versus t The regression equation is Daily Visits = 38.2 + 7.32 t Predictor Constant t

Coef 38.171 7.3192

SE Coef 4.420 0.7123

S = 6.470

R-Sq = 93.0%

T 8.64 10.27

P 0.000 0.000

R-Sq(adj) = 92.1%

Analysis of Variance Source Regression Residual Error Total

DF 1 8 9

SS 4419.5 334.9 4754.4

MS 4419.5 41.9

Predicted Values for New Observations New Obs Fit SE Fit 95.0% CI 1 118.68 4.42 ( 108.49, 128.87)

F 105.57

P 0.000

(

95.0% PI 100.61, 136.75)

(

95.0% PI 107.06, 144.94)

Values of Predictors for New Observations New Obs 1

t 11.0

Predicted Values for New Observations New Obs Fit SE Fit 95.0% CI 2 126.00 5.06 ( 114.33, 137.67) Values of Predictors for New Observations New Obs 2

t 12.0

Copyright © 2022 Pearson Education, Inc.


864

Chapter 14 Predicted Values for New Observations New Obs Fit SE Fit 95.0% CI 3 133.32 5.72 ( 120.13, 146.51)

(

95.0% PI 113.40, 153.24)

Values of Predictors for New Observations New Obs 3

t 13.0

The fitted regression line is:

Yˆt = 3 8 .1 7 1 + 7 .3 1 9 t

The forecasts for the next 3 years are: Yˆ11 = 38.171 + 7.319(11) = 118.68 Yˆ = 38.171 + 7.319(12) = 126.00 12

Yˆ13 = 38.171 + 7.319(13) = 133.32

b.

From the printout, the 95% prediction intervals for the 3 years are: Year 11: (100.61, 136.75) Year 12: (107.06, 144.94) Year 13: (113.40, 153.24)

14.62

c.

There are basically two problems with using simple linear regression for predicting time series data. First, we must predict values of the time series for values of time outside the observed range. We observe data for time periods 1, 2, …, t and use the regression model to predict values of the time series for t + 1, t + 2, … . The second problem is that simple linear regression does not allow for any cyclical effects such as seasonal trends.

d.

We could use an exponentially smoothed series to forecast patient visits or we could use a Holt’s series to forecast patient visits.

To compute the Holt’s series, we use: 𝐸 = 𝑌 = 7.17 𝐸 = 𝑤𝑌 + (1– 𝑤)(𝐸 + 𝑇 ) = .3(8.28) + (1 − .3)(7.17 − 1.10) = 6.733 𝑇 = 𝑌 – 𝑌 = 7.17 − 8.27 = −1.10 𝑇 = 𝑣(𝐸 – 𝐸 ) + (1– 𝑣)𝑇 = .7(6.733-7.17) + (1 − .7)(−1.10) = −.636 The rest of the values appear in the table:

Copyright © 2022 Pearson Education, Inc.


Time Series: Descriptive Analyses, Models, and Forecasting Holt’s Year

Interest

1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019

8.27 7.17 8.28 7.86 7.76 7.57 6.92 7.46 8.08 7.01 6.56 5.89 5.86 5.93 6.47 6.4 6.23 5.38 4.86 4.45 3.66 3.98 4.17 3.85 3.65 3.99 3.54 3.78

Et w = .3

Tt v = .7

7.170 6.733 6.626 6.780 7.037 7.134 7.318 7.663 7.672 7.403 6.837 6.233 5.753 5.616 5.678 5.823 5.754 5.472 5.022 4.351 3.832 3.556 3.396 3.320 3.438 3.501 3.626

-1.100 -0.636 -0.266 0.028 0.188 0.124 0.167 0.292 0.093 -0.160 -0.444 -0.556 -0.503 -0.247 -0.030 0.092 -0.020 -0.204 -0.376 -0.583 -0.538 -0.354 -0.218 -0.119 0.047 0.058 0.105

The forecasts for 2020-2021 using the Holt’s series with 𝑤 = .3 and 𝜈 = .7 are: 𝐹 𝐹

=𝐹 =𝐹

= 𝐸 + 𝑇 = 3.626 + (. 105) = 3.731 = 𝐸 + 2𝑇 = 3.626 + 2(. 105) = 3.836

From Exercise 14.41, the forecasts for 2020-2021 are: 2020: 𝑌 = 8.437 − .1884𝑡 = 8.437 − .1884(28) = 3.16 2021: 𝑌 = 8.437 − .1884𝑡 = 8.437 − .1884(29) = 2.973 The forecasts from the Holt’s series are larger than those of the regression forecasts.

Copyright © 2022 Pearson Education, Inc.

865


866 14.63

Chapter 14 a.

We first calculate the exponentially smoothed values for 1995–2019. E1 = Y1 = 41.05

E 2 = .8 Y2 + (1 − .8 ) E1 = .8 ( 50.75 ) + .2 ( 41.05 ) = 48.81 E3 = .8 Y3 + (1 − .8 ) E 2 = .8 ( 65.50 ) + .2 ( 48.81) = 62.16

The rest of the values appear in the table.

YEAR

PRICE

Exponentially Smoothed w = .8

1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019

41.05 50.75 65.5 49 36.31 48.44 55.75 40 46.6 46.65 39.43 48.71 56.15 53.37 53.99 47.91 56.23 65.5 38.33 45.02 44.91 38.41 57.07 72.33 82.66

41.05 48.81 62.16 51.63 39.37 46.63 53.93 42.79 45.84 46.49 40.84 47.14 54.35 53.57 53.91 49.11 54.81 63.36 43.34 44.68 44.86 39.70 53.60 68.58 79.84

Holt’s Tt Et w = .8 v = .5 50.750 64.490 54.442 40.104 45.422 53.541 43.449 45.331 46.255 40.822 46.602 54.554 54.558 54.580 49.484 54.491 63.604 44.449 43.522 43.848 39.138 52.833 69.475 82.209

9.700 11.720 0.836 -6.751 -0.716 3.701 -3.196 -0.657 0.134 -2.650 1.566 4.758 2.382 1.202 -1.947 1.530 5.321 -6.917 -3.922 -1.798 -3.254 5.220 10.931 11.833

The forecasts for 2020 and 2021 are: = 𝐹 = 𝐸 = 79.84 𝐹 = 𝐹 = 𝐸 = 79.84 𝐹 The expected gain is 𝐹 actually a loss. b.

–𝑌

= 79.84 − 82.66 = −2.82. Since this number is negative, it is

We first calculate the Holt’s values for 1995-2019. For 𝑤 = .8 and 𝜈 = .5, Copyright © 2022 Pearson Education, Inc.


Time Series: Descriptive Analyses, Models, and Forecasting

867

E2 = Y2 = 50.75 E 3 = .8 Y3 + (1 − .8 ) ( E 2 + T 2 ) = .8 ( 65.50 ) + .2 ( 50.75 + 9.70 ) = 64.49

T2 = Y2 − Y1 = 50.75 − 41.05 = 9.70 T3 = .5 ( E 3 − E 2 ) + (1 − .5 ) ( T 2 ) = .5 ( 64.49 − 50.75 ) + .5 ( 9.70 ) = 11.72

The rest of the values appear in the table in part a. The forecasts for 2020 and 2021 are: 𝐹 𝐹

=𝐹 =𝐹

= 𝐸 + 𝑇 = 82.209 + (11.833) = 94.042 = 𝐸 + 2𝑇 = 82.209 + 2(11.833) = 105.875

The expected gain is 𝐹

14.64

–𝑌

= 105.875 – 82.66 = 23.215.

c.

Generally, we have more confidence in the Holt’s forecasts because the forecasts can change each year. Using the exponentially smoothed forecasts, the forecasts are the same for each additional year.

a.

Using MINITAB, the printout from fitting the model 𝐸(𝑌 ) = 𝛽 + 𝛽 𝑡 starting with 𝑡 = 1 is: Regression Equation PRICE

=

44.19 + 0.539 t

Coefficients Term Constant t

Coef 44.19 0.539

SE Coef 4.43 0.298

T-Value 9.98 1.81

P-Value VIF 0.000 0.083 1.00

Model Summary S R-sq 10.7377 12.48%

R-sq(adj) 8.67%

R-sq(pred) 0.00%

Settings Variable t

Prediction

Setting 26

Fit SE Fit 95% CI 95% PI 58.2129 4.42727 (49.0544, 67.3714) (34.1862, 82.2396)

Settings Variable t

Setting 27

Prediction Fit SE Fit 95% CI 95% PI 58.7521 4.68993 (49.0503, 68.4540) (34.5132, 82.9911)

Durbin-Watson Statistic Durbin-Watson Statistic =

1.17128

The fitted model is 𝑌 = 44.19 + .539𝑡. b.

The plot of the data is: Copyright © 2022 Pearson Education, Inc.


868

Chapter 14

c.

From the printout in part a, 𝐹

d.

Also from the printout in part a, the 95% prediction intervals are: 2020: (34.1862, 82.2396) 2021: (34.5132, 82.9911)

e.

= 58.2129 and 𝐹

= 58.7521.

We are 95% confident that the actual closing price for 2020 will be between $34.19 and $82.24. We are 95% confident that the actual closing price for 2021 will be between $34.51 and $82.99.

To determine if autocorrelation is present, we test:

H0: Autocorrelation is not present Ha: Autocorrelation is present The test statistic is 𝑑 = 1.17128. Since α is not given, we will use 𝛼 = .10. The rejection region is 𝑑 < 𝑑 ,

4 − 𝑑 < 𝑑 ,. = 1.29, where 𝑑 ,. Also, 𝑑 ,. = 1.45

/

= 𝑑 ,.

= 1.29 or

is from Table X, Appendix D, for 𝑘 = 1, 𝑛 = 25, and 𝛼 = .10.

Since the observed value of the test statistic does fall in the rejection region (𝑑 = 1.17128 < 1.29), H0 is rejected. There is sufficient evidence to indicate that autocorrelation is present at α = .10 . 14.65

a.

To find the simple index for “closed-end” fund types, divide each value by 150 (the value for the base year) and then multiply by 100. To find the simple index for “exchange-traded” fund types, divide each value by 66 (the value for the base year) and then multiply by 100. To find the simple index for “UIT” fund types, divide each value by 74 (the value for the base year) and then multiply by 100. The indeces are found in the following table.

Copyright © 2022 Pearson Education, Inc.


Time Series: Descriptive Analyses, Models, and Forecasting

YEAR 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018

14.66

CLOSEEND 100.0 96.7 107.3 144.0 170.0 184.0 199.3 210.7 123.3 149.3 159.3 162.7 176.7 188.0 194.7 175.3 176.7 184.7 166.7

EXCHTRADE 100.0 125.8 154.5 228.8 345.5 456.1 640.9 921.2 804.5 1177.3 1503.0 1587.9 2025.8 2537.9 2992.4 3183.3 3824.2 5153.0 5107.6

869

UIT 100.0 66.2 48.6 48.6 50.0 55.4 67.6 71.6 39.2 51.4 68.9 81.1 97.3 117.6 136.5 127.0 114.9 114.9 94.6

b.

Using MINITAB, the plot of the indecies is:

c.

Based on the annual investment assets in 2000, the annual investment assests for UITs increased dramatically more than the annual investment assets for either closed-end or exchange-traded fund types.

To compute the Holt’s values for the years 2015-2019: With 𝑤 = .5 and 𝜈 = .5, Copyright © 2022 Pearson Education, Inc.


870

Chapter 14

𝐸 = 𝑌 = 18219.41 𝐸 = 𝑤𝑌 + (1– 𝑤)(𝐸 + 𝑇 ) = .5(18344.71) + (1 − .5)(18219.41 + 235.23) = 18399.67 𝑇 = 𝑌 – 𝑌 = 18219.41 − 17984.18 = 235.23 𝑇 = 𝑣(𝐸 – 𝐸 ) + (1– 𝑣)𝑇 = .5(18399.67 − 18219.41) + (1 − .5)(235.23) = 207.75 The rest of the values appear in the table: Holt’s Tt v = .5

Year

Quarter

GDP

Et w = .5

2015 2015 2015 2015 2016 2016 2016 2016 2017 2017 2017 2017 2018 2018 2018 2018 2019 2019 2019

1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3

17,984.18 18,219.41 18,344.71 18,350.83 18,424.28 18,637.25 18,806.74 18,991.88 19,190.43 19,356.65 19,611.70 19,918.91 20,163.16 20,510.18 20,749.75 20,897.80 21,098.83 21,340.27 21,525.82

18,219.41 18,399.67 18,479.12 18,523.50 18,627.37 18,766.52 18,938.72 19,137.39 19,333.09 19,564.36 19,845.43 20,126.46 20,449.66 20,746.18 20,969.35 21,163.57 21,365.21 21,552.57

235.23 207.75 143.60 93.99 98.93 119.04 145.62 172.14 183.92 207.60 244.33 262.68 292.94 294.73 258.95 226.58 214.11 200.74

2019

4

21,747.39

21,750.35

199.26

The forecasts for the four quarters of 2020 are: 𝐹 𝐹 𝐹 𝐹 14.67

a.

=𝐹 , =𝐹 , =𝐹 20, = 𝐹 ,

= 𝐸 + 𝑇 = 21,750.35 + (199.26) = 21,949.61 = 𝐸 + 2𝑇 = 21,750.35 + 2(199.26) = 22,148.87 = 𝐸 + 3𝑇 = 21,750.35 + 3(199.26) = 22,348.13 = 𝐸 + 4𝑇 = 21,750.35 + 4(199.26) = 22,547.39

Using MINITAB, the results from fitting the model 𝐸(𝑌 ) = 𝛽 + 𝛽 𝑡 starting with 𝑡 = 0 are: Coefficients Term Coef Constant 17730.6 t 206.62

SE Coef 69.7 6.27

T-Value 254.33 32.94

P-Value VIF 0.000 0.000 1.00

Copyright © 2022 Pearson Education, Inc.


Time Series: Descriptive Analyses, Models, and Forecasting Model Summary S R-sq 161.775 98.37%

R-sq(adj) 98.28%

R-sq(pred) 97.93%

Analysis of Variance Source DF Adj SS Adj MS Regression 1 28391083 28391083 t 1 28391083 28391083 Error 18 471079 26171 Total 19 28862161

F-Value 1084.83 1084.83

P-Value 0.000 0.000

Settings Variable Setting t 20 Prediction Fit SE Fit 95% CI 95% PI 21863.1 75.1494 (21705.2, 22020.9) (21488.3, 22237.8) Settings Variable Setting t 21 Prediction Fit SE Fit 95% CI 95% PI 22069.7 80.7047 (21900.1, 22239.2) (21689.9, 22449.5) Settings Variable Setting t 22 Prediction Fit SE Fit 95% CI 95% PI 22276.3 86.3583 (22094.9, 22457.7) (21891.0, 22661.6) Settings Variable Setting t 23 Prediction Fit SE Fit 95% CI 95% PI 22482.9 92.0923 (22289.4, 22676.4) (22091.8, 22874.0)

The fitted regression line is: 𝑌 = 17,730.6 + 206.62𝑡. From the printouts, the 2020 quarterly GDP forecasts are: Year 2020

b.

Quarter Q1 Q2 Q3 Q4

Forecast 21,863.1 22,069.7 22,276.3 22,482.9

95% Lower Limit 21,448.3 21,689.9 21,891.0 22,091.8

95% Upper Limit 22,237.8 22,449.5 22,661.6 22,874.0

The following model is fit: 𝐸(𝑌 ) = 𝛽 + 𝛽 𝑡 + 𝛽 𝑄 + 𝛽 𝑄 + 𝛽 𝑄 Copyright © 2022 Pearson Education, Inc.

871


872

Chapter 14 where Q1 = 

1 if quarter 1  0 otherwise

1 if quarter 2 Q2 =   0 otherwise

1 if quarter 3 Q3 =   0 otherwise

The MINITAB printout is: Coefficients Term Coef Constant 17707 t 206.79 Q1 11 Q2 45 Q3 33

SE Coef 110 6.96 113 112 112

T-Value 161.26 29.72 0.10 0.40 0.30

Model Summary S R-sq 176.036 98.39%

R-sq(adj) 97.96%

R-sq(pred) 97.10%

Analysis of Variance Source DF Adj SS Adj MS Regression 4 28397329 7099332 t 1 27368895 27368895 Q1 1 303 303 Q2 1 4979 4979 Q3 1 2741 2741 Error 15 464832 30989 Total 19 28862161 Durbin-Watson Statistic Durbin-Watson Statistic =

P-Value 0.000 0.000 0.923 0.694 0.770

F-Value 229.09 883.19 0.01 0.16 0.09

VIF 1.04 1.55 1.52 1.51

P-Value 0.000 0.000 0.923 0.694 0.770

0.215295

Settings Variable Setting t 20 Q1 1 Q2 0 Q3 0 Prediction Fit SE Fit 95% CI 95% PI 21853.7 114.762 (21609.1, 22098.3) (21405.8, 22301.6) Settings Variable Setting t 21 Q1 0 Q2 1 Q3 0 Prediction Fit SE Fit 95% CI 95% PI 22094.3 114.762 (21849.7, 22338.9) (21646.4, 22542.2)

Copyright © 2022 Pearson Education, Inc.


Time Series: Descriptive Analyses, Models, and Forecasting

873

Settings Variable Setting t 22 Q1 0 Q2 0 Q3 1 Prediction Fit SE Fit 95% CI 95% PI 22289.3 114.762 (22044.7, 22533.9) (21841.4, 22737.2) Settings Variable Setting t 23 Q1 0 Q2 0 Q3 0 Prediction Fit SE Fit 95% CI 95% PI 22462.9 114.762 (22218.3, 22707.5) (22015.0, 22910.8)

The fitted regression line is: 𝑌 = 17,707 + 206.79𝑡 + 11𝑄 + 45𝑄 + 33𝑄 To determine whether the data indicate a significant seasonal component, we test: H 0 : β2 = β3 = β4 = 0 H a : At least one β i ≠ 0

The test statistic is 𝐹=

(

)/( (

)

)

=

(471079 464832)/( 464832/

(

) )

=

. .

= .067

Since no 𝛼 is given, we will use 𝛼 = .05. The rejection region requires 𝛼 = .05 in the upper tail of the F-distribution with 𝜈 = 𝑘 − 𝑔 = 4 − 1 = 3 and 𝜈 = 𝑛 − (𝑘 + 1) = 20 − (4 + 1) = 15. From Table VI, Appendix D, 𝐹. = 3.29. The rejection region is 𝐹 > 3.29. Since the observed value of the test statistic does not fall in the rejection region (𝐹 = .067 ≯ 3.29), H0 is not rejected. There is insufficient evidence to indicate a seasonal component at 𝛼 = .05. This supports the assertion that the data have been seasonally adjusted. c.

From the printout, the 2020 quarterly forecasts are: Year 2020

Quarter Q1 Q2 Q3 Q4

Forecast 21,853.7 22,094.3 22,289.3 22,462.9

95% Lower Limit 21,405.8 21,646.4 21,841.4 22,015.0

95% Upper Limit 22,301.6 22,542.2 22,737.2 22,910.8

Copyright © 2022 Pearson Education, Inc.


874

Chapter 14 d.

To determine if the time series residuals are autocorrelated, we test: H0: No first-order autocorrelation of residuals Ha: Positive or negative first-order autocorrelation of residuals The test statistic is 𝑑 = 0.215295 For 𝛼 = .10, the rejection region is 𝑑 < 𝑑 ,

= .90 or (4 − 𝑑) < 𝑑 ,.

= 𝑑 ,.

/

is found in Table X, Appendix D, with 𝑘 = 4 and 𝑛 = 20. Also, 𝑑 ,.

of 𝑑 ,.

= .90. The value = 1.83.

Since the observed value of the test statistic does fall in the rejection region (𝑑 = 0.215295 < .90), H0 is rejected. There is sufficient evidence to indicate that the time series residuals are autocorrelated at 𝛼 = .10. 14.68

Once you find the 2020 GDP values, you would use the following formulas to compute the criteria to evaluate the forecasts. Holt forecasts: The forecasts for the four quarters of 2020 are:

MAD =

𝐹 𝐹 𝐹 𝐹

=𝐹 =𝐹 , =𝐹 20, = 𝐹

|

MAPE = RMSE =

= 𝐸 + 𝑇 = 21,750.35 + (199.26) = 21,949.61 = 𝐸 + 2𝑇 = 21,750.35 + 2(199.26) = 22,148.87 = 𝐸 + 3𝑇 = 21,750.35 + 3(199.26) = 22,348.13 = 𝐸 + 4𝑇 = 21,750.35 + 4(199.26) = 22,547.39

, ,

|

(

=

|

,

.

| |

,

)

,

.

| |

,

.

,

.

| |

.

,

,

|

.

.

,

.

100 =

(

)

=

100

(

,

)

.

(

,

)

.

(

,

.

)

(

,

.

Simple linear regression forecasts:

MAD =

Year 2020

Quarter Q1 Q2 Q3 Q4

|

MAPE = RMSE =

|

(

=

|

95% Lower Limit 21,448.3 21,689.9 21,891.0 22,091.8

Forecast 21,863.1 22,069.7 22,276.3 22,482.9 ,

. | |

)

. | |

,

,

. | |

,

.

,

95% Upper Limit 22,237.8 22,449.5 22,661.6 22,874.0

.

,

,

. |

.

,

.

100 = (

)

=

(

100 ,

. )

(

,

. )

(

,

. )

Copyright © 2022 Pearson Education, Inc.

(

,

. )

)


Time Series: Descriptive Analyses, Models, and Forecasting

875

Seasonal linear regression forecasts: Year 2020

MAD =

MAPE = RMSE =

Quarter Q1 Q2 Q3 Q4

|

|

=

(

|

95% Lower Limit 21,405.8 21,646.4 21,841.4 22,015.0

Forecast 21,853.7 22,094.3 22,289.3 22,462.9 . | |

,

. | |

,

)

,

. | |

,

.

,

95% Upper Limit 22,301.6 22,542.2 22,737.2 22,910.8

.

,

,

. |

.

,

.

100 =

(

)

(

=

100 . )

,

(

,

. )

(

,

. )

(

,

. )

To determine which forecasting model performs best, one would select the model with the smallest value for the particular criterion. 14.69

a.

Using MINITAB, the results from fitting the model 𝐸(𝑌 ) = 𝛽 + 𝛽 𝑡 starting with 𝑡 = 0 are:

Coefficients Term

Coef

SE Coef

T-Value

P-Value

VIF

Constant t

742.9 3.65

41.9 7.09

17.71 0.52

0.000 0.619

1.00

Model Summary S

R-sq

R-sq(adj)

R-sq(pred)

74.3629

2.87%

0.00%

0.00%

Analysis of Variance Source

DF

Adj SS

Adj MS

F-Value

P-Value

Regression t Error Total

1 1 9 10

1469 1469 49769 51238

1469 1469 5530

0.27 0.27

0.619 0.619

Settings Variable

Setting

t

11

Prediction Fit 783.109

SE Fit

95% CI

48.0882 (674.326, 891.892)

95% PI (582.780, 983.439)

Settings Variable t

Setting 12

Copyright © 2022 Pearson Education, Inc.


876

Chapter 14

Prediction Fit 786.764

SE Fit

95% CI

95% PI

54.4610 (663.564, 909.963)

(578.254, 995.273)

The fitted regression line is: 𝑌 = 742.9 + 3.65𝑡 For the years 2019 and 2020, t =11 and 12. From the printout, the predicted values and 95% prediction intervals for 2019 and 2020 are: Year 2019 2020 b.

Forecast 783.109 786.764

95% Lower Limit 582.780 578.254

95% Upper Limit 983.439 995.273

To compute the Holt’s values for the years 2008-2018: With 𝑤 = .7 and 𝜈 = .7, 𝐸 = 𝑌 = 795 𝐸 = 𝑤𝑌 + (1– 𝑤)(𝐸 + 𝑇 ) = .7(730) + (1 − .7)(795-71) = 728.20 𝑇 = 𝑌 – 𝑌 = 795-866 = −71 𝑇 = 𝑣(𝐸 – 𝐸 ) + (1– 𝑣)𝑇 = .7(728.20 − 795) + (1 − .7)(-71) = −68.06 The rest of the values appear in the table: Holt’s Tt Et w=.7 v=.7

YEAR

DEBT

2008

866

2009

795

795.00

-71.00

2010

730

728.20

-68.06

2011

704

690.84

-46.57

2012

679

668.58

-29.55

2013

683

669.81

-8.01

2014

700

688.54

10.71

2015

733

722.88

27.25

2016

779

770.34

41.40

2017

834

827.32

52.31

2018

870

872.89

47.59

Using the Holt’s series, the forecasts for 2019 and 2020 are: 𝐹 𝐹

=𝐹 =𝐹

= 𝐸 + 𝑇 = 872.89 + (47.59) = 920.48 = 𝐸 + 2𝑇 = 872.89 + 2(47.59) = 968.07

These values are larger than the forecasts using simple linear regression, but both fall inside the simple linear regression prediction intervals.

Copyright © 2022 Pearson Education, Inc.


Time Series: Descriptive Analyses, Models, and Forecasting 14.70

a.

Real income 1990 =

$50,000

Real income 2019 =

$

.

125.8 , .

877

× 100 = $39,745.63 × 100 = $37,342.77

The real income for 2019 was slightly lower than that for 1990. Since the real income in 2019 is lower than that in 1990, you would be able to buy more in 1990 than in 2019.

14.71

=

$

,

. Solving for x, we get 𝑥 = $40,445.15.

b.

Let x = monetary income in 2019. Then

a.

To compute the exponentially smoothed values E1, E2, …, Et for months January through September:

254.4

.

𝐸 = 𝑌 = 134.42 For 𝑤 = .5, 𝐸 = 𝑤𝑌 + (1 − 𝑤)𝐸 = .5(138.13) + (1 − .5)(134.42) = 136.275 𝐸 = 𝑤𝑌 + (1 − 𝑤)𝐸 = .5(141.10) + (1 − .5)(136.275) = 138.688 The rest of the values appear in the table.

Month

IBM

JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC

134.42 138.13 141.10 140.27 126.99 137.90 148.24 135.53 145.42 133.73 134.45 134.04

Exponentially Smoothed w = .5 134.42 136.28 138.69 139.48 133.23 135.57 141.90 138.72 142.07

The forecasts for October through December 2019 are: 𝐹 19,Oct = 𝐹 = 𝐸 = 142.07 𝐹 ,Nov = 𝐹 = 𝐹 = 142.07 𝐹 19,Dec = 𝐹 = 𝐹 = 142.07 The forecast errors are the differences between the actual values and the forecasted values. The forecast errors are: Month Oct Nov Dec b.

Yt+i 133.73 134.45 134.04

Ft+i 142.07 142.07 142.07

Difference -8.43 -7.62 -8.03

Using MINITAB, the output is:

Copyright © 2022 Pearson Education, Inc.


878

Chapter 14 Coefficients Term Coef Constant 134.66 TIME 0.802

SE Coef 4.53 0.805

T-Value 29.72 1.00

Model Summary S R-sq 6.23659 12.41%

R-sq(adj) 0.00%

R-sq(pred) 0.00%

Analysis of Variance Source DF Adj SS Regression 1 38.58 TIME 1 38.58 Error 7 272.26 Total 8 310.84 Durbin-Watson Statistic Durbin-Watson Statistic =

Adj MS 38.58 38.58 38.89

P-Value VIF 0.000 0.352 1.00

F-Value 0.99 0.99

P-Value 0.352 0.352

2.46978

Settings Variable Setting TIME 10 Prediction Fit SE Fit 95% CI 95% PI 142.676 4.53077 (131.962, 153.389) (124.448, 160.904) Settings Variable Setting TIME 11 Prediction Fit SE Fit 95% CI 95% PI 143.478 5.25915 (131.042, 155.914) (124.187, 162.768) Settings Variable Setting TIME 12 Prediction Fit SE Fit 95% CI 95% PI 144.280 6.00716 (130.075, 158.484) (123.804, 164.755)

The least squares fitted model is: 𝑌 = 134.66 + .802𝑡

c.

𝛽 = 134.66.

The estimated stock price for IBM in December 2014 is $134.66.

𝛽 = 0.802.

The estimated increase in the value of stock for IBM for each additional month is $0.802.

We expect most of the monthly stock prices to fall within 2(standard deviations) = 2(6.23659) = $12.47318 of their least squares predicted values.

Copyright © 2022 Pearson Education, Inc.


Time Series: Descriptive Analyses, Models, and Forecasting d.

879

The forecasts and prediction intervals are found at the bottom of the printout in part b.

Month Oct Nov Dec

IBM 133.73 134.45 134.04

95% Lower Limit 124.448 124.187 123.804

Forecast 142.676 143.478 144.280 .

The precision for October is approximately

.

= 4.473.

.

The precision for November is approximately

.

.

The precision for December is approximately

95% Upper Limit 160.904 162.768 164.755

= 4.514.

.

= 5.12

These are all well within the 12.47318 from part c. e.

The MAD, MAPE, and RMSE for the smoothed series are: MAD =

𝑀𝐴𝑃𝐸 =

𝑅𝑀𝑆𝐸 =

|

|

=

|

.

.

| |

.

.

)

(

=

(

.

.

.

|

.

.

.

100 =

| |

.

.

.

)

(

.

.

)

= 7.997

.

.

.

.

=

(

.

100 = 5.965

.

.

)

.

=

= 8.002

The MAD, MAPE, and RMSE for the regression model are: MAD =

𝑀𝐴𝑃𝐸 =

𝑅𝑀𝑆𝐸 =

|

|

=

|

| |

.

.

.

.

)

=

(

.

.

.

.

.

.

.

)

(

.

|

.

.

.

100 =

(

| |

.

=

.

. .

.

)

(

= 9.405

.

100 = 7.015

.

)

=

.

= 9.423

The values of MAD, MAPE, and RMSE for the exponentially smoothed model are all smaller than their corresponding values for the regression model. f.

We have to assume that the error terms are independent.

g.

To determine if positive autocorrelation is present, we test: H0: No first-order autocorrelation of residuals Ha: Positive first-order autocorrelation of residuals From the printout, the test statistic is 𝑑 = 2.46978 Copyright © 2022 Pearson Education, Inc.


880

Chapter 14 The critical value for 𝑘 = 1 and 𝑛 = 9 cannot be found in Table X, Appendix D. Using a table online, 𝑑 ,. = .824 and 𝑑 ,. = 1.320. The rejection region is 𝑑 < .824. Since the observed value of the test statistic does not fall in the rejection region (𝑑 = 2.46978 ≮ . 824), H0 is not rejected. There is insufficient evidence to indicate the time series residuals are positively autocorrelated at 𝛼 = .05. Since there is no evidence of positive autocorrelation, the validity of the regression model is not in doubt.

14.72

a.

For Bank 1, R 2 = .914 . 91.4% of the sample variation of the deposit shares of Bank 1 is explained by the model containing expenditures on promotion-related activities, expenditures on servicerelated activities, and expenditures on distribution-related activities. For Bank 2, R2 = .721 . 72.1% of the sample variation of the deposit shares of Bank 2 is explained by the model containing expenditures on promotion-related activities, expenditures on service-related activities, and expenditures on distribution-related activities. For Bank 3, R 2 = .926 . 92.6% of the sample variation of the deposit shares of Bank 3 is explained by the model containing expenditures on promotion-related activities, expenditures on servicerelated activities, and expenditures on distribution-related activities. For Bank 4, R 2 = .827 . 82.7% of the sample variation of the deposit shares of Bank 4 is explained by the model containing expenditures on promotion-related activities, expenditures on servicerelated activities, and expenditures on distribution-related activities. For Bank 5, R 2 = .270 . 27.0% of the sample variation of the deposit shares of Bank 5 is explained by the model containing expenditures on promotion-related activities, expenditures on servicerelated activities, and expenditures on distribution-related activities. For Bank 6, R 2 = .616 . 61.6% of the sample variation of the deposit shares of Bank 6 is explained by the model containing expenditures on promotion-related activities, expenditures on servicerelated activities, and expenditures on distribution-related activities. For Bank 7, R 2 = .962 . 96.2% of the sample variation of the deposit shares of Bank 7 is explained by the model containing expenditures on promotion-related activities, expenditures on servicerelated activities, and expenditures on distribution-related activities. For Bank 8, R2 = .495 . 49.5% of the sample variation of the deposit shares of Bank 8 is explained by the model containing expenditures on promotion-related activities, expenditures on service-related activities, and expenditures on distribution-related activities. For Bank 9, R 2 = .500 . 50.0% of the sample variation of the deposit shares of Bank 9 is explained by the model containing expenditures on promotion-related activities, expenditures on servicerelated activities, and expenditures on distribution-related activities.

b.

For all banks, to determine if the model is adequate, we test: H 0 : β1 = β 2 = β 3 = 0 H a : At least one β i ≠ 0

For Bank 1, the p-value is p = 0.000 . Since the p-value is less than α =.01, H0 is rejected. There is sufficient evidence to indicate the model is adequate at α =.01.

Copyright © 2022 Pearson Education, Inc.


Time Series: Descriptive Analyses, Models, and Forecasting

881

For Bank 2, the p-value is p = 0.004 . Since the p-value is less than α =.01, H0 is rejected. There is

sufficient evidence to indicate the model is adequate at α =.01.

For Bank 3, the p-value is p = 0.000 . Since the p-value is less than α =.01, H0 is rejected. There is

sufficient evidence to indicate the model is adequate at α =.01.

For Bank 4, the p-value is p = 0.000 . Since the p-value is less than α =.01, H0 is rejected. There is

sufficient evidence to indicate the model is adequate at α =.01.

For Bank 5, the p-value is p = 0.155 . Since the p-value is not less than α =.01, H0 is not rejected. There is insufficient evidence to indicate the model is adequate at α =.01.

For Bank 6, the p-value is p = 0.012 . Since the p-value is not less than α =.01, H0 is not rejected. There is insufficient evidence to indicate the model is adequate at α =.01.

For Bank 7, the p-value is p = 0.000 . Since the p-value is less than α =.01, H0 is rejected. There is sufficient evidence to indicate the model is adequate at α =.01.

For Bank 8, the p-value is p = 0.014 . Since the p-value is not less than α =.01, H0 is not rejected. There is insufficient evidence to indicate the model is adequate at α =.01.

For Bank 9, the p-value is p = 0.011 . Since the p-value is not less than α =.01, H0 is not rejected. There is insufficient evidence to indicate the model is adequate at α =.01.

c.

To determine if positive autocorrelation is present, we test: H0: No positive first-order autocorrelation Ha: Positive first-order autocorrelation of residuals The test statistics is d. For α =.01, the rejection region is d < d L ,α = d L ,.01 = .77 . The value d L ,.01 is found in Table XI, Appendix D, with k = 3, n = 20 , and α =.01. Also, d U , .01 = 1.41 .

For Bank 1, d = 1.3. Since the observed value of the test statistic does not fall in the rejection region (𝑑 = 1.3 ≮ . 77)and is not greater thn dU, .01 ( d = 1 .3 >/ 1 .4 1 ) , no decision can be made at α =.01. For Bank 2, d = 3.4 . Since the observed value of the test statistic does not fall in the rejection region ( d = 3 .4 </ .7 7 ) and is greater than dU, .01 ( d = 3.4 > 1.41) , H0 is not rejected. There is

insufficient evidence to indicate the time series residuals are positively autocorrelated at α =.01.

For Bank 3, d = 2.7 . Since the observed value of the test statistic does not fall in the rejection region ( d = 2 .7 </ .7 7 ) and is greater than dU, .01 ( d = 2.7 > 1.41) , H0 is not rejected. There is

insufficient evidence to indicate the time series residuals are positively autocorrelated at α =.01. For Bank 4, d =1.9 . Since the observed value of the test statistic does not fall in the rejection region (d = 1.9 </ .77) and is greater than dU, .01 ( d = 1.9 > 1.41) , H0 is not rejected. There is insufficient Copyright © 2022 Pearson Education, Inc.


882

Chapter 14 evidence to indicate the time series residuals are positively autocorrelated at α =.01. For Bank 5, d = .85 . Since the observed value of the test statistic does not fall in the rejection region ( d = .85 </ .77 ) and is not greater thn dU, .01 ( d = .85 >/ 1.41) , no decision can be made at α =.01. For Bank 6, d =1.8 . Since the observed value of the test statistic does not fall in the rejection region ( d = 1.8 </ .77 ) and is greater than dU, .01 ( d = 1.8 > 1.41) , H0 is not rejected. There is insufficient

evidence to indicate the time series residuals are positively autocorrelated at α =.01.

For Bank 7, d = 2.5 . Since the observed value of the test statistic does not fall in the rejection region ( d = 2 .5 </ .7 7 ) and is greater than dU, .01 ( d = 2.5 > 1.41) , H0 is not rejected. There is

insufficient evidence to indicate the time series residuals are positively autocorrelated at α =.01.

For Bank 8, d = 2.3 . Since the observed value of the test statistic does not fall in the rejection region ( d = 2 .3 </ .7 7 ) and is greater than dU, .01 ( d = 2.3 > 1.41) , H0 is not rejected. There is

insufficient evidence to indicate the time series residuals are positively autocorrelated at α =.01. For Bank 9, d = 1.1. Since the observed value of the test statistic does not fall in the rejection region ( d = 1.1 </ .77 ) and is not greater thn dU, .01 ( d = 1 .1 >/ 1 .4 1) , no decision can be made at α =.01.

Copyright © 2022 Pearson Education, Inc.


Chapter 15 Nonparametric Statistics 15.1

The sign test is preferred to the t-test when the population from which the sample is selected is not normal.

15.2

a.

Since the normal distribution is symmetric, the probability that a randomly selected observation exceeds the mean of a normal distribution is .5.

b.

By the definition of "median," the probability that a randomly selected observation exceeds the median of a normal distribution is .5.

c.

If the distribution is not normal, the probability that a randomly selected observation exceeds the mean depends on the distribution. With the information given, the probability cannot be determined.

d.

By definition of "median," the probability that a randomly selected observation exceeds the median of a non-normal distribution is .5.

a.

P ( x ≥ 7 ) = 1 − P ( x ≤ 6 ) = 1 − .965 = .035

b.

P ( x ≥ 5) = 1 − P ( x ≤ 4 ) = 1 − .637 = .363

c.

P ( x ≥ 8 ) = 1 − P ( x ≤ 7 ) = 1 − .996 = .004

d.

P ( x ≥ 10 ) = 1 − P ( x ≤ 9 ) = 1 − .849 = .151

15.3

μ = np = 15 (.5) = 7.5 and σ = npq = 15 (.5)(.5) = 1.9365

 (10 − .5) − 7.5 = P z ≥ 1.03 = .5 − .3485 = .1515 P ( x ≥ 10) ≈ P  z ≥ (Using Table II, Appendix D) ( ) 1.9365   e.

P ( x ≥ 15) = 1 − P ( x ≤ 14 ) = 1 − .788 = .212

μ = np = 25 (.5) = 12.5 and σ = npq = 25(.5)(.5) = 2.5

 (15 − .5) − 12.5  = P z ≥ .80 = .5 − .2881 = .2119 P ( x ≥ 15) ≈ P  z ≥ (Using Table II, Appendix D) ( )  2.5  15.4

a.

H0 : η = 9 Ha : η > 9

The test statistic is S = {Number of observations greater than 9} = 7. The p-value = P ( x ≥ 7 ) where x is a binomial random variable with 𝑛 = 10 and 𝑝 = .5. From Table I,

883 Copyright © 2022 Pearson Education, Inc.


884

Chapter 15 p − value = P ( x ≥ 7 ) = 1 − P ( x ≤ 6 ) = 1 − .828 = .172

Since the p-value is not less than α ( p = .172 </ .05) , H0 is not rejected. There is insufficient evidence to indicate the median is greater than 9 at 𝛼 = .05. b.

H0 : η = 9 Ha : η ≠ 9

S1 = {Number of observations less than 9} = 3 and S2 = {Number of observations greater than 9} = 7 The test statistic S is the larger of of S1 and S2. Thus, S = 7. The p-value = 2 P ( x ≥ 7 ) where x is a binomial random variable with 𝑛 = 10 and 𝑝 = .5. From Table I,

p − value = 2P ( x ≥ 7) = 2 (1 − P ( x ≤ 6) ) = 2 (1 − .828) = .344 Since the p-value is not less than α ( p = .344 </ .05) , H0 is not rejected. There is insufficient evidence to indicate the median is different than 9 at 𝛼 = .05. c.

H 0 : η = 20 H a : η < 20

The test statistic is S = {Number of observations less than 20} = 9. The p-value = P ( x ≥ 9 ) where x is a binomial random variable with 𝑛 = 10 and 𝑝 = .5. From Table I, p − value = P ( x ≥ 9 ) = 1 − P ( x ≤ 8) = 1 − .989 = .011

Since the p-value is less than α ( p = .011 < .05) , H0 is rejected. There is sufficient evidence to indicate the median is less than 20 at 𝛼 = .05. d.

H 0 : η = 20 H a : η ≠ 20

S1 = {Number of observations less than 20} = 9 and S2 = {Number of observations greater than 20} = 1 The test statistic S is the larger of S1 and S2. Thus, S = 9. The p-value = 2 P ( x ≥ 9 ) where x is a binomial random variable with 𝑛 = 10 and 𝑝 = .5. From Table I,

p − value = 2P ( x ≥ 9) = 2 (1 − P ( x ≤ 8) ) = 2 (1 − .989) = .022 Since the p-value is less than α ( p = .022 < .05) , H0 is rejected. There is sufficient evidence to indicate the median is different than 20 at 𝛼 = .05. Copyright © 2022 Pearson Education, Inc.


Nonparametric Statistics e.

885

For all parts, μ = np = 10 (.5) = 5 and σ = npq = 10 (.5)(.5) = 1.581 .

 ( 7 − .5) − 5 = P z ≥ .95 = .5 − .3289 = .1711 For part a, P ( x ≥ 7) ≈ P  z ≥ ( ) 1.581   This is close to the probability .172 in part a. The conclusion is the same.

 ( 7 − .5) − 5  = 2P z ≥ .95 = 2 .5 − .3289 = .3422 For part b, 2P ( x ≥ 7) ≈ 2 P  z ≥ ( ) ( ) 1.581   This is close to the probability .344 in part b. The conclusion is the same.

 ( 9 − .5) − 5 = P z ≥ 2.21 = .5 − .4864 = .0136 For part c, P ( x ≥ 9) ≈ P  z ≥ ( ) 1.581   This is close to the probability .011 in part c. The conclusion is the same.

 ( 9 − .5) − 5 = 2P z ≥ 2.21 = 2 .5 − .4864 = .0272 For part d, 2P ( x ≥ 9) ≈ 2 P  z ≥ ( ) ( ) 1.581   This is close to the probability .022 in part d. The conclusion is the same. f. 15.5

We must assume only that the sample is selected randomly from a continuous probability distribution.

To determine if the median is greater than 75, we test: H 0 : η = 75 H a : η > 75

The test statistic is S = number of measurements greater than 75 = 17. The p-value = P ( x ≥ 17 ) where x is a binomial random variable with 𝑛 = 25 and 𝑝 = .5. From Table I, p − value = P ( x ≥ 17 ) = 1 − P ( x ≤ 16 ) = 1 − .946 = .054

Since the p-value is less than α ( p = .054 < .10 ) , H0 is rejected. There is sufficient evidence to indicate the median is greater than 75 at 𝛼 = .10. We must assume the sample was randomly selected from a continuous probability distribution. Note: Since n ≥ 10 , we could use the large-sample approximation. 15.6

a.

To determine if the median number of occupational accidents at all Turkish sites is less than 70, we test: H 0 : η = 70 H a : η < 70

Copyright © 2022 Pearson Education, Inc.


886

Chapter 15 b.

The test statistic is S = {Number of observations less than 70} = 2.

c.

The p-value = P ( x ≥ 2 ) where x is a binomial random variable with 𝑛 = 3 and 𝑝 = .5.  3  3 2 3− 2 3 3−3 p − value = P ( x ≥ 2) = P ( x = 2) + P ( x = 3) =   (.5) (.5) +   (.5) (.5) = .5  2  3

Since the p-value is not small ( p = .5) , H0 is not rejected. There is insufficient evidence to indicate the median is less than 70 for any reasonable value of 𝛼. 15.7

a.

To determine if the median income of graduates of the MBA program is more than $125,000, we test: H 0 : η = 125, 000 H a : η > 125, 000

b.

The test statistic is S = {Number of observations greater than 125,000} = 9. The p-value = P ( x ≥ 9 ) where x is a binomial random variable with 𝑛 = 15 and 𝑝 = .5. From Table I, p − value = P ( x ≥ 9 ) = 1 − P ( x ≤ 8) = 1 − .696 = .304

Since the p-value is not less than α ( p = .304 </ .05) , H0 is not rejected. There is insufficient evidence to indicate the median income of graduates of the MBA program is more than $125,000 at 𝛼 = .05.

15.8

c.

We must assume only that the sample is selected randomly from a continuous probability distribution.

a.

To determine if the median amount of caffeine in Breakfast Blend coffee exceeds 300 milligrams, we test: H 0 : η = 300 H a : η > 300

15.9

b.

S =4

c.

Using Table I, Appendix D, with 𝑛 = 6 and 𝑝 = .5, P ( x ≥ 4 ) = 1 − P ( x ≤ 3) = 1 − .656 = .344

d.

Since the probability in part c is greater than α = .05 , H0 is not rejected. There is insufficient evidence to indicate the median amount of caffeine in Breakfast Blend coffee exceeds 300 milligrams at 𝛼 = .05.

a.

We would need to assume that the December 2019 retail whole milk prices in all U.S. cities follows an approximate normal distribution. Using Minitab, we created the following stem-and-leaf display of the sample milk prices:

Copyright © 2022 Pearson Education, Inc.


Nonparametric Statistics

887

Stem-and-leaf of Price ($/gallon) N = 10 1 2 2 3 4 5 5 4 2 1

2 2 2 3 3 3 3 3 4 4

5 7 0 3 4 7 89 0 3

Leaf Unit = 0.1

The assumption of an approximate normal distribution of prices does not appear true. b.

The Sign Test for a population median is the appropriate nonparametric test to conduct.

c.

To determine if the median December 2019 milk price for all U.S. cities exceeded $3.30, we test: 𝐻 : 𝜂 = 3.30 𝐻 : 𝜂 > 3.30

d.

The test statistic is S = number of measurements greater than 3.30 = 7.

e.

The p-value= 𝑃 𝑥 ≥ 7 where x is a binomial random variable with 𝑛 = 10 and 𝑝 = .5. From Table I, 𝑝 − value = 𝑃 𝑥 ≥ 7 = 1 − 𝑃 𝑥 ≤ 6 = 1 − .828 = .172

15.10

f.

Since the p-value is not less than 𝛼 𝑝 = .172 ≮ . 01 , H0 is not rejected. There is insufficient evidence to indicate the median December 2019 milk price for all U.S. cities exceeded $3.30 at 𝛼 = .01.

a.

One of the assumptions for the test in Exercise 7.54 is that the population being sampled from is normal. If the data are not normal, then the test is not valid.

b.

One alternative for this test would be the sign test.

c.

The test statistic is S = larger of S1 and S2 where S1 = {Number of observations less than 95} = 5 and S2 = {Number of observations greater than 95} = 2. The test statistic is S = 5.

d.

The p-value = 2P ( x ≥ 5) where x is a binomial random variable with 𝑛 = 7 and 𝑝 = .5. From Table I,

(

)

p-value = 2P ( x ≥ 5) = 2 1 − P ( x ≤ 4) = 2 (1 − .773) = .454 e.

In Exercise 7.54, we used 𝛼 = .05. Since the p-value is not less than 𝛼 𝑝 = .454 ≮ . 05 , H0 is not rejected. There is insufficient evidence to indicate the average trap spacing differs from 95.

Copyright © 2022 Pearson Education, Inc.


888 15.11

Chapter 15 To determine if the median ratio of repair to replacement cost differs from 7.0, we test: H0 : η = 7 Ha : η ≠ 7

S1 = {Number of measurements > 7} = 11. S2 = {Number of measurements < 7} = 2. The test statistic S is the larger of S1 and S2. Thus, S = 11. The p-value = 2 P ( x ≥ 11) where x is a binomial random variable with 𝑛 = 13 and 𝑝 = .5. p − value = 2 P ( x ≥ 11) = 2 (1 − P ( x ≤ 10 ) ) = 2 (1 − .989 ) = .022

Since the p-value is less than α ( p = .022 < .10 ) , H0 is rejected. There is sufficient evidence to indicate the median ratio of repair to replacement cost differs from 7 at 𝛼 = .10. 15.12

To determine if the true median rank of China is higher than 7th, we test: 𝐻 : 𝜂=7 𝐻 : 𝜂>7 The test statistic is S = {Number of observations greater than 7} = 5. The p-value= 𝑃 𝑥 ≥ 5 where x is a binomial random variable with 𝑛 = 12 and 𝑝 = .5. Using MINITAB, 𝑝 − value = 𝑃 𝑥 ≥ 5 = 1 − 𝑃 𝑥 ≤ 4 = 1 − .194 = .806 Since the p-value is not less than 𝛼 𝑝 = .806 ≮ . 05 , H0 is not rejected. There is insufficient evidence to indicate that the true median rank of China is higher than 7th at 𝛼 = .05.

15.13

To determine if the median radon exposure is less than 6,000, we test: H 0 : η = 6, 000 H a : η < 6, 000

The test statistic is S = number of measurements less than 6,000 = 9. The p-value = P ( x ≥ 9 ) where x is a binomial random variable with 𝑛 = 12 and 𝑝 = .5. Using MINITAB, p − value = P ( x ≥ 9 ) = 1 − P ( x ≤ 8) = 1 − .927 = .073

Since the p-value is less than α ( p = .073 < .10 ) , H0 is rejected. There is sufficient evidence to indicate the median radon exposure is less than 6,000 at 𝛼 = .10. No, the tombs should not be closed.

Copyright © 2022 Pearson Education, Inc.


Nonparametric Statistics 15.14

889

To determine if half of all stocks with suspended short-sales have a positive return rate, we test: H 0 :η = 0 H a :η ≠ 0

S1 = {Number of measurements < 0} = 11. S2 = {Number of measurements > 0} = 6. The test statistic S is the larger of S1 and S2. Thus, S = 11. The p-value = 2 P ( x ≥ 11) where x is a binomial random variable with 𝑛 = 17 and 𝑝 = .5. Using MINITAB,

p − value = 2P ( x ≥ 11) = 2 (1 − P ( x ≤ 10) ) = 2 (1 − .834) = .332 Since the p-value is not less than 𝛼 𝑝 = .332 ≮ . 05 , H0 is not rejected. There is insufficient evidence to indicate half of all stocks with suspended short-sales have a positive return rate at 𝛼 = .05. 15.15

To determine if the distribution of A is shifted to the left of distribution B, we test: H0: The two sampled populations have identical distributions Ha: The probability distribution for population A is shifted to the left of population B. n ( n + n2 + 1) 15 (15 + 15 + 1) 173 − T1 − 1 1 2 2 The test statistic is z = = = −2.47 15 (15)(15 + 15 + 1) n1n2 ( n1 + n2 + 1) 12 12

The rejection region requires 𝛼 = .05 in the lower tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.645. The rejection region is 𝑧 < −1.645. Since the observed value of the test statistic falls in the rejection region ( z = −2.47 < −1.645) , H0 is rejected. There is sufficient evidence to indicate the distribution of A is shifted to the left of distribution B at 𝛼 = .05. 15.16

a.

The test statistic is T2, the rank sum of population 2 (because n2 < n1 ). The rejection region is T2 ≤ 35 or T2 ≥ 67 , from Table XII, Appendix D, with n1 = 10, n2 = 6, and α = .10 .

b.

The test statistic is T1, the rank sum of population 1 (because n1 < n2 ). The rejection region is T1 ≥ 43 , from Table XII, Appendix D, with n1 = 5, n2 = 7, and α = .05 .

c.

The test statistic is T2, the rank sum of population 2 (because n2 < n1 ). The rejection region is T2 ≥ 93 , from Table XII, Appendix D, with n1 = 9, n2 = 8, and α = .025 .

Copyright © 2022 Pearson Education, Inc.


890

Chapter 15

d.

n ( n + n + 1) T1 − 1 1 2 2 Since n1 = n2 = 15 , the test statistic is z = . n1n2 ( n1 + n2 + 1) 12

The rejection region requires α / 2 = .05 / 2 = .025 in each tail of the z-distribution. From Table II, Appendix D, z.025 = 1.96 . The rejection region is z < −1.96 or z > 1.96 . 15.17

a.

The hypotheses are: H0: Two sampled populations have identical distributions Ha: The probability distribution for population B is shifted to the right of that for population B

b.

First, we rank all the data: A Observation 37 40 33 29 42 33 35 28 34

Rank 8 9 3.5 2 10 3.5 6.5 1 5 TA = 48.5

B Observation 65 35 47 52

Rank 13 6.5 11 12

TB = 42.5

The test statistic is TB = 42.5 because nB < nA . The rejection region is TB ≥ 39 from Table XII, Appendix D, with nA = 9, nB = 4, and α = .05 .. Since the observed value of the test statistic falls in the rejection region (TB = 42.5 ≥ 39 ) , H0 is rejected. There is sufficient evidence to indicate the distribution for population B is shifted to the right of the distribution for population A at α = .05 . 15.18 Sample from Population 1 15 10 12 16 13 8

Rank 13 8.5 10.5 14 12 4.5

T1 = 62.5

a.

Sample from Population 2 5 12 9 9 8 4 5 10

Rank 2.5 10.5 6.5 6.5 4.5 1 2.5 8.5 T2 = 42.5

To determine if there is a shift in the locations of the probability distributions, we test:

Copyright © 2022 Pearson Education, Inc.


Nonparametric Statistics

891

H0: The two sampled populations have identical probability distributions Ha: The probability distribution for population 1 is shifted to the left or to the right of that for population 2 The test statistic is T1 = 62.5 since sample A has the smallest number of measurements. The null hypothesis will be rejected if T1 ≤ TL or T1 ≥ TU where TL and TU correspond to α = .05 (twotailed), n1 = 6 and n2 = 8 . From Table XII, Appendix D, TL = 29 and TU = 61 . Reject H0 if T1 ≤ 29 or T1 ≥ 61 . Since T1 = 62.5 ≥ 61 , H0 is rejected. There is sufficient evidence to indicate population 1 is shifted to the left or right of population 2 at 𝛼 = .05. b.

To determine if the distribution of population 1 is shifted to the right of that for population 2, we test: H0: The two sampled populations have identical probability distributions Ha: The probability distribution for population 1 is shifted to the right of population 2 The test statistic remains T1 = 62.5 . The null hypothesis will be rejected if T1 ≥ TU where TU corresponds to α = .05 (one-tailed), n1 = 6 and n2 = 8 . From Table XII, Appendix D, TU = 58 . Reject H0 if T1 ≥ 58 . Since T1 = 62.5 ≥ 58 , H0 is rejected. There is sufficient evidence to indicate population 1 is shifted to the right of population 2 at 𝛼 = .05.

15.19

a.

To determine if the median responses for the two groups of students differ, we test: H0: The two sampled populations have identical probability distributions Ha: The probability distribution for non-texting group is shifted to the right or left of that for texting group

b.

Since the p-value is so small ( p = .004 ) , H0 is rejected. There is sufficient evidence to indicate the median responses for the two groups of students differ for any reasonable value of α . Since the sample median for the students in the texting group is greater than the sample median of the nontexting group, the students in the texting group have more of a preference for face-to-face meetings with their professor.

15.20

a.

To determine if the median responses for the two groups of students differ, we test: H0: The two sampled populations have identical probability distributions Ha: The probability distribution for the low rating group is shifted to the right or left of that for the high rating group

b.

In order to conduct the test, we need to arrange the data set into two groups of ratings, high and low, and then rank the capitalization rates of the two groups.

Copyright © 2022 Pearson Education, Inc.


892

Chapter 15 The ranks of the data are:

5

High Rating 6.75

7.5

4

6.75

2.5

8.5

6.5

5.5

1

8.5

6.5

Low Rating 8.45

Rank

T 1 = 22

Rank 2.5

T2 = 6

The sum of the ranks is 𝑇 = 22. c.

The null hypothesis will be rejected if 𝑇 ≤ 𝑇 or T ≥ 𝑇 where TL and TU correspond to 𝛼 = .10 (two-tailed), 𝑛 = 4 and 𝑛 = 3. From Table XII, Appendix D, 𝑇 = 7 and 𝑇 = 17. Reject H0 if 𝑇 ≤ 7 or 𝑇 ≥ 17.

15.21

d.

Since 𝑇 = 22 ≥ 17, H0 is rejected. There is sufficient evidence to indicate that probability distribution for the low rating group is shifted to the right or left of that for the high rating group at 𝛼 = .10.

a.

The ranks of the data are: Old Design 210 212 211 211 190 213 212 211 164 209

Rank 9 13.5 11 11 7 15 13.5 11 5 8 T1 = 104

New Design 216 217 162 137 219 216 179 153 152 217

Rank 16.5 18.5 4 1 20 16.5 6 3 2 18.5 T2 = 106

b.

The sum of the ranks is T1 = 104 .

c.

The sum of the ranks is T2 = 106 .

d.

Since n1 = n2 = 10 , either T1 or T2 can be used. We will pick T1 = 104 .

e.

To determine if the distributions of bursting strengths differ for the two designs, we test: H0: The two sampled populations have identical probability distributions Ha: The probability distribution of the new design is located to the right or left of that for the old design. The test statistic is T1 = 104 . Copyright © 2022 Pearson Education, Inc.


Nonparametric Statistics

893

The null hypothesis will be rejected if T1 ≤ TL or T1 ≥ TU where TL and TU correspond to α = .05 (two-tailed) and n1 = n2 = 10 . From Table XII, Appendix D, TL = 79 and TU = 131 . Reject H0 if T1 ≤ 79 or T1 ≥ 131 . Since T1 = 104 ≤/ 79 and T1 = 104 ≥/ 131 , H0 is not rejected. There is insufficient evidence to indicate the distributions of bursting strengths differ for the two designs at α = .05 . 15.22

a.

To determine if men and women differ in their attitude toward public corruption and tax evasion, we test: H0: The attitudes toward public corruption and tax evasion for men and women have identical probability distributions Ha: The probability distribution of the attitudes toward public corruption and tax evasion for men is located to the right or left of that for the women.

15.23

b.

Since the p-value is so small, H0 will be rejected for any reasonable value of 𝛼. There is sufficient evidence to indicate that the probability distribution of the attitudes toward justifyable corruption for men is located to the right or left of that for the women.

c.

The larger the sum, the more the group thinks the behavior is “never justified”. Thus, women think that there is less justifyability of corruption than men.

d.

Since the p-value is so small, H0 will be rejected for any reasonable value of 𝛼. There is sufficient evidence to indicate that the probability distribution of the attitudes toward justifyable tax evasion for men is located to the right or left of that for the women.

e.

The larger the sum, the more the group thinks the behavior is “never justified”. Thus, women think that there is less justifyability of tax evasion than men.

a.

The Wilcoxon Rank Sum Test would be appropriate for analyzing these data.

b.

To determine if the low-handicapped golfers have a higher X-factor than high-handicapped golfers, we test: H0: The probability distributions of the X-factors for low-handicapped and high handicapped golfers are identical Ha: The probability distribution of the X-factors for low-handicapped golfers is shifted to the right of that for high-handicapped golfers

c.

The rejection region is T2 ≤ 41 , from Table XII, Appendix D, with n1 = 8 , n2 = 7 , and 𝛼 = .05.

d.

Since the p-value is not less than α ( p = .487 </ .05) , H0 is not rejected. There is insufficient evidence to indicate that low-handicapped golfers have a higher X-factor than high-handicapped golfers at α = .05 .

15.24

a.

The researchers’ theory is somewhat supported based on the sample medians. The highest sample median gain (3) was with the group that had no solutions. This was greater than the sample median gains for the other 2 groups. This supports the theory. However, the sample median gain for the group with check figures (2) and the sample median gain for the group with completed solutions (2) were the same. This does not support the theory. Copyright © 2022 Pearson Education, Inc.


894

Chapter 15 b.

These are sample statistics and thus, are variables. If we took different samples, the sample medians would probably change. We also do not know what the shapes of the distributions are.

c.

To determine if the distribution for the group receiving no solutions is shifted to the right of that for the group receiving check figures, we test: H0: The two sampled populations have identical probability distributions Ha: The distribution of the group receiving no solutions is shifted to the right of that for the group receiving check figures

15.25

d.

Since the p-value is not less than 𝛼 𝑝 = .456 ≮ . 05 , H0 is not rejected. There is insufficient evidence to indicate the distribution for the group receiving no solutions is shifted to the right of that for the group receiving check figures at 𝛼 = .05. Thus, the researchers’ theory is not supported.

a.

To determine if the distribution for the one-time lump sum payment group is shifted to the right of that for the multiple payment group, we test: H0: The two sampled populations have identical probability distributions Ha: The distribution of the one-time lump sum payment group is shifted to the right of that for the multiple payments group Using MINITAB, the results are:

Mann-Whitney: Lump Sum, Multiple Method η₁: median of Lump Sum η₂: median of Multiple Difference: η₁ - η₂

Descriptive Statistics N

Median

Lump Sum 10 Multiple 10

Sample

3.5 3.5

Estimation for Difference

Difference

Lower Bound for Difference

Achieved Confidence

0.5

-1

95.55%

Test Null hypothesis Alternative hypothesis

H₀: η₁ - η₂ = 0 H₁: η₁ - η₂ > 0

Method

W-Value

P-Value

112.00 112.00

0.312 0.309

Not adjusted for ties Adjusted for ties

The test statistic is T1 = 112 and the p-value is 0.312. Since the p-value is not less than 𝛼 𝑝 = .312 ≮ . 05 , H0 is not rejected. There is insufficient evidence to indicate the distribution for the one-time lump payment group is shifted to the right of that for the multiple payments group at 𝛼 = .05 Copyright © 2022 Pearson Education, Inc.


Nonparametric Statistics

15.26

895

b.

Since the data are not likely to be normally distributed, the parametric test back in Chapter 8 would not be appropriate to conduct. This nonparametric test would be the valid test to conduct.

a.

First, we rank the observations: Resp. 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3

Nurses Rank Resp. Rank 6 3 33 6 3 33 6 3 33 6 3 33 18 3 33 18 4 43 18 5 54.5 18 5 54.5 18 5 54.5 18 5 54.5 18 5 54.5 18 5 54.5 18 5 54.5 18 7 74 33 7 74 33 T1 = 1, 007.5

Resp. 1 1 1 1 1 1 1 2 2 2 3 3 3 3 3 3

Rank 6 6 6 6 6 6 6 18 18 18 33 33 33 33 33 33

Doctors Resp. Rank 3 33 3 33 3 33 3 33 4 43 4 43 5 54.5 5 54.5 5 54.5 5 54.5 5 54.5 5 54.5 5 54.5 5 54.5 5 54.5 5 54.5

Resp. 5 5 5 6 6 6 6 6 6 7 7 7 7 7

Rank 54.5 54.5 54.5 67.5 67.5 67.5 67.5 67.5 67.5 74 74 74 74 74

T2 = 1,995.5

To determine if the distributions for nurses and doctors differ on Question 4, we test: H0: The distributions for the responses to Question 4 are identical for nurses and doctors Ha: The distributions for the responses to Question 4 for nurses is shifted to the right or left of that for doctors n ( n + n + 1) 31( 31 + 46 + 1) T1 − 1 1 2 1, 007.5 − 2 2 The test statistic is z = = = −2.09 31( 46)( 31 + 46 + 1) n1n2 ( n1 + n2 + 1) 12 12

The rejection region requires 𝛼/2 = .05/2 = .025 in each tail of the z-distribution. From Table II, = 1.96. The rejection region is 𝑧 < −1.96 or 𝑧 > 1.96. Appendix D, 𝑧. Since the observed value of the test statistic falls in the rejection region 𝑧 = −2.09 < −1.96 , H0 is rejected. There is sufficient evidence to indicate the response distributions for nurses and doctors differ on Question 4 at 𝛼 = .05. Doctors and nurses do not agree that physicians and nurses cooperate in making decisions. b.

First, we rank the observations:

Copyright © 2022 Pearson Education, Inc.


896

Chapter 15 Nurses Rank Resp. Rank 1.5 4 26.5 1.5 4 26.5 7 4 26.5 7 4 26.5 7 5 43.5 7 5 43.5 17 5 43.5 17 5 43.5 17 5 43.5 17 5 43.5 17 5 43.5 17 5 43.5 17 6 62.5 17 6 62.5 26.5 7 73 26.5 T1 = 872

Resp. 1 1 2 2 2 2 3 3 3 3 3 3 3 3 4 4

Resp. 2 2 2 2 2 3 3 3 4 4 5 5 5 5 5 5

Rank 7 7 7 7 7 17 17 17 26.5 26.5 43.5 43.5 43.5 43.5 43.5 43.5

Doctors Resp. Rank 5 43.5 5 43.5 5 43.5 5 43.5 5 43.5 5 43.5 5 43.5 5 43.5 5 43.5 5 43.5 5 43.5 5 43.5 6 62.5 6 62.5 6 62.5 6 62.5

Resp. 6 6 6 6 6 6 7 7 7 7 7 7 7 7

Rank 62.5 62.5 62.5 62.5 62.5 62.5 73 73 73 73 73 73 73 73

T2 = 2,131

To determine if the distributions for nurses and doctors differ on Question 5, we test: H0: The distributions for the responses to Question 5 are identical for nurses and doctors Ha: The distributions for the responses to Question 5 for nurses is shifted to the right or left of that for doctors n ( n + n2 + 1) 31( 31 + 46 + 1) T1 − 1 1 872 − 2 2 The test statistic is z = = = −3.50 31( 46)( 31 + 46 + 1) n1n2 ( n1 + n2 + 1) 12 12

The rejection region requires 𝛼/2 = .05/2 = .025 in each tail of the z-distribution. From Table II, = 1.96. The rejection region is 𝑧 < −1.96 or 𝑧 > 1.96. Appendix D, 𝑧. Since the observed value of the test statistic falls in the rejection region 𝑧 = −3.50 < −1.96 , H0 is rejected. There is sufficient evidence to indicate the response distributions for nurses and doctors differ on Question 5 at 𝛼 = .05. Doctors and nurses do not agree that in making decisions, both nursing and medical concerns about patients’ needs are considered. a.

Using MINITAB, the histrograms of the data are: Normal -6

Control

14

0

6

12

18

Rudeness

12 10

Frequency

15.27

24

30 Control Mean 11.81 StDev 7.383 N 53 Rudeness Mean 8.511 StDev 3.992 N 45

8 6 4 2 0

-6

0

6

12

18

24

30

Copyright © 2022 Pearson Education, Inc.


Nonparametric Statistics

897

As you can see from the above graphs, the distribution for the control group is skewed to the right while the distribution for the Rudeness group is fairly normal. b.

We first rank the data:

Score 1 24 5 16 21 7 20 1 9 20 19 10 23 16 0 4 9 13 17 13 0 2 12 11 7 1 19

Control Group Rank Score 5.5 9 96 12 22 18 81.5 5 93.5 21 30.5 30 91 15 5.5 4 42 2 91 12 88 11 50 10 95 13 81.5 11 2 3 17 6 42 10 73 13 84 16 73 12 2 28 9 19 66.5 12 58.5 20 30.5 3 5.5 11 88

Rank 42 66.5 85.5 22 93.5 98 78 17 9 66.5 58.5 50 73 58.5 12.5 25.5 50 73 81.5 66.5 97 88 66.5 91 12.5 58.5

Score 4 11 18 11 9 6 5 11 9 12 7 5 7 3 11 1 9 11 10 7 8 9 10

Rudeness Condition Rank Score 17 7 58.5 11 85.5 4 58.5 13 42 5 25.5 4 22 7 58.5 8 42 3 66.5 8 30.5 15 22 9 30.5 16 12.5 10 58.5 0 5.5 7 42 15 58.5 13 50 9 30.5 2 36 13 42 10 50

T1 = 2,964.5

Rank 30.5 58.5 17 73 22 17 30.5 36 12.5 36 78 42 81.5 50 2 30.5 78 73 42 9 73 50

T2 = 1,886.5

To determine if the distribution of the rudeness condition is shifted to the left of that for the control group, we test: H0: The distributions of the two sampled populations are identical Ha: The distribution of the rudeness group scores is shifted to the left of that for the control group n ( n + n + 1) 53 ( 53 + 45 + 1) T1 − 1 1 2 2,964.5 − 2 2 The test statistic is z = = = 2.43 53 ( 45)( 53 + 45 + 1) n1n2 ( n1 + n2 + 1) 12 12

The rejection region requires 𝛼 = .01in the upper tail of the z-distribution. From Table II, Appendix D, 𝑧. = 2.33. The rejection region is 𝑧 > 2.33. Since the observed value of the test statistic falls in the rejection region 𝑧 = 2.43 > 2.33 , H0 is rejected. There is sufficient evidence to indicate the distribution of the rudeness condition scores is shifted to the left of that for the control group at𝛼 = .01. c.

Since the sample sizes for both groups were over 30, the Central Limit Theorem applies. Thus, the parametric 2-sample test in Exercise 8.21 is appropriate. Copyright © 2022 Pearson Education, Inc.


898

15.28

Chapter 15

We first rank all the data: Score 12 11 15 11 10 13 10 4 15 16 9 14 10 6 10 8 11 12

Honey Dosage Rank Score Rank 52 12 52 44 8 21 65 12 52 44 9 28 37 11 44 59.5 15 65 37 10 37 5 15 65 65 9 28 68 13 59.5 28 8 21 62 12 52 37 10 37 11.50 8 21 37 9 28 21 5 9 44 12 52 52 T1 = 1, 440.5

Score 4 6 9 4 7 7 7 9 12 10 11 6 3 4 9 12 7

DM Dosage Rank Score 5 6 11.5 8 28 12 5 12 16 4 16 12 16 13 28 7 52 10 37 13 44 9 11.5 4 1 4 5 10 28 15 52 9 16

Rank 11.5 21 52 52 5 52 59.5 16 37 59.5 28 5 5 37 65 28 T2 = 905.5

To determine if the distribution of the honey scores is shifted to the right of that for the DM scores, we test: H0: The distributions of the two sampled populations are identical Ha: The distribution of the honey scores is shifted to the right of that for the DM scores n ( n + n + 1) 35 ( 35 + 33 + 1) T1 − 1 1 2 1440.5 − 2 2 The test statistic is z = = = 2.86 35 ( 33)( 35 + 33 + 1) n1n2 ( n1 + n2 + 1) 12 12

The rejection region requires 𝛼 = .05 in the upper tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.645. The rejection region is 𝑧 > 1.645. Since the observed value of the test statistic falls in the rejection region 𝑧 = 2.86 > 1.645 , H0 is rejected. There is sufficient evidence to indicate the distribution of the honey scores is shifted to the right of that for the DM scores at 𝛼 = .05. 15.29

a.

The test statistic is the smaller of T− or T+. The rejection region is T ≤ 152 , from Table XIII, Appendix D, with n = 30 , α = .10 , and two-tailed.

b.

The test statistic is T−. The rejection region is T− ≤ 60 , from Table XIII, Appendix D, with n = 20 , α = .05 , and one-tailed.

c.

The test statistic is T+. The rejection region is T+ ≤ 0 , from Table XIII, Appendix D, with n = 8 , α = .005 , and one-tailed.

Copyright © 2022 Pearson Education, Inc.


Nonparametric Statistics 15.30

a.

b.

899

The rejection region requires 𝛼 = .05 in the upper tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.645. The rejection region is 𝑧 > 1.645.

The large sample test statistic is z =

n ( n + 1) 25 ( 26) 273 − 4 4 = 2.97 = n ( n + 1)( 2n + 1) 25 ( 51) 24 24 T+ −

Since the observed value of the test statistic falls in the rejection region 𝑧 = 2.97 > 1.645 , H0 is rejected. There is sufficient evidence to indicate that the responses for A tend to be larger than those for B at 𝛼 = .05. c.

𝑝 − value = 𝑃 𝑧 ≥ 2.97 = .5 − .4985 = .0015 (from Table II, Appendix D)

Thus, we can reject H0 for any preselected α greater than .0015. 15.31

a.

The hypotheses are: H0: H a:

b.

The two sampled populations have identical probability distributions The probability distributions for population A is shifted to the right of that for population B

Some preliminary calculations are: Treatment Difference A B A-B 54 45 9 60 45 15 98 87 11 43 31 12 82 71 11 77 75 2 74 63 11 29 30 −1 63 59 4 80 82 −2

Rank of Absolute Difference 5 10 7 9 7 2.5 7 1 4 2.5 T− = 3.5

The test statistic is T− = 3.5 . The rejection region is 𝑇 ≤ 8, from Table XIII, Appendix D, with 𝑛 = 10 and 𝛼 = .025. Since the observed value of the test statistic falls in the rejection region 𝑇 = 3.5 ≤ 8 , Ho is rejected. There is sufficient evidence to indicate the responses for A tend to be larger than those for B at 𝛼 = .025. 15.32

a.

b.

H0: The two sampled populations have identical probability distributions Ha: The probability distribution for population 1 is located to the right of that for population 2

The test statistic is z =

30 ( 30 + 1) n ( n + 1) 354 − 121.5 4 4 = = = 2.50 . 48.6184 30 ( 30 + 1)( 60 + 1) n ( n + 1)( 2n + 1) 24 24 T+ −

Copyright © 2022 Pearson Education, Inc.


900

Chapter 15

The rejection region requires 𝛼 = .05 in the upper tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.645. The rejection region is 𝑧 > 1.645. Since the observed value of the test statistic falls in the rejection region 𝑧 = 2.50 > 1.645 , H0 is rejected. There is sufficient evidence to indicate population 1 is located to the right of that for population 2 at 𝛼 = .05. c.

The p − value = P ( z ≥ 2.50 ) = .5 − .4938 = .0062 (using Table II, Appendix D).

d.

The necessary assumptions are: 1. 2.

15.33

The sample of differences is randomly selected from the population of differences. The probability distribution from which the sample of paired differences is drawn is continuous.

a.

In order for the confidence interval to be valid, the distribution of the differences must be normal. This may not be the case.

b.

For this paired comparison, the appropriate nonparametric test is the Wilcoxon Signed Rank Test. To determine if there is a difference in the true THM means between the original holes and their twin holes, we test: H0: The distributions of the the THM values for the original holes and their twins are identical Ha: The distribution of the THM values for the the original holes is shfted to the right or left of that for the twin holes.

c & d. The differences and the ranks are in the following table:

Location 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

1st Hole 5.5 11.0 5.9 8.2 10.0 7.9 10.1 7.4 7.0 9.2 8.3 8.6 10.5 5.5 10.0

2nd Hole 5.7 11.2 6.0 5.6 9.3 7.0 8.4 9.0 6.0 8.1 10.0 8.1 10.4 7.0 11.2

Diff 1st – 2nd -0.2 -0.2 -0.1 2.6 0.7 0.9 1.7 -1.6 1.0 1.1 -1.7 0.5 0.1 -1.5 -1.2

Rank of Absolute Difference 3.5 3.5 1.5 15 6 7 13.5 12 8 9 13.5 5 1.5 11 10 T− = 55

e.

T− = 3.5 + 3.5 + 1.5 + 12 + 13.5 + 11 + 10 = 55 and T+ = 15 + 6 + 7 + 13.5 + 8 + 9 + 5 + 1.5 = 65

f.

The test statistic is the smaller of T− and T+ which is T− = 55 .

Copyright © 2022 Pearson Education, Inc.


Nonparametric Statistics

901

The rejection region is 𝑇 ≤ 25, from Table XIII, Appendix D, with 𝑛 = 15 and 𝛼 = .05. Since the observed value of the test statistic does not fall in the rejection region 𝑇 = 55 ≰ 25 , H0 is not rejected. There is insufficient evidence to indicate that there is a difference in the true THM means between the original holes and their twin holes at 𝛼 = .05. 15.34

a.

To determine if the distribution of COP ranges for Conditions A and B differ, we test: H0: The two sampled populations have identical probability distributions Ha: The probability distribution of COP ranges for Condition A is shifted to the right or left of that for Condition B

b and c. Subject

ConditionA

ConditionB

Difference

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

15.2 16 16.8 18.2 18.7 19.2 19.2 21.3 21.5 21.6 21.9 22.8 23.3 23.5 24.8 25.1 25.2 26.8 27.1 28.3 28.5 28.5 29.1 30.5

10.5 14.1 15.7 16.2 17.1 17.2 18.2 18.5 19 19.5 21.9 20.4 20.6 20.7 20.9 21.2 22 22.2 22.9 24 24.1 24.7 25.2 27.9

4.7 1.9 1.1 2 1.6 2 1 2.8 2.5 2.1 0 2.4 2.7 2.8 3.9 3.9 3.2 4.6 4.2 4.3 4.4 3.8 3.9 2.6

Rank of Absolute Difference

23 4 2 5.5 3 5.5 1 12.5 9 7 8 11 12.5 17 17 14 22 19 20 21 15 17 10 T+ =276 T- = 0

d.

The test statistic is T =𝑇 = 0.

e.

The rejection region is 𝑇 ≤ 𝑇 where To corresponds to 𝛼 = .05 (two-tailed) and 𝑛 = 23. From Table Copyright © 2022 Pearson Education, Inc.


902

Chapter 15

XIII, Appendix D,𝑇 = 81. The rejection region is 𝑇 ≤ 73. f.

Since the observed value of the test statistic falls in the rejection region 𝑇 = 0 ≤ 73 , H0 is rejected. There is sufficient evidence to indicate the distribution of COP ranges for Condition A is shifted to the right or left of that for Condition B at 𝛼 = .05.

g.

To determine if the distribution of COP ranges for Conditions B and C differ, we test: H0: The two sampled populations have identical probability distributions Ha: The probability distribution of COP ranges for Condition B is shifted to the right or left of that for Condition C

Subject 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

ConditionB 10.5 14.1 15.7 16.2 17.1 17.2 18.2 18.5 19 19.5 21.9 20.4 20.6 20.7 20.9 21.2 22 22.2 22.9 24 24.1 24.7 25.2 27.9

ConditionC 13.9 12.1 14.4 15.7 15.8 18.7 14.8 17.2 17.4 17.5 17.6 18.2 21.1 19.4 21 19 21.5 22.3 23.6 24.4 24.7 24.7 28.5 29.7

Rank of Absolute Difference 22.5 17.5 11.5 6 11.5 14 22.5 11.5 15 17.5 24 19.5 6 11.5 2.5 19.5 6 2.5 9 4 8

Difference -3.4 2 1.3 0.5 1.3 -1.5 3.4 1.3 1.6 2 4.3 2.2 -0.5 1.3 -0.1 2.2 0.5 -0.1 -0.7 -0.4 -0.6 0 -3.3 -1.8

21 16 T+ = 180.5 T- = 95.5

The test statistic is T =𝑇 = 95.5. The rejection region is 𝑇 ≤ 𝑇 where To corresponds to 𝛼 = .05 (two-tailed) and 𝑛 = 23. From Table XIII, Appendix D,𝑇 = 73. The rejection region is 𝑇 ≤ 73. Since the observed value of the test statistic does not fall in the rejection region 𝑇 = 95.5 ≰ 73 , H0 Copyright © 2022 Pearson Education, Inc.


Nonparametric Statistics

903

is rejected. There is insufficient evidence to indicate the distribution of COP ranges for Condition B is shifted to the right or left of that for Condition C at 𝛼 = .05.

15.35

a.

b.

n ( n + 1) 4 = n ( n + 1)( 2n + 1) 24 T− −

The test statistic is z =

31( 31 + 1) 4 = −4.63 . 31( 31 + 1) ( 2 ( 31) + 1) 11.50 −

24

To determine if handling a museum object has a positive impact on a sick patient’s well-being, we test: H0: The distributions of patients’ heath statuses before and after handling museaum pieces are identical Ha: The distribution of patients’ heath statuses after handling museaum pieces is shifted to the right of the distribution before handling museum pieces The test statistic is z = −4.63 . From the printout, the p-value is 𝑝 < .000. Since the p-value is less than 𝛼 𝑝 < .0001 < .01 , H0 is rejected. There is sufficient evidence to indicate handling a museum object has a positive impact on a sick patient’s well-being at 𝛼 = .01.

15.36

a.

To determine if the chest injury ratings of drivers and front-seat passengers differ, we test: H0: The two sampled populations have identical probability distributions Ha: The probability distribution of drivers is shifted to the right or left of that for front-seat passengers

b.

Using MINITAB, the results are: Wilcoxon Signed Rank Test: Diff Test of median = 0.000000 versus median not = 0.000000

Diff

N 18

N for Test 16

Wilcoxon Statistic 23.0

P 0.021

Estimated Median -4.000

From the printout, the test statistic is T+ = 23. c.

The rejection region is T+ ≤ To where To corresponds to 𝛼 = .01 (two-tailed) and 𝑛 = 16. From Table XIII, Appendix D, 𝑇 = 19. The rejection region is 𝑇 ≤ 19.

d.

Since the observed value of the test statistic does not fall in the rejection region 𝑇 = 23 ≰ 19 , H0 is not rejected. There is insufficient evidence to indicate the chest injury ratings of drivers and frontseat passengers differ at𝛼 = .01. From the printout, the p-value is p = .021 .

15.37

To determine if the photo-red enforcement program is effective in reducing red-light-running crash incidents at intersections, we test: H0: The two sampled populations have identical probability distributions Ha: The probability distribution after the camera installation is shifted to the left of that before the camera installation Copyright © 2022 Pearson Education, Inc.


904

Chapter 15

From the printout, the test statistic is 𝑇 = 79 and the p-value is 𝑝 = .011. Since the p-value is so small, H0 is rejected. There is sufficient evidence to indicate the photo-red enforcement program is effective in reducing red-light-running crash incidents at intersections for any value of α greater than .011. 15.38

Some preliminary calculations: School A B C D E F G H I J K L M N O Q R S P

Inventory 100.0 95.5 90.6 77.8 66.7 64.5 62.5 55.1 54.3 54.3 53.8 53.7 52.9 52.0 51.5 50.0 50.0 50.0 50.0

Checklist 100.0 66.7 62.5 71.4 66.7 50.0 50.0 63.6 58.3 55.6 58.3 58.3 66.7 54.5 50.0 100.0 66.7 62.5 50.0

Diff I-C 0.0 28.8 28.1 6.4 0.0 14.5 12.5 -8.5 -4.0 -1.3 -4.5 -4.6 -13.8 -2.5 1.5 -50.0 -16.7 -12.5 0.0

Rank Abs 30 29 10 19 14 11 5 1 7 8 18 4 2 31 24 14 -

School T U V W X Y Z AA BB CC DD EE FF GG HH II JJ KK

Inventory 46.3 44.2 43.8 43.5 42.2 41.3 40.7 39.0 38.5 35.8 32.4 29.2 28.9 27.8 25.0 25.0 7.7 6.3

Checklist 60.0 70.0 50.0 60.0 40.0 54.5 55.6 55.6 42.9 44.4 50.0 50.0 45.5 50.0 100.0 12.5 66.7 66.7

Diff I-C -13.7 -25.8 -6.2 -16.5 2.2 -13.2 -14.9 -16.6 -4.4 -8.6 -17.6 -20.8 -16.6 -22.2 -75.0 12.5 -59.0 -60.4

Rank Abs 17 28 9 21 3 16 20 22.5 6 12 25 26 22.5 27 34 14 32 33

T− = 474 , T+ = 121

To determine if the distribution of healthy food item percentages using the inventory method is shifted above or below the distribution of healthy food item percentages using the checklist method, we test: H0: The distributions of healthy food item percentages using the inventory method and the checklist method are identical Ha: The distributions of healthy food item percentages using the inventory method is shifted to the right or left of the distribution using the checklist method

The test statistic is z =

n ( n + 1) 4 = n ( n + 1)( 2n + 1) 24 T− −

34 ( 34 + 1) 4 = 3.02 . 34 ( 34 + 1) ( 2 ( 34) + 1) 474 −

24

The rejection region requires 𝛼/2 = .05/2 = .025 in each tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.96. The rejection region is 𝑧 < −1.96 or 𝑧 > 1.96. Since the observed value of the test statistic falls in the rejection region 𝑧 = 3.02 > 1.96 , H0 is rejected. There is sufficient evidence to indicate the distributions of healthy food item percentages using the inventory method is shifted to the right or left of the distribution using the checklist method at𝛼 = .05. Since there is a difference, we would recommend staying with the inventory method. Copyright © 2022 Pearson Education, Inc.


Nonparametric Statistics

905

15.39 Operator 1 2 3 4 5 6 7 8 9 10

Before Policy 10 3 16 11 8 2 1 14 5 6

After Policy 5 0 7 4 6 4 2 3 5 1

Difference 5 3 9 7 2 −2 −1 11 0 5

Rank of Absolute Difference 5.5 4 8 7 2.5 2.5 1 9 (eliminated) 5.5 T− = 3.5 T+ = 41.5

To determine if the distributions of the number of complaints differs for the two time periods, we test: H0: The distributions of the number of complaints for the two years are the same Ha: The distribution of the number of complaints after the policy change is shifted to the right or left of the distribution before the policy change. The test statistic is T− = 3.5 . Since no α is given we will use 𝛼 = .05. The null hypothesis will be rejected if 𝑇 ≤ 𝑇 where To corresponds to 𝛼 = .05 (two-tailed) and 𝑛 = 9. From Table XIII, Appendix D, 𝑇 = 6. Reject H0 if 𝑇 ≤ 6. Since the observed value of the test statistic falls in the rejection region 𝑇 = 3.5 ≤ 6 , H0 is rejected. There is sufficient evidence to indicate the distributions of the complaints are different for the two years at 𝛼 = .05. 15.40

Some preliminary calculations:

Day Oct. 24 Dec. 3 Dec. 15 Feb. 2 Mar. 25 May 24

Field Measurement -58 69 35 -32 -40 -83

3D Model -52 59 32 -24 -39 -71

Difference F - 3D -6 10 3 -8 -1 -12

Rank of Absolute Difference 3 5 2 4 1 6 T− = 14 T+ = 7

To determine if there is a shift in the change in transverse strain distributions between field measurements and the 3D model, we test: H0: The two sampled populations have identical probability distributions Ha: The probability distribution of the field measurements is shifted to the right or left of that for the 3D model Copyright © 2022 Pearson Education, Inc.


906

Chapter 15

The test statistic T is the smaller of T- and T+. Thus, T = 7. The rejection region is 𝑇 ≤ 1, from Table XIII, Appendix D, with 𝑛 = 6 and 𝛼 = .05 (two-tailed). Since the observed value of the test statistic does not fall in the rejection region 𝑇 = 7 ≰ 1 , H0 is not rejected. There is insufficient evidence to indicate that there is a shift in the change in transverse strain distributions between field measurements and the 3D model at 𝛼 = .05. 15.41

Some preliminary calculations are:

Month February April July September October

East-West 8,658 7,930 5,120 6,862 8,608

North-South 8,921 8,317 5,274 7,148 8,936

Difference -263 -387 -154 -286 -328

Rank of Absolute Difference 2 5 1 3 4 T− = 15 T+ = 0

To determine if the distribution of monthly solar energy levels for north-south oriented highways is shifted above the corresponding distribution for east-west oriented highways, we test: H0: The distributions of monthly solar energy levels for the two types of highways are the same Ha: The distribution of monthly solar energy levels for north-south oriented highways is shifted above the corresponding distribution for east-west oriented highways The test statistic is T+ = 0 . The null hypothesis will be rejected if 𝑇 ≤ 𝑇 where To corresponds to 𝛼 = .05 (one-tailed) and 𝑛 = 5. From Table XIII, Appendix D, 𝑇 = 1. Reject H0 if 𝑇 ≤ 6. Since the observed value of the test statistic falls in the rejection region 𝑇 = 0 ≤ 1 , H0 is rejected. There is sufficient evidence to indicate the distribution of monthly solar energy levels for north-south oriented highways is shifted above the corresponding distribution for east-west oriented highways at 𝛼 = .05. 15.42

Some preliminary calculations are:

Copyright © 2022 Pearson Education, Inc.


Nonparametric Statistics

Science 0 4 3 1 3 2 4 2 3 4

Math 2 3 0 1 1 3 0 1 1 1

Difference ScienceMath −2 1 3 0 2 −1 4 1 2 3

907

Rank of Absolute Difference 5 2 7.5 eliminate 5 2 9 2 5 7.5 T− = 7 T+ = 38

To determine if there are differences in the levels of family involvement between math and science homework, we test; H0: H a:

The distributions of the science and math levels of family involvement are the same The distributions of the science and math levels of family involvement differ

The test statistic is T− = 7 . The rejection region is 𝑇 ≤ 𝑇 where To corresponds to 𝛼 = .05 (two-tailed) and 𝑛 = 9. From Table XIII, Appendix D, 𝑇 = 6. The rejection region is 𝑇 ≤ 6. Since the observed value of the test statistic does not fall in the rejection region 𝑇 = 7 ≰ 6 , H0 is not rejected. There is insufficient evidence to indicate there are differences in the levels of family involvement between math and science homework at 𝛼 = .05. 15.43

The χ 2 distribution provides an appropriate characterization of the sampling distribution of H if the k sample sizes exceed 5.

15.44

a.

The hypotheses are: H0: The three probability distributions are identical Ha: At least two of the three probability distributions differ in location

b.

230 RA = R A = = 15.333 15 15

RB =

RC 365 = = 24.333 15 15

R=

RC =

RB 440 = = 29.333 15 15

n + 1 45 + 1 = = 23 2 2

The test statistic is: 2 12 H = nj Rj − R  n ( n + 1)

(

=

)

12  2 2 2 15 (15.333 − 23) + 15 ( 29.333 − 23) + 15 ( 24.333 − 23)  = 8.754 45 ( 46 ) 

Copyright © 2022 Pearson Education, Inc.


908

Chapter 15

The rejection region requires α = .05 in the upper tail of the χ 2 distribution with 𝑑𝑓 = 𝑘 − 1 = 3 − 1 = 2. From Table IV, Appendix D, 𝜒.

= 5.99147. The rejection region is 𝐻 > 5.99147.

Since the observed value of the test statistic falls in the rejection region 𝐻 = 8.754 > 5.99147 , H0 is rejected. There is sufficient evidence to indicate that the probability distributions of at least two of the populations A, B, and C, differ in location at 𝛼 = .05.

15.45

c.

The approximate p-value is 𝑃 𝜒 ≥ 8.754 . From Table IV, Appendix D, with df = 2, .01 ≤ 𝑃 𝜒 ≤ 8.754 ≤ .025.

d.

H =

a.

A completely randomized design was used.

b.

The hypotheses are:

2 12 12  230 2 440 2 365 2  Rj − 3 ( n + 1) = + + − 3 ( 46 ) = 146.754 − 138 = 8.754  n ( n + 1) 45 ( 46 )  15 15 15  nj

H0: The three probability distributions are identical Ha: At least two of the three probability distributions differ in location c.

The rejection region requires α =.01in the upper tail of the χ 2 distribution with df = k − 1 = 3 − 1 = 2 . From Table IV, Appendix D, 𝜒. = 9.21034. The rejection region is 𝐻 > 9.21034.

d.

Some preliminary calculations are:

I Observation 66 23 55 88 58 62 79 49

Rank 13 3 10 18 11 12 17 9 R A = 93

93 RA = R A = = 11.625 8 8 RC 105 = = 17.5 6 6 The test statistic is: RC =

II Observation 19 31 16 29 30 33 40

Rank 2 6 1 4 5 7 8 R B = 33

RB =

R=

RB 33 = = 4.714 7 7

n + 1 21 + 1 = = 11 2 2

Copyright © 2022 Pearson Education, Inc.

III Observation 75 96 102 75 98 78

Rank 14.5 19 21 14.5 20 16

RC = 105


Nonparametric Statistics H = =

909

2 12  nj Rj − R n ( n + 1)

(

)

12  2 2 2 8 (11.625 − 11) + 7 ( 4.714 − 11) + 6 (17.5 − 11)  = 13.85  21( 22 )

Since the observed value of the test statistic falls in the rejection region ( H = 13.85 > 9.21034 ) , H0 is rejected. There is sufficient evidence to indicate at least two of the three probability distributions differ in location at α =.01. a.

Using MINITAB, histograms of the 3 groups are: Histogram of Ground, Office, Air Normal Ground

Frequency

15.46

Office

2.0

4

1.5

3

1.0

2

0.5

1

0.0

0

0

5

10

15

20

25

1

2

3

4

5

Ground Mean 11.73 StDev 5.556 N 6 Office Mean 4.37 StDev 1.594 N 10

6

7

Air Mean 3 StDev 0.4743 N 5

8

Air 2.0 1.5 1.0 0.5 0.0

2.0

2.5

3.0

3.5

4.0

None of the data sets look particularly mound-shaped. One of the assumptions of the ANOVA F test is that the data all come from normal distributions. Since the data do not look normal, the F test would not be appropriate. b.

The Kruskal-Wallis test would be appropriate for this data. To determine if the estimated containment times differ among the three groups, we test: H0: The distributions of the containment times for the three groups are identical Ha: At least two of the three probability distributions differ in location

c.

The ranks appear in the table: Ground 7.6 10.8 20.9 15.5 9.7 5.9

Rank 16 19 21 20 18 14.5

R1 = 108.5

Office 5.4 2.8 3.9 5.9 4.3 4.6 2.6 3.3 3.2 7.7

Rank 13 4.5 10 14.5 11 12 2 7 6 17 R 2 = 97

Air 2.5 3.4 2.7 2.8 3.6

Rank 1 8 3 4.5 9

R3 = 25.5

Copyright © 2022 Pearson Education, Inc.


910

Chapter 15

d.

R1 =

R 1 108.5 = = 18.0833 n1 6

R2 =

R2 97 = = 9.7 n2 10

R3 =

R3 25.5 = = 5.1 n3 5

R=

n + 1 21 + 1 = = 11 2 2

The test statistic is 2 12 12  6 (18.0833 − 11) 2 + 10 ( 9.7 − 11) 2 + 5 ( 5.1 − 11) 2  = 12.779 H = nj Rj − R =   n ( n + 1) 21( 21 + 1) 

(

15.47

)

e.

The rejection region requires 𝛼 = .10 in the upper tail of the χ 2 distribution with 𝑑𝑓 = 𝑘 − 1 = 3 − 1 = 2. From Table IV, Appendix D, 𝜒. = 4.60517. The rejection region is 𝐻 > 4.60517.

f.

Since the observed value of the test statistic falls in the rejection region 𝐻 = 12.779 > 4.60517 , H0 is rejected. There is sufficient evidence to indicate that the probability distributions of at least two of the distributions of the containment times for the three groups differ in location at 𝛼 = .10.

a.

To determine if the number of collisions over a 3-year period differ among the 5 road patterns, we test: H0: The five probability distributions are identical Ha: At least two of the five probability distributions differ in location

b.

RB 2, 249.5 = = 74.983 30 30 R 1, 245 RE = E = = 41.5 30 30

R A 3,398 = = 113.267 30 30 R 1, 288.5 RD = D = = 42.95 30 30

RB =

RA =

RC 3,144 = = 104.8 30 30 n + 1 150 + 1 R= = = 75.5 2 2 RC =

The test statistic is: 2 12 H= n j Rj − R  n ( n + 1)

(

)

12 30 (113.267 − 75.5) + 30 ( 74.983 − 75.5) + 30 (104.8 − 75.5)    = 71.53 150 (151) + 30 ( 42.95 − 75.5) 2 + 30 ( 41.5 − 75.5) 2   2

=

c.

2

2

The rejection region requires 𝛼 = .05 in the upper tail of the χ 2 distribution with df = k − 1 = 5 − 1 = 4 . From Table IV, Appendix D, 𝜒. = 9.48773. The rejection region is 𝐻 > 9.48773. Since the observed value of the test statistic falls in the rejection region 𝐻 = 71.53 > 9.48773 , H0 is rejected. There is sufficient evidence to indicate that the probability distributions number of collisions of at least two of the raod patterns differ in location at 𝛼 = .05.

Copyright © 2022 Pearson Education, Inc.


Nonparametric Statistics

15.48

a & b.

The ranked data are: Cage Strength Rank 36.9 10 39.2 19 40.2 25 33.0 3 39.0 18 36.6 8 37.5 11 38.1 15 37.8 13.5 34.9 7 R1 = 129.5

c.

911

Free Strength Rank 31.5 1 39.7 21 37.8 13.5 33.5 5 39.9 22 40.6 28

Barn Strength Rank 40.0 23 37.6 12 39.6 20 40.3 27 38.3 16 40.2 25

Organic Strength Rank 34.5 6 36.8 9 32.6 2 38.5 17 40.2 25 33.2 4

R2 = 90.5

R3 = 123

R 4 = 63

R1 =

R 1 129.5 = = 12.95 n1 10

R2 =

R 123 R2 90.5 = = 15.083 R3 = 3 = = 20.5 6 6 n2 n3

R4 =

R4 63 = = 10.5 6 n4

R=

n + 1 28 + 1 = = 14.5 2 2

The test statistic is 2 12 H =  nj Rj − R n ( n + 1)

(

=

d.

)

12 10 (12.95 − 14.5) 2 + 6 (15.083 − 14.5) 2 + 6 ( 20.5 − 14.5) 2 + 6 (10.5 − 14.5) 2  = 4.996  28 ( 28 + 1) 

To determine if the distributions of penetration strength differ among the 4 housing systems, we test: H0: The four probability distributions are identical Ha: At least two of the four probability distributions differ in location The test statistic is H = 4.996 . Since no α is given, we will use 𝛼 = .05. The rejection region requires 𝛼 = .05 in the upper tail of the χ 2 distribution with df = k − 1 = 4 − 1 = 3 . From Table IV, Appendix D, 𝜒. = 7.81473. The rejection region is 𝐻 > 7.81473. Since the observed value of the test statistic does not fall in the rejection region 𝐻 = 4.996 ≯ 7.81473 , H0 is not rejected. There is insufficient evidence to indicate the distributions of penetration strength differ in location among the 4 housing systems at 𝛼 = .05.

15.49

To determine if there are differences in the distributions of recall percentages for student-drivers in the four groups, we test: H0: The four probability distributions are identical Ha: At least two of the four probability distributions differ in location The test statistic is 𝐻 = 12.846 and the p-value is 𝑝 = .005. Since the p-value is less than 𝛼 𝑝 = .005 < .01 , H0 is rejected. There is sufficient evidence to indicate there are differences in the distributions of recall percentages for student-drivers in the four groups at 𝛼 = .01. Copyright © 2022 Pearson Education, Inc.


912

15.50

Chapter 15

a.

Some preliminary calculations are: VHR Rank -20 4 -56 1 -34 3 0 11 16 16 0 11 -14 5 -7 8.5 -44 2 43 19 -11 7 R1 = 87.5

VRD -12 63 12 -7 29

Rank 6 21 15 8.5 18

Control 51 21 8 0 4

R2 = 68.5

R1 =

R 1 87.5 = = 7.9545 n1 11

R2 =

R2 68.5 = = 13.7 n2 5

R3 =

R3 75 = = 15 5 n3

R=

n + 1 21 + 1 = = 11 2 2

Rank 20 17 14 11 13

R3 = 75

To determine if the distributions of differences in pain intensities differ among the three treatments, we test: H0: The distributions of differences in pain intensities for the three treatments are identical Ha: At least two of the three probability distributions differ in location The test statistic is H =

2 12 12 11( 7.9545 − 11) 2 + 5 (13.7 − 11) 2 + 5 (15 − 11) 2  = 5.675 nj Rj − R =   n ( n + 1) 21( 21 + 1) 

(

)

The rejection region requires 𝛼 = .05 in the upper tail of the χ 2 distribution with df = k − 1 = 3 − 1 = 2 . From Table IV, Appendix D, 𝜒. = 5.99147. The rejection region is 𝐻 > 5.99147. Since the observed value of the test statistic does not fall in the rejection region 𝐻 = 5.675 ≯ 5.99147 , H0 is not rejected. There is insufficient evidence to indicate that the distributions of differences in pain intensities differ among the three treatments differ in location at 𝛼 = .05. b.

Some preliminary calculations are:

Copyright © 2022 Pearson Education, Inc.


Nonparametric Statistics

VHR Rank -20 4 -56 1 -34 3 0 11 16 16 0 11 -14 5 -7 8.5 -44 2 43 19 -11 7 T1 = 87.5

VRD & Control -12 63 12 -7 29 51 21 8 0 4

913

Rank 6 21 15 8.5 18 20 17 14 11 13

T2 = 143.5

To determine if the distributions of differences in pain intensity differ between for the VHR group and the VRD and Control groups combined, we test: H0: The distributions of differences in pain intensity for the VHR group and the VRD and Control groups combined are identical Ha: The distribution of differences in pain intensity for the VHR group is shifted to the right or left of that for the VRD and Control groups combined

The test statistic is z =

n1 ( n1 + n2 + 1) 11(11 + 10 + 1) 87.5 − 2 2 = = −2.36 . 11(10)(11 + 10 + 1) n1 n2 ( n1 + n2 + 1) 12 12

T1 −

The rejection region requires 𝛼/2 = .05/2 = .025 in each tail of the z-distribution. From Table II, Appendix D, 𝑧. 25 = 1.96. The rejection region is 𝑧 < −1.96 or 𝑧 > 1.96. Since the observed value of the test statistic falls in the rejection region 𝑧 = −2.36 < −1.96 , H0 is rejected. There is sufficient evidence to indicate the distributions of differences in pain intensity differ between for the VHR group and the VRD and Control group combined at 𝛼 = .05. 15.51

To determine if the distributions of the self-regulation deficiency scores in the three age groups differ, we test: H0: The three self-regulation deficiency score probability distributions are identical Ha: At least two of the three self-regulation deficiency score distributions differ in location MINITAB was used to create the following output:

Descriptive Statistics AGE B20-25 Over25 Under20 Overall

N

Median

Mean Rank

Z-Value

117 79 33 229

13 12 14

123.8 91.8 139.2 115.0

2.06 -3.85 2.27

Copyright © 2022 Pearson Education, Inc.


914

Chapter 15

Test Null hypothesis Alternative hypothesis

H₀: All medians are equal H₁: At least one median is different

Method

DF

H-Value

P-Value

2 2

16.17 16.28

0.000 0.000

Not adjusted for ties Adjusted for ties

The test statistic is H = 16.17 The p-value is p = 0.000 Since the p-value is less than 𝛼 𝑝 = 0.000 < .01 , H0 is rejected. There is sufficient evidence to indicate that at least two of the three self-regulation deficiency score distributions differ in location at 𝛼 = .01. 15.52

Some preliminary calculations: 1 No Help 5 3 -3 9 6 0 9 5 -2 0 3 0 0 4 5 1 -1 8 1 1 5 0 4 0 -2 0 -3 4 6 5

2 Rank 63 45 1.5 74.5 69.5 15.5 74.5 63 4 15.5 45 15.5 15.5 55 63 25.5 7.5 72.5 25.5 25.5 63 15.5 55 15.5 4 15.5 1.5 55 69.5 63 R1 = 1,134

R1 =

R 1 1134 = = 37.8 n1 30

Check 0 -1 5 3 0 3 0 0 2 1 5 4 -1 3 3 2 4 3 5 8 6 4 3 4 2

3 Rank 15.5 7.5 63 45 15.5 45 15.5 15.5 34 25.5 63 55 7.5 45 45 34 55 45 63 72.5 69.5 55 45 55 34

Complete 3 2 2 3 6 2 -1 -2 2 3 2 1 3 5 1 3 2 0 1 1

R2 = 1, 025.5

R2 =

R2 1025.5 = = 41.02 25 n2

Copyright © 2022 Pearson Education, Inc.

Rank 45 34 34 45 69.5 34 7.5 4 34 45 34 25.5 45 63 25.5 45 34 15.5 25.5 25.5

R3 = 690.5


Nonparametric Statistics

R3 =

R3 690.5 = = 34.525 n3 20

R=

915

n + 1 75 + 1 = = 38 2 2

To determine if the levels of assistance are related to improvement scores, we test: H0: The three probability distributions are identical Ha: At least two of the three improvement distributions differ in location The test statistic is 2 12 12 2 2 2 H =  n j R j − R = 75 ( 75 + 1) 30 ( 37.8 − 38) + 25 ( 41.02 − 38) + 20 ( 34.525 − 38)  = 0.99 n ( n + 1)

(

)

Since no α was given, we will use 𝛼 = .05. The rejection region requires 𝛼 = .05 in the upper tail of the χ 2 distribution with df = 𝑘 − 1 = 3 − 1 = 2. From Table IV, Appendix D, 𝜒. = 5.99147. The rejection region is 𝐻 > 5.99147. Since the observed value of the test statistic does not fall in the rejection region (H = .99 ≯ 5.99147 , H0 is not rejected. There is insufficient evidence to indicate the distributions of improvement scores differ among the 3 assistance groups at 𝛼 = .05. This agrees with the conclusion we arrived at in Exercise 9.30. 15.53

a.

MINITAB was used to create the following histrograms:

None of the authenticy rating distributions appears to satisfy the normal distribution assumption required for the parametric tests. The nonparametric Kruskal-Wallis H-test is appropriate to use. Copyright © 2022 Pearson Education, Inc.


916

Chapter 15

b.

To determine if the distributions of authenticity ratings for the four logo types differ, we test: H0: The four authenticity rating probability distributions are identical Ha: At least two of the four authenticity rating probability distributions differ in location MINITAB was used to create the following output:

Descriptive Statistics LOGO Group 1 2 3 4 Overall

N

Median

Mean Rank

Z-Value

45 45 45 45 180

6.00 7.00 4.67 5.67

95.6 107.0 71.3 88.1 90.5

0.75 2.45 -2.85 -0.35

Test Null hypothesis Alternative hypothesis

H₀: All medians are equal H₁: At least one median is different

Method

DF

H-Value

P-Value

3 3

11.15 11.19

0.011 0.011

Not adjusted for ties Adjusted for ties

The test statistic is H = 11.15 The p-value is p = 0.011 No value of alpha was specified so we choose to use 𝛼 = .05. Since the p-value is less than 𝛼 𝑝 = 0.011 < .05 , H0 is rejected. There is sufficient evidence to indicate that at least two of the four authenticity rating probability distributions differ in location at 𝛼 = .05. 15.54

a.

The number of blocks, b, is 6.

b.

H0: The probability distributions for the four treatments are identical Ha: At least two of the probability distributions differ in location

c.

RA =

RA 11 = = 1.833 b 6

RB =

RB 21 = = 3.5 b 6

RD =

RD 7 = = 1.167 b 6

R=

1 1 ( k + 1) = ( 4 + 1) = 2.5 2 2

RC =

RC 21 = = 3.5 b 6

The test statistic is Fr = =

2 12b  Rj − R k ( k + 1)

(

12 ( 6 )

)

(1.833 − 2.5 ) 2 + ( 3.5 − 2.5 ) 2 + ( 3.5 − 2.5 ) 2 + (1.167 − 2.5 ) 2  = 15.2 

( 4 )( 4 + 1) 

The rejection region requires 𝛼 = .10 in the upper tail of the χ 2 distribution with 𝑑𝑓 = 𝑘 − 1 = 4 − 1 = 3. From Table IV, Appendix D, 𝜒. = 6.25139. The rejection region is 𝐹 > 6.25139. Since the observed value of the test statistic falls in the rejection region 𝐹 = 15. 2 > 6.25139 , H0 is rejected. There is sufficient evidence to indicate a difference in the location of at least two of the four treatments at 𝛼 = .10. Copyright © 2022 Pearson Education, Inc.


Nonparametric Statistics

d.

(

)

(

917

)

The p-value = P Fr ≥ 15.2 = P χ ≥ 15.2 . Using MINTAB, 2

Cumulative Distribution Function Chi-Square with 3 DF x 15.2

P( X <= x ) 0.998347

p = P ( Fr ≥ 15.2 ) = 1 − .998347 = .001653

e.

The test statistic is Fr =

15.55

a.

12 12  R 2j − 3b( k + 1) = 6(4)(4 + 1) 112 + 212 + 212 + 7 2  − 3 ( 6 )( 4 + 1) = 105.2 − 90 = 15.2 bk ( k + 1)

The hypotheses are: H0: H a:

The probability distributions for three treatments are identical At least two of the probability distributions differ in location

b.

The rejection region requires 𝛼 = .10 in the upper tail of the χ 2 distribution with𝑑𝑓 = 𝑘 − 1 = 3 − 1 = 2. From Table IV, Appendix D, 𝜒. = 4.60517. The rejection region is 𝐹 > 4.60517.

c.

Some preliminary calculations are: Block

A 9 13 11 10 9 14 10

1 2 3 4 5 6 7

RA =

R=

Rank 1 2 1 1 2 2 1 R A = 10

RA 10 = = 1.429 b 7

B 11 13 12 15 8 12 12

RB =

Rank 2 2 2.5 2 1 1 2 R B = 12.5

RB 12.5 = = 1.786 b 7

C 18 13 12 16 10 16 15

RC =

Rank 3 2 2.5 3 3 3 3 RC = 19.5

RC 19.5 = = 2.786 b 7

1 1 ( k + 1) = ( 3 + 1) = 2 2 2

The test statistic is Fr =

2 12 ( 7 ) 12b (1.429 − 2 ) 2 + (1.786 − 2 ) 2 + ( 2.786 − 2 ) 2  = 6.93 Rj − R =   3 ( 3 + 1)  k ( k + 1)

(

)

Since the observed value of the test statistic falls in the rejection region 𝐹 = 6.93 > 4.60517 , H0 is rejected. There is sufficient evidence to indicate the effectiveness of the three different treatments differ at 𝛼 = .10.

Copyright © 2022 Pearson Education, Inc.


918

Chapter 15

15.56

R1 = 16

R2 = 7

R3 = 23

R 4 = 14

R1 16 R 7 = = 2.667 R2 = 2 = = 1.167 6 b b 6 1 1 R = ( k + 1) = ( 4 + 1) = 2.5 2 2

R1 =

R3 =

R3 23 = = 3.833 b 6

R4 =

R4 14 = = 2.333 6 b

To determine if at least two of the treatment probability distributions differ in location, we test: H0: The probability distributions of the four treatments are identical Ha: At least two of the probability distributions differ in location The test statistic is Fr = =

2 12b Rj − R  k ( k + 1)

(

)

12 ( 6 ) ( 2.667 − 2.5) 2 + (1.167 − 2.5) 2 + ( 3.833 − 2.5) 2 + ( 2.333 − 2.5) 2  = 12.99  4 ( 4 + 1) 

The rejection region requires 𝛼 = .05 in the upper tail of the χ 2 distribution with 𝑑𝑓 = 𝑘 − 1 = 4 − 1 = 3. From Table IV, Appendix D, 𝜒. = 7.81473. The rejection region is 𝐹 > 7.81473. Since the observed value of the test statistic falls in the rejection region 𝐹 = 12.99 > 7.81473 , reject H0. There is sufficient evidence to indicate a difference in the location for at least two of the probability distributions at 𝛼 = .05. 15.57

15.58

a.

The data are not independent. Each subject had measurements for each of the 4 time segments. Thus, the data are blocked. It would realistic to assume that the data might not be normally distributed, so the use of a nonparametric test would be appropriate. Thus, the Friedman test would be appropriate.

b.

The test statistic is 𝐹 = 14.37 and the p-value is 𝑝 = .002.

c.

Since the p-value is less than 𝛼 𝑝 = .002 < .01 , H0 is rejected. There is sufficient evidence to indicate the distribution of walking times differ for the four time segments at 𝛼 = .01.

a.

The Friedman test will allow us to compare the distributions of hand pull force for the four patienttransfer devices.

b.

Some preliminary calculations are:

Copyright © 2022 Pearson Education, Inc.


Nonparametric Statistics

919

Caregiver

DRAW

Rank

REPOS

Rank

SLIDE

Rank

AIR

Rank

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

127 130 118 120 133 123 114 125 135 119 135 144 134 123 130 131 141 137 127 128

4 3 4 4 4 4 4 4 3 3 3 4 4 4 4 4 4 4 3 4

126 132 109 111 117 118 110 123 137 115 140 136 118 115 113 130 132 124 128 120

3 4 3 3 3 3 3 3 4 2 4 3 3 3 3 3 3 3 4 3

103 96 86 95 113 92 100 99 117 120 119 113 105 108 99 100 109 96 103 105

2 2 2 2 2 2 2 2 2 4 2 2 2 2 2 2 2 2 2 2

40 47 30 35 34 41 43 44 41 42 46 54 41 40 45 42 54 44 43 40

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

RDraw = 75

c.

𝑅

=

=

𝑅

=

𝑅=

𝑘+1 =

=

= 3.75

= 2.1

RRepos = 63

𝑅

=

𝑅

=

RSlide = 42

=

=

RAir = 20

= 3.15

= 1.0

4 + 1 = 2.5

The test statistic is 𝐹 =

15.59

∑ 𝑅 −𝑅

=

3.75 − 2.5

+ 3.15 − 2.5

+ 2.1 − 2.5

+ 1.0 − 2.5

= 52.74

d.

The rejection region requires 𝛼 = .05 in the upper tail of the χ 2 distribution with 𝑑𝑓 = 𝑘 − 1 = 4 − 1 = 3. From Table IV, Appendix D, 𝜒. = 7.81473. The rejection region is 𝐹 > 7.81473.

e.

Since the observed value of the test statistic falls in the rejection region 𝐹 = 52.74 > 7.81473 , H0 is rejected. There is sufficient evidence to indicate the the distributions of hand pull force for the four patient-transfer devices differ at 𝛼 = .05. This conclusion agrees with the parametric test we conducted back in Chapter 9.

a.

From the printout, the rank sums are 23 (before), 32 (after 2 months), and 35 (after 2 days).

b.

R1 =

R1 23 = = 1.533 b 15

R2 =

R2 32 = = 2.133 b 15

R3 =

R3 35 = = 2.333 b 15

Copyright © 2022 Pearson Education, Inc.


920

Chapter 15

R=

1 1 ( k + 1) = ( 3 + 1) = 2 2 2

Fr =

2 12 (15 ) 12b 2 2 2  R j − R = 3 ( 3 + 1) (1.533 − 2 ) + ( 2.133 − 2 ) + ( 2.333 − 2 )  = 5.2 k ( k + 1)

(

)

c.

From the printout, the test statistic is 𝐹 = 5.20 and the p-value is 𝑝 = .074.

d.

To determine if the distributions of the competence levels differ in location among the 3 time periods, we test: H0: The probability distributions of the three sampled populations are the same Ha: At least two of the distributions of the competence levels differ in location The test statistic is 𝐹 = 5.20 and the p-value is 𝑝 = .074. Since the p-value is not small, we would not reject H0 for any values of 𝛼 < .074. There is insufficient evidence to indicate the distributions of the competence levels differ in location among the 3 time periods for 𝛼 < .074. If we use 𝛼 = .10, then we would reject H0.

15.60

a.

To determine if the distributions of the taste ratings differ in location among the five food/beverage items, we test: H0: The probability distributions for the five food/beverage items are identical Ha: At least two of the probability distributions differ in location

b.

For the gLMS group, the test statistic is 𝐹 = 332.5733 and the p-value is 𝑝 < .0001. Since the pvalue is so small, H0 is rejected. There is sufficient evidence to indicate the distributions of the taste ratings differ in location among the five food/beverage items for any reasonable value of 𝛼. For the Hedonic 9-point scale group, the test statistic is 𝐹 = 546.7132 and the p-value is 𝑝 < .0001. Since the p-value is small, H0 is rejected. There is sufficient evidence to indicate the distributions of the taste ratings differ in location among the five food/beverage items for any reasonable value of 𝛼.

15.61

Using MINITAB, the results of analyzing the data using Friedman’s test are: Friedman Test: Score versus Item blocked by Review S = 29.11 DF = 10 P = 0.001 S = 31.39 DF = 10 P = 0.001 (adjusted for ties)

Item 1 2 3 4 5 6 7 8 9 10 11

N 5 5 5 5 5 5 5 5 5 5 5

Est Median 3.500 2.500 3.864 3.591 2.455 3.591 3.500 3.227 2.636 1.091 1.045

Sum of Ranks 40.0 28.5 46.5 40.0 23.0 41.0 37.0 32.0 23.5 9.5 9.0

Grand median = 2.818

Copyright © 2022 Pearson Education, Inc.


Nonparametric Statistics

921

To determine if the distributions of the 11 item scores are different, we test: H0: The distributions of the 11 item scores are identical Ha: At least two of the distributions of the item scores differ in location From the printout, the test statistic is 𝐹 = 29.11 and the p-value is 𝑝 = .001. Since the p-value is so small, H0 is rejected for any reasonable value of 𝛼. There is sufficient evidence to indicate that the distributions of the 11 item scores differ in location at any reasonable value of 𝛼. 15.62

Using MINITAB, the results of analyzing the data using Friedman’s test are: Descriptive Statistics for RangeCOP Condition N A 24 B 24 C 24 Overall 72

Median 23.1500 20.2167 19.5333 20.9667

Sum of Ranks 71.5 38.0 34.5

Test

Null hypothesis Alternative hypothesis Method Not adjusted for ties Adjusted for ties

H₀: All treatment effects are zero H₁: Not all treatment effects are zero DF Chi-Square P-Value 2 34.77 0.000 2 35.51 0.000

To determine if the distributions of the COP ranges for all three conditions are different, we test: H0: The distributions of the COP ranges are identical for all three conditions Ha: At least two of the distributions of the COP ranges differ in location From the printout, the test statistic is 𝐹 = 34.77 and the p-value is 𝑝 = .000. Since the p-value is so small, H0 is rejected for any reasonable value of 𝛼. There is sufficient evidence to indicate that the distributions of the COP ranges differ in location at any reasonable value of 𝛼. 15.63

Some preliminary calculations are: Student

1 2 3 4 5 6 7 8 9 10

R1 =

Rank Live Plant 1 2 3 1 2 3 2 1 2 2 R1 = 19

R 1 19 = = 1.9 n1 10

R2 =

Rank Plant Photo 2 3 2 2 3 2 1 3 1 1 R2 = 20

R2 20 = =2 n2 10

R3 =

Rank No Plant 3 1 1 3 1 1 3 2 3 3 R 3 = 21

R3 21 = = 2.1 n3 10

R=

Copyright © 2022 Pearson Education, Inc.

k +1 3 +1 = =2 2 2


922

Chapter 15

To determine if the students' finger temperatures depend on the experimental conditions, we test: H0: The probability distributions of finger temperatures are the same for the three conditions Ha: At least two probability distributions of finger temperatures differ in location The test statistic is Fr =

2 12 (10 ) 12b 2 2 2  R j − R = 3 ( 3 + 1) (1.9 − 2 ) + ( 2 − 2 ) + ( 2.1 − 2 )  = 0.2 k ( k + 1)

(

)

Since no α was given, we will use 𝛼 = .05. The rejection region requires 𝛼 = .05 in the upper tail of the χ 2 distribution with𝑑𝑓 = 𝑘 − 1 = 3 − 1 = 2. From Table IV, Appendix D, 𝜒. = 5.99147. The rejection region is 𝐹 > 5.99147. Since the observed value of the test statistic does not fall in the rejection region 𝐹 = 0.2 ≯ 5.99147 , H0 is not rejected. There is insufficient evidence to indicate that the students' finger temperatures depend on the experimental conditions at 𝛼 = .05. Because the value of the test statistic is so small, H0 would not be rejected for any reasonable value of α. 15.64

Some preliminary calculations are:

R1 =

Row 1 2 3

Rank Standard 3 3 3 R1 = 9

R1 9 = =3 n1 3

R2 =

Rank Supervent 1 1 2 R2 = 4

R2 4 = = 1.333 3 n2

R3 =

Rank Ecopack 2 2 1 R3 = 5

R3 5 = = 1.667 n3 3

R=

k +1 3 +1 = =2 2 2

To determine if the distributions of the half-cooling times differ in location among the three designs, we test: H0: The probability distributions of the half-cooling times are the same for the three designs Ha: At least two probability distributions of half-cooling times differ in location The test statistic is Fr =

2 12 ( 3 ) 12b 2 2 2  R j − R = 3 ( 3 + 1) ( 3 − 2 ) + (1.333 − 2 ) + (1.667 − 2 )  = 4.67 k ( k + 1)

(

)

The rejection region requires 𝛼 = .10 in the upper tail of the χ 2 distribution with 𝑑𝑓 = 𝑘 − 1 = 3 − 1 = 2. From Table IV, Appendix D, 𝜒. = 4.60517. The rejection region is 𝐹 > 4.60517. Since the observed value of the test statistic falls in the rejection region 𝐹 = 4.67 > 4.60517 , H0 is rejected. There is sufficient evidence to indicate the distributions of the half-cooling times differ in location among the three designs at 𝛼 = .10.

Copyright © 2022 Pearson Education, Inc.


Nonparametric Statistics

15.65

923

Some preliminary calculations are: Subject 1 2 3 4 5 6 7 8 9 10

R1 =

Rank Standard 1 1 1.5 2 2 1 2 1 1 1 R1 = 13.5

R 1 13.5 = = 1.35 10 n1

R2 =

Rank Supervent 2 3 1.5 1 1 2 3 2 2 3 R2 = 20.5 R2 20.5 = = 2.05 10 n2

Rank Ecopack 3 2 3 3 3 3 1 3 3 2 R3 = 26 R3 =

R3 26 = = 2.6 n3 10

R=

k +1 3 +1 = =2 2 2

To determine if the distributions of job suitability scores differ for the three candidates, we test: H0: The probability distributions of job suitability scores are the same for the three candidates Ha: At least two probability distributions of job suitability scores differ in location The test statistic is Fr =

2 12 (10 ) 12b (1.35 − 2 ) 2 + ( 2.05 − 2 ) 2 + ( 2.6 − 2 ) 2  = 7.85 Rj − R =   3 ( 3 + 1)  k ( k + 1)

(

)

The rejection region requires 𝛼 = .01 in the upper tail of the χ 2 distribution with 𝑑𝑓 = 𝑘 − 1 = 3 − 1 = 2. From Table IV, Appendix D, 𝜒. = 9.21034. The rejection region is 𝐹 > 9.21034. Since the observed value of the test statistic does not fall in the rejection region 𝐹 = 7.85 ≯ 9.21034 , H0 is not rejected. There is insufficient evidence to indicate the distributions of job suitability scores differ for the three candidates at 𝛼 = .01. 15.66

15.67

a.

For 𝑛 = 22, P ( rs > .508 ) = .01

b.

For 𝑛 = 28, P ( rs > .448 ) = .01

c.

For 𝑛 = 10, P ( rs ≤ .6 4 8 ) = 1 − .0 2 5 = .9 7 5

d.

For 𝑛 = 8, P ( rs < − .738 or rs > .738 ) = 2 (.025 ) = .05

a.

From Table XIV with 𝑛 = 10, 𝑟 , = 𝑟 ,. 25 = .648. The rejection region is 𝑟 < −.648 or 𝑟 > .648.

15.68

b.

From Table XIV with 𝑛 = 20, 𝑟 , = 𝑟 ,. 25 = .450. The rejection region is 𝑟 > .450.

c.

From Table XIV with 𝑛 = 30, 𝑟 , = 𝑟 ,.

a.

= .432. The rejection region is 𝑟 < − .432.

H 0 : ρs = 0 H a : ρs ≠ 0

Copyright © 2022 Pearson Education, Inc.


924

Chapter 15

b.

The test statistic is rs = x 0 3 0 −4 3 0 4

Rank, u 3 5.5 3 1 5.5 3 7

SS uv SS uu SS vv

y 0 2 2 0 3 1 2

 u = 28

Rank, v 1.5 5 5 1.5 7 3 5

 v = 28

u2 9 30.25 9 1 30.25 9 49

v2 2.25 25 25 2.25 49 9 25

 u = 137.5

 v = 137.5

2

2

SS uv = 

SSuu = 

SSvv = 

( v) = 137.5 − ( 28) = 25.5 v −

rs =

7

2

2

2

n

7

Reject H0 if 𝑟 < −𝑟 , / or 𝑟 > 𝑟 ,

/

 uv = 131

( u) = 137.5 − ( 28) = 25.5 u − 2

(  u )(  v ) = 131 − 28 ( 28) = 19 uv − n

uv 4.5 27.5 15 1.5 38.5 9 35

2

2

n

19

( 25.5)( 25.5)

7

= .745

where 𝛼/2 = .05/2 = .025 and 𝑛 = 7:

Reject H0 if 𝑟 < −.786 or 𝑟 > .786 (from Table XIV, Appendix D). Since the observed value of the test statistic does not fall in the rejection region, 𝑟 = .745 ≯ . 786 , H0 is not rejected. There is insufficient evidence to indicate x and y are correlated at 𝛼 = .05.

15.69

c.

where 𝛼/2 = .025 The p-value is 𝑃 𝑟 ≥ .745 + 𝑃 𝑟 ≤ −.745 . For 𝑛 = 7, 𝑟 = .745 is above 𝑟 ,. and below 𝑟 ,. where 𝛼/2 = .05. Therefore, 2 . 025 = .05 < 𝑝 − value < 2 . 05 = .10.

d.

The assumptions of the test are that the samples are randomly selected and the probability distributions of the two variables are continuous.

Since there are no ties, we will use the shortcut formula. a.

Some preliminary calculations are: x Rank (ui)

y Rank (vi)

3 5 2 1 4

2 4 5 1 3

di = ui − vi 1 1 −3 0 1

d i2

1 1 9 0 1 Total = 12

Copyright © 2022 Pearson Education, Inc.


Nonparametric Statistics

rs = 1 −

6 di2

(

2

)

n n −1

6 (12)

= 1−

(

= 1 − .6 = .4

)

5 52 − 1

b. x Rank (ui)

y Rank (vi)

di = ui − vi

2 3 4 5 1

3 4 2 1 5

−1 −1 2 4 −4

6 di2

rs = 1 −

(

2

)

n n −1

= 1−

6 ( 38)

(

)

5 52 − 1

d i2

1 1 4 16 16 Total = 38

= 1 − 1.9 = −.9

c. x Rank (ui)

y Rank (vi)

1 4 2 3

2 1 3 4

6 di2

rs = 1 −

(

)

2

n n −1

= 1−

6 (12)

(

)

4 42 − 1

di = ui − vi

d i2

−1 3 −1 −1

1 9 1 1 Total = 12

= 1 − 1.2 = −.2

d. x Rank (ui)

y Rank (vi)

2 5 4 3 1

1 3 5 2 4

rs = 1 −

6 di2

(

2

)

n n −1

= 1−

6 (16)

(

di = ui − vi 1 2 −1 1 −3

)

5 52 − 1

d i2

1 4 1 1 9 Total = 16

= 1 − .8 = .2

Copyright © 2022 Pearson Education, Inc.

925


926

15.70

Chapter 15

a,b&c. The ranks of the pivot pin-top distance and farmer’s work experience along with the differences in the randks are shown in the table. Farmer

Exp

Rank

Distance

Rank

d

d2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

22 47 17 9 1 20 1 18 28 1 7 41 18 9 26 16 53 49 35 24

12 18 8 5.5 2 11 2 9.5 15 2 4 17 9.5 5.5 14 7 20 19 16 13

527 553 485 596 354 713 736 762 596 381 374 518 855 404 654 683 774 823 426 440

9 10 7 11.5 1 15 16 17 11.5 3 2 8 20 4 13 14 18 19 5 6

3 8 1 -6 1 -4 -14 -7.5 3.5 -1 2 9 -10.5 1.5 1 -7 2 0 11 7

9 64 1 36 1 16 196 56.25 12.25 1 4 81 110.25 2.25 1 49 4 0 121 49

∑ 𝑑 = 814 Spearman’s rank correlation coefficient is 𝑟 = 1 − d.

=1−

= 1 − .6120 = .3880.

To determine if there is positive rank correlation between the pivot pin-top distance and a farmer’s working experience, we test: H 0 : ρs = 0 H a : ρs > 0

The test statistic is 𝑟 = .3880. Reject H0 if 𝑟 > 𝑟 , where 𝛼 = .10 and 𝑛 = 20: Reject H0 if 𝑟 > .378 (from an online table). Since the observed value of the test statistic does fall in the rejection region 𝑟 = .3880 > .378 , H0 is rejected. There is sufficient evidence to indicate that the pivot pin-top distance and a farmer’s working experience are positively rank correlated at 𝛼 = .10. 15.71

a,b&c. The ranks of Vote share and Charisma difference along with the differences in the ranks are shown in the table: Copyright © 2022 Pearson Education, Inc.


Nonparametric Statistics

Year 1916 1920 1924 1928 1932 1936 1940 1944 1948 1952 1956 1960 1964 1968 1972 1976 1980 1984 1988 1992 1996 2000 2004 2008

Vote Share 51.68 36.15 41.74 41.24 59.15 62.23 54.98 53.78 52.32 44.71 42.91 50.09 61.2 49.43 38.21 51.05 44.84 40.88 46.17 53.62 54.74 50.26 48.77 53.69

Rank 15 1 5 4 22 24 21 19 16 7 6 12 23 11 2 14 8 3 9 17 20 13 10 18

Charisma -6 -60 18 4 -22 -14.5 -7.5 -10 -10.5 2 0 -34.5 -29 -20.5 -6 14 -12 -42 -34 18.5 -16.5 -2.5 48.5 13.5

Rank 14.5 1 22 19 6 9 13 12 11 18 17 3 5 7 14.5 21 10 2 4 23 8 16 24 20

927

d2 0.25 0 289 225 256 225 64 49 25 121 121 81 324 16 156.25 49 4 1 25 36 144 9 196 4

d 0.5 0 -17 -15 16 15 8 7 5 -11 -11 9 18 4 -12.5 -7 -2 1 5 -6 12 -3 -14 -2

𝑑 = 2,420.5

15.72

=1−

,

.

d.

Spearman’s rank correlation coefficient is 𝑟 = 1 −

= 1 − 1.053 = −.052.

a.

One of the assumptions of the Pearson correlation coefficient is that the sample is drawn from a normal distribution. For both hg-index scores and the number of citations, the data looked to be skewed to the right. There are a few very large numbers in each variable.

b&c. The ranks of the hg-index scores and the number of citations are in the table:

Copyright © 2022 Pearson Education, Inc.


928

Chapter 15

hg-index 81.47 62.10 48.64 36.96 34.63 29.65 29.23 28.02 26.76 25.76 23.52 22.55 22.30 19.42 7.58 17.82 14.75 12.92 11.90 19.85

Rank, u 20 19 18 17 16 15 14 13 12 11 10 9 8 6 1 5 4 3 2 7

Citations 12,583 9,420 7,139 2,900 2,866 2,213 2,051 2,092 1,276 1,327 1,520 1,497 951 891 101 642 459 283 255 1,439

Rank, v 20 19 18 17 16 15 13 14 8 9 12 11 7 6 1 5 4 3 2 10

Difference, di 0 0 0 0 0 0 1 -1 4 2 -2 -2 1 0 0 0 0 0 0 -3

d i2

0 0 0 0 0 0 1 1 16 4 4 4 1 0 0 0 0 0 0 9

d = 40 2 i

d.

Since there are no ties in the ranks, we can use the shortcut formula. The differences between the ranks for each pair appear in the table above, along with the squared difference.

rs = 1 −

e.

6 di2

(

2

)

n n −1

= 1−

6 ( 40)

(

)

20 202 − 1

= .970

To determine if the true rank correlation in the population of all marketing journals is greater than 0, we test: H 0 : ρs = 0 H a : ρs > 0

The test statistic is 𝑟 = .970. Reject H0 if 𝑟 > 𝑟 , where 𝛼 = .01 and 𝑛 = 20. Reject H0 if 𝑟 > .534 (from Table XIV, Appendix D). Since the observed value of the test statistic falls in the rejection region 𝑟 = .970 > .534 , H0 is rejected. There is sufficient evidence to indicate the true rank correlation in the population of all marketing journals is greater than 0 at 𝛼 = .01. 15.73

a.

Navigability: 𝑟 = .179. Since this value is close to 0, there is a very weak positive rank correlation between the ranks of organizational internet use and the ranks of navigability. Transactions: 𝑟 = .334. Since this value is relatively close to 0, there is a weak positive rank correlation between the ranks of organizational internet use and the ranks of transactions.

Copyright © 2022 Pearson Education, Inc.


Nonparametric Statistics

929

Locatability: 𝑟 = .590. Since this value is about half way between 0 and 1, there is a moderate positive rank correlation between the ranks of organizational internet use and the ranks of locatability. Information Richness: 𝑟 = −.115. Since this value is close to 0, there is a very weak negative rank correlation between the ranks of organizational internet use and the ranks of information richness. Number of files: 𝑟 = .114. Since this value is close to 0, there is a very weak positive rank correlation between the ranks of organizational internet use and the ranks of number of files.

b.

For each indicator, we will test: H 0 : ρs = 0 H a : ρs > 0

Navigability: p-value = 𝑝 = .148. Since the p-value is greater than 𝛼 = .10, H0 is not rejected. There is insufficient evidence to indicate a positive rank correlation between organizational internet use and navigability. Transactions: p-value = 𝑝 = .023. Since the p-value is less than 𝛼 = .10, H0 is rejected. There is sufficient evidence to indicate a positive rank correlation between organizational internet use and transactions. Locatability: p-value = 𝑝 = .000. Since the p-value is less than 𝛼 = .10, H0 is rejected. There is sufficient evidence to indicate a positive rank correlation between organizational internet use and locatability. Information Richness: p-value = 𝑝 = .252. Since the p-value is greater than 𝛼 = .10, H0 is not rejected. There is insufficient evidence to indicate a positive rank correlation between organizational internet use and information richness. Number of files: p-value = 𝑝 = .255. Since the p-value is greater than 𝛼 = .10, H0 is not rejected. There is insufficient evidence to indicate a positive rank correlation between organizational internet use and number of files.

15.74

a.

Some preliminary calculations are:

Obs

PER

Rank, u

AngerHos

Rank, v

u2

v2

uv

1 2 3 4 5 6 7 8 9 10

5 4 2 3 6 4 7 4 3 3

8 6 1 3 9 6 10 6 3 3

4 3 3 2 4 4 6 3 4 1

7.5 4 4 2 7.5 7.5 10 4 7.5 1

64 36 1 9 81 36 100 36 9 9

56.25 16 16 4 56.25 56.25 100 16 56.25 1

60 24 4 6 67.5 45 100 24 22.5 3

𝑣 = 55

𝑢 = 381

𝑣 = 378

𝑢 𝑣 = 356

𝑢 = 55

Copyright © 2022 Pearson Education, Inc.


930

Chapter 15 SS

= ∑ 𝑢𝑣 −

SS

= ∑𝑢 −

= 381 −

SS

𝑟 =

15.75

SS

SS

= 356 −

.

=

.

.

= 53.5 = 78.5

SS

= ∑𝑣 −

= 378 −

= 75.5

= .6949

b.

MINITAB was used to calculate Spearman’s rank correlation coefficient and found 𝑟 = .7343.

c.

Since the correlation coefficient is a large positive number, we can conclude that there is a strong positive linear relationship between the level of the exploitative relationship and hostility toward an organization. We need to be careful, however, in suggesting that one of the variables causes the other.

Some preliminary calculations are: Punish 0 1 2 3 4 5 6 8 10 12 14 16 17

Rank, u 1 2 3 4 5 6 7 8 9 10 11 12 13

Payoff 0.50 0.20 0.30 0.25 0.00 0.30 0.10 -0.20 0.15 -0.30 -0.10 -0.20 -0.25

 u = 91 SS uv =  uv −

Rank, v 13 9 11.5 10 6 11.5 7 3.5 8 1 5 3.5 2

u2 1 4 9 16 25 36 49 64 81 100 121 144 169

v2 169 81 132.25 100 36 132.25 49 12.25 64 1 25 12.25 4

uv 13 18 34.5 40 30 69 49 28 72 10 55 42 26

 v = 91

u = 819

 v = 818

 uv = 486.5

rs =

n

13

( u) = 819 − ( 91) = 182 u − 2

2

SSuv SSuuSSvv

2

(  u )(  v ) = 486.5 − 91 ( 91) = −150.5 2

SSuu = 

2

n

=

13

−150.5

(182)(181)

(  v) = 818 − ( 91) = 181 v − 2

SSvv = 

2

2

= −.829

To determine if “punishers tend to have lower payoffs”, we test: H 0 : ρs = 0 H a : ρs < 0

The test statistic is 𝑟 = −.829 Since no 𝛼 was given, we will use 𝛼 = .05. Reject H0 if 𝑟 < −𝑟 , where 𝛼 = .05 and 𝑛 = 13. Copyright © 2022 Pearson Education, Inc.

n

13


Nonparametric Statistics

931

Reject H0 if 𝑟 < −.475 (from Table XIV, Appendix D) Since the observed value of the test statistic falls in the rejection region 𝑟 = −.829 < −.475 , H0 is rejected. There is sufficient evidence to indicate “punishers tend to have lower payoffs” at 𝛼 = .05. 15.76

Some preliminary calculations are: School A B C D E F G H I J K L M N O Q R S P T U V W X Y Z AA BB CC DD EE FF GG HH II JJ KK

Inventory, u 100.0 95.5 90.6 77.8 66.7 64.5 62.5 55.1 54.3 54.3 53.8 53.7 52.9 52.0 51.5 50.0 50.0 50.0 50.0 46.3 44.2 43.8 43.5 42.2 41.3 40.7 39.0 38.5 35.8 32.4 29.2 28.9 27.8 25.0 25.0 7.7 6.3

Rank, u 37.0 36.0 35.0 34.0 33.0 32.0 31.0 30.0 28.5 28.5 27.0 26.0 25.0 24.0 23.0 20.5 20.5 20.5 20.5 18.0 17.0 16.0 15.0 14.0 13.0 12.0 11.0 10.0 9.0 8.0 7.0 6.0 5.0 3.5 3.5 2.0 1.0

Checklist, v 100.0 66.7 62.5 71.4 66.7 50.0 50.0 63.6 58.3 55.6 58.3 58.3 66.7 54.5 50.0 100.0 66.7 62.5 50.0 60.0 70.0 50.0 60.0 40.0 54.5 55.6 55.6 42.9 44.4 50.0 50.0 45.5 50.0 100.0 12.5 66.7 66.7

Rank, v 36 29.5 24.5 34 29.5 9.5 9.5 26 20 17 20 20 29.5 14.5 9.5 36 29.5 24.5 9.5 22.5 33 9.5 22.5 2 14.5 17 17 3 4 9.5 9.5 5 9.5 36 1 29.5 29.5

u2 1369 1296 1225 1156 1089 1024 961 900 812.25 812.25 729 676 625 576 529 420.25 420.25 420.25 420.25 324 289 256 225 196 169 144 121 100 81 64 49 36 25 12.25 12.25 4 1

Copyright © 2022 Pearson Education, Inc.

v2 1296 870.25 600.25 1156 870.25 90.25 90.25 676 400 289 400 400 870.25 210.25 90.25 1296 870.25 600.25 90.25 506.25 1089 90.25 506.25 4 210.25 289 289 9 16 90.25 90.25 25 90.25 1296 1 870.25 870.25

uv 1332 1062 857.5 1156 973.5 304 294.5 780 570 484.5 540 520 737.5 348 218.5 738 604.75 502.25 194.75 405 561 152 337.5 28 188.5 204 187 30 36 76 66.5 30 47.5 126 3.5 59 29.5


932

Chapter 15

u

u = 703 SS uv =  uv −

rs =

n

= 17,508

= 14,784.75

37

( u) = 17,569 − ( 703) = 4,212 u − 2

n

SSuu SSvv

uv

(  u )(  v ) = 14, 784.75 − 703 ( 703) = 1, 427.75

2

SSuv

v

2

v = 703 = 17,569

2

SSuu = 

2

=

37

1,427.75

( 4,212)( 4,151)

( v) = 17,508 − ( 703) = 4,151 v − 2

SSvv = 

2

2

n

37

= .341 .

To determine if there is positive rank correlation between the percentage determined using the inventory method and the percentage found using the checklist method, we test: H 0 : ρs = 0 H a : ρs > 0

The test statistic is 𝑟 = .341. Reject H0 if 𝑟 > 𝑟 , where 𝛼 = .05 and 𝑛 = 37. Reject H0 if 𝑟 > .305 (from Table XIV, Appendix D. The largest n in the table is 𝑛 = 30. However, as n gets larger, the critical value gets smaller. Thus, the actual critical value is less than .305. If we reject H0 with 𝑛 = 30, we will reject H0 for 𝑛 = 37. Since the observed value of the test statistic falls in the rejection region 𝑟 = .341 > .305 , H0 is rejected. There is sufficient evidence to indicate there is positive rank correlation between the percentage determined using the inventory method and the percentage found using the checklist method at 𝛼 = .05. 15.77

Some preliminary calculations are: Year 1974 1975 1977 1978 1978 1979 1981 1987 1989 1998 2005 2012

R-Year 1 2 3 4.5 4.5 6 7 8 9 10 11 12

Cost 40 28 13 26 30 16 20 29 31 55 105 80

R-Cost 9 5 1 4 7 2 3 6 8 10 12 11

d -8 -3 2 0.5 -2.5 4 4 2 1 0 -1 1

d2 64 9 4 0.25 6.25 16 16 4 1 0 1 1

d =122.5 2 i

Copyright © 2022 Pearson Education, Inc.


Nonparametric Statistics

rs = 1 −

6 di2

(

2

)

n n −1

= 1−

6 (122.5)

(

)

12 122 − 1

933

= 1 − .428 = .572

There is a positive linear relationship between the ranks of the year and the ranks of the estimates annual cost. As the year increases, the cost tends to increase. 15.78

For the correlation between perceived hedonic intensity and perceived sensoriy intensity for favorite foods, 𝑟 = .3736. This implies that perceived hedonic intensity and perceived sensoriy intensity have a positive linear relationship. Because the value of r is not very large, the relationship is fairly weak. For the correlation between perceived hedonic intensity and perceived sensoriy intensity for least favorite foods, 𝑟 = −.4025. This implies that perceived hedonic intensity and perceived sensoriy intensity have a negative linear relationship. Because the value of r is not very large, the relationship is fairly weak. Yes. The correlation between perceived hedonic intensity and perceived sensoriy intensity for favorite foods is positive, while the correlation between perceived hedonic intensity and perceived sensoriy intensity for least favorite foods is negative. Thus, those with the highest perceived sensoriy intensity scores tend to have the highest perceived hedonic intensity scores for favorite foods and the lowest perceived hedonic intensity scores for least favorite foods.

15.79

a.

Some preliminary calculations are: x 5.2 5.5 6.0 5.9 5.8 6.0 5.8 5.6 5.6 5.9 5.4 5.6 5.8 5.5 5.3 5.3 5.7 5.5 5.7 5.3 5.9 5.8 5.8 5.9

u 1 7 23.5 20.5 16 23.5 16 10 10 20.5 5 10 16 7 3 3 12.5 7 12.5 3 20.5 16 16 20.5

 u = 300

SS uv =  uv −

y 220 227 259 210 224 215 231 268 239 212 410 256 306 259 284 383 271 264 227 263 232 220 246 241

v 4.5 7.5 15.5 1 6 3 9 19 11 2 24 14 22 15.5 21 23 20 18 7.5 17 10 4.5 13 12

u-sq 1 49 552.25 420.25 256 552.25 256 100 100 420.25 25 100 256 49 9 9 156.25 49 156.25 9 420.25 256 256 420.25

 v = 300

 u = 4878 2

uv 4.5 52.5 364.25 20.5 96 70.5 144 190 110 41 120 140 352 108.5 63 69 250 126 93.75 51 205 72 208 246

 v = 4898.5  uv = 3197.5

(  u )(  v ) = 3197.5 − 300 ( 300 ) = −552.5 n

v-sq 20.25 56.25 240.25 1 36 9 81 361 121 4 576 196 484 240.25 441 529 400 324 56.25 289 100 20.25 169 144

24

Copyright © 2022 Pearson Education, Inc.

2


934

Chapter 15

(  u) = 4878 − ( 300) = 1128 u − 2

SSuu = 

n

( v) = 4898.5 − ( 300) = 1148.5 2

SSvv =  v2 −

2

2

n

rs =

24

2

24

SS uv SS uu SS vv

=

−552.5

(1128)(1148.5)

= −.4854

Since the magnitude of the correlation coefficient is not particularly large, there is a fairly weak negative rank correlation between sweetness index and pectin. b.

To determine if there is a negative association between the sweetness index and the amount of pectin, we test: H 0 : ρs = 0 H a : ρs < 0

The test statistic is 𝑟 = −.4854. Reject H0 if 𝑟 < −𝑟 , where 𝛼 = .01 and 𝑛 = 24. Reject H0 if 𝑟 < −.485 (from Table XIV, Appendix D) Since the observed value of the test statistic falls in the rejection region 𝑟 = −.4854 < −.485 , H0 is rejected. There is sufficient evidence to indicate there is a negative association between the sweetness index and the amount of pectin at 𝛼 = .01. 15.80

The appropriate test for this completely randomized design is the Kruskal-Wallis H-test. Some preliminary calculations are: Sample 1 18 32 43 15 63

Rank 4.5 6 9 3 12

Sample 2 12 33 10 34 18

R1 = 34.5

R1 =

R 1 34.5 = = 6.9 n1 5

Rank 2 7 1 8 4.5

Sample 3 87 53 65 50 64 77

R2 = 22.5

R2 =

R2 22.5 = = 4.5 n2 5

R3 =

Rank 16 11 14 10 13 15 R3 = 79

R3 79 = = 13.167 n3 6

To determine whether at least two of the populations differ in location, we test: H0: The three probability distributions are identical Ha: At least two of the three probability distributions differ in location

Copyright © 2022 Pearson Education, Inc.

R=

n + 1 16 + 1 = = 8.5 2 2


Nonparametric Statistics

The test statistic is ∑𝑛 𝑅 − 𝑅 𝐻=

=

5 6.9 − 8.5

+ 5 4.5 − 8.5

+ 613.167 − 8.5

935

= 9.86

The rejection region requires 𝛼 = .05 in the upper tail of the χ 2 distribution with 𝑑𝑓 = 𝑘 − 1 = 3 − 1 = 2. From Table IV, Appendix D, 𝜒. = 5.99147. The rejection region is 𝐻 > 5.99147. Since the observed value of the test statistic falls in the rejection region 𝐻 = 9.860 > 5.99147 , reject H0. There is sufficient evidence to indicate a difference in location for at least two of the three probability distributions at 𝛼 = .05. 15.81

a.

Some preliminary calculations are: Pair 1 2 3 4 5 6 7 8 9

x 19 27 15 35 13 29 16 22 16

Rank, u 5 7 2 9 1 8 3.5 6 3.5

 u = 45 i

SS uv =  ui vi −

y 12 19 7 25 11 10 16 10 18

 v = 45

v2 25 64 1 81 16 6.25 36 6.25 49

 u = 284.5

 v = 284.5

2 i

i

2 i

uv 25 56 2 81 4 20 21 15 24.5

 u v = 248.5 i i

 u v = 248.5 − 45 ( 45) = 23.5 i i

n

9

( u ) = 284.5 − ( 45) = 59.5 u − 2

SSuu = 

u2 25 49 4 81 1 64 12.25 36 12.25

Rank, v 5 8 1 9 4 2.5 6 2.5 7

2 i

2

i

n

9

( v ) = 284.5 − ( 45) = 59.5 v − 2

SSvv = 

2 i

2

i

n

9

To determine if the Spearman rank correlation differs from 0, we test: H 0 : ρs = 0 H a : ρs ≠ 0

The test statistic is 𝑟 =

.

=

Reject H0 if 𝑟 < −𝑟 , / or 𝑟 > 𝑟 ,

.

/

.

= .40

where 𝛼/2 = .025 and 𝑛 = 9:

Reject H0 if 𝑟 < −.683 or 𝑟 > .683 (from Table XIV, Appendix D) Since the observed value of the test statistic does not fall in the rejection region 𝑟 = .39 ≯ . 683 , H0 is not rejected. There is insufficient evidence to indicate that Spearman's rank correlation between x and y is significantly different from 0 at 𝛼 = .05. b.

Use the Wilcoxon signed rank test. Some preliminary calculations are:

Copyright © 2022 Pearson Education, Inc.


936

Chapter 15 Pair

x

y

Difference

1 2 3 4 5 6 7 8 9

19 27 15 35 13 29 16 22 16

12 19 7 25 11 10 16 10 18

7 8 8 10 2 19 0 12 -2

Rank of Absolute Difference 3 4.5 4.5 6 1.5 8 (eliminated) 7 1.5 T− = 1.5

To determine if the probability distribution of x is shifted to the right of that for y, we test: H0: H a:

The probability distributions are identical for the two variables The probability distribution of x is shifted to the right of the probability distribution of y

The test statistic is T = T− = 1.5 . Reject H0 if 𝑇 ≤ 𝑇 where T0 is based on 𝛼 = .05 and 𝑛 = 8 (one-tailed): Reject H0 if 𝑇 ≤ 6 (from Table XIII, Appendix D). Since the observed value of the test statistic falls in the rejection region 𝑇 = 1.5 ≤ 6 , H0 is rejected. There is sufficient evidence to conclude that the probability distribution of x is shifted to the right of that for y at 𝛼 = .05. 15.82

The appropriate test for two independent samples is the Wilcoxon rank sum test. Some preliminary calculations are: Sample 1 1.2 1.9 .7 2.5 1.0 1.8 1.1

Rank 4 8.5 1 10 2 7 3 T1 = 35.5

Sample 2 1.5 1.3 2.9 1.9 2.7 3.5

Rank 6 5 12 8.5 11 13 T2 = 55.5

To determine if there is a difference between the locations of the probability distributions, we test: H0: The two sampled populations have identical probability distributions Ha: The probability distribution for population 1 is shifted to the left or right of that for 2 The test statistic is 𝑇 = 55.5. Reject H0 if 𝑇 ≤ 𝑇 or 𝑇 ≥ 𝑇 where 𝛼 = .05 (two-tailed), 𝑛 = 7 and 𝑛 = 6: Reject H0 if 𝑇 ≤ 28 or 𝑇 ≥ 56 (from Table XII, Appendix D). Since 𝑇 = 55.5 ≰ 28 and 𝑇 = 55.5 ≱ 56, do not reject H0. There is insufficient evidence to indicate a difference between the locations of the probability distributions for the sampled populations at 𝛼 = .05.

Copyright © 2022 Pearson Education, Inc.


Nonparametric Statistics

15.83

937

Some preliminary calculations are: Block 1 2 3 4

1 75 77 70 80

Rank 4 3 4 3.5 R1 = 14.5

2 65 69 63 69

14.5 R1 = R1 = = 3.625 b 4

R2 =

R5 7.5 = = 1.875 b 4

R=

R5 =

Rank 1 1 1.5 1 R2 = 4.5

R2 4.5 = = 1.125 b 4

3 74 78 69 80

Rank 3 4 3 3.5 R 3 = 13.5

R3 =

4 80 80 75 86

Rank 5 5 5 5 R4 = 20

R3 13.5 = = 3.375 b 4

R4 =

5 69 72 63 77

Rank 2 2 1.5 2 R5 = 7.5

R4 20 = =5 b 4

k +1 5 +1 = =3 2 2

To determine whether at least two of the treatment probability distributions differ in location, use Friedman Fr test. H0: H a:

The five treatments have identical probability distributions At least two of the populations have probability distributions differ in location

The test statistic is Fr =

2 12 ( 4 ) 12b 2 2 2 2 2  R j − R = 5 ( 5 + 1) ( 3.625 − 3) + (1.125 − 3) + ( 3.375 − 3) + ( 5 − 3) + (1.875 − 3)  = 14.9 k ( k + 1)

(

)

The rejection region requires 𝛼 = .05 in the upper tail of the χ 2 distribution with𝑑𝑓 = 𝑘 − 1 = 5 − 1 = 4. From Table IV, Appendix D, 𝜒. = 9.48773. The rejection region is 𝐹 > 9.48773.

Since the observed value of the test statistic falls in the rejection region 𝐹 = 14.9 > 9.48773 , H0 is rejected. There is sufficient evidence to indicate that at least two of the treatment means differ in location at 𝛼 = .05. 15.84

a.

To determine if cohesiveness will deteriorate after storage, we test: H0 :η = 0 Ha :η > 0

b.

The test statistic is S = {number of measurements greater than 0} = 13. The p-value = 𝑃 𝑥 ≥ 13 where x is a binomial random variable with 𝑛 = 20 and 𝑝 = .5. From Table I, 𝑝 − value = 𝑃 𝑥 ≥ 13 = 1 − 𝑃 𝑥 ≤ 12 = 1 − .868 = .132

15.85

c.

Since the p-value is greater than 𝛼 𝑝 = .132 > .05 , H0 is not rejected. There is insufficient evidence to indicate cohesiveness will deteriorate after storage at 𝛼 = .05.

a.

To determine if the distribution of the recalls for those receiving audiovisual presentation differs from that of the recalls of those receiving only the visual presentation, we test: H0: The two sampled distributions are identical Ha: The distribution of recalls for those receiving audiovisual presentation is shifted to the right or left of that for those receiving only visual presentation

Copyright © 2022 Pearson Education, Inc.


938

Chapter 15

b.

First, we rank all of the data:

Recall 0 4 6 6 1 2 2 6 6 4

Audiovisual Group Rank Recall 1.5 1 24 2 34.5 6 34.5 1 5 3 12 0 12 2 34.5 5 34.5 4 24 5

The test statistic is z =

15.86

Rank 5 12 34.5 5 19 1.5 12 28 24 28 T1 = 385.5

Recall 6 3 6 2 2 4 7 6 1 3

Video Only Group Rank Recall 34.5 6 19 2 34.5 3 12 1 12 3 24 2 40 5 34.5 2 5 4 19 6

Rank 34.5 12 19 5 19 12 28 12 24 34.5 T2 = 434.5

n1 ( n1 + n2 + 1) 20 ( 20 + 20 + 1) 385.5 − 2 2 = = −.66 20 ( 20 )( 20 + 20 + 1) n1 n2 ( n1 + n2 + 1) 12 12

T1 −

c.

The rejection region requires 𝛼/2 = .10/2 = .05 in each tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.645. The rejection region is 𝑧 < −1.645 or 𝑧 > 1.645.

d.

Since the observed value of the test statistic does not fall in the rejection region 𝑧 = −.66 ≮ − 1.645 , H0 is not rejected. There is insufficient evidence to indicate the distribution of the recalls for those receiving audiovisual presentation differs from that of the recalls of those receiving only the visual presentation at 𝛼 = .10. This supports the researchers’ theory.

a.

The F-test would be appropriate if: 1. 2. 3.

All k populations sampled from are normal. The variances of the k populations are equal. The k samples are independent.

b.

The variances for the three populations are probably not the same and the populations are probably not normal.

c.

To determine whether the salary distributions differ among the three cities, we test: H0: The three probability distributions are identical Ha: At least two of the three probability distributions differ in location

d and e.

Some preliminary calculations are:

Copyright © 2022 Pearson Education, Inc.


Nonparametric Statistics

Atlanta 39,600 89,900 66,700 43,900 82,200 88,600 64,800

Los Angeles 47,400 140,000 68,000 48,700 74,400 102,000 54,500

Rank 1 19 11 2 14 18 9 R1 = 74

Rank 3 21 12 5 13 20 6 R 2 = 80

Washington,D.C. 48,000 86,900 58,000 82,600 83,200 61,800 65,000

R1 =

R 1 74 = = 10.571 n1 7

R2 =

R2 80 = = 11.429 n2 7

R3 =

R3 77 = = 11 7 n3

R=

n + 1 21 + 1 = = 11 2 2

939

Rank 4 17 7 15 16 8 10 R3 = 77

The test statistic is H =

2 12 12 2 2 2  n j R j − R = 21( 21 + 1)  7 (10.571 − 11) + 7 (11.429 − 11) + 7 (11 − 11)  = 0.07 n ( n + 1)

(

)

f.

The rejection region requires 𝛼 = .05 in the upper tail of the χ 2 distribution with df = k − 1 = 3 − 1 = 2 . From Table IV, Appendix D, 𝜒. = 5.99147. The rejection region is 𝐻 > 5.99147.

g

Since the observed value of the test statistic does not fall in the rejection region ( H = .0 7 >/ 5 .9 9 1 4 7 ) , H0 is not rejected. There is insufficient evidence to indicate the salary distributions differ among the three cities at α = .05 .

15.87

a.

To calculate the median, we first arrange the data in order from the smallest to the largest: 22, 28, 32, 33, 39, 41, 43, 43, 45, 47, 50, 54, 54, 59, 62 Since n is odd, the median is the middle number, which is 43.

b.

To determine if the median age of the terminated workers exceeds the entire company's median age, we test: H 0 : η = 37 H a : η > 37

c.

The test statistic is S = number of measurements greater than 37 = 11. The p-value = 𝑃 𝑥 ≥ 11 where x is a binomial random variable with 𝑛 = 15 and 𝑝 = .5. From Table I, Appendix D, 𝑝 − value = 𝑃 𝑥 ≥ 11 = 1 − 𝑃 𝑥 ≤ 10 = 1 − .941 = .059. Since no 𝛼 value was given, we will use 𝛼 = .05. Since the p-valueis greater than 𝛼 𝑝 = .059 > .05 , H0 is not rejected. There is insufficient evidence to indicate that the median age of the terminated workers exceeds the entire company's median age at 𝛼 = .05. Copyright © 2022 Pearson Education, Inc.


940

Chapter 15

(Note: If 𝛼 = .10 was used, the conclusion would be to reject H0.)

15.88

d.

Since the conclusion using 𝛼 = .10 is to reject H0 and conclude that there is sufficient evidence to indicate that the median age of the terminated workers exceeds the entire company's median age, we would advise the company to reevaluate its planned RIF. With the proposed sample, there is evidence that the company is discriminating with respect to age.

a.

We can use the Kruskial-Wallis H-test to compare the distributions of three or more groups.

a.

To determine if the distributions of improvement scores for the three groups differ in location, we test: H0: The three probability distributions are identical Ha: At least two of the three improvement distributions differ in location MINITAB was used to create the following output:

Descriptive Statistics Treatment C DM H Overall

N

Median

Mean Rank

Z-Value

37 33 35 105

7 9 11

35.7 51.3 72.8 53.0

-4.28 -0.38 4.72

Test Null hypothesis Alternative hypothesis

H₀: All medians are equal H₁: At least one median is different

Method

DF

H-Value

P-Value

2 2

26.82 27.06

0.000 0.000

Not adjusted for ties Adjusted for ties

The test statistic is H = 26.82 The p-value is p = 0.000 Since the p-value is less than 𝛼 𝑝 = 0.000 < .01 , H0 is rejected. There is sufficient evidence to indicate the distributions of improvement scores for the three groups differ in location at 𝛼 = .01. 15.89

a.

Some preliminary calculations are: Brand A B C D E F

Expert 1 6 5 1 3 2 4

Expert 2 5 6 2 1 4 3

Difference di 1 −1 −1 2 −2 1

d i2

1 1 1 4 4 1

 d = 12 2 i

rs = 1 −

6 di2

(

2

)

n n −1

= 1−

6 (12)

(

)

6 62 − 1

= 1 − .343 = .657

Copyright © 2022 Pearson Education, Inc.


Nonparametric Statistics

b.

941

To determine if there is a positive correlation in the rankings of the two experts, we test: H 0 : ρs = 0 H a : ρs > 0

The test statistic is 𝑟 = .657. Reject H0 if 𝑟 > 𝑟 , where 𝛼 = .05 and 𝑛 = 6. From Table XIV, Appendix D, 𝑟 ,. H0 if 𝑟 > .829.

= .829. Reject

Since the observed value of the test statistic does not fall in the rejection region 𝑟 = .657 ≯ . 829 , H0 is not rejected. There is insufficient evidence to indicate a positive correlation in the rankings of the two experts at 𝛼 = .05. 15.90

a.

R1 =

R1 27 = = 4.5 6 b

R5 =

R5 9 = = 1.5 b 6

R2 25 R 18 = = 4.167 R3 = 3 = =3 6 b b 6 1 1 R = ( k + 1) = ( 5 + 1) = 3 2 2

R2 =

R4 =

R4 11 = = 1.833 b 6

The Friedman test statistic is 𝐹 =

∑ 𝑅 −𝑅

12 6 5 5+1

4.5 − 3

=

15.91

+ 4.167 − 3

+ 3−3

+ 1.833 − 3

+ 1.5 − 3

= 17.33

b.

The rejection region requires 𝛼 = .05 in the upper tail of the χ 2 distribution with 𝑑𝑓 = 𝑘 − 1 = 5 − 1 = 4. From Table IV, Appendix D, 𝜒. = 9.48773. The rejection region is 𝐹 = 9.48773.

c.

Since the observed value of the test statistic falls in the rejection region 𝐹 = 17.337 > 9.48773 , H0 is rejected. There is sufficient evidence to indicate there is a difference in the levels of farm production among the five conditions at 𝛼 = .05.

a.

Some preliminary calculations: x 28.582 24.374 31.666 40.530 38.808 33.309

Rank, u 2 1 3 6 5 4

y 3 1 10 14 7 4

 u = 21

Rank, v 2 1 5 6 4 3

 v = 21

SS uv = 

(  u )(  v ) = 88 − 21( 21) = 14.5 uv −

SSvv = 

( v) = 91− ( 21) = 17.5 v −

n 2

 v = 91

2

2

uv 4 1 15 36 20 12

 uv = 88

( u) = 91 − ( 21) = 17.5 u − 2

SSuu = 

2

6

v2 4 1 25 36 16 9

 u = 91

6

2

n

u2 4 1 9 36 25 16

rs =

SSuv SSuu SSvv

=

Copyright © 2022 Pearson Education, Inc.

2

2

n

14.5

(17.5)(17.5)

6

= .8286 .


942

Chapter 15

b.

To determine if there is positive rank correlation between total US births and the number of software millionaire birthdays, we test: H 0 : ρs = 0 H a : ρs > 0

The test statistic is rs = .8286 . Reject H0 if 𝑟 > 𝑟 , where 𝛼 = .05 and 𝑛 = 6: Reject H0 if 𝑟 > .829 (from Table XIV, Appendix D). Since the observed value of the test statistic does not fall in the rejection region 𝑟 = .8286 ≯ .829 , H0 is not rejected. There is insufficient evidence to indicate total US births and the number of software millionaire birthdays are positively rank correlated at 𝛼 = .05. c.

Some preliminary calculations: x 2 2 23 38 9 0

Rank, u 2.5 2.5 5 6 4 1

y 3 1 10 14 7 4

 u = 21

Rank, v 2 1 5 6 4 3

 v = 21

SS uv = 

(  u )(  v ) = 87.5 − 21( 21) = 14 uv −

SSvv = 

( v) = 91− ( 21) = 17.5 v −

n

n

6

 v = 91

2

2

uv 5 2.5 25 36 16 3

 u v = 8 7 .5

( u) = 90.5 − ( 21) = 17 u − 2

SSuu = 

2

2

v2 4 1 25 36 16 9

 u = 9 0 .5

6

2

d.

u2 6.25 6.25 25 36 16 1

rs =

SSuv SSuu SSvv

=

2

2

n

14

(17)(17.5)

6

= .8117 .

To determine if there is positive rank correlation between the number of software millionaire birthdays and the number of CEO birthdays, we test: H 0 : ρs = 0 H a : ρs > 0

The test statistic is 𝑟 = .8117. Reject H0 if 𝑟 > 𝑟 , where 𝛼 = .05 and 𝑛 = 6. Reject H0 if 𝑟 > .829 (from Table XIV, Appendix D). Since the observed value of the test statistic does not fall in the rejection region 𝑟 = .8117 ≯ .829 , H0 is not rejected. There is insufficient evidence to indicate the number of software millionaire birthdays and the number of CEO birthdays are positively rank correlated at 𝛼 = .05. Copyright © 2022 Pearson Education, Inc.


Nonparametric Statistics

15.92

a.

943

To determine if the mean scores of the three dimensions differ, we test: H0: The distributions of the 3 dimensions are the same Ha: At least two of the distributions differ in location among the 3 dimensions

b.

One of the assumptions for the randomized block design using parametric statistics is that the data come from normal distributions. Since the values of the scores can only be 1, 2 or 3, it is very unlikely that the data are normally distributed.

c.

The ranked data are are: Paper 1 2 3 4 5 6 7 8 9 10 11 12 13

R1 =

R 1 31 = = 2.385 n1 13

R2 =

What 3 2.5 3 2 2.5 2.5 2 2 1.5 2.5 2 2.5 3 R1 = 31

Who 1.5 1 1.5 2 2.5 1 2 1 3 1 1 2.5 1.5 R2 = 21.5

R2 21.5 = = 1.654 13 n2

R3 =

How 1.5 2.5 1.5 2 1 2.5 2 3 1.5 2.5 3 1 1.5 R3 = 25.5 R3 25.5 = = 1.962 13 n3

R=

k +1 3 +1 = =2 2 2

The test statistic is Fr =

2 12 (13 ) 12b ( 2.385 − 2 ) 2 + (1.654 − 2 ) 2 + (1.962 − 2 ) 2  = 3.50 Rj − R =   3 ( 3 + 1)  k ( k + 1)

(

)

The rejection region requires 𝛼 = .10 in the upper tail of the χ 2 distribution with 𝑑𝑓 = 𝑘 − 1 = 3 − 1 = 2. From Table IV, Appendix D, 𝜒. = 4.60517. The rejection region is 𝐹 > 4.60517. Since the observed value of the test statistic does not fall in the rejection region 𝐹 = 3.5 ≯ 4.60517 , H0 is not rejected. There is insufficient evidence to indicate the mean scores of the three dimensions differ at 𝛼 = .10. 15.93

a.

Since only 70 of the 80 customers responded to the question, only the 70 will be included. To determine if the median amount spent on hamburgers at lunch at McDonald's is less than $2.25, we test: H 0 : η = 2.25 H a : η < 2.25

S = number of measurements less than 2.25 = 20.

Copyright © 2022 Pearson Education, Inc.


944

Chapter 15

The test statistic is z =

( S − .5) − .5n = ( 20 − .5) − .5 ( 70 ) = −3.71 .5 n

.5 70

Since no 𝛼 was given in the exercise, we will use 𝛼 = .05. The rejection region requires 𝛼 = .05 in the upper tail of the z-distribution. From Table II, Appendix D, 𝑧. = 1.645. The rejection region is 𝑧 > 1.645. Since the observed value of the test statistic does not fall in the rejection region 𝑧 = −3.71 ≯ 1.645 , H0 is not rejected. There is insufficient evidence to indicate that the median amount spent on hamburgers at lunch at McDonald's is less than $2.25 at 𝛼 = .05.

15.94

b.

No. The survey was done in Boston only. The eating habits of those living in Boston are probably not representative of all Americans.

c.

We must assume that the sample is randomly selected from a continuous probability distribution.

a.

To determine if the median years of experience for commercial suppliers of the DoD exceeds 5 years, we test: H0 : η = 5 Ha : η > 5

The test statistic is S = {Number of observations greater than 5} = 5. The p-value = 𝑃 𝑥 ≥ 5 where x is a binomial random variable with 𝑛 = 5 and 𝑝 = .5. 𝑝 − value = 𝑃 𝑥 ≥ 5 = 1 − 𝑃 𝑥 ≤ 4 = 1 − .969 = .031 (Using Table II, Appendix D)

Since the p-value is less than 𝛼 𝑝 = .031 < .05 , H0 is rejected. There is sufficient evidence to indicate the median years of experience for commercial suppliers of the DoD exceeds 5 years at 𝛼 = .05. b.

Some preliminary calculations: Commercial Suppliers 30 10 9 10 5 10

Rank 15.5 6 4 6 2 6

T1 = 39.5

Government Employees 15 30 30 25 6 3 20 25 30 20 25

Rank 8 15.5 15.5 12 3 1 9.5 12 15.5 9.5 12 T2 = 113.5

To determine if commercial suppliers of the DoD have less experience than government employees, we test:

Copyright © 2022 Pearson Education, Inc.


Nonparametric Statistics

945

H0: The two sampled populations have identical probability distributions Ha: The probability distribution for commercial suppliers of the DoD is shifted to the left of that for government employees The test statistic is 𝑇 = 39.5 since the sample from the commercial supplies has the smallest number of observations. Using MINITAB, the results are: Mann-Whitney Test and CI: Com, Gov N Median Com 6 10.00 Gov 11 25.00 Point estimate for η1 - η2 is -11.00 96.1 Percent CI for η1 - η2 is (-20.00,4.00) W = 39.5 Test of η1 = η2 vs η1 < η2 is significant at 0.0797 The test is significant at 0.0773 (adjusted for ties)

The p-value is 𝑝 = .0797. Since the p-value is not less than 𝛼 𝑝 = .0797 ≮ . 05 , H0 is not rejected. There is insufficient evidence to indicate commercial suppliers of the DoD have less experience than government employees at 𝛼 = .05. The nonparametric test may be more appropriate because the distributions of the years of experience for the two groups are probably not normal. 15.95

Some preliminary calculations are:

Circuit 1 2 3 4 5 6 7 8 9 10 11

Standard Method 0.80 0.80 0.83 0.53 0.50 0.96 0.99 0.98 0.81 0.95 0.99

Huffmancoding Method 0.78 0.80 0.86 0.53 0.51 0.68 0.82 0.72 0.45 0.79 0.77

Difference S-H 0.02 0.00 -0.03 0.00 -0.01 0.28 0.17 0.26 0.36 0.16 0.22

Rank of Absolute Differences 2 (eliminated) 3 (eliminated) 1 8 5 7 9 4 6 T− = 4

To determine if the Huffman-coding method yields a smaller mean compression ratios, we test: H0: The two sampled populations have identical probability distributions. Ha: The probability distribution of the Standard Method is shifted to the right of that for the Huffman-coding Method. The test statistic is 𝑇 = 4. Copyright © 2022 Pearson Education, Inc.


946

Chapter 15

The rejection region is 𝑇 ≤ 8, from Table XIII, Appendix D, with 𝑛 = 9 and 𝛼 = .05 (one-tailed). Since the observed value of the test statistic falls in the rejection region 𝑇 = 4 ≤ 8 , H0 is rejected. There is sufficient evidence to indicate the Huffman-coding method yields a smaller mean compression ratio at 𝛼 = .05 15.96

Some preliminary calculations: 1 Urban 4.3 5.2 6.2 5.6 3.8 5.8 4.7

Rank 4.5 10.5 15.5 12 1 13 6 R1 = 62.5

R1 =

R 1 62.5 = = 8.9286 n1 7

R=

n + 1 19 + 1 = = 10 2 2

2 Suburban 5.9 6.7 7.6 4.9 5.2 6.8

R2 =

3 Rank 14 17 19 8 10.5 18

Rural 5.1 4.8 3.9 6.2 4.2 4.3

Rank 9 7 2 15.5 3 4.5

R 2 = 86.5

R 3 = 41

R2 86.5 = = 14.4167 n2 6

R3 =

R3 41 = = 6.8333 6 n3

To determine if there is a difference in the level of property taxes among the three types of school districts, we test: H0: The three probability distributions are identical Ha: At least two of the three probability distributions differ in location The test statistic is 𝐻=

∑𝑛 𝑅 − 𝑅

=

7 8.9286 − 10

+ 6 14.4167 − 10

+ 6 6.8333 − 10

= 5.84989

The rejection region requires 𝛼 = .05 in the upper tail of the χ 2 distribution with 𝑑𝑓 = 𝑘 − 1 = 3 − 1 = 2. From Table IV, Appendix D, 𝜒. = 5.99147. The rejection region is 𝐻 > 5.99147. Since the observed value of the test statistic does not fall in the rejection region 𝐻 = 5.8499 ≯ 5.99147 , H0 is not rejected. There is insufficient evidence to indicate that there is a difference in the level of property taxes among the three types of school districts at 𝛼 = .05. 15.97

a.

Using MINITAB, histograms of the two data sets are:

Copyright © 2022 Pearson Education, Inc.


Nonparametric Statistics

947

Histogram of HEATRATE 9000 10000 11000 12000 13000 14000 15000 16000

Aeroderiv

50

Traditional

Percent

40 30 20 10 0

9000 10000 11000 12000 13000 14000 15000 16000

HEATRATE Panel variable: ENGINE

From the histograms, the data for each group do not look like they are mound-shaped. The variance of the aeroderivative engines is greater than that of the traditional engines. Thus, the assumptions of normal distributions and equal variances necessary for the t-test are probably not met. b.

Using MINITAB, the results are: Mann-Whitney Test and CI: Trad, Aero N Median Trad 39 11183 Aero 7 12414 Point estimate for η1 - η2 is -1125 95.3 Percent CI for η1 - η2 is (-2358,1448) W = 885.0 Test of η1 = η2 vs η1 ≠ η2 is significant at 0.3431 The test is significant at 0.3431 (adjusted for ties)

To determine if the distributions of the heat rates for traditional and aeroderivative engines differ, we test: H0: The distributions of the heat rates for the two types of engines are identical Ha: The distribution of the heat rates traditional engines is shifted to the right or left of that for aeroderivative engines The test statistic is 𝑇 = 885 and the p-value is 𝑝 = .3431. Since this p-value is not small, H0 is not rejected. There is no evidence to indicate that the heat rate distribution of the traditional turbine engines is shifted to the right or left of that for the aeroderivative turbine engines for any reasonable value of 𝛼. 15.98

Some preliminary calculations are:

Copyright © 2022 Pearson Education, Inc.


948

Chapter 15

Employee 1 2 3 4 5 6 7 8 9 10

Before Flextime 54 25 80 76 63 82 94 72 33 90

After Flextime 68 42 80 91 70 88 90 81 39 93

Difference (B − A) −14 −17 0 −15 −7 −6 4 −9 −6 −3

Difference 7 9 (Eliminated) 8 5 3.5 2 6 3.5 1 T+ = 2

To determine if the pilot flextime program is a success, we test: H0: The two probability distributions are identical Ha: The probability distribution before is shifted to the left of that after The test statistic is 𝑇 = 2. The rejection region is 𝑇 ≤ 8, from Table XIII, Appendix D, with 𝑛 = 9 and 𝛼 = .05. Since the observed value of the test statistic falls in the rejection region 𝑇 = 2 ≤ 8 , H0 is rejected. There is sufficient evidence to indicate the pilot flextime program has been a success at 𝛼 = .05. 15.99

a.

To determine if the median level differs from the target, we test: H 0 : η = .75 H a : η ≠ .75

b.

S1 = number of observations less than .75 and S2 = number of observations greater than .75. The test statistic is S = larger of S1 and S2. The p-value = 2𝑃 𝑥 ≥ 𝑆 where x is a binomial random variable with 𝑛 = 25 and 𝑝 = .5. If the pvalue is less than 𝛼 = .10, reject H0.

c.

A Type I error would be concluding the median level is not .75 when it is. If a Type I error were committed, the supervisor would correct the fluoridation process when it was not necessary. A Type II error would be concluding the median level is .75 when it is not. If a Type II error were committed, the supervisor would not correct the fluoridation process when it was necessary.

d.

S1 = number of observations less than .75 = 7 and S2 = number of observations greater than .75 = 18. The test statistic S is the larger of S1 and S2. Thus, S = 18. The p-value = 2𝑃 𝑥 ≥ 18 where x is a binomial random variable with 𝑛 = 25 and 𝑝 = .5. From Table I, 𝑝-value = 2𝑃 𝑥 ≥ 18 = 2 1 − 𝑃 𝑥 ≤ 17 = 2 1 − .978 = 2 . 022 = .044 Since the p-value = .044 < 𝛼 = .10, H0 is rejected. There is sufficient evidence to indicate the median level of fluoridation differs from the target of .75 at 𝛼 = .10. Copyright © 2022 Pearson Education, Inc.


Nonparametric Statistics

A distribution heavily skewed to the right might look something like the following:

Y

e.

949

X

One assumption necessary for the t-test is that the distribution from which the sample is drawn is normal. A distribution which is heavily skewed in one direction is not normal. Thus, the sign test would be preferred. 15.100 Some preliminary calculations are: Hours

Rank

1 2 3 4 5 6 7 8

1 2 3 4 5 6 7 8

Fraction Defective .02 .05 .03 .08 .06 .09 .11 .10

Rank

di

1 3 2 5 4 6 8 7

0 −1 1 −1 1 0 −1 1

d i2

d

0 1 1 1 1 0 1 1 2 i

= 6

To determine if the fraction defective increases as the day progresses, we test: H 0 : ρs = 0 H a : ρs > 0

The test statistic is rs = 1 −

6 di2

(

2

)

n n −1

= 1−

6 ( 6)

(

)

8 82 − 1

= 1 − .071 = .929

Reject H0 if 𝑟 > 𝑟 , where 𝛼 = .05 and 𝑛 = 8: Reject H0 if 𝑟 > .643 (from Table XIV, Appendix D). Since he observed value of the test statistic falls in the rejection region 𝑟 = .929 > .643 , reject H0. There is sufficient evidence to indicate that the fraction defective increases as the day progresses at 𝛼 = .05.

Copyright © 2022 Pearson Education, Inc.


950

Chapter 15

15.101 Since the data are already ranked, it is clear that: R1 = 19

R2 = 21.5

R1 =

R 1 19 = = 1.9 n1 10

R=

k +1 4 +1 = = 2.5 2 2

R3 = 27.5

R2 =

R 4 = 32

R2 21.5 = = 2.15 10 n2

R3 =

R3 27.5 = = 2.75 10 n3

R4 =

R4 32 = = 3.2 n4 10

To determine if the probability distributions of ratings differ for at least two of the items, we test: H0: The probability distributions of responses are identical for the four aspects Ha: At least two of the probability distributions differ in location The test statistic is Fr =

2 12 (10 ) 12b 2 2 2 2  R j − R = 4 ( 4 + 1) (1.9 − 2.5 ) + ( 2.15 − 2.5 ) + ( 2.75 − 2.5 ) + ( 3.2 − 2.5 )  = 6.21 k ( k + 1)

(

)

The rejection region requires 𝛼 = .05 in the upper tail of the χ 2 distribution with 𝑑𝑓 = 𝑘 − 1 = 4 − 1 = 3. From Table IV, Appendix D, 𝜒. = 7.81473. The rejection region is 𝐹 > 7.81473. Since the observed value of the test statistic does not fall in the rejection region 𝐹 = 6.21 ≯ 7.81473 , H0 is not rejected. There is insufficient evidence to conclude that at least two of the probability distributions of ratings differ at 𝛼 = .05. 15.102 To determine if the median productivity z-score of all such Ph.D. programs differs from 0, we test: H0 :η = 0 Ha :η ≠ 0

S1 = {Number of observations < 0} = 8 S2 = {Number of observations > 0} = 2 The test statistic S is the larger of S1 and S2. Thus, S = 8. The p-value = 𝑃 𝑥 ≥ 8 where x is a binomial random variable with 𝑛 = 10 and 𝑝 = .5. Using Table I, Appendix D, 𝑝 − value = 2𝑃 𝑥 ≥ 8 = 2 1 − 𝑃 𝑥 ≤ 7

= 2 1 − .945 = .110

Since the p-value is not less than 𝛼 𝑝 = .110 ≮ . 05 , H0 is not rejected. There is insufficient evidence to indicate the median productivity z-score of all such Ph.D. programs differs from 0 at 𝛼 = .05. 15.103 a.

To determine if the distributions of the number of text messages sent and received during peak time differs for those on annual contracts and those with a pay-as-you-go option, we test: H0: The two sampled populations have identical distributions Ha: The probability distribution of those on annual contracts is shifted to the right or left of the distribution of those with the pay-as-you-go option

b.

Let T1 = sum of the ranks of the observations for those on annual contracts and T2 = sum of the ranks of the observations for those on the pay-as-you-go option. The test statistic is Copyright © 2022 Pearson Education, Inc.


Nonparametric Statistics

z=

c.

951

n1 ( n1 + n2 + 1) 25 ( 25 + 40 + 1) T1 − T − 825 2 2 = = 1 74.162 25 ( 40)( 25 + 40 + 1) n1n2 ( n1 + n2 + 1) 12 12

T1 −

A possible graph of the two distributions is: Histogram of Pay-as-go, Contract Variable Pay-as-go Contract

200

Frequency

150

100

50

0

15.104 a.

-3.0

-1.5

0.0

1.5

3.0 Data

4.5

6.0

7.5

Using MINITAB, the results are: Sign Test for Median: MTBE Sign test of median = 0.5000 versus < 0.5000

MTBE

N 223

Below 180

Equal 0

Above 43

P 0.0000

Median 0.2000

To determine if the median level of MTBE in New Hampshire groundwater wells is less than .5 micrograms per liter, we test: H 0 : η = .5 H a : η < .5

The test statistic is 𝑆 = 180 and the p-value is 𝑝 = .0000. Since the p-value is so small, H0 is rejected. There is sufficient evidence to indicate the median level of MTBE in New Hampshire groundwater wells is less than .5 micrograms per liter for any reasonable value of 𝛼. b.

Using MINITAB, the results are: Mann-Whitney Test and CI: Private, Public N Median Private 22 0.520 Public 48 1.035 Point estimate for η1 - η2 is -0.390 95.1 Percent CI for η1 - η2 is (-1.279,0.041) W = 654.5 Test of η1 = η2 vs η1 ≠ η2 is significant at 0.1109 The test is significant at 0.1108 (adjusted for ties)

To determine if the distribution of MTBE levels in public wells is shifted above or below the Copyright © 2022 Pearson Education, Inc.


952

Chapter 15

distribution of MTBE levels in private wells, we test: H0: The distributions of MTBE levels in public wells and private wells are identical Ha: The distribution of MTBE levels in public wells is shifted above or below the distribution of MTBE levels in private wells The test statistic is 𝑇 = 654.5 and the p-value is 𝑝 = .1109. Since the p-value is not small, H0 is not rejected. There is insufficient evidence to indicate the distribution of MTBE levels in public wells is shifted above or below the distribution of MTBE levels in private wells for any value of 𝛼 < .1109. Using MINITAB, the results are: Mann-Whitney Test and CI: Bedrock, Unconsolidated N Median Bedrock 63 0.970 Unconsolidated 7 0.340 Point estimate for η1 - η2 is 0.590 95.2 Percent CI for η1 - η2 is (0.010,1.990) W = 2345.5 Test of η1 = η2 vs η1 ≠ η2 is significant at 0.0337 The test is significant at 0.0336 (adjusted for ties)

To determine if the distribution of MTBE levels in bedrock aquifers is shifted above or below the distribution of MTBE levels in unconsolidated aquifers, we test: H0: The distributions of MTBE levels in bedrock aquifers and unconsolidated aquifers are identical Ha: The distribution of MTBE levels in bedrock aquifers is shifted above or below the distribution of MTBE levels in unconsolidated aquifers The test statistic is 𝑇 = 2,345.5 and the p-value is𝑝 = .0337. Since the p-value is small, H0 is rejected. There is sufficient evidence to indicate the distribution of MTBE levels in bedrock aquifers is shifted above or below the distribution of MTBE levels in unconsolidated aquifers for any value of 𝛼 > .0337. c.

Using MINITAB, the results are: Kruskal-Wallis Test: MTBE versus Trt Kruskal-Wallis Test on MTBE Trt PrBed PuBed PuUnc Overall

N 22 41 7 70

Median 0.5200 1.5000 0.3400

H = 9.12 H = 9.12

DF = 2 DF = 2

Ave Rank 29.8 41.2 19.9 35.5

P = 0.010 P = 0.010

Z -1.60 2.81 -2.13

(adjusted for ties)

To determine if the distributions of MTBE levels differ in location among the three combinations of well class, we test: H0: The distributions of MTBE levels in the three combinations of well class are identical Ha: At least two of the three distributions differ in location The test statistic is 𝐻 = 9.12 and the p-value is 𝑝 = .010. Since the p-value is small, H0 is rejected. Copyright © 2022 Pearson Education, Inc.


Nonparametric Statistics

953

There is sufficient evidence to indicate the distribution of MTBE levels differ in location among the three combinations of well class for any value of 𝛼 > .010. d.

Using MINITAB, the calculations are: Correlation: Rank-D-Pr, Rank-M-Pr Pearson correlation of Rank-D-Pr and Rank-M-Pr = -0.410 P-Value = 0.103

From the printout, 𝑟 = −.410. For private well, the rank correlation between depth and MTBE level is -.410. For private wells, there is a negative relationship between depth and MTBE levels. As the depth increases, the level of MTBE tends to decrease. Note: Although the printout states that the correlation is the Pearson correlation, this correlation was computed on the ranks of depth and MTBE levels, not the actual values. Thus, this Pearson correlation is the same as the Spearman correlation. Correlation: Rank-D-Pu, Rank-M-Pu Pearson correlation of Rank-D-Pu and Rank-M-Pu = 0.444 P-Value = 0.002

From the printout, 𝑟 = .444. For private well, the rank correlation between depth and MTBE level is .444. For public wells, there is a positive relationship between depth and MTBE levels. As the depth increases, the level of MTBE tends to inecrease. 15.105 Using MINITAB, the results of the Wilcoxon Rank Sum Test (Mann-Whitney Test) for each of the variables are: Mann-Whitney Test and CI: CREATIVE-S, CREATIVE-NS N Median CREATIVE-S 47 5.0000 CREATIVE-NS 67 4.0000

Point estimate for ETA1-ETA2 is 1.0000 95.0 Percent CI for ETA1-ETA2 is (0.9999,1.0000) W = 3734.5 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.0000 The test is significant at 0.0000 (adjusted for ties) Mann-Whitney Test and CI: INFO-S, INFO-NS N Median INFO-S 47 5.000 INFO-NS 67 5.000 Point estimate for ETA1-ETA2 is 0.000 95.0 Percent CI for ETA1-ETA2 is (-0.000,1.000) W = 2888.5 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.2856 The test is significant at 0.2743 (adjusted for ties)

Copyright © 2022 Pearson Education, Inc.


954

Chapter 15 Mann-Whitney Test and CI: DECPERS-S, DECPERS-NS N Median DECPERS-S 47 3.000 DECPERS-NS 67 2.000 Point estimate for ETA1-ETA2 is -0.000 95.0 Percent CI for ETA1-ETA2 is (-0.000,1.000) W = 2963.5 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.1337 The test is significant at 0.1228 (adjusted for ties) Mann-Whitney Test and CI: SKILLS-S, SKILLS-NS N Median SKILLS-S 47 6.0000 SKILLS-NS 67 5.0000

Point estimate for ETA1-ETA2 is 1.0000 95.0 Percent CI for ETA1-ETA2 is (0.9999,1.9999) W = 3498.5 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.0000 The test is significant at 0.0000 (adjusted for ties) Mann-Whitney Test and CI: TASKID-S, TASKID-NS N Median TASKID-S 47 5.000 TASKID-NS 67 4.000

Point estimate for ETA1-ETA2 is 1.000 95.0 Percent CI for ETA1-ETA2 is (-0.000,1.000) W = 3028.0 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.0614 The test is significant at 0.0566 (adjusted for ties) Mann-Whitney Test and CI: AGE-S, AGE-NS N Median AGE-S 47 47.000 AGE-NS 67 45.000

Point estimate for ETA1-ETA2 is 1.000 95.0 Percent CI for ETA1-ETA2 is (-1.000,4.001) W = 2891.5 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.2779 The test is significant at 0.2771 (adjusted for ties) Mann-Whitney Test and CI: EDYRS-S, EDYRS-NS N Median EDYRS-S 47 13.000 EDYRS-NS 67 13.000

Point estimate for ETA1-ETA2 is -0.000 95.0 Percent CI for ETA1-ETA2 is (0.000,-0.000) W = 2664.0 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.8268 The test is significant at 0.8191 (adjusted for ties)

Copyright © 2022 Pearson Education, Inc.


Nonparametric Statistics

955

A summary of the tests above and the t-tests from Chapter 8 are listed in the table: Variable CREATIVE INFO DECPERS SKILLS TASKID AGE EDYRS

Wilcoxon Test Statistic, T2 3734.5 2888.5 2963.5 3498.5 3028.0 2891.5 2664.0

p-value 0.000 0.274 0.123 0.000 0.057 0.277 0.819

t 8.847 1.503 1.506 4.766 1.738 0.742 -0.623

p-value 0.000 0.136 0.135 0.000 0.087 0.460 0.534

The p-values for the Wilcoxon Rank Sum Tests and the t-tests are similar and the decisions are the same. Since the sample sizes are large (𝑛 = 47 and 𝑛 = 67), the Central Limit Theorem applies. Thus, the t-tests (or z-tests) are valid. One assumption for the Wilcoxon Rank Sum test is that the distributions are continuous. Obviously, this is not true. There are many ties in the data, so the Wilcoxon Rank Sum tests may not be valid.

Copyright © 2022 Pearson Education, Inc.


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.