Test Bank for Business Analytics 4th Edition

Page 1

Test Bank for Business Analytics 4th Edition

richard@qwconsultancy.com

1|Pa ge


Name:

Class:

Date:

Chapter 01 - Introduction Multiple Choice 1. The decisions concerning an organization’s goals and future plans are called _____. a. financial decisions b. tactical decisions c. strategic decisions d. operational decisions ANSWER: c 2. Tactical decisions are concerned with _____. a. the day-to-day activities of the organization b. the goals and plans of the organization c. the domain of operations managers, who are close to the customer d. how the organization should achieve the goals and objectives set by its strategy ANSWER: d 3. Picks and Axes Inc. is an Internet-based retail seller of hiking boots and mountaineering gear. The company decides to open retail stores across the major areas of the city to help complement its Internet-based strategy. This activity would be categorized as a(n) _____. a. tactical decision b. operational decision c. strategic decision d. financial decision ANSWER: c 4. _____ is the most critical step of the decision-making process. a. Choosing an alternative b. Identifying and defining the problem c. Evaluating the alternatives d. Determining the set of alternatives ANSWER: b 5. Which of the following is not an approach to making decisions? a. Tradition b. Rules of thumb c. Intuition d. Guess and check ANSWER: d 6. Data-driven decision making tends to decrease a firm's _____. a. market value b. productivity c. risk d. profit ANSWER: c 7. Data dashboards are a type of _____analytics. a. predictive b. descriptive c. prescriptive d. decision ANSWER: b 8. The extraction of information on the number of shipments, how much was included in each shipment, the date each shipment was sent, and so on from the manufacturing plant’s database exemplifies _____ a. spreadsheet models b. data dashboards c. data mining d. data queries Copyright Cengage Learning. Powered by Cognero.

Page 1


Name:

Class:

Date:

Chapter 01 - Introduction ANSWER: d 9. Corporate-level managers use ______ to summarize sales by region, current inventory levels, and other company-wide metrics all in a single screen. a. simulations b. crosstabulation c. data dashboards d. tables ANSWER: c 10. A forecast that helps direct police officers to areas where crimes are likely to occur based on past data is an example of _____. a. predictive analytics b. decision analysis c. prescriptive analytics d. descriptive analytics ANSWER: a 11. Which one of the following is used in predictive analytics? a. Data dashboard b. Linear regression c. Data visualization d. Optimization model ANSWER: b 12. A retail store owner offers a discount on product A and predicts that the customers would purchase products B and C in addition to product A. Identify the technique used to make such a prediction. a. Data query b. Simulation c. Data mining d. Data dashboards ANSWER: c 13. _____ are used in the pharmaceutical industry to assess the risk of introducing a new drug. a. Data dashboards b. Charts c. Spreadsheet models d. Simulations ANSWER: d 14. Which of the following analytical techniques helps us arrive at the best decision? a. Predictive analytics b. Data mining c. Prescriptive analytics d. Descriptive analytics ANSWER: c 15. Simulation optimization helps _____. a. in identifying the constraints of the situation b. to find good decisions in highly complex and highly uncertain settings c. in assigning values to outcomes d. to model certainty using optimization techniques ANSWER: b 16. When a decision maker is faced with several alternatives and an uncertain set of future events, s/he uses _____ to develop an optimal strategy. a. utility theory b. predictive analytics Copyright Cengage Learning. Powered by Cognero.

Page 2


Name:

Class:

Date:

Chapter 01 - Introduction c. data mining ANSWER: d

d. decision analysis

17. _____ assigns values to outcomes based on the decision maker’s attitude toward risk, loss, and other factors. a. Simulation optimization b. Utility theory c. Optimization model d. Data dashboard ANSWER: b 18. Which of the following best exemplifies big data? a. Five hundred Facebook users upload one thousand pictures per day. b. Cellphone owners around the world generate vast amounts of data by calling, texting, tweeting, and browsing the Web on a daily basis. c. A local grocery store collects data from those that scan their loyalty card. d. A pharmacy keeps track of customer purchases to send its customers coupons. ANSWER: b 19. Which of the following sources of big data is not publicly available? a. Twitter b. Weather data c. Medical records d. Sports records ANSWER: c 20. Advanced analytics generally refers to _____. a. descriptive and prescriptive analytics b. simulation c. predictive and prescriptive analytics d. decision analysis ANSWER: c 21. In the financial sector, _____ are used to construct financial instruments such as derivatives. a. descriptive and prescriptive models b. predictive models c. descriptive models d. prescriptive models ANSWER: b 22. Optimization models can be used to _____. a. assess the risk of investment portfolios b. forecast future financial performance c. successfully manage commercial real estate risk d. decide on how to invest cash received from insurance policies ANSWER: d 23. Utility theory is the study of the _____ or relative desirability of a particular outcome that reflects the decision maker’s attitude toward a collection of factors, such as profit, loss, and risk. a. total worth b. total cost c. feasibility Copyright Cengage Learning. Powered by Cognero.

Page 3


Name:

Class:

Date:

Chapter 01 - Introduction d. financial wellness ANSWER: a 24. _____ refers to the technology that allows data, collected from sensors in all types of machines, to be sent over the Internet to repositories where it can be stored and analyzed. a. Internet of Things (IoT) b. MapReduce c. Hadoop d. Advanced analytics ANSWER: a 25. _____ refers to a programming model used within Hadoop that performs the two major steps for which it is named: the map step and the reduce step. a. MapReduce b. Internet of Things (IoT) c. Advanced analytics d. Optimization model ANSWER: a 26. _____ is an open-source programming environment that supports big data processing through distributed storage and distributed processing on clusters of computers. a. Hadoop b. Excel c. Java d. MapReduce ANSWER: a 27. _____ analytics are techniques that use models, constructed from past data, to predict the future or to ascertain the impact of one variable on another. a. Predictive b. Descriptive c. Simulation d. Prescriptive ANSWER: a 28. A _____ decision involves higher-level issues and is concerned with the overall direction of the organization, defining the overarching goals and aspirations for the organization’s future. a. strategic b. tactical c. intuitive d. operational ANSWER: a 29. A _____ decision is concerned with how the organization should achieve the goals and objectives set by its strategy. a. tactical Copyright Cengage Learning. Powered by Cognero.

Page 4


Name:

Class:

Date:

Chapter 01 - Introduction b. strategic c. intuitive d. operational ANSWER: a 30. _____ analytics use techniques that take input data and yield a best course of action. a. Prescriptive b. Simulation c. Strategic d. Operational ANSWER: a 31. In the spectrum of business analytics, which is the most complex? a. Descriptive b. Predictive c. Prescriptive d. Operational ANSWER: c 32. In order to manage an organization’s human resource activities, such as hiring employees, tracking, and influencing employee retention, HR personnel use _____. a. descriptive and predictive analytics. b. descriptive and prescriptive analytics. c. predictive and prescriptive analytics. d. predictive analytics. ANSWER: a 33. A better understanding of consumer behavior through analytics directly leads to _____. a. more profits b. better pricing strategies c. reduced advertising costs d. reduced risk ANSWER: b 34. A light bulb manufacturer uses descriptive analytics _____. a. to present supply chain to managers visually. b. to achieve efficiency in delivery of goods. c. to schedule staff and vehicle for delivery. d. to plan capacity utilization by incorporating the inherent uncertainty in commodities pricing. ANSWER: a 35. The U.S. Internal Revenue Service uses _____ to identify patterns that distinguish questionable annual personal income tax filings. a. utility theory b. prescriptive analytics c. data mining d. decision analysis ANSWER: c Subjective Short Answer Copyright Cengage Learning. Powered by Cognero.

Page 5


Name:

Class:

Date:

Chapter 01 - Introduction 36. _____ may be used to develop an optimal strategy when a decision maker is faced with several decision alternatives and an uncertain set of future events. ANSWER: Decision analysis 37. An increase in data _____ would help to protect stored data from destructive forces or unauthorized users. ANSWER: security 38. _____ are analytical tools that describe what has happened. ANSWER: Descriptive analytics 39. The use of analytical techniques for better understanding patterns and relationships that exist in large data sets is _____. ANSWER: data mining 40. A dashboard is a collection of tables, charts, and maps to help management _____ selected aspects of the company’s performance. ANSWER: monitor 41. A decision concerned with how the organization is run from day to day is known as a(n) _____. ANSWER: operational decision 42. A mathematical model that gives the best decision, subject to the situation’s constraints, is an a(n) _____. ANSWER: optimization model 43. A data _____ is a request to obtain information with certain characteristics from a database. ANSWER: query 44. Business analytics is the _____ process of transforming data into insight for making better decisions. ANSWER: scientific 45. A data _____ is trained in both computer science and statistics and knows how to effectively process and analyze large amounts of data. ANSWER: scientist 46. The use of probability and statistics to construct a computer model to study the impact of uncertainty on the decision at hand is called _____. ANSWER: simulation 47. Predictive and prescriptive analytics can also be referred to as _____. ANSWER: advanced analytics 48. _____ analytics is the analysis of online activity, such as visits to websites or social media. ANSWER: Web 49. One of the 4 Vs of big data that refers to uncertainty due to data inconsistency and incompleteness, ambiguities, latency, deception, and model approximations is _____. ANSWER: veracity Copyright Cengage Learning. Powered by Cognero.

Page 6


Name:

Class:

Date:

Chapter 01 - Introduction 50. Data that are too large or too complex to be handled by standard data-processing techniques and typical desktop software are called _____. ANSWER: big data 51. Veracity has to do with how much _____is in the data. ANSWER: uncertainty 52. What are the four V’s of big data? ANSWER: Volume, Velocity, Variety, Veracity Essay 53. With the rise of big data, increased attention is being paid to legal and ethical issues. INFORMS has established certain guidelines. Briefly discuss. ANSWER: Answers may vary by student.

Copyright Cengage Learning. Powered by Cognero.

Page 7


Name:

Class:

Date:

Chapter 02 - Descriptive Statistics Multiple Choice 1. A quantity of interest that can take on different values is known as a(n) _____. a. variable b. parameter c. sample d. observation ANSWER: a 2. A set of values corresponding to a set of variables is defined as a(n) _____. a. quantity b. event c. factor d. observation ANSWER: d 3. The difference in a variable measured over observations (time, customers, items, etc.) is known as _____. a. observed differences b. variation c. variable change d. descriptive analytics ANSWER: b 4. _____ acts as a representative of the population. a. The variable b. The variance c. A sample d. A random variable ANSWER: c 5. The act of collecting data that are representative of the population data is called _____. a. random sampling b. sample data c. population sampling d. sources of data ANSWER: a 6. The letter grades (A, B, C, D, F) of business analysis students are recorded by a professor. This variable’s classification _____. a. is quantitative data b. cannot be determined c. is categorical data d. is time series data ANSWER: c 7. The amount of time taken by each of 10 students in a class to complete an exam is an example of what type of data? a. Cannot be determined b. Categorical data c. Time series data d. Quantitative data ANSWER: d 8. _____ are collected from several entities at the same point in time. a. Time series data b. Categorical and quantitative data c. Cross-sectional data d. Random data ANSWER: c 9. Data collected from several entities over a period of time (minutes, hours, days, etc.) are called _____. a. categorical and quantitative data b. time series data Copyright Cengage Learning. Powered by Cognero.

Page 1


Name:

Class:

Date:

Chapter 02 - Descriptive Statistics c. source data ANSWER: b

d. cross-sectional data

10. In a(n) _____, one or more variables are identified and controlled or manipulated so that data can be obtained about how they influence the variable of interest identified first. a. experimental study b. observational study c. categorical study d. variable study ANSWER: a 11. The data collected from the customers in restaurants about the quality of food is an example of a(n) _____. a. variable study b. cross-sectional study c. experimental study d. observational study ANSWER: d 12. When working with large spreadsheets with many rows of data, it can be helpful to _____the data to better find, view, or manage subsets of data. a. split b. sort and filter c. chart d. manipulate ANSWER: b 13. When working with data sets in Excel, _____ can be used to automatically highlight cells that meet specified requirements. a. averaging b. conditional formatting c. summing d. sorting ANSWER: b 14. A summary of data that shows the number of observations in each of several nonoverlapping bins is called a(n) _____. a. frequency distribution b. sample summary c. bin distribution d. observed distribution ANSWER: a 15. Which of the following gives the proportion of items in each bin? a. Frequency b. Class size c. Relative frequency d. Bin proportion ANSWER: c 16. Compute the relative frequencies for the data given in the table below: Number of Grades students A 16 B 28 C 33 D 13 Total 90 a. 0.31, 0.14, 0.37, 0.18

b. 0.37, 0.14, 0.31, 0.18

Copyright Cengage Learning. Powered by Cognero.

Page 2


Name:

Class:

Date:

Chapter 02 - Descriptive Statistics c. 0.16, 0.28, 0.33, 0.13 ANSWER: d

d. 0.18, 0.31, 0.37, 0.14

17. Consider the data below. What percentage of students scored grade C? Grades A B C D Total a. 33% c. 37% ANSWER: c

Number of students 16 28 33 13 90 b. 31% d. 28%

18. Which of the following are necessary to be determined to define the classes for a frequency distribution with quantitative data? a. Number of nonoverlapping bins, width of each bin, and bin limits b. Width of each bin and bin lower limits c. Number of overlapping bins, width of each bin, and bin upper limits d. Width of each bin and number of bins ANSWER: a 19. The goal regarding using an appropriate number of bins is to show the _____. a. number of observations b. number of variables c. variation in the data d. correlation in the data ANSWER: c 20. A _____ is a graphical summary of data previously summarized in a frequency distribution. a. box plot b. histogram c. line chart d. scatter chart ANSWER: b 21. Identify the shape of the distribution in the figure below.

Copyright Cengage Learning. Powered by Cognero.

Page 3


Name:

Class:

Date:

Chapter 02 - Descriptive Statistics

a. Skewed left c. Approximately bell shaped

b. Symmetric d. Skewed right

ANSWER: d 22. The _____ shows the number of data items with values less than or equal to the upper class limit of each class. a. cumulative frequency distribution b. frequency distribution c. percent frequency distribution d. relative frequency distribution ANSWER: a 23. The _____ is a point estimate of the population mean for the variable of interest. a. sample mean b. median c. sample d. geometric mean ANSWER: a 24. Compute the mean of the following data. 56, 42, 37, 29, 45, 51, 30, 25, 34, 57 a. 42.8 c. 40.6 ANSWER: c

b. 52.1 d. 39.4

25. Compute the median of the following data. 32, 41, 36, 24, 29, 30, 40, 22, 25, 37 a. 28 b. 31 c. 40 d. 34 ANSWER: b 26. Compute the mode for the following data. 12, 16, 19, 10, 12, 11, 21, 12, 21, 10 Copyright Cengage Learning. Powered by Cognero.

Page 4


Name:

Class:

Date:

Chapter 02 - Descriptive Statistics a. 21 b. 11 c. 12 d. 10 ANSWER: c 27. Compute the geometric mean for the following data on growth factors of an investment for 10 years. 1.10, 0.50, 0.70, 1.21, 1.25, 1.12, 1.16, 1.11, 1.13, 1.22 a. 1.0221 c. 1.0363 ANSWER: b

b. 1.0148 d. 1.1475

28. The simplest measure of variability is the _____. a. variance b. standard deviation c. coefficient of variation d. range ANSWER: d 29. The variance is based on the a. deviation about the median. c. deviation about the mean. ANSWER: c

b. number of variables. d. correlation in the data.

30. Use technology to compute the standard deviation for the following sample data. 32, 41, 36, 24, 29, 30, 40, 22, 25, 37 a. 5.96 c. 5.42 ANSWER: d

b. 6.41 d. 6.75

31. Compute the coefficient of variation for the following sample data. 32, 41, 36, 24, 29, 30, 40, 22, 25, 37 a. 18.64% c. 20.28% ANSWER: b

b. 21.36% d. 21.67%

32. Compute the 50th percentile for the following data. 10, 15, 17, 21, 25, 12, 16, 11, 13, 22 a. 18.6 c. 15.5 ANSWER: c

b. 13.3 d. 17.7

33. Compute the third quartile for the following data. 10, 15, 17, 21, 25, 12, 16, 11, 13, 22 Copyright Cengage Learning. Powered by Cognero.

Page 5


Name:

Class:

Date:

Chapter 02 - Descriptive Statistics a. 21.25 c. 21.5 ANSWER: a

b. 15.5 d. 11.75

34. Compute the IQR for the following data. 10, 15, 17, 21, 25, 12, 16, 11, 13, 22 a. 6.25 c. 5.14 ANSWER: d

b. 7.75 d. 9.50

35. A _____ determines how far a particular value is from the mean relative to the data set’s standard deviation. a. coefficient of variation b. z-score c. variance d. percentile ANSWER: b 36. For data having a bell-shaped distribution, approximately _____ percent of the data values will be within one standard deviation of the mean. a. 95 b. 66 c. 68 d. 97 ANSWER: c 37. Any data value with a z-score less than –3 or greater than +3 is considered to be a(n) _____. a. outlier b. statistic c. whisker d. z-score value ANSWER: a 38. Which of the following graphs provides information on outliers and IQR of a data set? a. Histogram b. Line chart c. Scatter chart d. Box plot ANSWER: d 39. If the covariance between two variables is near 0, it implies that ______. a. a positive relationship exists between the variables b. the variables are not linearly related c. the variables are negatively related d. the variables are strongly related ANSWER: b 40. The correlation coefficient will always take values _____. a. greater than 0 b. between –1 and 0 c. between –1 and +1 d. less than –1 ANSWER: c Copyright Cengage Learning. Powered by Cognero.

Page 6


Name:

Class:

Date:

Chapter 02 - Descriptive Statistics 41. Scores on Ms. Bond's test have a mean of 70 and a standard deviation of 11. Michelle has a score of 48. Convert Michelle's score to a z-score. (Round to two decimal places if necessary.) a. 2 b. 41.64 c. –2 d. 1.33 ANSWER: c 42. Scores on Ms. Nash's test have a mean of 64 and a standard deviation of 9. Steve has a score of 52. Convert Steve's score to a z-score. (Round to two decimal places if necessary.) a. 1.33 b. 58.2 c. –2 d. –1.33 ANSWER: d 43. Scores on Ms. Bond's test have a mean of 70 and a standard deviation of 11. David has a score of 52 on Ms. Bond's test. Scores on Ms. Nash's test have a mean of 64 and a standard deviation of 6. Steven has a score of 52 on Ms. Nash's test. Which student has the higher standardized score? a. David's standardized score is –1.64 and Steven's standardized score is –2.00. Therefore, David has the higher standardized score. b. David's standardized score is –1.64 and Steven's standardized score is –2.00. Therefore, Steven has the higher standardized score. c. David's standardized score is 1.64 and Steven's standardized score is 2.00. Therefore, Steven has the higher standardized score. d. Cannot be determined with the information provided. ANSWER: b 44. The College Board originally scaled SAT scores so that the scores for each section were approximately normally distributed with a mean of 500 and a standard deviation of 100. Assuming scores follow a bell-shaped distribution, use the empirical rule to find the percentage of students who scored greater than 700. a. 97.5% b. 95% c. 2.5% d. 5% ANSWER: c 45. The College Board originally scaled SAT scores so that the scores for each section were approximately normally distributed with a mean of 500 and a standard deviation of 100. Assuming scores follow a bell-shaped distribution, use the empirical rule to find the percentage of students who scored less than 400. a. 16% b. 68% c. 84% d. 32% ANSWER: a Copyright Cengage Learning. Powered by Cognero.

Page 7


Name:

Class:

Date:

Chapter 02 - Descriptive Statistics 46. The College Board reported that the mean Math Level 2 SAT subject test score was 686 with a standard deviation of 96. Assuming scores follow a bell-shaped distribution, use the empirical rule to find the percentage of students who scored less than 494. a. 97.5% b. 95% c. 2.5% d. 5% ANSWER: c 47. Compute the relative frequency for students who earned an A shown in the table of grades below. Grades Number of Students A 10 B 31 C 36 D 6 83 a. 0.12 b. 0.10 c. 0.83 d. Not enough information ANSWER: a 48. Compute the relative frequency for students who earned a C shown in the table of grades below. Grades Number of Students A 10 B 31 C 36 D 6 83 a. 0.43 b. 0.53 c. 0.83 d. Not enough information ANSWER: a 49. Below is a histogram for the number of days that it took Wyche Accounting to perform audits in the last quarter of last year. What is the relative frequency of the 21-24 bin?

Copyright Cengage Learning. Powered by Cognero.

Page 8


Name:

Class:

Date:

Chapter 02 - Descriptive Statistics

a. 0.05 b. 0.14 c. 0.25 d. 2.5 ANSWER: c 50. Below is a histogram for the number of days that it took Wyche Accounting to perform audits in the last quarter of last year. What is the frequency of the 25-28 bin?

Copyright Cengage Learning. Powered by Cognero.

Page 9


Name:

Class:

Date:

Chapter 02 - Descriptive Statistics

a. 0.05 b. 1 c. 0.5 d. 4 ANSWER: b 51. What is the total relative frequency? 20XX Contest Sales Salesman Frequency Relative Frequency Frances Clonts 15 0.05 Sarah Leigh 184 0.62 Devon Pride 37 John Townes 62 0.21 Total 298 a. 1 b. 99.12 c. 0.88 d. Not enough information ANSWER: a 52. Below is a histogram for the number of days that it took Wyche Accounting to perform audits in the last quarter of last year. What is the bin size for the histogram? Copyright Cengage Learning. Powered by Cognero.

Page 10


Name:

Class:

Date:

Chapter 02 - Descriptive Statistics

a. 3 b. 4 c. 16 d. 5 ANSWER: b 53. Select the histogram that is moderately skewed right.

a. A b. B Copyright Cengage Learning. Powered by Cognero.

Page 11


Name:

Class:

Date:

Chapter 02 - Descriptive Statistics c. C d. D ANSWER: b 54. Which graph represents a negative linear relationship between x and y?

a. A b. B c. C d. None of the graphs display a negative linear relationship. ANSWER: c 55. Below is the data for the number of days that it took Wyche Accounting to perform audits in the last quarter of last year. 56, 42, 37, 29, 45, 51, 30, 25, 34, 57 What is the median number of days that it took Wyche Accounting to perform audits in the last quarter of last year? a. 41 b. 40.6 c. 39.5 d. 42 ANSWER: c 56. What is the mode of the data set given below? 35, 47, 65, 47, 22 a. 47.5 b. 47 c. 65 d. 22 ANSWER: b 57. A sample of 13 adult males’ heights are listed below. 70, 72, 71, 70, 69, 73, 69, 68, 70, 71, 67, 71, 74 Copyright Cengage Learning. Powered by Cognero.

Page 12


Name:

Class:

Date:

Chapter 02 - Descriptive Statistics Find the range of the data. a. 7 b. 6.5 c. 5 d. 4 ANSWER: a 58. James’s manager asked him to sort the last names in the following list in descending order. What does this mean? Customer ID CG-12520 DV-13045 SO-20335 BH-11710 AA-10480 IM- 15070 HP-14815 PK-19075 AG-10270 ZD-21925

First Last Sales Quantity Discount Profit Claire Gute $ 261.96 2 0 $ 41.91 Darrin VanHuff $ 14.62 2 0 $ 16.87 Sean O'Donnell $ 957.58 5 0.45 $ (383.03) Brosina Hoffman $ 48.86 7 0 $ 14.17 Andrew Allen $ 25.55 3 0.2 $ 5.44 Irene Maddox $ 407.98 3 0.2 $ 132.59 Harold Pawlan $ 68.81 5 0.8 $ (123.86) Pete Kriz $ 665.88 6 0 $ 13.32 Alejandro Grove $ 55.50 2 0 $ 9.99 Zuschuss Donatelli $ 8.56 2 0 $ 2.48

a. The last names must be sorted from A to Z. b. The last names must be sorted from Z to A. c. The last names must be sorted from the earliest to the latest that has been added to the list. d. James should use the Sort function to organize the data into order of sales. ANSWER: b 59. You have been asked to reorganize the Excel table below into order of sales using the Sales column. Which option will allow you to do this quickly? Customer ID CG-12520 DV-13045 SO-20335 BH-11710 AA-10480 IM- 15070 HP-14815 PK-19075 AG-10270 ZD-21925

First Last Sales Quantity Discount Profit Claire Gute $ 261.96 2 0 $ 41.91 Darrin VanHuff $ 14.62 2 0 $ 16.87 Sean O'Donnell $ 957.58 5 0.45 $ (383.03) Brosina Hoffman $ 48.86 7 0 $ 14.17 Andrew Allen $ 25.55 3 0.2 $ 5.44 Irene Maddox $ 407.98 3 0.2 $ 132.59 Harold Pawlan $ 68.81 5 0.8 $ (123.86) Pete Kriz $ 665.88 6 0 $ 13.32 Alejandro Grove $ 55.50 2 0 $ 9.99 Zuschuss Donatelli $ 8.56 2 0 $ 2.48

Copyright Cengage Learning. Powered by Cognero.

Page 13


Name:

Class:

Date:

Chapter 02 - Descriptive Statistics a. Use the Cut and Paste function to reorganize the data into order of sales. b. Use the Filter function to organize the data into order of sales. c. Use the Order function to organize the data into order of sales. d. Use the Sort function to organize the data into order of sales. ANSWER: d 60. Which Excel command will return all modes when more than one mode exists? a. MODE.MULT b. MODE.SNGL c. MODE d. MODES ANSWER: a 61. In a survey of patients in a local hospital, 62.42% of the respondents indicated that the health care providers needed to spend more time with each patient. Who makes up the population? a. All patients in a local hospital b. All survey respondents c. Hospital patients d. Cannot be determined from the information given ANSWER: a 62. In a survey of patients in a local hospital, 62.42% of the respondents indicated that the health care providers needed to spend more time with each patient. Who makes up the sample? a. All patients in a local hospital b. All survey respondents c. Hospital patients d. Cannot be determined from the information given ANSWER: b 63. A manager of a fast food restaurant wants the drive-thru employee to ask every fifth customer if he or she is satisfied with the service. Who makes up the population? a. All customers who use the drive-thru window of this fast food restaurant b. All survey respondents c. All customers of this restaurant d. The proportion of customers who say they are satisfied with their service ANSWER: a 64. A manager of a fast food restaurant wants the drive-thru employee to ask every fifth customer if he or she is satisfied with the service. Who makes up the sample? a. All customers who use the drive-thru window of this fast food restaurant b. All survey respondents c. All customers of this restaurant Copyright Cengage Learning. Powered by Cognero.

Page 14


Name:

Class:

Date:

Chapter 02 - Descriptive Statistics d. The proportion of customers who say they are satisfied with their service ANSWER: b 65. Which of the following relationships would have a negative correlation coefficient? a. Supply and demand b. Amount of a bill at a restaurant and the amount of the tip c. Cost of a car and the amount of tax to be paid d. The square footage of a home and the price of the home ANSWER: a 66. The distribution of hourly sales for a local family owned store is normally distributed with a mean of $225 per hour and a standard deviation of $75 per hour. Which of the following intervals contains the middle 95% of hourly sales? a. $75 to $375 b. $150 to $300 c. $175 to $275 d. $125 to $325 ANSWER: a 67. Data sets commonly include observations with missing values for one or more variables. In some cases missing data naturally occur; these are called _____. a. legitimately missing data b. data cleansing c. illegitimate missing data d. missing random data ANSWER: a 68. _____ is the process of removing variables from the analysis without losing crucial information. a. Data Cleansing b. Dimension reduction c. Legitimate missing data d. Missing random data ANSWER: b Subjective Short Answer 69. A student willing to participate in a debate competition is required to fill out a registration form. State whether each of the following information about the participant provides categorical or quantitative data. a. What is your birth month? b. Have you participated in any debate competition previously? c. If yes, in how many debate competitions have you participated so far? d. Have you won any of the competitions? e. If yes, how many have you won? ANSWER: a. Categorical b. Categorical c. Quantitative d. Categorical Copyright Cengage Learning. Powered by Cognero.

Page 15


Name:

Class:

Date:

Chapter 02 - Descriptive Statistics e. Quantitative 70. The following table provides information on the number of billionaires in a country and the continents on which these countries are located. Nationality United States Brazil Russia Mexico India Turkey United Kingdom Hong Kong Germany Canada China

Continent North America South America Europe North America Asia Europe Europe Asia Europe North America Asia

Number of Billionaires 426 38 105 37 54 40 31 39 57 28 120

a. Sort the countries from largest to smallest based on the number of billionaires. What are the top five countries according to the number of billionaires? b. Filter the countries to display only the countries located in North America. ANSWER: a. Nationality Continent Number of Billionaires United States North America 426 China Asia 120 Russia Europe 105 Germany Europe 57 India Asia 54 Turkey Europe 40 Hong Kong Asia 39 Brazil South America 38 Mexico North America 37 United Kingdom Europe 31 Canada North America 28 The top five countries with the greatest number of billionaires are the United States, China, Russia, Germany, and India. b. Nationality Continent Number of Billionaires United States North America 426 Mexico North America 37 Canada North America 28 71. The data on the percentage of visitors in the previous and current years at 12 well-known national parks of the United States are given below. National Parks The Smokies The Grand Canyon Theodore Roosevelt Yosemite

Percentage of visitors previous year 78.2% 83.5% 81.6% 74.2%

Copyright Cengage Learning. Powered by Cognero.

Percentage of visitors current year 84.2% 81.6% 84.8% 78.4% Page 16


Name:

Class:

Date:

Chapter 02 - Descriptive Statistics Yellowstone Olympic The Colorado Rockies Zion The Grand Tetons Cuyahoga Valley Acadia Shenandoah

77.9% 86.4% 84.3% 76.7% 84.6% 85.1% 79.2% 72.9%

76.2% 88.6% 85.4% 78.9% 87.8% 86.7% 82.6% 79.2%

a. Sort the parks in descending order by their current year’s visitor percentage. Which park has the highest number of visitors in the current year? Which park has the lowest number of visitors in the current year? b. Calculate the change in visitor percentage from the previous to the current year for each park. Use Excel’s conditional formatting to highlight the parks whose visitor percentage decreased from the previous year to the current year. c. Use Excel’s conditional formatting tool to create data bars for the change in visitor percentage from the previous year to the current year for each park calculated in part b. ANSWER: a. The sorted list of parks for the current year appears as below: National Parks Olympic The Grand Tetons Cuyahoga Valley The Colorado Rockies Theodore Roosevelt The Smokies Acadia The Grand Canyon Shenandoah Zion Yosemite Yellowstone

Percentage of visitors previous Percentage of visitors current year year 86.4% 88.6% 84.6% 87.8% 85.1% 86.7% 84.3% 85.4% 81.6% 84.8% 78.2% 84.2% 79.2% 82.6% 83.5% 81.6% 72.9% 79.2% 76.7% 78.9% 74.2% 78.4% 77.9% 76.2%

Olympic has the highest number of visitors in the current year, and Yellowstone has the lowest number of visitors in the current year. b. National Parks The Smokies The Grand Canyon Theodore Roosevelt Yosemite Yellowstone Olympic The Colorado Rockies Zion The Grand Tetons Cuyahoga Valley Acadia Shenandoah Copyright Cengage Learning. Powered by Cognero.

Percentage of visitors previous Percentage of visitors Change in visitor year current year percentage 78.2% 84.2% 6.00% 83.5% 81.6% -1.90% 81.6% 84.8% 3.20% 74.2% 78.4% 4.20% 77.9% 76.2% -1.70% 86.4% 88.6% 2.20% 84.3% 85.4% 1.10% 76.7% 78.9% 2.20% 84.6% 87.8% 3.20% 85.1% 86.7% 1.60% 79.2% 82.6% 3.40% 72.9% 79.2% 6.30% Page 17


Name:

Class:

Date:

Chapter 02 - Descriptive Statistics c. The output using Excel’s conditional formatting tool that created data bars for the change in visitor percentage from the previous year to the current year for each park appears as below.

72. The partial relative frequency distribution is given below: Group 1 2 3 4

Relative Frequency 0.15 0.32 0.29

a. What is the relative frequency of group 4? b. The total sample size is 400. What is the frequency of group 4? c. Show the frequency distribution. d. Show the percent frequency distribution. ANSWER: a. The relative frequency of group 4 is obtained as 1.00 – 0.15 – 0.32 – 0.29 = 0.24. b. If the total sample size is 400, the frequency of group 4 is obtained as 0.24 × 400 = 96. c. Group Relative Frequency Frequency 1 0.15 60 2 0.32 128 3 0.29 116 4 0.24 96 Total 1.00 400 d. Group Relative Frequency % Frequency 1 0.15 15 2 0.32 32 3 0.29 29 4 0.24 24 Total 1.00 100 73. A survey on the most preferred newspaper in the USA listed The New York Times (TNYT), Washington Post (WP), Daily News (DN), New York Post (NYP), and Los Angeles Times (LAT) as the top five most preferred newspapers. The table below shows the preferences of 50 citizens. TNYT DN

WP TNYT

NYP LAT

WP WP

TNYT WP

Copyright Cengage Learning. Powered by Cognero.

Page 18


Name:

Class:

Date:

Chapter 02 - Descriptive Statistics DN NYP LAT WP TNYT LAT WP TNYT

LAT TNYT WP DN TNYT LAT WP DN

TNYT WP DN TNYT LAT NYP TNYT NYP

TNYT LAT WP DN TNYT WP DN TNYT

NYP NYP LAT DN NYP DN TNYT WP

a. Are these data categorical or quantitative? b. Provide frequency and percent frequency distributions. c. On the basis of the sample, which newspaper is preferred the most? ANSWER: a. The given data are categorical. b. Newspapers Frequency % Frequency TNYT 14 28 WP 12 24 DN 9 18 NYP 7 14 LAT 8 16 Total 50 100 c. The most preferred newspaper is The New York Times. 74. The mentor of a class researched the number of hours spent on study in a week by each student of the class in order to analyze the correlation between the study hours and the marks obtained by each student. The data on the hours spent per week by 25 students are listed below. 13 12 13 17 24

14 19 16 18 20

16 21 18 23 14

15 22 25 16 22

12 19 21 12 15

a. What is the least amount of time a student spent per week on studying in this sample? The highest? b. Use a class width of 2 hours to prepare a frequency distribution, a relative frequency distribution, and a percent frequency distribution for the data. c. Prepare a histogram and comment on the shape of the distribution. ANSWER: a. The least amount of time a student spent was 12 hours, and the highest was 25 hours. b. Hours in Study per Week Frequency Relative Frequency % Frequency 12–13 5 0.2 20 14–15 4 0.16 16 16–17 4 0.16 16 18–19 4 0.16 16 20–21 3 0.12 12 22–23 3 0.12 12 24–25 2 0.08 8 Total 25 1 100 c. Copyright Cengage Learning. Powered by Cognero.

Page 19


Name:

Class:

Date:

Chapter 02 - Descriptive Statistics

The distribution is skewed to the right. 75. The manager of an automobile showroom studied the time spent by each salesperson interacting with the customer in a month apart from the other jobs assigned to them. The data in hours are given below. 17 13 18 16 20 24 15 19 19 12 10 16 26 27 13 23 17 15 24 20 14 21 26 24 Using classes 10-13, 14-17, and so on, show: a. The frequency distribution. b. The relative frequency distribution. c. The cumulative frequency distribution. d. The cumulative relative frequency distribution. e. The proportion of salesperson who spent 13 hours of time or less with the customers. f. Prepare a histogram and comment on the shape of the distribution. ANSWER: a. – d. Class 10–13 14–17 18–21 22–25 26–29 Total

Frequency 4 7 6 4 3 24

Relative Frequency 0.17 0.29 0.25 0.17 0.13 ˜1

Copyright Cengage Learning. Powered by Cognero.

Cumulative Frequency 4 11 17 21 24

Cumulative Relative Frequency 0.17 0.46 0.71 0.88 1.00 (approx.) Page 20


Name:

Class:

Date:

Chapter 02 - Descriptive Statistics e. From the cumulative relative frequency distribution, 17% of the salespersons spent 13 hours of time or less with the customers. f.

The distribution is skewed to the right. 76. The scores of a sample of students in a Math test are 20, 15, 19, 21, 22, 12, 17, 14, 24, 16 and in a Stat test are 16, 12, 19, 17, 22, 14, 20, 21, 24, 15, 13. a. Compute the mean and median scores for both the Math and the Stat tests. b. Compare the mean and median scores computed in part a. Comment. ANSWER: a. For Math test: Mean = 18 Median = 18 For Stat test: Mean = 17.5 Median = 17 b. The mean and the median scores for statistics are lower than that for mathematics. These lower values are because of an additional score 13 for statistics, which is lower than the mean and the median scores for mathematics. 77. Consider a sample on the waiting times (in minutes) at the billing counter in a grocery store to be 15, 24, 18, 15, 21, 20, 15, 22, 19, 16, 15, 22, 20, 15, and 21. Compute the mean, median, and mode. ANSWER: Mean = 18.53 Median = 19 Mode = 15 78. Suppose that you make a fixed deposit of $1,000 in Bank X and $500 in Bank Y. The value of each investment at the end of each subsequent year is provided in the table. Year 1 2

Bank X ($) 1,320 1,510

Bank Y ($) 560 620

Copyright Cengage Learning. Powered by Cognero.

Page 21


Name:

Class:

Date:

Chapter 02 - Descriptive Statistics 3 4 5 6 7 8 9 10

1,750 2,090 2,240 2,470 2,830 3,220 3,450 3,690

680 740 790 820 870 910 950 990

Which of the two banks provides a better return over this time period? ANSWER: a. Year 1 2 3 4 5 6 7 8 9 10

Bank X 1,000 1,320 1,510 1,750 2,090 2,240 2,470 2,830 3,220 3,450 3,690

Growth Factor

Growth Factor

1.32 1.14 1.16 1.19 1.07 1.10 1.15 1.14 1.07 1.07

Bank Y 500 560 620 680 740 790 820 870 910 950 990

Geometric Mean % of return

1.1395 13.95%

Geometric Mean % of return

1.0707 7.07%

1.12 1.11 1.10 1.09 1.07 1.04 1.06 1.05 1.04 1.04

Bank X provides a better return when compared to Bank Y. 79. Consider a sample on the waiting times (in minutes) at the billing counter in a grocery store to be 15, 24, 18, 15, 21, 20, 15, 22, 19, 16, 15, 22, 20, 15, and 21. Compute the 25th, 50th, and 75th percentiles. ANSWER: 25th percentile = 15 50th percentile = 19 75th percentile = 21 80. Suppose that the average time an employee takes to reach the office is 35 minutes. To address the issue of late comers, the mode of transport chosen by the employee is tracked: private transport (two-wheelers and four-wheelers) and public transport. The data on the average time (in minutes) taken using both a private transportation system and a public transportation system for a sample of employees are given below. Private Transport 27 33 28 32 20 34 30

Public Transport 30 29 25 20 27 32 37

Copyright Cengage Learning. Powered by Cognero.

Page 22


Name:

Class:

Date:

Chapter 02 - Descriptive Statistics 28 18 29

38 21 35

a. What are the mean and median travel times for employees using a private transport? What are the mean and median travel times for employees using a public transport? b. What are the variance and standard deviation of travel times for employees using a private transport? What are the variance and standard deviation of travel times for employees using a public transport? c. Comment on the results. ANSWER: Travel times (in minutes) a. Using private transport: Mean = 27.9 Median = 28.5 Using public transport: Mean = 29.4 Median = 29.5 b. Using private transport: Variance= 27.43 Standard deviation = 5.24 Using public transport: Variance = 39.38 Standard deviation = 6.28 c. The travel times of employees using a private transport are less than those using a public transport. 81. The average time a customer service executive takes to resolve an issue on a mobile handset is 26.4 minutes. The average times taken to resolve the issue by a sample of 15 such executives are shown below. Name Jack Samantha Richard Steve Mary Sergio John Michelle Linda Mark Matt Polly Sheila Jeff Gerald

Time (in minutes) 25.3 28.2 26.8 29.5 22.4 21.7 24.3 22.4 26.8 29.4 23.6 26.4 23.5 26.8 28.1

a. What is the mean resolution time? b. What is the median resolution time? c. What is the mode for these 15 executives? d. What is the variance and standard deviation? e. What is the third quartile? Copyright Cengage Learning. Powered by Cognero.

Page 23


Name:

Class:

Date:

Chapter 02 - Descriptive Statistics ANSWER:

a. Mean = 25.68 b. Median = 26.4 c. Mode = 26.8 d. Variance = 6.67; Standard deviation = 2.58 e. Third quartile = 28.1

82. Suppose that the average time an employee takes to reach the office is 35 minutes. To address the issue of late comers, the mode of transport chosen by the employee is tracked: private transport (two-wheelers and four-wheelers) and public transport. The data on the average time (in minutes) taken using both a private transportation system and a public transportation system for a sample of employees are given below. Private Transport 27 33 28 32 20 34 30 28 18 29

Public Transport 30 29 25 20 27 32 37 38 21 35

a. Considering the travel times (in minutes) of employees using private transport, compute the z-score for the tenth employee with travel time of 29 minutes. b. Considering the travel times (in minutes) of employees using public transport, compute the z-score for the second employee with travel time of 29 minutes. How does this z-score compare with the z-score you calculated for part a? c. Based on z-scores, do the data for employees using private transport and public transport contain any outliers? ANSWER: a. For tenth employee using private transport: First, calculate the mean (AVERAGE function in Excel) and standard deviation (STDEV.S function in Excel) for private transport. Mean = 27.9, StDev = 5.24 The z-score is then obtained as, b. For second employee using public transport: First, calculate the mean (AVERAGE function in Excel) and standard deviation (STDEV.S function in Excel) for public transport. Mean = 29.4, StDev = 6.28 The z-score is then obtained as, Even though the employees had the same travel time, the z-score for the tenth employee in the sample who used a private transport is much larger because that employee is part of a sample with a smaller mean and a smaller standard deviation. c. Travel Times using Private Transport 27 33 28 32 Copyright Cengage Learning. Powered by Cognero.

z-score –0.17 0.97 0.02 0.78

Travel Times using Public Transport 30 29 25 20

z-score 0.10 –0.06 –0.70 –1.50 Page 24


Name:

Class:

Date:

Chapter 02 - Descriptive Statistics 20 34 30 28 18 29

–1.51 1.16 0.40 0.02 –1.89 0.21

27 32 37 38 21 35

–0.38 0.41 1.21 1.37 –1.34 0.89

No z-score is less than –3.0 or above +3.0; therefore, the z-scores do not indicate the existence of any outliers in either sample. 83. The results of a survey showed that, on average, children spend 5.6 hours on PlayStation per week. Suppose that the standard deviation is 1.7 hours and that the number of hours on PlayStation follows a bell-shaped distribution. a. Use the empirical rule to calculate the percentage of children who spend between 2.2 and 9 hours on PlayStation per week. b. What is the z-value for a child who spends 7.5 hours on PlayStation per week? c. What is the z-value for a child who spends 4.5 hours on PlayStation per week? ANSWER: a. According to the empirical rule, approximately 95% of data values will be within two standard deviations of the mean. 2.2 is two standard deviations less than the mean, and 9 is two standard deviations greater than the mean. Therefore, approximately 95% of children spend between 2.2 and 9 hours on PlayStation per week. b. c. 84. A study on the average minutes spent by students on Internet usage is 300 with a standard deviation of 102. Answer the following questions assuming a bell-shaped distribution and using the empirical rule. a. What percentage of students use the Internet for more than 402 minutes? b. What percentage of students use the Internet for more than 504 minutes? c. What percentage of students use the Internet between 198 minutes and 300 minutes? ANSWER: a. 402 is one standard deviation above the mean. The empirical rule states that 68% of data values will be within one standard deviation of the mean. Because a bell-shaped distribution is symmetric, 0.5 × (1– 68%) = 16% of the data values will be greater than (mean + 1 × standard deviation) 402. 16% of students use the Internet for more than 402 minutes. b. 504 is two standard deviations above the mean. The empirical rule states that 95% of data values will be within two standard deviations of the mean. Because a bell-shaped distribution is symmetric, 0.5 × (1– 95%) = 2.5% of the data values will be greater than (mean + 2 × standard deviation) 504. 2.5% of students use the Internet for more than 504 minutes. c. 198 is one standard deviation below the mean. The empirical rule states that 68% of data values will be within one standard deviation of the mean, and we expect that 0.5 × (1 – 68%) = 16% of data values will be below one standard deviation below the mean. 300 is the mean, so we expect that 50% of the data values will be below the mean. Therefore, we expect 50% – 16% = 34% of the data values will be between the mean 300 and one standard deviation below the mean 198. 34% of students use the Internet between 198 minutes and 300 minutes. 85. Eight observations taken for two variables are as follows:

Copyright Cengage Learning. Powered by Cognero.

Page 25


Name:

Class:

Date:

Chapter 02 - Descriptive Statistics 11 35 13 32 17 26 18 25 22 20 24 17 26 11 28 10 a. Develop a scatter diagram with x on the horizontal axis. b. What does the scatter diagram developed in part (a) indicate about the relationship between the two variables? c. Compute and interpret the sample covariance. d. Compute and interpret the sample correlation coefficient. ANSWER: a.

b. There appears to be a negative linear relationship between the x and y variables. c. 11 13 17 18 22 24 26 28

35 32 26 25 20 17 11 10

=

19.88

=

22

–8.88 –6.88 –2.88 –1.88 2.13 4.13 6.13 8.13

13 10 4 3 -2 -5 -11 -12

–115.38 –68.75 –11.50 –5.63 –4.25 –20.63 –67.38 –97.50 –391

. The negative covariance confirms that there is a negative linear relationship between the x and y variables in Copyright Cengage Learning. Powered by Cognero.

Page 26


Name:

Class:

Date:

Chapter 02 - Descriptive Statistics this data set. d. Then the correlation coefficient is calculated as: . The correlation coefficient again confirms and indicates a strong negative linear association between the x and y variables in this data set. 86. Consider the following data on income and savings of a sample of residents in a locality: Income ($ thousands) 50 51 52 55 56 58 60 62 65 66

Savings ($ thousands) 10 11 13 14 15 15 16 16 17 17

a. Compute the correlation coefficient. Is there a positive correlation between the income and savings? What is your interpretation? b. Show a scatter diagram of the relationship between the income and savings. ANSWER: a. 50 51 52 55 56 58 60 62 65 66

10 11 13 14 15 15 16 16 17 17

–7.5 –6.5 –5.5 –2.5 –1.5 0.5 2.5 4.5 7.5 8.5

Copyright Cengage Learning. Powered by Cognero.

–4.4 –3.4 –1.4 –0.4 0.6 0.6 1.6 1.6 2.6 2.6

56.25 42.25 30.25 6.25 2.25 0.25 6.25 20.25 56.25 72.25 292.5

19.36 11.56 1.96 0.16 0.36 0.36 2.56 2.56 6.76 6.76 52.4

33 22.1 7.7 1 –0.9 0.3 4 7.2 19.5 22.1 116

Page 27


Name:

Class:

Date:

Chapter 02 - Descriptive Statistics

This indicates that there is a strong positive relationship between income and savings. b.

87. Below is a histogram for the number of days that it took Wyche Accounting to perform audits in the last quarter of last year. What is the relative frequency of the 21-24 bin?

ANSWER: 0.15 88. Below is a histogram for the number of days that it took Wyche Accounting to perform audits in the last quarter of last year. What is the relative frequency of the 25-28 bin? Copyright Cengage Learning. Powered by Cognero.

Page 28


Name:

Class:

Date:

Chapter 02 - Descriptive Statistics

ANSWER: 0.10 89. Below are the data for the number of days that it took Wyche Accounting to perform audits in the last quarter of last year. 56, 42, 37, 29, 45, 51, 30, 25, 34, 57 What is(are) the mode(s) number of days that it took Wyche Accounting to perform audits in the last quarter of last year? ANSWER: None 90. What is(are) the mode(s) of the following data set? 35, 42, 65, 42, 22 ANSWER: 42 91. The difference between the largest and the smallest data values is the __________. ANSWER: range 92. The Excel function STANDARDIZE can be used to calculate ____________. ANSWER: z-scores 93. You would __________ a table if you wanted to display only data that match specific criteria. ANSWER: filter 94. The ___________ measures the variability of the middle 50% of a data set. ANSWER: IQR 95. Below are the data for African countries. Assess the quality of the data by identifying missing values and sort by GDP. Copyright Cengage Learning. Powered by Cognero.

Page 29


Name:

Class:

Date:

Chapter 02 - Descriptive Statistics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

A Country Algeria Angola Bolivia Botswana Cameroon Congo, Democratic Republic of the Congo, Republic of the Côte d'Ivoire Egypt Equatorial Guinea Ethiopia Gabon Ghana Jordan Kenya Libya Mali Mauritius Morocco Mozambique Namibia Nigeria Senegal South Africa Sudan +South Sudan Tanzania Uganda Yemen Zambia

B Continent Africa Africa Africa Africa Africa Africa Africa Africa Africa Africa Africa Africa Africa Africa Africa Africa Africa Africa Africa Africa Africa Africa Africa Africa Africa Africa Africa Africa Africa

ANSWER: Country

Mauritius Mozambique Senegal Congo, Republic of the Congo, Democratic Republic of the Gabon Uganda Botswana Zambia Copyright Cengage Learning. Powered by Cognero.

Continent Africa Africa Africa Africa Africa Africa Africa Africa Africa

C GDP (millions of US$) 190,709 100,948 24,604 17,570 15,668 14,769 24,096 235,719 31,256 16,176 29,233 34,796 36,874 11,313 99,241 12,827 238,920 14,461 408,074 64,750 23,333 16,810 33,675 19,206 GDP (millions of US$) 11,313 12,827 14,461 14,769 15,668 16,176 16,810 17,570 19,206 Page 30


Name:

Class:

Date:

Chapter 02 - Descriptive Statistics

Tanzania Côte d'Ivoire Bolivia Jordan Ethiopia Yemen Kenya Libya Sudan +South Sudan Morocco Angola Algeria Egypt Nigeria South Africa Cameroon Equatorial Guinea Ghana Mali Namibia

Copyright Cengage Learning. Powered by Cognero.

Africa Africa Africa Africa Africa Africa Africa Africa Africa Africa Africa Africa Africa Africa Africa Africa Africa Africa Africa Africa

23,333 24,096 24,604 29,233 31,256 33,675 34,796 36,874 64,750 99,241 100,948 190,709 235,719 238,920 408,074

Page 31


Name:

Class:

Date:

Chapter 03 - Data Visualization Multiple Choice 1. Data-ink is the ink used in a table or chart that _____. a. does not help in conveying the data to the audience b. helps in presenting data when the audience need not know exact values c. is necessary to convey the meaning of the data to the audience d. increases the non-data-ink ratio ANSWER: c 2. Deleting the grid lines in a table and the horizontal lines in a chart ______. a. increases the data-ink ratio b. decreases the data-ink ratio c. increases the non-data-ink ratio d. does not affect the data-ink ratio ANSWER: a 3. In many cases, white space in a chart can improve _____. a. complexity b. readability c. functionality d. stability ANSWER: b 4. Tables should be used instead of charts when _____. a. the reader needs relative comparisons of data b. there are more than two columns of data c. the values being displayed have different units or very different magnitudes d. the reader need not differentiate the columns and rows ANSWER: c 5. Which one of the following statements is not true concerning PivotTables in Excel? a. PivotTables are also known as crosstabulation tables. b. PivotTables summarize data for two variables. c. PivotTables are interactive. d. PivotTables can be built using data arrayed in rows. ANSWER: d 6. Fields may be chosen to represent all of the following except _____ in the body of a PivotTable. a. rows b. columns c. values d. filters ANSWER: d 7. A _____ is a graphical presentation of the relationship between two quantitative variables. a. histogram b. bar chart c. pie chart d. scatter chart ANSWER: d 8. _____ are visual methods of displaying data. a. Tables b. Charts Copyright Cengage Learning. Powered by Cognero.

Page 1


Name:

Class:

Date:

Chapter 03 - Data Visualization c. PivotTables ANSWER: b

d. Crosstabs

9. The software package most commonly used for creating simple charts is _____. a. Excel. b. XLMiner. c. SAS. d. R. ANSWER: a 10. A _____ is a line that provides an approximation of the relationship between the variables. a. line chart b. sparkline c. trendline d. gridline ANSWER: c 11. The following image is a _____.

a. sparkline c. gridline ANSWER: d

b. trendline d. line chart

12. DJ needs to display data over time. Which of the following charts should he use? a. Scatter chart b. Pie chart c. Bar chart d. Line chart ANSWER: d 13. A time series plot is also known as a _____. a. boxplot b. frequency graph c. dot plot d. line chart ANSWER: d 14. A line chart that has no axes but is used to provide information on overall trends for time series data is called a _____. a. time series plot b. sparkline c. trendline d. bubble chart ANSWER: b Copyright Cengage Learning. Powered by Cognero.

Page 2


Name:

Class:

Date:

Chapter 03 - Data Visualization 15. The charts that are helpful in making comparisons between categorical variables are _____. a. bar charts and scatter charts b. scatter charts and line charts c. bar charts and column charts d. column charts and line charts ANSWER: c 16. Bar charts use _____. a. horizontal bars to display the magnitude of the quantitative variable b. vertical bars to display the magnitude of the quantitative variable c. horizontal and vertical bars to display the magnitude of the quantitative variable d. vertical bars to display the magnitude of the categorical variable ANSWER: a 17. Making visual comparisons between categorical variables may be difficult in a _____. a. scatter chart b. pie chart c. line chart d. column chart ANSWER: b 18. Using multiple lines on a line chart or employing multiple charts is an alternative to a _____. a. column chart b. line chart c. two-dimensional graph d. three-dimensional chart ANSWER: d 19. A chart that is recommended as an alternative to a pie chart is a _____. a. bar chart b. line chart c. stacked column chart d. box plot ANSWER: a 20. In order to visualize three variables in a two-dimensional graph, we use a _____. a. 2-D chart b. 3-D chart c. bubble chart d. column chart ANSWER: c 21. A two-dimensional graph representing the data using different shades of color to indicate magnitude is called a _____. a. heat map b. bubble chart c. column chart d. pie chart ANSWER: a 22. To avoid problems in interpreting the differences in color in a heat map, _____ can be added. a. a bubble chart b. a pie chart c. a scatter chart d. sparklines ANSWER: d 23. An effective display of trend and magnitude is achieved by using a combination of a _____. a. time series plot and sparklines b. line chart and trendlines c. heat map and sparklines d. bubble chart and trendlines Copyright Cengage Learning. Powered by Cognero.

Page 3


Name:

Class:

Date:

Chapter 03 - Data Visualization ANSWER: c 24. A disadvantage of stacked-column charts and stacked-bar charts is that _____. a. they do not include all the values of the variable b. they cannot be used to compare relative values of quantitative variables for the same category c. it can be difficult to perceive small differences in areas d. they are only used when many quantitative variables need to be displayed ANSWER: c 25. An alternative for a stacked column chart when comparing more than a couple of quantitative variables in each category is a _____. a. stacked bar chart b. clustered column chart c. pie chart d. clustered bar chart ANSWER: b 26. A useful chart for displaying multiple variables is the _____. a. stacked column and bar chart b. scatter chart c. scatter chart matrix d. two-dimensional graph ANSWER: c 27. To generate a scatter chart matrix, we use _____. a. native Excel functionality b. Excel Add-In XLMiner c. Excel Add-In MegaStat d. all of these ANSWER: b 28. To summarize and analyze data with both a crosstabulation and charting, Excel typically pairs _____. a. PivotCharts with PivotTables b. stacked column charts with PivotTables c. heat maps with trendlines d. bubble charts with trendlines ANSWER: a 29. A PivotChart, in few instances, is the same as a _____. a. clustered-column chart b. bubble chart c. stacked-column chart d. bar chart ANSWER: a 30. The best way to differentiate chart elements is by using _____. a. colors b. labels c. bubbles d. chart titles ANSWER: b 31. A _____ is used for examining data with more than two variables, and it includes a different vertical axis for each variable. a. scatter plot b. PivotChart Copyright Cengage Learning. Powered by Cognero.

Page 4


Name:

Class:

Date:

Chapter 03 - Data Visualization c. column chart ANSWER: d

d. parallel-coordinates plot

32. A _____ is useful for visualizing hierarchical data along multiple dimensions. a. heat map b. hierarchical map c. treemap d. map of multiple hierarchy ANSWER: c 33. _____ merges maps and statistics to present data collected over different geographies. a. The heat map b. The geographic information system c. A geographical map d. The statistical information system ANSWER: b 34. A data visualization tool that updates in real time and gives multiple outputs is called _____. a. a data table b. a metrics table c. the GIS d. a data dashboard ANSWER: d 35. In a business, the values indicating the business’s current operating characteristics, such as its financial position, the inventory on hand, and customer service metrics, are typically known as _____. a. company performance indicators b. performance indicators c. key performance indicators d. business performance indicators ANSWER: c 36. We create multiple dashboards _____. a. to help the user scroll vertically and horizontally to see the entire dashboard b. so that each dashboard can be viewed on a single screen c. to make sure the KPIs are not displayed in the data dashboard d. so that all dashboards can be viewed on a single screen ANSWER: b 37. The data dashboard for a marketing manager may have KPIs related to _____. a. current sales measures and sales by region b. current financial standing of the company c. data on the company's call center d. overall performance of the company’s stock over the previous 52 weeks ANSWER: a 38. Consider the clustered bar chart of the dashboard developed to monitor the performance of a call center:

Copyright Cengage Learning. Powered by Cognero.

Page 5


Name:

Class:

Date:

Chapter 03 - Data Visualization

This chart allows the IT manager to _____. a. identify how often a problem is related to hardware b. identify the frequency of a particular type of problem by location c. identify which city contains the most customers d. identify the percent of customers who do not have one of the listed problems ANSWER: b 39. This Excel bar chart displays the demographics of a Business Analysis class. Approximately how many students are in the class?

a. 175 b. 150 c. 105 d. 130 ANSWER: d 40. This bar chart displays the demographics of a Business Analysis class. How many male students are in the class?

Copyright Cengage Learning. Powered by Cognero.

Page 6


Name:

Class:

Date:

Chapter 03 - Data Visualization

a. 30 b. 50 c. 80 d. 130 ANSWER: b 41. Natalie needs to compare the number of employees by job title for the last five years. Which of the following charts should Natalie use? a. Scatter chart b. Bubble chart c. Clustered-column (bar) chart d. Line chart ANSWER: c 42. A graphical presentation that uses vertical bars to display the magnitude of quantitative data is known as a _____. a. scatter chart b. bubble chart c. clustered column chart d. column chart ANSWER: d 43. The ratio of the amount of ink used in a table or chart that is necessary to convey information to the total amount of ink used in the table and chart is known as data-ink ratio. Using additional ink that is not necessary to convey information has what effect on the data-ink ratio? a. It reduces the data-ink ratio. b. It increases the data-ink ratio. c. It doesn't change the data-ink ratio. d. The data-ink ratio becomes zero. ANSWER: a 44. Never use a _____ chart when a _____ chart will suffice. a. 3-D; 2-D Copyright Cengage Learning. Powered by Cognero.

Page 7


Name:

Class:

Date:

Chapter 03 - Data Visualization b. 2-D; 3-D c. color; black and white d. bar; pie ANSWER: a 45. Which of the following graphs cannot be used to display categorical data? a. Stacked-column chart b. Scatter chart c. Pie chart d. Clustered-column chart ANSWER: b 46. Susan would like to create a graph to display the number of males and females in her class who got an A, B, C, D, and F on the last test. Which of the following graphs could she use? a. Stacked-column chart b. Scatter chart c. Pie chart d. Heat map ANSWER: a Subjective Short Answer 47. The following table is an example of the profit made by Hydro America, a water servicing company, for five different years. Year Total revenue ($) Cost of revenue ($) Gross profit ($) Year 1 62723201 26256000 36467201 Year 2 67177612 37026005 30151607 Year 3 72648252 35054123 37594129 Year 4 71225185 35187462 36037723 Year 5 75847373 39298243 36549130 Reformat the table to improve readability and to help the manager identify the year with the highest profit. ANSWER: To improve the readability of the table, we remove unnecessary gridlines, right align the numerical columns, remove bolded font except for column titles, and add commas to dollar values to ease readability. Year Total revenue ($) Cost of revenue ($) Gross profit ($) Year 1 62,723,201 26,256,000 36,467,201 Year 2 67,177,612 37,026,005 30,151,607 Year 3 72,648,252 35,054,123 37,594,129 Year 4 71,225,185 35,187,462 36,037,723 Year 5 75,847,373 39,298,243 36,549,130 It is now easy to identify Year 3 as the year with the highest profit ($37,594,129). 48. Consider the following table and the line chart on the temperatures in 11 different states of the United States. States

Temperature

Copyright Cengage Learning. Powered by Cognero.

Page 8


Name:

Class:

Date:

Chapter 03 - Data Visualization Illinois North Carolina Florida Indiana New Jersey Ohio Pennsylvania Texas Virginia Michigan New York

(degrees F) 76 79 80 80 85 83 86 90 87 91 85

a. What are the problems with the layout and display of this line chart? b. Create a new line chart for the given data. Format the chart to make it easy to read and interpret. ANSWER:

a. The chart contains unnecessary gridlines, the y-axis label values are spaced close together, and the shading of the chart does not add value. b.

Copyright Cengage Learning. Powered by Cognero.

Page 9


Name:

Class:

Date:

Chapter 03 - Data Visualization

Small markers are added on the line chart at each data point. 49. The data on the scores obtained by students in five different entrance exams have been collected from 50 colleges and they are provided below. Create a PivotTable in Excel to display the number of students who took each exam and the average score for students in each exam. Exams SAT ACT MCAT GRE GMAT ACT MCAT GRE GMAT SAT GRE GMAT ACT MCAT GRE GMAT SAT GMAT SAT GRE GMAT ACT MCAT GRE ACT

Scores 520 400 580 280 540 356 520 355 480 574 396 450 420 560 297 520 489 500 566 451 460 422 550 310 384

Exams MCAT GRE GMAT SAT GMAT SAT GRE MCAT GRE ACT MCAT GRE GMAT SAT GMAT SAT GMAT ACT MCAT GRE GMAT SAT GMAT SAT GRE

Scores 487 267 455 528 536 469 455 520 489 455 589 500 500 528 480 475 570 480 567 546 544 420 453 510 473

a. Which exam did most students attempt? b. Which exam has the highest average score? c. Use the PivotTable to determine the exam attempted by the student with the highest score. Copyright Cengage Learning. Powered by Cognero.

Page 10


Name:

Class:

Date:

Chapter 03 - Data Visualization What is the exam attempted by the student with the lowest score? ANSWER:

a. Most students attempted the GMAT exam. The PivotTable shows that the GMAT exam has the greatest number of students with 13 students. b. MCAT has the highest average score of 547 (approx.). c. By changing the Value Field Settings for Scores from Average to Max, we see that MCAT has the highest score of 589. By changing the Value Field Settings for Scores to Min, we see that GRE has the least score of 267. 50. A local search service company surveys on the number of service centers available in three major cities for different brands of automobiles with an objective to improve the services to its customers. The data on the 20 automobile brands and the number of service centers are given below: Brands Audi BMW Mercedes-Benz Rolls-Royce Volkswagen Toyota Jaguar Nissan Ford Fiat Land Rover Chevrolet Ferrari

Number of service centers 38 42 49 25 30 30 42 35 41 35 34 29 32

Copyright Cengage Learning. Powered by Cognero.

Page 11


Name:

Class:

Date:

Chapter 03 - Data Visualization Hyundai Porsche Skoda Tata Honda Renault Subaru

15 35 42 20 23 40 10

a. How many automobile brands have centers between 20 and 29 in these three cities? b. How many automobile brands have more than 40 centers in these cities? ANSWER:

We create a PivotTable to summarize the given data using classes 10-19, 20-29, 30-39, and 40-49.

Use “Number of service centers” as the Columns, and use “Count of Number of service centers” as the Values in the PivotTable. Right-click on the table and use the option Group to obtain the classes. We see that, a. 4 automobile brands have centers between 20 and 29 in these cities. b. 6 brands have centers more than 40. 51. A summary on commodities below lists the change in price on a particular day for each commodity belonging to three categories—Base Metals, Precious Metals, and Agricultural & Cattle Futures. Commodity Aluminum Gold Corn Silver Aluminum Wheat Soybeans Copper Platinum

Commodity Summary Type of Commodity Base metals Precious metals Agricultural & Cattle Futures Precious metals Base metals Agricultural & Cattle Futures Agricultural & Cattle Futures Base metals Precious metals

Copyright Cengage Learning. Powered by Cognero.

Price ($) 1700 1229 400 1975 1750 640 1300 7012 1357

Change (%) 0.0750 –0.2300 0.0125 –0.1800 –0.1000 –0.0425 –0.1250 –0.1700 –0.1900 Page 12


Name:

Class:

Date:

Chapter 03 - Data Visualization Cocoa Coffee Lead White Sugar Sugar 11 Nickel Cotton Oranges Tin Palladium Palm oil Zinc

Agricultural & Cattle Futures Agricultural & Cattle Futures Base metals Agricultural & Cattle Futures Agricultural & Cattle Futures Base metals Agricultural & Cattle Futures Agricultural & Cattle Futures Base metals Precious metals Agricultural & Cattle Futures Base metals

2.8 109 2065 450 20.19 13300 77.39 139.62 22600 717 930 1800

0.0000 –0.0085 –0.1000 –0.0900 –0.0087 –0.2500 –0.0087 –0.0040 0.3000 –0.0700 0.0500 –0.0100

a. Prepare a PivotTable that gives the frequency count of the data by Commodity Type (rows) and the Change (columns). Use classes of –0.25 to –0.15, –0.15 to –0.05, –0.05 to 0.05, 0.05 to 0.15, 0.15 to 0.25, and 0.25 to 0.35 for the Change (%). b. What conclusions can you draw about the commodity type and the change (%) in price for that particular day? ANSWER:

a.

b. The Precious metals commodities had the lowest change (%) in price for that particular day and the Base metals had varied changes between –0.25% and 0.35%. No commodity of the Agricultural & Cattle futures and the Precious metals had a change greater than 0.25%. 52. The income levels vary by race and educational attainment. To examine this inequality in the income, data have been collected for seven different years on the median income earned by an individual based on his or her race and education. Racial Median Year Demographic Educational attainment Income 2003 White High School Graduate $33,405 2003 White Some college $40,325 2003 White Bachelor's degree $55,225 Copyright Cengage Learning. Powered by Cognero.

Page 13


Name:

Class:

Date:

Chapter 03 - Data Visualization 2003 2003 2003 2003 2003 2003 2003 2003 2003 2003 2003 2003 2003 2003 2003 2003 2003 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 1997 1997 1997 1997 1997 1997 1997 1997 1997 1997 1997 1997

White White Black Black Black Black Black Asian Asian Asian Asian Asian Hispanic Hispanic Hispanic Hispanic Hispanic White White White White White Black Black Black Black Black Asian Asian Asian Asian Asian Hispanic Hispanic Hispanic Hispanic Hispanic White White White White White Black Black Black Black Black Asian Asian

Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college

Copyright Cengage Learning. Powered by Cognero.

$67,295 $77,521 $25,741 $30,517 $46,851 $52,106 $61,523 $33,654 $25,749 $51,752 $70,519 $81,760 $22,547 $25,403 $35,482 $45,207 $56,217 $32,451 $39,410 $53,178 $65,147 $75,120 $23,874 $29,415 $42,013 $50,321 $60,741 $32,185 $24,961 $50,102 $75,410 $80,164 $23,784 $25,640 $32,654 $44,891 $55,617 $31,048 $38,497 $52,179 $62,498 $74,614 $22,981 $26,479 $43,578 $48,521 $58,462 $30,148 $23,647 Page 14


Name:

Class:

Date:

Chapter 03 - Data Visualization 1997 1997 1997 1997 1997 1997 1997 1997 1994 1994 1994 1994 1994 1994 1994 1994 1994 1994 1994 1994 1994 1994 1994 1994 1994 1994 1994 1994 1991 1991 1991 1991 1991 1991 1991 1991 1991 1991 1991 1991 1991 1991 1991 1991 1991 1991 1991 1991 1988

Asian Asian Asian Hispanic Hispanic Hispanic Hispanic Hispanic White White White White White Black Black Black Black Black Asian Asian Asian Asian Asian Hispanic Hispanic Hispanic Hispanic Hispanic White White White White White Black Black Black Black Black Asian Asian Asian Asian Asian Hispanic Hispanic Hispanic Hispanic Hispanic White

Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate

Copyright Cengage Learning. Powered by Cognero.

$49,521 $72,149 $80,149 $22,156 $23,641 $31,560 $43,297 $53,189 $33,405 $40,325 $55,225 $67,295 $77,521 $25,741 $30,517 $46,851 $52,106 $61,523 $33,654 $25,749 $51,752 $70,519 $81,760 $22,547 $25,403 $35,482 $45,207 $56,217 $32,451 $39,410 $53,178 $65,147 $75,120 $23,874 $29,415 $42,013 $50,321 $60,741 $32,185 $24,961 $50,102 $75,410 $80,164 $23,784 $25,640 $32,654 $44,891 $55,617 $31,048 Page 15


Name:

Class:

Date:

Chapter 03 - Data Visualization 1988 1988 1988 1988 1988 1988 1988 1988 1988 1988 1988 1988 1988 1988 1988 1988 1988 1988 1988 1985 1985 1985 1985 1985 1985 1985 1985 1985 1985 1985 1985 1985 1985 1985 1985 1985 1985 1985 1985

White White White White Black Black Black Black Black Asian Asian Asian Asian Asian Hispanic Hispanic Hispanic Hispanic Hispanic White White White White White Black Black Black Black Black Asian Asian Asian Asian Asian Hispanic Hispanic Hispanic Hispanic Hispanic

Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree High School Graduate Some college Bachelor's degree Master’s degree Doctorate degree

$38,497 $52,179 $62,498 $74,614 $22,981 $26,479 $43,578 $48,521 $58,462 $30,148 $23,647 $49,521 $72,149 $80,149 $22,156 $23,641 $31,560 $43,297 $53,189 $30,178 $36,479 $50,341 $60,278 $72,369 $20,149 $25,874 $42,987 $42,687 $55,649 $29,741 $22,648 $45,321 $70,561 $75,219 $20,498 $22,647 $30,489 $40,089 $52,641

a. Sort the PivotTable data to display the years with the smallest sum of median income on top and the largest on the bottom. Which year had the smallest sum of median income? What is the total income in the year with the smallest sum of median income? b. Add the Racial Demographic to the Row Labels in the PivotTable. Sort the Racial Demographic by Sum of Median Income with the lowest values on top and the highest values on bottom. Filter the Row Labels so that only the year 2003 is displayed. Which Racial demography had the smallest sum of median income in the year 2003? Which Racial demography had the largest sum of median income in the year 2003? ANSWER: To sort data in a PivotTable in Excel, right-click any cell in the PivotTable that contains the data to be sorted Copyright Cengage Learning. Powered by Cognero.

Page 16


Name:

Class:

Date:

Chapter 03 - Data Visualization and select Sort. a.

The year 1985 had the smallest sum of median income with $846,845. b.

Hispanics had the lowest sum of median income and Whites had the highest sum of median income in the year 2003. 53. Consider a study on the number of accidents that occurred in ten U.S. states in different cities for three consecutive years. Create a PivotTable in Excel to answer the following questions. The PivotTable should group the number of accidents Copyright Cengage Learning. Powered by Cognero.

Page 17


Name:

Class:

Date:

Chapter 03 - Data Visualization into yearly bins and display the sum of accidents that occurred each year in columns of Excel. Row labels should include the accident locations and allow for grouping the locations into states or viewing by city. You should also sort the PivotTable so that the states with the greatest number of accidents between 2011 and 2013 appear at the top of the PivotTable. State City Number of accidents Year GA Rock Spring 52 2011 GA Doraville 44 2011 GA Ellaville 67 2011 FL Jacksonville 53 2011 GA Stockbridge 72 2011 FL Belleview 63 2012 AZ Phoenix 69 2011 FL Crestview 51 2012 IA Johnston 48 2012 GA Rockmart 44 2012 CO Greenwood Village 53 2011 GA Jonesboro 54 2011 GA Decatur 76 2013 FL Clearwater 76 2013 GA Gray 57 2012 CA Nevada City 76 2013 FL Milton 61 2011 GA Woodstock 78 2013 GA Cumming 70 2012 GA Statesboro 47 2013 FL Palm Beach 42 2011 CO Greeley 60 2012 FL Sarasota 40 2011 FL Apollo Beach 75 2011 AZ Prescott 40 2012 FL Port St. Lucie 61 2012 GA Stockbridge 78 2012 GA Atlanta 60 2011 CO Windsor 43 2013 CO Castle Rock 55 2011 GA Clayton 58 2011 FL Tampa 58 2011 GA Jackson 63 2012 GA Franklin 64 2012 GA Macon 71 2011 FL Cocoa Beach 76 2011 GA Valdosta 76 2011 GA Dallas 64 2012 FL Brooksville 41 2013 FL Winter Park 41 2011 AL Birmingham 74 2013 AL Birmingham 78 2011 GA East Ellijay 80 2013 GA Cartersville 66 2011 Copyright Cengage Learning. Powered by Cognero.

Page 18


Name:

Class:

Date:

Chapter 03 - Data Visualization CA CA GA GA CA FL GA GA CO CO GA GA AZ FL AR GA GA GA FL FL AZ GA GA CA CA AZ GA GA FL FL FL FL GA GA GA FL CA CA CA CA FL FL FL FL GA GA FL FL FL

San Luis Obispo Napa Springfield Clarkesville Palm Springs Port Orange Watkinsville Roswell Louisville Denver McDonough Brunswick Scottsdale Orlando Batesville Atlanta McCaysville Dawsonville Coral Gables Carrabelle Scottsdale Vidalia Tifton Westminster Woodland Hills Scottsdale Barnesville Gordon Tampa Jacksonville Crawfordville Ponte Vedra Beach Winder Douglasville Ellijay Bradenton Sonoma Solvang Chico Stockton Ocala Bartow Panama City Beach Port Saint Joe Acworth Jasper Lantana Clewiston Aventura

57 65 80 60 69 77 48 65 62 80 54 48 75 41 60 49 54 62 53 49 42 66 73 68 72 79 54 47 62 68 56 50 61 51 40 43 67 65 61 61 56 61 45 68 42 48 72 61 57

Copyright Cengage Learning. Powered by Cognero.

2011 2012 2013 2012 2011 2013 2012 2013 2012 2013 2012 2013 2011 2013 2011 2012 2011 2013 2011 2012 2012 2013 2011 2011 2013 2013 2011 2013 2012 2013 2013 2013 2012 2011 2011 2011 2012 2013 2013 2011 2011 2011 2012 2011 2013 2011 2012 2011 2012 Page 19


Name:

Class:

Date:

Chapter 03 - Data Visualization FL GA FL CA FL FL FL GA CA AZ FL CA CA FL FL FL AZ GA FL GA AL GA GA GA FL FL CA FL CA GA FL GA FL CA CA AL FL GA AZ FL GA GA GA GA FL CA CA AR GA

Miami Savannah Englewood Granite Bay Tampa Naples Fort Lauderdale Saint Marys San Diego Mesa Bonifay San Rafael Oakland Fort Pierce Clermont Palatka Phoenix Cartersville Key West Carrollton Fort Deposit Hiawassee Ellijay Duluth Orlando Boca Raton La Jolla Marco Island Los Angeles Cornelia Immokalee Carrollton Miami Santa Monica La Jolla Irondale Panama City Atlanta Mesa Miami Reidsville Norcross Alpharetta Alpharetta Boca Raton Newport Beach Pasadena Bentonville Alpharetta

67 46 63 64 60 58 57 62 61 70 47 77 57 48 49 70 52 41 54 65 68 58 61 47 78 45 45 78 61 75 55 56 51 54 40 80 76 64 41 75 77 52 54 58 78 55 59 67 50

Copyright Cengage Learning. Powered by Cognero.

2012 2012 2012 2012 2013 2011 2012 2013 2012 2012 2011 2013 2013 2011 2013 2011 2013 2012 2013 2012 2013 2013 2012 2012 2011 2012 2011 2012 2011 2011 2011 2013 2011 2011 2013 2012 2012 2013 2011 2013 2011 2013 2011 2012 2011 2012 2012 2012 2013 Page 20


Name:

Class:

Date:

Chapter 03 - Data Visualization CA CA CA AZ FL FL GA FL CA CO AZ IA CA AL GA AZ AL FL FL FL GA GA GA GA GA CA CA GA CA CA GA GA GA FL CA GA CO CO GA GA FL CA CA GA CA GA GA CA CA

San Francisco Los Angeles San Diego Phoenix Bradenton Naples Lawrenceville Ocala Bakersfield Pueblo Flagstaff Sioux City Ventura Birmingham Newnan Gilbert Montgomery Venice Sarasota Jupiter Gray Perry Macon Woodstock Suwanee Temecula Rancho Cucamonga Winder Los Angeles Irvine Newnan Villa Rica Fayetteville Coral Gables Calabasas Kennesaw Greeley Colorado Springs Stockbridge Commerce Cape Coral Merced Culver City McDonough Redlands Duluth Jackson Pomona Newport Beach

41 80 57 67 77 59 58 51 68 47 42 53 46 58 45 40 64 71 61 66 69 72 40 78 58 45 57 77 66 74 63 67 55 75 42 63 52 79 47 65 69 52 55 63 48 51 68 41 61

Copyright Cengage Learning. Powered by Cognero.

2013 2012 2013 2012 2012 2013 2012 2013 2011 2011 2013 2013 2012 2012 2013 2011 2011 2013 2013 2012 2012 2012 2012 2012 2011 2012 2013 2013 2013 2013 2011 2012 2013 2013 2012 2012 2013 2013 2013 2012 2013 2011 2011 2013 2012 2013 2013 2011 2011 Page 21


Name:

Class:

Date:

Chapter 03 - Data Visualization GA FL FL CA CT AR HI

Loganville Bradenton Tallahassee Torrance Stamford Gravette Honolulu

68 44 60 42 70 44 70

2012 2013 2012 2013 2011 2013 2011

a. Which state had the greatest number of accidents between 2011 and 2013? b. How many accidents occurred in the state of Colorado (CO) in 2012? In what cities did these accidents occur? c. Use the PivotTable’s filter capability to view only the accidents in Alabama (AL), Arizona (AZ), and Arkansas (AR) for the years 2011 through 2013. What is the total number of accidents in these states between 2011 and 2013? d. Create a PivotChart to display a column chart that shows the total number of accidents in each year 2011 through 2013 in the state of California. Adjust the formatting of this column chart so that it best conveys the data. What does this column chart suggest about accidents between 2011 and 2013 in California? Discuss. Hint: You may have to switch the row and column labels in the PivotChart to get the best presentation for your PivotChart. ANSWER: a.

Georgia (GA) had the greatest number of accidents between 2011 and 2013.

Copyright Cengage Learning. Powered by Cognero.

Page 22


Name:

Class:

Date:

Chapter 03 - Data Visualization

b. The state of Colorado had 122 accidents in the year 2012 in the cities, Greeley and Louisville. c.

There were 1210 accidents between the years 2011 and 2013 in Alabama (AL), Arizona (AZ), and Arkansas (AR). d.

Copyright Cengage Learning. Powered by Cognero.

Page 23


Name:

Class:

Date:

Chapter 03 - Data Visualization

The accidents have increased in the year 2013 compared to the past two years. 54. The data on the distance walked per week by 20 people of different age groups are given in the table below. Age Distance walked/week 18 25 20 22 21 20 25 23 26 18 29 15 38 19 34 16 42 14 23 21 32 24 45 13 50 11 53 9 44 10 19 28 28 26 35 17 49 12 27 27 a. Create a scatter chart for these 20 observations. b. Fit a linear trendline to the 20 observations. What can you say about the relationship between the two quantitative variables? ANSWER: a.

Copyright Cengage Learning. Powered by Cognero.

Page 24


Name:

Class:

Date:

Chapter 03 - Data Visualization

b.

There appears to be a negative linear relationship between the age and the distance walked. 55. Consider the following data on 30 different investments and their maturity values after 15 years. Investment ($) Future value ($) 1500 3119 2000 4158 2200 4574 2480 5156 2850 5925 3250 6757 3560 7401 3890 8088 4180 8690 4390 9127 4550 9460 4800 9979 5150 10,707 5320 11,060 Copyright Cengage Learning. Powered by Cognero.

Page 25


Name:

Class:

Date:

Chapter 03 - Data Visualization 5510 5760 6140 6300 6480 6590 6712 6900 7110 7480 7590 7670 7700 7840 8010 8500

11,455 11,975 12,765 13,098 13,472 13,701 13,954 14,345 14,782 15,551 15,780 15,946 16,008 16,299 16,653 17,671

a. Prepare a scatter diagram to show the relationship between the variables Investment and Future value. Comment on any relationship between the variables. b. Create a trendline for the relationship between Investment and Future value. What does the trendline indicate about this relationship? ANSWER:

a.

There appears to be a positive linear relationship between Investment and Future value. As Investment increases, Future value also increases. b.

Copyright Cengage Learning. Powered by Cognero.

Page 26


Name:

Class:

Date:

Chapter 03 - Data Visualization

The trendline confirms that there is a positive linear trend between Investment and Future value. 56. A survey on the average pass percentage achieved by four of the top-ranked colleges of a city for five different years was conducted to rate the quality of teaching in each of these colleges. Colleges Year 1 Year 2 Year 3 Year 4 Year 5 College 1 65 67 63 68 70 College 2 70 75 77 82 75 College 3 88 95 90 97 98 College 4 55 57 53 59 55 a. Construct a line chart for the time series data for years 1 through 5 showing the average pass percentage in each college. Show the time series for all four colleges on the same graph. b. What does the line chart indicate about the average pass percentage of the colleges between years 1 through 5? Discuss. c. Construct a clustered column chart showing average pass percentage in each college using the years 1 through 5 data. Represent the years along the horizontal axis, and cluster the average pass percentages for the four colleges in each year. Which college is leading in each year? ANSWER:

a. b. College 3 has the highest average pass percentage between years 1 through 5 followed by College 2, College 1, and College 4. This performance has been consistent throughout the five years. c. Copyright Cengage Learning. Powered by Cognero.

Page 27


Name:

Class:

Date:

Chapter 03 - Data Visualization

We observe that College 3 is leading in each year consistently. 57. Growth is the primary focus for all companies. A factor that acts as a key term while analyzing the growth of a company is the number of resources/employees working for the company over a period of time. One such study about a start-up company’s growth in terms of the increase in the number of employees per month in a span of two years is shown below. Month Number of employees 1 40 2 48 3 50 4 52 5 49 6 54 7 57 8 53 9 60 10 64 11 68 12 70 13 73 14 76 15 72 16 75 17 79 18 80 19 84 20 82 21 86 22 89 23 94 24 100 a. Create a line chart for these time series data. What interpretations can you make about the increase in the number of Copyright Cengage Learning. Powered by Cognero.

Page 28


Name:

Class:

Date:

Chapter 03 - Data Visualization employees over these 24 months? b. Fit a linear trendline to the data. What does the trendline indicate about the increase in the number of employees over these 24 months? ANSWER: a.

There was a slight change in the number of employees for the first 9 months. It increased rapidly through the 14 months before falling in the 15th month and again increased up to 24 months. Overall, there was an increase in the number of employees over the 24 months. b.

The trendline confirms that there is an overall linear trend in the increase in the number of employees over these 24 months. 58. The data on the runs scored in a match by the top five players of a cricket team are given below. Players Runs Scored Player 1 42 Player 2 35 Player 3 53 Copyright Cengage Learning. Powered by Cognero.

Page 29


Name:

Class:

Date:

Chapter 03 - Data Visualization Player 4 Player 5

29 39

a. Create a column chart to display the information in the table above. Format the column chart to best display the data by adding axes labels, a chart title, etc. b. Sort the values in Excel so that the column chart is ordered from most runs scored to fewest. c. Insert data labels to display the runs scored by each player above the columns in the column chart obtained in part (b). ANSWER: a.

b. Sorting can be done by selecting the data in Excel and then using the Sort function in the Sort & Filter group under the DATA tab.

c. Data labels can be added by right-clicking on one of the columns in the chart and selecting Add Data Labels.

Copyright Cengage Learning. Powered by Cognero.

Page 30


Name:

Class:

Date:

Chapter 03 - Data Visualization

59. The total number of runs scored by the top five players in a cricket match is 198. The following pie chart shows the percentage of runs scored by each player.

a. What are the problems with using a pie chart to display these data? b. What type of chart would be preferred for displaying the data in this pie chart? c. Use a different type of chart to display the percentage of runs scored by each player that conveys the data better than the pie chart. Format the chart and add data labels to improve the chart’s readability. ANSWER: a. In the pie chart, it is difficult to perceive differences in area. It can also be difficult to distinguish the different colors in the pie chart. Finally, it takes a lot of work for the reader to match the players to the different pieces of the pie chart. b. A sorted column or bar chart would be preferable to display the data in this pie chart. c.

Copyright Cengage Learning. Powered by Cognero.

Page 31


Name:

Class:

Date:

Chapter 03 - Data Visualization

60. A research study was conducted on a sample of 1000 males and 1000 females to study the kind of movie most men and women prefer to watch. The results are shown in the table below. Movie Type Male Female Action 294 226 Comedy 264 276 Horror 237 200 Romance 205 298 a. Construct a clustered column chart with the type of movie as the horizontal variable. b. What can we infer from the clustered bar chart in part (a)? ANSWER: a.

b. From the chart, we observe that most men prefer to watch action movies and most women prefer to watch romantic movies. However, the preferences for comedy movies are almost evenly distributed across the genders, and a horror movie is preferred more by men when compared to women. 61. Consider the following survey results regarding marital status by age. Age Category Never Married (%) Married (%) Divorced (%) Copyright Cengage Learning. Powered by Cognero.

Page 32


Name:

Class:

Date:

Chapter 03 - Data Visualization 18-24 25-34 35-44 45-54

49 44 28 22

35 35 45 58

16 21 27 20

a. Construct a stacked-column chart to display the survey data on marital status. Use Age Category as the variable on the horizontal axis. b. Construct a clustered-column chart to display the survey data. Use Age Category as the variable on the horizontal axis. c. What can you infer about the relationship between age and marital status from the column charts in parts (a) and (b)? Which column chart (stacked or clustered) is best for interpreting this relationship? Why? ANSWER: a.

b.

Copyright Cengage Learning. Powered by Cognero.

Page 33


Name:

Class:

Date:

Chapter 03 - Data Visualization

c. Younger respondents are more likely to be never married, and older respondents are more likely to be married. The clustered-column chart makes it easier to compare the relative percent values within an age category. The percentage of respondents who are never married is high in the age groups 18-24 and 25-34. The percentage of respondents who are married is high in the age group 45-54 and who are divorced is high in the age group 35-44. 62. The regional manager of a company wishes to determine the time spent at each division in the car production process. A study was undertaken over a month that resulted in the following data related to the percentage of time spent at three divisions (car body construction, paint shop, and assembly) at four locations of production plants. Car Body Production Plants Construction (%) Paint Shop (%) Assembly (%) Michigan 35 45 20 Kentucky 37 41 22 Illinois 33 39 28 Ohio 36 40 24 a. Create a stacked-bar chart with production plants along the vertical axis. Reformat the bar chart to best display these data by adding required labels and chart title. b. Create a clustered-bar chart with production plants along the vertical axis and clusters of divisions. Reformat the bar chart to best display these data by adding required labels and chart title. c. Create multiple bar charts where each production plant becomes a single bar chart showing the percentage of time spent at the divisions. Reformat the bar charts to best display these data by adding required labels and chart title. d. Which form of bar chart (stacked, clustered, or multiple) is preferable for these data? Why? ANSWER: a.

Copyright Cengage Learning. Powered by Cognero.

Page 34


Name:

Class:

Date:

Chapter 03 - Data Visualization

b.

c.

Copyright Cengage Learning. Powered by Cognero.

Page 35


Name:

Class:

Date:

Chapter 03 - Data Visualization

Copyright Cengage Learning. Powered by Cognero.

Page 36


Name:

Class:

Date:

Chapter 03 - Data Visualization

d. Both the stacked and clustered-bar chart do not help in making relative comparisons when there are many quantitative variables within each category, so the individual bar charts are preferred. However, the clustered bar chart, which may help make comparisons between the production plants easier, may be preferred in this case. 63. A consumer electronics company, after three months of the launch of five new products in the market, arrived at the following results. Products Profit (%) Market share (%) Cost ($) A 19 18 4500 B 28 12 3000 C 15 25 8750 D 22 35 6250 E 16 10 2500 a. Create a bubble chart where the market share is along the horizontal axis, the profit is on the vertical axis, and the size of the bubbles represents the cost. Format this chart for best presentation by adding axes labels and labeling each bubble with the product name. b. The manager of the company is interested in producing the product that increases the profit for a given level of market share and cost. From the bubble chart in part a, identify the product which needs to be produced in larger quantity. c. From the bubble chart in part (a), now identify the product which needs to be produced in larger quantity taking into account its market share, cost, increase in profit. ANSWER: a.

Copyright Cengage Learning. Powered by Cognero.

Page 37


Name:

Class:

Date:

Chapter 03 - Data Visualization

b. Product B makes the highest profit and, hence, it needs to be produced in larger quantity for the given level of market share and cost. c. Product D can be produced in larger quantity when its market share, the profit, and its cost are taken into consideration. 64. The project lead in an MNC decides to assign every member of his team to a new project and monitors their performance on a customized scale of scores. The data on their performance over a period of six months are shown below. Performance Scores Team members Jan Feb Mar Apr May Jun 1 4 5 2 3 -1 3 2 2 2 3 1 4 4 3 –1 4 4 4 5 1 4 5 1 4 2 5 3 5 2 2 1 5 4 4 6 4 1 2 1 –1 4 7 1 5 –1 2 5 1 8 1 2 5 5 4 2 9 4 5 3 4 2 2 10 4 5 2 2 –1 5 11 5 -1 5 1 2 2 12 3 2 –1 –1 1 2 13 –1 1 4 -1 4 5 14 2 3 3 2 –1 4 15 5 –1 5 5 1 1 16 5 2 1 5 2 –1 17 5 –1 4 –1 2 1 18 3 4 –1 5 2 4 19 2 3 5 1 3 1 20 4 4 3 1 –1 5 21 3 1 1 4 –1 4 22 3 1 5 5 4 2 23 2 2 –1 3 2 –1 24 1 2 4 4 4 3 Copyright Cengage Learning. Powered by Cognero.

Page 38


Name:

Class:

Date:

Chapter 03 - Data Visualization 25

3

5

–1

2

5

4

a. Create a heat map in Excel that shades the cells with negative performance scores. Use Excel’s Conditional Formatting function to create this heat map. b. For each month, identify the team members who scored negative. Which month has the highest negative performance scores? ANSWER:

a.

b. January: Team members 3 and 13. February: Team members 11, 15, and 17. March: Team members 7, 12, 18, 23, and 25. April: Team members 12, 13, and 17. May: Team members 1, 6, 10, 14, 20, and 21. June: Team members 16 and 23. We observe that most team members scored negative in the month of May. 65. The following table shows the average monthly distance travelled (in Billions of Miles) by vehicles on urban Copyright Cengage Learning. Powered by Cognero.

Page 39


Name:

Class:

Date:

Chapter 03 - Data Visualization highways for five different years. Years Year 1 Year 2 Year 3 Year 4 Year 5

Urban Highways - Average Monthly Distance Travelled by Vehicles (Billion Miles) Feb Mar Apr May Jun July Aug Sep Oct Nov 5.32 5.21 5.12 4.92 4.49 4.55 4.49 4.44 4.39 4.37 5.44 5.34 5.24 4.98 4.59 4.68 4.65 4.61 4.68 4.74 5.51 5.41 5.36 4.98 4.63 4.71 4.78 4.82 4.88 4.85 5.59 5.5 5.41 5.01 4.72 4.78 4.79 4.82 4.92 5.06 5.65 5.62 5.49 5.12 4.8 4.88 4.82 4.95 5.12 5.22

Jan 4.22 4.31 4.38 4.45 4.51

Dec 4.35 4.79 4.89 5.11 5.44

a. Use Excel to create sparklines for the average monthly vehicle distance travelled each year. b. Which year has decreasing trend of the average distance travelled? Which year has increasing trend of the average distance travelled? c. Use Excel to create a heat map for the average distance travelled by vehicles. Do you find the heat map or the sparklines to be better at communicating the trend of the average vehicle distance travelled over different years? Why? ANSWER:

a.

b. Year 1 has a decreasing trend, and Year 5 has an increasing trend. c.

It is difficult to create a heat map that effectively conveys the overall trend of the average monthly distance travelled for each year. The heat map shows the relative magnitude of the distance travelled by the vehicles, which is absent from the sparklines. However, the trend for each year is less apparent in the heat map. 66. The data on the ranks assigned to a random sample of students in a competitive exam based on scores and three different veteran statuses are given below. Name Score Rank Status Steve 80 1 DV Joshua 75 2 DV John 95 3 V Alex 90 4 V Jeff 85 5 V Matt 80 6 NV Chris 75 7 NV a. Create a parallel-coordinates plot using XLMiner for these data. Include vertical axes for the name, score, and rank. Color the lines by the type of status. b. According to the parallel-coordinates plot, how are disabled veterans differentiated from veterans? ANSWER: a. Copyright Cengage Learning. Powered by Cognero.

Page 40


Name:

Class:

Date:

Chapter 03 - Data Visualization

b. We observe that the disabled veterans are assigned the highest ranks though their scores are relatively low and the veterans with high scores are given the moderate ranks. 67. The owner of a grocery store is interested in providing better service to his customers with respect to the wait time at the billing counter. The data on 20 waiting customers are given below. Customer Number Wait Time (min) Purchase Amount ($) Customer Age Credit Score 1 2.3 518 42 694 2 2.8 592 33 879 3 3.2 598 38 531 4 3.4 845 40 509 5 3.4 648 29 869 6 4.2 695 46 777 7 3.2 844 42 470 8 1.4 470 40 714 9 6.4 488 24 517 10 7.8 527 37 794 11 6.5 843 52 551 12 9.8 704 43 673 13 5 824 56 846 14 1.8 570 35 735 15 6.1 503 39 816 16 3.4 483 44 516 17 7.8 707 33 729 18 2.8 796 42 591 19 1.2 485 46 866 20 9.5 727 50 879 a. Use XLMiner to create a scatter chart matrix for these data. Include the variables wait time, purchase amount, customer age, and credit score. b. What can you infer about the relationships between these variables from the scatter chart matrix? ANSWER: a.

Copyright Cengage Learning. Powered by Cognero.

Page 41


Name:

Class:

Date:

Chapter 03 - Data Visualization

b. The waiting time appears to have a positive relationship with the purchase amount. The customers’ age seems to have a positive relationship with the purchase amount and credit score as well. 68. Sadie is constructing a bar chart to describe the average savings account balances for customers at her bank. If the minimum balance is $5.00 and the maximum balance is $18,700, would the following bar chart be a good representation of the data? If not, what would Sadie need to change?

ANSWER: Sadie should change her bins to something like: $0.00 to $4,999; $5,000 to 9,999; $10,000 to $14,999; Copyright Cengage Learning. Powered by Cognero.

Page 42


Name:

Class:

Date:

Chapter 03 - Data Visualization $15,000 to $20,000. 69. Data are shown below on the quality rating, volume, average wait time from pull-up to completion, average unit purchase, and revenue tier for franchises of a certain fast food restaurant in Area 6. Area 6 Franchise Data Franchise Number

Quality Rating

Volume Category

Wait Time (sec.)

Average Unit Purchase ($)

Revenue Tier

124

Above Average

High

175

13.25

4

152

Acceptable

Low

181

10.02

2

452

Above Average

Low

179

13.56

4

462

Above Average

Medium

175

12.12

3

485

Superior

High

171

15.11

4

567

Acceptable

High

178

9.78

2

568

Above Average

Medium

177

12.54

4

584

Above Average

Medium

176

11.54

3

625

Acceptable

Medium

175

11.14

3

875

Acceptable

Low

180

8.78

1

Using the table below, complete the crosstabulation chart of quality rating and categorized daily volume. Crosstabulation of Quality Rating and Daily Units for Area 6 Franchises Daily Units Quality Rating Low Medium High Total Acceptable Above Average Superior Total ANSWER: Crosstabulation of Quality Rating and Daily Units for Area 6 Franchises Daily Units Quality Rating Low Medium High Total Acceptable 2 1 1 4 Above Average 1 3 1 5 Superior 0 0 1 1 Copyright Cengage Learning. Powered by Cognero.

Page 43


Name:

Class:

Date:

Chapter 03 - Data Visualization Total

3

4

3

10

70. Data are show below on the quality rating, volume, average wait time from pull-up to completion, average unit purchase, and revenue tier for franchises of a certain fast food restaurant in Area 6. Area 6 Franchise Data Franchise Number

Quality Rating

Volume Category

Wait Time (sec.)

Average Unit Purchase ($)

Revenue Tier

124

Above Average

High

175

13.25

4

152

Acceptable

Low

181

10.02

2

452

Above Average

Low

179

13.56

4

462

Above Average

Medium

175

12.12

3

485

Superior

High

171

15.11

4

567

Acceptable

High

178

9.78

2

568

Above Average

Medium

177

12.54

4

584

Above Average

Medium

176

11.54

3

625

Acceptable

Medium

175

11.14

3

875

Acceptable

Low

180

8.78

1

Using the table below, complete the crosstabulation chart of Volume Category and Revenue Tier. Crosstabulation of Quality Rating and Daily Units for Area 6 Franchises Revenue Tier Volume Category Tier 1 Tier 2 Tier 3 Tier 4 Total Low Medium High Total ANSWER: Crosstabulation of Quality Rating and Revenue Tier for Area 6 Franchises Revenue Tier Volume Category Tier 1 Tier 2 Tier 3 Tier 4 Total Low 1 2 0 4 7 Medium 0 0 9 4 13 Copyright Cengage Learning. Powered by Cognero.

Page 44


Name:

Class:

Date:

Chapter 03 - Data Visualization High Total

0 1

2 4

0 9

8 16

10 30

71. Danah is responsible for reporting the status of sales for his company. The following pie chart shows the percentages of closed sales in each of the top seven cities. Use a different type of chart to display the percentage of sales sold in each city that conveys the data better than the pie chart. Convert the pie chart to a bar chart in order to improve the chart's readability.

ANSWER:

72. Construct a scatter chart for the following set of data. Describe the relationship between the two variables. VAR1 VAR2

2 3

5 6

6 8

7 13

10 15

Copyright Cengage Learning. Powered by Cognero.

Page 45


Name:

Class:

Date:

Chapter 03 - Data Visualization ANSWER:

There is a weak, positive, linear relationship. 73. This pie chart describes the age frequencies of students in a Business Analysis class. What is the relative frequency of students who are younger than 23? (Round to a whole number if necessary.)

ANSWER: 48

100/210 = 47.6 74. The regional manager of a company wishes to determine the time spent at each division in the car production process. A study was undertaken over a month that resulted in the following data related to the percentage of time spent at three Copyright Cengage Learning. Powered by Cognero.

Page 46


Name:

Class:

Date:

Chapter 03 - Data Visualization divisions (car body construction, paint shop, and assembly) at four locations of production plants. Production Plants

Car Body Construction Paint Shop (%) (%)

Assembly (%)

Michigan

35

45

20

Kentucky

37

41

22

Illinois

33

39

28

Ohio

36

40

24

Would it be appropriate to make a pie chart of the distribution of time spent in the paint shop for these four production plants? Explain. ANSWER: No. A pie chart can only be used to display a categorical variable whose categories make up a whole. 75. Data are shown below on the quality rating, volume, average wait time from pull-up to completion, average unit purchase, and revenue tier for franchises of a certain fast food restaurant in Area 6. Area 6 Franchise Data Volume Wait Time Category (sec.)

Franchise Number

Quality Rating

Average Unit Purchase ($)

Revenue Tier

124

Above Average

High

175

13.25

4

152

Acceptable

Low

181

10.02

2

452

Above Average

Low

179

13.56

4

462

Above Average

Medium

175

12.12

3

485

Superior

High

171

15.11

4

567

Acceptable

High

178

9.78

2

568

Above Average

Medium

177

12.54

4

584

Above Average

Medium

176

11.54

3

625

Acceptable

Medium

175

11.14

3

875

Acceptable

Low

180

8.78

1

Is it appropriate to make a scatter chart to display the relationship between the franchise number and the average unit purchase? Explain. ANSWER: No. A scatter chart is a graphical presentation of the relationship between two quantitative variables. The variable “Franchise Number” is not quantitative.

Copyright Cengage Learning. Powered by Cognero.

Page 47


Name:

Class:

Date:

Chapter 04 - Probability: An Introduction to Modeling Uncertainty Multiple Choice 1. Probability is the _____. a. number of successes divided by the number of failures b. numerical measure of the likelihood that an event will occur c. chance that an event will not happen d. number of successes divided by the standard deviation of the distribution ANSWER: b 2. A _____ describes the range and relative likelihood of all possible values for a random variable. a. probability distribution for a random variable b. probability mass function of an event c. density function d. probability ANSWER: a 3. An initial estimate of the probabilities of events is a _____ probability. a. posterior b. conditional c. empirical d. prior ANSWER: d 4. Bayes' theorem is a method used to compute _____ probabilities. a. posterior b. conditional c. empirical d. prior ANSWER: a 5. All the events in the sample space that are not part of the specified event are called _____. a. joint events b. the complement of the event c. simple events d. independent events ANSWER: b 6. Sample space is _____. a. a process that results in some outcome b. the collection of all possible outcomes c. the collection of events d. a subgroup of a population/the likelihood of an outcome ANSWER: b Copyright Cengage Learning. Powered by Cognero.

Page 1


Name:

Class:

Date:

Chapter 04 - Probability: An Introduction to Modeling Uncertainty 7. The event containing the outcomes belonging to A or B or both is the _____ of A and B. a. union b. Venn diagram c. intersection d. complement ANSWER: a 8. Two events are independent if _____. a. the two events occur at the same time b. the probability of one or both events is greater than 1 c. P(A | B) = P(A) or P(B | A) = P(B) d. None of these are correct. ANSWER: c 9. Which statement is true about mutually exclusive events? a. If events A and B cannot occur at the same time, they are called mutually exclusive. b. If either event A or event B must occur, they are called mutually exclusive. c. P(A) + P(B) = 1 for any events A and B that are mutually exclusive. d. None of these are correct. ANSWER: a 10. A joint probability is the _____. a. sum of the probabilities of two events b. probability of the intersection of two events c. probability of the union of two events d. sum of the probabilities of two independent events ANSWER: b 11. In the probability table below, which value is a marginal probability?

Obstacle Course Level Challenging Easy Total a. 0.1 b. 1.0 c. 0.5 d. 0.4 ANSWER: c

No 0.4 0.1 0.5

Completed Yes Total 0.3 0.7 0.2 0.3 0.5 1.0

12. A variable that can only take on specific numeric values is called a _____. a. categorical variable b. discrete random variable Copyright Cengage Learning. Powered by Cognero.

Page 2


Name:

Class:

Date:

Chapter 04 - Probability: An Introduction to Modeling Uncertainty c. continuous random variable d. categorical variable ANSWER: b 13. An experiment consists of determining the speed of automobiles on a highway by the use of radar equipment. The random variable in this experiment is a _____. a. discrete random variable b. continuous random variable c. complex random variable d. categorical random variable ANSWER: b 14. Which of the following statements is correct? a. The binomial and normal distributions are both discrete probability distributions. b. The binomial and normal distributions are both continuous probability distributions. c. The binomial distribution is a continuous probability distribution, and the normal distribution is a discrete probability distribution. d. The binomial distribution is a discrete probability distribution and the normal distribution is a continuous probability distribution. ANSWER: d 15. Which of the following is a discrete random variable? a. The number of times a student guesses the answers to questions on a certain test b. The amount of gasoline purchased by a customer c. The amount of mercury found in fish caught in the Gulf of Mexico d. The height of water-oak trees ANSWER: a 16. All of the following are examples of discrete random variables except _____. a. number of tickets sold b. marital status c. time d. population of a city ANSWER: c 17. The _____ probability distribution can be used to estimate the number of vehicles that go through an intersection during the lunch hour. a. binomial b. normal c. triangular d. Poisson ANSWER: d 18. The random variable X is known to be uniformly distributed between 2 and 12. Compute E(X), the expected value of the distribution. Copyright Cengage Learning. Powered by Cognero.

Page 3


Name:

Class:

Date:

Chapter 04 - Probability: An Introduction to Modeling Uncertainty a. 4 b. 5 c. 6 d. 7 ANSWER: d 19. The random variable X is known to be uniformly distributed between 2 and 12. Compute the standard deviation of X. a. 2.887 b. 3.464 c. 8.333 d. 12 ANSWER: a 20. If a z-score is zero, then the corresponding x-value must be equal to the _____. a. mean b. median c. mode d. standard deviation ANSWER: a 21. In a normal distribution, which is greater, the mean or the median? a. Mean b. Median c. Neither the mean nor the median (they are equal) d. Cannot be determined with the information provided ANSWER: c 22. The center of a normal curve is _____. a. always equal to zero b. the mean of the distribution c. always a positive number d. equal to the standard deviation ANSWER: b 23. Which of the following is not a characteristic of the normal probability distribution? a. The mean, median, and the mode are equal. b. The mean of the distribution can be negative, zero, or positive. c. The distribution is symmetrical. d. The standard deviation must be 1. ANSWER: d 24. A health conscious student faithfully wears a device that tracks his steps. Suppose that the distribution of the number of steps he takes in a day is normally distributed with a mean of 10,000 and a standard deviation of 1,500 steps. What percent of the days does he exceed 13,000 steps? Copyright Cengage Learning. Powered by Cognero.

Page 4


Name:

Class:

Date:

Chapter 04 - Probability: An Introduction to Modeling Uncertainty a. 2.28% b. 5% c. 95% d. 97.72% ANSWER: a 25. The newest model of smart car is supposed to get excellent gas mileage. A thorough study showed that gas mileage (measured in miles per gallon) is normally distributed with a mean of 75 miles per gallon and a standard deviation of 10 miles per gallon. What is the probability that, if driven normally, the car will get 100 miles per gallon or better? a. 0.6% b. 2.5% c. 6% d. 25% ANSWER: a 26. A health conscious student faithfully wears a device that tracks his steps. Suppose that the distribution of the number of steps he takes in a day is normally distributed with a mean of 10,000 and a standard deviation of 1,500 steps. One day he took 15,000 steps. What was his percentile on that day? a. 95% b. 97.7% c. 99.7% d. 100% ANSWER: c 27. The newest model of smart car is supposed to get excellent gas mileage. A thorough study showed that gas mileage (measured in miles per gallon) is normally distributed with a mean of 75 miles per gallon and a standard deviation of 10 miles per gallon. What value represents the 50th percentile of this distribution? a. 75 b. 85 c. 95 d. 105 ANSWER: a 28. A health conscious student faithfully wears a device that tracks his steps. Suppose that the distribution of the number of steps he takes is normally distributed with a mean of 10,000 and a standard deviation of 1,500 steps. How many steps would he have to take to make the cut for the top 5% for his distribution? a. 7,533 b. 8,078 c. 10,000 d. 12,467 ANSWER: d

29. What is the mean of x, given the exponential probability function Copyright Cengage Learning. Powered by Cognero.

Page 5


Name:

Class:

Date:

Chapter 04 - Probability: An Introduction to Modeling Uncertainty a. 0.05 b. 20 c. 100 d. 2,000 ANSWER: b 30. Fast food restaurants pride themselves in being able to fill orders quickly. A study was done at a local fast food restaurant to determine how long it took customers to receive their order at the drive thru. It was discovered that the time it takes for orders to be filled is exponentially distributed with a mean of 1.5 minutes. What is the probability density function for the time it takes to fill an order? a.

b.

c.

d. None of these are correct. ANSWER: c 31. What is the total area under the normal distribution curve? a. It depends upon the mean and standard deviation b. It must be calculated c. 1 d. 100 ANSWER: c 32. The triangular distribution is a good model for _____ distributions. a. uniform b. skewed c. normal d. normal ANSWER: b 33. Fast food restaurants pride themselves in being able to fill orders quickly. A study was done at a local fast food restaurant to determine how long it took customers to receive their order at the drive-thru. It was discovered that the time it takes for orders to be filled is exponentially distributed with a mean of 1.5 minutes. What is the probability that it takes less than one minute to fill an order? a. 0.1813 b. 0.4866 c. 0.6321 d. 0.7769 ANSWER: b Copyright Cengage Learning. Powered by Cognero.

Page 6


Name:

Class:

Date:

Chapter 04 - Probability: An Introduction to Modeling Uncertainty 34. A survey of 100 random high school students finds that 85 students watched the Super Bowl, 25 students watched the Stanley Cup Finals, and 20 students watched both games. How many students did not watch either game? a. 15 b. 30 c. 10 d. 20 ANSWER: c 35. The number of minutes that Samantha waits to catch the bus is uniformly distributed between 0 and 15 minutes. What is the probability that Samantha has to wait less than 4.5 minutes to catch the bus? a. 10% b. 20% c. 30% d. 3% ANSWER: c Objective Short Answer 36. A nickel and a dime are tossed. How many possible outcomes are in this event? ANSWER: There are four possible outcomes. 37. A nickel and a dime are tossed. We are interested only in the event that includes at least one head appearing on a single toss of both coins. What are the possible outcomes? ANSWER: There are three possible outcomes. 38. Consider a random experiment of rolling two dice. The sample space for rolling two dice is shown. Let S be the set of all ordered pairs listed in the figure. What are the possible outcomes for the event of rolling a 7?

ANSWER: {(6, 1), (5, 2), (4, 3), (3, 4), (2, 5), (1, 6)} 39. Consider a random experiment of rolling two dice. The sample space for rolling two dice is shown. Let S be the set of all ordered pairs listed in the figure. What is the probability of rolling a 7? Copyright Cengage Learning. Powered by Cognero.

Page 7


Name:

Class:

Date:

Chapter 04 - Probability: An Introduction to Modeling Uncertainty

ANSWER: 0.1667 or 16.7% or 1/6 40. Consider a random experiment of rolling two dice. The sample space for rolling two dice is shown. Let S be the set of all ordered pairs listed in the figure. What is the probability of rolling a sum larger than 10?

ANSWER: 1/12 or 0.0833 or 8.3% 41. James has two fair coins. When he flips them, what is the sample space? ANSWER: heads-heads, heads-tails, tails-heads, and tails-tails 42. A nickel and a dime are tossed. If an event is defined as a single toss of both coins where at least one head appears, what is the complement of that event? ANSWER: tails-tails

c 43. Given that A and B are independent with P(A ∪ B) = 0.8 and P(B ) = 0.3, find P(A). ANSWER: 0.33 44. A bucket contains 2 red balls, 4 yellow balls, and 5 purple balls. One ball is taken from the bucket and then replaced. Another ball is taken from the bucket. Are the events of pulling a red ball first and then a purple one independent or dependent? ANSWER: Independent Copyright Cengage Learning. Powered by Cognero.

Page 8


Name:

Class:

Date:

Chapter 04 - Probability: An Introduction to Modeling Uncertainty 45. A bucket contains 2 red balls, 4 yellow balls, and 5 purple balls. One ball is taken from the bucket and then replaced. Another ball is taken from the bucket. What is the probability that the first ball is red and the second ball is purple? ANSWER: 10/121 46. A bucket contains 3 red balls, 4 yellow balls, and 5 purple balls. One ball is taken from the bucket and is not replaced. Another ball is taken from the bucket. Are the events of pulling a red ball first and then a purple one independent or dependent? ANSWER: dependent 47. A bucket contains 3 red balls, 4 yellow balls, and 5 purple balls. One ball is taken from the bucket and is not replaced. Another ball is taken from the bucket. What is the probability that the first ball is red and the second ball is purple? ANSWER: 5/44 48. Given that P(A) = 0.3, P(A | B) = 0.4, and P(B) = 0.5, compute P(A and B). ANSWER: 0.2 49. The cross tabulation below classifies employees of a communications company by age and field of expertise. Use the given information to create a joint probability table.

Engineering Business Education Liberal Arts Total

Under 35 8,399 14,515 6,738 11,415 41,067

35-44 8,663 14,988 5,657 11,484 40,792

ANSWER:

45+ 7,072 26,683 8,669 12,111 54,535

Under 35 0.06 0.11 0.05 0.08 0.30

Engineering Business Education Liberal Arts Total

Total 24,134 56,186 21,064 35,010 136,394 35-44 0.06 0.11 0.04 0.08 0.30

45+ 0.05 0.20 0.06 0.09 0.40

Total 0.18 0.41 0.15 0.26 1.00

50. The contingency table below represents employees of a communications company classified by age and field of expertise. Fill in the missing entries.

Engineering Business Education Liberal Arts Total

Under 35 7,635 13,195 4,802 10,377 36,009

ANSWER: Engineering Business

35-44 7,875 5,143 10,440

Under 35 7,635 13,195

Copyright Cengage Learning. Powered by Cognero.

45+ 6,429

Total 21,939 38,814

11,010 37,313

31,827 110,405

35-44 7,875 13,625

45+ 6,429 11,994

Total 21,939 38,814 Page 9


Name:

Class:

Date:

Chapter 04 - Probability: An Introduction to Modeling Uncertainty Education Liberal Arts Total

4,802 10,377 36,009

5,143 10,440 37,083

7,880 11,010 37,313

17,825 31,827 110,405

51. The contingency table below represents employees of a communications company classified by age and field of expertise. What is the probability that a randomly selected employee age 35-45 years old has business expertise?

Engineering Business Education Liberal Arts Total

Under 35 8,399 14,515 6,738 11,415 41,067

35-44 8,663 14,988 5,657 11,484 40,792

45+ 7,072 26,683 8,669 12,111 54,535

Total 24,134 56,186 21,064 35,010 136,394

ANSWER: 14,988/40,791 = 0.37 52. The cross tabulation shown below shows employees of a communications company classified by age and field of expertise. What is the probability that a randomly selected engineer is under the age of 35?

Engineering Business Education Liberal Arts Total

Under 35 8,399 14,515 6,738 11,415 41,067

35-44 8,663 14,988 5,657 11,484 40,792

45+ 7,072 26,683 8,669 12,111 54,535

Total 24,134 56,186 21,064 35,010 136,394

ANSWER: 8,399 / 24,134 = 0.35 53. The random variable X is known to be uniformly distributed between 2 and 12. Compute P(X = 3). ANSWER: 0 54. The random variable X is known to be uniformly distributed between 2 and 12. Compute P(X > 10). ANSWER: 0.2 55. For the standard normal probability distribution, what percent of the curve lies to the left of the mean? ANSWER: 50% 56. Participants at the state fair were given eight rings to toss. The number x of rings tossed onto a stick can be approximated by the probability distribution in the table. Use the probability distribution to find the mean and variance of the probability distribution. x 0 1 2 3 4

f(x) 0.010 0.030 0.070 0.070 0.100

Copyright Cengage Learning. Powered by Cognero.

Page 10


Name:

Class:

Date:

Chapter 04 - Probability: An Introduction to Modeling Uncertainty 5 6 7 8

0.210 0.320 0.130 0.060

ANSWER: x 0 1 2 3 4 5 6 7 8

f(x) 0.010 0.030 0.070 0.070 0.100 0.210 0.320 0.130 0.060

xf(x) 0 0.03 0.14 0.21 0.4 1.05 1.92 0.91 0.48

57. In a binomial experiment, what does it mean to say that each trial is independent of the other trials? ANSWER: Each trial is independent of the other trials if the outcome of one trial does not affect the outcome of any of the other trials. 58. What type of distribution models the number of occurrences of an event over a specified interval of time or space? ANSWER: Poisson distribution 59. Let X be a random variable with a Uniform distribution between 8 and 20. Find the probability that X is less than 10? ANSWER: 2/12 = 0.1667 60. Could this curve represent a normal distribution?

ANSWER: No 61. You recently took a standardized test in which scores follow a normal distribution with a mean of 18 and a standard deviation of 3. You were told that your score is at the 75th percentile of this distribution. What is your score? ANSWER: Your score is 20. 62. The time in seconds that it takes a production worker to inspect an item has an exponential distribution with a mean of 15 seconds. What proportion of inspection times is less than 10 seconds? ANSWER: 0.4866 Copyright Cengage Learning. Powered by Cognero.

Page 11


Name:

Class:

Date:

Chapter 04 - Probability: An Introduction to Modeling Uncertainty 63. The random variable X is normally distributed with a mean of 80 and a standard deviation of 10. What is the probability that a value of X chosen at random will be between 70 and 90? ANSWER: P(70 < x < 90) = 0.683 64. Reviews of call center representatives over the last three years showed that 10% of all call center representatives were rated as outstanding, 75% were rated as excellent/good, 10% percent were rated as satisfactory, and 5% were considered unsatisfactory. For a sample of 10 reps selected at random, what is the probability that 2 will be rated as unsatisfactory? ANSWER: 0.0746 65. A game at an arcade is in the form of a large wheel that a player spins. The wheel is programmed to give 2 tickets 50% of the time, 5 tickets 25% of the time, 10 tickets 23% of the time, and 100 tickets 2% of the time. If a player spins the wheel once, what is the expected number of tickets the player will win? ANSWER: 6.55

Copyright Cengage Learning. Powered by Cognero.

Page 12


Name:

Class:

Date:

Chapter 05 - Descriptive Data Mining Multiple Choice 1. The goal of _____ is to use the variable values to identify relationships between observations. a. unsupervised learning b. data mining c. McQuitty’s method d. Ward's method ANSWER: a 2. In preparing categorical variables for analysis, it is usually best to _____. a. convert the categories to numeric representations b. convert the categories to binary, dummy variables c. combine as many categories as possible d. let them remain categorical ANSWER: b 3. Observation refers to the _____. a. estimated continuous outcome variable b. set of recorded values of variables associated with a single entity c. goal of predicting a categorical outcome based on a set of variables d. mean of all variable values associated with one particular entity ANSWER: b 4. _____ approaches are designed to describe patterns and relationships in large data sets with many observations of many variables. a. Data mining b. Unsupervised learning c. Dimension reduction d. Data sampling ANSWER: b 5. Suppose we had a data set of from a call center where customers were asked to choose between the following three options: hear account information, billing questions, and customer service. Using the given order of the three options, and using 0–1 dummy variables to encode the categorical variables, which of the following combinations would yield an entry “customer service”? a. 000 b. 100 c. 010 d. 001 ANSWER: d 6. The data preparation technique used in market segmentation to divide consumers into different homogeneous groups is called _____. a. data visualization b. cluster analysis c. market analysis d. supervised learning ANSWER: b 7. k-means clustering is the process of _____. a. agglomerating observations into a series of nested groups based on a measure of similarity Copyright Cengage Learning. Powered by Cognero.

Page 1


Name:

Class:

Date:

Chapter 05 - Descriptive Data Mining b. organizing observations into distinct groups based on a measure of similarity c. reducing the number of variables to consider in data-mining d. estimating the value of a continuous outcome variable ANSWER: b 8. Euclidean distance can be used to calculate the dissimilarity between two observations. Let u = (25, $350) correspond to a 25-year-old customer that spent $350 at Store A in the previous fiscal year. Let v = (53, $420) correspond to a 53year-old customer that spent $4,100 at Store A in the previous fiscal year. Calculate the dissimilarity between these two observations using Euclidean distance. a. 66.21 b. 72.28 c. 75.39 d. 88.57 ANSWER: c 9. Which of the following is true of Euclidean distances? a. It is used to measure dissimilarity between categorical variable observations. b. It is not affected by the scale on which variables are measured. c. It increases with the increase in similarity between variable values. d. It is commonly used as a method of measuring dissimilarity between quantitative observations. ANSWER: d 10. Jaccard’s coefficient is different from the matching coefficient in that the former _____. a. measures overlap while the latter measures dissimilarity b. does not count matching zero entries while the latter does c. deals with categorical variable while the latter deals with continuous variables d. is affected by the scale used to measure variables while the latter is not ANSWER: b 11. Single linkage is a measure of calculating dissimilarity between clusters by _____. a. considering only the two most dissimilar observations in the two clusters b. computing the average dissimilarity between every pair of observations between the two clusters c. considering only the two most similar observations in the two clusters d. considering the distance between the cluster centroids ANSWER: c 12. _____ is a measure of calculating dissimilarity between clusters by considering only the two most dissimilar observations in the two clusters. a. Single linkage b. Complete linkage c. Average linkage d. Average group linkage ANSWER: b 13. If the Euclidean distance were to be represented in a right triangle, which of the following would be considered the distance between two observations of a cluster? a. The short leg b. The long leg c. The hypotenuse Copyright Cengage Learning. Powered by Cognero.

Page 2


Name:

Class:

Date:

Chapter 05 - Descriptive Data Mining d. Euclidean distance is not related to right triangles. ANSWER: c 14. When clustering only by dummy variables that represent categorical variables, the simplest measure of similarity between two observations is called _____. a. the matching coefficient b. Jaccard's coefficient c. Euclidean distance d. the antecedent ANSWER: a 15. A method for modifying variables that reduces bias prior to cluster analysis is _____. a. standardization b. weighting c. removing outliers d. randomizing ANSWER: a 16. Euclidean distance can be used to measure the distance between _____ in cluster analysis. a. objects b. clusters c. observations d. ward ANSWER: c 17. Average linkage is a measure of calculating dissimilarity between two clusters by _____. a. finding the distance between the two most dissimilar observations in the two clusters b. computing the average distance between every pair of observations between two clusters c. finding the distance between the two closest observations in the two clusters d. computing the distance between the cluster centroids ANSWER: b 18. _____ is a method of calculating dissimilarity between clusters by calculating the distance between the centroids of the two clusters. a. Single linkage b. Complete linkage c. Average linkage d. Centroid linkage ANSWER: d 19. _____ can be used to partition observations in a manner to obtain clusters with the least amount of information loss due to the aggregation. a. Single linkage b. Ward’s method c. Average group linkage d. Dendrogram ANSWER: b 20. Suppose the dissimilarity between clusters A and B has the value 24 and the dissimilarity between cluster B and C has Copyright Cengage Learning. Powered by Cognero.

Page 3


Name:

Class:

Date:

Chapter 05 - Descriptive Data Mining the value 12. Use McQuitty’s method to determine the dissimilarity of clusters A and B. a. 12 b. 18 c. 24 d. 36 ANSWER: b 21. A tree diagram used to illustrate the sequence of nested clusters produced by hierarchical clustering is known as a _____. a. dendrogram b. scatter chart c. decile-wise lift chart d. cumulative lift tree ANSWER: a 22. The endpoint of a k-means clustering algorithm occurs when _____. a. Euclidean distance between clusters is minimized b. Euclidean distance between observations in a cluster is maximized c. no further changes are observed in cluster structure and number d. all of the observations are encompassed within a single large cluster with mean k ANSWER: c 23. A cluster’s _____ can be measured by the difference between the distance value at which a cluster is originally formed and the distance value at which it is merged with another cluster in a dendrogram. a. dimension b. affordability c. durability d. span ANSWER: c 24. Complete linkage can be used to measure the distance between _____ in cluster analysis. a. objects b. clusters c. observations d. wards ANSWER: b 25. Complete linkage can be used to measure the distance between clusters that are the _____ in cluster analysis. a. most similar b. most different c. farthest apart d. closest ANSWER: b 26. Single linkage can be used to measure the distance between clusters that are the _____ in cluster analysis. a. most similar b. most different c. farthest apart d. closest ANSWER: a Copyright Cengage Learning. Powered by Cognero.

Page 4


Name:

Class:

Date:

Chapter 05 - Descriptive Data Mining 27. _____ is a measure that computes the dissimilarity between a cluster AB and a cluster C by averaging the distance between A and C and the distance between B and C. a. Ward's method b. Jaccard's coefficient c. McQuitty's method d. None of these are correct. ANSWER: c 28. Hierarchical clustering using _____ results in a sequence of aggregated clusters that minimizes the loss of information between the individual observation level and the cluster level. a. McQuitty’s method b. centroid linkage c. median linkage d. Ward’s method ANSWER: d 29. _____ is the dissimilarity measure that is more robust to outliers than Euclidean distance. a. Manhattan distance b. Matching coefficient c. Matching distance d. Jaccard distance ANSWER: a 30. Manhattan distance is the distance traveled as if traveled along rectangular city blocks. The Manhattan distance for the standardized observations of (–1.85, 0.65) and (0.55, –0.75) is _____. a. 2.40 b. 2.00 c. 1.40 d. 3.80 ANSWER: d 31. In k-means clustering, k represents the _____. a. number of variables b. number of clusters c. number of observations in a cluster d. mean of the cluster ANSWER: b 32. The strength of a cluster can be measured by comparing the average distance in a cluster to the distance between cluster centroids. One rule of thumb is that the ratio for between-cluster distance to within-cluster distance should exceed what value for useful clusters? a. 0.5 b. 1 c. 1.5 d. 2 Copyright Cengage Learning. Powered by Cognero.

Page 5


Name:

Class:

Date:

Chapter 05 - Descriptive Data Mining ANSWER: b 33. In which of the following scenarios would it be appropriate to use hierarchical clustering? a. When the number of observations in the dataset is relatively high b. When it is not necessary to know the nesting of clusters c. When the number of clusters is known beforehand d. When binary or ordinal data needs to be clustered ANSWER: d 34. An analysis of items frequently co-occurring in transactions is known as _____. a. market segmentation b. market basket analysis c. regression analysis d. cluster analysis ANSWER: b 35. _____ refers to the number of times a collection of items occurs together in a transaction data set. a. A consequent b. Validation count c. Support count d. Antecedent ANSWER: c 36. To identify patterns across transactions, we can use _____. a. association rules b. complete linkage c. centroid linkage d. k-means ANSWER: a 37. Which statement is true of an association rule? a. It is ultimately judged on how actionable it is and how well it explains the relationship between item sets. b. It is a data reduction technique that reduces large information into smaller homogeneous groups. c. It uses analytic models to describe the relationship between metrics that drive business performance. d. It seeks to classify a categorical outcome into two or more categories. ANSWER: a 38. The _____ the lift ratio, the _____ the association rule. a. higher; stronger b. higher; weaker c. lower; stronger d. lower; weaker ANSWER: a 39. The strength of the association rule is known as _____ and is calculated as the ratio of the confidence of an association rule to the benchmark confidence. a. lift b. antecedent Copyright Cengage Learning. Powered by Cognero.

Page 6


Name:

Class:

Date:

Chapter 05 - Descriptive Data Mining c. support count d. consequent ANSWER: a 40. The process of extracting useful information from text data is known as _____. a. text mining b. tokenization c. stemming d. corpus ANSWER: a 41. A collection of text documents to be analyzed is called a _____. a. book b. corpus c. library d. consequent ANSWER: b 42. The process of dividing text into separate terms is referred to as _____. a. data cleaning b. stemming c. tokenization d. stacking ANSWER: c 43. The process of converting a word to its stem, or root word, is referred to as _____. a. data cleaning b. stemming c. tokenization d. stacking ANSWER: b 44. In the text mining process, the text is first preprocessed by deriving a smaller set of _____ from the larger set of words contained in a collection of documents. a. tokens b. stems c. terms d. stack ANSWER: a 45. A popular measure for weighing terms based on frequency and uniqueness is _____. a. cosine distance b. word cloud c. corpus Copyright Cengage Learning. Powered by Cognero.

Page 7


Name:

Class:

Date:

Chapter 05 - Descriptive Data Mining d. term frequency times inverse document frequency ANSWER: d 46. _____ is used to measure the dissimilarity between text documents. a. Cosine distance b. Word cloud c. Corpus d. Dendrogram ANSWER: a 47. A visual representation of a document or set of documents in which the size of the word is proportional to the frequency with which the word appears is called a _____. a. cosine distance b. word cloud c. corpus d. dendrogram ANSWER: b Subjective Short Answer As part of the quarterly reviews, the manager of a retail store analyzes the quality of customer service based on the periodic customer satisfaction ratings (on a scale of 1 to 10 with 1 = Poor and 10 = Excellent). To understand the level of service quality, which includes the waiting times of the customers in the checkout section, he collected the data shown below on 100 customers who visited the store. Customer Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Wait Time (min)

Purchase Amount ($)

Customer Age

2.3 2.8 3.2 3.4 3.4 4.2 3.2 1.4 6.4 7.8 6.5 9.8 5 1.8 6.1 3.4 7.8 2.8 1.2 9.5 8.2

436 408 432 431 456 537 456 430 663 839 659 836 543 419 700 432 845 467 425 848 808

42 33 38 40 29 46 42 40 24 37 52 43 56 35 39 44 33 42 46 50 55

Copyright Cengage Learning. Powered by Cognero.

Customer Satisfaction Rating 7 6 5 5 6 4 5 8 3 4 5 2 4 8 6 7 5 6 8 4 3 Page 8


Name:

Class:

Date:

Chapter 05 - Descriptive Data Mining 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73

7.6 5.4 6.7 9.6 11.4 2.1 5.6 3.7 4.9 6.4 9.3 10.6 6.5 5.4 7.6 3.2 2.4 1 0.2 2.4 5.7 6.4 6 3.7 8.7 6.9 9.8 10 9.5 6.3 7.4 2.3 4.6 4.9 5.7 7.4 6.8 9.6 6.4 7.2 5.6 9.7 2.3 4.3 5.7 2.4 6.7 2.4 9.8 4.5 6.7 7.2

674 547 691 847 826 426 535 521 513 645 846 730 786 523 654 443 409 400 418 498 532 663 681 543 800 673 856 756 854 672 698 434 544 523 546 676 662 1000 678 655 535 833 498 508 542 435 665 387 845 532 687 643

Copyright Cengage Learning. Powered by Cognero.

35 52 38 53 48 52 32 43 44 53 52 51 53 46 36 48 54 39 51 30 32 44 39 54 51 45 43 44 43 50 47 43 40 53 55 42 36 40 46 32 36 35 30 41 49 39 41 54 34 40 30 33

3 4 5 4 2 7 7 8 6 5 4 3 3 5 6 7 8 6 7 6 5 7 8 5 5 5 4 4 6 6 7 7 4 6 6 8 6 5 5 4 5 3 7 6 6 8 5 9 7 6 5 4 Page 9


Name:

Class:

Date:

Chapter 05 - Descriptive Data Mining 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100

3.5 8.9 9.7 3.5 4.7 8.5 9.7 2.7 5.7 7.6 4.4 7.8 9.4 4.9 7.1 5.4 6.7 8.6 4.5 6.1 5.3 6.7 8.1 6.3 7.4 8.8 9.6

424 836 876 456 523 818 845 401 554 648 540 839 845 534 693 512 665 825 548 704 509 672 824 632 689 839 847

49 47 31 47 49 35 54 55 43 51 31 45 48 36 44 39 49 36 30 31 31 35 36 30 35 50 35

7 5 4 7 6 5 4 7 6 7 6 5 4 5 4 3 5 5 7 5 6 5 4 4 2 4 2

48. Using the data given, apply k-means clustering with k = 5 using Wait Time (min), Purchase Amount ($), Customer Age, and Customer Satisfaction Rating as variables. Be sure to Normalize input data and specify 50 iterations and 10 random starts in Step 2 of the XLMiner k-Means Clustering procedure. Analyze the resultant clusters. What is the smallest cluster? What is the least dense cluster (as measured by the average distance in the cluster)? What reasons do you see for low customer satisfaction ratings? ANSWER: We specify # Iterations = 50 and # Starts = 10. We use the default fixed seed of 12345. We see that the size of the clusters does not vary much. Size of cluster varies from 6 to 36. The smallest cluster has 6 customers, Cluster-4. The least dense cluster is the 36-customer cluster, Cluster-5, which includes customers with waiting time ranging from 6.1 to 11.4, purchase amount ranging from 654 to 1000, age between 31 and 55, and customer satisfaction rating ranging from 2 to 7. From the below output, it appears that more waiting times and high purchase amounts are the reasons for low customer satisfaction ratings. The high purchase amounts can be attributed to high prices of the products in the store.

Copyright Cengage Learning. Powered by Cognero.

Page 10


Name:

Class:

Date:

Chapter 05 - Descriptive Data Mining

49. Using the data given, apply hierarchical clustering with five clusters using Wait Time (min), Purchase Amount ($), Customer Age, and Customer Satisfaction Rating as variables. Be sure to Normalize input data in Step 2 of the XLMiner Hierarchical Clustering procedure. Use Ward’s method as the clustering method. a. Use a PivotTable on the data in the HC_Clusters1 worksheet to compute the cluster centers for the five clusters in the hierarchical clustering. b. Identify the cluster with the largest average waiting time. Using all the variables, how would you characterize this cluster? c. Identify the smallest cluster. d. By examining the dendrogram on the HC_Dendrogram worksheet (as well as the sequence of clustering stages in HC_Output1), what number of clusters seems to be the most natural fit based on the distance? ANSWER: a. Below is the PivotTable obtained on the data in the “HC_Clusters1” worksheet.

Copyright Cengage Learning. Powered by Cognero.

Page 11


Name:

Class:

Date:

Chapter 05 - Descriptive Data Mining

b. Cluster 5 has the largest average waiting time (approx. 9.35 min). This cluster is a collection of 11 customers characterized by the largest average purchase amount of about $823, the oldest average customer age, and the lowest average customer satisfaction rating 3.36. c. We see that the size of the clusters does not vary much. However, Cluster 5 is the smallest cluster with a collection of 11 customers. d. From the below figure, four clusters appear to be a natural fit for this data. When there are more than four clusters, mergers result in a small marginal increase in distance, but when there are less than four clusters, mergers lead to a large marginal increase in distance.

50. a. Using the data given, apply hierarchical clustering with five clusters using Wait Time (min) and Customer Satisfaction Rating as variables. Be sure to Normalize input data in Step 2 of the XLMiner Hierarchical Clustering procedure and specify single linkage as the clustering method. Analyze the resulting clusters Copyright Cengage Learning. Powered by Cognero.

Page 12


Name:

Class:

Date:

Chapter 05 - Descriptive Data Mining by computing the cluster size. It may be helpful to use a PivotTable on the data in the HC_Clusters worksheet generated by XLMiner to compute descriptive measures of the Wait Time and Customer Satisfaction Rating variables in each cluster. You can also visualize the clusters by creating a scatter plot with Wait Time (min) as the x-variable and Customer Satisfaction Rating as the y-variable. b. Repeat part a using average linkage as the clustering method. Compare the clusters to the previous method. ANSWER: a. Single linkage results in clusters with extreme sizes. There are three single-customer clusters (customer-40, customer-70, and customer-98). There is one 90-customer cluster with waiting time ranging between 1 min to 11.4 min.

e. Average linkage results in two clusters which have two customers. Some of the single linkage clusters are closely related to the average linkage clusters. For example, Cluster 1 in the single linkage is the merger of Clusters 1, 2, and 3 from the average linkage. And, Cluster 4 of the single linkage cluster is similar to Cluster 5 of the average linkage cluster.

Copyright Cengage Learning. Powered by Cognero.

Page 13


Name:

Class:

Date:

Chapter 05 - Descriptive Data Mining

51. Using the data given, apply k-means clustering using Wait time (min) as the variable with k = 3. Be sure to Normalize input data and specify 50 iterations and 10 random starts in Step 2 of the XLMiner k-Means Clustering procedure. Then create one distinct data set for each of the three resulting clusters for waiting time. a. For the observations composing the cluster which has the low waiting time, apply hierarchical clustering with Ward’s method to form two clusters using Purchase Amount, Customer Age, and Customer Satisfaction Rating as variables. Be sure to Normalize input data in Step 2 of the XLMiner Hierarchical Clustering procedure. Using a PivotTable on the data in HC_Clusters, report the characteristics of each cluster. b. For the observations composing the cluster which has the medium waiting time, apply hierarchical clustering with Ward’s method to form three clusters using Purchase Amount, Customer Age, and Customer Satisfaction Rating as variables. Be sure to Normalize input data in Step 2 of the XLMiner Hierarchical Clustering procedure. Using a PivotTable on the data in HC_Clusters, report the characteristics of each cluster. c. For the observations composing the cluster which has the high waiting time, apply hierarchical clustering with Ward’s method to form two clusters using Purchase Amount, Customer Age, and Customer Satisfaction Rating as variables. Be sure to Normalize input data in Step 2 of the XLMiner Hierarchical Clustering procedure. Using a PivotTable on the data in HC_Clusters, report the characteristics of each cluster. ANSWER: Below is the Pivot table on the data in KM_Cluster1.

a. The interval with the low waiting time is separated into two clusters with respect to Purchase amount, Age, and Customer satisfaction rating. Copyright Cengage Learning. Powered by Cognero.

Page 14


Name:

Class:

Date:

Chapter 05 - Descriptive Data Mining

b. The interval with the medium waiting time is separated into clusters of 22 and 19 customers with about similar Customer age and Customer Satisfaction Rating. The other cluster differs primarily in terms of Customer age and Customer Satisfaction Rating.

c. The interval with the high waiting time is separated into two clusters of 14 and 9 customers which have similar purchase amount and Customer Satisfaction Rating.

To examine the local housing market in a particular region, a sample of 120 homes sold during a year is collected. The data are given below. LandValue ($) 18,100 23,600 25,900 22,100 23,900 22,400 24,100 26,300 24,900 13,600 36,100 19,500 38,800 23,500 26,300 21,900 23,400 15,000 15,000 9,200 9,200 5,600

BuildingValue ($) 92,500 152,700 134,300 129,600 168,700 118,300 123,300 133,800 139,400 87,200 210,400 101,300 224,700 139,000 164,200 122,400 149,600 102,200 102,200 22,000 22,000 48,000

Copyright Cengage Learning. Powered by Cognero.

Acres 0.5 0.22 0.3 0.23 0.32 0.25 0.26 0.26 0.24 0.17 0.6 0.16 0.44 0.22 0.35 0.17 0.22 0.12 0.12 0.17 0.17 0.12

Age 53.9 19.7 15.9 41 39.9 41.8 70.9 37.8 33 34.7 52.9 67.8 21.7 10.8 3.9 15.7 15.7 97.8 97.7 120.9 120.9 103.9

Price ($) 114,885 180,895 162,038 154,496 196,973 145,075 151,480 164,762 166,528 105,762 250,170 125,082 265,066 166,697 194,881 146,818 176,048 119,584 121,759 34,947 35,214 57,142 Page 15


Name:

Class:

Date:

Chapter 05 - Descriptive Data Mining 9,000 21,000 23,500 36,000 23,700 22,000 19,900 22,100 24,600 21,500 15,000 15,700 14,200 10,700 16,600 25,500 15,100 7,400 28,500 25,100 50,100 83,300 124,500 47,000 64,600 33,900 41,100 29,100 56,400 45,400 23,800 52,800 25,100 27,200 28,100 28,800 33,400 20,700 25,600 25,800 29,300 26,000 25,900 32,800 31,100 25,800 27,200 25,000 29,200 30,000 20,400 23,600

58,800 109,600 165,900 262,500 114,900 102,700 95,800 116,300 165,500 113,400 81,100 129,200 81,600 49,700 72,700 110,700 74,300 55,500 129,400 83,900 164,600 276,000 552,300 214,400 185,000 138,800 156,300 96,400 256,400 219,200 92,100 172,800 99,200 152,600 102,900 98,800 103,900 95,600 101,900 110,700 147,700 116,000 73,500 125,000 166,800 105,300 94,800 105,900 117,500 93,300 112,000 83,400

Copyright Cengage Learning. Powered by Cognero.

0.24 0.21 0.15 0.22 0.22 0.2 0.23 0.18 0.29 0.17 0.16 0.23 0.15 0.15 0.18 0.21 0.23 0.15 0.25 0.2 0.23 0.61 1.05 0.22 0.58 0.22 0.18 0.28 0.4 0.21 0.15 0.27 0.19 0.18 0.18 0.19 0.45 0.14 0.2 0.18 0.2 0.18 0.16 0.35 0.2 0.17 0.17 0.16 0.2 0.26 0.13 0.16

88 36.7 5.7 2.9 37.7 48.9 78.9 30.8 43 44.9 62.9 46.7 57.9 99.8 91.8 48 71.8 96.8 49.9 45.8 44 47.9 5.7 92.9 91 97.9 76 57.8 56.8 79.8 91.9 74.8 36.7 16.7 75.8 53.9 84.9 89.8 57.8 51.9 90.9 44 81.8 68.7 57.7 58.8 42.9 82 53.8 55.7 80.9 57.7

72,192 133,848 194,079 300,407 141,700 128,866 119,189 141,018 193,661 137,308 99,817 148,909 100,701 65,082 92,614 137,889 91,180 64,119 160,139 113,043 217,684 360,936 679,795 264,115 254,075 173,987 200,251 130,214 316,874 267,672 119,769 229,499 128,456 181,102 132,977 131,411 139,697 120,046 131,026 141,202 181,575 144,513 100,953 160,546 199,970 134,647 124,311 133,543 151,392 124,476 136,599 110,399 Page 16


Name:

Class:

Date:

Chapter 05 - Descriptive Data Mining 16,200 29,300 27,000 25,600 46,200 22,900 27,100 30,700 29,100 34,700 20,000 35,700 35,100 33,700 33,700 36,400 33,200 39,200 33,100 16,000 24,900 22,000 20,000 33,900 22,100 22,800 24,700 38,700 25,800 31,700 82,200 19,500 24,400 22,500 25,900 22,700 21,200 34,000 18,900 33,900 23,800 23,900 18,500 36,300 47,300 36,600

85,800 123,900 97,800 86,300 220,500 160,000 105,200 107,100 102,400 150,400 80,400 159,400 161,500 162,500 162,500 176,100 122,300 169,200 180,100 98,400 63,800 121,300 107,600 230,800 153,800 111,100 11,7800 118,700 108,000 140,500 171,700 147,600 132,000 119,800 117,100 95,000 56,700 163,800 118,000 151,600 133,500 119,000 110,500 122,500 298,800 238,700

0.1 0.22 0.18 0.16 0.57 0.15 0.21 0.3 0.23 0.28 0.24 0.28 0.25 0.21 0.21 0.29 0.2 0.36 0.2 0.19 0.45 0.27 0.23 0.27 0.3 0.23 0.32 0.81 0.26 0.34 1.23 0.53 0.25 0.18 0.29 0.25 0.23 0.26 0.17 0.26 0.21 0.21 0.19 0.61 0.36 0.28

67 44.8 46.8 61.7 50.8 20.7 51.8 70 58 68.9 66.9 1.7 8.8 8.8 8.8 8.9 4.9 5.9 5.8 49.9 83.9 34.9 36.7 10 46.8 52 48.7 47.8 53.3 40.6 56.4 28.2 14.2 15.5 17.7 55.6 96.6 15.2 45.5 25.3 13.6 14.3 32.2 56.2 31.4 25.5

105,027 157,819 129,675 115,952 268,552 187,870 135,549 142,738 135,284 189,790 105,302 196,936 201,349 198,580 200,228 215,634 157,208 212,662 217,543 118,491 91,539 147,802 131,948 268,444 180,464 137,326 145,115 159,644 135,049 174,475 257,467 169,311 157,570 143,676 146,960 121,175 81,869 199,361 139,981 186,637 161,123 146,054 130,575 162,270 348,138 278,839

52. Using the data given, apply k-means clustering with k = 10 using LandValue ($), BuildingValue ($), Acres, Age, and Price ($) as variables. Be sure to Normalize input data and specify 50 iterations and 10 random starts in Step 2 of the XLMiner k-Means Clustering procedure. What is the smallest cluster? What is the least dense cluster (as measured by the Copyright Cengage Learning. Powered by Cognero.

Page 17


Name:

Class:

Date:

Chapter 05 - Descriptive Data Mining average distance in the cluster)? ANSWER: We specify # Iterations = 50 and # Starts = 10. We use the default fixed seed of 12345. We see that the size of the clusters varies widely. There are two single-home clusters, Cluster-3 and Cluster-6. The least dense cluster is the seven-home cluster, Cluster-2, which includes homes with age ranging from 21.7 to 91 and price ranging from $159,644 to $360,936.

53. Using the data given, apply hierarchical clustering with 10 clusters using LandValue ($), BuildingValue ($), Acres, Age, and Price ($) as variables. Be sure to Normalize input data in Step 2 of the XLMiner Hierarchical Clustering procedure. Use Ward’s method as the clustering method. a. Use a PivotTable on the data in the HC_Clusters1 worksheet to compute the cluster centers for the clusters in the hierarchical clustering. b. Identify the cluster with the largest average price. Using all the variables, how would you characterize this cluster? c. Identify the smallest cluster. ANSWER: a. Below is the PivotTable obtained on the data in the “HC_Clusters1” worksheet.

Copyright Cengage Learning. Powered by Cognero.

Page 18


Name:

Class:

Date:

Chapter 05 - Descriptive Data Mining

b. Cluster 7 has the largest average price (about $296,295). This cluster is a collection of six homes characterized by a cluster center indicating relatively moderate land value of $41,500, the second largest average building value of $251,983, a relatively low average acres value of 0.33; and a relatively low average age of about 25 years. c. Clusters 9 and 10 are the smallest clusters each with a single home. 54. a. Using the data given, apply hierarchical clustering with 10 clusters using LandValue ($), BuildingValue ($), Acres, Age, and Price ($) as variables. Be sure to Normalize input data in Step 2 of the XLMiner Hierarchical Clustering procedure and specify complete linkage as the clustering method. Analyze the resulting clusters by computing the cluster size. It may be helpful to use a PivotTable on the data in the HC_Clusters worksheet generated by XLMiner. You can also visualize the clusters by creating a scatter plot with Acre as the x-variable and Price ($) as the yvariable. b. Repeat part a using average group linkage as the clustering method. Compare the clusters to the previous method. ANSWER: a. Complete linkage results in clusters with extreme sizes. There are two single-home clusters (home-45 and home-105). There is one 43-home cluster, Cluster 3, which has the average price centered at $124,927.

Copyright Cengage Learning. Powered by Cognero.

Page 19


Name:

Class:

Date:

Chapter 05 - Descriptive Data Mining

b. Average group linkage results in three single-home clusters. Only one of the complete linkage clusters is identical to a cluster from average group linkage. Cluster 10 of complete linkage and average group linkage are the same.

Copyright Cengage Learning. Powered by Cognero.

Page 20


Name:

Class:

Date:

Chapter 05 - Descriptive Data Mining

55. Using the data given, apply k-means clustering using Price ($) as the variable with k = 3. Be sure to Normalize input data and specify 50 iterations and 10 random starts in Step 2 of the XLMiner k-Means Clustering procedure. Then create one distinct data set for each of the three resulting clusters of price. a. For the observations composing the cluster with low home price, apply hierarchical clustering with Ward’s method to form three clusters using Acres and Age as variables. Be sure to Normalize input data in Step 2 of the XLMiner Hierarchical Clustering procedure. Using a PivotTable on the data in HC_Clusters1, report the characteristics of each cluster. b. For the observations composing the cluster with medium home price, apply hierarchical clustering with Ward’s method to form three clusters using Acres and Age as variables. Be sure to Normalize input data in Step 2 of the XLMiner Hierarchical Clustering procedure. Using a PivotTable on the data in HC_Clusters1, report the characteristics of each cluster. c. Comment on the cluster with the high home price. ANSWER: Below is the Pivot table on the data in KM_Cluster1.

a. The interval with the low home price is separated into three clusters with respect to Acres and Age. The characteristics of each cluster are as below.

Copyright Cengage Learning. Powered by Cognero.

Page 21


Name:

Class:

Date:

Chapter 05 - Descriptive Data Mining

b. The interval with the medium home price is separated into three clusters with respect to Acres and Age. The characteristics of each cluster are as below.

c. The third cluster that has the high home price is a single-home cluster with values for Acres and Age as 1.05 and 5.7, respectively, and price $679,795. 56. A retailer is interested in analyzing the shopping trends of men concerning the items shirts, pants, jeans, t-shirts, shoes, and belts. A sample of 50 male customers is selected and the data are given below. t-shirt Formal Shirt Formal Pants Formal Pants Formal Shirt Formal Pants Formal Shoes Formal Shoes Formal Pants Belt Formal Shirt Belt Formal Shoes Formal Shoes t-shirt Formal Shirt Jeans Formal Shoes Formal Shirt Formal Shoes Belt t-shirt Formal Pants Formal Pants Formal Shirt Belt Formal Shirt Formal Pants

Formal Pants t-shirt Formal Shoes Formal Pants Formal Shirt Formal Shirt Jeans t-shirt Jeans t-shirt Formal Pants Belt Formal Pants Jeans Formal Pants Belt Formal Pants Formal Pants Formal Pants t-shirt Formal Pants Formal Shirt

Belt

Formal Shoes Formal Pants t-shirt Formal Shoes

Belt Jeans t-shirt Formal Pants t-shirt Jeans

t-shirt Formal Shoes

Formal Shoes

Belt

Formal Shirt

Belt

Formal Shoes

Jeans Belt

Belt

Formal Shoes Formal Shirt

Belt Formal Shoes

t-shirt

Formal Shirt

Formal Shoes

Jeans

Formal Shirt t-shirt Formal Shirt Jeans Formal Shirt Formal Shoes

t-shirt Formal Shoes Belt Formal Shoes Formal Shoes

Jeans Formal Shoes

Formal Pants

Copyright Cengage Learning. Powered by Cognero.

Page 22


Name:

Class:

Date:

Chapter 05 - Descriptive Data Mining Formal Shoes Jeans Formal Pants Formal Shoes Formal Pants Formal Shoes Belt Jeans Formal Shirt Formal Pants t-shirt Formal Shoes Belt Formal Pants Formal Shoes Jeans Formal Pants Belt Formal Shirt Jeans Formal Shirt Formal Shirt

Formal Pants Formal Pants Formal Shoes Formal Pants t-shirt Formal Pants Formal Shoes t-shirt Formal Pants Formal Shirt Jeans Formal Pants Formal Shoes Formal Shirt t-shirt Formal Shoes Formal Shoes Formal Pants Jeans Formal Pants Jeans Jeans

Jeans Formal Shoes Belt Formal Shirt

Formal Shirt

Belt Formal shirt Formal Pants Jeans Formal Shoes Formal shirt Belt Formal pant Formal Shoes

t-shirt Formal Pants

Formal Shirt

t-shirt

Formal Shoes

Formal shirt t-shirt

Belt

Belt t-shirt Jeans t-shirt Belt Formal Shoes Formal Pants

Formal Shirt Formal Shirt

Formal Shirt

Belt t-shirt

Formal Shoes

Formal Shoes

Belt

Jeans

Formal Shirt

a. Using a minimum support of 20 transactions and a minimum confidence of 50 percent, use XLMiner to generate a list of association rules. How many rules satisfy this criterion? b. Using the list of rules from part (a), consider the rule with the largest lift ratio. Interpret what this rule is saying about the relationship between the antecedent item set and consequent item set. c. Interpret the support count of the item set composed of the all the items involved in the rule with the largest lift ratio. d. Interpret the confidence of the rule with the largest lift ratio. e. Interpret the lift ratio of the rule with the largest lift ratio. ANSWER: a. Fourteen rules have a support count of at least 20 and a confidence of 50%. b. Antecedent: Formal Pants, Formal shoes; Consequent: Formal shirt. If a customer purchases formal pants and formal shoes, then he also purchases formal shirts. c. The support count of the item set involved in this rule is 23, meaning that formal pants, a formal shirt, and formal shoes have been purchased 23 times together. d. The confidence of this rule is 79.31%, which means that of the 29 times formal pants and formal shoes were purchased, 23 times formal shirts were also purchased. e. The lift ratio of this rule is 1.37, which means that a customer purchasing formal pants and formal shoes and who also purchased formal shirts is 37% more likely than a randomly selected customer who purchased formal shoes.

Copyright Cengage Learning. Powered by Cognero.

Page 23


Name:

Class:

Date:

Chapter 05 - Descriptive Data Mining

57. _____ clustering method defines the similarity between two clusters as the similarity of the pair of observations (one from each cluster) that are the most different. ANSWER: Complete linkage 58. _____ uses the averaging concept of cluster centroids to define between-cluster similarity. ANSWER: Centroid linkage 59. Platinum Gym has 10,000 gym members out of which 1500 memberships included Unlimited Fitness Training and use of the tanning salon, and out of which 750 included Unlimited Hydromassage. If the Fitness Training is considered A, the use of the tanning salon is considered B, and the Hydromassage is considered C, then the associate rule for these sales becomes "If A and B are purchased, then C is also purchased." Calculate the confidence level. ANSWER: 0.5 60. Platinum Gym has 10,000 gym members out of which 1500 memberships included Unlimited Fitness Training and use of the tanning salon, and out of which 750 included Unlimited Hydromassage. If the Fitness Training is considered A, the use of the tanning salon is considered B, and the Hydromassage is considered C, then the associate rule for these sales becomes "If A and B are purchased, then C is also purchased." Given total transactions for C are 3000. Calculate the benchmark confidence level ANSWER: 0.3 61. Platinum Gym has 10,000 gym members out of which 1500 memberships included Unlimited Fitness Training and use of the tanning salon, and out of which 750 included Unlimited Hydromassage. If the Fitness Training is considered A, the use of the tanning salon is considered B, and the Hydromassage is considered C, then the associate rule for these sales becomes, "If A and B are purchased, then C is also purchased." Given total transactions for C are 3000. Calculate the lift for this rule. ANSWER: 1.67 62. Using the table below, calculate the Cosine distance between Document 2 and Document 9? Document 1 2 3 4 Copyright Cengage Learning. Powered by Cognero.

Term Excellent 2 5 5 5

Poor 1 1 2 1 Page 24


Name:

Class:

Date:

Chapter 05 - Descriptive Data Mining 5 6 7 8 9 10

0 1 2 5 1 2

3 5 2 1 3 3

ANSWER: 0.495

Copyright Cengage Learning. Powered by Cognero.

Page 25


Name:

Class:

Date:

Chapter 06 - Statistical Inference Multiple Choice 1. The finite correction factor should be used in the computation of the standard deviation of the sample mean and the standard population when n / N is _____. a. greater than 0.05 b. greater than 0.5 c. less than 0.05 d. less than 0.5 ANSWER: a 2. The purpose of statistical inference is to make estimates or draw conclusions about a _____. a. sample based upon information obtained from the population b. population based upon information obtained from the sample c. statistic based upon information obtained from the population d. mean of the sample based upon the mean of the population ANSWER: b 3. A parameter is a numerical measure from a population, such as _____. a. u b. c. s d. ANSWER: a 4. A simple random sample of size n from a finite population of size N is a sample selected such that each possible sample of size _____. a. N and n have the same probability of being selected b. n has a probability of 0.5 of being selected c. n has a probability of 0.05 of being selected d. n has the same probability of being selected ANSWER: d 5. The random numbers generated using Excel's RAND function follows a _____ probability distribution between 0 and 1. a. normal b. uniform c. binomial d. random ANSWER: b 6. A random sample selected from an infinite population is a sample selected such that each element selected comes from the same _____ and each element is selected _____. a. population; independently b. population; simultaneously Copyright Cengage Learning. Powered by Cognero.

Page 1


Name:

Class:

Date:

Chapter 06 - Statistical Inference c. sample; independently d. sample; simultaneously ANSWER: a 7. The value of the _____ is used to estimate the value of the population parameter. a. population statistic b. sample parameter c. population estimate d. sample statistic ANSWER: d 8. The population parameter value and the point estimate differ because a sample is not a census of the entire population, but it is being used to develop the _____. a. population parameter b. point estimate c. population mean d. standard error ANSWER: b 9. The CEO of a company wants to estimate the percent of employees that use company computers to go on Facebook during work hours with 95% confidence. He selects a random sample of 150 of the employees and finds that 53 of them logged onto Facebook that day. What is the point estimate of the proportion of the population that logged onto Facebook that day? a. 0.25 b. 0.35 c. 0.53 d. 0.65 ANSWER: b 10. A simple random sample of 31 observations was taken from a large population. The sample mean equals 5. Five is a _____. a. population parameter b. point estimate c. population mean d. standard error ANSWER: b 11. The basis for using a normal probability distribution to approximate the sampling distribution of the sample means and population mean is _____. a. Chebyshev’s theorem b. the empirical rule c. the central limit theorem d. Bayes’ theorem ANSWER: c Copyright Cengage Learning. Powered by Cognero.

Page 2


Name:

Class:

Date:

Chapter 06 - Statistical Inference 12. When the expected value of the point estimator is equal to the population parameter it estimates, it is said to be _____. a. unbiased b. precise c. symmetric d. predicted ANSWER: a 13. A statistics teacher started class one day by drawing the names of 10 students out of a hat and asked them to do as many pushups as they could. The 10 randomly selected students averaged 15 pushups per person with a standard deviation of 9 pushups. Suppose the distribution of the population of number of pushups that can be done is approximately normal. What is the standard error of the mean? a. 0.900 b. 2.876 c. 3.061 d. 4.743 ANSWER: b 14. If the expected value of the sample statistic is equal to the population parameter being estimated, the sample statistic is said to _____. a. have low variability b. be an unbiased estimator of the population parameter c. have high precision d. be a random estimator of the population parameter ANSWER: b 15. For a population with an unknown distribution, the form of the sampling distribution of the sample mean is _____. a. approximately normal for small sample sizes b. exactly normal for large sample sizes c. exactly normal for small sample sizes d. approximately normal for large sample sizes ANSWER: d 16. The CEO of a company wants to estimate the percent of employees that use company computers to go on Facebook during work hours with 95% confidence. He selects a random sample of 150 of the employees and finds that 53 of them logged onto Facebook that day. What is the estimate of the standard error of the proportion ? a. 0.039 b. 0.050 c. 0.350 d. 0.455 ANSWER: a 17. An estimate of a population parameter that provides an interval of values believed to contain the value of the parameter is known as the _____. a. confidence level b. interval estimate Copyright Cengage Learning. Powered by Cognero.

Page 3


Name:

Class:

Date:

Chapter 06 - Statistical Inference c. parameter level d. population estimate ANSWER: b 18. In order to determine an interval for the mean of a population with unknown standard deviation, a sample of 24 items is selected. The mean of the sample is determined to be 23. The number of degrees of freedom for reading the t value is _____. a. 21 b. 22 c. 23 d. 24 ANSWER: c 19. As the number of degrees of freedom for a t distribution increases, the difference between the t distribution and the standard normal distribution _____. a. becomes larger b. becomes smaller c. stays the same d. fluctuates ANSWER: b 20. The t value for a 99% confidence interval estimation based upon a sample of size 10 is _____. a. 1.645 b. 1.812 c. 2.576 d. 3.249 ANSWER: d 21. In interval estimation, as the sample size becomes larger, the interval estimate _____. a. becomes narrower b. becomes wider c. remains the same, since the mean is not changing d. gets closer to 1.96 ANSWER: a 22. A statistics teacher started class one day by drawing the names of 10 students out of a hat and asked them to do as many pushups as they could. The 10 randomly selected students averaged 15 pushups per person with a standard deviation of 9 pushups. Suppose the distribution of the population of number of pushups that can be done is approximately normal. The 95% confidence interval for the true mean number of pushups that can be done is _____. a. 5.75 to 24.25 b. 8.56 to 21.40 c. 11.31 to 18.55 d. 13.02 to 16.98 ANSWER: b Copyright Cengage Learning. Powered by Cognero.

Page 4


Name:

Class:

Date:

Chapter 06 - Statistical Inference 23. A statistics teacher started class one day by drawing the names of 10 students out of a hat and asked them to do as many pushups as they could. The 10 randomly selected students averaged 15 pushups per person with a standard deviation of 9 pushups. Suppose the distribution of the population of number of pushups that can be done is approximately normal. If we would like to capture the population mean with 95% confidence the margin of error would be a. b. c. d. ANSWER: c 24. A sample of 37 AA batteries had a mean lifetime of 584 hours. A 95% confidence interval for the population mean was 579.2 < μ < 588.8. Which statement is the correct interpretation of the results? a. We are 95% confident that the mean lifetime of all the bulbs in the population is between 579.2 hours and 588.8 hours. b. The probability that the population mean is between 579.2 hours and 588.8 hours is 0.95. c. 95% of the light bulbs in the sample had lifetimes between 579.2 hours and 588.8 hours. d. None of these statements correctly interpret the results. ANSWER: a 25. In a random sample of 400 registered voters, 120 indicated they plan to vote for Trump for President. Determine a 95% confidence interval for the proportion of all the registered voters who will vote for Trump. a. (0.25, 0.34) b. (0.27, 0.32) c. (0.29, 0.30) d. Cannot be determined from the information given. ANSWER: a 26. Using an α = 0.04, a confidence interval for a population proportion is determined to be 0.65 to 0.75. If the level of significance is decreased, the interval for the population proportion _____. a. becomes narrower b. becomes wider c. does not change d. remains the same ANSWER: b 27. The CEO of a company wants to estimate the percent of employees that use company computers to go on Facebook during work hours with 95% confidence. He selects a random sample of 150 of the employees and finds that 53 of them logged onto Facebook that day. Compute the 95% confidence interval for the population proportion. Copyright Cengage Learning. Powered by Cognero.

Page 5


Name:

Class:

Date:

Chapter 06 - Statistical Inference a.

b.

c.

d.

ANSWER: a 28. Two approaches to drawing a conclusion in a hypothesis test are _____. a. p-value and critical value b. one-tailed and two-tailed c. Type I and Type II d. null and alternative ANSWER: a 29. A Type I error is committed when _____. a. a true alternative hypothesis is not accepted b. a true null hypothesis is rejected c. the critical value is greater than the value of the test statistic d. the validity of a claim was rejected ANSWER: b 30. What are the two decisions that you can make from performing a hypothesis test? a. Reject the null hypothesis; Fail to reject the null hypothesis b. Accept the null hypothesis; Accept the alternative hypothesis c. Make a Type I error; Make a Type II error d. Reject the alternative hypothesis; Accept the null hypothesis ANSWER: a 31. A null and alternative hypothesis for a one proportion z test are given as H0: p = 0.8, Ha: p < 0.8. This hypothesis test is _____. a. lower-tailed b. upper-tailed c. two-tailed d. incorrectly stated ANSWER: a 32. A pizza shop advertises that they deliver in 30 minutes or less or it is free. People who live in homes that are located on the opposite side of town believe it will take the pizza shop longer than 30 minutes to make and deliver Copyright Cengage Learning. Powered by Cognero.

Page 6


Name:

Class:

Date:

Chapter 06 - Statistical Inference the pizza. Write the null and alternative hypotheses that can be used to conduct a significance test. a. H0: u ≤ 30, Ha: u > 30 b. H0: u < 30, Ha: u > 30 c. H0: u ≥ 30, Ha: u < 30 d. H0: u > 30, Ha: u < 30 ANSWER: a 33. A pizza shop advertises that they deliver in 30 minutes or less or it is free. People who live in homes that are located on the opposite side of town believe it will take the pizza shop longer than 30 minutes to make and deliver the pizza. A random sample of 50 deliveries to homes across town was taken and the mean time was computed to be 32 minutes. What is the appropriate symbol to represent the value, 32? a. b. c. n = 32 d. ANSWER: b 34. The proportion of dental procedures that are extractions is 0.16. Which of the following exemplifies a Type I error in this situation? a. We reject the claim that the proportion of dental procedures that are extractions is 0.16 when the proportion is actually different from 0.16. b. We fail to reject the claim that the proportion of dental procedures that are extractions is 0.16 when the proportion is actually 0.16. c. We reject the claim that the proportion of dental procedures that are extractions is 0.16 when the proportion is actually 0.16. d. We fail to reject the claim that the proportion of dental procedures that are extractions is 0.16 when the proportion is actually different from 0.16. ANSWER: c 35. Larger values of α have the disadvantage of increasing the probability of making a _____. a. Type I error b. Type II error c. random sampling error d. normal probability error ANSWER: a 36. The average number of hours for a random sample of mail order pharmacists from company A was 50.1 hours last year. It is believed that changes to medical insurance have led to a reduction in the average work week. To test the validity of this belief, the hypotheses are _____. a. H0: u > 50.1, Ha: u < 50.1 b. H0: u = 50.1, Ha: u = 50.1 c. H0: u ≤ 50.1, Ha: u > 50.1 d. H0: u ≥ 50.1, Ha: u < 50.1 Copyright Cengage Learning. Powered by Cognero.

Page 7


Name:

Class:

Date:

Chapter 06 - Statistical Inference ANSWER: d 37. The owners of a fast food restaurant have automatic drink dispensers to help fill orders more quickly. When the 12 ounce button is pressed, they would like for exactly 12 ounces of beverage to be dispensed. There is, however, some variation in this amount. The company does not want the machine to systematically over fill or under fill the cups. Which of the following gives the correct set of hypotheses? a. H0: u > 12, Ha: u < 12 b. H0: u = 12, Ha: u ≠ 12 c. H0: u ≤ 12, Ha: u > 12 d. H0: u ≥ 12, Ha: u < 12 ANSWER: b 38. A large manufacturing plant has analyzed the amount of time required to produce an electrical part and determined that the times follow a normal distribution with mean time μ = 45 hours. The production manager has developed a new procedure for producing the part. He believes that the new procedure will decrease the population mean amount of time required to produce the part. After training a group of production line workers, a random sample of 25 parts will be selected and the average amount of time required to produce the parts will be determined. If the switch is made to the new procedure, the cost to implement the new procedure will be more than offset by the savings in manpower required to produce the parts. Use the hypotheses: Ho: μ ≥ 45 hours and Ha: μ < 45 hours. Determine the p value of the test statistic if the sample mean amount of time is = 43.118 hours with the sample standard deviation s = 5.5 hours. a. 0.04973 b. 0.04999 c. 0.95818 d. 0.04354 ANSWER: b 39. A large manufacturing plant has analyzed the amount of time required to produce an electrical part and determined that the times follow a normal distribution with mean time μ = 45 hours. The production manager has developed a new procedure for producing the part. He believes that the new procedure will decrease the population mean amount of time required to produce the part. After training a group of production line workers, a random sample of 25 parts will be selected and the average amount of time required to produce them will be determined. If the switch is made to the new procedure, the cost to implement the new procedure will be more than offset by the savings in manpower required to produce the parts. Use the hypotheses: H0: μ ≥ 45 hours and Ha: μ < 45 hours. If the sample mean amount of time is = 43.118 hours with the sample standard deviation s = 5.5 hours, give the appropriate conclusion, for α = 0.025. a. Do not reject H0, do not switch to the new procedure. b. Reject H0, switch to the new procedure. c. Reject H0, do not switch to the new procedure. d. Do not reject H0, switch to the new procedure. ANSWER: a 40. A one-tailed test is a hypothesis test in which the rejection region is _____. a. in both tails of the sampling distribution b. in one tail of the sampling distribution Copyright Cengage Learning. Powered by Cognero.

Page 8


Name:

Class:

Date:

Chapter 06 - Statistical Inference c. only in the lower tail of the sampling distribution d. only in the upper tail of the sampling distribution ANSWER: b 41. Determine whether the alternative hypothesis is left-tailed, right-tailed, or two-tailed: H0: μ = 11, Ha: μ > 11. a. Left-tailed b. Right-tailed c. Two-tailed d. There is not enough information to make a determination. ANSWER: b 42. Which statement is NOT true? a. The greater the level of confidence, the more likely it is that the confidence interval actually includes the true population mean. b. The greater the level of confidence, the larger the z-score. c. The greater the level of confidence, the wider the confidence interval. d. Rejecting the null hypothesis when it is true is a Type II error. ANSWER: d 43. Statistical significance at the 0.01 level is _____ than significance at the 0.05 level. a. more difficult to achieve b. easier to achieve c. less costly d. less informative ANSWER: a 44. You are _____ to commit a Type I error using the 0.05 level of significance than using the 0.01 level of significance. a. more likely b. less likely c. equally likely d. twice as likely ANSWER: a 45. Which statement is NOT true? a. Rejecting the null hypothesis when it is true is a Type I error. b. The probability of making a Type I error is symbolized by α. c. Failing to reject the null hypothesis when it is false is a Type I error. d. Type II error can occur for both one and two-tailed tests. ANSWER: c 46. A student wants to determine if pennies are really fair when flipped, meaning equally likely to land heads up or tails up. He flips a random sample of 50 pennies and finds that 28 of them land heads up. If p denotes the true probability of a penny landing heads up when flipped, what are the appropriate null and alternative hypotheses? a. H0: p ≥ 0.5, Ha: p < 0.5. Copyright Cengage Learning. Powered by Cognero.

Page 9


Name:

Class:

Date:

Chapter 06 - Statistical Inference b. H0: p ≤ 0.5, Ha: p ≠ 0.5. c. H0: p = 0.5, Ha: p ≠ 0.5. d. H0: p ≥ 28, Ha: p < 28. ANSWER: c 47. One reason a sample may fail to represent the population of interest is _____. a. measurement error b. sampling error c. statistical inference d. population proportion ANSWER: b 48. The processes that generate big data can be described by the following four attributes or dimensions: a. volume, variety, veracity, and velocity b. volume, variability, veracity, and velocity c. variety, vectors, veracity, and velocity d. tall data, wide data, narrow data, and big data ANSWER: a Subjective Short Answer 49. A simple random sample of 11 observations from a population containing 400 female soccer players was taken, and the following values were obtained. 48

53

72

56

63

64

56

76

50

46

73

What is the value of the point estimate of the population mean? ANSWER: 59.7 50. What is the general form of an interval estimate? ANSWER: point estimate ± margin of error 51. It is impossible to construct a sampling frame for an _____ population. ANSWER: infinite 52. Numerical characteristics of the population are called _____. ANSWER: parameters. 53. The medical director of a company looks at the medical records of all 50 employees and finds that the mean systolic blood pressure for these employees is 126.07. The value of 126.07 is symbolized by _____. ANSWER: . 54. The sample mean is the point estimator of what population parameter? Copyright Cengage Learning. Powered by Cognero.

Page 10


Name:

Class:

Date:

Chapter 06 - Statistical Inference ANSWER: u 55. Sample statistics, such as , s, or ANSWER: point estimates

that provide an estimate of the population parameter are also known as _____.

56. As the sample size increases, the standard error of the mean _____. ANSWER: decreases 57. A simple random sample of 100 observations was taken from a large population. The sample mean and the standard deviation were determined to be 80 and 12, respectively. Calculate the standard error of the mean. ANSWER: 1.20 58. A sample of 92 observations is taken from an infinite population. The sampling distribution of normal because of what theorem?

is approximately

ANSWER: Central limit theorem 59. A random sample of 150 people was taken from a very large population. Ninety of the people in the sample were female. What is the standard error of the proportion? ANSWER: 0.04 60. Random samples of size 100 are taken from an infinite population whose population proportion is 0.2. What are the mean and standard deviation of the sample proportion? ANSWER: 0.2 and 0.04 61. As a rule of thumb, the sampling distribution of the sample proportion can be approximated by a normal probability distribution when _____. ANSWER: n(1 – p) ≥ 5 and np ≥ 5. 62. A cellular phone company claims that the mean amount spent per month is more than $75. A test is made of H0: μ = 75 versus Ha: μ > 75. The null hypothesis is rejected. State the appropriate conclusion. ANSWER: There is sufficient evidence to support the claim that the mean checkout amount is greater than $75. 63. In a survey of 3539 female university students ages 18 to 22, 401 say they live in off-campus housing. If you constructed 90% and 95% confidence intervals for the population proportion, how would they differ? As the level of confidence _____, the confidence interval gets _____. ANSWER: increases; wider. Also acceptable are the phrases decreases; narrower 64. What is the difference between the standardized normal distribution (used for tests with z) and the t distribution? ANSWER: The t distribution is used when the population standard deviation is unknown and the sample size is small. The t-test statistic is calculated using n – 1 degrees of freedom. 65. A car manufacturer states that a new make and model of car has a mean gas mileage of at least 35 miles per gallon. A Copyright Cengage Learning. Powered by Cognero.

Page 11


Name:

Class:

Date:

Chapter 06 - Statistical Inference consumer group thinks that this is overstated. A test of H0: μ ≥ 35 versus Ha: μ < 35 fails to reject the null hypothesis. State the appropriate conclusion. ANSWER: There is insufficient evidence to support the suspicion that the true mean gas mileage is less than 35 miles per gallon. 66. A simple random sample of 11 observations from a population containing 400 female soccer players was taken, and the following values were obtained. 48

53

72

56

63

64

56

76

50

46

73

Calculate a 95% confidence interval for the population mean. ANSWER: (52.64, 66.82) 67. A simple random sample of 100 students was asked, “Have you eaten pizza within the past week?” Of the 100 students, 82 said “yes.” Calculate a 95% confidence interval for the true population proportion. ANSWER: (0.7447, 0.8953) 68. Consumer data collected from shopper cards show that, nationwide, 97% of U.S. shoppers buy toilet paper. A grocery store manager thinks that more than 97% of shoppers at his store buy toilet paper. He takes a random sample of 500 shoppers and finds that 490 purchased toilet paper. He wants to test H0: p ≤ 0.97 versus Ha: p > 0.97. What is the value of the test statistic and the associated p value? ANSWER: z = 1.31, p value = 0.0950. 69. A cell phone user wants to determine what data plan she should get on her new contract. She selects a random sample of 15 months and finds her average usage to be 11.25 GB with a standard deviation of 2.5 GB. She wants to test H0: µ ≥ 12 GB versus Ha: µ < 12 GB. What is the value of the test statistic and the associated p value? ANSWER: t = –1.16, p value = 0.1324.

Copyright Cengage Learning. Powered by Cognero.

Page 12


Name:

Class:

Date:

Chapter 07 - Linear Regression Multiple Choice 1. _____ is a statistical procedure used to develop an equation showing how two variables are related. a. Regression analysis b. Data mining c. Time series analysis d. Factor analysis ANSWER: a 2. A regression analysis involving one independent variable and one dependent variable is referred to as a _____. a. factor analysis b. time series analysis c. simple linear regression d. data mining ANSWER: c 3. The population parameters that describe the y-intercept and slope of the line relating y and x, respectively, are _____. a. B0 and B1 b. y and x c. a and b d. a and B ANSWER: a 4. In a simple linear regression model, y = ß0 + ß1x + ε the parameter ß1 represents the _____. a. intercept b. slope of the true regression line c. mean value of x d. error term ANSWER: b 5. In the simple linear regression model, the _____ accounts for the variability in the dependent variable that cannot be explained by the linear relationship between the variables. a. constant term b. error term c. model parameter d. residual ANSWER: b 6. In a linear regression model, the variable that is being predicted or explained is known as _____. It is denoted by y and is often referred to as the response variable. a. dependent variable b. independent variable c. residual variable d. regression variable ANSWER: a 7. The graph of the simple linear regression equation is a(n) _____. a. ellipse b. hyperbola c. parabola d. straight line ANSWER: d 8. In the graph of the simple linear regression equation, the parameter ß0 represents the _____ of the true regression line. Copyright Cengage Learning. Powered by Cognero.

Page 1


Name:

Class:

Date:

Chapter 07 - Linear Regression a. slope c. y-intercept ANSWER: c

b. x-intercept d. end-point

9. In the graph of the simple linear regression equation, the parameter ß1 is the _____ of the true regression line. a. slope b. x-intercept c. y-intercept d. end-point ANSWER: a 10. In a linear regression model, the variable (or variables) used for predicting or explaining values of the response variable are known as the _____. It(they) is(are) denoted by x. a. dependent variable b. independent variable c. residual variable d. regression variable ANSWER: b 11. In a simple linear regression analysis the quantity that gives the amount by which the dependent variable changes for a unit change in the independent variable is called the _____. a. coefficient of determination b. slope of the regression line c. correlation coefficient d. standard error ANSWER: b 12. A _____ is used to visualize sample data graphically and to draw preliminary conclusions about the possible relationship between the variables. a. contingency table b. scatter chart c. Gantt chart d. pie chart ANSWER: b 13. A procedure for using sample data to find the estimated regression equation is _____. a. point estimation b. interval estimation c. the least squares method d. extrapolation ANSWER: c 14. When the mean value of the dependent variable is independent of variation in the independent variable, the slope of the regression line is _____. a. positive b. zero c. negative d. infinite ANSWER: b 15. The difference between the observed value of the dependent variable and the value predicted using the estimated regression equation is known as the _____. a. constant term b. error term c. residual d. model parameter ANSWER: c 16. The _____ is the range of values of the independent variables in the data used to estimate the regression model. Copyright Cengage Learning. Powered by Cognero.

Page 2


Name:

Class:

Date:

Chapter 07 - Linear Regression a. confidence interval c. experimental region ANSWER: c

b. codomain d. validation set

17. Prediction of the value of the dependent variable outside the experimental region is called _____. a. interpolation b. forecasting c. averaging d. extrapolation ANSWER: d 18. Prediction of the mean value of the dependent variable y for values of the independent variables x1, x2, . . . , xq that are outside the experimental range is called _____. a. dummy variable b. overfitting c. extrapolation d. interaction ANSWER: c 19. The _____ is a measure of the error that results from using the estimated regression equation to predict the values of the dependent variable in the sample. a. sum of squares due to regression (SSR) b. error term c. sum of squares due to error (SSE) d. residual ANSWER: c 20. The least squares regression line minimizes the sum of the _____ a. differences between actual and predicted y values b. absolute deviations between actual and predicted y values c. absolute deviations between actual and predicted x d. squared differences between actual and predicted y values values ANSWER: d 21. What would be the value of the sum of squares due to regression (SSR) if the total sum of squares (SST) is 25.32 and the sum of squares due to error (SSE) is 6.89? a. 31.89 b. 19.32 c. 18.43 d. 15.32 ANSWER: c 22. The _____ is a measure of the goodness of fit of the estimated regression equation. It can be interpreted as the proportion of the variability in the dependent variable y that is explained by the estimated regression equation. a. residual b. coefficient of determination c. dummy variable d. interaction variable ANSWER: b 23. The coefficient of determination _____. a. takes values between –1 to +1 c. is equal to negative one for the poorest fit ANSWER: d Copyright Cengage Learning. Powered by Cognero.

b. is equal to zero for a perfect fit d. is used to evaluate the goodness of fit

Page 3


Name:

Class:

Date:

Chapter 07 - Linear Regression 24. What would be the coefficient of determination if the total sum of squares (SST) is 23.29 and the sum of squares due to regression (SSR) is 10.03? a. 2.32 b. 0.43 c. 0.19 d. 0.89 ANSWER: b 25. Regression analysis involving one dependent variable and more than one independent variable is known as ____. a. simple regression b. linear regression c. multiple regression d. None of these are correct. ANSWER: c 26. The process of making estimates and drawing conclusions about one or more characteristics of a population through analysis of sample data drawn from the population is known as _____. a. inductive inference b. deductive inference c. statistical inference d. Bayesian inference ANSWER: c 27. The process of making a conjecture about the value of a population parameter, collecting sample data that can be used to assess this conjecture, measuring the strength of the evidence against the conjecture that is provided by the sample, and using these results to draw a conclusion about the conjecture is known as _____. a. postulation b. hypothesis testing c. statistical inference d. empirical research ANSWER: b 28. _____ refers to the use of sample data to calculate a range of values that is believed to include the value of the population parameter. a. Interval estimation b. Hypothesis testing c. Statistical inference d. Point estimation ANSWER: a 29. A normally distributed error term with a mean of zero would _____. a. have values that are symmetric about the variance b. allow more accurate modeling c. yield biased regression estimates d. be a hyperbolic curve ANSWER: b 30. The scatter chart below displays the residuals versus the dependent variable, x. Which of the following conclusions can be drawn from the scatter chart given below?

Copyright Cengage Learning. Powered by Cognero.

Page 4


Name:

Class:

Date:

Chapter 07 - Linear Regression

a. The residuals have an increasing variance as the dependent variable increases. b. The model captures the relationship between the variables accurately. c. The regression model follows the standard normal probability distribution. d. The residual distribution is consistently scattered about zero. ANSWER: a 31. The scatter chart below displays the residuals versus the dependent variable, t. Which of the following conclusions can be drawn based upon this scatter chart?

a. Model is time-invariant. b. Model captures the relationship between the variables accurately. c. Residuals are not independent. d. Residuals are normally distributed. ANSWER: c 32. The scatter chart below displays the residuals versus the dependent variable, x. Which of the following conclusions can be drawn based upon this scatter chart?

Copyright Cengage Learning. Powered by Cognero.

Page 5


Name:

Class:

Date:

Chapter 07 - Linear Regression

a. The residuals have a constant variance. b. The model fails to capture the relationship between the variables accurately. c. The model overpredicts the value of the dependent variable for small values and large values of the independent variable. d. The residuals are normally distributed. ANSWER: b 33. The scatter chart below displays the residuals versus the dependent variable, x. Which of the following conclusions can be drawn based upon this scatter chart?

a. The residuals have a constant variance. b. The model captures the relationship between the variables accurately. c. The model underpredicts the value of the dependent variable for intermediate values of the independent variable. d. The residual distribution is not normally distributed. ANSWER: d 34. The _____ is an indication of how frequently interval estimates based on samples of the same size taken from the same population using identical sampling techniques will contain the true value of the parameter we are estimating. a. residual b. tolerance factor Copyright Cengage Learning. Powered by Cognero.

Page 6


Name:

Class:

Date:

Chapter 07 - Linear Regression c. confidence level ANSWER: c

d. accuracy level

35. _____ refers to the degree of correlation among independent variables in a regression model. a. Multicollinearity b. Tolerance c. Rank d. Confidence level ANSWER: a 36. The degree of correlation among independent variables in a regression model is called _____. a. multicollinearity b. interaction c. the coefficient of determination d. the sum of squared errors (SSE) ANSWER: a 37. _____ is used to test the hypothesis that the values of the regression parameters ß1, ß2, ... ßq are all zero. a. An F test b. A t test c. The least squares method d. Extrapolation ANSWER: a 38. A variable used to model the effect of categorical independent variables in a regression model which generally takes only the value zero or one is called _____. a. a residual b. the coefficient of determination c. a dummy variable d. interaction ANSWER: c 39. A variable used to model the effect of categorical independent variables in a regression model is known as a _____. a. dependent variable b. response c. dummy variable d. predictor variable ANSWER: c 40. Which of the following regression models is used to model a nonlinear relationship between the independent and dependent variables by including the independent variable and the square of the independent variable in the model? a. Multiple regression model b. Quadratic regression model c. Simple regression model d. Least squares regression model ANSWER: b 41. The prespecified value of the independent variable at which its relationship with the dependent variable changes in a piecewise linear regression model is referred to as the ______. a. milestone b. knot c. tipping point d. watchpoint ANSWER: b 42. _____ refers to the scenario in which the relationship between the dependent variable and one independent variable is different at different values of a second independent variable. a. Interaction b. Multicollinearity c. Autocorrelation d. Covariance Copyright Cengage Learning. Powered by Cognero.

Page 7


Name:

Class:

Date:

Chapter 07 - Linear Regression ANSWER: a 43. Fitting a model too closely to sample data, resulting in a model that does not accurately reflect the population is termed as _____. a. approximation b. hypothesizing c. overfitting d. postulating ANSWER: c 44. Assessing the regression model on data other than the sample data that was used to generate the model is known as _____. a. approximation b. cross-validation c. graphical validation d. postulation ANSWER: b 45. _____ is the data set used to build the candidate models. a. Range b. Codomain c. Validation set d. Training set ANSWER: d 46. ______ refers to the data set used to compare model forecasts and ultimately pick a model for predicting values of the dependent variable. a. Codomain b. Training set c. Validation set d. Range ANSWER: c 47. A _____ is an interval estimate of an individual y value, given values of the independent variables. a. prediction interval b. confidence interval c. interval estimation d. regression ANSWER: a Subjective Short Answer 48. Listed below is data on profit and market capitalization for a sample of 15 different firms in the United States. Profits ($ Market Capitalization millions) y ($ millions) x 296.2 1,936.9 –25 1,171.8 4,085 55,135.8 6,558 97,417.2 12,525 95,198.9 3,394 53,579.7 442.8 12,466.3 633.1 8,894.3 3,528 65,872.4 Copyright Cengage Learning. Powered by Cognero.

Page 8


Name:

Class:

Date:

Chapter 07 - Linear Regression 2,698 1,200.65 11,987 641.8 5,043 5,206

25,661.3 19,854.7 195,643.8 10,447.8 66,695.5 53,558.4

a. Develop a scatter chart for the above data. What does this chart indicate about the relationship between market capitalization and profit? b. Use the data to develop an estimated regression equation that could be used to estimate a firm’s profit based on its market capitalization. What is the estimated regression model? c. What is the predicted profit for the market capitalization of 70,721.3 (million)? ANSWER: a. The scatter chart with Market Capitalization as the independent variable follows.

This scatter chart indicates there may be a positive linear relationship between market capitalization and profit. b. The following Excel output provides the estimated regression equation that could be used to estimate the firm’s profit (y) based on its market capitalization (x).

Copyright Cengage Learning. Powered by Cognero.

Page 9


Name:

Class:

Date:

Chapter 07 - Linear Regression

The estimated simple linear regression equation is

.

The estimated simple linear regression equation can also be found by adding a trendline to the scatter chart. c. The predicted profit for a market capitalization of 70,721.3 will be or approximately 3,537 ($ millions). 49. A research center is interested in investigating the height and age of children who are between 5 to 9 years old. In order to do this, a sample of 15 children is selected and the data are given below. Age (in years) Height (inches) 7 47.3 8 48.8 5 41.3 8 50.4 8 51 7 47.1 7 46.9 7 48 9 51.2 8 51.2 5 40.3 8 48.9 6 45.2 5 41.9 8 49.6 a. Develop a scatter chart with age as the independent variable. What does the scatter chart indicate about the relationship between the height and age of children? b. Use the data to develop an estimated regression equation that could be used to estimate the height based on the age. What is the estimated regression model? c. How much of the variation in the sample values of height does the model estimated in part (b) explain? Copyright Cengage Learning. Powered by Cognero.

Page 10


Name:

Class:

Date:

Chapter 07 - Linear Regression ANSWER:

a. The scatter chart with age as the independent variable follows.

This scatter chart indicates there may be a positive linear relationship between height and age of children who are 5 to 10 years old. b. The following Excel output provides the estimated regression equation that could be used to estimate a child’s height (y) based on age (x).

The estimated simple linear regression equation is

.

The estimated simple linear regression equation can also be found by adding a trendline to the scatter chart. c. The coefficient of determination r2 is 0.9402, so the regression model estimated in part (b) explains approximately 94% of the variation in the height in the sample. 50. Listed below is a company’s sales in Year 1 through Year 12 along with the national income of the country, where the Copyright Cengage Learning. Powered by Cognero.

Page 11


Name:

Class:

Date:

Chapter 07 - Linear Regression business is set up. Year Year 1 Year 2 Year 3 Year 4 Year 5 Year 6 Year 7 Year 8 Year 9 Year 10 Year 11 Year 12

National Income (in millions of dollars) x 305 316 358 350 375 392 400 398 430 456 578 498

Company's sales (in thousands of dollars) y 470 485 499 515 532 532 556 576 583 587 601 605

a. Develop a scatter chart for the above data. What does this chart indicate about the relationship between the national income and the company's sales in Year 1 through Year 12? b. Use the data to develop an estimated regression equation that could be used to estimate the company’s sales based on the national income. What is the estimated regression model? ANSWER: a. The scatter chart with national income as the independent variable follows.

This scatter chart indicates there may be a positive linear relationship between national income and company’s sales. b. The following Excel output provides the estimated regression equation that could be used to estimate the company’s sales (y) based on the national income (x).

Copyright Cengage Learning. Powered by Cognero.

Page 12


Name:

Class:

Date:

Chapter 07 - Linear Regression

The estimated simple linear regression equation is

.

The estimated simple linear regression equation can also be found by adding a trendline to the scatter chart. 51. Listed below is a company’s sales in Year 1 through Year 12 along with the national income of the country, where the business is set up. National Income (in Company's sales (in Year millions of dollars) x thousands of dollars) y Year 1 305 470 Year 2 316 485 Year 3 358 499 Year 4 350 515 Year 5 375 532 Year 6 392 532 Year 7 400 556 Year 8 398 576 Year 9 430 583 Year 10 456 587 Year 11 578 601 Year 12 498 605 Test whether each of the regression parameters β0 and β1 is equal to zero at a 0.05 level of significance. What are the correct interpretations of the estimated regression parameters? ANSWER: First, we check the conditions necessary for valid inference in regression. The Excel plot of the residuals and market capitalization follows.

Copyright Cengage Learning. Powered by Cognero.

Page 13


Name:

Class:

Date:

Chapter 07 - Linear Regression

Because we are working with only 12 observations, assessing the conditions necessary for inference to be valid in regression is extremely difficult. However, this scatter chart does not provide strong evidence of a violation of the conditions, so we will proceed with our inference. Excel output:

The p value associated with the estimated regression parameter b1 is 8.9E-05, which is 0.000089, or approximately equal to 0. Because this p value is less than the 0.05 level of significance, we reject the hypothesis that β1 = 0. Hence, we conclude that there is a relationship between national income and the company’s sales, and, on average, a $1 million increase in national income corresponds to an increase of $534.02 in the company's sales. The company's sales are expected to increase as national income increases, so this result is consistent with what is expected. The p value associated with the estimated regression parameter b0 is 2.7E-06 which is 0.0000027, or approximately equal to 0. Because this p value is less than the 0.05 level of significance, we reject the hypothesis that β0= 0. The estimated regression parameter b0 suggests that when national income is zero, the company's sales are $328,981, which is not a realistic estimate. 52. The data listed below is the average personal income and personal consumption expenditures based on a survey conducted in the United States over 15 consecutive years. Personal income Personal consumption Copyright Cengage Learning. Powered by Cognero.

Page 14


Name:

Class:

Date:

Chapter 07 - Linear Regression ($) 23,310 24,444 25,657 27,260 28,336 30,317 31,162 31,448 32,282 33,872 35,423 37,723 39,418 40,156 39,113

expenditures ($) 18,714 19,569 20,414 21,434 22,738 24,227 25,074 25,865 26,848 28,228 29,818 31,210 32,551 33,273 32,853

a. Develop a scatter chart for the above data. What does this chart indicate about the relationship between average personal income and personal consumption expenditure? b. Develop an estimated regression equation showing how personal consumption expenditure is related personal income. c. What proportion of variation in the sample values of proportion of personal consumption expenditure does this model explain? ANSWER: a. The scatter chart with National Income as the independent variable follows.

This scatter chart indicates there is a positive linear relationship between personal income and personal consumption expenditure. b. The following Excel output provides the estimated regression equation that could be used to estimate the personal consumption expenditure (y) based on the personal income (x).

Copyright Cengage Learning. Powered by Cognero.

Page 15


Name:

Class:

Date:

Chapter 07 - Linear Regression

The estimated simple linear regression equation is . The estimated simple linear regression equation can also be found by adding a trendline to the scatter chart. c. The coefficient of determination r2 is 0.9954, so the regression model estimated in part (b) explains approximately 99.5% of the variation in the personal consumption expenditure in the sample. 53. The data listed below is the average personal income and personal consumption expenditures based on a survey conducted in the United States over 15 consecutive years. Personal income Personal consumption ($) expenditures ($) 23,310 18,714 24,444 19,569 25,657 20,414 27,260 21,434 28,336 22,738 30,317 24,227 31,162 25,074 31,448 25,865 32,282 26,848 33,872 28,228 35,423 29,818 37,723 31,210 39,418 32,551 40,156 33,273 39,113 32,853 a. What is the 95 percent confidence interval for the regression parameter β1? Based on this interval, what conclusion can you make about the hypotheses that the regression parameter β1 is equal to zero? d. What is the 95 percent confidence interval for the regression parameter β0? Based on this interval, what conclusion can you make about the hypotheses that the regression parameter β0 is equal to zero? ANSWER: a. First we check the conditions necessary for valid inference in regression. The Excel Copyright Cengage Learning. Powered by Cognero.

Page 16


Name:

Class:

Date:

Chapter 07 - Linear Regression plot of the residuals and personal income follows.

The scatter chart suggests that the regression model might underpredict the value of dependent variable. However, because we are working with only 15 observations, assessing the conditions necessary for inference to be valid in regression is extremely difficult. So, we will proceed with our inference. Excel Output:

The 95% confidence interval for the regression parameter β1 provided in the Excel output is (0.8616, 0.9349). Because this interval does not include zero, we reject the hypothesis that β1 = 0. Hence, we conclude that there is a relationship between personal income and personal consumption expenditure. And, our best estimate is that a one-dollar increase in personal income corresponds to an average increase of $0.8983 in personal consumption expenditure. b. The 95% confidence interval for the regression parameter β0 provided in the Excel output is (–3740.5039, –1363.1151). This interval does not include zero, so we reject the hypothesis that β0 = 0. 54. A survey is conducted to determine whether the age of car influences the annual maintenance cost. A sample of 10 cars is selected and the data is shown below. Copyright Cengage Learning. Powered by Cognero.

Page 17


Name:

Class:

Date:

Chapter 07 - Linear Regression Age of car (months) x 3 5 6 7 9 10 11 13 14 15

Annual Maintenance Cost ($) y 120 115 135 290 275 300 350 475 500 550

a. Develop a scatter chart for these data with age of cars as the independent variable. What does the scatter chart indicate about the relationship between age of a car and the annual maintenance cost? b. Use the data to develop an estimated regression equation that could be used to predict the annual maintenance cost given the age of the car. What is the estimated regression model? ANSWER: c. The scatter chart with age of cars as the independent variable follows.

This scatter chart indicates there is a positive linear relationship between age of a car and the annual maintenance cost. d. The following Excel output provides the estimated regression equation that could be used to estimate the annual maintenance based on the age of a car.

Copyright Cengage Learning. Powered by Cognero.

Page 18


Name:

Class:

Date:

Chapter 07 - Linear Regression

The estimated simple linear regression equation is

.

The estimated simple linear regression equation can also be found by adding a trendline to the scatter chart. 55. A survey is conducted to determine whether the age of car influences the annual maintenance cost. A sample of 10 cars is selected and the data is shown below. Annual Maintenance Age of car (months) x Cost ($) y 3 120 5 115 6 135 7 290 9 275 10 300 11 350 13 475 14 500 15 550 a. Test whether each of the regression parameters β0 and β1 is equal to zero at a 0.05 level of significance. b. Interpret the estimated regression parameters? Are these interpretations reasonable? ANSWER: a. First we check the conditions necessary for valid inference in regression. The Excel plot of the residuals and age of cars follows.

Copyright Cengage Learning. Powered by Cognero.

Page 19


Name:

Class:

Date:

Chapter 07 - Linear Regression

Because we are working with only 10 observations, assessing the conditions necessary for inference to be valid in regression is extremely difficult. However, this scatter chart does not provide strong evidence of a violation of the conditions, so we will proceed with our inference. Excel output:

The p value associated with the estimated regression parameter b1is 4.14E-06, which is 0.00000414 or essentially zero. Because this p value is less than the 0.05 level of significance, we reject the hypothesis that β1= 0. We conclude that there is a linear relationship between the age of a car and the annual maintenance cost. Based upon the regression model, we estimate that as the age of the car increases by one year, the maintenance cost increases by $38.34, on average. The maintenance cost of a car is expected to increase as the age increases. So, this result is consistent with what is expected. The p value associated with the estimated regression parameter b0is 0.229, which is greater than the 0.05 level of significance, we cannot reject the hypothesis that β0= 0. The estimated regression parameter b0 suggests that when the age of car is 0, the maintenance cost is –$45.60. This result is obviously not realistic, and the parameter estimate and the test of the hypothesis that β0= 0 are meaningless because the y-intercept has been estimated through extrapolation (there is no car in the sample data with age zero). 56. A research team in at the Gonzaga University is interested in predicting a student's overall university GPA if his/her high school GPA is known. Assume that a random sample of 20 students is selected from the data listed below. High School GPA University GPA 3.65 3.72 Copyright Cengage Learning. Powered by Cognero.

Page 20


Name:

Class:

Date:

Chapter 07 - Linear Regression 3.08 3.12 2.87 3.34 2.7 3.25 3.01 3.2 2.62 3.71 2.26 3.02 3.58 3.67 2.92 3.71 3.22 2.27 2.8

3.21 3 2.67 3.57 2.97 2.83 2.89 3.8 2.43 3.82 2.66 3.62 3.83 3.74 2.64 3.53 2.99 2.37 2.69

a. Develop a scatter chart for these data with High School GPA as the independent variable. What does the scatter chart indicate about the relationship between high school GPAs and overall university GPA? b. Develop an estimated regression equation showing how high school GPA is related to overall university GPA. What is the estimated regression model? c. What is the predicted overall university GPA of Sophia, a student who has been admitted to Gonzaga University, with 3.40 high school GPA? ANSWER: a. The scatter chart with high school GPA as the independent variable follows.

This scatter chart indicates there may be a positive linear relationship between high school GPAs and overall university GPA. b. The following Excel output provides the estimated regression equation that could be used to estimate the overall university GPA (y) based on student’s GPA scored in high school (x). Copyright Cengage Learning. Powered by Cognero.

Page 21


Name:

Class:

Date:

Chapter 07 - Linear Regression

The estimated simple linear regression equation is . The estimated simple linear regression equation can also be found by adding a trendline to the scatter chart. c. The predicted overall university GPA of Sophia who has scored 3.40 in high school GPA will be

57. A researcher wanted to study effect of two factors, x1 and x2, on yield (y). The observations are given below. Observations y x1 x2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

42.5 43.5 43.9 44.8 46.8 47.5 50.1 51.9 54.7 54.8 57.1 57.8 62.3 66.7 71.2

30.1 29.3 31.1 29.6 29.7 29.9 30.1 30.4 30.5 31 31.8 31.4 31.5 32.1 32.5

260.4 261.7 273.6 278.6 281.5 294.6 301.2 314.6 320.5 324.7 356.7 370.3 378 384.8 396.9

a. Develop an estimated linear regression equation with the factor x1 as the independent variable. Test for a significant relationship between factor x1 and yield at the 0.05 level of significance. b. How much of the variation in the sample values of yield does the model in part (a) explain? ANSWER: a. The following Excel output provides the estimated linear regression equation that can be used to predict yield (y) given the factor x1.

Copyright Cengage Learning. Powered by Cognero.

Page 22


Name:

Class:

Date:

Chapter 07 - Linear Regression

The estimated linear regression equation is . Before testing the hypothesis β1 =0 for this regression model, we check the conditions necessary for valid inference in regression. The Excel plot of the residuals and factor x1 follows.

Because we are working with only 15 observations, assessing the conditions necessary for inference to be valid in regression is extremely difficult. However, these scatter charts do not provide strong evidence of a violation of the conditions. The p value for the test of the hypothesis that β1 = 0 is 9.64E-10 = 0.000000000964, which is essentially zero. Because this p value is less than the 0.05 level of significance, we reject the hypothesis that β1 = 0, and conclude that there is a relationship between yield and factor x1 at the 0.05 level of significance. b. The coefficient of determination r2 is 0.9483, so the regression model estimated in part (a) explains approximately 94.8% of the variation in the yield in the sample. 58. A researcher wanted to study effect of two factors, x1 and x2, on yield (y). The observations are given below. Observations 1 2 3 4 5 6

x1 42.5 43.5 43.9 44.8 46.8 47.5

x2 30.1 29.3 31.1 29.6 29.7 29.9

Copyright Cengage Learning. Powered by Cognero.

y 260.4 261.7 273.6 278.6 281.5 294.6 Page 23


Name:

Class:

Date:

Chapter 07 - Linear Regression 7 8 9 10 11 12 13 14 15

50.1 51.9 54.7 54.8 57.1 57.8 62.3 66.7 71.2

30.1 30.4 30.5 31 31.8 31.4 31.5 32.1 32.5

301.2 314.6 320.5 324.7 356.7 370.3 378 384.8 396.9

a. Develop an estimated regression equation with both factors x1 and x2 as the independent variables. Is the overall regression statistically significant at the 0.05 level of significance? If so, then test whether each of the regression parameters β0, β1, and β2 is equal to zero at a 0.01 level of significance. What are the correct interpretations of the estimated regression parameters? b. How much of the variation in the sample values of y does the model in part (a) explain? ANSWER: The following Excel output provides the estimated multiple linear regression equation that could be used to predict the yield (y) given the two factors x1 and x2.

The estimated multiple linear regression equation is

.

Before testing for a significant overall regression relationship (that is, testing the hypothesis that β1 = β2 = 0), we check the conditions necessary for valid inference in regression. The Excel plots of the residuals and each of the two independent variables follow.

Copyright Cengage Learning. Powered by Cognero.

Page 24


Name:

Class:

Date:

Chapter 07 - Linear Regression

Because we are working with only 15 observations, assessing the conditions necessary for inference to be valid in regression is extremely difficult. This scatter chart of the residuals versus factor x1 does not provide strong evidence of a violation of the conditions.

Similarly, this scatter chart of the residuals versus factor x2 does not provide strong evidence of a violation of the conditions, so we will proceed with our inference. The p value associated with the F test for an overall regression relationship is 9.87E-09 = 0.00000000987, which is essentially zero. Because this p value is less than the 0.01 level of significance, we reject the hypothesis that β1 = β2 = 0. We conclude that there is an overall regression relationship at the 0.01 level of significance. The p value associated with the estimated regression parameter b1 is 2.41E-05 = 0.0000241, which is essentially zero. Because this p value is less than the 0.01 level of significance, we reject the hypothesis that β1 = 0. We conclude that there is a relationship between yield and factor x1 at the 0.01 level of significance. The best estimate is that if we hold factor x2 constant, a 1 unit increase in factor x1 corresponds to an average increase of 4.49 unit change in yield. The p value associated with the estimated regression parameter b2 is 0.2607. Because this p value is greater than the 0.01 level of significance, we do not reject the hypothesis that β2 = 0. We fail to conclude that there is a relationship between factor x2 and yield at the 0.01 level of significance. The estimated regression parameter b0 suggests that when factors x1 and x2 are both zero, the yield is –138.97 units. This result is obviously not realistic as yield cannot be negative. Copyright Cengage Learning. Powered by Cognero.

Page 25


Name:

Class:

Date:

Chapter 07 - Linear Regression b. The coefficient of determination r2 is 0.9537, so this regression model explains approximately 95% of the variation in the sample values of yield. 59. A student is interested in studying the impact of the number of books students referred to in a statistics course and the number of lectures they attended on the final grade on the course. A sample of 25 students is selected and the data are given below. BOOKS 2 3 1 5 2 1 3 4 5 2 5 1 3 1 4 5 3 3 1 5

ATTEND 17 18 17 20 12 13 17 19 22 22 22 12 18 12 21 14 18 13 8 22

GRADE 60 54 62 59 44 40 96 90 97 54 91 48 91 65 82 61 54 46 64 90

a. Develop an estimated regression equation using the number of books referred to and the number of lectures attended to predict the final grade on the course. b. Joseph referred to 4 books and attended 19 lectures. What is his predicted final score on the course? ANSWER:

a. The following Excel output provides the estimated multiple linear regression equation with number of books referred to by students in a statistics course and the number of lectures they attended as the independent variables.

Copyright Cengage Learning. Powered by Cognero.

Page 26


Name:

Class:

Date:

Chapter 07 - Linear Regression

The estimated multiple linear regression equation is

.

b. If Joseph refers to 4 books and attends 19 lectures, his predicted final score will be , or approximately 75. 60. A student is interested in studying the impact of the number of books students referred to in a statistics course and the number of lectures they attended on the final grade on the course. A sample of 25 students is selected and the data are given below. BOOKS 2 3 1 5 2 1 3 4 5 2 5 1 3 1 4 5 3 3 1 5

ATTEND 17 18 17 20 12 13 17 19 22 22 22 12 18 12 21 14 18 13 8 22

GRADE 60 54 62 59 44 40 96 90 97 54 91 48 91 65 82 61 54 46 64 90

a. Use the F test to determine the overall significance of the relationship. What is your conclusion at the 0.05 level of significance? Use the t test to determine the significance of each independent variable? What are your conclusions at the 0.05 level of significance? Copyright Cengage Learning. Powered by Cognero.

Page 27


Name:

Class:

Date:

Chapter 07 - Linear Regression b. How much of the variation in the final grade does the model in part (a) explain? ANSWER: a. To test any hypotheses, we check the conditions necessary for valid inference in regression. The Excel plots of the residuals and each of the two independent variables follow.

Because we are working with only 20 observations, assessing the conditions necessary for inference to be valid in regression is extremely difficult. However, this scatter chart does not provide strong evidence of a violation of the conditions.

Similarly, this scatter chart does not provide strong evidence of a violation of the conditions, so we will proceed with our inference. Excel Output:

Copyright Cengage Learning. Powered by Cognero.

Page 28


Name:

Class:

Date:

Chapter 07 - Linear Regression The p value associated with the F test for an overall regression relationship is 0.0151. Because this p value is less than the 0.05 level of significance, we reject the hypothesis that β1 = β2 = 0. We conclude that there is an overall regression relationship at the 0.05 level of significance. The p value associated with the estimated regression parameter b1 is 0.1945. Because this p value is greater than the 0.05 level of significance, we do not reject the hypothesis that β1 = 0. We cannot conclude that there is a relationship between number of books referred to and the final score at the 0.05 level of significance. The p value associated with the estimated regression parameter b2 is 0.2081. Because the p value is greater than the 0.05 level of significance, we do not reject the hypothesis that β2 = 0. We cannot conclude that there is a relationship between the number of lectures attended and the final score of students. 61. A survey conducted by a research team was to investigate how the education level, tenure in current employment, and age are related to annual income. A sample of 20 employees is selected and the data are given below. Length of tenure in Education (No. of current Annual income Age (No. of years) years) employment (No. ($) of years) 17 8 40 124,000 12 12 41 30,000 20 9 44 193,000 14 4 42 88,000 12 1 19 27,000 14 9 28 43,000 12 8 43 96,000 18 10 37 110,000 16 12 36 88,000 11 7 39 36,000 16 14 36 81,000 12 4 22 38,000 16 17 45 140,000 13 7 42 11,000 11 6 18 21,000 20 4 40 151,000 19 7 35 124,000 16 12 38 48,000 12 2 19 26,000 10 6 44 124,000 a. Determine the estimated multiple linear regression equation that can be used to predict the annual income given number of years school completed (Education), length of tenure in current employment, and age. b. Use the F test to determine the overall significance of the regression relationship. What is the conclusion at the 0.05 level of significance? ANSWER: a. The following Excel output provides the estimated multiple linear regression equation with education (x1), length of tenure in current employment (x2), age (x3) as the independent variables.

Copyright Cengage Learning. Powered by Cognero.

Page 29


Name:

Class:

Date:

Chapter 07 - Linear Regression

The estimated multiple linear regression equation is . b. Before performing any hypothesis tests on the results, we check the conditions necessary for valid inference in regression. The Excel plots of the residuals and education, length of tenure in current employment, and age follow.

Copyright Cengage Learning. Powered by Cognero.

Page 30


Name:

Class:

Date:

Chapter 07 - Linear Regression

Because we are working with only 20 observations, assessing the conditions necessary for inference to be valid in regression is extremely difficult. However, none of these scatter charts provide strong evidence of a violation of the conditions necessary for valid inference in regression, so we will proceed with our inference. The p value associated with the F test for an overall regression relationship is 0.00039. Because this p value is less than the 0.05 level of significance, we reject the hypothesis that β1 = β2 = β3 = 0. We conclude that there is an overall regression relationship at the 0.05 level of significance. 62. A survey conducted by a research team was to investigate how the education level, tenure in current employment, and age are related to annual income. A sample of 20 employees is selected and the data are given below. Length of tenure in Education (No. of years) current employment (No. of years) 17 8 12 12 20 9 14 4 12 1 14 9 12 8 18 10 16 12 11 7 16 14 12 4 16 17 13 7 11 6 20 4 19 7 16 12 12 2 10 6 Copyright Cengage Learning. Powered by Cognero.

Age (No. of years)

Annual income ($)

40 41 44 42 19 28 43 37 36 39 36 22 45 42 18 40 35 38 19 44

124,000 30,000 193,000 88,000 27,000 43,000 96,000 110,000 88,000 36,000 81,000 38,000 140,000 11,000 21,000 151,000 124,000 48,000 26,000 124,000 Page 31


Name:

Class:

Date:

Chapter 07 - Linear Regression a. Check if the F test leads to conclude that an overall regression relationship exists. If yes, use the t test to determine the significance of each independent variable. What is the conclusion for each test at the 0.05 level of significance? b. Remove all independent variables that are not significant at the 0.05 level of significance from the estimated regression equation. What is your estimated regression equation in this case? ANSWER:

a. Using the following Excel output, we conclude that there is an overall regression relationship at the 0.05 level of significance.

The p value associated with the estimated regression parameter b1 is 0.0013. Because this p value is less than the 0.05 level of significance, we reject the hypothesis that β1 = 0. We conclude that there is a relationship between the annual income and the education level at the 0.05 level of significance. Our best estimate is that if we hold the length of tenure in current employment and the age constant, a one-year increase in the education level corresponds to an increase of $10,011.92 in annual income, on average. The p value associated with the estimated regression parameter b2 is 0.3246. Because this p value is greater than the 0.05 level of significance, we do not reject the hypothesis that β2 = 0. We cannot conclude that there is a relationship between the length of tenure in current employment and the annual income at the 0.05 level of significance when controlling for education level and age. The p value associated with the estimated regression parameter b3 is 0.0149. Because this p value is less than the 0.05 level of significance, we reject the hypothesis that β3 = 0. We conclude that there is a relationship between the age and the annual income at the 0.05 level of significance. Our best estimate is that if we hold the education level and length of tenure in current employment constant, an increase in age by a year corresponds to an increase of $2,689.24 in annual income, on average. b. The following Excel output is obtained by removing the independent variable, length of tenure in current employment (x2), which is not significant at the 0.05 significance level. Hence, this output provides the estimated multiple linear regression equation with education (x1) and age (x3) as the independent variables.

Copyright Cengage Learning. Powered by Cognero.

Page 32


Name:

Class:

Date:

Chapter 07 - Linear Regression

The estimated multiple linear regression equation is

.

63. Consider the following data with the dependent variable y, independent variable x, and the dummy variable d. Y d x 100 0 9.3 100 0 6.5 50 0 4.2 75 1 7.4 65 0 6.0 90 0 7.6 85 1 9.6 65 1 6.5 50 1 6.0 75 1 9.9 75 1 8.6 65 0 4.9 95 0 7.2 95 0 9.9 90 1 7.8 80 1 7.2 75 0 7.0 95 0 8.7 100 1 9.9 90 1 9.3 70 1 9.1 100 0 8.9 40 0 5.4 90 0 8.8 75 1 8.8 40 1 5.0 45 1 6.5 55 0 5.2 100 0 10.0 90 0 8.4 80 1 6.4 Copyright Cengage Learning. Powered by Cognero.

Page 33


Name:

Class:

Date:

Chapter 07 - Linear Regression 80 90 60 95 45 65 85 60 75

0 1 0 1 0 0 1 1 1

6.7 8.5 5.9 8.9 6.3 7.5 6.8 5.7 4.5

a. Develop the estimated regression equation using all of the independent variables included in the data. b. Test for an overall regression relationship at the 0.05 level of significance. Is there a significant regression relationship? ANSWER: a. The following Excel output provides the estimated multiple linear regression equation that could be used to predict y given the dummy variable, d, and x.

The estimated multiple linear regression equation is

.

b. Before testing any hypotheses for this regression model, we check the conditions necessary for valid inference in regression. Excel plots of the residuals and each independent variable follow.

Copyright Cengage Learning. Powered by Cognero.

Page 34


Name:

Class:

Date:

Chapter 07 - Linear Regression

The residuals appear to have a mean of zero and do not appear to be badly skewed at any value of any independent variable. We therefore will proceed with our inference. The p value associated with the F test for an overall regression relationship is 7.813E-07 = 0.0000007813, which is essentially zero. Because this p value is less than the 0.05 level of significance, we reject the hypothesis that β1 = β2 = 0 and conclude that there is an overall regression relationship at the 0.05 level of significance. 64. Consider the following data with the dependent variable y, independent variable x, and the dummy variable d. Y d x 100 0 9.3 100 0 6.5 50 0 4.2 75 1 7.4 65 0 6.0 90 0 7.6 85 1 9.6 65 1 6.5 50 1 6.0 75 1 9.9 75 1 8.6 65 0 4.9 95 0 7.2 95 0 9.9 90 1 7.8 80 1 7.2 75 0 7.0 95 0 8.7 100 1 9.9 90 1 9.3 70 1 9.1 100 0 8.9 40 0 5.4 90 0 8.8 75 1 8.8 40 1 5.0 45 1 6.5 Copyright Cengage Learning. Powered by Cognero.

Page 35


Name:

Class:

Date:

Chapter 07 - Linear Regression 55 100 90 80 80 90 60 95 45 65 85 60 75

0 0 0 1 0 1 0 1 0 0 1 1 1

5.2 10.0 8.4 6.4 6.7 8.5 5.9 8.9 6.3 7.5 6.8 5.7 4.5

a. Check if the overall regression relationship exists for the above data. If yes, test the relationship between each independent variable and the dependent variable at the 0.05 level of significance, and interpret the relationship between each of the independent variables and the dependent variable. b. How much of the variation in the sample values of delay does this estimated regression equation explain? ANSWER: a. Using the following Excel output, we conclude that there is an overall regression relationship at the 0.05 level of significance.

We also have checked the necessary conditions for valid inference in regression in the previous question. Hence, the p value for the test of the hypothesis that β1 = 0 is 0.1540. Because this p value is greater than the 0.05 level of significance, we do not reject the hypothesis that β1 = 0, and we cannot conclude there is a relationship between the variables y and d while controlling for the variable x. The p value for the test of the hypothesis that β2 = is 1.52E-07 = 0.000000152. Because this p value is less than the 0.05 level of significance, we reject the hypothesis that β2 = 0 and conclude that there is a relationship between variable y and the independent variable x while controlling for variable d at the 0.05 level of significance. We estimate that, holding the variable d constant, for every 1 unit increase in x, there is an 8.010 unit increase in y, on average. b. The coefficient of determination is r2 = 0.5324, so the regression model explains approximately 53% of the variation in the y values in the sample. Copyright Cengage Learning. Powered by Cognero.

Page 36


Name:

Class:

Date:

Chapter 07 - Linear Regression 65. A production company is studying the relationship between the average cost/unit and number of units produced in a batch. A sample of 10 batches is selected and the data is given below. No. of units produced Cost/unit 20 37.7158 35 35.0158 50 30.8158 65 25.8158 80 20.0364 95 16.9064 110 16.3766 120 13.8564 135 13.9696 150 13.847 a. Develop a scatter chart for these data. What does the scatter chart indicate about the relationship between average cost/unit and number of units produced? b. Regardless of your answer to part (a), develop an estimated simple linear regression equation for the data. How much variation in the sample values of cost/unit is explained by this regression model? ANSWER: a. The scatter chart with number of units produced as the independent variable follows.

A simple linear regression model does not appear to be appropriate; there appears to be a nonlinear relationship between cost/unit and number of units produced in a batch. b. The following Excel output provides the estimated simple linear regression equation that could be used to predict cost/unit (y) given the number of units produced in a batch (x).

Copyright Cengage Learning. Powered by Cognero.

Page 37


Name:

Class:

Date:

Chapter 07 - Linear Regression

The estimated multiple linear regression equation is , and the coefficient of 2 determination for this model is r = 0.9177, so the regression model explains approximately 92% of the variation in the sample values of cost/unit. 66. A production company is studying the relationship between the average cost/unit and number of units produced in a batch. A sample of 10 batches is selected and the data is given below.

No. of units produced 20 35 50 65 80 95 110 120 135 150

Cost/unit 37.7158 35.0158 30.8158 25.8158 20.0364 16.9064 16.3766 13.8564 13.9696 13.847

a. Develop an estimated quadratic regression equation for the data. How much variation in the sample values of cost/unit does this regression model explain? b. Is the overall regression relationship significant at a 0.05 level of significance? If so, then test the relationship between the independent variable and the dependent variable at a 0.05 level of significance. ANSWER: a. The following Excel output provides the estimated second order quadratic regression equation that could be used to predict cost/unit (y) given the number of units produced (x).

Copyright Cengage Learning. Powered by Cognero.

Page 38


Name:

Class:

Date:

Chapter 07 - Linear Regression

The estimated second order quadratic regression equation is

, and the

coefficient of determination for this model is r2 = 0.9824, so the quadratic regression model explains approximately 98% of the variation in the sample values of cost/unit. b. Before testing any hypotheses about this regression model, we again check the conditions necessary for valid inference in regression. The Excel plot of the residuals and vehicle speed follows.

These scatter charts do not provide strong evidence of a violation of the conditions, so we will proceed with our inference. The p value associated with the F test for an overall regression relationship is 7.24E-07 = 0.000000724. Because this p value is less than the 0.05 level of significance, we reject the hypothesis that β1= β2 = 0 and conclude that there is an overall regression relationship at the 0.05 level of significance. Copyright Cengage Learning. Powered by Cognero.

Page 39


Name:

Class:

Date:

Chapter 07 - Linear Regression The p value for the test of the hypothesis that β1 = 0 is 4.32E-05. Because this p value is less than the 0.05 level of significance, we again reject the hypothesis that β1 = 0. Similarly, the p value for the test of the hypothesis that β2 = 0 is 0.0014. Because this p value is less than the 0.05 level of significance, we reject the hypothesis that β2 = 0. We therefore conclude that there is a nonlinear relationship between cost/unit and number of units produced in the batch. 67. Consider the below data which is based on a company’s sales in Year 1 through Year 12 along with the national income of the country, where the business is set up. National Income Company's sales Year (in millions of (in thousands of dollars) x dollars) y Year 1 305 470 Year 2 316 485 Year 3 358 499 Year 4 350 515 Year 5 375 532 Year 6 392 532 Year 7 400 556 Year 8 398 576 Year 9 430 583 Year 10 456 587 Year 11 578 601 Year 12 498 605 a. Develop a scatter chart for these data, treating the national income as the independent variable. Does a simple linear regression model appear to be appropriate? b. Develop an appropriate estimated regression equation to predict the company's sales, given the national income. How much variation in the sample values of company’s sales is explained by this regression model? ANSWER:

a. The scatter chart with national income as the independent variable follows.

A simple linear regression model does not appear to be appropriate. There appears to be a nonlinear Copyright Cengage Learning. Powered by Cognero.

Page 40


Name:

Class:

Date:

Chapter 07 - Linear Regression relationship between national income and company’s sales. b. Based on the scatter chart, we estimate a quadratic regression equation that could be used to national income (y) given company’s sales (x).

The estimated multiple linear regression equation is

, and the

coefficient of determination for this model is r2 = 0.9380, so the regression model explains approximately 94% of the variation in the sample values of company’s sales. 68. Give an estimated simple linear regression equation of = 46.2 + 589.2x with a coefficient of determination r2 of 0.7523, interpret the coefficient of determination for this equation. ANSWER: The regression model estimate explains approximately 75.2% of the variation the y values 69. The multiple regression model represents pricing for residential housing in a certain market. Predicted Price = 19,856.56 + 6985.25 bedrooms + 87.53 square foot. A house in the local market has 4 bedrooms and 2500 square feet of living area. Use the multiple regression model to determine the predicted price. ANSWER: $266,623 70. The multiple regression model represents pricing for residential housing in a certain market. Predicted Price ̂ = 19,856.56 + 6,985.25 bedrooms + 87.53 square foot. A house in the local market has 5 bedrooms and 3,500 square feet of living area. Use the multiple regression model to determine the predicted price and the residual if the house sells for $360,200. ANSWER: Predicted Price = $361,138, Residual = –$937.81 71. A multiple regression model has the form expected to increase by how many units? ANSWER: 2

. As x1 increases by 1 unit (holding x2 constant), is

72. A multiple regression model for predicted heart rate is as follows: heart rate = 10 – 0.4 run speed + 12 body weight. As the run speed increases by 1 unit (holding body weight constant), heart weight is expected to increase by how much? ANSWER: 0.4 Copyright Cengage Learning. Powered by Cognero.

Page 41


Name:

Class:

Date:

Chapter 07 - Linear Regression 73. Given the partial Excel output from a multiple regression, formulate the regression model. Coefficients 37,375.357

Standard Error 3,721.625

x1

55.655

9.370

x2

–5.750

3.575

x3

0.213

0.373

Intercept

ANSWER:

= 37,375.36 + 55.655(x1) – 5.75(x2) + 0.213(x3)

74. The multiple regression model represents pricing for residential housing in a certain market. Predicted Price ̂ = 19,856.56 + 6,985.25 bedrooms + 87.53 square foot. A house in the local market has 4 bedrooms and 2,200 square feet of living area. Use the multiple regression model to determine the predicted price. ANSWER: $240,364 75. The multiple regression model represents pricing for residential housing in a certain market. Predicted Price ̂ = 19,856.56 + 6,985.25 bedrooms + 87.53 square foot. A house in the local market has 5 bedrooms and 3,200 square feet of living area. Use the multiple regression model to determine the price and the residual if the house sells for $352,200. ANSWER: predicted price = $334,879, error = – $17,321 76. There is a linear relationship between the distance that a car drives and the amount of gas left in the tank of the car. The value of r2 for this linear relationship is 98.01%. What is the value of r? ANSWER: r = –0.99 77. What is the range of the possible values of r2? ANSWER: The values of r2 fall between 0 and 1. 78. A runner keeps track of the distance she runs (in miles) and the number of calories her watch says that she burns. The least-squares regression line for her data is . Identify and interpret the slope of the linear relationship between distance and number of calories burned. ANSWER: The slope is 125. For each additional mile she runs, the predicted additional number of calories she burns is 125. 79. A runner keeps track of the distance she runs (in miles) and the number of calories her watch says that she burns. The least-squares regression line for her data is . Predict the number of calories she can expect to burn if she runs 10 miles. ANSWER: 1260 calories 80. If a residual plot of the explanatory variable versus the residuals shows a curved pattern, what can we conclude is the shape of the scatterplot of the explanatory variable versus the response variable? ANSWER: The scatterplot is curved.

Copyright Cengage Learning. Powered by Cognero.

Page 42


Name:

Class:

Date:

Chapter 08 - Time Series Analysis and Forecasting Multiple Choice 1. A forecast is defined as a(n) ______. a. prediction of future values of a time series b. quantitative method used when historical data on the variable of interest are either unavailable or not applicable c. set of observations on a variable measured at successive points in time d. outcome of a random experiment ANSWER: a 2. A set of observations on a variable measured at successive points in time or over successive periods of time constitute a _____. a. geometric series b. time invariant set c. time series d. logarithmic series ANSWER: c 3. Which of the following states the objective of time series analysis? a. To predict the values of a time series based on one or more other variables b. To analyze the cause-and-effect relationship of a dependent variable with a time series and one or more other variables c. To use present variable values to study what should have been the ideal past values d. To uncover a pattern in a time series and then extrapolate the pattern into the future ANSWER: d 4. Which of the following is not true of a stationary time series? a. The process generating the data has a constant mean. b. The time series plot is a straight line. c. The statistical properties are independent of time. d. The variability is constant over time. ANSWER: b 5. If a time series plot exhibits a horizontal pattern, then _____. a. it is evident that the time series is stationary b. the data fluctuates around the variable mean c. there is no relationship between time and the time series variable d. there is still not enough evidence to conclude that the time series is stationary ANSWER: d 6. Which of the following is not present in a time series? a. Seasonality b. Operational variations c. Trend d. Cycles ANSWER: b Copyright Cengage Learning. Powered by Cognero.

Page 1


Name:

Class:

Date:

Chapter 08 - Time Series Analysis and Forecasting 7. Trend refers to _____. a. the long-run shift or movement in the time series observable over several periods of time. b. the outcome of a random experiment c. the recurring patterns observed over successive periods of time d. the short-run shift or movement in the time series observable for some specific period of time ANSWER: a 8. Which is not true regarding trend patterns? a. Can result when business conditions shift to a new level at some point in time b. Exist when there are gradual shifts of values over long periods of time c. Can result from factors such as improving technology or changes in consumer preferences d. Can represent nonlinear relationships ANSWER: a 9. A time series plot of a period of time (in weeks) versus sales (in 1,000’s of gallons) is shown below. Which of the following data patterns best describes the scenario shown?

a. Time series with a linear trend pattern c. Time series with no pattern ANSWER: d

b. Time series with a nonlinear trend pattern d. Time series with a horizontal pattern

10. A time series plot of a period of time (in years) versus sales (in thousands of dollars) is shown below. Which of the following data patterns best describes the scenario shown?

Copyright Cengage Learning. Powered by Cognero.

Page 2


Name:

Class:

Date:

Chapter 08 - Time Series Analysis and Forecasting

a. Linear trend pattern c. Seasonal pattern ANSWER: a

b. Nonlinear trend pattern d. Cyclical pattern

11. A time series plot of a period of time (in years) versus revenue (in millions of dollars) is shown below. Which of the following data patterns best describes the scenario shown?

a. Linear trend pattern c. Seasonal pattern ANSWER: b

b. Nonlinear trend pattern d. Cyclical pattern

12. A time series plot of a period of time (in months) versus sales (in number of units) is shown below. Which of the following data patterns best describes the scenario shown?

Copyright Cengage Learning. Powered by Cognero.

Page 3


Name:

Class:

Date:

Chapter 08 - Time Series Analysis and Forecasting

a. Linear trend pattern c. Exponential trend ANSWER: d

b. Logarithmic trend d. Seasonal pattern

13. A time series plot of a period of time (quarterly) versus quarterly sales (in $1,000s) is shown below. Which of the following data patterns best describes the scenario shown?

a. Linear trend and cyclical pattern c. Seasonal and cyclical patterns ANSWER: d

b. Linear trend and horizontal pattern d. Seasonal pattern and linear trend

14. An exponential trend pattern occurs when ______. a. the amount of increase between periods in the value of the variable is constant b. the percentage change between periods in the value of the variable is relatively constant c. there is a no relationship between the time series variable and time d. there are random fluctuations in the variable value with time ANSWER: b 15. A time series that shows a recurring pattern over one year or less is said to follow a _____. a. horizontal pattern Copyright Cengage Learning. Powered by Cognero.

Page 4


Name:

Class:

Date:

Chapter 08 - Time Series Analysis and Forecasting b. stationary pattern c. cyclical pattern d. seasonal pattern ANSWER: d 16. With reference to time series data patterns, a cyclical pattern is the component of the time series that _____. a. shows a periodic pattern lasting one year or less b. does not vary with respect to time c. shows a periodic pattern lasting more than one year d. is characterized by a linear variation of the dependent variable with respect to time ANSWER: c 17. _____ is the amount by which the predicted value differs from the observed value of the time series variable. a. Mean forecast error b. Mean absolute error c. Smoothing constant d. Forecast error ANSWER: d 18. If the forecasted value of the time series variable for period 2 is 22.5 and the actual value observed for period 2 is 25, what is the forecast error in period 2? a. 3 b. 2 c. 2.5 d. –2.5 ANSWER: c 19. The mean absolute error, mean squared error, and mean absolute percentage error are all methods to measure the accuracy of a forecast. These methods measure forecast accuracy by _____. a. determining how well a particular forecasting method is able to reproduce the time series data that are already available b. using the current value to estimate how well the model generates previous values correctly c. predicting the future values and wait for a predefined time period to examine how accurate the predictions were d. adjusting the scale of the data ANSWER: a 20. Forecast error _____. a. takes a positive value when the forecast is too high b. cannot be negative c. cannot be zero d. is associated with measuring forecast accuracy ANSWER: d 21. A positive forecast error indicates that the forecasting method _____ the dependent variable. Copyright Cengage Learning. Powered by Cognero.

Page 5


Name:

Class:

Date:

Chapter 08 - Time Series Analysis and Forecasting a. overestimated b. underestimated c. accurately estimated d. closely approximated ANSWER: b 22. Which of the following measures of forecast accuracy is susceptible to the problem of positive and negative forecast errors offsetting one another? a. Mean absolute error b. Mean forecast error c. Mean squared error d. Mean absolute percentage error ANSWER: b 23. Demand for a product and the forecasting department’s forecast (naïve model) for a product are shown below. Compute the mean absolute error. Period 1 2 3 4 a. 1 b. 1.5 c. 2 d. 2.5 ANSWER: c

Actual Demand 12 15 14 18

Forecasted Demand -12 15 16

24. Demand for a product and the forecasting department’s forecast (naïve model) for a product are shown below. Compute the mean squared error. Period 1 2 3 4 a. 3.33 b. 4.67 c. 5.33 d. 6.67 ANSWER: b

Actual Demand 12 15 14 18

Forecasted Demand -12 15 16

25. Suppose for a particular week, the forecasted sales were $4,000. The actual sales were $3,000. What is the value of the mean absolute percentage error? a. –33.3% b. –25% Copyright Cengage Learning. Powered by Cognero.

Page 6


Name:

Class:

Date:

Chapter 08 - Time Series Analysis and Forecasting c. 25% d. 33.3% ANSWER: d 26. The moving averages method refers to a forecasting method that _____. a. is used when considerable trend, cyclical, or seasonal effects are present b. uses regression relationship based on past time series values to predict the future time series values c. relates a time series to other variables that are believed to explain or cause its behavior d. uses the average of the most recent data values in the time series as the forecast for the next period ANSWER: d 27. The moving averages and exponential smoothing methods are appropriate for a time series exhibiting _____. a. a horizontal pattern b. a cyclical pattern c. trends d. seasonal effects ANSWER: a 28. Which of the following statements is the objective of the moving averages and exponential smoothing methods? a. To maximize forecast accuracy measures b. To smooth out random fluctuations in the time series c. To characterize the variable fluctuations by an exponential equation d. To transform a nonstationary time series into a stationary series ANSWER: b 29. In the moving averages method, the order k determines the _____. a. error tolerance b. compensation for forecasting error c. number of time series values under consideration d. number of samples in each unit time period ANSWER: c 30. Using a large value for order k in the moving averages method is effective in _____. a. tracking changes in a time series more quickly b. smoothing out random fluctuations c. providing a forecast when only the most recent time series are relevant d. eliminating the effect of seasonal variations in the time series ANSWER: b 31. _____ uses a weighted average of past time series values as the forecast. a. The qualitative method b. Exponential smoothing c. Correlation analysis d. The causal model ANSWER: b Copyright Cengage Learning. Powered by Cognero.

Page 7


Name:

Class:

Date:

Chapter 08 - Time Series Analysis and Forecasting 32. With reference to exponential forecasting models, a parameter that provides the weight given to the most recent time series value in the calculation of the forecast value is known as the _____. a. moving average b. regression coefficient c. smoothing constant d. mean forecast error ANSWER: c 33. The exponential smoothing forecast for period t + 1 is a weighted average of the _____. a. forecast value in period t with weight α and the actual value for period t with weight 1 – α b. actual value in period t + 1 with weight α and the forecast for period t with weight 1 – α c. forecast value in period t – 1 with weight α and the forecast for period t with weight 1 – α d. actual value in period t with weight α and the forecast for period t with weight 1 – α ANSWER: d 34. Which of the following is true of the exponential smoothing coefficient? a. It is a randomly generated value between –1 and +1. b. It is small for a time series that has relatively little random variability. c. It is chosen as the value that minimizes a selected measure of forecast accuracy such as the mean squared error. d. It is computed in relation with the order value, k, for the moving averages. ANSWER: c 35. The process of _____ might be used to determine the value of the smoothing constant that minimizes the mean squared error. a. quantization b. nonlinear optimization c. clustering d. curve fitting ANSWER: b 36. Autoregressive models _____. a. use the average of the most recent data values in the time series as the forecast for the next period b. are used to smooth out random fluctuations in time series c. relate a time series to other variables that are believed to explain or cause its behavior d. occur whenever all the independent variables are previous values of the time series ANSWER: d 37. A time series with a seasonal pattern can be modeled by treating the season as a _____. a. predictor variable b. dependent variable c. dummy variable d. quantitative variable ANSWER: c 38. Causal models _____. a. provide evidence of a causal relationship between an independent variable and the variable to be forecast b. use the average of the most recent data values in the time series as the forecast for the next period c. occur whenever all the independent variables are previous values of the same time series Copyright Cengage Learning. Powered by Cognero.

Page 8


Name:

Class:

Date:

Chapter 08 - Time Series Analysis and Forecasting d. relate a time series to other variables that are believed to explain or cause its behavior ANSWER: d 39. A causal model provides evidence of _____between an independent variable and the variable to be forecast. a. a causal relationship b. an association c. no relationship d. a seasonal relationship ANSWER: b 40. The value of an independent variable from the prior period is referred to as a _____. a. lagged variable b. dummy variable c. predictor variable d. categorical variable ANSWER: a 41. For causal modeling, _____ are used to detect linear or nonlinear relationships between the independent and dependent variables. a. descriptive statistics on the data b. scatter charts c. contingency tables d. pie charts ANSWER: b Subjective Short Answer 42. What is the difference between a stationary time series and a time series that shows a trend pattern? ANSWER: A stationary time series is a time series whose statistical properties are independent of time. In particular this means that: (1) the process generating the data has a constant mean and (2) the variability of the time series is constant over time. A time series plot for a stationary time series will always exhibit a horizontal pattern with random fluctuations. A trend pattern exists when a time series shows gradual shifts or movements to relatively higher or lower values over a longer period of time. 43. What is the difference between qualitative and quantitative forecasting methods? Under what circumstances is each method preferred? ANSWER: Qualitative methods generally involve the use of expert judgment to develop forecasts. Such methods are appropriate when historical data on the variable being forecast are either unavailable or not applicable. Quantitative forecasting methods can be used when (1) past information about the variable being forecast is available, (2) the information can be quantified, and (3) it is reasonable to assume that past is prologue (i.e., that the pattern of the past will continue into the future). 44. What is the difference between moving averages and exponential smoothing? ANSWER: A moving average uses the average of the most recent k data values in the time series as the forecast for the next period. A moving average gives equal weight to the most recent k data value in the time series. Exponential smoothing uses a weighted average of past time series values as a forecast. The weight given to the actual value in period t is the smoothing constant and is denoted by α. Copyright Cengage Learning. Powered by Cognero.

Page 9


Name:

Class:

Date:

Chapter 08 - Time Series Analysis and Forecasting

45. A linear trendline used to forecast sales for a given time period takes the form

Interpret the

variables ANSWER: The variable

gives the forecast of sales in period t. The variable t represents the time period of interest. The variable b0 gives the value of the y-intercept of the linear trendline, which is the predicted forecast of sales in period 0. The variable b1 gives the slope of the linear trendline, which gives the amount we predict the sales to increase (or decrease) by as the time period increases by 1.

46. Consider the following time series data. Year Value 1 234 2 287 3 255 4 310 5 298 6 250 7 456 8 412 9 525 10 436 Using the naïve method (most recent value) as the forecast for the next year, compute the following measures of forecast accuracy. a. Mean absolute error b. Mean squared error c. Mean absolute percentage error d. What is the forecast for year 11? ANSWER: The following table shows the calculations for parts (a), (b), and (c). Absolute Absolute Value of Squared Value of Forecast Forecast Forecast Percentage Percentage Year Value Forecast Error Error Error Error Error 1 234 2 287 234 53 53 2,809 18.4669 18.4669 3 255 287 –32 32 1,024 –12.5490 12.5490 4 310 255 55 55 3,025 17.7419 17.7419 5 298 310 –12 12 144 –4.0268 4.0268 6 250 298 –48 48 2,304 –19.2000 19.2000 7 456 250 206 206 42,436 45.1754 45.1754 8 412 456 –44 44 1,936 –10.6796 10.6796 9 525 412 113 113 12,769 21.5238 21.5238 10 436 525 –89 89 7,921 –20.4128 20.4128 Total 652 74,368 169.7764 a. MAE = 652/9 = 72.44 Copyright Cengage Learning. Powered by Cognero.

Page 10


Name:

Class:

Date:

Chapter 08 - Time Series Analysis and Forecasting b. MSE = 74,368/9 = 8263.11 c. MAPE = 169.7764/9 = 18.86% d. The forecast for year 11 is

= 436.

47. Consider the following time series data. Year Value 1 234 2 287 3 255 4 310 5 298 6 250 7 456 8 412 9 525 10 436 Using the average of all the historical data as a forecast for the next year, compute the following measures of forecast accuracy. a. Mean absolute error b. Mean squared error c. Mean absolute percentage error d. What is the forecast for year 11? ANSWER: The following table shows the calculations for parts (a), (b), and (c). Absolute Absolute Value of Squared Value of Forecast Forecast Forecast Percentage Percentage Year Value Forecast Error Error Error Error Error 1 234 2 287 234.00 53.00 53.00 2,809.00 18.4669 18.4669 3 255 260.50 –5.50 5.50 30.25 –2.1569 2.1569 4 310 258.67 51.33 51.33 2,635.11 16.5591 16.5591 5 298 271.50 26.50 26.50 702.25 8.8926 8.8926 6 250 276.80 –26.80 26.80 718.24 –10.7200 10.7200 7 456 272.33 183.67 183.67 33,733.44 40.2778 40.2778 8 412 298.57 113.43 113.43 12,866.04 27.5312 27.5312 9 525 312.75 212.25 212.25 45,050.06 40.4286 40.4286 10 436 336.33 99.67 99.67 9,933.44 22.8593 22.8593 Total 772.15 108,477.84 187.8924 a. MAE = 772.15/9= 85.79 b. MSE = 108,477.84/9 = 12,053.09 c. MAPE = 187.8924/9 = 20.88% d. The forecast for year 11 is = (y1 + y2 + y3 + y4 + y5 + y6 + y7 + y8 + y9 + y10) / 10 = (234 + 287 + 255 + 310 + 298 + 250 + 456 + 412 + 525 + 436) / 10 = 346.3. 48. The monthly sales revenue (in hundreds of dollars) of a company for one year is listed below. Month Sales ($100s) Copyright Cengage Learning. Powered by Cognero.

Page 11


Name:

Class:

Date:

Chapter 08 - Time Series Analysis and Forecasting January February March April May June July August September October November December

12,354 13,657 14,536 13,478 16,590 19,790 17,987 18,657 19,765 18,678 20,678 23,675

a. Compute MSE using the most recent value as the forecast for the next period. What is the forecast for the next month? b. Compute MSE using the average of all the data available as the forecast for the next period. What is the forecast for the next month? c. Which method appears to provide the better forecast? ANSWER: a. Squared Forecast Forecast Month Sales ($100s) Forecast Error Error January 12,354 February 13,657 12,354 1,303 1,697,809 March 14,536 13,657 879 772,641 April 13,478 14,536 –1,058 1,119,364 May 16,590 13,478 3,112 9,684,544 June 19,790 16,590 3,200 10,240,000 July 17,987 19,790 –1,803 3,250,809 August 18,657 17,987 670 448,900 September 19,765 18,657 1,108 1,227,664 October 18,678 19,765 –1,087 1,181,569 November 20,678 18,678 2,000 4,000,000 December 23,675 20,678 2,997 8,982,009 Total 42,605,309 MSE = 42,605,309/11 = 3,873,209.91 » 3,873,210 The forecast (in $100s) for the next month is b. Month January February March April May June July August

Sales ($100s) 12,354 13,657 14,536 13,478 16,590 19,790 17,987 18,657

Copyright Cengage Learning. Powered by Cognero.

= ydec = 23,675.

Forecast

Forecast Squared Forecast Error Error

12,354.00 13,005.50 13,515.67 13,506.25 14,123.00 15,067.50 15,484.57

1,303.00 1,530.50 –37.67 3,083.75 5,667.00 2,919.50 3,172.43

1,697,809.00 2,342,430.25 1,418.78 9,509,514.06 32,114,889.00 8,523,480.25 10,064,303.04 Page 12


Name:

Class:

Date:

Chapter 08 - Time Series Analysis and Forecasting September October November December

19,765 15,881.13 18,678 16,312.67 20,678 16,549.20 23,675 16,924.55 Total

3,883.88 2,365.33 4,128.80 6,750.45

15,084,485.02 5,594,801.78 17,046,989.44 45,568,636.57 147,548,757.1847

MSE = 147,548,757.1847 / 11 = 13,413,523.38 Forecast (in $100s) for next month is = (y1 + y2 +… + y11 + y12) / 12 = (12,354 + 13,657 + … + 20,678 + 23,675) / 12 = 17,487.08. c. The most recent value method in part (a) is better because MSE is smaller. 49. Consider the following time series data. Year Value 1 234 2 287 3 255 4 310 5 298 6 250 7 302 8 267 9 225 10 336 a. Construct a time series plot. What type of pattern exists in the data? b. Develop a three-year moving average for this time series. Compute MSE and a forecast for the year 11. ANSWER: a.

The time series data appear to follow a horizontal pattern. b. Year Value Forecast Forecast Squared Copyright Cengage Learning. Powered by Cognero.

Page 13


Name:

Class:

Date:

Chapter 08 - Time Series Analysis and Forecasting

1 2 3 4 5 6 7 8 9 10

234 287 255 310 298 250 302 267 225 336

258.67 284.00 287.67 286.00 283.33 273.00 264.67 Total

Error

Forecast Error

51.33 14.00 –37.67 16.00 –16.33 –48.00 71.33

2,635.11 196.00 1,418.78 256.00 266.78 2,304.00 5,088.44 12,165.11

MSE = 12,165.11/7 = 1,737.87 The forecast for year 11 is = (y8 + y9 + y10) / 3 = (267 + 225 + 336) / 3 = 276.00. 50. Consider the following time series data. Year Value 1 234 2 287 3 255 4 310 5 298 6 250 7 302 8 267 9 225 10 336 a. Use α = 0.2 to compute the exponential smoothing values for the time series. Compute MSE and a forecast for year 11. b. Use trial and error to find a value of the exponential smoothing coefficient α (rounded to 2 decimal places) that results in a smaller MSE than what you calculated for α = 0.2. c. Compute the forecast for year 11 using the smoothing coefficient α selected using trial error. ANSWER: a. Smoothing constant α = 0.2 Squared Forecast Forecast Year Value Forecast Error Error 1 234 2 287 234.00 53.00 2,809.00 3 255 244.60 10.40 108.16 4 310 246.68 63.32 4,009.42 5 298 259.34 38.66 1,494.29 6 250 267.08 –17.08 291.56 7 302 263.66 38.34 1,469.94 8 267 271.33 –4.33 18.73 9 225 270.46 –45.46 2,066.84 10 336 261.37 74.63 5,569.64 Copyright Cengage Learning. Powered by Cognero.

Page 14


Name:

Class:

Date:

Chapter 08 - Time Series Analysis and Forecasting Total

17,837.58

MSE = 17,837.58/9 = 1,981.95 The forecast for year 11 is = αy10 + (1– α) = (0.2)(336) + (1 – 0.2)(261.37) = 276.30. b. Several values of α will yield an MSE smaller than the MSE associated with α = 0.2. The table below shows the resulting MSE from several different α. α 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

MSE 2,285.29 1,981.95 1,928.71 1,978.37 2,081.20 2,219.57 2,387.01 2,580.70

The value of α that yields the minimum MSE is α = 0.29, which yields an MSE of 1,928.08. Year 1 2 3 4 5 6 7 8 9 10

Value 234 287 255 310 298 250 302 267 225 336

Forecast 234.00 249.37 251.00 268.11 276.78 269.01 278.58 275.22 260.66

MSE = 17,352.74/9 = 1,928.08 The forecast for year 11 is = αy10 + (1– α)

Forecast Error Squared Forecast Error 53.00 5.63 59.00 29.89 –26.78 32.99 –11.58 –50.22 75.34 Total

2,809.00 31.70 3,480.68 893.30 717.14 1,088.11 134.09 2,522.20 5,676.53 17,352.74

= (0.29)(336) + (1 – 0.29)260.66 = 282.51.

51. The following time series shows the sales of a particular commodity over the past 15 weeks. Week Sales 1 1,123 2 1,157 3 1,138 4 1,120 5 1,130 6 1,132 7 1,188 8 1,151 9 1,129 10 1,118 11 1,125 12 1,147 Copyright Cengage Learning. Powered by Cognero.

Page 15


Name:

Class:

Date:

Chapter 08 - Time Series Analysis and Forecasting 13 14 15

1,162 1,190 1,137

a. Construct a time series plot. What type of pattern exists in the data? b. Use α = 0.3 to develop the exponential smoothing values for the time series and compute the forecast of demand for the next week. c. Use trial and error to find a value of the exponential smoothing coefficient α that results in a relatively small MSE. ANSWER: a.

The time series plot shows a horizontal pattern. b. α = 0.3 Week 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Sales 1,123 1,157 1,138 1,120 1,130 1,132 1,188 1,151 1,129 1,118 1,125 1,147 1,162 1,190 1,137

Forecast 1,123.00 1,133.20 1,134.64 1,130.25 1,130.17 1,130.72 1,147.91 1,148.83 1,142.88 1,135.42 1,132.29 1,136.71 1,144.29 1,158.01

Forecast Error 34.00 4.80 –14.64 –0.25 1.83 57.28 3.09 –19.83 –24.88 –10.42 14.71 25.29 45.71 –21.01 Total

Squared Forecast Error 1,156.00 23.04 214.33 0.06 3.34 3,280.82 9.58 393.37 619.19 108.54 216.30 639.84 2,089.08 441.23 9,194.72

MSE = 9,194.72/14 = 656.77 Copyright Cengage Learning. Powered by Cognero.

Page 16


Name:

Class:

Date:

Chapter 08 - Time Series Analysis and Forecasting The forecast for week 16 is = αy15 + (1– α) = 0.3(1,137) + 0.7(1,158.01) = 1151.70 c. MSE values for exponential smoothing forecasts with several different values of α appear below. α MSE 0.05 767.11 0.1 686.51 0.2 646.06 0.21 645.92 0.3 656.77 0.4 679.49 0.5 703.75 The value of α that yields the smallest possible MSE is α = 0.21, which yields an MSE of 645.92. Week 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Sales

Forecast 1,123 1,157 1,138 1,120 1,130 1,132 1,188 1,151 1,129 1,118 1,125 1,147 1,162 1,190 1,137

1,123.00 1,130.14 1,131.79 1,129.31 1,129.46 1,129.99 1,142.17 1,144.03 1,140.87 1,136.07 1,133.74 1,136.53 1,141.88 1,151.98

Forecast Error 34.00 7.86 –11.79 0.69 2.54 58.01 8.83 –15.03 –22.87 –11.07 13.26 25.47 48.12 –14.98 Total

Squared Forecast Error 1,156.00 61.78 139.02 0.47 6.46 3,364.90 77.90 225.82 523.11 122.51 175.72 648.83 2,315.82 224.49 9,042.83

MSE = 9,042.83/14 = 645.92

52. The following times series shows the demand for a particular product over the past 10 months. Month Demand 1 324 2 311 3 303 4 314 5 323 6 313 7 302 8 315 9 312 10 326 a. Construct a time series plot. What type of pattern exists in the data? Copyright Cengage Learning. Powered by Cognero.

Page 17


Name:

Class:

Date:

Chapter 08 - Time Series Analysis and Forecasting b. Develop a three-month moving average for this time series. Compute MSE and a forecast for month 11. ANSWER: a.

The data appear to follow a horizontal pattern. b. Month 1 2 3 4 5 6 7 8 9 10

Demand 324 311 303 314 323 313 302 315 312 326

MSE = 680.00/7 = 97.14 The forecast for month 11 is

Forecast

312.67 309.33 313.33 316.67 312.67 310.00 309.67

Forecast Error

1.33 13.67 –0.33 –14.67 2.33 2.00 16.33 Total

Squared Forecast Error

1.78 186.78 0.11 215.11 5.44 4.00 266.78 680.00

= (y8 + y9 + y10) / 3 = (315 + 312 + 326) / 3 = 317.67.

53. The following times series shows the demand for a particular product over the past 10 months. Month Value 1 324 2 311 3 303 4 314 5 323 6 313 7 302 8 315 9 312 10 326 a. Use α = 0.2 to compute the exponential smoothing values for the time series. Compute MSE and a forecast for month 11. Copyright Cengage Learning. Powered by Cognero.

Page 18


Name:

Class:

Date:

Chapter 08 - Time Series Analysis and Forecasting b. Compare the three-month moving average forecast with the exponential smoothing forecast using α = 0.2. Which appears to provide the better forecast based on MSE? ANSWER: a. Smoothing constant α = 0.2 Forecast Squared Forecast Month Value Forecast Error Error 1 324 2 311 324.00 –13.00 169.00 3 303 321.40 –18.40 338.56 4 314 317.72 –3.72 13.84 5 323 316.98 6.02 36.29 6 313 318.18 –5.18 26.84 7 302 317.14 –15.14 229.36 8 315 314.12 0.88 0.78 9 312 314.29 –2.29 5.26 10 326 313.83 12.17 148.01 Total 967.94 MSE = 967.94/9 = 107.55 The forecast for month 11 is = αy10 + (1– α) = 0.2(326) + (1 – 0.2)313.83 = 316.27. b. Comparing the MSE for three-month moving average (calculated in the previous problem) and the MSE for exponential smoothing, the three-month moving average provides a better forecast as it has a smaller MSE. 54. The following data shows the quarterly profit (in thousands of dollars) made by a particular company in the past 3 years. Year Quarter Profit ($1000s) 1 1 45 1 2 51 1 3 72 1 4 50 2 1 49 2 2 45 2 3 79 2 4 54 3 1 42 3 2 58 3 3 70 3 4 56 a. Construct a time series plot. What type of pattern exists in the data? b. Develop a three-period moving average for this time series. Compute MSE and a forecast of profit (in $1000s) for the next quarter. ANSWER: Rewrite the data as below: t Profit ($1000s) 1 45 2 51 3 72 4 50 5 49 6 45 Copyright Cengage Learning. Powered by Cognero.

Page 19


Name:

Class:

Date:

Chapter 08 - Time Series Analysis and Forecasting 7 8 9 10 11 12

79 54 42 58 70 56

a.

The data appear to follow a horizontal pattern. b. Year 1 1 1 1 2 2 2 2 3 3 3 3

Quarter 1 2 3 4 1 2 3 4 1 2 3 4

t 1 2 3 4 5 6 7 8 9 10 11 12

Profit Forecast ($1000s) Forecast Error 45 51 72 50 56.00 –6.00 49 57.67 –8.67 45 57.00 –12.00 79 48.00 31.00 54 57.67 –3.67 42 59.33 –17.33 58 58.33 –0.33 70 51.33 18.67 56 56.67 –0.67 Total

MSE = 1,879 / 9 = 208.78 The forecast of profit (in $1000s) for the next quarter is 61.33.

Squared Forecast Error

36.00 75.11 144.00 961.00 13.44 300.44 0.11 348.44 0.44 1,879.00

= (y10 + y11 + y12) / 3 = (58 + 70 + 56) / 3 =

55. The following data shows the quarterly profit (in thousands of dollars) made by a particular company in the past 3 years. Year Quarter Profit ($1000s) Copyright Cengage Learning. Powered by Cognero.

Page 20


Name:

Class:

Date:

Chapter 08 - Time Series Analysis and Forecasting 1 1 1 1 2 2 2 2 3 3 3 3

1 2 3 4 1 2 3 4 1 2 3 4

45 51 72 50 49 45 79 54 42 58 70 56

a. Use α = 0.3 to compute the exponential smoothing values for the time series. Compute MSE and the forecast of profit (in $1000s) for the next quarter. b. Compare the three-period moving average forecast with the exponential smoothing forecast using α = 0.3. Which appears to provide the better forecast based on MSE? ANSWER: a. Smoothing constant α = 0.3 Squared Profit Forecast Forecast Year Quarter t ($1000s) Forecast Error Error 1 1 1 45 1 2 2 51 45.000 6.000 36.000 1 3 3 72 46.800 25.200 635.040 1 4 4 50 54.360 –4.360 19.010 2 1 5 49 53.052 –4.052 16.419 2 2 6 45 51.836 –6.836 46.736 2 3 7 79 49.785 29.215 853.488 2 4 8 54 58.550 –4.550 20.701 3 1 9 42 57.185 –15.185 230.581 3 2 10 58 52.629 5.371 28.843 3 3 11 70 54.241 15.759 248.359 3 4 12 56 58.968 –2.968 8.811 Total 2,143.988 MSE = 2,143.988/11 = 194.91 The forecast of profit (in $1000s) for quarter 13 is = αy12 + (1– α) = 0.3(56) + (1 – 0.3)58.968 = 58.08. b. Compared to the three-period moving average forecast (calculated in the previous problem), the exponential smoothing forecast provides a better forecast because it has a smaller MSE. 56. The below time series gives the indices of industrial production in the United States for 10 consecutive years. Year IP 1 79.62 2 86.54 3 88.14 4 89.23 5 93.45 6 97.4 Copyright Cengage Learning. Powered by Cognero.

Page 21


Name:

Class:

Date:

Chapter 08 - Time Series Analysis and Forecasting 7 8 9 10

99.34 96.98 100.22 103.56

a. Construct a time series plot. What type of pattern exists in the data? b. Use simple linear regression analysis to find the parameters for the line that minimizes MSE for this time series. c. What is the forecast for t = 11? ANSWER: a.

The time series plot shows a linear trend. b. Excel output:

From the above output, the regression estimates for the y-intercept and slope that minimize MSE for this time series are b0 = 80.458 and b1 = 2.36, which result in the following forecasts, errors, and MSE: Squared Forecast Forecast Error Year IP Forecast Error 1 79.62 82.81981818 –3.200 10.239 2 86.54 85.18163636 1.358 1.845 3 88.14 87.54345455 0.597 0.356 4 89.23 89.90527273 –0.675 0.456 Copyright Cengage Learning. Powered by Cognero.

Page 22


Name:

Class:

Date:

Chapter 08 - Time Series Analysis and Forecasting 5 6 7 8 9 10

93.45 97.4 99.34 96.98 100.22 103.56

92.26709091 94.62890909 96.99072727 99.35254545 101.71436364 104.07618182

1.183 2.771 2.349 –2.373 –1.494 –0.516 Total

1.399 7.679 5.519 5.629 2.233 0.266 35.622

MSE = 35.622/10 = 3.56. c. = b0 + b1t = 80.458 + 2.36(11) = 106.438. 57. The monthly sales (in hundreds of dollars) of a company are listed below. Month Sales ($100s) January 12,354 February 13,657 March 14,536 April 13,478 May 16,590 June 19,790 July 17,987 August 18,657 September 19,765 October 18,678 November 20,678 December 23,675 a. Construct a time series plot. What type of pattern exists in the data? b. Use simple linear regression analysis to find the parameters for the line that minimizes MSE for this time series. c. What is the sales forecast (in hundreds of dollars) for next month? ANSWER: a.

The time series plot shows a linear trend. b. Excel output:

Copyright Cengage Learning. Powered by Cognero.

Page 23


Name:

Class:

Date:

Chapter 08 - Time Series Analysis and Forecasting

From the above output, the regression estimates for the y-intercept and slope that minimize MSE for this time series are b0 = 11747.38 and b1 = 883.03, which result in the following forecasts, errors, and MSE: Sales Forecast Squared Month t ($100s) Forecast Error Forecast Error January 1 12,354 12,630.41026 –276.410 76,402.630 February 2 13,657 13,513.44172 143.558 20,608.978 March 3 14536 14,396.47319 139.527 1,9467.730 April 4 13478 15,279.50466 –1801.505 3,245,419.047 May 5 16,590 16,162.53613 427.464 182,725.360 June 6 19,790 17,045.56760 2744.432 7,531,909.203 July 7 17,987 17,928.59907 58.401 3,410.669 August 8 18,657 18,811.63054 –154.631 23,910.603 September 9 19,765 19,694.66200 70.338 4,947.434 October 10 18,678 20,577.69347 –1899.693 3,608,835.292 November 11 20,678 21,460.72494 –782.725 612,658.334 December 12 23,675 22,343.75641 1,331.244 1,772,209.495 Total 17,102,504.775 MSE = 17,102,504.775/12 = 1,425,208.73. c. The forecast (in $100s) is = b0 + b1t = 11747.38 + 883.03(13) = $23,226.79. 58. Consider the following time series. yt t 1 1,234 2 1,201 3 1,103 4 987 5 945 6 891 7 817 8 734 a. Construct a time series plot. What type of pattern exists in the data? b. Use simple linear regression analysis to find the parameters for the line that minimizes MSE for this time series. c. What is the forecast for t = 9? ANSWER: a. Copyright Cengage Learning. Powered by Cognero.

Page 24


Name:

Class:

Date:

Chapter 08 - Time Series Analysis and Forecasting

The data appear to show a linear trend with a decreasing pattern. b. Excel output:

The regression estimates for the y-intercept and slope that minimize MSE for the given time series are b0 = 1315.68 and b1 = –72.60, which result in the following forecasts, errors, and MSE: Forecast Squared Forecast yt t Forecast Error Error 1 1,234 1,243.083333 -9.083 82.507 2 1,201 1,170.488095 30.512 930.976 3 1,103 1,097.892857 5.107 26.083 4 987 1,025.297619 -38.298 1,466.708 5 945 952.702381 -7.702 59.327 6 891 880.1071429 10.893 118.654 7 817 807.5119048 9.488 90.024 8 734 734.9166667 -0.917 0.840 Total 2,775.119 MSE = 2,775.119/8 = 346.89. c. = b0 + b1t = 1,315.68 – 72.60(9) = 662.32 59. The yearly sales (in millions of dollars) of an automobile manufacturing company during the period 2000 to 2011 are Copyright Cengage Learning. Powered by Cognero.

Page 25


Name:

Class:

Date:

Chapter 08 - Time Series Analysis and Forecasting given below. Year 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011

Sales ($millions) y 470 485 499 515 532 532 556 576 583 587 601 605

a. Construct a time series plot. What type of pattern exists in the data? b. Use simple linear regression analysis to find the parameters for the line that minimizes MSE for this time series. c. What is the sales forecast (in millions of dollars) for the year 2012? ANSWER: a.

The time series plot shows a linear trend. b. Excel output:

Copyright Cengage Learning. Powered by Cognero.

Page 26


Name:

Class:

Date:

Chapter 08 - Time Series Analysis and Forecasting

The regression estimates for the y-intercept and slope that minimize MSE for the given time series are b0 = – 24,986.47 and b1 = 12.73, which result in the following forecasts, errors, and MSE. Squared Sales Forecast Forecast Year ($millions) y Forecast Error Error 2000 470 475.0641026 –5.064 25.645 2001 485 487.7948718 –2.795 7.811 2002 499 500.525641 –1.526 2.328 2003 515 513.2564103 1.744 3.040 2004 532 525.9871795 6.013 36.154 2005 532 538.7179487 –6.718 45.131 2006 556 551.4487179 4.551 20.714 2007 576 564.1794872 11.821 139.725 2008 583 576.9102564 6.090 37.085 2009 587 589.6410256 –2.641 6.975 2010 601 602.3717949 –1.372 1.882 2011 605 615.1025641 –10.103 102.062 Total 428.551 MSE = 428.551/12 = 35.713. c. The sales forecast (in millions of dollars) for the year 2012: = b0 + b1t = –24986.47 + 12.73(2012) = 627.83. 60. Consider the following time series data: yt t 1 0.345 2 0.366 3 0.398 4 0.356 5 0.456 6 0.478 7 0.543 8 0.596 9 0.634 10 0.698 a. Construct a time series plot. What type of pattern exists in the data? Copyright Cengage Learning. Powered by Cognero.

Page 27


Name:

Class:

Date:

Chapter 08 - Time Series Analysis and Forecasting b. Use simple linear regression analysis to find the parameters for the line that minimizes MSE for this time series. c. What is the forecast for t = 11? ANSWER: a.

This time series plot shows an upward linear trend. b. Excel output:

The regression estimates that minimize MSE for this time series are b0 = 0.2661 and b1 = 0.0402, which result in the following forecasts, errors, and MSE: Squared Forecast yt t Forecast Forecast Error Error 1 0.345 0.306290909 0.039 0.001 2 0.366 0.346448485 0.020 0.000 3 0.398 0.386606061 0.011 0.000 4 0.356 0.426763636 –0.071 0.005 5 0.456 0.466921212 –0.011 0.000 6 0.478 0.507078788 –0.029 0.001 7 0.543 0.547236364 –0.004 0.000 8 0.596 0.587393939 0.009 0.000 9 0.634 0.627551515 0.006 0.000 10 0.698 0.667709091 0.030 0.001 Total 0.009 Copyright Cengage Learning. Powered by Cognero.

Page 28


Name:

Class:

Date:

Chapter 08 - Time Series Analysis and Forecasting

MSE = 0.009/10 = 0.0009. c. The forecast for t = 11 is

= b0 + b1t = 0.2661 + 0.0402(11) = 0.708.

61. Consider the following quarterly time series. Quarter 1 2 3 4

Year 1 923 1,056 1,124 992

Year 2 1,112 1,156 1,124 1,078

Year 3 1,243 1,301 1,254 1,198

a. Construct a time series plot. What type of pattern exists in the data? b. Use a multiple regression model with dummy variables as follows to develop an equation to account for seasonal effects in the data. Qtr1 = 1 if quarter 1, 0 otherwise; Qtr2 = 1 if quarter 2, 0 otherwise; Qtr3 = 1 if quarter 3, 0 otherwise. c. Compute the quarterly forecasts for next year based on the model developed in part (b). ANSWER: a.

The above time series plot reveals a trend with a seasonal pattern in the data. For instance, in each year the value increases from quarter 1 to quarter 2 and drops from quarter 3 to quarter 4. b. Rewrite the data using the dummy variables in the following format: Time Series Value, yt Year Quarter Qtr1 Qtr2 Qtr3 1 1 1 0 0 923 1 2 0 1 0 1,056 1 3 0 0 1 1,124 1 4 0 0 0 992 2 1 1 0 0 1,112 2 2 0 1 0 1,156 2 3 0 0 1 1,124 2 4 0 0 0 1,078 Copyright Cengage Learning. Powered by Cognero.

Page 29


Name:

Class:

Date:

Chapter 08 - Time Series Analysis and Forecasting 3 3 3 3

1 2 3 4

1 0 0 0

0 1 0 0

0 0 1 0

1,243 1,301 1,254 1,198

We can use Excel’s Regression tool to find the regression model that accounts for the seasonal effects in the data. Excel output:

From the above output, the regression model that minimizes MSE for the given time series is: = 1,089.33 + 3.33Qtr1 + 81.67Qtr2 + 78Qtr3 c. Based on the model in part (b), the quarterly forecasts for next year are as follows: Quarter 1 forecast = 1,089.33 + 3.33(1) + 81.67(0) + 78(0) = 1,092.67 Quarter 2 forecast = 1,089.33 + 3.33(0) + 81.67(1) + 78(0) = 1,171.00 Quarter 3 forecast = 1,089.33 + 3.33(0) + 81.67(0) + 78(1) = 1,167.33 Quarter 4 forecast = 1,089.33 + 3.33(0) + 81.67(0) + 78(0) = 1,089.33 62. The following table shows the average monthly distance traveled (in billion miles) by vehicles on urban highways for five different years. Urban Highways - Average Monthly Distance Traveled by Vehicles (Billion Miles) Years Jan Feb Mar Apr May Jun July Aug Sep Oct Nov Dec Year 1 4.22 5.32 5.21 5.12 4.92 4.49 4.55 4.49 4.44 4.39 4.37 4.35 Year 2 4.31 5.44 5.34 5.24 4.98 4.59 4.68 4.65 4.61 4.68 4.74 4.79 Year 3 4.38 5.51 5.41 5.36 4.98 4.63 4.71 4.78 4.82 4.88 4.85 4.89 Year 4 4.45 5.59 5.5 5.41 5.01 4.72 4.78 4.79 4.82 4.92 5.06 5.11 Year 5 4.51 5.65 5.62 5.49 5.12 4.8 4.88 4.82 4.95 5.12 5.22 5.44 a. Construct a time series plot. What type of pattern exists in the data? b. Use a multiple regression model to develop an equation to account for seasonal effects and any linear trend in the data. To capture seasonal effects, use the dummy variables Jan = 1 if month is January, 0 otherwise; Feb = 1 if month is February, 0 otherwise; …; Nov = 1 if month is November, 0 otherwise; and create a variable t such that t = 1 for January of year 1, t = 2 for February of year 1, …, t = 60 for December of year 5. c. Compute the forecast (in billion miles) for the next three months based on the model developed in part a. ANSWER: a.

Copyright Cengage Learning. Powered by Cognero.

Page 30


Name:

Class:

Date:

Chapter 08 - Time Series Analysis and Forecasting

This time series plot shows a trend pattern in the data, however, there is a seasonal pattern as well. For instance, the lowest value occurs in January and the highest in February. b. Rewrite the data using the dummy variables and the variable t in the following format. yt Jan Feb Mar Apr May June July Aug Sep Oct Nov t 4.22 1 0 0 0 0 0 0 0 0 0 0 1 5.32 0 1 0 0 0 0 0 0 0 0 0 2 5.21 0 0 1 0 0 0 0 0 0 0 0 3 5.12 0 0 0 1 0 0 0 0 0 0 0 4 4.92 0 0 0 0 1 0 0 0 0 0 0 5 4.49 0 0 0 0 0 1 0 0 0 0 0 6 4.55 0 0 0 0 0 0 1 0 0 0 0 7 4.49 0 0 0 0 0 0 0 1 0 0 0 8 4.44 0 0 0 0 0 0 0 0 1 0 0 9 4.39 0 0 0 0 0 0 0 0 0 1 0 10 4.37 0 0 0 0 0 0 0 0 0 0 1 11 4.35 0 0 0 0 0 0 0 0 0 0 0 12 4.31 1 0 0 0 0 0 0 0 0 0 0 13 5.44 0 1 0 0 0 0 0 0 0 0 0 14 5.34 0 0 1 0 0 0 0 0 0 0 0 15 5.24 0 0 0 1 0 0 0 0 0 0 0 16 4.98 0 0 0 0 1 0 0 0 0 0 0 17 4.59 0 0 0 0 0 1 0 0 0 0 0 18 4.68 0 0 0 0 0 0 1 0 0 0 0 19 4.65 0 0 0 0 0 0 0 1 0 0 0 20 4.61 0 0 0 0 0 0 0 0 1 0 0 21 4.68 0 0 0 0 0 0 0 0 0 1 0 22 4.74 0 0 0 0 0 0 0 0 0 0 1 23 4.79 0 0 0 0 0 0 0 0 0 0 0 24 4.38 1 0 0 0 0 0 0 0 0 0 0 25 5.51 0 1 0 0 0 0 0 0 0 0 0 26 5.41 0 0 1 0 0 0 0 0 0 0 0 27 5.36 0 0 0 1 0 0 0 0 0 0 0 28 4.98 0 0 0 0 1 0 0 0 0 0 0 29 4.63 0 0 0 0 0 1 0 0 0 0 0 30 Copyright Cengage Learning. Powered by Cognero.

Page 31


Name:

Class:

Date:

Chapter 08 - Time Series Analysis and Forecasting 4.71 4.78 4.82 4.88 4.85 4.89 4.45 5.59 5.5 5.41 5.01 4.72 4.78 4.79 4.82 4.92 5.06 5.11 4.51 5.65 5.62 5.49 5.12 4.8 4.88 4.82 4.95 5.12 5.22 5.44

0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0

1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0

0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0

0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0

0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0

31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

We can use Excel’s Regression tool to find the regression model that accounts for both the trend and seasonal effects in the data. Excel output:

Copyright Cengage Learning. Powered by Cognero.

Page 32


Name:

Class:

Date:

Chapter 08 - Time Series Analysis and Forecasting

From the above output, the regression model that minimizes MSE for this time series is: = 4.576 – 0.438Jan + 0.681Feb + 0.585Mar + 0.484Apr + 0.152May – 0.213June – 0.149July – 0.172Aug – 0.1608Sep – 0.099Oct – 0.059Nov + 0.0095t b. Based on the model in part (b), the quarterly forecasts for the next three months are as follows: Year 6, January forecast (in billion miles) = 4.576 – 0.438(1) + 0.681(0) + 0.585(0) + 0.484(0) + 0.152(0) – 0.213(0) – 0.149(0) – 0.172(0) – 0.1608(0) – 0.099(0) – 0.059(0) + 0.0095(61) = 4.714. Year 6, February forecast (in billion miles) = 4.576 – 0.438(0) + 0.681(1) + 0.585(0) + 0.484(0) + 0.152(0) – 0.213(0) – 0.149(0) – 0.172(0) – 0.1608(0) – 0.099(0) – 0.059(0) + 0.0095(62) = 5.842. Year 6, March forecast (in billion miles) = 4.576 – 0.438(0) + 0.681(0) + 0.585(1) + 0.484(0) + 0.152(0) – 0.213(0) – 0.149(0) – 0.172(0) – 0.1608(0) – 0.099(0) – 0.059(0) + 0.0095(63) = 5.756. 63. The monthly market shares of General Electric Company for 12 consecutive months follow. Construct a time series plot. What type of pattern exists in the data? Month 1 2 3 4 5 6 7 8 9 10 11 12

Market Shares 23.39 23.56 23.02 23.03 23.60 23.37 23.21 23.40 23.31 23.94 23.39 23.50

Copyright Cengage Learning. Powered by Cognero.

Page 33


Name:

Class:

Date:

Chapter 08 - Time Series Analysis and Forecasting ANSWER:

The data appear to follow a horizontal pattern. 64. The monthly market shares of General Electric Company for 12 consecutive months follow. Develop three-month and four-month moving averages for this time series. Does the three-month or the four-month moving average provide the better forecasts based on MSE? Explain your reasoning. Month 1 2 3 4 5 6 7 8 9 10 11 12

Market Shares 23.39 23.56 23.02 23.03 23.60 23.37 23.21 23.40 23.31 23.94 23.39 23.50

ANSWER:

Market Month Shares 1 23.39 2 23.56 3 23.02 4 23.03 5 23.60 6 23.37 7 23.21

3-Month Moving Average Forecast

22.32 22.83 22.72 22.83

Copyright Cengage Learning. Powered by Cognero.

Error (Error)

0.71 –0.73 0.65 0.38

0.5 0.53 0.43 0.14

2

4-Month Moving Average Forecast

22.5 22.65 22.88

Error (Error)

–0.4 0.72 0.33

2

0.16 0.53 0.11 Page 34


Name:

Class:

Date:

Chapter 08 - Time Series Analysis and Forecasting 8 9 10 11 12

23.40 23.89 23.94 23.39 23.50

22.89 23.73 23.71 23.95 24.43

1.71 –0.42 0.23 2.1 2.22 Total

2.91 0.17 0.05 4.41 4.91 14.05

22.93 23.32 23.62 23.77 24.48

1.67 –0.01 0.32 2.29 2.18 Total

2.8 0 0.1 5.22 4.73 13.65

MSE (3-Month) = 14.05/ 9 = 1.56 MSE (4-Month) = 13.65/ 8 = 1.71 The 3-Month moving average provides the better forecasts because the MSE for the 3-Month moving average is smaller.

Copyright Cengage Learning. Powered by Cognero.

Page 35


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining Multiple Choice 1. The set of recorded values of variables associated with a single entity is a(n) _____. a. observation b. data point c. classification d. location ANSWER: a 2. A(n) _____ is often displayed as a row of values in a spreadsheet or database in which the columns correspond to the variables. a. record b. data point c. classification d. location ANSWER: a 3. A characteristic or quantity of interest that can take on different values is a(n) _____. a. variable b. observation c. record d. quality ANSWER: a 4. Estimation methods are also referred to as _____. a. prediction methods b. clustering methods c. association methods d. supervised methods ANSWER: a 5. Data mining methods for classifying or estimating an outcome based on a set of input variables is referred to as _____. a. supervised learning b. unsupervised learning c. dimension reduction d. data sampling ANSWER: a 6. _____ is a category of data mining techniques in which an algorithm learns how to classify or estimate an outcome variable of interest. a. Supervised learning b. Unsupervised learning c. Dimension reduction d. Data sampling Copyright Cengage Learning. Powered by Cognero.

Page 1


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining ANSWER: a 7. _____ is NOT a step of the data mining process. a. Data sampling b. Data partitioning c. Model construction d. Supervised learning ANSWER: d 8. _____ is a method of extracting data relevant to the business problem under consideration. It is the first step in the data mining process. a. Data sampling b. Data partitioning c. Model construction d. Model assessment ANSWER: a 9. _____ is the manipulation of the data with the goal of putting it in a form suitable for formal modeling. a. Data sampling b. Data partitioning c. Data preparation d. Model assessment ANSWER: c 10. _____ is the step in data mining that includes addressing missing and erroneous data, reducing the number of variables, defining new variables, and data exploration. a. Data sampling b. Data partitioning c. Data preparation d. Model assessment ANSWER: c 11. _____ involves descriptive statistics, data visualization, and clustering. a. Data exploration b. Data partitioning c. Data preparation d. Model assessment ANSWER: a 12. _____ is dividing the sample data into three sets for training, validation, and testing of the data mining algorithm performance. a. Data sampling b. Data partitioning c. Data preparation Copyright Cengage Learning. Powered by Cognero.

Page 2


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining d. Model assessment ANSWER: b 13. Applying descriptive statistics and data visualization to the training set to understand the data and assist in the selection of an appropriate technique is a part of _____. a. data exploration b. data partitioning c. data preparation d. model assessment ANSWER: a 14. Data used to build a data mining model is called _____. a. validation data b. training data c. test data d. exploration data ANSWER: b 15. Determine a freshman’s likely first-year grade point average from the student’s Scholastic Aptitude Test (SAT) score, high school grade point average, and number of extra-curricular activities. This is an example of _____. a. classification of a categorical outcome b. estimation of a continuous outcome c. prediction of a categorical outcome d. unsupervised learning ANSWER: b 16. The percent of misclassified records out of the total records in the validation data is known as the _____. a. overall error rate b. error c. accuracy d. class ANSWER: a 17. Classifying a record as belonging to one class when it belongs to another class is referred to as a(n) _____. a. overall error rate b. error c. accuracy d. class ANSWER: b 18. As we increase the cutoff value, _____ error will decrease and _____ error will rise. a. Class 0, Class 1 b. Class 1, Class 0 c. false, true d. None of these are correct. ANSWER: a Copyright Cengage Learning. Powered by Cognero.

Page 3


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 19. _____ is one minus the Class 0 error rate. a. Sensitivity b. Specificity c. Accuracy d. Cutoff value ANSWER: b 20. Misclassifying an actual _____ observation as a(n) _____ observation is known as a false positive. a. Class 0, Class 1 b. Class 1, Class 0 c. error, accuracy d. false, true ANSWER: a 21. A(n) _____ matrix displays a model’s correct and incorrect classification. a. cumulative lift b. confusion c. decile-wise lift chart d. ROC curve ANSWER: b 22. _____ attempts to classify a categorical outcome as a linear function of explanatory variables. a. Linear regression b. Logistic regression c. Classification model d. Supervised learning ANSWER: b 23. How many Class 1's are correctly classified as Class 1 in the Table below? Confusion Matrix Predicted Class Actual Class 1 0 1 221 100 0 30 3,000 a. 221 b. 100 c. 30 d. 3,000 ANSWER: a 24. How many Class 1's are incorrectly classified as Class 0? Confusion Matrix Copyright Cengage Learning. Powered by Cognero.

Page 4


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining Actual Class 1 0

Predicted Class 1 0 221 100 30 3,000

a. 221 b. 100 c. 30 d. 3,000 ANSWER: b 25. _____ compares the number of actual Class 1 observations identified if considered in decreasing order of their estimated probability if randomly classified. a. Cumulative lift b. Confusion c. Decile-wise lift chart d. ROC curve ANSWER: a 26. The y-axis of a decile chart shows _____. a. number of important class records identified b. ratio of decile mean to overall mean c. the number of actual Class 1 records d. the ratio of the overall mean to the decile mean ANSWER: b 27. The x-axis of a lift chart shows _____. a. the number of actual Class 1 records identified b. the ratio of decile mean to overall mean c. the number of actual Class 1 records d. the ratio of the overall mean to the decile mean ANSWER: a 28. Which of the following is a commonly used supervised learning method? a. k-means clustering b. k-nearest neighbors c. hierarchical clustering d. association rule development ANSWER: b 29. _____ refers to the scenario in which the analyst builds a model that does a great job of explaining the sample of data on which it is based but fails to accurately predict outside the sample data. a. Underfitting b. Overfitting c. Oversampling d. Undersampling ANSWER: b 30. A test set is the data set used to ______. Copyright Cengage Learning. Powered by Cognero.

Page 5


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining a. build the data mining model b. estimate performance of candidate models on unseen data c. estimate performance of the final model on unseen data d. show counts of actual versus predicted class values ANSWER: c 31. One minus the overall error rate is often referred to as the _____ of the model. a. sensitivity b. accuracy c. specificity d. cutoff value ANSWER: b 32. An observation classified as part of a group with a characteristic when it actually does not have the characteristic is termed as a(n) _____. a. false negative b. false positive c. residual d. outlier ANSWER: b 33. Separate error rates with respect to the false negative and false positive cases are computed to take into account the _____. a. asymmetric costs in misclassification b. symmetric weights of these two cases c. distortions due to outliers d. effect of sampling error ANSWER: a 34. _____ is a generalization of linear regression for predicting a categorical outcome variable. a. Multiple linear regression b. Logistic regression c. Discriminant analysis d. Cluster analysis ANSWER: b 35. In the k-nearest neighbors method, when the value of k is set to 1 _____. a. the classification or prediction of a new observation is based solely on the single most similar observation from the training set b. the new observation’s class is naïvely assigned to the most common class in the training set c. the new observation’s prediction is used to estimate the anticipated error rate on future data over the entire training set d. the classification or prediction of a new observation is subject to the smallest possible classification error ANSWER: a 36. _____ is a measure of the heterogeneity of observations in a classification tree. a. Sensitivity b. Specificity c. Accuracy d. Impurity Copyright Cengage Learning. Powered by Cognero.

Page 6


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining ANSWER: d 37. A _____classifies a categorical outcome variable by splitting observations into groups via a sequence of hierarchical rules. a. regression tree b. scatter chart c. classification tree d. confusion matrix ANSWER: c 38. The impurity of a group of observations is based on the variance of the outcome value for the observations in the group for _____. a. regression trees b. time-series plots c. classification trees d. cumulative lift charts ANSWER: a Subjective Short Answer 39. Given the following confusion matrix, what is the overall error rate?

Actual Class 1 0

Confusion Matrix Predicted Class 1 0 224 85 28 3,258

ANSWER: 0.03 40. Given the following confusion matrix, what is the accuracy?

Actual Class 1 0

Confusion Matrix Predicted Class 1 0 224 85 28 3,258

ANSWER: 0.9686 41. A bank is interested in identifying different attributes of its customers, and below is the sample data of 150 customers. For Gender, 0 represents Male and 1 represents Female. For Personal loan, 0 represents a customer who has not taken a personal loan and 1 represents a customer who has taken a personal loan. Partition the data into training (50 percent), validation (30 percent), and test (20 percent) sets, setting the seed as 12345. Use the appropriate software to classify the data, setting k-nearest neighbors with up to k = 10. Use Age, Gender, Work experience, Income (in $1000s), and Family size as input variables and Personal loan as the output variable. Be sure to Normalize input data and to Score on best k between 1 and specified value. Generate lift charts for both the validation data and test data. Using the Table below, for the cutoff probability value 0.5, what value of k minimizes the overall error rate on the validation data? Copyright Cengage Learning. Powered by Cognero.

Page 7


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining

Age 47 26 38 37 44 55 44 30 63 34 52 55 52 63 51 41 37 46 30 48 50 56 35 39 48 51 27 57 33 58 46 32 56 35 47 50 57 38 52 56 47

Gender 0 1 0 0 0 1 1 1 0 1 0 0 0 1 1 0 0 0 1 1 1 0 1 0 0 1 1 1 1 0 0 0 1 1 1 0 1 0 1 0 1

Work Income (in Personal experience $1000s) Family size loan 22 53 3 1 3 22 1 1 16 29 4 1 12 32 6 1 22 32 3 0 30 45 7 0 23 50 2 0 5 22 2 0 35 56 2 0 8 23 4 0 26 29 1 1 25 34 2 1 28 45 3 1 29 23 3 1 30 32 4 0 18 21 5 1 14 43 4 1 23 23 3 1 6 18 3 1 25 34 2 0 22 21 1 1 31 24 4 0 9 23 3 1 13 29 5 1 22 34 6 0 21 39 2 1 3 26 1 1 32 49 2 1 12 39 3 1 33 32 2 0 21 45 3 1 6 23 5 0 28 45 3 1 12 28 4 1 23 38 1 1 23 32 3 0 25 32 4 1 15 25 5 1 24 22 2 1 31 19 3 1 24 34 4 0

Copyright Cengage Learning. Powered by Cognero.

Page 8


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 54 25 40 61 29 52 56 61 26 60 37 39 46 59 54 27 54 42 64 33 65 38 48 31 39 58 51 41 36 39 58 42 44 65 40 33 45 38 32 51 32 46 44

0 0 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 1 0 1 0 0 1 1 0 1 0 0 1 0 0 1 1 1 0 1 1 0 1 0 0 0 1

31 2 16 30 6 25 31 33 4 30 12 14 21 30 31 4 30 18 35 8 34 13 23 7 15 28 26 18 12 12 29 15 21 35 13 11 24 12 10 30 11 22 21

Copyright Cengage Learning. Powered by Cognero.

45 21 34 49 34 39 54 43 23 56 23 39 34 39 28 22 45 36 46 32 36 32 26 32 35 45 23 34 32 22 34 45 32 54 23 41 29 24 28 43 34 39 25

2 1 6 2 1 3 2 2 2 3 5 4 5 2 1 1 2 4 2 6 1 4 5 3 5 2 3 3 2 4 2 1 3 2 5 4 5 4 3 2 1 4 3

1 1 0 1 1 1 1 0 1 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 Page 9


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 41 54 26 33 45 63 55 49 64 26 42 48 64 52 41 40 40 55 49 46 59 51 45 39 55 63 53 39 50 27 29 61 57 56 35 25 57 47 60 32 27 39 26

1 1 1 0 1 1 0 1 1 0 1 1 1 0 0 0 0 1 0 0 1 0 0 1 1 0 1 0 1 0 0 0 1 1 1 0 0 0 1 0 1 0 0

15 29 3 8 20 30 31 25 35 3 19 22 34 28 16 14 16 32 27 21 32 26 23 17 32 35 27 16 25 4 6 30 27 34 6 3 27 25 35 8 4 18 1

Copyright Cengage Learning. Powered by Cognero.

28 54 24 26 34 54 49 34 54 19 23 43 45 32 34 34 26 37 39 29 56 23 43 26 37 43 24 43 56 32 23 45 34 54 45 32 36 54 33 45 32 39 22

2 4 2 3 4 2 3 5 2 1 1 3 2 4 3 5 4 2 3 1 2 3 4 4 5 2 1 6 3 1 2 1 2 4 5 2 2 4 3 2 1 4 1

1 1 0 1 0 0 1 1 0 0 1 1 0 0 1 0 0 1 0 0 0 1 1 1 1 0 1 1 0 1 1 0 1 0 0 1 0 1 0 1 0 1 0 Page 10


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 46 28 64 65 47 27 25 25 64 44 65 54 51 51 28 56 57 35 47 54 28 45 43

1 0 0 1 1 1 0 0 0 1 0 0 0 0 1 1 1 1 1 1 1 0 1

25 4 37 33 25 5 3 2 30 21 25 31 26 21 4 32 26 9 21 27 5 22 18

34 23 43 49 37 28 34 24 53 48 47 55 43 46 25 55 49 27 54 45 29 45 43

5 2 4 2 4 2 1 1 2 5 2 1 3 1 2 3 2 6 4 2 3 5 2

0 0 0 0 0 1 1 1 0 1 0 1 0 1 1 1 1 1 1 0 1 0 1

ANSWER: The overall error rate is minimized at k = 2. 42. A bank is interested in identifying different attributes of its customers, and below is the sample data of 150 customers. In the data table for the dummy variable Gender, 0 represents Male and 1 represents Female. And for the dummy variable Personal loan, 0 represents a customer who has not taken a personal loan and 1 represents a customer who has taken a personal loan. Income Work Family Personal Age Gender (in experience size loan $1000s) 47 0 22 53 3 1 26 1 3 22 1 1 38 0 16 29 4 1 37 0 12 32 6 1 44 0 22 32 3 0 55 1 30 45 7 0 44 1 23 50 2 0 30 1 5 22 2 0 63 0 35 56 2 0 34 1 8 23 4 0 52 0 26 29 1 1 55 0 25 34 2 1 52 0 28 45 3 1 Copyright Cengage Learning. Powered by Cognero.

Page 11


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 63 51 41 37 46 30 48 50 56 35 39 48 51 27 57 33 58 46 32 56 35 47 50 57 38 52 56 47 54 25 40 61 29 52 56 61 26 60 37 39 46 59 54 27 54 42 64 33 65 38 48

1 1 0 0 0 1 1 1 0 1 0 0 1 1 1 1 0 0 0 1 1 1 0 1 0 1 0 1 0 0 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 1 0 1 0 0 1

29 30 18 14 23 6 25 22 31 9 13 22 21 3 32 12 33 21 6 28 12 23 23 25 15 24 31 24 31 2 16 30 6 25 31 33 4 30 12 14 21 30 31 4 30 18 35 8 34 13 23

23 32 21 43 23 18 34 21 24 23 29 34 39 26 49 39 32 45 23 45 28 38 32 32 25 22 19 34 45 21 34 49 34 39 54 43 23 56 23 39 34 39 28 22 45 36 46 32 36 32 26

Copyright Cengage Learning. Powered by Cognero.

3 4 5 4 3 3 2 1 4 3 5 6 2 1 2 3 2 3 5 3 4 1 3 4 5 2 3 4 2 1 6 2 1 3 2 2 2 3 5 4 5 2 1 1 2 4 2 6 1 4 5

1 0 1 1 1 1 0 1 0 1 1 0 1 1 1 1 0 1 0 1 1 1 0 1 1 1 1 0 1 1 0 1 1 1 1 0 1 0 0 0 0 0 1 1 0 0 0 1 0 0 0 Page 12


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 31 39 58 51 41 36 39 58 42 44 65 40 33 45 38 32 51 32 46 44 41 54 26 33 45 63 55 49 64 26 42 48 64 52 41 40 40 55 49 46 59 51 45 39 55 63 53 39 50 27 29

1 0 1 0 0 1 0 0 1 1 1 0 1 1 0 1 0 0 0 1 1 1 1 0 1 1 0 1 1 0 1 1 1 0 0 0 0 1 0 0 1 0 0 1 1 0 1 0 1 0 0

7 15 28 26 18 12 12 29 15 21 35 13 11 24 12 10 30 11 22 21 15 29 3 8 20 30 31 25 35 3 19 22 34 28 16 14 16 32 27 21 32 26 23 17 32 35 27 16 25 4 6

32 35 45 23 34 32 22 34 45 32 54 23 41 29 24 28 43 34 39 25 28 54 24 26 34 54 49 34 54 19 23 43 45 32 34 34 26 37 39 29 56 23 43 26 37 43 24 43 56 32 23

Copyright Cengage Learning. Powered by Cognero.

3 5 2 3 3 2 4 2 1 3 2 5 4 5 4 3 2 1 4 3 2 4 2 3 4 2 3 5 2 1 1 3 2 4 3 5 4 2 3 1 2 3 4 4 5 2 1 6 3 1 2

0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 1 1 0 1 0 0 1 1 0 0 1 1 0 0 1 0 0 1 0 0 0 1 1 1 1 0 1 1 0 1 1 Page 13


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 61 57 56 35 25 57 47 60 32 27 39 26 46 28 64 65 47 27 25 25 64 44 65 54 51 51 28 56 57 35 47 54 28 45 43

0 1 1 1 0 0 0 1 0 1 0 0 1 0 0 1 1 1 0 0 0 1 0 0 0 0 1 1 1 1 1 1 1 0 1

30 27 34 6 3 27 25 35 8 4 18 1 25 4 37 33 25 5 3 2 30 21 25 31 26 21 4 32 26 9 21 27 5 22 18

45 34 54 45 32 36 54 33 45 32 39 22 34 23 43 49 37 28 34 24 53 48 47 55 43 46 25 55 49 27 54 45 29 45 43

1 2 4 5 2 2 4 3 2 1 4 1 5 2 4 2 4 2 1 1 2 5 2 1 3 1 2 3 2 6 4 2 3 5 2

0 1 0 0 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 1 0 1 0 1 0 1 1 1 1 1 1 0 1 0 1

ANSWER: a. The rules of the best pruned tree can be distilled to characterize a customer who has taken the personal loan as: i. Age < 39.5 years, Female, and Income > $24,000 OR ii. Age between 46.5 and 57.5 years and Family size < 3 OR iii. Age between 46.5 and 57.5 years, Family size > 2, and Income > $49,000

Copyright Cengage Learning. Powered by Cognero.

Page 14


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining

43. A bank is interested in identifying different attributes of its customers, and below is the sample data of 150 customers. In the data table for the dummy variable Gender, 0 represents Male and 1 represents Female. And for the dummy variable Personal loan, 0 represents a customer who has not taken a personal loan and 1 represents a customer who has taken a personal loan. Income Work Family Personal Age Gender (in experience size loan $1000s) 47 0 22 53 3 1 26 1 3 22 1 1 38 0 16 29 4 1 37 0 12 32 6 1 44 0 22 32 3 0 55 1 30 45 7 0 44 1 23 50 2 0 30 1 5 22 2 0 63 0 35 56 2 0 34 1 8 23 4 0 52 0 26 29 1 1 55 0 25 34 2 1 Copyright Cengage Learning. Powered by Cognero.

Page 15


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 52 63 51 41 37 46 30 48 50 56 35 39 48 51 27 57 33 58 46 32 56 35 47 50 57 38 52 56 47 54 25 40 61 29 52 56 61 26 60 37 39 46 59 54 27 54 42 64 33 65 38

0 1 1 0 0 0 1 1 1 0 1 0 0 1 1 1 1 0 0 0 1 1 1 0 1 0 1 0 1 0 0 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 1 0 1 0 0

28 29 30 18 14 23 6 25 22 31 9 13 22 21 3 32 12 33 21 6 28 12 23 23 25 15 24 31 24 31 2 16 30 6 25 31 33 4 30 12 14 21 30 31 4 30 18 35 8 34 13

45 23 32 21 43 23 18 34 21 24 23 29 34 39 26 49 39 32 45 23 45 28 38 32 32 25 22 19 34 45 21 34 49 34 39 54 43 23 56 23 39 34 39 28 22 45 36 46 32 36 32

Copyright Cengage Learning. Powered by Cognero.

3 3 4 5 4 3 3 2 1 4 3 5 6 2 1 2 3 2 3 5 3 4 1 3 4 5 2 3 4 2 1 6 2 1 3 2 2 2 3 5 4 5 2 1 1 2 4 2 6 1 4

1 1 0 1 1 1 1 0 1 0 1 1 0 1 1 1 1 0 1 0 1 1 1 0 1 1 1 1 0 1 1 0 1 1 1 1 0 1 0 0 0 0 0 1 1 0 0 0 1 0 0 Page 16


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 48 31 39 58 51 41 36 39 58 42 44 65 40 33 45 38 32 51 32 46 44 41 54 26 33 45 63 55 49 64 26 42 48 64 52 41 40 40 55 49 46 59 51 45 39 55 63 53 39 50 27

1 1 0 1 0 0 1 0 0 1 1 1 0 1 1 0 1 0 0 0 1 1 1 1 0 1 1 0 1 1 0 1 1 1 0 0 0 0 1 0 0 1 0 0 1 1 0 1 0 1 0

23 7 15 28 26 18 12 12 29 15 21 35 13 11 24 12 10 30 11 22 21 15 29 3 8 20 30 31 25 35 3 19 22 34 28 16 14 16 32 27 21 32 26 23 17 32 35 27 16 25 4

26 32 35 45 23 34 32 22 34 45 32 54 23 41 29 24 28 43 34 39 25 28 54 24 26 34 54 49 34 54 19 23 43 45 32 34 34 26 37 39 29 56 23 43 26 37 43 24 43 56 32

Copyright Cengage Learning. Powered by Cognero.

5 3 5 2 3 3 2 4 2 1 3 2 5 4 5 4 3 2 1 4 3 2 4 2 3 4 2 3 5 2 1 1 3 2 4 3 5 4 2 3 1 2 3 4 4 5 2 1 6 3 1

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 1 1 0 1 0 0 1 1 0 0 1 1 0 0 1 0 0 1 0 0 0 1 1 1 1 0 1 1 0 1 Page 17


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 29 61 57 56 35 25 57 47 60 32 27 39 26 46 28 64 65 47 27 25 25 64 44 65 54 51 51 28 56 57 35 47 54 28 45 43

0 0 1 1 1 0 0 0 1 0 1 0 0 1 0 0 1 1 1 0 0 0 1 0 0 0 0 1 1 1 1 1 1 1 0 1

6 30 27 34 6 3 27 25 35 8 4 18 1 25 4 37 33 25 5 3 2 30 21 25 31 26 21 4 32 26 9 21 27 5 22 18

23 45 34 54 45 32 36 54 33 45 32 39 22 34 23 43 49 37 28 34 24 53 48 47 55 43 46 25 55 49 27 54 45 29 45 43

2 1 2 4 5 2 2 4 3 2 1 4 1 5 2 4 2 4 2 1 1 2 5 2 1 3 1 2 3 2 6 4 2 3 5 2

1 0 1 0 0 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 1 0 1 0 1 0 1 1 1 1 1 1 0 1 0 1

Partition the data into training (50 percent), validation (30 percent), and test (20 percent) sets. Classify the data using knearest neighbors with up to k = 10. Use Age, Gender, Work experience, Income (in $1000s), and Family size as input variables and Personal loan as the output variable. Be sure to Normalize input data and to Score on best k between 1 and specified value. Generate lift charts for both the validation data and test data. a. For the cutoff probability value 0.5, what value of k minimizes the overall error rate on the validation data? Explain the difference in the overall error rate on the training, validation, and test data. b. Examine the decile-wise lift chart on the test data. Identify and interpret the first decile lift. c. For cutoff probability values of 0.5, 0.4, 0.3, and 0.2, what are the corresponding Class 1 error rates and Class 0 error rates on the validation data? ANSWER: a. The overall error rate is minimized at k = 2. The overall error rate for the training, validation, and test sets is 26.67%, 42.22%, and 43.33%, respectively. The overall error rate is the lowest on the training data since a training set observation’s set of k nearest neighbors will always include itself, artificially lowering the error Copyright Cengage Learning. Powered by Cognero.

Page 18


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining rate. For k = 2, the overall error rate on the validation data is biased since this overall error rate is the lowest error rate over all values of k. Thus, applying k = 2 on the test data will typically result in a larger (and more representative) overall error rate because we are not using the test data to find the best value of k.

Copyright Cengage Learning. Powered by Cognero.

Page 19


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining

b. The first decile lift is 1.14. For this test data set of 30 observations and 21 actual customers who have taken Copyright Cengage Learning. Powered by Cognero.

Page 20


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining the personal loan, if we randomly select 3 customers, on average 2.1 of them would have taken the personal loan. However, if we use k-NN with k = 2 to identify the top 3 observations most likely to have a personal loan, then (2.1)(1.14) ≈ 2.4 of them would have taken the personal loan. This can be confirmed from the Detailed Scoring report on the test data by observing that there are 5 observations with a predicted probability of 100% of taking a personal loan, but only 4 of these actually took a loan. Thus, of the top 3 observations recommended by k-NN with k = 2, there would be on average (3)(4/5) = 2.4 that actually took the loan. c. For cutoff probability values of 0.5, 0.4, 0.3, and 0.2, the corresponding Class 1 error rates and Class 0 error rates on the validation data are as below: Cutoff Value 0.5 0.4 0.3 0.2

Class 1 Error Rate 37.04% 37.04% 37.04% 37.04%

Class 0 Error Rate 50.22% 50.22% 50.22% 50.22%

44. A bank wants to understand better the details of customers who are likely to default on their loan. In order to analyze this, the data from a random sample of 200 customers are given below: Average Balance 1,222.3 6,291.0 1,051.0 1,118.3 1,176.8 1,052.0 1,314.6 439.7 1,232.7 1,855.4 322.4 1,570.7 2,729.0 1,397.8 1,464.1 40.3 1,296.4 2,142.7 2,756.3 1,451.1 1,003.9 1,245.7 3,011.1 1,222.3 2,225.9 2,708.2 2,341.6 1,817.7 1,417.3 4,291.6

Age 36 41 52 36 35 46 37 34 56 51 44 39 51 41 47 46 32 58 32 49 48 40 42 30 51 40 30 42 52 59

Gender Married Divorced 1 0 0 0 1 1 1 1 0 1 0 0 0 1 0 1 1 0 1 1 0 1 1 1 0 1 1 0 1 0 0 1 0 1 1 0 1 1 0 1 1 0 1 1 1 1 1 0 0 1 0 0 1 0 1 1 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 0 0 1 0 1 1 0 1 1 1 1 0 0 1 1 0 1 1 0

Copyright Cengage Learning. Powered by Cognero.

Family Size 1 3 4 2 2 3 1 5 1 2 5 3 1 2 2 3 2 4 2 3 2 6 1 2 5 2 1 2 3 4

Loan Default 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 Page 21


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 1,310.7 1,144.0 1,088.4 1,341.9 1,269.1 1,435.5 113.7 4,646.5 1,003.9 1,773.5 3,349.1 647.0 3,901.6 1,603.2 1,308.1 4,061.5 2,283.1 1,023.0 1,083.2 1,158.6 1,052.0 592.2 6,834.4 1,505.7 1,170.0 1,509.6 1,061.0 517.2 1,661.7 1,279.5 1,656.5 1,319.8 1,227.5 1,748.8 1,060.0 1,119.6 1,135.0 2,777.1 1,535.6 352.0 1,605.8 5,737.2 3,354.3 10,096.1 9,164.0 6,796.7 2,108.9 265.2 1097.0 1,041.0 1,224.9

34 38 47 31 56 46 58 39 38 46 45 54 58 38 39 35 57 40 52 26 30 35 41 57 56 43 52 60 46 57 46 49 38 51 38 44 59 50 34 59 42 48 45 57 53 48 57 52 55 58 51

1 1 0 1 1 1 1 1 1 0 1 1 1 1 0 0 1 1 0 1 1 1 0 1 0 1 0 0 1 0 0 1 0 0 1 1 0 1 0 0 1 0 0 1 0 1 1 0 1 0 0

1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1

Copyright Cengage Learning. Powered by Cognero.

0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0

2 3 2 2 1 2 3 3 2 2 3 3 4 2 2 1 3 2 4 1 2 3 3 3 6 4 3 5 2 1 2 2 1 3 4 2 3 2 1 5 2 2 1 2 3 4 5 3 3 2 1

0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 Page 22


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 1,557.7 3,202.2 1,173.0 1,794.3 2,423.5 171.8 1,2157.9 4,107.0 887.9 1,165.1 643.5 1,529.1 2,142.7 1,035.0 1,003.9 1,509.6 1,118.3 1,124.8 1,891.8 6,796.7 1,709.8 1,011.7 1,270.4 1,663.0 1,648.7 1,887.9 1,244.4 2,465.1 6,086.9 1,262.6 1,513.5 1,170.3 1,557.7 2,454.7 710.8 3,711.8 1,748.8 1,248.3 1,002.6 1,130.0 1,040.3 1,595.4 1,144.0 1,582.4 1,049.0 1,577.2 561.0 3,349.1 1,704.6 1,245.7 16,191.8

44 57 39 41 47 52 50 52 47 44 50 45 59 54 48 43 37 46 49 48 54 30 52 39 33 47 37 36 38 57 27 29 45 57 41 56 52 59 51 41 47 50 34 52 53 39 40 46 45 35 54

1 0 1 0 0 0 1 1 0 0 0 1 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 0 1 1 0 0 1 1 0 0 0 1 0 0 0 1 1 1 0 1 0 0 1 0 0

1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1

Copyright Cengage Learning. Powered by Cognero.

1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0

3 2 3 1 2 3 3 2 2 3 1 1 1 2 3 1 2 2 1 1 2 3 2 3 2 2 3 3 3 3 1 3 2 1 1 2 2 2 2 1 2 3 1 3 1 2 3 2 3 2 3

0 0 0 1 0 1 0 0 1 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 Page 23


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 2,185.6 1,167.7 1,535.6 1,319.8 1,145.6 1,304.2 1,851.5 2,099.8 1,152.0 1,219.7 1,235.3 1,811.2 732.5 5,630.6 2,420.9 2,454.7 1,557.7 4,017.3 4,017.3 1,351.0 1,507.0 1,050.7 1,657.8 1,115.0 245.9 1,058.5 1,377.0 1,079.3 1,456.3 2,063.4 1,106.6 1,119.6 2,496.3 1,578.5 1,284.7 1,409.5 1,085.8 1,083.0 1,556.4 1,080.6 1,457.6 1,478.4 1,690.3 1,458.9 1,465.4 1,002.6 1,728.0 1,015.6 1,163.8 1,299.0 1,400.4

37 40 34 50 49 53 49 40 34 49 50 30 28 56 67 58 46 72 72 41 29 60 36 37 59 56 60 34 41 29 26 60 53 42 47 53 44 38 49 41 33 47 51 56 46 51 34 31 48 59 38

0 1 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 1 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 1 0 0 1

1 1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1

Copyright Cengage Learning. Powered by Cognero.

0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1

3 1 2 1 1 3 3 1 1 2 3 1 2 3 3 3 2 2 1 2 3 1 1 1 2 3 1 1 3 2 2 1 2 1 3 3 2 3 3 3 2 3 1 3 3 1 1 2 1 2 1

0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 Page 24


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 1,005.2 1,341.9 1,032.5 1,236.6 1,087.1 1,170.3 1,237.9 1,296.4 1,182.0 1,133.0 1,629.2 1,830.7 1,137.8 2,011.4 170.3 1,135.2 195.0

27 40 39 48 34 55 55 51 37 39 52 36 36 56 36 39 29

1 1 0 0 0 0 0 0 1 0 0 0 1 0 0 1 1

1 0 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1

0 0 0 0 1 0 0 1 0 0 0 0 0 1 1 0 0

3 2 1 1 1 1 1 2 1 1 3 3 3 3 1 3 1

0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1

Using appropriate software, partition the data so there are 50 percent successes (Loan default) in the training set and 40 percent of the validation data are taken away as test data. Classify the data using k-nearest neighbors with up to k = 10. Use Loan default as the output variable and all the other variables as input variables. Be sure to Normalize input data and to Score on best k between 1 and specified value. Generate lift charts for both the validation data and test data. a. For the cutoff probability value 0.5, what value of k minimizes the overall error rate on the validation data? b. What is the overall error rate on the test data? Interpret this measure. c. What are the Class 1 error rate and the Class 0 error rate on the test data? d. Compute and interpret the sensitivity and specificity for the test data. e. Examine the decile-wise lift chart on the test data. What is the first decile lift on the test data? Interpret this value. ANSWER: a. A value of k = 1 minimizes the overall error rate on the validation set.

b. The overall error rate on the test data is 32.50%. For a randomly-selected observation from the test data, kNN with k =1 will classify it correctly 32.50% of the time.

Copyright Cengage Learning. Powered by Cognero.

Page 25


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining

c. The class 1 error rate is 50.00% and the class 0 error rate is 29.41% for the test data. d. Sensitivity = 1 – class 1 error rate = 50.00%. This means that the model can correctly identify 50% of the customers who had defaulted on their loan. Specificity = 1 – class 0 error rate = 70.59%. This means that the model can correctly identify 70.59% of the customers who had not defaulted on their loan in the test data. e. The first decile lift is 1.54. For this test data set of 40 customers and 6 actual customers who have defaulted on their loan, if we randomly selected 4 customers, on average 0.6 of the customers would have defaulted on their loan. However, if we use k-NN with k = 1 to identify the top 4 customers, then (0.6)(1.54) = 0.92 customers would have defaulted on their loan. This can be confirmed from the Detailed Scoring report on the test data by observing that there are 13 observations with a predicted probability of 100% of taking a personal loan, but only 3 of these actually took a loan. Thus, of the top 4 observations recommended by k-NN with k = 1, there would be on average (4)(3/13) = 0.92 that actually defaulted. 45. A bank wants to understand better the details of customers who are likely to default on their loan. In order to analyze this, the data from a random sample of 200 customers are given below: Average Balance 1,222.3 6,291.0 1,051.0 1,118.3 1,176.8 1,052.0 1,314.6 439.7 1,232.7 1,855.4 322.4 1,570.7 2,729.0 1,397.8 1,464.1 40.3 1,296.4 2,142.7

Age 36 41 52 36 35 46 37 34 56 51 44 39 51 41 47 46 32 58

Gender Married Divorced 1 0 1 1 0 1 1 1 0 0 0 1 1 1 1 1 0 0

0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Copyright Cengage Learning. Powered by Cognero.

0 1 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0

Family Size 1 3 4 2 2 3 1 5 1 2 5 3 1 2 2 3 2 4

Loan Default 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 Page 26


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 2,756.3 1,451.1 1,003.9 1,245.7 3,011.1 1,222.3 2,225.9 2,708.2 2,341.6 1,817.7 1,417.3 4,291.6 1,310.7 1,144.0 1,088.4 1,341.9 1,269.1 1,435.5 113.7 4,646.5 1,003.9 1,773.5 3,349.1 647.0 3,901.6 1,603.2 1,308.1 4,061.5 2,283.1 1,023.0 1,083.2 1,158.6 1,052.0 592.2 6,834.4 1,505.7 1,170.0 1,509.6 1,061.0 517.2 1,661.7 1,279.5 1,656.5 1,319.8 1,227.5 1,748.8 1,060.0 1,119.6 1,135.0 2,777.1 1,535.6

32 49 48 40 42 30 51 40 30 42 52 59 34 38 47 31 56 46 58 39 38 46 45 54 58 38 39 35 57 40 52 26 30 35 41 57 56 43 52 60 46 57 46 49 38 51 38 44 59 50 34

1 1 1 1 0 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 0 0 1 1 0 1 1 1 0 1 0 1 0 0 1 0 0 1 0 0 1 1 0 1 0

1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1

Copyright Cengage Learning. Powered by Cognero.

0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0

2 3 2 6 1 2 5 2 1 2 3 4 2 3 2 2 1 2 3 3 2 2 3 3 4 2 2 1 3 2 4 1 2 3 3 3 6 4 3 5 2 1 2 2 1 3 4 2 3 2 1

0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 Page 27


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 352.0 1,605.8 5,737.2 3,354.3 10,096.1 9,164.0 6,796.7 2,108.9 265.2 1097.0 1,041.0 1,224.9 1,557.7 3,202.2 1,173.0 1,794.3 2,423.5 171.8 1,2157.9 4,107.0 887.9 1,165.1 643.5 1,529.1 2,142.7 1,035.0 1,003.9 1,509.6 1,118.3 1,124.8 1,891.8 6,796.7 1,709.8 1,011.7 1,270.4 1,663.0 1,648.7 1,887.9 1,244.4 2,465.1 6,086.9 1,262.6 1,513.5 1,170.3 1,557.7 2,454.7 710.8 3,711.8 1,748.8 1,248.3 1,002.6

59 42 48 45 57 53 48 57 52 55 58 51 44 57 39 41 47 52 50 52 47 44 50 45 59 54 48 43 37 46 49 48 54 30 52 39 33 47 37 36 38 57 27 29 45 57 41 56 52 59 51

0 1 0 0 1 0 1 1 0 1 0 0 1 0 1 0 0 0 1 1 0 0 0 1 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 0 1 1 0 0 1 1 0 0 0 1 0

1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1

Copyright Cengage Learning. Powered by Cognero.

0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0

5 2 2 1 2 3 4 5 3 3 2 1 3 2 3 1 2 3 3 2 2 3 1 1 1 2 3 1 2 2 1 1 2 3 2 3 2 2 3 3 3 3 1 3 2 1 1 2 2 2 2

1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 1 0 0 1 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 Page 28


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 1,130.0 1,040.3 1,595.4 1,144.0 1,582.4 1,049.0 1,577.2 561.0 3,349.1 1,704.6 1,245.7 16,191.8 2,185.6 1,167.7 1,535.6 1,319.8 1,145.6 1,304.2 1,851.5 2,099.8 1,152.0 1,219.7 1,235.3 1,811.2 732.5 5,630.6 2,420.9 2,454.7 1,557.7 4,017.3 4,017.3 1,351.0 1,507.0 1,050.7 1,657.8 1,115.0 245.9 1,058.5 1,377.0 1,079.3 1,456.3 2,063.4 1,106.6 1,119.6 2,496.3 1,578.5 1,284.7 1,409.5 1,085.8 1,083.0 1,556.4

41 47 50 34 52 53 39 40 46 45 35 54 37 40 34 50 49 53 49 40 34 49 50 30 28 56 67 58 46 72 72 41 29 60 36 37 59 56 60 34 41 29 26 60 53 42 47 53 44 38 49

0 0 1 1 1 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 1 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 1 1 0 1 1

1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1

Copyright Cengage Learning. Powered by Cognero.

0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0

1 2 3 1 3 1 2 3 2 3 2 3 3 1 2 1 1 3 3 1 1 2 3 1 2 3 3 3 2 2 1 2 3 1 1 1 2 3 1 1 3 2 2 1 2 1 3 3 2 3 3

0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Page 29


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 1,080.6 1,457.6 1,478.4 1,690.3 1,458.9 1,465.4 1,002.6 1,728.0 1,015.6 1,163.8 1,299.0 1,400.4 1,005.2 1,341.9 1,032.5 1,236.6 1,087.1 1,170.3 1,237.9 1,296.4 1,182.0 1,133.0 1,629.2 1,830.7 1,137.8 2,011.4 170.3 1,135.2 195.0

41 33 47 51 56 46 51 34 31 48 59 38 27 40 39 48 34 55 55 51 37 39 52 36 36 56 36 39 29

0 0 1 0 1 0 1 0 1 0 0 1 1 1 0 0 0 0 0 0 1 0 0 0 1 0 0 1 1

1 1 1 1 1 1 0 1 0 1 1 1 1 0 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1

0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 1 1 0 0

3 2 3 1 3 3 1 1 2 1 2 1 3 2 1 1 1 1 1 2 1 1 3 3 3 3 1 3 1

0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1

Using appropriate software, partition the data so there are 50 percent successes (Loan default) in the training set and 40 percent of the validation data are taken away as test data. Fit a classification tree using Loan Default as the output variable and all the other variables as input variables. In Step 2 of XLMiner’s Classification Tree procedure, be sure to Normalize input data, and set the Minimum #records in a terminal node to 1. In Step 3 of XLMiner’s Classification Tree procedure, set the maximum number of levels to 7. Generate the Full tree, Best pruned tree, and Minimum error tree. Generate lift charts for both the validation data and test data. a. Why is partitioning with oversampling advised in this case? b. Interpret the set of rules implied by the best pruned tree that characterize loan defaulters. c. For the default cutoff value of 0.5, what are the overall error rate, Class 1 error rate, and Class 0 error rate of the best pruned tree on the test data? d. Examine the decile-wise lift chart for the best pruned tree on the test data. What is the first decile lift? Interpret this value. ANSWER:

a. Loan default observations only make up 15.5% of the data set. By oversampling the Loan default observations in the training set, a data mining algorithm can better learn how to classify them. b. Customers who had defaulted on their loan are classified by: i. The average balance is less than $951.75 OR

Copyright Cengage Learning. Powered by Cognero.

Page 30


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining ii. The average balance is less than $1,216.30 and the family size is greater than 2.

c. The overall error rate is 10%. The class 1 error rate is 50% and the class 0 error rate is 2.94%.

d. The first decile lift is 5. For this test data set of 40 customers and 6 actual customers who have defaulted on their loan, if we randomly selected 4 customers, on average 0.6 customers would have defaulted on their loan. However, if we use the classification tree to identify the top 4 customers, then (0.6)(5) = 3 of the customers Copyright Cengage Learning. Powered by Cognero.

Page 31


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining would have defaulted on their loan. This can be confirmed from the Detailed Scoring report by observing that among the top 4 observations in the test set rated by the best pruned tree to be most likely to default, 3 actually defaulted. 46. A bank wants to understand better the details of customers who are likely to default on their loan. In order to analyze this, the data from a random sample of 200 customers are given below: Average Balance 1,222.3 6,291.0 1,051.0 1,118.3 1,176.8 1,052.0 1,314.6 439.7 1,232.7 1,855.4 322.4 1,570.7 2,729.0 1,397.8 1,464.1 40.3 1,296.4 2,142.7 2,756.3 1,451.1 1,003.9 1,245.7 3,011.1 1,222.3 2,225.9 2,708.2 2,341.6 1,817.7 1,417.3 4,291.6 1,310.7 1,144.0 1,088.4 1,341.9 1,269.1 1,435.5 113.7 4,646.5 1,003.9 1,773.5 3,349.1 647.0

Age 36 41 52 36 35 46 37 34 56 51 44 39 51 41 47 46 32 58 32 49 48 40 42 30 51 40 30 42 52 59 34 38 47 31 56 46 58 39 38 46 45 54

Gender Married Divorced 1 0 1 1 0 1 1 1 0 0 0 1 1 1 1 1 0 0 1 1 1 1 0 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1

0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1 1 1

Copyright Cengage Learning. Powered by Cognero.

0 1 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1

Family Size 1 3 4 2 2 3 1 5 1 2 5 3 1 2 2 3 2 4 2 3 2 6 1 2 5 2 1 2 3 4 2 3 2 2 1 2 3 3 2 2 3 3

Loan Default 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 Page 32


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 3,901.6 1,603.2 1,308.1 4,061.5 2,283.1 1,023.0 1,083.2 1,158.6 1,052.0 592.2 6,834.4 1,505.7 1,170.0 1,509.6 1,061.0 517.2 1,661.7 1,279.5 1,656.5 1,319.8 1,227.5 1,748.8 1,060.0 1,119.6 1,135.0 2,777.1 1,535.6 352.0 1,605.8 5,737.2 3,354.3 10,096.1 9,164.0 6,796.7 2,108.9 265.2 1097.0 1,041.0 1,224.9 1,557.7 3,202.2 1,173.0 1,794.3 2,423.5 171.8 1,2157.9 4,107.0 887.9 1,165.1 643.5 1,529.1

58 38 39 35 57 40 52 26 30 35 41 57 56 43 52 60 46 57 46 49 38 51 38 44 59 50 34 59 42 48 45 57 53 48 57 52 55 58 51 44 57 39 41 47 52 50 52 47 44 50 45

1 1 0 0 1 1 0 1 1 1 0 1 0 1 0 0 1 0 0 1 0 0 1 1 0 1 0 0 1 0 0 1 0 1 1 0 1 0 0 1 0 1 0 0 0 1 1 0 0 0 1

1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1

Copyright Cengage Learning. Powered by Cognero.

0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0

4 2 2 1 3 2 4 1 2 3 3 3 6 4 3 5 2 1 2 2 1 3 4 2 3 2 1 5 2 2 1 2 3 4 5 3 3 2 1 3 2 3 1 2 3 3 2 2 3 1 1

0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 1 0 0 1 0 1 0 Page 33


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 2,142.7 1,035.0 1,003.9 1,509.6 1,118.3 1,124.8 1,891.8 6,796.7 1,709.8 1,011.7 1,270.4 1,663.0 1,648.7 1,887.9 1,244.4 2,465.1 6,086.9 1,262.6 1,513.5 1,170.3 1,557.7 2,454.7 710.8 3,711.8 1,748.8 1,248.3 1,002.6 1,130.0 1,040.3 1,595.4 1,144.0 1,582.4 1,049.0 1,577.2 561.0 3,349.1 1,704.6 1,245.7 16,191.8 2,185.6 1,167.7 1,535.6 1,319.8 1,145.6 1,304.2 1,851.5 2,099.8 1,152.0 1,219.7 1,235.3 1,811.2

59 54 48 43 37 46 49 48 54 30 52 39 33 47 37 36 38 57 27 29 45 57 41 56 52 59 51 41 47 50 34 52 53 39 40 46 45 35 54 37 40 34 50 49 53 49 40 34 49 50 30

0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 0 1 1 0 0 1 1 0 0 0 1 0 0 0 1 1 1 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 0 0 1 1

1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1

Copyright Cengage Learning. Powered by Cognero.

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0

1 2 3 1 2 2 1 1 2 3 2 3 2 2 3 3 3 3 1 3 2 1 1 2 2 2 2 1 2 3 1 3 1 2 3 2 3 2 3 3 1 2 1 1 3 3 1 1 2 3 1

0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Page 34


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 732.5 5,630.6 2,420.9 2,454.7 1,557.7 4,017.3 4,017.3 1,351.0 1,507.0 1,050.7 1,657.8 1,115.0 245.9 1,058.5 1,377.0 1,079.3 1,456.3 2,063.4 1,106.6 1,119.6 2,496.3 1,578.5 1,284.7 1,409.5 1,085.8 1,083.0 1,556.4 1,080.6 1,457.6 1,478.4 1,690.3 1,458.9 1,465.4 1,002.6 1,728.0 1,015.6 1,163.8 1,299.0 1,400.4 1,005.2 1,341.9 1,032.5 1,236.6 1,087.1 1,170.3 1,237.9 1,296.4 1,182.0 1,133.0 1,629.2 1,830.7

28 56 67 58 46 72 72 41 29 60 36 37 59 56 60 34 41 29 26 60 53 42 47 53 44 38 49 41 33 47 51 56 46 51 34 31 48 59 38 27 40 39 48 34 55 55 51 37 39 52 36

0 0 0 0 0 1 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 1 0 0 1 1 1 0 0 0 0 0 0 1 0 0 0

0 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 0 0 1 1 1 1 1 0 0 1 1

Copyright Cengage Learning. Powered by Cognero.

0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0

2 3 3 3 2 2 1 2 3 1 1 1 2 3 1 1 3 2 2 1 2 1 3 3 2 3 3 3 2 3 1 3 3 1 1 2 1 2 1 3 2 1 1 1 1 1 2 1 1 3 3

1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 Page 35


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 1,137.8 2,011.4 170.3 1,135.2 195.0

36 56 36 39 29

1 0 0 1 1

1 1 1 1 1

0 1 1 0 0

3 3 1 3 1

0 0 1 1 1

In XLMiner’s Partition with Oversampling procedure, partition the data so there are 50 percent successes (Loan default) in the training set and 40 percent of the validation data are taken away as test data. Construct a logistic regression model using Loan default as the output variable and all the other variables as input variables. Perform an exhaustive-search best subset selection with the number of best subsets equal to 2. Generate lift charts for both the validation data and test data. a. From the generated set of logistic regression models, select one that is a good fit. Express the model as a mathematical equation relating the output variable to the input variables. Do the relationships suggested by the model make sense? Try to explain them. b. Using the default cutoff value of 0.5 for your logistic regression model, what is the overall error rate on the test data? c. Examine the decile-wise lift chart for your model on the test data. What is the first decile lift? Interpret this value. ANSWER: a. Using Mallow’s Cp statistic to guide the selection, we see that the model using 2 independent variables seem to be viable candidates. We will select the model with 2 variables (3 coefficients including the intercept).

Resulting model: log odds of event (Loan default) = 1.556 – 0.004(Average Balance) + 1.569(Family size)

As the average balance in the account increases, the chances of a customer defaulting on the loan decreases, and if the family size increases the chances of a customer defaulting on the loan increases. b. The overall error rate is 22.50%.

Copyright Cengage Learning. Powered by Cognero.

Page 36


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining

c. The first decile lift is 5. For this test data set of 40 customers and 6 actual customers who have defaulted on their loan, if we randomly selected 4 customers, on average 0.6 customers would have defaulted on their loan. However, if we use the classification tree to identify the top 4 customers, then (0.6)(5) = 3 of the customers would have defaulted on their loan. 47. To examine the local housing market in a particular region, a sample of 120 homes sold during a year are collected. The data are given below: Land BuildingValue Value ($) ($) Acres 18,100 92,500 0.5 23,600 152,700 0.22 25,900 134,300 0.3 22,100 129,600 0.23 23,900 168,700 0.32 22,400 118,300 0.25 24,100 123,300 0.26 26,300 133,800 0.26 24,900 139,400 0.24 13,600 87,200 0.17 36,100 210,400 0.6 19,500 101,300 0.16 38,800 224,700 0.44 23,500 139,000 0.22 26,300 164,200 0.35 21,900 122,400 0.17 23,400 149,600 0.22 15,000 102,200 0.12 15,000 102,200 0.12 9,200 22,000 0.17 9,200 22,000 0.17 5,600 48,000 0.12 9,000 58,800 0.24

Baths 1 2 2 2 2 2 1 2 2 1 2 1 2 1 2 2 2 1 1 1 1 1 1

Copyright Cengage Learning. Powered by Cognero.

Toilets FireplacesBedrooms Age 1 1 4 53.9 1 1 3 19.7 1 1 3 15.9 2 1 4 41 1 1 4 39.9 1 1 3 41.8 1 1 4 70.9 1 1 3 37.8 1 1 4 33 1 0 3 34.7 1 2 2 52.9 1 1 2 67.8 1 1 4 21.7 1 0 3 10.8 1 0 3 3.9 1 1 3 15.7 1 1 3 15.7 1 0 3 97.8 1 0 3 97.7 1 0 4 120.9 1 0 4 120.9 1 0 3 103.9 1 0 3 88

Sale Price ($) 114,885 180,895 162,038 154,496 196,973 145,075 151,480 164,762 166,528 105,762 250,170 125,082 265,066 166,697 194,881 146,818 176,048 119,584 121,759 34,947 35,214 57,142 72,192 Page 37


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 21,000 23,500 36,000 23,700 22,000 19,900 22,100 24,600 21,500 15,000 15,700 14,200 10,700 16,600 25,500 15,100 7,400 28,500 25,100 50,100 83,300 124,500 47,000 64,600 33,900 41,100 29,100 56,400 45,400 23,800 52,800 25,100 27,200 28,100 28,800 33,400 20,700 25,600 25,800 29,300 26,000 25,900 32,800 31,100 25,800 27,200 25,000 29,200

109,600 165,900 262,500 114,900 102,700 95,800 116,300 165,500 113,400 81,100 129,200 81,600 49,700 72,700 110,700 74,300 55,500 129,400 83,900 164,600 276,000 552,300 214,400 185,000 138,800 156,300 96,400 256,400 219,200 92,100 172,800 99,200 152,600 102,900 98,800 103,900 95,600 101,900 110,700 147,700 116,000 73,500 125,000 166,800 105,300 94,800 105,900 117,500

0.21 0.15 0.22 0.22 0.2 0.23 0.18 0.29 0.17 0.16 0.23 0.15 0.15 0.18 0.21 0.23 0.15 0.25 0.2 0.23 0.61 1.05 0.22 0.58 0.22 0.18 0.28 0.4 0.21 0.15 0.27 0.19 0.18 0.18 0.19 0.45 0.14 0.2 0.18 0.2 0.18 0.16 0.35 0.2 0.17 0.17 0.16 0.2

2 2 3 1 1 2 1 1 1 1 2 1 1 1 1 1 1 1 1 2 3 4 2 2 1 1 1 1 1 1 2 1 2 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1

Copyright Cengage Learning. Powered by Cognero.

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1

0 1 1 0 0 1 0 1 1 0 0 0 0 1 1 1 0 0 0 0 1 2 2 1 1 1 0 1 1 1 2 0 1 1 0 1 1 0 0 1 1 0 1 2 0 0 1 1

3 4 4 4 3 2 3 4 3 2 3 3 2 4 3 3 2 4 3 3 2 4 4 4 4 3 1 3 3 4 2 3 3 3 3 4 3 2 3 4 3 2 3 2 3 3 3 3

36.7 5.7 2.9 37.7 48.9 78.9 30.8 43 44.9 62.9 46.7 57.9 99.8 91.8 48 71.8 96.8 49.9 45.8 44 47.9 5.7 92.9 91 97.9 76 57.8 56.8 79.8 91.9 74.8 36.7 16.7 75.8 53.9 84.9 89.8 57.8 51.9 90.9 44 81.8 68.7 57.7 58.8 42.9 82 53.8

133,848 194,079 300,407 141,700 128,866 119,189 141,018 193,661 137,308 99,817 148,909 100,701 65,082 92,614 137,889 91,180 64,119 160,139 113,043 217,684 360,936 679,795 264,115 254,075 173,987 200,251 130,214 316,874 267,672 119,769 229,499 128,456 181,102 132,977 131,411 139,697 120,046 131,026 141,202 181,575 144,513 100,953 160,546 199,970 134,647 124,311 133,543 151,392 Page 38


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 30,000 20,400 23,600 16,200 29,300 27,000 25,600 46,200 22,900 27,100 30,700 29,100 34,700 20,000 35,700 35,100 33,700 33,700 36,400 33,200 39,200 33,100 16,000 24,900 22,000 20,000 33,900 22,100 22,800 24,700 38,700 25,800 31,700 82,200 19,500 24,400 22,500 25,900 22,700 21,200 34,000 18,900 33,900 23,800 23,900 18,500 36,300 47,300

93,300 112,000 83,400 85,800 123,900 97,800 86,300 220,500 160,000 105,200 107,100 102,400 150,400 80,400 159,400 161,500 162,500 162,500 176,100 122,300 169,200 180,100 98,400 63,800 121,300 107,600 230,800 153,800 111,100 117,800 118,700 108,000 140,500 171,700 147,600 132,000 119,800 117,100 95,000 56,700 163,800 118,000 151,600 133,500 119,000 110,500 122,500 298,800

0.26 0.13 0.16 0.1 0.22 0.18 0.16 0.57 0.15 0.21 0.3 0.23 0.28 0.24 0.28 0.25 0.21 0.21 0.29 0.2 0.36 0.2 0.19 0.45 0.27 0.23 0.27 0.3 0.23 0.32 0.81 0.26 0.34 1.23 0.53 0.25 0.18 0.29 0.25 0.23 0.26 0.17 0.26 0.21 0.21 0.19 0.61 0.36

1 2 1 1 1 1 1 2 3 1 1 2 1 1 2 2 2 2 2 2 2 2 1 2 1 1 2 1 1 1 1 2 1 2 2 2 2 2 1 1 2 1 2 2 2 2 1 3

Copyright Cengage Learning. Powered by Cognero.

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

0 1 0 1 1 0 0 1 1 0 0 0 2 0 1 1 1 1 1 0 1 1 0 1 0 1 1 1 0 0 1 0 1 1 1 1 1 0 0 0 1 0 1 1 1 2 2 1

2 3 2 2 4 3 3 4 3 3 3 2 3 3 4 3 4 4 4 3 3 4 4 2 4 3 3 3 3 3 3 2 3 3 2 3 3 3 3 2 4 3 3 3 3 4 3 4

55.7 80.9 57.7 67 44.8 46.8 61.7 50.8 20.7 51.8 70 58 68.9 66.9 1.7 8.8 8.8 8.8 8.9 4.9 5.9 5.8 49.9 83.9 34.9 36.7 10 46.8 52 48.7 47.8 53.3 40.6 56.4 28.2 14.2 15.5 17.7 55.6 96.6 15.2 45.5 25.3 13.6 14.3 32.2 56.2 31.4

124,476 136,599 110,399 105,027 157,819 129,675 115,952 268,552 187,870 135,549 142,738 135,284 189,790 105,302 196,936 201,349 198,580 200,228 215,634 157,208 212,662 217,543 118,491 91,539 147,802 131,948 268,444 180,464 137,326 145,115 159,644 135,049 174,475 257,467 169,311 157,570 143,676 146,960 121,175 81,869 199,361 139,981 186,637 161,123 146,054 130,575 162,270 348,138 Page 39


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 36,600

238,700

0.28

2

1

2

3

25.5

278,839

Partition the data into training (50 percent), validation (30 percent), and test (20 percent) sets. Predict the sale price using k-nearest neighbors with up to k = 10. Use Sale Price as the output variable and all the other variables as input variables. In Step 2 of XLMiner’s k-Nearest Neighbors Prediction procedure, be sure to Normalize input data and to Score on best k between 1 and specified value. Generate a Detailed Scoring report for all three sets of data. a. What value of k minimizes the root mean squared error (RMSE) on the validation data? b. What is the RMSE on the validation data and test data? c. What is the average error on the validation data and test data? What does this suggest? ANSWER: a. A value of k = 2 minimizes the RMSE on validation data.

b. The RMSE on the validation set is $22,873.11, and the RMSE on the test data is $27,987.05.

c. The average error of –778.37 on the validation data suggests a slight tendency to overestimate the output variable in the validation data. The average error of 8324.75 on the test data suggests a tendency to underestimate the output variable in the test data. The difference in sign of these two average error estimates suggests that there is no systematic bias in the predictions. 48. To examine the local housing market in a particular region, a sample of 120 homes sold during a year are collected. Copyright Cengage Learning. Powered by Cognero.

Page 40


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining The data are given below: Land Value ($) 18,100 23,600 25,900 22,100 23,900 22,400 24,100 26,300 24,900 13,600 36,100 19,500 38,800 23,500 26,300 21,900 23,400 15,000 15,000 9,200 9,200 5,600 9,000 21,000 23,500 36,000 23,700 22,000 19,900 22,100 24,600 21,500 15,000 15,700 14,200 10,700 16,600 25,500 15,100 7,400 28,500 25,100 50,100

Building Value ($) 92,500 152,700 134,300 129,600 168,700 118,300 123,300 133,800 139,400 87,200 210,400 101,300 224,700 139,000 164,200 122,400 149,600 102,200 102,200 22,000 22,000 48,000 58,800 109,600 165,900 262,500 114,900 102,700 95,800 116,300 165,500 113,400 81,100 129,200 81,600 49,700 72,700 110700 74,300 55,500 129,400 83,900 164,600

Acres 0.5 0.22 0.3 0.23 0.32 0.25 0.26 0.26 0.24 0.17 0.6 0.16 0.44 0.22 0.35 0.17 0.22 0.12 0.12 0.17 0.17 0.12 0.24 0.21 0.15 0.22 0.22 0.2 0.23 0.18 0.29 0.17 0.16 0.23 0.15 0.15 0.18 0.21 0.23 0.15 0.25 0.2 0.23

Baths Toilets Fireplaces Bedrooms 1 1 1 4 2 1 1 3 2 1 1 3 2 2 1 4 2 1 1 4 2 1 1 3 1 1 1 4 2 1 1 3 2 1 1 4 1 1 0 3 2 1 2 2 1 1 1 2 2 1 1 4 1 1 0 3 2 1 0 3 2 1 1 3 2 1 1 3 1 1 0 3 1 1 0 3 1 1 0 4 1 1 0 4 1 1 0 3 1 1 0 3 2 1 0 3 2 1 1 4 3 1 1 4 1 1 0 4 1 1 0 3 2 1 1 2 1 1 0 3 1 1 1 4 1 1 1 3 1 1 0 2 2 1 0 3 1 1 0 3 1 1 0 2 1 1 1 4 1 1 1 3 1 1 1 3 1 1 0 2 1 1 0 4 1 1 0 3 2 1 0 3

Copyright Cengage Learning. Powered by Cognero.

Age 53.9 19.7 15.9 41 39.9 41.8 70.9 37.8 33 34.7 52.9 67.8 21.7 10.8 3.9 15.7 15.7 97.8 97.7 120.9 120.9 103.9 88 36.7 5.7 2.9 37.7 48.9 78.9 30.8 43 44.9 62.9 46.7 57.9 99.8 91.8 48 71.8 96.8 49.9 45.8 44

Sale Price ($) 114,885 180,895 162,038 154,496 196,973 145,075 151,480 164,762 166,528 105,762 250,170 125,082 265,066 166,697 194,881 146,818 176,048 119,584 121,759 34,947 35,214 57,142 72,192 133,848 194,079 300,407 141,700 128,866 119,189 141,018 193,661 137,308 99,817 148,909 100,701 65,082 92,614 137,889 91,180 64,119 160,139 113,043 217,684 Page 41


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 83,300 124,500 47,000 64,600 33,900 41,100 29,100 56,400 45,400 23,800 52,800 25,100 27,200 28,100 28,800 33,400 20,700 25,600 25,800 29,300 26,000 25,900 32,800 31,100 25,800 27,200 25,000 29,200 30,000 20,400 23,600 16,200 29,300 27,000 25,600 46,200 22,900 27,100 30,700 29,100 34,700 20,000 35,700 35,100 33,700 33,700 36,400 33,200

276,000 552,300 214,400 185,000 138,800 156,300 96,400 256,400 219,200 92,100 172,800 99,200 152,600 102,900 98,800 103,900 95,600 101,900 110,700 147,700 116,000 73,500 125,000 166,800 105,300 94,800 105,900 117,500 93,300 112,000 83,400 85,800 123,900 97,800 86,300 220,500 160,000 105,200 107,100 102,400 150,400 80,400 159,400 161,500 162,500 162,500 176,100 122,300

0.61 1.05 0.22 0.58 0.22 0.18 0.28 0.4 0.21 0.15 0.27 0.19 0.18 0.18 0.19 0.45 0.14 0.2 0.18 0.2 0.18 0.16 0.35 0.2 0.17 0.17 0.16 0.2 0.26 0.13 0.16 0.1 0.22 0.18 0.16 0.57 0.15 0.21 0.3 0.23 0.28 0.24 0.28 0.25 0.21 0.21 0.29 0.2

3 4 2 2 1 1 1 1 1 1 2 1 2 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 3 1 1 2 1 1 2 2 2 2 2 2

Copyright Cengage Learning. Powered by Cognero.

1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 2 2 1 1 1 0 1 1 1 2 0 1 1 0 1 1 0 0 1 1 0 1 2 0 0 1 1 0 1 0 1 1 0 0 1 1 0 0 0 2 0 1 1 1 1 1 0

2 4 4 4 4 3 1 3 3 4 2 3 3 3 3 4 3 2 3 4 3 2 3 2 3 3 3 3 2 3 2 2 4 3 3 4 3 3 3 2 3 3 4 3 4 4 4 3

47.9 5.7 92.9 91 97.9 76 57.8 56.8 79.8 91.9 74.8 36.7 16.7 75.8 53.9 84.9 89.8 57.8 51.9 90.9 44 81.8 68.7 57.7 58.8 42.9 82 53.8 55.7 80.9 57.7 67 44.8 46.8 61.7 50.8 20.7 51.8 70 58 68.9 66.9 1.7 8.8 8.8 8.8 8.9 4.9

360,936 679,795 264,115 254,075 173,987 200,251 130,214 316,874 267,672 119,769 229,499 128,456 181,102 132,977 131,411 139,697 120,046 131,026 141,202 181,575 144,513 100,953 160,546 199,970 134,647 124,311 133,543 151,392 124,476 136,599 110,399 105,027 157,819 129,675 115,952 268,552 187,870 135,549 142,738 135,284 189,790 105,302 196,936 201,349 198,580 200,228 215,634 157,208 Page 42


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 39,200 33,100 16,000 24,900 22,000 20,000 33,900 22,100 22,800 24,700 38,700 25,800 31,700 82,200 19,500 24,400 22,500 25,900 22,700 21,200 34,000 18,900 33,900 23,800 23,900 18,500 36,300 47,300 36,600

169,200 180,100 98,400 63,800 121,300 107,600 230,800 153,800 111,100 117,800 118,700 108,000 140,500 171,700 147,600 132,000 119,800 117,100 95,000 56,700 163,800 118,000 151,600 133,500 119,000 110,500 122,500 298,800 238,700

0.36 0.2 0.19 0.45 0.27 0.23 0.27 0.3 0.23 0.32 0.81 0.26 0.34 1.23 0.53 0.25 0.18 0.29 0.25 0.23 0.26 0.17 0.26 0.21 0.21 0.19 0.61 0.36 0.28

2 2 1 2 1 1 2 1 1 1 1 2 1 2 2 2 2 2 1 1 2 1 2 2 2 2 1 3 2

1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 1 0 1 0 1 1 1 0 0 1 0 1 1 1 1 1 0 0 0 1 0 1 1 1 2 2 1 2

3 4 4 2 4 3 3 3 3 3 3 2 3 3 2 3 3 3 3 2 4 3 3 3 3 4 3 4 3

5.9 5.8 49.9 83.9 34.9 36.7 10 46.8 52 48.7 47.8 53.3 40.6 56.4 28.2 14.2 15.5 17.7 55.6 96.6 15.2 45.5 25.3 13.6 14.3 32.2 56.2 31.4 25.5

212,662 217,543 118,491 91,539 147,802 131,948 268,444 180,464 137,326 145,115 159,644 135,049 174,475 257,467 169,311 157,570 143,676 146,960 121,175 81,869 199,361 139,981 186,637 161,123 146,054 130,575 162,270 348,138 278,839

Partition the data into training (50 percent), validation (30 percent), and test (20 percent) sets. Predict the sale price using a regression tree. Use Sale Price as the output variable and all the other variables as input variables. In Step 2 of XLMiner’s Regression Tree procedure, be sure to Normalize input data, set the Maximum #splits for input variables to 59, set the Minimum #records in a terminal node to 1, and specify Using Best pruned tree as the scoring option. In Step 3 of XLMiner’s Regression Tree procedure, set the maximum number of levels to 7. Generate the Full tree and Best pruned tree. a. In terms of number of decision nodes, compare the size of the full tree to the size of the best pruned tree. b. What is the root mean squared error (RMSE) of the best pruned tree on the validation data and on the test data? c. What is the average error on the validation data and test data? What does this suggest? d. By examining the best pruned tree, what are the critical variables in predicting the sale price of a home? ANSWER: a. There 59 decision nodes in the full tree and 41 decision nodes in the best pruned tree. b. The RMSE on the validation set is $13,757.85, and the RMSE on the test data is $16,054.64.

Copyright Cengage Learning. Powered by Cognero.

Page 43


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining

c. The average error on the validation set is $5,464.69, and the average error on the test data is $5,294.11. There is a slight evidence of systematic underestimation of home sale price. d. The best pruned tree for the pre-crisis data contains decision nodes on BuildingValue, LandValue, Acres, Fireplaces, and Age.

49. A research team wanted to assess the relationship between age, systolic blood pressure, smoking, and risk of stroke. A sample of 150 patients who had a stroke is selected and the data collected are given below. Here, for the variable Smoker, 1 represents smokers and 0 represents nonsmokers. Age 86 76 56 78 67 77 60 66 80 62 59 72 70 73

Blood Pressure 177 189 155 98 145 209 199 166 125 117 83 134 145 188

Smoker 1 1 0 1 0 1 1 1 1 1 0 1 1 0

Copyright Cengage Learning. Powered by Cognero.

Risk 45 65 16 45 7 34 67 54 67 56 12 32 45 26 Page 44


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 67 64 67 64 62 75 59 64 71 78 65 67 69 67 75 72 71 70 68 63 70 75 66 64 71 58 70 63 65 63 66 65 80 67 60 60 62 66 57 68 66 72 70 70 61 60 82 70 63 75 60

163 87 123 204 145 213 196 124 145 120 156 167 143 187 193 85 152 89 132 165 221 173 145 132 167 155 134 92 143 143 87 154 135 156 187 125 176 187 152 154 134 165 173 132 167 165 119 184 167 132 176

1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 0 1 1 1 1 0 1 1 0 1 1 1 1 0 0 1 1 1 1 0 1 1 0 1 1 1 1 0 1 1 1

Copyright Cengage Learning. Powered by Cognero.

67 87 23 23 34 56 76 34 54 43 56 13 76 54 34 6 23 45 34 68 23 87 67 56 45 26 65 76 28 45 34 39 72 24 22 34 59 67 54 26 34 56 27 76 45 34 76 12 56 89 76 Page 45


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 72 72 65 72 60 64 65 70 75 75 62 67 69 73 68 67 61 61 71 60 67 67 73 67 71 67 63 73 72 70 67 69 69 62 74 71 75 63 71 66 66 71 71 70 66 68 66 63 75 61 70

78 93 154 77 134 165 187 234 123 99 103 114 156 160 107 142 165 141 128 138 117 147 135 154 174 126 142 167 159 133 147 157 175 125 150 124 176 173 172 112 130 125 104 125 102 176 167 156 187 113 142

1 0 1 1 1 1 1 0 1 0 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1

Copyright Cengage Learning. Powered by Cognero.

56 12 54 56 78 38 59 6 47 21 28 39 13 47 56 43 26 45 87 45 34 8 56 76 43 56 52 23 45 52 45 32 67 65 49 58 12 52 49 64 69 61 34 58 64 58 54 47 34 56 37 Page 46


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 72 105 1 57 63 140 1 12 61 137 1 34 71 142 1 67 66 105 1 54 68 149 1 36 60 134 0 5 64 128 1 43 65 111 1 47 70 106 1 26 67 101 1 37 74 170 1 48 61 130 1 59 72 164 1 62 65 123 1 68 62 211 0 9 75 98 1 56 65 67 1 53 62 145 1 45 67 132 1 49 67 145 1 39 62 132 1 46 67 154 1 56 74 167 1 63 61 156 0 12 75 187 1 59 75 193 1 39 61 132 1 47 65 156 1 52 74 123 1 57 75 156 0 34 70 167 0 12 64 165 1 48 76 123 1 41 Partition the data into training (50 percent), validation (30 percent), and test (20 percent) sets. Predict the Risk of stroke using k-nearest neighbors with up to k = 20. Use Risk as the output variable and all the other variables as input variables. In Step 2 of XLMiner’s k-Nearest Neighbors Prediction procedure, be sure to Normalize input data and to Score on best k between 1 and specified value. Generate a Detailed Scoring report for all three sets of data. a. What value of k minimizes the root mean squared error (RMSE) on the validation data? b. What is the RMSE on the validation data and test data? c. What is the average error on the validation data and test data? What does this suggest? ANSWER: a. A value of k = 10 minimizes the RMSE on validation data.

Copyright Cengage Learning. Powered by Cognero.

Page 47


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining

b. The RMSE on the validation set is 15.31, and the RMSE on the test data is 14.65.

c. The average error of 4.84 on the validation data suggests a slight tendency to underestimate the output variable in the validation data. The average error of –1.32 on the test data suggests a slight tendency to overestimate the output variable in the test data. The difference in sign of these two average error estimates suggests that there is no systematic bias in the predictions. 50. A research team wanted to assess the relationship between age, systolic blood pressure, smoking, and risk of stroke. A sample of 150 patients who had a stroke is selected and the data collected are given below. Here, for the variable Smoker, 1 represents smokers and 0 represents nonsmokers. Age 86

Blood Pressure 177

Smoker 1

Copyright Cengage Learning. Powered by Cognero.

Risk 45 Page 48


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 76 56 78 67 77 60 66 80 62 59 72 70 73 67 64 67 64 62 75 59 64 71 78 65 67 69 67 75 72 71 70 68 63 70 75 66 64 71 58 70 63 65 63 66 65 80 67 60 60 62 66

189 155 98 145 209 199 166 125 117 83 134 145 188 163 87 123 204 145 213 196 124 145 120 156 167 143 187 193 85 152 89 132 165 221 173 145 132 167 155 134 92 143 143 87 154 135 156 187 125 176 187

1 0 1 0 1 1 1 1 1 0 1 1 0 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 0 1 1 1 1 0 1 1 0 1 1 1 1 0 0 1 1 1

Copyright Cengage Learning. Powered by Cognero.

65 16 45 7 34 67 54 67 56 12 32 45 26 67 87 23 23 34 56 76 34 54 43 56 13 76 54 34 6 23 45 34 68 23 87 67 56 45 26 65 76 28 45 34 39 72 24 22 34 59 67 Page 49


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 57 68 66 72 70 70 61 60 82 70 63 75 60 72 72 65 72 60 64 65 70 75 75 62 67 69 73 68 67 61 61 71 60 67 67 73 67 71 67 63 73 72 70 67 69 69 62 74 71 75 63

152 154 134 165 173 132 167 165 119 184 167 132 176 78 93 154 77 134 165 187 234 123 99 103 114 156 160 107 142 165 141 128 138 117 147 135 154 174 126 142 167 159 133 147 157 175 125 150 124 176 173

1 0 1 1 0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 0 1 0 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1 1 1 0 1

Copyright Cengage Learning. Powered by Cognero.

54 26 34 56 27 76 45 34 76 12 56 89 76 56 12 54 56 78 38 59 6 47 21 28 39 13 47 56 43 26 45 87 45 34 8 56 76 43 56 52 23 45 52 45 32 67 65 49 58 12 52 Page 50


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 71 66 66 71 71 70 66 68 66 63 75 61 70 72 63 61 71 66 68 60 64 65 70 67 74 61 72 65 62 75 65 62 67 67 62 67 74 61 75 75 61 65 74 75 70 64 76

172 112 130 125 104 125 102 176 167 156 187 113 142 105 140 137 142 105 149 134 128 111 106 101 170 130 164 123 211 98 67 145 132 145 132 154 167 156 187 193 132 156 123 156 167 165 123

1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 0 1 1

49 64 69 61 34 58 64 58 54 47 34 56 37 57 12 34 67 54 36 5 43 47 26 37 48 59 62 68 9 56 53 45 49 39 46 56 63 12 59 39 47 52 57 34 12 48 41

Partition the data into training (50 percent), validation (30 percent), and test (20 percent) sets. Predict the Risk of stroke using a regression tree. Use Risk as the output variable and all the other variables as input variables. In Step 2 of XLMiner’s Regression Tree procedure, be sure to Normalize input data, set the Maximum #splits for input variables to 74, set the Minimum #records in a terminal node to 1, and specify Using Best pruned tree as the scoring option. In Step 3 of Copyright Cengage Learning. Powered by Cognero.

Page 51


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining XLMiner’s Regression Tree procedure, set the maximum number of levels to 7. Generate the Full tree, Best pruned tree, and Minimum error tree. Generate a Detailed Scoring report for all three sets of data. a. In terms of number of decision nodes, compare the size of the full tree to the size of the best pruned tree. b. What is the root mean squared error (RMSE) of the best pruned tree on the validation data and on the test data? c. What is the average error on the validation data and test data? What does this suggest? d. By examining the best pruned tree, what are the critical variables in predicting the risk? ANSWER: a. There 67 decision nodes in the full tree and 1 decision node in the best pruned tree. b. The RMSE on the validation set is 13.66, and the RMSE on the test data is 14.78.

c. The average error of 3.54 on the validation data suggests a slight tendency to underestimate the output variable in the validation data. The average error of –0.91 on the test data suggests a slight tendency to overestimate the output variable in the test data. The difference in sign of these two average error estimates suggests that there is no systematic bias in the predictions. d. The best pruned tree only contains a decision node on Smoker.

51. A bank is interested in identifying different attributes of its customers and below is the sample data of 150 customers. For Gender, 0 represents Male and 1 represents Female. For Personal loan, 0 represents a customer who has not taken a personal loan and 1 represents a customer who has taken a personal loan. Partition the data into training (50 percent), validation (30 percent), and test (20 percent) sets, setting the seed as 12345. Use k-Nearest Neighbors in XLMiner to classify the data, setting k-nearest neighbors with up to k = 10. Use Age, Gender, Work experience, Income (in $1000s), and Family size as input variables and Personal loan as the output variable. Be sure to Normalize input data and to Score on best k between 1 and specified value. Generate lift charts for both the validation data and test data. Explain the differences in the overall error rate on the training, validation, and test data. Copyright Cengage Learning. Powered by Cognero.

Page 52


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining

Age 47 26 38 37 44 55 44 30 63 34 52 55 52 63 51 41 37 46 30 48 50 56 35 39 48 51 27 57 33 58 46 32 56 35 47 50 57 38 52 56 47

Gender 0 1 0 0 0 1 1 1 0 1 0 0 0 1 1 0 0 0 1 1 1 0 1 0 0 1 1 1 1 0 0 0 1 1 1 0 1 0 1 0 1

Work Income (in Personal experience $1000s) Family size loan 22 53 3 1 3 22 1 1 16 29 4 1 12 32 6 1 22 32 3 0 30 45 7 0 23 50 2 0 5 22 2 0 35 56 2 0 8 23 4 0 26 29 1 1 25 34 2 1 28 45 3 1 29 23 3 1 30 32 4 0 18 21 5 1 14 43 4 1 23 23 3 1 6 18 3 1 25 34 2 0 22 21 1 1 31 24 4 0 9 23 3 1 13 29 5 1 22 34 6 0 21 39 2 1 3 26 1 1 32 49 2 1 12 39 3 1 33 32 2 0 21 45 3 1 6 23 5 0 28 45 3 1 12 28 4 1 23 38 1 1 23 32 3 0 25 32 4 1 15 25 5 1 24 22 2 1 31 19 3 1 24 34 4 0

Copyright Cengage Learning. Powered by Cognero.

Page 53


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 54 25 40 61 29 52 56 61 26 60 37 39 46 59 54 27 54 42 64 33 65 38 48 31 39 58 51 41 36 39 58 42 44 65 40 33 45 38 32 51 32 46 44

0 0 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 1 0 1 0 0 1 1 0 1 0 0 1 0 0 1 1 1 0 1 1 0 1 0 0 0 1

31 2 16 30 6 25 31 33 4 30 12 14 21 30 31 4 30 18 35 8 34 13 23 7 15 28 26 18 12 12 29 15 21 35 13 11 24 12 10 30 11 22 21

Copyright Cengage Learning. Powered by Cognero.

45 21 34 49 34 39 54 43 23 56 23 39 34 39 28 22 45 36 46 32 36 32 26 32 35 45 23 34 32 22 34 45 32 54 23 41 29 24 28 43 34 39 25

2 1 6 2 1 3 2 2 2 3 5 4 5 2 1 1 2 4 2 6 1 4 5 3 5 2 3 3 2 4 2 1 3 2 5 4 5 4 3 2 1 4 3

1 1 0 1 1 1 1 0 1 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 Page 54


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 41 54 26 33 45 63 55 49 64 26 42 48 64 52 41 40 40 55 49 46 59 51 45 39 55 63 53 39 50 27 29 61 57 56 35 25 57 47 60 32 27 39 26

1 1 1 0 1 1 0 1 1 0 1 1 1 0 0 0 0 1 0 0 1 0 0 1 1 0 1 0 1 0 0 0 1 1 1 0 0 0 1 0 1 0 0

15 29 3 8 20 30 31 25 35 3 19 22 34 28 16 14 16 32 27 21 32 26 23 17 32 35 27 16 25 4 6 30 27 34 6 3 27 25 35 8 4 18 1

Copyright Cengage Learning. Powered by Cognero.

28 54 24 26 34 54 49 34 54 19 23 43 45 32 34 34 26 37 39 29 56 23 43 26 37 43 24 43 56 32 23 45 34 54 45 32 36 54 33 45 32 39 22

2 4 2 3 4 2 3 5 2 1 1 3 2 4 3 5 4 2 3 1 2 3 4 4 5 2 1 6 3 1 2 1 2 4 5 2 2 4 3 2 1 4 1

1 1 0 1 0 0 1 1 0 0 1 1 0 0 1 0 0 1 0 0 0 1 1 1 1 0 1 1 0 1 1 0 1 0 0 1 0 1 0 1 0 1 0 Page 55


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 46 28 64 65 47 27 25 25 64 44 65 54 51 51 28 56 57 35 47 54 28 45 43

1 0 0 1 1 1 0 0 0 1 0 0 0 0 1 1 1 1 1 1 1 0 1

25 4 37 33 25 5 3 2 30 21 25 31 26 21 4 32 26 9 21 27 5 22 18

34 23 43 49 37 28 34 24 53 48 47 55 43 46 25 55 49 27 54 45 29 45 43

5 2 4 2 4 2 1 1 2 5 2 1 3 1 2 3 2 6 4 2 3 5 2

0 0 0 0 0 1 1 1 0 1 0 1 0 1 1 1 1 1 1 0 1 0 1

ANSWER: The overall error rate for the training, validation, and test sets is 26.67%, 42.22%, and 43.33%, respectively. The overall error rate is the lowest on the training data since a training set observation’s set of k nearest neighbors will always include itself, artificially lowering the error rate. For k = 2, the overall error rate on the validation data is biased since this overall error rate is the lowest error rate over all values of k. Thus, applying k = 2 on the test data will typically result in a larger (and more representative) overall error rate because we are not using the test data to find the best value of k. 52. A bank is interested in identifying different attributes of its customers, and below is the sample data of 150 customers. For Gender, 0 represents Male and 1 represents Female. For Personal loan, 0 represents a customer who has not taken a personal loan and 1 represents a customer who has taken a personal loan. Partition the data into training (50 percent), validation (30 percent), and test (20 percent) sets, setting the seed as 12345. Use appropriate software to classify the data, setting k-nearest neighbors with up to k = 10. Use Age, Gender, Work experience, Income (in $1000s), and Family size as input variables and Personal loan as the output variable. Be sure to Normalize input data and to Score on best k between 1 and specified value. Generate lift charts for both the validation data and test data. Examine the decile-wise lift chart on the test data. Identify and interpret the first decile lift.

Age 47 26 38

Gender 0 1 0

Work Income (in Personal experience $1000s) Family size loan 22 53 3 1 3 22 1 1 16 29 4 1

Copyright Cengage Learning. Powered by Cognero.

Page 56


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 37 44 55 44 30 63 34 52 55 52 63 51 41 37 46 30 48 50 56 35 39 48 51 27 57 33 58 46 32 56 35 47 50 57 38 52 56 47 54 25 40 61 29

0 0 1 1 1 0 1 0 0 0 1 1 0 0 0 1 1 1 0 1 0 0 1 1 1 1 0 0 0 1 1 1 0 1 0 1 0 1 0 0 1 1 1

12 22 30 23 5 35 8 26 25 28 29 30 18 14 23 6 25 22 31 9 13 22 21 3 32 12 33 21 6 28 12 23 23 25 15 24 31 24 31 2 16 30 6

Copyright Cengage Learning. Powered by Cognero.

32 32 45 50 22 56 23 29 34 45 23 32 21 43 23 18 34 21 24 23 29 34 39 26 49 39 32 45 23 45 28 38 32 32 25 22 19 34 45 21 34 49 34

6 3 7 2 2 2 4 1 2 3 3 4 5 4 3 3 2 1 4 3 5 6 2 1 2 3 2 3 5 3 4 1 3 4 5 2 3 4 2 1 6 2 1

1 0 0 0 0 0 0 1 1 1 1 0 1 1 1 1 0 1 0 1 1 0 1 1 1 1 0 1 0 1 1 1 0 1 1 1 1 0 1 1 0 1 1 Page 57


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 52 56 61 26 60 37 39 46 59 54 27 54 42 64 33 65 38 48 31 39 58 51 41 36 39 58 42 44 65 40 33 45 38 32 51 32 46 44 41 54 26 33 45

0 0 0 0 1 1 1 1 1 0 0 0 1 0 1 0 0 1 1 0 1 0 0 1 0 0 1 1 1 0 1 1 0 1 0 0 0 1 1 1 1 0 1

25 31 33 4 30 12 14 21 30 31 4 30 18 35 8 34 13 23 7 15 28 26 18 12 12 29 15 21 35 13 11 24 12 10 30 11 22 21 15 29 3 8 20

Copyright Cengage Learning. Powered by Cognero.

39 54 43 23 56 23 39 34 39 28 22 45 36 46 32 36 32 26 32 35 45 23 34 32 22 34 45 32 54 23 41 29 24 28 43 34 39 25 28 54 24 26 34

3 2 2 2 3 5 4 5 2 1 1 2 4 2 6 1 4 5 3 5 2 3 3 2 4 2 1 3 2 5 4 5 4 3 2 1 4 3 2 4 2 3 4

1 1 0 1 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 1 1 0 1 0 Page 58


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 63 55 49 64 26 42 48 64 52 41 40 40 55 49 46 59 51 45 39 55 63 53 39 50 27 29 61 57 56 35 25 57 47 60 32 27 39 26 46 28 64 65 47

1 0 1 1 0 1 1 1 0 0 0 0 1 0 0 1 0 0 1 1 0 1 0 1 0 0 0 1 1 1 0 0 0 1 0 1 0 0 1 0 0 1 1

30 31 25 35 3 19 22 34 28 16 14 16 32 27 21 32 26 23 17 32 35 27 16 25 4 6 30 27 34 6 3 27 25 35 8 4 18 1 25 4 37 33 25

Copyright Cengage Learning. Powered by Cognero.

54 49 34 54 19 23 43 45 32 34 34 26 37 39 29 56 23 43 26 37 43 24 43 56 32 23 45 34 54 45 32 36 54 33 45 32 39 22 34 23 43 49 37

2 3 5 2 1 1 3 2 4 3 5 4 2 3 1 2 3 4 4 5 2 1 6 3 1 2 1 2 4 5 2 2 4 3 2 1 4 1 5 2 4 2 4

0 1 1 0 0 1 1 0 0 1 0 0 1 0 0 0 1 1 1 1 0 1 1 0 1 1 0 1 0 0 1 0 1 0 1 0 1 0 0 0 0 0 0 Page 59


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 27 25 25 64 44 65 54 51 51 28 56 57 35 47 54 28 45 43

1 0 0 0 1 0 0 0 0 1 1 1 1 1 1 1 0 1

5 3 2 30 21 25 31 26 21 4 32 26 9 21 27 5 22 18

28 34 24 53 48 47 55 43 46 25 55 49 27 54 45 29 45 43

2 1 1 2 5 2 1 3 1 2 3 2 6 4 2 3 5 2

1 1 1 0 1 0 1 0 1 1 1 1 1 1 0 1 0 1

ANSWER: The first decile lift is 1.14. For this test data set of 30 observations and 21 actual customers who have taken the personal loan, if we randomly select 3 customers, on average 2.1 of them would have taken the personal loan. However, if we use k-NN with k = 2 to identify the top 3 observations most likely to have a personal loan, then (2.1)(1.14) ≈ 2.4 of them would have taken the personal loan. This can be confirmed from the Detailed Scoring report on the test data by observing that there are 5 observations with a predicted probability of 100% of taking a personal loan, but only 4 of these actually took a loan. Thus, of the top 3 observations recommended by k-NN with k = 2, there would be on average (3)(4/5) = 2.4 that actually took the loan. 53. A bank is interested in identifying different attributes of its customers, and below is the sample data of 150 customers. For Gender, 0 represents Male and 1 represents Female. For Personal loan, 0 represents a customer who has not taken a personal loan and 1 represents a customer who has taken a personal loan. Partition the data into training (50 percent), validation (30 percent), and test (20 percent) sets, setting the seed as 12345. Use appropriate software to classify the data, setting k-nearest neighbors with up to k = 10. Use Age, Gender, Work experience, Income (in $1000s), and Family size as input variables and Personal loan as the output variable. Be sure to Normalize input data and to Score on best k between 1 and specified value. Generate lift charts for both the validation data and test data. For cutoff probability values of 0.5, 0.4, 0.3, and 0.2, what are the corresponding Class 1 error rates and Class 0 error rates on the validation data?

Age 47 26 38 37 44 55

Gender 0 1 0 0 0 1

Work Income (in Personal experience $1000s) Family size loan 22 53 3 1 3 22 1 1 16 29 4 1 12 32 6 1 22 32 3 0 30 45 7 0

Copyright Cengage Learning. Powered by Cognero.

Page 60


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 44 30 63 34 52 55 52 63 51 41 37 46 30 48 50 56 35 39 48 51 27 57 33 58 46 32 56 35 47 50 57 38 52 56 47 54 25 40 61 29 52 56 61

1 1 0 1 0 0 0 1 1 0 0 0 1 1 1 0 1 0 0 1 1 1 1 0 0 0 1 1 1 0 1 0 1 0 1 0 0 1 1 1 0 0 0

23 5 35 8 26 25 28 29 30 18 14 23 6 25 22 31 9 13 22 21 3 32 12 33 21 6 28 12 23 23 25 15 24 31 24 31 2 16 30 6 25 31 33

Copyright Cengage Learning. Powered by Cognero.

50 22 56 23 29 34 45 23 32 21 43 23 18 34 21 24 23 29 34 39 26 49 39 32 45 23 45 28 38 32 32 25 22 19 34 45 21 34 49 34 39 54 43

2 2 2 4 1 2 3 3 4 5 4 3 3 2 1 4 3 5 6 2 1 2 3 2 3 5 3 4 1 3 4 5 2 3 4 2 1 6 2 1 3 2 2

0 0 0 0 1 1 1 1 0 1 1 1 1 0 1 0 1 1 0 1 1 1 1 0 1 0 1 1 1 0 1 1 1 1 0 1 1 0 1 1 1 1 0 Page 61


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 26 60 37 39 46 59 54 27 54 42 64 33 65 38 48 31 39 58 51 41 36 39 58 42 44 65 40 33 45 38 32 51 32 46 44 41 54 26 33 45 63 55 49

0 1 1 1 1 1 0 0 0 1 0 1 0 0 1 1 0 1 0 0 1 0 0 1 1 1 0 1 1 0 1 0 0 0 1 1 1 1 0 1 1 0 1

4 30 12 14 21 30 31 4 30 18 35 8 34 13 23 7 15 28 26 18 12 12 29 15 21 35 13 11 24 12 10 30 11 22 21 15 29 3 8 20 30 31 25

Copyright Cengage Learning. Powered by Cognero.

23 56 23 39 34 39 28 22 45 36 46 32 36 32 26 32 35 45 23 34 32 22 34 45 32 54 23 41 29 24 28 43 34 39 25 28 54 24 26 34 54 49 34

2 3 5 4 5 2 1 1 2 4 2 6 1 4 5 3 5 2 3 3 2 4 2 1 3 2 5 4 5 4 3 2 1 4 3 2 4 2 3 4 2 3 5

1 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 1 1 0 1 0 0 1 1 Page 62


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 64 26 42 48 64 52 41 40 40 55 49 46 59 51 45 39 55 63 53 39 50 27 29 61 57 56 35 25 57 47 60 32 27 39 26 46 28 64 65 47 27 25 25

1 0 1 1 1 0 0 0 0 1 0 0 1 0 0 1 1 0 1 0 1 0 0 0 1 1 1 0 0 0 1 0 1 0 0 1 0 0 1 1 1 0 0

35 3 19 22 34 28 16 14 16 32 27 21 32 26 23 17 32 35 27 16 25 4 6 30 27 34 6 3 27 25 35 8 4 18 1 25 4 37 33 25 5 3 2

Copyright Cengage Learning. Powered by Cognero.

54 19 23 43 45 32 34 34 26 37 39 29 56 23 43 26 37 43 24 43 56 32 23 45 34 54 45 32 36 54 33 45 32 39 22 34 23 43 49 37 28 34 24

2 1 1 3 2 4 3 5 4 2 3 1 2 3 4 4 5 2 1 6 3 1 2 1 2 4 5 2 2 4 3 2 1 4 1 5 2 4 2 4 2 1 1

0 0 1 1 0 0 1 0 0 1 0 0 0 1 1 1 1 0 1 1 0 1 1 0 1 0 0 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 1 Page 63


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 64 44 65 54 51 51 28 56 57 35 47 54 28 45 43

0 1 0 0 0 0 1 1 1 1 1 1 1 0 1

30 21 25 31 26 21 4 32 26 9 21 27 5 22 18

53 48 47 55 43 46 25 55 49 27 54 45 29 45 43

2 5 2 1 3 1 2 3 2 6 4 2 3 5 2

0 1 0 1 0 1 1 1 1 1 1 0 1 0 1

ANSWER: For cutoff probability values of 0.5, 0.4, 0.3, and 0.2, the corresponding Class 1 error rates and Class 0 error rates on the validation data are as below: (Sheet Bank1 - Image3) 54. A bank is interested in identifying different attributes of its customers, and below is the sample data of 150 customers. For Gender, 0 represents Male and 1 represents Female. For Personal loan, 0 represents a customer who has not taken a personal loan and 1 represents a customer who has taken a personal loan. Partition the data into training (50 percent), validation (30 percent), and test (20 percent) sets. Using appropriate software, fit a classification tree using Age, Gender, Work experience, Income (in $1000s), and Family size as input variables and Personal loan as the output variable. Be sure to Normalize input data and to set the Minimum #records in a terminal node to 1. Set the maximum number of levels to seven. Generate the Full tree, Best pruned tree, and Minimum error tree. Generate lift charts for both the validation data and the test data. Using the Table below, interpret the set of rules implied by the best pruned tree that characterize the customers who have taken personal loan.

Age 47 26 38 37 44 55 44 30 63 34 52 55 52

Gender 0 1 0 0 0 1 1 1 0 1 0 0 0

Work Income (in Personal experience $1000s) Family size loan 22 53 3 1 3 22 1 1 16 29 4 1 12 32 6 1 22 32 3 0 30 45 7 0 23 50 2 0 5 22 2 0 35 56 2 0 8 23 4 0 26 29 1 1 25 34 2 1 28 45 3 1

Copyright Cengage Learning. Powered by Cognero.

Page 64


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 63 51 41 37 46 30 48 50 56 35 39 48 51 27 57 33 58 46 32 56 35 47 50 57 38 52 56 47 54 25 40 61 29 52 56 61 26 60 37 39 46 59 54

1 1 0 0 0 1 1 1 0 1 0 0 1 1 1 1 0 0 0 1 1 1 0 1 0 1 0 1 0 0 1 1 1 0 0 0 0 1 1 1 1 1 0

29 30 18 14 23 6 25 22 31 9 13 22 21 3 32 12 33 21 6 28 12 23 23 25 15 24 31 24 31 2 16 30 6 25 31 33 4 30 12 14 21 30 31

Copyright Cengage Learning. Powered by Cognero.

23 32 21 43 23 18 34 21 24 23 29 34 39 26 49 39 32 45 23 45 28 38 32 32 25 22 19 34 45 21 34 49 34 39 54 43 23 56 23 39 34 39 28

3 4 5 4 3 3 2 1 4 3 5 6 2 1 2 3 2 3 5 3 4 1 3 4 5 2 3 4 2 1 6 2 1 3 2 2 2 3 5 4 5 2 1

1 0 1 1 1 1 0 1 0 1 1 0 1 1 1 1 0 1 0 1 1 1 0 1 1 1 1 0 1 1 0 1 1 1 1 0 1 0 0 0 0 0 1 Page 65


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 27 54 42 64 33 65 38 48 31 39 58 51 41 36 39 58 42 44 65 40 33 45 38 32 51 32 46 44 41 54 26 33 45 63 55 49 64 26 42 48 64 52 41

0 0 1 0 1 0 0 1 1 0 1 0 0 1 0 0 1 1 1 0 1 1 0 1 0 0 0 1 1 1 1 0 1 1 0 1 1 0 1 1 1 0 0

4 30 18 35 8 34 13 23 7 15 28 26 18 12 12 29 15 21 35 13 11 24 12 10 30 11 22 21 15 29 3 8 20 30 31 25 35 3 19 22 34 28 16

Copyright Cengage Learning. Powered by Cognero.

22 45 36 46 32 36 32 26 32 35 45 23 34 32 22 34 45 32 54 23 41 29 24 28 43 34 39 25 28 54 24 26 34 54 49 34 54 19 23 43 45 32 34

1 2 4 2 6 1 4 5 3 5 2 3 3 2 4 2 1 3 2 5 4 5 4 3 2 1 4 3 2 4 2 3 4 2 3 5 2 1 1 3 2 4 3

1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 1 1 0 1 0 0 1 1 0 0 1 1 0 0 1 Page 66


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 40 40 55 49 46 59 51 45 39 55 63 53 39 50 27 29 61 57 56 35 25 57 47 60 32 27 39 26 46 28 64 65 47 27 25 25 64 44 65 54 51 51 28

0 0 1 0 0 1 0 0 1 1 0 1 0 1 0 0 0 1 1 1 0 0 0 1 0 1 0 0 1 0 0 1 1 1 0 0 0 1 0 0 0 0 1

14 16 32 27 21 32 26 23 17 32 35 27 16 25 4 6 30 27 34 6 3 27 25 35 8 4 18 1 25 4 37 33 25 5 3 2 30 21 25 31 26 21 4

Copyright Cengage Learning. Powered by Cognero.

34 26 37 39 29 56 23 43 26 37 43 24 43 56 32 23 45 34 54 45 32 36 54 33 45 32 39 22 34 23 43 49 37 28 34 24 53 48 47 55 43 46 25

5 4 2 3 1 2 3 4 4 5 2 1 6 3 1 2 1 2 4 5 2 2 4 3 2 1 4 1 5 2 4 2 4 2 1 1 2 5 2 1 3 1 2

0 0 1 0 0 0 1 1 1 1 0 1 1 0 1 1 0 1 0 0 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 1 0 1 0 1 0 1 1 Page 67


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 56 57 35 47 54 28 45 43

1 1 1 1 1 1 0 1

32 26 9 21 27 5 22 18

55 49 27 54 45 29 45 43

3 2 6 4 2 3 5 2

1 1 1 1 0 1 0 1

ANSWER: The rules of the best pruned tree can be distilled to characterize a customer who has taken the personal loan as: 1. Age < 39.5 years, Female, and Income > $24,000 or 2. Age between 46.5 and 57.5 years and Family size < 3 or 3. Age between 46.5 and 57.5 years, Family size > 2, and Income > $49,000 55. A bank is interested in identifying different attributes of its customers, and below is the sample data of 150 customers. For Gender, 0 represents Male and 1 represents Female. For Personal loan, 0 represents a customer who has not taken a personal loan and 1 represents a customer who has taken a personal loan. Partition the data into training (50 percent), validation (30 percent), and test (20 percent) sets. Using appropriate software, fit a classification tree using Age, Gender, Work experience, Income (in $1000s), and Family size as input variables and Personal loan as the output variable. Be sure to Normalize input data and to set the Minimum #records in a terminal node to 1. Set the maximum number of levels to seven. Generate the Full tree, Best pruned tree, and Minimum error tree. Generate lift charts for both the validation data and the test data. For the default cutoff value of 0.5, what is the overall error rate, Class 1 error rate, and Class 0 error rate of the best pruned tree on the test data? Interpret these respective measures.

Age 47 26 38 37 44 55 44 30 63 34 52 55 52 63 51 41

Gender 0 1 0 0 0 1 1 1 0 1 0 0 0 1 1 0

Work Income (in Personal experience $1000s) Family size loan 22 53 3 1 3 22 1 1 16 29 4 1 12 32 6 1 22 32 3 0 30 45 7 0 23 50 2 0 5 22 2 0 35 56 2 0 8 23 4 0 26 29 1 1 25 34 2 1 28 45 3 1 29 23 3 1 30 32 4 0 18 21 5 1

Copyright Cengage Learning. Powered by Cognero.

Page 68


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 37 46 30 48 50 56 35 39 48 51 27 57 33 58 46 32 56 35 47 50 57 38 52 56 47 54 25 40 61 29 52 56 61 26 60 37 39 46 59 54 27 54 42

0 0 1 1 1 0 1 0 0 1 1 1 1 0 0 0 1 1 1 0 1 0 1 0 1 0 0 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 1

14 23 6 25 22 31 9 13 22 21 3 32 12 33 21 6 28 12 23 23 25 15 24 31 24 31 2 16 30 6 25 31 33 4 30 12 14 21 30 31 4 30 18

Copyright Cengage Learning. Powered by Cognero.

43 23 18 34 21 24 23 29 34 39 26 49 39 32 45 23 45 28 38 32 32 25 22 19 34 45 21 34 49 34 39 54 43 23 56 23 39 34 39 28 22 45 36

4 3 3 2 1 4 3 5 6 2 1 2 3 2 3 5 3 4 1 3 4 5 2 3 4 2 1 6 2 1 3 2 2 2 3 5 4 5 2 1 1 2 4

1 1 1 0 1 0 1 1 0 1 1 1 1 0 1 0 1 1 1 0 1 1 1 1 0 1 1 0 1 1 1 1 0 1 0 0 0 0 0 1 1 0 0 Page 69


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 64 33 65 38 48 31 39 58 51 41 36 39 58 42 44 65 40 33 45 38 32 51 32 46 44 41 54 26 33 45 63 55 49 64 26 42 48 64 52 41 40 40 55

0 1 0 0 1 1 0 1 0 0 1 0 0 1 1 1 0 1 1 0 1 0 0 0 1 1 1 1 0 1 1 0 1 1 0 1 1 1 0 0 0 0 1

35 8 34 13 23 7 15 28 26 18 12 12 29 15 21 35 13 11 24 12 10 30 11 22 21 15 29 3 8 20 30 31 25 35 3 19 22 34 28 16 14 16 32

Copyright Cengage Learning. Powered by Cognero.

46 32 36 32 26 32 35 45 23 34 32 22 34 45 32 54 23 41 29 24 28 43 34 39 25 28 54 24 26 34 54 49 34 54 19 23 43 45 32 34 34 26 37

2 6 1 4 5 3 5 2 3 3 2 4 2 1 3 2 5 4 5 4 3 2 1 4 3 2 4 2 3 4 2 3 5 2 1 1 3 2 4 3 5 4 2

0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 1 1 0 1 0 0 1 1 0 0 1 1 0 0 1 0 0 1 Page 70


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 49 46 59 51 45 39 55 63 53 39 50 27 29 61 57 56 35 25 57 47 60 32 27 39 26 46 28 64 65 47 27 25 25 64 44 65 54 51 51 28 56 57 35

0 0 1 0 0 1 1 0 1 0 1 0 0 0 1 1 1 0 0 0 1 0 1 0 0 1 0 0 1 1 1 0 0 0 1 0 0 0 0 1 1 1 1

27 21 32 26 23 17 32 35 27 16 25 4 6 30 27 34 6 3 27 25 35 8 4 18 1 25 4 37 33 25 5 3 2 30 21 25 31 26 21 4 32 26 9

Copyright Cengage Learning. Powered by Cognero.

39 29 56 23 43 26 37 43 24 43 56 32 23 45 34 54 45 32 36 54 33 45 32 39 22 34 23 43 49 37 28 34 24 53 48 47 55 43 46 25 55 49 27

3 1 2 3 4 4 5 2 1 6 3 1 2 1 2 4 5 2 2 4 3 2 1 4 1 5 2 4 2 4 2 1 1 2 5 2 1 3 1 2 3 2 6

0 0 0 1 1 1 1 0 1 1 0 1 1 0 1 0 0 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 1 0 1 0 1 0 1 1 1 1 1 Page 71


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 47 54 28 45 43

1 1 1 0 1

21 27 5 22 18

54 45 29 45 43

4 2 3 5 2

1 0 1 0 1

ANSWER: For the default cutoff value of 0.5 on the best pruned tree on the test data, the overall error rate is 46.67%, the class 1 error rate is 61.90%, and the class 0 error rate is 11.11%. That is, the best pruned tree classifies a randomly-selected observation in the test data correctly 46.67% of the time. For a randomly-selected observation who has taken a personal loan, the best pruned tree will correctly classify it 61.90% of the time. For a randomly-selected observation who has not taken a personal loan, the best pruned tree will correctly classify it only 11.11% of the time. 56. A bank is interested in identifying different attributes of its customers, and below is the sample data of 150 customers. For Gender, 0 represents Male and 1 represents Female. For Personal loan, 0 represents a customer who has not taken a personal loan and 1 represents a customer who has taken a personal loan. Partition the data into training (50 percent), validation (30 percent), and test (20 percent) sets. In XLMiner, fit a classification tree using Age, Gender, Work experience, Income (in $1000s), and Family size as input variables and Personal loan as the output variable. Be sure to Normalize input data and to set the Minimum #records in a terminal node to 1. Set the maximum number of levels to seven. Generate the Full tree, Best pruned tree, and Minimum error tree. Generate lift charts for both the validation data and the test data. Examine the decile-wise lift chart for the best pruned tree on the test data. What is the first decile lift? Interpret this value. Age 47 26 38 37 44 55 44 30 63 34 52 55 52 63 51 41 37 46 30 48

Gender 0 1 0 0 0 1 1 1 0 1 0 0 0 1 1 0 0 0 1 1

Work Income (in Personal experience $1000s) Family size loan 22 53 3 1 3 22 1 1 16 29 4 1 12 32 6 1 22 32 3 0 30 45 7 0 23 50 2 0 5 22 2 0 35 56 2 0 8 23 4 0 26 29 1 1 25 34 2 1 28 45 3 1 29 23 3 1 30 32 4 0 18 21 5 1 14 43 4 1 23 23 3 1 6 18 3 1 25 34 2 0

Copyright Cengage Learning. Powered by Cognero.

Page 72


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 50 56 35 39 48 51 27 57 33 58 46 32 56 35 47 50 57 38 52 56 47 54 25 40 61 29 52 56 61 26 60 37 39 46 59 54 27 54 42 64 33 65 38

1 0 1 0 0 1 1 1 1 0 0 0 1 1 1 0 1 0 1 0 1 0 0 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 1 0 1 0 0

22 31 9 13 22 21 3 32 12 33 21 6 28 12 23 23 25 15 24 31 24 31 2 16 30 6 25 31 33 4 30 12 14 21 30 31 4 30 18 35 8 34 13

Copyright Cengage Learning. Powered by Cognero.

21 24 23 29 34 39 26 49 39 32 45 23 45 28 38 32 32 25 22 19 34 45 21 34 49 34 39 54 43 23 56 23 39 34 39 28 22 45 36 46 32 36 32

1 4 3 5 6 2 1 2 3 2 3 5 3 4 1 3 4 5 2 3 4 2 1 6 2 1 3 2 2 2 3 5 4 5 2 1 1 2 4 2 6 1 4

1 0 1 1 0 1 1 1 1 0 1 0 1 1 1 0 1 1 1 1 0 1 1 0 1 1 1 1 0 1 0 0 0 0 0 1 1 0 0 0 1 0 0 Page 73


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 48 31 39 58 51 41 36 39 58 42 44 65 40 33 45 38 32 51 32 46 44 41 54 26 33 45 63 55 49 64 26 42 48 64 52 41 40 40 55 49 46 59 51

1 1 0 1 0 0 1 0 0 1 1 1 0 1 1 0 1 0 0 0 1 1 1 1 0 1 1 0 1 1 0 1 1 1 0 0 0 0 1 0 0 1 0

23 7 15 28 26 18 12 12 29 15 21 35 13 11 24 12 10 30 11 22 21 15 29 3 8 20 30 31 25 35 3 19 22 34 28 16 14 16 32 27 21 32 26

Copyright Cengage Learning. Powered by Cognero.

26 32 35 45 23 34 32 22 34 45 32 54 23 41 29 24 28 43 34 39 25 28 54 24 26 34 54 49 34 54 19 23 43 45 32 34 34 26 37 39 29 56 23

5 3 5 2 3 3 2 4 2 1 3 2 5 4 5 4 3 2 1 4 3 2 4 2 3 4 2 3 5 2 1 1 3 2 4 3 5 4 2 3 1 2 3

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 1 1 0 1 0 0 1 1 0 0 1 1 0 0 1 0 0 1 0 0 0 1 Page 74


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 45 39 55 63 53 39 50 27 29 61 57 56 35 25 57 47 60 32 27 39 26 46 28 64 65 47 27 25 25 64 44 65 54 51 51 28 56 57 35 47 54 28 45

0 1 1 0 1 0 1 0 0 0 1 1 1 0 0 0 1 0 1 0 0 1 0 0 1 1 1 0 0 0 1 0 0 0 0 1 1 1 1 1 1 1 0

23 17 32 35 27 16 25 4 6 30 27 34 6 3 27 25 35 8 4 18 1 25 4 37 33 25 5 3 2 30 21 25 31 26 21 4 32 26 9 21 27 5 22

Copyright Cengage Learning. Powered by Cognero.

43 26 37 43 24 43 56 32 23 45 34 54 45 32 36 54 33 45 32 39 22 34 23 43 49 37 28 34 24 53 48 47 55 43 46 25 55 49 27 54 45 29 45

4 4 5 2 1 6 3 1 2 1 2 4 5 2 2 4 3 2 1 4 1 5 2 4 2 4 2 1 1 2 5 2 1 3 1 2 3 2 6 4 2 3 5

1 1 1 0 1 1 0 1 1 0 1 0 0 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 1 0 1 0 1 0 1 1 1 1 1 1 0 1 0 Page 75


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 43

1

18

43

2

1

ANSWER: The first decile lift of the best pruned tree on the test data is 1.22. For this test data set of 30 observations and 21 customers who have actually taken a personal loan, if we randomly select 3 customers, on average 2.1 of them would have taken a personal loan. However, if we use the best pruned tree to identify the top 3 observations most likely to have a personal loan, then (1.22)(2.1) ≈ 2.57 of them would have taken a personal loan. This can be confirmed from the Detailed Scoring report by observing that the best pruned tree rates 7 observations to have a 100% probability of taking a loan, but only 6 of these actually took a loan. Therefore, out of the top 3 recommendations from the best pruned tree, only (3)(6/7) = 2.57 observations on average will have taken a loan. 57. A bank is interested in identifying different attributes of its customers who default on their loans, and below is the sample data of 5,000 customers. In the data table for the dummy variable loan in default status, 0 represents not in default and 1 represents in default. Partition the data into training (50 percent), validation (30 percent), and test (20 percent) sets. Use logistic regression to classify observations as loan defaults (yes or no) using annual income ($1000s), household size, years of post-high school education, work experience as input variables and Default status as the output variable. Perform an exhaustive-search best subset selection with the number of best subsets equal to 2. Partition the data into training (50 percent), validation (30 percent), and test (20 percent) sets. Use logistic regression to classify observations as Personal loan taken (or not taken) using Age, Gender, Work experience, Income (in $1000s), and Family size as input variables and Personal loan as the output variable. Perform an exhaustive-search best subset selection with the number of best subsets equal to 2. From the generated set of logistic regression models, select one that you believe is a good fit. Express the model as a mathematical equation relating the output variable to the input variables.

Annual Income ($1000) 21.8 65.5 54.2 73.7 110.4 22.1 39.6 90.6 38.7 60.5 104.3 44.5 67.1 72.3 114.8 96.0 90.2

Years of PostHousehold High School Work Size Education Experience Loan Default 4 5 29 1 7 3 46 0 3 2 18 1 6 0 44 0 7 5 39 0 8 3 39 1 5 4 40 1 8 5 27 0 1 4 15 0 3 1 3 0 4 5 58 0 3 5 19 0 6 1 33 0 3 3 27 0 5 3 23 0 6 2 36 0 4 0 59 0

Copyright Cengage Learning. Powered by Cognero.

Page 76


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 54.6 97.4 45.9 61.8 38.6 68.9 73.4 22.6 50.5 81.6 37.3 66.1 24.3 32.5 65.7 51.4 101.8 31.8 120.9 102.8 70.4 43.2 33.6 32.9 97.9 101.7 77.2 86.2 111.9 59.8 74.5 56.8 39.9 96.6 53.5 126.9 49.9 52.0 114.8 126.9 60.3

4 4 7 4 2 3 7 6 7 5 1 4 5 8 8 1 8 4 8 7 5 6 6 3 5 3 5 5 6 6 6 1 6 4 1 5 7 6 1 6 5

0 4 3 4 3 3 1 0 5 1 4 3 3 4 1 3 5 4 1 3 2 0 2 2 3 2 4 4 0 3 2 3 2 2 0 0 4 0 1 2 4

Copyright Cengage Learning. Powered by Cognero.

6 30 46 53 57 27 9 17 20 32 18 38 46 14 16 18 22 17 16 53 20 11 21 56 21 15 59 54 56 23 38 46 42 7 14 43 23 38 16 5 3

0 0 1 0 1 0 0 0 1 0 1 0 1 0 0 1 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 0 0 Page 77


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 42.3 99.5 34.9 71.5 51.6 90.7 16.4 47.0 58.7 81.4 77.8 99.2 27.6 82.3 118.3 67.5 78.6 36.9 30.4 24.1 49.2 76.4 15.9 59.6 105.7 56.3 74.7 30.7 103.8 60.1 23.3 77.3 69.7 106.8 104.1 80.3 89.9 82.6 45.5 45.7 76.1

4 2 4 3 8 4 5 7 4 6 6 1 3 5 6 7 4 6 1 3 2 5 3 1 3 7 7 1 5 7 6 3 2 6 2 2 2 2 2 7 5

3 4 3 0 2 0 2 4 1 4 5 2 1 5 4 2 5 2 3 2 0 1 4 2 4 0 4 3 1 3 4 5 2 3 4 4 3 3 4 1 5

Copyright Cengage Learning. Powered by Cognero.

10 20 54 34 21 60 27 8 46 49 39 4 16 34 15 10 50 17 13 41 29 12 49 26 56 18 40 36 10 30 34 50 9 59 27 17 43 6 11 2 47

0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 Page 78


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 91.3 105.8 43.0 66.7 48.1 29.7 98.7 119.5 111.9 30.5 49.9 54.7 116.4 68.4 28.3 71.7 69.8 78.3 85.2 22.6 43.1 109.6 56.7 125.5 52.0 40.0 46.0 101.3 101.1 83.1 69.2 47.4 82.0 76.1 63.9 87.8 67.1 30.5 69.6 31.9 36.2

4 4 8 5 2 1 2 5 3 6 6 2 8 5 5 7 6 8 7 3 7 4 5 1 3 7 5 3 4 7 1 8 7 8 5 3 2 4 4 6 3

2 5 3 0 3 4 3 3 5 2 0 3 2 1 4 1 1 1 1 4 5 1 4 0 1 2 2 4 3 0 1 4 2 4 5 2 1 2 3 3 5

Copyright Cengage Learning. Powered by Cognero.

14 25 59 36 22 41 57 24 51 49 36 40 59 55 46 57 45 5 32 35 4 1 20 21 54 39 3 38 11 6 5 33 57 1 57 58 19 53 46 53 21

0 0 1 0 1 1 0 0 0 1 0 1 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 1 1 Page 79


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 96.3 50.3 97.7 70.6 71.9 22.9 68.9 30.0 116.9 82.0 29.9 117.5 66.5 60.0 50.8 77.8 100.2 63.8 46.4 36.5 74.9 80.1 111.1 70.6 42.9 102.5 46.3 50.5 31.4 64.7 38.4 100.0 44.3 47.8 15.0 83.5 75.4 95.3 104.9 92.2 54.0

4 2 7 5 3 7 2 8 1 2 7 3 3 5 4 6 2 7 7 2 8 5 2 5 2 2 1 4 5 5 1 6 5 2 8 6 1 1 6 2 6

4 5 5 2 4 0 5 4 5 4 5 3 0 2 3 4 0 4 4 1 5 4 3 5 1 4 1 0 4 4 3 5 2 1 1 4 4 5 3 5 2

Copyright Cengage Learning. Powered by Cognero.

33 44 16 51 46 43 50 47 7 33 5 32 37 7 23 39 15 42 20 1 15 35 17 23 45 52 57 49 4 59 1 57 40 40 27 18 2 18 46 5 50

0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 Page 80


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 87.7 34.6 41.2 60.0 102.8 41.3 45.8 45.1 91.2 81.8 38.1 62.4 36.1 43.8 67.2 47.6 50.1 116.9 74.4 109.1 108.1 94.5 65.0 70.1 99.5 46.4 53.6 93.5 112.5 96.5 98.8 45.6 104.5 33.2 36.8 32.0 87.2 63.5 60.2 76.6 20.8

6 4 8 6 6 2 3 2 3 6 1 6 5 7 4 6 2 4 5 2 7 7 3 5 4 3 2 7 5 5 1 4 2 4 3 6 5 5 7 4 8

0 4 2 2 5 4 0 4 4 4 2 1 3 3 4 5 0 1 4 4 0 2 4 1 2 2 4 1 4 2 5 4 4 3 2 5 2 2 2 1 1

Copyright Cengage Learning. Powered by Cognero.

53 51 54 52 47 47 48 58 26 35 8 48 31 40 55 26 51 39 28 48 49 39 47 7 16 18 18 6 48 35 51 24 28 9 60 14 2 13 59 12 22

0 1 1 1 0 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 Page 81


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 92.1 73.6 28.4 78.7 35.7 87.8 106.3 77.3 39.9 93.6 75.6 105.3 104.2 87.4 54.8 89.5 71.5 107.1 60.0 37.9 59.7 84.2 104.2 54.3 116.4 54.9 41.5 88.0 67.7 22.8 61.1 85.9 80.5 42.8 53.0 67.8 46.6 28.6 85.4 63.8 84.5

1 7 4 5 7 4 3 7 7 4 5 4 3 7 3 3 8 7 6 6 1 7 3 2 5 4 6 3 1 5 6 6 2 7 5 7 6 5 6 8 3

3 3 4 3 2 3 1 2 5 4 5 4 4 4 5 2 3 2 1 5 3 2 4 3 3 2 4 4 1 5 5 1 5 4 4 5 4 3 4 1 3

Copyright Cengage Learning. Powered by Cognero.

9 50 28 25 9 8 7 29 52 30 31 42 10 7 1 41 19 31 10 6 53 38 29 3 37 13 36 30 54 15 9 48 56 56 37 12 36 17 44 23 24

0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 1 1 0 0 0 Page 82


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 53.4 14.3 85.7 51.7 47.5 97.2 48.4 92.7 57.3 74.7 96.0 55.0 131.7 42.0 25.2 47.3 51.9 41.1 26.9 116.8 66.2 127.1 42.9 40.3 105.5 44.2 83.5 45.8 100.4 106.0 62.3 114.3 63.9 62.7 84.3 48.2 72.5 38.9 52.2 42.9 104.5

6 5 3 1 4 1 4 5 5 8 1 5 4 7 4 5 8 3 6 3 4 3 6 4 3 6 1 2 7 8 5 6 7 3 2 4 2 5 6 3 3

3 4 1 1 4 0 4 1 2 4 4 1 3 4 0 3 1 3 0 5 1 2 5 2 2 4 2 2 3 2 0 4 2 4 2 5 1 0 4 4 1

Copyright Cengage Learning. Powered by Cognero.

31 28 31 51 30 3 33 4 28 57 54 51 45 11 38 48 0 14 20 54 9 56 41 16 29 15 29 54 43 19 48 20 11 35 55 1 53 10 17 18 51

1 1 0 0 1 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 Page 83


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 85.5 98.2 64.5 107.8 85.7 73.8 90.0 75.5 51.4 107.7 18.3 59.4 30.4 54.4 103.2 34.1 88.8 53.5 22.4 71.0 55.3 68.5 99.7 48.9 107.3 42.1 42.0 81.6 64.0 45.9 79.6 100.2 105.9 69.3 68.5 51.1 55.8 66.2 103.1 55.3 85.0

6 7 2 2 1 8 6 4 3 7 4 2 2 4 5 3 4 5 5 7 5 7 5 7 6 7 2 8 2 7 8 1 6 4 5 6 7 6 7 4 2

1 2 4 5 5 1 0 2 4 0 3 4 0 3 1 4 5 4 2 1 3 2 4 1 0 3 4 4 2 2 3 4 2 2 3 1 1 4 3 2 1

Copyright Cengage Learning. Powered by Cognero.

17 23 41 46 13 54 10 9 13 32 56 14 20 22 1 23 10 21 60 9 36 28 49 25 37 19 57 59 25 43 3 20 59 3 56 51 29 52 53 12 5

0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 1 1 0 1 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 Page 84


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 53.7 90.1 29.2 92.3 72.1 73.8 48.4 45.0 77.2 55.2 117.2 46.6 8.0 108.0 56.1 95.3 53.2 38.9 78.4 41.0 99.9 90.0 79.3 24.6 75.1 61.6 92.2 120.7 48.7 97.4 44.5 94.6 9.9 79.3 98.1 51.9 89.2 70.2 27.5 106.7 98.4

6 3 6 3 4 7 4 5 3 5 7 7 8 4 7 5 2 8 7 3 5 5 3 4 2 7 6 5 3 7 8 6 3 7 6 7 8 3 3 1 5

0 2 2 1 1 5 4 1 1 3 3 3 4 1 2 4 1 2 2 4 3 5 1 2 0 4 0 3 1 1 3 1 3 3 2 2 1 1 4 2 1

Copyright Cengage Learning. Powered by Cognero.

40 52 53 8 14 41 38 52 58 14 36 56 43 29 28 36 18 2 22 25 40 10 40 51 22 0 12 42 51 7 34 7 41 38 9 45 57 39 8 51 25

0 0 1 0 0 0 1 0 0 0 0 1 1 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 Page 85


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 104.1 45.2 82.7 46.8 27.0 83.1 17.5 42.9 71.8 80.1 58.8 84.3 71.8 50.4 59.6 47.6 66.3 98.0 115.8 65.2 22.4 71.5 82.5 93.0 25.0 85.1 18.2 25.6 91.5 90.9 53.0 20.3 60.0 89.3 40.1 64.0 62.5 42.3 85.9 54.9 43.1

3 2 4 7 7 7 4 3 7 7 5 6 4 4 8 8 7 4 2 4 3 6 4 6 3 5 3 6 6 3 6 3 5 7 7 4 6 2 6 8 5

4 2 3 4 2 0 4 2 0 3 4 3 3 2 1 2 3 4 2 1 4 0 3 0 1 2 4 2 3 1 4 1 5 4 3 3 0 0 1 4 5

Copyright Cengage Learning. Powered by Cognero.

52 12 41 54 42 41 33 37 51 23 41 47 7 14 6 32 6 53 9 35 49 33 40 26 14 45 13 49 15 54 25 29 15 8 18 35 35 54 30 34 59

0 0 0 1 1 0 1 1 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 1 1 Page 86


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 36.1 34.7 44.1 94.4 33.0 55.9 86.5 55.1 39.5 86.6 42.6 69.3 31.8 72.5 52.4 15.0 52.0 34.4 117.3 71.5 106.1 114.4 29.8 47.2 36.9 80.2 87.8 83.8 75.3 42.7 25.8 71.7 91.5 73.1 31.6 55.7 102.7 39.3 72.0 53.8 46.4

7 7 7 8 8 4 6 7 4 3 5 5 6 1 6 5 3 7 4 4 1 3 3 5 2 4 1 2 2 3 2 4 3 6 3 8 8 7 3 4 5

3 3 4 1 4 1 4 4 3 2 5 0 1 0 1 2 2 2 3 3 2 0 1 3 1 1 3 1 4 1 3 5 4 2 3 3 2 3 1 4 0

Copyright Cengage Learning. Powered by Cognero.

2 13 14 57 26 7 59 14 44 24 24 19 42 32 8 52 45 8 35 2 35 0 31 23 22 30 28 48 18 45 35 4 13 22 4 33 41 59 35 44 47

0 0 0 0 1 0 0 0 1 0 1 1 1 0 0 1 1 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 1 0 1 0 1 0 Page 87


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 50.5 93.0 50.1 44.6 54.7 113.4 41.6 55.4 29.5 104.7 93.5 31.5 107.1 88.6 45.2 72.7 40.0 80.0 126.8 78.8 67.9 47.1 43.1 76.1 40.7 50.3 45.3 71.4 53.9 45.2 71.3 85.9 101.8 73.6 88.1 20.8 23.9 92.5 52.3 35.0 72.3

4 8 4 4 6 8 2 7 2 3 6 8 3 3 2 7 3 1 8 4 2 7 3 5 7 7 3 3 5 3 5 6 2 6 7 7 3 7 7 2 7

1 5 1 2 0 1 1 4 2 0 1 4 3 5 5 4 1 2 0 2 4 2 3 4 0 4 4 2 5 1 3 3 1 1 2 1 1 5 2 1 1

Copyright Cengage Learning. Powered by Cognero.

44 8 23 8 4 7 54 26 5 44 3 11 0 3 38 13 58 15 45 12 9 12 34 40 9 11 3 53 16 44 23 2 32 18 38 27 7 20 27 21 25

0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 Page 88


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 44.9 81.2 97.4 78.2 113.4 117.3 39.2 55.5 92.6 39.4 87.7 49.1 39.0 69.4 103.2 90.5 100.2 46.9 122.9 75.4 101.0 75.6 120.4 114.1 34.6 73.3 57.4 87.1 90.5 110.4 81.1 74.6 25.2 15.2 39.9 44.7 82.4 65.2 101.4 54.3 51.6

1 7 1 4 3 3 7 3 3 6 2 7 4 4 3 4 6 1 7 2 7 8 5 5 2 2 2 8 6 3 6 6 7 7 5 1 7 6 5 5 5

2 1 1 1 1 5 0 3 2 0 5 0 0 1 3 1 3 2 2 4 1 5 4 4 4 4 2 3 3 2 1 4 3 4 3 3 3 2 1 4 1

Copyright Cengage Learning. Powered by Cognero.

18 53 54 45 41 58 27 52 17 55 12 15 60 0 14 35 48 24 1 23 50 1 53 7 53 46 1 32 55 1 27 39 13 52 20 29 11 6 13 24 17

1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 Page 89


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 68.2 49.9 75.7 43.3 109.6 5.8 39.7 52.5 30.7 25.0 114.5 73.6 75.0 65.8 76.0 78.9 45.9 112.7 96.4 73.8 127.9 92.2 90.9 83.6 95.1 78.7 94.0 45.1 116.0 77.1 34.7 67.9 20.2 50.2 36.2 62.1 68.9 24.5 55.7 79.2 54.5

5 2 5 2 1 4 2 8 4 7 8 5 2 3 5 8 5 1 1 6 8 3 3 7 2 7 7 5 3 8 4 5 1 3 5 3 4 2 3 3 3

1 3 1 1 2 2 1 2 3 4 3 1 2 1 0 2 0 3 4 2 1 2 2 1 5 1 3 1 4 2 1 1 1 2 2 1 4 0 4 1 5

Copyright Cengage Learning. Powered by Cognero.

41 48 30 6 33 17 21 3 52 32 21 22 10 14 6 14 41 34 23 52 11 16 53 8 47 29 16 46 37 49 54 20 31 57 42 4 14 51 37 23 55

0 1 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 1 Page 90


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 100.2 49.9 31.8 72.1 110.5 58.8 80.2 85.3 50.2 42.5 98.8 47.1 24.8 91.5 54.9 50.0 80.8 59.1 86.6 51.4 16.0 108.8 76.0 32.4 37.6 54.3 58.3 29.6 54.3 52.1 36.9 117.8 68.2 99.0 89.2 71.2 93.1 44.3 112.3 7.7 50.5

6 3 6 7 5 2 7 6 3 4 2 6 4 6 3 8 2 2 8 7 3 8 4 4 2 6 5 3 4 6 5 6 3 4 5 5 4 2 3 1 6

2 4 2 5 5 0 3 5 4 4 3 1 4 2 1 2 3 4 0 3 5 0 2 1 3 3 2 4 1 0 3 1 3 4 1 1 3 3 0 4 1

Copyright Cengage Learning. Powered by Cognero.

10 1 35 52 1 51 4 29 39 2 42 14 48 22 40 35 5 17 19 44 41 53 59 52 25 45 22 51 45 21 5 39 58 47 12 12 42 19 13 43 35

0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 1 1 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 1 0 1 0 Page 91


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 50.8 21.4 51.2 57.3 32.2 87.3 63.1 69.7 53.5 36.2 36.3 41.4 41.7 85.1 49.3 74.3 41.7 108.8 42.6 45.2 104.0 31.9 87.3 44.7 68.4 60.8 114.8 95.5 67.3 115.2 31.7 125.1 36.9 93.0 46.1 75.0 108.6 115.7 87.6 96.3 66.1

3 3 1 4 2 2 5 2 5 3 5 5 7 8 5 6 7 2 6 2 4 1 4 7 7 8 6 7 4 4 4 4 3 6 6 5 8 7 2 6 3

2 2 5 4 5 4 5 5 4 4 5 4 2 1 3 4 3 4 2 3 2 2 3 5 5 1 3 1 4 0 2 1 1 3 2 0 3 0 1 1 0

Copyright Cengage Learning. Powered by Cognero.

35 18 3 50 48 48 55 15 44 4 21 40 42 17 0 0 35 8 4 54 22 10 53 51 58 29 55 12 38 43 25 28 15 11 42 38 48 43 20 3 24

1 1 0 1 1 0 0 0 1 0 1 1 1 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 Page 92


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 99.1 27.7 92.9 78.9 55.9 25.9 67.6 44.7 44.2 50.0 88.2 126.2 82.7 52.5 66.6 72.6 87.8 90.7 56.4 118.3 95.6 42.0 57.8 34.1 68.4 57.1 48.6 44.7 48.8 101.9 102.5 110.9 32.4 20.3 20.9 17.4 45.5 42.8 24.4 32.3 41.4

3 2 2 7 3 4 3 4 2 4 2 5 1 4 4 3 6 3 8 6 6 3 6 7 5 6 4 5 7 5 6 3 5 4 5 2 1 7 8 3 5

1 3 3 1 0 3 3 2 3 2 2 1 5 0 2 3 1 2 2 0 4 5 2 1 3 5 2 1 5 1 0 2 3 5 4 3 4 3 2 0 1

Copyright Cengage Learning. Powered by Cognero.

31 44 29 11 4 7 42 10 47 30 19 1 25 18 36 23 18 26 12 49 25 60 7 6 41 25 47 28 18 50 16 11 19 20 27 19 39 25 19 36 56

0 1 0 0 0 0 0 0 1 1 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 1 1 0 1 0 0 0 1 1 1 1 1 1 1 0 0 Page 93


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 28.6 37.4 18.4 80.4 33.3 113.1 85.8 62.3 96.0 44.1 74.4 49.8 85.9 107.3 57.4 101.5 49.9 107.8 92.7 37.7 66.5 55.6 100.9 95.7 22.0 73.3 111.7 39.3 74.9 82.3 72.3 49.7 29.5 57.4 53.7 97.5 65.4 60.8 84.0 87.8 89.2

3 8 5 3 8 2 6 3 7 3 7 5 7 3 4 7 5 5 2 3 5 3 6 5 5 5 6 5 8 2 6 6 2 3 7 7 4 2 6 4 5

4 1 3 4 4 1 2 1 2 1 4 4 1 4 2 5 3 2 2 0 2 1 4 3 2 1 2 1 2 4 1 3 4 1 1 4 2 5 3 1 2

Copyright Cengage Learning. Powered by Cognero.

27 19 38 47 38 24 53 22 23 36 20 30 1 52 28 21 35 40 44 57 37 49 5 6 41 24 26 52 43 41 13 15 35 44 17 46 27 22 44 21 41

1 0 1 0 1 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 Page 94


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 123.4 46.4 105.7 31.6 109.2 100.8 99.0 78.8 65.5 107.1 108.9 50.9 77.4 93.6 24.6 83.5 42.9 124.1 70.8 70.4 101.0 71.6 74.2 29.4 50.6 37.1 23.0 64.2 54.4 20.0 8.4 39.1 47.2 115.9 20.5 56.4 57.3 90.8 85.2 78.6 92.6

3 4 2 7 4 2 6 3 4 5 2 6 6 2 1 4 3 7 4 3 8 3 4 3 4 3 6 6 5 6 4 8 2 4 5 6 4 8 7 2 7

1 2 1 4 2 4 1 3 1 2 5 1 2 1 0 3 2 3 5 5 2 1 2 0 4 1 1 5 4 2 1 5 3 4 3 5 1 2 4 1 4

Copyright Cengage Learning. Powered by Cognero.

43 40 36 31 49 48 3 37 45 20 31 1 7 20 58 11 24 22 50 50 19 44 58 2 51 5 24 34 37 52 14 11 0 7 27 7 46 51 29 37 48

0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 Page 95


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 103.5 79.5 91.9 112.5 10.7 46.2 113.0 54.7 28.5 103.3 101.4 18.4 26.1 31.7 44.8 105.8 67.8 96.1 62.2 82.7 97.2 42.7 124.2 65.2 17.6 93.2 33.7 90.3 67.0 106.8 93.3 101.0 28.8 39.8 36.9 78.3 97.5 55.0 44.8 54.6 62.8

3 3 4 7 1 3 3 7 4 3 5 3 3 1 8 7 3 3 2 8 5 7 6 3 8 5 6 6 7 7 2 2 8 8 6 1 8 8 7 6 4

1 3 4 0 4 2 0 0 4 5 2 2 2 2 2 0 0 3 0 1 1 4 2 4 3 4 2 0 1 2 2 0 4 2 1 3 5 3 0 3 4

Copyright Cengage Learning. Powered by Cognero.

51 29 2 20 51 41 20 16 37 46 59 58 58 2 35 9 5 57 25 12 22 28 2 32 13 8 15 41 48 53 13 49 29 18 6 8 54 50 32 38 30

0 0 0 0 1 1 0 0 1 0 0 1 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 1 0 Page 96


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 90.2 51.4 94.3 51.5 68.0 47.6 98.6 39.6 58.4 78.9 54.6 91.1 70.1 48.3 47.1 55.1 24.1 58.0 92.6 53.3 21.3 102.0 41.9 45.6 124.4 40.6 76.1 119.5 72.7 43.3 60.3 44.3 68.3 91.7 88.5 102.3 115.7 29.5 109.9 74.6 53.5

7 7 1 8 7 3 8 7 1 1 7 4 7 5 3 6 3 1 2 5 2 8 7 8 3 2 8 7 2 7 5 6 6 4 2 1 5 6 4 2 4

2 5 5 4 1 2 4 0 4 2 0 0 1 1 4 4 2 1 1 4 4 2 1 1 3 2 4 4 1 3 1 1 1 1 2 2 4 4 4 0 3

Copyright Cengage Learning. Powered by Cognero.

20 22 39 23 22 40 11 6 43 22 20 2 26 24 41 11 50 27 9 43 43 13 28 13 34 44 5 35 46 1 28 16 54 46 22 9 42 25 23 47 38

0 1 0 1 0 1 0 0 1 0 0 0 0 0 1 0 1 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 Page 97


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 31.6 80.8 103.0 94.0 99.0 119.3 112.1 38.0 34.9 69.8 123.5 91.7 79.8 28.8 48.8 35.5 99.5 87.9 76.4 44.7 92.4 42.9 33.0 57.1 99.6 58.8 54.0 105.8 35.3 46.5 52.6 23.6 33.5 64.2 40.6 97.3 110.2 26.7 48.0 71.1 53.1

7 2 2 4 8 6 1 7 5 3 5 7 8 7 6 3 2 5 4 7 2 4 7 3 7 1 8 6 4 7 8 5 6 4 4 8 8 4 7 6 7

2 3 1 1 0 2 3 1 5 4 3 1 3 4 5 2 2 1 3 3 1 0 2 3 2 4 4 0 3 3 2 4 4 2 2 4 0 0 3 4 4

Copyright Cengage Learning. Powered by Cognero.

51 9 37 23 57 57 18 51 15 21 7 57 13 57 36 47 42 38 2 4 45 32 9 43 53 30 12 27 7 55 6 19 60 21 39 4 3 13 1 55 36

1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 1 1 0 1 0 0 0 0 0 1 Page 98


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 53.8 12.5 29.2 41.7 33.5 63.6 119.2 47.1 30.6 108.2 74.4 74.9 29.5 39.3 68.8 83.4 30.1 105.3 37.3 44.8 53.1 61.9 79.5 95.4 88.3 48.4 46.2 56.9 101.2 39.9 33.2 83.5 77.7 41.2 101.4 29.9 99.4 52.6 59.9 107.7 14.6

5 7 4 3 2 8 4 5 4 4 4 1 3 7 4 7 2 4 8 7 5 6 1 3 5 6 4 7 3 4 7 8 7 2 7 7 1 2 7 5 6

2 2 4 1 4 3 2 5 1 4 2 2 4 1 4 4 4 3 2 5 3 4 1 3 4 5 4 1 1 4 5 0 1 4 2 5 4 2 1 4 0

Copyright Cengage Learning. Powered by Cognero.

12 49 15 9 18 36 50 27 7 18 28 11 35 12 45 38 32 36 5 49 22 4 5 53 7 50 20 34 13 23 41 27 42 54 23 40 34 13 20 38 58

0 1 0 0 1 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1 1 0 0 0 0 1 1 0 0 1 1 0 0 1 0 1 0 0 0 0 0 Page 99


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 64.8 115.5 49.2 45.0 76.0 91.4 92.6 108.1 86.2 87.6 90.7 57.5 79.9 59.1 62.9 113.2 92.3 59.7 93.3 46.2 49.4 57.5 73.3 51.7 59.0 84.7 18.3 14.5 102.3 63.8 33.7 35.4 101.3 81.6 115.5 95.0 62.5 102.4 93.3 30.5 47.7

7 5 7 6 2 3 7 2 7 3 7 5 2 7 4 4 6 4 8 1 2 1 6 2 3 4 7 3 6 2 8 7 3 8 3 7 5 4 2 6 8

4 1 0 1 5 4 5 4 0 4 4 1 1 0 4 3 4 2 5 1 3 3 2 5 4 4 2 1 3 2 3 1 5 4 1 5 4 1 1 1 2

Copyright Cengage Learning. Powered by Cognero.

13 28 37 17 38 14 59 9 10 2 22 11 35 11 3 25 4 1 13 13 55 2 40 18 18 40 9 44 35 10 23 26 21 2 10 38 31 28 48 20 46

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 Page 100


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 84.4 47.6 61.7 35.4 10.0 121.8 97.1 80.7 31.0 31.6 35.9 42.7 85.6 51.9 53.6 48.1 27.2 54.5 81.3 33.5 60.9 91.9 46.7 59.5 78.6 42.9 53.5 72.7 109.2 86.1 17.9 81.1 51.7 73.3 39.6 72.0 75.2 47.0 46.1 77.7 98.2

4 2 8 7 3 4 3 2 4 4 6 2 5 8 5 2 7 6 2 2 3 3 3 5 4 7 1 3 7 4 2 4 6 2 7 4 7 3 4 2 1

3 2 1 3 2 2 3 5 1 2 2 2 0 4 1 5 3 2 4 2 3 5 2 4 4 3 0 3 4 3 3 0 2 0 1 1 1 5 0 1 4

Copyright Cengage Learning. Powered by Cognero.

45 56 21 14 15 19 16 16 30 30 3 14 34 56 48 11 28 36 2 8 9 38 24 6 48 31 28 3 2 34 50 41 46 43 53 30 55 25 45 10 49

0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 1 0 0 0 0 1 0 0 1 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 Page 101


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 89.8 71.4 64.5 111.9 82.8 92.9 43.0 58.8 99.8 46.2 66.7 40.6 100.3 46.7 124.1 41.1 48.9 45.1 35.2 91.6 57.0 81.9 72.4 51.2 27.9 98.6 89.6 42.3 67.9 99.3 58.8 92.5 48.6 54.0 100.2 84.9 93.4 53.1 60.2 29.5 110.4

4 6 4 4 6 4 8 4 6 8 7 6 2 6 3 5 3 2 1 1 6 3 7 2 3 4 7 2 2 2 2 7 7 2 8 5 4 2 4 2 4

0 3 1 2 3 1 4 1 1 0 4 3 2 2 5 1 3 5 0 1 1 3 1 1 0 2 3 0 1 4 4 3 5 0 5 1 2 2 3 3 1

Copyright Cengage Learning. Powered by Cognero.

33 59 5 20 24 24 57 19 11 50 24 46 55 27 22 27 53 11 57 10 11 14 41 28 53 7 20 48 57 29 2 4 17 5 34 60 5 11 23 23 40

0 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 Page 102


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 97.3 61.6 16.0 107.8 67.5 26.8 96.5 65.3 11.8 88.7 87.1 63.2 51.5 39.7 83.5 35.3 110.5 20.3 124.6 41.6 27.6 76.1 43.6 44.7 108.8 46.2 94.9 89.2 52.4 30.2 53.1 51.1 97.9 35.4 49.1 49.0 114.6 33.3 107.5 116.1 44.9

7 5 5 4 4 6 3 5 4 3 6 5 2 2 3 4 6 3 5 5 6 6 1 4 7 5 6 2 7 4 2 6 3 4 6 4 4 6 1 4 2

2 4 2 1 0 4 2 1 1 3 4 4 2 2 1 1 4 4 2 0 1 5 4 5 2 1 4 3 5 5 2 1 3 4 1 1 4 4 4 5 1

Copyright Cengage Learning. Powered by Cognero.

43 21 35 18 53 44 52 13 43 3 46 30 4 16 19 11 44 9 17 28 59 30 55 59 49 37 28 31 39 8 30 18 47 44 45 38 35 14 12 7 51

0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 Page 103


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 103.7 77.7 39.4 109.2 106.4 54.8 84.6 85.5 15.3 31.6 79.9 29.5 54.7 84.3 99.8 31.6 67.1 36.0 46.7 78.3 27.4 64.5 91.9 44.4 109.2 83.4 110.4 108.8 80.2 61.9 123.1 112.7 75.4 75.6 88.4 48.9 48.5 50.1 55.8 66.6 104.8

5 3 5 2 1 3 3 4 6 5 3 5 5 5 7 4 7 8 7 8 2 7 7 1 3 6 4 5 7 2 8 7 2 8 4 4 2 7 6 2 6

2 2 4 1 3 4 1 4 4 4 1 0 0 3 4 1 3 1 2 4 0 2 4 0 1 2 3 4 2 2 3 5 3 0 2 2 1 5 4 1 0

Copyright Cengage Learning. Powered by Cognero.

57 19 8 22 1 16 47 12 35 5 17 21 0 22 50 40 24 51 34 6 13 33 20 23 17 52 5 40 52 19 36 43 10 52 39 40 49 38 60 19 34

0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 Page 104


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 47.0 72.0 79.3 51.4 129.3 49.4 82.3 63.6 66.4 86.6 49.0 40.4 51.9 62.7 52.9 85.7 28.8 32.5 44.9 43.1 61.6 97.1 120.9 76.7 90.6 24.2 53.2 84.0 72.4 82.4 95.0 43.1 52.6 20.3 17.1 102.9 122.2 98.4 62.0 47.2 34.0

5 7 8 4 5 4 4 5 8 4 6 6 2 6 2 3 5 4 3 3 2 5 2 7 8 4 2 5 6 1 1 4 5 8 5 4 5 5 5 6 2

4 2 5 1 0 4 1 0 1 0 5 5 3 0 3 1 0 5 2 4 4 2 3 0 5 1 3 1 0 2 5 4 2 1 1 4 1 2 5 0 0

Copyright Cengage Learning. Powered by Cognero.

26 48 28 15 49 17 5 5 39 30 37 2 8 38 37 38 7 56 58 31 17 48 36 6 48 32 40 12 2 3 40 58 56 35 8 56 13 51 50 46 49

1 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 Page 105


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 22.7 78.2 91.9 39.0 102.7 47.5 37.8 98.9 85.5 97.8 109.3 41.6 98.9 125.4 59.4 90.2 19.6 115.1 28.7 64.2 96.3 19.3 96.0 26.5 65.5 44.6 37.6 102.2 59.9 101.0 91.6 50.7 49.4 44.8 73.9 70.3 113.3 69.2 54.1 58.1 110.9

1 2 6 2 4 5 7 3 2 3 2 7 7 3 1 6 4 3 4 7 5 2 6 7 5 8 8 8 3 8 4 4 7 7 5 4 6 5 7 2 2

4 0 1 1 2 2 2 2 0 3 3 1 3 2 4 1 5 1 4 2 3 3 4 4 0 2 4 5 2 2 3 2 1 0 0 3 4 2 5 2 0

Copyright Cengage Learning. Powered by Cognero.

42 7 31 15 19 39 48 28 39 12 11 31 31 52 17 6 46 48 45 17 8 18 40 35 45 38 28 39 32 6 9 19 36 44 56 0 56 44 31 24 47

1 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 0 1 1 0 1 0 0 1 0 0 0 0 0 0 1 1 0 Page 106


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 52.8 57.6 81.8 111.2 31.2 22.4 42.3 41.5 50.1 47.3 38.7 42.7 32.8 45.3 129.2 72.5 35.0 74.4 28.7 122.3 45.1 79.7 117.7 93.9 32.0 56.1 80.7 78.3 75.3 50.6 55.3 67.1 55.8 72.2 81.8 52.7 122.5 51.5 9.9 76.9 54.7

6 2 4 6 3 6 6 2 3 2 3 6 8 7 2 6 2 8 1 4 4 4 7 6 2 5 6 8 8 6 5 3 4 7 5 2 7 3 7 6 7

4 4 1 1 0 5 3 4 1 3 0 0 5 1 4 2 0 1 1 3 2 2 4 3 1 3 4 5 5 1 2 1 3 3 4 2 5 1 5 4 3

Copyright Cengage Learning. Powered by Cognero.

32 13 18 3 56 53 13 37 17 41 11 57 42 15 39 47 55 33 31 27 28 15 29 3 32 57 16 29 22 15 37 35 45 25 50 54 56 18 0 15 19

1 0 0 0 0 1 0 1 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 Page 107


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 93.9 99.1 100.7 38.8 28.4 103.7 54.1 84.8 36.2 86.9 109.8 49.6 66.0 45.3 69.3 73.8 64.1 101.0 86.4 52.1 48.3 69.8 31.6 95.6 58.4 80.8 91.5 107.2 45.2 100.4 104.6 59.2 86.2 51.6 81.0 61.6 35.0 44.7 65.9 114.9 38.4

2 8 1 1 7 5 6 3 7 6 5 8 2 6 3 5 2 6 7 6 1 1 2 6 3 5 3 3 1 5 7 6 8 8 7 5 7 3 4 2 4

3 1 0 4 0 4 1 3 5 4 5 2 5 3 5 0 0 5 4 5 1 5 1 4 0 4 3 1 4 2 2 1 3 1 1 5 2 3 4 3 0

Copyright Cengage Learning. Powered by Cognero.

37 20 15 1 36 11 25 4 13 34 6 32 23 45 35 56 58 5 31 57 23 15 35 22 13 18 28 32 19 12 28 1 22 13 47 52 49 48 21 0 56

0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 Page 108


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 106.5 65.8 50.5 113.5 108.1 38.9 26.5 40.2 113.3 56.7 51.5 68.2 49.0 69.0 46.5 32.0 71.5 58.0 123.8 36.3 107.0 79.3 80.6 73.8 75.0 125.5 75.4 97.7 84.7 56.5 44.4 61.4 84.8 111.2 44.0 80.5 78.3 78.9 75.3 84.5 46.4

7 4 4 2 4 2 6 3 1 5 7 6 7 1 8 3 1 8 1 4 4 2 5 2 5 2 7 5 2 7 8 5 8 6 5 5 2 4 2 8 2

1 3 3 1 0 1 1 1 4 4 3 1 2 4 3 2 1 1 2 4 3 3 0 1 3 0 2 1 5 4 2 4 1 5 5 0 2 2 4 0 1

Copyright Cengage Learning. Powered by Cognero.

36 9 39 34 49 10 35 21 59 37 14 33 23 4 1 48 11 12 52 58 27 10 57 44 56 43 11 38 40 4 22 4 17 31 37 57 11 42 19 16 3

0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 Page 109


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 36.8 123.9 9.3 27.4 73.8 69.5 80.0 51.7 106.4 102.5 13.3 58.4 33.9 101.7 111.0 91.6 65.9 68.1 28.4 61.1 91.0 49.4 53.0 84.3 93.6 29.0 49.4 32.6 43.6 69.4 100.0 121.5 82.1 109.5 128.5 74.7 71.4 83.6 79.5 67.0 46.9

2 2 2 1 4 2 2 8 6 7 3 7 6 5 4 5 1 4 3 2 7 3 4 7 7 4 5 8 5 5 3 1 5 6 1 4 6 5 6 5 1

3 4 4 1 4 5 3 2 2 2 2 2 1 3 2 3 4 0 5 4 1 4 3 3 1 1 3 2 2 1 0 2 3 3 5 4 1 1 3 4 4

Copyright Cengage Learning. Powered by Cognero.

3 29 53 29 18 44 34 33 19 40 55 41 29 38 59 31 16 9 44 10 10 12 8 50 52 6 20 54 49 34 55 25 45 38 13 2 9 29 56 19 22

0 0 1 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 Page 110


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 64.6 111.4 116.9 57.3 46.5 89.5 93.5 52.2 41.1 64.5 88.7 46.3 112.9 30.0 41.4 77.1 45.7 35.4 88.6 72.3 89.5 57.6 90.4 111.6 35.0 117.1 72.2 38.5 105.9 49.7 112.8 100.4 96.0 39.2 15.6 84.4 85.5 25.7 52.0 47.8 24.1

3 3 8 1 5 5 3 6 7 1 6 5 5 6 6 4 3 2 3 3 6 1 7 3 7 8 3 4 5 4 5 6 2 2 1 2 1 7 2 6 6

3 2 1 4 4 3 3 3 3 3 0 0 3 4 1 5 2 5 3 2 5 4 1 3 4 2 3 4 1 2 4 5 5 3 5 3 3 1 1 0 2

Copyright Cengage Learning. Powered by Cognero.

25 37 16 43 42 44 33 16 60 10 18 44 59 56 56 53 55 57 57 47 40 42 18 59 15 1 24 9 22 49 24 34 2 14 43 23 43 3 38 18 1

0 0 0 1 1 0 0 1 1 0 0 0 0 1 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 Page 111


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 48.6 32.1 38.5 52.3 63.5 79.6 43.6 85.1 62.6 53.5 87.2 58.0 47.9 50.1 23.4 117.6 61.9 27.7 74.9 74.9 112.3 80.4 87.3 50.5 81.5 89.6 44.3 42.7 76.0 133.1 24.9 24.6 91.2 62.6 53.7 31.6 88.6 69.4 76.8 54.1 105.5

2 3 1 2 5 1 7 8 5 2 2 6 2 6 8 7 3 7 5 6 5 3 6 6 3 8 2 6 3 5 7 7 7 2 6 7 8 5 3 4 3

1 3 4 1 5 0 3 3 3 1 4 3 0 2 4 4 3 1 0 0 4 1 4 3 3 5 1 1 5 3 0 3 3 5 4 2 3 5 4 1 2

Copyright Cengage Learning. Powered by Cognero.

51 45 16 48 41 18 33 28 49 39 13 55 0 36 27 51 28 30 45 36 51 40 3 46 23 53 45 37 60 8 46 52 31 55 45 25 47 30 27 5 29

0 1 1 0 0 0 1 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 Page 112


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 88.8 85.4 23.4 88.7 108.4 57.6 42.5 53.1 29.7 56.8 113.4 80.0 100.6 67.0 117.7 89.0 68.0 76.6 94.6 76.7 95.7 72.6 31.1 59.9 51.8 54.4 82.8 61.7 79.2 79.4 28.8 80.7 39.2 23.9 61.9 122.5 61.0 49.7 81.9 17.3 28.6

2 5 6 3 7 5 3 8 1 3 7 7 7 4 1 5 2 2 4 4 7 3 8 1 3 3 3 2 5 7 3 2 4 4 2 6 6 2 7 1 7

5 4 3 3 2 4 2 3 2 5 3 3 4 3 1 2 4 2 4 5 5 4 4 0 4 4 4 0 1 3 2 1 3 4 5 4 4 2 2 2 5

Copyright Cengage Learning. Powered by Cognero.

3 17 21 10 14 58 12 53 2 53 20 22 52 29 0 12 18 42 4 26 5 8 37 30 52 37 16 8 49 41 23 8 46 48 58 24 53 18 39 37 47

0 0 1 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 1 0 1 1 0 0 0 1 0 1 1 Page 113


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 73.9 93.5 49.7 96.1 65.3 30.3 70.9 58.0 27.2 44.6 79.6 41.1 79.3 47.9 31.8 30.4 43.3 29.3 54.2 111.1 41.3 80.3 59.2 26.5 48.3 70.3 116.7 72.9 68.5 62.6 67.7 90.3 51.9 69.8 40.5 83.1 65.5 23.0 51.3 39.8 66.2

4 4 8 3 8 4 3 2 4 2 8 6 6 2 4 5 1 1 8 3 8 4 7 7 8 6 1 5 6 1 1 5 5 8 6 8 2 3 8 3 7

2 3 2 5 3 1 0 4 2 4 4 1 2 3 5 5 4 4 2 4 0 4 5 4 0 2 4 1 2 3 4 4 5 2 0 2 3 2 2 3 2

Copyright Cengage Learning. Powered by Cognero.

50 17 28 16 54 16 28 46 34 32 23 6 18 37 30 54 14 60 50 8 27 6 56 21 32 54 36 3 15 43 11 23 38 44 34 37 44 34 10 58 2

0 0 1 0 0 0 0 1 1 1 0 0 0 1 1 1 0 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 0 Page 114


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 65.7 38.2 45.3 98.9 88.9 15.5 46.6 97.2 94.3 62.0 35.3 75.4 38.1 99.9 128.6 22.1 59.3 84.7 78.7 105.5 64.6 128.8 86.4 83.8 51.5 72.6 66.8 106.2 73.3 40.3 58.4 78.5 44.5 39.0 20.0 129.5 113.8 57.2 35.4 46.4 76.7

2 6 7 1 7 4 4 5 4 1 2 5 6 8 8 5 1 5 7 6 6 4 2 1 7 6 5 6 2 4 5 3 6 6 2 4 4 3 7 5 3

3 1 4 2 0 4 2 2 3 4 2 2 5 5 2 1 0 3 4 1 0 2 3 2 4 4 3 2 0 2 3 4 5 0 2 4 2 3 4 1 4

Copyright Cengage Learning. Powered by Cognero.

21 35 0 38 55 48 11 17 18 59 32 31 51 24 21 27 55 20 40 52 40 41 57 41 54 28 38 54 13 35 4 3 11 37 15 29 55 32 55 57 51

0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 Page 115


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 39.5 51.5 61.4 39.4 42.2 17.6 91.2 134.2 73.7 100.8 23.9 100.8 116.4 36.6 73.7 53.2 53.4 56.6 86.2 108.2 31.0 63.3 109.8 71.8 118.6 40.5 46.3 58.1 89.6 63.8 111.8 119.4 71.2 73.2 39.5 87.4 41.0 67.2 95.7 94.5 42.4

3 6 3 5 2 7 3 8 8 5 4 2 1 8 5 2 2 1 2 4 4 6 8 1 3 6 5 7 6 1 5 5 4 7 7 3 3 3 1 4 2

2 5 2 0 2 0 0 5 1 1 3 3 1 2 4 3 2 1 4 4 5 2 1 3 2 3 3 4 3 1 4 5 1 0 4 2 3 4 3 3 3

Copyright Cengage Learning. Powered by Cognero.

59 26 40 9 7 23 40 27 3 22 8 11 21 8 26 1 58 11 30 58 30 37 43 59 46 20 59 8 54 35 25 28 28 20 10 19 16 4 39 59 27

1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 Page 116


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 79.4 34.8 53.9 103.6 99.5 74.2 32.1 32.6 112.0 117.7 67.4 44.0 71.4 63.9 80.9 17.4 25.6 43.9 106.1 110.9 111.7 63.5 46.6 125.3 45.7 40.4 77.2 55.1 79.6 77.3 119.6 92.9 35.5 59.6 68.1 90.0 54.8 32.6 39.0 106.3 14.8

4 2 5 5 8 3 1 3 2 4 3 6 6 4 5 2 5 6 3 4 7 4 5 3 3 5 2 8 5 2 3 5 2 7 3 3 3 2 7 7 6

2 2 1 2 1 2 2 2 1 4 2 1 4 2 4 4 3 2 4 4 4 3 4 2 2 0 4 1 0 5 1 4 4 4 3 0 1 4 2 3 1

Copyright Cengage Learning. Powered by Cognero.

38 24 38 26 23 51 40 41 60 58 3 9 25 30 34 43 9 45 24 55 56 41 32 8 22 24 15 16 33 42 55 8 45 23 41 55 25 46 55 8 53

0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 1 0 0 0 1 1 0 0 Page 117


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 102.8 52.8 46.7 54.5 111.2 92.8 92.9 47.4 92.6 114.1 78.3 55.7 25.1 76.1 43.6 94.7 112.0 46.8 77.8 27.0 55.0 77.1 62.5 49.0 106.4 13.5 52.7 42.0 66.1 79.7 80.1 93.7 55.6 64.5 64.5 123.3 40.5 80.4 83.0 75.8 40.3

3 3 4 2 4 3 3 2 2 6 2 8 4 2 1 5 4 6 4 7 1 5 2 2 7 7 2 7 4 6 5 7 6 3 3 7 3 8 1 3 2

1 2 5 1 5 3 3 2 2 3 3 2 4 1 1 2 2 4 2 0 3 3 2 3 2 1 2 3 2 1 3 4 2 3 1 5 2 4 0 1 1

Copyright Cengage Learning. Powered by Cognero.

12 44 48 48 60 45 32 47 44 44 20 25 40 11 15 15 37 7 43 21 35 36 21 51 41 58 9 43 4 40 23 35 57 39 33 29 31 3 11 34 50

0 1 1 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 Page 118


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 87.3 53.9 68.0 54.7 96.9 86.1 104.1 36.7 73.9 91.2 102.2 27.2 49.3 68.0 59.2 10.8 43.4 51.5 43.0 92.4 84.8 35.8 56.4 83.4 53.3 27.0 92.3 76.6 83.4 88.5 79.8 47.5 61.7 87.9 36.2 79.3 41.7 77.0 58.7 71.7 31.0

2 4 5 8 8 2 1 6 1 7 1 1 3 3 2 5 2 2 3 3 7 3 7 4 4 3 1 3 7 5 1 6 1 8 3 4 5 4 2 1 8

2 2 1 2 1 1 4 4 4 4 3 1 1 3 4 1 0 2 5 5 3 1 0 4 4 3 2 2 4 3 1 2 2 4 1 3 0 2 2 3 4

Copyright Cengage Learning. Powered by Cognero.

33 10 3 13 45 33 48 58 21 26 50 3 4 55 3 31 1 2 24 53 53 6 56 40 4 3 53 50 41 49 21 54 34 3 28 51 37 17 33 30 57

0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 Page 119


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 84.8 31.1 24.9 88.4 112.9 62.1 81.7 42.9 120.9 36.6 13.7 89.0 92.1 96.4 110.1 98.1 74.0 54.5 51.6 50.6 46.1 89.7 105.0 53.5 52.9 36.5 76.2 43.8 34.2 89.1 106.4 21.4 51.8 46.9 51.2 56.0 43.3 44.6 36.1 50.9 56.2

8 3 4 2 7 3 2 8 4 5 5 1 8 2 5 2 1 8 4 2 6 5 1 1 3 7 2 3 4 5 3 7 4 2 7 7 7 1 2 2 2

2 0 2 5 3 2 4 0 4 1 2 1 3 2 3 1 3 2 0 2 2 3 1 2 3 5 4 5 1 0 0 2 3 2 0 5 3 1 3 3 3

Copyright Cengage Learning. Powered by Cognero.

38 26 33 38 36 47 18 18 43 56 37 18 1 42 48 34 31 60 5 10 28 54 41 17 28 14 58 47 11 51 41 11 25 50 36 38 14 13 16 17 50

0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 1 1 0 0 1 0 0 0 0 1 1 0 1 0 0 1 1 1 Page 120


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 53.5 99.7 68.9 54.8 46.1 55.0 55.1 25.0 82.9 86.6 119.8 74.0 106.9 18.9 103.2 114.1 83.1 39.0 37.8 63.9 35.8 61.4 38.7 49.0 26.1 78.2 38.5 101.0 38.3 81.3 32.6 67.2 45.7 36.0 62.1 79.8 50.1 34.3 27.8 47.7 68.7

6 5 7 4 8 6 3 5 6 1 4 1 2 2 3 6 3 7 7 6 3 2 2 3 2 4 4 6 3 5 2 5 4 8 8 5 5 2 1 6 4

4 1 3 0 5 2 2 2 3 3 0 5 3 3 2 0 2 0 4 4 4 2 5 2 4 2 1 4 2 4 5 1 1 0 3 4 2 2 2 2 4

Copyright Cengage Learning. Powered by Cognero.

30 52 36 15 13 23 53 23 2 24 55 20 24 53 32 54 50 45 53 43 7 56 38 12 54 44 48 41 40 18 8 38 47 10 39 57 31 54 45 5 49

1 0 0 0 0 1 1 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 1 1 1 0 0 Page 121


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 25.9 56.0 14.5 25.0 56.6 103.8 47.7 109.5 120.7 73.3 74.3 114.8 53.0 100.8 100.7 74.0 37.2 39.0 80.2 25.0 100.8 93.7 65.3 30.4 122.5 54.4 28.0 98.5 99.7 85.7 63.3 91.1 66.6 46.9 21.2 100.0 86.9 47.0 47.0 57.2 90.1

3 2 5 7 4 3 6 5 6 4 6 6 8 3 1 4 3 7 6 7 2 5 7 8 2 2 6 6 6 6 7 8 2 5 6 3 7 7 4 7 6

2 3 0 2 4 5 4 2 1 4 3 5 5 4 0 4 4 2 0 3 4 4 4 2 2 1 5 4 1 1 2 2 2 3 5 4 2 4 3 5 1

Copyright Cengage Learning. Powered by Cognero.

59 27 0 49 41 51 45 25 14 14 41 52 23 26 48 12 44 5 33 34 35 1 10 43 30 7 38 35 14 23 45 19 14 32 1 58 5 26 32 41 1

1 1 0 1 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 1 1 1 0 Page 122


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 88.8 36.5 66.2 83.3 51.8 48.4 66.5 53.5 52.5 103.3 38.8 54.9 59.5 79.6 42.1 24.6 49.9 53.6 73.5 84.4 101.1 80.2 106.5 36.8 63.8 76.5 37.5 46.7 32.1 36.8 130.5 82.6 84.6 59.3 27.7 48.7 96.4 65.4 42.4 65.0 12.6

1 5 3 8 3 7 8 3 3 2 6 3 4 3 5 4 8 7 2 7 2 4 5 2 2 5 6 4 5 8 7 5 8 8 7 4 2 5 3 3 7

0 3 3 3 1 2 5 5 4 3 2 0 2 2 1 2 1 5 2 2 0 3 0 4 4 4 5 5 3 1 0 1 0 1 1 1 0 2 3 4 2

Copyright Cengage Learning. Powered by Cognero.

37 42 11 29 54 15 44 4 53 37 44 7 26 32 13 41 7 10 11 37 15 20 19 23 3 50 37 4 10 28 55 13 25 30 52 41 38 55 45 27 13

0 1 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 Page 123


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 79.1 52.1 41.8 79.2 86.3 113.7 69.5 116.6 44.1 22.9 49.7 63.5 103.7 121.7 122.6 49.6 101.0 85.7 52.0 41.6 68.2 58.0 102.0 30.5 39.8 44.1 41.7 33.0 54.3 97.6 69.6 112.2 25.3 65.0 31.5 53.0 56.0 114.3 47.0 58.0 45.7

8 3 5 5 4 1 6 7 3 4 5 6 5 8 6 5 5 2 4 2 8 6 4 5 5 3 5 6 5 7 6 5 2 4 4 8 1 3 5 4 3

4 3 2 1 4 3 4 3 4 0 2 5 5 3 1 1 1 4 3 3 0 4 4 4 1 2 3 3 4 5 0 0 3 4 1 2 3 2 0 3 5

Copyright Cengage Learning. Powered by Cognero.

43 3 40 28 52 44 60 20 14 32 21 56 28 45 30 24 39 3 27 28 59 11 18 57 8 13 45 26 20 26 33 60 20 10 9 33 44 52 29 11 10

0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 1 1 1 0 0 0 1 0 0 1 1 0 0 0 0 Page 124


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 127.7 44.7 90.5 57.1 101.1 47.0 68.0 120.4 116.3 17.3 100.6 51.7 80.1 92.0 107.9 121.3 77.7 47.7 73.6 95.9 92.5 55.6 60.6 29.8 21.0 85.0 45.9 66.8 45.0 88.1 97.0 120.9 69.0 17.8 36.0 118.4 101.7 85.5 49.5 113.5 34.2

5 6 7 6 5 7 5 7 4 5 3 6 1 1 6 3 5 8 2 6 7 3 3 7 5 7 5 6 5 6 4 5 6 4 3 7 4 5 3 8 3

2 3 4 3 1 2 5 2 0 3 2 0 4 5 2 3 3 4 3 4 3 1 1 2 2 5 1 3 4 2 1 3 2 0 0 1 2 3 4 4 4

Copyright Cengage Learning. Powered by Cognero.

30 12 49 12 6 11 14 27 46 17 14 2 28 44 24 56 24 12 42 44 34 30 59 2 15 10 51 42 47 12 53 36 6 23 51 15 8 7 35 59 25

0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 Page 125


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 25.9 68.5 38.9 45.2 85.1 68.9 27.3 45.1 37.2 28.7 54.3 96.9 38.6 58.3 67.3 65.9 42.1 97.9 42.4 61.2 55.3 54.9 84.6 29.3 76.0 75.5 59.1 36.6 82.9 75.6 14.2 94.3 81.3 36.7 79.8 56.5 82.0 48.7 81.1 40.4 54.1

8 7 5 1 7 7 2 3 5 8 7 5 4 7 4 5 5 5 4 2 4 7 1 2 8 6 3 3 4 3 3 5 7 2 6 2 6 7 4 3 7

3 4 3 2 3 4 2 3 1 1 5 1 2 3 3 5 2 4 4 5 2 2 2 3 2 2 1 0 2 3 4 0 3 1 1 1 2 4 4 2 5

Copyright Cengage Learning. Powered by Cognero.

23 55 16 18 25 20 7 20 14 16 44 26 51 19 14 53 35 32 33 55 14 4 4 24 10 19 22 22 50 34 25 58 1 40 24 1 22 35 22 51 21

1 0 1 1 0 0 0 1 0 0 1 0 1 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 1 Page 126


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 14.9 22.9 19.7 45.9 65.3 79.1 42.0 96.1 92.4 46.2 49.8 93.1 49.3 109.0 98.4 44.5 21.4 55.3 54.7 20.5 50.7 46.8 106.5 51.9 100.8 43.4 26.0 29.9 87.0 48.9 90.0 47.1 89.9 50.3 69.6 28.6 49.5 31.3 39.8 58.8 103.4

5 3 5 3 1 2 3 4 3 5 3 5 5 6 5 3 1 5 1 5 8 7 7 4 3 4 5 6 7 4 4 1 6 7 4 7 4 2 5 7 3

1 1 5 4 1 0 4 1 1 4 0 4 2 2 2 3 2 4 4 4 4 1 2 3 4 3 3 1 0 3 3 2 3 3 4 4 4 1 1 4 3

Copyright Cengage Learning. Powered by Cognero.

43 4 14 20 47 48 49 29 31 31 21 30 15 59 47 12 5 4 11 14 29 47 5 5 50 43 15 28 14 31 9 4 16 51 18 1 14 44 39 8 20

0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 Page 127


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 110.6 53.2 66.7 63.6 105.7 47.1 82.9 58.2 48.6 90.1 42.1 23.3 25.5 65.1 46.1 61.6 90.3 62.8 121.3 46.0 25.1 80.5 25.2 55.7 60.5 112.0 41.6 77.7 72.4 52.6 91.2 38.7 115.1 124.9 53.2 96.0 81.4 95.2 119.1 45.3 111.4

6 5 7 6 3 6 6 7 5 8 1 1 3 4 2 3 2 2 3 2 7 4 1 4 1 1 4 4 3 2 2 4 1 6 4 5 1 3 4 3 1

5 4 3 2 4 2 0 5 0 3 4 2 0 4 4 1 3 2 1 4 2 2 5 4 4 1 1 5 4 4 2 4 0 4 2 1 0 5 3 4 1

Copyright Cengage Learning. Powered by Cognero.

29 47 39 29 20 31 48 41 49 18 31 1 41 29 27 1 30 51 21 39 6 56 22 59 37 18 31 11 20 16 33 58 31 10 33 10 5 9 19 12 36

0 1 0 0 0 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 Page 128


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 79.7 46.5 23.5 19.7 30.6 79.0 40.2 90.0 102.8 28.6 73.5 72.2 41.1 36.0 25.1 42.7 27.0 60.7 95.0 46.5 77.7 24.1 85.2 96.4 35.0 114.6 55.5 77.3 54.1 81.6 109.9 46.9 70.9 36.2 70.3 45.0 35.3 29.9 53.7 100.7 57.1

5 5 2 3 8 6 2 4 2 6 4 8 1 6 5 7 6 7 6 1 6 6 3 3 4 6 7 2 3 5 7 4 3 2 3 1 6 3 2 3 7

4 5 3 4 3 1 1 4 2 3 4 5 0 2 3 2 3 2 1 1 4 1 1 2 2 4 0 3 2 4 2 4 3 4 1 2 4 1 4 4 2

Copyright Cengage Learning. Powered by Cognero.

10 51 14 45 58 0 38 55 29 48 5 57 54 41 12 9 22 54 27 20 22 32 57 39 45 17 46 57 2 4 37 10 18 46 34 21 50 33 13 57 48

0 1 0 1 1 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 1 Page 129


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 57.4 71.8 52.6 49.9 102.5 65.7 75.5 84.7 53.1 40.1 124.7 31.9 65.0 106.3 21.9 102.4 93.6 62.4 46.6 33.6 15.3 62.3 86.0 111.7 71.7 98.8 94.7 32.5 88.5 78.1 24.4 60.2 106.4 45.1 89.7 111.8 52.1 51.4 59.3 12.9 49.1

8 2 3 6 4 5 3 3 8 2 4 5 2 6 7 4 4 6 1 6 7 5 6 3 4 3 1 5 8 7 5 5 3 2 3 2 7 4 6 8 3

0 3 3 0 1 4 4 3 1 4 3 1 4 4 1 1 3 3 1 2 4 1 5 1 2 3 2 4 2 1 4 2 2 4 1 0 2 3 5 0 3

Copyright Cengage Learning. Powered by Cognero.

47 53 27 7 59 54 56 2 4 46 34 41 4 37 9 4 41 16 27 49 26 3 40 18 42 17 13 14 19 9 52 18 31 4 56 30 12 23 20 20 41

0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 1 Page 130


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 43.1 31.6 103.5 103.7 114.8 73.2 114.4 61.1 53.0 63.9 86.8 57.2 48.7 39.0 22.3 55.6 99.5 104.8 53.9 77.4 31.5 96.3 40.1 54.7 86.9 29.6 47.2 86.4 57.2 34.2 74.3 50.0 45.2 40.9 69.1 41.5 44.2 98.7 46.7 84.2 78.5

4 6 3 3 5 6 5 3 8 5 6 5 4 2 6 4 3 7 4 7 7 2 4 4 4 3 2 3 5 6 7 3 4 5 5 3 5 5 2 6 7

2 4 4 3 4 0 2 5 1 4 5 4 3 3 5 3 1 4 3 3 2 1 1 0 1 0 1 1 4 4 3 3 4 2 1 1 2 2 3 3 3

Copyright Cengage Learning. Powered by Cognero.

50 39 30 45 22 57 40 37 33 59 51 0 39 49 41 27 53 7 0 1 39 0 52 34 8 48 40 29 55 51 48 7 41 41 45 44 11 32 15 48 7

1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 Page 131


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 40.5 80.8 75.0 47.6 81.7 39.8 54.1 73.6 65.3 28.0 104.6 86.3 111.0 41.9 103.2 56.6 71.6 39.1 95.6 75.3 55.6 31.8 23.7 39.9 113.7 49.5 103.9 34.2 49.1 50.0 74.5 16.6 87.7 97.0 59.5 87.5 74.2 58.5 62.6 72.3 96.7

6 8 7 4 4 3 6 1 8 7 2 2 3 6 5 6 3 3 7 4 3 7 4 3 8 4 7 8 1 7 5 1 5 4 7 3 6 1 8 2 5

4 4 3 3 1 1 1 1 2 2 1 1 2 4 1 4 4 3 4 2 5 0 3 0 4 3 4 5 3 4 5 1 5 4 0 4 3 0 3 2 2

Copyright Cengage Learning. Powered by Cognero.

56 34 46 53 18 34 16 22 35 14 49 42 59 11 6 36 14 42 33 15 4 56 32 41 46 4 19 54 40 29 43 56 40 56 46 15 9 35 24 4 35

1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 Page 132


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 90.8 43.5 63.4 116.1 101.8 83.2 99.8 107.0 28.1 46.6 44.7 54.5 120.9 103.7 82.4 52.9 52.8 97.7 46.3 104.7 28.7 105.1 93.0 41.5 62.8 105.3 112.4 45.8 118.4 104.0 16.9 58.2 111.9 94.8 101.0 49.7 43.5 72.7 60.7 92.8 48.0

3 3 4 6 5 3 6 7 4 6 6 5 3 1 5 7 4 4 7 6 2 5 3 5 4 2 7 6 3 4 7 2 7 8 6 7 3 4 5 3 4

1 1 4 4 4 4 3 1 2 2 1 2 3 1 3 5 5 2 2 3 2 1 4 0 2 5 1 5 1 3 0 1 2 2 2 4 4 5 3 3 3

Copyright Cengage Learning. Powered by Cognero.

15 4 22 16 20 54 33 3 23 12 33 56 4 30 47 19 39 21 6 6 34 47 21 28 4 36 59 20 18 24 27 37 60 47 2 52 29 17 49 19 7

0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 Page 133


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 69.1 75.1 96.9 30.6 34.9 94.9 55.2 31.9 30.5 34.8 53.5 36.9 50.8 76.3 40.4 74.3 20.5 99.3 86.5 36.9 123.4 81.6 110.5 82.3 107.4 29.3 49.7 64.3 107.6 74.5 89.8 106.9 37.4 91.9 74.4 73.0 45.1 60.9 71.0 94.6 29.8

5 6 6 5 8 4 4 5 3 4 5 6 2 2 5 5 4 3 6 3 5 6 7 3 5 4 2 7 5 5 2 7 4 7 3 2 5 7 6 6 1

2 4 0 2 5 1 0 2 2 3 1 3 3 1 3 1 1 1 4 0 4 4 1 0 4 3 0 5 0 1 3 2 0 2 2 1 2 2 3 1 3

Copyright Cengage Learning. Powered by Cognero.

31 20 22 60 27 55 45 36 33 58 32 27 11 25 6 50 10 59 52 53 28 7 16 10 41 49 23 37 44 20 31 13 33 50 28 34 10 25 47 32 40

0 0 0 1 1 0 0 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 Page 134


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 31.3 57.6 25.7 34.7 124.3 50.3 93.1 61.4 31.4 70.2 57.7 70.0 89.8 51.1 91.9 56.5 72.0 73.2 30.2 79.6 76.3 115.4 55.5 123.1 87.2 50.6 101.1 44.1 48.0 36.1 83.1 82.5 90.1 59.1 77.0 33.3 38.9 45.0 56.0 22.6 92.6

1 4 6 3 1 6 2 2 7 4 2 2 6 1 4 6 6 8 7 7 4 7 7 2 3 5 8 3 7 3 1 4 6 5 2 6 7 6 7 4 7

4 2 2 5 5 3 5 1 3 5 3 1 5 0 5 1 0 5 4 3 2 2 3 3 4 4 4 3 0 3 1 2 0 2 4 2 1 4 4 3 3

Copyright Cengage Learning. Powered by Cognero.

55 16 58 46 30 50 58 46 10 42 32 58 23 8 3 27 53 28 51 42 29 5 11 20 46 53 5 44 26 57 33 29 57 37 38 35 2 17 37 4 29

1 1 1 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 1 0 0 0 1 0 1 0 1 1 0 0 Page 135


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 100.9 67.4 94.6 52.1 47.0 87.9 17.5 59.3 65.2 52.5 32.9 76.4 44.2 95.4 84.9 114.4 84.5 31.7 100.0 126.1 23.3 52.8 34.2 95.7 54.2 89.0 53.6 66.8 30.5 38.2 101.7 121.1 46.5 72.3 99.8 85.4 77.9 56.3 76.6 73.8 63.5

7 4 4 5 3 5 1 2 3 1 3 4 4 1 6 3 4 5 4 5 2 7 5 5 3 2 2 3 7 5 2 4 5 2 1 8 4 3 2 7 8

2 2 1 3 4 0 3 4 0 2 2 1 4 1 2 3 5 5 5 1 4 0 4 3 1 1 2 2 0 2 4 2 4 1 2 2 2 3 1 4 3

Copyright Cengage Learning. Powered by Cognero.

19 54 46 4 34 45 57 52 24 35 3 25 49 33 31 24 32 29 51 31 43 34 29 3 2 11 29 45 28 36 1 12 25 8 54 54 2 28 41 26 15

0 0 0 0 1 0 1 1 0 1 0 0 1 0 0 0 0 1 0 0 1 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 0 1 0 0 0 Page 136


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 115.4 102.4 59.5 109.7 64.6 63.2 42.6 99.1 123.9 98.0 81.8 82.6 82.4 93.5 102.1 91.3 37.7 12.2 87.2 53.7 115.1 70.9 44.1 118.3 91.4 46.9 60.0 48.4 109.7 83.5 53.6 61.3 66.3 49.4 96.9 65.5 84.9 84.8 84.6 82.0 31.7

6 5 4 3 5 5 7 6 2 5 1 7 7 1 6 3 6 1 2 8 4 8 5 7 3 8 5 2 5 6 5 6 4 5 6 6 6 7 5 7 6

3 2 5 1 0 1 5 2 4 3 3 1 4 0 2 4 5 2 1 1 5 4 4 4 5 3 1 4 3 3 2 1 2 2 0 3 5 3 5 2 3

Copyright Cengage Learning. Powered by Cognero.

16 30 53 15 17 28 11 2 44 7 35 14 10 11 57 42 35 35 26 35 10 13 35 24 37 9 50 32 0 21 6 22 34 40 39 5 13 53 2 21 4

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 Page 137


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 91.8 93.1 100.2 89.7 91.7 119.1 87.0 102.2 54.9 68.8 38.5 46.6 47.7 37.2 79.5 86.0 64.4 77.4 108.0 87.7 84.0 51.5 68.5 13.2 129.2 78.0 27.2 39.4 27.3 23.8 68.3 90.4 83.4 76.1 53.9 64.7 119.6 44.3 53.4 54.8 120.8

7 4 6 5 6 2 5 4 6 5 3 3 7 1 4 2 5 5 7 7 2 5 1 4 2 4 7 6 3 5 3 2 4 7 1 5 5 2 4 3 2

4 3 1 1 3 0 1 2 1 2 5 2 1 2 4 2 1 0 4 2 4 1 2 3 0 3 3 3 3 2 1 4 2 4 4 5 3 4 4 5 3

Copyright Cengage Learning. Powered by Cognero.

40 46 42 15 41 41 58 18 29 1 12 50 11 23 37 17 35 6 29 37 52 53 13 29 53 30 0 10 54 14 2 48 36 17 24 47 21 54 11 53 7

0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 1 0 Page 138


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 119.4 52.5 69.0 82.9 83.6 42.9 50.1 107.3 79.8 97.0 102.6 66.6 58.6 45.0 110.9 38.6 117.5 22.8 101.1 118.3 24.6 70.7 87.5 41.6 91.5 48.9 84.0 68.1 73.2 65.5 57.3 48.8 73.9 111.5 51.0 36.3 21.6 51.0 90.7 38.0 47.3

7 6 7 5 7 6 4 1 7 1 5 7 7 2 5 4 2 1 6 5 1 4 8 3 7 2 5 5 8 7 8 6 5 6 7 6 8 2 7 8 3

1 5 3 0 3 5 1 4 1 4 2 1 3 1 4 4 1 0 5 4 4 1 3 4 2 3 2 0 0 5 1 4 1 0 1 2 4 2 1 3 4

Copyright Cengage Learning. Powered by Cognero.

51 45 54 26 20 6 24 53 3 59 5 10 56 10 5 8 4 50 40 52 53 52 38 40 5 1 45 38 20 39 25 18 23 31 43 24 34 22 21 60 51

0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 1 1 0 1 1 Page 139


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 78.9 62.7 45.1 27.6 89.4 83.0 96.5 43.5 112.7 12.4 52.4 72.9 33.2 46.5 87.7 84.8 25.1 86.6 75.8 107.3 61.6 20.1 55.9 31.0 28.2 101.2 58.3 89.8 130.9 67.2 113.1 22.6 70.4 113.4 66.1 56.2 107.9 46.6 47.2 87.9 106.6

5 6 2 2 7 4 5 2 8 2 2 2 4 4 1 6 5 7 4 4 1 4 8 5 7 6 5 6 7 5 2 3 6 2 7 6 3 8 4 6 6

2 0 4 0 3 1 3 1 4 1 1 2 2 4 5 2 4 5 4 2 2 0 2 5 3 1 3 2 0 4 3 3 2 2 4 2 5 2 2 5 5

Copyright Cengage Learning. Powered by Cognero.

53 5 32 29 20 23 60 3 3 5 5 3 42 30 35 27 42 43 22 15 54 2 35 50 10 2 51 48 21 46 50 59 50 56 41 54 53 44 55 50 46

0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 1 1 0 0 1 0 0 0 0 1 0 0 0 1 0 1 1 0 0 Page 140


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 33.5 39.2 89.2 88.5 94.2 35.7 9.3 73.0 40.1 54.2 76.5 122.3 120.4 53.8 97.2 108.1 101.6 111.4 93.1 97.4 27.7 26.7 94.4 36.8 78.7 86.5 84.8 108.7 52.7 87.8 21.9 82.8 45.3 53.9 43.2 98.1 85.0 20.4 50.8 99.8 46.4

3 5 3 1 2 4 7 7 6 8 2 6 5 5 8 5 4 2 6 4 4 4 6 7 3 6 4 4 3 5 6 3 2 4 2 4 2 7 3 5 4

3 2 2 2 1 1 3 1 1 1 2 4 4 3 0 5 4 3 5 3 1 1 0 5 1 1 4 2 1 3 1 3 4 4 2 1 2 3 2 2 3

Copyright Cengage Learning. Powered by Cognero.

29 51 11 1 60 9 54 6 42 60 2 8 55 26 24 48 59 18 15 29 47 22 0 60 58 50 33 50 22 26 4 43 24 50 42 16 50 14 57 2 25

1 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 1 Page 141


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 61.3 117.3 96.9 111.1 47.1 32.6 87.7 113.1 53.2 46.3 24.2 112.4 59.1 66.0 76.2 70.3 48.0 71.3 51.8 44.0 86.0 47.3 33.8 48.1 32.2 73.6 58.9 36.4 69.6 79.6 76.2 94.5 74.0 50.0 30.3 56.6 82.4 63.5 91.8 97.8 42.1

1 4 5 7 7 4 1 8 4 7 7 7 1 6 5 4 2 2 5 3 8 4 2 4 8 6 3 7 2 7 1 6 7 6 8 5 7 5 4 5 2

4 1 1 1 4 1 3 3 3 3 3 4 3 5 4 2 4 2 3 2 4 1 2 4 4 1 1 1 3 1 1 2 2 1 4 5 4 2 2 3 0

Copyright Cengage Learning. Powered by Cognero.

46 47 0 42 33 34 30 54 22 12 6 14 7 37 12 44 19 60 14 16 7 22 8 37 13 42 35 59 53 20 33 59 10 20 6 7 5 35 55 37 12

0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Page 142


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 60.7 68.5 91.2 22.0 44.3 48.1 50.8 63.5 42.0 58.9 50.4 89.2 91.3 92.2 86.8 59.9 63.2 81.2 31.6 35.6 68.1 60.1 86.2 92.1 76.2 105.5 64.9 26.5 124.5 57.0 38.8 90.7 41.3 41.1 57.5 58.2 60.4 110.1 90.0 85.6 36.2

8 6 4 4 2 5 6 2 1 2 7 7 3 5 7 6 7 4 4 2 2 2 4 4 4 6 7 6 4 2 2 3 7 4 6 5 5 7 5 2 3

1 4 4 3 1 5 5 4 3 3 2 2 1 2 3 3 1 1 1 5 2 0 3 0 3 1 2 1 3 4 4 2 1 4 5 5 4 2 2 2 4

Copyright Cengage Learning. Powered by Cognero.

12 60 5 60 16 44 53 47 2 2 26 39 10 57 38 2 43 0 29 28 32 54 15 34 11 54 10 21 36 25 24 1 51 31 4 53 2 55 50 33 28

0 0 0 1 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 1 0 0 0 0 1 Page 143


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 55.6 41.3 56.2 58.6 77.9 33.0 99.9 47.5 41.5 41.6 52.8 105.8 124.2 70.4 53.3 33.5 115.0 102.6 85.2 74.3 51.9 44.1 79.8 90.9 44.2 82.6 59.1 27.8 123.1 111.2 108.8 43.4 77.6 91.0 54.5 105.6 48.5 76.7 18.7 16.5 83.1

5 7 3 5 5 8 7 5 7 2 6 6 2 5 6 7 2 2 5 5 7 3 4 7 8 4 7 2 5 2 1 4 1 6 3 5 6 5 1 1 6

4 5 3 3 5 1 2 5 4 0 4 1 4 4 2 1 0 1 3 2 4 4 2 1 4 1 4 4 3 4 1 1 0 3 3 5 4 3 1 1 4

Copyright Cengage Learning. Powered by Cognero.

6 23 27 10 6 18 22 15 54 25 19 8 25 46 28 59 14 3 45 41 5 26 19 3 35 15 38 31 8 20 31 31 55 25 51 40 40 18 7 7 9

0 1 1 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 1 1 0 0 0 0 0 0 1 0 1 0 0 0 0 Page 144


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 78.0 48.0 112.4 58.1 103.7 41.5 100.4 65.1 57.6 26.9 63.4 64.5 124.0 109.3 58.3 66.8 24.6 98.6 96.1 49.3 13.5 49.1 91.9 78.4 58.9 88.5 55.2 98.6 116.2 48.0 101.1 84.9 64.9 35.6 51.3 33.5 52.7 124.5 15.2 15.0 44.7

3 8 1 5 7 4 7 2 3 5 3 7 7 7 3 6 7 5 6 2 5 7 6 7 4 6 3 8 8 3 5 7 7 4 8 5 1 3 4 6 3

4 4 2 3 4 1 2 1 3 1 3 4 0 1 1 1 4 3 4 1 1 1 4 5 5 1 1 0 5 0 4 2 2 1 0 2 4 0 4 3 4

Copyright Cengage Learning. Powered by Cognero.

30 26 43 31 34 51 14 19 51 37 48 21 21 10 33 4 6 17 24 39 51 32 13 39 44 7 50 6 52 42 8 58 6 44 29 55 19 5 21 8 60

0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 1 Page 145


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 13.3 97.8 84.8 50.8 28.2 92.6 27.4 59.0 52.0 40.9 125.4 129.8 71.4 104.4 28.3 79.4 32.7 45.3 22.0 91.2 52.7 61.3 98.9 44.3 38.7 92.9 74.2 100.4 122.3 69.2 101.9 56.2 104.7 110.0 115.5 104.1 53.5 70.6 52.6 49.7 97.3

2 5 4 3 8 6 2 5 1 7 2 6 6 7 6 5 8 7 5 5 3 4 7 4 3 5 8 5 2 7 4 8 1 2 6 5 1 7 4 6 6

2 0 5 1 1 3 3 4 1 1 1 1 2 0 4 5 5 2 3 1 3 1 5 3 2 5 3 5 4 1 2 5 0 2 4 2 4 4 4 2 4

Copyright Cengage Learning. Powered by Cognero.

2 44 17 26 44 46 32 53 28 59 36 3 6 36 42 27 12 37 25 1 16 25 1 46 22 19 12 57 52 57 15 38 59 3 32 53 20 8 4 50 54

0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 1 1 0 1 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 Page 146


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 96.1 74.9 79.9 92.4 103.2 79.7 28.0 10.1 29.5 58.4 86.4 78.4 114.4 72.7 46.8 40.9 75.2 73.2 101.0 60.5 64.5 93.7 56.3 46.3 100.2 84.2 101.6 62.6 42.9 67.9 56.8 70.4 61.2 57.8 63.5 45.9 71.5 51.7 44.3 49.7 64.6

5 2 5 6 3 3 5 5 2 5 8 7 7 5 3 5 6 6 5 7 6 2 8 4 7 1 6 7 4 7 7 4 3 1 3 2 3 5 3 7 3

4 2 0 1 1 1 1 5 3 0 3 5 3 3 3 0 1 2 2 1 2 3 3 2 4 2 3 4 0 2 2 2 5 0 4 1 1 3 2 4 1

Copyright Cengage Learning. Powered by Cognero.

17 39 1 11 58 9 25 29 31 60 12 47 21 52 21 21 41 33 20 3 31 6 4 29 12 2 12 22 54 49 19 49 5 0 19 55 34 42 35 34 35

0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 1 0 Page 147


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 98.2 73.0 54.9 44.3 45.3 43.7 46.3 94.8 94.5 49.9 46.0 56.0 91.2 102.4 28.0 51.8 79.3 94.2 24.6 44.8 79.8 71.1 95.3 34.1 96.9 50.0 49.9 31.6 81.7 59.0 84.4 69.2 35.9 103.7 68.4 59.8 44.5 96.1 61.9 37.9 36.3

4 6 6 6 7 2 4 8 5 7 4 2 2 7 4 3 5 5 4 4 4 6 7 8 2 6 1 5 2 8 2 4 2 3 7 8 4 2 5 1 8

2 2 1 5 4 2 1 4 5 2 2 4 4 5 4 1 2 5 4 4 2 4 3 4 1 5 1 1 4 4 1 2 4 4 3 3 1 0 3 3 5

Copyright Cengage Learning. Powered by Cognero.

56 56 23 38 47 47 46 33 47 12 2 54 39 25 52 13 37 44 34 18 26 27 5 48 42 23 55 24 9 37 27 46 15 41 2 18 14 14 39 44 42

0 0 0 1 1 1 0 0 0 0 0 1 0 0 1 0 0 0 1 1 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 1 1 Page 148


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 60.4 26.7 28.4 72.7 101.8 53.0 116.0 92.2 55.7 126.5 108.3 55.6 130.7 75.2 58.3 55.6 44.4 71.8 116.7 41.5 61.7 56.5 87.5 92.1 70.7 93.9 80.1 60.5 66.8 63.5 121.1 50.2 39.6 88.8 101.2 52.0 100.8 70.6 88.9 89.4 82.7

2 5 1 4 8 5 7 2 1 1 6 5 5 5 6 5 2 4 5 4 2 2 1 1 4 3 4 5 5 6 6 5 2 4 6 5 5 3 8 5 6

0 4 3 4 1 1 1 0 0 4 2 1 4 5 5 4 1 4 3 4 5 0 1 2 3 1 5 1 5 0 3 1 4 1 2 2 4 0 2 2 2

Copyright Cengage Learning. Powered by Cognero.

59 60 30 59 18 51 6 32 12 21 14 47 54 11 19 50 45 12 4 5 35 3 35 56 8 15 27 1 28 31 41 2 13 19 54 19 50 47 41 29 41

0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 Page 149


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 41.0 68.5 109.2 98.5 28.6 61.5 41.3 116.6 110.2 44.2 79.0 49.5 120.5 45.6 60.6 69.9 48.1 44.4 46.2 53.7 79.0 48.9 45.2 113.2 64.0 45.5 103.5 129.4 50.5 61.1 111.9 53.3 37.5 95.7 104.4 95.2 17.9 67.0 45.0 17.0 56.0

6 2 7 6 7 2 5 8 4 5 6 5 7 4 4 8 5 7 6 2 1 4 1 7 1 7 6 5 6 5 8 6 3 7 7 8 4 1 4 5 8

3 2 2 1 0 5 1 2 4 2 1 1 2 2 4 2 1 3 4 5 2 2 5 0 4 3 4 5 1 0 3 5 5 5 1 1 1 1 1 4 2

Copyright Cengage Learning. Powered by Cognero.

39 25 29 11 30 22 14 47 19 12 38 51 54 8 51 28 36 12 19 16 55 14 21 36 57 8 4 24 5 8 59 58 55 28 28 57 40 16 7 3 48

1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 Page 150


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 83.9 44.8 36.0 45.6 60.4 9.1 96.2 107.1 104.0 108.2 84.5 117.9 62.4 40.1 58.9 33.6 38.4 24.8 72.4 109.7 74.0 63.0 80.7 38.2 80.7 102.0 108.6 98.0 79.4 97.0 118.3 74.7 88.7 83.2 51.3 44.8 81.7 47.6 101.5 77.5 93.1

4 2 3 3 3 7 4 7 2 3 6 4 3 1 2 6 5 4 5 4 6 2 3 7 5 1 1 5 6 6 6 7 3 2 7 5 8 7 5 6 2

0 3 0 5 2 2 1 3 2 1 1 4 0 3 4 4 2 3 2 2 4 3 2 4 1 4 5 4 4 2 3 0 5 1 2 2 2 4 2 0 5

Copyright Cengage Learning. Powered by Cognero.

34 12 40 53 59 16 31 2 4 57 46 6 33 49 18 42 12 15 33 55 24 46 28 3 59 6 16 17 9 24 15 26 40 58 31 15 17 9 21 55 21

0 0 0 1 0 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 Page 151


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 39.0 105.3 38.1 109.2 113.0 21.6 72.9 41.9 103.3 108.0 52.1 80.1 98.8 21.7 119.9 85.4 24.0 96.8 45.4 72.7 21.5 41.5 57.7 20.3 88.7 107.1 94.6 79.9 61.5 91.0 54.5 33.0 55.5 95.6 33.6 101.7 23.3 28.0 50.3 77.5 44.7

8 6 2 2 2 2 4 3 6 7 3 4 6 5 1 6 2 3 6 3 5 8 3 5 4 4 8 3 3 2 4 5 3 3 2 3 4 6 5 4 5

4 5 1 1 1 1 3 3 2 0 0 2 4 3 2 3 2 4 4 0 0 4 4 1 1 5 5 3 0 0 4 2 4 1 1 0 3 0 1 0 4

Copyright Cengage Learning. Powered by Cognero.

30 33 58 21 45 39 60 8 57 5 42 33 14 44 49 25 21 24 15 37 8 44 44 4 9 10 55 58 52 23 45 27 46 49 0 49 36 15 39 18 8

1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 0 0 0 Page 152


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 74.2 53.4 79.3 91.2 98.9 23.4 43.0 54.6 94.1 46.7 85.2 98.8 80.3 52.5 93.9 102.2 52.0 69.7 37.4 51.9 14.6 37.1 35.4 70.9 58.9 77.0 44.7 78.5 48.8 78.3 50.6 104.0 76.7 66.6 104.8 85.9 97.7 63.6 57.2 43.7 46.3

7 6 3 4 2 8 7 4 5 6 3 2 6 6 8 2 6 5 3 4 7 7 7 7 5 8 2 6 5 6 4 7 5 8 3 1 8 8 3 1 2

5 4 2 4 2 3 4 3 2 4 2 0 4 3 1 2 3 2 2 2 4 5 0 5 4 4 3 2 2 2 2 4 5 3 4 2 4 0 2 2 2

Copyright Cengage Learning. Powered by Cognero.

42 26 16 40 30 10 12 11 42 57 1 10 39 20 2 39 53 23 36 51 51 12 44 41 28 31 17 38 47 39 41 32 36 35 37 37 49 24 45 57 8

0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 1 1 1 0 0 0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 1 1 0 Page 153


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 85.9 92.4 67.9 119.3 50.3 81.8 40.5 88.2 126.8 8.9 105.5 93.6 109.6 97.8 69.9 113.9 80.4 61.1 81.0 113.1 56.4 40.5 112.8 43.4 95.5 30.1 49.3 66.2 42.9 78.6 52.6 73.1 62.0 71.6 31.8 86.1 59.6 33.7 66.6 49.4 20.9

1 1 6 7 7 4 6 1 3 2 6 3 6 3 4 4 3 1 7 5 7 6 6 6 2 5 6 3 7 4 1 5 5 7 2 7 5 7 8 5 6

4 3 4 3 3 0 4 4 3 5 1 0 4 4 5 0 1 0 2 1 3 2 5 0 5 0 4 3 4 4 3 2 4 4 2 1 2 1 2 4 3

Copyright Cengage Learning. Powered by Cognero.

52 27 19 10 12 25 53 53 57 40 28 7 19 54 36 23 18 0 22 24 59 44 52 44 31 2 38 59 21 48 21 51 49 37 34 10 51 47 43 44 56

0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 1 0 1 0 0 0 1 0 1 0 0 1 1 Page 154


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 75.2 85.5 82.0 98.4 52.5 38.5 45.3 76.4 68.1 76.8 27.8 70.0 90.5 57.7 24.4 96.4 107.3 87.8 92.4 54.4 40.0 67.9 67.1 112.1 47.4 66.5 41.6 115.1 63.4 105.4 27.0 43.5 52.7 71.5 55.6 106.1 85.7 42.2 32.4 29.6 107.8

5 2 5 5 6 2 6 3 5 2 1 5 5 5 8 1 5 4 7 4 8 3 4 5 6 2 8 8 4 7 2 4 5 5 5 2 5 5 3 6 4

3 3 1 0 1 2 3 0 2 1 0 1 4 2 1 0 4 5 1 3 3 0 3 5 1 1 1 2 1 3 1 1 4 0 3 1 2 3 1 2 4

Copyright Cengage Learning. Powered by Cognero.

28 8 16 1 24 33 13 38 1 7 31 58 16 59 16 43 48 27 42 15 30 33 48 47 3 25 10 20 3 5 30 15 6 41 12 47 9 50 34 57 37

0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 Page 155


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 45.8 22.1 59.2 66.3 53.7 51.2 84.8 55.3 63.6 73.5 70.7 33.2 35.4 39.3 73.1 108.6 71.7 66.8 96.7 115.0 67.6 98.6 41.6 41.6 38.4 41.0 30.2 39.3 110.2 122.1 54.3 50.4 51.1 33.7 43.0 103.0 35.5 86.0 74.2 114.2 94.4

8 3 7 2 4 5 7 4 8 3 6 7 8 7 3 3 5 6 3 7 3 2 1 3 2 6 5 5 5 6 4 1 3 7 5 5 5 3 2 3 6

4 5 1 0 0 4 4 3 0 2 4 3 4 3 3 5 0 1 5 4 0 0 2 2 1 4 2 2 2 4 4 1 0 2 4 2 1 5 2 1 4

Copyright Cengage Learning. Powered by Cognero.

24 47 33 33 24 49 49 49 1 12 36 18 5 40 26 29 52 16 45 5 24 54 33 13 55 12 22 22 60 51 44 42 31 56 30 39 11 56 40 46 16

1 1 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 1 0 0 1 1 0 0 0 0 0 0 Page 156


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 45.6 20.0 64.4 85.4 17.6 38.6 9.8 38.2 21.8 88.2 75.9 108.7 68.2 47.0 40.9 37.5 51.9 30.9 32.7 50.7 71.2 54.9 31.3 44.2 76.9 29.0 48.9 49.4 116.7 92.1 90.5 38.1 80.9 46.7 43.0 71.8 54.3 52.5 51.7 94.1 93.1

5 3 7 6 7 6 2 5 4 4 3 8 3 6 1 4 2 2 3 7 7 8 3 7 6 1 6 8 7 5 7 5 7 1 6 4 3 5 3 7 1

5 5 0 2 3 4 1 1 4 4 3 1 4 1 5 4 2 1 5 3 1 4 0 2 2 1 3 3 4 3 2 1 2 0 3 5 2 1 4 0 1

Copyright Cengage Learning. Powered by Cognero.

25 60 17 58 56 59 12 41 16 17 13 31 27 10 16 10 21 18 32 16 37 16 57 48 17 15 39 3 48 44 35 49 11 37 30 34 43 3 25 29 13

1 1 0 0 1 1 0 0 1 0 0 0 0 0 1 0 1 0 1 1 0 1 0 1 0 0 1 0 0 0 0 0 0 0 1 0 1 0 1 0 0 Page 157


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 53.6 26.8 74.3 91.2 108.8 99.4 25.0 65.4 32.5 57.2 100.2 52.1 66.4 73.4 35.0 75.8 83.6 86.1 64.5 43.6 49.0 113.1 105.8 50.9 81.5 91.5 54.3 68.9 21.9 51.0 72.1 44.8 18.4 37.1 41.1 59.7 92.2 83.4 108.8 54.5 46.3

2 5 6 3 7 1 8 7 5 4 2 2 8 7 4 3 7 4 5 7 2 7 4 2 8 1 6 4 7 5 4 1 2 3 3 7 5 5 4 7 8

4 2 1 3 4 2 0 1 4 0 1 1 1 3 5 4 4 5 2 4 2 5 1 3 3 4 4 1 3 2 3 3 3 3 4 3 3 2 4 2 3

Copyright Cengage Learning. Powered by Cognero.

14 46 0 1 19 30 58 57 40 21 33 55 2 59 31 45 52 60 24 45 26 55 50 32 33 40 26 19 46 8 33 40 20 37 14 38 52 34 14 30 12

0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 1 0 0 1 0 0 1 0 1 0 0 1 1 1 0 1 0 0 0 1 0 Page 158


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 89.1 76.9 48.3 29.9 46.2 76.9 46.8 100.5 60.1 42.0 58.4 46.1 18.1 118.3 64.3 73.9 52.0 99.7 54.7 31.2 56.0 31.5 99.4 98.4 98.7 56.1 71.9 67.1 55.4 33.0 131.5 120.4 69.9 75.8 74.7 31.5 54.3 68.1 49.1 72.8 34.1

5 2 5 4 1 1 6 7 1 7 5 7 5 1 8 3 5 5 5 7 6 4 6 3 6 3 8 4 2 7 4 8 6 2 5 4 2 3 2 3 5

3 2 4 4 4 2 4 4 0 4 5 4 3 1 4 2 1 1 1 1 3 5 4 4 2 5 2 0 2 4 4 0 3 4 2 0 5 3 1 4 1

Copyright Cengage Learning. Powered by Cognero.

12 51 29 4 49 17 31 24 58 9 51 26 27 19 9 41 22 38 43 26 12 9 13 7 37 26 4 58 17 28 27 8 4 50 37 55 58 28 49 33 13

0 0 1 0 1 0 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 Page 159


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 88.3 80.8 47.7 15.9 79.2 27.9 92.5 29.8 48.1 97.1 45.6 83.0 34.4 79.4 34.6 44.5 77.4 87.9 48.3 45.4 56.9 35.6 52.5 56.1 20.6 76.6 82.1 79.2 131.7 107.3 92.0 43.0 79.5 52.4 63.6 35.8 103.5 57.8 118.1 70.4 37.6

3 5 7 1 6 6 1 4 7 3 7 2 2 1 2 1 1 6 7 4 4 7 4 5 6 8 3 7 3 3 5 4 6 5 6 6 4 6 2 1 3

3 2 0 4 3 4 4 1 2 3 2 1 3 3 5 0 2 4 1 3 5 0 4 4 1 4 4 1 3 3 2 3 3 2 4 5 4 0 5 3 1

Copyright Cengage Learning. Powered by Cognero.

15 40 13 9 0 29 47 9 48 34 29 22 21 32 16 20 23 32 53 13 5 19 8 16 36 18 15 4 28 1 51 13 34 32 9 48 60 35 41 1 44

0 0 0 0 0 1 0 0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 Page 160


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 107.1 119.3 73.6 48.6 89.8 66.5 52.6 108.1 58.9 48.7 76.2 101.3 50.2 57.3 110.9 75.9 56.4 93.7 75.9 73.4 44.5 86.6 42.3 47.8 84.1 115.4 75.0 119.1 28.9 110.5 55.9 118.4 79.1 125.4 73.0 96.4 115.6 36.4 48.6 83.5 33.6

6 6 2 5 6 5 3 5 5 4 1 5 3 3 6 4 6 3 2 3 3 3 5 3 4 5 2 4 1 3 1 2 1 6 7 2 2 7 2 4 5

1 3 2 3 1 4 2 3 2 1 3 2 1 5 5 3 1 1 1 4 3 0 2 3 4 1 2 5 3 2 1 5 3 2 4 2 4 2 2 1 1

Copyright Cengage Learning. Powered by Cognero.

11 29 30 28 28 12 36 39 20 27 55 42 50 32 20 47 48 58 30 18 26 15 48 4 30 43 12 35 5 59 30 51 17 1 46 9 57 35 39 8 55

0 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 Page 161


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 91.1 70.7 93.6 107.0 76.7 50.0 47.1 51.5 115.6 82.0 107.7 124.8 100.1 36.1 49.6 58.9 89.2 27.3 120.1 127.9 113.8 20.9 44.7 107.8 55.4 59.8 54.4 70.3 109.5 74.2 102.0 53.0 62.4 108.2 72.1 52.4 81.0 59.9 91.2 40.3 34.4

7 5 3 2 3 5 7 6 8 1 7 7 5 4 6 5 5 6 3 6 5 7 3 4 6 7 4 7 7 3 7 5 3 3 7 6 4 3 3 7 2

5 3 4 5 4 3 1 4 3 5 0 5 1 3 3 1 4 2 4 4 3 2 1 4 3 4 2 3 1 5 3 2 3 2 2 4 2 1 5 3 4

Copyright Cengage Learning. Powered by Cognero.

14 32 3 43 5 59 18 12 36 44 38 20 31 20 38 5 57 48 7 48 58 32 10 8 29 42 23 38 8 32 7 30 48 25 55 31 38 27 22 44 17

0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 1 0 0 1 1 1 0 0 0 0 1 0 0 0 1 0 0 0 1 1 Page 162


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 26.7 96.1 92.1 50.0 91.2 97.0 60.9 42.9 123.5 93.5 72.6 23.5 42.7 35.2 48.9 42.9 86.8 66.9 96.5 125.0 37.7 81.0 36.4 105.9 81.4 95.2 40.6 117.9 72.4 82.6 47.4 24.3 75.4 40.2 68.3 19.3 8.3 27.9 90.4 49.0 104.4

2 4 3 7 7 2 7 6 4 6 5 8 3 8 5 3 2 3 4 2 7 2 7 3 3 7 2 4 2 8 7 2 1 5 4 3 8 2 2 2 7

3 1 3 2 2 1 2 1 5 3 3 4 4 1 3 4 0 2 1 2 3 1 4 0 1 3 0 2 0 1 1 5 3 3 5 0 2 4 4 4 2

Copyright Cengage Learning. Powered by Cognero.

53 36 60 33 44 51 15 38 0 46 1 53 31 16 24 15 22 0 33 11 10 17 12 52 14 35 41 57 54 14 1 9 36 23 26 8 43 39 44 43 1

1 0 0 1 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 1 0 Page 163


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 43.9 28.8 38.8 56.2 55.2 98.3 38.7 89.3 77.4 68.2 51.8 46.5 46.6 67.5 49.2 78.3 67.0 88.7 93.0 41.1 83.6 98.8 100.7 48.5 54.9 53.9 44.9 44.5 44.0 72.4 97.5 112.7 32.6 106.8 61.2 115.0 85.1 53.7 120.4 46.1 114.8

1 5 8 2 7 7 2 3 5 4 6 1 6 1 7 4 7 2 4 2 7 8 3 8 8 7 5 2 6 2 8 4 8 4 5 4 7 3 7 1 6

1 5 0 3 1 5 0 5 5 3 4 2 1 0 5 3 2 0 5 1 0 1 0 4 0 2 2 2 2 0 1 3 3 3 4 4 1 3 3 0 0

Copyright Cengage Learning. Powered by Cognero.

26 47 17 53 0 2 3 30 53 7 57 41 17 10 33 51 29 4 34 18 51 7 45 1 52 10 32 4 28 29 33 49 15 52 55 14 47 46 38 3 52

0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 Page 164


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 97.5 93.5 41.4 51.2 63.8 89.5 110.1 85.2 73.0 75.9 86.5 41.1 72.7 107.5 59.2 91.1 34.2 89.4 43.0 35.3 99.9 49.4 105.7 50.2 20.2 84.7 95.7 95.0 40.2 82.0 43.3 22.2 72.7 69.8 89.7 82.7 27.9 95.0 56.3 17.4 71.2

4 5 7 3 5 6 4 4 6 5 2 3 4 6 3 4 4 3 1 1 6 2 8 2 4 2 3 1 6 7 4 4 2 6 1 5 7 6 1 4 8

0 5 3 2 1 4 5 0 5 1 1 5 0 3 4 4 3 1 1 2 0 1 3 0 3 2 4 2 3 5 2 4 3 2 2 4 4 3 5 2 0

Copyright Cengage Learning. Powered by Cognero.

25 4 59 17 28 57 51 56 50 27 57 31 1 15 42 19 44 31 21 10 11 21 9 41 40 28 40 26 40 53 4 33 11 42 2 18 29 48 30 33 34

0 0 1 1 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 1 0 1 1 0 Page 165


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 19.2 104.5 78.8 61.3 22.9 97.1 43.7 97.6 77.8 62.5 50.4 42.0 112.9 35.9 50.8 109.3 83.6 119.7 23.4 83.7 73.8 52.1 30.9 86.1 123.6 98.9 60.0 114.8 95.0 78.7 47.5 74.6 70.3 112.8 73.8 93.4 48.4 83.9 40.7 76.8 100.4

4 3 3 7 6 4 3 6 4 3 3 3 4 7 1 5 4 6 4 6 3 3 8 2 5 3 5 7 1 4 5 5 5 1 7 8 5 7 7 5 1

4 0 2 4 4 2 2 3 4 1 5 1 1 2 5 0 4 3 4 2 2 3 2 2 3 4 3 5 1 3 2 4 4 2 4 3 0 1 1 3 0

Copyright Cengage Learning. Powered by Cognero.

20 5 51 20 58 42 14 10 54 17 22 55 47 55 35 8 1 51 10 47 44 15 31 26 3 21 17 46 28 16 47 59 57 56 5 12 58 3 2 46 52

1 0 0 0 1 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 Page 166


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 27.0 94.6 85.5 127.0 51.9 53.8 30.7 44.2 52.3 62.4 18.1 69.5 63.2 35.6 93.3 99.9 114.8 78.0 25.6 76.0 44.9 39.3 98.5 78.0 49.7 33.5 56.6 85.0 25.2 107.0 41.5 37.7 117.2 13.9 53.8 50.8 28.4 36.0 52.7 74.7 69.9

2 6 7 7 5 8 7 2 6 6 5 2 1 6 3 7 1 1 3 7 1 2 8 8 5 4 6 8 4 7 2 5 5 6 8 7 3 8 3 1 5

3 2 4 4 3 3 3 4 1 4 1 4 1 2 3 0 3 1 4 4 1 2 2 4 5 0 3 2 2 3 4 3 3 2 1 2 3 2 5 0 4

Copyright Cengage Learning. Powered by Cognero.

45 29 4 43 4 1 16 28 24 31 35 23 39 58 47 2 43 40 30 32 6 54 60 1 34 54 26 60 32 1 10 50 19 42 12 23 25 28 58 10 35

1 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 1 0 1 0 0 1 0 1 0 1 1 1 1 0 0 Page 167


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 43.6 91.0 120.6 81.6 16.9 75.3 83.6 103.1 44.0 45.7 72.3 53.3 89.2 27.7 80.4 47.9 74.8 86.7 62.8 39.7 50.7 53.6 33.7 51.2 35.1 61.6 91.0 76.3 67.0 84.8 103.3 20.4 118.5 30.9 20.7 95.2 67.3 81.7 60.9 85.5 98.0

6 4 7 5 4 5 2 8 3 5 6 4 6 6 3 3 7 6 6 6 6 6 3 1 2 5 1 5 5 7 6 1 6 4 5 1 3 1 4 8 4

1 1 2 2 0 5 2 3 5 3 4 2 5 2 2 1 1 3 1 4 1 1 3 0 1 1 2 3 2 5 4 3 1 5 1 3 3 3 3 3 4

Copyright Cengage Learning. Powered by Cognero.

53 37 26 56 23 28 36 18 12 54 38 42 46 32 23 48 37 3 23 34 10 16 19 15 60 45 57 43 7 27 45 55 46 15 10 49 25 33 49 24 46

0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 Page 168


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 99.2 62.5 112.7 94.9 48.2 16.8 31.6 69.2 85.7 59.6 100.4 103.6 29.5 77.4 60.7 75.3 48.5 87.1 78.4 50.4 68.2 90.1 116.6 40.6 51.5 78.4 109.3 93.4 23.8 73.2 44.8 74.9 35.9 119.1 47.5 122.1 106.8 71.9 77.1 40.9 79.4

7 3 3 4 7 4 5 3 6 5 5 5 2 7 5 2 8 4 8 5 5 1 4 2 3 2 3 2 6 5 7 3 6 7 2 1 5 5 3 3 7

1 3 1 5 1 4 5 1 0 5 3 1 0 2 0 0 3 5 4 3 0 1 2 2 1 0 0 1 5 5 4 4 1 1 1 4 5 3 1 2 4

Copyright Cengage Learning. Powered by Cognero.

57 20 23 34 1 28 47 55 53 53 46 56 31 45 8 28 57 17 55 32 1 26 32 17 44 52 59 38 3 15 0 8 30 54 14 52 44 50 14 47 42

0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 Page 169


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 114.3 62.0 19.9 58.0 49.6 67.5 103.1 54.3 31.8 115.2 53.4 39.2 40.8 38.6 5.6 70.4 74.0 113.0 115.8 70.6 50.8 38.4 121.0 60.0 119.1 37.3 54.7 58.8 87.0 130.4 53.5 24.1 94.7 89.0 78.1 89.0 70.7 94.8 104.6 73.2 93.1

4 7 2 8 6 4 5 3 3 3 5 1 7 5 4 3 2 2 8 7 6 2 2 8 7 5 2 2 7 7 1 7 4 1 4 2 8 4 5 7 1

1 4 4 2 2 4 1 5 4 0 4 2 2 5 1 3 3 0 0 3 3 2 3 1 4 2 3 4 5 0 2 2 0 3 2 0 2 2 5 4 5

Copyright Cengage Learning. Powered by Cognero.

31 17 13 13 37 16 18 27 44 37 38 40 24 13 35 42 36 54 36 17 56 33 6 35 25 31 42 13 13 29 48 32 58 13 18 41 14 48 43 41 38

0 0 0 0 1 0 0 1 1 0 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 Page 170


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 45.2 33.5 100.9 34.2 71.4 55.0 97.2 20.5 73.5 47.2 44.7 25.5 53.7 82.7 22.5 76.7 47.2 125.0 93.0 112.0 27.1 45.6 38.2 54.3 71.4 95.3 9.8 74.4 119.2 40.1 44.6 37.9 36.5 106.3 97.7 65.4 21.9 66.0 36.1 43.0 90.6

5 3 3 3 5 4 5 5 3 5 6 5 3 7 6 8 6 6 3 3 2 7 5 8 4 3 6 8 1 3 4 2 6 2 7 6 7 2 2 4 6

2 3 5 3 5 5 1 4 2 1 3 4 2 3 1 5 5 1 5 0 1 2 2 2 1 4 4 2 1 2 3 2 5 5 0 0 2 4 1 2 1

Copyright Cengage Learning. Powered by Cognero.

31 6 47 29 18 24 14 22 5 59 48 7 43 50 18 57 40 32 3 44 26 29 51 56 49 28 30 24 34 24 20 33 55 14 43 23 52 50 60 49 5

1 0 0 1 0 1 0 1 0 0 1 0 1 0 0 0 1 0 0 0 0 1 1 1 0 0 1 0 0 1 1 1 1 0 0 0 1 0 0 1 0 Page 171


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 94.9 108.8 56.2 54.5 63.7 123.0 63.7 107.3 81.4 98.0 106.7 81.2 30.2 83.5 47.2 37.1 100.7 44.4 122.0 77.1 123.5 84.3 96.8 32.8 35.1 21.8 41.6 98.6 103.1 53.0 51.6 16.6 51.1 69.8 87.6 104.5 37.5 26.1 77.2 40.1 22.5

7 3 5 2 5 3 8 7 8 3 8 4 6 3 4 8 5 2 6 3 4 7 2 4 5 1 6 1 7 6 4 5 1 4 3 8 5 6 6 4 1

3 1 0 5 1 4 4 0 4 3 0 2 1 3 2 2 0 2 3 4 1 4 1 2 3 0 3 1 4 4 1 5 4 2 0 1 3 4 3 3 1

Copyright Cengage Learning. Powered by Cognero.

29 53 32 49 5 50 49 41 59 4 21 32 30 35 12 0 48 42 4 29 39 4 52 37 27 47 17 58 36 18 55 56 42 30 2 4 37 7 54 28 48

0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 1 0 0 1 0 1 1 0 0 0 1 0 0 1 0 Page 172


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 65.3 26.4 58.1 58.8 77.1 91.7 83.0 113.2 96.1 40.7 46.8 71.7 26.7 30.2 70.9 92.6 68.1 76.1 38.6 34.6 55.3 72.5 59.5 64.7 54.9 42.3 88.1 73.3 105.2 55.2 108.8 91.5 48.9 5.4 89.9 59.5 21.6 111.8 33.6 101.4 44.2

7 5 4 3 5 5 7 5 1 5 7 4 7 5 3 1 2 7 6 6 6 4 2 7 1 2 2 4 4 6 7 3 4 6 4 5 5 2 2 5 8

0 4 3 2 0 2 1 3 3 5 2 3 5 5 1 2 3 1 4 4 3 0 1 1 4 4 0 4 4 4 4 0 1 1 1 3 4 3 1 2 0

Copyright Cengage Learning. Powered by Cognero.

20 26 24 40 21 1 31 30 56 51 44 28 53 30 52 23 39 20 23 17 33 56 56 20 24 54 6 22 33 33 20 42 44 23 15 48 50 57 20 14 1

0 1 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 0 1 1 1 0 0 0 1 1 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 Page 173


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 67.4 20.4 50.1 89.1 44.1 36.8 47.8 77.1 45.5 25.1 60.7 68.8 79.9 102.8 45.3 93.9 76.6 92.3 63.4 51.9 49.3 51.6 83.3 80.4 105.8 69.7 35.6 100.8 79.7 101.4 42.8 59.3 29.4 110.4 73.6 15.0 73.5 38.0 86.9 58.7 63.2

2 2 3 2 2 3 3 4 2 6 2 5 8 2 2 6 2 4 2 2 5 2 1 8 7 8 6 2 2 2 8 4 7 6 1 2 2 2 3 7 1

2 3 3 1 3 4 2 2 4 1 0 4 2 4 0 3 3 5 3 2 3 5 1 2 4 4 2 3 1 2 1 5 4 5 4 2 2 3 2 1 1

Copyright Cengage Learning. Powered by Cognero.

54 17 49 38 30 14 17 30 10 15 35 25 7 25 45 4 50 42 31 53 56 58 48 31 12 37 10 42 43 3 27 53 42 5 15 34 35 15 14 33 26

0 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 Page 174


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 93.7 35.0 41.8 103.0 64.6 42.0 53.0 110.8 38.9 126.8 37.5 54.7 39.5 42.4 109.4 44.2 89.4 83.5 103.1 34.2 49.6 73.6 50.7 40.2 40.3 84.2 103.6 41.5 72.4 47.3 94.3 95.8 56.5 39.0 109.0 46.3 10.4 42.2 90.4 97.4 44.2

6 6 6 7 2 3 3 4 2 7 7 7 5 3 8 1 6 4 7 4 2 1 2 6 1 2 6 7 6 8 4 3 8 6 6 7 6 8 5 4 5

3 1 3 4 2 0 2 2 5 3 4 1 1 3 2 1 3 0 0 3 2 4 4 2 1 5 1 1 4 4 1 4 3 3 4 1 1 2 1 4 1

Copyright Cengage Learning. Powered by Cognero.

59 7 3 34 46 47 42 22 28 33 9 36 28 51 55 58 31 22 41 8 12 51 39 2 6 9 51 48 50 51 19 11 5 23 37 6 2 48 45 19 26

0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 Page 175


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 19.3 49.8 72.3 41.2 28.5 29.3 73.6 84.8 56.2 76.4 48.7 78.4 22.9 34.0 58.2 89.4 117.1 45.0 112.8 107.0 69.0 51.0 93.5 31.5 116.2 40.6 67.2 83.7 79.5 87.4 39.6 78.3 13.4 51.2 88.5 29.5 117.9 98.4 81.4 28.8 86.4

6 3 3 2 2 1 7 2 3 7 3 2 8 2 6 6 5 3 4 3 4 6 4 4 7 1 4 3 4 3 7 4 7 3 1 2 1 4 7 7 6

1 5 1 5 1 2 4 3 3 3 1 4 4 0 4 3 3 2 1 4 4 2 2 5 2 1 2 0 1 5 5 1 1 4 4 3 4 3 5 0 0

Copyright Cengage Learning. Powered by Cognero.

15 55 28 59 14 3 30 18 9 3 8 7 41 50 49 12 53 55 57 11 54 47 8 37 60 24 41 19 59 42 30 5 26 58 54 57 25 20 8 40 17

0 1 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 Page 176


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 118.3 92.0 83.1 30.7 57.7 59.5 13.7 54.4 124.8 84.6 117.3 28.6 67.4 120.2 39.6 19.7 83.8 66.9 88.7 61.1 42.5 13.1 40.9 113.0 99.4 104.5 80.2 48.1 83.8 49.5 61.2 39.4 40.5 76.4 51.0 49.2 37.4 57.1 69.0 78.4 97.1

7 6 3 3 8 1 7 1 4 1 4 5 7 4 2 8 2 6 2 6 4 7 4 4 8 7 2 3 4 4 1 5 7 2 3 6 4 2 8 5 2

2 4 4 0 4 5 1 2 2 3 3 1 2 2 2 5 3 2 2 4 5 4 1 2 4 5 4 0 3 4 1 0 0 4 4 2 1 5 5 2 5

Copyright Cengage Learning. Powered by Cognero.

11 29 40 47 1 8 11 37 34 58 43 53 9 11 5 47 23 12 7 32 38 35 15 12 31 22 8 32 2 18 47 37 52 44 51 38 21 16 20 58 21

0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 1 0 0 0 Page 177


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 30.1 87.8 81.8 80.5 40.0 113.4 79.8 26.3 113.1 67.7 82.0 52.3 53.0 16.7 65.6 49.2 54.8 71.8 50.5 22.9 102.5 25.4 32.7 92.3 52.2 89.4 49.6 61.3 20.7 65.6 98.2 58.9 91.1 88.7 63.1 119.2 82.5 118.8 73.7 26.9 76.6

4 6 2 5 4 8 5 2 6 2 1 3 4 3 3 1 5 6 5 5 3 1 5 1 5 7 4 1 3 1 8 4 5 3 7 4 5 3 2 2 2

4 4 4 1 4 0 5 4 4 2 3 1 3 0 2 0 3 0 0 1 3 3 2 2 0 1 2 1 2 3 2 1 3 4 1 2 3 3 2 2 3

Copyright Cengage Learning. Powered by Cognero.

59 22 26 52 41 4 41 6 27 42 36 12 2 25 16 45 57 34 6 9 45 48 25 59 24 33 55 44 58 3 37 13 12 36 39 1 3 11 9 2 36

1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 Page 178


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 111.5 27.0 101.6 72.4 63.1 48.2 117.2 90.9 114.8 92.7 35.2 40.9 114.7 101.6 62.2 55.1 52.6 107.7 76.5 95.2 29.1 33.7 72.1 80.3 50.6 63.4 20.9 92.8 73.7 71.9 50.5 121.3 51.9 33.6 90.3 29.6 104.0 102.3 51.4 89.0 66.4

1 3 6 5 6 5 6 3 4 5 7 4 8 4 7 7 4 7 6 2 3 6 7 2 6 5 3 2 8 2 7 5 3 8 4 4 5 4 1 3 2

0 5 3 3 1 2 0 4 1 3 5 1 2 1 2 4 5 3 1 4 3 3 5 3 1 4 4 1 3 3 3 0 1 3 4 4 2 5 0 2 3

Copyright Cengage Learning. Powered by Cognero.

9 9 7 57 14 7 48 33 18 20 26 0 46 6 56 29 35 52 7 9 24 50 6 57 47 42 13 59 3 50 18 35 14 41 11 16 20 4 10 31 42

0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 Page 179


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 104.1 63.9 27.5 69.9 58.6 56.0 67.5 85.1 40.1 82.8 63.6 106.9 42.0 63.0 76.7 81.4 37.0 51.1 76.5 81.6 103.0 127.0 31.6 55.4 26.4 25.7 13.1 90.2 29.5 81.6 108.8 33.2 69.4 54.2 59.3 88.2 117.8 58.7 87.1 51.9 76.3

6 3 2 6 6 8 8 3 5 7 7 3 7 3 7 2 3 2 6 2 4 2 2 8 7 8 2 7 5 3 5 3 2 2 2 7 6 1 5 3 8

4 3 5 1 5 1 0 3 3 1 1 2 1 1 2 3 4 4 2 2 0 1 4 5 4 1 3 0 1 4 5 5 4 1 2 4 0 3 3 3 2

Copyright Cengage Learning. Powered by Cognero.

56 43 34 4 36 45 10 8 33 18 14 50 5 16 43 59 58 58 17 24 15 19 21 22 44 36 15 5 56 20 40 48 35 41 19 41 38 17 46 51 34

0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 1 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 1 0 Page 180


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 43.1 68.0 41.5 77.6 114.4 67.3 45.3 87.1 90.5 22.4 99.4 79.2 110.7 31.7 96.0 10.3 103.4 79.9 63.0 40.2 54.3 46.2 51.4 113.1 94.6 50.0 65.0 42.4 56.1 84.5 52.0 68.4 57.6 26.7 61.1 32.8 93.8 81.4 22.4 102.0 117.9

7 1 7 7 2 4 3 5 8 2 1 2 8 1 4 5 5 2 7 2 3 4 4 7 7 2 3 6 4 1 5 3 7 7 5 7 5 8 3 7 4

3 4 5 4 1 5 4 4 4 1 4 5 2 5 4 2 3 4 4 1 1 3 3 4 1 3 3 4 3 1 4 5 3 1 4 4 3 5 5 0 5

Copyright Cengage Learning. Powered by Cognero.

39 20 47 13 60 37 33 12 60 44 49 39 49 6 26 14 49 50 19 48 51 15 28 21 23 25 7 38 14 33 36 40 3 24 8 26 2 50 35 5 10

1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 1 0 0 0 0 1 0 0 1 0 0 Page 181


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 79.8 81.7 101.4 107.1 34.1 79.4 43.0 43.7 92.5 71.7 58.6 34.3 22.0 107.2 99.6 99.0 44.8 31.8 93.2 96.6 83.4 24.2 45.2 95.7 29.6 29.9 102.4 36.8 42.1 78.1 98.3 35.1 30.7 122.3 61.1 42.4 109.4 81.9 39.4 44.2 112.7

2 7 7 2 8 7 5 6 7 3 3 7 1 5 4 4 7 1 6 7 2 5 7 8 6 8 5 2 3 2 3 7 3 1 4 3 6 1 3 7 1

0 4 3 2 2 4 0 3 1 3 0 2 3 0 5 4 2 4 4 5 2 1 4 1 3 4 4 3 4 3 3 0 2 2 3 3 1 5 3 1 2

Copyright Cengage Learning. Powered by Cognero.

29 34 23 46 56 6 44 46 24 33 24 26 23 40 33 48 1 23 26 57 56 25 53 57 31 4 42 24 51 50 57 28 7 44 5 15 6 54 38 25 35

0 0 0 0 1 0 0 1 0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 Page 182


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 63.0 62.2 80.3 107.5 71.6 74.7 47.3 120.3 80.9 37.5 96.7 54.2 104.9 28.5 57.5 50.5 96.3 46.6 85.0 110.3 39.0 96.8 37.9 101.5 50.2 83.0 52.1 108.0 112.5 111.4 84.9 94.7 63.1 96.2 76.7 132.4 110.2 80.3 83.8 93.9 119.8

5 2 1 4 1 5 3 3 3 1 6 7 2 5 3 4 7 1 4 7 7 2 5 2 5 4 6 7 3 7 6 1 2 4 4 4 5 3 4 5 4

5 4 3 2 5 3 4 2 4 5 5 4 4 5 0 0 2 5 4 4 2 3 5 2 5 5 3 3 2 4 3 3 3 2 4 1 1 2 1 2 0

Copyright Cengage Learning. Powered by Cognero.

50 59 56 58 25 59 19 15 25 38 28 33 57 59 5 46 30 55 30 51 32 55 5 55 35 1 19 49 30 10 44 25 37 7 50 25 26 16 36 14 23

0 0 0 0 0 0 1 0 0 1 0 1 0 1 0 0 0 1 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Page 183


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 29.2 56.3 42.6 104.7 83.6 40.8 50.1 41.5 98.5 36.8 105.9 108.3 73.0 50.3 95.3 75.7 85.3 104.4 101.9 72.9 46.0 37.0 55.6 50.8 74.8 34.0 46.1 38.6 22.6 123.4 96.1 78.3 90.2 41.1 75.5 104.8 51.2 48.3 57.8 119.5 78.3

4 4 2 4 3 7 5 6 7 4 2 5 5 8 3 5 7 3 3 5 4 4 4 6 6 6 3 1 3 8 3 4 7 7 4 7 4 7 1 4 7

2 2 3 4 4 3 1 3 4 4 1 1 2 3 1 1 3 1 1 4 4 4 5 4 1 0 2 4 4 3 4 2 2 1 0 1 0 4 4 1 1

Copyright Cengage Learning. Powered by Cognero.

31 50 7 30 39 59 19 20 19 3 58 55 45 17 2 42 31 53 55 18 25 48 40 0 21 19 19 14 22 31 17 24 23 50 20 8 13 31 49 19 36

1 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 1 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 1 0 0 Page 184


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 74.3 32.8 89.6 98.3 91.3 86.2 26.7 51.5 41.5 51.5 94.5 68.2 42.9 101.7 27.3 38.3 41.2 54.2 27.4 95.9 72.0 31.3 29.4 45.8 121.9 45.1 87.9 33.1 49.5 26.5 91.3 67.8 72.8 67.7 79.6 92.6 39.0 38.8 82.2 107.6 36.7

4 8 5 3 8 4 8 1 1 4 3 1 1 7 4 4 4 3 8 7 2 2 3 7 6 2 2 1 5 3 7 3 3 2 2 6 3 5 4 6 6

2 0 2 1 2 4 2 1 3 1 4 3 5 2 2 2 3 1 3 4 4 1 3 3 4 3 1 2 3 1 2 5 4 4 4 3 1 4 4 3 2

Copyright Cengage Learning. Powered by Cognero.

47 42 52 54 49 59 55 47 17 1 35 54 58 13 35 2 11 32 25 46 10 48 43 35 6 32 41 34 24 27 46 12 28 60 32 58 51 25 24 30 23

0 0 0 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0 0 0 1 1 0 1 0 1 1 0 0 0 0 0 0 0 0 1 0 0 1 Page 185


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 34.6 49.4 109.0 57.6 48.7 101.8 107.6 47.3 96.7 120.0 52.3 39.1 48.0 31.2 18.5 30.7 108.3 36.3 98.8 94.1 31.7 101.7 39.1 44.0 49.0 87.9 33.9 91.4 51.7 65.9 47.9 53.8 77.7 72.9 7.4 52.4 34.4 97.8 16.0 63.5 84.2

7 6 6 2 2 2 6 2 7 6 7 3 2 7 8 6 3 1 7 6 4 4 7 6 6 2 5 8 4 3 3 6 5 6 6 4 6 6 2 4 2

1 0 4 3 5 3 1 2 4 5 3 4 2 1 1 1 2 3 0 0 2 1 4 5 3 2 5 2 0 3 4 3 5 2 3 1 3 4 3 2 1

Copyright Cengage Learning. Powered by Cognero.

33 41 25 13 14 48 29 0 50 27 45 34 54 58 40 39 10 0 37 19 21 39 11 4 37 7 2 45 59 59 56 52 58 42 20 49 36 5 39 21 15

0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 1 0 0 1 0 1 0 1 0 0 Page 186


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 116.5 60.8 52.4 32.7 47.9 100.8 35.1 127.4 47.0 58.3 81.8 9.5 14.0 92.5 103.1 29.6 45.1 56.7 31.9 85.5 66.1 74.9 46.3 97.8 44.3 48.2 35.0 71.4 45.5 90.9 49.5 110.4 33.5 49.5 51.4 46.7 49.3 97.2 44.8 33.6 105.7

7 2 6 5 6 2 8 4 7 7 5 1 2 5 7 5 3 4 1 7 6 2 2 8 2 8 4 4 3 4 7 7 6 3 4 4 7 6 7 7 3

4 5 5 1 1 0 3 4 1 5 1 4 3 0 3 4 3 2 3 0 1 1 3 0 5 4 2 2 4 2 3 3 4 4 5 2 4 1 3 3 1

Copyright Cengage Learning. Powered by Cognero.

15 4 55 2 37 29 56 51 37 48 8 57 55 50 2 30 9 22 40 7 56 8 1 13 25 53 50 24 49 5 5 18 13 51 51 12 17 60 13 59 6

0 0 1 0 0 0 1 0 0 1 0 1 1 0 0 1 0 1 1 0 0 0 0 0 1 1 1 0 1 0 0 0 0 1 1 0 1 0 0 1 0 Page 187


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 40.5 70.4 83.3 47.2 28.8 34.5 74.7 49.2 96.5 35.6 46.3 69.2 95.0 124.7 77.6 104.8 70.6 19.3 31.2 77.6 99.3 38.6 55.3 82.9 79.2 73.0 50.0 95.4 118.7 43.0 76.9 57.6 113.8 84.5 60.8 95.7 20.3 49.8 125.5 104.2 16.0

2 4 4 3 7 4 2 8 4 5 5 5 5 5 4 6 5 6 3 8 6 6 2 6 3 5 1 3 6 7 5 2 6 8 4 5 8 7 6 4 1

3 3 0 4 4 4 3 4 4 4 3 4 1 4 4 4 2 3 5 5 3 1 1 2 2 1 2 5 1 3 3 5 1 2 4 2 0 3 0 4 3

Copyright Cengage Learning. Powered by Cognero.

8 46 44 2 54 38 25 3 8 39 11 1 59 43 8 26 57 34 10 8 30 52 5 8 57 15 49 20 9 40 21 21 12 36 30 5 18 32 53 51 4

0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 1 0 0 0 Page 188


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 55.5 102.9 52.5 83.9 50.5 96.7 90.4 123.8 20.5 79.5 126.1 85.4 40.5 100.1 33.3 43.4 43.3 53.6 37.6 65.3 57.3 38.1 101.7 70.5 51.8 96.5 81.3 74.4 89.1 78.9 89.6 113.5 69.1 46.1 86.7 68.4 118.1 93.9 51.5 51.3 28.4

6 7 7 7 2 5 7 3 1 3 4 4 7 2 2 6 2 2 4 1 2 2 2 4 3 7 2 6 4 4 7 1 7 6 3 6 6 2 6 3 7

3 2 2 4 2 1 4 0 1 1 3 4 4 3 3 2 2 3 3 4 0 3 2 3 4 2 5 4 2 3 4 4 2 0 4 4 1 2 4 3 1

Copyright Cengage Learning. Powered by Cognero.

55 21 47 8 26 26 9 54 25 37 40 52 17 16 30 59 14 27 23 12 2 35 9 4 30 26 20 20 56 10 25 47 31 40 42 20 58 6 16 49 30

1 0 1 0 1 0 0 0 0 0 0 0 1 0 1 1 0 1 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 Page 189


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 60.3 16.6 55.4 102.6 53.9 99.8 29.2 50.3 81.1 95.9 105.9 35.1 27.9 45.6 13.5 61.6 96.0 115.0 86.0 62.9 66.8 76.4 48.2 52.3 106.3 96.5 81.2 96.2 14.3 125.1 109.5 44.4 75.1 35.6 50.7 49.9 55.5 70.8 77.7 114.7 54.8

2 2 5 8 5 5 1 4 6 5 3 6 1 6 1 8 7 1 7 2 2 7 5 4 8 3 2 1 2 2 7 7 7 6 2 2 7 8 2 7 6

1 1 2 1 2 3 2 3 1 1 4 3 3 5 4 4 0 3 1 2 3 5 1 5 4 4 2 2 5 3 4 2 5 2 4 5 2 1 3 1 5

Copyright Cengage Learning. Powered by Cognero.

59 20 56 42 52 6 56 10 56 51 27 4 13 20 34 18 48 3 54 3 59 40 47 55 45 42 17 16 37 41 23 15 53 4 46 31 12 37 2 31 48

0 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 Page 190


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 95.1 53.6 45.8 95.5 95.1 91.5 94.1 43.2 80.0 39.4 100.3 47.9 88.2 42.3 80.7 76.6 71.5 11.5 96.8 44.1 47.0 87.7 53.8 114.3 105.5 25.1 84.1 90.7 99.9 63.3 102.9 83.8 38.0 41.6 128.8 36.7 100.4 47.8 114.8 91.9 114.2

4 1 7 2 3 2 8 7 2 6 3 3 4 6 4 3 7 2 2 6 6 4 4 6 2 7 3 3 5 5 4 1 3 7 7 3 8 2 3 7 1

0 0 1 1 1 3 1 4 3 0 3 4 2 1 4 5 2 4 2 3 4 0 2 0 1 1 3 0 1 2 1 3 3 0 3 3 1 2 0 4 3

Copyright Cengage Learning. Powered by Cognero.

44 6 4 3 49 47 0 50 35 34 6 29 12 8 14 9 7 4 51 36 42 47 23 42 34 59 13 22 21 47 51 33 20 35 5 16 33 11 60 24 17

0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 Page 191


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 24.3 50.3 18.8 73.1 74.6 97.7 48.5 64.2 88.1 69.9 79.9 23.8 55.5 55.7 86.4 63.2 53.1 29.6 76.9 98.4 91.9 75.4 45.2 80.9 103.7 46.8 26.8 64.8 37.3 80.1 99.9 85.2 48.1 12.7 8.3 60.9 122.6 55.6 93.0 50.6 69.3

3 3 1 6 4 5 7 5 3 7 5 6 4 5 6 2 1 1 3 5 8 3 7 4 6 6 6 7 7 3 7 3 6 2 3 6 7 8 3 4 7

3 2 2 1 2 2 1 1 1 5 4 2 4 3 3 2 4 1 4 2 0 3 3 4 3 1 2 3 0 0 5 1 1 3 3 0 2 3 4 1 1

Copyright Cengage Learning. Powered by Cognero.

51 32 4 20 10 58 15 23 38 19 7 3 26 3 29 24 18 10 43 2 51 50 38 32 22 53 52 2 56 8 35 10 15 35 55 58 33 39 41 28 32

1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 Page 192


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 43.9 93.1 49.0 57.3 67.8 14.8 130.2 80.8 112.7 40.6 115.0 32.9 112.5 132.9 58.4 105.9 57.1 43.2 60.0 107.8 124.5 128.9 92.1 50.7 28.4 13.6 74.9 101.2 124.7 41.8 49.3 46.0 54.0 104.9 94.8 100.6 39.6 90.8 95.2 97.4 105.0

2 8 5 7 4 5 5 8 5 7 4 7 3 2 2 6 3 3 3 7 8 2 5 2 2 8 2 2 6 8 3 3 4 7 5 6 4 6 6 5 5

2 4 4 1 3 1 4 2 1 0 1 3 2 5 2 2 2 4 4 2 1 3 4 2 3 1 3 4 3 4 2 2 5 1 3 3 1 3 5 2 2

Copyright Cengage Learning. Powered by Cognero.

34 14 12 22 35 25 43 39 30 12 22 44 35 45 8 36 27 35 2 11 1 53 8 12 10 14 56 13 44 58 45 46 12 33 34 44 24 36 4 18 4

1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 Page 193


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 92.6 115.8 65.2 79.4 39.7 51.7 56.1 94.3 33.8 22.4 38.8 62.6 66.7 58.1 53.5 100.0 89.8 59.5 52.7 65.7 89.6 121.8 42.0 91.1 82.4 60.0 44.6 46.9 34.6 124.4 79.1 79.1 63.9 67.7 101.0 51.9 17.8 94.7 80.6 28.6 84.6

3 8 7 7 4 3 4 3 4 8 6 3 7 7 7 2 2 3 5 7 2 3 3 5 7 4 6 8 3 2 7 1 4 7 8 5 1 6 3 8 3

2 1 0 2 3 0 4 5 4 0 3 1 1 4 4 0 2 5 2 3 0 4 4 3 1 0 4 2 1 3 3 1 5 4 2 3 5 3 4 3 4

Copyright Cengage Learning. Powered by Cognero.

4 19 19 26 57 40 39 26 16 29 11 10 0 29 18 21 13 24 7 39 52 22 3 38 57 8 22 26 20 52 3 10 28 20 31 54 55 60 4 52 13

0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 0 0 1 0 Page 194


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 114.1 52.3 49.3 117.6 52.5 93.8 74.9 35.1 38.1 113.7 41.5 71.8 86.0 81.8 28.7 37.3 45.8 27.8 97.2 29.6 83.0 84.0 51.4 61.5 110.2 75.6 41.2 112.5 62.3 90.3 130.9 74.3 38.6 56.4 65.8 102.5 114.3 43.1 78.5 102.3 55.4

4 5 5 2 3 2 4 5 3 2 5 7 1 7 2 6 3 5 6 1 8 2 3 7 4 6 4 3 2 4 3 8 2 6 7 7 6 7 6 3 4

3 3 4 4 2 4 2 2 2 2 5 4 2 4 0 3 1 3 3 2 1 1 2 0 2 4 4 3 2 1 1 3 2 4 2 2 5 4 2 5 2

Copyright Cengage Learning. Powered by Cognero.

7 25 56 8 2 45 24 31 26 39 17 15 51 1 39 17 40 17 37 19 20 54 13 38 53 9 32 23 11 31 52 29 7 25 50 17 36 58 51 14 33

0 1 1 0 0 0 0 1 1 0 1 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 1 Page 195


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 53.3 101.4 88.7 57.1 71.6 53.7 78.2 55.3 83.7 28.0 49.5 46.6 78.1 65.9 54.0 35.8 31.5 106.9 48.4 107.2 34.5 93.0 101.9 68.9 52.4 63.2 102.2 64.7 15.4 63.5 84.0 120.4 112.5 92.8 60.3 113.6 54.7 117.7 59.1 20.2 68.0

2 5 7 5 6 8 5 7 6 2 1 5 4 4 4 5 4 6 5 6 2 4 3 2 5 5 2 6 7 3 6 8 3 8 5 6 2 2 4 7 7

3 4 4 4 0 1 2 3 3 4 1 0 1 3 3 0 1 1 0 2 3 4 1 1 4 4 3 0 1 5 5 1 4 4 3 1 0 4 4 0 2

Copyright Cengage Learning. Powered by Cognero.

40 18 29 30 18 29 17 53 46 11 2 53 51 53 48 24 19 58 50 58 28 16 15 55 31 43 11 29 36 7 14 37 36 30 7 37 13 42 60 24 36

1 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 Page 196


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 78.9 95.5 79.8 127.8 49.4 57.8 93.6 82.7 39.4 50.4 87.7 33.8 74.1 98.0 79.0 54.8 59.3 124.2 87.5 88.6 75.1 41.7 98.4 43.4 71.4 56.4 94.7 50.6 93.4 88.0 27.9 36.9 23.7 89.0 100.3 103.0 81.5 49.2 44.9 45.8 117.5

4 4 4 4 7 8 3 1 1 6 1 4 6 3 4 6 8 5 2 2 3 2 7 3 6 6 7 2 6 1 6 8 8 7 2 4 7 7 3 5 3

4 4 4 1 1 3 2 0 1 2 2 1 0 5 2 4 1 2 3 0 2 3 5 4 2 3 4 4 4 3 2 3 2 2 2 0 5 5 1 1 2

Copyright Cengage Learning. Powered by Cognero.

53 40 13 33 57 30 24 45 19 57 22 30 8 51 30 52 32 59 28 43 3 19 39 44 47 40 8 32 47 1 37 58 22 14 49 43 33 7 7 15 52

0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 1 0 1 0 0 1 1 1 0 0 0 0 0 0 0 0 Page 197


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 63.1 49.4 78.6 82.3 56.6 31.3 91.5 77.4 38.7 82.0 87.6 47.3 82.4 43.2 49.9 36.7 54.9 84.4 41.6 51.0 39.0 39.0

6 7 7 2 7 1 3 6 4 2 3 8 4 5 5 3 1 4 2 7 7 5

2 4 1 1 4 2 3 1 1 3 2 1 1 1 0 2 5 5 4 5 1 2

51 49 37 21 36 7 56 23 22 38 59 10 46 14 48 19 25 23 12 2 3 17

0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1

ANSWER: The resulting model is: log odds of the event (Default) = –1.022 – 0.097 × Annual Income + 0.210 × Years of Post-High School Education – 0.282 × Work Experience. 58. A bank wants to understand better the details of customers who are likely to default on their loan. In order to analyze this, the data from a random sample of 200 customers are given below: Using an appropriate software with Partition with Oversampling procedure, partition the data so there are 50 percent successes (Loan default) in the training set and 40 percent of the validation data are taken away as test data. Classify the data using k-nearest neighbors with up to k = 10. Use Loan default as the output variable and all the other variables as input variables. Using an appropriate software with k-nearest neighbors Classification procedure, be sure to Normalize input data and to Score on best k between 1 and specified value. Generate lift charts for both the validation data and test data. For the cutoff probability value 0.5, what value of k minimizes the overall error rate on the validation data? Average Balance Age Gender Married Divorced Family size Loan Default 1,222.3 36 1 0 0 1 0 6,291 41 0 1 1 3 0 1,051 52 1 1 0 4 1 1,118.3 36 1 0 0 2 0 Copyright Cengage Learning. Powered by Cognero.

Page 198


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 1,176.8 1,052 1,314.6 439.7 1,232.7 1,855.4 322.4 1,570.7 2,729 1,397.8 1,464.1 40.3 1,296.4 2,142.7 2,756.3 1,451.1 1,003.9 1,245.7 3,011.1 1,222.3 2,225.9 2,708.2 2,341.6 1,817.7 1,417.3 4,291.6 1,310.7 1,144 1,088.4 1,341.9 1,269.1 1,435.5 113.7 4,646.5 1,003.9 1,773.5 3,349.1 647 3,901.6 1,603.2 1,308.1 4,061.5 2,283.1

35 46 37 34 56 51 44 39 51 41 47 46 32 58 32 49 48 40 42 30 51 40 30 42 52 59 34 38 47 31 56 46 58 39 38 46 45 54 58 38 39 35 57

0 1 1 1 0 0 0 1 1 1 1 1 0 0 1 1 1 1 0 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 0 0 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 0 1

0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1

Copyright Cengage Learning. Powered by Cognero.

2 3 1 5 1 2 5 3 1 2 2 3 2 4 2 3 2 6 1 2 5 2 1 2 3 4 2 3 2 2 1 2 3 3 2 2 3 3 4 2 2 1 3

0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 Page 199


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 1,023 1,083.2 1,158.6 1,052 592.2 6,834.4 1,505.7 1,170 1,509.6 1,061 517.2 1,661.7 1,279.5 1,656.5 1,319.8 1,227.5 1,748.8 1,060 1,119.6 1,135 2,777.1 1,535.6 352 1,605.8 5,737.2 3,354.3 10,096.1 9,164 6,796.7 2,108.9 265.2 1,097 1,041 1,224.9 1,557.7 3,202.2 1,173 1,794.3 2,423.5 171.8 1,2157.9 4,107 887.9

40 52 26 30 35 41 57 56 43 52 60 46 57 46 49 38 51 38 44 59 50 34 59 42 48 45 57 53 48 57 52 55 58 51 44 57 39 41 47 52 50 52 47

1 0 1 1 1 0 1 0 1 0 0 1 0 0 1 0 0 1 1 0 1 0 0 1 0 0 1 0 1 1 0 1 0 0 1 0 1 0 0 0 1 1 0

1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0

0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0

Copyright Cengage Learning. Powered by Cognero.

2 4 1 2 3 3 3 6 4 3 5 2 1 2 2 1 3 4 2 3 2 1 5 2 2 1 2 3 4 5 3 3 2 1 3 2 3 1 2 3 3 2 2

0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 1 0 0 1 Page 200


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 1,165.1 643.5 1,529.1 2,142.7 1,035 1,003.9 1,509.6 1,118.3 1,124.8 1,891.8 6,796.7 1,709.8 1,011.7 1,270.4 1,663 1,648.7 1,887.9 1,244.4 2,465.1 6,086.9 1,262.6 1,513.5 1,170.3 1,557.7 2,454.7 710.8 3,711.8 1,748.8 1,248.3 1,002.6 1,130 1,040.3 1,595.4 1,144 1,582.4 1,049 1,577.2 561 3,349.1 1,704.6 1,245.7 16,191.8 2,185.6

44 50 45 59 54 48 43 37 46 49 48 54 30 52 39 33 47 37 36 38 57 27 29 45 57 41 56 52 59 51 41 47 50 34 52 53 39 40 46 45 35 54 37

0 0 1 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 0 1 1 0 0 1 1 0 0 0 1 0 0 0 1 1 1 0 1 0 0 1 0 0 0

1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0

Copyright Cengage Learning. Powered by Cognero.

3 1 1 1 2 3 1 2 2 1 1 2 3 2 3 2 2 3 3 3 3 1 3 2 1 1 2 2 2 2 1 2 3 1 3 1 2 3 2 3 2 3 3

0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 Page 201


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 1,167.7 1,535.6 1,319.8 1,145.6 1,304.2 1,851.5 2,099.8 1,152 1,219.7 1,235.3 1,811.2 732.5 5,630.6 2,420.9 2,454.7 1,557.7 4,017.3 4,017.3 1,351 1,507 1,050.7 1,657.8 1,115 245.9 1,058.5 1,377 1,079.3 1,456.3 2,063.4 1,106.6 1,119.6 2,496.3 1,578.5 1,284.7 1,409.5 1,085.8 1,083 1,556.4 1,080.6 1,457.6 1,478.4 1,690.3 1,458.9

40 34 50 49 53 49 40 34 49 50 30 28 56 67 58 46 72 72 41 29 60 36 37 59 56 60 34 41 29 26 60 53 42 47 53 44 38 49 41 33 47 51 56

1 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 1 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 1 1 0 1 1 0 0 1 0 1

1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1

0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0

Copyright Cengage Learning. Powered by Cognero.

1 2 1 1 3 3 1 1 2 3 1 2 3 3 3 2 2 1 2 3 1 1 1 2 3 1 1 3 2 2 1 2 1 3 3 2 3 3 3 2 3 1 3

0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Page 202


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 1,465.4 1,002.6 1,728 1,015.6 1,163.8 1,299 1,400.4 1,005.2 1,341.9 1,032.5 1,236.6 1,087.1 1,170.3 1,237.9 1,296.4 1,182 1,133 1,629.2 1,830.7 1,137.8 2,011.4 1,70.3 1,135.2 195

46 51 34 31 48 59 38 27 40 39 48 34 55 55 51 37 39 52 36 36 56 36 39 29

0 1 0 1 0 0 1 1 1 0 0 0 0 0 0 1 0 0 0 1 0 0 1 1

1 0 1 0 1 1 1 1 0 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1

0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 1 1 0 0

3 1 1 2 1 2 1 3 2 1 1 1 1 1 2 1 1 3 3 3 3 1 3 1

0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1

ANSWER: Loan default observations only make up 15.5% of the data set. By oversampling the Loan default observations in the training set, a data mining algorithm can better learn how to classify them. Customers who had defaulted on their loan are classified by: 1. The average balance is less than $951.75 or 2. The average balance is less than $1,216.30, and the family size is greater than 2. 59. A bank wants to understand better the details of customers who are likely to default on their loan. In order to analyze this, the data from a random sample of 200 customers are given below: Using an appropriate software Partition with Oversampling procedure, partition the data so there are 50 percent successes (Loan default) in the training set and 40 percent of the validation data are taken away as test data. Classify the data using k-nearest neighbors with up to k = 10. Use Loan default as the output variable and all the other variables as input variables. Using appropriate software, be sure to Normalize input data and to Score on best k between 1 and specified value. Generate lift charts for both the validation data and test data. What is the overall error rate on the test data? Interpret this measure. Average Balance Age Gender Married Divorced Family Size Loan Default 1,222.3 36 1 0 0 1 0 6,291 41 0 1 1 3 0 Copyright Cengage Learning. Powered by Cognero.

Page 203


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 1,051 1,118.3 1,176.8 1,052 1,314.6 439.7 1,232.7 1,855.4 322.4 1,570.7 2,729 1,397.8 1,464.1 40.3 1,296.4 2,142.7 2,756.3 1,451.1 1,003.9 1,245.7 3,011.1 1,222.3 2,225.9 2,708.2 2,341.6 1,817.7 1,417.3 4,291.6 1,310.7 1,144 1,088.4 1,341.9 1,269.1 1,435.5 113.7 4,646.5 1,003.9 1,773.5 3,349.1 647 3,901.6 1,603.2 1,308.1

52 36 35 46 37 34 56 51 44 39 51 41 47 46 32 58 32 49 48 40 42 30 51 40 30 42 52 59 34 38 47 31 56 46 58 39 38 46 45 54 58 38 39

1 1 0 1 1 1 0 0 0 1 1 1 1 1 0 0 1 1 1 1 0 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 0

1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1

0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0

Copyright Cengage Learning. Powered by Cognero.

4 2 2 3 1 5 1 2 5 3 1 2 2 3 2 4 2 3 2 6 1 2 5 2 1 2 3 4 2 3 2 2 1 2 3 3 2 2 3 3 4 2 2

1 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 Page 204


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 4,061.5 2,283.1 1,023 1,083.2 1,158.6 1,052 592.2 6,834.4 1,505.7 1,170 1,509.6 1,061 517.2 1,661.7 1,279.5 1,656.5 1,319.8 1,227.5 1,748.8 1,060 1,119.6 1,135 2,777.1 1,535.6 352 1,605.8 5,737.2 3,354.3 10,096.1 9,164 6,796.7 2,108.9 265.2 1,097 1,041 1,224.9 1,557.7 3,202.2 1,173 1,794.3 2,423.5 171.8 12,157.9

35 57 40 52 26 30 35 41 57 56 43 52 60 46 57 46 49 38 51 38 44 59 50 34 59 42 48 45 57 53 48 57 52 55 58 51 44 57 39 41 47 52 50

0 1 1 0 1 1 1 0 1 0 1 0 0 1 0 0 1 0 0 1 1 0 1 0 0 1 0 0 1 0 1 1 0 1 0 0 1 0 1 0 0 0 1

0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1

Copyright Cengage Learning. Powered by Cognero.

1 3 2 4 1 2 3 3 3 6 4 3 5 2 1 2 2 1 3 4 2 3 2 1 5 2 2 1 2 3 4 5 3 3 2 1 3 2 3 1 2 3 3

0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 1 0 Page 205


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 4,107 887.9 1,165.1 643.5 1,529.1 2,142.7 1,035 1,003.9 1,509.6 1,118.3 1,124.8 1,891.8 6,796.7 1,709.8 1,011.7 1,270.4 1,663 1,648.7 1,887.9 1,244.4 2,465.1 6,086.9 1,262.6 1,513.5 1,170.3 1,557.7 2,454.7 710.8 3,711.8 1,748.8 1,248.3 1,002.6 1,130 1,040.3 1,595.4 1,144 1,582.4 1,049 1,577.2 561 3,349.1 1,704.6 1,245.7

52 47 44 50 45 59 54 48 43 37 46 49 48 54 30 52 39 33 47 37 36 38 57 27 29 45 57 41 56 52 59 51 41 47 50 34 52 53 39 40 46 45 35

1 0 0 0 1 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 0 1 1 0 0 1 1 0 0 0 1 0 0 0 1 1 1 0 1 0 0 1 0

1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0

Copyright Cengage Learning. Powered by Cognero.

2 2 3 1 1 1 2 3 1 2 2 1 1 2 3 2 3 2 2 3 3 3 3 1 3 2 1 1 2 2 2 2 1 2 3 1 3 1 2 3 2 3 2

0 1 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 Page 206


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 16,191.8 2,185.6 1,167.7 1,535.6 1,319.8 1,145.6 1,304.2 1,851.5 2,099.8 1,152 1,219.7 1,235.3 1,811.2 732.5 5,630.6 2,420.9 2,454.7 1,557.7 4,017.3 4,017.3 1,351 1,507 1,050.7 1,657.8 1,115 245.9 1,058.5 1,377 1,079.3 1,456.3 2,063.4 1,106.6 1,119.6 2,496.3 1,578.5 1,284.7 1,409.5 1,085.8 1,083 1,556.4 1,080.6 1,457.6 1,478.4

54 37 40 34 50 49 53 49 40 34 49 50 30 28 56 67 58 46 72 72 41 29 60 36 37 59 56 60 34 41 29 26 60 53 42 47 53 44 38 49 41 33 47

0 0 1 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 1 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 1 1 0 1 1 0 0 1

1 1 1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1

0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0

Copyright Cengage Learning. Powered by Cognero.

3 3 1 2 1 1 3 3 1 1 2 3 1 2 3 3 3 2 2 1 2 3 1 1 1 2 3 1 1 3 2 2 1 2 1 3 3 2 3 3 3 2 3

0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Page 207


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 1,690.3 1,458.9 1,465.4 1,002.6 1,728 1,015.6 1,163.8 1,299 1,400.4 1,005.2 1,341.9 1,032.5 1,236.6 1,087.1 1,170.3 1,237.9 1,296.4 1,182 1,133 1,629.2 1,830.7 1,137.8 2,011.4 1,70.3 1,135.2 195

51 56 46 51 34 31 48 59 38 27 40 39 48 34 55 55 51 37 39 52 36 36 56 36 39 29

0 1 0 1 0 1 0 0 1 1 1 0 0 0 0 0 0 1 0 0 0 1 0 0 1 1

1 1 1 0 1 0 1 1 1 1 0 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1

0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 1 1 0 0

1 3 3 1 1 2 1 2 1 3 2 1 1 1 1 1 2 1 1 3 3 3 3 1 3 1

0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1

ANSWER: The overall error rate is 10%. The class 1 error rate is 50%, and the class 0 error rate is 2.94%. 60. A bank wants to understand better the details of customers who are likely to default on their loan. In order to analyze this, the data from a random sample of 200 customers are given below: Using appropriate software, partition the data so there are 50 percent successes (Loan default) in the training set and 40 percent of the validation data are taken away as test data. Classify the data using k-nearest neighbors with up to k = 10. Use Loan default as the output variable and all the other variables as input variables. Be sure to Normalize input data and to Score on best k between 1 and specified value. Generate lift charts for both the validation data and test data. What are the Class 1 error rate and the Class 0 error rate on the test data? Average Balance Age Gender Married Divorced Family Size Loan Default 1,222.3 36 1 0 0 1 0 6,291 41 0 1 1 3 0 1,051 52 1 1 0 4 1 1,118.3 36 1 0 0 2 0 1,176.8 35 0 1 0 2 0 Copyright Cengage Learning. Powered by Cognero.

Page 208


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 1,052 1,314.6 439.7 1,232.7 1,855.4 322.4 1,570.7 2,729 1,397.8 1,464.1 40.3 1,296.4 2,142.7 2,756.3 1,451.1 1,003.9 1,245.7 3,011.1 1,222.3 2,225.9 2,708.2 2,341.6 1,817.7 1,417.3 4,291.6 1,310.7 1,144 1,088.4 1,341.9 1,269.1 1,435.5 113.7 4,646.5 1,003.9 1,773.5 3,349.1 647 3,901.6 1,603.2 1,308.1 4,061.5 2,283.1 1,023

46 37 34 56 51 44 39 51 41 47 46 32 58 32 49 48 40 42 30 51 40 30 42 52 59 34 38 47 31 56 46 58 39 38 46 45 54 58 38 39 35 57 40

1 1 1 0 0 0 1 1 1 1 1 0 0 1 1 1 1 0 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 0 0 1 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 0 1 1

0 0 1 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0

Copyright Cengage Learning. Powered by Cognero.

3 1 5 1 2 5 3 1 2 2 3 2 4 2 3 2 6 1 2 5 2 1 2 3 4 2 3 2 2 1 2 3 3 2 2 3 3 4 2 2 1 3 2

0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 Page 209


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 1,083.2 1,158.6 1,052 592.2 6,834.4 1,505.7 1,170 1,509.6 1,061 517.2 1,661.7 1,279.5 1,656.5 1,319.8 1,227.5 1,748.8 1,060 1,119.6 1,135 2,777.1 1,535.6 352 1,605.8 5,737.2 3,354.3 10,096.1 9,164 6,796.7 2,108.9 265.2 1,097 1,041 1,224.9 1,557.7 3,202.2 1,173 1,794.3 2,423.5 171.8 12,157.9 4,107 887.9 1,165.1

52 26 30 35 41 57 56 43 52 60 46 57 46 49 38 51 38 44 59 50 34 59 42 48 45 57 53 48 57 52 55 58 51 44 57 39 41 47 52 50 52 47 44

0 1 1 1 0 1 0 1 0 0 1 0 0 1 0 0 1 1 0 1 0 0 1 0 0 1 0 1 1 0 1 0 0 1 0 1 0 0 0 1 1 0 0

1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1

0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0

Copyright Cengage Learning. Powered by Cognero.

4 1 2 3 3 3 6 4 3 5 2 1 2 2 1 3 4 2 3 2 1 5 2 2 1 2 3 4 5 3 3 2 1 3 2 3 1 2 3 3 2 2 3

1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 1 0 0 1 0 Page 210


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 643.5 1,529.1 2,142.7 1,035 1,003.9 1,509.6 1,118.3 1,124.8 1,891.8 6,796.7 1,709.8 1,011.7 1,270.4 1,663 1,648.7 1,887.9 1,244.4 2,465.1 6,086.9 1,262.6 1,513.5 1,170.3 1,557.7 2,454.7 710.8 3,711.8 1,748.8 1,248.3 1,002.6 1,130 1,040.3 1,595.4 1,144 1,582.4 1,049 1,577.2 561 3,349.1 1,704.6 1,245.7 16,191.8 2,185.6 1,167.7

50 45 59 54 48 43 37 46 49 48 54 30 52 39 33 47 37 36 38 57 27 29 45 57 41 56 52 59 51 41 47 50 34 52 53 39 40 46 45 35 54 37 40

0 1 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 0 1 1 0 0 1 1 0 0 0 1 0 0 0 1 1 1 0 1 0 0 1 0 0 0 1

1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0

Copyright Cengage Learning. Powered by Cognero.

1 1 1 2 3 1 2 2 1 1 2 3 2 3 2 2 3 3 3 3 1 3 2 1 1 2 2 2 2 1 2 3 1 3 1 2 3 2 3 2 3 3 1

1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 Page 211


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 1,535.6 1,319.8 1,145.6 1,304.2 1,851.5 2,099.8 1,152 1,219.7 1,235.3 1,811.2 732.5 5,630.6 2,420.9 2,454.7 1,557.7 4,017.3 4,017.3 1,351 1,507 1,050.7 1,657.8 1,115 245.9 1,058.5 1,377 1,079.3 1,456.3 2,063.4 1,106.6 1,119.6 2,496.3 1,578.5 1,284.7 1,409.5 1,085.8 1,083 1,556.4 1,080.6 1,457.6 1,478.4 1,690.3 1,458.9 1,465.4

34 50 49 53 49 40 34 49 50 30 28 56 67 58 46 72 72 41 29 60 36 37 59 56 60 34 41 29 26 60 53 42 47 53 44 38 49 41 33 47 51 56 46

1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 1 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 1 1 0 1 1 0 0 1 0 1 0

1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1

0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0

Copyright Cengage Learning. Powered by Cognero.

2 1 1 3 3 1 1 2 3 1 2 3 3 3 2 2 1 2 3 1 1 1 2 3 1 1 3 2 2 1 2 1 3 3 2 3 3 3 2 3 1 3 3

0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Page 212


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 1,002.6 1,728 1,015.6 1,163.8 1,299 1,400.4 1,005.2 1,341.9 1,032.5 1,236.6 1,087.1 1,170.3 1,237.9 1,296.4 1,182 1,133 1,629.2 1,830.7 1,137.8 2,011.4 1,70.3 1,135.2 195

51 34 31 48 59 38 27 40 39 48 34 55 55 51 37 39 52 36 36 56 36 39 29

1 0 1 0 0 1 1 1 0 0 0 0 0 0 1 0 0 0 1 0 0 1 1

0 1 0 1 1 1 1 0 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1

0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 1 1 0 0

1 1 2 1 2 1 3 2 1 1 1 1 1 2 1 1 3 3 3 3 1 3 1

0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1

ANSWER: Customers who had defaulted on their loan are classified by: The average balance is less than $951.75 or The average balance is less than $1,216.30, and the family size is greater than 2. 61. A bank wants to understand better the details of customers who are likely to default on their loan. In order to analyze this, the data from a random sample of 200 customers are given below: Using appropriate software, partition the data so there are 50 percent successes (Loan default) in the training set and 40 percent of the validation data are taken away as test data. Classify the data using k-nearest neighbors with up to k = 10. Use Loan default as the output variable and all the other variables as input variables. Be sure to Normalize input data and to Score on best k between 1 and specified value. Generate lift charts for both the validation data and test data. Examine the decile-wise lift chart on the test data. What is the first decile lift on the test data? Interpret this value. Average Balance Age Gender Married Divorced Family Size Loan Default 1,222.3 36 1 0 0 1 0 6,291 41 0 1 1 3 0 1,051 52 1 1 0 4 1 1,118.3 36 1 0 0 2 0 Copyright Cengage Learning. Powered by Cognero.

Page 213


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 1,176.8 1,052 1,314.6 439.7 1,232.7 1,855.4 322.4 1,570.7 2,729 1,397.8 1,464.1 40.3 1,296.4 2,142.7 2,756.3 1,451.1 1,003.9 1,245.7 3,011.1 1,222.3 2,225.9 2,708.2 2,341.6 1,817.7 1,417.3 4,291.6 1,310.7 1,144 1,088.4 1,341.9 1,269.1 1,435.5 113.7 4,646.5 1,003.9 1,773.5 3,349.1 647 3,901.6 1,603.2 1,308.1 4,061.5 2,283.1

35 46 37 34 56 51 44 39 51 41 47 46 32 58 32 49 48 40 42 30 51 40 30 42 52 59 34 38 47 31 56 46 58 39 38 46 45 54 58 38 39 35 57

0 1 1 1 0 0 0 1 1 1 1 1 0 0 1 1 1 1 0 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 0 0 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 0 1

0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1

Copyright Cengage Learning. Powered by Cognero.

2 3 1 5 1 2 5 3 1 2 2 3 2 4 2 3 2 6 1 2 5 2 1 2 3 4 2 3 2 2 1 2 3 3 2 2 3 3 4 2 2 1 3

0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 Page 214


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 1,023 1,083.2 1,158.6 1,052 592.2 6,834.4 1,505.7 1,170 1,509.6 1,061 517.2 1,661.7 1,279.5 1,656.5 1,319.8 1,227.5 1,748.8 1,060 1,119.6 1,135 2,777.1 1,535.6 352 1,605.8 5,737.2 3,354.3 10,096.1 9,164 6,796.7 2,108.9 265.2 1,097 1,041 1,224.9 1,557.7 3,202.2 1,173 1,794.3 2,423.5 171.8 12,157.9 4,107 887.9

40 52 26 30 35 41 57 56 43 52 60 46 57 46 49 38 51 38 44 59 50 34 59 42 48 45 57 53 48 57 52 55 58 51 44 57 39 41 47 52 50 52 47

1 0 1 1 1 0 1 0 1 0 0 1 0 0 1 0 0 1 1 0 1 0 0 1 0 0 1 0 1 1 0 1 0 0 1 0 1 0 0 0 1 1 0

1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0

0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0

Copyright Cengage Learning. Powered by Cognero.

2 4 1 2 3 3 3 6 4 3 5 2 1 2 2 1 3 4 2 3 2 1 5 2 2 1 2 3 4 5 3 3 2 1 3 2 3 1 2 3 3 2 2

0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 1 0 0 1 Page 215


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 1,165.1 643.5 1,529.1 2,142.7 1,035 1,003.9 1,509.6 1,118.3 1,124.8 1,891.8 6,796.7 1,709.8 1,011.7 1,270.4 1,663 1,648.7 1,887.9 1,244.4 2,465.1 6,086.9 1,262.6 1,513.5 1,170.3 1,557.7 2,454.7 710.8 3,711.8 1,748.8 1,248.3 1,002.6 1,130 1,040.3 1,595.4 1,144 1,582.4 1,049 1,577.2 561 3,349.1 1,704.6 1,245.7 16,191.8 2,185.6

44 50 45 59 54 48 43 37 46 49 48 54 30 52 39 33 47 37 36 38 57 27 29 45 57 41 56 52 59 51 41 47 50 34 52 53 39 40 46 45 35 54 37

0 0 1 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 0 1 1 0 0 1 1 0 0 0 1 0 0 0 1 1 1 0 1 0 0 1 0 0 0

1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0

Copyright Cengage Learning. Powered by Cognero.

3 1 1 1 2 3 1 2 2 1 1 2 3 2 3 2 2 3 3 3 3 1 3 2 1 1 2 2 2 2 1 2 3 1 3 1 2 3 2 3 2 3 3

0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 Page 216


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 1,167.7 1,535.6 1,319.8 1,145.6 1,304.2 1,851.5 2,099.8 1,152 1,219.7 1,235.3 1,811.2 732.5 5,630.6 2,420.9 2,454.7 1,557.7 4,017.3 4,017.3 1,351 1,507 1,050.7 1,657.8 1,115 245.9 1,058.5 1,377 1,079.3 1,456.3 2,063.4 1,106.6 1,119.6 2,496.3 1,578.5 1,284.7 1,409.5 1,085.8 1,083 1,556.4 1,080.6 1,457.6 1,478.4 1,690.3 1,458.9

40 34 50 49 53 49 40 34 49 50 30 28 56 67 58 46 72 72 41 29 60 36 37 59 56 60 34 41 29 26 60 53 42 47 53 44 38 49 41 33 47 51 56

1 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 1 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 1 1 0 1 1 0 0 1 0 1

1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1

0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0

Copyright Cengage Learning. Powered by Cognero.

1 2 1 1 3 3 1 1 2 3 1 2 3 3 3 2 2 1 2 3 1 1 1 2 3 1 1 3 2 2 1 2 1 3 3 2 3 3 3 2 3 1 3

0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Page 217


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 1,465.4 1,002.6 1,728 1,015.6 1,163.8 1,299 1,400.4 1,005.2 1,341.9 1,032.5 1,236.6 1,087.1 1,170.3 1,237.9 1,296.4 1,182 1,133 1,629.2 1,830.7 1,137.8 2,011.4 170.3 1,135.2 195

46 51 34 31 48 59 38 27 40 39 48 34 55 55 51 37 39 52 36 36 56 36 39 29

0 1 0 1 0 0 1 1 1 0 0 0 0 0 0 1 0 0 0 1 0 0 1 1

1 0 1 0 1 1 1 1 0 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1

0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 1 1 0 0

3 1 1 2 1 2 1 3 2 1 1 1 1 1 2 1 1 3 3 3 3 1 3 1

0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1

ANSWER: The first decile lift is 5. For this test data set of 40 customers and 6 actual customers who have defaulted on their loan, if we randomly selected 4 customers, on average 0.6 customers would have defaulted on their loan. However, if we use the classification tree to identify the top 4 customers, then (0.6)(5) = 3 of the customers would have defaulted on their loan. This can be confirmed from the Detailed Scoring report by observing that among the top 4 observations in the test set rated by the best pruned tree to be most likely to default, 3 actually defaulted. 62. A bank wants to understand better the details of customers who are likely to default on their loan. In order to analyze this, the data from a random sample of 200 customers are given below: Using appropriate software, partition the data so there are 50 percent successes (Loan default) in the training set and 40 percent of the validation data are taken away as test data. Construct a logistic regression model using Loan default as the output variable and all the other variables as input variables. Perform an exhaustive-search best subset selection with the number of best subsets equal to 2. Generate lift charts for both the validation data and test data. From the generated set of logistic regression models, select one that is a good fit. Express the model as a mathematical equation relating the output variable to the input variables. Do the relationships suggested by the model make sense? Try to explain them. Average Balance Age Gender Married Divorced Family Size Loan Default Copyright Cengage Learning. Powered by Cognero.

Page 218


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 1,222.3 6,291 1,051 1,118.3 1,176.8 1,052 1,314.6 439.7 1,232.7 1,855.4 322.4 1,570.7 2,729 1,397.8 1,464.1 40.3 1,296.4 2,142.7 2,756.3 1,451.1 1,003.9 1,245.7 3,011.1 1,222.3 2,225.9 2,708.2 2,341.6 1,817.7 1,417.3 4,291.6 1,310.7 1,144 1,088.4 1,341.9 1,269.1 1,435.5 113.7 4,646.5 1,003.9 1,773.5 3,349.1 647 3,901.6

36 41 52 36 35 46 37 34 56 51 44 39 51 41 47 46 32 58 32 49 48 40 42 30 51 40 30 42 52 59 34 38 47 31 56 46 58 39 38 46 45 54 58

1 0 1 1 0 1 1 1 0 0 0 1 1 1 1 1 0 0 1 1 1 1 0 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1

0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1

0 1 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0

Copyright Cengage Learning. Powered by Cognero.

1 3 4 2 2 3 1 5 1 2 5 3 1 2 2 3 2 4 2 3 2 6 1 2 5 2 1 2 3 4 2 3 2 2 1 2 3 3 2 2 3 3 4

0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 Page 219


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 1,603.2 1,308.1 4,061.5 2,283.1 1,023 1,083.2 1,158.6 1,052 592.2 6,834.4 1,505.7 1,170 1,509.6 1,061 517.2 1,661.7 1,279.5 1,656.5 1,319.8 1,227.5 1,748.8 1,060 1,119.6 1,135 2,777.1 1,535.6 352 1,605.8 5,737.2 3,354.3 10,096.1 9,164 6,796.7 2,108.9 265.2 1,097 1,041 1,224.9 1,557.7 3,202.2 1,173 1,794.3 2,423.5

38 39 35 57 40 52 26 30 35 41 57 56 43 52 60 46 57 46 49 38 51 38 44 59 50 34 59 42 48 45 57 53 48 57 52 55 58 51 44 57 39 41 47

1 0 0 1 1 0 1 1 1 0 1 0 1 0 0 1 0 0 1 0 0 1 1 0 1 0 0 1 0 0 1 0 1 1 0 1 0 0 1 0 1 0 0

1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1

0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1

Copyright Cengage Learning. Powered by Cognero.

2 2 1 3 2 4 1 2 3 3 3 6 4 3 5 2 1 2 2 1 3 4 2 3 2 1 5 2 2 1 2 3 4 5 3 3 2 1 3 2 3 1 2

0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 Page 220


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 171.8 12,157.9 4,107 887.9 1,165.1 643.5 1,529.1 2,142.7 1,035 1,003.9 1,509.6 1,118.3 1,124.8 1,891.8 6,796.7 1,709.8 1,011.7 1,270.4 1,663 1,648.7 1,887.9 1,244.4 2,465.1 6,086.9 1,262.6 1,513.5 1,170.3 1,557.7 2,454.7 710.8 3,711.8 1,748.8 1,248.3 1,002.6 1,130 1,040.3 1,595.4 1,144 1,582.4 1,049 1,577.2 561 3,349.1

52 50 52 47 44 50 45 59 54 48 43 37 46 49 48 54 30 52 39 33 47 37 36 38 57 27 29 45 57 41 56 52 59 51 41 47 50 34 52 53 39 40 46

0 1 1 0 0 0 1 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 0 1 1 0 0 1 1 0 0 0 1 0 0 0 1 1 1 0 1 0 0

1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1

0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1

Copyright Cengage Learning. Powered by Cognero.

3 3 2 2 3 1 1 1 2 3 1 2 2 1 1 2 3 2 3 2 2 3 3 3 3 1 3 2 1 1 2 2 2 2 1 2 3 1 3 1 2 3 2

1 0 0 1 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 1 0 Page 221


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 1,704.6 1,245.7 16,191.8 2,185.6 1,167.7 1,535.6 1,319.8 1,145.6 1,304.2 1,851.5 2,099.8 1,152 1,219.7 1,235.3 1,811.2 732.5 5,630.6 2,420.9 2,454.7 1,557.7 4,017.3 4,017.3 1,351 1,507 1,050.7 1,657.8 1,115 245.9 1,058.5 1,377 1,079.3 1,456.3 2,063.4 1,106.6 1,119.6 2,496.3 1,578.5 1,284.7 1,409.5 1,085.8 1,083 1,556.4 1,080.6

45 35 54 37 40 34 50 49 53 49 40 34 49 50 30 28 56 67 58 46 72 72 41 29 60 36 37 59 56 60 34 41 29 26 60 53 42 47 53 44 38 49 41

1 0 0 0 1 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 1 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 1 1 0 1 1 0

1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1

0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0

Copyright Cengage Learning. Powered by Cognero.

3 2 3 3 1 2 1 1 3 3 1 1 2 3 1 2 3 3 3 2 2 1 2 3 1 1 1 2 3 1 1 3 2 2 1 2 1 3 3 2 3 3 3

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Page 222


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 1,457.6 1,478.4 1,690.3 1,458.9 1,465.4 1,002.6 1,728 1,015.6 1,163.8 1,299 1,400.4 1,005.2 1,341.9 1,032.5 1,236.6 1,087.1 1,170.3 1,237.9 1,296.4 1,182 1,133 1,629.2 1,830.7 1,137.8 2,011.4 1,70.3 1,135.2 195

33 47 51 56 46 51 34 31 48 59 38 27 40 39 48 34 55 55 51 37 39 52 36 36 56 36 39 29

0 1 0 1 0 1 0 1 0 0 1 1 1 0 0 0 0 0 0 1 0 0 0 1 0 0 1 1

1 1 1 1 1 0 1 0 1 1 1 1 0 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1

0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 1 1 0 0

2 3 1 3 3 1 1 2 1 2 1 3 2 1 1 1 1 1 2 1 1 3 3 3 3 1 3 1

0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1

ANSWER: Using Mallow’s Cp statistic to guide the selection, we see that the model using 2 independent variables seem to be viable candidates. We will select the model with 2 variables (3 coefficients including the intercept). 63. A bank wants to understand better the details of customers who are likely to default on their loan. In order to analyze this, the data from a random sample of 200 customers are given below: Using appropriate software, partition the data so there are 50 percent successes (Loan default) in the training set and 40 percent of the validation data are taken away as test data. Construct a logistic regression model using Loan default as the output variable and all the other variables as input variables. Perform an exhaustive-search best subset selection with the number of best subsets equal to 2. Generate lift charts for both the validation data and test data. From the generated set of logistic regression models, select one that is a good fit. Express the model as a mathematical equation relating the output variable to the input variables. Do the relationships suggested by the model make sense? Try to explain them. Average Balance Age Gender Married Divorced Family Size Loan Default Copyright Cengage Learning. Powered by Cognero.

Page 223


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 1,222.3 6,291 1,051 1,118.3 1,176.8 1,052 1,314.6 439.7 1,232.7 1,855.4 322.4 1,570.7 2,729 1,397.8 1,464.1 40.3 1,296.4 2,142.7 2,756.3 1,451.1 1,003.9 1,245.7 3,011.1 1,222.3 2,225.9 2,708.2 2,341.6 1,817.7 1,417.3 4,291.6 1,310.7 1,144 1,088.4 1,341.9 1,269.1 1,435.5 113.7 4,646.5 1,003.9 1,773.5 3,349.1 647 3,901.6

36 41 52 36 35 46 37 34 56 51 44 39 51 41 47 46 32 58 32 49 48 40 42 30 51 40 30 42 52 59 34 38 47 31 56 46 58 39 38 46 45 54 58

1 0 1 1 0 1 1 1 0 0 0 1 1 1 1 1 0 0 1 1 1 1 0 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1

0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1

0 1 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0

Copyright Cengage Learning. Powered by Cognero.

1 3 4 2 2 3 1 5 1 2 5 3 1 2 2 3 2 4 2 3 2 6 1 2 5 2 1 2 3 4 2 3 2 2 1 2 3 3 2 2 3 3 4

0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 Page 224


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 1,603.2 1,308.1 4,061.5 2,283.1 1,023 1,083.2 1,158.6 1,052 592.2 6,834.4 1,505.7 1,170 1,509.6 1,061 517.2 1,661.7 1,279.5 1,656.5 1,319.8 1,227.5 1,748.8 1,060 1,119.6 1,135 2,777.1 1,535.6 352 1,605.8 5,737.2 3,354.3 10,096.1 9,164 6,796.7 2,108.9 265.2 1,097 1,041 1,224.9 1,557.7 3,202.2 1,173 1,794.3 2,423.5

38 39 35 57 40 52 26 30 35 41 57 56 43 52 60 46 57 46 49 38 51 38 44 59 50 34 59 42 48 45 57 53 48 57 52 55 58 51 44 57 39 41 47

1 0 0 1 1 0 1 1 1 0 1 0 1 0 0 1 0 0 1 0 0 1 1 0 1 0 0 1 0 0 1 0 1 1 0 1 0 0 1 0 1 0 0

1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1

0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1

Copyright Cengage Learning. Powered by Cognero.

2 2 1 3 2 4 1 2 3 3 3 6 4 3 5 2 1 2 2 1 3 4 2 3 2 1 5 2 2 1 2 3 4 5 3 3 2 1 3 2 3 1 2

0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 Page 225


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 171.8 12,157.9 4,107 887.9 1,165.1 643.5 1,529.1 2,142.7 1,035 1,003.9 1,509.6 1,118.3 1,124.8 1,891.8 6,796.7 1,709.8 1,011.7 1,270.4 1,663 1,648.7 1,887.9 1,244.4 2,465.1 6,086.9 1,262.6 1,513.5 1,170.3 1,557.7 2,454.7 710.8 3,711.8 1,748.8 1,248.3 1,002.6 1,130 1,040.3 1,595.4 1,144 1,582.4 1,049 1,577.2 561 3,349.1

52 50 52 47 44 50 45 59 54 48 43 37 46 49 48 54 30 52 39 33 47 37 36 38 57 27 29 45 57 41 56 52 59 51 41 47 50 34 52 53 39 40 46

0 1 1 0 0 0 1 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 0 1 1 0 0 1 1 0 0 0 1 0 0 0 1 1 1 0 1 0 0

1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1

0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1

Copyright Cengage Learning. Powered by Cognero.

3 3 2 2 3 1 1 1 2 3 1 2 2 1 1 2 3 2 3 2 2 3 3 3 3 1 3 2 1 1 2 2 2 2 1 2 3 1 3 1 2 3 2

1 0 0 1 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 1 0 Page 226


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 1,704.6 1,245.7 16,191.8 2,185.6 1,167.7 1,535.6 1,319.8 1,145.6 1,304.2 1,851.5 2,099.8 1,152 1,219.7 1,235.3 1,811.2 732.5 5,630.6 2,420.9 2,454.7 1,557.7 4,017.3 4,017.3 1,351 1,507 1,050.7 1,657.8 1,115 245.9 1,058.5 1,377 1,079.3 1,456.3 2,063.4 1,106.6 1,119.6 2,496.3 1,578.5 1,284.7 1,409.5 1,085.8 1,083 1,556.4 1,080.6

45 35 54 37 40 34 50 49 53 49 40 34 49 50 30 28 56 67 58 46 72 72 41 29 60 36 37 59 56 60 34 41 29 26 60 53 42 47 53 44 38 49 41

1 0 0 0 1 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 1 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 1 1 0 1 1 0

1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1

0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0

Copyright Cengage Learning. Powered by Cognero.

3 2 3 3 1 2 1 1 3 3 1 1 2 3 1 2 3 3 3 2 2 1 2 3 1 1 1 2 3 1 1 3 2 2 1 2 1 3 3 2 3 3 3

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Page 227


Name:

Class:

Date:

Chapter 09 - Predictive Data Mining 1,457.6 1,478.4 1,690.3 1,458.9 1,465.4 1,002.6 1,728 1,015.6 1,163.8 1,299 1,400.4 1,005.2 1,341.9 1,032.5 1,236.6 1,087.1 1,170.3 1,237.9 1,296.4 1,182 1,133 1,629.2 1,830.7 1,137.8 2,011.4 1,70.3 1,135.2 195

33 47 51 56 46 51 34 31 48 59 38 27 40 39 48 34 55 55 51 37 39 52 36 36 56 36 39 29

0 1 0 1 0 1 0 1 0 0 1 1 1 0 0 0 0 0 0 1 0 0 0 1 0 0 1 1

1 1 1 1 1 0 1 0 1 1 1 1 0 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1

0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 1 1 0 0

2 3 1 3 3 1 1 2 1 2 1 3 2 1 1 1 1 1 2 1 1 3 3 3 3 1 3 1

0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1

ANSWER: Using Mallow’s Cp statistic to guide the selection, we see that the model using 2 independent variables seem to be viable candidates. We will select the model with 2 variables (3 coefficients including the intercept).

Copyright Cengage Learning. Powered by Cognero.

Page 228


Name:

Class:

Date:

Chapter 10 - Spreadsheet Models Multiple Choice 1. Which of the following is true of spreadsheet packages used in business analytics? a. They are more expensive than specialized packages. b. They require substantial user training. c. They come preloaded on computers. d. They do not have specialized functions to perform detailed analyses. ANSWER: c 2. Spreadsheet models are referred to as what-if models because they _____. a. are mathematical and logic-based models b. allow easy instantaneous recalculation for a change in model inputs c. come preloaded on computers d. have specialized functions to perform detailed analysis ANSWER: b 3. A _____ decision is one in which companies have to decide whether they should manufacture a product or outsource production to another firm. a. goal seek b. two-way c. voting-based d. make-versus-buy ANSWER: d 4. The modeling process begins with the framing of a _____ model that shows the relationships between the various parts of the problem being modeled. a. mathematical b. conceptual c. circular d. correlation ANSWER: b 5. The conceptual model _____. a. helps in organizing the data requirements b. controls the model inputs c. has tools defined to identify the optimal solution d. explores the effects of changing model parameters ANSWER: a 6. A(n) _____ is a visual representation that shows which entities affect others in a model. a. decision tree diagram b. influence diagram c. entity chart d. time series plot ANSWER: b 7. What do nodes in an influence diagram represent? a. Parts of the model b. Influence levels c. Road maps d. Environmental factors ANSWER: a Copyright Cengage Learning. Powered by Cognero.

Page 1


Name:

Class:

Date:

Chapter 10 - Spreadsheet Models 8. The influence in an influence diagram is visually depicted by _____. a. a circular symbol. b. an arrow. c. a straight line. d. the height of the influence diagram. ANSWER: b 9. Which of the following approaches is a good way to proceed with the influence diagram building for a problem? a. The influence diagram for the entire problem is built first and then separate portions are clustered to form separate models. b. The influence diagram for all the model parts at the same level are built in parallel to reduce the likelihood of error. c. The influence diagram is reverse engineered—the diagram is developed in the opposite direction starting with the model output. d. The influence diagram for a portion of the problem is built first and then expanded until the total problem is conceptually modeled. ANSWER: d 10. Using the diagram below, which of the following would be a likely mathematical expression for Total Cost?

a. Total Cost = Total Variable Cost × Fixed Cost b. Total Cost = Fixed Cost + Total Variable Cost c. Total Cost = Total Variable Cost + Total Revenue × Production Volume d. Total Cost = Fixed Cost + Total Variable Cost + Production Volume ANSWER: b 11. Which of the following would be a likely mathematical expression for Total Variable Cost?

Copyright Cengage Learning. Powered by Cognero.

Page 2


Name:

Class:

Date:

Chapter 10 - Spreadsheet Models

a. Total Variable Cost = Production Volume × Revenue per Unit b. Total Variable Cost = Material Cost per Unit × Labor Cost per Unit c. Total Variable Cost = Total Cost – (Material Cost per Unit + Labor Cost per Unit) d. Total Variable Cost = (Material Cost per Unit + Labor Cost per Unit) × Production Volume ANSWER: d 12. Which of the following would be a likely mathematical expression for Total Revenue?

a. Total Revenue = Production Volume + Revenue per Unit Copyright Cengage Learning. Powered by Cognero.

Page 3


Name:

Class:

Date:

Chapter 10 - Spreadsheet Models b. Total Revenue = Profit – Production Volume × Revenue per Unit c. Total Revenue = Production Volume × Revenue per Unit d. Total Revenue = Total Variable Cost + Production Volume + Revenue per Unit ANSWER: c 13. With reference to a spreadsheet model, an uncontrollable model input is known as a(n) _____. a. decision variable b. dummy variable c. parameter d. statistic ANSWER: c 14. A(n) _____ refers to a model input that can be controlled in a spreadsheet model. a. decision variable b. outlier c. parameter d. dummy variable ANSWER: a 15. Which of the following design guidelines, if followed, enables the user to update the model parameters without the risk of mistakenly creating an error in a formula? a. Separating the parameters from the spreadsheet model b. Documenting the spreadsheet model c. Using numbers in the spreadsheet formula d. Using simple formulas ANSWER: a 16. Navigation in a spreadsheet model can be facilitated by _____. a. using different spreadsheets for each formula in the model b. using long calculations in the cells c. using clear labels and proper formatting and alignment d. referencing data by using hyperlinks to the problem statement ANSWER: c 17. An Excel _____ quantifies the impact of changing the value of a specific input on an output of interest. a. Watch Window b. Data Table c. Goal Seek d. Chart ANSWER: b 18. A one-way data table summarizes _____. a. a single input’s impact on the output of interest b. multiple inputs' impact on a single output of interest c. values of the input cells that will cause the single output value to equal zero d. values of cells when not all of the model is observable on the screen ANSWER: a 19. The impact of two inputs on the output of interest is summarized by a _____. a. Goal Seek b. Watch Window Copyright Cengage Learning. Powered by Cognero.

Page 4


Name:

Class:

Date:

Chapter 10 - Spreadsheet Models c. multiple-way data table ANSWER: d

d. two-way data table

20. Excel’s _____ tool allows the user to determine the value of an input cell that will cause the value of a related output cell to equal some specified value. a. Goal Seek b. Watch Window c. Data Validation d. XLMiner ANSWER: a 21. The SUM function in Excel _____. a. adds up all the numbers in the cells diagonally b. adds up only positive numbers in a range of cells c. adds up all the numbers in a range of cells d. adds up the cells specified by a given condition or criteria ANSWER: c 22. The _____ function pairs each element of the first array with its counterpart in the second array, multiplies the elements of the pairs together, and adds the results. a. SUM b. SUMPRODUCT c. SUMIF d. VLOOKUP ANSWER: b 23. With reference to the SUMPRODUCT function, which of the following statements is true? a. The range of cells for each array must contain only nonzero values. b. Any cell that does not satisfy the specified given condition or criteria will not be considered. c. The array appearing as the first argument must be sorted in ascending order. d. The arrays that appear as arguments must be of the same dimension. ANSWER: d 24. The _____ function is used for the conditional computation of expressions in Excel. a. EFFECT b. IF c. FALSE d. NOT ANSWER: b 25. The arguments supplied to the IF function, in order, are the condition for execution, _____. a. the result if condition is true, and the result if condition is false b. and the range of cells to test c. the array1 of data cells to test, and the array2 of data cells to output d. the result if condition is false, and the result if condition is true ANSWER: a 26. Within a given range of cells, the number of times a particular condition is satisfied is computed by using the _____ function. a. SUMIF b. IF c. VLOOKUP d. COUNTIF Copyright Cengage Learning. Powered by Cognero.

Page 5


Name:

Class:

Date:

Chapter 10 - Spreadsheet Models ANSWER: d 27. The _____ function allows the user to pull a subset of data from a larger table of data based on some criterion. a. VLOOKUP b. IF c. SUMIF d. COUNTIF ANSWER: a 28. The condition that VLOOKUP assumes is that _____. a. there are no nonzero values in the range. b. all the arguments are of the same dimension. c. the first column of the table is sorted in ascending order. d. the columns with empty cells are to be neglected. ANSWER: c 29. The VLOOKUP with range set to _____ takes the first argument and searches the first column of the table for the last row that is strictly less than the first argument. a. FALSE b. TRUE c. LESS d. NULL ANSWER: b 30. Excel searches for an exact match of the first argument in the first column of the data when the range in the VLOOKUP function is _____. a. TRUE b. NULL c. FALSE d. EXACT ANSWER: c 31. The _____ button, located in the Formula Auditing group, creates arrows pointing to the selected cell from cells that are part of the formula in that cell. a. Trace Dependents b. Trace Precedents c. Error Checking d. Watch Window ANSWER: b 32. Arrows pointing from the selected cell to cells that depend on the selected cell are generated by using the _____ button of the Formula Auditing group. a. Error Checking b. Trace Precedents c. Trace Dependents d. Watch Window ANSWER: c 33. The following table is used to lookup information on a specific product. The Product ID is entered into cell B2 and the information is returned in the shaded box. There are several formulas in the table. How could I look at the cells that use the value in cell B2?

Copyright Cengage Learning. Powered by Cognero.

Page 6


Name:

Class:

Date:

Chapter 10 - Spreadsheet Models

a. Select "Trace Dependents" in the Formula Auditing Group on the Formulas tab. b. Select "Trace Precedents" in the Formula Auditing Group on the Formulas tab. c. Select "Show Formulas" in the Formula Auditing Group on the Formulas tab. d. Select "Evaluate Formulas" in the Formula Auditing Group on the Formulas tab. ANSWER: a 34. The function of Trace Precedents and Trace Dependents is to _____. a. highlight errors in copying and formula construction b. ascertain how model parts are segregated c. trace the range of cells included in the Watch Window box list d. investigate the cell calculations in great detail ANSWER: a 35. The _____ button in the Formula Auditing group allows the user to inspect each formula in detail in its cell location. a. Evaluate Formula b. Error Checking c. Watch Window d. Show Formulas ANSWER: d 36. The calculations of a cell can be investigated in great detail by using the _____ button. a. Calculation Options b. Evaluate Formula c. Error Checking d. Show Formulas ANSWER: b 37. Which of the following tools provides an excellent means of identifying the exact location of an error in a formula? a. Error Checking b. Function Library Copyright Cengage Learning. Powered by Cognero.

Page 7


Name:

Class:

Date:

Chapter 10 - Spreadsheet Models c. Show Formulas ANSWER: d

d. Evaluate Formula

38. The _____ button provides an automatic means of checking for mathematical errors within formulas of a worksheet. a. Error Checking b. Trace Precedents c. Watch Window d. Math & Trig ANSWER: a 39. The user can monitor how listed cells change with a change in the model without searching through the worksheet or changing from one worksheet to another by using the _____ functionality. a. Goal Seek b. Evaluate Formula c. Watch Window d. Trial-and-Error ANSWER: c 40. The Watch Window is observable _____. a. only when the complete model is observable on the screen b. only in the same worksheet of a workbook c. across different worksheets of a workbook d. across different workbooks in the same folder ANSWER: c Subjective Short Answer 41. When should you use a one-way data table in Excel instead of a two-way data table? ANSWER: A one-way data table should be used, rather than a two-way data table, when you wish to summarize a single input’s impact on the output rather than to summarize the impact of two inputs on the output. 42. A company receives a discount of 10% per unit for all units ordered over a quantity of 50. Suppose cell B1 of an Excel sheet contains the quantity of units ordered and that cell B2 contains the unit price. Write an Excel formula that can be used to calculate the cost of the order. ANSWER: = IF(B1>50, 50*B2 + (B1 – 50)*(0.90*B2), B1*B2) 43. Suppose that cells B1 through B100 of an Excel spreadsheet contain the quantity of units ordered on each of 100 different days. Write an Excel function that would count the number of cells that contain quantities of at least 50. ANSWER: = COUNTIF(B1:B100, “>49”) or = COUNTIF(B1:B100, “>=50”) 44. Suppose that cells B1 through B100 of an Excel spreadsheet contain the quantity of units ordered on each of 100 different days. Write an Excel function that would add the values in these cells. ANSWER: = SUM(B1:B100) 45. Suppose that cells B1 through B100 of an Excel spreadsheet contain the quantity of units ordered on each of 100 different days. Write an Excel function that would add the values of all cells that contain a quantity of at least 50. Copyright Cengage Learning. Powered by Cognero.

Page 8


Name:

Class:

Date:

Chapter 10 - Spreadsheet Models ANSWER: = SUMIF(B1:B100, “>49”) or = SUMIF(B1:B100, “>=50”) 46. The Gatson manufacturing company has estimated the following components for a new product. Fixed cost = $50,000 Material cost per unit = $2.15 Labor cost per unit = $2.00 Revenue per unit = $7.50 a. Build an influence diagram that illustrates how to calculate profit. b. Using mathematical notation, construct a mathematical model for calculating profit. c. Implement your model from part (b) in an Excel spreadsheet model using the principles of good spreadsheet design. d. Using the spreadsheet model, what will be the resulting profit if the company decides to make 70,000 units of the new product? ANSWER:

a.

b. Let, q = production volume R = revenue per unit FC = the fixed costs of production MC = material cost per unit LC = labor cost per unit P(q) = total profit for producing (and selling) q units P(q) = (R) × (q) – FC – (MC) × (q) – (LC) × (q) c.

Copyright Cengage Learning. Powered by Cognero.

Page 9


Name:

Class:

Date:

Chapter 10 - Spreadsheet Models

d. Profit of $184,500 will be earned from a production volume of 70,000. 47. The Gatson manufacturing company has estimated the following components for a new product. Fixed cost = $50,000 Material cost per unit = $2.15 Labor cost per unit = $2.00 Revenue per unit = $7.50 a. Construct a spreadsheet model and then construct a one-way data table with production volume as the column input and profit as the output. Breakeven occurs when profit is zero. Vary production volume from 0 to 100,000 in increments of 10,000. In which interval of production volume does breakeven occur? b. Using the appropriate Excel tool, find the exact breakeven point. ANSWER:

a.

Copyright Cengage Learning. Powered by Cognero.

Page 10


Name:

Class:

Date:

Chapter 10 - Spreadsheet Models

Breakeven appears in the production volume interval of 10,000 to 20,000 units. b.

The breakeven point is 14,925 units. 48. The Gatson manufacturing company has estimated the following components for a new product. Fixed cost = $50,000 Material cost per unit = $2.15 Labor cost per unit = $2.00 Revenue per unit = $7.50 Construct a spreadsheet model and then use a two-way data table to show how the profit changes as a function of different production volumes and different values of material cost per unit. Vary the production volume from 0 to 100,000 Copyright Cengage Learning. Powered by Cognero.

Page 11


Name:

Class:

Date:

Chapter 10 - Spreadsheet Models in increments of 10,000. The five different material costs are $1.50, $1.95, $2.15, $2.85, and $3.25. ANSWER:

49. A company asked one of its analysis teams to analyze and create models that help decide whether it should manufacture a particular product or outsource its production. The different components are given below. Fixed Cost, FC = $25,000 Material Cost per Unit, MC = $2.15 Labor Cost per Unit, LC = $2.00 Outsourcing Cost per Unit, O = $4.50 a. Build an influence diagram that illustrates how to calculate the difference in cost of manufacturing and outsourcing. b. Using mathematical notation, construct a mathematical model for calculating the difference in cost of manufacturing and outsourcing. c. Implement your model from part (b) in an Excel spreadsheet model using the principles of good spreadsheet design. d. Using the spreadsheet model, what will be the resulting savings due to outsourcing if the company wants to make 30,000 units of a particular product? ANSWER:

a.

Copyright Cengage Learning. Powered by Cognero.

Page 12


Name:

Class:

Date:

Chapter 10 - Spreadsheet Models

b. Savings by Outsourcing (SBO) is the difference between the Total Outsource Cost (TOC) and the Total Manufacturing Cost (TMC) SBO = TOC – TMC The cost-volume model for manufacturing q units is TMC = FC + ((MC + LC) × q) And, a mathematical model for outsourcing q units is TOC = O × q c.

d. The model shows that the cost of manufacturing 30,000 units is $149,500 and the cost of outsourcing the same 30,000 units is $135,000. The savings from outsourcing is $14,500. 50. A company asked one of its analysis teams to analyze and create models that help decide whether it should manufacture a particular product or outsource its production. The different components are given below. Fixed Cost, FC = $25,000 Material Cost per Unit, MC = $2.15 Labor Cost per Unit, LC = $2.00 Outsourcing Cost per Unit, O = $4.50

Copyright Cengage Learning. Powered by Cognero.

Page 13


Name:

Class:

Date:

Chapter 10 - Spreadsheet Models a. Build a spreadsheet model and then construct a one-way data table with production volume as the column input and savings due to outsourcing as the output. Breakeven occurs when savings equal zero. Vary production volume from 0 to 100,000 in increments of 10,000. In which interval of production volume does breakeven occur? b. Using the appropriate Excel tool, find the exact breakeven point. ANSWER: a.

Breakeven appears in the production volume interval of 70,000 to 80,000 units. b.

Copyright Cengage Learning. Powered by Cognero.

Page 14


Name:

Class:

Date:

Chapter 10 - Spreadsheet Models

The breakeven point is approximately 71,429 units. 51. A company asked one of its analysis teams to analyze and create models that help decide whether it should manufacture a particular product or outsource its production. The different components are given below. Fixed Cost, FC = $25,000 Material Cost per Unit, MC = $2.15 Labor Cost per Unit, LC = $2.00 Outsourcing Cost per Unit, O = $4.50 Build a spreadsheet model and then use a two-way data table to show how the savings due to outsourcing changes as a function of different production volume and different bids on per-unit cost for outsourcing. Vary the production volume from 0 to 100,000 in increments of 10,000. The six bids are $3.11, $3.49, $4.50, $4.98, $5.12, and $5.45. ANSWER:

Copyright Cengage Learning. Powered by Cognero.

Page 15


Name:

Class:

Date:

Chapter 10 - Spreadsheet Models

52. A clothing retail store offers a discount at the rate of 10% on the customer bill if the purchase exceeds $100. The owner of the store wishes to know the total amount that has been discounted on the customers’ purchases on a particular day. The purchase amount for each of the 12 customers who visited the store on that day is given below. Display your use of the IF and SUM functions and calculate the total amount discounted on this single day purchases. Customer 1 2 3 4 5 6 7 8 9 10 11 12 ANSWER:

Purchase Amount ($) 123 114 43 123 32 72 119 52 89 116 176 9

The total amount discounted on a particular day is $77.10. 53. Given below is a sample list of 20 products in a grocery store with the product code, the price, and the associated discount rates. Product code A003

Price ($) 4.00

Copyright Cengage Learning. Powered by Cognero.

Discount (%) 5 Page 16


Name:

Class:

Date:

Chapter 10 - Spreadsheet Models A345 B985 C765 F302 B453 A109 F432 D234 B432 D765 A406 D203 F405 C432 C106 D324 F456 A156 B654

2.70 4.50 1.50 3.00 6.80 9.50 4.80 5.40 2.60 6.90 2.60 5.40 3.60 5.20 3.20 1.30 5.20 2.50 1.10

0 5 0 5 10 10 5 10 0 10 5 10 0 5 5 0 10 5 0

a. Display your use of the VLOOKUP function and find the price of the products A109, F432, B985, D203, C432, B654, and A345. b. Display your use of the COUNTIF function and determine the number of products associated with each discount rate: 0%, 5%, and 10%, from the provided list. ANSWER:

a.

Copyright Cengage Learning. Powered by Cognero.

Page 17


Name:

Class:

Date:

Chapter 10 - Spreadsheet Models b.

54. The average cost/unit for the production of a particular component at a manufacturing plant varies with the number of units produced in each batch. The data are given below. Number of Units Produced 0–49 50–100

Cost/Unit $37.72 $25.02

Suppose the selling price of each unit is $35. a. Build a model to calculate the profit of the manufacturing industry if the demand is 20. b. Construct a data table that shows the profit per unit as a function of demand if the demand ranges between 20 units through 80 units in increments of 10 units. ANSWER: a.

Copyright Cengage Learning. Powered by Cognero.

Page 18


Name:

Class:

Date:

Chapter 10 - Spreadsheet Models b.

55. The average cost/unit for the production of a particular component at a manufacturing plant varies with the number of units produced in each batch. The data are given below.

Number of units produced

Cost/unit

0–49 50–100

$37.72 $25.02

Suppose the selling price of each unit is $35. Use a two-way data table to show how the profit changes as a function of demand and the selling price of the product. Vary the demand from 20 units to 80 units in increments of 10 units and selling price from $30 to $40 in increments of $2. ANSWER:

56. An electronics store sells two models of television. The sales of these two models, X and Y, are dependent, that is, if the price of one increases, the demand for the other increases. A study is made to find the relationship between the demand (D) and the price (P) in order to maximize the revenue from these products. The result of the study is shown below. Copyright Cengage Learning. Powered by Cognero.

Page 19


Name:

Class:

Date:

Chapter 10 - Spreadsheet Models DX = 476 – 0.54 PX + 0.22 PY DY = 601 + 0.12 PX – 0.54 PY a. Construct a model for the total revenue and implement it on a spreadsheet. b. Develop a two-way data table to estimate the optimal prices of each of the two products in order to maximize the total revenue. Vary price of each product from $600 to $900 in increments of $50. ANSWER:

a.

b. From the table shown below, the maximum revenue occurs at prices $700 and $800 for TV models X and Y, respectively.

57. The selling price of each product sold in a furnishing showroom and the number of units of each of these products sold during a period of one month are given below. The rental cost of the showroom is $225, and the other costs incurred are included in the cost/unit. Product code AD12 FD23 BD34 AG56 ET76 FA56

Price/Unit($) 232 334 342 267 345 235

Cost/Unit($) 162.40 233.80 239.40 186.90 241.50 164.50

Copyright Cengage Learning. Powered by Cognero.

Units 12 24 5 11 15 23 Page 20


Name:

Class:

Date:

Chapter 10 - Spreadsheet Models DE78 BF32

546 245

382.20 171.50

34 22

Display your use of the SUMPRODUCT function and find the profit earned by the showroom in a month. ANSWER:

The profit earned is $14,769.30. 58. Suppose a company supplies four of its products A, B, C, and D, to five different regions. The management wants to know the total number of all products supplied to each region as well as the total number of units of each product Copyright Cengage Learning. Powered by Cognero.

Page 21


Name:

Class:

Date:

Chapter 10 - Spreadsheet Models supplied. The data collected over a period of month are given below. Region Region 1 Region 2 Region 3 Region 4 Region 5 Region 1 Region 2 Region 3 Region 4 Region 5 Region 1 Region 2 Region 3 Region 4 Region 5 Region 1 Region 2 Region 3 Region 4 Region 5

Model A A A A A B B B B B C C C C C D D D D D

Number of units 1,784 2,170 415 2,040 2,991 947 2,111 1,234 2,061 607 2,907 4,790 2,191 1,942 220 2,557 2,980 1,518 2,957 4,462

Display your use of the SUMIF function and find the total volume by each region and total volume by each product. ANSWER:

Copyright Cengage Learning. Powered by Cognero.

Page 22


Name:

Class:

Date:

Chapter 10 - Spreadsheet Models

59. John would like to establish a retirement plan that returns an amount of $100,000 in 20 years. Build a spreadsheet model to calculate the amount John must contribute at the end of each year towards his retirement fund, assuming an annual interest rate of 6%. Copyright Cengage Learning. Powered by Cognero.

Page 23


Name:

Class:

Date:

Chapter 10 - Spreadsheet Models Use the Excel function =PMT(rate, nper, pv, fv, type) The arguments of this function are rate = the interest rate for the loan nper = the total number of payments pv = present value (the amount borrowed which is 0 in this case) fv = future value (in the formula, indicate this value as negative as the future value command assumes a stream of payments not deposits) type = payment type (0 = end of period, 1 = beginning of the period) Also, construct a one-way table with interest rate as the column variable and the amount contributed at the end of each year as the output. Vary the interest rate from 4% to 7% in increments of 0.5%. ANSWER:

Key cell formula:

60. Starsystems is a small information systems company that employs 50 workers. The employee details for a particular month are given below.

Name John

Age (in years) 47

Gender M

Copyright Cengage Learning. Powered by Cognero.

Work experience (in years) 22

Number of Income (in leaves taken 1000 $) in a month 53 0 Page 24


Name:

Class:

Date:

Chapter 10 - Spreadsheet Models Olivia Gabriel Logan James Ava Isabella Sophia Joshua Abigail Anthony Matthew Jayden Emily Alexis Angel Ryan Michael Grace Julia Ella Noah Madison Tyler Jose Samantha Lily Elizabeth Anna Luis Jackson Aiden Madison Lillian Natalie Christopher Taylor Wyatt Chloe Jack Sarah Mason Mason Alanis Brooklyn Jessica Chase Aiden David

26 38 37 44 55 44 30 63 34 52 55 52 63 51 41 37 46 30 48 50 56 35 39 48 51 27 57 33 58 46 32 56 35 47 50 57 38 52 56 47 54 25 40 61 29 52 56 61

F M M M F F F M F M M M F F M M M F F F M F M M F F F F M M M F F F M F M F M F M M F F F M M M

Copyright Cengage Learning. Powered by Cognero.

3 16 12 22 30 23 5 35 8 26 25 28 29 30 18 14 23 6 25 22 31 9 13 22 21 3 32 12 33 21 6 28 12 23 23 25 15 24 31 24 31 2 16 30 6 25 31 33

22 29 32 32 45 50 22 56 23 29 34 45 23 32 21 43 23 18 34 21 24 23 29 34 39 26 49 39 32 45 23 45 28 38 32 32 25 22 19 34 45 21 34 49 34 39 54 43

3 1 4 0 5 0 4 2 3 2 0 1 1 0 4 0 3 0 5 0 1 1 2 2 3 1 0 4 3 1 4 0 0 1 3 1 2 3 3 2 1 4 5 0 4 0 1 0 Page 25


Name:

Class:

Date:

Chapter 10 - Spreadsheet Models Andrew

26

M

4

23

2

a. The administrative manager of the company wants to know the total number of employees who were on leave for 4 days and 5 days in this month. Display your use of the COUNTIF function and provide the desired information. b. Now, the manger wants the details of employees, Ava, Julia, and Alanis who are working in the company. Display your use of the VLOOKUP function and provide the employees’ details.

ANSWER:

a. A part of the spreadsheet model is given below:

Key cell formulas:

b. A part of the spreadsheet model is given below:

Copyright Cengage Learning. Powered by Cognero.

Page 26


Name:

Class:

Date:

Chapter 10 - Spreadsheet Models

Key cell formulas:

61. Starsystems is a small information systems company which employs 50 workers. The employee details are given below.

Name John Olivia Gabriel Logan James Ava Isabella Sophia Joshua Abigail Anthony Matthew Jayden

Age (in years) 47 26 38 37 44 55 44 30 63 34 52 55 52

Number Work Income of leaves experience (in 1000 taken in Gender (in years) $) a month M 22 53 0 F 3 22 3 M 16 29 1 M 12 32 4 M 22 32 0 F 30 45 5 F 23 50 0 F 5 22 4 M 35 56 2 F 8 23 3 M 26 29 2 M 25 34 0 M 28 45 1

Copyright Cengage Learning. Powered by Cognero.

Page 27


Name:

Class:

Date:

Chapter 10 - Spreadsheet Models Emily Alexis Angel Ryan Michael Grace Julia Ella Noah Madison Tyler Jose Samantha Lily Elizabeth Anna Luis Jackson Aiden Madison Lillian Natalie Christopher Taylor Wyatt Chloe Jack Sarah Mason Mason Alanis Brooklyn Jessica Chase Aiden David Andrew

63 51 41 37 46 30 48 50 56 35 39 48 51 27 57 33 58 46 32 56 35 47 50 57 38 52 56 47 54 25 40 61 29 52 56 61 26

F F M M M F F F M F M M F F F F M M M F F F M F M F M F M M F F F M M M M

29 30 18 14 23 6 25 22 31 9 13 22 21 3 32 12 33 21 6 28 12 23 23 25 15 24 31 24 31 2 16 30 6 25 31 33 4

23 32 21 43 23 18 34 21 24 23 29 34 39 26 49 39 32 45 23 45 28 38 32 32 25 22 19 34 45 21 34 49 34 39 54 43 23

1 0 4 0 3 0 5 0 1 1 2 2 3 1 0 4 3 1 4 0 0 1 3 1 2 3 3 2 1 4 5 0 4 0 1 0 2

a. Find the total number of male and female employees who are working in this company. Display your use of the COUNTIF function. b. Display your use of the SUMIF function and find the average incomes of both male and female employees who are working in this company. ANSWER:

a. A part of the spreadsheet model is given below:

Copyright Cengage Learning. Powered by Cognero.

Page 28


Name:

Class:

Date:

Chapter 10 - Spreadsheet Models

Key cell formulas:

b. A part of the spreadsheet model is given below:

Key cell formulas:

Copyright Cengage Learning. Powered by Cognero.

Page 29


Name:

Class:

Date:

Chapter 10 - Spreadsheet Models

62. Anna operates a consignment shop where she sells clothes for women and children. The average number of consignments sold per month is 1,000. The average material cost and the selling price of each consignment are $8 and $20, respectively. The monthly fixed costs to run this business are given below: Rental cost: $750 Utilities: $150 Advertising: $35 Insurance: $100 Labor cost: $4,000 a. Build an influence diagram that illustrates how to calculate profit. b. Using mathematical notation, give a mathematical model for calculating profit. c. Implement your model from part (b) in Excel using the principles of good spreadsheet design. ANSWER: a.

b. Let R - Rental cost U - Utilities A - Advertising I - Insurance L - Labor cost M - Material cost per consignment C - Consignments sold per month S - Selling price Profit = (C × S) – ((M × C) + (R + U + A + I + L)) c. Copyright Cengage Learning. Powered by Cognero.

Page 30


Name:

Class:

Date:

Chapter 10 - Spreadsheet Models

63. Anna operates a consignment shop where she sells cloths for women and children. The average number of consignments sold per month is 1000. The average material cost and the selling price of each consignment are $8 and $20, respectively. The monthly fixed costs to run this business are given below: Rental cost: $750 Utilities: $150 Advertising: $35 Insurance: $100 Labor cost: $4,000 a. Using a spreadsheet model, construct a one-way data table with number of consignments sold per month as the column input and profit as the output. Breakeven occurs when profit goes from a negative to a positive value. Vary the number of consignments sold per month from 400 to 1,200 in increments of 100. In which interval does breakeven occur? b. Use the appropriate Excel tool to find the exact breakeven point. ANSWER:

a.

Copyright Cengage Learning. Powered by Cognero.

Page 31


Name:

Class:

Date:

Chapter 10 - Spreadsheet Models

Breakeven point occurs in the interval of 400 to 500 consignments sold per month. b.

The breakeven point is approximately 420 consignments. 64. Anna operates a consignment shop where she sells cloths for women and children. The average number of consignments sold per month is 1,000. The average material cost and the selling price of each consignment are $8 and $20, respectively. Copyright Cengage Learning. Powered by Cognero.

Page 32


Name:

Class:

Date:

Chapter 10 - Spreadsheet Models The monthly fixed costs to run this business are given below: Rental cost: $750 Utilities: $150 Advertising: $35 Insurance: $100 Labor cost: $4,000 Use a two-way data table to show profit changes as a function of different number of consignments sold per month and different material costs. Vary the number of consignments from 400 to 1,200 in increments of 100. The eight different material costs are $5.45, $6.23, $6.95, $7.54, $8.23, $8.88, $9, and $9.45. ANSWER:

65. Suppose you have $1,100 and decide to purchase a new model of television that costs you $1,100. You find an electronics store where a gift voucher, worth $50, is offered for this TV model if payment is made in full at the time of purchase. Alternatively, it can be financed at zero-percent (0%) interest for 5 months with a monthly payment of $220. You now have two options: either opt for the zero-percent financing option for the full amount and invest your money at an annual interest rate of 10%; or choose the full-payment option with the $50 discount. Develop a spreadsheet model to find the better option that results in the most savings. Also, find the discount rate for the zero-percent financing option. Hint: Use Goal Seek to find the discount rate that makes the net present value of the payments = $1,050. ANSWER:

Copyright Cengage Learning. Powered by Cognero.

Page 33


Name:

Class:

Date:

Chapter 10 - Spreadsheet Models

Key cell formulas:

Copyright Cengage Learning. Powered by Cognero.

Page 34


Name:

Class:

Date:

Chapter 10 - Spreadsheet Models

The zero-percent interest option saves $45.83 whereas the full-payment option saves $50. Hence, it would be better if the full payment is made at the time of purchase. The discount rate for the zero-percent financing option is 28.58%. 66. The following table is used to look up information on a specific product. I chose to show the dependents of cell B2. How can I remove the arrows that are indicating the dependents?

Copyright Cengage Learning. Powered by Cognero.

Page 35


Name:

Class:

Date:

Chapter 10 - Spreadsheet Models

ANSWER: Select "Remove Arrows" in the Formula Auditing Group on the Formulas tab. 67. Using only the "VLOOKUP" function, transfer all of the data from the table below into the "Individual Sales" table.

Rank 6 8 9 5 1 2 10 7 4 3

Rank 1 2 3 4 5 6 7

Individual Sales Name Sales Susan 5 Thomas 7 Lynn 2 Sean 6 Jerry 20 George 12 Elaine 1 Delores 4 Tony 9 Art 12 Individual Sales Name Sales

Copyright Cengage Learning. Powered by Cognero.

Team red red blue blue yellow red red blue yellow blue

Team

Page 36


Name:

Class:

Date:

Chapter 10 - Spreadsheet Models 8 9 10 ANSWER:

68. Using only the "SUMIF" function, complete the "Team Sales."

Rank 6 8 9 5 1

Individual Sales Name Sales Susan 5 Thomas 7 Lynn 2 Sean 6 Jerry 20

Copyright Cengage Learning. Powered by Cognero.

Team red red blue blue yellow Page 37


Name:

Class:

Date:

Chapter 10 - Spreadsheet Models 2 10 7 4 3

Rank 1 2 3 4 5 6 7 8 9 10 ANSWER:

George Elaine Delores Tony Art

12 1 4 9 12

Individual Sales Name Sales

Copyright Cengage Learning. Powered by Cognero.

red red blue yellow blue

Team

Page 38


Name:

Class:

Date:

Chapter 10 - Spreadsheet Models

69. The following table is used to look up information on a specific product. The Product ID is entered into the box and the information is returned in the green box. What would the lookup formula need to be in C5 to return the information for the Product named in B2?

ANSWER: =VLOOKUP($B$2,$A$14:$F$44,2,FALSE) 70. The following table is used to look up information on a specific product. The Product ID is entered into the box and the information is returned in the green box. What would the lookup formula need to be in C8 to return the information for the Product named in B2?

Copyright Cengage Learning. Powered by Cognero.

Page 39


Name:

Class:

Date:

Chapter 10 - Spreadsheet Models

ANSWER: =VLOOKUP($B$2,$A$14:$F$44,6,FALSE) 71. The following table is used to look up information on a specific product. There are several formulas in the table. How can I look at all the formulas that reside in this worksheet?

Copyright Cengage Learning. Powered by Cognero.

Page 40


Name:

Class:

Date:

Chapter 10 - Spreadsheet Models

ANSWER: Select "Show Formulas" in the Formula Auditing Group on the Formulas tab.

Copyright Cengage Learning. Powered by Cognero.

Page 41


Name:

Class:

Date:

Chapter 11 - Monte Carlo Simulation Multiple Choice 1. A _____ uses repeated random sampling to represent uncertainty in a model representing a real system and that computes the values of model outputs. a. Monte Carlo simulation b. what-if analysis c. deterministic model d. discrete event simulation ANSWER: a 2. A simulation model extends spreadsheet modeling by _____. a. extending the range of parameters for which solutions are computed b. using real-time values for parameters from the application to formulate solutions c. replacing the use of single values for parameters with a range of possible values d. using historical data to make predictions about future values and expected trends ANSWER: c 3. A description of the range and relative likelihood of possible values of an uncertain variable is known as a _____. a. risk analysis b. probability distribution c. base-case scenario d. simulation optimization ANSWER: b 4. A(n) _____ is an input to a simulation model whose value is uncertain and described by a probability distribution. a. identifier b. constraint c. random variable d. decision variable ANSWER: c 5. The outcome of a simulation experiment is a(n) _____. a. objective function b. probability distribution for one or more output measures c. single number d. what-if scenario ANSWER: b 6. An input to a simulation model that is selected by the decision maker is known as a ______. a. random variable b. nonnegativity constraint c. probable input d. controllable input ANSWER: d Copyright Cengage Learning. Powered by Cognero.

Page 1


Name:

Class:

Date:

Chapter 11 - Monte Carlo Simulation 7. The process of evaluating a decision in the face of uncertainty by quantifying the likelihood and magnitude of an undesirable outcome is known as _____. a. risk analysis b. regression analysis c. data mining d. decision tree analysis ANSWER: a 8. In a base-case scenario, the output is determined by assuming _____. a. worst values that can be expected for the random variables of a model b. the mean trial values for the random variables of a model c. best values that can be expected for the random variables of a model d. the most likely values for the random variables of a model ANSWER: d 9. A _____ analysis involves considering alternative values for the random variables and computing the resulting value for the output. a. random b. what-if c. risk d. cluster ANSWER: b 10. A disadvantage of the simple what-if analyses is that _____. a. there are errors induced as a result of rounding b. the optimal solutions are not guaranteed c. there is no indication of the likelihood of various output values d. it cannot compute alternate optimal solutions ANSWER: c 11. The values for random variables in a Monte Carlo simulation are ______. a. selected manually b. generated randomly from probability distributions c. taken from forecasting analysis d. derived secondarily using formulas ANSWER: b 12. The choice of the probability distribution for a random variable can be guided by _____. a. an objective function b. likelihood factors c. forecasting d. historical data ANSWER: d Copyright Cengage Learning. Powered by Cognero.

Page 2


Name:

Class:

Date:

Chapter 11 - Monte Carlo Simulation 13. Which of the following inferences about a variable of interest can be drawn from the graph given below?

a. The variable is equally likely to take any value between 20 and 40. b. The variable is more likely to take the value 20 than 40. c. The variable is more likely to take any value outside the range of 20 and 40. d. The variable can only take the value 30. ANSWER: a 14. The type of distribution shown in the graph below is a(n) _____ distribution.

a. uniform b. normal c. exponential d. beta ANSWER: b 15. In reviewing the graph below, which of the following inferences can be drawn about the monthly salary?

Copyright Cengage Learning. Powered by Cognero.

Page 3


Name:

Class:

Date:

Chapter 11 - Monte Carlo Simulation

a. The average monthly salary is $3,000. b. The monthly salary is always less than $3,000. c. The monthly salary is always greater than $3,000. d. The range of the monthly salary distribution is $3,000 to $5,000. ANSWER: a 16. In a _____ distribution, a random variable can take any value in a specified range. a. discrete probability b. cumulative c. relative frequency d. continuous probability ANSWER: d 17. A distribution of a random variable for which values extremely larger or smaller than the mean are increasingly unlikely can possibly be modeled as a(n) _____ probability distribution. a. binomial b. normal c. exponential d. gamma ANSWER: b 18. In simulation analysis, the _____ of random variables can be adjusted to determine the impact of the assumptions about the shape of the uncertainty on the results. a. probability distributions b. ranges c. relative frequencies d. manual generations ANSWER: a 19. A set of values for the random variables is called a(n) _____. a. event b. permutation c. trial Copyright Cengage Learning. Powered by Cognero.

Page 4


Name:

Class:

Date:

Chapter 11 - Monte Carlo Simulation d. combination ANSWER: c 20. The range of computer-generated random numbers is _____. a. [–8, 8] b. [–8, 0) c. [1, 8] d. [0, 1) ANSWER: d 21. All the values of computer-generated random numbers are _____. a. Poisson distributed b. lognormally distributed c. uniformly distributed d. normally distributed ANSWER: c 22. The _____ function is used to generate a pseudorandom number in Excel. a. FREQUENCY() b. RAND() c. NORM.INV() d. ROUND() ANSWER: b 23. Which of the following parameters is required to convert a computer-generated random variable into a uniform random variable? a. Range of the distribution b. Mean of the distribution c. Variance of the distribution d. Moments of the distribution ANSWER: a 24. The weekly demand for an item in a retail store follows a uniform distribution over the range 70 to 83. What would be the weekly demand if its corresponding computer-generated value is 0.5? a. 90.1 b. 83 c. 76.5 d. 50.85 ANSWER: c 25. For a given mean and standard deviation, the _____ function in Excel is used to generate a value for the random variable characterized by a normal distribution. a. NORM.INV b. RAND c. VLOOKUP d. FREQUENCY ANSWER: a 26. The _____ function in Excel is used to compute the statistics required to create a histogram. Copyright Cengage Learning. Powered by Cognero.

Page 5


Name:

Class:

Date:

Chapter 11 - Monte Carlo Simulation a. NORM.INV b. RAND c. FREQUENCY d. STDEV.S ANSWER: c 27. The random variables corresponding to the interarrival times of customers and the service times of the servers are commonly part of a(n) _____ simulation. a. Monte Carlo b. what-if c. risk analysis d. discrete-event ANSWER: d 28. In Excel, the expression LN(RAND())*(–m) would generate a(n) _____ random variable with mean m. a. lognormal b. logarithmic c. normal d. exponential ANSWER: d 29. The Excel function _____ generates integer values between lower and upper bounds. a. RAND b. RANDBETWEEN c. LOWER d. UPPER ANSWER: b 30. The process of determining that a computer program implements a simulation model as it is intended is known as _____. a. validation b. verification c. correlation d. optimization ANSWER: b 31. Which of the following is true of verification? a. It is largely a debugging task. b. It requires an agreement among analysts and managers. c. It deals with the accurate modeling of real system operations. d. It is performed prior to the development of the computer procedure for simulation. ANSWER: a 32. _____ is the process of determining that a simulation model provides an accurate representation of a real system. Copyright Cengage Learning. Powered by Cognero.

Page 6


Name:

Class:

Date:

Chapter 11 - Monte Carlo Simulation a. Regression b. Verification c. Consideration d. Validation ANSWER: d 33. Which of the following is a disadvantage of using simulation? a. Experimenting directly with a simulation model is often not feasible. b. Each simulation run provides only a sample of how the real system will operate. c. The simulation models are used to describe systems without requiring the assumptions that are required by mathematical models. d. Simulation models warn against poor decision strategies by projecting disastrous outcomes such as system failures, large financial losses, and so on. ANSWER: b 34. Which of the following cannot be described by a discrete probability distribution? a. Sales of two medical devices in which Device A generates $35 per unit sold and will likely constitute 30% of the sales and Device B generates $50 per unit sold and will likely constitute 70% of the sales. b. The labor cost for manufacturing goods, where one-third of the units cost $10 in labor, one-third cost $15 in labor, and one-third cost $50 in labor. c. The cost of parts for manufacturing an item, where the parts can take on any value between $80 and $100. d. The number of units produced in a given day, where 20% of the time 99 units are produced and 80% of the time 100 units are produced. ANSWER: c 35. Which of the following cannot be modeled by a continuous distribution? a. Length of time it takes to manufacture a product b. Height of the finished manufactured product c. Weight of a finished manufactured product d. Number of products produced in an hour ANSWER: d 36. The time it takes to manufacture a product is modeled by a continuous distribution. The time to manufacture one unit can take anywhere from 5 to 6 minutes with equal probability. What distribution can be used to model the random variable, production time? a. Normal distribution b. Uniform distribution c. Discrete probability distribution d. Binomial distribution ANSWER: b 37. The profit realized by the sales of a particular item follows a normal distribution with a mean of $0.5 million per quarter and a standard deviation of $0.1 million per quarter. What percent of the quarters can be expected to see a profit of at least $0.5 million? a. 50% Copyright Cengage Learning. Powered by Cognero.

Page 7


Name:

Class:

Date:

Chapter 11 - Monte Carlo Simulation b. 40% c. 60% d. 10% ANSWER: a 38. Which of the following Excel functions would generate random integers from 0 to 100? a. =RAND( ) b. =RANDBETWEEN(0, 100) c. =SUMIF(A1:A100, 100) d. =100*RAND( ) ANSWER: b 39. Which of the following numbers cannot result from the Excel function =NORM.INV(RAND( ), 100, 10)? a. 99 b. 115 c. 121 d. All of these numbers can result from this Excel function. ANSWER: d 40. Which of the following functions computes a value such that 2.5% of the area under the standard normal distribution lies in the upper tail defined by this value? a. =NORM.S.INV(0.975) b. =NORM.S.INV(0.025) c. =NORM.S.INV(0.05) d. =NORM.S.INV(0.95) ANSWER: a Subjective Short Answer 41. Sunseel Industries produces different types of raw materials, and it is interested in using simulation to estimate the profit per unit for its new product X. The selling price for the product will be $40 per unit. Probability distributions for the raw material cost, the production cost, and the marketing cost are estimated as follows. Raw Material Cost ($) 16 18 20 22

Probability 0.20 0.30 0.35 0.15

Production Cost ($) 10 11 12

Probability 0.25 0.45 0.30

Marketing Cost ($) 5 6

Probability 0.40 0.60

a. Compute profit per unit for the base case, worst case, and best case. b. Construct a simulation model to estimate the mean profit per unit. c. Management believes the project may not be sustainable if the profit per unit is less than $2. Use simulation to estimate the probability the profit per unit will be less than $2. ANSWER: a. Profit = Selling Price – Raw Material Cost – Production Cost – Marketing Cost Base-Case using most likely costs Copyright Cengage Learning. Powered by Cognero.

Page 8


Name:

Class:

Date:

Chapter 11 - Monte Carlo Simulation Profit = 40 – 20 – 11 – 6 = $3/unit Worst-Case scenario: Profit = 40 – 22 – 12 – 6 = $0/unit Best-Case scenario: Profit = 40 – 16 – 10 – 5 = $9/unit b. The average profit from the simulation model (see below) should be approximately $4.45.

c. As seen in the chart below, there is approximately a 0.09 probability that profit per unit will be less than $2.

Copyright Cengage Learning. Powered by Cognero.

Page 9


Name:

Class:

Date:

Chapter 11 - Monte Carlo Simulation

42. Salemach Corporation is a start-up company that manufactures simple machines. It is interested in analyzing the profit from a new machine. It estimates that the selling price will be $150 per unit and the setup and advertising costs will total $250,000. The company estimates that the per unit raw material cost is uniformly distributed between $50 and $80 and are equally likely. The demand is normally distributed with a mean of 12,000 units and a standard deviation of 3,000 units. The probability distribution for a range of labor cost per unit is given below. Labor Cost $52 $53 $54 $55 $56

Probability 0.05 0.25 0.40 0.25 0.05

a. Obtain estimates for the mean profit, maximum profit, minimum profit, and standard deviation of profit. b. What is your estimate of the probability of a loss? ANSWER: a. Estimated mean profit = $121,702; maximum profit = $725,148; minimum profit = –$232,635; profit standard deviation = $140,713. b. Estimated probability of a loss = 0.2106

Copyright Cengage Learning. Powered by Cognero.

Page 10


Name:

Class:

Date:

Chapter 11 - Monte Carlo Simulation

43. A company has produced a new battery with an estimated mean lifetime of 60 hours. Management also believes that the standard deviation is 4.5 hours and that battery hours are normally distributed. To promote the new battery, management has offered to refund some money if the battery fails to reach 50 hours before the battery needs to be recharged. Specifically, for batteries with a lifetime below 50 hours, the management will refund a customer $50 per hour short of 50 hours. a. For each battery sold, what is the expected cost of the promotion? b. What hours should the company set the promotion claim if it wants the expected cost to be $0.50? ANSWER: a. The average cost of the promotion per battery is approximately $1.03.

b. The solution is obtained using a simulation optimization model with an objective of setting the expected value of cost (in cell B12) to $0.50 and setting cell B4 to be the decision variable. The solution obtained will vary slightly across optimization runs, but when rounded, a promotion claim of 49 hours will result in an average/expected cost of $0.50.

Copyright Cengage Learning. Powered by Cognero.

Page 11


Name:

Class:

Date:

Chapter 11 - Monte Carlo Simulation

44. An Investment firm offers free financial planning seminars at major hotels for groups of 30 individuals. Each seminar costs them $4,000, and the average first-year commission for each new enrollment is $6,000. The firm estimates that for each individual attending the seminar, there is a 0.05 probability that he/she will enroll. a. Determine the equation for computing the profit per seminar, given values of the relevant parameters. b. Construct a spreadsheet simulation model to analyze the profitability of the seminars. Would you recommend the investment firm to continue running the seminars? ANSWER: a. Profit = (New Enrollment × 6,000) – 4,000. b. The expected profit from a seminar is $5,000. Hence, the investment firm can continue conducting seminars. Copyright Cengage Learning. Powered by Cognero.

Page 12


Name:

Class:

Date:

Chapter 11 - Monte Carlo Simulation

45. The stock price of Robin Tires, Inc., listed on the Stock Exchange is currently $20. The following probability distribution shows how the price per share is expected to change over a three-month period: Stock Profit Change ($) –3 –2 –1 0 +1

Probability 0.25 0.2 0.05 0.15 0.1

Copyright Cengage Learning. Powered by Cognero.

Page 13


Name:

Class:

Date:

Chapter 11 - Monte Carlo Simulation +2 +3

0.15 0.1

a. Construct a spreadsheet simulation model that computes the value of the stock profit in 3 months, 6 months, 9 months, and 12 months under the assumption that the change in profit over any 3-month period is independent of the change in profit over any other 3-month period. b. With the current profit of $20 per share, simulate the profit per share for the next four 3-month periods. What is the average profit in 12 months? What is the standard deviation of the profit in 12 months? ANSWER: a. Refer to the screenshot below for the Spreadsheet Simulation Model.

b. The mean stock profit after 12 months is $18 and the standard deviation is $4.20. 46. A football tournament is conducted between Team A and Team B of a college with the winner being the first team to win four games out of seven games. The probability that Team A wins each game is as follows: Game Probability of Win

1 0.48

2 0.5

3 0.45

4 0.6

5 0.55

6 0.40

7 0.55

a. Set up a spreadsheet simulation model in which whether Team A wins each game is a random variable. b. What is the probability that Team A wins the tournament? c. What is the average number of games played regardless of winner? ANSWER: a. Refer to the screenshot below for the Spreadsheet Simulation Model.

Copyright Cengage Learning. Powered by Cognero.

Page 14


Name:

Class:

Date:

Chapter 11 - Monte Carlo Simulation

b. Team A has approximately 0.507 probability of winning the Football Tournament.

c. The average length of the tournament is approximately 5.757 games.

Copyright Cengage Learning. Powered by Cognero.

Page 15


Name:

Class:

Date:

Chapter 11 - Monte Carlo Simulation

47. The quality of a device should be examined in the inspection department sequentially in three steps before it is sent to the packaging department. The probability distributions for the time required to complete each of the activities are as follows: Step 1

2

3

Time (minutes) Probability 3 0.15 5 0.25 7 0.35 9 0.25 11 0.25 13 0.30 15 0.45 8 0.35 10 0.20 12 0.45

a. Construct a spreadsheet simulation model to estimate the average time spent in the inspection department and the standard deviation of the time spent in the inspection department. b. What is the estimated probability that the inspection will be completed in 32 minutes or less? ANSWER: a. Expected duration is 30 minutes with a standard deviation of 3.19 minutes. b. Probability of completing project in 32 minutes or less is 0.8171.

Copyright Cengage Learning. Powered by Cognero.

Page 16


Name:

Class:

Date:

Chapter 11 - Monte Carlo Simulation

48. Team X is scheduled to play against Team Y in an upcoming game in Baseball’s World Series. Assume that each player’s point production can be represented as an integer uniform variable with the ranges provided in the following table: Player 1 2 3 4 5 6 7 8 9

Team X Team Y [4,10] [2,4] [2,6] [14,30] [7,20] [2,20] [3,5] [1,10] [6,20] [8,20] [5,10] [7,12] [7,10] [14,20] [12,40] [3,5] [9,20] [14,25]

a. Develop a spreadsheet model that simulates the points scored by each team. b. What are the average and standard deviation of points scored by Team X? What is the shape of the distribution of points scored by Team X? c. What are the average and standard deviation of points scored by Team Y? What is the shape of the distribution of points scored by Team Y? d. Let Point Differential = Team X points – Team Y points. What is the average point differential between the two teams? What is the standard deviation in the point differential? What is the shape of the point differential distribution? e. What is the probability that the Team X scores more points than the Team Y? ANSWER: a. Refer to the screenshot below for the Spreadsheet Simulation Model.

Copyright Cengage Learning. Powered by Cognero.

Page 17


Name:

Class:

Date:

Chapter 11 - Monte Carlo Simulation

b. Team X scores an average of approximately 98 points with a standard deviation of 11.25 points. The distribution of points is bell-shaped (approximately normal).

c. Team Y scores an average of approximately 105.50 points with a standard deviation of 9.87 points. The distribution of points is bell-shaped (approximately normal).

Copyright Cengage Learning. Powered by Cognero.

Page 18


Name:

Class:

Date:

Chapter 11 - Monte Carlo Simulation

d. The average point differential is –7.5 points with a standard deviation of 14.95 points. The distribution of point differential is bell-shaped (approximately normal). These observations provide empirical proof that the difference of two independent normal random variables is normally distributed with a mean equal to the difference in the means of the two underlying random variables and a variance equal to the sum of the variances of the two underlying random variables.

e. Team X has approximately a 0.2991 probability of scoring more points than Team Y.

Copyright Cengage Learning. Powered by Cognero.

Page 19


Name:

Class:

Date:

Chapter 11 - Monte Carlo Simulation

49. A toy company designs a new toy car this season. The fixed cost to produce the car is $120,000. The variable cost, which includes raw materials, production, and shipping costs, is $40 per car. The company will sell the car for $48 each. A distributor has agreed to pay the toy company $12 for each car remaining after the retail selling season. Forecasts are for expected sales of 55,000 toy cars with a standard deviation of 9,000. The normal probability distribution is assumed to be a good description of the demand. The management has tentatively decided to produce 55,000 units (the same as average demand), but it wants to conduct an analysis regarding this production quantity before finalizing the decision. a. Create a what-if spreadsheet model using formulas that relate the values of production quantity, demand, sales, revenue from sales, amount of surplus, revenue from sales of surplus, total cost, and net profit. What is the profit corresponding to average demand (55,000 units)? b. Modeling demand as a normal random variable with a mean of 55,000 and a standard deviation of 9,000, simulate the sales of the toy car using a production quantity of 55,000 units. What is the estimate of the average profit associated with the production quantity of 55,000 cars? How does this compare to the profit corresponding to the average demand (as computed in part a)? c. Before making a final decision on the production quantity, management wants an analysis of a more aggressive 65,000unit production quantity and a more conservative 45,000-unit production quantity. Run your simulation with these two production quantities. What is the mean profit associated with each? ANSWER: a. Profit equals $320,000 when demand is equal to its average of 55,000 units.

Copyright Cengage Learning. Powered by Cognero.

Page 20


Name:

Class:

Date:

Chapter 11 - Monte Carlo Simulation

b. Average profit is approximately $190,742. Average profit is less than the profit corresponding to average demand. This phenomenon is often called the Flaw of Averages.

Copyright Cengage Learning. Powered by Cognero.

Page 21


Name:

Class:

Date:

Chapter 11 - Monte Carlo Simulation c. When ordering 45,000 units, the average profit is approximately $218,251. When ordering 65,000 units, the average profit is $18,252. 50. The manager of a company decides to arrange a party for being promoted and has invited 50 guests for dinner. The following table contains information on the number of RSVP’ed guests. He assumes that 12 people will not turn up. He also estimates that 13 guests planning to come solo has a 65 percent chance of attending alone, a 30 percent chance of not attending, and a 5 percent chance of bringing a companion. For each of the 20 guests who plan to bring a companion, there is a 75 percent chance that she or he will attend with a companion, a 10 percent chance of attending solo, and a 15 percent chance of not attending at all. For the 5 people who have not responded, the wedding planner assumes that there is an 85 percent chance that each will not attend, a 10 percent chance they will attend alone, and a 5 percent chance they will attend with a companion. Guests 0 1 2 No response

Number of Invitations 12 13 20 5

a. Assist the manager by constructing a spreadsheet simulation model to determine the expected number of guests who will attend the party. b. Use the Monte Carlo simulation model to determine X, the minimum number of guests for whom the dinner needs to be ordered, so that there is at least a 95 percent chance that the actual attendance is less than or equal to X. What is the best estimate for the value of X? ANSWER: a. The expected number of attendees is 42.75 » 43. b. P(actual attendance = 49) » 96%. So, 49 guests would be a relatively safe number on which to base dinner preparations.

Copyright Cengage Learning. Powered by Cognero.

Page 22


Name:

Class:

Date:

Chapter 11 - Monte Carlo Simulation

Copyright Cengage Learning. Powered by Cognero.

Page 23


Name:

Class:

Date:

Chapter 11 - Monte Carlo Simulation

51. A distributor has generated a rough estimate of aftershave demand at their retails store. The distributor is confident that demand will range from 100 to 650. The following table lists weights for demand values within this range. Demand Weight

230 0.15

330 0.25

430 0.35

530 0.25

The distributor pays a wholesale price of $21 per aftershave and then sells at a retail price of $31. a. Construct a spreadsheet model that computes net profit corresponding to a given level of demand and specified order quantity. Model demand as a random variable with ASP’s custom general distribution. b. Using simulation optimization, determine the order quantity that maximizes expected profit. What is the probability of running out of aftershave at this order quantity? c. How many aftershaves does the distributor need to order so that the probability of running out of aftershaves is only 25 percent? How much expected profit will the distributor lose if he orders this amount rather than the amount from part (b)? ANSWER: a. Refer to the screenshot below for the Spreadsheet Model.

Copyright Cengage Learning. Powered by Cognero.

Page 24


Name:

Class:

Date:

Chapter 11 - Monte Carlo Simulation

b. An order quantity of about 345 maximizes the expected profit at a value of approximately $2,638. There is a 68% chance of running out of aftershaves at this order quantity. c. As shown in the figure below, the 75th percentile of demand is 489. Therefore, an order quantity of 489 ensures only a 25% chance of running out of aftershaves. Re-running the simulation with this order quantity results in an expected profit of $1,653. Thus, the distributor will be losing $985 in expected profit if it decides to achieve a 75% service level rather than maximizing expected profit.

52. Consider the table below with information regarding each activity, immediate predecessors, and duration estimates (in minutes) for each activity. Activity A

Immediate Predecessors —

Minimum Time 6

Copyright Cengage Learning. Powered by Cognero.

Likely Time 7

Maximum Time 9 Page 25


Name:

Class:

Date:

Chapter 11 - Monte Carlo Simulation B C D E F G

— A B B, C D E

4 3 2 3 4 4

9 9 6 5 7 7

11 12 12 10 9 15

a. Using the PERT distribution in ASP to represent the duration of each activity, construct a simulation model to compute the total time to complete the task. b. What is the expected duration of the entire project? What is the standard deviation of the project duration? c. What is the likelihood that the project will be complete in 26 minutes? ANSWER: a. Refer to the screenshot below for the Spreadsheet Simulation Model.

b. The expected project duration is 29.03 minutes. The standard deviation of the project duration is 2.9 minutes. c. The project has an estimated 0.1528 probability of being completed within 26 minutes. 53. A specialty hedge fund is considering the purchase of a Jackson Pollock painting. It estimates the value of the painting to be $185 million. In an auction, both the number of competing bids and the amount of the competing bids is uncertain. Copyright Cengage Learning. Powered by Cognero.

Page 26


Name:

Class:

Date:

Chapter 11 - Monte Carlo Simulation The hedge fund has maintained a file summarizing 10 recent art auctions that it believes are similar to the upcoming auction. It is considering a bid of $163 million and would like to evaluate its chances of winning the upcoming auction with this bid. Bid Amount (As Fraction of Estimated Share Value) Company 1 2 3 4 5 6 7 1 0.817 0.884 0.756 2 0.771 0.863 0.825 0.819 0.851 0.786 3 0.804 0.851 0.786 0.851 4 0.880 0.756 0.874 0.877 0.910 5 0.890 0.804 0.819 0.860 0.880 0.880 6 0.851 0.786 0.896 0.784 0.792 0.792 7 0.881 0.786 0.804 0.819 8 0.804 0.819 0.860 0.880 0.773 0.824 9 0.819 0.896 0.877 0.860 0.784 0.819 0.880 10 0.756 0.804 0.786 0.786 0.819 0.885 a. Construct a spreadsheet simulation model to determine the likelihood of the hedge fund winning the auction. Use a discrete uniform distribution between the minimum and maximum number of bidders in the 10 observed auctions to model the number of bidders in the Jackson Pollock auction. Fit a realistic distribution to the bid data to generate values of competing bid amounts. b. For a bid amount of $163 million, estimate the probability of the hedge fund winning the auction? ANSWER: a. Refer to the screenshot below for the Spreadsheet Simulation Model.

Copyright Cengage Learning. Powered by Cognero.

Page 27


Name:

Class:

Date:

Chapter 11 - Monte Carlo Simulation

b. The probability of the hedge fund winning the auction at a bid amount of $163 is 0.34. 54. A specialty hedge fund is considering the purchase of a Jackson Pollock painting. It estimates the value of the painting to be $185 million. In an auction, both the number of competing bids and the amount of the competing bids is uncertain. The hedge fund has maintained a file summarizing 10 recent art auctions that it believes are similar to the upcoming auction. It is considering a bid of $163 million and would like to evaluate its chances of winning the upcoming auction with this bid.

Company 1 2 3 4 5 6 7 8 9 10

1 0.817 0.771 0.804 0.880 0.890 0.851 0.881 0.804 0.819 0.756

Bid Amount (As Fraction of Estimated Share Value) 2 3 4 5 6 7 0.884 0.756 0.863 0.825 0.819 0.851 0.786 0.851 0.786 0.851 0.756 0.874 0.877 0.910 0.804 0.819 0.860 0.880 0.880 0.786 0.896 0.784 0.792 0.792 0.786 0.804 0.819 0.819 0.860 0.880 0.773 0.824 0.896 0.877 0.860 0.784 0.819 0.880 0.804 0.786 0.786 0.819 0.885

a. Construct a spreadsheet simulation model for this auction. Use a discrete uniform distribution between the minimum and maximum number of bidders in the 10 observed auctions to model the number of bidders in the Jackson Pollock Copyright Cengage Learning. Powered by Cognero.

Page 28


Name:

Class:

Date:

Chapter 11 - Monte Carlo Simulation auction. Fit a realistic distribution to the bid data to generate values of competing bid amounts. Use ASP to apply simulation optimization to determine the hedge fund’s bid amount that maximizes the expected return = P(winning auction)*(185 – bid amount). (Hint: Placing reasonable bounds on the highest and lowest possible bid amount will greatly assist the optimization algorithm.) b. What is the probability that the hedge fund wins the auction if it bids the amount that maximizes its expected return? ANSWER: a. Refer to the screenshot below for the Spreadsheet Simulation Model.

Copyright Cengage Learning. Powered by Cognero.

Page 29


Name:

Class:

Date:

Chapter 11 - Monte Carlo Simulation

A bid amount of about $168.35 million maximizes the expected return on the auction to a value of approximately $16.65 million. b. A bid amount of $168.35 million has a nearly a 100% chance of winning the auction. This is because the maximum competitor bid is 0.91 × 185 = $168.35 million. 55. A store is offering a discount on 800 pairs of basketball shoes. The amount of the discount varies and is not revealed to the customer until checkout. The distribution of discounts is given in the below table. Discount Rate (%) 5 20 35 50 65 90

Number of Tags 250 220 120 80 70 60

How many pairs of shoes does a customer have to buy so that, on average, he has purchased five containing a 65% or 90% discount? (Hint: Use the hypergeometric distribution in ASP to answer this question.) ANSWER: A customer needs to buy 31 shoes in order to have purchased an average of at least five shoes containing a 65% or 90% discount.

Copyright Cengage Learning. Powered by Cognero.

Page 30


Name:

Class:

Date:

Chapter 11 - Monte Carlo Simulation

Copyright Cengage Learning. Powered by Cognero.

Page 31


Name:

Class:

Date:

Chapter 11 - Monte Carlo Simulation

56. A store is offering a discount on 800 pairs of basketball shoes. The amount of the discount varies and is not revealed to the customer until checkout. The distribution of discounts is given in the below table. Discount Rate (%) 5 20 35 50 65 90

Number of Tags 250 220 120 80 70 60

Use the negative binomial distribution to approximate the average number of pairs of shoes that a customer has to buy before purchasing two pairs with a discount of at least 50%. ANSWER: On average, a customer will buy 5.62 pairs of shoes with less than a 50% discount before having purchased two pairs of shoes with a 50% or 65% or 90% discount. Thus, on average, a customer purchases 5.62 + 2 = 7.62 pairs of shoes to obtain two pairs with a discount of at least 50%.

Copyright Cengage Learning. Powered by Cognero.

Page 32


Name:

Class:

Date:

Chapter 11 - Monte Carlo Simulation

57. An entrepreneur who operates a cellular phone store orders inventory of cell phones based on four internal memory specifications – 8 GB, 16 GB, 32 GB, and 64 GB. She wants to evaluate her inventory ordering policy for the phones with four different amounts of internal memory. Any phone unsold at the end of a period is kept in inventory for the next sales period and incurs a holding cost expressed as 10% of the cost per unit per period. If demand during a period exceeds supply for a phone, then the sale is lost. The data on the cost and selling prices of the cell phones categorized by these memory specifications are known, and representative data on the past sales are also available. Internal Memory 8 GB 16 GB 32 GB 64 GB

Cost ($) 420 500 590 670

Past Sales: Period 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

16 GB 274 211 276 258 244 252 213 217 215 237 251 271 252 306 250 230 285 304

8 GB 276 302 299 261 260 306 227 228 205 236 279 303 266 263 210 247 314 283

Selling Price ($) 430 520 620 700 32 GB 327 264 306 232 278 262 290 308 246 272 301 271 276 285 328 317 289 312

Copyright Cengage Learning. Powered by Cognero.

64 GB 332 329 239 277 220 252 253 250 277 284 282 282 317 310 253 340 332 256 Page 33


Name:

Class:

Date:

Chapter 11 - Monte Carlo Simulation 19 20

283 252

245 237

294 262

272 323

a. Construct a spreadsheet simulation model to estimate the total profit the entrepreneur earns in a period when ordering 300 units of each cell phone. To model the respective cell phone demands, fit a realistic distribution to the sales data. What is the average total profit? What is the estimated likelihood that the entrepreneur makes less than $10,000 next period? b. Using Spearman rank correlation, compute the correlations between the demands for the cells phones based on the four memory specifications. Incorporate a correlation matrix to capture the interrelationships between the demands for each cell phone type. What is the average total profit? What is the estimated likelihood that the entrepreneur makes less than $10,000 next period? Comparing these answers to part (a), conclude how the correlated demand affects the model’s implications. ANSWER: a. The spreadsheet simulation model is shown below. The average total profit earned by the entrepreneur is about $17,515. Also, the estimated probability that the entrepreneur makes less than $10,000 next period is 0.0278.

Copyright Cengage Learning. Powered by Cognero.

Page 34


Name:

Class:

Date:

Chapter 11 - Monte Carlo Simulation

b. The calculation of ranks, using sales data, for computing the correlations between the demands for the cells phones based on the four memory specifications is given below:

The average total profit earned by the entrepreneur is about $17,515. Also, there is a 0.0435 probability that the total profit made by the entrepreneur is less than $10,000 next period. Comparing these results with the answers in part (a), we see that considering the correlation between demands Copyright Cengage Learning. Powered by Cognero.

Page 35


Name:

Class:

Date:

Chapter 11 - Monte Carlo Simulation does not have a large impact on the profit estimation, but correlated demand does suggest a slightly higher chance of more extreme outcomes.

58. A branded store has outlets around the world that generates profit in the British pound, the New Zealand kiwi, and the Copyright Cengage Learning. Powered by Cognero.

Page 36


Name:

Class:

Date:

Chapter 11 - Monte Carlo Simulation Japanese yen. At the end of each quarter, the store converts the revenue from these three international outlets back into U.S. dollars, exposing itself to exchange rate risk. The current exchange rates are US$1.56 per £1, US$0.85 per NZD$1, and US$0.02 per ¥1. The management of the store wants to construct a simulation model to assess its vulnerability to uncertain exchange rate fluctuations. The quarterly profits earned in British pounds, New Zealand kiwis, and Japanese yen are £150,000, NZD$200,000, and ¥9,000,000, respectively. The data are given below. USD/GBP USD/NZD 6.85% 4.25% 2.74% 6.96% 4.60% 13.54% 4.73% 6.42% 2.99% 3.27% 6.62% 7.41% 8.78% 10.64% –0.84% –5.30% 3.91% 7.04% 6.38% 8.44% 1.86% 5.01% –0.36% 0.22% –3.73% 0.41% –0.87% 3.73% 2.67% –0.31% 4.51% –4.64% 6.57% 3.28% 2.30% 9.32% 2.90% 4.39% 3.71% 7.45% 2.91% 1.16% 3.91% 5.73% –1.47% 5.18% –0.01% 0.47% –0.37% –7.40% –17.09% –17.84% –8.32% –5.49% 8.58% 15.48% 8.02% 14.63% 3.22% 11.91% –3.20% –2.46% 2.96% 4.21% 3.34% 4.85% 4.32% 3.97% 5.92% 4.99% 2.41% 6.19% 2.44% 7.99% 2.09% 2.77% 3.04% 0.27% –0.11% 5.53% 1.62% 3.78% 3.39% 4.44%

USD/JPY 8.84% –1.83% 6.38% 3.59% 2.63% 9.23% 2.88% –1.84% 2.25% 5.22% 2.35% –0.68% –2.14% –2.97% 2.64% 4.86% 1.49% 1.66% 2.34% –0.63% 6.10% 4.66% 8.43% 3.82% 0.39% 12.47% 6.72% –0.85% 5.80% 7.38% 0.31% 0.67% 6.23% 7.53% 0.28% 6.75% 15.23% –9.44% –3.46% –2.33% –1.30% 1.73%

Copyright Cengage Learning. Powered by Cognero.

Page 37


Name:

Class:

Date:

Chapter 11 - Monte Carlo Simulation 8.36% 1.93% 0.23% 0.17% 2.03% 2.66% 3.71% 0.13% 2.02% 1.34% 1.05% 2.33% 2.99% 4.20% 1.81% –1.13% –1.79% 3.78%

2.71% –0.92% 2.67% –4.61% –0.93% –3.25% –4.95% –2.08% 3.74% 5.73% –5.03% 7.73% 7.79% 5.74% 1.54% –1.63% 1.50% –0.48%

–0.52% –3.51% 1.62% 3.39% –2.95% 1.19% –4.55% –0.15% 18.55% 4.33% –0.45% 3.39% 7.23% 4.89% –2.70% –0.68% –1.37% 3.48%

a. If exchange rates stay at their current values, what is the total quarterly profit in U.S. dollars? b. Model the uncertainty in the quarterly changes of the exchange rates between U.S. dollars and British pounds, New Zealand kiwis, and Japanese yen using a SLURP. Use your simulation model to estimate the average total quarterly profit in U.S. dollars. What is the probability that the total quarterly profit will be lower than the answer in part a? ANSWER: a. If exchange rates do not change, total revenue in U.S. dollars will be $584,000. b. Expected total revenue in U.S. dollars is about $599,000. The probability that the total revenue in U.S. dollars will be less than $584,000 is 0.28. Thus, the store faces significant downside risk due to exchange rate fluctuation.

59. A company has improved its anti-virus program and has released a new version. Assume there is a 0.6 probability that Copyright Cengage Learning. Powered by Cognero.

Page 38


Name:

Class:

Date:

Chapter 11 - Monte Carlo Simulation the users of this anti-virus will upgrade the version in any particular year. That is, the upgrade year of the user is a geometric random variable. The revenue generated from the upgrade (when it occurs) follows a normal distribution with a mean of $80,000 and a standard deviation of $22,000. a. Complete a simulation model that analyzes the net present value of the revenue from the user upgrade. Use an annual discount rate of 8 percent. b. What is the average net present value earned by the company? c. What is the standard deviation of net present value? ANSWER: a.

b. Average NPV is about $76,253. c. Standard deviation of NPV is about $21,816. 60. A tourist bus can accommodate 80 people and currently books up to 80 reservations. Past data shows that the tourist bus always accommodates all 80 reservations but that, on average, two people do not show up. To capture additional profit, the travel agent is considering an overbooking strategy in which he would accept 82 reservations even though the Copyright Cengage Learning. Powered by Cognero.

Page 39


Name:

Class:

Date:

Chapter 11 - Monte Carlo Simulation tourist bus can accommodate only 80 people. The travel agent believes that he will be able to always book all 82 reservations. The probability distribution for the number of people showing up when 82 reservations are accepted is estimated as follows: People Showing Up 78 79 80 81 82

Probability 0.05 0.3 0.5 0.1 0.05

The travel agent receives a marginal profit of $110 for each passenger who books a reservation (regardless whether they show up). The travel agent will also incur a cost for any passenger denied seating on the bus. This cost covers added expenses of rescheduling the passenger as well as loss of goodwill, estimated to be $160 per passenger. Develop a spreadsheet simulation model for this overbooking system. Simulate the number of passengers showing up. a. What is the average net profit for each tourist bus with the overbooking strategy? b. What is the probability that the net profit with the overbooking strategy will be less than the net profit without overbooking (80 × $110 = $8,800)? c. Explain how your simulation model could be used to evaluate other overbooking levels such as 81, 83, and 84 and for recommending a best overbooking strategy. ANSWER: a. The average net profit for each tourist bus with the overbooking strategy is $8,988. b. There is a 0.05 probability that the overbooking strategy will result in less than $8,800 net profit (the net profit resulting from no overbooking).

c. The same spreadsheet design can be used to simulate other overbooking strategies including accepting 81, 83 and 84 passenger reservations. In each case, the travel agent needs to estimate the distribution of the number of passengers showing up and rerun the simulation model. This would enable the agency to evaluate the other overbooking alternatives and determine the most beneficial overbooking policy. Alternatively, the distribution of passengers showing up could be modeled as a binomial random variable in which n = the reservation limit and p = probability of an individual passenger boarding the bus.

Copyright Cengage Learning. Powered by Cognero.

Page 40


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models Multiple Choice 1. The term _____ refers to the expression that defines the quantity to be maximized or minimized in a linear programming model. a. objective function b. problem formulation c. decision variable d. association rule ANSWER: a 2. Constraints are _____. a. quantities to be maximized in a linear programming model b. quantities to be minimized in a linear programming model c. restrictions that limit the settings of the decision variables d. input variables that can be controlled during optimization ANSWER: c 3. _____, or modeling, is the process of translating a verbal statement of a problem into a mathematical statement. a. Problem-solving approach b. Data preparation c. Data structuring d. Problem formulation ANSWER: d 4. A controllable input for a linear programming model is known as a _____. a. parameter. b. decision variable. c. dummy variable. d. constraint. ANSWER: b 5. In problem formulation, the _____. a. objective is expressed in terms of the decision variables. b. constraints are expressed in terms of the obtained objective function coefficients. c. nonnegativity constraints are always ignored. d. optimal solution is decided upon. ANSWER: a 6. When formulating a constraint, care must be taken to ensure that _____. a. all the objective function coefficients are included b. there are no inequalities in the mathematical expression c. the decision variables are set at either maximum or minimum values d. the units of measurement on both sides of the constraint match ANSWER: d 7. Nonnegativity constraints ensure that _____. a. the problem modeling includes only nonnegative values in the constraints b. the solution to the problem will contain only nonnegative values for the decision variables c. the objective function of the problem always returns maximum quantities d. there are no inequalities in the constraints Copyright Cengage Learning. Powered by Cognero.

Page 1


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models ANSWER: b 8. A mathematical function in which each variable appears in a separate term and is raised to the first power is known as a _____. a. power function b. linear function c. what-if function d. nonlinear function ANSWER: b 9. The _____ assumption necessary for a linear programming model to be appropriate means that the contribution to the objective function and the amount of resources used in each constraint are in accordance to the value of each decision variable. a. proportionality b. divisibility c. additivity d. negativity ANSWER: a 10. The assumption that is necessary for a linear programming model to be appropriate and that ensures that the value of the objective function and the total resources used can be found by summing the objective function contribution and the resources used for all decision variables is known as _____. a. proportionality b. negativity c. additivity d. divisibility ANSWER: c 11. In a linear programming model, the _____ assumption plus the nonnegativity constraints mean that decision variables can take on any value greater than or equal to zero. a. proportionality b. divisibility c. additivity d. negativity ANSWER: b 12. A(n) _____ solution satisfies all the constraint expressions simultaneously. a. feasible b. objective c. infeasible d. extreme ANSWER: a 13. A(n) _____ refers to a set of points that yield a fixed value of the objective function. a. objective function coefficient b. infeasible solution c. objective function contour d. feasible region ANSWER: c 14. The points where constraints intersect on the boundary of the feasible region are termed as the _____. a. feasible points b. objective function contour c. extreme points d. feasible edges ANSWER: c

Copyright Cengage Learning. Powered by Cognero.

Page 2


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models 15. Which algorithm, developed by George Dantzig and utilized by Excel Solver, is effective at investigating extreme points in an intelligent way to find the optimal solution to even very large linear programs? a. Ellipsoidal algorithm b. Complex algorithm c. Trial-and-error algorithm d. Simplex algorithm ANSWER: d 16. Suppose that profit for a particular product is calculated using the linear equation: Profit = 20S + 3D. Which of the following combinations of S and D would yield a maximum profit? a. S = 0, D = 0 b. S = 405, D = 0 c. S = 0, D = 299 d. S = 182, D = 145 ANSWER: b 17. A _____ refers to a constraint that can be expressed as an equality at the optimal solution. a. nonnegativity constraint b. first class constraint c. slack variable d. binding constraint ANSWER: d 18. Geometrically, binding constraints intersect to form the ______. a. subspace b. optimal point c. decision cell d. zero slack ANSWER: b 19. The _____ value for each less-than-or-equal-to constraint indicates the difference between the left-hand and righthand values for a constraint. a. objective function coefficient b. slack c. unbounded d. surplus ANSWER: b 20. The slack value for binding constraints is _____. a. always a positive integer b. zero c. a negative integer d. equal to the sum of the optimal points in the solution ANSWER: b 21. A variable subtracted from the left-hand side of a greater-than-or-equal to constraint to convert the constraint into an Copyright Cengage Learning. Powered by Cognero.

Page 3


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models equality is known as a(n) _____. a. surplus variable b. slack variable c. unbounded variable d. binding constraint ANSWER: a 22. A scenario in which the optimal objective function contour line coincides with one of the binding constraint lines on the boundary of the feasible region leads to _____ solutions. a. infeasible b. alternative optimal c. binding d. unique optimal ANSWER: b 23. _____ is the situation in which no solution to the linear programming problem satisfies all the constraints. a. Unboundedness b. Divisibility c. Infeasibility d. Optimality ANSWER: c 24. Problems with infeasible solutions arise in practice because _____. a. management doesn’t specify enough restrictions b. too many restrictions have been placed on the problem c. of errors in objective function formulation d. there are too few decision variables ANSWER: b 25. The situation in which the value of the solution may be made infinitely large in a maximization linear programming problem or infinitely small in a minimization problem without violating any of the constraints is known as _____. a. infeasibility b. unbounded c. infiniteness d. semi-optimality ANSWER: b 26. Which of the following error messages is displayed in Excel Solver when attempting to solve an unbounded problem? a. Solver could not find a feasible solution. b. Solver cannot improve the current solution. All constraints are satisfied. c. Solver could not find a bounded solution. d. Objective Cell values do not converge. ANSWER: d 27. In linear programming models of real problems, the occurrence of an unbounded solution means that the _____. a. resultant values of the decision variables have no bounds b. mathematical models sufficiently represent the real-world problems c. problem formulation is improper d. constraints have been excessively used in modeling ANSWER: c Copyright Cengage Learning. Powered by Cognero.

Page 4


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models 28. The study of how changes in the input parameters of a linear programming problem affect the optimal solution is known as _____. a. regression analysis b. model analysis c. optimality analysis d. sensitivity analysis ANSWER: d 29. The change in the optimal objective function value per unit increase in the right-hand side of a constraint is given by the _____. a. objective function coefficient b. shadow price c. restrictive cost d. allowable increase ANSWER: b 30. The shadow price of nonbinding constraints _____. a. will always be zero b. will always be a positive value c. can never be equal to zero d. is no longer valid if the right-hand side of the constraint remains the same ANSWER: a 31. The reduced cost for a decision variable that appears in a Sensitivity Report refers to the _____ of the nonnegativity constraint for that variable. a. range of optimality b. slack value c. shadow price d. range of feasibility ANSWER: c 32. The reduced cost for a decision variable that appears in a Sensitivity Report indicates the change in the optimal objective function value that results from changing the right-hand side of the nonnegativity constraint from _____. a. 1 to 0 b. 0 to 1 c. –1 to 0 d. 0 to –1 ANSWER: b 33. Rob is a financial manager with Sharez, an investment advisory company. He must select specific investments—for example, stocks and bonds—from a variety of investment alternatives. Which of the following statements is most likely to be the objective function in this scenario? a. Minimization of the number of stocks held b. Maximization of expected return c. Minimization of tax dues Copyright Cengage Learning. Powered by Cognero.

Page 5


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models d. Maximization of investment risk ANSWER: b 34. Rob is a financial manager with Sharez, an investment advisory company. He must select specific investments—for example, stocks and bonds—from a variety of investment alternatives. Restrictions on the type of permissible investments would be a _____ in this case. a. feasible solution b. surplus variable c. slack variable d. constraint ANSWER: d 35. A canned food manufacturer has its manufacturing plants in three locations across a state. Their product has to be transported to 3 central distribution centers, which in turn disperse the goods to 72 stores across the state. Which of the following is most likely to be the objective function in this scenario? a. Increasing the number of goods manufactured at the plant b. Decreasing the cost of their raw material sourcing c. Minimizing the cost of shipping goods from the plant to the store d. Minimizing the quantity of goods distributed across the stores ANSWER: c 36. A canned food manufacturer has its manufacturing plants in three locations across a state. Their product has to be transported to 3 central distribution centers, which in turn disperse the goods to 72 stores across the state. Which of the following visualization tools could help understand this problem better? a. Time-series plot b. Scatter chart c. Network graph d. Contour plot ANSWER: c Subjective Short Answer 37. The set of all points that satisfies the constraints of a given linear programming problem is the_____. ANSWER: feasible region 38. To find the optimal solution to a linear optimization problem, do you have to examine all of the points in the feasible region? Explain. ANSWER: No. To solve a linear optimization problem, you only have to examine the extreme points of the feasible region to find an optimal solution. The extreme points are found where constraints intersect on the boundary of the feasible region. 39. Can a linear programming problem have no solution? More than one solution? Explain. ANSWER: A linear programming problem may have no solution, one solution, or more than one solution (alternative optimal solutions). Infeasibility means that no solution to the linear programming problem satisfies all the constraints, including the nonnegativity constraints. Graphically, infeasibility means that a feasible region does not exist; that is, no points satisfy all the constraints and the nonnegativity conditions simultaneously. The Copyright Cengage Learning. Powered by Cognero.

Page 6


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models objective function may have the same (optimal) value at more than one of the extreme points. When this occurs, any point on the line connecting the two optimal extreme points also provides an optimal solution. 40. What does it mean for a linear programming problem to be unbounded? ANSWER: The solution to a maximization linear programming problem is unbounded if the value of the solution may be made infinitely large without violating any of the constraints. In other words, no matter which solution you pick, you will always be able to reach another feasible solution with a larger value. For a minimization problem, the solution is unbounded if the value may be made infinitely small without violating any of the constraints. In other words, no matter which solution you pick, you will always be able to reach another feasible solution with a smaller value. 41. If a linear program has more than one optimal solution, does this mean that it doesn’t matter which solution is selected? Explain. ANSWER: No. If a linear program has more than one optimal solution, it would be good for management to know this. There might be factors external to the model that make one optimal solution preferable to another. For example, in a portfolio optimization problem, perhaps more than one strategy yields the maximum expected return. However, those strategies might be quite different in terms of their risk to the investor. By knowing the optimal alternatives and then assessing the risk of each, the investor could pick the least risky alternative from the optimal solutions. 42. Gatson manufacturing company produces two types of tires: Economy tires and Premium tires. The manufacturing time and the profit contribution per tire are given in the following table.

Operation Material Preparation Tire Building Curing Final Inspection Profit/Tire

Manufacturing Time (Hours) Economy tires 4/3 4/5 1/2 1/5 $12

Time Available Premium tires Hours 1/2 600 1 650 2/4 580 1/3 120 $10

Answer the following assuming that the company is interested in maximizing the total profit contribution. a. What is the linear programming model for this problem? b. Develop a spreadsheet model and find the optimal solution using Excel Solver. How many tires of each model should Gatson manufacture? c. What is the total profit contribution Gatson can earn with the optimal production quantities? ANSWER: a. Let E = number of economy tires manufactured P = number of premium tires manufactured Max 12E + 10P s.t. 4/3E + 1/2P ≤ 600 – Material Preparation 4/5E + P ≤ 650 – Tire Building 1/2E + 2/4P ≤ 580 – Curing 1/5E + 1/3P ≤ 120 – Final Inspection E, P ≥ 0 b. Copyright Cengage Learning. Powered by Cognero.

Page 7


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 8


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models

Gatson should manufacture about 406 Economy tires and about 116 Premium tires. c. With the optimal production quantities, the profit Gatson can earn is approximately $6,039. 43. Hire-a-Car System rents three types of cars at two different locations. The profit contribution made per day for each car type at each location is listed below. Location A B

Economy $25 $30

Mid-size $40 $35

Car Type Luxury $10 $45

The management forecasts the demand per day by car type as follows: 125 rentals for Economy cars, 55 rentals for Midsize cars, and 40 rentals for Luxury cars. The vehicle capacity of each location is 100 cars in location A and 120 cars in location B. Develop a linear programming model to maximize profit and determine how many reservations each location should accept for each type of car. Is the demand for any car type not satisfied? Explain. ANSWER: Let A1 = number of reservations made for Economy car of Location A A2 = number of reservations made for Mid-size car of Location A A3 = number of reservations made for Luxury car of Location A Copyright Cengage Learning. Powered by Cognero.

Page 9


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models B1 = number of reservations made for Economy car of Location B B2 = number of reservations made for Mid-size car of Location B B3 = number of reservations made for Luxury car of Location B Max 25A1 + 40A2 + 10A3 + 30B1 + 35B2 + 45B3 s.t. A1 + A2 + A3 ≤ 100 B1 + B2 + B3 ≤ 120 A1 + B1 ≤ 125 A2 + B2 ≤ 55 A3 + B3 ≤ 40 A1, A2, …, B3 ≥ 0

Copyright Cengage Learning. Powered by Cognero.

Page 10


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 11


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models

The optimal solution obtained using Excel Solver shows the reservations to accept for each car type and their allocations to the different locations. Also, the demands for all car types are satisfied. 44. The supervisor of a manufacturing plant is trying to determine how many of two parts, Part X and Part Y, are to be produced per day. Each part must be processed in three sections of the plant. The time required for the production along with the profit contribution for each part are given in the following table. Time required (Minutes/Unit) Section 1 Section 2 Section 3 Part X 50 30 18 Part Y 80 45 22 Available time (minutes) 3,600 2,500 1,200

Profit/Unit $2 $3

No more than 60 units of Part X and up to 70 units of Part Y can be produced per day. The company already has orders for 30 units of Part Y that must be satisfied. a. Develop a linear programming model and solve the model to determine the optimal production quantities of Parts X and Y. b. If more time could be made available in Section 2, how much would the profit increase? (Hint: Generate Answer Report). Copyright Cengage Learning. Powered by Cognero.

Page 12


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models ANSWER: a. Let P1 = number of units of Part X produced P2 = number of units of Part Y produced Max 2P1 + 3P2 s.t. 50P1 + 80P2 ≤ 3600 30P1 + 45P2 ≤ 2500 18P1 + 22P2 ≤ 1200 P1 ≤ 60 P2 ≤ 70 P2 ≤ 30 P1, P2 ≥ 0

Copyright Cengage Learning. Powered by Cognero.

Page 13


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 14


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models

b. There would be no increase in profit because there are 430 minutes of slack time for the Section 2 at the optimal solution.

Copyright Cengage Learning. Powered by Cognero.

Page 15


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models

45. A beverage can manufacturer makes three sizes of soft drink cans—Small, Medium and Large. Production is limited by machine availability, with a combined maximum of 90 production hours per day, and the daily supply of metal, no more than 120 kg per day. The following table provides the details of the input needed to manufacture one batch of 100 cans for each size.

Metal (kg)/batch Machines’ Time (hr)/batch Profit/batch

Large 9 4.4 $50

Cans Medium Small Maximum 6 5 120 4.2 4 90 $45 $42

Develop a linear programming model to maximize profit and determine how many batches of each can size should be produced. ANSWER: Let L = number of batches of Large cans produced M = number of batches of Medium cans produced Copyright Cengage Learning. Powered by Cognero.

Page 16


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models S = number of batches of Small cans produced Max 50 L + 45 M + 42 S s.t. 9L + 6M + 5S ≤ 120 4.4L + 4.2M + 4S ≤ 90 L, M, S ≥ 0

Copyright Cengage Learning. Powered by Cognero.

Page 17


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 18


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models

46. Robin Tires, Inc. makes two types of tires, one for SUVs and the other for Hatchbacks. The firm has the following limits—500 hours for production, 250 hours for packaging, and 150 hours for shipping. The times required per tire type is given in the following table. Type SUV tires Hatchback tires

Production Hours 2 1.5

Packaging 1.5 1

Shipping 1 0.5

Profit/Tire $22 $12

Assuming that the company is interested in maximizing the total profit contribution, answer the following: a. What is the linear programming model for this problem? b. Develop a spreadsheet model and find the optimal solution using Excel Solver. How many tires of each model should Robin manufacture? c. What is the total profit contribution Robin can earn with the optimal production quantities? ANSWER: a. Let S = number of SUV tires manufactured H = number of Hatchback tires manufactured Max 22S + 12H s.t. 2S + 1.5H ≤ 500 – Production 1.5S + H ≤ 250 – Packaging S + 0.5H ≤ 150 – Shipping S, H ≥ 0 b.

Copyright Cengage Learning. Powered by Cognero.

Page 19


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 20


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models

Robin should manufacture about 100 SUV tires and about 100 Hatchback tires. c. With the optimal production quantities, the total profit Robin can earn is 100(22) + 100(12) = $3400. 47. Robin Tires, Inc. makes two types of tires, one for SUVs and the other for Hatchbacks. The firm has the following limits—500 hours for production, 250 hours for packaging, and 150 hours for shipping. The times required per tire type is given in the following table. Type SUV Tires Hatchback tires

Production Hours 2 1.5

Packaging

Shipping

Profit/Tire

1.5 1

1 0.5

$22 $12

Assuming that the company is interested in maximizing the total profit contribution, find the optimal solution using Excel Solver and answer the following: a. How many hours of production time will be scheduled in each department? Copyright Cengage Learning. Powered by Cognero.

Page 21


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models b. What is the slack time in each department? c. If one more hour is available for packaging, what is the change in profit? d. What is the change in profit if one more hour is available for shipping? ANSWER:

Let S = number of SUV tires manufactured H = number of Hatchbacks tires manufactured Max 22S + 12H s.t. 2S + 1.5H ≤ 500 – Production 1.5S + H ≤ 250 – Packaging S + 0.5H ≤ 150 – Shipping S, H ≥ 0

a. Production: 100(2) + 100(1.5) = 350 Packaging: 100(1.5) + 100(1) = 250 Shipping: 100(1) + 100(0.5) = 150 b. Hours

Production Packaging Shipping

Time used 350 250 150

Available time 500 250 150

Unused (Slack) time 150 0 0

c. The Sensitivity Report:

Copyright Cengage Learning. Powered by Cognero.

Page 22


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models

The constraint is: 1.5S + H ≤ 250. This constraint is binding and its shadow price is 4. If an additional hour is available for packaging, that is if the constraint is changed from 1.5S + H ≤ 250 to 1.5S + H ≤ 251, the optimal objective function value will increase by $4; that is, the new optimal solution will have objective function value or the profit equal to $3,400 + $4 = $3,404. d. The constraint is: S + 0.5H = 150. This constraint is binding and its shadow price is 16. If an additional hour is available for shipping, that is if the constraint is changed from S + 0.5H ≤ 150 to S + 0.5H ≤ 151, the optimal objective function value will increase by $16; that is, the new optimal solution will have objective function value equal to $3400 + $16 = $3,416. 48. Clever Sporting Equipment, Inc. makes two types of balls: Soccer balls and Cork balls. The making of each soccer ball and cork ball requires 3 hours and 4 hours of production time, respectively. A total of 500 production hours are available during the next month. At least 150 balls, combined, must be produced. The production cost for each Soccer ball is $9 and each Cork ball is $7. Develop a linear programming model to minimize production costs and determine how many of each type of ball should be produced to meet the required demand. ANSWER: Let S = number of Soccer balls manufactured C = number of Cork balls manufactured Min 9S + 7C s.t. S + C ≥ 150 3S + 4C ≤ 500 S, C ≥ 0

Copyright Cengage Learning. Powered by Cognero.

Page 23


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 24


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models

49. The Pat-A-Cake Pastry Shop makes chocolate cake in three sizes—Small, Medium, and Large. The shop has the following amounts of the three main ingredients on hand—400 ounces of cake flour, 550 ounces of caster sugar, and 150 ounces of cocoa powder. The table below provides details on the amount of each ingredient required for each cake size as well as the profit contributions. Cake Plain flour (Ounce) Caster sugar (Ounce) Cocoa powder (Ounce) Profit/Unit

Small 8 18 3 $18

Medium 16 22 5 $25

Large 21 25 11 $32

Available 400 550 150

Develop and solve a linear programming model to maximize the profit. What is the optimal solution for this problem? ANSWER: Let S = Number of small cakes made M = Number of medium cakes made L = Number of large cakes made Max 18S + 25M + 32L s.t. 8S + 16M + 21L ≤ 400 18S + 22M + 25L ≤ 55 Copyright Cengage Learning. Powered by Cognero.

Page 25


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models 3S + 5M + 11L ≤ 150 S, M, L ≥ 0

Copyright Cengage Learning. Powered by Cognero.

Page 26


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models

50. Two mining fields, Field A and Field B, of a coal mining company produce Lignite and Bituminous coal. The operating cost per day for Field A and Field B are $55,000 and $45,000, respectively. The recent records at the company indicate that Field A can produce 250 tons of Lignite along with 300 tons of Bituminous coal per day, whereas Field B can produce 200 tons of Lignite along with 450 tons of Bituminous coal per day. The expected demands to be met are 120,000 tons of Lignite and 170,000 tons of Bituminous coal. To minimize the operating costs of the mining fields, how many days does the company need to operate each of these fields? ANSWER: Let A = number of days Field A operates B = number of days Field B operates Min 55,000A + 45,000B s.t. 250A + 200B = 120000 300A + 450B = 170000 A, B ≥ 0

Copyright Cengage Learning. Powered by Cognero.

Page 27


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 28


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models

51. Sunseel Industries produces two types of raw materials, A and B, with a production cost of $4 and $8 per unit, respectively. The combined production of A and B must be at least 700 units per month. The factory is expected to produce at least 400 units of B and not more than 1200 units of A each month. The processing times for A and B are observed to be 5 hours and 4 hours, respectively. A total of 3000 production hours are available per month. Develop a linear program that Sunseel Industries can use to determine the number of units of each raw material to produce that will meet the demand and minimize the total cost. ANSWER: Let A = number of units of Raw material-A produced per month B = number of units of Raw material-B produced per month Min 4A + 8B s.t. 5A + 4B ≤ 3000 A ≤ 1200 B ≥ 400 A + B ≥ 700 A, B ≥ 0

Copyright Cengage Learning. Powered by Cognero.

Page 29


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 30


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 31


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models

52. Michael has decided to invest $40,000 in three types of funds. Fund A has projected an annual return of 8 percent, Fund B has projected an annual return of 10 percent, and Fund C has projected an annual return of 9 percent. He has decided to invest no more than 30 percent of the total amount in Fund B and no more than 40 percent of the total amount in Fund C. a. Formulate a linear programming model that can be used to determine the amount of investments Michael should allocate to each type of fund to maximize the total annual return. b. How much should be allocated to each type of fund? What is the total annual return? ANSWER: a. Let A = amount invested in Fund A B = amount invested in Fund B C = amount invested in Fund C Max 0.08A + 0.10B + 0.09C s.t. A + B + C = 40000 B ≤ 0.3(A + B + C) → B ≤ 12000 C ≤ 0.4(A + B + C) → B ≤ 16000 A, B, and C ≥ 0 b.

Copyright Cengage Learning. Powered by Cognero.

Page 32


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models

Cell 18 =SUM(B15:B17) Cell 20 =SUMPRODUCT(B4:B6,B15:B17)

Copyright Cengage Learning. Powered by Cognero.

Page 33


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models

53. Northwest California Ventures Ltd. has decided to provide capital in five market areas for the start-ups. The investment consultant for the venture capital company has projected an annual rate of return based on the market risk, the product, and the size of the market. Market Area Electronics Software Logistics Education Retail

Annual Rate of Return on Capital (%) 12 18 15 12 17

The maximum capital provided will be $5 million. The consultant has imposed conditions on allotment of capital based on the risk involved in the market. • The capital provided to retail should be at most 40 percent of the total capital. • The capital for education should be 26 percent of the total of other four markets (Electronics, Software, Logistics, and Retail) • Logistics should be at least 15 percent of the total capital. • The capital allocated for Software plus Logistics should be no more than the capital allotted for Electronics. • The capital allocated for Logistics plus Education should not be greater than that allocated to Retail. Calculate the expected annual rate of return based on the allocation of capital to each market area to maximize the return Copyright Cengage Learning. Powered by Cognero.

Page 34


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models on capital provided. Also, show the allocation of capital for each market area. ANSWER: Let x1 = investment on Electronics x2 = investment on Software x3 = investment on Logistics x4 = investment on Education x5 = investment on Retail Max 0.12x1 + 0.18x2 + 0.15x3 + 0.12x4 + 0.17x5 s.t. x5 ≤ 0.4(5,000,000) x4 = 0.26 (x1 + x2 + x3 + x5) x3 ≥ 0.15(5,000,000) x2 + x3 £ x1 x3 + x4 £ x5 x1+ x2 + x3 + x4+ x5 = 5,000,000 x1, x2, x3, x4, and x5 ≥ 0

Copyright Cengage Learning. Powered by Cognero.

Page 35


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 36


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 37


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models

The expected annual rate of return based on the allocation of capital to each of the five market areas, as shown below, is obtained as $736,547.62, approximately. Market Area Electronics (x1) Software (x2) Logistics (x3) Education (x4) Retail (x5) Total

Allocation (approx.) $984,127 $234,127 $750,000 $1,031,746 $2,000,000 $5,000,000

54. Jackson just obtained $240,000 by selling mutual funds and is now looking for other investment opportunities for these funds. His financial consultant recommends that all new investments be made in the stocks of industries such as Agriculture, Healthcare, Banking, Manufacturing, and Real Estate. The projected annual rates of returns for the investments are as follows: Expected Annual Returns of the Stocks Copyright Cengage Learning. Powered by Cognero.

Page 38


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models Stocks Agriculture Health Care Banking Manufacturing Real Estate

Return (%) 11 6.50 9 12 8.50

His consultant has set constraints on the investments based on the calculated risks involved with the industries: 1) Neither Agriculture nor Manufacturing should receive more than $100,000. 2) Neither Healthcare nor Banking should receive more than $50,000. 3) The amount invested in Manufacturing should not be more than 45 percent of the sum of the investment in Banking and Healthcare sectors. 4) The amount invested in Real Estate should be at least 20 percent of the sum of the investment in Banking and Healthcare sectors. Develop portfolio recommendations—investments and amounts—for investing the available $240,000. ANSWER: Let X1 = amount invested in Agriculture X2 = amount invested in Health Care X3 = amount invested in Banking X4 = amount invested in Manufacturing X5 = amount invested in Real Estate Max: 0.11X1 + 0.065X2 + 0.09X3 + 0.12X4 + 0.085X5 s.t. X1 + X2 + X3 + X4 + X5 = 240,000 X1 ≤ 100000 X4 ≤ 100000 X2 ≤ 50000 X3 ≤ 50000 X4 ≤ 0.45(X2 + X3) X5 ≥ 0.20(X2 + X3) X1, X2, X3, X4, X5 ≥ 0

Copyright Cengage Learning. Powered by Cognero.

Page 39


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 40


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models

55. A soft drink manufacturing company has three factories—one in Orlando, one in Tampa, and one in Port St. Lucie— that supply soft drink bottles to three warehouses located in the city of Miami. The associated per-unit transportation cost between the factories and the warehouses is provided in the table below. Factories/Warehouse (W) Orlando Tampa Port St. Lucie

Transportation Costs ($) W1 W2 7 4 7 6 5 5

W3 5 4 6

The factory in Orlando has a capacity of 14,000 units. The factory in Tampa has a capacity of 25,000 units. The factory in Port St. Lucie has a capacity of 23,000 units. The requirements of the warehouses are: Warehouse W1 W2

Requirement (Bottles) 18,000 19,000

Copyright Cengage Learning. Powered by Cognero.

Page 41


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models W3

22,000

Determine how much of the company’s production should be shipped from each factory to each warehouse in order to minimize the total transportation cost?

ANSWER: Let x11: number of bottles shipped from Orlando to W1 x12: number of bottles shipped from Orlando to W2 . . . x33: number of bottles shipped from Port St. Lucie to W3 Min 7x11 + 4x12 + 5x13 + 7x21 + 6x22 + 4x23 + 5x31 + 5x32 + 6x33 s.t. x11 + x12 + x13 ≤ 14000 x21 + x22 + x23 ≤ 25000 x31 + x32 + x33 ≤ 23000 x11 + x21 + x31 = 18000 x12 + x22 + x32 = 19000 x13 + x23 + x33 = 22000 x11, x12, x13, …, x33 ³ 0

Copyright Cengage Learning. Powered by Cognero.

Page 42


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 43


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 44


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models

56. A soft drink manufacturing company has three factories—one in Orlando, one in Tampa, and one in Port St. Lucie— which supply soft drink bottles to three warehouses located in the city of Miami. The associated per-unit transportation cost table is provided below. Transportation Costs ($) Factories/Warehouse (W) W1 W2 W3 Orlando 4 3 7 Tampa 7 6 4 Port St. Lucie 3 6 6 The factory in Orlando has a capacity of 15,000 units. The factory in Tampa has a capacity of 18,000 units. The factory in Port St. Lucie has a capacity of 8,000 units. The requirements of the warehouses are: Warehouse W1 W2 W3

Requirement (Bottles) 18,000 12,000 5,000

Copyright Cengage Learning. Powered by Cognero.

Page 45


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models a. Determine how much of the company’s production should be shipped from each factory to each warehouse in order to minimize the total transportation cost? b. Find an alternative optimal solution for this transportation problem? ANSWER: a. Let x11: number of bottles shipped from Orlando to W1 x12: number of bottles shipped from Orlando to W2 . . . x33: number of bottles shipped from Port St. Lucie to W3 Min 4x11 + 3x12 + 7x13 + 7x21 + 6x22 + 4x23 + 3x31 + 6x32 + 6x33 s.t. x11 + x12 + x13 ≤ 15000 x21 + x22 + x23 ≤ 18000 x31 + x32 + x33 ≤ 8000 x11 + x21 + x31 = 18000 x12 + x22 + x32 = 12000 x13 + x23 + x33 = 5000 x11, x12, x13, …, x33 ≥ 0

b. To find an alternative optimal solution, solve the problem (maximize the sum of supplies that are zeroes in Copyright Cengage Learning. Powered by Cognero.

Page 46


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models the above solution; subject to cost must be optimal). Max x13 + x22 + x32 + x33 s.t. x11 + x12 + x13 ≤ 15000 x21 + x22 + x23 ≤ 18000 x31 + x32 + x33 ≤ 8000 x11 + x21 + x31 = 18000 x12 + x22 + x32 = 12000 x13 + x23 + x33 = 5000 x11, x12, x13, …, x33 ≥ 0

Copyright Cengage Learning. Powered by Cognero.

Page 47


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 48


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models

57. Ethan Steel, Inc. has two factories that manufacture steel components for four different rail projects. The demand for the steel components for the four projects—Project A, Project B, Project C, and Project D—are 3220, 3675, 4125, and 2975, respectively. The shipping details are as below. Production Details: Factory 1 2

Maximum Capacity 6,500 8,500

Shipping Details (with per-unit shipping cost): Project Factory A B C 1 $7 $7 $8 2 $6 $5 $7

D $4 $3

What is the optimal (cost minimizing) distribution plan for this transportation problem? ANSWER: Let x11: number of steel components shipped from Factory 1 to Project A x12: number of steel components shipped from Factory 1 to Project B . . . Copyright Cengage Learning. Powered by Cognero.

Page 49


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models x24: number of steel components shipped from Factory 2 to Project D Min 7x11 + 7x12 + 8x13 + 4x14 + 6x21 + 5x22 + 7x23 + 3x24 s.t. x11 + x21 = 3220 x12 + x22 = 3675 x13 + x23 = 4125 x14 + x24 = 2975 x11 + x12 + x13 + x14 ≤ 6500 x21 + x22 + x23 + x24 ≤ 8500 x11, x12, x13, …, x24 ³ 0

Copyright Cengage Learning. Powered by Cognero.

Page 50


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 51


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models

58. Ethan Steel, Inc. has two factories that manufacture steel components for four different rail projects. The demand for the steel components for the four projects—Project A, Project B, Project C, and Project D—are 3220, 3675, 4125, and 2975, respectively. The shipping details are as below: Production details: Factory 1 2

Maximum Capacity 6,500 8,500

Shipping Details (with per-unit shipping cost): Project Factory A B C 1 $7 $7 $8 2 $6 $5 $7

D $4 $3

Find an alternative optimal solution for this transportation problem? ANSWER: First, solve the problem by minimizing the total shipping cost. Let x11: number of steel components shipped from Factory 1 to Project A Copyright Cengage Learning. Powered by Cognero.

Page 52


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models x12: number of steel components shipped from Factory 1 to Project B . . . x24: number of steel components shipped from Factory 2 to Project D Min 7x11 + 7x12 + 8x13 + 4x14 + 6x21 + 5x22 + 7x23 + 3x24 s.t. x11 + x21 = 3,220 x12 + x22 = 3,675 x13 + x23 = 4,125 x14 + x24 = 2,975 x11 + x12 + x13 + x14 ≤ 6,500 x21 + x22 + x23 + x24 ≤ 8,500 x11, x12, x13, …, x24 ≥ 0

To find an alternative optimal solution, solve the problem (maximize the shipping that is zero in the above solution; subject to cost must be optimal). Max x21 + x12 + x13 s.t. x11 + x21 = 3,220 x12 + x22 = 3,675 Copyright Cengage Learning. Powered by Cognero.

Page 53


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models x13 + x23 = 4,125 x14 + x24 = 2,975 x11 + x12 + x13 + x14 ≤ 6,500 x21 + x22 + x23 + x24 ≤ 8,500 x11, x12, x13, …, x24 ≥ 0

Copyright Cengage Learning. Powered by Cognero.

Page 54


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 55


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models

59. Three plants—P1, P2, and P3—of a gas corporation supply gasoline to three of their distributors located in the city at three different locations—A, B, and C. The plants’ daily capacities are 4500, 3000, and 5000, gallons, respectively, while the distributors’ daily requirements are 5500, 2500, and 4200 gallons. The per-gallon transportation costs (in $) are provided in the table below. Plant P1 P2 P3

A 0.8 0.7 0.5

Distributor B C 0.5 1 0.65 0.8 0.45 0.7

Because of a failure of expected supply earlier, the distributors A, B, and C have decided to charge a penalty this time of $0.45, $0.55, and $0.5 per gallon, respectively, to avoid any further delays. Now, determine the optimum supply of gasoline to the distributors in order to minimize the total transportation cost as well as the charges payable as penalty. ANSWER: Let x11: number of gallons of gasoline supplied from Plant 1 to Distributor A x12: number of gallons of gasoline supplied from Plant 1 to Distributor B . . . x33: number of gallons of gasoline supplied from Plant 3 to Distributor C Copyright Cengage Learning. Powered by Cognero.

Page 56


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models Min 0.8x11 + 0.5x12 + 1x13 + 0.7x21 + 0.65x22 + 0.8x23 + 0.5x31 + 0.45x32 + 0.7x33 s.t. x11 + x12 + x13 ≤ 4,500 x21 + x22 + x23 ≤ 3,000 x31 + x32 + x33 ≤ 5,000 x11 + x21 + x31 = 5,500 x12 + x22 + x32 = 2,500 x13 + x23 + x33 = 4,200 x11, x12, …, x33 ≥ 0

Copyright Cengage Learning. Powered by Cognero.

Page 57


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 58


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models

60. Three plants—P1, P2, and P3—of a gas corporation supply gasoline to three of their distributors in the city located at A, B, and C locations. The plants’ daily capacities are 4500, 3000, and 5000, gallons, respectively, while the distributors’ daily requirements are 5500, 2500, and 4200 gallons. The per-gallon transportation costs (in $) are provided in the table below. Plant P1 P2 P3

A 0.8 0.7 0.5

B 0.5 0.65 0.45

Distributor C 1 0.8 0.7

Because of a failure of expected supply earlier, the distributors have decided to charge a penalty this time of $0.45, $0.55, and $0.5 per gallon, respectively, for the locations A, B, and C to avoid any further delays. Find an alternative optimal solution for this transportation problem? ANSWER: First, solve the problem by minimizing the total transportation cost. Let x11: number of gallons of gasoline supplied from Plant 1 to Distributor A x12: number of gallons of gasoline supplied from Plant 1 to Distributor B . . . Copyright Cengage Learning. Powered by Cognero.

Page 59


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models x33: number of gallons of gasoline supplied from Plant 3 to Distributor C Min 0.8x11 + 0.5x12 + 1x13 + 0.7x21 + 0.65x22 + 0.8x23 + 0.5x31 + 0.45x32 + 0.7x33 s.t. x11 + x12 + x13 ≤ 4,500 x21 + x22 + x23 ≤ 3,000 x31 + x32 + x33 ≤ 5,000 x11 + x21 + x31 = 5,500 x12 + x22 + x32 = 2,500 x13 + x23 + x33 = 4,200 x11, x12, …, x33 ≥ 0

To find the alternative optimal solution, solve the problem (maximize the supply that is zero in the above solution; subject to cost must be optimal). Max x13 + x21 + x22 + x32 s.t. x11 + x12 + x13 ≤ 4,500 x21 + x22 + x23 ≤ 3,000 x31 + x32 + x33 ≤ 5,000 Copyright Cengage Learning. Powered by Cognero.

Page 60


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models x11 + x21 + x31 = 5,500 x12 + x22 + x32 = 2,500 x13 + x23 + x33 = 4,200 x11, x12, …, x33 ≥ 0

Copyright Cengage Learning. Powered by Cognero.

Page 61


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 62


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models

61. Zen, Inc. manufactures two types of products, the G.1 and the T.1 models. The manufacturing process consists of two principal departments: production and assembly. The production department has 58 skilled workers, each of whom works 7 hours per day. The assembly department has 25 workers, who also work a 7-hour shift. On an average, to produce a G.1 model, Zen, Inc. requires 3.5 labor hours for production and 2 labor hours for assembly. The T.1 model requires 4 labor hours for production and 1.5 labor hours in assembly. The company anticipates selling at least 1.5 times as many T.1 models as G.1 models. The company operates five days per week and makes a net profit of $130 on the G.1 model, and $150 on the T.1 model. Zen, Inc. wants to determine how many of each model should be produced on a weekly basis to maximize net profit. Formulate the problem. Let G = the number of G.1 product produced each week. Let T = the number of T.1 product produced each week. Maximize 130G + 150T s.t. production’s labor constraint 3.5G + 4T ≤ 2030 assembly’s labor constraint 2G + 1.5T ≤ 875 ANSWER: UNITS PROFIT

G.1 206 130

T.1 309 150

Copyright Cengage Learning. Powered by Cognero.

Total Profit 73,088 Page 63


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models Constraints Production Labor Assembly Labor Quantity

3.5 2 –1.5

4 1.5 1

1955.88235 875 0

≤ ≤ ≥

2030 875 0

The company anticipates selling at least 1.5 times as many T.1 models as G.1 models. T ≥ 1.5G T – 1.5G ≥ 0 G.1 model requires 3.5 labor hours for production and 2 labor hours for assembly. The T.1 model requires 4 labor hours for production and 1.5 labor hours in assembly. 62. Zen, Inc. manufactures two types of products, the G.1 and the T.1 models. The manufacturing process consists of two principal departments: production and assembly. The production department has 58 skilled workers, each of whom works 7 hours per day. The assembly department has 25 workers, who also work a 7-hour shift. On an average, to produce a G.1 model, Zen, Inc. requires 3.5 labor hours for production and 2 labor hours for assembly. The T.1 model requires 4 labor hours for production and 1.5 labor hours in assembly. The company anticipates selling at least 1.5 times as many T.1 models as G.1 models. The company operates five days per week and makes a net profit of $130 on the G.1 model, and $150 on the T.1 model. Zen, Inc. wants to determine how many of each model should be produced on a weekly basis to maximize net profit. Solve Using the Excel Solver tool. ANSWER: UNITS PROFIT Constraints Production Labor Assembly Labor Quantity

G.1 206 130

T.1 309 150

Total Profit 73,088

3.5 2 –1.5

4 1955.88235 1.5 875 1 0

≤ ≤ ≥

2,030 875 0

The company anticipates selling at least 1.5 times as many T.1 models as G.1 models. T ≥ 1.5G T – 1.5G ≥ 0 G.1 model requires 3.5 labor hours for production and 2 labor hours for assembly. The T.1 model requires 4 labor hours for production and 1.5 labor hours in assembly. The company should manufacture 206 G.1 Products and 309 T.1 products to maximize profits 63. Zen, Inc. manufactures two types of products, the G.1 and the T.1 models. The manufacturing process consists of two principal departments: production and assembly. The production department has 58 skilled workers, each of whom works 7 hours per day. The assembly department has 25 workers, who also work a 7-hour shift. On an average, to produce a G.1 model, Zen, Inc. requires 3.5 labor hours for production and 2 labor hours for assembly. The T.1 model requires 4 labor hours for production and 1.5 labor hours in assembly. The company anticipates selling at least 1.5 times as many T.1 models as G.1 models. The company operates five days per week and makes a net profit of $130 on the G.1 model, and $150 on the T.1 model. Zen, Inc. wants to determine how many of each model should be produced on a weekly basis to maximize net profit. What is the projected profit at the maximized number of units produced? Let G = the number of G.1 product produced each week Let T = the number of T.1 product produced each week Maximize 130G + 150T Copyright Cengage Learning. Powered by Cognero.

Page 64


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models production’s labor constraint 3.5G + 4T ≤ 2,030 assembly’s labor constraint 2G + 1.5T ≤ 875 ANSWER: 73,088

UNITS PROFIT Constraints Production Labor Assembly Labor Quantity

G.1 206 130

T.1 309 150

Total Profit 73,088

3.5 2 –1.5

4 1955.88235 1.5 875 1 0

≤ ≤ ≥

2,030 875 0

The company anticipates selling at least 1.5 times as many T.1 models as G.1 models. T ≥ 1.5G T – 1.5G ≥ 0

64. A company has three treatments that it can apply to three different types of clothes—namely denims, linens, and suiting—yielding the profit $4, $5, and $8 per bolt, respectively. One bolt of denims requires 2 hours in Treatment 1, 3 hours in Treatment 2 and 4 hours in Treatment 3. Similarly, one bolt of linens requires 3 hours in Treatment 1, 2 hours in Treatment 2 and 4 hours in Treatment 3 while one bolt suiting requires 2 hours in Treatment 1, 3 hours in Treatment 2 and 4 hours in Treatment 3. In a week, total run time of each department is 80 hours, 90 hours, and 65 hours for Treatment 1, Treatment 2 and Treatment 3, respectively. Formulation Let D1 = the number of bolts of denims with Treatment 1 D2 = the number of bolts of denims with Treatment 2 D3 = the number of bolts of denims with Treatment 3 . . . S3 = the number of bolts of Suiting with Treatment 3 Max 4(DW + DI + DP) + 5(LW + LI + LP) + 8(SW + SI + SP) s.t. 2D1 + 3L1 + 3S1 ≤ 80 3D2 + 2L2 + 3S2 ≤ 90 4D3 + 4L3 + 3S3 ≤ 65 D1, D2, D3, …, S3 ≥ 0 ANSWER: The optimal solution is 40 bolts of denims with Treatment 1, 30 bolts of suiting with Treatment 2, 32.5 bolts of linens with Treatment 3 yielding a profit of $530 and utilizing all of the available resources. Model Coefficients Treatment 1

Denims $4 40

Copyright Cengage Learning. Powered by Cognero.

Linens $5 0

Suiting $8 0

Profit 160 Page 65


Name:

Class:

Date:

Chapter 12 - Linear Optimization Models Treatment 2 Treatment 3

0 32.5 76.5

Constraints Treatment 1 Treatment 2 Treatment 3

Resources 2 3 4 3 2 3 2 3 4

Copyright Cengage Learning. Powered by Cognero.

0 0 5

30 0 38 Used 80 90 65

≤ ≤ ≤

230 130 $530 Available 80 90 65

Page 66


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models Multiple Choice 1. The imposition of an integer restriction is necessary for models where _____. a. nonnegativity constraints are needed b. variables can take negative values c. the decision variables cannot take fractional values d. possible values of variables are restricted to particular intervals ANSWER: c 2. The objective function for a linear optimization problem is: Max 3x + 5y, with constraints x ≥ 0, y ≥ 0 and x and y are both integers and they are also the only decisions variables. This is an example of a(n) _____. a. all-integer linear program b. mixed-integer linear program c. nonlinear program d. binary integer linear program ANSWER: a 3. The linear program that results from dropping the integer requirements for the variables in an integer linear program is known as _____. a. convex hull b. a mixed-integer linear program c. LP relaxation d. a binary integer linear program ANSWER: c 4. The objective function for an optimization problem is: Min 3x – 2y, with constraints x ≥ 0, y ≥ 0. x and y must be integers. Suppose that the integer restriction on the variables is removed. If so, this would be a familiar two-variable linear program; however, it would also be an example of _____. a. the convex hull of the linear program b. a mixed-integer linear program c. an LP relaxation of the integer linear program d. a binary integer linear program ANSWER: c 5. The objective function for an optimization problem is: Max 5x – 3y, with constraints x ≥ 0, y ≥ 0 and y must be an integer. x and y are the only decisions variables. This is an example of a(n) _____. a. all-integer linear program b. mixed-integer linear program c. LP relaxation of the integer linear program d. binary integer linear program ANSWER: b 6. In a binary integer linear program, the integer variables take only the values _____. a. 0 or 1 b. 0 or 8 c. 1 or 8 d. 1 or –1 ANSWER: a Copyright Cengage Learning. Powered by Cognero.

Page 1


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models 7. Which of the following is true of rounding the optimized solution of a linear program to an integer? a. It always produces the most optimal integer solution. b. It always produces a feasible solution. c. It does not affect the value of the objective function. d. It may or may not be feasible. ANSWER: d 8. The _____ of a set of points is the smallest intersection of linear inequalities that contain the set of points. a. concave hull b. slope c. convex hull d. geometry ANSWER: c 9. The optimal solution to the integer linear program will be an extreme point of the _____. a. convex hull b. objective contour c. cutting plane d. slope ANSWER: a 10. Which of the following is true of the relationship between the value of the optimal integer solution and the value of the optimal solution to the LP Relaxation? a. For integer linear programs involving minimization, the value of the optimal solution to the LP Relaxation provides an upper bound on the value of the optimal integer solution. b. For integer linear programs involving maximization, the value of the optimal solution to the LP Relaxation provides a lower bound on the value of the optimal integer solution. c. For integer linear programs involving minimization, the value of the optimal solution to the LP Relaxation provides a lower bound on the value of the optimal integer solution. d. For any linear program involving either minimization or maximization, the value of the optimal solution to the LP Relaxation provides an infeasible value for the optimal integer solution. ANSWER: c 11. The _____ approach to solving integer linear optimization problems breaks the feasible region of the LP Relaxation into subregions until the subregions have integer solutions or it is determined that the solution cannot be in the subregion. a. cutting plane b. trial-and-error c. breaking region d. branch-and-bound ANSWER: d 12. Which of the following approaches to solving integer linear optimization problems tries to identify the convex hull by adding a series of new constraints that do not exclude any feasible integer points? a. Branch-and bound approach b. Cutting plane approach Copyright Cengage Learning. Powered by Cognero.

Page 2


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models c. Trial-and-error approach d. Convex hull approach ANSWER: b 13. The worksheet formulation for integer linear programs and linear programming problems is exactly the same except that the _____ for integer linear programs. a. objective function using Set Objective in the Solver Parameters dialog box is set to Value Of option b. decision variables need not be added in By Changing Variable Cells in the Solver Parameters dialog box c. decision variables must be added in By Changing Variable Cells in the Solver Parameters dialog box along with selecting the Ignore Integer Constraints in the Integer Options dialog box d. constraints must be added in the Solver Parameters dialog box to identify the integer variables and the value for Tolerance in the Integer Options dialog box may need to be adjusted ANSWER: d 14. Binary variables are identified with the _____ designation in the Solver Parameters dialog box. a. bin b. 0 and 1 c. int d. dif ANSWER: a 15. The importance of _____ for integer linear programming problems is often intensified by the fact that a small change in one of the coefficients in the constraints can cause a relatively large change in the value of the optimal solution. a. objective function b. decision variables c. sensitivity analysis d. optimization analysis ANSWER: c 16. Which of the following is true about the sensitivity analysis for integer optimization problems? a. Sensitivity reports are readily available for integer optimization problems similar to the linear programming problems. b. Because of the discrete nature of the integer optimization, Excel Solver takes much more time to calculate objective function coefficient ranges, shadow prices, and right-hand-side ranges. c. The sensitivity analysis is not important for integer problems. d. To determine the sensitivity of the solution to changes in model inputs for integer optimization problems, the data must be changed and the problem must be re-solved. ANSWER: d 17. In order to choose the best solution for implementation, practitioners usually recommend re-solving the integer linear program several times with variations in the _____. a. objective function b. decision variables c. constraint coefficients d. integer constraints ANSWER: c Copyright Cengage Learning. Powered by Cognero.

Page 3


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models 18. In cases where Excel Solver experiences excessive run times when solving integer linear problems, the Integer Optimality is set to _____. a. 5% b. 0% c. infinity d. a value equal to the number of integer constraints ANSWER: a 19. The objective function for a linear optimization problem is: Max 3x + 2y, with one of the constraints being x and y both only take the values 0, 1. Also x and y are the only decision variables. This is an example of a _____. a. nonlinear program b. mixed-integer linear program c. LP relaxation of the integer linear program d. binary integer linear program ANSWER: d 20. A _____ problem is a binary integer programming problem that involves choosing which possible projects or activities provide the best investment return. a. capital budgeting b. fixed-cost c. market share optimization d. location ANSWER: a 21. In a production application involving a fixed setup cost and a variable cost, the use of _____ makes including the setup cost possible in a production model. a. location variables b. noninteger constraints c. objective function coefficients d. binary variables ANSWER: d 22. A binary mixed-integer programming problem in which the binary variables represent whether an activity, such as a production run, is undertaken or not is known as the _____. a. capital budgeting problem b. share of choice problem c. fixed-cost problem d. covering problem ANSWER: c 23. In a fixed-cost model, each fixed cost is associated with a binary variable and a specification of the _____. a. upper bound for the corresponding production variable b. upper bound for each of the binary variable c. integer constraints involving the corresponding production variables d. objective function involving these binary variables only ANSWER: a Copyright Cengage Learning. Powered by Cognero.

Page 4


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models 24. Which of the following is a likely constraint on the production quantity x associated with a maximum value and a setup variable y in a fixed-cost problem? a. x ≥ My b. x ≤ My c. Mx ≤ y d. xy ≥ M ANSWER: b 25. In a fixed-cost problem, choosing excessively large values for the maximum production quantity will result in _____. a. all reasonable levels of production b. no production c. no solution at all d. possibly a slow solution procedure ANSWER: d 26. For a location problem, if the variables are defined as xi = 1 if an outlet store is established in region i and 0 otherwise, the objective function is best defined by _____ for i = 1, 2, …, n number of outlet stores included in the problem. a. Min(Sxi) b. Max(Sxi) c. Min(pxi) d. Max(pxi) ANSWER: a 27. _____ analysis is a market research technique that can be used to learn how prospective buyers of a product value the product’s attributes. a. Part-worth b. Conjoint c. Regression d. Sensitivity ANSWER: b 28. The _____ is the utility value that a consumer attaches to each level of each attribute in a conjoint analysis model. a. weightage b. share of choice c. part-worth d. share of market ANSWER: c 29. The part-worth for each of the attribute levels in a conjoint analysis is determined by _____. a. regression analysis b. sensitivity analysis c. online surveys d. word-of-mouth ANSWER: a 30. Coming up with a product design that will have the highest utility for a sufficient number of people to ensure sufficient sales to justify making the product is known as the _____ problem in marketing literature. Copyright Cengage Learning. Powered by Cognero.

Page 5


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models a. capital budgeting b. share of choice c. fixed-cost d. traveling-salesman ANSWER: b 31. The results of _____ can be used in an integer programming model of a product design and market share optimization problem. a. conjoint analysis b. product design c. part-worth d. variations analysis ANSWER: a 32. An apparel designing company is planning to enter the women’s trousers market. They are in the process of developing a product that will appeal most to customers. What category does the above objective fall under? a. Capital budgeting problem b. Covering problem c. Fixed-cost problem d. Product design and market share optimization problem ANSWER: d 33. An apparel designing company is planning to enter the women’s trousers market. They are in the process of developing a product that will appeal most to customers. In an integer programming model for this problem, the available sizes of the trousers will be represented as _____. a. binary variables b. constraints c. attributes d. regression constants ANSWER: c 34. An apparel designing company is planning to enter the women’s trousers market. They are in the process of developing a product that will appeal most to customers. Pink, green, and black will be _____ of the color attribute. a. levels b. constraints c. regression constants d. utility values ANSWER: a 35. An apparel designing company is planning to enter the women’s trousers market. They are in the process of developing a product that will appeal most to customers. The levels—small, medium, and large—of the size attribute are modeled using _____. a. objective function coefficients b. slack variables c. binary variables Copyright Cengage Learning. Powered by Cognero.

Page 6


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models d. nonlinear coefficients ANSWER: c 36. An apparel designing company is planning to enter the women’s trousers market. They are in the process of developing a product that will appeal most to customers. The part-worth for each of the attribute levels obtained from an initial customer survey and the subsequent regression analysis can be used to determine the _____. a. customer utility value b. optimal solution for the regression analysis c. overall profit for the company d. overall sales achieved by the company ANSWER: a 37. The sum of two or more binary variables must be less than or equal to one in a _____ constraint. a. corequisite b. conditional c. multiple-choice d. mutually exclusive ANSWER: d 38. A constraint involving binary variables that does not allow certain variables to equal one unless certain other variables are equal to one is known as a _____. a. conditional constraint b. corequisite constraint c. k out of n alternatives constraint d. mutually exclusive constraint ANSWER: a 39. _____ constraint is a constraint requiring that two binary variables be equal and that thus are both either in or out of the solution together. a. Conditional b. Corequisite c. k out of n alternatives d. Mutually exclusive ANSWER: b 40. Which of the following is true about generating alternatives in binary optimization? a. If the second-best solution is very close to optimal, it is always preferred over the true optimal solution because of factors outside the model. b. If alternative solutions exist, it would not help management because some factors that make one alternative are not preferred over the factors that make another alternative. c. If the solution is a unique optimal solution, it would be good for management to know how much worse the second-best solution is than the unique optimal solution. d. If any alternative solution exists, it would only be a second-best next to the optimal solution because there is no third-best or an alternative second-best solution to any binary integer programming problem. ANSWER: c Copyright Cengage Learning. Powered by Cognero.

Page 7


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models Subjective Short Answer 41. What is the difference between an all-integer linear program, LP relaxation, a mixed-integer linear program, and a binary integer linear program? ANSWER: An all-integer linear program is one in which all the variables are required to be integers. The linear program that results from dropping the integer requirements is the linear programming relaxation of the integer linear program. A mixed-integer linear program is one in which some, but not all, of the variables are required to be integers. In a binary integer linear program, the integer values may only take on the values 0 or 1. 42. Is sensitivity analysis possible for integer linear programming problems? Is it needed? Explain. ANSWER: Classical sensitivity analysis for linear programs is not available for integer programs. Because of the discrete nature of integer optimization, it is not possible to easily calculate objective function coefficient ranges, shadow prices, and right-hand-side ranges. However, this does not mean that the sensitivity analysis is not important for integer programs. Sensitivity analysis is often more crucial for integer linear programming problems than for linear programming problems. A small change in one of the coefficients in the constraints can cause a relatively large change in the value of the optimal solution. 43. What is a binary variable? ANSWER: A binary variable is a categorical (qualitative) variable that can only take two outcomes, which are labeled 0 and 1. 44. What is the convex hull? ANSWER: The convex hull is the smallest intersection of linear inequalities that contain a certain set of points. 45. What is the utility value that a consumer attaches to each level of each attribute in a conjoint analysis model? ANSWER: Part-worth 46. A manufacturer makes two types of rubber, Butadiene and Polyisoprene. The plant has two machines, Machine-1 and Machine-2, which are used to make the rubber strips. Manufacturing one strip of Butadiene requires 2.75 hours on Machine-1 and 3 hours on Machine-2. Processing one strip of Polyisoprene takes 3.5 hours on Machine-1 and 4 hours on Machine-2. Machine-1 is available 180 hours per month, and Machine-2 is available 200 hours per month. Formulate an all-integer mathematical model that will determine how many units of each type of rubber should be produced to maximize profits if the profit contributions of Butadiene and Polyisoprene are $20 and $26, respectively. ANSWER: Let B = Number of units of Butadiene produced P = Number of units of Polyisoprene produced Max 20B + 26P s.t. 75B + 3.5P ≤ 180 3B + 4P ≤ 200 B, P ≥ 0 and integer 47. A manufacturer makes two types of rubber, Butadiene and Polyisoprene. The plant has two machines, Machine-1 and Machine-2, which are used to make the rubber strips. Manufacturing one strip of Butadiene requires 2.75 hours on Machine-1 and 3 hours on Machine-2. Processing one strip of Polyisoprene takes 3.5 hours on Machine-1 and 4 hours on Machine-2. Machine-1 is available 180 hours per month, and Machine-2 is available 200 hours per month. Formulate an all-integer spreadsheet model that will determine how many units of each type of rubber should be produced to maximize profits if the profit contributions of Butadiene and Polyisoprene are $20 and $26, respectively. How many units of each type of rubber will maximize profits? Copyright Cengage Learning. Powered by Cognero.

Page 8


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models ANSWER: Let B = Number of units of Butadiene produced P = Number of units of Polyisoprene produced Max 20B + 26P s.t. 2.75B + 3.5P ≤ 180 3B + 4P ≤ 200 B, P ≥ 0 and integer

Copyright Cengage Learning. Powered by Cognero.

Page 9


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 10


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models

48. A coffee manufacturing company has two processing plants (P1 and P2) that roast imported coffee beans. After roasting, the plants produce three types of coffee beans, A, B, and C. The company has contracted with a chain of cafes to provide coffee beans each week in the following quantities—20 tons of type A, 11 tons of type B, and 18 tons of type C. The two plants have the same capacity, but their diverse operational procedures affect costs per ton as below. Plant P1 P2 Demand

Manufacturing Cost per Ton ($) A B C 900 1125 875 850 1200 950 20 11 18

Capacity 25 25

Formulate and solve the all-integer model that will determine how many tons of each type of coffee beans are produced in each plant while minimizing the total cost. ANSWER: Let Xij = number of tons of coffee beans produced in plant i of type j; i = 1, 2, and j = 1, 2, 3 Min. Cost = 900 X11 + X12 + X13 ≤ 25 X21 + X22 + X23 ≤ 25 X11 + X21 = 20 Copyright Cengage Learning. Powered by Cognero.

Page 11


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models X12 + X22 = 11 X13 + X23 = 18 Xij ≥ 0 and integer; i = 1, 2, and j = 1, 2, 3

Copyright Cengage Learning. Powered by Cognero.

Page 12


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 13


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models

49. FinFone Paper Mill is a small-scale paper-making company which produces four different types of paper. Each type of paper must go through processing on four different machines. The manufacturing time (in minutes) per unit of paper produced is listed in the following table.

Machine Type 1 2 3 4

A 2.4 2.1 1.6 2.5

Time required (in minutes) Paper Type B C 1.2 2.7 2.4 3.2 0.9 2.6 2.5 3.2

D 3.2 3.3 5.1 6.5

The maximum time allotted for each machine is 30 hours per week and at least 100 units of each type of paper should be made during the week. Profit per unit is: Paper Type Profit ($)

A 0.25

B 0.32

C 0.44

D 0.5

Develop and solve an all-integer model that will determine, using the available machine time, the number of units of each paper type to be produced in order to meet the weekly demand and to maximize the profit. ANSWER: Let A = Number of units of Type A paper made B = Number of units of Type B paper made Copyright Cengage Learning. Powered by Cognero.

Page 14


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models C = Number of units of Type C paper made D = Number of units of Type D paper made Max 0.25A + 0.32B + 0.44C + 0.5D s.t. 2.4A + 1.2B + 2.7C + 3.2D £ 1800 2.1A + 2.4B + 3.2C + 3.3D £ 1800 1.6A + 0.9B + 2.6C + 5.1D £ 1800 2.5A + 2.5B + 3.2C + 6.5D £ 1800 A, B, C, D ³ 100 A, B, C, D ³ 0 and integer

Copyright Cengage Learning. Powered by Cognero.

Page 15


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 16


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models

50. The following questions refer to an advertisement budgeting problem involving printing of five magazines represented by binary variables M1, M2, M3, M4, and M5. a. Write a constraint modeling a situation in which two of the magazines M1, M4, and M5 must be printed. b. Write a constraint modeling a situation in which, if M2 or M3 is printed, they must both be printed. c. Write a constraint modeling a situation in which magazine M1 or M3 must be printed, but not both. d. Write a constraint modeling a situation where M2 cannot be printed unless both magazines M3 and M5 are also printed. e. Write a constraint in which not more than 4 of all the five magazines have to be printed. f. Write a constraint in which exactly five of the magazines are printed. ANSWER: a. M1 + M4 + M5 = 2 b. M2 – M3 = 0 c. M1 + M3 = 1 d. M2 ≤ (M3 + M5) / 2 Copyright Cengage Learning. Powered by Cognero.

Page 17


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models e. M1 + M2 + M3 + M4 + M5 ≤ 4 f. M1 + M2 + M3 + M4 + M5 = 5 51. Delisshious Toasty Chocolates Company primarily produces two types of chocolate bars, Almond Tasty and Cashew Crunchy. The personnel costs incurred per day to produce the two types of chocolate bars are $100 and $120, respectively. The production cost per chocolate bar is $2 for Almond Tasty and $2.5 for Cashew Crunchy. The daily production capacities of Almond Tasty and Cashew Crunchy chocolate bars are 1100 and 1250, respectively. Only one type of bar can be produced on a given day since only one machine is available. Let C1 = the number of Almond Tasty chocolate bars produced C2 = the number of Cashew Crunchy chocolate bars are produced Y1 = 1 if the machine produces Almond Tasty; 0, otherwise Y2 = 1 if the machine produces Cashew Crunchy; 0, otherwise a. Write a constraint that sets the next day’s maximum production of Almond Tasty to 1100. b. Write a constraint that sets the next day’s maximum production of Cashew Crunchy to 1250. c. Write a constraint that requires that production be set up for exactly one of the two chocolates bars. d. Write the cost function to be minimized. ANSWER: a. C1 ≤ 1100Y1 b. C2 ≤ 1250Y2 c. Y1 + Y2 = 1 d. Min 2C1 + 2.5C2 + 100Y1 + 120Y2 52. A chocolate making company largely produces one particular type of crunchy chocolate bar. Only one of two machines, Machine-1 or Machine-2, can be used to produce this chocolate bar on any given day. The maintenance costs incurred on these two machines per day are $100 and $120, respectively. The manufacturing cost per chocolate bar is $2.5 for Machine-1 and $2 for Machine-2. The maximum daily production capacities for Machine-1 and Machine-2 are 1100 and 1250, respectively. Demand requires that at least 1000 chocolate bars be produced per day. Develop and solve a binary integer programming model for minimizing the total cost. ANSWER: Let C1 = the number of chocolate bars produced by Machine-1 C2 = the number of chocolate bars produced by Machine-2 Y1 = 1 if Machine-1 produces chocolate bar; 0, otherwise Y2 = 1 if Machine-2 produces chocolate bar; 0, otherwise Min 2.5C1 + 2C2 + 100Y1 + 120Y2 s.t. C1 ≤ 1100Y1 C2 ≤ 1250Y2 Y1 + Y2 = 1 C1 + C2 ≥ 1000 C1, C2 ≥ 0 and integer Y1, Y2 = 0, 1

Copyright Cengage Learning. Powered by Cognero.

Page 18


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 19


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 20


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models

53. A shipping freighter has space for two more shipping containers, but the combined weight cannot go over 20 tons. Four shipping containers are being considered. The following table provides details on the weight (in tons) and value of the contents of each container. Container Weight of container (tons) Value / Container

1 5 $6,000

2 6 $5,500

3 9 $7,500

4 7 $6,000

Develop a binary integer model that will determine the two containers that will maximize the value of the shipment. ANSWER: Let C1 = if Container 1 is considered for shipment; 0 otherwise C2 = if Container 2 is considered for shipment; 0 otherwise C3 = if Container 3 is considered for shipment; 0 otherwise C4 = if Container 4 is considered for shipment; 0 otherwise Max 6000C1 + 5500C2 + 7500C3 + 6000C4 s.t. 5C1 + 6C2 + 9C3 + 7C4 ≤ 20 C1 + C2 + C3 + C4 = 2 Ci = 0, 1 (for i = 1, 2, 3, 4) Copyright Cengage Learning. Powered by Cognero.

Page 21


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 22


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models

54. Best Ink Printing Co. received an order to print a minimum of 50,000 tickets for a concert. They have three printing machines available to meet the order they received. The set-up cost of these machines and the unit cost/ticket printed using each machine along with their maximum production are provided in the table below: Machine A B C

Set-up cost $7,000 $4,000 $5,400

Cost per unit $18 $21 $24

Maximum Production 30,000 25,000 30,000

a. Formulate a binary integer linear programming model to find which machines should be used to print the required number of tickets in order to minimize the cost. b. Solve the problem in part a. ANSWER: a. Let P1 = Number of tickets printed by Machine A P2 = Number of tickets printed by Machine B P3 = Number of tickets printed by Machine C A = 1 if Machine A is used to print; 0 otherwise B = 1 if Machine B is used to print; 0 otherwise C = 1 if Machine C is used to print; 0 otherwise Min 18P1 + 21P2 + 24P3 + 7000A + 4000B + 5400C Copyright Cengage Learning. Powered by Cognero.

Page 23


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models s.t. P1 + P2 + P3 ≥ 50,000 P1 ≤ 30,000A P2 ≤ 25,000B P3 ≤ 30,000C P1, P2, P3 ≥ 0 and integer A, B, C = 0, 1 b.

Copyright Cengage Learning. Powered by Cognero.

Page 24


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 25


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models

55. Light-Twilight, Inc. currently has four factories where light bulbs are manufactured, but budget cuts are forcing the company to limit production to only two of the factories. Once produced, the light bulbs are shipped to five distribution centers. The cost, demand, and maximum volume details are given as below. Shipping cost/1000 bulbs ($) Warehouse Factory

1

2

3

4

5

A B C D Demand (in 1000’s)

78 49 31 28

25 72 90 20

39 27 31 18

77 17 25 27

48 29 42 21

40

36

16

23

29

Maximum Fixed Cost Volume (in ($) 1000’s) 90 32,700 85 35,000 80 40,000 86 20,000

a. Formulate a mixed-integer programming model to identify which two factories the management should retain in order to fulfill the estimated demand while minimizing the cost. b. Solve the model you formulated in part (a). What is the optimum cost? Copyright Cengage Learning. Powered by Cognero.

Page 26


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models ANSWER: a. Let Xij= Number of light bulbs shipped from ith factory to the distribution center A = 1 if factory A is retained; 0 if otherwise B = 1 if factory B is retained; 0 if otherwise C = 1 if factory C is retained; 0 if otherwise D = 1 if factory D is retained; 0 if otherwise Min 78X11 + 25X12 + 39X13 + 77X14 + 48X15 + 49X21 + 72X22 + 27X23 + 17X24 + 29X25 + 31X31 + 90X32 + 31X33 + 25X34 + 42X35 + 28X41 + 20X42 + 18X43 + 27X44 + 21X45 + 32700A + 35000B + 40000C + 20000D s.t. X11 +X12 +X13 +X14 +X15 ≤ 90A X21 +X22 +X23 +X24 +X25 ≤ 85B X31 + X32 + X33 + X34 + X35 ≤ 80C X41 + X42 + X43 + X44 + X45 ≤ 86D X11 + X21 + X31 + X41 = 40 X12 + X22 + X32 + X42 = 36 X13 + X23 + X33 + X43 = 16 X14 + X24 + X34 + X44 = 23 X15 + X25 + X35 + X45 = 29 A+B+C+D=2 Xij ≥ 0, where i = 1, 2, 3, 4; j = 1, 2, 3, 4, 5 A, B, C, D = 0, 1 b.

Copyright Cengage Learning. Powered by Cognero.

Page 27


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 28


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models

The optimal cost is $56,736. 56. Sansuit Investments is deciding on future investments for the coming two years and is considering four bonds. The investment details for the next two years are given in the table below.

Bond A Bond B Bond C Bond D

Investment Requirements ($) Year 1 Year 2 25,000 30,000 15,000 21,000 8,000 9,500 10,000 7,000

The net worth of these four bonds at maturity is $60,000, $40,000, $25,500, and $18,000, respectively. The firm plans to invest $35,000 and $62,000 in Year 1 and Year 2, respectively. Develop and solve a binary integer programming model for maximizing the net worth. ANSWER: Let X1 = 1 if Bond A is selected for investment; 0 if it is not X2 = 1 if Bond B is selected for investment; 0 if it is not X3 = 1 if Bond C is selected for investment; 0 if it is not Copyright Cengage Learning. Powered by Cognero.

Page 29


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models X4 = 1 if Bond D is selected for investment; 0 if it is not Max 60000X1 + 40000X2 + 25500X3 + 18000X4 s.t. 25000X1 + 15000X2 + 8000X3 + 10000X4 ≤ 35000 30000X1 + 21000X2 + 9500X3 + 7000X4 ≤ 62000 X1, X2, X3, X4 = 0, 1

Copyright Cengage Learning. Powered by Cognero.

Page 30


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 31


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models

57. Sansuit Investments is deciding on future investment for the coming two years and is considering four bonds. The investment details for the next two years are given in the table below.

Bond A Bond B Bond C Bond D

Investment Requirements ($) Year 1 Year 2 25,000 30,000 15,000 21,000 8,000 9,500 10,000 7,000

The net worth of these four bonds at maturity is $60,000, $40,000, $25,500, and $18,000, respectively. The firm plans to invest $35,000 and $62,000 in Year 1 and Year 2, respectively. a. Develop and solve a binary integer programming model for maximizing the return on investment (in dollars) assuming that only one of the bonds can be considered. How much money is invested? What is the return on investment (in dollars)? b. Suppose the investment has to be made on Bond B, and only two of the four bonds can be considered for investment. Modify your formulation from Part (a) to reflect this new situation. How much money is invested? What is the return on investment (in dollars)? Based on the ratio of return vs. investment, which of these two options would you recommend? ANSWER: a. Let X1 = 1 if Bond A is selected for investment; 0 if it is not Copyright Cengage Learning. Powered by Cognero.

Page 32


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models X2 = 1 if Bond B is selected for investment; 0 if it is not X3 = 1 if Bond C is selected for investment; 0 if it is not X4 = 1 if Bond D is selected for investment; 0 if it is not Max 5,000X1 + 4,000X2 + 8,000X3 + 1,000X4 s.t. 25,000X1 + 15,000X2 + 8,000X3 + 10,000X4 ≤ 3,5000 30,000X1 + 21,000X2 + 9,500X3 + 7,000X4 ≤ 6,2000 X1 + X2 + X3 + X4 = 1 X1, X2, X3, X4 = 0, 1

Copyright Cengage Learning. Powered by Cognero.

Page 33


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 34


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models The optimal return on investment is $8,000 with an investment of $17,000. b. Let X1 = 1 if Bond A is selected for investment; 0 if it is not X2 = 1 if Bond B is selected for investment; 0 if it is not X3 = 1 if Bond C is selected for investment; 0 if it is not X4 = 1 if Bond D is selected for investment; 0 if it is not Max 60,000X1 + 40,000X2 + 25,500X3 + 18,000X4 s.t. 25,000X1 + 15,000X2 + 8,000X3 + 10,000X4 ≤ 35,000 30,000X1 + 21,000X2 + 9,500X3 + 7,000X4 ≤ 62,000 X2 = 1 X1 + X3 + X4 = 1 X1, X2, X3, X4 = 0, 1

Copyright Cengage Learning. Powered by Cognero.

Page 35


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 36


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models

The optimal return on investment is $12,000 with an investment of $53,500. The “rate of return,” or ratio return/investment, for Option A (8000/17500 = 0.457) is greater than the rate for Option B (12000/53500 = 0.224). Therefore, Option A is better. 58. A manufacturer wants to construct warehouses in six different locations of the city to supply dry cells to his customers on time. The manufacturer wants to construct the minimum number of warehouses such that each warehouse is within 40 miles of at least one other warehouse. The following table provides the distance (in miles) between the locations. To From Location A Location B Location C Location D Location E Location A 0 35 40 45 60 Location B 0 35 40 70 Location C 0 45 50 Location D 0 40 Location E 0 Location F

Location F 70 75 50 50 30 0

Formulate and solve an integer linear program that can be used to determine the minimum number of warehouses needed to be constructed. What are their locations? ANSWER: Let X1, X2, X3, X4, X5, and X6 be the variables indicating the locations A, B, C, D, E, and F, respectively, where the warehouses are to be constructed. Xi = 1 if warehouse is constructed in Location i; 0, if otherwise, where i = 1, 2, …, 6 Copyright Cengage Learning. Powered by Cognero.

Page 37


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models Min X1 + X2 + X3 + X4 + X5 + X6 s.t. X1 + X2 + X3 ³ 1 Location A constraint X1 + X2 + X3 + X4 ³ 1 Location B constraint X1 + X2 + X3 ³ 1 Location C constraint X2 + X4 + X5 ³ 1 Location D constraint X4 + X5 + X6 ³ 1 Location E constraint X5 + X6 ³ 1 Location F constraint Xi = 0, 1 (i = 1, 2, 3, 4, 5, 6)

Copyright Cengage Learning. Powered by Cognero.

Page 38


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 39


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models

The minimum number of warehouses to be constructed is two at the locations B and E. 59. A store is offering a discount on 800 pairs of basketball shoes. The amount of the discount varies and is not revealed to the customer until checkout. The distribution of discounts is given in the below table. Store Location L1 L2 L3 L4 L5 L6 L7 L8 L9

Potential Areas Covered 3, 4, 6, 8 1, 5, 9 1, 4, 7 2, 6, 7 3, 4, 9 2, 7, 9 4, 8, 9 1, 2, 5, 6 3, 6, 8

Formulate an integer programming model that could be used to find the minimum number of stores to open in order to cover customers of all areas for the home delivery service. ANSWER: Let L1, L2, L3, L4, L5, L6, L7, L8, and L9 be the variables indicating the locations L1, L2, L3, L4, L5, L6, L7, L8, and L9, respectively. Copyright Cengage Learning. Powered by Cognero.

Page 40


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models Li = 1 if the location i is selected for the store; 0 otherwise where i = 1, 2, …, 9. Min L1 + L2 + L3 + L4 + L5 + L6 + L7 + L8 + L9 s.t. L2 + L3 + L8 ≥ 1 Potential Area 1 L4 + L6 + L8 ≥ 1 Potential Area 2 L1 + L5 + L9 ≥ 1 Potential Area 3 L1 + L3 + L5 + L7 ≥ 1 Potential Area 4 L2 + L8 ≥ 1 Potential Area 5 L1 + L4 + L8 + L9 ≥ 1 Potential Area 6 L3 + L4 + L6 ≥ 1 Potential Area 7 L1 + L7 + L9 ≥ 1 Potential Area 8 L2 + L5 + L6 + L7 ≥ 1 Potential Area 9 Li = 0, 1 where i = 1, 2, …, 9 60. To meet excess demand for the pizza home delivery services, ROFiL Pizza is planning to open new stores in various regions. The store locations that are under consideration and their coverage areas are given in the following table. Store Location L1 L2 L3 L4 L5 L6 L7 L8 L9

Potential Areas Covered 3, 4, 6, 8 1, 5, 9 1, 4, 7 2, 6, 7 3, 4, 9 2, 7, 9 4, 8, 9 1, 2, 5, 6 3, 6, 8

Develop an integer optimization model that determines the minimum number of stores to open in order to meet the coverage demand. ANSWER: Note: There are alternative optima for this model. Let L1, L2, L3, L4, L5, L6, L7, L8, and L9 be the variables indicating the locations L1, L2, L3, L4, L5, L6, L7, L8, and L9, respectively. Li = 1 if the location i is selected for the store; 0 otherwise where i = 1, 2, …, 9. Min L1 + L2 + L3 + L4 + L5 + L6 + L7 + L8 + L9 s.t. L2 + L3 + L8 ≥ 1 Potential Area 1 L4 + L6 + L8 ≥ 1 Potential Area 2 L1 + L5 + L9 ≥ 1 Potential Area 3 L1 + L3 + L5 + L7 ≥ 1 Potential Area 4 L2 + L8 ≥ 1 Potential Area 5 L1 + L4 + L8 + L9 ≥ 1 Potential Area 6 L3 + L4 + L6 ≥ 1 Potential Area 7 L1 + L7 + L9 ≥ 1 Potential Area 8 Copyright Cengage Learning. Powered by Cognero.

Page 41


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models L2 + L5 + L6 + L7 ≥ 1 Potential Area 9 Li = 0, 1 where i = 1, 2, …, 9

Copyright Cengage Learning. Powered by Cognero.

Page 42


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models

The minimum number of stores needed to open is 3 and the locations are L1, L6, and L8. 61. Greenbell Software, Inc. conducted a study on its smartphone products in the market to determine which phone has the best features in terms of three prominent attributes: operating system of the phone (A or B), RAM (512MB or 1GB), and the rear camera specifications (3MP, 5MP, or 7MP). A sample of eight customers participated in the study and provided the following part-worths for each of the above attributes. Consumer 1 2 3 4 5 6 7 8

Operating System RAM A B 512 MB 1 GB 25 29 20 45 35 32 30 35 45 35 30 30 55 65 25 45 40 40 35 40 30 47 40 35 35 45 30 30 25 25 25 30

3MP 18 27 15 25 20 15 20 25

Camera 5MP 18 19 25 20 25 20 20 30

7MP 15 13 30 35 30 15 20 30

Suppose the overall utility (sum of part-worths) of the current favorite Greenbell smartphone is 100 for each consumer. What new product design will maximize the share of choice for the eight consumers in the sample? ANSWER: Note: There are alternative optima for this model. Lij = 1 if Greenbell chooses level i for attribute j; 0 otherwise Yk = 1 if consumer k prefers the new Greenbell product; 0 otherwise Copyright Cengage Learning. Powered by Cognero.

Page 43


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models Max Y1 + Y2 + . . . + Y8 s.t. 25L11 + 29L21 + 20L12 + 45L22 + 18L13 + 18L23 + 15L33 ≥ 1 + 100Y1 35L11 + 32L21 + 30L12 + 35L22 + 27L13 + 19L23 + 13L33 ≥ 1 + 100Y2 45L11 + 35L21 + 30L12 + 30L22 + 15L13 + 25L23 + 30L33 ≥ 1 + 100Y3 55L11 + 65L21 + 25L12 + 45L22 + 25L13 + 20L23 + 35L33 ≥ 1 + 100Y4 40L11 + 40L21 + 35L12 + 40L22 + 20L13 + 25L23 + 30L33 ≥ 1 + 100Y5 30L11 + 47L21 + 40L12 + 35L22 + 15L13 + 20L23 + 15L33 ≥ 1 + 100Y6 35L11 + 45L21 + 30L12 + 30L22 + 20L13 + 20L23 + 20L33 ≥ 1 + 100Y7 25L11 + 25L21 + 25L12 + 30L22 + 25L13 + 30L23 + 30L33 ≥ 1 + 100Y8 L11 + L21 = 1 L12 + L22 = 1 L13 + L23 + L33 = 1

Copyright Cengage Learning. Powered by Cognero.

Page 44


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 45


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models

The optimal solution obtained using Excel Solver shows l21 = l22= l23 = 1. This indicates that a smartphone with an operating system B, RAM 1GB, and 5MP camera will maximize the share of choices. The optimal solution also has y4 = y5 = y6 = 1 which indicates that customers 4, 5, and 6 will prefer this new smartphone. Note: An alternative optimal solution is l11 = l12 = l33 = 1. 62. Greenbell Software, Inc. conducted a study on its smartphone products in the market to determine which phone has the best features in terms of three prominent attributes: operating system of the phone (A or B), RAM (512MB or 1GB), and the rear camera specifications (3MP, 5MP, or 7MP). A sample of eight customers participated in the study and provided the following part-worths for each of the above attributes. Consumer 1 2 3

Operating System A B 25 29 35 32 45 35

512 MB 20 30 30

Copyright Cengage Learning. Powered by Cognero.

RAM 1 GB 45 35 30

3MP 18 27 15

Camera 5MP 18 19 25

7MP 15 13 30 Page 46


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models 4 5 6 7 8

55 40 30 35 25

65 40 47 45 25

25 35 40 30 25

45 40 35 30 30

25 20 15 20 25

20 25 20 20 30

35 30 15 20 30

Assume the overall utility (sum of part-worths) of the current favorite Greenbell smartphone for customers 1 to 5 is 105 and customers 6 to 8 is 90. What new product design will maximize the share of choice for the eight consumers in the sample? ANSWER: Note: There are alternative optima for this model. Lij = 1 if Greenbell chooses level i for attribute j; 0 otherwise Yk = 1 if consumer k prefers the new Greenbell product; 0 otherwise Max Y1 + Y2 + . . . + Y8 s.t. 25L11 + 29L21 + 20L12 + 45L22 + 18L13 + 18L23 + 15L33 ≥ 1 + 105Y1 35L11 + 32L21 + 30L12 + 35L22 + 27L13 + 19L23 + 13L33 ≥ 1 + 105Y2 45L11 + 35L21 + 30L12 + 30L22 + 15L13 + 25L23 + 30L33 ≥ 1 + 105Y3 55L11 + 65L21 + 25L12 + 45L22 + 25L13 + 20L23 + 35L33 ≥ 1 + 105Y4 40L11 + 40L21 + 35L12 + 40L22 + 20L13 + 25L23 + 30L33 ≥ 1 + 105Y5 30L11 + 47L21 + 40L12 + 35L22 + 15L13 + 20L23 + 15L33 ≥ 1 + 90Y6 35L11 + 45L21 + 30L12 + 30L22 + 20L13 + 20L23 + 20L33 ≥ 1 + 90Y7 25L11 + 25L21 + 25L12 + 30L22 + 25L13 + 30L23 + 30L33 ≥ 1 + 90Y8 L11 + L21 = 1 L12 + L22 = 1 L13 + L23 + L33 = 1

Copyright Cengage Learning. Powered by Cognero.

Page 47


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 48


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 49


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models

The optimal solution obtained using Excel Solver shows l21 = l22= l33 = 1. This indicates that a smartphone with an operating system B, RAM 1GB, and 7MP camera will maximize the share of choices. The optimal solution also has y4 = y5 = y6 = y7 = 1 which indicates that customers 4, 5, 6, and 7 will prefer this new smartphone. 63. Andrew is ready to invest $200,000 in stocks and he has been provided nine different alternatives by his financial consultant. The following stocks belong to three different industrial sectors and each sector has three varieties of stocks each with different expected rate of return. The average rate of return taken for the past ten years is provided with each of the nine stocks. Stock 1 2 3 4 5 6 7 8

Industry Airlines Airlines Airlines Banking Banking Banking Agriculture Agriculture

Annual Return 18.24% 28.75% 11.08% 20.12% 14.00% 26.17% 23.67% 18.25%

Copyright Cengage Learning. Powered by Cognero.

Page 50


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models 9

Agriculture

16.50%

The decision will be based on the constraints provided below: • • • • •

Exactly five alternatives should be chosen. One stock can have a maximum invest of $55,000. Any stock chosen must have a minimum investment of at least $25,000. For the Airlines sector, the maximum number of stocks chosen should be two. The total amount invested in Banking must be at least as much as the amount invested in Agriculture.

Formulate a model that will decide Andrew’s investment strategy to maximize his expected annual return. ANSWER: Let X1, X2, X3, X4, X5, X6, X7, X8, and X9 be the amount (in dollars) invested in Stocks 1, 2, 3, …, 9, respectively. Let Yi = 1 if Andrew invests in Stock i; 0, otherwise; where i = 1, 2, …, 9. Max 0.1824X1 + 0.2875X2 + 0.1108X3 + 0.2012X4 + 0.1400X5 + 0.2617X6 + 0.2367X7 + 0.1825X8 + 0.1650X9 s.t. Y1 + Y2 + Y3 + Y4 + Y5 + Y6 + Y7 + Y8 + Y9 = 5 Xi £ 55,000Yi; where i = 1, 2, …, 9 Xi ³ 25000Yi; where i = 1, 2, …, 9 Y1 + Y2 + Y3 £ 2 X4 + X5 + X6 ³ X7 + X8 +X9 X1+ X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9 = 200,000 Xi ³ 0; where i = 1, 2, …, 9 64. Andrew is ready to invest $200,000 in stocks and he has been provided nine different alternatives by his financial consultant. The following stocks belong to three different industrial sectors and each sector has three varieties of stocks each with different expected rate of return. The average rate of return taken for the past ten years is provided with each of the nine stocks. Stock 1 2 3 4 5 6 7 8 9

Industry Airlines Airlines Airlines Banking Banking Banking Agriculture Agriculture Agriculture

Annual Return 18.24% 28.75% 11.08% 20.12% 14.00% 26.17% 23.67% 18.25% 16.50%

The decision will be based on the constraints provided below: • • • • •

Exactly five alternatives should be chosen. Any stock chosen can have a maximum investment of $55,000. Any stock chosen must have a minimum investment of at least $25,000. For the Airlines sector, the maximum number of stocks that can be chosen is two. The total amount invested in Banking must be at least as much as the amount invested in Agriculture.

Copyright Cengage Learning. Powered by Cognero.

Page 51


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models Formulate and solve a model that will decide Andrew’s investment strategy to maximize his expected annual return. ANSWER: Let X1, X2, X3, X4, X5, X6, X7, X8, and X9 be the amount (in dollars) invested in Stocks 1, 2, 3, …, 9, respectively. Let Yi = 1 if Andrew invests in Stock i; 0, otherwise where i = 1, 2, …, 9. Max 0.1824X1 + 0.2875X2 + 0.1108X3 + 0.2012X4 + 0.1400X5 + 0.2617X6 + 0.2367X7 + 0.1825X8 + 0.1650X9 s.t. Y1 + Y2 + Y3 + Y4 + Y5 + Y6 + Y7 + Y8 + Y9 = 5 Xi £ 55,000Yi; where i = 1, 2, …, 9 Xi ³ 25000Yi; where i = 1, 2, …, 9 Y1 + Y2 + Y3 £ 2 X4 + X5 + X6 ³ X7 + X8 +X9 X1+ X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9 = 200,000 Xi ³ 0; where i = 1, 2, …, 9

Copyright Cengage Learning. Powered by Cognero.

Page 52


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 53


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models

65. Denver Transportation Company is considering investing in several projects that have varying capital requirements over the next four years. Faced with limited capital each year, management would like to select the most profitable projects that it can afford. The estimated net present value for each project, the capital requirements, and the available capital over the four-year period are shown in the following binary integer linear programming model. Questions 1. What is the optimal solution? 2. What is the total estimated net present value at the optimal solution? 3. Based upon the optimal solution and slack variables, what recommendations could you make to management? Where S = 1 if the store expansion project is accepted; 0 if rejected F = 1 if the fleet expansion project is accepted; 0 if rejected M = 1 if the equipment upgrade project is accepted; 0 if rejected R = 1 if the market research project is accepted; 0 if rejected and currency is in $1,000s. Maximize s.t.

75S + 50F + 16E + 12R 30S + 30F + 10E + 20R ≤ 85 30S + 35F + 20E + 10R ≤ 60 40S + 50F + 20E + 10R ≤ 100

Copyright Cengage Learning. Powered by Cognero.

Page 54


Name:

Class:

Date:

Chapter 13 - Integer Linear Optimization Models 65S + 55F + 20E + 10R ≤ 125 S, F, E, R = 0, 1 (binary) ANSWER: Model

S

F

E

Max Net Present Value

R

Store Fleet Equipment Market Expansion Expansion Upgrade Research Objective 1 0 1 1 Function Coefficients $75 $50 $16 $12 $103 Max Net Present Value: =SUMPRODUCT(F7:I7, $F$6:$I$6) S = 0, F = 1, E = 1, and R = 1. So, management should invest in the store expansion, equipment upgrade, and market research, which will generate a maximum present net value of $103,000. There will be $15,000 in Year 1, $30,000 in Year 3, and $30,000 in Year 4; therefore, it is possible to accumulate the four-year total slack of $75,000 to fund Fleet Expansion. An added $35,000 would yield a return of an additional $153,000. Constraints Year 1 Capital Year 2 Capital Year 3 Capital Year 4 Capital

$30 $30 $40 $65

Resources $30 $10 $20 $35 $10 $10 $50 $20 $10 $55 $20 $10

Used $60 $60 $70 $95

≤ ≤ ≤ ≤

Available $85 $60 $100 $125

$60 = =SUMPRODUCT(F11:I11,$F$6:$I$6) $60 = =SUMPRODUCT(F12:I12,$F$6:$I$6) $70 = =SUMPRODUCT(F13:I13,$F$6:$I$6) $95 = =SUMPRODUCT(F14:I14,$F$6:$I$6)

Copyright Cengage Learning. Powered by Cognero.

Page 55


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models Multiple Choice 1. In a nonlinear optimization problem, _____ a. the objective function is a nonlinear function of the constraints. b. all the constraints are nonlinear only when the objective is to maximize the function of the decision variables. c. at least one term in the objective function or a constraint is nonlinear. d. both the objective function and the constraints must have all nonlinear terms. ANSWER: c 2. A nonlinear function with at least one term raised to the power of two is known as a _____. a. hyperbolic function b. quadratic function c. logarithmic function d. cubic function ANSWER: b 3. A _____is the shadow price of a binding simple lower or upper bound on the decision variable. a. reduced gradient b. binding constraint c. binary variable d. local optimum ANSWER: a 4. The reduced gradient is analogous to the _____ for linear models. a. binary variable b. binding constraint c. reduced cost d. objective coefficient ANSWER: c 5. The Lagrangian multiplier is the _____ for a constraint in a nonlinear problem. a. shadow price b. payoff value c. reducing gradient d. reduced cost ANSWER: a 6. In a nonlinear problem, the rate of change of the objective function with respect to the right-hand side of a constraint is given by the _____. a. slope of the contour line b. local optimum c. reduced gradient d. Lagrangian multiplier ANSWER: d 7. The _____ of a solution is a mathematical concept that refers to the set of points within a relatively close proximity of Copyright Cengage Learning. Powered by Cognero.

Page 1


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models the solution. a. objective function contour b. neighborhood c. regression equation d. Lagrangian multiplier ANSWER: b 8. A feasible solution is a(n) _____ if there are no other feasible solutions with a better objective function value in the immediate neighborhood. a. efficient frontier b. local optimum c. global maximum d. diverging function ANSWER: b 9. If there are no other feasible solutions with a larger objective function value in the immediate neighborhood, then the feasible solution is known as _____. a. a global maximum b. infeasible c. a nonlinear solution d. a local maximum ANSWER: d 10. A feasible solution is a local minimum if there are no other feasible solutions with a _____. a. smaller objective function value in the immediate neighborhood b. same objective function value in the immediate neighborhood c. set of points defining the minimum possible risk in the entire feasible region d. same objective function value in the entire feasible region ANSWER: a 11. A feasible solution is _____ if there are no other feasible points with a better objective function value in the entire feasible region. a. infeasible b. unbounded c. nonlinear d. a global optimum ANSWER: d 12. If there are no other feasible points with a larger objective function value in the entire feasible region, a feasible solution is _____. a. an efficient frontier b. a global maximum c. not a local maximum d. a global minimum Copyright Cengage Learning. Powered by Cognero.

Page 2


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models ANSWER: b 13. A feasible solution is _____ if there are no other feasible points with a smaller objective function value in the entire feasible region. a. a global minimum b. not a local maximum c. not a local minimum d. bowl-shaped ANSWER: a 14. A global minimum _____. a. is also a local maximum b. need not be a local maximum, but vice versa is true c. is also a local minimum d. need not be local minimum, but vice versa is true ANSWER: c 15. A function that is bowl-shaped down is called a _____ function. a. concave b. convex c. conic d. linear ANSWER: a 16. In reviewing the image below, which of the following functions is most likely to yield the above shape?

a. f(X, Y) = X2 + Y2 b. f(X, Y) = –X – Y c. f(X, Y) = –X2 – Y2 d. f(X, Y) = Xsin(5πX) + Ysin(5πY) ANSWER: c 17. In reviewing the image below, the point (0, 0, 0) is a(n) _____ for the given concave function.

Copyright Cengage Learning. Powered by Cognero.

Page 3


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

a. local maximum b. local minimum c. convergence point d. endpoint ANSWER: a 18. If all the squared terms in a quadratic function have a negative coefficient and there are no cross-product terms, then the function is a _____ function. a. convex quadratic b. nonlinear objective c. concave quadratic d. negative elliptical ANSWER: c 19. A function that is bowl-shaped up is called a(n) _____ function. a. concave b. optimal c. convex d. elliptical ANSWER: c 20. Which of the following functions yields the shape shown below?

a. f(X, Y) = X2 + Y2 b. f(X, Y) = Xsin(2πY) + Ysin(2πX) c. f(X, Y) = –X2 – Y2 d. f(X, Y) = Xsin(5πX) + Ysin(5πY) ANSWER: a 21. In reviewing the image below, what is the minimum value for this function? Copyright Cengage Learning. Powered by Cognero.

Page 4


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

a. –8 b. 0 c. –1 d. 1 ANSWER: b 22. Using the graph given below, which of the following equations represents the function shown in the graph?

a. f(X, Y) = Xlog(2πY) + Ylog(2πX) b. f(X, Y) = X – Y c. f(X, Y) = –X2 – Y2 d. f(X, Y) = Xsin(5πX) + Ysin(5πY) ANSWER: d 23. Using the graph below, the feasible region for the function represented in the graph is _____.

Copyright Cengage Learning. Powered by Cognero.

Page 5


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

a. –1 £ X £ 1, –1 £ Y £ 1 b. –1.5 £ X £ 1, 0 £ Y £ 8 c. –1.5 £ X £ 2.0, –1.5 £ Y £ 2.0 d. 0 £ X £ 1, 0 £ Y £ 1 ANSWER: d 24. Using the graph below, which of the following is true of its function?

a. It has single local minimum. b. It has multiple local optima. c. It has single local maximum. d. It has no maxima and minima. ANSWER: b 25. The _____ option in Excel Solver is helpful when the solution to a problem appears to depend on the starting values for the decision variables. a. restart b. convergence c. derivatives Copyright Cengage Learning. Powered by Cognero.

Page 6


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models d. multistart ANSWER: d 26. Solving nonlinear problems with local optimal solutions is performed using _____, in Excel Solver, which is based on more classical optimization techniques. a. Goal Seeker b. Linear Regression c. GRG Nonlinear d. Simplex LP ANSWER: c 27. Excel Solver’s _____ is based on a method that searches for an optimal solution by iteratively adjusting a population of candidate solutions. a. Evolutionary Solver b. Goal Seeker c. Simplex LP d. GRG Nonlinear ANSWER: a 28. A portfolio optimization model used to construct a portfolio that minimizes risk subject to a constraint requiring a minimum level of return is known as _____. a. a capital budgeting pricing model b. a market share optimization model c. the Hauck maximum variance portfolio model d. the Markowitz mean-variance portfolio model ANSWER: d 29. The measure of risk most often associated with the Markowitz portfolio model is the _____. a. expected return of the portfolio b. annual interest on the portfolio c. variance of the portfolio’s return d. number of investments listed in the portfolio ANSWER: c 30. The portfolio variance is the _____. a. sum of the squares of the deviations from the mean value under each scenario b. average of the sum of the squares of the deviations from the mean value under each investment scenario c. average of the product of the squares of the deviations from the mean value under each scenario d. average of the sum of the deviations from the mean value under each investment scenario ANSWER: b 31. If the portfolio variance were equal to zero, the amount of risk would be _____. a. unity b. a positive number greater than 1 c. negative always d. zero Copyright Cengage Learning. Powered by Cognero.

Page 7


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models ANSWER: d 32. One of the ways to formulate the Markowitz model is to _____. a. maximize the variance of the portfolio subject to a constraint on the expected return of the portfolio b. minimize the expected return of the portfolio subject to a constraint on variance c. minimize the variance of the portfolio subject to a constraint on the expected return of the portfolio d. minimize the expected return of the portfolio with no constraint on variance ANSWER: c 33. Which of the following is a second way of formulating the Markowitz model? a. Maximize the expected return of the portfolio subject to a constraint on variance. b. Minimize the expected return of the portfolio subject to a constraint on variance. c. Maximize the variance of the portfolio subject to a constraint on the expected return of the portfolio. d. Maximize the variance of the portfolio with no constraint needed for the expected return of the portfolio. ANSWER: a 34. A(n) _____ is a set of points defining the minimum possible risk for a set of return values. a. contour b. efficient frontier c. unity constraint d. reduced gradient ANSWER: b 35. The _____ forecasting model uses nonlinear optimization to forecast the adoption of innovative and new technologies in the marketplace. a. Hauck b. LMS c. Markowitz d. Bass ANSWER: d 36. In the Bass forecasting model, parameter m _____. a. measures the likelihood of adoption due to a potential adopter being influenced by someone who has already adopted the product b. measures the likelihood of adoption, assuming no influence from someone who has already adopted the product c. refers to the number of people estimated to eventually adopt the new product d. refers to the number of people who have already adopted the new product ANSWER: c 37. In the Bass forecasting model, the _____ measures the likelihood of adoption due to a potential adopter being influenced by someone who has already adopted the product. a. coefficient of innovation b. coefficient of imitation Copyright Cengage Learning. Powered by Cognero.

Page 8


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models c. coefficient of regression d. coefficient of the objective function ANSWER: b 38. In the Bass forecasting model, the _____ measures the likelihood of adoption, assuming no influence from someone who has already purchased (adopted) the product. a. coefficient of correlation b. coefficient of imitation c. coefficient of independence d. coefficient of innovation ANSWER: d 39. Which of the following conclusions can be drawn from the below figure using the Bass forecasting model? (Note: Bass forecasting model is given by: Ft = (p + q[Ct – 1 /m]) (m – Ct – 1) where m = the number of people estimated to eventually adopt the new product, Ct – 1 = the number of people who have adopted the product through time t – 1, q = the coefficient of imitation, and p = the coefficient of innovation.

a. q < p b. q > p c. m < q d. p > m ANSWER: a 40. One of the ways to use the Bass forecasting model is to wait until several periods of data for the problem under consideration are available. This is known as the _____ approach. a. branch-and-bound b. cutting plane c. rolling-horizon d. sensible-period ANSWER: c Copyright Cengage Learning. Powered by Cognero.

Page 9


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models Subjective Short Answer 41. If a maximization problem has a single global optimum, will it have a local maximum? If yes, can it have more than one local maximum? Explain. ANSWER: If a maximization problem has a single global optimum, then that optimum must be a global maximum. All global maximum solutions are also local maximum solutions, so the maximization problem definitely will have at least one local maximum. It may have more than one local maxima. 42. If a minimization problem has a single global optimum, will it have a local minimum? If yes, can it have more than one local minimum? Explain. ANSWER: If a minimization problem has a single global optimum, then that optimum must be a global minimum. All global minimum solutions are also local minimum solutions, so the minimization problem definitely will have at least one local minimum. It may have more than one local minima. 43. If an optimization objective function produces a graph that is concave, will the global optimum be a maximum or minimum value? Explain. ANSWER: If an optimization objective function produces a graph that is concave, the global optimum will be a maximum. A function that is concave (bowl-shaped down) has a single local maximum value, which is also a global maximum. 44. If an optimization objective function produces a graph that is convex, how many local minimum solutions are possible? Explain. ANSWER: If an optimization objective function produces a graph that is convex, only one global minimum solution is possible. For a convex function, we know that if our computer software finds a local minimum, it has found a global minimum. 45. Is a location optimization problem an example of a maximization or a minimization problem? Explain. ANSWER: In a location problem, the goal is to minimize the sum of the distances from the desired central location to n other locations. The sum of the Euclidean (straight-line) distances is to be minimized. 46. Jeff is willing to invest $5000 in buying shares and bonds of a company to gain maximum returns. From his past experience, he estimates the relationship between returns and investments made in this company to be: R = –2S2 – 9B2 – 4SB + 20S + 30B. where, R = total returns in thousands of dollars S = thousands of dollars spent on Shares B = thousands of dollars spent on Bonds Jeff would like to develop a strategy that will lead to maximum return subject to the restriction provided on the amount available for investment. a. What is the value of return if $3,000 is invested in shares and $2,000 is invested bonds of the company? b. Formulate an optimization problem that can be solved to maximize the returns subject to investing no more than $5,000 on both shares and bonds. c. Determine the optimal amount to invest in shares and bonds of the company. How much return will Jeff gain? Round all your answers to two decimal places. ANSWER: a. With $3,000 being invested in shares and $2,000 being invested in bonds, we can simply substitute these values into the returns function (remembering that the variables are defined as thousands of dollars). R = – 2S2 – 9B2 – 4SB + 20S + 30B R = – 2(32) – 9(22) – 4(3)(2) + 20(3) + (30(2) Copyright Cengage Learning. Powered by Cognero.

Page 10


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models R = 42 Return of $42,000 will be realized with this allocation of the investment. b. and c. We simply add an investment constraint and non-negativity constraint to the return function that is to be maximized. Max – 2S2 – 9B2 – 4SB + 20S + 30B s.t. S+B≤5 S, B ≥ 0 The solution is S = $4,290 and B = $710 with return of $53,570. The spreadsheet model is:

Copyright Cengage Learning. Powered by Cognero.

Page 11


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 12


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

47. Consider the objective function, Y = total output A= total-factor productivity L = labor input K= capital input a = capital input share of contribution for L B = capital input share of contribution for K

, where,

a. Assume a = 0.33, B = 0.67, A = 10 and each unit of labor costs $45 and each unit of capital costs $55. With $50,000 available in the budget, develop an optimization model to determine the number of units of capital and labor required in order to maximize output. b. Find the optimal solution to the model you formulated in part (a). Round all your answers to two decimal places. (Hint: When using Excel Solver, use the Multistart option with bounds 0 £ L £ 700 and 0 £ K £ 1000.) ANSWER: a. The optimization model is Max s.t.

Copyright Cengage Learning. Powered by Cognero.

Page 13


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models b. The optimal solution is L = 366.67, K = 609.09 with the total output of $5,151.65. The spreadsheet model is:

Copyright Cengage Learning. Powered by Cognero.

Page 14


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 15


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

48. The profit function for two types of iPod is: Profit = where x1 and x2 represent number of units of production of basic and advanced iPods, respectively. Production time required for the basic iPod is 6 hours per unit, and production time required for the advanced iPod is 8 hours per unit. Currently, 50 hours are available. The cost of hours is already factored into the profit function. a. Formulate an optimization problem that can be used to find the optimal production quantity of basic and advanced iPods. b. Solve the optimization model you formulated in part (a). How much should be produced? ANSWER: a. The optimization model is, Max s.t.

b. The optimal solution is: x1 = 3.32 and x2 = 3.76 with a profit of $685.80. The spreadsheet model is:

Copyright Cengage Learning. Powered by Cognero.

Page 16


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 17


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

49. Roger is willing to promote and sell two types of smart watches, X and Y, at his outlet. The demand for these two watches are as follows. DX = –0.45PX + 0.34PY + 242 DY = 0.2PX – 0.58PY + 282 where, DX is the demand for watch X, PX is the selling price of watch X, DY is the demand for watch Y, and PY is the selling price of watch Y. Rogers wishes to determine the selling price that maximizes revenue for these two products. Develop the revenue function for these two models, and find the revenue maximizing prices. ANSWER: The revenue function is: PXDX + PYDY = PX(–0.45PX + 0.34PY + 242) + PY(0.2PX – 0.58PY + 282). This is an example of an unconstrained optimization problem because no constraints are required here. The optimal solution is: PX = $575.49 and PY = $511 with optimal revenue of $141,686.18. The spreadsheet model is:

Copyright Cengage Learning. Powered by Cognero.

Page 18


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

50. A Steel Manufacturing company has two production facilities that manufacture Dishwashers. Production costs at the two facilities differ because of varying labor costs, local property taxes, type of material used, volume, and so on. For Plant A, the weekly costs for producing a number of units of Dishwashers is expressed as a function. TCA(X) = X2 – 2X + 12000 where X is the weekly production volume and TCA(X) is the weekly cost for Plant A. Plant B’s weekly production costs are given by TCB(Y) = Y2 + 8Y + 10000 Copyright Cengage Learning. Powered by Cognero.

Page 19


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models where Y is the weekly production volume and TCB(Y) is the weekly cost for Plant B. The manufacturer would like to produce 50 dishwashers per week at the lowest possible cost. a. Formulate a mathematical model that can be used to determine the optimal number of dishwashers to produce each week at each facility. b. Solve the optimization model to determine the optimal number of dishwashers to produce at each facility. ANSWER: a. If X is the weekly production volume at plant A and Y is the weekly production volume at plant B, then the optimization model is Min X2 – 2X + 12000 + Y2 + 8Y + 10000 s.t. X + Y = 50 X, Y ≥ 0 b. The optimal solution is X = 27.5 and Y = 22.5 for an optimal cost of $23,388. These are all in thousands. The spreadsheet model is:

Copyright Cengage Learning. Powered by Cognero.

Page 20


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

51. An Electrical Company has two manufacturing plants. The cost in dollars of producing an Amplifier at each of the two plants is given below. The cost of producing Q1 Amplifiers at the first plant is: 65Q1 + 4Q12+ 90 and the cost of producing Q2 Amplifiers at the second plant is: 20Q2 + 2Q22+ 120 The company needs to manufacture at least 60 Amplifiers to meet the received orders. How many Amplifiers should be produced at each of the plants to minimize the total production cost? Round the answers to two decimal places and the total cost to the nearest dollar value. ANSWER: If Q1 and Q2 are the number of units of amplifiers manufactured in the first and second plants, respectively, then the optimization model is Min (65Q1 + 4Q12+ 90) + (20Q2 + 2Q22+ 120) s.t. Q1 + Q2 ≥ 60 Q1, Q2 ≥ 0 The optimal solution to this model is to produce 16.25 Amplifier’s at plant 1 for a production cost of Copyright Cengage Learning. Powered by Cognero.

Page 21


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models $2,202.50 and 43.75 Amplifier’s at plant 2 for a production cost of $4,823.13. The total cost is $7,026. The spreadsheet model follows.

Copyright Cengage Learning. Powered by Cognero.

Page 22


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

52. The exponential smoothing model is given by

where

This model is used to predict the future based on the past data values. a. The observed values with the smoothing constant a = 0.45 are given in the below table. The third column of the table displays the forecast values obtained using the above model. The forecasted error is calculated in the fourth column, and the square of the forecast error and the sum of squared forecast errors are given in the fifth column. Construct this table in your spreadsheet model using the formula above. (Hint: The first forecast value is same as the observed value.) Alpha = 0.45 Day 1 2 3 4

Observed Value 15 12 14 18

Forecast 15.00 15.00 13.65 13.81

Copyright Cengage Learning. Powered by Cognero.

Forecast Error 0.0000 –3.0000 0.3500 4.1925

Squared Forecast Error 0.0000 9.0000 0.1225 17.5771 Page 23


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models 5 6 7 8 9 10 11 12 13 14 15

15 16 21 18 12 19 22 14 21 20 19

15.69 15.38 15.66 18.06 18.03 15.32 16.98 19.24 16.88 18.73 19.30

–0.6941 0.6182 5.3400 –0.0630 –6.0346 3.6809 5.0245 –5.2365 4.1199 1.2660 –0.3037

0.4818 0.3822 28.5159 0.0040 36.4169 13.5494 25.2458 27.4211 16.9737 1.6026 0.0922 177.39

Sum of the squared forecast error

b. The value of a is often chosen by minimizing the sum of squared forecast errors. Use Excel Solver to find the value of a that minimizes the sum of squared forecast errors. ANSWER: a.

b. Min (y2 – 2)2 + (y3 – 3)2 + (y4 – 4)2 + (y5 – 5)2 + (y6 – 6)2 + (y7 – 7)2 + (y8 – 8)2 + (y9 – 9)2 + (y10 2 2 2 2 2 2 – 10) + (y11 – 11) + (y12 – 12) + (y13 – 13) + (y14 – 14) + (y15 – 15) s.t.

Copyright Cengage Learning. Powered by Cognero.

Page 24


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

y1 = 15 y2 = 12 y3 = 14 y4 = 18 y5 = 15 y6 = 16 y7 = 21 y8 = 18 y9 = 12 y10 = 19 y11 = 22 y12 = 14 y13 = 21 y14 = 20 y15 = 19 0≤a≤1 The optimal solution is a = 0.2221. The spreadsheet model follows.

Copyright Cengage Learning. Powered by Cognero.

Page 25


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 26


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

53. Consider the economic order quantity (EOQ) model for multiple products that are independent except for a budget restriction. The following model describes this situation. Let Dk = annual demand for product k Ck = unit cost of product k Sk = cost per order placed for product k i = inventory carrying charge as a percentage of the cost per unit B = the maximum amount of investment in goods N = number of products The decision variables are Qk, the amount of product k to order. The model is:

s.t.

a. Set up a spreadsheet model for the following data: Copyright Cengage Learning. Powered by Cognero.

Page 27


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models Annual Demand Product Cost Order Cost B i

Product 1 1,250 $120 $110 $30,000 0.3

Product 2 1,550 $90 $175

Product 3 1,450 $105 $140

b. Solve the problem using Excel Solver. (Hint: For Solver to find a solution, you need to start with decision variable values that are greater than 0.) ANSWER: a.

b. The optimal solution is Q1 = 31.565, Q2 = 51.194, and Q3 = 41.002 with a total minimum cost of $29,211.

Copyright Cengage Learning. Powered by Cognero.

Page 28


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

54. Gatson manufacturing company is willing to promote two types of tires: the Economy tire and Premium tire. These two tires are independent of each other in terms of demand, cost, price, etc. An analytics team of this company has estimated the profit functions for both the tires as Monthly profit for Economy tire = 49.2415 LN(XA) + 180.414 Monthly profit for Premium tire = 84.344 LN(XB) – 150.112 where XA and XB are the advertising amounts allocated to the Economy tire and Premium tire, respectively, and LN is the natural logarithm function. The advertising budget is $200,000, and management has dictated that at least $20,000 must be allocated to each of the two tires. (Hint: To compute a natural logarithm for the value X in Excel, use the formula = LN(X). For Solver to find an answer, you also need to start with decision variable values greater than 0 in this problem.) Develop and solve an optimization model that will prescribe how the company should allocate its marketing budget to maximize profit. ANSWER: Max 49.2415 LN(XA) + 180.414 + 84.344 LN(XB) – 150.112 s.t. XA + XB ≤ 200,000 XA ≥ 20,000 XB ≥ 20,000 The optimal solution is XA = $73,722.82 and XB = $126,277.18 with a profit of $1,572.93. The spreadsheet model follows: Copyright Cengage Learning. Powered by Cognero.

Page 29


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 30


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

55. Jim must solve a nonlinear optimization problem where point A should be within a radius of 15 centimeters from each of the points B, C, D, E, and F. The decision variables are defined as below. X = horizontal coordinate of point A Y = vertical coordinate of point A The data on the distances are given below:

B C D E F

Horizontal Vertical Coordinate Coordinate 9 11 14 18 18 22 13 16 17 21

Formulate a model to find the optimal location of point A. ANSWER: Let X = horizontal coordinate of point A Y = vertical coordinate of point A

Copyright Cengage Learning. Powered by Cognero.

Page 31


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

s.t.

The optimal solution is Horizontal Coordinate (X) = 14, Vertical Coordinate (Y) = 18, with an objective function value of 20.738. The spreadsheet model follows:

Copyright Cengage Learning. Powered by Cognero.

Page 32


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 33


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

56. Jim must solve a nonlinear optimization problem where point A should be within a radius of 15 centimeters from each of the points B, C, D, E, and F. The decision variables are defined as below. X = horizontal coordinate of point A Y = vertical coordinate of point A The data on the distances are given below:

B C D E F

Horizontal Vertical Coordinate Coordinate 9 11 14 18 18 22 13 16 17 21

Formulate and solve a model that minimizes the maximum distance from point A to each of the points B, C, D, E, and F. Round all your answers to three decimal places. ANSWER: Let X = horizontal coordinate of point A Y = vertical coordinate of point A Copyright Cengage Learning. Powered by Cognero.

Page 34


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models Let d = the maximum distance from point A to points B, C, D, E, and F. Min d s.t.

The optimal solution is Horizontal Coordinate (X) = 13.498 and Vertical Coordinate (Y) = 16.502 with a minimum of maximum distance, 7.106. The spreadsheet model follows:

Copyright Cengage Learning. Powered by Cognero.

Page 35


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 36


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

57. The manager of a supermarket estimates the average number of trips made to the warehouse from each of the five outlets. Some outlets have a higher number of average daily trips, and his goal is to relocate the warehouse closer to these outlets. The available data on the distance between the warehouse and the outlets are provided in the table below.

Outlet 1 Outlet 2 Outlet 3 Outlet 4 Outlet 5

Horizontal Vertical Coordinate Coordinate (X) (Y) Demand 2.5 3.2 10 3 3.8 16 2 2.8 13 3.5 4.2 17 2.8 3.6 19

a. Develop a new unconstrained model that minimizes the sum of the demand-weighted distance defined as the product of the demand (measured in number of trips) and the distance to the warehouse. b. Solve the model you developed in part a. ANSWER: a. The demand-weighted objective is:

b. The optimal solution is Horizontal Coordinate (X) = 2.8, Vertical Coordinate (Y) = 3.6, with an objective Copyright Cengage Learning. Powered by Cognero.

Page 37


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models function value of 39.907. The spreadsheet model follows:

Copyright Cengage Learning. Powered by Cognero.

Page 38


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 39


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

58. Mark and his friends are planning for a holiday party. Data on longitude, latitude, and number of friends at each of the 10 locations are given below. Mark would like to identify the location for the holiday party such that it minimizes the demand-weighted distance, where demand is the number of friends at each location. Find the optimal location for the party. The distance between two cities can be approximated by the following formula.

where lat1 and long1 are the latitude and longitude of city 1, and lat2 and long2 are the latitude and longitude of city 2. (Hint: Notice that all longitude values given for this problem are negative. Make sure that you do not check the option for Make Unconstrained Variables Nonnegative in Solver.) Location Ohio TN Mass Iowa New York Virginia NJ Wyoming Maryland CA

latitude 26.782 38.952 36.961 33.216 36.499 44.934 40.850 42.901 41.019 43.623

longitude –77.639 –76.164 –85.921 –81.753 –84.575 –72.798 –76.657 –75.514 –120.491 –119.626

Copyright Cengage Learning. Powered by Cognero.

Friends 12 2 1 8 10 7 5 6 1 9 Page 40


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models ANSWER: Let X = the latitude of the optimal location to visit for a holiday trip Y = the longitude of the optimal location to visit for a holiday trip Ri = number of friends in ith location (demand)

The optimal solution is X = 35.772, Y = –81.999, with an objective function value of 37,747.156. The spreadsheet model follows:

Copyright Cengage Learning. Powered by Cognero.

Page 41


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 42


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

59. Consider the return scenario for three types of mutual funds shown in the following table: Scenario Mutual Fund X Y Z

1

2

3

4

5

6

–33.8 –45.5 –11.8

12.1 15.8 128.8

129.9 58.9 164.4

137.7 34.6 17.8

–55.5 –39.8 –43.4

16.3 –64.5 –32.3

a. Develop the Markowitz portfolio model for these data with a required expected return of at least 20 percent. Assume that the six scenarios are equally likely to occur. b. Solve the model developed in part (a). ANSWER: a. Let X = the fraction of the portfolio to invest in Mutual Fund X Y = the fraction of the portfolio to invest in Mutual Fund Y Z = the fraction of the portfolio to invest in Mutual Fund Z

s.t.

Copyright Cengage Learning. Powered by Cognero.

Page 43


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

b.

Copyright Cengage Learning. Powered by Cognero.

Page 44


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 45


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

60. Consider the stock return data given below. Stock A B C D

Month 1 12.07 15.95 30.52 32.42

Month 2 10.12 4.16 16.51 21.36

Month 3 14.54 6.31 34.25 13.84

Month 4 46.58 –2.74 45.62 8.12

Month 5 –19.34 6.54 –27.21 –6.84

a. Construct the Markowitz model that maximizes expected return subject to a maximum variance of 35. b. Solve the model developed in part a. Round all your answers to three decimal places. ANSWER: a. Let A = proportion of portfolio invested in Stock A B = proportion of portfolio invested in Stock B C = proportion of portfolio invested in Stock C D = proportion of portfolio invested in Stock D = the expected return of the portfolio = the return of the portfolio in month s s.t. –33.8X – 45.5Y – 11.8Z = R1 Copyright Cengage Learning. Powered by Cognero.

Page 46


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models 12.1X + 15.8Y + 128.8Z = R2 129.9X + 58.9Y + 164.4Z = R3 137.7X + 34.6Y + 17.8Z = R4 –55.5X – 39.8Y – 43.4Z = R5 16.3X – 64.5Y – 32.3Z = R6 X+Y+Z=1

X, Y, Z ≥ 0 b.

Copyright Cengage Learning. Powered by Cognero.

Page 47


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 48


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

61. Consider the following data on the returns from bonds.

Bond 1 Bond 2 Bond 3

1 0.20 0.128 0.167

2 0.126 0.21 0.27

3 0.321 0.325 0.426

Year 4 –0.39 –0.243 –0.84

5 –0.67 0.169 0.143

6 0.135 0.125 –0.46

7 0.52 0.304 0.147

8 0.75 0.286 0.704

a. Construct the Markowitz portfolio model using a required expected return of at least 15 percent. Assume that the eight scenarios are equally likely to occur. b. Solve the model using Excel Solver. ANSWER: a. Let X = the fraction of the portfolio to invest in Bond 1 Y = the fraction of the portfolio to invest in Bond 2 Z = the fraction of the portfolio to invest in Bond 3

s.t. 0.20X + 0.128Y + 0.167Z = R1 0.126X + 0.21Y + 0.27Z = R2 Copyright Cengage Learning. Powered by Cognero.

Page 49


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models 0.321X + 0.325Y + 0.426Z = R3 –0.39X – 0.243Y – 0.84Z = R4 –0.67X + 0.169Y + 0.143Z = R5 0.135X + 0.125Y – 0.46Z = R6 0.52X + 0.304Y + 0.147Z = R7 0.75X + 0.286Y + 0.704Z = R8 X+Y+Z=1

X, Y, Z ≥ 0 b.

Copyright Cengage Learning. Powered by Cognero.

Page 50


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 51


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

62. Consider the following data on the returns from bonds.

Bond 1 Bond 2 Bond 3

1 0.200 0.128 0.067

2 0.026 0.100 0.700

3 0.121 0.125 0.226

Year 4 –0.139 –0.243 –0.184

5 –0.167 0.269 0.234

6 0.135 0.225 –0.146

7 0.152 0.204 0.047

8 0.175 0.186 0.604

Develop and solve the Markowitz portfolio model using a required expected return of at least 15 percent. Assume that the eight scenarios are equally likely to occur. Use this model to construct an efficient frontier by varying the expected return from 2 to 18 percent in increments of 2 percent and solving for the variance. Round all your answers to three decimal places. ANSWER: Let X = the fraction of the portfolio to invest in Bond 1 Y = the fraction of the portfolio to invest in Bond 2 Z = the fraction of the portfolio to invest in Bond 3

Copyright Cengage Learning. Powered by Cognero.

Page 52


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models s.t. 0.20X + 0.128Y + 0.067Z = R1 0.026X + 0.1Y + 0.7Z = R2 0.121X + 0.125Y + 0.226Z = R3 –0.139X – 0.243Y – 0.18Z = R4 –0.167X + 0.269Y + 0.234Z = R5 0.135X + 0.225Y – 0.146Z = R6 0.152X + 0.204Y + 0.047Z = R7 0.175X + 0.186Y + 0.604Z = R8 X + Y + Z =1

X, Y, Z, ≥ 0

Copyright Cengage Learning. Powered by Cognero.

Page 53


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 54


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

The efficient frontier is shown below. We see that as the expected return increases, the minimum variance (possible risk) also increases, and this increase is evident for expected returns of more than 8 percent.

Copyright Cengage Learning. Powered by Cognero.

Page 55


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

63. Consider the data on investment made in four types of funds and returns from S&P 500. Mutual Fund Large-Cap Growth Large-Cap Value Small-Cap Growth Small-Cap Value S&P 500 Return

Year 1 41.54 32.45 26.13 37.56 33.15

Year 2 36.18 44.78 7.04 18.53 27.62

Year 3 32.76 28.61 –23.97 27.53 15.84

Year 4 –20.63 38.49 45.67 –5.48 30.42

a. Develop an optimization model that will give the fraction of the portfolio to invest in each of the funds so that the return of the resulting portfolio matches as closely as possible the return of the S&P 500 Index. (Hint: Minimize the sum of the squared deviations between the portfolio’s return and the S&P 500 Index return for each year in the data set.) b. Solve the model developed in part (a). ANSWER: a. Let LG = proportion of portfolio invested in large-cap growth fund LV = proportion of portfolio invested in large-cap value fund SG = proportion of portfolio invested in small-cap growth fund SV = proportion of portfolio invested in small-cap value fund Ds = the difference between the portfolio return and the S&P 500 return, year s s.t. 41.54LG + 32.45LV + 26.13SG + 37.56SV – 33.15 = D1 36.18LG + 44.78LV + 7.04SG + 18.53SV – 27.62 = D2 32.76LG + 28.61LV – 23.97SG + 27.53SV – 15.84 = D3 – 20.63LG + 38.49LV + 45.67SG – 5.48SV – 30.42 = D4 LG + LV + SG + SV = 1 LG, LV, SG, SV ≥ 0 b.

Copyright Cengage Learning. Powered by Cognero.

Page 56


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 57


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 58


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

64. Develop a model that minimizes semivariance for the data given below with a required return of 15 percent. Define a variable ds for each scenario and let Mutual Fund Large-Cap Growth Large-Cap Value Small-Cap Growth Small-Cap Value

Year 1 41.54 32.45 26.13 37.56

with ds = 0. Then make the objective function: Min

Scenario Year 2 Year 3 36.18 32.76 44.78 28.61 7.04 –23.97 18.53 27.53

.

Year 4 –20.63 38.49 45.67 –5.48

Solve the model you developed with a required expected return of at least 15 percent. ANSWER: Let: LG = proportion of portfolio invested in large-cap growth fund LV = proportion of portfolio invested in large-cap value fund SG = proportion of portfolio invested in small-cap growth fund SV = proportion of portfolio invested in small-cap value fund = the expected return of the portfolio Copyright Cengage Learning. Powered by Cognero.

Page 59


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models = the return of the portfolio in year s = the difference between the expected portfolio return and the return for year s s.t. 41.54LG + 32.45LV + 26.13SG + 37.56SV – 33.15 = R1 36.18LG + 44.78LV + 7.04SG + 18.53SV – 27.62 = R2 32.76LG + 28.61LV – 23.97SG + 27.53SV – 15.84 = R3 – 20.63LG + 38.49LV + 45.67SG – 5.48SV – 30.42 = R4

LG + LV + SG + SY = 1 ds = 0; s = 1, 2, 3, 4

LG, LV, SG, SV ≥ 0

Copyright Cengage Learning. Powered by Cognero.

Page 60


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 61


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 62


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

65. Consider the stock return data given below. Stock A B C D

Month 1 12.07 15.95 30.52 32.42

Month 2 10.12 4.16 16.51 21.36

Month 3 14.54 6.31 34.25 13.84

Month 4 46.58 –2.74 45.62 8.12

Month 5 –19.34 6.54 –27.21 –6.84

Develop and solve the Markowitz model that maximizes expected return subject to a maximum variance of 35. Use this model to construct an efficient frontier by varying the maximum allowable variance from 25 to 55 in increments of five and solving for the maximum return for each. ANSWER: Let A = proportion of portfolio invested in Stock A B = proportion of portfolio invested in Stock B C = proportion of portfolio invested in Stock C D = proportion of portfolio invested in Stock D = the expected return of the portfolio = the return of the portfolio in month s Copyright Cengage Learning. Powered by Cognero.

Page 63


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models s.t. 12.07A + 15.95B + 30.52C + 32.42D = R1 10.12A + 4.16B + 16.51C + 21.36D = R2 14.54A + 6.31B + 34.25C + 13.84D = R3 46.58A – 6.31B + 45.62C + 8.12D = R4 –19.37A + 6.54B – 27.21C – 6.84D = R5 A + B + C + D =1

A, B, C, D ≥ 0

Copyright Cengage Learning. Powered by Cognero.

Page 64


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

Copyright Cengage Learning. Powered by Cognero.

Page 65


Name:

Class:

Date:

Chapter 14 - Nonlinear Optimization Models

The output is shown below. As the maximum variance increases, the expected return increases but at a decreasing rate. This curve is known as the efficient frontier.

Copyright Cengage Learning. Powered by Cognero.

Page 66


Name:

Class:

Date:

Chapter 15 - Decision Analysis Multiple Choice 1. An uncertain future event affecting the consequence associated with a decision is known as a _____. a. chance event b. decision alternative c. decision node d. payoff ANSWER: a 2. A(n) _____ refers to the result obtained when a decision alternative is chosen and a chance event occurs. a. payoff table b. outcome c. state of nature d. node ANSWER: b 3. _____ are possible outcomes for chance events that affect the consequences associated with a decision alternative. a. Payoffs b. Forecasts c. Decision trees d. States of nature ANSWER: d 4. No more than one state of nature can occur at a given time for a chance event. This indicates that the states of nature are defined such that they are _____. a. collectively exhaustive b. mutually exclusive c. independent outcomes d. conservative events ANSWER: b 5. The states of nature are defined so that they are _____. This means that at least one state of nature must occur at a given time for a chance event. a. collectively exhaustive b. mutually exclusive c. certain events d. optimistic outcomes ANSWER: a 6. A measure of the outcome of a decision such as profit, cost, or time is known as a _____. a. branch b. payoff c. regret d. forecasting index Copyright Cengage Learning. Powered by Cognero.

Page 1


Name:

Class:

Date:

Chapter 15 - Decision Analysis ANSWER: b 7. _____ are graphical representations of the decision problems that show the sequential nature of the decision-making process. a. Influence diagrams b. Utility functions c. Decision trees d. Payoff tables ANSWER: c 8. Which of the following is true of decision trees when used to solve a complex problem? a. They provide a useful way to decompose the problem. b. They are used to compute a decision maker’s risk tolerance. c. They can be converted into truth tables. d. They can be used only when the decision maker is risk neutral. ANSWER: a 9. Brett wants to sell throw blankets for the holiday season at a local flea market. Brett purchases the throws for $15 and sells them to his customers for $35. The rental space is fixed fee of $1,500 for the season. Assume there is no leftover value for unsold units. If he orders 200 and demand is 150, what is the payoff? a. $800 b. $50 c. $750 d. $2,800 ANSWER: c 10. An intersection or junction point of a decision tree is called a(n) _____. a. node b. stem c. intercept d. branch ANSWER: a 11. Chance nodes are _____. a. nodes provided at the end of the states-of-nature branches b. nodes indicating points where an uncertain event will occur c. nodes provided at the end of the decision alternative branches where a payoff is shown d. nodes indicating points where a decision is made ANSWER: b 12. Lines showing the alternatives from decision nodes and the outcomes from chance nodes are called _____. a. weights b. payoffs c. diagonals Copyright Cengage Learning. Powered by Cognero.

Page 2


Name:

Class:

Date:

Chapter 15 - Decision Analysis d. branches ANSWER: d 13. The _____ approach evaluates each decision alternative in terms of the best payoff that can occur. a. conservative b. optimistic c. maximin regret d. expected value ANSWER: b 14. For a maximization problem, the optimistic approach often is referred to as the _____ approach. a. minimin b. maximin c. minimax d. maximax ANSWER: d 15. For a minimization problem, the optimistic approach often is referred to as the _____ approach. a. maximin b. minimax c. minimin d. maximax ANSWER: c 16. Choosing a decision alternative that maximizes the minimum profit is a feature of the _____ approach. a. expected value b. optimistic c. conservative d. maximin regret ANSWER: c 17. Using the Table below, which is the recommended decision alternative using the optimistic approach?

Decision Alternative

Payoff Table State of Nature 1

State of Nature 2

d1

5

7

d2

–4

1

d3

1

–3

d4

10

2

d5

6

4

a. d1 b. d4 Copyright Cengage Learning. Powered by Cognero.

Page 3


Name:

Class:

Date:

Chapter 15 - Decision Analysis c. d2 d. d5 ANSWER: b 18. Using the Table below, which is the recommended decision alternative using the conservative approach?

Decision Alternative

Payoff Table State of Nature 1

State of Nature 2

d1

5

7

d2

–4

1

d3

1

–3

d4

10

2

d5

6

4

a. d1 b. d5 c. d2 d. d3 ANSWER: a 19. The amount of loss (lower profit or higher cost) from not making the best decision for each state of nature is known as _____. a. best payoff b. opportunity loss c. risk profile d. utility ANSWER: b 20. The minimax regret approach is _____. a. purely optimistic b. purely conservative c. both purely optimistic and purely conservative d. neither purely optimistic nor purely conservative ANSWER: d 21. For a particular maximization problem, the payoff for the best decision alternative is $15.7 million while the payoff for one of the other alternatives is $12.9 million. The regret associated with the alternate decision would be _____. a. $28.6 million b. $15.7 million c. $0.129 million d. $2.8 million ANSWER: d 22. The weighted average of the payoffs for a chance node is known as the _____. a. median value Copyright Cengage Learning. Powered by Cognero.

Page 4


Name:

Class:

Date:

Chapter 15 - Decision Analysis b. variance of the node c. risk measure d. expected value ANSWER: d 23. _____ is the study of the possible payoffs and probabilities associated with a decision alternative or a decision strategy in the face of uncertainty. a. Risk analysis b. Cost analysis c. Certainty analysis d. Optimization ANSWER: a 24. The study of how changes in the probability assessments for the states of nature or changes in the payoffs affect the recommended decision alternative is known as _____. a. uncertainty analysis b. cost analysis c. sensitivity analysis d. probability analysis ANSWER: c 25. New information obtained through research or experimentation that enables an updating or revision of the state-ofnature probabilities is known as _____. a. joint probability b. sample information c. conditional probability d. expected utility ANSWER: b 26. _____ refer to the probabilities of the states of nature after revising the prior probabilities based on sample information. a. Preliminary probabilities b. Perfect probabilities c. Joint probabilities d. Posterior probabilities ANSWER: d 27. What would be the value added by a market analysis undertaken if the expected value with sample information is $8.56 million and the expected value without sample information is $6.39 million? a. $8.56 million b. $6.39 million c. $2.17 million d. $14.95 million ANSWER: c 28. A special case of sample information where the information tells the decision maker exactly which state of nature is going to occur is known as _____ information. a. perfect b. mutual c. conditional d. prior Copyright Cengage Learning. Powered by Cognero.

Page 5


Name:

Class:

Date:

Chapter 15 - Decision Analysis ANSWER: a 29. _____ refers to the probability of one event, given the known outcome of a (possibly) related event. a. Joint probability b. Priori probability c. Decisive probability d. Conditional probability ANSWER: d 30. Bayes’ theorem _____. a. can be used only for cases where conditional probabilities are unknown b. cannot be used to calculate posterior probabilities c. enables the use of sample information to revise prior probabilities d. is useful for determining optimal decisions without requiring knowledge of probabilities of the states of nature ANSWER: c 31. Using the data below, what would be the joint probabilities, P(U ∩ Sj)? States of Nature (sj)

Prior Probabilities P(sj)

Conditional Probabilities P(U | sj)

0.65 0.20 0.15 1.00

0.75 0.35 0.20

s1 s2 s3 Total a. 0.83, 0.12, and 0.05 b. 0.49, 0.07, and 0.03 c. 0.47, 0.49, and 0.04 d. 1.00, 0.59, and 1.00 ANSWER: b

32. Using the data below, which of the following would be the posterior probabilities, P(Sj | U)?

States of Nature (sj)

Prior Probabilities P(sj)

Conditional Probabilities P(U | sj)

s1 s2 s3 Total

0.65 0.20 0.15 1.00

0.75 0.35 0.20

a. 0.83, 0.12, and 0.05 b. 0.49, 0.07, and 0.03 c. 0.47, 0.49, and 0.04 d. 1.00, 0.59, and 1.00 Copyright Cengage Learning. Powered by Cognero.

Page 6


Name:

Class:

Date:

Chapter 15 - Decision Analysis ANSWER: a 33. _____ is a measure of the total worth of a consequence reflecting a decision maker’s attitude toward considerations such as profit, loss, and risk. a. Cost-to-company b. Utility c. Decision value d. Regret ANSWER: b 34. A _____ is a decision maker who would choose a guaranteed payoff over a lottery with a better expected payoff. a. risk taker b. risk-neutral c. risk avoider d. risk-creator ANSWER: c 35. The utility function for money is a curve that depicts the relationship between _____. a. decision alternative and utility b. branch probabilities and utility c. regret and utility d. monetary value and utility ANSWER: d 36. Exponential utility functions assume that the decision maker is _____. a. a risk monitor b. risk averse c. risk neutral d. a risk taker ANSWER: b 37. The parameter R in an exponential utility function represents the _____. a. decision maker’s risk tolerance b. utility function’s error tolerance c. posterior probability d. likely profit/loss from the investment ANSWER: a 38. Which of the following is the value of e, the mathematical constant used in the exponential utility function, U(x) = 1 – e–x/R? a. 3.14159 b. 2.71828 c. 1 d. The value of e depends on the risk tolerance value. ANSWER: b 39. The parameter R in the exponential utility function U(x) = 1 – e–x/R represents the decision maker’s risk tolerance. Larger values of R indicate that the decision maker _____. a. is less risk averse (closer to neutral) b. is more risk averse (has less risk tolerance) Copyright Cengage Learning. Powered by Cognero.

Page 7


Name:

Class:

Date:

Chapter 15 - Decision Analysis c. is not concerned with risk d. will accept the gamble ANSWER: a Subjective Short Answer 40. Jase Hansen is interested in leasing a sports-utility vehicle and has contacted three automobile dealers for pricing information. Each dealer offered Jase a 24-month lease with no down payment due at the time of signing. Each lease includes a monthly cost, mileage allowances, and the cost for additional miles. The details are given in the below table. Dealer True Vehicle FCO Jack’s Auto

Monthly Cost Mileage Allowances ($) 300 40,000 360 46,000 410 50,000

Cost per Additional Mile ($) 0.30 0.35 0.15

Jase decided to choose the lease option that will minimize his total 24-month cost. He is not sure how many miles he will drive in the next two years. Hence, for the purpose of decision, assume that Jase wants to evaluate options of driving 20,000 miles per year, 23,000 miles per year, and 25,000 miles per year. a. What is the decision, and what is the chance event? b. Construct a payoff table for Jase’s problem. ANSWER: a. The decision faced by Jase is to select the best lease option from three alternatives (True Vehicle, FCO, and Jack’s Auto). The chance event is the number of miles Jase will drive. b. The payoff for any combination of alternative and chance event is the sum of the total monthly charges and total additional mileage cost, i.e., for the True Vehicle lease option: 40,000 miles (20,000 miles for 2 years): 24($300) + $0.30(40000 – 40000) = $7,200 46,000 miles (23,000 miles for 2 years): 24($300) + $0.30(46000 – 40000) = $9,000 50,000 miles (25,000 miles for 2 years): 24($300) + $0.30(50000 – 40000) = $10,200 for the FCO lease option: 40,000 miles (20,000 miles for 2 years): 24($360) + $0.35*Max((40000 – 46000),0) = $8,640 46,000 miles (23,000 miles for 2 years): 24($360) + $0.35*Max((46000 – 46000),0) = $8,640 50,000 miles (25,000 miles for 2 years): 24($360) + $0.35*Max((50000 – 46000),0) = $10,040 for the Jack’ Auto lease option: 40,000 miles (20,000 miles for 2 years): 24($410) + $0.15*Max((40000 – 50000),0) = $9,840 46,000 miles (23,000 miles for 2 years): 24($410) + $0.15*Max((46000 – 50000),0) = $9,840 50,000 miles (25,000 miles for 2 years): 24($410) + $0.15*Max((50000 – 50000),0) = $9,840 Below is the payoff table for Jase’s problem: Actual Miles Driven Annually Dealer 20,000 23,000 25,000 True Vehicle $7,200 $9,000 $10,200 FCO $8,640 $8,640 $10,040 Jack’s Auto $9,840 $9,840 $9,840 41. Jase Hansen is interested in leasing a sports-utility vehicle and has contacted three automobile dealers for pricing information. Each dealer offered Jase a 24-month lease with no down payment due at the time of signing. Each lease Copyright Cengage Learning. Powered by Cognero.

Page 8


Name:

Class:

Date:

Chapter 15 - Decision Analysis includes a monthly cost, mileage allowances, and the cost for additional miles. The details are given in the below table. Dealer True Vehicle FCO Jack’s Auto

Monthly Cost Mileage Allowances ($) 300 40,000 360 46,000 410 50,000

Cost per Additional Mile ($) 0.30 0.35 0.15

Jase decided to choose the lease option that will minimize his total 24-month cost. He is not sure how many miles he will drive in the next two years. Hence, for the purpose of decision, assume that Jase wants to evaluate options of driving 20,000 miles per year, 23,000 miles per year, and 25,000 miles per year. a. Construct a decision tree based on the payoff table constructed in the previous problem. b. Recommend a decision based on the use of optimistic, conservative, and minimax regret approaches? ANSWER: a. The payoff table for the cost is: Actual Miles Driven Annually Dealer 20,000 23,000 25,000 True Vehicle 7,200 9,000 10,200 FCO 8,640 8,640 10,040 Jack’s Auto 9,840 9,840 9,840

b. Decision Alternatives True Vehicle FCO

Maximum Cost $10,200 $10,040

Copyright Cengage Learning. Powered by Cognero.

Minimum Cost $7,200 $8,640 Page 9


Name:

Class:

Date:

Chapter 15 - Decision Analysis Jack’s Auto

$9,840

$9,840

Optimistic approach: Select True Vehicle lease option which has the smallest minimum cost. Conservative approach: Select Jack’s Auto lease option which has the smallest maximum cost. Regret or opportunity loss table: Dealer True Vehicle FCO Jack’s Auto

Actual Miles Driven Annually 20,000 23,000 25,000 $0 $360 $360 $1,440 $0 $200 $2,640 $1,200 $0

Maximum regret $360 $1,440 $2,640

Minimax Regret results in the selection of the True Vehicle lease option (which has the smallest regret of the three alternatives: $360). 42. Greentrop Pharmaceutical Products is the world leader in the area of sleep aids. Its major product is “Dozealot.” The Research-and-Development Division has defined two alternatives to improve the quality of the product. These alternatives are simple reformulations of the product to minimize the side effects and to improve the product efficacy. To conduct an analysis, management has decided to consider the possible demands for the drug under each alternative. The following payoff table shows the projected profit in millions of dollars.

d1

Demand Low Medium High $500 $350 $525

d2

$875

Decision Alternatives

$300

$765

a. Construct a decision tree for this problem. b. If the decision maker knows nothing about the probabilities of three states of nature, what is the recommended decision using the optimistic, conservative, and minimax regret approaches? ANSWER: a.

b. Decision Alternatives d1

Maximum Profit

Minimum Profit

$525

$350

Copyright Cengage Learning. Powered by Cognero.

Page 10


Name:

Class:

Date:

Chapter 15 - Decision Analysis d2

$875

$300

Optimistic approach: select d2 which has the largest maximum profit. Conservative approach: select d1 which has the largest minimum profit. Regret or opportunity loss table: Demand Decision Alternatives d1 d2

Low

Medium

High

Maximum Regret

$375 0

0 $50

$240 0

$375 $50

Minimax Regret: The decision alternative is d2 with the minimum of the maximum regret values of $50. 43. Meega airlines decided to offer direct service from Akron to Clearwater Beach, Florida. Management must decide between full-price service using a company’s new fleet of jet aircraft and a discount-service using smaller capacity commuter planes. Management developed estimates of the contribution to profit for each type of service based upon two possible levels of demand for service on Clearwater Beach: high, moderate, and low. The following table shows the estimated quarterly profits (in thousands of dollars):

Service Full Price Discount

Demand for service High Medium 900 760 710 650

Low –430 350

a. If the demand probabilities are 0.3, 0.5, and 0.2, what is the best decision using the expected value approach? b. Construct a risk profile for the optimal decision in part a. What is the probability of the profit exceeding $700,000? ANSWER:

a. EV(Full) = 0.3(900) + 0.5(760) + 0.2(–430) = 564 or $564,000 EV(Discount) = 0.3(710) + 0.5(650) + 0.2(350) = 608 or $608,000 Optimal Decision: Discount service b. The risk profile is shown in tabular form. The probability that the cost exceeds $700,000 is 0.3. Profit $350 $650 $710

Probability 0.2 0.5 0.3

Copyright Cengage Learning. Powered by Cognero.

Page 11


Name:

Class:

Date:

Chapter 15 - Decision Analysis

44. The following payoff table shows the profit for a decision problem with three states of nature and three decision alternatives:

d1

State of Nature s1 s2 7 3

s3 4

d2

2

4

5

d3

8

2

3

Decision Alternative

a. Suppose P(s1) = 0.1, P(s2) = 0.3, and P(s3) = 0.6. What is the best decision using the expected value approach? b. Suppose that the probability of state of nature, s1, s2, and s3 changes to 0.4, 0.2, and 0.4, respectively. What is the best decision using the expected value approach in this case? ANSWER: a. EV(d1) = 0.1(7) + 0.3(3) + 0.6(4) = 4 EV(d2) = 0.1(2) + 0.3(4) + 0.6(5) = 4.4 EV(d3) = 0.1(8) + 0.3(2) + 0.6(3) = 3.2 Therefore, the best decision alternative is d2. b. EV(d1) = 0.4(7) + 0.2(3) + 0.4(4) = 5 EV(d2) = 0.4(2) + 0.2(4) + 0.4(5) = 3.6 EV(d3) = 0.4(8) + 0.2(2) + 0.4(3) = 4.8 Therefore, the best decision alternative is d1. 45. Visual Park is considering marketing one of its two television models for the coming Christmas season: Model A or Model B. Model A is a unique featured television and appears to have no competition. Estimated profits (in thousands of dollars) under high, medium, and low demand are given below: Model A Profit Probability

High 1200 0.2

Demand Medium 900 0.6

Low 500 0.2

Visual Park is optimistic about the TV Model B. However, the concern is that profitability will be affected if a competitor launches a TV model which has similar features as Model B. Estimated profits (in thousands of dollars) with and without competition is as follows: Copyright Cengage Learning. Powered by Cognero.

Page 12


Name:

Class:

Date:

Chapter 15 - Decision Analysis Model B With competition Profit Probability

High 1,200 0.2

Demand Medium 900 0.3

Low 500 0.5

Model B Without competition Profit Probability

High 1,600 0.6

Demand Medium 1,100 0.2

Low 700 0.2

a. Develop a decision tree for the Visual Park problem. b. For planning purposes, Visual Park believes there is a 0.7 probability that its competitor will launch a TV model similar to Model B. Given this probability of competition, the director of planning recommends marketing the Model A. Using expected value, what is your recommended decision? c. Show a risk profile for your recommended decision. d. Use sensitivity analysis to determine what the probability of competition for Model B would have to be for you to change your recommended decision alternative. ANSWER: a.

Copyright Cengage Learning. Powered by Cognero.

Page 13


Name:

Class:

Date:

Chapter 15 - Decision Analysis

b. EV(node 2) = 0.2(1,200) + 0.6(900) + 0.2(500) = 880 EV(node 4) = 0.2(1,200) + 0.3(900) + 0.5(500) = 760 EV(node 5) = 0.6(1,600) + 0.2(1,100) + 0.2(700) = 1,320 EV(node 3) = 0.7EV(node 4) + 0.3EV(node 5) = 0.7(760) + 0.3(1,320) = 928 Model B is recommended as the expected value of $928,000 is $48,000 better than Model A. c. Risk profile: 1600 0.3 × 0.6 1200 0.7 × 0.2 1100 0.3 × 0.2 900 0.7 × 0.3 700 0.3 × 0.2 500 0.7 × 0.5

0.18 0.14 0.06 0.21 0.06 0.35

Copyright Cengage Learning. Powered by Cognero.

Page 14


Name:

Class:

Date:

Chapter 15 - Decision Analysis

d. Let p = probability of competition p = 1 ® EV(node 5) = 1,120 p = 0 ® EV(node 4) = 460 Setting the Expected Value of both decisions equal to each other gives us: EV(Model B) = EV(Model A) 1320 – p(1320 – 760) = 880 560p = 440 p = 440/560 = 0.7857 For p > 0.7857, the EV of Model A is greater; for p < 0.7857, the EV of Model B is greater. Therefore, the probability of competition would have to be greater than 0.7857 before we would change to the Model A. 46. The Golden Jill Mining Company is interested in procuring 10,000 acres of coal mines in Powder River Basin. The mining company is considering two payment-plan options to buy the mines: I. 100% Payment II. Installment-Payment The payoff received will be based on the quality of coal obtained from the mines which has been categorized as High, Normal, and Poor Quality as well as the payment plan. The profit payoff in millions of dollars resulting from the various combinations of options and quality are provided below: Payment-Plan Options 100% Payment Installment-Payment

High

Quality Normal

Poor

450

320

–250

350

300

–110

a. What is the decision to be made, what is the chance event, and what is the consequence for this problem? How many Copyright Cengage Learning. Powered by Cognero.

Page 15


Name:

Class:

Date:

Chapter 15 - Decision Analysis decision alternatives are there? How many outcomes are there for the chance event? b. If nothing is known about the probabilities of the chance outcomes, what is the recommended decision using the optimistic, conservative, and minimax regret approaches? ANSWER: a. The decision to be made is to choose the payment-plan option to buy the mines. The chance event is the quality of the coal obtained from the mines which has been categorized as High, Normal, and Poor. The consequence is the amount of profit. There are two decision alternatives (100% Payment and InstallmentPayment). There are three outcomes for the chance event (High, Normal, and Poor). b. Payment-Plan Options Maximum Profit 100% Payment 450 Installment-Payment 350

Minimum Profit –250 –110

Optimistic Approach: 100% Payment Conservative Approach: Installment-Payment Opportunity Loss or Regret Table Payment Options 100% Payment Installment-Payment

Low 0 300

Quality Medium 0 220

High 140 0

Maximum Regret 140 300

Minimax Regret Approach: 100% Payment 47. The Golden Jill Mining Company is interested in procuring 10,000 acres of coal mines in Powder River Basin. The mining company is considering two payment-plan options to buy the mines: I. 100% Payment II. Installment-Payment The payoff received will be based on the quality of coal obtained from the mines, which has been categorized as High, Normal, and Poor Quality, as well as the payment plan. The profit payoff in millions of dollars resulting from the various combinations of options and quality are provided below: PaymentPlan Options 100% Payment InstallmentPayment

Quality High

Normal

Poor

450

320

–250

350

300

–110

a. Suppose that management believes that the probability of obtaining High Quality coal is 0.55, probability of Normal Quality Coal is 0.35, and probability of Poor Quality Coal is 0.1. Use the expected value approach to determine an optimal decision. b. Suppose that management believes that the probability of High Quality coal is 0.25, probability of Normal Quality Coal is 0.4, and probability of Poor Quality Coal is 0.35. What is the optimal decision using the expected value approach? ANSWER: a. EV(100% Payment) = 0.55(450) + 0.35(320) + 0.10(–250) = 334.5 EV(Installment - Payment) = 0.55(350) + 0.35(300) + 0.10(–110) = 286.5 Optimal Decision: 100% Payment b. EV(100% Payment) = 0.25(450) + 0.40(320) + 0.35(–250) = 153 EV (Installment - Payment) = 0.25(350) + 0.40(300) + 0.35(–110) = 169 Copyright Cengage Learning. Powered by Cognero.

Page 16


Name:

Class:

Date:

Chapter 15 - Decision Analysis Optimal Decision: Installment – Payment 48. Meega airlines decided to offer direct service from Akron to Clearwater Beach, Florida. Management must decide between full-price service using a company’s new fleet of jet aircraft and a discount-service using smaller capacity commuter planes. Management developed estimates of the contribution to profit for each type of service based upon two possible levels of demand for service on Clearwater Beach: high, moderate, and low. The following table shows the estimated quarterly profits (in thousands of dollars). Demand for service High Medium Low 900 760 –430 710 650 350

Service Full price Discount

The probabilities for the demand is P(High) = 0.3, P(Medium) = 0.5, and P(Low) = 0.2, respectively. a. What is the optimal decision strategy if perfect information were available? b. What is the expected value for the decision strategy developed in part a? c. Using the expected value approach, what is the recommended decision without perfect information? What is its expected value? d. What is the expected value of perfect information? ANSWER: a. If demand is High or Medium, select the decision alternative Full price; if demand is Low, select the decision alternative Discount. b. EVwPI = 0.3(900) + 0.5(760) + 0.20(350) = 720 or $720,000 c. EV(Full) = 0.3(900) + 0.5(760) + 0.2(–430) = 564 or $564,000 EV(Discount) = 0.3(710) + 0.5(650) + 0.2(350) = 608 or $608,000 Thus, the recommended decision is Discount. Hence, EVwoPI = $608,000. d. EVPI = EVwPI – EVwoPI = $720,000 – $608,000 = $112,000 49. The following table provides information about the profit payoff of an investment strategy. State of Nature

Decision Alternative

s1

s2

s3

s4

d1

52

44

44

36

d2

52

68

60

40

d3

36

36

36

44

Probability

0.3

0.2

0.4

0.1

a. What is the optimal decision strategy if perfect information were available? b. What is the expected value for the decision strategy developed in part (a)? c. Using the expected value approach, what is the recommended decision without perfect information? What is its expected value? d. What is the expected value of perfect information?

ANSWER: a. If s1, select d1 or d2 and receive a payoff of 52. If s2, select d2 and receive a payoff of 68. Copyright Cengage Learning. Powered by Cognero.

Page 17


Name:

Class:

Date:

Chapter 15 - Decision Analysis If s3, select d2 and receive a payoff of 60. If s4, select d3 and receive a payoff of 44. b. EVwPI = 0.3(52) + 0.2(68) + 0.4(60) + 0.1(44) = 57.6 c. EV(d1) = 0.3(52) + 0.2(44) + 0.4(44) + 0.1(36) = 45.6 EV(d2) = 0.3(52) + 0.2(68) + 0.4(60) + 0.1(40) = 57.2 EV(d3) = 0.3(36) + 0.2(36) + 0.4(36) + 0.1(44) = 36.8 Thus, the recommended decision is d2. Hence, EVwoPI = 57.2. d. EVPI = EVwPI – EVwoPI = 57.6 – 57.2 = 0.4. 50. Consider a decision situation with four possible states of nature: s1, s2, s3, and s4. The prior probabilities are P(s1) = 0.35, P(s2) = 0.15, P(s3) = 0.20, P(s4) = 0.30. The conditional probabilities are P(C |s1) = 0.2, P(C |s2) = 0.09, P(C |s3) = 0.15, and P(C |s4) = 0.20. Find the revised (posterior) probabilities P(s1| C), P(s2| C), P(s3| C), and P(s4| C). ANSWER: State of P(sj) P(C sj) P(C  sj) P(sjC) Nature 0.35 s1 0.20 0.0700 0.4035 0.15 s2 0.09 0.0135 0.0778 0.20 s3 0.15 0.0300 0.1729 s4

0.30

0.20 P(C) =

0.0600 0.1735

0.3458 1.0000

51. A construction company must decide on the size of the shopping mall, i.e. Large, Medium or Small, that has to be constructed in their acquired plot in the sub-urban area of Seattle. Due to the market conditions, the number of visitors to the mall will be High, Moderate, or Low. The level of response and the size of the mall will decide the return of investment from the mall. The profit payoff table for management (in millions of dollars) after 5 years is provided below.

Number of Visitors Size of the Mall

High

Moderate

Low

Large

25

15

–20

Medium

20

12

–10

Small

15

13

5

The probabilities are P(High) = 0.35, P(Moderate) = 0.40, and P(Low) = 0.25. a. Use a decision tree to recommend a decision. b. Use EVPI to determine whether the construction company should attempt to obtain a better estimate of the response. ANSWER: a. Let d1 = Size of the shopping mall is large d2 = Size of the shopping mall is medium d3 = Size of the shopping mall is small s1 = High demand Copyright Cengage Learning. Powered by Cognero.

Page 18


Name:

Class:

Date:

Chapter 15 - Decision Analysis s2 = Moderate demand s3 = Low demand

EV(node 2) = (0.35)(25) + (0.40)(15) + (0.25)(–20) = 9.75 EV(node 3) = (0.35)(20) + (0.40)(12) + (0.25)(–10) = 9.30 EV(node 4) = (0.35)(15) + (0.40)(13) + (0.25)(5) = 11.7 Recommended decision: d3 (Small shopping mall). b. Optimal decision strategy with perfect information: If s1, select d1. If s2, select d1. If s3, select d3. Expected value of this strategy is (0.35)(25) + (0.40)(15) + (0.25)(5) = 16 EVPI = EVwPI – EVwoPI = 16 – 11.7 = 4.3 or $4.3 million. In other words, $4.3 million represents the additional expected value that can be obtained if perfect Copyright Cengage Learning. Powered by Cognero.

Page 19


Name:

Class:

Date:

Chapter 15 - Decision Analysis information were available about the states of nature. 52. A construction company must decide on the size of the shopping mall, i.e. Large, Medium or Small, that has to be constructed in their acquired plot in the sub-urban area of Seattle. Due to the market conditions, the number of visitors to the mall will be High, Moderate, or Low. The level of response and the size of the mall will decide the return of investment from the mall. The profit payoff table for management (in millions of dollars) after 5 years is provided below. Number of Visitors Size of the Mall

High

Moderate

Low

Large

25

15

–20

Medium

20

12

–10

Small

15

13

5

The probabilities for the state of nature are P(High) = 0.35, P(Moderate) = 0.40, and P(Low) = 0.25. a. A test market study of the potential response for the mall in that area is expected to report either a favorable (F) or unfavorable (U) condition. The relevant conditional probabilities are as follows: P(F | High) = 0.35; P(U | High) = 0.65 P(F | Moderate) = 0.45; P(U | Moderate) = 0.55 P(F | Low) = 0.20; P(U | Low) = 0.80 What is the probability that the market research report will be favorable? b. Show the decision tree for this problem. ANSWER: a. Let d1 = Size of the shopping mall is large d2 = Size of the shopping mall is medium d3 = Size of the shopping mall is small s1 = High demand s2 = Moderate demand s3 = Low demand If F – Favorable State of Nature s1 (High)

P(sj) 0.35

0.35

0.1225

0.3475

s2 (Moderate) s3 (Low)

0.40

0.45

0.1800

0.5106

0.25

0.20 P(F) =

0.0500 0.3525

0.1418 1.0000

P(F | sj) P(F ∩ sj) P(si | F)

If U – Unfavorable State of Nature s1 (High) s2 (Moderate)

P(sj) 0.35 0.40

P(U | sj) P(U∩ sj) P(si | U) 0.65 0.2275 0.3514 0.55

Copyright Cengage Learning. Powered by Cognero.

0.2200

0.3398 Page 20


Name:

Class:

Date:

Chapter 15 - Decision Analysis s3 (Low)

0.25

0.80

0.2000 0.6475

0.3089 1.0000

The probability the report will be favorable is P(F ) = 0.3525 b. Assuming the test market study is used, a portion of the decision tree is shown below.

Copyright Cengage Learning. Powered by Cognero.

Page 21


Name:

Class:

Date:

Chapter 15 - Decision Analysis

Copyright Cengage Learning. Powered by Cognero.

Page 22


Name:

Class:

Date:

Chapter 15 - Decision Analysis 53. Three decision makers have assessed payoffs for the following decision problem (payoff in dollars). State of Nature

Decision Alternative d1

s1 15

s2 40

s3 –20

d2

60

80

–80

The indifference probabilities are as follows: Indifference Probability (p) Decision Payoff Decision Maker A Decision Maker C Maker B 80 Does not apply Does not apply Does not apply 60 0.70 0.95 0.87 40 0.50 0.90 0.74 15 0.30 0.80 0.59 –20 0.15 0.60 0.37 –80 Does not apply Does not apply Does not apply a. Plot the utility function for money for each decision maker. b. Classify each decision maker as a risk avoider, a risk taker, or risk neutral. c. For the payoff of 40, what is the premium that the risk avoider will pay to avoid risk? What is the premium that the risk taker will pay to have the opportunity of the high payoff? ANSWER: a.

b. A - Risk taker B - Risk avoider Copyright Cengage Learning. Powered by Cognero.

Page 23


Name:

Class:

Date:

Chapter 15 - Decision Analysis C - Risk neutral c. For risk avoider B: at $40, indifference probability p = 0.9 Thus, EV(Lottery) = 0.9(80) + 0.1(–80) = $64. Therefore, B will pay $64 – $40 = $24. For risk taker A: at $40, indifference probability p = 0.5 Thus, EV(Lottery) = 0.5(80) + 0.5(–80) = $0. Therefore, A will pay $40 – $0 = $40. 54. Three decision makers have assessed payoffs for the following decision problem (payoff in dollars). State of Nature

Decision Alternative d1

s1 15

s2 40

s3 –20

d2

60

80

–80

The indifference probabilities are as follows: Indifference Probability (p) Decision Payoff Decision Maker A Decision Maker C Maker B 80 Does not apply Does not apply Does not apply 60 0.70 0.95 0.85 40 0.50 0.90 0.70 15 0.30 0.80 0.55 –20 0.15 0.60 0.35 –80 Does not apply Does not apply Does not apply If P(s1) = 0.30, P(s2) = 0.55, and P(s3) = 0.15, find a recommended decision for each of the three decision makers. ANSWER: For each of the cases, assume that the utilities for best and worst payoffs are 10 and 0, respectively. For Decision Maker A. Utility Table State of Nature

Decision Alternative d1

s1 3

s2 5

s3 1.5

d2

7

10

0

EU(d1) = 0.30(3) + 0.55(5) + 0.15(1.5) = 3.875 EU(d2) = 0.30(7) + 0.55(10) + 0.15(0) = 7.60 The recommended decision is d2. For Decision Maker B. Utility Table State of Nature

Decision Alternative s1 Copyright Cengage Learning. Powered by Cognero.

s2

s3 Page 24


Name:

Class:

Date:

Chapter 15 - Decision Analysis d1

8

9

6

d2

9.5

10

0

EU(d1) = 0.30(8) + 0.55(9) + 0.15(6) = 8.25 EU(d2) = 0.30(9.5) + 0.55(10) + 0.15(0) = 8.35 The recommended decision is d2. For Decision Maker C. Utility Table State of Nature

Decision Alternative d1

s1 5.5

s2 7

s3 3.5

d2

8.5

10

0

EU(d1) = 0.30(5.5) + 0.55(7) + 0.15(3.5) = 6.025 EU(d2) = 0.30(8.5) + 0.55(10) + 0.15(0) = 8.05 The recommended decision is d2. 55. A manufacturing company introduces two product alternatives. The table below provides profit payoffs in thousands of dollars.

Bet on

State of Nature (Demand) Up

Stable

Product A

11

8

Down 8

Product B

8

10

12

The probabilities for the state of nature are P(Up) = 0.35, P(Stable) = 0.35, and P(Down) = 0.30. a. Use a decision tree to recommend a decision. b. Use EVPI to determine whether the manufacturing company should attempt to obtain a better estimate of the response. ANSWER: a.

Copyright Cengage Learning. Powered by Cognero.

Page 25


Name:

Class:

Date:

Chapter 15 - Decision Analysis

EV(node 2) = (0.35)(11) + (0.35)(8) + (0.30)(8) = 9.05 or $9,050 EV(node 3) = (0.35)(8) + (0.35)(10) + (0.30)(12) = 9.9 or $9,900 Recommended decision: d2 (Product B). c. Optimal decision strategy with perfect information: If demand is up, select product A. If demand is stable, select product B. If demand is down, select product B. Expected value of this strategy is (0.35)(11) + (0.35)(10) + (0.30)(12) = 10.95 or $10,950. EVPI = EVwPI – EVwoPI = 10.95 – 9.9 = 1.05 or $1,050. In other words, $1,050 represents the additional expected value that can be obtained if perfect information were available about the states of nature. 56. A Manufacturing company introduces two product alternatives. The table below provides profit payoffs in thousands of dollars. Bet on

State of Nature (Demand) Up

Stable

Product A

11

8

Down 8

Product B

8

10

12

Copyright Cengage Learning. Powered by Cognero.

Page 26


Name:

Class:

Date:

Chapter 15 - Decision Analysis The probabilities for the state of nature are P(Up) = 0.35, P(Stable) = 0.35, and P(Down) = 0.30. A test market study of the potential demand for the product is expected to report either a favorable (F) or unfavorable (U) condition. The relevant conditional probabilities are as follows: P(F | Up) = 0.5; P(F | Stable) = 0.3; P(F | Down) = 0.2 P(U | Up) = 0.2; P(U | Stable) = 0.3; P(U | Down) = 0.5 Use Bayes’ theorem to compute the conditional probability of the demand being up, stable, or down, given each market research outcome. ANSWER: Let s1 = Demand is up s2 = Demand is stable s3 = Demand is down

57. Harold has visited a casino and paid an entry fee of $20,000 to play the game of cards. Below is the payoff table in terms of the decision to play or not to play the game (Note: Harold will not pay the entry fee if he does not want to play and the below payoff table includes the entry fee).

Decision

State of Nature Win ($)

Copyright Cengage Learning. Powered by Cognero.

Lose ($) Page 27


Name:

Class:

Date:

Chapter 15 - Decision Analysis Play the game, d1 Do not play the game, d2

50,000

–20,000

0

0

a. In his previous visits, Harold has won 1 out of every 5 games that he has played. Use the expected value approach to recommend a decision. b. Assume that the utilities for 50,000 and –20,000 are 10 and 0, respectively. If a particular decision maker assigns an indifference probability of 0.0001 to the $0 payoff, would Harold play the game? Use expected utility to justify your answer. ANSWER: a. P(Win) = 1/5 = 0.2; P(Lose) = 4/5 = 0.8 EV(d1) = (1/5)(50,000) + (4/5)(–20,000) = –6,000 EV(d2) = 0 Therefore, the best decision under the EV approach is d2 – Do not play the game. b. Utility Table. Decision Play the game, d1 Do not play the game, d2

State of Nature Win

Lose

10

0

0.0001

0.0001

EU(d1) = (1/5)(10) + (4/5)(0) = 2 EU(d2 ) = 0.0001 Therefore, the best decision under the EU approach is d1 - purchase lottery ticket. 58. Translate the following monetary payoffs into utilities for a decision maker whose utility function is described by an exponential function with R = 6,450: –$3,000, –$1,500, $0, $1,500, $3,000, $4,500, $6,000, $7,500, $9,000. ANSWER: Monetary Payoff, x Utility, U(x) –3,000 –0.592 –1,500 –0.262 0 0.000 1,500 0.207 3,000 0.372 4,500 0.502 6,000 0.606 7,500 0.687 9,000 0.752 59. Consider an advertising company which has to decide on investing with the current team that has a 50 percent chance of earning a net profit of $35,000 and a 50 percent chance of losing $17,500 invested. a. Write the equation for the exponential function that approximates the advertising company’s utility function. b. Plot the exponential utility function for this advertising company for x values between –30,000 and 45,000. Is the management for the advertising company risk seeking, risk neutral, or risk averse? c. Suppose the management would like to invest more on marketing and actually be willing to make an investment that has a 50 percent chance of earning $50,000 and a 50 percent chance of losing $25,000. Plot the exponential function that approximates this utility function and compare it to the utility function from part (b). Is the management becoming more Copyright Cengage Learning. Powered by Cognero.

Page 28


Name:

Class:

Date:

Chapter 15 - Decision Analysis risk seeking or more risk averse? ANSWER: a. The exponential utility function for the advertising company is b. The utility function values and plot of U(x) = x Utility, U(x) –30,000 –1.356 –25,000 –1.043 –20,000 –0.771 –15,000 –0.535 –10,000 –0.331 –5,000 –0.154 0 0.000 5,000 0.133 10,000 0.249 15,000 0.349 20,000 0.435 25,000 0.510 30,000 0.576 35,000 0.632 40,000 0.681 45,000 0.724

.

is shown below.

The decision maker is risk averse. c.

Copyright Cengage Learning. Powered by Cognero.

Page 29


Name:

Class:

Date:

Chapter 15 - Decision Analysis

The plots of two exponential utility functions are shown here. We observe that the new utility function is “flatter” than the utility function from part (b). Therefore, the management here is being more risk seeking (less risk averse) than in part (b). While the management is still, in general, risk averse, it is willing to accept more risk in part (b). 60. The graph shown displays the utility function for risk-avoiding, risk-taking, and risk-neutral decision makers. Explain how this graph demonstrates these three risk positions.

ANSWER: The utility function for a risk avoider shows a diminishing marginal return for money. However, the utility function for a risk taker shows an increasing marginal return for money. When the marginal return for money Copyright Cengage Learning. Powered by Cognero.

Page 30


Name:

Class:

Date:

Chapter 15 - Decision Analysis is neither decreasing nor increasing but remains constant, the corresponding utility function describes the behavior of a decision maker who is neutral to risk. 61. Explain how the parameter R represents the decision maker’s risk tolerance. Do larger values of R indicate that a decision maker is more or less risk averse? Explain. ANSWER: The R parameter in equation U(x) = 1 – e–x/R represents the decision maker’s risk tolerance; it controls the shape of the exponential utility function. Larger R values create flatter exponential functions, indicating that the decision maker is less risk averse (closer to risk neutral). Smaller R values indicate that the decision maker has less risk tolerance (is more risk averse). 62. Sensitivity analysis can be used to determine how changes in the probabilities for the states of nature or changes in the payoffs affect the recommended decision alternative. Explain how sensitivity analysis helps the decision maker understand which inputs are critical to the choice of the best decision alternative. ANSWER: Sensitivity analysis helps the decision maker understand which inputs are critical to the choice of the best decision alternative. If a small change in the value of one of the inputs causes a change in the recommended decision alternative, the solution to the decision analysis problem is sensitive to that particular input. Extra effort and care should be taken to make sure the input value is as accurate as possible. On the other hand, if a modest-to-large change in the value of one of the inputs does not cause a change in the recommended decision alternative, the solution to the decision analysis problem is not sensitive to that particular input. No extra time or effort would be needed to refine the estimated input value.

Copyright Cengage Learning. Powered by Cognero.

Page 31


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.